Hi! does anyone have an example of deployment of a...
# announcements
Hi! does anyone have an example of deployment of a toy repository with celery using different machines/server locations? It is not clear for me how to structure the code and how to configure each part of dagster to make it work in this execution configuration
hey @Cris. I’m not sure I fully understand the “using different machines/server locations” part of the question? As long as all of the celery workers share the same broker/backend and are watching the same queue, they should be able to pick up work regardless of location / machine
If you have different machine requirements, an option would be to create a separate queue per requirement. lmk if im misunderstanding what youre asking~
also are you using the celery executor directly or celery k8s executor
Thanks for the reply! I was refering more to code organization. The docs mention that the celery example needs more configuration regarding code and I wanted to know what could be a good way to structure the code for the repositories/workers. Perhaps also a related question; should all workers have access to the same code as the dagit instance? can some workers have separate pieces of code (eg. 2 repos)
We want to try with just the celery executor, we dont have yet a dedicated person to devops and I dont know how much overhead we can handle if we add kubernetes to our stack
👍 1
how are you planning on splitting pipeline executions among celery machines / server locations? usually the executor config would be specified on a pipeline by pipeline basis via mode definitions
were you thinking of doing at a lower level, like solid config?
our celery k8s workers dont include the repo code, we’ve been just including these:
Copy code
RUN apt-get update -yqq && \
    apt-get install -yqq cron && \
    pip install \
    dagster \
    dagster-graphql \
    dagster-postgres \
    dagster-cron \
    dagster-celery[flower,redis,kubernetes] \
ah but without running on k8s, then yes i think it would need access to the repo code
yup, makes sense — celery w/o k8s sounds good
wait, the workers dont need pipeline code? :0
sorry edited it to “ah but without running on k8s, then yes i think it would need access to the repo code”
But the k8s dont need that because you have setup an image with all the code, am I correct?
I see, if I had two+ different repos, should all the code still be on a single image?
sorry for delay, just checking with some other folks at dagster. so, it should be possible to have multi-repo + dagit + dagster-celery where each repo has its own image that contains only a subset of the pipelines, but i dont think anybody has actually attempted that yet
i think if the question is mainly about code organization, having a workspace with multi repos (that all have the same image) should work and we are actively working on ironing out the details for supporting a diff image per repo
Hi @Cris ! As from my experience the rule of thumb is like “you don’t need k8s unless you already have it” :) It makes sense only for a huge team and say a hundred aws nodes in production when you want to save some money on dynamic load balancing. Usually requiring 2-3 dedicated devops to maintain this monster :) We’ve currently set up our repo and deployment strategy for dockerized dagster/dagit/celery with ansible on several servers but now it has a big limitation on dagster-celery workers side: if you deploy your code while the pipeline is running you’ll end up with a pipeline executed a half old code and a half new code. That’s frustrating. I’m working on @nate ‘s DinD branch (so each pipeline could run in a separate versioned docker) now and when I finish we’ve planned to make an excursion over our infrastructure around dagster for dagster team. I think you could join if it is interesting for you
Thanks for the reply! That sounds nice, and aligned with what we're doing. I'm currently exploring the celery executor and trying to dockerize our application in order to deploy it easier with celery. Having separate dockers per pipeline/repo would be ideal since we deploy many different models in production with quite a heterogeneous set of requirements. It would be nice to isolate the models a little.