Hello all, I am experimenting with dagster on kube...
# deployment-kubernetes
s
Hello all, I am experimenting with dagster on kubernetes but I have some doubts regarding the concept of dagster-user-deployments. From what I understand, the deployment is the unit of dependency isolation in dagster, is this correct? Meaning that if I want to run pipelines with different or incompatible dependencies, I must organize them in separate deployments. Now, I've also noticed that, for each deployment there is a pod running on the k8s cluster. I was now wondering how should I proceed to organize my pipelines in dagster with the idea that I might end up having hundreds of them in the near future, and I would like the pipelines to be as independent of each other as possible. Does this mean that I need to create a deployments for each pipeline? Does this mean that I would end up having an always running pod for each deployment? I tried looking this stuff up on the documentation but I could not get a precise answer, if I missed something I would be grateful to anyone who will point me in the right direction. Thanks!
a
Hi Sebastian 👋 This concept of user-deployments can be a bit confusing at the beginning, but I will try my best to explain. You can see a user-deployment as “a group of pipelines with the same dependencies”. Every user-deployment can contain multiple pipelines, so you will have a new user-deployment for every group of pipelines. With that said, all the pipelines within a single user-group are independent and run in separate pods (no risk of conflicts there). When you start a new pipeline in Dagster, a new pod will be created using the same container image of your user-deployment and your pipeline will run there. Does that help?
s
Thank you @Andrea Giardini for your quick response. What I'm specifically asking is this pod in particular (highlighted in the screenshot). When the pipelines are executed, I see kubernetes jobs being created and running the same docker image of this pod. My concern is about the number of these always running pods if the number of user-deployments increases. I am guessing I will have a pod for each user-deployment. Also, I am still trying to figure out a way to organize my GIT repository (or repositories). Are there any best practices I should follow?
a
you are right. You will have one user-deployment per group of pipelines. IMO they take minimal resources, and they provide a lot of advantages. You can upgrade/modify every user-deployment independently without affecting the other and that’s super nice. I wouldn’t not get too crazy about this… If you have a large deployment is normal to have several user-deployments running In term of organization of git repositories, I would organize them by “group of pipelines with the same dependencies”
s
Thank you Andrea, you have been very helpful!
🙏 1
a
Happy to help!
s
One last question Andrea, is the source code for this Docker Image (https://hub.docker.com/r/dagster/user-code-example) available somewhere? Thanks!
j
The code for the dagster resources in the image is here: https://github.com/dagster-io/dagster/tree/master/examples/deploy_k8s There’s some indirection around how we actually build the image but the Dockerfile is ultimately sourced from here (after the above directory is copied into the build context) https://github.com/dagster-io/dagster/tree/master/python_modules/automation/automation/docker/images/user-code-example
đź‘Ť 1