Hello all After experimenting with deploying Dagster to ECS dagster #deployment-kubernetes

Hello all. After experimenting with deploying Dags...

Jeff Nawrocki

02/15/2023, 4:19 PM

Hello all. After experimenting with deploying Dagster to ECS, I tried deploying to kubernetes (EKS) and found it easier to get my example working. I have a simple user code deployment based on the cereal example. I'm still trying to wrap my head around assets vs ops, but my general understanding is that I would use assets for data tasks and ops for tasks such as sending an email notification. I would also use a job to schedule this cereal asset graph. Is that sounding correct? My other point of confusion is regarding the kubernetes launchers. In the Dagster user-code-example (which I have also deployed), they build out 3 different launchers. It looks to me like the main difference is that using the k8s launcher will run each step in a pod whereas if you don't use the launcher, all steps are run in the same pod. Is that correct? How important is it to use Celery? I don't really understand when to use that or exactly how it helps. In the cereal example, I never defined a run launcher and it still runs just fine. If I want to build out asset only graphs, do I ever really need to dive into these launchers? Thanks for any help/advice!

jordan

02/15/2023, 4:39 PM

Your choice of launchers and executors will depend on the level of isolation you need for each job and step. If scaling/isolation isn’t a huge concern yet, you’re likely fine sticking with the defaults for a while. Using the k8s run launcher will launch each job in its own pod. This is how most of our users choose to configure K8s. https://docs.dagster.io/_apidocs/libraries/dagster-k8s#dagster_k8s.K8sRunLauncher Using the k8s executor will launch each step within that job in its own pod. https://docs.dagster.io/_apidocs/libraries/dagster-k8s#dagster_k8s.k8s_job_executor But the default multiprocess executor (launching each step within that job in its own process in the same pod) is likely sufficient https://docs.dagster.io/_apidocs/execution#dagster.multiprocess_executor It’s not important to use Celery - just another option for deployment if your needs justify it. Some users prefer the level of customization it affords them, but most are comfortable with letting Dagster handle the execution. tl;dr if you mostly just want to experiment working with asset graphs, you can safely stick to the defaults. And if/when you outgrow them, this channel can help you graduate to additional layers of customization.

jordan

02/15/2023, 4:40 PM

Oh, and your understanding for ops/assets/etc. looks broadly accurate to me - #dagster-support can probably help with any specific details that come up around the core Dagster APIs

Jeff Nawrocki

02/15/2023, 5:05 PM

Awesome, thanks @jordan! Really excited about using this.

2 Views

Open in Slack

Previous Next