Louis Auneau

12/02/2021, 9:25 AM
Hi everyone :dagster-evolution: ! We are currently rolling out Dagster in our company and we are pretty happy about it in “batch” mode. We deployed it in Kubernetes and it handles each batch in a k8s job which is convenient and scalable. However, we are thinking about using it in more “*event-driven/realtime*” conditions, were we would have: • sensors running every 1-to-10 seconds to retrieve events from messaging queues and run jobs on each of them, • external services triggering jobs through the GraphQL API. Each job is made out of short and light computations packaged in ops. Having a k8s job and pod spawned for each job is not scalable and would heavily weight on our Kubernetes infrastructure. Also the overhead of spawning containers is a huge waste of time in this context. We were looking for something more “server” oriented with workers ready to receive any computation thrown at them. I was wondering if a dask or celery clusters could be the answer ? As it seems it executes each op as a future/task, does the dagster job graph itself is executed in a kubernetes job because of the K8sRunLauncher ? Thank you by advance for your help ☀️


12/02/2021, 1:59 PM
Hi Louis - this is possible to do in Dagster because of the pluggable nature of the run launcher, but is not something that we have a great out-of-the-box solution for currently. I could absolutely imagine a custom run launcher that uses a pool of standing gRPC servers to launch runs vs. spinning up a new pod for each one (actually the default run launcher works pretty similarly to this, calling out to a gRPC server to launch a run in a subprocess- it just makes some assumptions about the server that wouldn't apply in this case). A user at a community meeting described a setup they built that sounds a lot like what you're describing:

:next-level-daggy: 1

Louis Auneau

12/02/2021, 2:00 PM
I will have a look, thanks a lot !


12/02/2021, 2:42 PM
I’ll add that the
seems close to what you’re describing. It distributes ops to a pool of workers. Note this is distinct from the
, which uses the celery queue but does actually compute in ephemeral jobs. You could consider using the DefaultRunLauncher, which creates run workers in subprocesses in the standing grpc server, which would then hand out the Celery tasks. We generally caution against heavy usage of the DefaultRunLauncher because spinning up a bunch of processes on one box can be an issue, but it may work for you. At least your actual compute isn’t happening on that box- it’s out on the Celery workers.
:next-level-daggy: 1