Filip Radovic

11/20/2022, 4:02 PM
Hey everyone! I wanted to get some insight if it is possible to run jobs inside a kubernetes cluster only inside the user deployment pod using something like the DefaultRunLauncher. Currently the issue with the default k8sRunLauncher is that it spawns a new pod each time it runs a job which results in huge overhead especially if many small jobs need to be ran. Since I need to run around 100 smaller jobs each day which are results of partitioning multiple assets with each job lasting about 10s each the overhead of spawning a new pod each time becomes huge. Currently the solution I have in mind is writing a custom launcher that uses already existing pods for each partitioned asset. But I wanted to know if there is a simpler solution out there for this issue before I started working on the custom launcher.


11/22/2022, 5:16 PM
Hi Fillip, we currently have a DefaultRunLauncher (it’s the default locally, but not when using the helm chart…) which launches runs in to the standing grpc servers.
Here it is: You’d be able to specify this in the custom run launcher field of the Helm chart with something like
      module: dagster
      class: DefaultRunLauncher
      config: {}
🔥 1
We’re also working on ways to make it easy to switch between isolated and non isolated runs by switching between these two launchers
❤️ 2

Tomas Gatial

01/10/2023, 9:30 AM
Hi @johann happy to hear about outlook of RunLauncher switching. I am looking for possibility to spawn small jobs at higher rate along with existing k8s job execution architecture. I was just about to start implementing a CustomRunLauncher that combines both K8SRunLauncher and DefaultRunLauncher. Do you think above is a good work-around for now? Is it possible to get an outline of the new functionality?


01/10/2023, 4:03 PM
Hi Tomas, I think the combined run launcher is worth trying for your use case, but does have some caveats. We actually launched something very similar to that in Dagster Cloud Serverless, where we have more chance to monitor how things are working. We found • much lower run start time (just wait for process init, don’t have to wait for a new container) • but, it’s easy to OOM the standing gRPC server. As a result, we increased Memory from 256mb for the standing server to 1gb. We also introduced a default limit of 1 concurrent run on the standing server, and any other runs we launch with an isolated container.
This was to support arbitrary workloads though. If you know you’ll only be throwing lightweight jobs at it, you can get away with more