:wave: I have many tiny assets that do not require...
# deployment-kubernetes
p
👋 I have many tiny assets that do not require a lot of resources (i.e.: a single http request to fetch a small file, parsing and validation, save to
gcs
). These can share a single pod's resources with no problem. I also have a few CPU intensive assets which I'd like to isolate in separate pods. I believe what I need is a custom implementation of
RunCoordinator
that can launch runs in different
RunLauncher
implementations; potentially, based on tags on those jobs / assets, e.g.:
foo/launcher: k8s
Is there already support for something like this? I looked at the
RunCoordinator
interface and the different implementations but I'm not sure I understand how they work; any guidance would be appreciated. Another approach is to use the
K8sExecutor
(which is a pod per step), but this assumes the use of the
K8sLauncher
which means that every run also gets its own pod which I'd like to avoid since it would be overkill for the tiny tasks. Is there another approach I could consider?
m
I would be really interested in a way to split up execution of jobs like this!
p
Can you share some details about your use case for something like this?
m
We have a portion of our DAG that's a bunch of small setup steps -- pull a spreadsheet from Google Drive, fetch metadata from an API, etc. Many steps can run in parallel, but they're all small. So that first part would be a good fit for all running on the same k8s node. And then after those steps are done, we move into a bunch of more resource-intensive steps, using pandas to do data transformation. This section is a big fan-out (1k+ items to process). It would take too long to all run on one node, so we want to split it up, and putting one step per pod is the best compute model. Though if we could have 5-10 persistent pods that worked through a queue of these items that might be more efficient in terms of k8s resources.
d
There's a feature request here for supporting multiple run launchers in the same deployment (unfortunately not currently possible although I could imagine writing a custom run launcher that does this): https://github.com/dagster-io/dagster/issues/11709 Setting the helm chart to use the DefaultRunLauncher would cause it to do execution as a subprocess within the user code deployment (but due to the issue above that would then kick in for all your runs). You can change the executor per run though which controls where each step is executed - which could allow you to do certain runs with each asset happening in its own pod using the k8s_job_executor. There's another feature request here for varying where each step is executed within a single run: https://github.com/dagster-io/dagster/issues/13266
p
Thanks @Mark Fickett that sounds like a similar situation to mine. Good to know I’m not alone! @daniel thanks for those links, I’ll keep an eye in them. The most promising for me was indeed to use the k8s executor, but it assumes you’re using the k8s run launcher. I wouldn’t mind having all steps in a k8s pod even if only one of them requires that, but that doesn’t seem possible at this time. Do you know if that requirement could be lifted?
d
I agree that restriction is annoying - here's a PR that removes it: https://github.com/dagster-io/dagster/pull/14391
🎉 1
🙏 1
p
Nice! I'll try to kick those tires once it gets merged. Thanks!