The cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability.

dagster

Hey :wave: I noticed the experimental support for <https://docs.dagster.io/dagster-cloud/deployment/agents/running-multiple-agents#running-multiple-agents-|running multiple agents>, and although it's currently not possible to submit runs to specific agents, I'm wondering if that support is considered somewhere on the roadmap? An example use case is to be able to submit specific ops to on-prem servers (with custom hardware), while being able to submit everything else to a cloud Kubernetes cluster by default.

Hi Charles - when you say "submit specific ops" - does this mean within a single run, submitting some ops to one place and other ops to another? Or do you more mean submit some *jobs/runs* to different clusters

Hey Daniel!
&gt; does this mean within a single run, submitting some ops to one place and other ops to another?
Yes -- op/asset-level granularity would be ideal, but I understand that might be more complex to implement and could workaround job-level granularity.

One example where this would be useful is where I have a job that runs a training pipeline for an ML model, from feature computation (i.e. k8s), to training (i.e. k8s/ec2 with GPUs), and evaluation (i.e. on-prem with custom inference hardware).

I could very well break this out into two different jobs, but it feels like with cloud IO managers, the execution of each asset/op in a job doesn't need to be tied to a single machine or even cluster.

What type of agent were you imagining the on-prem server would be running?

We've definitely talked about ways to intersperse different ops within a single run to run in different places (some ops running as k8s pods, others in local subprocesses, maybe others as an ECS task, etc. - all communicating together through the event log as part of the same run).

&gt; What type of agent were you imagining the on-prem server would be running?
Thinking it would be running a Docker agent

&gt; We've definitely talked about ways to intersperse different ops within a single run to run in different places (some ops running as k8s pods, others in local subprocesses, maybe others as an ECS task, etc. - all communicating together through the event log as part of the same run).
That sounds great! Curious if you know whether there's an open issue I could track for the status of that?

I thought there was but couldn't find it, so I filed one: <https://github.com/dagster-io/dagster/issues/13266>

Thanks Daniel! In the meantime, is it possible to accomplish this at the job level? Say, by having a k8s and docker agent, and <https://docs.dagster.io/concepts/ops-jobs-graphs/job-execution#controlling-job-execution|configuring a job> to use a `docker_executor`?

That could work, yeah - or you could have the body of the op call out to the other service

Perfect, that should be good for now. Really appreciate your help! :pray: