Hi is there anyway to deploy a Dagster such that each solid dagster #deployment-kubernetes

Hi, is there anyway to deploy a Dagster such that ...

Oliver

06/29/2021, 11:34 PM

Hi, is there anyway to deploy a Dagster such that each solid runs in it own deployment and data is passed between them? For context on why; I'm currently using the K8sRunLauncher and finding that the overhead of launching a pod for each run is bottlenecking the latency of the system. Each run will typically last ~8s under max load and the minimum for any container to create seems to be about ~10s

daniel

06/30/2021, 2:20 PM

We don't have this available out of the box currently, but I can imagine a run launcher that launched against a pool of standing deployments, I don't think that would be a huge amount of work to build - it would actually be pretty similar to the DefaultRunLauncher, just running against a different gRPC server. Is it important to you that different solids run in different deployments? Or would you be OK with the isolation being at the pipeline/run level instead (just running against a set of servers that were already running, so you didn't have to worry about startup overhead)?

Oliver

07/01/2021, 4:16 AM

Hmm, yea, solid level isolation isn't important to me.

daniel

07/01/2021, 1:25 PM

Got it - this might require a custom run launcher, but its a reasonable feature request to have a standing set of servers that receive requests to launch runs

Oliver

07/02/2021, 8:59 AM

I would still want to be running in k8s, so I think the plan would be to replace the host in the DeafultRunLauncher with the cluster ip for the service of the k8s user code deployment, I guess three questions • is that all that would be neccesary? • Is this how the daemon executes code currently? • Is it trivial to scale the usercode deployment -- i.e would there be any state in the user code deployment to consider

daniel

07/02/2021, 2:20 PM

The DefaultRunLauncher already runs code in the user code deployment - the tricky part would be scaling it up. I think dagit in particular makes some assumptions that there's a single server powering each user code deployment (to do things like detect when the server has changed)

3 Views

Open in Slack

Previous Next