What's the best Dagster-cloud execution backend for a batch job with 10k tasks that run for a couple minutes each but can almost all be in parallel?
I'm trying to figure out which executor will work best for our pipeline. Ideally it would:
• Have relatively low per-step overhead, since our tasks don't run for that long (locally, python multiprocess + forkserver is working well). This may not be possible with lots of horizontal scaling / starting more instances.
• Have minimal cost while our pipeline isn't running (we would have tasks executing maybe 10 out of 24 hours). I think both EKS and ECS can use Fargate, which will allocate instances/pods to match load.
• Scale horizontally to match increasing number of items to process. I think the ECS executor only runs 1 instance per job, not instances per task, which probably rules it out. EKS starts additional pods per step I believe, though that's a bit heavy weight for our needs.
A mix of execution models could be nice, not sure if that's at all practical. For example, run the lightweight setup steps with forkserver, and then send 1000 of the heavier tasks each to a separate dynamically allocated instance to be processed there in parallel with forkserver.
I'm not familiar with Dask and I don't see Dagster Cloud docs about running an agent for Dask, so I'm not sure if I should be considering it.