Matt Millican
11/28/2022, 10:41 PMdagster api grpc
in a single ECS task we call user_code
. We’re using the EcsRunLauncher
to launch jobs, so when a run is triggered through Dagit, a sensor, etc., we provision a new ECS task pointing to an ECR image holding our own code repository in order to do the work.
This scales quite well in terms of throughput and resource usage—runs don’t eat each other’s resources since we spin up as many ECS tasks as we like! But it takes about 1-2 minutes for ECS to provision and launch the new task for a run, which is slower than we’d want.
On the opposite extreme, I could imagine simply using the DefaultRunLauncher
in the gRPC task so that runs could start near-instantaneously after being triggered. But a single task, even a large one, probably couldn’t handle the volume of runs we’re working with.
Has anyone worked with any sort of in-between? I’m imagining something like an autoscaling ECS service with a bunch of user-code tasks that launch jobs instantly on demand, but how to put this together isn’t obvious to me. A custom run launcher? Or maybe add more user_code
gRPC tasks and somehow load-balance traffic to those tasks from dagit
/ dagster-daemon
?
Any thoughts welcome 😃:daggy-3d:Jeremy
11/28/2022, 11:17 PMZach
11/29/2022, 12:07 AM