Matt Millican

11/28/2022, 10:41 PM
Has anyone deployed Dagster to AWS ECS and had success getting runs to start “quickly” at scale (say, within 10-20sec after the run is requested)? Currently, my team is hosting a gRPC server
dagster api grpc
in a single ECS task we call
. We’re using the
to launch jobs, so when a run is triggered through Dagit, a sensor, etc., we provision a new ECS task pointing to an ECR image holding our own code repository in order to do the work. This scales quite well in terms of throughput and resource usage—runs don’t eat each other’s resources since we spin up as many ECS tasks as we like! But it takes about 1-2 minutes for ECS to provision and launch the new task for a run, which is slower than we’d want. On the opposite extreme, I could imagine simply using the
in the gRPC task so that runs could start near-instantaneously after being triggered. But a single task, even a large one, probably couldn’t handle the volume of runs we’re working with. Has anyone worked with any sort of in-between? I’m imagining something like an autoscaling ECS service with a bunch of user-code tasks that launch jobs instantly on demand, but how to put this together isn’t obvious to me. A custom run launcher? Or maybe add more
gRPC tasks and somehow load-balance traffic to those tasks from
? Any thoughts welcome 😃:daggy-3d:
:dagster-bot-responded-by-community: 1


11/28/2022, 11:17 PM
have you read through the ECS best practices guide for launching tasks?


11/29/2022, 12:07 AM
whatever happened to the feature the dagster team was working on for supporting non-fargate / EC2 ECS tasks? that should allow for tasks with much faster spin-up (at the cost of having to manage an EC2 fleet, but that could be done with autoscaling)
it's a bit unclear how you link it up to your EC2 ECS cluster though... this page seems to hint at configuring your agent to launch EC2 tasks. I think separately you'd need to create and configure EC2 instances with an ECS agent for the tasks to get launched on