Hi everyone, we are using Dagster on ECS(fargate)....
# deployment-ecs
m
Hi everyone, we are using Dagster on ECS(fargate). But the thing I am facing is the time between "Launching subprocess" and "Executing step" is almost more than 3 minutes every time with every step. I can understand the container takes sometime to warm up but it should happen on first time. is it normal or there is something I am doing wrong ?
v
ECS tasks typically take a while to spin up, but even then, 3 minutes seems a little excessive. We typically have 1-2 minutes warm up, but that’s on AWS rather than on dagster. There’s a lot of stuff happening on the background: the VM needs to pull the docker image, spin up the container, ENI, attach ip addresses, etc.
m
I have like 4 assets and each asset turns into subprocess and it is taking almost 3 minutes to start.
v
For reference: https://stackoverflow.com/questions/51618252/how-to-speed-up-deployments-on-aws-fargate The only thing that immediately comes to mind without knowing your infrastructure would be trying to reduce the image sizes or using ztsd compressed container images. Maybe someone else knows of other best practices they can share.
d
ECS task spinup time would make sense to me if it was the startup time for the whole run task that’s slow, but it sounds like this is the time spinning up a subprocess within the task, where that shouldn’t matter. Is this a job that you’re able to run locally and do you get the same slow startup times there? Does your Python environment possibly have any very large imports or side effects? Each step process will re-import your job code again, which can sometimes be a source of slowness if it takes a long time to do that
v
Oops, looks like I misread the question 😄
m
@daniel It hardly takes 10 second on local to run the whole pipeline which includes (dagster assets and dbt assets) Even on local I use docker image and container to test everything.
d
Any chance you’d be able to send us a debug output of a slow run? In the upper right of the timeline page for the run there’s a menu option with “download debug info”
m
d
i took a look at this and one very quick and easy thing that you can do, since the ops here don't seem to be running in parallel, is to use the in process executor that launches each of them in the same process instead of in parallel in subprocesses. There are examples here in the docs for how to do that: https://docs.dagster.io/concepts/ops-jobs-graphs/job-execution#default-job-executor Setting this configuration in the launchpad or in your job config is probably the easiest way:
Copy code
execution:
  config:
    in_process:
That won't help with the 3 minute startup time between "Started process for run (pid: 1)." and "Started execution of run for "everything_everywhere_job"." Almost the only thing that happens between those two events is Dagster loading your code - you mentioned it runs really fast locally though, so the other thing i'd consider is increasing the amount of CPU and/or memory that the job task has access to: https://docs.dagster.io/deployment/guides/aws#customizing-cpu-and-memory-in-ecs
m
Thank you, let me try it and I will let you know.