Hi everyone we are using Dagster on ECS fargate But the thin dagster #deployment-ecs

Hi everyone, we are using Dagster on ECS(fargate)....

Muhammad Nouman Khalid

03/02/2023, 9:56 AM

Hi everyone, we are using Dagster on ECS(fargate). But the thing I am facing is the time between "Launching subprocess" and "Executing step" is almost more than 3 minutes every time with every step. I can understand the container takes sometime to warm up but it should happen on first time. is it normal or there is something I am doing wrong ?

Vinnie

03/02/2023, 9:59 AM

ECS tasks typically take a while to spin up, but even then, 3 minutes seems a little excessive. We typically have 1-2 minutes warm up, but that’s on AWS rather than on dagster. There’s a lot of stuff happening on the background: the VM needs to pull the docker image, spin up the container, ENI, attach ip addresses, etc.

Muhammad Nouman Khalid

03/02/2023, 10:23 AM

I have like 4 assets and each asset turns into subprocess and it is taking almost 3 minutes to start.

Vinnie

03/02/2023, 10:29 AM

For reference: https://stackoverflow.com/questions/51618252/how-to-speed-up-deployments-on-aws-fargate The only thing that immediately comes to mind without knowing your infrastructure would be trying to reduce the image sizes or using ztsd compressed container images. Maybe someone else knows of other best practices they can share.

daniel

03/02/2023, 1:10 PM

ECS task spinup time would make sense to me if it was the startup time for the whole run task that’s slow, but it sounds like this is the time spinning up a subprocess within the task, where that shouldn’t matter. Is this a job that you’re able to run locally and do you get the same slow startup times there? Does your Python environment possibly have any very large imports or side effects? Each step process will re-import your job code again, which can sometimes be a source of slowness if it takes a long time to do that

Vinnie

03/02/2023, 1:11 PM

Oops, looks like I misread the question 😄

Muhammad Nouman Khalid

03/02/2023, 1:13 PM

@daniel It hardly takes 10 second on local to run the whole pipeline which includes (dagster assets and dbt assets) Even on local I use docker image and container to test everything.

daniel

03/02/2023, 1:18 PM

Any chance you’d be able to send us a debug output of a slow run? In the upper right of the timeline page for the run there’s a menu option with “download debug info”

Muhammad Nouman Khalid

03/02/2023, 1:19 PM

thats the slow one

8b4b115a-9520-432e-bb60-c6b80ea0988c.gz

daniel

03/02/2023, 9:56 PM

i took a look at this and one very quick and easy thing that you can do, since the ops here don't seem to be running in parallel, is to use the in process executor that launches each of them in the same process instead of in parallel in subprocesses. There are examples here in the docs for how to do that: https://docs.dagster.io/concepts/ops-jobs-graphs/job-execution#default-job-executor Setting this configuration in the launchpad or in your job config is probably the easiest way:

Copy code

execution:
  config:
    in_process:

That won't help with the 3 minute startup time between "Started process for run (pid: 1)." and "Started execution of run for "everything_everywhere_job"." Almost the only thing that happens between those two events is Dagster loading your code - you mentioned it runs really fast locally though, so the other thing i'd consider is increasing the amount of CPU and/or memory that the job task has access to: https://docs.dagster.io/deployment/guides/aws#customizing-cpu-and-memory-in-ecs

Muhammad Nouman Khalid

03/02/2023, 9:59 PM

Thank you, let me try it and I will let you know.

9 Views

Open in Slack

Previous Next