On hybrid we are seeing this The step worker takes 2 minutes dagster #dagster-plus

On hybrid, we are seeing this - The step worker t...

Abhishek Agrawal

03/21/2023, 5:38 AM

On hybrid, we are seeing this - The step worker takes 2 minutes to actually start. These are ops within a job. Why would it be so slow? Can we make it faster?

Andrea Giardini

03/21/2023, 3:28 PM

how big is the image being pulled?

Abhishek Agrawal

03/21/2023, 3:32 PM

~500MB

Andrea Giardini

03/21/2023, 3:33 PM

what is happening in the k8s cluster in that interval?

Abhishek Agrawal

03/21/2023, 3:52 PM

Thanks for the hint. What I see is that during those 2 minutes, the code location is being loaded again which is quite an intense process on our side. We are making an API call and looping through the response to generate assets and sensors. While that is happening, the STEP_WORKER just waits and it actually starts when the code location is reloaded.

Abhishek Agrawal

03/21/2023, 3:52 PM

It would be great to have control on how frequently reloads the definitions from the code location file..

Andrea Giardini

03/21/2023, 3:53 PM

Mh? Why is the code location reloaded? it shoudn’t be reloaded for every run

Abhishek Agrawal

03/21/2023, 3:56 PM

Is this the reason - When a job is starting, I see this log on the dagit UI -

Executing steps using multiprocess executor

. See this. Does it mean that each of the op in my job needs its own code to be loaded before it executes? I see this behaviour at the start of each op in my job..

Abhishek Agrawal

03/21/2023, 4:02 PM

Can I change it to

in_process

prha

03/21/2023, 4:24 PM

Switching to the in process executor should eliminate the code location reload at the expense of losing the process isolation and parallelism

prha

03/21/2023, 4:28 PM

If you are still using

repository

definitions, you might have some luck by deferring some of this cost by using the lazy-loaded constructor. See

lazy_loaded_repository

in https://docs.dagster.io/_apidocs/repositories#dagster.RepositoryDefinition. But that probably depends on what you’re actually querying.

Abhishek Agrawal

03/21/2023, 4:33 PM

When you said that I might lose code isolation, this is within a job, right? Each job gets its own k8s pod as we see on the Dagit UI like shown below, so if I have 5 sensors yielding 5 different run requests, k8s will parallelise it on separate pods but within that pod, the steps would run in the same process if I use

in_process

. Did I get it right? @prha

prha

03/21/2023, 4:33 PM

Yes, that’s right. I meant specifically process-isolation for steps

Abhishek Agrawal

03/21/2023, 4:34 PM

I think this wouldn't affect my use-case then. I want them to run in series.. would there be any downside in your opinion?

Andrea Giardini

03/21/2023, 5:17 PM

@prha For my own understanding… why does the multiprocess executor need a location reload and the in_process executor does not? 🤔

prha

03/21/2023, 5:33 PM

I think it’s not a code location reload so much as that there’s just a lot happening at import time, which would happen on a per-process basis.

Abhishek Agrawal

03/22/2023, 12:48 AM

Hey @prha, when the job is launched it does the same thing between the

ENGINE_EVENT

and the

RUN_START

events. It runs through the code location file I have provided which is heavy in our case. Can we do something about this too? I am using

in_process_executor

so the remaining operations are blazingly fast.. @Andrea Giardini

2 Views

Open in Slack

Previous Next