On hybrid, we are seeing this - The step worker t...
# dagster-plus
a
On hybrid, we are seeing this - The step worker takes 2 minutes to actually start. These are ops within a job. Why would it be so slow? Can we make it faster?
a
how big is the image being pulled?
a
~500MB
a
what is happening in the k8s cluster in that interval?
a
Thanks for the hint. What I see is that during those 2 minutes, the code location is being loaded again which is quite an intense process on our side. We are making an API call and looping through the response to generate assets and sensors. While that is happening, the STEP_WORKER just waits and it actually starts when the code location is reloaded.
It would be great to have control on how frequently reloads the definitions from the code location file..
a
Mh? Why is the code location reloaded? it shoudn’t be reloaded for every run
a
Is this the reason - When a job is starting, I see this log on the dagit UI -
Executing steps using multiprocess executor
. See this. Does it mean that each of the op in my job needs its own code to be loaded before it executes? I see this behaviour at the start of each op in my job..
Can I change it to
in_process
?
p
Switching to the in process executor should eliminate the code location reload at the expense of losing the process isolation and parallelism
If you are still using
repository
definitions, you might have some luck by deferring some of this cost by using the lazy-loaded constructor. See
lazy_loaded_repository
in https://docs.dagster.io/_apidocs/repositories#dagster.RepositoryDefinition. But that probably depends on what you’re actually querying.
a
When you said that I might lose code isolation, this is within a job, right? Each job gets its own k8s pod as we see on the Dagit UI like shown below, so if I have 5 sensors yielding 5 different run requests, k8s will parallelise it on separate pods but within that pod, the steps would run in the same process if I use
in_process
. Did I get it right? @prha
p
Yes, that’s right. I meant specifically process-isolation for steps
a
I think this wouldn't affect my use-case then. I want them to run in series.. would there be any downside in your opinion?
a
@prha For my own understanding… why does the multiprocess executor need a location reload and the in_process executor does not? 🤔
p
I think it’s not a code location reload so much as that there’s just a lot happening at import time, which would happen on a per-process basis.
a
Hey @prha, when the job is launched it does the same thing between the
ENGINE_EVENT
and the
RUN_START
events. It runs through the code location file I have provided which is heavy in our case. Can we do something about this too? I am using
in_process_executor
so the remaining operations are blazingly fast.. @Andrea Giardini