Hi, I basically just used a job to execute an enti...
# ask-community
j
Hi, I basically just used a job to execute an entire legacy pipelines, which ran fine in less than 10 minutes. Now I refactored my job to use
op
operations for each sub function to have proper logging in Dagster. However now it takes serveral minutes before each function is executed and I need some idea on how to solve this 😄 Between each
Launching subprocess
and
Executing step "X" in subprocess.
it takes 5 minutes, and I have no visibility of what is happening here -.- Starting a subprocess can't take that long, or? I tailed all the logs while the job was running and didn't see anything related.
a
Starting a subprocess can’t take that long, or?
this time also includes importing dependencies and creating the
Definitions
/
repository
. To go back to the
pipeline
default of in process execution, you can set run config in the
execution:
section to select
in_process
or bind the
in_process_executor
directly to your
job
definition
j
Thanks Alex, I'll give it a try!
5 min overhead per operation still seems excessive, though :D
a
5 min overhead per operation
yea this is not an expected amount unless you have some very heavy lifting happening at process init / import time. If you are not sure what to attribute the cost to, you can use a profiler like https://github.com/benfred/py-spy or specifically or profile import time using https://docs.python.org/3/using/cmdline.html#envvar-PYTHONPROFILEIMPORTTIME