Hi I basically just used a job to execute an entire legacy p dagster #ask-community

Hi, I basically just used a job to execute an enti...

Johannes Müller

07/17/2023, 9:36 AM

Hi, I basically just used a job to execute an entire legacy pipelines, which ran fine in less than 10 minutes. Now I refactored my job to use

op

operations for each sub function to have proper logging in Dagster. However now it takes serveral minutes before each function is executed and I need some idea on how to solve this 😄 Between each

Launching subprocess

and

Executing step "X" in subprocess.

it takes 5 minutes, and I have no visibility of what is happening here -.- Starting a subprocess can't take that long, or? I tailed all the logs while the job was running and didn't see anything related.

alex

07/19/2023, 7:43 PM

Starting a subprocess can’t take that long, or?

this time also includes importing dependencies and creating the

Definitions

repository

. To go back to the

pipeline

default of in process execution, you can set run config in the

execution:

section to select

in_process

or bind the

in_process_executor

directly to your

job

definition

alex

07/19/2023, 7:45 PM

https://docs.dagster.io/concepts/ops-jobs-graphs/job-execution#controlling-job-execution

Johannes Müller

08/16/2023, 6:39 AM

Thanks Alex, I'll give it a try!

Johannes Müller

08/16/2023, 6:40 AM

5 min overhead per operation still seems excessive, though :D

alex

08/16/2023, 1:42 PM

5 min overhead per operation

yea this is not an expected amount unless you have some very heavy lifting happening at process init / import time. If you are not sure what to attribute the cost to, you can use a profiler like https://github.com/benfred/py-spy or specifically or profile import time using https://docs.python.org/3/using/cmdline.html#envvar-PYTHONPROFILEIMPORTTIME

Open in Slack

Previous Next