https://dagster.io/ logo
Title
c

Chris Nogradi

02/03/2022, 5:38 PM
Sorry for the newbie question: Is it expected that Dagit is significantly slower than running from python all in one process? Seems like there are long delays between each op's execution and I see execute_windows_tail timeouts in the log (Timed out waiting for process to start). The trivial pipeline with 4 ops I am using takes 5 minutes to run in dagit and 30 seconds in straight python. Is this an operator error?
o

owen

02/03/2022, 5:41 PM
Just a quick question -- when you execute the job in python, are you executing it in a single process (i.e.
my_job.execute_in_process()
)? This could cause at least some difference (as by default when you execute from dagit it will execute each op in a separate process), although I still wouldn't expect it to take that long to spin up the new processes
c

Chris Nogradi

02/03/2022, 5:47 PM
Yes I use
my_job.execute_in_process()
o

owen

02/03/2022, 5:53 PM
gotcha that will explain the difference -- seems like it's not an issue that's specific to dagit. If you don't care about having each process run in a separate process (the main benefit here is step-level parallelism), then you can always swap the executor back to the in process one either by setting the executor_def argument on your job or by adding a blob to your run config: https://docs.dagster.io/_apidocs/execution#dagster.in_process_executor . I'm still surprised that it takes that long to start a process though (there should be some overhead but not more than a few seconds).
c

Chris Nogradi

02/03/2022, 6:10 PM
Thank you @owen. I had to use this:
execution:
  config:
    in_process:
rather than this per docs:
execution:
  in_process:
And now it used one process in Dagit but still takes much longer than w/o Dagit. Total time in Dagit was 2.5 minutes (half of the multiprocess but still 4x more than command line). I suspect the timeouts on the console are the issue. I'll try to investigate more ...