https://dagster.io/ logo
#ask-community
Title
# ask-community
q

Qumber Ali

11/19/2021, 1:46 PM
Hi all, i'm facing an issue that when I run a job of dagster lots of
/usr/bin/python3 -c from multiprocessing.spawn
multiprocessing.semaphore_tracker
processes start running and system got overload and with only 8 concurrent jobs my 16 cores/64 ram systemd overlods, in fact i'm not importing
multiprocessing
this library in my code please help on it.
d

daniel

11/19/2021, 2:51 PM
by default dagster runs each op in its own subprocess, which is what uses multiprocessing. You can turn off multiprocessing by changing your job to use the in_process_executor instead of the default multiprocess_executor:
Copy code
from dagster import job, in_process_executor

@job(executor_def=in_process_executor)
def my_job():
  ...
(but then your ops will execute serially within each job instead of in parallel)
q

Qumber Ali

11/19/2021, 3:02 PM
OK, why on only 8 concurrent jobs so much multiprocessing threads?
How we can optimise our jobs with respect to CPU efficient.
d

daniel

11/19/2021, 3:04 PM
how many multiprocessing.semaphore_tracker processes are there?
q

Qumber Ali

11/19/2021, 3:05 PM
59
d

daniel

11/19/2021, 3:06 PM
are there more semaphore_tracker processes than regular dagster subprocesses? The default behavior would be one subprocess per op
q

Qumber Ali

11/19/2021, 3:06 PM
and 70
multiprocessing.spawn
.
d

daniel

11/19/2021, 3:07 PM
i think there's probably about one semaphore_tracker process per subprocess - believe that's something spun up by the multiprocessing library
q

Qumber Ali

11/19/2021, 3:08 PM
yeah, may be.
Please let me know what to do.
d

daniel

11/19/2021, 3:08 PM
I'm not totally sure what exactly you should do, but I'm happy to answer questions about how dagster works to help you understand the problem
if you don't need to run your ops in parallel, using the in_process_executor is one option
(your jobs would still run in parallel)
q

Qumber Ali

11/19/2021, 3:10 PM
Dear I just have dagster on my server nothing else and i'm facing multiple issues since last week please help.
The main thing is if close all dagster jobs all the
multiprocessing.spawn
and
multiprocessing.semaphore_tracker
thread also closed too.
d

daniel

11/19/2021, 3:16 PM
Is using the in_process_executor not an option for you? That is the first thing I would try to see if it improves your performance
The other thing you could try is continue using the default multiprocess_executor, but change the 'max_concurrent' parameter here: https://docs.dagster.io/_apidocs/execution#dagster.multiprocess_executor
q

Qumber Ali

11/19/2021, 5:02 PM
Can you please tell how I can set this configurations?
Copy code
execution:
  multiprocess:
    config:
      max_concurrent: 4
@daniel?
d

daniel

11/19/2021, 5:09 PM
That's run configuration, you set that in the Launchpad in dagit: https://docs.dagster.io/concepts/configuration/config-schema
q

Qumber Ali

11/19/2021, 7:32 PM
Thanks @daniel
in_process_executor
this worked for me.
d

daniel

11/19/2021, 7:33 PM
nice - glad it worked out. We may want to do more on our side to make multiprocess_executor less likely to cause problems when using the default config
q

Qumber Ali

11/19/2021, 7:33 PM
Can you please tell me one more thing that which one is more efficient logs in files (~/dagster_home) or logs in Postgress?
d

daniel

11/19/2021, 8:18 PM
Postgres is generally more performant
q

Qumber Ali

11/21/2021, 2:51 PM
Ok, thanks.
17 Views