Guillaume Onfroy
03/17/2023, 2:31 PMMultithreadedExecutor
which would allow executing steps in parallel within the same process. Currently, when running on Kubernetes using the MultiprocessExecutor
, there's a massive cold start delay for each subprocess being started because, from what I understand, it has to reload the entire project and assets, which can lead to extremely long jobs even when the tasks are very lightweight.Chris Histe
03/17/2023, 2:54 PMGuillaume Onfroy
03/17/2023, 2:59 PMalex
03/17/2023, 3:36 PMforkserver
start method
https://docs.dagster.io/concepts/ops-jobs-graphs/job-execution#default-job-executorGuillaume Onfroy
03/17/2023, 3:45 PMDanny Steffy
03/17/2023, 4:35 PMpreload_modules
in the config if the default doesn't load the necessary modules as expected: https://github.com/dagster-io/dagster/discussions/7338
Is that in the forkserver
start method config? How can we determine which modules were loaded correctly via forkserver
and which we need to list explicitly?Guillaume Onfroy
03/17/2023, 4:54 PMexecution:
config:
multiprocess:
start_method:
forkserver:
preload_modules:
- dagster
- dagster_dbt
- dagster_k8s
- dagster_postgres
- ...
I see no difference...Guillaume Onfroy
03/17/2023, 5:27 PMalex
03/17/2023, 5:48 PMHarrison Conlin
03/18/2023, 1:47 AMalex
03/20/2023, 2:53 PMjust passed individually to asyncio.runcorrect
Abhishek Agrawal
03/23/2023, 4:23 AMalex
03/23/2023, 3:00 PMAbhishek Agrawal
03/23/2023, 8:56 PMChris Histe
03/23/2023, 9:03 PMexecution:
config:
multiprocess:
max_concurrent: 15
start_method:
forkserver: null
Chris Histe
03/23/2023, 9:05 PMThe best case scenario is to pre-load the module that defines all your definitions
what do you mean by definitions?
And do you have a recommendation to know whether or not dependent libraries are fork safe?
@alexalex
03/23/2023, 9:14 PMwould you have any other suggestion for the issue I have raised in my comment?probably need to consider optimizations specific to how you are building up your code location
forkserver: nullnot sure about null working,
forkserver: {}
should work
what do you mean by definitions?
@op
, @asset
, etc https://docs.dagster.io/concepts/code-locations
know whether or not dependent libraries are fork safe?No special advice, try it out and/or search the web.
Chris Histe
03/23/2023, 9:48 PMforkserver
gave us a 30 to 40% faster run. Pretty nice. Multithreading would be nicer but for now this will do for our use case.Jordan
03/23/2023, 11:58 PMalex
03/24/2023, 2:55 PMDo you know if in the future it is planned that this reloading will not be done anymore or that the reloading will be for a smaller part (just the repository and not the whole workspace for example)?Directionally we are moving away from
repository
to Definitions
which are limited to one per code location
https://docs.dagster.io/concepts/code-locations
https://github.com/dagster-io/dagster/discussions/10772
so i don’t expect much change in repository
.
since I would have to use a more powerful cluster in terms of CPU and memory resourcesI wouldn’t expect much total CPU cost shift going from one to N code servers. The fixed memory overhead will accumulate across N servers but I would still be surprised if this was meaningful relative to any data computation happening in ops/assets (unless none of the assets/ops do meaningful local compute).