Hi. I think Dagster is really missing something li...
# dagster-feedback
g
Hi. I think Dagster is really missing something like a
MultithreadedExecutor
which would allow executing steps in parallel within the same process. Currently, when running on Kubernetes using the
MultiprocessExecutor
, there's a massive cold start delay for each subprocess being started because, from what I understand, it has to reload the entire project and assets, which can lead to extremely long jobs even when the tasks are very lightweight.
c
I asked a related question in #dagster-support yesterday: https://dagster.slack.com/archives/C01U954MEER/p1678979490771579 I’m not quite sure what the cold start is about either but I think it’s a bug.
g
My understanding is that it's not a bug, it's how executors are designed. For each of your assets on your graph, meaning for each subprocess, your entire project (all of the assets) are loaded. Hence the proposal of introducing a multithreaded executor instead, where everything is loaded once instead.
👀 1
a
tracking task is https://github.com/dagster-io/dagster/issues/4041 one mitigation for heavy import time cost thats possible in the interim is to use the
forkserver
start method https://docs.dagster.io/concepts/ops-jobs-graphs/job-execution#default-job-executor
👍 1
g
I tried that option earlier but saw only very marginal speed gain (a handful of seconds).
d
I saw mentioned here that we might need to add the
preload_modules
in the config if the default doesn't load the necessary modules as expected: https://github.com/dagster-io/dagster/discussions/7338 Is that in the
forkserver
start method config? How can we determine which modules were loaded correctly via
forkserver
and which we need to list explicitly?
g
Just did a test with:
Copy code
execution:
  config:
    multiprocess:
      start_method:
        forkserver:
          preload_modules:
          - dagster
          - dagster_dbt
          - dagster_k8s
          - dagster_postgres
          - ...
I see no difference...
Actually that's not true. I did some more tests and I do see up to some minus 20-25% delay between steps when listing modules. But in terms of absolute figures, the cold start is still annoyingly long between steps.
a
the preload module resolution is here https://github.com/dagster-io/dagster/blame/master/python_modules/dagster/dagster/_core/executor/multiprocess.py#L154-L171 The best case scenario is to pre-load the module that defines all your definitions, assuming the common pattern of defining them at import time (and that dependent libraries are fork safe). In the end I agree that an in-process async/threaded executor is really what is needed to make lightweight concurrency work better
h
out of interest, I know you can use async ops, but how are they handled in dagster? I assume they are just passed individually to asyncio.run
a
just passed individually to asyncio.run
correct
a
@alex I wasn't able to understand how to make use of pre-load definitions.. our code location code is pretty full-on and it takes a lot of time at the start of each process. Can I make use of what you're referring here? for reference - https://dagster.slack.com/archives/C02LJ7G0LAZ/p1679446117427139?thread_ts=1679377131.710799&cid=C02LJ7G0LAZ
a
module preloads is specific to multiprocess executor
a
We had to switch to in_process as multiprocess was taking too long for each step to execute. If not preload, would you have any other suggestion for the issue I have raised in my comment?
c
On my side, I had some higher priority work. But I’m back on this, I’m going to try out the forkserver. Is this the correct YAML config? I’m not sure about the null
Copy code
execution:
  config:
    multiprocess:
      max_concurrent: 15
      start_method:
        forkserver: null
The best case scenario is to pre-load the module that defines all your definitions
what do you mean by definitions? And do you have a recommendation to know whether or not dependent libraries are fork safe? @alex
a
would you have any other suggestion for the issue I have raised in my comment?
probably need to consider optimizations specific to how you are building up your code location
forkserver: null
not sure about null working,
forkserver: {}
should work
what do you mean by definitions?
@op
,
@asset
, etc https://docs.dagster.io/concepts/code-locations
know whether or not dependent libraries are fork safe?
No special advice, try it out and/or search the web.
👌 1
c
forkserver
gave us a 30 to 40% faster run. Pretty nice. Multithreading would be nicer but for now this will do for our use case.
j
Hi! 👋 I face a similar problem where at the beginning of each execution the whole project is reloaded and this is quite costly (40 seconds of waiting). My project has about 50 repositories (1 repository per client) of variable size (between 5 and 70 jobs and 20 and 1000 assets). I am aware that the ideal solution would be to use several workspaces to have much shorter reloads. In reality, I don't really see any reason to do that, except to reduce the load, especially since I would have to use a more powerful cluster in terms of CPU and memory resources. @alex Do you know if in the future it is planned that this reloading will not be done anymore or that the reloading will be for a smaller part (just the repository and not the whole workspace for example)?
a
Do you know if in the future it is planned that this reloading will not be done anymore or that the reloading will be for a smaller part (just the repository and not the whole workspace for example)?
Directionally we are moving away from
repository
to
Definitions
which are limited to one per code location https://docs.dagster.io/concepts/code-locations https://github.com/dagster-io/dagster/discussions/10772 so i don’t expect much change in
repository
.
since I would have to use a more powerful cluster in terms of CPU and memory resources
I wouldn’t expect much total CPU cost shift going from one to N code servers. The fixed memory overhead will accumulate across N servers but I would still be surprised if this was meaningful relative to any data computation happening in ops/assets (unless none of the assets/ops do meaningful local compute).