Hi, If I define the following DAG: dep_C(dep_A, de...
# announcements
d
Hi, If I define the following DAG: dep_C(dep_A, dep_B)... I.e., C that executes after A and B finish. I'd want A and B to both be triggered at the same time and in parallel... Right now A and B are independent python processes that both run on their own cluster, independent of dagster. However in my case right now A and B are not triggered in parallel, rather A then B and then C... is this the expected behavior or I am doing something wrong?
s
Hey @Daniel Fernandez, do you know which executor you are using?
You might be using the
in_process
executor, but the
multiprocess
executor will give you the behavior you are looking for
To use the
multiprocess
executor, you can include the following snippet in your run config:
Copy code
execution:
  multiprocess:
More info here: https://docs.dagster.io/_apidocs/execution#dagster.multiprocess_executor
1
d
@sashank thanks very much for the detailed answer. I see that multiprocess_executor has the following parameter: _The ``max_concurrent`` arg is optional and tells the execution engine how many processes may run concurrently. By default, or if you set ``max_concurrent`` to be 0, this is the return value of pyfunc:`python:multiprocessing.cpu_count`_ It seems that this parameter depends on the cpu_count of the cluster where dagster is installed. But our idea is that most (if not all) nodes/solids will actually run outside of the dagster cluster (as batch jobs, lambda jobs, spark jobs, dask jobs, etc.). Based on this I think this cpu_count of the dagster cluster is not the default we'd want, am I right or I am missing some of the logic on how this multiprocess executor works?
@Vinod @Zuber for awareness