We have a dagster asset pipeline that fans out to ...
# ask-ai
j
We have a dagster asset pipeline that fans out to 100s of assets and then back in to a single one at the end. We are finding that the final asset takes a long time to load all of the incoming assets -- and it appears to be using just a single thread. Is there a way to have the executor read from the io manager in parallel with several threads?
s
In Dagster, the execution of assets and the loading of inputs can be parallelized depending on the configuration of your job's executor. If you're experiencing slow load times due to single-threaded input loading, you might want to consider configuring your job to use a multi-process executor, which can execute multiple steps in parallel across different processes. The documentation provides an example of how to limit the concurrency for a job, which can be adapted to increase concurrency as well. Here's a relevant snippet from the documentation:
Copy code
python from dagster import define_asset_job  my_job = define_asset_job(     name="my_job",     selection=[...],  # your assets here     config={         "execution": {             "config": {                 "multiprocess": {                     "max_concurrent": 10,  # Adjust this number to increase parallelism                 },             },         },     }, )
By setting the
max_concurrent
parameter in the
multiprocess
executor configuration, you can control the number of processes that are spawned to execute the job, which in turn can increase the parallelism of input loading if your I/O manager supports concurrent reads. However, it's important to note that the actual parallelism achieved will also depend on how your I/O manager is implemented. Some I/O managers may not support concurrent reads or may have limitations when it comes to parallelism. If you're using a custom I/O manager, you may need to ensure that it is designed to handle concurrent access patterns efficiently. If you're still facing performance issues after adjusting the executor configuration, you may need to look into the implementation details of your I/O manager or consider using a different I/O manager that is optimized for concurrent reads.
j
i am using the multi-process excecutor and different assets run do run in parallel. but the final asset seems to use a single thread to load all of its incoming assets.