Jeremy Hermann
02/14/2024, 11:27 PMScout
02/14/2024, 11:27 PMpython from dagster import define_asset_job my_job = define_asset_job( name="my_job", selection=[...], # your assets here config={ "execution": { "config": { "multiprocess": { "max_concurrent": 10, # Adjust this number to increase parallelism }, }, }, }, )
By setting the max_concurrent
parameter in the multiprocess
executor configuration, you can control the number of processes that are spawned to execute the job, which in turn can increase the parallelism of input loading if your I/O manager supports concurrent reads. However, it's important to note that the actual parallelism achieved will also depend on how your I/O manager is implemented. Some I/O managers may not support concurrent reads or may have limitations when it comes to parallelism. If you're using a custom I/O manager, you may need to ensure that it is designed to handle concurrent access patterns efficiently. If you're still facing performance issues after adjusting the executor configuration, you may need to look into the implementation details of your I/O manager or consider using a different I/O manager that is optimized for concurrent reads.Jeremy Hermann
02/14/2024, 11:29 PM