Hello, I'm new to Dagster, but it looks wonderful! I'm currently in a situation where my data pipeline have a lot of dependencies between the different (ca. 300) assets so I think Dagster will handle this as 1 big job, right? My pipeline starts from source systems which are loaded into a DB, then transformed and then moved to a datawarehouse. I'm looking to optimize the pipeline run and was diving into the dagster docs but some questions arrise: each of the systems I'm connecting with can handle concurrent tasks (some 2, some 8, others 4). So ideally, depending on the resource of the asset, it is queued in a different queue where I can set the concurrency limit. Can dagster do this?
01/03/2023, 5:07 PM
Hi Joris - do you need these limits to apply across multiple runs? Or would being able to set limits within a single run be sufficient?
Hi Daniel, thank you very much for your reply! I'm looking forward to the release this thursday and will have an extra look at the doc page you mentioned.
An extra question to be sure: If I configure the materializing of all assets into one job (besides the dependent assets described above, we have at the moment no extra assets/ops), I suppose that that results into 1 run. So to answer your question: I would be nice to set limits within a run.