I currently have a dynamically generated list of values that dagster #ask-community

I currently have a dynamically generated list of v...

Chris Anderson

02/16/2023, 8:26 PM

I currently have a dynamically generated list of values that i'd like to map down through a pipeline of ops until it reaches a

.collect

call eventually, but i'd like to have a limit of only one of those pipelines within a job running at a time. For a visual, in the example photo below i'd like the first mapping index of downstream ops to run and finish execution before the next mapping index is started. I tried per-op prioritization but it seems (correct me if i'm wrong) that it's a prioritization of launching the op, not actually the complete execution of it. Is there an easy way to offer concurrency limits on a per-mapping index baseline, or some ideas about possible method of going about this problem?

Chris Anderson

02/16/2023, 8:34 PM

It seems like https://docs.dagster.io/_apidocs/execution#dagster.in_process_executor is very close to what I want, is this correct?

owen

02/16/2023, 9:53 PM

The in_process_executor will limit concurrency to 1, so that, combined with setting downstream_two to a higher priority than downstream_one would probably work, but you can also set the concurrency limit via other means For example:

@job(config={"execution": {"config": {"multiprocess": {"max_concurrent": 1}}}})

Another option would be to apply per-tag concurrency limits. If you add a tag to downstream_one of

{"group": "downstream_one"}

and a similar tag to downstream_two, you should be able to set per-tag concurrency limits:

Copy code

@job(
    config={
        "execution": {
            "config": {
               "tag_concurrency_limits": [
                   {"key": "group", "value": "downstream_one", "limit": 1},
                   {"key": "group", "value": "downstream_two", "limit": 1},
                ]
            }
         }
    }
)
    ...

(apologies if the formatting is somewhat off) this will limit the concurrency purely for those ops, meaning you can still run more stuff concurrently in cases where the order doesn't matter

🌈 1

3 Views

Open in Slack

Previous Next