Mikael Frosthage

02/10/2024, 10:50 PM
I have a partitioned asset. When running back fills. Is it possible to regulate how many jobs will be running concurrently. I have a problem where one database is a replica. Meaning I can only make one query at a time.


02/10/2024, 10:50 PM
Yes, it is possible to regulate how many jobs will be running concurrently when performing backfills on a partitioned asset in Dagster. You can achieve this by configuring a run queue and using
in the run queue configuration to control concurrency. Here's how you can set up concurrency limits for backfills: 1. Configure a Run Queue: Depending on whether you are using Dagster Cloud or Dagster Open Source, you will need to set up a run queue. This run queue will manage the execution of runs. 2. Use
: Within the run queue configuration, you can specify
to control how many runs can execute concurrently. You can set limits across all backfills or within each backfill. For example, to limit to one concurrent run for a specific backfill, you could use a configuration like this:
yaml run_coordinator:   module: dagster.core.run_coordinator   class: QueuedRunCoordinator    config:     tag_concurrency_limits:       - key: "my_limited_backfill_tag"         limit: 1
3. Supply Tags When Launching a Backfill: When you launch a backfill from the Dagster UI, you can supply a set of tags in the backfill dialog. These tags will be used in conjunction with the
to enforce the concurrency limits. By setting the concurrency limit to 1 for the tag associated with your database backfill, you can ensure that only one query is made at a time, thus respecting the limitation of your replica database. For more detailed guidance and examples, you can refer to the documentation on limiting concurrency in data pipelines.