This message was deleted dagster #ask-ai

Join Slack

This message was deleted.

# ask-ai

Slackbot

02/11/2024, 5:07 PM

This message was deleted.

Scout

02/11/2024, 5:07 PM

Based on the documentation, when you launch a backfill that covers

partitions, Dagster will typically launch

separate runs, one for each partition. This is the default behavior to avoid overwhelming resources, especially when not using a parallel-processing engine like Spark or Snowflake. However, this behavior can lead to inefficiencies when dealing with unpartitioned downstream assets, as they might be triggered multiple times unnecessarily. The documentation also mentions that backfills can be launched for a selection of partitioned assets as long as the most upstream assets share the same partitioning. This implies that if there is a mix of partitioned and unpartitioned assets, the behavior might not be optimal for the unpartitioned assets, as they do not have a partitioning scheme to dictate how many times they should be materialized. There are also discussions in the Dagster community and GitHub issues that highlight similar concerns. For example, there are issues where users have reported that partitioned job backfills do not respect the backfill policies of respective assets, leading to a run for every single partition being launched instead of respecting a "single_run" backfill policy that would trigger only once per asset. To address this, you might need to look into the specific backfill policies of your assets and how they are being respected (or not) during the backfill process. If the default behavior is causing inefficiencies, you may need to consider customizing the backfill logic or partitioning scheme to ensure that unpartitioned downstream assets are only materialized once. If you are encountering unexpected behavior with backfills in your Dagster deployment, it may be beneficial to review the backfill policies and partitioning configurations of your assets. Additionally, checking for any open issues or discussions on the Dagster GitHub repository that match your scenario could provide further insights or workarounds. If the behavior you're experiencing is not covered by the documentation or existing community discussions, it may be a good idea to raise a new issue in the Dagster GitHub repository to seek further assistance from the maintainers.

Scout

02/11/2024, 5:41 PM

To overcome the limitation where a backfill triggers materializations for unpartitioned downstream assets multiple times, you can consider providing a custom job definition that handles partitioned and unpartitioned assets differently. Here are some suggestions based on the documentation: 1. Custom Backfill Policy: You can set a custom

backfill_policy

for your assets. For partitioned assets, you can use

BackfillPolicy.single_run

to ensure that the backfill executes as a single run for all partitions. This policy needs to be set on the asset level and requires you to write code that operates on a range of partitions. 2. Separate Jobs for Partitioned and Unpartitioned Assets: Define separate jobs for partitioned and unpartitioned assets. This way, you can control the backfill behavior for each job individually. For the job targeting partitioned assets, you can specify the partitions to materialize. For the job targeting unpartitioned assets, you can ensure it runs only once. 3. Custom Partition Mapping: If you have complex dependencies between partitioned and unpartitioned assets, you might need to provide a custom

PartitionMapping

that defines how partitions of upstream assets map to downstream assets. This allows you to override the default partition dependency rules. 4. Custom Logic in Ops: If you are using ops within your job, you can add custom logic to check if the op is being run as part of a backfill and skip the execution if it's not the first partition (for unpartitioned assets). This requires careful management of execution context and partition keys. 5. Asset Sensors: Use asset sensors to trigger downstream computations based on the materialization of upstream assets. This can help ensure that unpartitioned assets are only materialized when necessary. 6. Manual Triggering: As a last resort, you could manually trigger backfills for partitioned assets and separately trigger a run for unpartitioned assets to ensure they are only materialized once. Remember that any custom solution should be thoroughly tested to ensure it behaves as expected in your specific use case. If you find that the documentation does not cover your scenario or if you're unsure how to implement a custom solution, consider reaching out to the Dagster community for support or opening an issue on the Dagster GitHub repository for further assistance.

5 Views

Open in Slack

Previous Next