For AutoMaterialize assets is there a way to constrain to on dagster #dagster-feedback

For AutoMaterialize assets, is there a way to cons...

Chris Comeau

06/13/2023, 5:45 PM

For AutoMaterialize assets, is there a way to constrain to one asset per run? I'm currently working through retry policies and I think it might be helpful to avoid grouping multiple asset ops into one run. You wind up with a lot of runs, but those runs are simple, and the assets are plainly laid out on the in-progress runs rather than seeing "48 assets". For the way I'm looking at things right now, the concept of a run as a grouping of asset ops is adding a layer of indirection that probably isn't needed any more - I'm just looking at asset materialization ops. Assuming that's possible, with AutoMaterialize processing emitting one run per asset, and a run tag with the asset name (node_handle), global asset-level concurrency control could then be done like this:

Copy code

run_queue:
  tag_concurrency_limits:
    - key: "asset"
      value:
        applyLimitPerUniqueValue: true
      limit: 1

I might be thinking of all this the wrong way. Basically the behaviour I'm looking for is: - avoid overlapping execution of the same asset op - always retry assets after failure after some delay, never "give up" in a way that requires manual intervention. (right now I think I have this handled by having AutoMaterializePolicy + FreshnessPolicy with no RetryPolicy, with a retry interval equal to the freshness policy interval). For more frequent "in-run" retries, raising RetryRequested exception. Maybe an easier approach: a different asset config that adjusts the timing here https://docs.dagster.io/concepts/assets/asset-auto-execution#auto-materializing-assets-

Dagster won't try to auto-materialize that asset again until it would auto-materialize it if the failed run had succeeded. I.e., if an asset has a daily freshness policy, and it fails, Dagster won't auto-materialize it again until the next day.

So say we have a daily freshness policy, but want it to retry auto-materialize at 15 minute intervals in case of failure, have that retry_interval as another argument to the FreshnessPolicy. When overdue, and failing, how often to retry (but in a new run).

owen

06/14/2023, 7:12 PM

Hi @Chris Comeau! I appreciate the detailed feedback here. Starting with the conceptual bit, "why even create multi-asset runs?", you're correct that there isn't any requirement for things to work this way, and it's definitely worth considering that as a possibility. We already chop up runs at the code location boundaries, as well as at the intersections between different PartitionsDefinitions, and things would work similarly if we just refused to combine any assets at all. The main arguments for the current behavior off the top of my head are: 1. In some cases, this ends up being more understandable, as if one upstream update requires a lot of downstream work to be done, so once the daemon realizes that this work is necessary, it's nice to see that downstream work grouped together into a single run a. counter-argument: if this is desirable, then maybe we should have a purpose-built UI for this sort of "in-progress" work that can also handle downstream partition and code-location boundary issues. 2. This is a bit of a strange execution model, as it removes some bit of flexibility in the difference between launching a run and executing a step. That is, depending on your execution setup, it might end up being less computationally efficient to launch a bunch of small runs rather than grouping together more steps into a single run. There might be some other things I'm missing there, but I think the original impetus was definitely centered around the first point around observability. Addressing the more specific question about retrying, the ability to configure global op-level concurrency (and therefore global asset-level concurrency) is in progress, and I believe will be released in the next couple of releases. So long story short, breaking each asset into its own separate run will no longer be necessary in order to get the functionality that you're looking for 🙂

👍 1

Chris Comeau

06/15/2023, 8:53 PM

Right now I have global concurrency control working using dask-scheduler's semaphores, where each asset op requires a lease on its asset key (limit 1) during execution. If there's any trouble with the postgres-based approach, that could be another option. One problem with doing this lease acquisition in the op: when multiple overlapping asset ops are started (e.g. by clicking the materialize button too many times), they all show as running while waiting for the first one's lease to be released, which can fill up the default queue of 10 runs easily. It'd be an improvement for that to be caught in the daemon before starting runs for conflicting ops for sure. I'm mostly thinking of basic "refresh staged table in data warehouse from source" patterns here, handling those in the simplest way possible, which might help for people getting started. I actually do have big chains of derived datasets in other areas where it's helpful to see those in the same run for context.

238 Views

Open in Slack

Previous Next