Has anyone run into a case where an `eager` automaterializat dagster #ask-community

Has anyone run into a case where an `eager` autom...

Joel Olazagasti

05/16/2023, 8:03 PM

Has anyone run into a case where an

eager

automaterialization policy with DBT assets will kick off multiple jobs with overlapping assets? We're running into this after an overnight sync failure last night, and it's making reconciling the downstream changes difficult. Also, the policy is triggering on assets whose upstream ingestion dependencies are still running, is there a way to configure the policy such that it won't trigger if an active upstream job is active?

owen

05/16/2023, 9:27 PM

hey @Joel Olazagasti! the first part is definitely unexpected behavior. Were these jobs with overlapping assets all launched around the same time, or were they spread out over time? also, are your dbt assets all part of a single

load_assets_from_dbt...

call, or are there multiple underlying multi-assets? as for not triggering when the upstream ingestion dependencies are still running, this does make a good deal of sense. would you mind filing a github issue for that? in the meantime, one possible solution might be to use

AutoMaterializePolicy.lazy()

, then apply a freshness policy with a maximum lag minutes of (e.g.) 1hr to the assets downstream of your ingestion assets, such that they will wait awhile before consuming the ingestion data. This does mean that the downstream assets would still wait a similar amount of time even if all ingestion tasks had finished, so I do think you're right that the behavior of waiting for in-flight runs to complete ends up the cleanest

Joel Olazagasti

05/16/2023, 9:47 PM

They're all loaded from a single call. I think it might be related to the fact that the automaterialize policy triggered a run on a partitioned asset as well as that asset's missed schedule being run when I merged the PR. We run right up against our hourly consumption limit on the API that job consumes, so the simultaneous runs hit the rate limit and caused both jobs to fail. Once I rematerialized that asset it looks like the automaterialize policy tried to kick off jobs for both code version changes & upstream data changes at the same time, which had overlapping assets. Submitted an issue here. As I explained there, this is probably only entirely unwanted behavior in cases like mine where a code change also triggers schedules/backfills. I can imagine cases where folks might have a complex asset that always has upstream assets materializing, in which case they'd want the current behavior if they wished to have continuously up to date data.

owen

05/16/2023, 9:50 PM

Ah interesting -- so you have both a schedule and an auto-materialize policy targeting the same (partitioned) asset?

Joel Olazagasti

05/16/2023, 9:54 PM

That's correct. We only just moved to

1.3+

so I hadn't considered the implications of that. Theoretically, if I didn't care what time a partitioned asset synced, I could forgo scheduling it at all? But yes, I'm loading in my asset defs with

load_assets_from_package_module

and assigning the auto-materialize policy there, where the partitioned asset gets picked up as well. That was mostly out of convenience, but if I need to load in/assign the policy more granularly that's not an issue.

owen

05/16/2023, 10:02 PM

ah that makes sense. assuming the asset is time-partitioned, then foregoing the schedule would mean that the new partition would be materialized basically as soon as it existed (so right at the end of the day, if it's daily partitioned). in general, having two instigators (AMPs / schedules) can cause odd things, as they act pretty much independently. that being said, it's still odd / seemingly incorrect behavior that multiple overlapping sets of assets would be triggered at the same time -- my suspicion is that these runs must have been kicked off on separate ticks of the daemon (as the assets that we plan to materialize are de-duped) do you have a screenshot showing these runs? would love to try to get to the bottom of why this would happen

owen

05/16/2023, 10:04 PM

In the next couple releases, we'll be shipping a UI that can give some more insight into why things were kicked off on any given tick, which seems like it would be very useful here

Joel Olazagasti

05/16/2023, 10:10 PM

Ahh, you may be on to something there, they were triggered in the same minute. What info would you need in the screenshots?

owen

05/16/2023, 10:12 PM

mostly just the set of assets in each run, and the run timestamps

Joel Olazagasti

05/16/2023, 10:28 PM

Yeah, let me anonymize the names of assets real quick, and annotate with their dbt hierarchy. The model names are descriptive enough they might leak too much about our internal processes 😂

🌈 1

Joel Olazagasti

05/16/2023, 10:53 PM

Okay, so small caveat to all of this. Because the runs were all being triggered in such a weird order, with dbt assets that were out of date with their code versions, asset D in this picture had a cartesian join somehow, that was resolved when I just reran our whole DBT stack. But the loaded data, asset A, that the rest of the assets are dependent on in this heirarchy, was correctly loaded in the partitioned run. (Hopefully this all makes sense, it was a lot of moving parts at once, and sparked a team discussion about how many changes should be in a single PR, and why it's important not to admin force push changes before our CI pipeline runs blobl grimace)

owen

05/16/2023, 11:25 PM

gotcha, thanks for the diagram! it seems like, on the first tick after

was materialized, the logic correctly identified that all downstream assets should be materialized. On the next tick (~30 seconds later), something weird happened which caused the logic to kick off

and

, but it's really hard to determine exactly what that weird thing was without that observability UI (and looking through the code I can't see how that would be possible w/o at least some materialization occurring in between those two ticks

owen

05/16/2023, 11:26 PM

right now, code versions are not taken into account by the logic, so I don't think that would have had an effect

Joel Olazagasti

05/16/2023, 11:43 PM

Ahh, okay. I didn't notice when writing this up, but in the subsequent minute from the scheduled job starting, there was another DBT asset auto-materialization that started, which is an upstream dependency of asset D, but from the other side of the asset graph, with no co-dependencies for assets A - C. That must be because when I retriggered the partioned job run (when our api quota refreshed) I also kicked off a graph of DBT assets to try and assess where the cartesian join was coming from. I only mentioned the code version because some downstream DBT assets ran with a new code version than their upstream assets, which is what caused the cartesian join. On a related note: In a case like ours, with a DBT asset that's dependent on 2 sets of distinct DBT asset graphs, it seems like the automaterialization policy will happily run one of those graphs, including the final asset (Asset D in this graph) with a new code version if there's an upstream data change, even if there's code version staleness on the other side of the dependencies. Is that a correct understanding? And as a follow up, is there a suggested pattern for automated materializations of code changes? Ideally we'd have all of our DBT models and their downstreams run when we've pushed a code change. Until this moment I, mistakenly, believed that the auto-materialization policies would handle that. Our prior plan for that was to implement a job call in CI that dynamically refreshed changed models based on the DBT manifest. I'm not relishing that complexity though, so if there's a better pattern I'd love to learn.

Joel Olazagasti

05/16/2023, 11:44 PM

If this is all too hard to follow without introspecting the UI, I can get approval from my higher-ups tomorrow, and either huddle or PM with you with the actual details.

158 Views

Open in Slack

Previous Next