We're having some issues with (lazy) auto material...
# ask-community
m
We're having some issues with (lazy) auto materialization, where some assets never get triggered. As an example: We have two assets
users
and
subscriptions
that are backed by dbt models to pump data from one data store to another with some transformations applied.
subscriptions
depends on users in addition to some source assets. Both of them have a freshness policy that requires them to be materialized by 02:15am with a 2 hour max lag. However, neither asset gets an auto-materialization with reason "Required to meet this asset's freshness policy" triggered.
users
happens to get triggered due to a downstream freshness policy at some later point in time, but
subscriptions
becomes and remains overdue. The auto-materialize history is entirely empty for that asset. The expected behaviour would be that both assets get materialized around the same time around 02:15am. We have some other assets that are also affected by this, but this is the simplest case I could find. I unfortunately have not been able to reproduce this locally.
The asset subgraph for these two looks like this (
users
has several more source assets it depends on):
o
Hi @Marvin Rösch -- does
users
depend on any observable source assets, or are they all just regular source assets?
m
Hi @owen, l honestly don't quite to know 😅 It's all things from a dbt project. The source assets that
users
depends on definitely do not get any observation events from dbt tests, however
Heya, @owen, sorry to ping you again, but do you have some idea what might be going wrong? The affected assets thankfully aren't too time-critical and we can occasionally update them manually, but we're not particularly confident in other freshness policies being met correctly given this issue
d
Hey Marvin - owen is out until next week Thursday, but I'll check with the team to see if there are any troubleshooting steps we can give you. Not being able to reproduce locally makes this particularly tricky but we'll see what we can find
m
Ah, thanks! I think in general a nice longterm improvement would be to have a more comprehensive overview of auto materialization decisions in the web UI. Something where you are able to look at some logs for every evaluation of the auto materialization policy so it is more transparent why a specific condition apparently wasn't met. Right now it is very easy to identify why a materialization was triggered, but not why it wasn't.
d
Absolutely - that feedback really resonates and is in line with improvements we want to make around making the auto-materialization process much more observable before moving the feature out of the Experimental bucket
m
We have updated to Dagster 1.4.11 in hopes that we could glean something from the new asset daemon logs. Unfortunately, there isn't much new there, the affected assets simply do not get any logs (do we maybe need to lower the minimum log level for more details?). Interestingly, we had another asset manifest this issue now that previously was fine. The attached screenshot shows the auto-materialize history. Nothing in the freshness policy, auto-materialization policy or dependencies for the asset has changed since Aug 31, but it stopped getting materialized on Sep 1. We did deploy an update to Dagster 1.4.10 (from 1.4.7) on Aug 31, so it looks like there was some regression there?
Are there any steps we could take to help better understand our issue? Or is there any input on whether https://github.com/dagster-io/dagster/issues/14328 would be implemented? Being able to specify a fixed schedule for auto-materialization would solve a lot of our issues since many of our dbt models more or less only need that and do not quite fit the freshness model.
o
hi @Marvin Rösch! very sorry for the delayed response here, looks like I forgot to hit send on my previous comment:
definitely agree that having more in-depth logging here would be useful, I'm looking into adding some freshness-specific information to those logs. are the evaluations you're showing there for the
users
asset or the
subscriptions
asset?
but to get to the heart of the issue, I think you're correct that a freshness-based solution is potentially overkill for your specific situation. In terms of that specific issue, this is something we are investigating, and is certainly something that we want implemented, but ideally in a way that does not conflict too heavily with the existing scheduling system. with that in mind, one pattern that we've recommended for similar situations is a combination of traditional schedules for your root assets ("run these dbt models at x time every morning"), and then eager policies for the downstreams. would this work for your specific usecase?
m
That particular history in my latest screenshot is from another asset which stopped being auto-materialized after we upgraded to 1.4.10. It is like the
subscriptions
asset in that it has a dependency on another asset, but in this case that dependency got materialized just fine. It does not depend on any additional source assets, unlike
subscriptions
. We are already considering creating schedules for the root assets given these issues, but mainly avoided it due to the overhead from adding the schedules for the dbt-generated assets.
o
got it, that makes sense, just logging here that we're still investigating / trying to replicate this behavior, and separately that we're beginning looking into a simpler (more direct) cron-based rule system for these sorts of cases
👍 1
m
@owen Just wanted to provide an update on this from our side: We have implemented schedules based on the dbt assets since the last message, so we're in the green for now. This also surfaced the likely cause of some of our issues, as the freshness policy for some downstream assets was such that they should have been materialized at a time when some dependency was not yet needed to be fresh. Hypothetically, that should be covered by the criterion "upstream asset gets materialized when downstream asset needs it for freshness", though, right? I have not been able to reproduce the issue in isolation, unfortunately, but maybe that helps you as a pointer. The other potential troublemaker would be assets where up- and downstream asset have the same freshness policy. It appears like our (production) Dagster is getting confused there or "unaware" of how dbt takes care of materializing both in the correct order. Again, I could not reproduce this locally.