Hi! I'm testing `@dbt_assets` with AutoMaterialize...
# integration-dbt
n
Hi! I'm testing
@dbt_assets
with AutoMaterializePolicy. Current version supports two types for `dagster_auto_materialize_policy`:
lazy
and
eager
. For local dev environments we need to materialize only few dbt models via freshness policies.
AutoMaterializePolicy.lazy()
defines
on_missing=True
which triggers materialization of all models. Is there any way to define custom
AutoMaterializePolicy
with new
@dbt_assets
? Or extend
AutoMaterializePolicy
with "super lazy" type with:
on_missing=False,
on_new_parent_data=False,
for_freshness=True,
s
@owen - what would you think about adding an
on_missing
argument to
lazy
?
o
hi @Nikolaj Galak ! right now, an unmaterialized asset is considered to be an indeterminate amount "out of date", and so the freshness logic will want to materialize the asset immediately (it's almost like it's an infinite amount out of date). In your ideal world, would the "super lazy" behavior basically just wait for the root data to be updated by some external process before propagating those changes downstream? basically like a "don't materialize anything for the first time in order to align with this freshness policy" sort of thing?
n
Hi @owen. Exactly, "don't materialize anything right after Automaterialize daemon is enabled, wait until next freshness policy trigger event". This behavior corresponds to zero amount out of date when asset materialization is missing.
o
just trying to nail down the definition here (so sorry for being a bit pedantic), but if a missing asset is considered to be completely up to date in this model, then we would never materialize the asset at all (as regardless of what's happening upstream of it, a missing asset would be considered to be in an acceptable state. it seems like the logic might need to be something like, instead of considering missing assets "infinitely out of date" or "zero out of date", we would want to consider them "`current_time - latest_root_materialization_time` out of date". So if you have assets A and B, where A is materialized at 1:00, and it's currently 1:30, and B has a 60 minute freshness policy, then we'll wait to materialize B until closer to 2:00. The question then remains what to do with the root assets (e.g. if A is also not materialized when we turn on the daemon). What would you expect in that case? My intuition is that the roots should probably all be immediately kicked off, so that there's some timestamp to compare against, but interested in your thoughts there. Also, if you're interested in writing up a github issue with some examples and expected behavior, I'm happy to chat in that venue as well
n
As I see it, there are two reference points
latest_root_materialization_time
and
daemon_start_time
that can be used for "out of date" calculation. If root is missing, then only logical reference point I can think of is
daemon_start_time
. I agree with you that if
latest_root_materialization_time
is not null then it is much better reference point compared to infinite or zero. In your example with A and B models, my thinking is: if any of downstream models require fresh data, all upstream models should be materialized, so freshness policy on B should trigger materialization of A, but A should not be materialized just because daemon got started. I'd gladly create github issue for the idea.
🌈 1
132 Views