Philippe Laflamme
04/24/2023, 9:00 PMAutoMaterializePolicy
more or else like so:
@asset(freshness_policy=FreshnessPolicy(max_lag_minutes=24*60, cron_schedule="2 15 * * *"), auto_materialize_policy=AutoMaterializePolicy.lazy())
def upstream_a():
pass
@asset(partitions_def=some_partitions)
def upstream_b:
pass
@asset(partitions_def=some_partitions, auto_materialize_policy=AutoMaterializePolicy.eager())
def downstream(upstream_a, upstream_b):
pass
I've seen downstream
be launched at least once with created_by:auto_materialize
so this works at least sometimes. But today, I've noticed a run for upstream_a
be automatically triggered and a sensor updating upstream_b
more or less concurrently. Both runs produced their corresponding ASSET_MATERIALIZATION
events, but no run for downstream
has been scheduled (this was more than 30 minutes ago). What are my options to debug this?owen
04/24/2023, 11:36 PMsome_partitions
a TimeWindowPartitionsDefinition (e.g. DailyPartitionsDefinition)?
• when you say that upstream_b
is updated by a sensor, what does that sensor look like?
• can you confirm if the partition that you want to get kicked off for downstream
is present in upstream_b
?
the main reason why a run of downstream
would not be kicked off even though it's eager and a parent has updated is that the corresponding upstream partition is missing (and so if a run was kicked off, it'd likely have incomplete data), so that's the first thing I'd want to checkPhilippe Laflamme
04/24/2023, 11:54 PM• isYes. It is aa TimeWindowPartitionsDefinition (e.g. DailyPartitionsDefinition)?some_partitions
DailyPartitionsDefinition
with a start_date
a timezone
and end_offset=2
• when you say thatIt's a function annotated withis updated by a sensor, what does that sensor look like?upstream_b
@sensor
which uses some_job.run_request_for_partition(...)
where some_job
contains 2 assets, the "second" asset being upstream_b
(i.e.: upstream_b
has its own upstream asset that runs in the same job)
• can you confirm if the partition that you want to get kicked off forYes, looking at the dagster UI, I can see the partition, its materialization event and the job that materialized it (i.e.:is present indownstream
?upstream_b
some_job
)
Another datapoint (which hopefully is relevant), upstream_a
was materialized again 2h later for seemingly no apparent reason. It auto-materialized at 4:20pm (which I expected) and then again at 6:02pm but no other job kicked off around that time.owen
04/24/2023, 11:55 PMowen
04/24/2023, 11:56 PMPhilippe Laflamme
04/24/2023, 11:57 PMupstream_a
is the only one with a FreshnessPolicy
downstream
is a leaf.owen
04/24/2023, 11:57 PMPhilippe Laflamme
04/24/2023, 11:57 PMsensor
Philippe Laflamme
04/24/2023, 11:57 PMowen
04/24/2023, 11:58 PMPhilippe Laflamme
04/24/2023, 11:59 PMupstream_b
consumes its "parent" as a dependencyowen
04/24/2023, 11:59 PMPhilippe Laflamme
04/25/2023, 12:04 AMsome_job
at 41945pm, `upstream_b`'s materialization event happened at 42057pm. upstream_a
was kicked off as an ad-hoc materialization (no job) at 42005 and its materialization event happened at 42018pm. So `upstream_a`'s event occurred before upstream_b
according to this, but still relatively close to one-another if that matters.Philippe Laflamme
04/25/2023, 12:06 AMupstream_a
was kicked off though to be honest. It's freshness was definitely out of date and it's lazy
so I figure that's why?Philippe Laflamme
04/25/2023, 12:07 AMowen
04/25/2023, 12:08 AMowen
04/25/2023, 12:10 AMPhilippe Laflamme
04/25/2023, 2:29 AMend_offset=1
) I don't think it should matter, but it's relevant for the next point
• tomorrow's partition (end_offset=2
) was correctly auto-materialized just now (~10:20pm). The sequence was as expected: the sensor detected the data, kicked off a some_job
run which materialized upstream_b
(and its parent) and then something kicked off a run for downstream
As expected upstream_a
was not materialized since it was within its freshness policyPhilippe Laflamme
05/29/2023, 2:42 PM1.3.6
and still seeing this problem on occasion. One thing to note is that my daemon doesn't run 24/7. When I leave things running the problem doesn't seem to occur. This usually occurs when I start the daemon after it hasn't been running for several hours (say 12 to 16 hours). In that situation, when the daemon starts, my sensors start scheduling runs for various assets, the "auto-materialize for freshness" runs get scheduled, but the "eager auto-materialize" assets do not get kicked off after their parent asset gets materialized (which were runs scheduled by the sensors).owen
06/05/2023, 11:52 PMdagster instance migrate
before any data gets written, but I think this would help a ton in understanding what's going on here.
whenever a parent of an eager auto-materialize asset is materialized, that asset will be evaluated to see if it makes sense to materialize it as well. in most cases it will, but there are some exceptions, for example if any of its parents are missing, or if any of its parents have out-of date data. regardless of if it's materialized or skipped, a reason will be recorded and visible in the UIowen
06/05/2023, 11:53 PMPhilippe Laflamme
06/06/2023, 2:05 PMowen
06/06/2023, 4:15 PM