I am wondering whats the idiomatic solution to the...
# ask-community
a
I am wondering whats the idiomatic solution to the following scenario: I have a daily-partitioned asset thats kicked off by a sensor when an external successfile becomes available. However, there are occasions when the file is delayed and the day rolls over. What would be the preferred way to (attempt to) backfill, say last N days in addition to the daily sensor based schedule?
c
Essentially you need a way to determine the date that corresponds to the external success file. If you had that, you could just call multiple different run requests with partitions matching each date. Could you reasonably gather that information from your file?
🙏🏽 1
a
The successfile is a daily one so not a ton of info there. I guess a poorman’s approach would be to establish 1. missing daily partitions 2. days where the sucessfile do exists Issue run requests for the intersection. Although, I am not entirely sure what API would give me (1). Finally this job/op ought to be run on a reasonable cron schedule?
c
Could you check the file metadata to see when it was constructed?
a
Yes. The successfiles are date partitioned so this gives me (2). I am not sure though how to programmatically generate which partitions are missing for a given asset (ie 1)
c
ah I see. I guess a few things could work here:
context.instance.get_latest_materialization_event
will give you the last materialization of your asset - if you don’t anticipate needing / wanting to fill in blanks beyond the previous day, that could be used to pinpoint whether that partition needs to be run. Otherwise you could fall back to using
context.instance.get_event_records
, with a filter to just retrieve asset materialization events - and launch off runs for any files which do not have a partition attached for them yet.
a
Ah lovely! Those look like perfect starting point for my google searches. Many thanks Chris!
c
blob salute