https://dagster.io/ logo
#ask-community
Title
# ask-community
v

Vitaly Markov

05/22/2023, 7:23 AM
Let's talk about internals. 🙂 I keep wondering, is there any way to get latest materialization timestamp per asset partition without touching Dagster core? I've spotted
get_asset_records
, which returns latest mat. event for one partition. And it is affected by backfills, so it is not always the last partition. I see
AssetStatusCacheValue
which is available in asset record. It holds information about partition statuses, but no timestamps. I see
get_materialization_count_by_partition
, which is very close, but it returns only
count(id)
, but not
max(timestamp)
. In theory, I can use sensor cursor, call
get_event_records
directly and react on materialization events of specific partitions. But I might have "one-to-many" dependencies, when one asset waits for changes in multiple upstream partitions. Just one isolated mat. event is not enough, I am looking for a full picture. I've noticed multi-asset-sensor and clever logic around consumed / unconsumed events there. It might be acceptable in theory, but I am going to have thousands of jobs monitored like this, and dependency graph might change at any time. I am worried about performance and fragility of dependency checks on asset graph changes between sensor evaluations. Ideally, it would be nice to do all checks in one super-sensor or custom daemon using 1-2-3 SQL selects per tick for all assets in repo. Is there anything else which I might have missed? Thank you!
c

claire

05/22/2023, 5:01 PM
Hey Vitaly. Unfortunately we don't have a built-in way to fetch the latest materialization timestamp per partition from the instance, the easiest way at the moment is probably just to call
get_event_records
per-asset partition. You can try to use the multi asset sensor since it does have built-in methods to fetch the latest unconsumed materialization per partition. The cursor stores partitioning information, so it is possible if your partitions change frequently that you may need to do cursor resets when this occurs. Otherwise, I'd recommend filing an issue for your use case, since I do agree that it would be nice to support an instance method that fetches latest materialization per partition.
v

Vitaly Markov

05/22/2023, 5:56 PM
Thanks @claire, I almost finished custom version of alternative scheduling logic. I had to read and reverse engineer a lot of Dagster code during this process. In my view, the requirement to have
partition_keys
table with latest mat events per partition (similar to existing
asset_keys
) is almost unavoidable. It is crucial to fully support all edge case scenarios for Auto-Materializing assets. For example: https://github.com/dagster-io/dagster/issues/14385 I'll do some more tests and create an issue in a few days. Thank you!
c

claire

05/22/2023, 6:25 PM
We've definitely thought about creating a new asset keys partition keys table as you mentioned, which may be necessary to support staleness for partitioned assets. I think this is likely something we will need to eventually implement, but it isn't on our immediate roadmap
v

Vitaly Markov

05/23/2023, 11:02 AM
@claire, raised it: https://github.com/dagster-io/dagster/issues/14406 In theory, I could make PR myself with some guidance and examples of how it should look like according to latest Dagster standards. Especially regarding to testing and migrations.