Just reading up on the <https docs dagster io concepts asset dagster #dagster-feedback

Just reading up on the <AutoMaterializePolicy> fea...

Stephen Bailey

05/11/2023, 7:12 PM

Just reading up on the AutoMaterializePolicy features, very cool, and natural extension. Are there plans to extend it to other triggers, like schedule- and event-based? I can imagine the combination of

AutomaterializePolicy.cron_schedule("0 0 * * *")

and

AutomaterializePolicy.eager()

could solve a ton of use cases without ever having to learn about sensors and schedules classes.

johann

05/12/2023, 7:22 PM

Interesting. So far we’ve been thinking of it as a separate thing from sensors and schedules (though they’re all clearly in some instigator category). We have seen confusion arising from combing AutoMaterializePolicies with schedules though

sandy

05/12/2023, 8:25 PM

Is the idea that the assets that you'd want to put

AutoMaterializePolicy.cron_schedule("0 0 * * *")

on are at the root of the asset graph? What determines cadence you end up wanting to refresh an asset like that on? E.g. is source data that the asset is derived from that gets refreshed daily? Is there source data that gets updated continuously, but the asset doesn't need to incorporate it immediately?

Stephen Bailey

05/15/2023, 4:27 PM

I'd say I'm basically trying to remove the overhead of creating jobs, schedules, or sensors in order for a developer to deploy a new asset. Currently, they have to think about not only the asset itself, but also a

define_asset_job

AssetSelection

(or other relative python imports),

schedules

, etc. So deploying a new, simple asset brings a lot Dagster framework overhead.

AutomaterializePolicy

could make it so that any new asset can be deployed by only configuring it at the asset level in sufficiently simple circumstances

sandy

05/16/2023, 12:21 AM

I'd say I'm basically trying to remove the overhead of creating jobs, schedules, or sensors in order for a developer to deploy a new asset.

That makes total sense. I share this goal. What I'm curious about here is how annoying it would be for the developer to express the schedule for the root asset in a "declarative" way, i.e. in terms of either: 1. When source data is available 2. When derived data is required to be up-to-date Example of (1): "refresh the events table whenever the raw_events table is modified"

Copy code

@observable_source_asset(auto_observe_interval_minutes=30)
def raw_events():
    return get_last_modified_timestamp("raw_events_table")

@asset(non_argument_deps={"raw_events"}, auto_materialize_policy=AutoMaterializePolicy.eager())
def events():
    ...

Example of (2): "the events table should never be more than 24 hours out of date"

Copy code

@asset(
    freshness_policy=FreshnessPolicy(maximum_lag_minutes=24 * 60),
    auto_materialize_policy=AutoMaterializePolicy.lazy(),
)
def events():
    ...

Both of these are more code than what you're suggesting and don't map to it 100%, so I'm not convinced they're better. Just trying to understand how far apart your mental model is from the mental model of the current declarative scheduling system.

Stephen Bailey

05/16/2023, 12:57 AM

I guess my mental model sees both "refreshed-on-a-cron-schedule" and "refreshed-with-respect-to-some-dataset-property" as both instances of declarative scheduling? I get that it deviates from the freshness policy mental model, but I'll confess I have trouble mapping freshness policies -- in almost all cases we are continuously receiving / ingesting data, so there's not really a difference between giving it a cron schedule and saying, "check for new data every 30 minutes". The pain I'm seeing is that schedules are being set in places other than where the asset itself is declared, which makes it harder to build and debug.

sandy

05/16/2023, 3:31 AM

Got it - that makes sense

👍 1

sandy

05/16/2023, 10:38 PM

I filed an issue to track this request: https://github.com/dagster-io/dagster/issues/14328

❤️ 2

Vinnie

05/17/2023, 9:47 AM

Piggybacking to say that this would be an amazing feature. As Stephen said, I’d love to say goodbye to jobs entirely and shift the mental model and the entire data platform to an asset-only one. I currently only have asset jobs for the root assets and everything else gets auto-materialized, but removing this initial layer would be even better!

Open in Slack

Previous Next