A question about combining an asset with `output required=Fa dagster #ask-community

A question about combining an asset with `output_r...

Gabe Schine

01/23/2023, 5:23 AM

A question about combining an asset with

output_required=False

with a

FreshnessPolicy

and `asset_reconciliation_sensor`: if the asset job is written to only materialize (

yield

Output

) when some source data on the internet changes, but the

FreshnessPolicy

is set to "1 hour", what happens when the last materialization was 10 hours ago but the source data hasn't changed, yet?

Gabe Schine

01/23/2023, 5:24 AM

Will the sensor go crazy and launch a run every 30 seconds?

Gabe Schine

01/26/2023, 12:26 AM

It would appear that combining

asset_reconciliation_sensor

with an asset that has a

FreshnessPolicy

and does not always return a result is not a good idea. The sensor is kicking off a run every time the sensor fires once the asset becomes stale.

yuhan

01/30/2023, 7:37 PM

Will the sensor go crazy and launch a run every 30 seconds?

sensor can handle kicking off runs every 30 seconds.

yuhan

01/30/2023, 7:37 PM

cc @sandy regarding combining

asset_reconciliation_sensor

and

FreshnessPolicy

sandy

01/30/2023, 8:40 PM

@owen thoughts?

owen

01/30/2023, 8:44 PM

hi @Gabe Schine! I think the ideal way of handling this situation (which is not currently possible, but something we're actively working on) would be with an observable source asset representing that upstream source data on the internet. The reconciliation sensor currently does not take these source asset versions into account when deciding whether or not to kick off a run, resulting in the behavior you're seeing, but we hope to remedy this in the near future

Gabe Schine

01/30/2023, 11:48 PM

Thanks @owen. The link you included goes to a Slack workspace that I don't have access to. Is taht the right link?

owen

01/31/2023, 6:04 PM

@Gabe Schine ah sorry -- definitely not the right link haha, updated it (and here it is again: https://docs.dagster.io/concepts/assets/asset-observations#observable-source-assets)

Gabe Schine

01/31/2023, 7:21 PM

Got it, yes. Thank you. Am I reading it correctly that this is useful if determining the source asset "version" can be done in an inexpensive manner compared with downloading the data itself? The way mine are I have to download the content and hash it to generate a revision, at which point I may as well materialize the asset itself. Is my thinking logical here?

owen

01/31/2023, 7:27 PM

ah interesting -- that's accurate (the assumption is definitely that it's much easier to generate a version than materialize an asset), so it seems like this might not be a great fit. I'd have to think about this more, but it's possible that we should apply the same treatment to "skipped" assets (such as your case) as we do to "failed" assets. When an asset materialization fails, we "pretend" that the materialization actually succeeded when determining if we should kick off a run (which prevents the sensor from repeatedly kicking off a run for a failing asset). By analogy, if you had a 1hr freshness policy on your asset, the sensor wouldn't care if the asset was skipped or not, it'd just attempt to materialize that asset approximately once an hour

Gabe Schine

01/31/2023, 7:30 PM

right - that'd make more sense for my use-case, and possibly for others as well: if the materialization op is smart enough to determine if an asset should or should not be materialized, in my mind the asset should be considered "still fresh" if the op determines it shouldn't be re-materialized

👍 1

Gabe Schine

01/31/2023, 7:30 PM

my short-term fix is just to use schedules instead of freshness policy, and rely on the reconciliation sensor for downstream asset coordination

👍 1

owen

01/31/2023, 7:33 PM

got it, that's exactly the workaround I'd recommend for now 🙂

2 Views

Open in Slack

Previous Next