A question about combining an asset with `output_r...
# ask-community
g
A question about combining an asset with
output_required=False
with a
FreshnessPolicy
and `asset_reconciliation_sensor`: if the asset job is written to only materialize (
yield
an
Output
) when some source data on the internet changes, but the
FreshnessPolicy
is set to "1 hour", what happens when the last materialization was 10 hours ago but the source data hasn't changed, yet?
Will the sensor go crazy and launch a run every 30 seconds?
It would appear that combining
asset_reconciliation_sensor
with an asset that has a
FreshnessPolicy
and does not always return a result is not a good idea. The sensor is kicking off a run every time the sensor fires once the asset becomes stale.
y
Will the sensor go crazy and launch a run every 30 seconds?
sensor can handle kicking off runs every 30 seconds.
cc @sandy regarding combining
asset_reconciliation_sensor
and
FreshnessPolicy
s
@owen thoughts?
o
hi @Gabe Schine! I think the ideal way of handling this situation (which is not currently possible, but something we're actively working on) would be with an observable source asset representing that upstream source data on the internet. The reconciliation sensor currently does not take these source asset versions into account when deciding whether or not to kick off a run, resulting in the behavior you're seeing, but we hope to remedy this in the near future
g
Thanks @owen. The link you included goes to a Slack workspace that I don't have access to. Is taht the right link?
o
@Gabe Schine ah sorry -- definitely not the right link haha, updated it (and here it is again: https://docs.dagster.io/concepts/assets/asset-observations#observable-source-assets)
g
Got it, yes. Thank you. Am I reading it correctly that this is useful if determining the source asset "version" can be done in an inexpensive manner compared with downloading the data itself? The way mine are I have to download the content and hash it to generate a revision, at which point I may as well materialize the asset itself. Is my thinking logical here?
o
ah interesting -- that's accurate (the assumption is definitely that it's much easier to generate a version than materialize an asset), so it seems like this might not be a great fit. I'd have to think about this more, but it's possible that we should apply the same treatment to "skipped" assets (such as your case) as we do to "failed" assets. When an asset materialization fails, we "pretend" that the materialization actually succeeded when determining if we should kick off a run (which prevents the sensor from repeatedly kicking off a run for a failing asset). By analogy, if you had a 1hr freshness policy on your asset, the sensor wouldn't care if the asset was skipped or not, it'd just attempt to materialize that asset approximately once an hour
g
right - that'd make more sense for my use-case, and possibly for others as well: if the materialization op is smart enough to determine if an asset should or should not be materialized, in my mind the asset should be considered "still fresh" if the op determines it shouldn't be re-materialized
👍 1
my short-term fix is just to use schedules instead of freshness policy, and rely on the reconciliation sensor for downstream asset coordination
👍 1
o
got it, that's exactly the workaround I'd recommend for now 🙂