Thanks again both of you for the detailed responses! @Stephen Bailey I am largely in agreement with the larger point that the FreshnessPolicy is/should be the main abstraction that users focus on, and that more flexibility in that layer would be useful. One nuance that I want to dig into is the idea of assets that have somewhat ambiguous “staleness” values.
This situation is actually quite common in the non-partitioned asset world. For example, if you have an online transactional database table (source asset), which gets sync’d to an analytical database table (regular asset), it is always possible to materialize the downstream table asset in order to pull in new data. It may be the case that you’re fairly indifferent on how often this syncing operation takes place (running it more often may have negligible cost implications, as each run only updates new rows), so setting a FreshnessPolicy on that particular node may be undesirable. You just want that table to be updated at whatever times allow downstream assets (which do have FreshnessPolicies) to get the data that they need in time.
I don’t think this pattern (which I would argue is a core use case) can coexist with “The reconciliation sensor should submit only and all stale assets for rerun”. Currently, we treat assets in that category as neither fresh nor stale. They’re not fresh because they have no freshness policy defined, and not stale because there’s no upstream materialization event (we just are assuming that the upstream data for the source asset is always changing) and… well, it’d be annoying to have a bunch of assets marked as stale with no way to fix it. If we had some default freshness policy for all assets, it’d probably be categorized as “always fresh”, which would be undesirable as then it’d never be executed by the reconciliation logic.
At the risk of being too pedantic, I think it’d potentially be more accurate to say something more like “The reconciliation sensor should only submit assets which are not fresh for rerun”. In this case, if you are certain of your desired behavior for that aforementioned regular asset (maybe it really should be only run once a day), then you could simply set a FreshnessPolicy on that asset specifying that fact, and prevent runs from being kicked off at a higher frequency than you’ve specified. Otherwise, that asset is at the whims of its downstream assets.
Some extra miscellaneous points:
• In the nearish future, we are adding the capability to materialize any/all missing/stale assets (i.e. things that show up as “stale” in Dagit) on a schedule. This is a more straightforward approach to the problem than the current reconciliation logic, but doesn’t work for the situation described above (for the reasons detailed above). It also means that you need to wait for your assets to become stale before Dagster fixes them, rather than Dagster anticipating when the asset will become stale and launching runs to keep assets up to date before that happens. However, this may be the sort of nice "safe" functionality you're looking for.
• In the mediumish future, we’re planning on moving the reconciliation sensor into a daemon, which should not only make the reconciliation stuff feel less “special” (it’s a bit weird to have to add a special sensor to your repository/Definitions, and having reconciliation-specific properties on an asset feels very off if the only thing that interprets those properties is that special sensor), but also means that it can work cross-code location.
• Along those lines, I do think that there are some properties that do purely exist in the reconciliation domain, rather than the freshness domain. The first one is the property of “should this asset be allowed to be kicked off by reconciliation logic at all”. You may have a FreshnessPolicy on something indicating when you expect it to be available by, but have some custom logic for determining when it needs to get kicked off (and so you don’t want Dagster to try to kick off that asset with its own logic). Another property would concern when an asset should be materialized. In many situations, there’s a large window of times during which it’d be acceptable to materialize an upstream asset in order to get all of the data that you need. In general, given the FreshnessPolicy that’s defined, it should be acceptable to wait past the first acceptable moment in order to minimize the number of times that that upstream asset needs to get executed (multiple downstream freshness policies may depend on this upstream asset at different cadences). The reconciliation logic takes this into account (for non-partitioned assets), but you might want some way of telling it not to for specific nodes. I think these properties can potentially be hidden away in more elegant ways than a whole new complex policy to manage, but I did want to point that out.