Thanks again both of you for the detailed responses! @Stephen Bailey I am largely in agreement with the larger point that the FreshnessPolicy is/should be the main abstraction that users focus on, and that more flexibility in that layer would be useful. One nuance that I want to dig into is the idea of assets that have somewhat ambiguous âstalenessâ values.
This situation is actually quite common in the non-partitioned asset world. For example, if you have an online transactional database table (source asset), which gets syncâd to an analytical database table (regular asset), it is always possible to materialize the downstream table asset in order to pull in new data. It may be the case that youâre fairly indifferent on how often this syncing operation takes place (running it more often may have negligible cost implications, as each run only updates new rows), so setting a FreshnessPolicy on that particular node may be undesirable. You just want that table to be updated at whatever times allow downstream assets (which do have FreshnessPolicies) to get the data that they need in time.
I donât think this pattern (which I would argue is a core use case) can coexist with âThe reconciliation sensor should submit only and all stale assets for rerunâ. Currently, we treat assets in that category as neither fresh nor stale. Theyâre not fresh because they have no freshness policy defined, and not stale because thereâs no upstream materialization event (we just are assuming that the upstream data for the source asset is always changing) and⌠well, itâd be annoying to have a bunch of assets marked as stale with no way to fix it. If we had some default freshness policy for all assets, itâd probably be categorized as âalways freshâ, which would be undesirable as then itâd never be executed by the reconciliation logic.
At the risk of being too pedantic, I think itâd potentially be more accurate to say something more like âThe reconciliation sensor should only submit assets which are not fresh for rerunâ. In this case, if you are certain of your desired behavior for that aforementioned regular asset (maybe it really should be only run once a day), then you could simply set a FreshnessPolicy on that asset specifying that fact, and prevent runs from being kicked off at a higher frequency than youâve specified. Otherwise, that asset is at the whims of its downstream assets.
Some extra miscellaneous points:
⢠In the nearish future, we are adding the capability to materialize any/all missing/stale assets (i.e. things that show up as âstaleâ in Dagit) on a schedule. This is a more straightforward approach to the problem than the current reconciliation logic, but doesnât work for the situation described above (for the reasons detailed above). It also means that you need to wait for your assets to become stale before Dagster fixes them, rather than Dagster anticipating when the asset will become stale and launching runs to keep assets up to date before that happens. However, this may be the sort of nice "safe" functionality you're looking for.
⢠In the mediumish future, weâre planning on moving the reconciliation sensor into a daemon, which should not only make the reconciliation stuff feel less âspecialâ (itâs a bit weird to have to add a special sensor to your repository/Definitions, and having reconciliation-specific properties on an asset feels very off if the only thing that interprets those properties is that special sensor), but also means that it can work cross-code location.
⢠Along those lines, I do think that there are some properties that do purely exist in the reconciliation domain, rather than the freshness domain. The first one is the property of âshould this asset be allowed to be kicked off by reconciliation logic at allâ. You may have a FreshnessPolicy on something indicating when you expect it to be available by, but have some custom logic for determining when it needs to get kicked off (and so you donât want Dagster to try to kick off that asset with its own logic). Another property would concern when an asset should be materialized. In many situations, thereâs a large window of times during which itâd be acceptable to materialize an upstream asset in order to get all of the data that you need. In general, given the FreshnessPolicy thatâs defined, it should be acceptable to wait past the first acceptable moment in order to minimize the number of times that that upstream asset needs to get executed (multiple downstream freshness policies may depend on this upstream asset at different cadences). The reconciliation logic takes this into account (for non-partitioned assets), but you might want some way of telling it not to for specific nodes. I think these properties can potentially be hidden away in more elegant ways than a whole new complex policy to manage, but I did want to point that out.