hey all – I understand explicit cycles within a graph are disallowed – does anyone have examples of maintaining some arbitrary "state" within an asset/step between runs without introducing an external system?
use case here is a step which submits an API request on behalf of each unique value at most once. every few weeks we have a new raw input file land, which is transformed into dataframe with a bunch of addresses. maybe it grows in size by 5% each run but the bulk of it will not change. for an address we have not seen before, we want to geocode it using Google Maps – but if we have seen it, we just skip that row and use the old value.
if the answer is "use Postgres or something similar" as a cache layer, that's fine – but wanted to check if the community has any clever suggestions without complicating infrastructure substantially.
dagster bot answered by content 1