https://dagster.io/ logo
#dagster-support
Title
# dagster-support
m

Matt Clarke

09/15/2022, 1:38 PM
Hi all, Is there any best practice for handling calculations across time partitions which are differential/rolling in nature. For example, if I have some parameter
x
, and I want to calculate
dx/dt
, or
cumsum(x)
, then I need to know the latest value of
x
to be able to initialise at the leading boundary of my partition I've seen
TrailingWindowPartitionMapping
and
AssetObservation
as two ways I might be able to handle this, but am curious what current best practice is or if someone has any examples of this in practice. Something like
map_overlap
in dask might be needed? https://docs.dask.org/en/stable/generated/dask.array.map_overlap.html#dask.array.map_overlap Bonus question: What if there may be missing data? For example, a date partitioned parquet file where I have files in the 1st, 2nd, 3rd, and 5th of the month. A small trailing window won't work if there is the scope for data to be missing. Some kind of "This partition, plus the latest preceding one which also had data" would be needed