Matt Clarke
09/15/2022, 1:38 PMx
, and I want to calculate dx/dt
, or cumsum(x)
, then I need to know the latest value of x
to be able to initialise at the leading boundary of my partition
I've seen TrailingWindowPartitionMapping
and AssetObservation
as two ways I might be able to handle this, but am curious what current best practice is or if someone has any examples of this in practice. Something like map_overlap
in dask might be needed? https://docs.dask.org/en/stable/generated/dask.array.map_overlap.html#dask.array.map_overlap
Bonus question:
What if there may be missing data? For example, a date partitioned parquet file where I have files in the 1st, 2nd, 3rd, and 5th of the month. A small trailing window won't work if there is the scope for data to be missing. Some kind of "This partition, plus the latest preceding one which also had data" would be needed