The cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability.

dagster

When using "Single Run" to backfill an asset, the asset must be written in a way to support this by fetching the data for the range somehow. Once the asset has all this data for multiple partitions, what's responsible for "cutting it up" into each partition? Should the asset `yield` one `Output` per partition or is it the responsability of the IO manager to figure this out? Specifically in my case, my asset produces `DataFrame`s and the IO manager writes them as parquet (one file per partition): where should I cut up this `DataFrame`?

Hi! Technically the answer is "both".

The end user should only send back 1 output per partition because the I/O manager should dictate how to partition the data assuming that each materialization returns back 1 output per partition.

Thanks, so specifically, my asset should produce one `DataFrame` per partition and `yield Output(df)` for each? How do I indicate which partition a particular `DataFrame` is for? Perhaps what you are saying is that it's the I/O manager's job to figure this out?