When using "Single Run" to backfill an asset, the ...
# ask-community
p
When using "Single Run" to backfill an asset, the asset must be written in a way to support this by fetching the data for the range somehow. Once the asset has all this data for multiple partitions, what's responsible for "cutting it up" into each partition? Should the asset
yield
one
Output
per partition or is it the responsability of the IO manager to figure this out? Specifically in my case, my asset produces `DataFrame`s and the IO manager writes them as parquet (one file per partition): where should I cut up this
DataFrame
?
t
Hi! Technically the answer is "both". The end user should only send back 1 output per partition because the I/O manager should dictate how to partition the data assuming that each materialization returns back 1 output per partition.
p
Thanks, so specifically, my asset should produce one
DataFrame
per partition and
yield Output(df)
for each? How do I indicate which partition a particular
DataFrame
is for? Perhaps what you are saying is that it's the I/O manager's job to figure this out?