This might be an easy question but if I have a mas...
# ask-community
a
This might be an easy question but if I have a massive dataset with no discrete known partitions, I would normally handle it via a generator. Generators keep memory usage extremely low and are great at chunking through huge datasets. How can I do this in Dagster without pooling all the data in mem to send to an IO manager. Ideally the IO manager would flush data whenever I
yield
a list of records within the body of the asset. Is this how it works or how do others do this?
j
l
I recently explored Dynamic Output in greater detail that may fit this use case? Using the default
fs_io_manager
the sliced data are individually processed by the compute function, and the usual setup applies if the function is decorated with
@op
or
@asset
. The only caveat is that the compute function is parsed only once, so you may use a generator for that compute function. https://docs.dagster.io/concepts/ops-jobs-graphs/dynamic-graphs#a-dynamic-job