https://dagster.io/ logo
#ask-community
Title
# ask-community
a

Alexander Butler

04/18/2023, 7:23 AM
This might be an easy question but if I have a massive dataset with no discrete known partitions, I would normally handle it via a generator. Generators keep memory usage extremely low and are great at chunking through huge datasets. How can I do this in Dagster without pooling all the data in mem to send to an IO manager. Ideally the IO manager would flush data whenever I
yield
a list of records within the body of the asset. Is this how it works or how do others do this?
j

Jan Hajny

04/18/2023, 7:25 AM
l

Le Yang

04/18/2023, 1:58 PM
I recently explored Dynamic Output in greater detail that may fit this use case? Using the default
fs_io_manager
the sliced data are individually processed by the compute function, and the usual setup applies if the function is decorated with
@op
or
@asset
. The only caveat is that the compute function is parsed only once, so you may use a generator for that compute function. https://docs.dagster.io/concepts/ops-jobs-graphs/dynamic-graphs#a-dynamic-job