The cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability.

dagster

This might be an easy question but if I have a massive dataset with no discrete known partitions, I would normally handle it via a generator. Generators keep memory usage extremely low and are great at chunking through huge datasets. How can I do this in Dagster without pooling all the data in mem to send to an IO manager. Ideally the IO manager would flush data whenever I `yield` a list of records within the body of the asset. Is this how it works or how do others do this?

Relevant?
<https://github.com/dagster-io/dagster/discussions/9772>

I recently explored Dynamic Output in greater detail that may fit this use case? Using the default `fs_io_manager` the sliced data are individually processed by the compute function, and the usual setup applies if the function is decorated with `@op` or `@asset`. The only caveat is that the compute function is parsed only once, so you may use a generator for that compute function.

<https://docs.dagster.io/concepts/ops-jobs-graphs/dynamic-graphs#a-dynamic-job>