https://dagster.io/ logo
#dagster-support
Title
# dagster-support
p

PB

04/24/2022, 9:55 PM
Hi all - quick q, in wrapping my head around the framework. With regards to Ops and IO managers - given Ops are by definition - for small calculations, but IO Managers (unless using in-memory) store data somewhere - isn’t there a lot of overhead being introduced for read/write? Particularly when working with big data, do we really want to store/duplicate the data at each small stage of processing (as opposed to key check-points)? What is the reasoning behind this, noting the redundancy (large tables per step per run - exponential sizes). Or is it meant that one would stay in memory for certain Ops and then store at others ? Thanks heaps
j

johann

04/25/2022, 1:27 PM
Hi PB- one pattern is rather than directly passing around a full table through IO managers, to use them to pass pointers to wherever your table/other piece of data is stored
p

PB

05/03/2022, 9:19 PM
Ok cool thanks!