Hi all - quick q, in wrapping my head around the f...
# ask-community
p
Hi all - quick q, in wrapping my head around the framework. With regards to Ops and IO managers - given Ops are by definition - for small calculations, but IO Managers (unless using in-memory) store data somewhere - isn’t there a lot of overhead being introduced for read/write? Particularly when working with big data, do we really want to store/duplicate the data at each small stage of processing (as opposed to key check-points)? What is the reasoning behind this, noting the redundancy (large tables per step per run - exponential sizes). Or is it meant that one would stay in memory for certain Ops and then store at others ? Thanks heaps
j
Hi PB- one pattern is rather than directly passing around a full table through IO managers, to use them to pass pointers to wherever your table/other piece of data is stored
p
Ok cool thanks!