geoHeil
03/02/2022, 9:48 AMThe op-decorated function accepts DataFrames as parameters and returns DataFrames when it completes. An IOManager handles writing and reading the DataFrames to and from persistent storage.Assuming the DF is petabytes in size I do not neccessarily want to materialize all this IO. Spark itself will create a DAG of the submitted operations - and perhaps calculate additional predicate pushdowns or projections for optimization (AQE). How can I use dagster and ops to define multiple (reusable building blocks) but still not materialize the IO between these steps?
Zach
03/02/2022, 3:53 PMgeoHeil
03/02/2022, 9:41 PMZach
03/02/2022, 9:44 PMgeoHeil
03/02/2022, 9:48 PMZach
03/02/2022, 9:56 PMgeoHeil
03/03/2022, 8:49 AMsandy
03/03/2022, 4:05 PM