General Dagster use-case question. We're processing large (multiple TB) of data across many machines using k8s/ec2 and tools like dask. We'd love to be able to model this in dagster for a) visibility and b) asset provenance/tracking. How would one model this? it seems like Dagster is geared towards doing all the computation within the python/dagster process
Does "data is passed between computation" mean that data is actually moved around, because in my case, i need data to stay in the dask cluster and only a reference be passed
n
nickvazz
11/08/2022, 4:33 AM
I have been using the dask_resource directly rather than using it as an executor. I bet creating a dask IOManager similar to the mem_io_manager might be a way to pass around the futures how ever you want? 🤷 still thinking about this myself
y
yuhan
11/08/2022, 4:33 AM
yes the data itself can be moved around. but you can also pass it by reference.