Jaap Langemeijer
07/11/2022, 11:36 AMk8s_job_executor
. Because execution is on kubernetes, I use the gcs_pickle_io_manager
. I am just passing a small amount of data in this fanout. Many calls to the gcs api makes this execute slowly, it just creates about 3 outputs per second. Any suggestions or ideas how to create large fanouts within kubernetes?
I was thinking about using a local iomanager and mounting a volume to all the pods that will be using the fanout, but I am interested in any alternatives 🙂Aaron Hoffer
07/11/2022, 8:43 PMDynamicOutput
and creating a fan out from that? It’s possible to use different IO managers for each operation (https://docs.dagster.io/concepts/io-management/io-managers#per-output-io-manager). If you’re creating a new Kubernetes Job with each fan out not sure if you’d be able to use the mem_io_manager
or the fs_io_manager
but those will be much faster. (Disadvantage of the mem_io_manager is that the state of the memory isn’t kept on failure)
A different option, but if it’s just a small amount of data you’re using to run the op you could use something like redis to mange those states?Jaap Langemeijer
07/12/2022, 9:41 AMAaron Hoffer
07/12/2022, 5:22 PMI think mem_io_manager will not work when in different podsDoes your fan out create the pods from a single pod? It won’t share the memory between the pods but if they are created from one pod, it might be able to pass that information to each of the pods?
Aaron Hoffer
07/12/2022, 5:24 PMI will need to investigate Redis, thanks for the tipSounds like you’re using GCS and their redis Memorystore is easy to set up and connect to your pod and easy to write with the redis python client