I want to run a large fanout of small jobs in para...
# deployment-kubernetes
j
I want to run a large fanout of small jobs in parallel using the
k8s_job_executor
. Because execution is on kubernetes, I use the
gcs_pickle_io_manager
. I am just passing a small amount of data in this fanout. Many calls to the gcs api makes this execute slowly, it just creates about 3 outputs per second. Any suggestions or ideas how to create large fanouts within kubernetes? I was thinking about using a local iomanager and mounting a volume to all the pods that will be using the fanout, but I am interested in any alternatives 🙂
dagster bot responded by community 1
a
Are you referring to an op that has a
DynamicOutput
and creating a fan out from that? It’s possible to use different IO managers for each operation (https://docs.dagster.io/concepts/io-management/io-managers#per-output-io-manager). If you’re creating a new Kubernetes Job with each fan out not sure if you’d be able to use the
mem_io_manager
or the
fs_io_manager
but those will be much faster. (Disadvantage of the mem_io_manager is that the state of the memory isn’t kept on failure) A different option, but if it’s just a small amount of data you’re using to run the op you could use something like redis to mange those states?
1
j
I think mem_io_manager will not work when in different pods? I will need to investigate Redis, thanks for the tip
a
I think mem_io_manager will not work when in different pods
Does your fan out create the pods from a single pod? It won’t share the memory between the pods but if they are created from one pod, it might be able to pass that information to each of the pods?
Your file storage idea is also a reasonable option as long as you can get the pods to access the same volume
I will need to investigate Redis, thanks for the tip
Sounds like you’re using GCS and their redis Memorystore is easy to set up and connect to your pod and easy to write with the redis python client
❤️ 1