Hi guys, When I run dagit -f and run a job locally...
# ask-community
j
Hi guys, When I run dagit -f and run a job locally, I manage successfully to use joblib (Parallel(n_jobs=-1, verbose=10)(delayed(func)(**kwargs) for kwargs in kwargs_list)). When running it using the Docker container,
I get an error: joblib.externals.loky.process_executor.BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.
Any ideas on why?
I know that one possible solution would be creating multiple tasks rather than using joblib, but I would want to spare the IO time
y
Hi! what are you trying to achieve here with the joblib?
j
Hi Yuhan, I have a compute intensive process that deals with different dataframes (heavy ones). I would like thus to process them in different processes to speed up
y
Dagster by default runs ops in parallel and multiple processing - if that’s what you’re looking for
j
I understand. I would like to know if it is possible to avoid using multiple ops for two reasons: 1. Artifact code that I would have to rewrite using ops 2. My understanding is that dagster needs to store the IO between ops. This will cause significant slowdown for me as my dataframes are huge. Is it possible to run joblib (or any other parallel processing library) in an op? If not, how to deal with point 2?