Hi again y'all. I'm building a pipeline that will need to download over a hundred files, I'm wondering if I should handle the concurrent downloads using dagster's
multiprocess_executor
or if I'm better off using something like aiohttp.
I'm guessing I'll have a huge execution overhead using the executor, but maybe there's a better way I'm not aware of?
Anyway sorry for the noob question, I'm still going through the docs
l
Lyle
02/18/2020, 4:51 AM
i'm new to dagster myself, but it seems like aiohttp would be advantageous here.
i'm with you on the overhead bit... multiproc seems like overkill for just downloading files.
maybe do both and timeit ¯\_(ツ)_/¯
a
alex
02/18/2020, 7:05 PM
ya I would lean aiohttp or something designed to do the task well - the multi process executor is currently not optimized for anything other than doing each step execution in a clean process for isolation