Alexander Verbitsky

02/16/2022, 6:07 PM
Hi all! We are working with satellite images Sentinel 2 and have quite simple pipeline consists in 4 steps, but it has to able to process around 1_000_000 dates for one run. Also we would like in case if some date already processed (corresponded artifact already presented) such date should be skipped. Is dagster suitable for such task and can it to parallelized processing?


02/17/2022, 6:49 PM
Hi Alexander, thanks for the question. Some tools that might be what you're looking for: • Dynamic Out allows you map over each date and skip a date if the artifact is already presented. • Dagster's default executor is the multiprocess executor, which will process your ops in parallel.