Hello team how are you I have been working on the implementa dagster #ask-community

Hello team, how are you? I have been working on th...

Carlos Sanoja

09/07/2021, 3:19 PM

Hello team, how are you? I have been working on the implementation of a pipeline that generates subtasks to process blocks of information that use the same solids so the process would be:

info 1 --> solid1 --> solid2 --> solid3

info 2 --> solid1 --> solid2 --> solid3

info 3 --> solid1 --> solid2 --> solid3

and so on, for this I am generating a dynamic output (since in principle I don't know how many sub tasks I will generate beforehand, it can be 3 or 7 or any number). The problem I have is that when processing several subtasks in parallel the computer runs out of resources (memory) and dagster kills the processes. This is because solid1 runs a machine learning model. My question is, looking at the attached image and taking into account tests I have done, I notice that dagster automatically generates 4 subprocesses and as it finishes it generates the next ones. For this example, I have 7 input data, it divides them in 4 and then in 3, once the previous ones are finished. Is there any way to limit or control the generation or parallelization of tasks from the macro? currently I generate the dynamic outputs and use the .map() function to apply the requests to each output. I hope you can help me

George Pearse

09/07/2021, 4:03 PM

Hi @Carlos Sanoja I think you're after max_concurrent which you can set in the config https://docs.dagster.io/_apidocs/execution#dagster.multiprocess_executor

Carlos Sanoja

09/07/2021, 4:11 PM

Thank you very much @George Pearse!

Open in Slack

Previous Next