https://dagster.io/ logo
d

Danny

07/02/2020, 7:18 PM
Let's say I'm gathering a list of things in one solid A, and want to have each thing found be processed by another solid B. Some arbitrary number (N) of things can be found, and its ok to gather all of them in A into one List output. But because Dagster expects there are no cycles in the compute graph, currently B is forced to process the entire list as opposed to having it run N times. I want B to run once per thing because then when B is run via celery I can properly control concurrency etc. Do I need to upgrade B to be a pipeline to accomplish what I want? That'll greatly decrease the usability of my dagit's Runs page, which could have 10000 entries for Things per each run of A... is there a different way to do this?
D 1
Just found this https://github.com/dagster-io/dagster/issues/462, answers my question
👍 1
a

alex

07/02/2020, 7:31 PM
In the mean time, depending on your exact constraints, you could consider changing how solid B works. In the same way you might have a solid that submits a spark job to a cluster, you could turn your solid B in to something that submits work to a process pool / celery workers / etc. and then reports back relevant data / events to dagster
d

Danny

07/02/2020, 7:42 PM
Makes sense, thanks!
a

alex

07/02/2020, 7:58 PM
thanks for commenting with your details on the issue too
👍 1
2 Views