https://dagster.io/ logo
#announcements
Title
# announcements
s

Shaun Ryan

06/28/2020, 8:44 PM
Hi. I'm trying to process documents through an nlp AI pipeline. Are there any examples of a parallel loop work flow? Each document is block of text that will hit dependency pipeline of ML service solids each returning a payload. I don't want to process the documents sequentially. I figure I need a solid for each ML service (3 in total with a rest endpoint) and the pipeline will iterate the documents in parallel through the pipeline. ML models are A, B, C where B & C are dependent on the output of A. I have a small dev setup with RabbitMQ, Dagster_Celery, Dagit, Dagster & Posgres. I will need to be able to scale it out on a bigger compute platform possibly later. I've been messing about and so far I can't get dagit to execute the workload out to the workers. I have it setup in docker-compose here -> https://github.com/shaunryan/docker-compose/tree/master/dagster I'm using the dagster celery cli... e.g.
dagster-celery worker start --config-yaml celery_config.yaml
s

Simon Späti

06/29/2020, 7:49 AM
hey @Shaun Ryan If you have same solid with different input, you can use the "reuse" of the same solid: https://docs.dagster.io/docs/tutorial/advanced_solids#reusable-solids For the parallel part, I had similar issues. I was missing
max_concurrent
part:
Copy code
execution:
  multiprocess:
    config:
      max_concurrent: 2
storage:
  s3:
    config:
https://dagster.readthedocs.io/en/0.6.9/sections/deploying/
But be aware with the latest `0.8`version a lot has changed, didn't upgrade yet and can't tell if this is still valid with that latest version.
a

alex

06/29/2020, 3:06 PM
ya make sure to set the
execution
section in the config for the pipeline run, for celery it will look like
Copy code
execution:
  celery:
    config:
      broker: <rabbitmq address in your docker setup>