Hi I m trying to process documents through an nlp AI pipelin dagster #announcements

Hi. I'm trying to process documents through an nlp...

Shaun Ryan

06/28/2020, 8:44 PM

Hi. I'm trying to process documents through an nlp AI pipeline. Are there any examples of a parallel loop work flow? Each document is block of text that will hit dependency pipeline of ML service solids each returning a payload. I don't want to process the documents sequentially. I figure I need a solid for each ML service (3 in total with a rest endpoint) and the pipeline will iterate the documents in parallel through the pipeline. ML models are A, B, C where B & C are dependent on the output of A. I have a small dev setup with RabbitMQ, Dagster_Celery, Dagit, Dagster & Posgres. I will need to be able to scale it out on a bigger compute platform possibly later. I've been messing about and so far I can't get dagit to execute the workload out to the workers. I have it setup in docker-compose here -> https://github.com/shaunryan/docker-compose/tree/master/dagster I'm using the dagster celery cli... e.g.

dagster-celery worker start --config-yaml celery_config.yaml

Simon Späti

06/29/2020, 7:49 AM

hey @Shaun Ryan If you have same solid with different input, you can use the "reuse" of the same solid: https://docs.dagster.io/docs/tutorial/advanced_solids#reusable-solids For the parallel part, I had similar issues. I was missing

max_concurrent

part:

Copy code

execution:
  multiprocess:
    config:
      max_concurrent: 2
storage:
  s3:
    config:

https://dagster.readthedocs.io/en/0.6.9/sections/deploying/

Simon Späti

06/29/2020, 7:51 AM

But be aware with the latest `0.8`version a lot has changed, didn't upgrade yet and can't tell if this is still valid with that latest version.

alex

06/29/2020, 3:06 PM

ya make sure to set the

execution

section in the config for the pipeline run, for celery it will look like

Copy code

execution:
  celery:
    config:
      broker: <rabbitmq address in your docker setup>

Open in Slack

Previous Next