Hi all Is there a way to limit concurrency across ops based dagster #ask-community

Hi all. Is there a way to limit concurrency across...

Amadou Crookes

05/05/2022, 7:05 PM

Hi all. Is there a way to limit concurrency across ops based on a tag? I've seen that the QueuedRunCoordinator has this for runs. Is there a built-in way to achieve this for all ops being run? Ideally this would work with k8s as that is the orchestration we are planning to use with dagster.

prha

05/05/2022, 7:12 PM

Hi Amadou! Would it work to set up celery queues to set up concurrency limits using op tags? You might find this useful: https://docs.dagster.io/_apidocs/libraries/dagster-celery-k8s#dagster_celery_k8s.celery_k8s_job_executor

Amadou Crookes

05/05/2022, 11:13 PM

Thank you for linking to that. If we were to use the celery queues where would the concurrency limits be configured? I don't see mention of that functionality.

prha

05/05/2022, 11:26 PM

The concurrency limits would be set by the number of workers and the concurrency options set per worker

prha

05/05/2022, 11:26 PM

(where worker is celery worker)

Amadou Crookes

05/06/2022, 12:02 AM

Okay I think I'm following. Below are a few follow up questions. Some of this might be that I need a deeper understanding of celery+dagster and I'm happy to go read more if you have resources that might fill in the gaps in my understanding. 1. Is there a way to specify what queue a task gets put on based on tags or other op metadata? 2. Is the celery worker deployment something we have to manage or is this a dagster config? 3. When a celery worker pulls a task and launches a k8s job, does it not pull another task until that job is done? I.e. will a worker pull task A and then launch job A and then pull task B and launch job B before job A is complete? Or will it pull task A and launch job A and then wait until job A is done before pulling task B?

prha

05/06/2022, 12:11 AM

Forgive me, I’m not as familiar with the specifics of the celery/dagster specifics either. I do know that some users are using it with some success. (cc @johann) Also, I think that this functionality might come directly to the k8s executor soon. See https://github.com/dagster-io/dagster/issues/6580

Amadou Crookes

05/06/2022, 12:30 AM

No worries. Thanks for sharing that issue. It is slightly different from this use case as it is a limitation across all spawned k8s jobs but is also something I am interested in! The Per-Op limits in Kubernetes doc has good information on the above questions 1. You can specify the queue by using the

dagster-celery/queue

tag. doc 2. It looks like the celery workers can be configured with the rest of dagster via a helm chart 3. Based on the content in the deployment architecture I believe the celery task launches the k8s job and waits for it to complete but I am still uncertain on that. It would be helpful to get confirmation on this if anyone knows.

dagsir 1

johann

05/06/2022, 2:50 PM

I believe the celery task launches the k8s job and waits for it to complete

that’s correct

Amadou Crookes

05/06/2022, 3:23 PM

Fantastic. Thank you for the help Johann and Phil!

2 Views

Open in Slack

Previous Next