EDIT: `tag_concurrency_limits` + some clever chunk...
# ask-community
l
EDIT:
tag_concurrency_limits
+ some clever chunking of my 5000 dynamic ops across ~100 multiple job runs might work for this, but it feels hacky to me. ~I have a dagster job that has dynamic ops (spins up 200-5000 using
k8s_executor
). Is there a way to specify a concurrency limit X so that a given job run will only schedule X parallel k8s jobs at a time?
My cluster can handle a large number of parallel k8s jobs (dynamic ops) but for this particular dagster job, I know if I don’t limit concurrency I’m gonna hit API limits — the API limits are due to a library that I’m calling from within each dynamic op. Ideally would be something configured here in `execution.config`:
p
Glad you found a solution that works for you… is there an ideal API you had in mind for specifying concurrency limits at the op-level? Would you want it to be run-scoped, or globally scoped?
l
globally scoped or run-scoped is fine. run-scoped would be cleaner since I can just have a single job with 3,500 dynamic ops within it my workaround now is: 1. create temp table with the 3.5k collections to process 2. have a schedule that every hour kicks off a job with 200 dynamic ops, force timeout after 10 hours (within the ETL logic in the dynamic op). increment offset every hour, go back to 0 offset once you reach the end. cycle through offsets forever until I turn off the schedule 3. allow 5 jobs running concurrently. most of these jobs will finish within 1-2 hours so overall op concurrency should never be the full 200 * 5 = 1000 at once 4. if a collection is finished processing, essentially the ETL is a no-op. but for the super long-running jobs (30 hours), we’ll get to the end of processing after 3 full “cycles” — lmk if that makes sense yikes this requires lots of babysitting and tuning so it’s not ideal, but it works for now
p
Would it work to set up celery and use the
celery_k8s_job_executor
? https://docs.dagster.io/_apidocs/libraries/dagster-celery-k8s#dagster_celery_k8s.celery_k8s_job_executor. You could set up queues according to the concurrency constraints you have: https://docs.dagster.io/deployment/guides/kubernetes/deploying-with-helm-advanced
l
@prha does this work well with dagster.cloud? I could try it out but I saw some other threads that discouraged celery for production use-cases and encouraged us to stick to k8s executors
d
Yeah the celery k8s executor doesn't work in cloud unfortunately. We're working on a replacement that will give you the same benefits without needing to run celery, but it's still a few weeks out.
👍 1