Hi everyone. We’ve been using static Celery worker...
# deployment-kubernetes
Hi everyone. We’ve been using static Celery workers to run our steps (with the default run launcher). We tried to migrate our jobs to CeleryK8sRunLauncher and CeleryK8sExecutor - it’s working well, but we have many, many steps per job (because of fan-outs), and we don’t want to raise hundreds of K8s step jobs and pods per Dagster run. Our optimal goal is to deploy a run worker (as CeleryK8sRunLauncher), which will deploy a limited amount of celery K8s jobs to execute the run (as CeleryExecutor does). Jobs will be deleted after the run completes. It seems like CeleryK8sRunLauncher requires its K8s executor. Thanks
Hi Roei - if all you care about is maintaining limits or priorities of the number of runs, are you sure that you need Celery at all? We have a built on run queue that is designed to accomplish that goal, described here: https://docs.dagster.io/deployment/run-coordinator#limiting-run-concurrency
Hi @daniel, thanks for the response (as always 🙂). We use
and it’s working great, and we use that to limit jobs. We’re thinking about ditching Celery, as you said. We’d like to run a worker pod (as
) that will spawn a specific number of step pods - but we’d like them to live and pull tasks (as Celery does for the Celery executor) - lowering the isolation and overhead of deploying pods per each step (as we have many of them). Kind of
that pulls for tasks and executes more than one step. Thanks again!
Ah I see, you want like a standing pool of workers to run steps? I've seen that request come up in the context of run workers too
Yes! We kind of implemented that using a static pool of Celery workers. But they live as deployment and it’s not scalable as a per-job workers pool. Regarding run workers - I think the RunCoordinator is pretty enough
If you want to run our ancient fork of Dagster, that's the pattern we use 🙂
The issue is it requires a core change due to some threading problems
Not sure if that ever got fixed upstream
Noah its been a while so I don't remember the details - i thought your setup was more around a pool of run workers than step workers?
It's both, the actual solids are all just wrappers around celery too. https://github.com/geomagical/dagster-geomagical/blob/main/dagster_geomagical/definitions.py#L44-L50
👍 1
ops, I should say now 🙂
Thanks, @Noah K 🙂 But I think we’d stick to the upstream Dagster. @daniel, anything that could help us just “group by” step executions to one container? The next step for us would be changing the orchestration a bit by reducing the number of Ops per job, but it’ll limit us a bit, and we prefer to try and solve that on the infra level. I’m sure we’re not the only ones with hundreds of Ops per job 🙂 Thanks again!
We don't have anything like that today unfortunately, but its a very reasonable feature request that's come up before
I see. How would you suggest implementing such a method?
This would probably involve writing your own executor: https://docs.dagster.io/deployment/executors#executors. This would probably require familiarizing yourself with other executors in the Dagster codebase first.