Hi everyone. We’ve been using static Celery worker...
# deployment-kubernetes
r
Hi everyone. We’ve been using static Celery workers to run our steps (with the default run launcher). We tried to migrate our jobs to CeleryK8sRunLauncher and CeleryK8sExecutor - it’s working well, but we have many, many steps per job (because of fan-outs), and we don’t want to raise hundreds of K8s step jobs and pods per Dagster run. Our optimal goal is to deploy a run worker (as CeleryK8sRunLauncher), which will deploy a limited amount of celery K8s jobs to execute the run (as CeleryExecutor does). Jobs will be deleted after the run completes. It seems like CeleryK8sRunLauncher requires its K8s executor. Thanks
d
Hi Roei - if all you care about is maintaining limits or priorities of the number of runs, are you sure that you need Celery at all? We have a built on run queue that is designed to accomplish that goal, described here: https://docs.dagster.io/deployment/run-coordinator#limiting-run-concurrency
r
Hi @daniel, thanks for the response (as always 🙂). We use
QueuedRunCoordinator
and it’s working great, and we use that to limit jobs. We’re thinking about ditching Celery, as you said. We’d like to run a worker pod (as
K8sRunLauncher
) that will spawn a specific number of step pods - but we’d like them to live and pull tasks (as Celery does for the Celery executor) - lowering the isolation and overhead of deploying pods per each step (as we have many of them). Kind of
k8s_job_executor
that pulls for tasks and executes more than one step. Thanks again!
d
Ah I see, you want like a standing pool of workers to run steps? I've seen that request come up in the context of run workers too
r
Yes! We kind of implemented that using a static pool of Celery workers. But they live as deployment and it’s not scalable as a per-job workers pool. Regarding run workers - I think the RunCoordinator is pretty enough
n
If you want to run our ancient fork of Dagster, that's the pattern we use 🙂
The issue is it requires a core change due to some threading problems
Not sure if that ever got fixed upstream
d
Noah its been a while so I don't remember the details - i thought your setup was more around a pool of run workers than step workers?
n
It's both, the actual solids are all just wrappers around celery too. https://github.com/geomagical/dagster-geomagical/blob/main/dagster_geomagical/definitions.py#L44-L50
👍 1
ops, I should say now 🙂
r
Thanks, @Noah K 🙂 But I think we’d stick to the upstream Dagster. @daniel, anything that could help us just “group by” step executions to one container? The next step for us would be changing the orchestration a bit by reducing the number of Ops per job, but it’ll limit us a bit, and we prefer to try and solve that on the infra level. I’m sure we’re not the only ones with hundreds of Ops per job 🙂 Thanks again!
d
We don't have anything like that today unfortunately, but its a very reasonable feature request that's come up before
r
I see. How would you suggest implementing such a method?
d
This would probably involve writing your own executor: https://docs.dagster.io/deployment/executors#executors. This would probably require familiarizing yourself with other executors in the Dagster codebase first.