Hi everyone We ve been using static Celery workers to run ou dagster #deployment-kubernetes

Hi everyone. We’ve been using static Celery worker...

Roei Jacobovich

05/06/2022, 9:10 AM

Hi everyone. We’ve been using static Celery workers to run our steps (with the default run launcher). We tried to migrate our jobs to CeleryK8sRunLauncher and CeleryK8sExecutor - it’s working well, but we have many, many steps per job (because of fan-outs), and we don’t want to raise hundreds of K8s step jobs and pods per Dagster run. Our optimal goal is to deploy a run worker (as CeleryK8sRunLauncher), which will deploy a limited amount of celery K8s jobs to execute the run (as CeleryExecutor does). Jobs will be deleted after the run completes. It seems like CeleryK8sRunLauncher requires its K8s executor. Thanks

daniel

05/06/2022, 1:46 PM

Hi Roei - if all you care about is maintaining limits or priorities of the number of runs, are you sure that you need Celery at all? We have a built on run queue that is designed to accomplish that goal, described here: https://docs.dagster.io/deployment/run-coordinator#limiting-run-concurrency

Roei Jacobovich

05/06/2022, 2:08 PM

Hi @daniel, thanks for the response (as always 🙂). We use

QueuedRunCoordinator

and it’s working great, and we use that to limit jobs. We’re thinking about ditching Celery, as you said. We’d like to run a worker pod (as

K8sRunLauncher

) that will spawn a specific number of step pods - but we’d like them to live and pull tasks (as Celery does for the Celery executor) - lowering the isolation and overhead of deploying pods per each step (as we have many of them). Kind of

k8s_job_executor

that pulls for tasks and executes more than one step. Thanks again!

daniel

05/06/2022, 2:09 PM

Ah I see, you want like a standing pool of workers to run steps? I've seen that request come up in the context of run workers too

Roei Jacobovich

05/06/2022, 2:11 PM

Yes! We kind of implemented that using a static pool of Celery workers. But they live as deployment and it’s not scalable as a per-job workers pool. Regarding run workers - I think the RunCoordinator is pretty enough

Noah K

05/06/2022, 8:28 PM

If you want to run our ancient fork of Dagster, that's the pattern we use 🙂

Noah K

05/06/2022, 8:28 PM

The issue is it requires a core change due to some threading problems

Noah K

05/06/2022, 8:28 PM

Not sure if that ever got fixed upstream

daniel

05/06/2022, 8:29 PM

Noah its been a while so I don't remember the details - i thought your setup was more around a pool of run workers than step workers?

Noah K

05/06/2022, 8:30 PM

It's both, the actual solids are all just wrappers around celery too. https://github.com/geomagical/dagster-geomagical/blob/main/dagster_geomagical/definitions.py#L44-L50

👍 1

Noah K

05/06/2022, 8:31 PM

ops, I should say now 🙂

Roei Jacobovich

05/06/2022, 9:16 PM

Thanks, @Noah K 🙂 But I think we’d stick to the upstream Dagster. @daniel, anything that could help us just “group by” step executions to one container? The next step for us would be changing the orchestration a bit by reducing the number of Ops per job, but it’ll limit us a bit, and we prefer to try and solve that on the infra level. I’m sure we’re not the only ones with hundreds of Ops per job 🙂 Thanks again!

daniel

05/06/2022, 10:01 PM

We don't have anything like that today unfortunately, but its a very reasonable feature request that's come up before

Roei Jacobovich

05/07/2022, 2:16 PM

I see. How would you suggest implementing such a method?

daniel

05/09/2022, 10:31 PM

This would probably involve writing your own executor: https://docs.dagster.io/deployment/executors#executors. This would probably require familiarizing yourself with other executors in the Dagster codebase first.

4 Views

Open in Slack

Previous Next