https://dagster.io/ logo
#deployment-kubernetes
Title
# deployment-kubernetes
s

Stephen Bailey

06/08/2022, 5:35 PM
I'm getting into some situations where a bunch of hourly load jobs get kicked off, triggering auto-scaling and evicting the pods where my jobs are running. I'll get errors like this:
Copy code
Step <op> finished without success or failure event. Downstream steps will not execute.
When I look at the job, I find:
Copy code
Warning  TooManyActivePods  26m   job-controller  Too many active pods running after completion count reached
Still learning a good bit about k8s, but I'm wondering whether there's a way to tag the job pods as "do not destroy", or something to that effect?
r

rex

06/08/2022, 5:40 PM
I believe what we want is https://kubernetes.io/docs/tasks/run-application/configure-pdb/#specifying-a-poddisruptionbudget. I don’t see why this shouldn’t be a built in option to OSS and cloud
I believe one of the downsides here is that you cannot drain node (i.e. remove all the pods from the node) if you set the
minUnavailable
for any of those pods to 0 - you’ll have to wait until the job fully terminates before trying to do that manual drain
👍 1
s

Stephen Bailey

06/08/2022, 7:29 PM
thanks rex!
12 Views