The cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability.

dagster

I'm getting into some situations where a bunch of hourly load jobs get kicked off, triggering auto-scaling and evicting the pods where my jobs are running.  I'll get errors like this:
```Step &lt;op&gt; finished without success or failure event. Downstream steps will not execute.```
When I look at the job, I find:
```  Warning  TooManyActivePods  26m   job-controller  Too many active pods running after completion count reached```
Still learning a good bit about k8s, but I'm wondering whether there's a way to tag the job pods as "do not destroy", or something to that effect?

I believe what we want is <https://kubernetes.io/docs/tasks/run-application/configure-pdb/#specifying-a-poddisruptionbudget>. I don’t see why this shouldn’t be a built in option to OSS and cloud

I believe one of the downsides here is that you cannot drain node (i.e. remove all the pods from the node) if you set the `minUnavailable` for any of those pods to 0 - you’ll have to wait until the job fully terminates before trying to do that manual drain