Mark Fickett
12/14/2022, 6:49 PM@op
cause a retry if the pod is OOMKilled? From the docs it looks like the retry is handling an exception from the op, and I'm not sure if that handling encapsulates the pod-termination check. I'm looking at an op that failed with "STEP_FAILURE Step my_step failed health check: Discovered failed Kubernetes job dagster-step-1b1e67c47da27410811b5606965fb1c4 for step my_step" and showed Status: OOMKilled in kubectl describe
but ran fine on retry w/ plenty of RAM headroom as far as I can tell.johann
12/19/2022, 3:32 PMMark Fickett
12/19/2022, 3:44 PM