https://dagster.io/ logo
#ask-community
Title
# ask-community
b

Bolin Zhu

08/10/2022, 5:52 AM
Hi team! On dagster 0.13.7, we are encountering this error sporadically. Could you assist us on identifying some potential root causes for it?
c

claire

08/10/2022, 4:16 PM
Hi Bolin. Which run launcher are you using? We've seen this error pop up before a k8s container restarts when a Dagster job hasn't successfully completed
b

Bolin Zhu

08/11/2022, 2:49 AM
@Yingqiu Lee
y

Yingqiu Lee

08/11/2022, 2:53 AM
we’re using the K8sRunLauncher
j

johann

08/11/2022, 5:30 PM
Yeah further up in the events I imagine you’ll see another pod start and emit the RUN_STARTED event. It then died for whatever reason (e.g. node spun down) and Kubernetes has a known issue where even if you disable retries on a pod, it will still try to restart it. We don’t support k8s pods restarting like that so we guard against it with this status check.
You’ll want to • investigate why the pod failed in the first place (too much resource usage? etc.) • if you decide it’s an ephemeral failure, you could consider configuring automatic retries at the dagster level https://docs.dagster.io/deployment/run-retries