I have question when the k8s pod is terminated or killed som dagster #ask-community

I have question. when the k8s pod is terminated or...

Gatsby Lee

02/28/2023, 6:14 PM

I have question. when the k8s pod is terminated or killed some reason, how can I make the flow move forward and run the final op? I know how to handle the failure in op, but I am not sure how to handle op killed.

✅ 1

daniel

02/28/2023, 6:19 PM

Hey Gatsby - which k8s pod is being referred to here? Are you using the executor that runs each pod in its own step and that's getting killed, or is it the pod for the whole run that's getting killed? What's the reason for the pod being terminated? autoscaling?

Gatsby Lee

02/28/2023, 6:24 PM

I guess that it is terminated by autoscaling although the k8s config is set with this.

Copy code

"annotations": {"<http://cluster-autoscaler.kubernetes.io/safe-to-evict|cluster-autoscaler.kubernetes.io/safe-to-evict>": "false"},

Gatsby Lee

02/28/2023, 6:25 PM

let me check which pod is killed.

Gatsby Lee

02/28/2023, 6:25 PM

( Thank you for your reply 😄 )

daniel

02/28/2023, 6:26 PM

Assuming the run failed, I think the easiest way is probably to set up run retries: https://docs.dagster.io/deployment/run-retries#run-retries - which will create a new run but will be able to pick up where it left off

Gatsby Lee

02/28/2023, 6:27 PM

Screen Shot 2023-02-28 at 10.27.12 AM.png

daniel

02/28/2023, 6:28 PM

Got it - it looks like its the run pod that's getting interrupted then

Gatsby Lee

02/28/2023, 6:28 PM

can I make it move forward although the step is terminated.

daniel

02/28/2023, 6:28 PM

I don't think we currently have a way to do that unfortunately - the retry strategy can be either FROM_FAILURE (which would start from the op that failed) or (ALL_STEPS)

Gatsby Lee

02/28/2023, 6:28 PM

ic.

Gatsby Lee

02/28/2023, 6:29 PM

thank you!

Open in Slack

Previous Next