Mark Fickett
08/16/2022, 7:35 PMk8s_job_executor
, and the spot instances get interrupted, how will the executor respond -- retry the step on a new instance? Treat it as a failure? (Is that something I would expect to be handled at the k8s control plane, or the job executor?)alex
08/16/2022, 7:38 PMop
/ step
pod interruption can be handled by setting a retry policy
https://docs.dagster.io/concepts/ops-jobs-graphs/op-retries#retrypolicy
run pod interruption can be handled by run level retries
https://docs.dagster.io/deployment/run-retriesMark Fickett
08/16/2022, 7:40 PMalex
08/16/2022, 7:41 PM