https://dagster.io/ logo
m

Mark Fickett

08/16/2022, 7:35 PM
If I use EKS on EC2 Spot instances with
k8s_job_executor
, and the spot instances get interrupted, how will the executor respond -- retry the step on a new instance? Treat it as a failure? (Is that something I would expect to be handled at the k8s control plane, or the job executor?)
a

alex

08/16/2022, 7:38 PM
op
/
step
pod interruption can be handled by setting a retry policy https://docs.dagster.io/concepts/ops-jobs-graphs/op-retries#retrypolicy run pod interruption can be handled by run level retries https://docs.dagster.io/deployment/run-retries
m

Mark Fickett

08/16/2022, 7:40 PM
Thanks! So it will be seen as a failure from Dagster's point of view, but I can also use standard Dagster tools to make execution resilient to those failures.
a

alex

08/16/2022, 7:41 PM
ya i would expect our features around this to continue to improve, but generally we want an explicit opt in to retrying computations since its not implicit that they are safe to redo if interrupted unexpectedly
ty thankyou 1
3 Views