Hello! I'm exploring the best practices to retry/r...
# ask-community
a
Hello! I'm exploring the best practices to retry/re-execute the failed ops and/or jobs. As I understand, the rock-solid solution should include both: • op retries within the same job run and • run retries for the runs that failed nevertheless Then, if after we still don't succeed we want to use the run failure sensor to notify us about that and potentially start another re-execution via Python API. Does it sound sensible or there are better approaches, considering we cannot afford leaving any tiny bit failed as well as performing accidental re-execution of the op that has been already re-executed successfully?
c
Hi Arsenii, that sounds like a sensible approach to me. You can implement run retries from failure to ensure that successful ops do not reexecute.
a
Ok, thank you! I will probably ask more questions further during the implementation 🙂