Hi Dagster Team! I am currently trying to make my ...
# ask-community
Hi Dagster Team! I am currently trying to make my Dagster deployment more resilient to transient network issues. We use the
executor and it has been working great for us, but when the connection to the broker is lost even for a few seconds, the whole run will fail with a
exception. I have been looking at Run Retries which may definitely help here, but we already have logic to handle failures in a
we consistently attach to all our ops. Obviously, we do not want the "Run Retry" logic to retry jobs where the failure hook has had the chance to run. Is there a way to mark a job inside from inside the hook in a way to ensure Dagster will not retry it? Alternatively, I have been looking into implementing my own
that would simply re-dispatch the same job, but I do not think the retry count is accessible from the
and I do not want to retry failed jobs in an infinite loop. How can I set up Dagster so it will retry jobs that have failed in such a way that my
did not run, and only those?
Thanks in advance!
Oh I just had a look at the
logic and how it keeps track of the retry count by using a tag on the job. Clever!
Unless there is a native solution, I guess I will implement my own run failure sensor and do something similar.
Anyone please? 🥲
Hi VxD. Yep, there currently isn't a way to customize run retries to not run in certain situations.
I think your run failure sensor solution sounds reasonable--within your hook, you could add a tag via
and in the run failure sensor, check whether that tag exists.
OK, will do! Thanks Claire for the help! Much appreciated. dagster angel
🌈 1
isn't a public API currently, which means that it is meant for internal use and can unexpectedly change. You could file a feature request to petition to add it as part of the public API
Sure! In the meantime it's OK, we have our own DB where we keep the status of jobs separately.