Hi Dagster Team I am currently trying to make my Dagster dep dagster #ask-community

Hi Dagster Team! I am currently trying to make my ...

VxD

04/11/2023, 8:10 AM

Hi Dagster Team! I am currently trying to make my Dagster deployment more resilient to transient network issues. We use the

dagster_celery

executor and it has been working great for us, but when the connection to the broker is lost even for a few seconds, the whole run will fail with a

DagsterExecutionInterruptedError

exception. I have been looking at Run Retries which may definitely help here, but we already have logic to handle failures in a

failure_hook

we consistently attach to all our ops. Obviously, we do not want the "Run Retry" logic to retry jobs where the failure hook has had the chance to run. Is there a way to mark a job inside from inside the hook in a way to ensure Dagster will not retry it? Alternatively, I have been looking into implementing my own

@run_failure_sensor

that would simply re-dispatch the same job, but I do not think the retry count is accessible from the

RunFailureSensorContext

and I do not want to retry failed jobs in an infinite loop. How can I set up Dagster so it will retry jobs that have failed in such a way that my
failure_hook
did not run, and only those? Thanks in advance!

VxD

04/11/2023, 9:06 AM

Oh I just had a look at the

auto_run_reexecution

logic and how it keeps track of the retry count by using a tag on the job. Clever!

VxD

04/11/2023, 9:08 AM

Unless there is a native solution, I guess I will implement my own run failure sensor and do something similar.

VxD

04/11/2023, 10:09 PM

Anyone please? 🥲

claire

04/11/2023, 11:30 PM

Hi VxD. Yep, there currently isn't a way to customize run retries to not run in certain situations.

claire

04/11/2023, 11:32 PM

I think your run failure sensor solution sounds reasonable--within your hook, you could add a tag via

context.instance.add_run_tags(...)

and in the run failure sensor, check whether that tag exists.

VxD

04/11/2023, 11:33 PM

OK, will do! Thanks Claire for the help! Much appreciated. dagster angel

🌈 1

claire

04/11/2023, 11:35 PM

Though

add_run_tags

isn't a public API currently, which means that it is meant for internal use and can unexpectedly change. You could file a feature request to petition to add it as part of the public API

VxD

04/11/2023, 11:36 PM

Sure! In the meantime it's OK, we have our own DB where we keep the status of jobs separately.

3 Views

Open in Slack

Previous Next