Hi I currently use a `run failure sensor` that monitors all dagster #ask-community

Hi! I currently use a `run_failure_sensor` that mo...

Jordan

03/20/2023, 9:34 PM

Hi! I currently use a

run_failure_sensor

that monitors all my repositories every 30 seconds to notify me. I would like to add a retry system only for some errors (via a regex on

context.failure_event.message

) with a delay between the error and the next execution (for example 10 minutes after the first failure, 1h after the second failure, etc.). The difficulty I have here is to schedule the executions. I had thought of adding the ids of the failures in the sensor cursor, to then check at each tick if the execution should be scheduled or wait for the next ticks. Since

run_failure_sensor

is a predefined sensor in dagster, I don't have control over the cursor for example. I guess I need to define my own sensor that can do this. Maybe there are other alternatives that are easier to set up? I was going to use

EventRecordsFilter

which allows to list the last executions. I notice that this function has

after_timestamp

and

after_cursor

parameters. Is there one of the two parameters to be privileged in my case? I would also like to make sure that the

EventRecordsFilter

runtime is not too important, (especially when initializing the sensor, or in case I have to catch up a lot of runs because the sensor was disabled). Maybe there is a way to limit the number of ticks to a few tens/hundreds per tick so that the sensor does not reach its maximum time limit. I had a quick look at the source code to understand how to define my own sensor, but I'm having a bit of trouble identifying the necessary parts, especially since I'd like to avoid using dagster-specific functions that might evolve in the next releases. Do you have a strategy to advise me to implement this? Thanks in advance

19 Views

Open in Slack

Previous Next