When a run fails I d love a way in Dagit to quickly filter t dagster #dagster-feedback

When a run fails, I'd love a way in Dagit to quick...

Mark Fickett

03/05/2023, 12:52 AM

When a run fails, I'd love a way in Dagit to quickly filter to the logs that tell me why. In recent examples, this means (a) event logs of type run_failure and step_failure, and (b) error logs, ideally filtered to related steps. (It continues to trip me up that

RUN_FAILURE

is not shown when I flip the

error

toggle.)

chris

03/06/2023, 5:38 PM

I’ve also found this annoying when debugging stuff - I feel like there should definitely be a one button click to filter just to error events rather than have to switch off all other events. Added an issue bc I also feel quite strongly about this: https://github.com/dagster-io/dagster/issues/12741

ty thankyou 1

Mark Fickett

03/23/2023, 1:48 PM

It's hard to find the logs for a step failure, similar to a run failure. (Updating here since chris' ticket got closed.) Our oncall engineer, who has not been as deeply involved with Dagster setup, was debugging a job that failed. I had left instructions to check

type:RUN_FAILURE

. But in this case (I think a k8s pod crashed with an OOM) there was a

type:STEP_FAILURE

. So they didn't know to look for that specific non-error log type. Then when one step fails, all the downstream steps fail with the same log type, so you can't just filter to the one real error. It would be great to make it automatic to find the error message associated with whatever made a job fail.

plus1 1

chris

04/14/2023, 1:38 AM

Hey getting to this a bit late - but I think your analysis here makes sense. Error surfacing is definitely something we’re thinking about a lot right now - and I think this is a very realistic pain point we can do more to address. Will put it on my queue to raise this

ty thankyou 1

Open in Slack

Previous Next