When a run fails, I'd love a way in Dagit to quick...
# dagster-feedback
m
When a run fails, I'd love a way in Dagit to quickly filter to the logs that tell me why. In recent examples, this means (a) event logs of type run_failure and step_failure, and (b) error logs, ideally filtered to related steps. (It continues to trip me up that
RUN_FAILURE
is not shown when I flip the
error
toggle.)
c
I’ve also found this annoying when debugging stuff - I feel like there should definitely be a one button click to filter just to error events rather than have to switch off all other events. Added an issue bc I also feel quite strongly about this: https://github.com/dagster-io/dagster/issues/12741
ty thankyou 1
m
It's hard to find the logs for a step failure, similar to a run failure. (Updating here since chris' ticket got closed.) Our oncall engineer, who has not been as deeply involved with Dagster setup, was debugging a job that failed. I had left instructions to check
type:RUN_FAILURE
. But in this case (I think a k8s pod crashed with an OOM) there was a
type:STEP_FAILURE
. So they didn't know to look for that specific non-error log type. Then when one step fails, all the downstream steps fail with the same log type, so you can't just filter to the one real error. It would be great to make it automatic to find the error message associated with whatever made a job fail.
plus1 1
c
Hey getting to this a bit late - but I think your analysis here makes sense. Error surfacing is definitely something we’re thinking about a lot right now - and I think this is a very realistic pain point we can do more to address. Will put it on my queue to raise this
ty thankyou 1