https://dagster.io/ logo
#dagster-support
Title
# dagster-support
b

Bryan Chavez

01/15/2022, 2:28 AM
In a @run_failure_sensor context, is there a way to get the original stack trace of the exception? This below just indicates the name of the step that failed.
Copy code
context.failure_event.message
s

sandy

01/18/2022, 4:11 PM
@yuhan - do you know the answer to this?
y

yuhan

01/18/2022, 5:35 PM
context.failure_event.event_specific_data.error
provides info about the error details
b

Bryan Chavez

01/19/2022, 3:59 PM
That returns None when I print it out
y

yuhan

01/19/2022, 5:13 PM
Do you mind sharing the error info shown in the run’s logs? Sensors operate outside the job execution process. That error field is a serialized exception representation that was extracted from the original exception with a custom serialization helper and stored into event db, which may not capture all info for some exception type.
b

Bryan Chavez

01/19/2022, 9:47 PM
In the UI, it shows up with EVENT TYPE = "STEP_FAILURE" with INFO below: dagster.core.errors.DagsterExecutionStepExecutionError: Error occurred while executing op "op_publish_to_snowflake": The above exception was caused by the following exception: Exception: test Exception Stack Trace: File "C:\Users\<RETRACTED>\Envs\khde\lib\site-packages\dagster\core\execution\plan\utils.py", line 47, in solid_execution_error_boundary yield , File "C:\Users\<RETRACTED>\Envs\khde\lib\site-packages\dagster\utils\__init__.py", line 387, in iterate_with_context next_output = next(iterator) , File "C:\Users\<RETRACTED>\PycharmProjects\khc_data_eng_core\python_lib\khde\khde\dagster\dagster_op_utils.py", line 206, in op_publish_to_snowflake raise Exception("test Exception")
y

yuhan

01/19/2022, 9:49 PM
Thank you! I'll try repro on my end. My hunch is we may not be storing stack trace into the db so probably will need to add that in the persistence layer.
Hi @Bryan Chavez sorry for the late response.
Copy code
@run_failure_sensor
def my_sensor(context):
    run_id = context.dagster_run.run_id
    step_failure_event = context.instance.all_logs(
        run_id=run_id, of_type=DagsterEventType.STEP_FAILURE
    )
    print(step_failure_event.event_specific_data.error)
this should get you the right stack trace.
The “failure_event” you got from the sensor context was a pipeline failure event, which doesn’t include the detailed step-level error info. So what the code ^ does instead is to fetch the “step failure event” from the event logs using the provided run_id. The reason why pipeline failure events don’t record this information is that there could be multiple failure (e.g. two step failures happen in multiprocessing) that result in a pipeline failure, so we can’t really tie the pipeline failure event to a single exception that happens at step level.
However, if the failure did happen at the pipeline level (e.g. pipeline fails to start), helpful stack trace will be available in
context.failure_event.event_specific_data.error
.
b

Bryan Chavez

01/22/2022, 1:27 AM
so I'm assuming this would capture all scenarios:
Copy code
error = context.failure_event.event_specific_data.error
if not error:
    event_logs = context.instance.all_logs(
        run_id=run_id,
        of_type=DagsterEventType.STEP_FAILURE,
    ) or []
    error = '<br><br>'.join([e.message for e in event_logs])
y

yuhan

01/22/2022, 1:47 AM
I believe so!
b

Bryan Chavez

03/15/2022, 11:15 PM
It looks like this implementation is no longer returning the stack trace - has something changed with ".message" attribute for step failure event logs?
y

yuhan

03/15/2022, 11:35 PM
Did a quick grep in the codebase. I don’t see any obvious change recently. Which version were you on and which version are you currently using?
b

Bryan Chavez

03/16/2022, 4:26 PM
currently on version 0.14.3
not sure when it stopped working because only noticed it's getting the generic message. I did find this: https://github.com/dagster-io/dagster/commit/ea19544fcac1b965aaed34882bff0a2d3055936a
Any feedback on this? Generally just need to get the stack trace of the exception and not the generic dagster messages.
y

yuhan

03/21/2022, 5:00 PM
Hi! Sorry this thread slipped through my support tracking. looking now
Hi @Bryan Chavez what about
.user_message
? can you get the same error info from that property?
b

Bryan Chavez

03/22/2022, 12:49 AM
yeah that provided the same error, I had to drill down but was able to find it:
Copy code
step_failure_event_logs = (
    context.instance.all_logs(
        run_id=run_id,
        of_type=DagsterEventType.STEP_FAILURE,
    )
    or []
)
if step_failure_event_logs:
    errors = '<br><br>'.join(
        [
            str(e.dagster_event.event_specific_data.error)
            for e in step_failure_event_logs[:NUMBER_OF_ERRORS]
        ]
    )
if not errors:
    errors = str(context.failure_event.event_specific_data.error)
but just want to know if you know why the behavior changed and also confirmation that above would capture all the scenarios
y

yuhan

03/22/2022, 3:27 AM
that should capture all the scenarios. the reason why the behavior changed was we were double logging the same info in both user_message and message properties which caused unnecessary storage. so the change removed the redundant info, which should still allow you to get the same info but through a user_message.
@Dagster Bot discussion Get the original stack trace of the exception in run_failure_sensor
d

Dagster Bot

06/16/2022, 8:03 AM
Question in the thread has been surfaced to GitHub Discussions for future discoverability: https://github.com/dagster-io/dagster/discussions/8428