https://dagster.io/ logo
Title
s

Spencer Nelson

03/14/2023, 6:48 PM
I have a job using the k8s Run Launcher, and Multiprocess Executor, which often fails because of a
DagsterExecutionInterruptedError
. This isn’t because of a termination request; it’s just a mysterious failure because something is killing my process. Maybe it’s out of memory? I’d like advice on how to debug this sort of thing - how can I get more inspection on the reason for a failure like this?
:dagster-bot-resolve-to-discussion: 1
j

johann

03/14/2023, 8:46 PM
Hi Spencer, I created a gh discussion with some debug steps https://github.com/dagster-io/dagster/discussions/12943
❤️ 1
s

Spencer Nelson

03/14/2023, 11:34 PM
Thanks. It turns out the pod was evicted because the node was low on ephemeral storage. This took a lot of investigation to figure out; it would be terrific if dagster-k8s could present more information on failure. I’ll see if I can come up with a more concrete feature request.
It also turns out that the Pod’s status is marked as “Succeeded”, which seems quite wrong; I’m pretty sure this is a bug (or missing feature, at least) in dagster-k8s
j

johann

03/15/2023, 3:40 PM
for surfacing more debug info to the UI. Something that’s on our plate for sure