I have a job using the k8s Run Launcher, and Multi...
# ask-community
s
I have a job using the k8s Run Launcher, and Multiprocess Executor, which often fails because of a
DagsterExecutionInterruptedError
. This isn’t because of a termination request; it’s just a mysterious failure because something is killing my process. Maybe it’s out of memory? I’d like advice on how to debug this sort of thing - how can I get more inspection on the reason for a failure like this?
dagster bot resolve to discussion 1
j
Hi Spencer, I created a gh discussion with some debug steps https://github.com/dagster-io/dagster/discussions/12943
❤️ 1
s
Thanks. It turns out the pod was evicted because the node was low on ephemeral storage. This took a lot of investigation to figure out; it would be terrific if dagster-k8s could present more information on failure. I’ll see if I can come up with a more concrete feature request.
It also turns out that the Pod’s status is marked as “Succeeded”, which seems quite wrong; I’m pretty sure this is a bug (or missing feature, at least) in dagster-k8s
j
for surfacing more debug info to the UI. Something that’s on our plate for sure