Hi team, we have a dagster job stuck in “starting”...
# ask-community
Hi team, we have a dagster job stuck in “starting” phase. We see
[K8sRunLauncher] Kubernetes run worker job created
but then nothing else after a day. I couldn’t find the job pod created in our k8s cluster though. Wondering if you know what might have caused this and how to debug please?
☝️ 1
We are on 0.14.5. Full log
Hi Hebo - if the pod isn't there, what about the k8s job? Does "kubectl get jobs | grep <job name>" show anything?
Thanks Daniel! Yes! the job exists with Pods Statuses: 0 Running / 0 Succeeded / 1 Failed
If you describe the job are there any clues?
other than that
Pods Statuses: 0 Running / 0 Succeeded / 1 Failed
, it does’t have much info
let me also check with our compute team..
I'm having the same issue. Recently upgraded to 0.14.19, running on EKS. This problem appeared only after the upgrade, but I'm not sure it's related.
Here's the output when describing the K8s job:
I launched a new instance of the same run from the Launchpad, and now I see the pod actually gets scheduled, but fails immediately on launch. I don't know what to make of that stack trace.
I'll upgrade the dagster dependency in my user-code image and report back. I see that my image is actually running dagster 0.13.18, explaining why that happened.
Hey assaf - that looks like a mismatch between your dagster-postgres package and your dagster package, yeah. There should be a pin that keeps them in lockstep but it looks like that might not be getting respected