https://dagster.io/ logo
Title
m

Mohammad Nazeeruddin

12/16/2021, 1:30 PM
Hi team, When we executed pipelines k8s-jobs got stuck and there is no pod created, how to debug, can anyone help me on this issue?
d

daniel

12/16/2021, 1:43 PM
Does describing the job in kubectl give any clues? kubectl describe job <job name here>
m

Mohammad Nazeeruddin

12/16/2021, 1:44 PM
k8s-pods not created to describe.
d

daniel

12/16/2021, 1:45 PM
This would be the job not the pod - I updated the command I wrote above
👍 1
m

Mohammad Nazeeruddin

12/16/2021, 1:47 PM
[root@e88ddc17ffaf centos]# kubectl describe job dagster-run-8d02d3ff-56ca-4240-9b97-3bed5f170e0d -n dagster
Error from server (NotFound): jobs.batch "dagster-run-8d02d3ff-56ca-4240-9b97-3bed5f170e0d" not found
jobs not created to check. it's showing in dagit UI only.
d

daniel

12/16/2021, 1:51 PM
Does kubectl get jobs show a job with that run ID in its name somewhere?
m

Mohammad Nazeeruddin

12/16/2021, 1:53 PM
Run ID
8d02d3ff-56ca-4240-9b97-3bed5f170e0d are you talking about this ^ run Id?
d

daniel

12/16/2021, 1:54 PM
That's right
m

Mohammad Nazeeruddin

12/16/2021, 1:57 PM
Okay this id will helpful to debug?
d

daniel

12/16/2021, 1:59 PM
If you describe the job it might explain why it didn't create a pod
Dagster created a Kubernetes job (https://kubernetes.io/docs/concepts/workloads/controllers/job/) which should normally create a pod
m

Mohammad Nazeeruddin

12/16/2021, 2:04 PM
But the problem is it's not creating any job to describe. got this error because job/pod not created : Error from server (NotFound): jobs.batch "dagster-run-8d02d3ff-56ca-4240-9b97-3bed5f170e0d" not found. I had only dagit,daemon and repos pods.
d

daniel

12/16/2021, 2:06 PM
What command did you run to make that list?
Jobs are different than pods - you would need to run ‘kubectl get jobs’ to generate the list of jobs
It would be surprising if there were no job given the event log message that says that it created a job successfully
m

Mohammad Nazeeruddin

12/16/2021, 2:09 PM
Yeah, showing in dagit UI job created but do't know what happening not showing in terminal, I ran kubectl describe job dagster-run-8d02d3ff-56ca-4240-9b97-3bed5f170e0d -n dagster and kubectl get jobs -n dagster i got > No resources found
some time getting this error >`dagster.core.errors.DagsterExecutionInterruptedError`   
File "/usr/local/lib/python3.8/site-packages/dagster/core/execution/api.py", line 748, in pipeline_execution_iterator
    
for event in pipeline_context.executor.execute(pipeline_context, execution_plan):
  
File "/usr/local/lib/python3.8/site-packages/dagster/core/executor/in_process.py", line 38, in execute
    
yield from iter(
  
File "/usr/local/lib/python3.8/site-packages/dagster/core/execution/api.py", line 822, in __iter__
    
yield from self.execution_context_manager.prepare_context()
  
File "/usr/local/lib/python3.8/site-packages/dagster/utils/__init__.py", line 447, in generate_setup_events
    
obj = next(self.generator)
  
File "/usr/local/lib/python3.8/site-packages/dagster/core/execution/context_creation_pipeline.py", line 282, in execution_context_event_generator
    
yield from resources_manager.generate_setup_events()
  
File "/usr/local/lib/python3.8/site-packages/dagster/utils/__init__.py", line 447, in generate_setup_events
    
obj = next(self.generator)
  
File "/usr/local/lib/python3.8/site-packages/dagster/core/execution/resources_init.py", line 227, in resource_initialization_event_generator
    
yield from _core_resource_initialization_event_generator(
  
File "/usr/local/lib/python3.8/site-packages/dagster/core/execution/resources_init.py", line 160, in _core_resource_initialization_event_generator
    
for event in manager.generate_setup_events():
  
File "/usr/local/lib/python3.8/site-packages/dagster/utils/__init__.py", line 447, in generate_setup_events
    
obj = next(self.generator)
  
File "/usr/local/lib/python3.8/site-packages/dagster/core/execution/resources_init.py", line 285, in single_resource_event_generator
    
with user_code_error_boundary(
  
File "/usr/local/lib/python3.8/contextlib.py", line 113, in __enter__
    
return next(self.gen)
  
File "/usr/local/lib/python3.8/site-packages/dagster/core/errors.py", line 181, in user_code_error_boundary
    
with raise_execution_interrupts():
  
File "/usr/local/lib/python3.8/contextlib.py", line 113, in __enter__
    
return next(self.gen)
  
File "/usr/local/lib/python3.8/site-packages/dagster/core/errors.py", line 151, in raise_execution_interrupts
    
with raise_interrupts_as(DagsterExecutionInterruptedError):
  
File "/usr/local/lib/python3.8/contextlib.py", line 113, in __enter__
    
return next(self.gen)
  
File "/usr/local/lib/python3.8/site-packages/dagster/utils/interrupts.py", line 73, in raise_interrupts_as
    
raise error_cls()
d

daniel

12/16/2021, 2:18 PM
Sorry where is that error appearing?
m

Mohammad Nazeeruddin

12/16/2021, 2:22 PM
pipeline execution time got this >
dagster.core.errors.DagsterExecutionInterruptedError
d

daniel

12/16/2021, 2:22 PM
That would only happen inside the pod, I thought there was no pod
m

Mohammad Nazeeruddin

12/16/2021, 2:24 PM
Yes no pod.
d

daniel

12/16/2021, 2:24 PM
Something doesn't add up here - the only thing that would have logged that message would have been from inside the pod
It seems like your cluster might be overloaded or otherwise having trouble spinning up pods or keeping them up? That error would happen if the cluster terminated your pod
👍 1
m

Mohammad Nazeeruddin

12/16/2021, 2:40 PM
Okay