https://dagster.io/ logo
#ask-community
Title
# ask-community
s

Sundara Moorthy

02/06/2023, 4:34 PM
Hi Team, currently we facing dagster._core.errors.DagsterExecutionInterruptedError on random runs of the repository. Note: dagster is deployed on gke & dagster-version: 1.1.14. Any help on this issue?
Copy code
dagster._core.errors.DagsterExecutionInterruptedError
  File "/usr/local/lib/python3.7/site-packages/dagster/_core/execution/api.py", line 991, in pipeline_execution_iterator
    for event in pipeline_context.executor.execute(pipeline_context, execution_plan):
  File "/usr/local/lib/python3.7/site-packages/dagster/_core/executor/in_process.py", line 50, in execute
    output_capture=plan_context.output_capture,
  File "/usr/local/lib/python3.7/site-packages/dagster/_core/execution/api.py", line 1104, in __iter__
    pipeline_context=self.pipeline_context,
  File "/usr/local/lib/python3.7/site-packages/dagster/_core/execution/plan/execute_plan.py", line 115, in inner_plan_execution_iterator
    dagster_event_sequence_for_step(step_context)
  File "/usr/local/lib/python3.7/site-packages/dagster/_core/execution/plan/execute_plan.py", line 349, in dagster_event_sequence_for_step
    raise interrupt_error
  File "/usr/local/lib/python3.7/site-packages/dagster/_core/execution/plan/execute_plan.py", line 265, in dagster_event_sequence_for_step
    for step_event in check.generator(step_events):
  File "/usr/local/lib/python3.7/site-packages/dagster/_core/execution/plan/execute_step.py", line 383, in core_dagster_event_sequence_for_step
    _step_output_error_checked_user_event_sequence(step_context, user_event_sequence)
  File "/usr/local/lib/python3.7/site-packages/dagster/_core/execution/plan/execute_step.py", line 94, in _step_output_error_checked_user_event_sequence
    for user_event in user_event_sequence:
  File "/usr/local/lib/python3.7/site-packages/dagster/_core/execution/plan/compute.py", line 177, in execute_core_compute
    for step_output in _yield_compute_results(step_context, inputs, compute_fn):
  File "/usr/local/lib/python3.7/site-packages/dagster/_core/execution/plan/compute.py", line 154, in _yield_compute_results
    user_event_generator,
  File "/usr/local/lib/python3.7/site-packages/dagster/_utils/__init__.py", line 457, in iterate_with_context
    next_output = next(iterator)
s

Sean Davis

02/06/2023, 4:37 PM
You might want to look at the GKE jobs that were generated using kubernetes tooling (kubectl) to see if they were killed (due to not enough memory, for example).
s

Sundara Moorthy

02/06/2023, 4:51 PM
Copy code
dagster - ERROR - job_12253f12af72 - 06655061-95eb-45bb-adbb-273558952891 - 1 - RUN_FAILURE - Execution of run for "job_12253f12af72" failed. Execution was interrupted unexpectedly. No user initiated termination request was found, treating as failure.
No memory issue. And if i re-trigger the repo it is working fine.
s

Sean Davis

02/06/2023, 5:02 PM
All good questions. I don't have a good answers for you... 😟
s

Sundara Moorthy

02/06/2023, 6:40 PM
Okay.. Any leads, who can help me on this?
s

sean

02/08/2023, 12:27 PM
Hi Sundara, this isn’t my wheelhouse but I’ve put out a call for a more knowledgable colleague to help here.
j

johann

02/08/2023, 4:24 PM
Hi Sundra, the Pod that the run was executing in shut down for some reason. It could be that the Node it was on got overloaded/shutdown, it could be an OOM, etc. You’ll want to look in to your K8s cluster to find the reason. You should be able to find the Kubernetes Job name in the event logs for your run. With that, you can execute
kubectl describe job <job name>
74 Views