Hi! We encountered a weird race-condition. Some jo...
# ask-community
r
Hi! We encountered a weird race-condition. Some jobs fail at random (every few days) due to the following issue:
Copy code
dagster._check.CheckError: Invariant failed. Description: Attempted to mark step <op_name> as complete that was not known to be in flight
  File "/somepath/lib/python3.9/site-packages/dagster/_core/execution/api.py", line 1089, in pipeline_execution_iterator
    for event in pipeline_context.executor.execute(pipeline_context, execution_plan):
  File "/somepath/lib/python3.9/site-packages/dagster/_core/executor/step_delegating/step_delegating_executor.py", line 220, in execute
    active_execution.handle_event(dagster_event)
  File "/somepath/lib/python3.9/site-packages/dagster/_core/execution/plan/active.py", line 411, in handle_event
    self.mark_success(step_key)
  File "/somepath/lib/python3.9/site-packages/dagster/_core/execution/plan/active.py", line 346, in mark_success
    self._mark_complete(step_key)
  File "/somepath/lib/python3.9/site-packages/dagster/_core/execution/plan/active.py", line 387, in _mark_complete
    check.invariant(
  File "/somepath/lib/python3.9/site-packages/dagster/_check/__init__.py", line 1627, in invariant
    raise CheckError(f"Invariant failed. Description: {desc}")
Of course the step itself was successful. In that case, the Op itself fired an
STEP_SUCCESS
event at
22:58:01.190
while we got the error message from the run manager at
22:58:02.332
. The Op itself took about a minute. We use both
K8sRunLauncher
and
K8sExecutor
. Thanks D
c
which version of dagster are you using?
r
Hi @chris, happened again on the latest