--- seeing some issues with retries and dynamic fa...
# ask-community
o
--- seeing some issues with retries and dynamic fans. first i'll get this
Copy code
dagster._core.definitions.events.RetryRequested
  File "/usr/local/lib/python3.9/site-packages/dagster/_core/execution/plan/execute_plan.py", line 224, in dagster_event_sequence_for_step
    for step_event in check.generator(step_events):
  File "/usr/local/lib/python3.9/site-packages/dagster/_core/execution/plan/execute_step.py", line 357, in core_dagster_event_sequence_for_step
    for user_event in check.generator(
  File "/usr/local/lib/python3.9/site-packages/dagster/_core/execution/plan/execute_step.py", line 69, in _step_output_error_checked_user_event_sequence
    for user_event in user_event_sequence:
  File "/usr/local/lib/python3.9/site-packages/dagster/_core/execution/plan/compute.py", line 174, in execute_core_compute
    for step_output in _yield_compute_results(step_context, inputs, compute_fn):
  File "/usr/local/lib/python3.9/site-packages/dagster/_core/execution/plan/compute.py", line 142, in _yield_compute_results
    for event in iterate_with_context(
  File "/usr/local/lib/python3.9/site-packages/dagster/_utils/__init__.py", line 432, in iterate_with_context
    return
  File "/usr/local/lib/python3.9/contextlib.py", line 137, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/usr/local/lib/python3.9/site-packages/dagster/_core/execution/plan/utils.py", line 87, in solid_execution_error_boundary
    raise RetryRequested(
The above exception was caused by the following exception:
dagster._core.errors.DagsterExecutionInterruptedError
  File "/usr/local/lib/python3.9/site-packages/dagster/_core/execution/plan/utils.py", line 47, in solid_execution_error_boundary
    yield
  File "/usr/local/lib/python3.9/site-packages/dagster/_utils/__init__.py", line 430, in iterate_with_context
    next_output = next(iterator)
  File "/usr/local/lib/python3.9/site-packages/dagster/_core/execution/plan/compute_generator.py", line 73, in _coerce_solid_compute_fn_to_iterator
    result = fn(context, **kwargs) if context_arg_provided else fn(**kwargs)
  File "/opt/datarwe_nlp/pipelines/freetext_ner_inference/data.py", line 164, in preprocessed_texts_op
    p = multiprocessing.Pool(6)
  File "/usr/local/lib/python3.9/multiprocessing/context.py", line 119, in Pool
    return Pool(processes, initializer, initargs, maxtasksperchild,
  File "/usr/local/lib/python3.9/multiprocessing/pool.py", line 212, in __init__
    self._repopulate_pool()
  File "/usr/local/lib/python3.9/multiprocessing/pool.py", line 303, in _repopulate_pool
    return self._repopulate_pool_static(self._ctx, self.Process,
  File "/usr/local/lib/python3.9/multiprocessing/pool.py", line 326, in _repopulate_pool_static
    w.start()
  File "/usr/local/lib/python3.9/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/usr/local/lib/python3.9/multiprocessing/context.py", line 277, in _Popen
    return Popen(process_obj)
  File "/usr/local/lib/python3.9/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/local/lib/python3.9/multiprocessing/popen_fork.py", line 73, in _launch
    os._exit(code)
  File "/usr/local/lib/python3.9/site-packages/dagster/_utils/interrupts.py", line 74, in _new_signal_handler
    raise error_cls()
and then a little later
Copy code
dagster._check.CheckError: Invariant failed. Description: Attempted to mark step preprocessed_texts.preprocessed_texts_op[8] as complete that was not known to be in flight
  File "/usr/local/lib/python3.9/site-packages/dagster/_core/execution/api.py", line 1035, in pipeline_execution_iterator
    for event in pipeline_context.executor.execute(pipeline_context, execution_plan):
  File "/usr/local/lib/python3.9/site-packages/dagster/_core/executor/step_delegating/step_delegating_executor.py", line 220, in execute
    active_execution.handle_event(dagster_event)
  File "/usr/local/lib/python3.9/site-packages/dagster/_core/execution/plan/active.py", line 402, in handle_event
    self.mark_success(step_key)
  File "/usr/local/lib/python3.9/site-packages/dagster/_core/execution/plan/active.py", line 346, in mark_success
    self._mark_complete(step_key)
  File "/usr/local/lib/python3.9/site-packages/dagster/_core/execution/plan/active.py", line 387, in _mark_complete
    check.invariant(
  File "/usr/local/lib/python3.9/site-packages/dagster/_check/__init__.py", line 1470, in invariant
    raise CheckError(f"Invariant failed. Description: {desc}")
And in dagit the step is marked as complete
have worked around this by disabling multiprocessing and decreasing the batch size to keep the runtime equivalent this results in more step overhead, which ends up being quite a bit using k8s executor. I can't figure out why the
Pool()
init is sometimes crashing. can't reproduce locally at all and it doesn't even seem to get caught by exception handling ---- is retry behaviour supported in dynamic graphs? if not is the multiprocessing the cause of the strange errors?
a
DagsterExecutionInterruptedError
indicates that the process was sent a signal to stop, likely this was kubernetes terminating the container potentially for something like moving the pod for autoscaling we try to set the properties on the job spec to tell k8s not to restart the pod but some autoscaling systems cause it to do it anyway, which is my best guess for the cause on the second error with the invariant violation
o
ahh ok, thanks