Tried out the new k8s_job_op and ran into a couple...
# ask-community
s
Tried out the new k8s_job_op and ran into a couple things: 1. If you're running with a service account, you ned to make sure it has permissions to spin up new jobs. Would be nice document exactly which permissions are required. 2. The jobs are completing successfully from the point of view of the code, but I'm getting an failure in dagster with the following error (e.g. when using the
busybox
example from the docs). Very possible that is related to the permissions issue mentioned above:
Copy code
dagster.core.errors.DagsterExecutionStepExecutionError: Error occurred while executing op "k8s_job_op":

  File "/usr/local/lib/python3.8/site-packages/dagster/core/execution/plan/execute_plan.py", line 224, in dagster_event_sequence_for_step
    for step_event in check.generator(step_events):
  File "/usr/local/lib/python3.8/site-packages/dagster/core/execution/plan/execute_step.py", line 353, in core_dagster_event_sequence_for_step
    for user_event in check.generator(
  File "/usr/local/lib/python3.8/site-packages/dagster/core/execution/plan/execute_step.py", line 69, in _step_output_error_checked_user_event_sequence
    for user_event in user_event_sequence:
  File "/usr/local/lib/python3.8/site-packages/dagster/core/execution/plan/compute.py", line 174, in execute_core_compute
    for step_output in _yield_compute_results(step_context, inputs, compute_fn):
  File "/usr/local/lib/python3.8/site-packages/dagster/core/execution/plan/compute.py", line 142, in _yield_compute_results
    for event in iterate_with_context(
  File "/usr/local/lib/python3.8/site-packages/dagster/utils/__init__.py", line 408, in iterate_with_context
    return
  File "/usr/local/lib/python3.8/contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.8/site-packages/dagster/core/execution/plan/utils.py", line 73, in solid_execution_error_boundary
    raise error_cls(

The above exception was caused by the following exception:
dagster_k8s.client.DagsterK8sUnrecoverableAPIError: Unexpected error encountered in Kubernetes API Client.

  File "/usr/local/lib/python3.8/site-packages/dagster/core/execution/plan/utils.py", line 47, in solid_execution_error_boundary
    yield
  File "/usr/local/lib/python3.8/site-packages/dagster/utils/__init__.py", line 406, in iterate_with_context
    next_output = next(iterator)
  File "/usr/local/lib/python3.8/site-packages/dagster/core/execution/plan/compute_generator.py", line 66, in _coerce_solid_compute_fn_to_iterator
    result = fn(context, **kwargs) if context_arg_provided else fn(**kwargs)
  File "/usr/local/lib/python3.8/site-packages/dagster/utils/backcompat.py", line 234, in _inner
    return callable_(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/dagster_k8s/ops/k8s_job_op.py", line 211, in k8s_job_op
    wait_for_running_job_to_succeed(
  File "/usr/local/lib/python3.8/site-packages/dagster_k8s/utils.py", line 78, in wait_for_running_job_to_succeed
    return DagsterKubernetesClient.production_client().wait_for_running_job_to_succeed(
  File "/usr/local/lib/python3.8/site-packages/dagster_k8s/client.py", line 290, in wait_for_running_job_to_succeed
    status = k8s_api_retry(
  File "/usr/local/lib/python3.8/site-packages/dagster_k8s/client.py", line 112, in k8s_api_retry
    raise DagsterK8sUnrecoverableAPIError(
🤖 1
d
Hey Stephen, do you recall where was that copy paste was taken from? I think the most useful part of the error is one inception level deeper in the exception stack
If this was in a cloud run and you have the run ID / URL handy we could probably pull the relevant bit out of it
s
will dm you
d
Confirmed offline that this is likely a permissions issue. The default permissions that we give in the dagster helm chart (which we'll document for this as well, good call), are:
Copy code
rules:
- apiGroups: ["batch"]
  resources: ["jobs", "jobs/status"]
  verbs: ["*"]
# The empty arg "" corresponds to the core API group
- apiGroups: [""]
  resources: ["pods", "pods/log", "pods/status"]
  verbs: ["*"]
s
amazing, thank you @daniel -- confirmed that this fixed the issue.