Hi everyone, I am using the <dagster_k8s.execute_k...
# ask-community
d
Hi everyone, I am using the dagster_k8s.execute_k8s_job function to launch a Kubernetes job from within an Op. However, the Dagster run fails after a while due to this error:
kubernetes.client.exceptions.ApiException: (400) a container name must be specified for pod 42044fa42ab602a62517dfdbc8a0c5c7-v2jj6
. This happens when Dagster tries to use the kubernetes API to get the logs from that pod. I am assuming that I am missing a Dagster config, namely on of the following (see image) but I don't know which one. Does anyone know how I can specify which container it should try to get logs from? I have tried to inspect the code myself and found where the API calls are made but can't see how I can specify the container name.
d
Hi Daniel - do you have a full stack trace for the error that you're seeing?
d
dagster._core.errors.DagsterExecutionStepExecutionError: Error occurred while executing op "deploy_op":
File "/usr/local/lib/python3.7/site-packages/dagster/_core/execution/plan/execute_plan.py", line 266, in dagster_event_sequence_for_step
for step_event in check.generator(step_events):
File "/usr/local/lib/python3.7/site-packages/dagster/_core/execution/plan/execute_step.py", line 389, in core_dagster_event_sequence_for_step
_step_output_error_checked_user_event_sequence(step_context, user_event_sequence)
File "/usr/local/lib/python3.7/site-packages/dagster/_core/execution/plan/execute_step.py", line 94, in _step_output_error_checked_user_event_sequence
for user_event in user_event_sequence:
File "/usr/local/lib/python3.7/site-packages/dagster/_core/execution/plan/compute.py", line 177, in execute_core_compute
for step_output in _yield_compute_results(step_context, inputs, compute_fn):
File "/usr/local/lib/python3.7/site-packages/dagster/_core/execution/plan/compute.py", line 154, in _yield_compute_results
user_event_generator,
File "/usr/local/lib/python3.7/site-packages/dagster/_utils/__init__.py", line 460, in iterate_with_context
return
File "/usr/local/lib/python3.7/contextlib.py", line 130, in __exit__
self.gen.throw(type, value, traceback)
File "/usr/local/lib/python3.7/site-packages/dagster/_core/execution/plan/utils.py", line 91, in op_execution_error_boundary
) from e
The above exception was caused by the following exception:
kubernetes.client.exceptions.ApiException: (400)
Reason: Bad Request
HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"a container name must be specified for pod 70c9eb1f1777cc40bd1089e1b391bd42-crfg5, choose one of: [dagster api axon-synapse]","reason":"BadRequest","code":400}\n'
File "/usr/local/lib/python3.7/site-packages/dagster/_core/execution/plan/utils.py", line 56, in op_execution_error_boundary
yield
File "/usr/local/lib/python3.7/site-packages/dagster/_utils/__init__.py", line 458, in iterate_with_context
next_output = next(iterator)
File "/usr/local/lib/python3.7/site-packages/dagster/_core/execution/plan/compute_generator.py", line 75, in _coerce_solid_compute_fn_to_iterator
result = fn(context, **kwargs) if context_arg_provided else fn(**kwargs)
File "/opt/dagster/app/dagster_src/graphs/axon_graph.py", line 111, in deploy_op
execute_k8s_job(context, **context.op_config)
File "/usr/local/lib/python3.7/site-packages/dagster/_annotations.py", line 108, in inner
return target(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/dagster_k8s/ops/k8s_job_op.py", line 305, in execute_k8s_job
log_entry = next(log_stream)
File "/usr/local/lib/python3.7/site-packages/kubernetes/watch/watch.py", line 163, in stream
resp = func(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api/core_v1_api.py", line 23747, in read_namespaced_pod_log
return self.read_namespaced_pod_log_with_http_info(name, namespace, **kwargs)  # noqa: E501
File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api/core_v1_api.py", line 23880, in read_namespaced_pod_log_with_http_info
collection_formats=collection_formats)
File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 353, in call_api
_preload_content, _request_timeout, _host)
File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 184, in __call_api
_request_timeout=_request_timeout)
File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 377, in request
headers=headers)
File "/usr/local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 245, in GET
query_params=query_params)
File "/usr/local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 235, in request
raise ApiException(http_resp=r)
This is the stack trace, I think I need to specify a "default container". Currently trying it out 🙂
I can't get it to work 😕 what happens is that the Kubernetes job launched through the Op keep running independently of the Dagster Run. The Run fails due to this error. So I see two issues: 1. Having a pod with multiple containers fails when fetching logs since no container is specified. I don't know if this is a bug or a wrong configuration from my end. 2. The K8s job keeps running even if the Op that launched it fails. I would have assumed that the Op failing would kill the job too. This seems like a bug to me.
d
Hm, we might need to add a container_name config field for this purpose
d
Okay so that would mean a feature request?
d
Yeah, kind of a feature request / bug fix blend - we’re at an offsite this week so a bit delayed but this should be a quick add, I imagine we can get it into the release next Thursday. As a workaround you could temporarily fork the op and alter it
d
Okay, thanks for the heads up 🙂 Should I open a ticket on Github then?
d
That would be great if you’re willing to do that
d
Hi Daniel, I've opened this ticket: https://github.com/dagster-io/dagster/issues/11853
🙏 1
Thanks for your help! 🙂