Carlos Pega
02/03/2023, 1:28 PMdaniel
02/03/2023, 1:30 PMCarlos Pega
02/03/2023, 1:30 PMdagster_k8s.client.DagsterK8sUnrecoverableAPIError: Unexpected error encountered in Kubernetes API Client.
File "/home/mehta/.local/lib/python3.9/site-packages/dagster/_core/executor/step_delegating/step_delegating_executor.py", line 248, in execute
health_check_result = self._step_handler.check_step_health(
File "/home/mehta/.local/lib/python3.9/site-packages/dagster_k8s/executor.py", line 264, in check_step_health
status = self._api_client.get_job_status(
File "/home/mehta/.local/lib/python3.9/site-packages/dagster_k8s/client.py", line 359, in get_job_status
return k8s_api_retry(_get_job_status, max_retries=3, timeout=wait_time_between_attempts)
File "/home/mehta/.local/lib/python3.9/site-packages/dagster_k8s/client.py", line 114, in k8s_api_retry
raise DagsterK8sUnrecoverableAPIError(
The above exception was caused by the following exception:
kubernetes.client.exceptions.ApiException: (401)
Reason: Unauthorized
HTTP response headers: HTTPHeaderDict({'Audit-Id': '1c7464ab-3d48-4848-a736-47285ac72638', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Tue, 24 Jan 2023 13:51:45 GMT', 'Content-Length': '129'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}
File "/home/mehta/.local/lib/python3.9/site-packages/dagster_k8s/client.py", line 95, in k8s_api_retry
return fn()
File "/home/mehta/.local/lib/python3.9/site-packages/dagster_k8s/client.py", line 356, in _get_job_status
job = self.batch_api.read_namespaced_job_status(job_name, namespace=namespace)
File "/home/mehta/.local/lib/python3.9/site-packages/kubernetes/client/api/batch_v1_api.py", line 2785, in read_namespaced_job_status
return self.read_namespaced_job_status_with_http_info(name, namespace, **kwargs) # noqa: E501
File "/home/mehta/.local/lib/python3.9/site-packages/kubernetes/client/api/batch_v1_api.py", line 2872, in read_namespaced_job_status_with_http_info
return self.api_client.call_api(
File "/home/mehta/.local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 348, in call_api
return self.__call_api(resource_path, method,
File "/home/mehta/.local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 180, in __call_api
response_data = self.request(
File "/home/mehta/.local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 373, in request
return self.rest_client.GET(url,
File "/home/mehta/.local/lib/python3.9/site-packages/kubernetes/client/rest.py", line 241, in GET
return self.request("GET", url,
File "/home/mehta/.local/lib/python3.9/site-packages/kubernetes/client/rest.py", line 235, in request
raise ApiException(http_resp=r)
daniel
02/03/2023, 1:46 PMCarlos Pega
02/03/2023, 1:50 PMdaniel
02/03/2023, 1:51 PMMark Fickett
02/03/2023, 2:04 PMCarlos Pega
02/03/2023, 2:23 PMdaniel
02/03/2023, 2:27 PMWHITELISTED_TRANSIENT_K8S_STATUS_CODES = [
503, # Service unavailable
504, # Gateway timeout
500, # Internal server error
]
we could potentially add a 401 here as well as a workaround for that EKS bug (assuming that's what it is) - but generally retrying on a 401 wouldn't help since that typically indicates a permissions issueCarlos Pega
02/03/2023, 2:31 PMdaniel
02/03/2023, 2:31 PMCarlos Pega
02/03/2023, 6:49 PMKeith Gross
02/21/2023, 7:03 PMCarlos Pega
03/06/2023, 12:32 PM• [dagster-k8s] Fixed an issue where pods launched by thewould sometimes unexpectedly fail due to transient 401 errors in certain kubernetes clusters.k8s_job_executor
Keith Gross
04/18/2023, 8:05 PM