Kirk Stennett
04/04/2022, 6:07 PMdbt_cloud_run_op
and then it ultimately fails when it tries to retry a second time with something like:
kubernetes.client.exceptions.ApiException: (404)
Reason: Not Found
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'X', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': 'X', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'X', 'Date': 'Sun, 03 Apr 2022 14:31:49 GMT', 'Content-Length': '284'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"jobs.batch \"dagster-step-fcb024c52f02ea006fb2b73294153771-2\" not found","reason":"NotFound","details":{"name":"dagster-step-fcb024c52f02ea006fb2b73294153771-2","group":"batch","kind":"jobs"},"code":404}
It always happens if a job fails twice but is set to retry a few times. In this case it had a job for dagster-step-fcb024c52f02ea006fb2b73294153771
and dagster-step-fcb024c52f02ea006fb2b73294153771-1
. Any idea why it couldn't retry more than once?dagster-step
and dagster-step-1
are deleted while dagster-step-2
is triggered to run?johann
04/04/2022, 6:48 PMKirk Stennett
04/04/2022, 8:05 PMjohann
04/04/2022, 8:13 PMKirk Stennett
04/04/2022, 8:25 PMjohann
04/04/2022, 8:48 PMKirk Stennett
04/04/2022, 9:02 PMFile "/usr/local/lib/python3.7/site-packages/dagster/core/execution/api.py", line 785, in pipeline_execution_iterator
for event in pipeline_context.executor.execute(pipeline_context, execution_plan):
File "/usr/local/lib/python3.7/site-packages/dagster/core/executor/step_delegating/step_delegating_executor.py", line 217, in execute
plan_context, [step], active_execution
File "/usr/local/lib/python3.7/site-packages/dagster_k8s/executor.py", line 230, in check_step_health
job = self._batch_api.read_namespaced_job(namespace=self._job_namespace, name=job_name)
File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api/batch_v1_api.py", line 2657, in read_namespaced_job
return self.read_namespaced_job_with_http_info(name, namespace, **kwargs) # noqa: E501
File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api/batch_v1_api.py", line 2758, in read_namespaced_job_with_http_info
collection_formats=collection_formats)
File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 353, in call_api
_preload_content, _request_timeout, _host)
File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 184, in __call_api
_request_timeout=_request_timeout)
File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 377, in request
headers=headers)
File "/usr/local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 244, in GET
query_params=query_params)
File "/usr/local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 234, in request
raise ApiException(http_resp=r)
johann
04/04/2022, 9:15 PMKirk Stennett
04/04/2022, 10:41 PMjohann
04/06/2022, 1:14 AMdagster/run-id
label to run workers and step workers https://github.com/dagster-io/dagster/pull/7167Kirk Stennett
04/06/2022, 3:24 PM