Roei Jacobovich
06/09/2022, 10:45 PMRetryRequested
using K8sRunLauncher
and K8sExecutor
. We raise a RetryRequested
during our jobs due to known limitations (AWS quotas, for example).
We get a STEP_UP_FOR_RETRY
event, but Dagit (and our K8s logs) doesn’t show any sign of actually executing a new pod for that task. Usually, we get an ENGINE_EVENT
of Executing step <step_name> in Kubernetes job …
, but nothing for the retried steps.
The run always fails with
kubernetes.client.exceptions.ApiException: (404)
Reason: Not Found
with the following message
jobs.batch \"dagster-step-<something>-1\" not found
It worked well when we used the default run launcher using static Celery executors.
(btw we’re using Dagster 0.14.17).
Thanks.johann
06/10/2022, 4:15 PMDagster Bot
06/10/2022, 4:15 PMRoei Jacobovich
06/10/2022, 5:08 PMVladislav Khokhlov
06/30/2022, 12:11 PMRoei Jacobovich
06/30/2022, 3:52 PMbackoff
on Ops instead. It works but we’re spinning up some compute during the backing off time.
@johann it would be great if someone could take a look at the issue and the bug 🙏 thanksHiroyuki Ota
08/29/2022, 3:31 AMjohann
09/07/2022, 3:33 PMHiroyuki Ota
09/13/2022, 1:17 AMHiroyuki Ota
09/26/2022, 1:16 AMHiroyuki Ota
09/26/2022, 1:17 AMjohann
09/26/2022, 2:27 PMHiroyuki Ota
10/26/2022, 12:46 AMjohann
10/26/2022, 3:52 PMjohann
11/04/2022, 8:11 PMjohann
11/10/2022, 4:50 PMjohann
11/10/2022, 9:21 PM