Hi! A scheduled job (in prod, run in isolation mod...
# ask-community
e
Hi! A scheduled job (in prod, run in isolation mode) failed when allocating the resources. After a few minutes with log
ENGINE_EVENT Still waiting for compute resources to spin up.
, it displayed
RUN_FAILURE Agent db8a94e8 detected that run worker failed.
and the run stopped. It is a job that usually runs correctly every two hours for a few seconds, not needing a lot of memory or compute resources. Is there a reason why that run failed? Is there a way to automatically relaunch the job when the run failed for that reason? Thanks! (cc @Kevin Martin)
d
Hi Eliott - to answer your last question, you can configure a default number of run retries for your deployment here: https://docs.dagster.io/dagster-cloud/developing-testing/deployment-settings-reference#run-retries-run_retries Or tag specific jobs with a number of runs that they should retry:
Copy code
from dagster import job


@job(tags={"dagster/max_retries": 3})
def sample_job():
    pass
If you have a link to the run we can take a look at why it failed
e
Hi Daniel! Thanks for the answer! Here is the link to the run