https://dagster.io/ logo
Title
y

Yevhen Samoilenko

03/06/2023, 10:29 AM
Hi all. It seems like the retry policy doesn't work for asset-based jobs. This code doesn't work (dagster adds tags to the job, I can see them in dagit, but failed jobs do not restart):
asset_job = define_asset_job(
    name=f"extract_load_prod_job",
    tags={
        "dagster/max_retries": 3,
        "dagster/retry_strategy": "FROM_FAILURE",
        "env": "prod",
    },
    selection=AssetSelection.groups("prod"),
    partitions_def=DailyPartitionsDefinition(start_date="2023-01-01"),
)
dagster version 1.1.17, DockerRunLauncher
j

johann

03/07/2023, 2:23 AM
Have you configured your dagster.yaml to enable run retries?
run_retries:
  enabled: true
https://docs.dagster.io/deployment/run-retries#configuration
y

Yevhen Samoilenko

03/07/2023, 4:14 PM
Yep, it is enabled
image.png
j

johann

03/09/2023, 3:13 PM
Do you see any errors in the logs of your daemon process? Particularly with the string
EventLogConsumerDaemon
One possibility- when was your database initialized? If it was before 0.15.0, it may be missing a table required for run retries. If that’s the case, you’ll see logs like
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedTable) relation "kvs" does not exist
. The guide to migrating is here https://docs.dagster.io/deployment/guides/kubernetes/how-to-migrate-your-instance
y

Yevhen Samoilenko

03/09/2023, 3:18 PM
cool, thanks! and if we are deploying to a single node (using docker)?
yep, this is our case
2023-03-09 15:15:54 +0000 - dagster.daemon.EventLogConsumerDaemon - ERROR - Caught error:
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedTable) relation "kvs" does not exist
👍 1
j

johann

03/09/2023, 3:21 PM
Where are you running your DB?
y

Yevhen Samoilenko

03/09/2023, 3:21 PM
cloud sql
maybe i can do it via docker exec on a running container?
j

johann

03/09/2023, 3:23 PM
The migration guide should cover you. It’s never a bad idea to make a backup before doing a schema upgrade
❤️ 1
Would you mind filing an issue for us to surface this error better? It at least should be mentioned on the docs for retries
y

Yevhen Samoilenko

03/09/2023, 3:24 PM
sure!
❤️ 1
I'll do it asap! Thank you, @johann!