Rohan Prasad
01/10/2023, 7:44 PMDagsterExecutionInterruptedError
in a few of our jobs … was trying to find what this means in docs but can’t seem to find it. Would anyone be able to help in terms of how to resolve this?
I found this on GH: https://github.com/dagster-io/dagster/blob/1.0.17/python_modules/dagster/dagster/_core/execution/plan/utils.py#L84-L94 which if I’m reading this correctly means that this error will get thrown if there’s no retry_policy?
CC: @Phil Armoursandy
01/10/2023, 9:14 PMRohan Prasad
01/10/2023, 9:24 PMPhil Armour
01/10/2023, 9:30 PMsandy
01/10/2023, 9:31 PMjohann
01/10/2023, 9:34 PMRohan Prasad
01/10/2023, 9:36 PMCaio Tavares
01/10/2023, 9:39 PM1.0.17
Right after the interrupt error there is a log message form the engine: Ignoring a duplicate run that was started from somewhere other than the run monitor daemon
and the run didn't continue from there.dagsterDaemon:
image:
tag: 1.0.17
runMonitoring:
enabled: true
# Temporary workaround provided by Dagster Support. Revisit this later on.
maxResumeRunAttempts: -1
runRetries:
enabled: true
maxRetries: 3
johann
01/10/2023, 9:40 PMRohan Prasad
01/10/2023, 9:40 PMCaio Tavares
01/10/2023, 9:40 PMjohann
01/10/2023, 9:44 PMEventLogConsumerDaemon
are relevant for retries not startingCaio Tavares
01/10/2023, 9:45 PMsqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedTable) relation "kvs" does not exist
EventLogConsumerDaemon
johann
01/10/2023, 9:47 PMCaio Tavares
01/10/2023, 9:49 PMjohann
01/10/2023, 9:49 PMkvs
table. The above guide will migrate the db to add the new tableCaio Tavares
01/10/2023, 9:50 PMPhil Armour
01/10/2023, 9:51 PMCaio Tavares
01/10/2023, 9:52 PMRohan Prasad
01/10/2023, 9:52 PMCaio Tavares
01/11/2023, 2:40 PMjohann
01/11/2023, 3:20 PMCaio Tavares
01/11/2023, 3:23 PMjohann
01/12/2023, 7:30 PMCaio Tavares
01/12/2023, 7:31 PMjohann
01/12/2023, 8:21 PMCaio Tavares
01/12/2023, 8:23 PMjohann
01/12/2023, 10:43 PMmaxRetries: 3
to every failed run (very silly behavior). All of these retries would have been retry_number: 1
, since they were retrying a different failure. Eventually if it caught up to the latest runs, it would have retried any of those that failed with retry number 2 and 3.Caio Tavares
01/13/2023, 8:04 PMjohann
01/23/2023, 5:15 PMCaio Tavares
01/25/2023, 3:44 PMjohann
01/25/2023, 4:30 PMCaio Tavares
01/25/2023, 4:30 PM1.1.13 (core)
?johann
01/25/2023, 4:31 PMCaio Tavares
01/25/2023, 4:31 PM