Pablo Beltran
04/14/2023, 6:50 PM# Experimental feature to add fault tolerance to Dagster runs. The new Monitoring Daemon will
# perform health checks on run workers. If a run doesn't start within the timeout, it will be
# marked as failed. If a run had started but then the run worker crashed, the daemon will attempt
# to resume the run with a new run worker.
runMonitoring:
enabled: true
# Timeout for runs to start (avoids runs hanging in STARTED)
startTimeoutSeconds: 180
# How often to check on in progress runs
pollIntervalSeconds: 120
# Max number of times to attempt to resume a run with a new run worker. Defaults to 3 if the the
# run launcher supports resuming runs, otherwise defaults to 0.
maxResumeRunAttempts: 0
daniel
04/14/2023, 7:17 PMPablo Beltran
04/14/2023, 7:43 PMPablo Beltran
04/18/2023, 5:41 PMPablo Beltran
04/18/2023, 5:41 PMdaniel
05/16/2023, 5:15 PMdaniel
05/16/2023, 6:11 PMmaxResumeRunAttempts: ~