Jakub Zgrzebnicki
02/01/2023, 1:02 PMShahab Tasharrofi
02/01/2023, 2:37 PMrun_monitoring:
enabled: true
start_timeout_seconds: 300 # ECS runs can take a long time to start (~80 seconds is normal)
max_resume_run_attempts: 0
poll_interval_seconds: 120
prha
02/01/2023, 5:03 PMJakub Zgrzebnicki
02/02/2023, 6:11 AMArnoud van Dommelen
02/22/2023, 1:37 PMpoll_interval_seconds
the maximum allowed duration of an Op before the monitoring daemon fails the run?Jakub Zgrzebnicki
02/22/2023, 1:39 PMArnoud van Dommelen
02/22/2023, 1:46 PMrun_monitoring:
enabled: true
start_timeout_seconds: 600
So this setup does not check for hanging "running" jobs but just jobs that are stuck on "starting"?