Hey there! I noticed that some of our jobs had been stuck in a 'starting state even tho we have runMonitoring enabled and startTimeoutSeconds: 300 what could cause this to happen? We may add some sensors to catch these but would prevent a non sensor solution if possible.
a
alex
08/30/2023, 3:17 PM
what version are you on and what type of deployment are you using
p
Pablo Beltran
08/30/2023, 7:39 PM
open source deployment on version 1.3.10
a
alex
08/30/2023, 8:59 PM
i mean like local, ECS, k8s, docker, etc
p
Pablo Beltran
08/30/2023, 10:04 PM
Running on k8s
a
alex
08/30/2023, 10:12 PM
hmm, whats the relative percent youve observed that the start timeout works and fails a run vs when it doesnt?
Do you observe any issues with your daemon deployment/pod in its logs or health?
p
Pablo Beltran
08/31/2023, 12:11 AM
Pretty low percentage only seen this a few times. I actually see a case in which assets are finished but get stuck in the running state much more often.