I have a question about run monitoring, specifical...
# ask-community
a
I have a question about run monitoring, specifically about run start timeouts I see in the docs that value
180
for
start_timeout_seconds
is the default and doesn't "need to be specified execpt to override" it but can I actually have no timeout for the run to start while still leaving
run_monitoring
itself
enabled
? (I need it to have general run timeouts)
@claire @daniel maybe you can give any input on that? couldn't find any supporting documentation elaborating on that config key will try to set it to
0
or
-1
and test, maybe this is how I disable this timeout
d
I don't think we supporting it to 0 or -1 currently although it's a very reasonable ask - in the short term, it's a bit inelegant, but would setting it to a very high value be an option that it will never realistically hit (say, the number of seconds in a year)?
a
yes, that's what we did for now
thank you for the reply
condagster 1
@daniel while you're still here, maybe you can advise on what I can do if I have a lot of jobs in
starting
state but none actually start because the k8s pods which daemon created for it do not exist anymore (2nd screenshot) I guess these jobs were somehow requested during the
helm upgrade
of the whole setup and all these jobs became orphans with no actual pod for them
d
probably worth a new post for this question - but what I would suggest is to force terminate the runs
a
and there is no way to trigger the re-execution of all?
I can do it 1 by 1, and it works by creating a new k8s job for it, but there is a
1000
of them hehe
d
Here's a script that you could use:
Copy code
from dagster import DagsterInstance, RunsFilter, DagsterRunStatus

instance = DagsterInstance.get() # needs your DAGSTER_HOME to be set
queued_runs = instance.get_runs(filters=RunsFilter(statuses=[DagsterRunStatus.STARTING]))
for run in queued_runs:
    instance.report_run_canceled(run)
you could run that on the dagit pod
a
that would help me restarting them, but at least I can clear it out this way, yes thank you
d
Included a fix that will let you set start_timeout_seconds to 0 here and have it turn off the feature, even if run monitoring is disabled: https://github.com/dagster-io/dagster/pull/14084
a
so with this it won't be failing runs if they are in
STARTING
state for indefinitely long time?
d
yeah, it would disable the feature
a
nice, thanks looking forward to have it released