Unable to terminate job. I have a job that got st...
# dagster-serverless
s
Unable to terminate job. I have a job that got stuck in a “starting” state for the last 33 hours (for a job that is supposed to run every 3 hours). I’ve tried terminating the job, but it persists in the “starting” state. I want to do the “force terminate” but there’s this warning about computational resources. How do I ensure that the resources are cleaned up after doing this? Also, how can I prevent this stuck state in the future or at least get alerted if a job is taking much longer than expected?
And then after I tried force terminating, I got this message. However, now it does look like the job is terminated.
j
the good news is that in the case that the underlying compute is not cleaned up successfully, we eat that cost and it shouldn't be reflected in your metered usage
it is odd that this run got stuck though...
can you share a link to it?
s
I’ll DM you the link.
Is there a way to put a max run time on a job, and would that even work if it required a “Forced Termination”?
j
there isn't a way to do that atm there's this gh issue open for tracking https://github.com/dagster-io/dagster/issues/3666 but i think even with that if the run isn't terminating you'd still likely have the same issue
s
This happened again today
d
We've added job-level timeouts since this was first posted actually
s
ah cool, I saw that the PR mentioned above was still open so I didn’t think it was addressed yet