Hi all! Can we customize job timeouts (default tim...
# ask-community
y
Hi all! Can we customize job timeouts (default timeout and per job timeout)? Sometimes our dbt/extract jobs get stuck. And they might hang there executing for 20 hours until we see it and cancel them manually. We use DockerRunLauncher.
c
Hi Yevhen. This is something we'd like to add support for in the future, but doesn't exist at the moment. We do have a couple issues to track this: • https://github.com/dagster-io/dagster/issues/3666https://github.com/dagster-io/dagster/issues/4937 If you wanted to, you could build a sensor that checks in progress runs and cancels the runs that have been running after X minutes.
a
@claire I got the same issue on ECSRunLauncher, where runs get stuck "in progress". But where does this come from, as this happens frequently when scaling up... Is it indeed a database connection issue where the connection with the run storage is disconnected? And how to avoid this issue, since we want to run approx. 1500 runs in parallel but this already occurs around approx. 350 runs (all runs get stuck somehow). Thank you in advance!
j
Hi Arnoud, I’d recommend taking a look at the underlying ecs task status and logs to find the underlying problem. You have the option of enabling run monitoring, which will poll the task status and report failures in the UI. https://docs.dagster.io/deployment/run-monitoring But for debugging you should go to the aws console first
❤️ 1