The cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability.

dagster

Hello, we recently have a pipeline that failed to start due to capacity issues on AWS:
```Caught an unrecoverable error while dequeuing the run. Marking the run as failed and dropping it from the queue
Exception: Task failed. Failure reason: Capacity is unavailable at this time. Please try again later or in a different availability zone```
However, it failed silently without sending us a slack message and there don't seems to be any retry mechanism. Is there anyway I can add a retry for such cases? How about setting up slack alerts?

hi <@U04SLH3C48G> -- re: slack messages, I believe a <https://docs.dagster.io/concepts/partitions-schedules-sensors/sensors#run-failure-sensor|run_failure_sensor> is what you're looking for. Re automated retries, you can set that up as described <https://docs.dagster.io/deployment/run-retries#configuration|here>