I am running Dagster using AWS ECS I have the run monitoring dagster #ask-community

I am running Dagster using AWS ECS. I have the run...

Sean Han

08/04/2023, 5:22 AM

I am running Dagster using AWS ECS. I have the run_monitoring configuration in my dagster.yaml as following. The run was marked as failed, but it didn't send out the slack message. How to let Dagster send a slack message when this happens again?

Copy code

run_launcher:
  module: dagster_aws.ecs
  class: EcsRunLauncher
  config:
    include_sidecars: true
    secrets_tag: "" 

run_monitoring:
  enabled: true
  start_timeout_seconds: 180
  max_resume_run_attempts: 0 
  poll_interval_seconds: 120

Here is the error message:

Copy code

Run 49f436f0-5cde-4506-9343-8ee395ba86a3 has been running for 240.16203594207764 seconds, which is longer than the timeout of 180 seconds to start. Marking run failed

Does anyone know how to fix the issue of the job run hanging at

Starting

status sometimes?

owen

08/04/2023, 5:37 PM

hi @Sean Han -- are you using a run_failure_sensor, or something else to send these slack alerts? as for the job getting stuck on Starting, this generally indicates a failure in spinning up a run worker. there's a wide variety of reasons that this might happen, but checking the ecs console might be a good place to start to look for relevant logs

Sean Han

08/04/2023, 10:31 PM

use @slack_on_failure("#{channel}".format(channel=os.getenv("SLACK_CHANNEL")), dagit_base_url=os.getenv("DAGIT_BASE_URL")) over the job.

Sean Han

08/04/2023, 10:32 PM

thanks for your help

Sean Han

08/04/2023, 10:43 PM

If I use

Copy code

@run_failure_sensor

, do I need to add it to every job I have?

owen

08/04/2023, 11:09 PM

ah I see, yeah those slack_on_failure hooks only execute within the context of an already-running job, so you'll want to use run_failure_sensor. by default, a run failure sensor will monitor all jobs in the code location that it's set up for, so you should be able to just create a single one

Sean Han

08/05/2023, 3:18 AM

thanks, @owen let me give a try.

Open in Slack

Previous Next