Hi all :slightly_smiling_face: We need the ability...
# ask-community
m
Hi all 🙂 We need the ability to see on datadog jobs that stuck on 'canceling' state for more than 30 min. Any ideas how to do it? specifically, is there a way to get the state of a job on datadog?
dagster bot responded by community 1
z
interested in how other people might do this, but one direction to look in would be to have a sensor that uses the dagster instance available on the sensor context to query for all canceling jobs, post those to datadog as a custom metric, then have an alarm set up in datadog on that metric. unfortunately I'm not sure you could use a run_status_sensor because it seems like you might not be able to get the run's start time via the RunSensorContext, but you could probably do it through a normal sensor that uses the DagsterInstance on the SensorContext with an EventRecordsFilter for canceled jobs. it might look vaguely like this:
Copy code
from dagster import sensor, SensorExecutionContext, EventRecordsFilter, DagsterEventType

@sensor(job=send_datadog_metric)
def canceling_sensor(context: SensorExecutionContext):
    canceling_ids = []
    canceling_events = context.instance.get_event_records(EventRecordsFilter(event_type=DagsterEventType.PIPELINE_CANCELING))
    for e in canceling_events:
        if e.timestamp > datetime.now().timestamp() - 30*60:
            canceling_ids.append(e.run_id)
    if canceling_ids:
        yield RunRequest(run_config={"ops": ..."}
    else:
        raise SkipReason("No canceling jobs detected")
m
Thanks a lot Zach! Did it as you suggested 🙂