Mark Fickett
03/14/2022, 8:18 PMyuhan
03/14/2022, 8:36 PMI want to get an alert if the pipeline doesn’t finish successfully.Dagster has built-in support for job-level alerting called
run_status_sensor
. You can find some examples here: https://docs.dagster.io/concepts/partitions-schedules-sensors/sensors#job-failure-sensor
I want a trend of overall pipeline duration over days (I think Dagster does this), and for different multi-op portions of the DAG (some notional ‘stage 1’, ‘stage 2’ of the pipeline that may map to a subgraph but is larger than an op).Dagit’s homepage “Factory Floor” provides a view that contains this info. Check out our blog post here: https://dagster.io/blog/dagster-0-14-0-never-felt-like-this-before#new-dagit-homepage-factory-floor-view
I want to report a metric if a particular op encounters some condition (such as no response from an API), and able to get an alert if too many ops in a fan-out hit that condition, sliced by some attribute (green widgets had 20 errors, blue widgets had 4k errors).Similarly, run status sensors should be able to address this use case too. It is a way to listen to job-level events. If that’s not the case, you can also write a custom sensor to manually listen to events, such as:
@sensor(job=my_job)
def custom_dagster_event_sensor(context):
dagster_event_records = context.instance.get_event_records(
EventRecordsFilter(
event_type=DagsterEventType.<...>, # insert the event that indicates the condition you're interested in
),
ascending=False,
limit=1,
)
if not dagster_event_records:
return
yield RunRequest(...)
Mark Fickett
03/14/2022, 8:45 PMsensorsThanks, I'll take a more detailed look at sensors.
Factory FloorCool! That's reporting at the job level, right? So that works for the overall pipeline duration, but not if I have a few areas of my DAG that I want to keep an eye on that don't map to jobs.
yuhan
03/14/2022, 8:46 PMOne particular op had 500 ERROR-level log lines. Find the op, group the log lines by error message.Again, similar to other monitoring cases, sensors could be one approach. Also, you could configure your own logger for this use case. Here are an example: https://docs.dagster.io/concepts/logging/loggers#customizing-loggers
dwall
03/14/2022, 8:47 PMZach
03/14/2022, 8:48 PMMark Fickett
03/14/2022, 8:49 PMyuhan
03/14/2022, 8:51 PMtimestamp of STEP_SUCCESS - timestamp of STEP_START
dwall
03/14/2022, 8:51 PMyuhan
03/29/2022, 12:40 AM