https://dagster.io/ logo
#ask-community
Title
# ask-community
n

Nitin Madhavan

06/23/2022, 10:04 AM
Hi, I wanted some advice on using dagster-prometheus pushgateway. I have a sensor which checks for a CSV file in a folder and runs/processes the csv rows in the file, if found. I want to track the number of rows processed. So I am sending the row_count on completion of the run to the pushgateway. The problem I am facing is that the prometheus counter/gauge object restarts every run as a fresh instance. So I am not able to use it as a counter. If I just pass the value, prometheus cannot track rate (rows recieved in last 1 minute, etc). How do I track overall rows processed by the pipeline?
🤖 1
Just to elaborate - I one file is processed at 10AM with 1000 rows, I push rowcount=1000 to the pushgateway. Now the prometheus scrapes the gateway every 15s and keeps showing rowcount as 1000. Now if I get another file at 1030, and the file has 500 rows, the prometheus expects me to send 1500 (1000 + 500). This will allow me to keep track of total messages and use rate to find out how many new messages have come per minute, etc. But in dagster, I can't use this pattern - I have to send 500 (instead of 1000) as this is a new process and the previous counter object is closed. So what pattern can I use to keep track of total messages processed by a dagster pipeline over many runs?
s

sandy

06/24/2022, 10:29 PM
Hi Nitin - I'm not very familiar with Prometheus, but it's possible that sensor cursors would help you out here? https://docs.dagster.io/concepts/partitions-schedules-sensors/sensors#sensor-optimizations-using-cursors
n

Nitin Madhavan

06/25/2022, 6:55 AM
Thanks. For now, I am just appending the count to a file and reading values when required to get total count. It is a hack and your suggestion seems better. I will have a look. Thanks.
16 Views