I have a job that changes a bunch of tables and is triggered dagster #ask-community

I have a job that changes a bunch of tables and is...

Bernardo Cortez

06/07/2022, 6:15 PM

I have a job that changes a bunch of tables and is triggered by a schedule. What is the "dagstier" way of avoiding redudant runs? I.e., to avoid that job to run twice in a day if there is some problem (some long queues that make such a delay that one run jumps into the next day, for example)? Thanks!

johann

06/07/2022, 8:07 PM

Hmm, in most cases the runs are able to use the partition key to only work on the subset of data intended. It sounds like this job always runs on all data?

johann

06/07/2022, 8:08 PM

I don’t think we have first class support for this, but it could be achieved by querying graphql for runs and aborting the run if there’s another before it on the same day. Or doing something similar inside a sensor, so you can avoid launching the run at all

Bernardo Cortez

06/08/2022, 9:18 AM

Hi @johann! Thanks for your answer 😁

Bernardo Cortez

06/08/2022, 9:21 AM

In this case, I have a job running with a schedule that is supposed to look at 'status' field and process everything still 'pending' on a daily basis. Therefore, I do not want to look at specific date partition, as 'pending' rows might have several dates

Bernardo Cortez

06/08/2022, 9:22 AM

Querying graphql sounds unidiomatic but effective 😅 . Is there any documentation on how to do it?

johann

06/08/2022, 2:30 PM

https://docs.dagster.io/concepts/dagit/graphql#graphql-api

👍 1

2 Views

Open in Slack

Previous Next