https://dagster.io/ logo
Title
r

Romain

10/04/2022, 8:42 PM
Hi everyone, I'm looking for a way to start a job after a backfill successfully complete. I have found two solutions here : one based on semi-public internal API's and one based on Dagster GraphQL API. I would like to know if one of them should be preferred over the other or if something new came up ever since. Context : I'm running a partitionned job that create files to be uploaded to Redshift ; since Redshift has a lot of overhead, I would like to upload all files at once (i.e. after the backfill) rather than one after the other (i.e. after each run).
:dagster-bot-resolve: 1
s

sandy

10/04/2022, 10:33 PM
@prha - thoughts?
p

prha

10/04/2022, 11:04 PM
We do have some better performant queries added maybe 3 months ago…. Here’s a snippet using semi-public internal API to fetch the status of all the partitions within your backfill job:
def backfill_statuses(instance, backfill_id):
    run_partition_data = instance.get_run_partition_data(
        runs_filter=RunsFilter(tags={'dagster/backfill': backfill_id})
    )
    statuses = {}
    for item in run_partition_data:
        statuses[item.partition] = item.status
    return statuses
And here’s the matching GraphQL fragment:
fragment BackfillFragment on PartitionBackfill {
    partitionStatuses {
        results {
             id
             partitionName
             runId
             runStatus
        }
    }
}
One is not really preferred over the other. They’re pretty stable (no plans to change them), but we don’t have back-compat guarantees over these APIs.
r

Romain

10/04/2022, 11:19 PM
Thanks a lot, I'm going to give it a try!