Philip Gunning
07/29/2022, 10:22 AM@graph
def load_assets():
data_a = get_from_vendor_a()
write_a_to_db(data_a)
data_b = get_from_vendor_b()
write_b_to_db(data_b)
run_dq_checks(data_a, data_b)
However some instability in our vendor side APIs can cause those ops to fail, and thus fail the whole pipeline graph.
We want to move to independent graphs for each vendor, however what would be best practice for performing the DQ check ops?
Is this feasible?
@graph
def load_assets_a():
data_a = get_from_vendor_a()
write_a_to_db(data_a)
return data_a
@graph
def load_assets_b():
data_b = get_from_vendor_b()
write_b_to_db(data_b)
return data_b
@graph
def ab_dq(data_a, data_b):
run_dq_checks(data_a, data_b)
What should my repo look like if this is the case?