Hunter Kuffel

03/03/2023, 7:55 PM
Hey all! I’m wondering if y’all can either provide me with some guidance in what I’m attempting to set up or help me understand why it may be a fool’s errand: A pattern we see more than once in our pipelines involves a downstream asset that relies on multiple upstream assets that are scheduled to ingest new data from external sources once per day. If all the upstream assets have new data, then of course I want to the downstream asset to materialize and incorporate that new data. However, if at least one of the upstream assets has new data, but one or more have failed that day’s run, I still want the downstream asset to materialize, as we can fix whatever caused the upstream job to fail (often it’s just a fluky timeout that won’t persist), and then the downstream asset can pick up the new data from that job the next time it runs. For example, if there are 10 ingestion jobs that run daily and get unioned together into a staging table, and each day, one job fails but nine succeed, it’s more important to me that the staging table incorporates those nine new datasets each day rather than stay frozen and out of date just because the Facebook API threw a fluky error. Looking through the docs, I feel like it should be possible to have some combination of a MultiAssetSensor and a RunStatusSensor that can yield a RunRequest if there’s at least one SUCCESS and enough time has gone by, but I’m struggling to put the pieces together, and I would love some affirmation that such an enterprise is at least feasible. Thanks and have a great weekend!


03/03/2023, 10:28 PM
I think it should be possible to implement this with a standard multi_asset_sensor. I.e. like this example: but modify the logic to include time and not require all assets to be materialized