I'm trying to write an asset that stores the respo...
# ask-community
I'm trying to write an asset that stores the response of an api call. The api doesn't offer historical data. So, I'd like to have dagster do something like:
Copy code
each time interval:
  asset queries api
  asset appends {formatted_time_interval: api_response} to a table
isn't a good fit because there is no fixed backfill option. Would this be more like a
that I continuously expand each time the
runs? That feels sloppy, especially as the
would need to do some weird work around rectifying the partition keys w/ the query time.
Yeah, the tough part is that if you don't have history, then it's technically not-backfillable. What I've seen/done before is: • querying the API • using a Kafka queue or something Pub/Sub to catch the messages and allow replaying if needed (Errors happen) • Dumping the data into blob storage • Building a partitioned asset out of the blobg storage data. and this new asset can be backfilled. You can tinker around with whether or not you want a queue, but that's general pattern I've approached trying to reconcile with an API that doesn't have historical data
❤️ 1
so, in this case would you just define a job that did the first 3 steps? I don't know that backfilling is specifically required.
Likely yeah! Partitions that aren't backfillable are a bit of an anti-pattern since one bad day could ruin all of your data. What are you looking to do with your partitions? ex. load chunks into memory?
Yeah, unfortunately I haven't figured out a way to backfill reality yet :T At a high level, we use dagster to combine a bunch of different time-series data and produce derived concepts. So, it would be nice to be able to integrate this non-backfillable chunk into our current dagster graphs and concepts. We could just manage things like these separately, but the same abstraction is easier for us.
😅 1