I m trying to write an asset that stores the response of an dagster #ask-community

I'm trying to write an asset that stores the respo...

Drew You

04/17/2023, 4:18 PM

I'm trying to write an asset that stores the response of an api call. The api doesn't offer historical data. So, I'd like to have dagster do something like:

Copy code

each time interval:
  asset queries api
  asset appends {formatted_time_interval: api_response} to a table

the

TimeWindowPartition

isn't a good fit because there is no fixed backfill option. Would this be more like a

DynamicPartition

that I continuously expand each time the

asset

runs? That feels sloppy, especially as the

asset

would need to do some weird work around rectifying the partition keys w/ the query time.

Tim Castillo

04/17/2023, 4:30 PM

Yeah, the tough part is that if you don't have history, then it's technically not-backfillable. What I've seen/done before is: • querying the API • using a Kafka queue or something Pub/Sub to catch the messages and allow replaying if needed (Errors happen) • Dumping the data into blob storage • Building a partitioned asset out of the blobg storage data. and this new asset can be backfilled. You can tinker around with whether or not you want a queue, but that's general pattern I've approached trying to reconcile with an API that doesn't have historical data

❤️ 1

Drew You

04/18/2023, 1:23 PM

so, in this case would you just define a job that did the first 3 steps? I don't know that backfilling is specifically required.

Tim Castillo

04/18/2023, 1:31 PM

Likely yeah! Partitions that aren't backfillable are a bit of an anti-pattern since one bad day could ruin all of your data. What are you looking to do with your partitions? ex. load chunks into memory?

Drew You

04/18/2023, 1:47 PM

Yeah, unfortunately I haven't figured out a way to backfill reality yet :T At a high level, we use dagster to combine a bunch of different time-series data and produce derived concepts. So, it would be nice to be able to integrate this non-backfillable chunk into our current dagster graphs and concepts. We could just manage things like these separately, but the same abstraction is easier for us.

😅 1

3 Views

Open in Slack

Previous Next