I’ve got two questions that are related to retriev...
# ask-community
r
I’ve got two questions that are related to retrieving data from an external api; 1. What are the best practice of handling rate limits (of asset backfills)? 2. I want to solely retrieve updated data from the API, e.g. using a modified_date field. Should partitions be used, even though a partition might get out date by updating a newer partition?
🤖 1
o
For the rate limiting bit, one possibility would be to limit the number of concurrent runs of your backfill (https://docs.dagster.io/deployment/run-coordinator#limiting-run-concurrency), to avoid having a ton of jobs all hit the API at once. For the second bit, I think partitions would likely not be a great fit, because of the issue you mention -- it's pretty easy to get things into a weird state. If possible, it might make sense to do a query before hitting the API that tells you what the newest data you have is, then use that to inform what modified_date to pass in.
r
Thanks for helping @owen! Regarding the second question; would it still make sense to make use of assets for this purpose? And create an IOManager that retrieves all records in the load_input, while storing only the modified records in the handle_output. Or would this go against the design principle of assets and are ops and jobs preferred in this case?
o
I think using that IOManager w/ sdas makes sense to me, as it seems like functionally storing all of the records in handle_output would do the same thing as storing just the modified ones (just in a less efficient way).
1
r
I agree, I think it makes sense! Thanks for helping :)