I'm scraping an API that can be partitioned and ca...
# ask-community
I'm scraping an API that can be partitioned and can support 30 simultaneous calls, so I'm looking for ways to parallelize over partitions using asyncio/tasks/gather or anyio/task groups. Would I need to create an executor for this use case or is there a way to do this with built-ins? In essence, I'd like to run ops in parallel using async within a single process. Thanks for any suggestions.
One answer that works for me is to use a "partition range" rather than a single partition as the partition source. From there, I can manage the async parallelism (still by day, but running multiple days simultaneously using
task groups) directly and materialize the partitions directly. So far, that works well.
The partition range approach makes a lot of sense. Another possible solution is to use an in process executor and yield dynamic outputs, using one output per partition. Not sure how that compares to your current approach, but this may work for your use case as well.
👍 1