I am really excited to dig into the new `collect` ...
# announcements
m
I am really excited to dig into the new
collect
and dynamic mapping feature. I wanted to share my use case and make sure it fits. I have a script that passes in a list of IDs, and iterates over each ID, to hit an api and download a file for that ID. The list of IDs varies per run. I also use
concurrent.futures
to use multiple threads while downloading. From looking at the docs and using the test code, I really like how it fans out each interaction to see the dynamic nature of what work the solid is doing. Is collect/dynamic mapping meant to run in serial? I also saw a mention of “retries” in the 0.11.0 blog post - is that an automatic thing? I’d love any feedback on my use case - whether y’all think it fits and/or any gotchas I might consider. Thx! (Please pardon any incorrect use of technical terminology 🙂)
a
seems like a good fit from what i can tell
Is collect/dynamic mapping meant to run in serial?
if you use the default in process executor it will execute in serial but you can use multi process or more complex executor for parallel https://docs.dagster.io/concepts/executors#overview
retries” in the 0.11.0 blog post - is that an automatic thing?
not yet, you have to raise a retry request https://docs.dagster.io/concepts/solids-pipelines/solid-events#retry-requests
m
(as @alex just said) One of the great features of Dagster is that the executor can handle the concurrency for you In the example below; Dagster is running a maximum of 2 concurrent instances of my
fetch_github_releases
solid - note how in the gantt view I can see which solids were run in parallel; including the relative time each repo takes to download
IMO this makes for simpler (and easier to test) solid code 🙂
m
@alex thank you for the links and feedback! @mrdavidlaing thanks for sharing your example. This is exactly what I need. The more I dig in, the more awesome the platform is. I agree this simplifies the code.
👍 2