Hello, I have the following use case: - Get list o...
# ask-community
m
Hello, I have the following use case: • Get list of strings from a url (this I completed) • Download a zip file for each string • Unzip the file • Store the result on S3 I'm stuck at the zip file download. I need to do this for 200+ strings, do I do this in one asset? Or can I iterate over the strings and have 200+ assets? I remember something in the past about composite solids, but they are not a thing any more. Also, how would my return types look like? Any feedback, advice is appreciated 🙂
y
i think how you model the assets depends on how you’d like to monitor them going forward. the simplest model is to pass down a single output (a list of 200+ strings) to the downstream and let the downstream loop through it. here’s a similar example where the first asset get a list of records from a url (and converts the list to a pandas dataframe — you can ignore that part), and the second asset takes the list and look through it: https://github.com/dagster-io/dagster/blob/master/examples/quickstart_aws/quickstart_aws/assets/hackernews.py#L13-L27 — it also stores data to s3
m
Thank you for the recommendation, I looked at that example but it didn't quite suite my needs. I'm currently trying to work with partitions. Because the downloaded zip files are per date.