I'm working on a pipeline that's supposed to use a list of URLs from an environment variable (I have this as a resource), and for each of the URLs we need to run multiple steps: 1. for each URL we need to get different data sets (the datasets to get are also an environment variable/resource), for each data set we get 2. push data to Kafka (to a topic with the same name as the data set). The process for each URL/data set can/should run in parallel. the entire pipeline should run on a schedule. The process looks something like the image bellow.
what would be the best way to tackle this? where should I be using assets/graphs/jobs/ops?
Thanks for the insights!