Yehuda Ornstein

02/01/2023, 5:02 PM
I'm working on a pipeline that's supposed to use a list of URLs from an environment variable (I have this as a resource), and for each of the URLs we need to run multiple steps: 1. for each URL we need to get different data sets (the datasets to get are also an environment variable/resource), for each data set we get 2. push data to Kafka (to a topic with the same name as the data set). The process for each URL/data set can/should run in parallel. the entire pipeline should run on a schedule. The process looks something like the image bellow. what would be the best way to tackle this? where should I be using assets/graphs/jobs/ops? Thanks for the insights!


02/01/2023, 5:23 PM
Hi @Yehuda Ornstein you’ll likely wan to use ops/graph/jobs for this, specifically dynamic graphs

Yehuda Ornstein

02/01/2023, 5:25 PM
Thanks @jamie. I'll take a look