The cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability.

dagster

I'm working on a pipeline that's supposed to use a list of URLs from an environment variable (I have this as a resource), and for each of the URLs we need to run multiple steps: 1. for each URL we need to get different data sets (the datasets to get are also an environment variable/resource), for each data set we get 2. push data to Kafka (to a topic with the same name as the data set). The process for each URL/data set can/should run in parallel. the entire pipeline should run on a schedule. The process looks something like the image bellow.

what would be the best way to tackle this? where should I be using assets/graphs/jobs/ops?

Thanks for the insights!

Hi <@U04M7747Q2J> you’ll likely wan to use ops/graph/jobs for this, specifically dynamic graphs <https://docs.dagster.io/concepts/ops-jobs-graphs/dynamic-graphs>