The cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability.

dagster

Hello All,
Question is that what is/are the best practices to repeat the same pipeline for homogenous entries, i.e. a few database records need to go through a same pipeline every time pipeline runs?

Hi <@U01S7S852CF>, there are a few ways you could accomplish this. If you want one single run of the pipeline to process multiple records, you might benefit from the `map` command (<https://docs.dagster.io/_apidocs/dynamic#dynamic-graphs-experimental>) that would allow you to call the same solid (and it’s downstreams) on each record. Alternatively you could consider a sensor (<https://docs.dagster.io/concepts/partitions-schedules-sensors/sensors>) that kicked off a new run each time a record is added to the table, but that might not make sense depending on the number of records, etc.

Thanks <@U015C9U9RLK>!
Map seems like an experimental feature of dagster at this point, and I’m not sure if the API will change in the future releases. I would ideally rely on a production-ready capability.
So thinking the Sensors alternative, what did you exactly mean by “might not make sense make sense in depending number of records”?

cc <@UH3RM70A2> but I think map is pretty stable at this point- collect is newer. In regards to the sensor- all I meant is that this will mean you launch one pipeline run per record. Since a run corresponds to a process/k8s job/etc (depending on run launcher+executor), this may have a high overhead