The cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability.

dagster

Hi all. Can somebody tell me is there some way to implement pipeline similar to map-reduce?

Can you elaborate on the question a bit?  Are you looking for more of a map &gt; shuffle &gt; reduce that dagster manages?  I'd think you could do this by nature of unleashing with Spark and let the Spark framework manage it, but managing it through Solids woudl be difficult if you don't know data-splits beforehand.  Maybe Dagster parallelism has some possibilities, but I haven't explored.

It’s a non-goal (currently) for us to support generalized map reduce. We consider that a feature of a compute substrate (like Spark, distributed Dask, Hive, etc) rather than the orchestration substrate.

We may look at supporting some very coarse-grained parallelism (e.g. firing N runs off simultaneously that don’t need to coordinate in a real way) but I would suggest pushing any sort of map reduce operation down to a layer like spark as <@URKA22CG4> suggests

If i should write logic with using spark or dask for a what in this case i need to use dakster? I hoped that I will able to expres my logic in terms of dagster but execution will performing on dask cluster. For example such framework as <http://prefect.io|prefect.io> supports this ability