I've got a question about splitting work and if dagster is a good fit for this or not. Essentially, I'm wondering about split-apply-combine strategies. Imagine I have a large table and want to run an op on each split or I have many files in S3 and I want to run an op on each file. Consider that I have a million splits or files. Conceptually what I want is to define the table or the files as an asset (maybe with
dynamic partition) and then have one (unpartitioned) downstream asset which is the outcome of combining those one million op results.
Would dagster be a good fit for this with (dynamic) partitions or would it be better to use dagster to coordinate the work and manage the assets only? What I mean is maybe a single op that takes the entire upstream asset as input then let's, for example, dask or spark do the split and combine work on that asset and finally creates the combined output as a single asset again? Thank you in advance for your insights.