https://dagster.io/ logo
#ask-community
Title
# ask-community
b

Binoy Shah

08/28/2023, 4:23 PM
Hi, I have a use-case where Tree of partitions and sub-partitions are required to be created. This is for distributing the processing of each Primary Partition (Date-based) to smaller chunks ( size based ) such that my job runs dont rely on huge amounts of memory How do I go about adding each day’s distinct set of dynamic partition Primary Partition: Date 2023-07-21 will have 17 Chunks Sub-Partitions ( dynamic in nature) Primary Partition: Date 2023-07-22 will have 32 Chunks Sub-Partitions ( dynamic in nature) If I add chunks as dynamic partitions, I end up with a cartesian product of chunks as partitions, which I dont want. I want the sub-partitions to identify itself as subset of its primary partition. Is there a way to achieve this.. Attaching a picture of what I end up with.. 😞
s

sandy

08/28/2023, 6:05 PM
Hi Binoy - this isn't currently possible. Mind filing an issue for it if you'd like us to track the request?
b

Binoy Shah

08/28/2023, 6:15 PM
I am not sure what the ticket would be.. It could be just UI representation .. if we just go by Partition Convention
2023-08-27|2023-08-27_chunk_01
or there could be a Partition + Sub-partition kind of Tree data structure itself with
parent_partition
kind of convention. Based on multiple searches on the Dagster channels, i have seen many folks needing to do fork and join kind of work to distribute the workload with dagster, but everybody did it in their own way. Traditional Map/Reduce functionality also is on similar end goals. You know the internals inside-out, Would something like this even be feasible with Dagster? I know for sure the lineage and partition map would look fantastic with it. @sandy
s

sandy

08/28/2023, 9:07 PM
In particular, multi-partitions definitions are expected to be a cartesian product. What's currently not possible is having a different set of partitions on one dimension for each element on the other dimension.
b

Binoy Shah

08/28/2023, 11:27 PM
so for this problem statement ( distributed processing by chunking ) what Dagster Pattern / Concept is most suitable. Currently we do have it implemented as Cartesian product of 2 dimensions, which has caused more than 500K partitions, not sure if Dagster can scale to that level I am willing to refactor it for better scalability