For Partition performance: seems the size of dimen...
# ask-community
x
For Partition performance: seems the size of dimensions in
MultiPartitionsDefinition
) matters so much (in the asset materialization step). after changing 1 of the 2 dimensions of
StaticPartitionsDefinition
(
StaticPartitionsDefinition
from size ~5000 to 3), the materialization time per partition (1.3 kb large pickle) is reduced from 3s to 0.1s. This is a huge loss considering the amount of partition I have: assuming the dimension of partition is (5000, 100), and no parallelization, the time loss is (3-0.1) * 5000 * 100 = 4 hours * 100 = 400 hours (otherwise it would be 0.1 * 5000 * 100 = 14 hours). ( p.s. I also tried dynamically getting the list of 5000-length string list, the time took per partition went to ~15s.) The io manager is the default pickle manager, and the dagster version is 1.3.4. Would be great if I'm doing something wrong or it gets fixed!
currently I'm feeling nervous about this performance issue. the real task for me takes ~0.3s (each, including saving time) outside dagster, the overhead for the partition abstraction seems not affordable right now.
updated to 1.3.5 and (as expected) the issue remains the same.
I create an issue on github about this and hope to get some response here or there. Thanks! https://github.com/dagster-io/dagster/issues/14384
o
Hi @Xiaotian Yu! I commented in the github issue, but the short answer is that this is something we're looking to fix in the next couple of weeks.