Hey there, I was wondering what sort of scale dags...
# ask-community
p
Hey there, I was wondering what sort of scale dagster is able to support in terms of assets/partitions. Is running 100s of assets each with 1000s of partitions per day something that is reasonable? Would the UI be able to handle it?
o
Hi @Pablo Beltran -- generally we recommend a maximum number of around 10-20k partitions for any given asset (and somewhere in the thousands assets total, regardless of the number of partitions per asset). We've seen larger numbers of partitions work, but it definitely can slow down the UI. So in short, the main concern would be the "1000s of partitions per day" bit, as you'd pretty quickly hit some scaling limits. What are these partitions representing? Is it possible to model it in a way that reduces the scale?
p
Is there any plans on improving perf here? If not this is likely something we would be willing to contribute on if we could get some help/guidance. Are the perf issues just around the UI or would the other services (daemon, code deployment) also struggle.
o
What would the total number of partitions be, in your estimation? Most of the perf issues center around the UI (there are plenty of views in which we attempt to render something per partition key), but things like backfilling a significant number of partitions at once would also cause issues What sort of partitions are these? Are they mostly time based? (i.e. one partition per minute)?
p
They are multi partitioned with static and day partitions. These partitions are customer based so we would like to be able to track them all individually. We delete partitions after 2 weeks so they don't stick around forever.
o
and how large would the static dimension be? the 10-20k partition limit is per-asset, so if you have a 2 week time frame (14 time partitions) then then I think you'd be ok with up to 1000 static partitions.
p
Ok that's great :) is there anything we should do around best practices for running dagster at this scale?
Also what about the timeline view would that be okay with this level of runs per day?
o
1000s of runs per day should be totally fine in the runs view! as for best practices, it really depends on your compute environment, we have some guides for deploying dagster but beyond that it's mostly just a matter of making sure your instance database, run workers, and other containers have enough resources