https://dagster.io/ logo
#ask-community
Title
# ask-community
a

Alex Kan

08/01/2023, 4:39 PM
Did Dagster’s asset loading behavior change between 1.3.14 and 1.4.2? Trying to load some of our code locations after migrating to 1.4.2 and it’s extremely slow, there’s like a 25 sec lag between loading some jobs. Can provide more context if needed
t

Tim Castillo

08/01/2023, 6:11 PM
Hey Alex! Some more context would be helpful. Are there any specific integrations that you're using that generate assets, notably the dbt one?
a

Alex Kan

08/01/2023, 6:17 PM
Hey! No, for these assets we’re just using Dagster core and no integrations. We are using
dagster-dbt
elsewhere (in a different code location) • The only changes were going from 1.3.14 -> 1.4.2 and replacing our usage of
AssetDefinition.asset_keys
to
AssetDefinition.keys
• We do however use rely on a network call to generate a list of partitions that are used as
StaticPartitionsDefinition
Think I may have figured it out, some of our assets have a ton of Static Partitions (planning to migrate to Dynamic Partition with 1.4) Limiting the # of static partitions seems to enable to code location to come up as fast as usual
👌 1
t

Tim Castillo

08/01/2023, 6:48 PM
Out of curiosity, how many static partitions are we talking about? 👀
a

Alex Kan

08/01/2023, 6:50 PM
• Yeah I saw y’all added this dupe check
Duplicate partition keys passed to StaticPartitionsDefinition will now raise an error.
Since 1.3.14 And its a ton, some of them have >10k
🧠 3
The reason for wanting to move our assets to Dynamic Partitions is to improve the performance of the UI + make backfills easier. Anyway I figure the root cause is that Dagster now does a DISTINCT over the partition set which in our case is quite large
t

Tim Castillo

08/01/2023, 6:56 PM
yeah, def supportive of you overriding it.
🎅 1
a

Alex Kan

08/01/2023, 7:35 PM
Are dynamic partitions also designed to handle this many partitions?
t

Tim Castillo

08/01/2023, 8:02 PM
It's definitely pushing the limits, but it'll be better than that distinct call that static partitions use. I'll take any feedback about performance on 5-figure partitions. More users are hitting those numbers and we'll need to optimize. What are the 10k partitions made up of? You'd get more performance breaking up into multiple assets of slices of partitions or multi-partitions.
a

Alex Kan

08/01/2023, 8:06 PM
For us it makes sense ergonomically to keep things as part of the same asset rather than split into multiple. In our case, the partitions are sequential integers representing block numbers (think blockchain)! It makes sense for the primary key to be the number of the block rather than the date it was mined, so that’s why we aren’t using DailyPartitionsDefinition Also down to look into multi partitions as well if it’ll improve performance
o

owen

08/04/2023, 11:07 PM
hi @Alex Kan, just checking in on this -- I'd personally be at least slightly surprised if that deduplication check was causing noticeable perf impact with a list containing ~10k strings (although I've been surprised before!) Is it possible for you to profile the code location loading (
sudo pyspy record -- python my_repository_file.py
should do the trick) to determine what's taking so long?
👀 1
a

Alex Kan

08/07/2023, 11:52 PM
Hey, @owen just got around to this and got the profiler to work. Lemme dm you the output