Did Dagster’s asset loading behavior change betwee...
# ask-community
a
Did Dagster’s asset loading behavior change between 1.3.14 and 1.4.2? Trying to load some of our code locations after migrating to 1.4.2 and it’s extremely slow, there’s like a 25 sec lag between loading some jobs. Can provide more context if needed
t
Hey Alex! Some more context would be helpful. Are there any specific integrations that you're using that generate assets, notably the dbt one?
a
Hey! No, for these assets we’re just using Dagster core and no integrations. We are using
dagster-dbt
elsewhere (in a different code location) • The only changes were going from 1.3.14 -> 1.4.2 and replacing our usage of
AssetDefinition.asset_keys
to
AssetDefinition.keys
• We do however use rely on a network call to generate a list of partitions that are used as
StaticPartitionsDefinition
Think I may have figured it out, some of our assets have a ton of Static Partitions (planning to migrate to Dynamic Partition with 1.4) Limiting the # of static partitions seems to enable to code location to come up as fast as usual
👌 1
t
Out of curiosity, how many static partitions are we talking about? 👀
a
• Yeah I saw y’all added this dupe check
Duplicate partition keys passed to StaticPartitionsDefinition will now raise an error.
Since 1.3.14 And its a ton, some of them have >10k
🧠 3
The reason for wanting to move our assets to Dynamic Partitions is to improve the performance of the UI + make backfills easier. Anyway I figure the root cause is that Dagster now does a DISTINCT over the partition set which in our case is quite large
t
yeah, def supportive of you overriding it.
🎅 1
a
Are dynamic partitions also designed to handle this many partitions?
t
It's definitely pushing the limits, but it'll be better than that distinct call that static partitions use. I'll take any feedback about performance on 5-figure partitions. More users are hitting those numbers and we'll need to optimize. What are the 10k partitions made up of? You'd get more performance breaking up into multiple assets of slices of partitions or multi-partitions.
a
For us it makes sense ergonomically to keep things as part of the same asset rather than split into multiple. In our case, the partitions are sequential integers representing block numbers (think blockchain)! It makes sense for the primary key to be the number of the block rather than the date it was mined, so that’s why we aren’t using DailyPartitionsDefinition Also down to look into multi partitions as well if it’ll improve performance
o
hi @Alex Kan, just checking in on this -- I'd personally be at least slightly surprised if that deduplication check was causing noticeable perf impact with a list containing ~10k strings (although I've been surprised before!) Is it possible for you to profile the code location loading (
sudo pyspy record -- python my_repository_file.py
should do the trick) to determine what's taking so long?
👀 1
a
Hey, @owen just got around to this and got the profiler to work. Lemme dm you the output