The cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability.

dagster

Hi guys, I'm having trouble loading the partition graphs on a schedule page that has a lot of executions. It makes Dagit unresponsive until I get a timeout.

It's a microbatch job that executes every 5 minutes, so that might be pushing some limits.

Hey <@UTZEYPN5V> - what DB are you using?

We can take a look at that page’s performance, but it’s also worth trying to upgrade your RDS instance type and seeing if that helps. We’ve seen that help page load times significantly in the past.

Unfortunately it still doesnt load the graphs, <@UL7C8DRKN>

Granted, I only went from a t3.small to a t3.medium, which is not that big of an upgrade. But getting those graphs back doesn't really justify increasing my hosting costs.

Anyway, this is not a critical feature for me, just kinda sucks having a page in dagit that can bottleneck my instance, but I can live with it for now.

But let me know if I can provide any info that can help you.

Thanks for trying to up the instance type. We’ll take a closer look at that page and make sure we’re querying + loading data efficiently. Tracking issue here: <https://github.com/dagster-io/dagster/issues/3041>

hey <@UTZEYPN5V>! ahh sorry to hear the performance is that bad… I was actually just talking with the team about how to debug this and I think there’s a way we can dump the info Dagit is trying to load on your machine and use it to reproduce+fix. Will pull together a few steps after lunch, stay tuned! :pray:

Hi <@UCMSYKG1Y>! Sure thing, happy to help

dump-partition-view-data.sh

Hi <@UTZEYPN5V> , thank you! I put together a small bash script that queries the dagster graphQL API for the data that is required for that partition view, which should let us narrow down whether the query time (in python) or render time (in the browser) is the problem, and if it’s the latter I can load the data into my Dagster instance to profile the UI.

Could you open this script and change the variables at the top (the pipeline name and partition set name, etc)? When you run it it’ll output timings and also write the retrieved data to a folder prefixed `dump-*` in your working directory. If you could send those to me in a DM i’d really appreciate it!

(Should add that the data will contain info that is usually visible from that partition tab, including solid names and materialization / expectation names, but shouldn’t contain any other PII or run configuration)