I’ve configured Dagster to use Postgres as its sto...
# ask-community
I’ve configured Dagster to use Postgres as its storage and performance compared to my previous usage with sqlite isn’t particularly great. I’ve investigated a bit and one thing I’m noticing is a high rate of connection churn: opening and closing connections. More details in 🧵
Here are prometheus graphs for 5 minutes of usage in Dagit. Simply clicking around, looking at runs, assets partitions, etc. The Dagster deamon is not running, so this is only reading data of out postgres. The first graph is a 15s sampling of the number of open connections which shows a peak at 6, but is usually around ~3 which seems reasonable. The second graph shows the number of connections made, you can see that this keeps increasing and after only 5 minutes, we’ve already opened (and closed) 90 connections to postgres. This seems excessive to me, is this expected? Normally, database connections are pooled and reused since they are typically a bit expensive to obtain (TCP / TLS, authentication, postgres “startup sequence”, etc.). Are connections not reused in dagster?
Here are the same graphs, over 15 minutes of running ~60 runs and some clicking around in Dagit. You can see the total number of connections made is >7K while we never have more than 10 concurrent connections opened at any time. That’s a rate of about 7 connections per second. Is this a known problem?
Thank you for the high quality analysis. Could you you file an issue for this? I think https://github.com/dagster-io/dagster/issues/8466 is related so up to you whether to append there or start a new top level one. I think we should definitely expose configurability around pooling. Changing the default behavior will require more thought and care.
Thanks, I’ll re-use that issue for further investigations. FYI: I’ve opened this PR which reduces the problem a bit.
thank you box 1