I’ve been using opentelemetry traces to analyze th...
# ask-community
s
I’ve been using opentelemetry traces to analyze the performance of some of our dagster jobs. Something that has surprised me is the number of SQL database connections that get opened and closed. A single op is opening and closing postgres DB connections 11 times! It appears that this has to do with how dagster’s run storage and event log storage work - they have context managers like
with self.connect(): …
which are used for almost all of their methods. Is there a way to trim this down, like pool the connections somehow? Those DB connection calls account for a surprisingly high fraction of our runtime and they seem unnecessary.
👀 1
Here’s a jaeger screenshot of some of the
connect
spam
It looks like this is quite intentional but I’m not sure this is not the tradeoff I’d make for my application: https://github.com/dagster-io/dagster/blob/bc0a735c0a78d879fe11a6b1ff3abd8f477efad[…]raries/dagster-postgres/dagster_postgres/event_log/event_log.py
a
@Spencer Nelson appreciate this is a month back, but is this dagit issue related? https://github.com/dagster-io/dagster/issues/8466 I would love to resolve this issue, there is a fairly large chunk of Azure Log Analytics ingestion cost incurred from these queries too.
s
Yes, that looks very related. Dagster basically requires a connection pooling frontend like pgbouncer.
116 Views