Hey All, we have been setting up our data infra wi...
# ask-community
c
Hey All, we have been setting up our data infra with Dagster on K8s (EKS) and have been really impressed so far with the ease of use and it's capabilities. We are running into a minor issue that I thought maybe someone here has seen before. I will include the error/log in a snippet below but in short, Dagit is throwing EOF errors on some SQL calls. We are using a postgres RDS instance on AWS as the database with a pretty standard EKS deployment. We do use Istio for network and service management on the EKS cluster.
Copy code
Operation name: RootWorkspaceQuery

Message: (psycopg2.OperationalError) SSL SYSCALL error: EOF detected

[SQL: SELECT c.relname FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace WHERE n.nspname = %(schema)s AND c.relkind in ('r', 'p')]
[parameters: {'schema': 'public'}]
(Background on this error at: <https://sqlalche.me/e/14/e3q8>)

Path: ["workspaceOrError","locationEntries",0,"locationOrLoadError","repositories",0,"schedules"]

Locations: [{"line":38,"column":15}]
The DB is under very little load. We recently tried adding
keepalives: 1
and
Keepalives_idle: 60
to the
postgresqlParams
values for the Helm deployment but this did not mitigate the issue.
d
Hi Caleb - are these errors transient or do they happen every time you load a particular page?
Looking around for this error it seems that there are some additional arguments to our engine that might be able to help here (pool_pre_ping)
c
@daniel they are intermittent and don't seem to be page dependent.
When we do a full page refresh they go away for a while.
For what it's worth I don't recall seeing these when we used the default container based postgres included with Helm, we noticed them once we switched to an external RDS instance instead.
d
I see a couple of other keepalive-related toggles described here that you could try: https://www.roelpeters.be/error-ssl-syscall-error-eof-detected/ But we can also look into adding that pool_pre_ping flag as an option
@Dagster Bot issue Transient connection issues to an external RDS database
d
c
@daniel Thanks for the blog post, we actually set those in various combinations already and it didn't seem to help.
If
pool_pre_ping
can be exposed that would definitely be worth trying I think
m
Hi, @Caleb Fornari! I'm actually facing the same issue in a relatively close setup. Was you able to mitigate the issue?
c
@Mykola Palamarchuk No we are still experiencing it. It doesn't block us since a reload works but it's just annoying.