Hey All we have been setting up our data infra with Dagster dagster #ask-community

Hey All, we have been setting up our data infra wi...

Caleb Fornari

06/17/2022, 5:32 PM

Hey All, we have been setting up our data infra with Dagster on K8s (EKS) and have been really impressed so far with the ease of use and it's capabilities. We are running into a minor issue that I thought maybe someone here has seen before. I will include the error/log in a snippet below but in short, Dagit is throwing EOF errors on some SQL calls. We are using a postgres RDS instance on AWS as the database with a pretty standard EKS deployment. We do use Istio for network and service management on the EKS cluster.

Caleb Fornari

06/17/2022, 5:32 PM

Copy code

Operation name: RootWorkspaceQuery

Message: (psycopg2.OperationalError) SSL SYSCALL error: EOF detected

[SQL: SELECT c.relname FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace WHERE n.nspname = %(schema)s AND c.relkind in ('r', 'p')]
[parameters: {'schema': 'public'}]
(Background on this error at: <https://sqlalche.me/e/14/e3q8>)

Path: ["workspaceOrError","locationEntries",0,"locationOrLoadError","repositories",0,"schedules"]

Locations: [{"line":38,"column":15}]

Caleb Fornari

06/17/2022, 5:37 PM

The DB is under very little load. We recently tried adding

keepalives: 1

and

Keepalives_idle: 60

to the

postgresqlParams

values for the Helm deployment but this did not mitigate the issue.

daniel

06/17/2022, 5:43 PM

Hi Caleb - are these errors transient or do they happen every time you load a particular page?

daniel

06/17/2022, 5:46 PM

Looking around for this error it seems that there are some additional arguments to our engine that might be able to help here (pool_pre_ping)

Caleb Fornari

06/17/2022, 5:46 PM

@daniel they are intermittent and don't seem to be page dependent.

Caleb Fornari

06/17/2022, 5:46 PM

When we do a full page refresh they go away for a while.

Caleb Fornari

06/17/2022, 5:48 PM

For what it's worth I don't recall seeing these when we used the default container based postgres included with Helm, we noticed them once we switched to an external RDS instance instead.

daniel

06/17/2022, 6:01 PM

I see a couple of other keepalive-related toggles described here that you could try: https://www.roelpeters.be/error-ssl-syscall-error-eof-detected/ But we can also look into adding that pool_pre_ping flag as an option

daniel

06/17/2022, 6:01 PM

@Dagster Bot issue Transient connection issues to an external RDS database

Dagster Bot

06/17/2022, 6:01 PM

Created issue at: https://github.com/dagster-io/dagster/issues/8462

Caleb Fornari

06/17/2022, 6:25 PM

@daniel Thanks for the blog post, we actually set those in various combinations already and it didn't seem to help.

Caleb Fornari

06/17/2022, 6:26 PM

pool_pre_ping

can be exposed that would definitely be worth trying I think

Mykola Palamarchuk

09/19/2022, 7:22 AM

Hi, @Caleb Fornari! I'm actually facing the same issue in a relatively close setup. Was you able to mitigate the issue?

Caleb Fornari

09/19/2022, 4:49 PM

@Mykola Palamarchuk No we are still experiencing it. It doesn't block us since a reload works but it's just annoying.

2 Views

Open in Slack

Previous Next