Alexander Fife
09/08/2022, 1:14 PMls -alht /proc/8739/fd
total 0
lrwx------ 1 root root 64 Sep 8 13:12 13 -> 'socket:[23909]'
lrwx------ 1 root root 64 Sep 8 13:12 14 -> 'socket:[24052]'
lrwx------ 1 root root 64 Sep 8 13:12 15 -> 'socket:[23399]'
lrwx------ 1 root root 64 Sep 8 13:09 12 -> 'socket:[23156]'
lrwx------ 1 root root 64 Sep 8 13:08 11 -> 'socket:[23576]'
lrwx------ 1 root root 64 Sep 8 13:06 10 -> 'socket:[22520]'
lrwx------ 1 root root 64 Sep 8 13:03 9 -> 'socket:[21384]'
lrwx------ 1 root root 64 Sep 8 13:01 7 -> 'socket:[21333]'
lrwx------ 1 root root 64 Sep 8 13:01 8 -> 'socket:[23994]'
lrwx------ 1 root root 64 Sep 8 13:00 6 -> 'socket:[21644]'
dr-x------ 2 root root 0 Sep 8 13:00 .
lrwx------ 1 root root 64 Sep 8 13:00 0 -> /dev/pts/0
lrwx------ 1 root root 64 Sep 8 13:00 1 -> /dev/pts/0
lrwx------ 1 root root 64 Sep 8 13:00 2 -> /dev/pts/0
lrwx------ 1 root root 64 Sep 8 13:00 3 -> 'socket:[23896]'
lrwx------ 1 root root 64 Sep 8 13:00 4 -> 'anon_inode:[eventpoll]'
lrwx------ 1 root root 64 Sep 8 13:00 5 -> 'anon_inode:[eventfd]'
dr-xr-xr-x 9 root root 0 Sep 8 12:59 ..
The longer I let it run, the more sockets get created. I don't know what the next step of debugging is. I've already rebuilt the VPS once, and the same error occurred again.daniel
09/08/2022, 1:42 PMAlexander Fife
09/08/2022, 1:44 PMREDACTED@instance-1:~$ sudo ls -alht /proc/2969627/fd
total 0
dr-x------ 2 REDACTED REDACTED 0 Sep 8 13:21 .
lr-x------ 1 REDACTED REDACTED 64 Sep 8 13:21 0 -> /dev/null
lrwx------ 1 REDACTED REDACTED 64 Sep 8 13:21 1 -> 'socket:[8022942]'
lrwx------ 1 REDACTED REDACTED 64 Sep 8 13:21 2 -> 'socket:[8022942]'
lrwx------ 1 REDACTED REDACTED 64 Sep 8 13:21 3 -> 'socket:[8060216]'
lrwx------ 1 REDACTED REDACTED 64 Sep 8 13:21 4 -> 'socket:[8061390]'
I don't know whats up but I notice that anon_inode |eventpoll] and [evendtfd] isn't there.Bianca Rosa
09/08/2022, 2:06 PME0907 21:26:00.771067497 20 <http://tcp_server_posix.cc:216]|tcp_server_posix.cc:216]> Failed accept4: Too many open files
on the workflows_user_code
service. We also have a bunch of sensors, and also for tasks failures. The container itself seems to get stuck after 2h of a deployment, for some random reason.daniel
09/08/2022, 2:32 PMAlexander Fife
09/08/2022, 2:33 PMdaniel
09/08/2022, 2:34 PMAlexander Fife
09/08/2022, 2:36 PMdaniel
09/08/2022, 2:37 PMAlexander Fife
09/08/2022, 2:38 PMdaniel
09/08/2022, 2:52 PMAlexander Fife
09/08/2022, 3:00 PMdaniel
09/08/2022, 3:01 PMAlexander Fife
09/08/2022, 3:06 PMdaniel
09/08/2022, 3:06 PMAlexander Fife
09/08/2022, 3:10 PMdaniel
09/08/2022, 3:11 PMAlexander Fife
09/08/2022, 3:12 PMBianca Rosa
09/08/2022, 3:13 PMgrpcio = ">=1.43.0"
, so we could be grabbing 1.47.1
- this started happening last week or so.daniel
09/08/2022, 3:14 PMBianca Rosa
09/08/2022, 3:14 PM