Drew Sonne

09/03/2021, 11:46 AM
I'm trying to solve a GRPC issue. I'm getting the following error:
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status = StatusCode.UNAVAILABLE details = "failed to connect to all addresses" debug_error_string = "{"created":"@1630669409.194800714","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/","file_line":3008,"referenced_errors":[{"created":"@1630669409.194797596","description":"failed to connect to all addresses","file":"src/core/ext/filters/client_channel/lb_policy/pick_first/","file_line":397,"grpc_status":14}]}" >
I'm trying to debug it myself, and using grpcurl as
grpcurl -plaintext <host>:4000 list
and getting the following response:
Failed to list services: server does not support the reflection API
. To help me debug this, what are the symbols exposed on the grpc endpoint I could use to test it's functioning correctly?
This error is occurring when trying to reload the Workspace or launching dagit. I'm running dagster 0.12.9

Nilesh Pandey

09/03/2021, 12:22 PM

Drew Sonne

09/03/2021, 12:44 PM
Heya, thanks! I've had a look through, and upped the resources on my containers to quite large numbers, but looks like I've still got it


09/03/2021, 3:07 PM
Hi Drew- we have
dagster api grpc-health-check
which we use for probes. Return code 0 means success (it doesn’t print anything, it should.)
Are you deploying on docker? K8s? You could look for error logs on the user deployment containers

Drew Sonne

09/04/2021, 7:02 AM
I'm on ECS, but there wasn't anything aside from that stack trace turning up in the logs
The health check has helped!
I think I've found that the grpc server is taking a long time (order of minutes) to get to a healthy state
trying to figure out some logging to see where it's lagging