Hey there!
Trying to debug these errors:
"{"created":"@1692385399.808842105","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3260,"referenced_errors":[{"created":"@1692385399.808841205","description":"failed to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":167,"grpc_status":14}]}"
They keep happening and it causes the deployments to not be usable from dagit and also causes graphql queries to dagit to fail during this time.
I have checked the resource usage for the pods and they are low < 50% during the entire time this is happening.
I have tried bumping up the timeout seconds on the reqdinessProbe but it hasnt seemed to change how often these happen.
readinessProbe:
periodSeconds: 20
timeoutSeconds: 15
successThreshold: 1
failureThreshold: 15
I am also considering changing dagsterApiGrpcArgs -> codeServerArgs but idk if that would make a difference. Some help here would be appreciated.