The cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability.

dagster

Hey there!

Trying to debug these errors:
"{"created":"@1692385399.808842105","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3260,"referenced_errors":[{"created":"@1692385399.808841205","description":"failed to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":167,"grpc_status":14}]}"

They keep happening and it causes the deployments to not be usable from dagit and also causes graphql queries to dagit to fail during this time.


I have checked the resource usage for the pods and they are low &lt; 50% during the entire time this is happening.

I have tried bumping up the timeout seconds on the reqdinessProbe but it hasnt seemed to change how often these happen.
readinessProbe:
      periodSeconds: 20
      timeoutSeconds: 15
      successThreshold: 1
      failureThreshold: 15

I am also considering changing dagsterApiGrpcArgs -&gt; codeServerArgs but idk if that would make a difference. Some help here would be appreciated.