Max Wong
12/03/2021, 2:06 AMError: Caught an error for run b11d456b-0374-4936-8ff2-e9a0ae9492f5 while removing it from the queue. Marking the run as failed and dropping it from the queue: Exception: Timed out waiting for gRPC server to start with arguments: "/usr/local/bin/python -m dagster.grpc --lazy-load-user-code --socket /tmp/tmpt1i9jqlo --heartbeat --heartbeat-timeout 120 --fixed-server-id bcb5c01a-2f88-4439-ab7e-136ca4e65b67 -f /opt/dagster/dags/repos.py -d /opt/dagster/dags". Most recent connection error: grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "failed to connect to all addresses"
debug_error_string = "{"created":"@1638496923.552474091","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3093,"referenced_errors":[{"created":"@1638496923.552472571","description":"failed to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":163,"grpc_status":14}]}"
This only happens to a single pipeline. Altho it’s a heaviest one (in terms of computing resources required). Other pipelines work fine.daniel
12/03/2021, 3:29 PMMax Wong
12/03/2021, 3:51 PM