hi. is there a way to increase grpc timeout? Recen...
# ask-community
m
hi. is there a way to increase grpc timeout? Recently face this error from a pipelines triggered via schedule:
Copy code
Error: Caught an error for run b11d456b-0374-4936-8ff2-e9a0ae9492f5 while removing it from the queue. Marking the run as failed and dropping it from the queue: Exception: Timed out waiting for gRPC server to start with arguments: "/usr/local/bin/python -m dagster.grpc --lazy-load-user-code --socket /tmp/tmpt1i9jqlo --heartbeat --heartbeat-timeout 120 --fixed-server-id bcb5c01a-2f88-4439-ab7e-136ca4e65b67 -f /opt/dagster/dags/repos.py -d /opt/dagster/dags". Most recent connection error: grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
    status = StatusCode.UNAVAILABLE
    details = "failed to connect to all addresses"
    debug_error_string = "{"created":"@1638496923.552474091","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3093,"referenced_errors":[{"created":"@1638496923.552472571","description":"failed to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":163,"grpc_status":14}]}"
This only happens to a single pipeline. Altho it’s a heaviest one (in terms of computing resources required). Other pipelines work fine.
d
Hi max - we could make the timeout configurable, but it's currently set at 60 seconds which is pretty high. If the server is taking more than 60 seconds just to start up (which happens before it starts running the pipeline - the fact that it uses a lot of computing resources to run ops shouldn't affect the server startup time) , it likely indicates that the box you're running dagster on is very overloaded which is likely to cause other problems even if we increase the timeout. Maybe another previously started run of the heavy pipeline is happening in the background and slowing everything down? A look at which processes are taking up resources while things are running slowly would probably provide some clues. If you have a particularly beefy pipeline and want to make sure that it can't affect dagster system components while it's running, one option is to set up dagster to run in a containerized setup like Docker. Another is to run your own gRPC server: https://docs.dagster.io/concepts/repositories-workspaces/workspaces#running-your-own-grpc-server - so that the daemon doesn't need to start up servers.
🙏 1
m
Ah yes. We have another small spark pipeline triggering at the same time. The box only has two vCPUs (ok for our workload) I guess I'll try shifting the start time by a few minutes, so it doesn't overlap Ty!
194 Views