https://dagster.io/ logo
#ask-community
Title
# ask-community
d

Daniel Daum

02/21/2023, 2:46 AM
Hey team, I need some assistance with the issue I'm having, seemingly related to my user_code container and grpc. I am deploying Dagster in Docker with Docker Compose, running on a single EC2 instance. I've had a working instance for some time now, but after EC2 instance had some down time today, I cannot get my dagster project to start again (when using docker compose The error occurs almost instantly, just after the postgres container starts up. I conducted a rebuild of my containers (with no code changes), and I am starting to receve a GRPC Unavailable error for my user code repository container on startup. I'm wondering if pip installing a newer dependency version of something is causing this error or maybe changes on the EC2 instance are affecting this? (The downtime was to change from a private IP instance to a public IP instance.) Specific Error:
/usr/local/lib/python3.10/site-packages/dagster/_core/workspace/context.py:602: UserWarning: Error loading repository location repository:dagster._core.errors.DagsterUserCodeUnreachableError: Could not reach user code server. gRPC Error code: UNAVAILABLE
The above exception was caused by the following exception:
daemon      | grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
daemon      |   status = StatusCode.UNAVAILABLE
daemon      |   details = "failed to connect to all addresses"
daemon      |   debug_error_string = "{"created":"@1676946492.319074179","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3260,"referenced_errors":[{"created":"@1676946492.319073239","description":"failed to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":167,"grpc_status":14}]}"
1x Container - UI 1x Container - PostgreSQL DB 1x Container - Daemon 1x Container - User Code (I called it repository) Here is a github gist to all of the following files: https://gist.github.com/daniel-daum/6d57b3c9f129e1f9f53985d7d14d35dc Dockerfile_dagster Dockerfile_repo docker-compose.yaml dagster.yaml workspace.yaml setup.py docker compose stack trace Notes: - Docker compose works locally on my windows machine with & without docker(from source), but not on my linux instance (docker). (I did run docker compose on my local windows machine, and recieved this same error, but the use_code container still started up properly a second later.) - This is also printed by the repo container: the AWS EC2 instance metadata IP appears for some reason?
repository  |     raise ReadTimeoutError(
repository  | urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='169.254.169.254', port=80): Read timed out. (read timeout=5.0)
🤖 1
Please ignore, it took some time but I figured out the issue
It was not dagster related. It was related to running Docker on EC2, and it running into EC2 hop limits. I bind my aws credentials on startup, and periodically after that. When my EC2 instance restarted, it reset the network Hop Limit back to 1. Meaning, while I can access my AWS credentials via instance metadata on the host, I could not do so from inside a docker container on that same host (as its considered as another network, the request would have to hop). This was causing an error on startup, and explains why I was receiving the main AWS IP Addr in my stack trace. I set my hop limit back to 3, and was good to go.
👍 2
2 Views