Simon Szalai
04/19/2022, 3:12 PMdagster.core.errors.DagsterUserCodeUnreachableError: Could not reach user code server
....
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status = StatusCode.UNAVAILABLE details = "failed to connect to all addresses" debug_error_string = "{"created":"@1650378812.506105369","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3128,"referenced_errors":[{"created":"@1650378812.506104247","description":"failed to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":163,"grpc_status":14}]}" >
What I checked so far is the following:
1. Logged in to the user code container, where I executed dagster api grpc -p 3030 --python-file /app/qc_pipeline/repository.py
, it worked as expected
2. kubectl describe
of the Dagit pod shows a warning: Warning Unhealthy 27m kubelet Readiness probe failed: Get "<http://192.168.127.122:80/dagit_info>": dial tcp 192.168.127.122:80: connect: connection refused
3. kubectl desctibe
of the user code deployment also shows a warning: Warning Unhealthy 4m17s (x95 over 35m) kubelet Readiness probe failed:
(but no more details here unlike in the Dagit pod)
4. kubectl logs
of the Dagit pod just shows the same error as Dagit UI
5. Pinged the user deployment container from the dagit container and vice-versa, both worked
I could imaging that it’s some AWS networking configuration that doesn’t allow communication between pods, but not really sure where to look even. I created the cluster with eksctl
and didn’t really touch any of the config.
@Lee Littlejohn seemed to have a similar problem, an in another post he mentioned that the problem was that the ‘user code container port was not mapped to the host’. What does this mean exactly? How can I do this?
Any help to point me to the right direction would be greatly appreciated! Thanks a lot!johann
04/20/2022, 5:43 PM1. Pinged the user deployment container from the dagit container and vice-versa, both workedAs in with the
ping
command? That uses a fixed port and protocol. You could use dagster api grpc-health-check
‘user code container port was not mapped to the host’. What does this meanI would guess it means that the grpc server was running on a port that wasn’t externally accessible. We default to
3030
in the helm chart, but we create a K8s Service with that port so usually that’s handled for you. It’s possible that you have IAM or security group rules that aren’t allowing that connection between Fargate containersSimon Szalai
04/21/2022, 6:57 PMdagster api grpc-health-check
, it failed with the same error that I mentioned in the original post.
In general, would be interested to know which components of Dagster can even run on Fargate? Can dagit and the daemon run there? Or just the workloads? Is there an example somewhere that describes ho to deploy it to EKS?johann
04/21/2022, 6:59 PMjohann
04/21/2022, 7:02 PMFargate exposed services only run on target type IP mode, and not on node IP mode. The recommended way to check the connectivity from a service running on a managed node and a service running on Fargate is to connect via service name.https://docs.aws.amazon.com/eks/latest/userguide/fargate.html