hi Friends, I encountered an issue that I’d like t...
# deployment-kubernetes
s
hi Friends, I encountered an issue that I’d like to get some support on. I deployed a Dagster instance to AWS EKS using Fargate. The problem that I have is that it seems Dagit cannot connect to the user code. The error I’m getting on the Dagster UI:
Copy code
dagster.core.errors.DagsterUserCodeUnreachableError: Could not reach user code server
....
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status = StatusCode.UNAVAILABLE details = "failed to connect to all addresses" debug_error_string = "{"created":"@1650378812.506105369","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3128,"referenced_errors":[{"created":"@1650378812.506104247","description":"failed to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":163,"grpc_status":14}]}" >
What I checked so far is the following: 1. Logged in to the user code container, where I executed
dagster api grpc -p 3030 --python-file /app/qc_pipeline/repository.py
, it worked as expected 2.
kubectl describe
of the Dagit pod shows a warning:
Warning  Unhealthy        27m   kubelet            Readiness probe failed: Get "<http://192.168.127.122:80/dagit_info>": dial tcp 192.168.127.122:80: connect: connection refused
3.
kubectl desctibe
of the user code deployment also shows a warning:
Warning  Unhealthy        4m17s (x95 over 35m)  kubelet            Readiness probe failed:
(but no more details here unlike in the Dagit pod) 4.
kubectl logs
of the Dagit pod just shows the same error as Dagit UI 5. Pinged the user deployment container from the dagit container and vice-versa, both worked I could imaging that it’s some AWS networking configuration that doesn’t allow communication between pods, but not really sure where to look even. I created the cluster with
eksctl
and didn’t really touch any of the config. @Lee Littlejohn seemed to have a similar problem, an in another post he mentioned that the problem was that the ‘user code container port was not mapped to the host’. What does this mean exactly? How can I do this? Any help to point me to the right direction would be greatly appreciated! Thanks a lot!
j
Hi Simon, I’m not familiar with the intricacies of EKS with Fargate but let’s see
1. Pinged the user deployment container from the dagit container and vice-versa, both worked
As in with the
ping
command? That uses a fixed port and protocol. You could use
dagster api grpc-health-check
‘user code container port was not mapped to the host’. What does this mean
I would guess it means that the grpc server was running on a port that wasn’t externally accessible. We default to
3030
in the helm chart, but we create a K8s Service with that port so usually that’s handled for you. It’s possible that you have IAM or security group rules that aren’t allowing that connection between Fargate containers
s
hey, thanks for the tips! 1. You are right, ping probably is not checking for the right port. Tried
dagster api grpc-health-check
, it failed with the same error that I mentioned in the original post. In general, would be interested to know which components of Dagster can even run on Fargate? Can dagit and the daemon run there? Or just the workloads? Is there an example somewhere that describes ho to deploy it to EKS?
j
These docs should apply to EKS, we have a lot of users running on it https://docs.dagster.io/deployment/guides/kubernetes/deploying-with-helm
Nothing jumps out to me from the eks/fargate docs to say this shouldn’t work…
Fargate exposed services only run on target type IP mode, and not on node IP mode. The recommended way to check the connectivity from a service running on a managed node and a service running on Fargate is to connect via service name.
https://docs.aws.amazon.com/eks/latest/userguide/fargate.html