hi Friends I encountered an issue that I d like to get some dagster #deployment-kubernetes

hi Friends, I encountered an issue that I’d like t...

Simon Szalai

04/19/2022, 3:12 PM

hi Friends, I encountered an issue that I’d like to get some support on. I deployed a Dagster instance to AWS EKS using Fargate. The problem that I have is that it seems Dagit cannot connect to the user code. The error I’m getting on the Dagster UI:

Copy code

dagster.core.errors.DagsterUserCodeUnreachableError: Could not reach user code server
....
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status = StatusCode.UNAVAILABLE details = "failed to connect to all addresses" debug_error_string = "{"created":"@1650378812.506105369","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3128,"referenced_errors":[{"created":"@1650378812.506104247","description":"failed to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":163,"grpc_status":14}]}" >

What I checked so far is the following: 1. Logged in to the user code container, where I executed

dagster api grpc -p 3030 --python-file /app/qc_pipeline/repository.py

, it worked as expected 2.

kubectl describe

of the Dagit pod shows a warning:

Warning  Unhealthy        27m   kubelet            Readiness probe failed: Get "<http://192.168.127.122:80/dagit_info>": dial tcp 192.168.127.122:80: connect: connection refused

kubectl desctibe

of the user code deployment also shows a warning:

Warning  Unhealthy        4m17s (x95 over 35m)  kubelet            Readiness probe failed:

(but no more details here unlike in the Dagit pod) 4.

kubectl logs

of the Dagit pod just shows the same error as Dagit UI 5. Pinged the user deployment container from the dagit container and vice-versa, both worked I could imaging that it’s some AWS networking configuration that doesn’t allow communication between pods, but not really sure where to look even. I created the cluster with

eksctl

and didn’t really touch any of the config. @Lee Littlejohn seemed to have a similar problem, an in another post he mentioned that the problem was that the ‘user code container port was not mapped to the host’. What does this mean exactly? How can I do this? Any help to point me to the right direction would be greatly appreciated! Thanks a lot!

johann

04/20/2022, 5:43 PM

Hi Simon, I’m not familiar with the intricacies of EKS with Fargate but let’s see

1. Pinged the user deployment container from the dagit container and vice-versa, both worked

As in with the

ping

command? That uses a fixed port and protocol. You could use

dagster api grpc-health-check

‘user code container port was not mapped to the host’. What does this mean

I would guess it means that the grpc server was running on a port that wasn’t externally accessible. We default to

in the helm chart, but we create a K8s Service with that port so usually that’s handled for you. It’s possible that you have IAM or security group rules that aren’t allowing that connection between Fargate containers

Simon Szalai

04/21/2022, 6:57 PM

hey, thanks for the tips! 1. You are right, ping probably is not checking for the right port. Tried

dagster api grpc-health-check

, it failed with the same error that I mentioned in the original post. In general, would be interested to know which components of Dagster can even run on Fargate? Can dagit and the daemon run there? Or just the workloads? Is there an example somewhere that describes ho to deploy it to EKS?

johann

04/21/2022, 6:59 PM

These docs should apply to EKS, we have a lot of users running on it https://docs.dagster.io/deployment/guides/kubernetes/deploying-with-helm

johann

04/21/2022, 7:02 PM

Nothing jumps out to me from the eks/fargate docs to say this shouldn’t work…

Fargate exposed services only run on target type IP mode, and not on node IP mode. The recommended way to check the connectivity from a service running on a managed node and a service running on Fargate is to connect via service name.

https://docs.aws.amazon.com/eks/latest/userguide/fargate.html

Open in Slack

Previous Next