hello wave new to Dagster here and working on deploying a pr dagster #deployment-ecs

hello! :wave: new to Dagster here and working on ...

Brandon Peebles

07/31/2023, 9:14 PM

hello! 👋 new to Dagster here and working on deploying a proof of concept at my company on AWS ECS via CDK so we can see if we want to use it to orchestrate dbt. I'm deploying the UI, daemon, and user code as separate services and am running into some issues getting the user code container to stay healthy. It keeps randomly shutting down and restarting (but seems to correctly load my dbt code?). I'm suspecting it might be an AWS thing killing the container due to a failed health check. I know that the UI offers a

/server_info

endpoint for this but I'm curious do the daemon and code location servers offer something similar? Unsure how to get ECS to think my container is healthy and keep it running. (did find this old convo from this channel but not sure if a final conclusion was ever reached)

rex

08/01/2023, 4:38 PM

the gRPC servers have a

/ping

gRPC endpoint that could be suitable. The daemon is a process, and so it doesn’t have endpoints you can reach.

Brandon Peebles

08/01/2023, 4:41 PM

thank you!! I managed to get the containers to stay healthy some other way but now am having issues getting the daemon to actually "find" the module the gRPC server is trying to share. I have the port exposed on the user code server and it's running a workspace.yaml file like below, but i'm getting a cannot find module error:

Copy code

load_from:
  # Each entry here corresponds to a container that exposes a gRPC server.
  - grpc_server:
      host: usercode
      port: 4000
      location_name: "dagster_example"

Brandon Peebles

08/01/2023, 4:42 PM

what exactly should the

host

and

location_name

be referring to?

rex

08/01/2023, 4:43 PM

location_name

is just the name that will display in the Dagster UI. This can be whatever you want.

host

should be the host of your network that you have all your services running in

Brandon Peebles

08/09/2023, 4:51 PM

made some more progress on this deployment architecture. Currently have the webserver and code location servers sitting behind their own application load balancers. I've set up the code location server to accept gRPC inbounds and pass the host/port of the ALB to the webserver and daemon to use in their

workspace.yaml

files, but I'm still unable to connect to the code location. Currently it's getting this error:

Copy code

grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "failed to connect to all addresses; last error: INTERNAL: ipv4:<ip_of_alb>:<port> Trying to connect an http1.x server".

Any ideas why it's giving this error?

Jake Kara

09/08/2023, 1:45 PM

Hi Brandon! We’re running into the same issue. We can run our GRPC setup locally in Docker, but we can’t reach out remotely from a dagster daemon running in one ECS service to a dagster GRPC server running in a different service on the same VPC

Jake Kara

09/08/2023, 1:45 PM

Did you end up resolving this?

Jake Kara

09/08/2023, 3:59 PM

Aha! We needed to set

ssl: true

in the grpc_server entry.

8 Views

Open in Slack

Previous Next