hello! :wave: new to Dagster here and working on ...
# deployment-ecs
b
hello! 👋 new to Dagster here and working on deploying a proof of concept at my company on AWS ECS via CDK so we can see if we want to use it to orchestrate dbt. I'm deploying the UI, daemon, and user code as separate services and am running into some issues getting the user code container to stay healthy. It keeps randomly shutting down and restarting (but seems to correctly load my dbt code?). I'm suspecting it might be an AWS thing killing the container due to a failed health check. I know that the UI offers a
/server_info
endpoint for this but I'm curious do the daemon and code location servers offer something similar? Unsure how to get ECS to think my container is healthy and keep it running. (did find this old convo from this channel but not sure if a final conclusion was ever reached)
r
the gRPC servers have a
/ping
gRPC endpoint that could be suitable. The daemon is a process, and so it doesn’t have endpoints you can reach.
b
thank you!! I managed to get the containers to stay healthy some other way but now am having issues getting the daemon to actually "find" the module the gRPC server is trying to share. I have the port exposed on the user code server and it's running a workspace.yaml file like below, but i'm getting a cannot find module error:
Copy code
load_from:
  # Each entry here corresponds to a container that exposes a gRPC server.
  - grpc_server:
      host: usercode
      port: 4000
      location_name: "dagster_example"
what exactly should the
host
and
location_name
be referring to?
r
location_name
is just the name that will display in the Dagster UI. This can be whatever you want.
host
should be the host of your network that you have all your services running in
b
made some more progress on this deployment architecture. Currently have the webserver and code location servers sitting behind their own application load balancers. I've set up the code location server to accept gRPC inbounds and pass the host/port of the ALB to the webserver and daemon to use in their
workspace.yaml
files, but I'm still unable to connect to the code location. Currently it's getting this error:
Copy code
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "failed to connect to all addresses; last error: INTERNAL: ipv4:<ip_of_alb>:<port> Trying to connect an http1.x server".
Any ideas why it's giving this error?
j
Hi Brandon! We’re running into the same issue. We can run our GRPC setup locally in Docker, but we can’t reach out remotely from a dagster daemon running in one ECS service to a dagster GRPC server running in a different service on the same VPC
Did you end up resolving this?
Aha! We needed to set
ssl: true
in the grpc_server entry.