Hi support, first of all - you've built a fantasti...
# ask-community
d
Hi support, first of all - you've built a fantastic tool! I am struggling to get gRPC work on Cloud Run and am wondering if I'm doing something too exotic. My setup is: • Cloud Run service running a dagit instance • Cloud Run service running a dagster instance • Cloud Run service running a "repository" via command
dagster api grpc --python-file repository.py --host 0.0.0.0 --port "${PORT}"
All three instances point to a Cloud SQL instance, spin up without errors and respond OK (e.g. I can access the dagit UI using the defaul
<http://run.app|run.app>
url). My issue is that dagit cannot connect to the "repository" instance, with the error below:
Copy code
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status = StatusCode.UNAVAILABLE details = "failed to connect to all addresses" debug_error_string = "{"created":"@1638898936.563075026","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3093,"referenced_errors":[{"created":"@1638898936.563072225","description":"failed to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":163,"grpc_status":14}]}" >
  File "/usr/local/lib/python3.9/site-packages/dagster/core/workspace/context.py", line 535, in _load_location
    location = self._create_location_from_origin(origin)
  File "/usr/local/lib/python3.9/site-packages/dagster/core/workspace/context.py", line 454, in _create_location_from_origin
    return origin.create_location()
  File "/usr/local/lib/python3.9/site-packages/dagster/core/host_representation/origin.py", line 271, in create_location
    return GrpcServerRepositoryLocation(self)
  File "/usr/local/lib/python3.9/site-packages/dagster/core/host_representation/repository_location.py", line 495, in __init__
    list_repositories_response = sync_list_repositories_grpc(self.client)
  File "/usr/local/lib/python3.9/site-packages/dagster/api/list_repositories.py", line 14, in sync_list_repositories_grpc
    deserialize_json_to_dagster_namedtuple(api_client.list_repositories()),
  File "/usr/local/lib/python3.9/site-packages/dagster/grpc/client.py", line 163, in list_repositories
    res = self._query("ListRepositories", api_pb2.ListRepositoriesRequest)
  File "/usr/local/lib/python3.9/site-packages/dagster/grpc/client.py", line 110, in _query
    response = getattr(stub, method)(request_type(**kwargs), timeout=timeout)
  File "/usr/local/lib/python3.9/site-packages/grpc/_channel.py", line 946, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/usr/local/lib/python3.9/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
Both dagit and dagster daemon are configured to connect to the "repository" instace via the instance's default
<http://run.app|run.app>
url and port 443. When I hit that url directly, I get
upstream connect error or disconnect/reset before headers. reset reason: protocol error
and though that looks like an error, I attributed that to the fact I'm simply querying it from a browser instead of a gRPC client and assumed that it's responding OK. At this point I don't have any ideas how to go from here, what to inspect or where to turn on debug mode to make dagit and the daemon find my repository gRPC service. I would appreciate help, if anyone has had similar experience or can spot I'm doing something odd here. Thanks!
The dagit instance has further logs:
Copy code
2021-12-07 17:48:03 - dagster-daemon - INFO - instance is configured with the following daemons: ['BackfillDaemon', 'SchedulerDaemon', 'SensorDaemon']
 2021-12-07 17:48:03 - SensorDaemon - INFO - Not checking for any runs since no sensors have been started.
 2021-12-07 17:48:03 - BackfillDaemon - INFO - No backfill jobs requested.
 2021-12-07 17:48:03 - SchedulerDaemon - INFO - Checking for new runs for the following schedules: schedule_update_shipping_emissions_db
 /usr/local/lib/python3.9/site-packages/dagster/core/workspace/context.py:538: UserWarning: Error loading repository location Shipping Repository:grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
     status = StatusCode.UNAVAILABLE
     details = "failed to connect to all addresses"
     debug_error_string = "{"created":"@1638899283.913094900","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3093,"referenced_errors":[{"created":"@1638899283.913094200","description":"failed to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":163,"grpc_status":14}]}"
 >

 Stack Trace:
   File "/usr/local/lib/python3.9/site-packages/dagster/core/workspace/context.py", line 535, in _load_location
     location = self._create_location_from_origin(origin)
   File "/usr/local/lib/python3.9/site-packages/dagster/core/workspace/context.py", line 454, in _create_location_from_origin
     return origin.create_location()
   File "/usr/local/lib/python3.9/site-packages/dagster/core/host_representation/origin.py", line 271, in create_location
     return GrpcServerRepositoryLocation(self)
   File "/usr/local/lib/python3.9/site-packages/dagster/core/host_representation/repository_location.py", line 495, in __init__
     list_repositories_response = sync_list_repositories_grpc(self.client)
   File "/usr/local/lib/python3.9/site-packages/dagster/api/list_repositories.py", line 14, in sync_list_repositories_grpc
     deserialize_json_to_dagster_namedtuple(api_client.list_repositories()),
   File "/usr/local/lib/python3.9/site-packages/dagster/grpc/client.py", line 163, in list_repositories
     res = self._query("ListRepositories", api_pb2.ListRepositoriesRequest)
   File "/usr/local/lib/python3.9/site-packages/dagster/grpc/client.py", line 110, in _query
     response = getattr(stub, method)(request_type(**kwargs), timeout=timeout)
   File "/usr/local/lib/python3.9/site-packages/grpc/_channel.py", line 946, in __call__
     return _end_unary_response_blocking(state, call, False, None)
   File "/usr/local/lib/python3.9/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
     raise _InactiveRpcError(state)

   warnings.warn(
 Loading repository...
 Serving on <http://0.0.0.0:3000> in process 7

   Telemetry:

   As an open source project, we collect usage statistics to inform development priorities. For more
   information, read <https://docs.dagster.io/install#telemetry>.

   We will not see or store solid definitions, pipeline definitions, modes, resources, context, or
   any data that is processed within solids and pipelines.

   To opt-out, add the following to $DAGSTER_HOME/dagster.yaml, creating that file if necessary:

     telemetry:
       enabled: false


   Welcome to Dagster!

   If you have any questions or would like to engage with the Dagster team, please join us on Slack
   (<https://bit.ly/39dvSsF>).
d
Hey David - for debugging grpc server access, there's a
dagster api grpc-health-check
command that you can use to verify that the server is running and accessible at a given host/port, using the same API calls that dagit will need to make in order for it to work. That should help with the browser/gRPC client issue at least, even if it doesn't necc. explain the specific reason the server isn't reachable. Does that help at all?
@Dagster Bot docs document the grpc health check CLI command
d
d
amazing, I'll try that. thanks so far!
I am getting the same error that I can read from dagit's logs:
Copy code
<_InactiveRpcError of RPC that terminated with:
        status = StatusCode.UNAVAILABLE
        details = "failed to connect to all addresses"
        debug_error_string = "{"created":"@1638900172.260668700","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3093,"referenced_errors":[{"created":"@1638900172.260667900","description":"failed to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":163,"grpc_status":14}]}"
>
Is there any way that you can think of how I toubleshoot why the "repository" instance is not available even though it's running and responding through browser? Do you know of any resource to point me to to get to the bottom of this?
Thanks!
d
just confirming, it behaves differently in the browser than when the server isn't running at all? That "upstream connect error or disconnect/reset before headers. reset reason: protocol error" doesn't appear?
m
is it possible that you've configured Cloud Run to use HTTP/2
d
is it possible that you've configured Cloud Run to use HTTP/2
I have tried both with and without HTTP/2 enabled. Now it's turned off.
just confirming, it behaves differently in the browser than when the server isn't running at all? That "upstream connect error or disconnect/reset before headers. reset reason: protocol error" doesn't appear?
the error appears when I hit the url/port from the browser. when I run the healthcheck I get the
failed to connect to all addresses
error
Is this because the server is on 443 and I'm not explicitly enabling ssl on dagit and the daemon?
d
not positive if it applies here, but there is a way to enable ssl in the workspace.yaml, for example:
Copy code
- grpc_server:
            host: remotehost
            port: 4266
            location_name: 'my_grpc_server'
            ssl: true
that would work if your server was set up in such a way that its only reachable over SSL
d
hmm it is only reachable over SSL
I'll give it a go and update
does the dagster-daemon have ssl mode (like dagit's
--use-ssl
)?
d
if you change the workspace.yaml, that would affect the daemon as well
confusingly dagit's --use-ssl is somewhat different, that's for accessing dagit itself (not what dagit uses to access the gRPC servers)
and you never access the daemon directly since it isn't a server
d
that fixed it facepalm
thanks for you help
d
oh nice - we should document this better for sure
m
yeah especially as we work towards a documented/supported Cloud Run solution
David, if you feel comfortable contributing some of this work back I think we would be quite receptive 🙂
d
Happy to help. I should have a working production setup today and then can write things up. Where should this live? In content/deployment/guides/cloud-run or the existing content/deployment/guides/gcp as a new section?
m
i think a new guide would be appropriate!