Any thoughts here on this error? Running k8s execu...
# ask-community
m
Any thoughts here on this error? Running k8s executor dagit/dagster version 1.2.7
Copy code
Operation name: JobMetadataQuery

Message: Failure loading edgeshare: dagster._core.errors.DagsterUserCodeUnreachableError: Could not reach user code server

Stack Trace:
  File "/usr/local/lib/python3.7/site-packages/dagster/_core/workspace/context.py", line 535, in _load_location
    location = self._create_location_from_origin(origin)
  File "/usr/local/lib/python3.7/site-packages/dagster/_core/workspace/context.py", line 460, in _create_location_from_origin
    return origin.create_location()
  File "/usr/local/lib/python3.7/site-packages/dagster/_core/host_representation/origin.py", line 329, in create_location
    return GrpcServerRepositoryLocation(self)
  File "/usr/local/lib/python3.7/site-packages/dagster/_core/host_representation/repository_location.py", line 606, in __init__
    self,
  File "/usr/local/lib/python3.7/site-packages/dagster/_api/snapshot_repository.py", line 29, in sync_get_streaming_external_repositories_data_grpc
    repository_name,
  File "/usr/local/lib/python3.7/site-packages/dagster/_grpc/client.py", line 336, in streaming_external_repository
    defer_snapshots=defer_snapshots,
  File "/usr/local/lib/python3.7/site-packages/dagster/_grpc/client.py", line 166, in _streaming_query
    raise DagsterUserCodeUnreachableError("Could not reach user code server") from e

The above exception was caused by the following exception:
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
	status = StatusCode.DEADLINE_EXCEEDED
	details = "Deadline Exceeded"
	debug_error_string = "{"created":"@1682543922.819565004","description":"Deadline Exceeded","file":"src/core/ext/filters/deadline/deadline_filter.cc","file_line":81,"grpc_status":4}"
>

Stack Trace:
  File "/usr/local/lib/python3.7/site-packages/dagster/_grpc/client.py", line 163, in _streaming_query
    method, request=request_type(**kwargs), timeout=timeout
  File "/usr/local/lib/python3.7/site-packages/dagster/_grpc/client.py", line 152, in _get_streaming_response
    yield from getattr(stub, method)(request, metadata=self._metadata, timeout=timeout)
  File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 426, in __next__
    return self._next()
  File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 826, in _next
    raise self


Path: ["assetNodes"]

Locations: [{"line":10,"column":3}]
fwiw: a fresh resolves this but then it keeps on happening.
a
StatusCode.DEADLINE_EXCEEDED
means it took longer than 60 seconds for the dagit webserver to fetch the workspace snapshot (representation of the definitions) from the code server via GRPC Do you have a very large workspace in on code location? many many jobs/ops/assets? Otherwise its possible limited resources are slowing things down. You can set env var
DAGSTER_GRPC_TIMEOUT_SECONDS
to change the timeout
m
Okay. got it. I have about 150 jobs/10 sensors running. I also make calls to the server via graphql query to refresh the repo. Yet I see this frequently
Copy code
Dagster Reload Response: {'data': {'reloadRepositoryLocation': {'__typename': 'WorkspaceLocationEntry', 'name': 'edgeshare', 'locationOrLoadError': {'__typename': 'PythonError', 'message': 'dagster._core.errors.DagsterUserCodeUnreachableError: Could not reach user code server
'}}}}
would this be the same thing going on?
a
you may need to fetch more of the error object to see the chained exception to see what the grpc status code is but i would speculate theres a good chance that its the same
👍 1
if you have the right
securityContext
settings you can use a profiler like
py-spy
to see whats taking the user code server so long
m
Okay yeh I'm only looking at response["errors"].. illl expose the whole body
a
how many ops/assets are in the 150 jobs ? is there any very large metadata attached to them?
there are some performance improvements in 1.3.2 coming out today/tomorrow that may help
m
eh, ~500 ops, no metadata and only a description.
I'm going to push a build and look at more of the response body
DAGSTER_GRPC_TIMEOUT_SECONDS increase may have helped but im still not 100% sure yet