Hi, I am getting this error intermittently
# ask-community
n
Hi, I am getting this error intermittently
anyone got this error before??
n
I am getting this error too. Working with version 12.10 . Error is coming too frequently. Because of this error I am not able to execute any pipeline.
d
Hi, are there any logs from your user code deployment that you can share that might contain a clue? It seems like the user code deployment pod is crashing, which can happen for a variety of reasons
(assuming that you're using k8s - if not, the logs from wherever your gRPC server is running may have some clues)
n
We don't have a GRPC server. We are using k8s
d
Are you using the helm chart that we provide?
n
Yes
To add more context... below is the full error we got on UI
Copy code
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "failed to connect to all addresses"
debug_error_string = "{"created":"@1631717203.837170554","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3186,"referenced_errors":[{"created":"@1631717203.837169354","description":"failed to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":146,"grpc_status":14}]}"
>
  File "/usr/local/lib/python3.7/site-packages/dagster_graphql/implementation/utils.py", line 34, in _fn
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/dagster_graphql/implementation/execution/launch_execution.py", line 16, in launch_pipeline_execution
    return _launch_pipeline_execution(graphene_info, execution_params)
  File "/usr/local/lib/python3.7/site-packages/dagster_graphql/implementation/execution/launch_execution.py", line 50, in _launch_pipeline_execution
    run = do_launch(graphene_info, execution_params, is_reexecuted)
  File "/usr/local/lib/python3.7/site-packages/dagster_graphql/implementation/execution/launch_execution.py", line 34, in do_launch
    pipeline_run = create_valid_pipeline_run(graphene_info, external_pipeline, execution_params)
  File "/usr/local/lib/python3.7/site-packages/dagster_graphql/implementation/execution/run_lifecycle.py", line 54, in create_valid_pipeline_run
    known_state=known_state,
  File "/usr/local/lib/python3.7/site-packages/dagster_graphql/implementation/external.py", line 120, in get_external_execution_plan_or_raise
    known_state=known_state,
  File "/usr/local/lib/python3.7/site-packages/dagster/core/workspace/context.py", line 195, in get_external_execution_plan
    known_state=known_state,
  File "/usr/local/lib/python3.7/site-packages/dagster/core/host_representation/repository_location.py", line 628, in get_external_execution_plan
    known_state=known_state,
  File "/usr/local/lib/python3.7/site-packages/dagster/api/snapshot_execution_plan.py", line 44, in sync_get_external_execution_plan_grpc
    known_state=known_state,
  File "/usr/local/lib/python3.7/site-packages/dagster/grpc/client.py", line 157, in execution_plan_snapshot
    execution_plan_snapshot_args
  File "/usr/local/lib/python3.7/site-packages/dagster/grpc/client.py", line 110, in _query
    response = getattr(stub, method)(request_type(**kwargs), timeout=timeout)
  File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 946, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
n
@Mohammad Nazeeruddin
d
Yeah, what we need are logs from the pod that it's calling out to, to see why it's not returning. If you're using the helm chart, you have a user code deployment that is running a gRPC server (described here: https://docs.dagster.io/deployment/guides/kubernetes/deploying-with-helm#user-code-deployment) - is it possible to share logs from that pod?
👍 1
y
Copy code
/home/appuser/.local/lib/python3.9/site-packages/dagster/core/workspace/context.py:510: UserWarning: Error loading repository location post-labeling:grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "failed to connect to all addresses"
	debug_error_string = "{"created":"@1631123325.774403864","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3186,"referenced_errors":[{"created":"@1631123325.774402474","description":"failed to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":146,"grpc_status":14}]}"
>

Stack Trace:
  File "/home/appuser/.local/lib/python3.9/site-packages/dagster/core/workspace/context.py", line 507, in _load_location
    location = self._create_location_from_origin(origin)
  File "/home/appuser/.local/lib/python3.9/site-packages/dagster/core/workspace/context.py", line 426, in _create_location_from_origin
    return origin.create_location()
  File "/home/appuser/.local/lib/python3.9/site-packages/dagster/core/host_representation/origin.py", line 263, in create_location
    return GrpcServerRepositoryLocation(self)
  File "/home/appuser/.local/lib/python3.9/site-packages/dagster/core/host_representation/repository_location.py", line 490, in __init__
    list_repositories_response = sync_list_repositories_grpc(self.client)
  File "/home/appuser/.local/lib/python3.9/site-packages/dagster/api/list_repositories.py", line 14, in sync_list_repositories_grpc
    deserialize_json_to_dagster_namedtuple(api_client.list_repositories()),
  File "/home/appuser/.local/lib/python3.9/site-packages/dagster/grpc/client.py", line 163, in list_repositories
    res = self._query("ListRepositories", api_pb2.ListRepositoriesRequest)
  File "/home/appuser/.local/lib/python3.9/site-packages/dagster/grpc/client.py", line 110, in _query
    response = getattr(stub, method)(request_type(**kwargs), timeout=timeout)
  File "/home/appuser/.local/lib/python3.9/site-packages/grpc/_channel.py", line 946, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/home/appuser/.local/lib/python3.9/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
    raise _InactiveRpcError(state)

  warnings.warn(
/home/appuser/.local/lib/python3.9/site-packages/dagster/core/workspace/context.py:510: UserWarning: Error loading repository location post-labeling:grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "failed to connect to all addresses"
	debug_error_string = "{"created":"@1631123399.672864111","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3186,"referenced_errors":[{"created":"@1631123399.672862085","description":"failed to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":146,"grpc_status":14}]}"
>

Stack Trace:
  File "/home/appuser/.local/lib/python3.9/site-packages/dagster/core/workspace/context.py", line 507, in _load_location
    location = self._create_location_from_origin(origin)
  File "/home/appuser/.local/lib/python3.9/site-packages/dagster/core/workspace/context.py", line 426, in _create_location_from_origin
    return origin.create_location()
  File "/home/appuser/.local/lib/python3.9/site-packages/dagster/core/host_representation/origin.py", line 263, in create_location
    return GrpcServerRepositoryLocation(self)
  File "/home/appuser/.local/lib/python3.9/site-packages/dagster/core/host_representation/repository_location.py", line 490, in __init__
    list_repositories_response = sync_list_repositories_grpc(self.client)
  File "/home/appuser/.local/lib/python3.9/site-packages/dagster/api/list_repositories.py", line 14, in sync_list_repositories_grpc
    deserialize_json_to_dagster_namedtuple(api_client.list_repositories()),
  File "/home/appuser/.local/lib/python3.9/site-packages/dagster/grpc/client.py", line 163, in list_repositories
    res = self._query("ListRepositories", api_pb2.ListRepositoriesRequest)
  File "/home/appuser/.local/lib/python3.9/site-packages/dagster/grpc/client.py", line 110, in _query
    response = getattr(stub, method)(request_type(**kwargs), timeout=timeout)
  File "/home/appuser/.local/lib/python3.9/site-packages/grpc/_channel.py", line 946, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/home/appuser/.local/lib/python3.9/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
I also encountered this with k8s at some point, I'm not sure about the reason but I think it was because the user deployment pod kept restarting due to OOM I'll post back if I encounter it again
d
Got it - if it is a resources issue and one of your deployments needs more memory to function, you can control the resources available to each user code deployment by setting a resources key for each deployment in the helm chart - for example:
Copy code
dagster-user-deployments:
  enabled: true
  deployments:
    - name: "k8s-example-user-code-1"
      image:
        repository: "<http://docker.io/dagster/user-code-example|docker.io/dagster/user-code-example>"
        tag: latest
        pullPolicy: Always
      dagsterApiGrpcArgs:
        - "--python-file"
        - "/example_project/example_repo/repo.py"
      port: 3030
      resources: <your resource configuration here>
👍 1
n
@daniel, If we update our workspace.yaml with new repositories do we have to update this file also? And does that mean we need to redeploy?
d
My understanding is that the helm chart creates your workspace.yaml for you based on the contents of this file
m
No, We are loading our worksapce.yaml from efs path. we did't configure workspace,yaml inside heml chart.
d
I didn't think it was possible to use the helm chart with your own workspace.yaml file. What are the contents of your workspace.yaml file?
n
Copy code
load_from:
  - python_file: 
      relative_path: /home/orchestrator/app/domains/udp/refined/sources/sample_source/udp__sample_source.py
  - python_file: 
      relative_path: /home/orchestrator/app/domains/udp/refined/sources/sample_source/udp__sample_product.py
d
got it. i think this may also help explain your original post about resource issues. If you're not using our user code deployments feature, all of your code is executing in a single pod, so its a lot more likely that one of them will go down or run out of resources. If you configure that helm chart i mentioned so that our helm chart manages your user code deployments for you, each line in that workspace.yaml will run in its own pod (with resource limits that you can configure), and kubernetes will automatically restart it if it goes down. So I think what you want to do is configure user code deployments in your helm chart and no longer write your own workspace.yaml field