Hi all I often get a repository not found error whenever new dagster #ask-community

Hi all , I often get a repository not found error...

Jay Sharma

05/17/2022, 7:58 PM

Hi all , I often get a repository not found error, whenever new code is merged into my Dagster Project. Has anyone else experienced this error in Dagit where they receive a 'repository not found' message. I'm not sure why this is happening or how I can prevent this from happening in the future. I am deploying Dagster on a K8 Cluster btw. Thanks for any suggestions. Here are the logs: UserWarning: Error loading repository location pipeline-a1b58665dagster.core.errors.DagsterUserCodeUnreachableError Could not reach user code server Stack Trace: File "/usr/local/lib/python3.7/site-packages/dagster/core/workspace/context.py", line 555, in _load_location location = self._create_location_from_origin(origin) File "/usr/local/lib/python3.7/site-packages/dagster/core/workspace/context.py", line 481, in _create_location_from_origin return origin.create_location() File "/usr/local/lib/python3.7/site-packages/dagster/core/host_representation/origin.py", line 291, in create_location return GrpcServerRepositoryLocation(self) File "/usr/local/lib/python3.7/site-packages/dagster/core/host_representation/repository_location.py", line 526, in init list_repositories_response = sync_list_repositories_grpc(self.client) File "/usr/local/lib/python3.7/site-packages/dagster/api/list_repositories.py", line 19, in sync_list_repositories_grpc api_client.list_repositories(), File "/usr/local/lib/python3.7/site-packages/dagster/grpc/client.py", line 164, in list_repositories res = self._query("ListRepositories", api_pb2.ListRepositoriesRequest) File "/usr/local/lib/python3.7/site-packages/dagster/grpc/client.py", line 110, in _query raise DagsterUserCodeUnreachableError("Could not reach user code server") from e The above exception was caused by the following exception: grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status = StatusCode.UNAVAILABLE details = "failed to connect to all addresses" debug_error_string = "{"created":"@1652362522.630459675","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3128,"referenced_errors":[{"created":"@1652362522.630458467","description":"failed to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":163,"grpc_status":14}]}"

Stack Trace: File "/usr/local/lib/python3.7/site-packages/dagster/grpc/client.py", line 107, in _query response = getattr(stub, method)(request_type(**kwargs), timeout=timeout) File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 946, in call return _end_unary_response_blocking(state, call, False, None) File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking raise _InactiveRpcError(state) location_name=location_name, error_string=error.to_string() 2022-05-12 133522 +0000 - dagit - INFO - Serving dagit on http://0.0.0.0:80 in process 1

Ben Gatewood

05/17/2022, 10:23 PM

Are you using a separate helm chart for user code repositories?

daniel

05/18/2022, 3:00 AM

Hi Jay, when you say "whenever new code is merged into my Dagster Project" - could you elaborate on what exactly you're doing when the code changes? As Ben said, are you using the Dagster helm chart? More information about your setup would help understand what's going on

Jay Sharma

05/20/2022, 5:01 PM

whenever new code is merged into my Dagster Project. By this I meant, when I merge new code into my gitlab project. This triggers the CI/CD process which uninstalls the old release and creates the new pods associated with the release. I am deploying using Helm using this guide: https://docs.dagster.io/deployment/guides/kubernetes/deploying-with-helm#deploying-dagster-on-helm. My user-code repositories portion is handled by the values_override.yaml and looks like this.

Copy code

dagster:
  dagster-user-deployments:
    enabled: true
    deployments:
      - name: "pipeline-1"
        image:
          repository: localhost:5000/repository
          tag: "latest"
          pullPolicy: Never
        dagsterApiGrpcArgs:
          - "--python-file"
          - "dagster_integration/setup.py"
          - "--working-directory"
          - "dagster_integration/src"
        port: 3030
        envConfigMaps:
          - name: pipeline-configmap

daniel

05/20/2022, 6:21 PM

Does it recover after the error? If you reload the location in Dagit once everything is spun up, does it load correctly?

daniel

05/20/2022, 6:22 PM

The expected behavior is that there might be an error initially, but dagit wiill poll until the server is ready and then automatically reload it in the UI

Jay Sharma

05/31/2022, 5:22 AM

Hi Daniel, Thanks for your reply and Apologies for the late response. Yes, it does recover 99% of the time when I click reload workspace. One time I did run into this: HTTP 503 Error for Dagit. This caused Dagit to be down until I did a fresh re-install of the pods: helm uninstall namespace -n namespace and reinstall the pods. Do you know why the HTTP 503 error may happen? A generic cause of this error is due to temporary server overload Can this occur : if two code changes within quick succession or is it a sign to increase resources of Dagster user code depoyment: https://docs.dagster.io/deployment/guides/kubernetes/deploying-with-helm#dagit Thanks for your help and please let me know if I need to provide any other information.

daniel

05/31/2022, 2:10 PM

A 503 in dagit could mean that the dagit pod needs more resources, since that's the pod that's becoming unavailable

290 Views

Open in Slack

Previous Next