https://dagster.io/ logo
#ask-community
Title
# ask-community
m

Mohammad Nazeeruddin

07/15/2022, 2:07 PM
Hi Team, We have 20 repositories in our workspace and in one repository we are getting below error. remaining repositories working fine and have same configurations for all repos (using user-code-deployment helm chart).the all repo services communicating with istio. one repo is not stable. the repo pod in running state, checked logs still no luck. this one repo behavior is very weird, sometimes working and sometimes frequently restarting the repo in UI.
Copy code
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with: status = StatusCode.DEADLINE_EXCEEDED details = "Deadline Exceeded" debug_error_string = "{"created":"@1657881937.310747097","description":"Error received from peer ipv4:192.168.158.64:3030","file":"src/core/lib/surface/call.cc","file_line":1069,"grpc_message":"Deadline Exceeded","grpc_status":4}" >
  File "/usr/local/lib/python3.7/site-packages/dagster/core/workspace/context.py", line 522, in _load_location
    location = self._create_location_from_origin(origin)
  File "/usr/local/lib/python3.7/site-packages/dagster/core/workspace/context.py", line 441, in _create_location_from_origin
    return origin.create_location()
  File "/usr/local/lib/python3.7/site-packages/dagster/core/host_representation/origin.py", line 263, in create_location
    return GrpcServerRepositoryLocation(self)
  File "/usr/local/lib/python3.7/site-packages/dagster/core/host_representation/repository_location.py", line 520, in __init__
    self,
  File "/usr/local/lib/python3.7/site-packages/dagster/api/snapshot_repository.py", line 19, in sync_get_streaming_external_repositories_data_grpc
    repository_name,
  File "/usr/local/lib/python3.7/site-packages/dagster/grpc/client.py", line 259, in streaming_external_repository
    external_repository_origin
  File "/usr/local/lib/python3.7/site-packages/dagster/grpc/client.py", line 118, in _streaming_query
    yield from response_stream
  File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 426, in __next__
    return self._next()
  File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 826, in _next
Any help on this ?
@daniel could you please have a look on this issue.
d

daniel

07/18/2022, 12:00 PM
It looks like Dagster is taking a long time to load your code. Is it possible to share the code of that repository? If 20 repositories on one machine is running into scaling issues you may want to look into ways of running each repository location in its own gRPC server/node. The dagster kubernetes helm chart and deploying in Docker/ECS examples all provide examples of how to do this
m

Mohammad Nazeeruddin

07/18/2022, 12:08 PM
Followed above ^ document and able to deploy most of the repos but facing issue for only one repo and getting above error.
d

daniel

07/18/2022, 12:10 PM
which document?
is it possible to share teh code of the repo that's failing? Anything you can think of that would make it sometimes take a long time to run just when importing the code?
m

Mohammad Nazeeruddin

07/18/2022, 12:20 PM
In this one repo we have 1000+ pipelines, those pipelines loading from EFS and efs mounted with user-code-deployment pod of the repo. sometimes it's working fine, whenever uploading a pipeline, repo getting down and if pod restarted same thing happening. we have 20 repos, 19repos working without any error. but for only this repo getting above mentioned error.
I will share code.
Copy code
- name: "client"
      image:
        repository: pipeline-orchestrator
        tag: v1.10
        pullPolicy: Always
      dagsterApiGrpcArgs:
        - "--python-file"
        - "/home/repos/client/client.py"
      port: 3030
d

daniel

07/18/2022, 12:26 PM
I can file an issue about becoming better able to handle repositories with thousands of pipelines (a way for us to reproduce this problem ourselves locally would be great). In the short term you may need to find a way to split out the pipelines into different repositories. 1000+ pipelines is a lot of pipelines and you may be hitting a scaling limit
m

Mohammad Nazeeruddin

07/18/2022, 12:31 PM
Yeah, i noticed one thing when i am using 200 pipelines it's working fine. then I increased resources and used 1000+ pipelines still no use. same issue, yes might be your correct it's not able to handle 1000+ pipelines. (we are using version 0.12.14 forgot to mention)
d

daniel

07/18/2022, 12:37 PM
@Dagster Bot scaling issues with a single repository location containing 1000+ pipelines
d

Dagster Bot

07/18/2022, 12:38 PM
Invalid command. Did you mean to create an issue or a discussion? Try
@dagster_bot <issue|docs|discussion> <title>
👍 1
d

daniel

07/18/2022, 12:43 PM
@Dagster Bot issue timeouts when loading a single repository location containing 1000+ pipelines
d

Dagster Bot

07/18/2022, 12:43 PM
m

Mohammad Nazeeruddin

07/18/2022, 12:45 PM
Thank you @daniel.
Any updates please on this @daniel ?
2 Views