Quite frequently, I am unable to retry solid error...
# ask-community
m
Quite frequently, I am unable to retry solid errors and I receive the error below. It seems to happen with solids that are downstream of dynamic outputs, but that may not always be the case. At a loss as to where to start troubleshooting and am hoping someone can point me in the right direction. Right now I am running this on my local machine using the multiprocess executor. Full codebase is here: https://github.com/xmarcosx/dagster-etl
Copy code
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.RESOURCE_EXHAUSTED
details = "Received message larger than max (10985123 vs. 10485760)"
debug_error_string = "{"created":"@1634491696.811821461","description":"Received message larger than max (10985123 vs. 10485760)","file":"src/core/ext/filters/message_size/message_size_filter.cc","file_line":206,"grpc_status":8}"
d
Hi Marcos - is there possibly a longer stack trace in the dagit process output when this happens? If so would you mind sharing it?
m
Copy code
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.RESOURCE_EXHAUSTED
details = "Received message larger than max (10985123 vs. 10485760)"
debug_error_string = "{"created":"@1634491696.811821461","description":"Received message larger than max (10985123 vs. 10485760)","file":"src/core/ext/filters/message_size/message_size_filter.cc","file_line":206,"grpc_status":8}"
>

  File "/usr/local/lib/python3.8/site-packages/dagster_graphql/implementation/utils.py", line 34, in _fn
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/dagster_graphql/implementation/execution/launch_execution.py", line 11, in launch_pipeline_reexecution
    return _launch_pipeline_execution(graphene_info, execution_params, is_reexecuted=True)
  File "/usr/local/lib/python3.8/site-packages/dagster_graphql/implementation/execution/launch_execution.py", line 50, in _launch_pipeline_execution
    run = do_launch(graphene_info, execution_params, is_reexecuted)
  File "/usr/local/lib/python3.8/site-packages/dagster_graphql/implementation/execution/launch_execution.py", line 34, in do_launch
    pipeline_run = create_valid_pipeline_run(graphene_info, external_pipeline, execution_params)
  File "/usr/local/lib/python3.8/site-packages/dagster_graphql/implementation/execution/run_lifecycle.py", line 48, in create_valid_pipeline_run
    external_execution_plan = get_external_execution_plan_or_raise(
  File "/usr/local/lib/python3.8/site-packages/dagster_graphql/implementation/external.py", line 115, in get_external_execution_plan_or_raise
    return graphene_info.context.get_external_execution_plan(
  File "/usr/local/lib/python3.8/site-packages/dagster/core/workspace/context.py", line 190, in get_external_execution_plan
    return self.get_repository_location(
  File "/usr/local/lib/python3.8/site-packages/dagster/core/host_representation/repository_location.py", line 620, in get_external_execution_plan
    execution_plan_snapshot_or_error = sync_get_external_execution_plan_grpc(
  File "/usr/local/lib/python3.8/site-packages/dagster/api/snapshot_execution_plan.py", line 36, in sync_get_external_execution_plan_grpc
    api_client.execution_plan_snapshot(
  File "/usr/local/lib/python3.8/site-packages/dagster/grpc/client.py", line 153, in execution_plan_snapshot
    res = self._query(
  File "/usr/local/lib/python3.8/site-packages/dagster/grpc/client.py", line 110, in _query
    response = getattr(stub, method)(request_type(**kwargs), timeout=timeout)
  File "/usr/local/lib/python3.8/site-packages/grpc/_channel.py", line 946, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/usr/local/lib/python3.8/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
Hopefully that's helpful! I can dig up more details if needed
d
Definitely helpful - and just to confirm, this is the latest version of dagster? Appears to be from looking at your GitHub repo
m
Yup, that's right
d
Hey marcos - just trying to reproduce this with some stubbed out data. You said 'It seems to happen with solids that are downstream of dynamic outputs' - do you recall a specific solid where it happened? Thanks
and if you remember how many dynamic outputs were being collected in that particular step, that would also be helpful
the github repo is really helpful - if sending over your a dump of your runs and event_logs table (over DM or email) is an option, that would definitely be enough to reproduce the problem
Ah, actually, I think there is an env var that you can set (which we should document) that increases the limit that you're running into - try setting DAGSTER_GRPC_MAX_RX_BYTES to 20000000
@Dagster Bot docs Document the DAGSTER_GRPC_MAX_RX_BYTES environment variable to increase gRPC memory limits
d
m
Thank you @daniel, setting the
DAGSTER_GRPC_MAX_RX_BYTES
environment variable did it! For background info: I have an external API request that returned 1,200 unique ids. Those ids are dynamic and for each one I need to hit a set of additional API endpoints (endpoints A, B, and C). I had a solid that runs the GET to receive the ~1,200 ids and dynamically outputs them. I put those results through
.map()
functions to hit endpoints A,B, C which can all be run in parallel. Dagster has allowed ETL pipelines that once took days, to take only several hours to complete.
condagster 1