Quite frequently I am unable to retry solid errors and I rec dagster #ask-community

Quite frequently, I am unable to retry solid error...

marcos

10/17/2021, 7:06 PM

Quite frequently, I am unable to retry solid errors and I receive the error below. It seems to happen with solids that are downstream of dynamic outputs, but that may not always be the case. At a loss as to where to start troubleshooting and am hoping someone can point me in the right direction. Right now I am running this on my local machine using the multiprocess executor. Full codebase is here: https://github.com/xmarcosx/dagster-etl

Copy code

grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.RESOURCE_EXHAUSTED
details = "Received message larger than max (10985123 vs. 10485760)"
debug_error_string = "{"created":"@1634491696.811821461","description":"Received message larger than max (10985123 vs. 10485760)","file":"src/core/ext/filters/message_size/message_size_filter.cc","file_line":206,"grpc_status":8}"

daniel

10/17/2021, 7:15 PM

Hi Marcos - is there possibly a longer stack trace in the dagit process output when this happens? If so would you mind sharing it?

marcos

10/17/2021, 7:38 PM

Copy code

grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.RESOURCE_EXHAUSTED
details = "Received message larger than max (10985123 vs. 10485760)"
debug_error_string = "{"created":"@1634491696.811821461","description":"Received message larger than max (10985123 vs. 10485760)","file":"src/core/ext/filters/message_size/message_size_filter.cc","file_line":206,"grpc_status":8}"
>

  File "/usr/local/lib/python3.8/site-packages/dagster_graphql/implementation/utils.py", line 34, in _fn
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/dagster_graphql/implementation/execution/launch_execution.py", line 11, in launch_pipeline_reexecution
    return _launch_pipeline_execution(graphene_info, execution_params, is_reexecuted=True)
  File "/usr/local/lib/python3.8/site-packages/dagster_graphql/implementation/execution/launch_execution.py", line 50, in _launch_pipeline_execution
    run = do_launch(graphene_info, execution_params, is_reexecuted)
  File "/usr/local/lib/python3.8/site-packages/dagster_graphql/implementation/execution/launch_execution.py", line 34, in do_launch
    pipeline_run = create_valid_pipeline_run(graphene_info, external_pipeline, execution_params)
  File "/usr/local/lib/python3.8/site-packages/dagster_graphql/implementation/execution/run_lifecycle.py", line 48, in create_valid_pipeline_run
    external_execution_plan = get_external_execution_plan_or_raise(
  File "/usr/local/lib/python3.8/site-packages/dagster_graphql/implementation/external.py", line 115, in get_external_execution_plan_or_raise
    return graphene_info.context.get_external_execution_plan(
  File "/usr/local/lib/python3.8/site-packages/dagster/core/workspace/context.py", line 190, in get_external_execution_plan
    return self.get_repository_location(
  File "/usr/local/lib/python3.8/site-packages/dagster/core/host_representation/repository_location.py", line 620, in get_external_execution_plan
    execution_plan_snapshot_or_error = sync_get_external_execution_plan_grpc(
  File "/usr/local/lib/python3.8/site-packages/dagster/api/snapshot_execution_plan.py", line 36, in sync_get_external_execution_plan_grpc
    api_client.execution_plan_snapshot(
  File "/usr/local/lib/python3.8/site-packages/dagster/grpc/client.py", line 153, in execution_plan_snapshot
    res = self._query(
  File "/usr/local/lib/python3.8/site-packages/dagster/grpc/client.py", line 110, in _query
    response = getattr(stub, method)(request_type(**kwargs), timeout=timeout)
  File "/usr/local/lib/python3.8/site-packages/grpc/_channel.py", line 946, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/usr/local/lib/python3.8/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
    raise _InactiveRpcError(state)

marcos

10/17/2021, 7:39 PM

Hopefully that's helpful! I can dig up more details if needed

daniel

10/17/2021, 7:45 PM

Definitely helpful - and just to confirm, this is the latest version of dagster? Appears to be from looking at your GitHub repo

marcos

10/17/2021, 7:49 PM

Yup, that's right

daniel

10/18/2021, 9:05 PM

Hey marcos - just trying to reproduce this with some stubbed out data. You said 'It seems to happen with solids that are downstream of dynamic outputs' - do you recall a specific solid where it happened? Thanks

daniel

10/18/2021, 9:15 PM

and if you remember how many dynamic outputs were being collected in that particular step, that would also be helpful

daniel

10/18/2021, 9:29 PM

the github repo is really helpful - if sending over your a dump of your runs and event_logs table (over DM or email) is an option, that would definitely be enough to reproduce the problem

daniel

10/18/2021, 9:49 PM

Ah, actually, I think there is an env var that you can set (which we should document) that increases the limit that you're running into - try setting DAGSTER_GRPC_MAX_RX_BYTES to 20000000

daniel

10/18/2021, 9:49 PM

@Dagster Bot docs Document the DAGSTER_GRPC_MAX_RX_BYTES environment variable to increase gRPC memory limits

Dagster Bot

10/18/2021, 9:50 PM

Created issue at: https://github.com/dagster-io/dagster/issues/5249

marcos

10/19/2021, 9:14 PM

Thank you @daniel, setting the

DAGSTER_GRPC_MAX_RX_BYTES

environment variable did it! For background info: I have an external API request that returned 1,200 unique ids. Those ids are dynamic and for each one I need to hit a set of additional API endpoints (endpoints A, B, and C). I had a solid that runs the GET to receive the ~1,200 ids and dynamically outputs them. I put those results through

.map()

functions to hit endpoints A,B, C which can all be run in parallel. Dagster has allowed ETL pipelines that once took days, to take only several hours to complete.

condagster 1

Open in Slack

Previous Next