Ahmed Slaoui
11/07/2022, 2:12 PMDagsterUserCodeUnreachableError: Could not reach user code server
with a "failed to connect to all addresses"
whenever I run backfills that generate a lot of job runs.
I have a daily partitioned job with 13 independent assets, and I'm attempting to backfill 1-2 years of daily runs (couple of hundreds of partitions). The asset materializations take a couple of minutes to execute.
A few minutes after launching the backfill, the Backfill status
shows Failed
with the following error in the thread. Only the runs that managed to get queued before the backfill status error get executed (maybe 10% of the partitions).
The error code points to a timeout issue so I attempted to increase the local_startup_timeout
to 600 seconds, with no effect ..
Note: We're running Dagster as a service locally with Postgres storage, but the issue was present with the default Sqlite storage as well.
Any idea ?Backfill status: Failed
dagster._core.errors.DagsterUserCodeUnreachableError: Could not reach user code server
File "E:\code\tasks-runner\venv\lib\site-packages\dagster\_daemon\backfill.py", line 95, in execute_backfill_iteration
for _run_id in submit_backfill_runs(
File "E:\code\tasks-runner\venv\lib\site-packages\dagster\_core\execution\backfill.py", line 196, in submit_backfill_runs
pipeline_run = create_backfill_run(
File "E:\code\tasks-runner\venv\lib\site-packages\dagster\_core\execution\backfill.py", line 289, in create_backfill_run
external_execution_plan = repo_location.get_external_execution_plan(
File "E:\code\tasks-runner\venv\lib\site-packages\dagster\_core\host_representation\repository_location.py", line 706, in get_external_execution_plan
execution_plan_snapshot_or_error = sync_get_external_execution_plan_grpc(
File "E:\code\tasks-runner\venv\lib\site-packages\dagster\_api\snapshot_execution_plan.py", line 46, in sync_get_external_execution_plan_grpc
api_client.execution_plan_snapshot(
File "E:\code\tasks-runner\venv\lib\site-packages\dagster\_grpc\client.py", line 159, in execution_plan_snapshot
res = self._query(
File "E:\code\tasks-runner\venv\lib\site-packages\dagster\_grpc\client.py", line 115, in _query
raise DagsterUserCodeUnreachableError("Could not reach user code server") from e
The above exception was caused by the following exception:
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "failed to connect to all addresses"
debug_error_string = "{"created":"@1667821417.237000000","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3261,"referenced_errors":[{"created":"@1667821417.237000000","description":"failed to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":167,"grpc_status":14}]}"
>
File "E:\code\tasks-runner\venv\lib\site-packages\dagster\_grpc\client.py", line 112, in _query
response = getattr(stub, method)(request_type(**kwargs), timeout=timeout)
File "E:\code\tasks-runner\venv\lib\site-packages\grpc\_channel.py", line 946, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "E:\code\tasks-runner\venv\lib\site-packages\grpc\_channel.py", line 849, in _end_unary_response_blocking
raise _InactiveRpcError(state)
Alexis Manuel
11/29/2022, 1:38 PMAhmed Slaoui
12/01/2022, 5:30 PMArthur
01/08/2023, 2:17 AMRafael Gomes
01/10/2023, 7:29 PMArthur
01/10/2023, 7:40 PMRafael Gomes
01/10/2023, 7:45 PM1.1.3
and was planning to upgrade to the latest version.Arthur
01/10/2023, 7:45 PM