Hebo Yang
03/09/2023, 5:30 PMDEADLINE_EXCEEDED
with “file”: “src/core/lib/surface/call.cc” please? It seems that repo appears to be unreachable from Dagit but we could still see it running sensors..grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with: status = StatusCode.DEADLINE_EXCEEDED details = "Deadline Exceeded" debug_error_string = "{"created":"@1678381231.466377137","description":"Error received from peer ipv4:172.30.227.74:3030","file":"src/core/lib/surface/call.cc","file_line":966,"grpc_message":"Deadline Exceeded","grpc_status":4}" >
File "/usr/local/lib/python3.7/site-packages/dagster/_grpc/client.py", line 122, in _streaming_query
yield from response_stream
File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 426, in __next__
return self._next()
File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 826, in _next
raise self
daniel
03/09/2023, 6:36 PMHebo Yang
03/10/2023, 12:14 AMdagster._core.errors.DagsterUserCodeUnreachableError: Could not reach user code server
File "/usr/local/lib/python3.7/site-packages/dagster/_core/workspace/context.py", line 552, in _load_location
location = self._create_location_from_origin(origin)
File "/usr/local/lib/python3.7/site-packages/dagster/_core/workspace/context.py", line 476, in _create_location_from_origin
return origin.create_location()
File "/usr/local/lib/python3.7/site-packages/dagster/_core/host_representation/origin.py", line 329, in create_location
return GrpcServerRepositoryLocation(self)
File "/usr/local/lib/python3.7/site-packages/dagster/_core/host_representation/repository_location.py", line 547, in _init_
list_repositories_response = sync_list_repositories_grpc(self.client)
File "/usr/local/lib/python3.7/site-packages/dagster/_api/list_repositories.py", line 19, in sync_list_repositories_grpc
api_client.list_repositories(),
File "/usr/local/lib/python3.7/site-packages/dagster/_grpc/client.py", line 169, in list_repositories
res = self._query("ListRepositories", api_pb2.ListRepositoriesRequest)
File "/usr/local/lib/python3.7/site-packages/dagster/_grpc/client.py", line 115, in _query
raise DagsterUserCodeUnreachableError("Could not reach user code server") from e
The above exception was caused by the following exception:
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status = StatusCode.UNAVAILABLE details = "failed to connect to all addresses" debug_error_string = "{"created":"@1678403306.434323254","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3260,"referenced_errors":[{"created":"@1678403306.434322899","description":"failed to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":167,"grpc_status":14}]}" >
File "/usr/local/lib/python3.7/site-packages/dagster/_grpc/client.py", line 112, in _query
response = getattr(stub, method)(request_type(**kwargs), timeout=timeout)
File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 946, in _call_
return _end_unary_response_blocking(state, call, False, None)
File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
raise _InactiveRpcError(state)
file":"src/core/lib/transport/error_utils.cc","file_line":167,"grpc_status":14}]}
daniel
03/10/2023, 12:34 AMHebo Yang
03/10/2023, 12:51 AMgrpc_message":"Deadline Exceeded","grpc_status":4
seems to occur more frequently with out repo recently. When this happens, it seems that our repo is still processing sensor but just not responding to Dagit. Restarting the repo resolves it.
Digging into the stack trace from Dagit, this seems to be due to sql query timeout. Would increasing the postgres DB timeout help?grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.DEADLINE_EXCEEDED
details = "Deadline Exceeded"
debug_error_string = "{"created":"@1678897749.619204048","description":"Error received from peer ipv4:172.30.227.74:3030","file":"src/core/lib/surface/call.cc","file_line":966,"grpc_message":"Deadline Exceeded","grpc_status":4}"
>
Stack Trace:
File "/usr/local/lib/python3.7/site-packages/dagster/_grpc/client.py", line 122, in _streaming_query
yield from response_stream
File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 426, in __next__
return self._next()
File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 826, in _next
raise self
location_name=location_name, error_string=error.to_string()
An error occurred while resolving field Pipeline.runs
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1820, in _execute_context
cursor, statement, parameters, context
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 732, in do_execute
cursor.execute(statement, parameters)
psycopg2.errors.QueryCanceled: canceling statement due to statement timeout
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/graphql/execution/executor.py", line 452, in resolve_or_error
return executor.execute(resolve_fn, source, info, **args)
File "/usr/local/lib/python3.7/site-packages/graphql/execution/executors/sync.py", line 16, in execute
return fn(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/dagster_graphql/schema/pipelines/pipeline.py", line 749, in resolve_runs
self._external_pipeline.name, kwargs.get("limit")
File "/usr/local/lib/python3.7/site-packages/dagster_graphql/implementation/loader.py", line 200, in get_run_records_for_job
return self._get(RepositoryDataType.JOB_RUNS, job_name, limit)
File "/usr/local/lib/python3.7/site-packages/dagster_graphql/implementation/loader.py", line 59, in _get
self._fetch(data_type, limit)
File "/usr/local/lib/python3.7/site-packages/dagster_graphql/implementation/loader.py", line 72, in _fetch
bucket_by=JobBucket(bucket_limit=limit, job_names=job_names),
File "/usr/local/lib/python3.7/site-packages/dagster/_utils/__init__.py", line 631, in inner
return func(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/dagster/_core/instance/__init__.py", line 1315, in get_run_records
filters, limit, order_by, ascending, cursor, bucket_by
File "/usr/local/lib/python3.7/site-packages/dagster/_core/storage/runs/sql_run_storage.py", line 452, in get_run_records
rows = self.fetchall(query)
File "/usr/local/lib/python3.7/site-packages/dagster/_core/storage/runs/sql_run_storage.py", line 82, in fetchall
result_proxy = conn.execute(query)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1306, in execute
return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 333, in _execute_on_connection
self, multiparams, params, execution_options
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1508, in _execute_clauseelement
cache_hit=cache_hit,
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1863, in _execute_context
e, statement, parameters, cursor, context
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2044, in _handle_dbapi_exception
sqlalchemy_exception, with_traceback=exc_info[2], from_=e
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 208, in raise_
raise exception
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1820, in _execute_context
cursor, statement, parameters, context
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 732, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.errors.QueryCanceled) canceling statement due to statement timeout
[SQL: SELECT subquery.id, subquery.run_body, subquery.status, subquery.create_timestamp, subquery.update_timestamp, subquery.start_time, subquery.end_time
FROM (SELECT runs.id AS id, runs.run_body AS run_body, runs.status AS status, runs.create_timestamp AS create_timestamp, runs.update_timestamp AS update_timestamp, runs.start_time AS start_time, runs.end_time AS end_time, rank() OVER (PARTITION BY runs.pipeline_name ORDER BY runs.id DESC) AS rank
FROM runs
WHERE runs.pipeline_name IN
…
subquery
WHERE subquery.rank <= %(rank_1)s ORDER BY subquery.rank ASC]
…
[parameters:
daniel
03/15/2023, 5:09 PMHebo Yang
03/15/2023, 5:09 PMdaniel
03/15/2023, 5:10 PMHebo Yang
03/15/2023, 5:13 PMdaniel
03/15/2023, 5:16 PMHebo Yang
03/15/2023, 5:17 PMruns = self.context.instance.get_runs(
RunsFilter(
job_name=f"fabricator_{source_name}_job",
statuses=[PipelineRunStatus.SUCCESS],
tags={EVENT_RECORD_PARTITION_KEY: partition},
),
limit=1,
)
daniel
03/23/2023, 10:25 PMHebo Yang
03/23/2023, 10:26 PMdagster._core.errors.DagsterUserCodeUnreachableError: Could not reach user code server
File "/usr/local/lib/python3.7/site-packages/dagster/_core/workspace/context.py", line 552, in _load_location
location = self._create_location_from_origin(origin)
File "/usr/local/lib/python3.7/site-packages/dagster/_core/workspace/context.py", line 476, in _create_location_from_origin
return origin.create_location()
File "/usr/local/lib/python3.7/site-packages/dagster/_core/host_representation/origin.py", line 329, in create_location
return GrpcServerRepositoryLocation(self)
File "/usr/local/lib/python3.7/site-packages/dagster/_core/host_representation/repository_location.py", line 583, in __init__
self,
File "/usr/local/lib/python3.7/site-packages/dagster/_api/snapshot_repository.py", line 29, in sync_get_streaming_external_repositories_data_grpc
repository_name,
File "/usr/local/lib/python3.7/site-packages/dagster/_grpc/client.py", line 265, in streaming_external_repository
external_repository_origin
File "/usr/local/lib/python3.7/site-packages/dagster/_grpc/client.py", line 124, in _streaming_query
raise DagsterUserCodeUnreachableError("Could not reach user code server") from e
The above exception was caused by the following exception:
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with: status = StatusCode.DEADLINE_EXCEEDED details = "Deadline Exceeded" debug_error_string = "{"created":"@1679605240.260476809","description":"Deadline Exceeded","file":"src/core/ext/filters/deadline/deadline_filter.cc","file_line":81,"grpc_status":4}" >
File "/usr/local/lib/python3.7/site-packages/dagster/_grpc/client.py", line 122, in _streaming_query
yield from response_stream
File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 426, in __next__
return self._next()
File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 826, in _next
raise self
daniel
03/23/2023, 10:27 PMHebo Yang
03/23/2023, 10:29 PMdaniel
03/23/2023, 10:30 PMHebo Yang
03/23/2023, 10:30 PMdaniel
03/23/2023, 10:31 PMHebo Yang
03/23/2023, 10:32 PMdaniel
03/23/2023, 10:32 PMHebo Yang
03/23/2023, 10:32 PMdaniel
03/23/2023, 10:33 PMHebo Yang
03/23/2023, 10:33 PMdaniel
03/23/2023, 10:33 PMHebo Yang
03/23/2023, 10:34 PMdaniel
03/23/2023, 10:35 PMHebo Yang
03/23/2023, 10:36 PM