We're getting an intermittent `GraphQLStorageError...
# dagster-serverless
j
We're getting an intermittent
GraphQLStorageError
that's blocking our auto-materialize jobs, trace in thread:
Copy code
Copy
dagster_cloud_cli.core.errors.GraphQLStorageError: Error in GraphQL response: [{'message': 'Internal Server Error (Trace ID: 3893643480369413991)', 'locations': [{'line': 22, 'column': 13}], 'path': ['eventLogs', 'getEventRecords']}]

  File "/usr/local/lib/python3.8/site-packages/dagster/_core/execution/plan/execute_plan.py", line 262, in dagster_event_sequence_for_step
    for step_event in check.generator(step_events):
  File "/usr/local/lib/python3.8/site-packages/dagster/_core/execution/plan/execute_step.py", line 326, in core_dagster_event_sequence_for_step
    step_context.fetch_external_input_asset_records()
  File "/usr/local/lib/python3.8/site-packages/dagster/_core/execution/context/system.py", line 890, in fetch_external_input_asset_records
    self._fetch_input_asset_record(key)
  File "/usr/local/lib/python3.8/site-packages/dagster/_core/execution/context/system.py", line 898, in _fetch_input_asset_record
    event = self.instance.get_latest_data_version_record(key)
  File "/usr/local/lib/python3.8/site-packages/dagster/_core/instance/__init__.py", line 2566, in get_latest_data_version_record
    observations = self.get_event_records(
  File "/usr/local/lib/python3.8/site-packages/dagster/_utils/__init__.py", line 649, in inner
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/dagster/_core/instance/__init__.py", line 1779, in get_event_records
    return self._event_storage.get_event_records(event_records_filter, limit, ascending)
  File "/usr/local/lib/python3.8/site-packages/dagster_cloud/storage/event_logs/storage.py", line 472, in get_event_records
    res = self._execute_query(
  File "/usr/local/lib/python3.8/site-packages/dagster_cloud/storage/event_logs/storage.py", line 276, in _execute_query
    res = self._graphql_client.execute(
  File "/usr/local/lib/python3.8/site-packages/dagster_cloud_cli/core/graphql_client.py", line 146, in execute
    raise GraphQLStorageError(str(e)) from e

The above exception was caused by the following exception:
dagster_cloud_cli.core.errors.GraphQLStorageError: Error in GraphQL response: [{'message': 'Internal Server Error (Trace ID: 3893643480369413991)', 'locations': [{'line': 22, 'column': 13}], 'path': ['eventLogs', 'getEventRecords']}]

  File "/usr/local/lib/python3.8/site-packages/dagster_cloud_cli/core/graphql_client.py", line 78, in execute
    return self._execute_retry(query, variable_values, headers)
  File "/usr/local/lib/python3.8/site-packages/dagster_cloud_cli/core/graphql_client.py", line 190, in _execute_retry
    raise GraphQLStorageError(f"Error in GraphQL response: {str(result['errors'])}")
p
Hi Joel. Thanks for the report. It looks like we’re hitting some performance issues querying for some asset history (the underlying error is a statement timeout)… We’re investigating and will update this thread.
👍 1
Hi Joel. We deployed some index changes that I think should help address some of the timeouts you’ve been seeing. Let me know if that resolves the issues you’ve been seeing.
j
Hey Phil, looks like we've avoided any of the same errors since the change, but we've just now hit some
Copy code
dagster_cloud_cli.core.errors.GraphQLStorageError: HTTPSConnectionPool(host='sealed.agent.dagster.cloud', port=443): Read timed out. (read timeout=60)
p
Taking a look
❤️ 1
@Joel Olazagasti is there any more to that trace? Was this from your agent logs?
j
Here's the full trace:
Copy code
Copy
dagster_cloud_cli.core.errors.GraphQLStorageError: HTTPSConnectionPool(host='sealed.agent.dagster.cloud', port=443): Read timed out. (read timeout=60)

  File "/usr/local/lib/python3.8/site-packages/dagster/_core/execution/plan/execute_plan.py", line 273, in dagster_event_sequence_for_step
    for step_event in check.generator(step_events):
  File "/usr/local/lib/python3.8/site-packages/dagster/_core/execution/plan/execute_step.py", line 375, in core_dagster_event_sequence_for_step
    for evt in _type_check_and_store_output(step_context, user_event):
  File "/usr/local/lib/python3.8/site-packages/dagster/_core/execution/plan/execute_step.py", line 425, in _type_check_and_store_output
    for output_event in _type_check_output(step_context, step_output_handle, output, version):
  File "/usr/local/lib/python3.8/site-packages/dagster/_core/execution/plan/execute_step.py", line 281, in _type_check_output
    yield DagsterEvent.step_output_event(
  File "/usr/local/lib/python3.8/site-packages/dagster/_core/events/__init__.py", line 767, in step_output_event
    return DagsterEvent.from_step(
  File "/usr/local/lib/python3.8/site-packages/dagster/_core/events/__init__.py", line 413, in from_step
    log_step_event(step_context, event)
  File "/usr/local/lib/python3.8/site-packages/dagster/_core/events/__init__.py", line 292, in log_step_event
    step_context.log.log_dagster_event(
  File "/usr/local/lib/python3.8/site-packages/dagster/_core/log_manager.py", line 409, in log_dagster_event
    self.log(level=level, msg=msg, extra={DAGSTER_META_KEY: dagster_event})
  File "/usr/local/lib/python3.8/site-packages/dagster/_core/log_manager.py", line 424, in log
    self._log(level, msg, args, **kwargs)
  File "/usr/local/lib/python3.8/logging/__init__.py", line 1589, in _log
    self.handle(record)
  File "/usr/local/lib/python3.8/logging/__init__.py", line 1599, in handle
    self.callHandlers(record)
  File "/usr/local/lib/python3.8/logging/__init__.py", line 1661, in callHandlers
    hdlr.handle(record)
  File "/usr/local/lib/python3.8/logging/__init__.py", line 954, in handle
    self.emit(record)
  File "/usr/local/lib/python3.8/site-packages/dagster/_core/log_manager.py", line 290, in emit
    handler.handle(dagster_record)
  File "/usr/local/lib/python3.8/logging/__init__.py", line 954, in handle
    self.emit(record)
  File "/usr/local/lib/python3.8/site-packages/dagster/_core/instance/__init__.py", line 199, in emit
    self._instance.handle_new_event(event)
  File "/usr/local/lib/python3.8/site-packages/dagster/_core/instance/__init__.py", line 1916, in handle_new_event
    self._event_storage.store_event(event)
  File "/usr/local/lib/python3.8/site-packages/dagster_cloud/storage/event_logs/storage.py", line 411, in store_event
    self._execute_query(
  File "/usr/local/lib/python3.8/site-packages/dagster_cloud/storage/event_logs/storage.py", line 289, in _execute_query
    res = self._graphql_client.execute(
  File "/usr/local/lib/python3.8/site-packages/dagster_cloud_cli/core/graphql_client.py", line 126, in execute
    raise GraphQLStorageError(str(e)) from e

The above exception was caused by the following exception:
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='sealed.agent.dagster.cloud', port=443): Read timed out. (read timeout=60)

  File "/usr/local/lib/python3.8/site-packages/dagster_cloud_cli/core/graphql_client.py", line 78, in execute
    return self._execute_retry(query, variable_values, headers)
  File "/usr/local/lib/python3.8/site-packages/dagster_cloud_cli/core/graphql_client.py", line 154, in _execute_retry
    response = <http://self._session.post|self._session.post>(
  File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 637, in post
    return self.request("POST", url, data=data, json=json, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/requests/adapters.py", line 532, in send
    raise ReadTimeout(e, request=request)

The above exception occurred during handling of the following exception:
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='sealed.agent.dagster.cloud', port=443): Read timed out. (read timeout=60)

  File "/usr/local/lib/python3.8/site-packages/requests/adapters.py", line 486, in send
    resp = conn.urlopen(
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 798, in urlopen
    retries = retries.increment(
  File "/usr/local/lib/python3.8/site-packages/urllib3/util/retry.py", line 550, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/local/lib/python3.8/site-packages/urllib3/packages/six.py", line 770, in reraise
    raise value
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 714, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 468, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 357, in _raise_timeout
    raise ReadTimeoutError(

The above exception occurred during handling of the following exception:
socket.timeout: The read operation timed out

  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 466, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 461, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/local/lib/python3.8/http/client.py", line 1348, in getresponse
    response.begin()
  File "/usr/local/lib/python3.8/http/client.py", line 316, in begin
    version, status, reason = self._read_status()
  File "/usr/local/lib/python3.8/http/client.py", line 277, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/local/lib/python3.8/socket.py", line 669, in readinto
    return self._sock.recv_into(b)
  File "/usr/local/lib/python3.8/ssl.py", line 1241, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/local/lib/python3.8/ssl.py", line 1099, in read
    return self._sslobj.read(len, buffer)

The above exception occurred during handling of the following exception:
TypeError: getresponse() got an unexpected keyword argument 'buffering'

  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 457, in _make_request
    httplib_response = conn.getresponse(buffering=True)
p
Thanks for this trace… I think this is a network blip between the serverless agent we’re running for you and the cloud storage. We’ve seen this very sporadically and haven’t 100% resolved the underlying issue. I think there might be some steps we could potentially take (i.e. set idempotency headers along with your requests so that the agent can repeat some network calls without duplicating stored events), but I’d like to hold off on that to see if this is a one-off event or a repeated issue.
j
Affirmative, sounds good by me! I'll let you know if it's a recurring issue