Dennis Schwartz (he/him)
03/04/2024, 9:23 AMdagster_cloud_cli.core.errors.GraphQLStorageError: Max retries (6) exceeded, too many ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) error responses.
It seems to be somewhat intermittent although it happens in probably half my runs and it's making my head explode.
Any tips of where to look for causes or errors? I have nothing else to go on.
I will post the full error message in the thread.Dennis Schwartz (he/him)
03/04/2024, 9:24 AMdagster_cloud_cli.core.errors.GraphQLStorageError: Max retries (6) exceeded, too many ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) error responses.
File "/usr/local/lib/python3.11/site-packages/dagster/_cli/api.py", line 377, in _execute_step_command_body
yield DagsterEvent.step_worker_started(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/dagster/_core/events/__init__.py", line 1120, in step_worker_started
log_manager.log_dagster_event(
File "/usr/local/lib/python3.11/site-packages/dagster/_core/log_manager.py", line 420, in log_dagster_event
self.log(level=level, msg=msg, extra={DAGSTER_META_KEY: dagster_event})
File "/usr/local/lib/python3.11/site-packages/dagster/_core/log_manager.py", line 435, in log
self._log(level, msg, args, **kwargs)
File "/usr/local/lib/python3.11/logging/__init__.py", line 1634, in _log
self.handle(record)
File "/usr/local/lib/python3.11/logging/__init__.py", line 1644, in handle
self.callHandlers(record)
File "/usr/local/lib/python3.11/logging/__init__.py", line 1706, in callHandlers
hdlr.handle(record)
File "/usr/local/lib/python3.11/logging/__init__.py", line 978, in handle
self.emit(record)
File "/usr/local/lib/python3.11/site-packages/dagster/_core/log_manager.py", line 301, in emit
handler.handle(dagster_record)
File "/usr/local/lib/python3.11/logging/__init__.py", line 978, in handle
self.emit(record)
File "/usr/local/lib/python3.11/site-packages/dagster/_core/instance/__init__.py", line 237, in emit
self._instance.handle_new_event(event)
File "/usr/local/lib/python3.11/site-packages/dagster/_core/instance/__init__.py", line 2350, in handle_new_event
self._event_storage.store_event(event)
File "/usr/local/lib/python3.11/site-packages/dagster_cloud/storage/event_logs/storage.py", line 519, in store_event
self._execute_query(
File "/usr/local/lib/python3.11/site-packages/dagster_cloud/storage/event_logs/storage.py", line 399, in _execute_query
res = self._graphql_client.execute(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/dagster_cloud_cli/core/graphql_client.py", line 135, in execute
raise GraphQLStorageError(
The above exception was caused by the following exception:
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
File "/usr/local/lib/python3.11/site-packages/dagster_cloud_cli/core/graphql_client.py", line 81, in execute
return self._execute_retry(query, variable_values, headers)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/dagster_cloud_cli/core/graphql_client.py", line 157, in _execute_retry
response = <http://self._session.post|self._session.post>(
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/requests/sessions.py", line 637, in post
return self.request("POST", url, data=data, json=json, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/requests/adapters.py", line 501, in send
raise ConnectionError(err, request=request)
The above exception occurred during handling of the following exception:
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
File "/usr/local/lib/python3.11/site-packages/requests/adapters.py", line 486, in send
resp = conn.urlopen(
^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 787, in urlopen
retries = retries.increment(
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/urllib3/util/retry.py", line 550, in increment
raise six.reraise(type(error), error, _stacktrace)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/urllib3/packages/six.py", line 769, in reraise
raise value.with_traceback(tb)
File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 449, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 444, in _make_request
httplib_response = conn.getresponse()
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/http/client.py", line 1390, in getresponse
response.begin()
File "/usr/local/lib/python3.11/http/client.py", line 325, in begin
version, status, reason = self._read_status()
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/http/client.py", line 294, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
The above exception occurred during handling of the following exception:
http.client.RemoteDisconnected: Remote end closed connection without response
File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 449, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 444, in _make_request
httplib_response = conn.getresponse()
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/http/client.py", line 1390, in getresponse
response.begin()
File "/usr/local/lib/python3.11/http/client.py", line 325, in begin
version, status, reason = self._read_status()
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/http/client.py", line 294, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
The above exception occurred during handling of the following exception:
TypeError: HTTPConnection.getresponse() got an unexpected keyword argument 'buffering'
File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse(buffering=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dennis Schwartz (he/him)
03/04/2024, 9:25 AMdagster 1.6.6
dagster-aws 0.22.6
dagster-k8s 0.22.6
dagstermill 0.22.6
They are the latest as far as I'm aware.Dennis Schwartz (he/him)
03/04/2024, 9:25 AMDennis Schwartz (he/him)
03/04/2024, 9:39 AMError in Dagster Cloud request (('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))). Retrying now.
Dennis Schwartz (he/him)
03/04/2024, 12:54 PMdagster-cloud.svc.cluster.local
and it caused the connection errors.
Since not all workloads were scheduled to this node, the issue was sporadic but happened often.
Removing this node from the cluster solved the problem.
Hope this might help someone in the future 🙂