https://dagster.io/ logo
#ask-community
Title
# ask-community
e

Edo

08/28/2023, 5:50 AM
Hi, I found my jobs failing to execute this weekend because the server couldn't reach the code location. I'm using docker deployment. Is there a way to detect this and send notification?
d

daniel

08/28/2023, 3:10 PM
Hi Edo - unfortunately I don't think we currently have built-in alerting for this. Depending on the exact nature of the failure I could imagine building something on top of our graphql api that does this: https://docs.dagster.io/concepts/webserver/graphql#graphql-api
e

Edo

08/29/2023, 1:30 AM
Hi @daniel, thanks for replying. This is the error I get from daemon container logs:
Copy code
2023-08-27 14:31:26 +0000 - dagster.daemon.SensorDaemon - WARNING - Could not load location dagster_pipelines to check for sensors due to the following error: dagster._core.errors.DagsterUserCodeUnreachableError: Could not reach user code server. gRPC Error code: UNKNOWN

Stack Trace:
  File "/usr/local/lib/python3.11/site-packages/dagster/_core/workspace/context.py", line 605, in _load_location
    origin.reload_location(self.instance) if reload else origin.create_location()
                                                         ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dagster/_core/host_representation/origin.py", line 368, in create_location
    return GrpcServerCodeLocation(self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dagster/_core/host_representation/code_location.py", line 590, in __init__
    list_repositories_response = sync_list_repositories_grpc(self.client)
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dagster/_api/list_repositories.py", line 20, in sync_list_repositories_grpc
    api_client.list_repositories(),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dagster/_grpc/client.py", line 229, in list_repositories
    res = self._query("ListRepositories", api_pb2.ListRepositoriesRequest)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dagster/_grpc/client.py", line 157, in _query
    self._raise_grpc_exception(
  File "/usr/local/lib/python3.11/site-packages/dagster/_grpc/client.py", line 140, in _raise_grpc_exception
    raise DagsterUserCodeUnreachableError(

The above exception was caused by the following exception:
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNKNOWN
	details = "Exception calling application: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "failed to connect to all addresses; last error: UNKNOWN: unix:/tmp/tmp4qsxck83: No such file or directory"
	debug_error_string = "UNKNOWN:failed to connect to all addresses; last error: UNKNOWN: unix:/tmp/tmp4qsxck83: No such file or directory {grpc_status:14, created_time:"2023-08-27T14:30:56.220980518+00:00"}"
>"
	debug_error_string = "UNKNOWN:Error received from peer  {created_time:"2023-08-27T14:30:56.221639446+00:00", grpc_status:2, grpc_message:"Exception calling application: <_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNAVAILABLE\n\tdetails = \"failed to connect to all addresses; last error: UNKNOWN: unix:/tmp/tmp4qsxck83: No such file or directory\"\n\tdebug_error_string = \"UNKNOWN:failed to connect to all addresses; last error: UNKNOWN: unix:/tmp/tmp4qsxck83: No such file or directory {grpc_status:14, created_time:\"2023-08-27T14:30:56.220980518+00:00\"}\"\n>"}"
>

Stack Trace:
  File "/usr/local/lib/python3.11/site-packages/dagster/_grpc/client.py", line 155, in _query
    return self._get_response(method, request=request_type(**kwargs), timeout=timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dagster/_grpc/client.py", line 130, in _get_response
    return getattr(stub, method)(request, metadata=self._metadata, timeout=timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/grpc/_channel.py", line 1161, in __call__
    return _end_unary_response_blocking(state, call, False, None)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/grpc/_channel.py", line 1004, in _end_unary_response_blocking
    raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The above exception occurred during handling of the following exception:
dagster._core.errors.DagsterUserCodeUnreachableError: Could not reach user code server. gRPC Error code: UNKNOWN

Stack Trace:
  File "/usr/local/lib/python3.11/site-packages/dagster/_grpc/server_watcher.py", line 119, in watch_grpc_server_thread
    watch_for_changes()
  File "/usr/local/lib/python3.11/site-packages/dagster/_grpc/server_watcher.py", line 82, in watch_for_changes
    new_server_id = client.get_server_id(timeout=REQUEST_TIMEOUT)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dagster/_grpc/client.py", line 214, in get_server_id
    res = self._query("GetServerId", api_pb2.Empty, timeout=timeout)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dagster/_grpc/client.py", line 157, in _query
    self._raise_grpc_exception(
  File "/usr/local/lib/python3.11/site-packages/dagster/_grpc/client.py", line 140, in _raise_grpc_exception
    raise DagsterUserCodeUnreachableError(

The above exception was caused by the following exception:
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNKNOWN
	details = "Exception calling application: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "failed to connect to all addresses; last error: UNKNOWN: unix:/tmp/tmp4qsxck83: No such file or directory"
	debug_error_string = "UNKNOWN:failed to connect to all addresses; last error: UNKNOWN: unix:/tmp/tmp4qsxck83: No such file or directory {grpc_status:14, created_time:"2023-08-27T14:30:46.153842679+00:00"}"
>"
	debug_error_string = "UNKNOWN:Error received from peer  {created_time:"2023-08-27T14:30:46.155007022+00:00", grpc_status:2, grpc_message:"Exception calling application: <_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNAVAILABLE\n\tdetails = \"failed to connect to all addresses; last error: UNKNOWN: unix:/tmp/tmp4qsxck83: No such file or directory\"\n\tdebug_error_string = \"UNKNOWN:failed to connect to all addresses; last error: UNKNOWN: unix:/tmp/tmp4qsxck83: No such file or directory {grpc_status:14, created_time:\"2023-08-27T14:30:46.153842679+00:00\"}\"\n>"}"
>

Stack Trace:
  File "/usr/local/lib/python3.11/site-packages/dagster/_grpc/client.py", line 155, in _query
    return self._get_response(method, request=request_type(**kwargs), timeout=timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dagster/_grpc/client.py", line 130, in _get_response
    return getattr(stub, method)(request, metadata=self._metadata, timeout=timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/grpc/_channel.py", line 1161, in __call__
    return _end_unary_response_blocking(state, call, False, None)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/grpc/_channel.py", line 1004, in _end_unary_response_blocking
    raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Using graphql-api means to build some kind of ping and notification program outside of dagster environment?
This is the error from the last job that ran before code server became unreachable:
Copy code
Copy
dagster_postgres.utils.DagsterPostgresException: too many retries for DB connection

  File "/usr/local/lib/python3.11/site-packages/dagster/_core/instance/__init__.py", line 203, in emit
    self._instance.handle_new_event(event)
  File "/usr/local/lib/python3.11/site-packages/dagster/_core/instance/__init__.py", line 2083, in handle_new_event
    self._event_storage.store_event(event)
  File "/usr/local/lib/python3.11/site-packages/dagster_postgres/event_log/event_log.py", line 176, in store_event
    with self._connect() as conn:
  File "/usr/local/lib/python3.11/contextlib.py", line 137, in __enter__
    return next(self.gen)
           ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dagster_postgres/utils.py", line 165, in create_pg_connection
    conn = retry_pg_connection_fn(engine.connect)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dagster_postgres/utils.py", line 129, in retry_pg_connection_fn
    raise DagsterPostgresException("too many retries for DB connection") from exc

The above exception was caused by the following exception:
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) could not translate host name "dagster_postgresql" to address: Temporary failure in name resolution

(Background on this error at: <https://sqlalche.me/e/20/e3q8>)

  File "/usr/local/lib/python3.11/site-packages/dagster_postgres/utils.py", line 117, in retry_pg_connection_fn
    return fn()
           ^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 3264, in connect
    return self._connection_cls(self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 147, in __init__
    Connection._handle_dbapi_exception_noconnection(
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 2426, in _handle_dbapi_exception_noconnection
    raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 145, in __init__
    self._dbapi_connection = engine.raw_connection()
                             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 3288, in raw_connection
    return self.pool.connect()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/pool/base.py", line 452, in connect
    return _ConnectionFairy._checkout(self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/pool/base.py", line 1267, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/pool/base.py", line 716, in checkout
    rec = pool._do_get()
          ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/pool/impl.py", line 284, in _do_get
    return self._create_connection()
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/pool/base.py", line 393, in _create_connection
    return _ConnectionRecord(self)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/pool/base.py", line 678, in __init__
    self.__connect()
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/pool/base.py", line 902, in __connect
    with util.safe_reraise():
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/util/langhelpers.py", line 147, in __exit__
    raise exc_value.with_traceback(exc_tb)
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/pool/base.py", line 898, in __connect
    self.dbapi_connection = connection = pool._invoke_creator(self)
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/create.py", line 637, in connect
    return dialect.connect(*cargs, **cparams)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/default.py", line 615, in connect
    return self.loaded_dbapi.connect(*cargs, **cparams)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/psycopg2/__init__.py", line 122, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The above exception was caused by the following exception:
psycopg2.OperationalError: could not translate host name "dagster_postgresql" to address: Temporary failure in name resolution


  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 145, in __init__
    self._dbapi_connection = engine.raw_connection()
                             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 3288, in raw_connection
    return self.pool.connect()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/pool/base.py", line 452, in connect
    return _ConnectionFairy._checkout(self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/pool/base.py", line 1267, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/pool/base.py", line 716, in checkout
    rec = pool._do_get()
          ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/pool/impl.py", line 284, in _do_get
    return self._create_connection()
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/pool/base.py", line 393, in _create_connection
    return _ConnectionRecord(self)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/pool/base.py", line 678, in __init__
    self.__connect()
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/pool/base.py", line 902, in __connect
    with util.safe_reraise():
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/util/langhelpers.py", line 147, in __exit__
    raise exc_value.with_traceback(exc_tb)
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/pool/base.py", line 898, in __connect
    self.dbapi_connection = connection = pool._invoke_creator(self)
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/create.py", line 637, in connect
    return dialect.connect(*cargs, **cparams)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/default.py", line 615, in connect
    return self.loaded_dbapi.connect(*cargs, **cparams)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/psycopg2/__init__.py", line 122, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The above exception occurred during handling of the following exception:
requests.exceptions.ConnectionError: HTTPConnectionPool(host='airbyte-proxy', port=8000): Max retries exceeded with url: /api/v1/jobs/get (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fd64fec0e10>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution'))

  File "/usr/local/lib/python3.11/site-packages/dagster_airbyte/resources.py", line 433, in make_request
    response = requests.request(
               ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/requests/adapters.py", line 519, in send
    raise ConnectionError(e, request=request)

The above exception occurred during handling of the following exception:
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='airbyte-proxy', port=8000): Max retries exceeded with url: /api/v1/jobs/get (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fd64fec0e10>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution'))

  File "/usr/local/lib/python3.11/site-packages/requests/adapters.py", line 486, in send
    resp = conn.urlopen(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 798, in urlopen
    retries = retries.increment(
              ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/util/retry.py", line 592, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))

The above exception occurred during handling of the following exception:
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7fd64fec0e10>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution

  File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 714, in urlopen
    httplib_response = self._make_request(
                       ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 415, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/local/lib/python3.11/site-packages/urllib3/connection.py", line 244, in request
    super(HTTPConnection, self).request(method, url, body=body, headers=headers)
  File "/usr/local/lib/python3.11/http/client.py", line 1282, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/local/lib/python3.11/http/client.py", line 1328, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.11/http/client.py", line 1277, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.11/http/client.py", line 1037, in _send_output
    self.send(msg)
  File "/usr/local/lib/python3.11/http/client.py", line 975, in send
    self.connect()
  File "/usr/local/lib/python3.11/site-packages/urllib3/connection.py", line 205, in connect
    conn = self._new_conn()
           ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/connection.py", line 186, in _new_conn
    raise NewConnectionError(

The above exception occurred during handling of the following exception:
socket.gaierror: [Errno -3] Temporary failure in name resolution

  File "/usr/local/lib/python3.11/site-packages/urllib3/connection.py", line 174, in _new_conn
    conn = connection.create_connection(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/util/connection.py", line 72, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/socket.py", line 962, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The job ran successfully (assets materialized).
2 Views