Hello! Posting this here for any additional ideas ...
# ask-community
j
Hello! Posting this here for any additional ideas to debug this problem. I seem to have an issue with the scheduler for our dagster daemon. it seems convinced that one of our grpc servers is not reachable, despite the container running the grpc service being healthy and running. This is the error that the scheduler has been reporting back the last couple of attempted scheduled runs:
Copy code
dagster.core.scheduler.scheduler.DagsterSchedulerError: Unable to reach the user code server for schedule daily_job_ingestion. Schedule will resume execution once the server is available.
  File "/root/.pyenv/versions/3.6.10/lib/python3.6/site-packages/dagster/scheduler/scheduler.py", line 363, in launch_scheduled_runs_for_schedule
    ) from e
In the
Workspace
section of dagit, the repository is displayed as "Loaded", and is loaded as of 20 minutes ago still. I've tried the following already: 1. restart all services (dagit/daemon/grpcs) 2. manually reload the grpc connection in workspace.
🤖 1
d
Hi Jose - do you have a full stack trace? Usually that DagsterSchedulerError wraps some inner error
j
sure thing:
Copy code
dagster.core.errors.DagsterUserCodeUnreachableError: Could not reach user code server
  File "/root/.pyenv/versions/3.6.10/lib/python3.6/site-packages/dagster/scheduler/scheduler.py", line 356, in launch_scheduled_runs_for_schedule
    debug_crash_flags,
  File "/root/.pyenv/versions/3.6.10/lib/python3.6/site-packages/dagster/scheduler/scheduler.py", line 430, in _schedule_runs_at_time
    scheduled_execution_time=schedule_time,
  File "/root/.pyenv/versions/3.6.10/lib/python3.6/site-packages/dagster/core/host_representation/repository_location.py", line 745, in get_external_schedule_execution_data
    scheduled_execution_time,
  File "/root/.pyenv/versions/3.6.10/lib/python3.6/site-packages/dagster/api/snapshot_schedule.py", line 57, in sync_get_external_schedule_execution_data_grpc
    else None,
  File "/root/.pyenv/versions/3.6.10/lib/python3.6/site-packages/dagster/grpc/client.py", line 278, in external_schedule_execution
    external_schedule_execution_args
  File "/root/.pyenv/versions/3.6.10/lib/python3.6/site-packages/dagster/grpc/client.py", line 117, in _streaming_query
    raise DagsterUserCodeUnreachableError("Could not reach user code server") from e
d
there's nothing else underneath that? Kind of unreasonable level of nesting here but often under that is a more specific grpc error
j
i can try digging around in our actual logs, give me a moment
these are from the dagit logs.
d
i bet there's something with a bit more in the daemon logs
j
it looks like the root cause of the issue seems to have been the following:
Copy code
2022-05-20T15:47:17.023-04:00 raise DagsterUserCodeUnreachableError("Could not reach user code server") from e

2022-05-20T15:47:17.023-04:00 The above exception was caused by the following exception:

2022-05-20T15:47:17.023-04:00 grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:

2022-05-20T15:47:17.023-04:00 status = StatusCode.UNKNOWN

2022-05-20T15:47:17.023-04:00 details = "Exception iterating responses: maximum recursion depth exceeded while calling a Python object"

2022-05-20T15:47:17.023-04:00 debug_error_string = "{"created":"@1653076036.997650590","description":"Error received from peer ipv4:<IPADDRESS>:4000","file":"src/core/lib/surface/call.cc","file_line":952,"grpc_message":"Exception iterating responses: maximum recursion depth exceeded while calling a Python object","grpc_status":2}"

2022-05-20T15:47:17.023-04:00

Copy
>
>
d
Got it - what version of dagster is this?
j
according to our requirements file, it's
0.14.2
d
Got it - so I think that indicates a bug in your schedule code. If you upgrade to 0.14.4, I believe we will wrap that error in a much nicer stack trace for you
"maximum recursion depth exceeded while calling a Python object" is very likely referring to something in your schedule function though
j
ok. I think this gives us enough to do our own exploring. Thanks!
condagster 1
Following up - it seems like it was indeed an issue in our own code. We will be updating to 0.14.4 as soon as we can 🙂