Hello Posting this here for any additional ideas to debug th dagster #ask-community

Hello! Posting this here for any additional ideas ...

Jose Uribe

05/20/2022, 7:50 PM

Hello! Posting this here for any additional ideas to debug this problem. I seem to have an issue with the scheduler for our dagster daemon. it seems convinced that one of our grpc servers is not reachable, despite the container running the grpc service being healthy and running. This is the error that the scheduler has been reporting back the last couple of attempted scheduled runs:

Copy code

dagster.core.scheduler.scheduler.DagsterSchedulerError: Unable to reach the user code server for schedule daily_job_ingestion. Schedule will resume execution once the server is available.
  File "/root/.pyenv/versions/3.6.10/lib/python3.6/site-packages/dagster/scheduler/scheduler.py", line 363, in launch_scheduled_runs_for_schedule
    ) from e

In the

Workspace

section of dagit, the repository is displayed as "Loaded", and is loaded as of 20 minutes ago still. I've tried the following already: 1. restart all services (dagit/daemon/grpcs) 2. manually reload the grpc connection in workspace.

🤖 1

daniel

05/20/2022, 7:55 PM

Hi Jose - do you have a full stack trace? Usually that DagsterSchedulerError wraps some inner error

Jose Uribe

05/20/2022, 8:03 PM

sure thing:

Copy code

dagster.core.errors.DagsterUserCodeUnreachableError: Could not reach user code server
  File "/root/.pyenv/versions/3.6.10/lib/python3.6/site-packages/dagster/scheduler/scheduler.py", line 356, in launch_scheduled_runs_for_schedule
    debug_crash_flags,
  File "/root/.pyenv/versions/3.6.10/lib/python3.6/site-packages/dagster/scheduler/scheduler.py", line 430, in _schedule_runs_at_time
    scheduled_execution_time=schedule_time,
  File "/root/.pyenv/versions/3.6.10/lib/python3.6/site-packages/dagster/core/host_representation/repository_location.py", line 745, in get_external_schedule_execution_data
    scheduled_execution_time,
  File "/root/.pyenv/versions/3.6.10/lib/python3.6/site-packages/dagster/api/snapshot_schedule.py", line 57, in sync_get_external_schedule_execution_data_grpc
    else None,
  File "/root/.pyenv/versions/3.6.10/lib/python3.6/site-packages/dagster/grpc/client.py", line 278, in external_schedule_execution
    external_schedule_execution_args
  File "/root/.pyenv/versions/3.6.10/lib/python3.6/site-packages/dagster/grpc/client.py", line 117, in _streaming_query
    raise DagsterUserCodeUnreachableError("Could not reach user code server") from e

daniel

05/20/2022, 8:04 PM

there's nothing else underneath that? Kind of unreasonable level of nesting here but often under that is a more specific grpc error

Jose Uribe

05/20/2022, 8:04 PM

i can try digging around in our actual logs, give me a moment

Jose Uribe

05/20/2022, 8:04 PM

these are from the dagit logs.

daniel

05/20/2022, 8:05 PM

i bet there's something with a bit more in the daemon logs

Jose Uribe

05/20/2022, 8:14 PM

it looks like the root cause of the issue seems to have been the following:

Copy code

2022-05-20T15:47:17.023-04:00 raise DagsterUserCodeUnreachableError("Could not reach user code server") from e

2022-05-20T15:47:17.023-04:00 The above exception was caused by the following exception:

2022-05-20T15:47:17.023-04:00 grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:

2022-05-20T15:47:17.023-04:00 status = StatusCode.UNKNOWN

2022-05-20T15:47:17.023-04:00 details = "Exception iterating responses: maximum recursion depth exceeded while calling a Python object"

2022-05-20T15:47:17.023-04:00 debug_error_string = "{"created":"@1653076036.997650590","description":"Error received from peer ipv4:<IPADDRESS>:4000","file":"src/core/lib/surface/call.cc","file_line":952,"grpc_message":"Exception iterating responses: maximum recursion depth exceeded while calling a Python object","grpc_status":2}"

2022-05-20T15:47:17.023-04:00

Copy
>
>

daniel

05/20/2022, 8:15 PM

Got it - what version of dagster is this?

Jose Uribe

05/20/2022, 8:16 PM

according to our requirements file, it's

0.14.2

daniel

05/20/2022, 8:16 PM

Got it - so I think that indicates a bug in your schedule code. If you upgrade to 0.14.4, I believe we will wrap that error in a much nicer stack trace for you

daniel

05/20/2022, 8:16 PM

"maximum recursion depth exceeded while calling a Python object" is very likely referring to something in your schedule function though

Jose Uribe

05/20/2022, 8:18 PM

ok. I think this gives us enough to do our own exploring. Thanks!

condagster 1

Jose Uribe

05/20/2022, 8:40 PM

Following up - it seems like it was indeed an issue in our own code. We will be updating to 0.14.4 as soon as we can 🙂

39 Views

Open in Slack

Previous Next