I have several multi-asset-sensors ticking every m...
# ask-community
c
I have several multi-asset-sensors ticking every minute, at some point they went from a completing every tick to failure state on every tick, they were suddenly getting grpc errors from the code-location that were indicating a recursion loop, the logs on my code location pod seemed to show some loop ocurring in some serialization routine (i.e.
L541 -> L557 -> L414 -> L541
)
Copy code
2023-06-01 09:21:37,380 ERROR [/.pyenv/versions/3.9.11/lib/python3.9/site-packages/grpc/_server.py:471] Exception iterating responses: maximum recursion depth exceeded in comparison
Traceback (most recent call last):
  File "/.pyenv/versions/3.9.11/lib/python3.9/site-packages/grpc/_server.py", line 461, in _take_response_from_response_iterator
    return next(response_iterator), True
  File "/.pyenv/versions/3.9.11/lib/python3.9/site-packages/dagster/_grpc/server.py", line 620, in ExternalSensorExecution
    serialized_sensor_data = serialize_value(
  File "/.pyenv/versions/3.9.11/lib/python3.9/site-packages/dagster/_serdes/serdes.py", line 496, in serialize_value
    packed_value = pack_value(val, whitelist_map=whitelist_map)
  File "/.pyenv/versions/3.9.11/lib/python3.9/site-packages/dagster/_serdes/serdes.py", line 541, in pack_value
    return _pack_value(val, whitelist_map=whitelist_map, descent_path=_root(val))
  File "/.pyenv/versions/3.9.11/lib/python3.9/site-packages/dagster/_serdes/serdes.py", line 557, in _pack_value
    return serializer.pack(val, whitelist_map, descent_path)
  File "/.pyenv/versions/3.9.11/lib/python3.9/site-packages/dagster/_serdes/serdes.py", line 414, in pack
    packed[storage_key] = pack_fn(
  File "/.pyenv/versions/3.9.11/lib/python3.9/site-packages/dagster/_serdes/serdes.py", line 541, in pack_value
    return _pack_value(val, whitelist_map=whitelist_map, descent_path=_root(val))
  File "/.pyenv/versions/3.9.11/lib/python3.9/site-packages/dagster/_serdes/serdes.py", line 557, in _pack_value
    return serializer.pack(val, whitelist_map, descent_path)
  File "/.pyenv/versions/3.9.11/lib/python3.9/site-packages/dagster/_serdes/serdes.py", line 414, in pack
    packed[storage_key] = pack_fn(
There weren't any changes I can see that would have occurred in the time window where the ticks went from succeeding to failure Bouncing the code location service by killing the associated pod and letting the deployment restore it made the sensors go back to a working state
I'm not sure how to reproduce this error, it's not clear to me what conditions would have caused this to happen hence what might cause it to happen again, so just want to see if anything comes to mind for anyone else
a
is that the full stack trace that you observed?
c
That trace is what's seen on the code-location server logs, it just repeats across those three lines until ending with
Copy code
File "/.pyenv/versions/3.9.11/lib/python3.9/site-packages/dagster/_serdes/serdes.py", line 541, in pack_value
    return _pack_value(val, whitelist_map=whitelist_map, descent_path=_root(val))
  File "/.pyenv/versions/3.9.11/lib/python3.9/site-packages/dagster/_serdes/serdes.py", line 571, in _pack_value
    if isinstance(val, collections.abc.Sequence):
  File "/.pyenv/versions/3.9.11/lib/python3.9/abc.py", line 119, in __instancecheck__
    return _abc_instancecheck(cls, instance)
RecursionError: maximum recursion depth exceeded in comparison
The trace given by the daemon i.e. in the sensor evaluation is
Copy code
dagster._core.errors.DagsterUserCodeUnreachableError: Could not reach user code server. gRPC Error code: UNKNOWN

  File "/usr/local/lib/python3.7/site-packages/dagster/_daemon/sensor.py", line 517, in _process_tick_generator
    sensor_debug_crash_flags,
  File "/usr/local/lib/python3.7/site-packages/dagster/_daemon/sensor.py", line 581, in _evaluate_sensor
    instigator_data.cursor if instigator_data else None,
  File "/usr/local/lib/python3.7/site-packages/dagster/_core/host_representation/code_location.py", line 861, in get_external_sensor_execution_data
    cursor,
  File "/usr/local/lib/python3.7/site-packages/dagster/_api/snapshot_sensor.py", line 72, in sync_get_external_sensor_execution_data_grpc
    timeout=timeout,
  File "/usr/local/lib/python3.7/site-packages/dagster/_grpc/client.py", line 394, in external_sensor_execution
    custom_timeout_message=custom_timeout_message,
  File "/usr/local/lib/python3.7/site-packages/dagster/_grpc/client.py", line 185, in _streaming_query
    e, timeout=timeout, custom_timeout_message=custom_timeout_message
  File "/usr/local/lib/python3.7/site-packages/dagster/_grpc/client.py", line 142, in _raise_grpc_exception
    ) from e

The above exception was caused by the following exception:
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
	status = StatusCode.UNKNOWN
	details = "Exception iterating responses: maximum recursion depth exceeded in comparison"
	debug_error_string = "{"created":"@1685636762.913646896","description":"Error received from peer ipv4:10.7.233.129:3030","file":"src/core/lib/surface/call.cc","file_line":966,"grpc_message":"Exception iterating responses: maximum recursion depth exceeded in comparison","grpc_status":2}"
>

  File "/usr/local/lib/python3.7/site-packages/dagster/_grpc/client.py", line 181, in _streaming_query
    method, request=request_type(**kwargs), timeout=timeout
  File "/usr/local/lib/python3.7/site-packages/dagster/_grpc/client.py", line 169, in _get_streaming_response
    yield from getattr(stub, method)(request, metadata=self._metadata, timeout=timeout)
  File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 426, in __next__
    return self._next()
  File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 826, in _next
    raise self
a
what version are you on?
c
1.2.7
a
are you setting non trivial
run_config
on the RunRequests in these sensors?
c
We're setting the partition_key and a tag, here's where we create the RunRequest
Copy code
run_requests.append(
                    RunRequest(
                        partition_key=partition,
                        tags={TRIGGER_SENSOR_TAG_LABEL: "true"},
                    )
                )
TRIGGER_SENSOR_TAG_LABEL
is just a string
a
hmm, well not sure exactly what went awry here. There have been a few related changes that upgrading up past at least
1.3.3
could potentially help so i would recommend starting there.
the serialization routine is recursive so thats expected, but it exceeding the recursion limit points at attempting to serialize a recursive data structure which is unexpected. Not sure what that would be after ruling out
run_config
c
When is the serialization routine invoked? We make a call to
Copy code
# context: MultiAssetSensorEvaluationContext
context.latest_materialization_records_by_partition_and_asset()
And
Copy code
context.instance.get_runs(...)
In the body of the sensors as well, not sure if it might happen these calls, but in the latter case at-least, the runfilter is very simple (just filtering on a list of particular run-ids)
a
this bit from the stack trace
Copy code
dagster/_grpc/server.py", line 620, in ExternalSensorExecution
    serialized_sensor_data = serialize_value(
points at it being the result of the sensor evaluation which is one of these https://github.com/dagster-io/dagster/blame/master/python_modules/dagster/dagster/_core/definitions/sensor_definition.py#L906-L926
we tightened up how exceptions / errors are handled in
ExternalSensorExecution
in
1.3.3
c
d
Any chance you'd be able to share the full code of the sensor that's hitting this?