Félix Tremblay
05/10/2023, 10:04 PMChris Comeau
05/10/2023, 10:10 PMdaniel
05/11/2023, 2:34 PMFélix Tremblay
05/11/2023, 4:56 PMdaniel
05/11/2023, 4:57 PMFélix Tremblay
05/11/2023, 4:59 PMPhilippe Laflamme
05/29/2023, 7:55 PMDAGSTER_GRPC_TIMEOUT_SECONDS
doesn't seem to help here.num_submit_workers
setting in 1.3.6, but it resulted in this error:daniel
05/30/2023, 1:11 AMPhilippe Laflamme
05/30/2023, 1:01 PMDAGSTER_GRPC_TIMEOUT_SECONDS
I've checked my k8s deployment and the environment variable is definitely set:
env:
- name: DAGSTER_HOME
value: "/dagster-home"
- name: POSTGRES_HOSTNAME
value: "postgres"
- name: DAGSTER_GRPC_TIMEOUT_SECONDS
value: "300"
And this is what I saw in the logs when the sensor started:
2023-05-29T14:34:20.376278480Z INFO:dagster.daemon.SensorDaemon:Checking for new runs for sensor: rtpd_publications_sensor
and when it failed:
2023-05-29T14:37:05.672024247Z ERROR:dagster.daemon.SensorDaemon:Sensor daemon caught an error for sensor rtpd_publications_sensor
Full trace:
Traceback (most recent call last):
File "/app/.venv/lib/python3.10/site-packages/dagster/_daemon/sensor.py", line 520, in _process_tick_generator
yield from _evaluate_sensor(
File "/app/.venv/lib/python3.10/site-packages/dagster/_daemon/sensor.py", line 583, in _evaluate_sensor
sensor_runtime_data = code_location.get_external_sensor_execution_data(
File "/app/.venv/lib/python3.10/site-packages/dagster/_core/host_representation/code_location.py", line 845, in get_external_sensor_execution_data
return sync_get_external_sensor_execution_data_grpc(
File "/app/.venv/lib/python3.10/site-packages/dagster/_api/snapshot_sensor.py", line 63, in sync_get_external_sensor_execution_data_grpc
api_client.external_sensor_execution(
File "/app/.venv/lib/python3.10/site-packages/dagster/_grpc/client.py", line 388, in external_sensor_execution
chunks = list(
File "/app/.venv/lib/python3.10/site-packages/dagster/_grpc/client.py", line 184, in _streaming_query
self._raise_grpc_exception(
File "/app/.venv/lib/python3.10/site-packages/dagster/_grpc/client.py", line 140, in _raise_grpc_exception
raise DagsterUserCodeUnreachableError(
dagster._core.errors.DagsterUserCodeUnreachableError: Could not reach user code server. gRPC Error code: UNAVAILABLE
As you can see, the time between the sensor starting and it failing is more than 60s, but it's also not 300s. I tried bumping that to 600s and I got a similar result (though I don't have the exact timeout on hand at the moment). I'm happy to test this using "Test Sensor" if that should also respect the environment variable setting.daniel
05/30/2023, 1:11 PMPhilippe Laflamme
05/30/2023, 1:30 PMDEADLINE_EXCEEDED
Perhaps some other resource is being exhausted here, but there’s nothing in the logs to go ondaniel
05/30/2023, 1:37 PMPhilippe Laflamme
05/30/2023, 1:39 PMdaniel
05/30/2023, 1:41 PMPhilippe Laflamme
05/30/2023, 1:41 PMdaniel
05/30/2023, 1:42 PMPhilippe Laflamme
05/30/2023, 1:43 PMdaniel
05/30/2023, 1:43 PMPhilippe Laflamme
05/30/2023, 2:28 PMdaniel
05/30/2023, 2:28 PMPhilippe Laflamme
05/30/2023, 2:40 PMdaniel
05/30/2023, 2:40 PMPhilippe Laflamme
05/30/2023, 3:04 PM