Hi everyone. I´m facing some problems with a sensor running in a code server deployment on k8s, with every tick of the sensor failing by timeout, with the following root exception:
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
	status = StatusCode.DEADLINE_EXCEEDED
	details = "Deadline Exceeded"
	debug_error_string = "{"created":"@1687870896.003832471","description":"Deadline Exceeded","file":"src/core/ext/filters/deadline/deadline_filter.cc","file_line":81,"grpc_status":4}"
Checking on the code server deployment, we can see that it consumes a big amount of CPU, in contrast to other code servers under the same dagster instance. However, I cannot reproduce this problem locally, as the sensor will run with no problems, and neither can I see any logs from the sensor using the "experimental schedule/sensor logging view" feature. I´m running on dagster
Hey Juan, What is the sensor doing? Are you trying to access some web API that is maybe timing out due to network issues or bad credentials or something?
Hi, yes, I have to access an FTP server and it hangs at some point. However the debugging for this is quite poor, since I can´t view the logs, from
in the sensor UI. Also, I´ve tested running a script similar to the sensor evaluation directly on the code server's pod, which I belive (please correct me if I am wrong) is where the sensor evaluation runs. The script runs normally with the same access credentials.
In the end, the problem with the sensor was a check if there was already a materialized dynamic partition created for a specific file, on the sensor job. This check was being performed once for each file, up to 1000 detected files.
I could see the logs directly from the pod where the code server was deployed.
Ah, glad to hear you’ve solved the problem!
Thank you!