Hi everyone I m facing some problems with a sensor running i dagster #ask-community

Hi everyone. I´m facing some problems with a senso...

Juan Freire

06/27/2023, 2:21 PM

Hi everyone. I´m facing some problems with a sensor running in a code server deployment on k8s, with every tick of the sensor failing by timeout, with the following root exception:

Copy code

grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
	status = StatusCode.DEADLINE_EXCEEDED
	details = "Deadline Exceeded"
	debug_error_string = "{"created":"@1687870896.003832471","description":"Deadline Exceeded","file":"src/core/ext/filters/deadline/deadline_filter.cc","file_line":81,"grpc_status":4}"

Checking on the code server deployment, we can see that it consumes a big amount of CPU, in contrast to other code servers under the same dagster instance. However, I cannot reproduce this problem locally, as the sensor will run with no problems, and neither can I see any logs from the sensor using the "experimental schedule/sensor logging view" feature. I´m running on dagster

1.3.3

🤖 1

sean

06/27/2023, 2:55 PM

Hey Juan, What is the sensor doing? Are you trying to access some web API that is maybe timing out due to network issues or bad credentials or something?

Juan Freire

06/27/2023, 4:13 PM

Hi, yes, I have to access an FTP server and it hangs at some point. However the debugging for this is quite poor, since I can´t view the logs, from

context.log

in the sensor UI. Also, I´ve tested running a script similar to the sensor evaluation directly on the code server's pod, which I belive (please correct me if I am wrong) is where the sensor evaluation runs. The script runs normally with the same access credentials.

Juan Freire

06/27/2023, 7:40 PM

In the end, the problem with the sensor was a check if there was already a materialized dynamic partition created for a specific file, on the sensor job. This check was being performed once for each file, up to 1000 detected files.

Juan Freire

06/27/2023, 7:41 PM

I could see the logs directly from the pod where the code server was deployed.

sean

06/27/2023, 7:46 PM

Ah, glad to hear you’ve solved the problem!

Juan Freire

06/27/2023, 8:20 PM

Thank you!

Open in Slack

Previous Next