Looking into sensors and I am getting this error ```grpc cha dagster #ask-community

Looking into sensors and I am getting this error: ...

Dominick Giordano

03/30/2022, 2:28 AM

Looking into sensors and I am getting this error:

Copy code

grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with: status = StatusCode.DEADLINE_EXCEEDED details = "Deadline Exceeded" debug_error_string = "{"created":"@1648579523.062155830","description":"Error received from peer unix:/tmp/tmp270j3kna","file":"src/core/lib/surface/call.cc","file_line":903,"grpc_message":"Deadline Exceeded","grpc_status":4}"

I am trying to have a pipeline job run everytime a file has been added into a very large s3 bucket. My sensor functions currently look like:

Copy code

def config_dagster_sensors():
    ##Dagster Sensors
    dagster_sensors = []

    def build_sensor() -> SensorDefinition:
        return SensorDefinition(
            name="pipeline_name_sensor",
            pipeline_name="pipeline_name",
            mode="s3",
            minimum_interval_seconds=10,
            evaluation_fn=sensor_fn
        )

    def sensor_fn(context: SensorExecutionContext):
        new_s3_keys = get_s3_keys("bucket_name", since_key=context.cursor)
        if not new_s3_keys:
            yield SkipReason("No new s3 files found for bucket.")
            return
        for s3_key in new_s3_keys:
            yield RunRequest(run_key=s3_key, run_config={}, pipeline_name="pipeline_name")
            context.update_cursor(s3_key)

    dagster_sensors.append(build_sensor())
    return dagster_sensors

I am getting the impression the error is coming from not being able to finish in the 60 second time. Is there anyway to override this and allow it to finish no matter how long it takes?

johann

03/30/2022, 2:34 PM

Hi Dominick, it’s not currently possible to increase that timeout. However you should be able to impose a limit on how many items in

new_s3_keys

you’ll process within a single tick, and use the cursor to just pick up where you left off

Dominick Giordano

03/30/2022, 2:47 PM

@johann Shouldn’t the cursor be updated after every run request I am calling above and start from that key next iteration?

johann

03/30/2022, 2:49 PM

Yes- your code could just change to something like

Copy code

for s3_key in new_s3_keys[:KEYS_PER_TICK_LIMIT]:

johann

03/30/2022, 2:51 PM

You’d just have to experiment with KEYS_PER_TICK_LIMIT to find a reasonable number that doesn’t go over the timeout. cc @prha in case he has any tips here

daniel

03/30/2022, 2:54 PM

The cursor isn't actually updated until the function completely finishes (which is a bit counterintuitive due to the yields)

daniel

03/30/2022, 2:55 PM

And similarly it waits until the function completely finishes before any RunRequests are processed - it's not actually async currently

Dominick Giordano

03/30/2022, 3:04 PM

Ok, great. I will mess around with my config. Thanks everyone!

Dominick Giordano

03/30/2022, 3:58 PM

Even when I lower limit for keys down to 1, it crashes my daemon entirely. I do not need a special kind of config for the daemon to work with sensors, do I? @johann @daniel

daniel

03/30/2022, 3:58 PM

that's surprising, do you have a stack trace for the crash?

daniel

03/30/2022, 4:00 PM

is it a memory issue possibly?

Dominick Giordano

03/30/2022, 4:02 PM

Same error as before - but both memory and CPU of my ec2 maxed out so easily could be part of it.

Dominick Giordano

03/30/2022, 4:02 PM

When I run job I am testing with a schedule or just manually in UI I do not get anywhere near that kind of usage rate

daniel

03/30/2022, 4:02 PM

Are you running the daemon locally?

daniel

03/30/2022, 4:03 PM

If the bucket is massive, its possible that get_s3_keys is OOMing the process

👍 1

Dominick Giordano

03/30/2022, 4:03 PM

Both are on ecs containers

daniel

03/30/2022, 4:03 PM

Is the user code server running in its own ECS task?

Dominick Giordano

03/30/2022, 4:04 PM

Yes

daniel

03/30/2022, 4:04 PM

huh, that's surprising that the daemon would crash then. Do you mean literally crash? Like the process/task stops?

Dominick Giordano

03/30/2022, 4:05 PM

It seems to have shut down - task is still running

daniel

03/30/2022, 4:06 PM

Any logs from the daemon task that might help explain what's going on?

Dominick Giordano

03/30/2022, 4:06 PM

I think my use case may be better suited to use Lambda in aws to trigger jobs using graphql

Dominick Giordano

03/30/2022, 4:07 PM

Daemon task logs just showing same error as before ^^ and that the daemon shut off

Exception: Stopping dagster-daemon process since the following threads are no longer sending heartbeats: ['SENSOR']

daniel

03/30/2022, 4:07 PM

Hm, and what version of dagster is this? It should still be heartbeating even if a sensor is timing out

Dominick Giordano

03/30/2022, 4:09 PM

Should be latest - I just rebuilt image and it grabs latest from my requirements file

daniel

03/30/2022, 4:10 PM

We can see if we can reproduce - if it's possible to post or DM the logs from the failed task (for 5-10 minutes or so before it crashed) that would help debug

daniel

03/30/2022, 4:11 PM

Also a bit surprising that the task stayed up after the daemon process died - I would expect it to restart

194 Views

Open in Slack

Previous Next