https://dagster.io/ logo
#dagster-support
Title
# dagster-support
f

Frank Dekervel

09/07/2022, 7:21 AM
hello, i'm about to create a sensor that watches cloud storage for launching runs. that's a usecase documented in dagster docs. It would not be too difficult, but there are some considerations that are probably relevant for everybody doing such a thing: how to do cloud dirlistings efficiently and incrementally so that the sensor remains speedy, how to avoid bogging down dagster by creating huge number of RunRequests for runs that are already done (i know dagster skips those runs automatically, but emitting 10K run requests will still take sth). So my question is: is there any example code already out there ?
d

daniel

09/07/2022, 2:54 PM
Hi Frank - there's a bit of guidance here on this with some examples of using cursors to avoid emitting 10k run requests on every sensor run: https://docs.dagster.io/concepts/partitions-schedules-sensors/sensors#sensor-optimizations-using-cursors
f

Frank Dekervel

09/07/2022, 3:06 PM
Tx.. i was aware of that and used it too. But I was hoping I could already get a head start on this generic usecase
j

Jon Simpson

09/07/2022, 3:44 PM
The sensor has a cursor that you increment, I’d use that to tell S3/GCS to list only objects with a modified after that date.
Copy code
aws s3api list-objects-v2 --bucket BUCKET_NAME  --query 'Contents[?LastModified>=`YYYY-MM-DD`].Key'
Then adjust the cursor to be the max date of the objects returned
2 Views