Hi ! I have a data pipeline where currently I have one job and one sensor per client (about 200). Each sensor looks at the modification date of the client's google sheet and launches the job if this date is different from the cursor. I would like to optimize my code due to a large number of clients because it requires a lot of CPU resources.
I am convinced that having a job with dynamic partitioning and a single sensor that triggers the appropriate partition keys is the solution to implement. The problem is around the sensor. Indeed to trigger a partition I need to read the modification date of the google sheet, this operation takes about 1 second. The sensor can't support a 200 seconds delay to know which executions it should trigger. What could be the solution to get around this? Make sure that at each tick of the sensor only a part of the clients (10 for example) is considered and set up a rotation so that after 20 ticks all the clients are taken into account?
05/16/2023, 2:49 PM
hey @Jordan breaking the sensor into chunks (ie only process 10 clients per tick) is our recommended solution for long-running sensor evaluation functions