Oliver
05/17/2021, 2:17 AMThe Dagster Daemon runs each sensor evaluation function on a tight loopDoes it mean only one sensor is evaulated at a time?
daniel
05/17/2021, 3:02 AMdaniel
05/17/2021, 3:07 AMOliver
05/17/2021, 3:19 AMOliver
05/17/2021, 7:57 AM2021-05-17 06:21:08 - SensorDaemon - INFO - Checking for new runs for sensor: kinesis007
2021-05-17 06:21:12 - SensorDaemon - INFO - Creating new run for kinesis007
2021-05-17 06:21:13 - SensorDaemon - INFO - Launching run for kinesis007
2021-05-17 06:21:14 - SensorDaemon - INFO - Completed launch of run d299d55a-96e4-46de-a055-ce75d8b7b60a for kinesis007
2021-05-17 06:21:14 - SensorDaemon - INFO - Creating new run for kinesis007
2021-05-17 06:21:15 - SensorDaemon - INFO - Launching run for kinesis007
2021-05-17 06:21:16 - SensorDaemon - INFO - Completed launch of run d02b9da4-e06d-4632-8afb-0894e211a4f6 for kinesis007
2021-05-17 06:21:17 - SensorDaemon - INFO - Creating new run for kinesis007
2021-05-17 06:21:17 - SensorDaemon - INFO - Launching run for kinesis007
2021-05-17 06:21:18 - SensorDaemon - INFO - Completed launch of run a313bcf4-daf1-4ca5-91f0-403f75e05878 for kinesis007
2021-05-17 06:21:18 - SensorDaemon - INFO - Creating new run for kinesis007
2021-05-17 06:21:19 - SensorDaemon - INFO - Launching run for kinesis007
2021-05-17 06:21:20 - SensorDaemon - INFO - Completed launch of run 98f32b52-6173-4ce4-973a-86add0f6de90 for kinesis007
2021-05-17 06:21:20 - SensorDaemon - INFO - Creating new run for kinesis007
2021-05-17 06:21:21 - SensorDaemon - INFO - Launching run for kinesis007
2021-05-17 06:21:23 - SensorDaemon - INFO - Completed launch of run 578c78a0-24ac-4829-ab4d-e5e451917d2c for kinesis007
2021-05-17 06:21:23 - SensorDaemon - INFO - Creating new run for kinesis007
2021-05-17 06:21:25 - SensorDaemon - INFO - Launching run for kinesis007
2021-05-17 06:21:26 - BackfillDaemon - INFO - No backfill jobs requested.
2021-05-17 06:21:26 - SensorDaemon - INFO - Completed launch of run 80ca17ad-0c42-41bd-8dfd-800842fc8321 for kinesis007
2021-05-17 06:21:26 - SensorDaemon - INFO - Creating new run for kinesis007
2021-05-17 06:21:28 - SensorDaemon - INFO - Launching run for kinesis007
2021-05-17 06:21:28 - SchedulerDaemon - INFO - Not checking for any runs since no schedules have been started.
2021-05-17 06:21:29 - SensorDaemon - INFO - Completed launch of run 4d737c1e-7e7f-476a-b675-92f7de8e8479 for kinesis007
2021-05-17 06:21:29 - SensorDaemon - INFO - Creating new run for kinesis007
2021-05-17 06:21:29 - SensorDaemon - INFO - Launching run for kinesis007
2021-05-17 06:21:31 - SensorDaemon - INFO - Completed launch of run 49ee27a0-0048-4f8d-81dd-603872a52f65 for kinesis007
2021-05-17 06:21:31 - SensorDaemon - INFO - Checking for new runs for sensor: kinesis001
2021-05-17 06:21:34 - SensorDaemon - INFO - Creating new run for kinesis001
the three main issues in the way of making this use case work
1. slow to actually create the jobs eg
2021-05-17 06:21:12 - SensorDaemon - INFO - Creating new run for kinesis007
2021-05-17 06:21:13 - SensorDaemon - INFO - Launching run for kinesis007
2021-05-17 06:21:14 - SensorDaemon - INFO - Completed launch of run d299d55a-96e4-46de-a055-ce75d8b7b60a for kinesis007
2. Sensors are not evaluated in parallel.
3. job not executed on yield.
#1 - probably just inherent in the system and would be hard to reduce
#2 - I think this is most detrimental issue for the use case since it ties the latency of the system to the number of shards.
#3 - would incidentally fix #1 but I think this might cross a grpc boundary so probably quite involved.Oliver
05/17/2021, 7:57 AMdef get_jobs(sensor):
now = time.now()
while True:
if time.now()-start>2:
return
records = get_records()
if len(records)==0:
return
job = make_job(record)
yield job
situtation: the data-proxy-service is fed with ~10mb/s of data. each kinesis shard can handle 2mb/s and so we need 5 streams to handle the load which results in 5 sensors.
through experimentation it was found that each senor iteration can yield ~3 jobs in the 2 seconds it is allocated. these 3 jobs are returned to the calling function. The calling function then iterates over the list and will launch about 1 job every 2 seconds. so for the 3 jobs to be launched and the collection function itself this results in around 8 seconds per sensor evaluation. We have 5 sensors so in total one round of sensor evaluations is taking ~40 seconds.
In 40 seconds the data-queue-proxy is going to push 8 records to the kinesis stream. so for every round of sensor evaluations we are losing 5 places from the head of the kinesis stream in each shard.Oliver
05/17/2021, 7:58 AMdaniel
05/17/2021, 1:14 PMOliver
05/18/2021, 4:18 AM