Tom Reilly
02/27/2023, 11:30 PMEcsRunLauncher
and my QueuedRunCoordinator
is setup with max_concurrent_runs
set to 750. A sensor requested ~650 job runs but I never saw more than about 200 in progress at once even though there were hundreds of runs waiting in queue. I expected to see a larger number of jobs in progress at once. The db I use for run and event storage as well as my grpc service did hit 100% CPU utilization at times. Any advice for getting runs out of the queue faster and have in progress runs closer to the max_concurrent_runs
value?rex
02/28/2023, 12:43 AMdagster.yaml
sensors:
use_threads: true
num_workers: 8
schedules:
use_threads: true
num_workers: 8
daniel
02/28/2023, 2:08 AMrun_coordinator:
module: dagster.core.run_coordinator
class: QueuedRunCoordinator
config:
dequeue_use_threads: true
dequeue_num_workers: 8 # This number can be tuned depending on your parallelism needs
Tom Reilly
03/01/2023, 2:22 PMrun_monitoring:
enabled: true
start_timeout_seconds: 300
and since enabling threading I'm seeing some failures due to a job exceeding the start_timeout_seconds
. Is this a sign to scale up the daemon?daniel
03/01/2023, 2:39 PMTom Reilly
03/01/2023, 2:48 PM[EcsRunLauncher] Launching run in ECS task
with a task arn and then about 5min later the run_failure is triggeredENGINE_EVENT
RunRequest
for eachstart_timeout_seconds
RUN_STARTING
and RUN_START
is usually around 70 secondsdaniel
03/01/2023, 3:02 PMTom Reilly
03/01/2023, 3:03 PMdaniel
03/01/2023, 3:34 PMTom Reilly
03/01/2023, 6:10 PM