Hi, I’m facing an issue with queued runs concurren...
# ask-community
a
Hi, I’m facing an issue with queued runs concurrency I have set
max_concurrent_runs
to en-queue runs to 50 but no. of runs dequeuing from the queue and coming in the
in_progress
state is relatively low (like 3 to 5 avg.). Any suggestion will be helpful. Thank you. Here is my configuration:
Copy code
Dagster version: 0.14.9

local_artifact_storage:
  module: dagster.core.storage.root
  class: LocalArtifactStorage
  config:
    base_dir: /home/ubuntu/dagster_home
run_storage:
  module: dagster_postgres.run_storage
  class: PostgresRunStorage
  config:
    postgres_db:
      db_name: dagster_prod
      hostname: 127.0.0.1
      password: password
      port: 5432
      username: newuser
event_log_storage:
  module: dagster_postgres.event_log
  class: PostgresEventLogStorage
  config:
    postgres_db:
      db_name: dagster_prod
      hostname: 127.0.0.1
      password: password
      port: 5432
      username: newuser
compute_logs:
  module: dagster.core.storage.local_compute_log_manager
  class: LocalComputeLogManager
  config:
    base_dir: /home/ubuntu/dagster_home/storage
schedule_storage:
  module: dagster_postgres.schedule_storage
  class: PostgresScheduleStorage
  config:
    postgres_db:
      db_name: dagster_prod
      hostname: 127.0.0.1
      password: password
      port: 5432
      username: newuser
scheduler:
  module: dagster.core.scheduler
  class: DagsterDaemonScheduler
  config: {}
run_coordinator:
  module: dagster.core.run_coordinator
  class: QueuedRunCoordinator
  config:
    max_concurrent_runs: 50
run_launcher:
  module: dagster
  class: DefaultRunLauncher
  config: {}
d
Hi Aman - how long does each run take to finish after it's started? Is the problem coming from the runs finishing quickly and the run dequeuer taking a while to take more runs off of the queue?
a
runs finishing time is not fixed, some are taking seconds to complete while some are completed in minutes
d
What would you say the average completion time is?
(Roughly is fine)
One thing that could be helpful to diagnose the problem is to send logs from your dagster-daemon process over a few minutes while it is pulling runs off the queue, that could help identify where the slowdown is happening. We have some work that we'd like to prioritize soon to improve the throughput of the run queue when there are lots of runs being dequeued at once, that would help us identify if that project would fix the problem you're seeing here
a
What would you say the average completion time is?
so for one report avg. time is below 2 sec and for another report, it is between 2 to 3 min. we are getting multiple reports and each one’s completion time depends on the amount of data it carries in it. But the most frequent report (which is running every hour) completion time is below 2 sec
let check the logs
d
Oh hi again aman - I just remembered we talked about a similar issue back in April: https://dagster.slack.com/archives/C01U954MEER/p1650562325789019?thread_ts=1650528736.368829&cid=C01U954MEER
a
yes
d
I think the advice from then still applies - we have some work we'd like to do on our side to make the run queue better at handling large numbers of quick runs. I've also seen some users doing similar things who had better luck using dynamic orchestration to have a single run doing many ops, rather than a large number of runs: https://docs.dagster.io/concepts/ops-jobs-graphs/dynamic-graphs#dynamic-graphs
So in that case you could potentially have an op for each report and periodically kick off the run (rather than having a separate job for each report and needing to go through the run queue for each run)
a
right now we have an op and job for each report and calling it at a scheduled time.
So rather than that we can create an op for each report and call them in a single job?
d
I think that could help with your run queue latency issues - we've had other users running into similar problems, and that seemed to help. The dynamic orchestration part lets you write an op that does similar logic to what your sensor is doing now
a
Right, Let me check it and get back to you if I have more query.