Hello, i have 4 sensors queuing the runs in queue...
# ask-community
a
Hello, i have 4 sensors queuing the runs in queue and have set limit of 20 for each tag of runs but max 3 to 5 runs are coming in
In Progress
state at a time that taking a lot time for processing runs. Here is my configuration setting. Any suggestion for making it faster will be helpful.
Copy code
run_coordinator:
  module: dagster.core.run_coordinator
  class: QueuedRunCoordinator
  config:
    max_concurrent_runs: 50
    tag_concurrency_limits:
    - key: GET_MERCHANT_LISTINGS_ALL_DATA
      limit: 20
    - key: GET_MERCHANT_LISTINGS_ALL_DATA
      limit: 2
      value:
        applyLimitPerUniqueValue: true
    - key: GET_FBA_FULFILLMENT_REMOVAL_ORDER_DETAIL_DATA
      limit: 20
    - key: GET_FBA_FULFILLMENT_REMOVAL_ORDER_DETAIL_DATA
      limit: 2
      value:
        applyLimitPerUniqueValue: true
    - key: GET_FBA_FULFILLMENT_REMOVAL_SHIPMENT_DETAIL_DATA
      limit: 20
    - key: GET_FBA_FULFILLMENT_REMOVAL_SHIPMENT_DETAIL_DATA
      limit: 2
      value:
        applyLimitPerUniqueValue: true
    - key: GET_FBA_FULFILLMENT_INVENTORY_HEALTH_DATA
      limit: 20
    - key: GET_FBA_FULFILLMENT_INVENTORY_HEALTH_DATA
      limit: 2
      value:
        applyLimitPerUniqueValue: true
    - key: GET_FBA_MYI_UNSUPPRESSED_INVENTORY_DATA
      limit: 20
    - key: GET_FBA_MYI_UNSUPPRESSED_INVENTORY_DATA
      limit: 2
      value:
        applyLimitPerUniqueValue: true
    - key: GET_RESERVED_INVENTORY_DATA
      limit: 20
    - key: GET_RESERVED_INVENTORY_DATA
      limit: 2
      value:
        applyLimitPerUniqueValue: true
    - key: list_inventory_supply
      limit: 20
    - key: list_inventory_supply
      limit: 2
      value:
        applyLimitPerUniqueValue: true
d
Is the problem that jobs aren't going on the queue at the rate you expect (which would imply that your sensors are going slower than they should be) or that you have a large number of queued runs that are coming off the queue slower than you would expect?
if its the first thing (the sensors being too slow now), how many runs are each of your sensors creating on each tick, and how long does the sensor wait between each tick? You can decrease the amount of time between each sensor tick by changing minimum_interval_seconds: https://docs.dagster.io/concepts/partitions-schedules-sensors/sensors#evaluation-interval
a
no the second one
that you have a large number of queued runs that are coming off the queue slower than you would expect?
d
Is it possible to share logs from your run queue daemon from a period of time where its going slower than you would expect? Similar to the SensorDaemon ask from before, but with QueuedRunCoordinatorDaemon instead
a
it would be difficult for getting this specific
QueuedRunCoordinatorDaemon
logs. Plus runs are coming slower from the time they got queued.
d
I think it's going to be difficult to help more without visibility into what the run queue daemon is doing, which would typically be found in those logs
a
i changed my run configuration to this:
Copy code
run_coordinator:
  module: dagster.core.run_coordinator
  class: QueuedRunCoordinator
  config:
    max_concurrent_runs: 50
    tag_concurrency_limits:
    - key: GET_MERCHANT_LISTINGS_ALL_DATA
      limit: 30
    - key: GET_MERCHANT_LISTINGS_ALL_DATA
      limit: 5
      value:
        applyLimitPerUniqueValue: true
    - key: GET_FBA_FULFILLMENT_REMOVAL_ORDER_DETAIL_DATA
      limit: 30
    - key: GET_FBA_FULFILLMENT_REMOVAL_ORDER_DETAIL_DATA
      limit: 5
      value:
        applyLimitPerUniqueValue: true
    - key: GET_FBA_FULFILLMENT_REMOVAL_SHIPMENT_DETAIL_DATA
      limit: 30
    - key: GET_FBA_FULFILLMENT_REMOVAL_SHIPMENT_DETAIL_DATA
      limit: 5
      value:
        applyLimitPerUniqueValue: true
    - key: GET_FBA_FULFILLMENT_INVENTORY_HEALTH_DATA
      limit: 30
    - key: GET_FBA_FULFILLMENT_INVENTORY_HEALTH_DATA
      limit: 5
      value:
        applyLimitPerUniqueValue: true
    - key: GET_FBA_MYI_UNSUPPRESSED_INVENTORY_DATA
      limit: 30
    - key: GET_FBA_MYI_UNSUPPRESSED_INVENTORY_DATA
      limit: 5
      value:
        applyLimitPerUniqueValue: true
    - key: GET_RESERVED_INVENTORY_DATA
      limit: 30
    - key: GET_RESERVED_INVENTORY_DATA
      limit: 5
      value:
        applyLimitPerUniqueValue: true
    - key: list_inventory_supply
      limit: 30
    - key: list_inventory_supply
      limit: 5
      value:
        applyLimitPerUniqueValue: true
d
I can throw out a couple of guesses, but without the logs they're just shots in the dark - one possibility could be that switching from sqlite to postgres would speed things up
a
I think it’s going to be difficult to help more without visibility into what the run queue daemon is doing, which would typically be found in those logs
ok let my try to get those logs
i already switched to postgres as you told earlier in previous problem i was facing
👍 1
d
Was your deployment previously able to handle more runs happening simultaneously? If so, did anything in particular change from when it was working better?
a
1. i have updated the dagster version from 0.13.6 to latest 2. switched Db from sqlite to postgres. 3. and set sensor limit on reading assets events to 10.
d
And before you did those things, it was processing many more runs in parallel when the run queue was large?
i.e. it was keeping up with that 50 limit that you set?
a
No, as number of queued runs are quite large so didn’t figured this out well
d
Got it - to try to figure out the source of the problem, i'm trying to identify if this is a new issue that was introduced recently, or if your dagster deployment has always had this issue. It sounds like it's possible that this is not a new issue?
a
might be
i’m trying to get the logs for more clarification
👍 1
daniel here are all logs i’m able to get
d
OK, so the logs here definitely paint a picture of the number of runs getting enqueued faster than they can be dequeued and run:
Copy code
Apr 20 08:03:51 ip-172-31-7-87 dagster-daemon[19942]: 2022-04-20 08:03:51 - QueuedRunCoordinatorDaemon - INFO - Retrieved 4600 queued runs, checking limits.
Apr 20 08:03:51 ip-172-31-7-87 dagster-daemon[19942]: INFO:QueuedRunCoordinatorDaemon:Retrieved 4600 queued runs, checking limits.
Apr 20 08:03:51 ip-172-31-7-87 dagster-daemon[19942]: 2022-04-20 08:03:51 - QueuedRunCoordinatorDaemon - INFO - Launched 0 runs.
Apr 20 08:03:51 ip-172-31-7-87 dagster-daemon[19942]: INFO:QueuedRunCoordinatorDaemon:Launched 0 runs.
Apr 20 08:04:07 ip-172-31-7-87 dagster-daemon[19942]: 2022-04-20 08:04:07 - QueuedRunCoordinatorDaemon - INFO - Retrieved 4645 queued runs, checking limits.
Apr 20 08:04:07 ip-172-31-7-87 dagster-daemon[19942]: INFO:QueuedRunCoordinatorDaemon:Retrieved 4645 queued runs, checking limits.
Apr 20 08:04:07 ip-172-31-7-87 dagster-daemon[19942]: 2022-04-20 08:04:07 - QueuedRunCoordinatorDaemon - INFO - Launched 0 runs.
Apr 20 08:04:07 ip-172-31-7-87 dagster-daemon[19942]: INFO:QueuedRunCoordinatorDaemon:Launched 0 runs.
Apr 20 08:04:13 ip-172-31-7-87 dagster-daemon[19942]: 2022-04-20 08:04:13 - QueuedRunCoordinatorDaemon - INFO - Retrieved 4661 queued runs, checking limits.
Apr 20 08:04:13 ip-172-31-7-87 dagster-daemon[19942]: INFO:QueuedRunCoordinatorDaemon:Retrieved 4661 queued runs, checking limits.
Apr 20 08:04:13 ip-172-31-7-87 dagster-daemon[19942]: 2022-04-20 08:04:13 - QueuedRunCoordinatorDaemon - INFO - Launched 0 runs.
Apr 20 08:04:13 ip-172-31-7-87 dagster-daemon[19942]: INFO:QueuedRunCoordinatorDaemon:Launched 0 runs.
Apr 20 08:05:19 ip-172-31-7-87 dagster-daemon[19942]: 2022-04-20 08:05:19 - QueuedRunCoordinatorDaemon - INFO - Launched 15 runs.
Apr 20 08:05:19 ip-172-31-7-87 dagster-daemon[19942]: INFO:QueuedRunCoordinatorDaemon:Launched 15 runs.
Apr 20 08:05:21 ip-172-31-7-87 dagster-daemon[19942]: 2022-04-20 08:05:21 - QueuedRunCoordinatorDaemon - INFO - Retrieved 4759 queued runs, checking limits.
Apr 20 08:05:21 ip-172-31-7-87 dagster-daemon[19942]: INFO:QueuedRunCoordinatorDaemon:Retrieved 4759 queued runs, checking limits.
Apr 20 08:05:42 ip-172-31-7-87 dagster-daemon[19942]: 2022-04-20 08:05:42 - QueuedRunCoordinatorDaemon - INFO - Launched 3 runs.
Apr 20 08:05:42 ip-172-31-7-87 dagster-daemon[19942]: INFO:QueuedRunCoordinatorDaemon:Launched 3 runs.
Apr 20 08:07:43 ip-172-31-7-87 dagster-daemon[19942]: 2022-04-20 08:07:43 - QueuedRunCoordinatorDaemon - INFO - Launched 16 runs.
Apr 20 08:07:43 ip-172-31-7-87 dagster-daemon[19942]: INFO:QueuedRunCoordinatorDaemon:Launched 16 runs.
Apr 20 08:07:45 ip-172-31-7-87 dagster-daemon[19942]: 2022-04-20 08:07:45 - QueuedRunCoordinatorDaemon - INFO - Retrieved 4955 queued runs, checking limits.
Apr 20 08:07:45 ip-172-31-7-87 dagster-daemon[19942]: INFO:QueuedRunCoordinatorDaemon:Retrieved 4955 queued runs, checking limits.
Apr 20 08:08:46 ip-172-31-7-87 dagster-daemon[19942]: 2022-04-20 08:08:46 - QueuedRunCoordinatorDaemon - INFO - Launched 17 runs.
Apr 20 08:08:46 ip-172-31-7-87 dagster-daemon[19942]: INFO:QueuedRunCoordinatorDaemon:Launched 17 runs.
Apr 20 08:08:48 ip-172-31-7-87 dagster-daemon[19942]: 2022-04-20 08:08:48 - QueuedRunCoordinatorDaemon - INFO - Retrieved 5046 queued runs, checking limits.
Apr 20 08:08:48 ip-172-31-7-87 dagster-daemon[19942]: INFO:QueuedRunCoordinatorDaemon:Retrieved 5046 queued runs, checking limits.
Apr 20 08:09:52 ip-172-31-7-87 dagster-daemon[19942]: 2022-04-20 08:09:52 - QueuedRunCoordinatorDaemon - INFO - Launched 15 runs.
Apr 20 08:09:52 ip-172-31-7-87 dagster-daemon[19942]: INFO:QueuedRunCoordinatorDaemon:Launched 15 runs.
Apr 20 08:09:53 ip-172-31-7-87 dagster-daemon[19942]: 2022-04-20 08:09:53 - QueuedRunCoordinatorDaemon - INFO - Retrieved 5139 queued runs, checking limits.
Apr 20 08:09:53 ip-172-31-7-87 dagster-daemon[19942]: INFO:QueuedRunCoordinatorDaemon:Retrieved 5139 queued runs, checking limits.
Apr 20 08:10:14 ip-172-31-7-87 dagster-daemon[19942]: 2022-04-20 08:10:14 - QueuedRunCoordinatorDaemon - INFO - Launched 17 runs.
Apr 20 08:10:14 ip-172-31-7-87 dagster-daemon[19942]: INFO:QueuedRunCoordinatorDaemon:Launched 17 runs.
Apr 20 08:10:16 ip-172-31-7-87 dagster-daemon[19942]: 2022-04-20 08:10:16 - QueuedRunCoordinatorDaemon - INFO - Retrieved 5169 queued runs, checking limits.
Apr 20 08:10:16 ip-172-31-7-87 dagster-daemon[19942]: INFO:QueuedRunCoordinatorDaemon:Retrieved 5169 queued runs, checking limits.
Apr 20 08:13:23 ip-172-31-7-87 dagster-daemon[19942]: 2022-04-20 08:13:23 - QueuedRunCoordinatorDaemon - INFO - Launched 17 runs.
Apr 20 08:13:23 ip-172-31-7-87 dagster-daemon[19942]: INFO:QueuedRunCoordinatorDaemon:Launched 17 runs.
Apr 20 08:13:44 ip-172-31-7-87 dagster-daemon[19942]: 2022-04-20 08:13:44 - QueuedRunCoordinatorDaemon - INFO - Retrieved 5420 queued runs, checking limits.
Apr 20 08:13:44 ip-172-31-7-87 dagster-daemon[19942]: INFO:QueuedRunCoordinatorDaemon:Retrieved 5420 queued runs, checking limits.
Apr 20 08:14:47 ip-172-31-7-87 dagster-daemon[19942]: 2022-04-20 08:14:47 - QueuedRunCoordinatorDaemon - INFO - Launched 16 runs.
Apr 20 08:14:47 ip-172-31-7-87 dagster-daemon[19942]: INFO:QueuedRunCoordinatorDaemon:Launched 16 runs.
Apr 20 08:14:49 ip-172-31-7-87 dagster-daemon[19942]: 2022-04-20 08:14:49 - QueuedRunCoordinatorDaemon - INFO - Retrieved 5504 queued runs, checking limits.
Apr 20 08:14:49 ip-172-31-7-87 dagster-daemon[19942]: INFO:QueuedRunCoordinatorDaemon:Retrieved 5504 queued runs, checking limits.
Apr 20 08:15:51 ip-172-31-7-87 dagster-daemon[19942]: 2022-04-20 08:15:51 - QueuedRunCoordinatorDaemon - INFO - Launched 16 runs.
Apr 20 08:15:51 ip-172-31-7-87 dagster-daemon[19942]: INFO:QueuedRunCoordinatorDaemon:Launched 16 runs.
Apr 20 08:16:15 ip-172-31-7-87 dagster-daemon[19942]: 2022-04-20 08:16:15 - QueuedRunCoordinatorDaemon - INFO - Launched 16 runs.
Apr 20 08:16:15 ip-172-31-7-87 dagster-daemon[19942]: INFO:QueuedRunCoordinatorDaemon:Launched 16 runs.
Apr 20 08:18:17 ip-172-31-7-87 dagster-daemon[19942]: 2022-04-20 08:18:17 - QueuedRunCoordinatorDaemon - INFO - Launched 17 runs.
Apr 20 08:18:17 ip-172-31-7-87 dagster-daemon[19942]: INFO:QueuedRunCoordinatorDaemon:Launched 17 runs.
Apr 20 08:19:44 ip-172-31-7-87 dagster-daemon[19942]: 2022-04-20 08:19:44 - QueuedRunCoordinatorDaemon - INFO - Launched 17 runs.
Apr 20 08:19:44 ip-172-31-7-87 dagster-daemon[19942]: INFO:QueuedRunCoordinatorDaemon:Launched 17 runs.
Apr 20 08:19:46 ip-172-31-7-87 dagster-daemon[19942]: 2022-04-20 08:19:46 - QueuedRunCoordinatorDaemon - INFO - Retrieved 5857 queued runs, checking limits.
Apr 20 08:19:46 ip-172-31-7-87 dagster-daemon[19942]: INFO:QueuedRunCoordinatorDaemon:Retrieved 5857 queued runs, checking limits.
Apr 20 08:20:48 ip-172-31-7-87 dagster-daemon[19942]: 2022-04-20 08:20:48 - QueuedRunCoordinatorDaemon - INFO - Launched 17 runs.
Apr 20 08:20:48 ip-172-31-7-87 dagster-daemon[19942]: INFO:QueuedRunCoordinatorDaemon:Launched 17 runs.
Apr 20 08:20:50 ip-172-31-7-87 dagster-daemon[19942]: 2022-04-20 08:20:50 - QueuedRunCoordinatorDaemon - INFO - Retrieved 5930 queued runs, checking limits.
Apr 20 08:20:50 ip-172-31-7-87 dagster-daemon[19942]: INFO:QueuedRunCoordinatorDaemon:Retrieved 5930 queued runs, checking limits.
The number of queued runs keeps going up, seemingly faster than they can be taken off the queue and launched
But each of those ticks says that it's launching several runs (often 16/17 or so)
it's possible that you need to either reduce the number of runs coming into the system, or find a way to make your queue limits less restrictive without overwhelming the node where dagster is running. This does seem like a lot of runs to try to be working through on a single machine, a lot of customers using dagster in production at load will make use of a system like k8s or ecs that can horizontally scale to run lots of runs in parallels
do your runs typically finish really quickly?
a
do your runs typically finish really quickly?
each run don’t have fix time, some are finishing in 2 to 6 sec while some can takes mins
can i run two separate demon servers for requesting and processing sensors respectively on same machine ?
can that be helpful ?
a lot of customers using dagster in production at load will make use of a system like k8s or ecs that can horizontally scale to run lots of runs in parallels
also will look into this
d
Each of the runs is already happening in its own process on the same machine (the daemon doesn't wait for the runs to finish)
a
got it