Hey team, I've noticed an issue when running backf...
# ask-community
r
Hey team, I've noticed an issue when running backfills for 1k items that the CPU usage of the Dagster database goes to 90% even if only 10 pipelines are running at once, and it stays at 90% until the backfill is over, then suddely goes to 1-2% usual cpu usage. Feels weird to have this huge cpu usage when 99% of the runs are already done and only 1% are remaining. Any ideas?
p
Is this on postgres? Is there anything popping out from
pg_stat_activity
?
r
This is postgres yes, these are the queries and stats as per GCP console .
This is the top 1 query in total time spent:
Copy code
SELECT
  runs.run_body
FROM
  runs
JOIN
  run_tags
ON
  runs.run_id = run_tags.run_id
WHERE
  run_tags.key = $1
  AND run_tags.value = $2
  OR run_tags.key = $3
  AND run_tags.value = $4
  OR run_tags.key = $5
  AND run_tags.value = $6
GROUP BY
  runs.run_body,
  runs.id
HAVING
  COUNT(runs.run_id) = $7
ORDER BY
  runs.id DESC
And this is the detailed view of such query:
p
This is helpful… I’m trying to narrow down whether this is specific to backfills / the backfill daemon, or just an aggressive polling of runs in dagit. Is it possible to tell whether the CPU spikes (and sustains) as the runs for the backfill are being created, or also after they’re all created and the runs are in flight? Also, does the CPU stay pinned even without any pages on Dagit being open to monitor backfill progress?
r
They are sustained for one hour until no run belonging to the backfill isbactually running, regardless of how many runs are still pending/were already done
Yes, this happens durint night hours so no one is even using Dagit for snything
Also pipelines themselves last for about 20-25 seconds and are lightweight