I've noticed that run history is taking a long tim...
# ask-community
I've noticed that run history is taking a long time to load for individual jobs. I just took a look at one of the queries, and found that there is a sequential scan in the query used by that page. Is this a query that is already being reviewed by the team?
Ah, this looks like we’re missing an index. We’ll try to get a fix out for this ASAP, thanks for the report!
no problem! looks like
has one on run_id, but not pipeline_name and run_id
@prha I've updated and applied migrations (index is now present), but this query still takes about 10s to run. there's a very small tweak to the query that makes it run in 80ms instead. I'll start a PR
hmmm - I can't reproduce this as reliably as I'd like actually. going to hold off for now. but curious if you still see anything funky. i'm seeing a nested loop in the plan, which likely isn't helping
anyone else experiencing issues loading the runs page for a given job? It pretty reliably times out loading on the first try for us, with the latest migrations applied - I haven't gotten a chance to dig in deeper.
@prha are you tracking any more issues in this area? I'm still getting statement timeouts for
Hey Adam… I was just looking at this query this morning. I think this might be hitting poor query performance when using the run tags table to filter for repository scope. I’m thinking about writing a data migration to move repository scope from tags to a column on the run table to help with this.
the new run tags index is 37 MB on our prod database - not huge by any means, but maybe big enough to not be as fast as it otherwise would be
anyways, let me know if query plan output from a larger database would be useful
Yeah, getting an explain/analyze (if postgres) would be useful on that query
so, the thing I've noticed with plans is that sometimes they are extremely fast, sometimes they take 10+ seconds. Something is clearly being cached, and we pay a performance penalty when that cache is not available. Here's a plan for a good case (cache available), I'll try again in a bit and see if I can hit the other case
Copy code
Limit  (cost=7252.73..7252.74 rows=3 width=1240) (actual time=22.041..22.049 rows=26 loops=1)
  ->  Sort  (cost=7252.73..7252.74 rows=3 width=1240) (actual time=22.039..22.045 rows=26 loops=1)
        Sort Key: runs.id DESC
        Sort Method: top-N heapsort  Memory: 77kB
        ->  Hash Semi Join  (cost=3032.95..7252.71 rows=3 width=1240) (actual time=19.412..21.651 rows=1277 loops=1)
              Hash Cond: ((runs.run_id)::text = (run_tags.run_id)::text)
              ->  Index Scan Backward using idx_runs_by_job on runs  (cost=0.42..4216.89 rows=1223 width=1277) (actual time=0.034..1.503 rows=1277 loops=1)
                    Index Cond: (pipeline_name = 'airbyte_sync_core'::text)
              ->  Hash  (cost=3022.06..3022.06 rows=838 width=37) (actual time=19.369..19.370 rows=6951 loops=1)
                    Buckets: 8192 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 533kB
                    ->  Bitmap Heap Scan on run_tags  (cost=17.14..3022.06 rows=838 width=37) (actual time=1.242..17.626 rows=6951 loops=1)
                          Recheck Cond: ((key = '.dagster/repository'::text) AND (value = 'core@agile-user-code'::text))
                          Heap Blocks: exact=5007
                          ->  Bitmap Index Scan on idx_run_tags  (cost=0.00..16.93 rows=838 width=0) (actual time=0.693..0.693 rows=6951 loops=1)
                                Index Cond: ((key = '.dagster/repository'::text) AND (value = 'core@agile-user-code'::text))
Planning Time: 0.311 ms
Execution Time: 22.242 ms
raw query (just to make sure I'm grabbing what you want):
Copy code
explain analyze SELECT runs.id, runs.run_body, runs.status, runs.create_timestamp, runs.update_timestamp, runs.start_time, runs.end_time 
FROM runs 
WHERE runs.pipeline_name = 'airbyte_sync_core' AND runs.run_id IN (SELECT run_tags.run_id 
FROM run_tags 
WHERE run_tags.key = '.dagster/repository' AND run_tags.value = 'core@agile-user-code') order by runs.id desc
 LIMIT 26;
yup, that’s it.
I think we should be able to eliminate the tag portion altogether for this query, but it adds some complexity and will require schema + data migration