fix(records): materialized page and seen cte#6200
Merged
Merged
Conversation
There was a problem hiding this comment.
Pull request overview
Adjusts the deleteOutdatedRecords batched soft-delete query to avoid pathological PostgreSQL query plans by forcing key CTEs to be materialized, improving join strategy selection and preventing extremely slow nested-loop anti-joins.
Changes:
- Force
pageCTE to beMATERIALIZEDso the planner treats the batch as a bounded driving set. - Force
seenCTE to beMATERIALIZEDto stabilize planning around the unnestedrecord_idsset.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
rbwest
approved these changes
May 21, 2026
sapnesh-nango
pushed a commit
that referenced
this pull request
May 24, 2026
the joy of postgres planner. My test with some records earlier showed
that MATERIALIZED was slightly slower but in some worse case scenario it
2 order magnitude worse
query plan before:
```
Update on records r (cost=23.50..19420.04 rows=1 width=112) (actual time=114705.018..114705.024 rows=0 loops=1)
Update on records_p67 r_1
Buffers: shared hit=23076
CTE seen
-> Append (cost=0.44..22.39 rows=150 width=16) (actual time=0.061..18.629 rows=102646 loops=1)
Buffers: shared hit=2054
-> ProjectSet (cost=0.44..2.52 rows=10 width=16) (actual time=0.010..0.010 rows=0 loops=1)
Buffers: shared hit=3
-> Index Scan using records_seen_20260519_connection_id_model_sync_job_id_idx on records_seen_20260519 records_seen_1 (cost=0.44..2.46 rows=1 width=752) (actual time=0.010..0.010 rows=0 loops=1)
Index Cond: ((connection_id = 822600) AND ((model)::text = 'GithubStar'::text) AND (sync_job_id >= 1806417674))
Buffers: shared hit=3
-> ProjectSet (cost=0.44..11.16 rows=90 width=16) (actual time=0.051..10.470 rows=102646 loops=1)
Buffers: shared hit=2051
-> Index Scan using records_seen_20260520_connection_id_model_sync_job_id_idx on records_seen_20260520 records_seen_2 (cost=0.44..10.64 rows=9 width=752) (actual time=0.049..1.889 rows=2053 loops=1)
Index Cond: ((connection_id = 822600) AND ((model)::text = 'GithubStar'::text) AND (sync_job_id >= 1806417674))
Buffers: shared hit=2051
-> ProjectSet (cost=0.42..5.74 rows=40 width=16) (never executed)
-> Index Scan using records_seen_20260521_connection_id_model_sync_job_id_idx on records_seen_20260521 records_seen_3 (cost=0.42..5.51 rows=4 width=752) (never executed)
Index Cond: ((connection_id = 822600) AND ((model)::text = 'GithubStar'::text) AND (sync_job_id >= 1806417674))
-> ProjectSet (cost=0.14..2.23 rows=10 width=16) (never executed)
-> Index Scan using records_seen_20260522_connection_id_model_sync_job_id_idx on records_seen_20260522 records_seen_4 (cost=0.14..2.17 rows=1 width=752) (never executed)
Index Cond: ((connection_id = 822600) AND ((model)::text = 'GithubStar'::text) AND (sync_job_id >= 1806417674))
-> Nested Loop Anti Join (cost=1.11..19397.65 rows=1 width=112) (actual time=114705.017..114705.019 rows=0 loops=1)
Join Filter: (s.id = p.id)
Rows Removed by Join Filter: 512446625
Buffers: shared hit=23076
-> Merge Join (cost=1.11..19392.77 rows=1 width=72) (actual time=0.041..68.768 rows=10000 loops=1)
Merge Cond: (p.id = r_1.id)
Join Filter: (p.ctid = r_1.ctid)
Buffers: shared hit=21022
-> Subquery Scan on p (cost=0.56..9659.50 rows=9825 width=68) (actual time=0.028..38.598 rows=10000 loops=1)
Buffers: shared hit=10511
-> Limit (cost=0.56..9561.25 rows=9825 width=22) (actual time=0.024..29.928 rows=10000 loops=1)
Buffers: shared hit=10511
-> Index Scan using records_p67_connection_id_model_id_key on records_p67 records (cost=0.56..9561.25 rows=9825 width=22) (actual time=0.024..25.533 rows=10000 loops=1)
Index Cond: ((connection_id = 822600) AND ((model)::text = 'GithubStar'::text))
Filter: (deleted_at IS NULL)
Rows Removed by Filter: 387
Buffers: shared hit=10511
-> Index Scan using records_p67_connection_id_model_id_key on records_p67 r_1 (cost=0.56..9561.25 rows=9862 width=32) (actual time=0.009..10.623 rows=10387 loops=1)
Index Cond: ((connection_id = 822600) AND ((model)::text = 'GithubStar'::text))
Buffers: shared hit=10511
-> CTE Scan on seen s (cost=0.00..3.00 rows=150 width=56) (actual time=0.000..8.445 rows=51246 loops=10000)
Buffers: shared hit=2054
Planning:
Buffers: shared hit=2
Planning Time: 0.472 ms
Execution Time: 114705.653 ms
```
query plan after
```
Update on records r (cost=9588.51..14924.82 rows=1 width=112) (actual time=87.193..87.200 rows=0 loops=1)
Update on records_p67 r_1
Buffers: shared hit=12587
CTE page
-> Limit (cost=0.56..9561.25 rows=9825 width=22) (actual time=0.023..11.268 rows=10000 loops=1)
Buffers: shared hit=10528
-> Index Scan using records_p67_connection_id_model_id_key on records_p67 records (cost=0.56..9561.25 rows=9825 width=22) (actual time=0.023..10.436 rows=10000 loops=1)
Index Cond: ((connection_id = 822600) AND ((model)::text = 'GithubStar'::text))
Filter: (deleted_at IS NULL)
Rows Removed by Filter: 387
Buffers: shared hit=10528
CTE seen
-> Append (cost=0.44..22.39 rows=150 width=16) (actual time=0.062..19.721 rows=102654 loops=1)
Buffers: shared hit=2059
-> ProjectSet (cost=0.44..2.52 rows=10 width=16) (actual time=0.011..0.011 rows=0 loops=1)
Buffers: shared hit=3
-> Index Scan using records_seen_20260519_connection_id_model_sync_job_id_idx on records_seen_20260519 records_seen_1 (cost=0.44..2.46 rows=1 width=752) (actual time=0.010..0.011 rows=0 loops=1)
Index Cond: ((connection_id = 822600) AND ((model)::text = 'GithubStar'::text) AND (sync_job_id >= 1806417674))
Buffers: shared hit=3
-> ProjectSet (cost=0.44..11.16 rows=90 width=16) (actual time=0.051..11.180 rows=102654 loops=1)
Buffers: shared hit=2051
-> Index Scan using records_seen_20260520_connection_id_model_sync_job_id_idx on records_seen_20260520 records_seen_2 (cost=0.44..10.64 rows=9 width=752) (actual time=0.049..2.080 rows=2053 loops=1)
Index Cond: ((connection_id = 822600) AND ((model)::text = 'GithubStar'::text) AND (sync_job_id >= 1806417674))
Buffers: shared hit=2051
-> ProjectSet (cost=0.42..5.74 rows=40 width=16) (actual time=0.017..0.018 rows=0 loops=1)
Buffers: shared hit=3
-> Index Scan using records_seen_20260521_connection_id_model_sync_job_id_idx on records_seen_20260521 records_seen_3 (cost=0.42..5.51 rows=4 width=752) (actual time=0.017..0.017 rows=0 loops=1)
Index Cond: ((connection_id = 822600) AND ((model)::text = 'GithubStar'::text) AND (sync_job_id >= 1806417674))
Buffers: shared hit=3
-> ProjectSet (cost=0.14..2.23 rows=10 width=16) (actual time=0.012..0.013 rows=0 loops=1)
Buffers: shared hit=2
-> Index Scan using records_seen_20260522_connection_id_model_sync_job_id_idx on records_seen_20260522 records_seen_4 (cost=0.14..2.17 rows=1 width=752) (actual time=0.012..0.012 rows=0 loops=1)
Index Cond: ((connection_id = 822600) AND ((model)::text = 'GithubStar'::text) AND (sync_job_id >= 1806417674))
Buffers: shared hit=2
-> Nested Loop (cost=4.88..5341.19 rows=1 width=112) (actual time=87.192..87.194 rows=0 loops=1)
Buffers: shared hit=12587
-> Hash Anti Join (cost=4.88..281.82 rows=4912 width=108) (actual time=87.192..87.193 rows=0 loops=1)
Hash Cond: (p.id = s.id)
Buffers: shared hit=12587
-> CTE Scan on page p (cost=0.00..196.50 rows=9825 width=68) (actual time=0.028..14.839 rows=10000 loops=1)
Buffers: shared hit=10528
-> Hash (cost=3.00..3.00 rows=150 width=56) (actual time=68.683..68.684 rows=102654 loops=1)
Buckets: 131072 (originally 1024) Batches: 1 (originally 1) Memory Usage: 9546kB
Buffers: shared hit=2059
-> CTE Scan on seen s (cost=0.00..3.00 rows=150 width=56) (actual time=0.064..51.240 rows=102654 loops=1)
Buffers: shared hit=2059
-> Tid Scan on records_p67 r_1 (cost=0.00..1.02 rows=1 width=32) (never executed)
TID Cond: (p.ctid = ctid)
Filter: ((connection_id = 822600) AND ((model)::text = 'GithubStar'::text) AND (p.id = id))
Planning:
Buffers: shared hit=77
Planning Time: 0.706 ms
Execution Time: 88.122 ms
```
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
the joy of postgres planner. My test with some records earlier showed that MATERIALIZED was slightly slower but in some worse case scenario it 2 order magnitude worse
query plan before:
query plan after