Skip to content

fix(records): materialized page and seen cte#6200

Merged
TBonnin merged 1 commit into
masterfrom
tbonnin/nan-5089/fix-outdated-materialized
May 21, 2026
Merged

fix(records): materialized page and seen cte#6200
TBonnin merged 1 commit into
masterfrom
tbonnin/nan-5089/fix-outdated-materialized

Conversation

@TBonnin

@TBonnin TBonnin commented May 21, 2026

Copy link
Copy Markdown
Collaborator

the joy of postgres planner. My test with some records earlier showed that MATERIALIZED was slightly slower but in some worse case scenario it 2 order magnitude worse

query plan before:

Update on records r  (cost=23.50..19420.04 rows=1 width=112) (actual time=114705.018..114705.024 rows=0 loops=1)
  Update on records_p67 r_1
  Buffers: shared hit=23076
  CTE seen
    ->  Append  (cost=0.44..22.39 rows=150 width=16) (actual time=0.061..18.629 rows=102646 loops=1)
          Buffers: shared hit=2054
          ->  ProjectSet  (cost=0.44..2.52 rows=10 width=16) (actual time=0.010..0.010 rows=0 loops=1)
                Buffers: shared hit=3
                ->  Index Scan using records_seen_20260519_connection_id_model_sync_job_id_idx on records_seen_20260519 records_seen_1  (cost=0.44..2.46 rows=1 width=752) (actual time=0.010..0.010 rows=0 loops=1)
                      Index Cond: ((connection_id = 822600) AND ((model)::text = 'GithubStar'::text) AND (sync_job_id >= 1806417674))
                      Buffers: shared hit=3
          ->  ProjectSet  (cost=0.44..11.16 rows=90 width=16) (actual time=0.051..10.470 rows=102646 loops=1)
                Buffers: shared hit=2051
                ->  Index Scan using records_seen_20260520_connection_id_model_sync_job_id_idx on records_seen_20260520 records_seen_2  (cost=0.44..10.64 rows=9 width=752) (actual time=0.049..1.889 rows=2053 loops=1)
                      Index Cond: ((connection_id = 822600) AND ((model)::text = 'GithubStar'::text) AND (sync_job_id >= 1806417674))
                      Buffers: shared hit=2051
          ->  ProjectSet  (cost=0.42..5.74 rows=40 width=16) (never executed)
                ->  Index Scan using records_seen_20260521_connection_id_model_sync_job_id_idx on records_seen_20260521 records_seen_3  (cost=0.42..5.51 rows=4 width=752) (never executed)
                      Index Cond: ((connection_id = 822600) AND ((model)::text = 'GithubStar'::text) AND (sync_job_id >= 1806417674))
          ->  ProjectSet  (cost=0.14..2.23 rows=10 width=16) (never executed)
                ->  Index Scan using records_seen_20260522_connection_id_model_sync_job_id_idx on records_seen_20260522 records_seen_4  (cost=0.14..2.17 rows=1 width=752) (never executed)
                      Index Cond: ((connection_id = 822600) AND ((model)::text = 'GithubStar'::text) AND (sync_job_id >= 1806417674))
  ->  Nested Loop Anti Join  (cost=1.11..19397.65 rows=1 width=112) (actual time=114705.017..114705.019 rows=0 loops=1)
        Join Filter: (s.id = p.id)
        Rows Removed by Join Filter: 512446625
        Buffers: shared hit=23076
        ->  Merge Join  (cost=1.11..19392.77 rows=1 width=72) (actual time=0.041..68.768 rows=10000 loops=1)
              Merge Cond: (p.id = r_1.id)
              Join Filter: (p.ctid = r_1.ctid)
              Buffers: shared hit=21022
              ->  Subquery Scan on p  (cost=0.56..9659.50 rows=9825 width=68) (actual time=0.028..38.598 rows=10000 loops=1)
                    Buffers: shared hit=10511
                    ->  Limit  (cost=0.56..9561.25 rows=9825 width=22) (actual time=0.024..29.928 rows=10000 loops=1)
                          Buffers: shared hit=10511
                          ->  Index Scan using records_p67_connection_id_model_id_key on records_p67 records  (cost=0.56..9561.25 rows=9825 width=22) (actual time=0.024..25.533 rows=10000 loops=1)
                                Index Cond: ((connection_id = 822600) AND ((model)::text = 'GithubStar'::text))
                                Filter: (deleted_at IS NULL)
                                Rows Removed by Filter: 387
                                Buffers: shared hit=10511
              ->  Index Scan using records_p67_connection_id_model_id_key on records_p67 r_1  (cost=0.56..9561.25 rows=9862 width=32) (actual time=0.009..10.623 rows=10387 loops=1)
                    Index Cond: ((connection_id = 822600) AND ((model)::text = 'GithubStar'::text))
                    Buffers: shared hit=10511
        ->  CTE Scan on seen s  (cost=0.00..3.00 rows=150 width=56) (actual time=0.000..8.445 rows=51246 loops=10000)
              Buffers: shared hit=2054
Planning:
  Buffers: shared hit=2
Planning Time: 0.472 ms
Execution Time: 114705.653 ms

query plan after

Update on records r  (cost=9588.51..14924.82 rows=1 width=112) (actual time=87.193..87.200 rows=0 loops=1)
  Update on records_p67 r_1
  Buffers: shared hit=12587
  CTE page
    ->  Limit  (cost=0.56..9561.25 rows=9825 width=22) (actual time=0.023..11.268 rows=10000 loops=1)
          Buffers: shared hit=10528
          ->  Index Scan using records_p67_connection_id_model_id_key on records_p67 records  (cost=0.56..9561.25 rows=9825 width=22) (actual time=0.023..10.436 rows=10000 loops=1)
                Index Cond: ((connection_id = 822600) AND ((model)::text = 'GithubStar'::text))
                Filter: (deleted_at IS NULL)
                Rows Removed by Filter: 387
                Buffers: shared hit=10528
  CTE seen
    ->  Append  (cost=0.44..22.39 rows=150 width=16) (actual time=0.062..19.721 rows=102654 loops=1)
          Buffers: shared hit=2059
          ->  ProjectSet  (cost=0.44..2.52 rows=10 width=16) (actual time=0.011..0.011 rows=0 loops=1)
                Buffers: shared hit=3
                ->  Index Scan using records_seen_20260519_connection_id_model_sync_job_id_idx on records_seen_20260519 records_seen_1  (cost=0.44..2.46 rows=1 width=752) (actual time=0.010..0.011 rows=0 loops=1)
                      Index Cond: ((connection_id = 822600) AND ((model)::text = 'GithubStar'::text) AND (sync_job_id >= 1806417674))
                      Buffers: shared hit=3
          ->  ProjectSet  (cost=0.44..11.16 rows=90 width=16) (actual time=0.051..11.180 rows=102654 loops=1)
                Buffers: shared hit=2051
                ->  Index Scan using records_seen_20260520_connection_id_model_sync_job_id_idx on records_seen_20260520 records_seen_2  (cost=0.44..10.64 rows=9 width=752) (actual time=0.049..2.080 rows=2053 loops=1)
                      Index Cond: ((connection_id = 822600) AND ((model)::text = 'GithubStar'::text) AND (sync_job_id >= 1806417674))
                      Buffers: shared hit=2051
          ->  ProjectSet  (cost=0.42..5.74 rows=40 width=16) (actual time=0.017..0.018 rows=0 loops=1)
                Buffers: shared hit=3
                ->  Index Scan using records_seen_20260521_connection_id_model_sync_job_id_idx on records_seen_20260521 records_seen_3  (cost=0.42..5.51 rows=4 width=752) (actual time=0.017..0.017 rows=0 loops=1)
                      Index Cond: ((connection_id = 822600) AND ((model)::text = 'GithubStar'::text) AND (sync_job_id >= 1806417674))
                      Buffers: shared hit=3
          ->  ProjectSet  (cost=0.14..2.23 rows=10 width=16) (actual time=0.012..0.013 rows=0 loops=1)
                Buffers: shared hit=2
                ->  Index Scan using records_seen_20260522_connection_id_model_sync_job_id_idx on records_seen_20260522 records_seen_4  (cost=0.14..2.17 rows=1 width=752) (actual time=0.012..0.012 rows=0 loops=1)
                      Index Cond: ((connection_id = 822600) AND ((model)::text = 'GithubStar'::text) AND (sync_job_id >= 1806417674))
                      Buffers: shared hit=2
  ->  Nested Loop  (cost=4.88..5341.19 rows=1 width=112) (actual time=87.192..87.194 rows=0 loops=1)
        Buffers: shared hit=12587
        ->  Hash Anti Join  (cost=4.88..281.82 rows=4912 width=108) (actual time=87.192..87.193 rows=0 loops=1)
              Hash Cond: (p.id = s.id)
              Buffers: shared hit=12587
              ->  CTE Scan on page p  (cost=0.00..196.50 rows=9825 width=68) (actual time=0.028..14.839 rows=10000 loops=1)
                    Buffers: shared hit=10528
              ->  Hash  (cost=3.00..3.00 rows=150 width=56) (actual time=68.683..68.684 rows=102654 loops=1)
                    Buckets: 131072 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 9546kB
                    Buffers: shared hit=2059
                    ->  CTE Scan on seen s  (cost=0.00..3.00 rows=150 width=56) (actual time=0.064..51.240 rows=102654 loops=1)
                          Buffers: shared hit=2059
        ->  Tid Scan on records_p67 r_1  (cost=0.00..1.02 rows=1 width=32) (never executed)
              TID Cond: (p.ctid = ctid)
              Filter: ((connection_id = 822600) AND ((model)::text = 'GithubStar'::text) AND (p.id = id))
Planning:
  Buffers: shared hit=77
Planning Time: 0.706 ms
Execution Time: 88.122 ms

@linear

linear Bot commented May 21, 2026

Copy link
Copy Markdown

NAN-5089

@TBonnin TBonnin requested a review from Copilot May 21, 2026 00:24
@superagent-security superagent-security Bot added contributor:verified Contributor passed trust analysis. pr:verified PR passed security analysis. labels May 21, 2026
@TBonnin TBonnin requested a review from rbwest May 21, 2026 00:24

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.

Re-trigger cubic

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adjusts the deleteOutdatedRecords batched soft-delete query to avoid pathological PostgreSQL query plans by forcing key CTEs to be materialized, improving join strategy selection and preventing extremely slow nested-loop anti-joins.

Changes:

  • Force page CTE to be MATERIALIZED so the planner treats the batch as a bounded driving set.
  • Force seen CTE to be MATERIALIZED to stabilize planning around the unnested record_ids set.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@TBonnin TBonnin enabled auto-merge May 21, 2026 00:26
@TBonnin TBonnin added this pull request to the merge queue May 21, 2026
Merged via the queue into master with commit 3ea8422 May 21, 2026
32 checks passed
@TBonnin TBonnin deleted the tbonnin/nan-5089/fix-outdated-materialized branch May 21, 2026 00:46
sapnesh-nango pushed a commit that referenced this pull request May 24, 2026
the joy of postgres planner. My test with some records earlier showed
that MATERIALIZED was slightly slower but in some worse case scenario it
2 order magnitude worse

query plan before:
```
Update on records r  (cost=23.50..19420.04 rows=1 width=112) (actual time=114705.018..114705.024 rows=0 loops=1)
  Update on records_p67 r_1
  Buffers: shared hit=23076
  CTE seen
    ->  Append  (cost=0.44..22.39 rows=150 width=16) (actual time=0.061..18.629 rows=102646 loops=1)
          Buffers: shared hit=2054
          ->  ProjectSet  (cost=0.44..2.52 rows=10 width=16) (actual time=0.010..0.010 rows=0 loops=1)
                Buffers: shared hit=3
                ->  Index Scan using records_seen_20260519_connection_id_model_sync_job_id_idx on records_seen_20260519 records_seen_1  (cost=0.44..2.46 rows=1 width=752) (actual time=0.010..0.010 rows=0 loops=1)
                      Index Cond: ((connection_id = 822600) AND ((model)::text = 'GithubStar'::text) AND (sync_job_id >= 1806417674))
                      Buffers: shared hit=3
          ->  ProjectSet  (cost=0.44..11.16 rows=90 width=16) (actual time=0.051..10.470 rows=102646 loops=1)
                Buffers: shared hit=2051
                ->  Index Scan using records_seen_20260520_connection_id_model_sync_job_id_idx on records_seen_20260520 records_seen_2  (cost=0.44..10.64 rows=9 width=752) (actual time=0.049..1.889 rows=2053 loops=1)
                      Index Cond: ((connection_id = 822600) AND ((model)::text = 'GithubStar'::text) AND (sync_job_id >= 1806417674))
                      Buffers: shared hit=2051
          ->  ProjectSet  (cost=0.42..5.74 rows=40 width=16) (never executed)
                ->  Index Scan using records_seen_20260521_connection_id_model_sync_job_id_idx on records_seen_20260521 records_seen_3  (cost=0.42..5.51 rows=4 width=752) (never executed)
                      Index Cond: ((connection_id = 822600) AND ((model)::text = 'GithubStar'::text) AND (sync_job_id >= 1806417674))
          ->  ProjectSet  (cost=0.14..2.23 rows=10 width=16) (never executed)
                ->  Index Scan using records_seen_20260522_connection_id_model_sync_job_id_idx on records_seen_20260522 records_seen_4  (cost=0.14..2.17 rows=1 width=752) (never executed)
                      Index Cond: ((connection_id = 822600) AND ((model)::text = 'GithubStar'::text) AND (sync_job_id >= 1806417674))
  ->  Nested Loop Anti Join  (cost=1.11..19397.65 rows=1 width=112) (actual time=114705.017..114705.019 rows=0 loops=1)
        Join Filter: (s.id = p.id)
        Rows Removed by Join Filter: 512446625
        Buffers: shared hit=23076
        ->  Merge Join  (cost=1.11..19392.77 rows=1 width=72) (actual time=0.041..68.768 rows=10000 loops=1)
              Merge Cond: (p.id = r_1.id)
              Join Filter: (p.ctid = r_1.ctid)
              Buffers: shared hit=21022
              ->  Subquery Scan on p  (cost=0.56..9659.50 rows=9825 width=68) (actual time=0.028..38.598 rows=10000 loops=1)
                    Buffers: shared hit=10511
                    ->  Limit  (cost=0.56..9561.25 rows=9825 width=22) (actual time=0.024..29.928 rows=10000 loops=1)
                          Buffers: shared hit=10511
                          ->  Index Scan using records_p67_connection_id_model_id_key on records_p67 records  (cost=0.56..9561.25 rows=9825 width=22) (actual time=0.024..25.533 rows=10000 loops=1)
                                Index Cond: ((connection_id = 822600) AND ((model)::text = 'GithubStar'::text))
                                Filter: (deleted_at IS NULL)
                                Rows Removed by Filter: 387
                                Buffers: shared hit=10511
              ->  Index Scan using records_p67_connection_id_model_id_key on records_p67 r_1  (cost=0.56..9561.25 rows=9862 width=32) (actual time=0.009..10.623 rows=10387 loops=1)
                    Index Cond: ((connection_id = 822600) AND ((model)::text = 'GithubStar'::text))
                    Buffers: shared hit=10511
        ->  CTE Scan on seen s  (cost=0.00..3.00 rows=150 width=56) (actual time=0.000..8.445 rows=51246 loops=10000)
              Buffers: shared hit=2054
Planning:
  Buffers: shared hit=2
Planning Time: 0.472 ms
Execution Time: 114705.653 ms
```

query plan after
```
Update on records r  (cost=9588.51..14924.82 rows=1 width=112) (actual time=87.193..87.200 rows=0 loops=1)
  Update on records_p67 r_1
  Buffers: shared hit=12587
  CTE page
    ->  Limit  (cost=0.56..9561.25 rows=9825 width=22) (actual time=0.023..11.268 rows=10000 loops=1)
          Buffers: shared hit=10528
          ->  Index Scan using records_p67_connection_id_model_id_key on records_p67 records  (cost=0.56..9561.25 rows=9825 width=22) (actual time=0.023..10.436 rows=10000 loops=1)
                Index Cond: ((connection_id = 822600) AND ((model)::text = 'GithubStar'::text))
                Filter: (deleted_at IS NULL)
                Rows Removed by Filter: 387
                Buffers: shared hit=10528
  CTE seen
    ->  Append  (cost=0.44..22.39 rows=150 width=16) (actual time=0.062..19.721 rows=102654 loops=1)
          Buffers: shared hit=2059
          ->  ProjectSet  (cost=0.44..2.52 rows=10 width=16) (actual time=0.011..0.011 rows=0 loops=1)
                Buffers: shared hit=3
                ->  Index Scan using records_seen_20260519_connection_id_model_sync_job_id_idx on records_seen_20260519 records_seen_1  (cost=0.44..2.46 rows=1 width=752) (actual time=0.010..0.011 rows=0 loops=1)
                      Index Cond: ((connection_id = 822600) AND ((model)::text = 'GithubStar'::text) AND (sync_job_id >= 1806417674))
                      Buffers: shared hit=3
          ->  ProjectSet  (cost=0.44..11.16 rows=90 width=16) (actual time=0.051..11.180 rows=102654 loops=1)
                Buffers: shared hit=2051
                ->  Index Scan using records_seen_20260520_connection_id_model_sync_job_id_idx on records_seen_20260520 records_seen_2  (cost=0.44..10.64 rows=9 width=752) (actual time=0.049..2.080 rows=2053 loops=1)
                      Index Cond: ((connection_id = 822600) AND ((model)::text = 'GithubStar'::text) AND (sync_job_id >= 1806417674))
                      Buffers: shared hit=2051
          ->  ProjectSet  (cost=0.42..5.74 rows=40 width=16) (actual time=0.017..0.018 rows=0 loops=1)
                Buffers: shared hit=3
                ->  Index Scan using records_seen_20260521_connection_id_model_sync_job_id_idx on records_seen_20260521 records_seen_3  (cost=0.42..5.51 rows=4 width=752) (actual time=0.017..0.017 rows=0 loops=1)
                      Index Cond: ((connection_id = 822600) AND ((model)::text = 'GithubStar'::text) AND (sync_job_id >= 1806417674))
                      Buffers: shared hit=3
          ->  ProjectSet  (cost=0.14..2.23 rows=10 width=16) (actual time=0.012..0.013 rows=0 loops=1)
                Buffers: shared hit=2
                ->  Index Scan using records_seen_20260522_connection_id_model_sync_job_id_idx on records_seen_20260522 records_seen_4  (cost=0.14..2.17 rows=1 width=752) (actual time=0.012..0.012 rows=0 loops=1)
                      Index Cond: ((connection_id = 822600) AND ((model)::text = 'GithubStar'::text) AND (sync_job_id >= 1806417674))
                      Buffers: shared hit=2
  ->  Nested Loop  (cost=4.88..5341.19 rows=1 width=112) (actual time=87.192..87.194 rows=0 loops=1)
        Buffers: shared hit=12587
        ->  Hash Anti Join  (cost=4.88..281.82 rows=4912 width=108) (actual time=87.192..87.193 rows=0 loops=1)
              Hash Cond: (p.id = s.id)
              Buffers: shared hit=12587
              ->  CTE Scan on page p  (cost=0.00..196.50 rows=9825 width=68) (actual time=0.028..14.839 rows=10000 loops=1)
                    Buffers: shared hit=10528
              ->  Hash  (cost=3.00..3.00 rows=150 width=56) (actual time=68.683..68.684 rows=102654 loops=1)
                    Buckets: 131072 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 9546kB
                    Buffers: shared hit=2059
                    ->  CTE Scan on seen s  (cost=0.00..3.00 rows=150 width=56) (actual time=0.064..51.240 rows=102654 loops=1)
                          Buffers: shared hit=2059
        ->  Tid Scan on records_p67 r_1  (cost=0.00..1.02 rows=1 width=32) (never executed)
              TID Cond: (p.ctid = ctid)
              Filter: ((connection_id = 822600) AND ((model)::text = 'GithubStar'::text) AND (p.id = id))
Planning:
  Buffers: shared hit=77
Planning Time: 0.706 ms
Execution Time: 88.122 ms
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor:verified Contributor passed trust analysis. pr:verified PR passed security analysis.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants