one collector per agg request instead per bucket #2759

PSeitz-dd · 2025-12-04T03:52:34Z

In this refactoring a collector knows in which bucket of the parent their data is in. This allows to convert the previous approach of one collector per bucket to one collector per request.

Add PagedTermMap as another TermAggregationMap to reduce memory usage compared to a HashMap

It contains an optimization for low cardinality bucket id
Remove Clone on the collector (we only have one instance now)

Future Work

Fetch all values for all buckets once per collector (currently each collect fetches their own data per bucket)
Improve perf of group by bucket id in caching layer
Improve low cardinality detection
Remove PerRequestAggSegCtx, we can store now everything in the collector

Performance

The heavy hitters are drastically reduced in terms of memory and CPU.
For term aggs with many terms, we use a lot less memory.
We use some more buffers to pass docs, which increases memory consumption for some aggs.

Biggest regression is
terms_zipf_1000_with_avg_sub_agg Avg: 9.1190ms (+39.79%)
Which should be fixed when we fetch all values for all buckets at once.

full
average_u64                                    Memory: 21.7 KB (-2.30%)      Avg: 2.9480ms (-3.64%)      Median: 2.9303ms (-3.69%)      [2.8115ms .. 3.1867ms]       
average_f64                                    Memory: 21.7 KB (-2.30%)      Avg: 3.1238ms (-3.17%)      Median: 3.1146ms (-2.94%)      [3.0220ms .. 3.3337ms]       
average_f64_u64                                Memory: 23.0 KB (-0.48%)      Avg: 5.7650ms (-1.43%)      Median: 5.7655ms (-0.97%)      [5.6324ms .. 6.0755ms]       
stats_f64                                      Memory: 21.8 KB (-2.29%)      Avg: 3.1220ms (-2.72%)      Median: 3.1131ms (-2.49%)      [3.0185ms .. 3.3488ms]       
extendedstats_f64                              Memory: 23.0 KB (+3.22%)      Avg: 3.2849ms (-1.79%)      Median: 3.2591ms (-1.38%)      [3.1558ms .. 3.5704ms]       
percentiles_f64                                Memory: 39.4 KB (+37.37%)     Avg: 6.9987ms (-3.80%)      Median: 6.9900ms (-2.82%)      [6.9396ms .. 7.1972ms]       
terms_7                                        Memory: 36.7 KB (+3.00%)      Avg: 2.3431ms (+0.96%)      Median: 2.3425ms (+1.63%)      [2.2808ms .. 2.4372ms]       
terms_all_unique                               Memory: 14.7 MB (-50.24%)     Avg: 6.9163ms (-58.72%)     Median: 6.8885ms (-57.92%)     [6.7612ms .. 7.3554ms]       
terms_150_000                                  Memory: 3.0 MB (-55.96%)      Avg: 6.2613ms (-37.68%)     Median: 6.2406ms (-35.99%)     [6.1016ms .. 6.4347ms]       
terms_many_top_1000                            Memory: 5.2 MB (-33.38%)      Avg: 9.3138ms (-29.50%)     Median: 9.2938ms (-27.81%)     [9.1127ms .. 9.8866ms]       
terms_many_order_by_term                       Memory: 3.0 MB (-55.96%)      Avg: 4.9930ms (-58.18%)     Median: 4.9816ms (-57.86%)     [4.8966ms .. 5.2057ms]       
terms_many_with_top_hits                       Memory: 50.0 MB (-11.50%)     Avg: 96.4471ms (-40.65%)    Median: 94.6858ms (-41.67%)    [91.5090ms .. 113.9275ms]    
terms_all_unique_with_avg_sub_agg              Memory: 56.7 MB (-39.11%)     Avg: 17.5857ms (-74.43%)    Median: 17.4918ms (-74.10%)    [17.0980ms .. 19.1589ms]     
terms_many_with_avg_sub_agg                    Memory: 13.5 MB (-34.50%)     Avg: 15.2693ms (-45.67%)    Median: 15.1739ms (-44.65%)    [14.9133ms .. 17.1725ms]     
terms_status_with_avg_sub_agg                  Memory: 101.3 KB (+65.18%)    Avg: 6.2418ms (+13.02%)     Median: 6.2257ms (+13.12%)     [6.1074ms .. 6.4710ms]       
terms_status_with_histogram                    Memory: 137.1 KB (+28.02%)    Avg: 6.1062ms (+12.70%)     Median: 6.0941ms (+13.20%)     [6.0223ms .. 6.3640ms]       
terms_zipf_1000                                Memory: 69.2 KB (-12.86%)     Avg: 2.2407ms (+5.06%)      Median: 2.2369ms (+5.27%)      [2.2020ms .. 2.3248ms]       
terms_zipf_1000_with_histogram                 Memory: 1.2 MB (+20.39%)      Avg: 24.0328ms (+5.64%)     Median: 23.9883ms (+5.50%)     [23.8058ms .. 25.1477ms]     
terms_zipf_1000_with_avg_sub_agg               Memory: 463.4 KB (+27.36%)    Avg: 9.1190ms (+39.79%)     Median: 9.0712ms (+41.72%)     [8.8888ms .. 9.8575ms]       
terms_many_json_mixed_type_with_avg_sub_agg    Memory: 20.6 MB (-20.72%)     Avg: 25.2443ms (-41.91%)    Median: 25.1315ms (-41.44%)    [24.6717ms .. 26.7138ms]     
cardinality_agg                                Memory: 3.7 MB (-0.01%)       Avg: 30.6559ms (+3.37%)     Median: 29.7482ms (+0.62%)     [28.5122ms .. 35.9346ms]     
terms_status_with_cardinality_agg              Memory: 5.5 MB (+0.78%)       Avg: 74.4157ms (+4.42%)     Median: 72.5500ms (+2.05%)     [69.0952ms .. 87.0695ms]     
range_agg                                      Memory: 25.2 KB (-5.33%)      Avg: 3.4807ms (+7.94%)      Median: 3.3977ms (+5.44%)      [3.2741ms .. 4.0961ms]       
range_agg_with_avg_sub_agg                     Memory: 95.5 KB (+81.82%)     Avg: 7.2579ms (+3.35%)      Median: 7.1267ms (+1.81%)      [6.9252ms .. 8.1604ms]       
range_agg_with_term_agg_status                 Memory: 109.4 KB (+62.59%)    Avg: 6.7002ms (-69.18%)     Median: 6.5399ms (-69.70%)     [6.3694ms .. 7.7095ms]       
range_agg_with_term_agg_many                   Memory: 6.9 MB (+0.32%)       Avg: 14.4014ms (-53.85%)    Median: 14.3027ms (-54.12%)    [13.6193ms .. 15.9501ms]     
histogram                                      Memory: 22.6 KB (+0.67%)      Avg: 3.1483ms (-2.77%)      Median: 3.1182ms (-2.91%)      [3.0445ms .. 3.3583ms]       
histogram_hard_bounds                          Memory: 20.6 KB (+3.46%)      Avg: 1.6982ms (+3.78%)      Median: 1.6709ms (+2.33%)      [1.6003ms .. 1.9916ms]       
histogram_with_avg_sub_agg                     Memory: 122.7 KB (+68.32%)    Avg: 9.6292ms (+5.66%)      Median: 9.5913ms (+6.64%)      [9.3812ms .. 9.9690ms]       
histogram_with_term_agg_status                 Memory: 492.5 KB (+4.78%)     Avg: 12.9491ms (-34.46%)    Median: 12.8700ms (-34.61%)    [12.6297ms .. 13.8100ms]     
avg_and_range_with_avg_sub_agg                 Memory: 82.5 KB (+96.81%)     Avg: 10.1590ms (+3.49%)     Median: 10.0878ms (+3.37%)     [9.9766ms .. 10.6046ms]      
filter_agg_all_query_count_agg                 Memory: 139.3 KB (+21.71%)    Avg: 4.5550ms (+16.74%)     Median: 4.5375ms (+16.39%)     [4.4388ms .. 4.8239ms]       
filter_agg_term_query_count_agg                Memory: 140.0 KB (+21.85%)    Avg: 7.0402ms (+9.54%)     Median: 7.0113ms (+12.72%)     [6.9111ms .. 7.3698ms]       
filter_agg_all_query_with_sub_aggs             Memory: 157.1 KB (+19.28%)    Avg: 9.6685ms (+7.96%)      Median: 9.6228ms (+8.09%)      [9.4179ms .. 10.1717ms]      
filter_agg_term_query_with_sub_aggs            Memory: 157.5 KB (+19.23%)    Avg: 12.1100ms (+5.88%)     Median: 12.0301ms (+5.82%)     [11.8841ms .. 12.4835ms]

In this refactoring a collector knows in which bucket of the parent their data is in. This allows to convert the previous approach of one collector per bucket to one collector per request. low card bucket optimization

use paged term map in term agg use special no sub agg term map impl

increase cache to 2048

remove clone move data in term req, single doc opt for stats

fulmicoton · 2025-12-11T12:55:59Z

src/aggregation/cached_sub_aggs.rs

+    /// Only used when LOWCARD is true.
+    /// Cache doc ids per bucket for sub-aggregations.
+    ///
+    /// The outer Vec is indexed by BucketId.
+    per_bucket_docs: Vec<Vec<DocId>>,
+    /// Only used when LOWCARD is false.
+    /// For higher cardinalities we use a partitioned approach to store
+    ///
+    /// partitioned Vec<(BucketId, DocId)> pairs to improve grouping locality.
+    partitions: [PartitionEntry; NUM_PARTITIONS],


!? why use a boolean for this. I don't understand?

What boolean? Do you mean array?

It's done as a cheap inexact group_by on bucket_id

PSeitz-dd force-pushed the bucket_id_agg branch 2 times, most recently from a16d0ff to d97bda9 Compare December 5, 2025 06:53

improve bench

254314a

PSeitz force-pushed the bucket_id_agg branch 4 times, most recently from 59e9f5a to 856e5d5 Compare December 8, 2025 00:43

PSeitz and others added 5 commits December 8, 2025 10:20

add more tests for new collection type

0dd6a95

one collector per agg request instead per bucket

2ce4da8

In this refactoring a collector knows in which bucket of the parent their data is in. This allows to convert the previous approach of one collector per bucket to one collector per request. low card bucket optimization

reduce dynamic dispatch, faster term agg

c852bac

use radix map, fix prepare_max_bucket

030554d

use paged term map in term agg use special no sub agg term map impl

specialize columntype in stats

1b56487

PSeitz force-pushed the bucket_id_agg branch 2 times, most recently from 3356fc8 to d76b315 Compare December 8, 2025 02:34

remove stacktrace bloat, use &mut helper

78bd382

increase cache to 2048

PSeitz force-pushed the bucket_id_agg branch 2 times, most recently from 1f717da to 29add85 Compare December 8, 2025 02:44

PSeitz-dd force-pushed the bucket_id_agg branch from 29add85 to 411587a Compare December 8, 2025 06:32

cleanup

1591344

remove clone move data in term req, single doc opt for stats

PSeitz-dd force-pushed the bucket_id_agg branch from 411587a to 1591344 Compare December 10, 2025 06:34

PSeitz requested review from fulmicoton and trinity-1686a December 10, 2025 06:35

fulmicoton reviewed Dec 11, 2025

View reviewed changes

add comment

71dc084

PSeitz-dd force-pushed the bucket_id_agg branch from 8a4286c to 71dc084 Compare December 12, 2025 08:19

share column block accessor

87fe3a3

fulmicoton mentioned this pull request Dec 15, 2025

doc_count in aggregation result is not the actual number of documents #2721

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

one collector per agg request instead per bucket #2759

one collector per agg request instead per bucket #2759

Uh oh!

PSeitz-dd commented Dec 4, 2025 •

edited by PSeitz

Loading

Uh oh!

fulmicoton Dec 11, 2025

Uh oh!

PSeitz-dd Dec 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

one collector per agg request instead per bucket #2759

Are you sure you want to change the base?

one collector per agg request instead per bucket #2759

Uh oh!

Conversation

PSeitz-dd commented Dec 4, 2025 • edited by PSeitz Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Future Work

Performance

Uh oh!

fulmicoton Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

PSeitz-dd Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

PSeitz-dd commented Dec 4, 2025 •

edited by PSeitz

Loading