Use a moving average of cache size estimates #151

djmb · 2024-02-20T15:19:30Z

With small caches, the outlier query is very effective at reducing the
error bars on the cache size estimate. As the cache grows though, it
becomes a small % of the total cache size.

Since the entry sizes can have a long tail, the small number of large
entries in the sample query can have a big effect on the overall
estimate.

To counteract this, we'll use a moving average of the last N estimates.
The estimates will be stored directly in the cache so can be shared
amongst all processes.

We'll calculate N such that we'll try to have sampled at least 0.05% of
all records in the cache, with a maximum of 50 estimates. Testing shows
this should roughly keep us within a +/-5% error margin.

There's a race condition on writing the moving average back, but that
should be rare and not important.

If the cache is small enough so that the queries sample all the data,
we'll just write the exact value back to the cache and ignore previous
estimates.

With small caches, the outlier query is very effective at reducing the error bars on the cache size estimate. As the cache grows though, it becomes a small % of the total cache size. Since the entry sizes can have a long tail, the small number of large entries in the sample query can have a big effect on the overall estimate. To counteract this, we'll use a moving average of the last N estimates. The estimates will be stored directly in the cache so can be shared amongst all processes. We'll calculate N such that we'll try to have sampled at least 0.05% of all records in the cache, with a maximum of 50 estimates. Testing shows this should roughly keep us within a +/-5% error margin. There's a race condition on writing the moving average back, but that should be rare and not important. If the cache is small enough so that the queries sample all the data, we'll just write the exact value back to the cache and ignore previous estimates.

djmb force-pushed the size-estimate-moving-average branch from a511c95 to ca7381f Compare February 20, 2024 15:37

djmb force-pushed the size-estimate-moving-average branch from ca7381f to 4e18cce Compare February 23, 2024 13:00

djmb merged commit 3e61bfe into main Feb 23, 2024

djmb deleted the size-estimate-moving-average branch February 23, 2024 13:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use a moving average of cache size estimates #151

Use a moving average of cache size estimates #151

Uh oh!

djmb commented Feb 20, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Use a moving average of cache size estimates #151

Use a moving average of cache size estimates #151

Uh oh!

Conversation

djmb commented Feb 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

djmb commented Feb 20, 2024 •

edited

Loading