Skip to content

Conversation

@djmb
Copy link
Collaborator

@djmb djmb commented Feb 20, 2024

With small caches, the outlier query is very effective at reducing the
error bars on the cache size estimate. As the cache grows though, it
becomes a small % of the total cache size.

Since the entry sizes can have a long tail, the small number of large
entries in the sample query can have a big effect on the overall
estimate.

To counteract this, we'll use a moving average of the last N estimates.
The estimates will be stored directly in the cache so can be shared
amongst all processes.

We'll calculate N such that we'll try to have sampled at least 0.05% of
all records in the cache, with a maximum of 50 estimates. Testing shows
this should roughly keep us within a +/-5% error margin.

There's a race condition on writing the moving average back, but that
should be rare and not important.

If the cache is small enough so that the queries sample all the data,
we'll just write the exact value back to the cache and ignore previous
estimates.

@djmb djmb force-pushed the size-estimate-moving-average branch from a511c95 to ca7381f Compare February 20, 2024 15:37
With small caches, the outlier query is very effective at reducing the
error bars on the cache size estimate. As the cache grows though, it
becomes a small % of the total cache size.

Since the entry sizes can have a long tail, the small number of large
entries in the sample query can have a big effect on the overall
estimate.

To counteract this, we'll use a moving average of the last N estimates.
The estimates will be stored directly in the cache so can be shared
amongst all processes.

We'll calculate N such that we'll try to have sampled at least 0.05% of
all records in the cache, with a maximum of 50 estimates. Testing shows
this should roughly keep us within a +/-5% error margin.

There's a race condition on writing the moving average back, but that
should be rare and not important.

If the cache is small enough so that the queries sample all the data,
we'll just write the exact value back to the cache and ignore previous
estimates.
@djmb djmb force-pushed the size-estimate-moving-average branch from ca7381f to 4e18cce Compare February 23, 2024 13:00
@djmb djmb merged commit 3e61bfe into main Feb 23, 2024
@djmb djmb deleted the size-estimate-moving-average branch February 23, 2024 13:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants