-
Notifications
You must be signed in to change notification settings - Fork 168
Fix broken cache eviction in clockcache #1603
Conversation
88dfafb to
7a7b10e
Compare
|
|
||
| if insertPtr != nil { | ||
| self.next = next + 1 | ||
| self.next = (startLoc + next + 1) % self.Len() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we sure we don't need % self.Len() on line 189 too? Even if startLoc is the last element?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming that self.storage has length 10, and startLoc == 9 (i.e. it points to the last element).
When we slice self.storage into postStart and preStart, the contents of postStart will be one item, the last item, and the contents of preStart will be the first 9 items.
So even if startLoc is the last element, preStart will never contain that element.
This is because go's slices are half-open (i.e. exclusive of the last element).
|
The commit I added to simplify the |
Benchmark Result for 7d8d512Master vs PR |
7a7b10e to
9b852fd
Compare
|
It looks like the benchmark job confirms the benchmark that I ran: I've removed the commit which reworks the cache. |
|
Just wondering if you managed to benchmark version of code that does not split array? |
|
Is #1603 (comment) benchmark before the fix or after the fix. If its after the fix, then we are loosing perf in |
9b852fd to
fede97f
Compare
This benchmark is after the fix. I was curious why we're losing performance, so I took a deeper look at this to better understand it. I think that I understand it now. What follows is a bit of a brain dump into understanding what's going on. The bottom line is: buggy code can look fast, but if it's buggy, it doesn't matter how fast it is. My first hypothesis was: maybe the code being emitted is different enough to be causing a problem (because we broke some "complier magic" like bounds check elimination), so I took to the disassembler (using This was before the fix: And after the fix: That function call is probably not helping, so I replaced There are more instructions (7 vs 2), but in a branch of code which is not hit very frequently, so I don't think that it could be exactly here that is causing the problem, but there's only one way to find out. So I used Then, using pprof's Before fix: After fix: The first thing to note is that practically no time is spent on the line of code that we were looking at in the disassembler. What is also interesting is that where time is spent in the function is different. The explanation for this is relatively simple: the bug that we had would effectively ensure that With this insight, I had another theory: because the location we check for an entry to evict is different in the broken and fixed versions, maybe we spend more time looking for an item to evict in the fixed code. I instrumented the code to tell me how many items in The final attempt, which yielded fruit, was the realisation that the memory location that we're looking at is always the same in the broken version. In the broken code, most of the time The patch: diff --git a/pkg/clockcache/cache.go b/pkg/clockcache/cache.go
index 43448ca7..1a36818f 100644
--- a/pkg/clockcache/cache.go
+++ b/pkg/clockcache/cache.go
@@ -162,7 +162,7 @@ func (self *Cache) evict() (insertPtr *element) {
// if the value is merely guarded by e.g. `if next >= len(slice) { next = 0 }`
// the bounds check will not be elided. Doing the walk like this lowers
// eviction time by about a third
- startLoc := self.next
+ startLoc := 1
postStart := self.storage[startLoc:]
preStart := self.storage[:startLoc]
for i := 0; i < 2; i++ {The result when comparing between current master and this PR with the above patch: (If you're curious, the I believe that this explains what changed in our code to cause this change in performance. What it doesn't explain is why this change causes a ~20% change in eviction performance. I suspect that the answer has to do with "magical CPU stuff", like branch prediction, or cache access patterns. I would like to be able to answer this, but I cannot (yet). |
This was the benchmark result: #1603 (comment) |
The clockcache is a simple approximation of an LRU cache. Instead of maintaining an exact ordering of which elements were least recently used, the cache simply keeps track of whether each element is "used". When an item is inserted into the cache, it is counted as unused. When an item is fetched from the cache, it is counted as used. The eviction algorithm looks through the elements of the cache one by one, trying to find one to evict. If the item is used, it marks it as unused. If it is unused, it can evict the item. The next time an eviction is required, it can continue with the next element. Through a bug in index manipulation, if the next item to be evicted was at index 1 of the storage array, the next eviction would not continue to index 2. Instead, after successfully evicting an item from index 1, it would attempt (on the next eviction) to evict the item at index 1 (again). The above, combined with the fact that items just inserted are marked as unused, meant that it was possible to end up in a situation where each insert evicted the _previously inserted_ item. This caused much woe, and thrashing.
fede97f to
826d148
Compare
Description
The clockcache is a simple approximation of an LRU cache. Instead of
maintaining an exact ordering of which elements were least recently
used, the cache simply keeps track of whether each element is "used".
When an item is inserted into the cache, it is counted as unused. When
an item is fetched from the cache, it is counted as used.
The eviction algorithm looks through the elements of the cache one by
one, trying to find one to evict. If the item is used, it marks it as
unused. If it is unused, it can evict the item. The next time an
eviction is required, it can continue with the next element.
Through a bug in index manipulation, if the next item to be evicted was
at index 1 of the storage array, the next eviction would not continue to
index 2. Instead, after successfully evicting an item from index 1, it
would attempt (on the next eviction) to evict the item at index 1
(again).
The above, combined with the fact that items just inserted are marked as
unused, meant that it was possible to end up in a situation where each
insert evicted the previously inserted item.
This caused much woe, and thrashing.
Merge requirements
Please take into account the following non-code changes that you may need to make with your PR: