cache: replace read/dirty with a concurrent hash-trie#6
Draft
twmb wants to merge 21 commits into
Draft
Conversation
- Bump go.mod from 1.18 to 1.24. The code has required at least 1.19 since the switch to generic atomic types (atomic.Pointer[T], atomic.Int64, atomic.Uint32). 1.24 also unlocks maps.Copy, which replaces the hand-rolled copy loop in promote. - newStale: when the main entry's expiry is 0 (infinite cache) or -1 (immediate expiry), base the stale's lifespan on now() instead of producing a stale born at the Unix epoch. This was a latent bug: Get paths typically regenerated the stale via maybeNewStale before it was observed, but the initial stale set by Swap/Set was wrong. - Clean: call e.del() directly instead of c.Delete(k). c.each already promotes when necessary, so every entry the callback sees lives in the read map and needs no dirty-map bookkeeping or per-key relock. - Document that Expire is a no-op on in-flight loads for Cache, Item, and Set.
Test coverage climbs from 84.3% to 100% of statements. The new tests
either target concretely uncovered behavior (Item/Set wrappers, autoclean
lifecycle, Clean no-op on MaxStaleAge<0) or target race windows in the
existing read/dirty implementation (Swap-vs-in-flight-Get,
Delete-during-load, Expire-no-op-during-load, maxAge=0 collapsing, the
promote CAS-retry path, Swap observing the promotingDelete sentinel,
Get's unlocked-miss / locked-hit interleaving, CompareAndSwap against a
just-promoted entry). Single -count=1 -race runs land 100% coverage
reliably on this machine; the stress test driving the tightest windows
runs for ~2s against many goroutines.
The Swap-vs-in-flight-Get test caught a real data race (reported by
-race). After Get's final e.get, the original code returned l.v without
synchronizing with any writer: concurrent Swap's defer writes l.v under
l.mu and calls l.wg.Done, but Get was only waiting on whatever loading
e.p pointed to at the time, which could be Swap's replacement (a
different loading). Add l.wg.Wait() before returning l.v to establish
the happens-before edge with whoever finalized l (either setve or
Swap's defer).
New test files:
- wrappers_test.go: every Item and Set method
- autoclean_test.go: AutoCleanInterval goroutine lifecycle and
StopAutoClean idempotency
- concurrency_test.go: race-targeted tests and the miss/stale
correctness properties called out in the trie port plan
Introduces cache/trie.go: a generic concurrent hash-trie map modeled on
Go's internal/sync.HashTrieMap, but implemented entirely in userspace
(hash/maphash for the runtime's typed hasher, unsafe.Sizeof(uintptr(0))
for pointer-size, no internal/abi or internal/goarch). The trie is not
yet wired into Cache; that happens in a later commit.
Design:
- Root is a lazily-allocated indirect node, atomically swapped on
Clear so readers holding the old root keep operating on it.
- Interior nodes hold 16 atomic child pointers; leaves hold a key,
an atomic value slot (*V), and an atomic overflow pointer for
hash-collision chains.
- Load is lock-free; LoadOrStore and Delete take the parent's
per-bucket mutex. Delete prunes empty parents bottom-up while
holding locks hand-over-hand.
- expand splits a leaf into a subtree on hash-prefix collision; full
hash collisions chain through overflow.
Tests (cache/trie_test.go) cover: zero-value trie, basic store/load,
load-or-store semantics with nil-value initialization, single and
collision-chain delete (head and mid-chain), large-key stress (20k keys)
through expand, full hash collisions via a test-only hashFn hook,
concurrent disjoint keys, concurrent contention on the same key,
concurrent delete-prune, range-during-mutation, and the deleteEntry
lock-retry paths under sustained delete contention.
Coverage is 99.1%. The five uncovered statements are defensive panics
at invariant-violation points (tree deeper than the hash has bits, node
type discriminator corrupted, etc.) that cannot fire without memory
corruption; each is documented inline with the reason it is unreachable.
Also:
- Scale TestMaxIdleAge sleep windows up by ~4x so the timing-based
assertions tolerate -count=3 -race load. The previous 30ms-within-
50ms-window spacing had no margin for scheduler delay.
- Fix TestSwap_CancelsInFlightGet: synchronize on the miss goroutine
actually completing (via a missDone channel) before asserting the
miss ran. Previously the test asserted a flag set by a goroutine
that hadn't been scheduled to run yet under heavy load.
- Modernize remaining int-range for-loops.
The read/dirty/promote structure is replaced by the concurrent hash-trie
from the previous commit. Public API, stale/TTL semantics, singleflight
collapsing, and CompareAndSwap/Delete behavior are preserved.
Shape of the rewrite:
- Cache holds a single `trie[K, loading[V]]` instead of r/mu/dirty/
misses/pd.
- An `ent[K, V]` type alias (Go 1.24 generic aliases) names the trie's
concrete entry type so callers can read `*ent[K, V]` instead of
`*trieEntry[K, loading[V]]` everywhere. All entry-manipulating logic
(entGet, entTryGet, entDel, entLoad, entMaybeNewStale) is free
functions over that alias, since Go generics do not permit methods
on instantiation-specific aliases.
- Delete physically removes the entry via the trie's deleteEntryIf
primitive (with a "still tombstoned under lock" predicate to avoid
clobbering a concurrent Swap resurrection). This lets keys and their
referents be GC'd, preserving TestIssue40999.
- Clean walks the trie, tombstones expired entries, and then calls
deleteEntryIf for each tombstone so the trie's memory footprint does
not grow unboundedly with delete churn.
- Range iterates via the trie walk; no snapshot (matches sync.Map and
the existing Range docstring).
- Swap's fast path CAS on an existing entry's value slot is preserved
without any global lock; the slow path goes through loadOrStoreEntry
and resolves races on the trie's per-bucket mutex.
Added to the trie for Cache's benefit:
- deleteEntryIf(k, pred): removes only if pred returns true under the
parent bucket's lock. Callers use this to avoid the
resurrected-during-delete race.
- Inline chain removal in deleteEntryIf, retiring the separate
trieRemoveFromChain helper (it had a "not found" return path that
became dead after the pred-and-locate restructure).
Two new tests beyond the existing suite:
- TestGet_ConcurrentStaleDuringInFlight: a second Get observes a
stale-returning loading that is already in flight from the first Get
and returns the stale without attempting its own miss.
- TestGet_StaleRefreshCASRace: 16 concurrent Gets on an expired-with-
stale key exercise the slow-path CAS-retry that replaces a finalized
prev loading with a fresh loading carrying a stale snapshot.
- TestClean_SkipsInFlightLoads: Clean must skip entries whose load has
not finalized.
- TestTrie_DeleteRaceAgainstExpand: deleteEntryIf's "slot changed to
non-entry after lock" branch, triggered by concurrent insert on
hash-prefix-colliding keys forcing an expand during delete.
Full suite passes under -race; coverage settles at 98.6%-99.0% of
statements across independent runs. The remaining uncovered lines are
documented defensive panics (invariant violations that require
corruption to fire) and two tightly-timed race branches that need
scheduler interleavings we cannot force from userspace without test
hooks.
Benchmark comparison against the pre-trie baseline is deferred to the
cleanup commit.
Update the design blurb (no more read/dirty promotion), the coverage footnote (97% → ~99% plus a note on the defensive-panic residue), and the microbenchmark section (fresh numbers from the current code on Apple M1; the old table compared sync.Map to the pre-trie cache and is no longer meaningful). Flag that per-op allocations on the insert/swap paths are intrinsic — the singleflight machinery (loading[V] with a Mutex, WaitGroup, and atomic counters) is something sync.Map doesn't carry, and carrying it costs a single heap allocation per inserted key.
Add three tests and one benchmark that exist in sync/map_test.go and
sync/map_bench_test.go but were not in our copy:
- TestConcurrentClear: 10 writers, 10 readers, and 10 Clear goroutines
run concurrently. Correctness is no panic and no phantom keys (keys
we never Stored appearing in Range after the dust settles).
- TestMapClearOneAllocation: asserts Clear allocates ≤1 time. Cache's
Clear replaces the trie root with one fresh indirect node, matching
the sync.Map guarantee.
- TestMapRangeNoAllocations: asserts Range does not allocate. The
benchmarks already showed 0 B/op but an explicit AllocsPerRun test
guards against a regression in the Range closure wiring.
- BenchmarkClear: Clear throughput.
Skipped from stdlib (intentionally):
- TestMapMatchesDeepCopy: a second mapInterface oracle (DeepCopyMap).
Our TestCacheMatchesRWMutex already covers the semantics; the stdlib
keeps a second oracle mainly to catch sync.Map-specific promotion
bugs we no longer have.
- TestMapMatchesHashTrieMap: stdlib's internal hash trie as the
oracle. We are the hash trie.
- TestHashTrieMap{BadHash,TruncHash}: pathological-hash tests. Our
TestTrie_FullHashCollision{,DeleteHead} already exercises full
collisions via the hashFn hook.
- BenchmarkHashTrieMapLoad[Small|Large], LoadOrStore[Large]: trie-
level microbenchmarks. Redundant with the cache-level benchmarks
since Cache is thin over trie.
An audit against internal/sync.HashTrieMap (which trie.go is modeled on) and the prior read/dirty implementation found three interleaving bugs, each reproduced with hammer tests before fixing: 1) trie.deleteEntryIf gave up (returned nil) when the slot it walked to was expanded into an indirect node between its unlocked walk and its lock, silently skipping a delete of a present key (3,109/200k rounds). Upstream's find() retries from the root in this case; we now do too. 2) Tombstoned entries (nil value slot) could be resurrected by Swap/Get via a bare CAS(nil, loading) that nothing serialized against Delete's and Clean's pred-then-unlink, so a concurrent Set/Swap could land in an entry that was then physically unlinked: the swap reported success (Miss = "stored") and the value vanished (66/200k rounds). The old read/dirty code was immune via the promotingDelete sentinel, which made tombstone removal a single CAS; the port lost that atomicity. A nil value slot is now terminal: no code path ever stores over it. Writers that find a tombstone unlink the dead entry and re-create the key as a fresh entry, which makes deleteEntryIf's nil check stable under the bucket lock. Delete also now derives its return value from the loading it actually displaced, so the value reported deleted is the value removed. 3) Get created its entry with loadOrStoreEntry(k, nil) and CAS'd the loading in afterwards; in that window a fresh, never-expired entry was indistinguishable from a tombstone and Clean would evict it (1/500k rounds), also breaking request collapsing for that key. Get now creates entries with the loading already in place; a linked entry never has a nil slot except as a tombstone. Also: CompareAndDelete now physically unlinks like Delete (tombstones no longer leak until Clean); NewSet/NewItem now actually honor AutoCleanInterval (Set.StopAutoClean previously stopped a goroutine that was never started) and Item gains Clean/StopAutoClean; onlyFinalized is collapsed into finalized (the third state it guarded against retired with the read/dirty machinery); stale comments describing promote/read/ dirty are rewritten; TestTrie_DeleteRaceAgainstExpand now asserts the delete succeeds (it previously only asserted no-crash, exercising bug 1 without catching it); regression tests added for 2 and 3.
The expand-race test's hash literals overflowed uintptr on 32-bit platforms, so 'go vet' / compiling the tests failed there (caught by a GOARCH=386 cross-vet; the library itself was fine). Build the constants with shifts off ptrBits so the test exercises the same top-4-bits collision shape at either word size.
Expire could be silently overwritten by a concurrent idle extension: the extension stored a fresh expiry computed from a clock sampled BEFORE loading the expiry, so an Expire whose now()-1 landed between the two samples looked unexpired and was extended over. Extension now (a) loads the expiry before sampling the clock, so any Expire value it observes is necessarily in its past, and (b) CASes against the observed expiry, so an Expire landing between the load and the write wins. Notably, fix (b) alone was not enough — the regression hammer caught the residual (a) window at ~1/2000 rounds. CompareAndSwap/CompareAndDelete now agree with TryGet about what exists: expired entries and errored entries (whose v is just the miss function's placeholder return) no longer match. Get's docs now state the miss function's panic behavior (it runs on an internal goroutine; a panic crashes the process) and the same-key re-entrancy deadlock. New verification layers: - A linearizability checker: batches of concurrent Set/Swap/Delete/ TryGet/CAS/CAD against single keys, verified to be explained by some real-time-consistent sequential ordering. This catches atomicity bugs generically rather than per-interleaving (the prior tombstone- resurrection bug class fails this checker directly). - A trie structural validator (hash placement, overflow chain purity, parent pointers, dead flags, pruning completeness) run at the quiesce point of every concurrent trie test. - GitHub Actions CI: -race on amd64 and arm64 (weaker memory ordering surfaces reordering bugs x86's TSO hides), plus linux/386 without -race (unsupported there) to exercise the 32-bit hash-budget path. The README test-suite paragraph is rewritten to describe the actual layers and the measured 98% coverage.
Implements the second audit round (independent session), verified by a review pass with full interleaving traces and mutation-tested regressions: - now() is monotonic (time.Since of a process epoch, padded 1s clear of the 0 sentinel). This makes the load-expiry-then-sample-clock ordering argument airtight rather than dependent on wall-clock sanity, and immunizes TTLs against NTP steps. Trade-off (suspend pauses aging) documented on epoch. - Idle extension can no longer resurrect expired values: the extension CAS is gated on !expired, fixing the waited-expired path where a MaxAge(0) collapse value was revived for the idle window; and the driving Get's tail no longer extends (it publicly returns Miss, not a Hit, and extension was silently rewriting the MaxAge-born TTL). - Clean and idle extension now contend on the expiry word: Clean claims it (CAS to the expired sentinel) before tombstoning, so a racing extension either wins (entry kept) or loses the same way it loses to Expire, instead of being silently thrown away by the eviction. - Clean no longer evicts entries whose stale is still being served: for born-expired entries (MaxAge(0)/MaxErrorAge(0) store -1) the grace window anchored at the epoch rather than the entry. Pre-existing bug found while verifying the Clean change. - Get's replace decision is re-derived from prev itself, and the stale piggyback is pinned to the slot it read, so a load that finalizes live in the window between Get's slot read and entGet's cannot be demoted to a stale and redundantly re-driven. - Delete and Swap classify the displaced value with a clock sampled at the displacing CAS, not after, so a value removed live is not misreported as expired. - All expiry arithmetic saturates at MaxInt64 (satAdd), so huge Durations mean "effectively forever" instead of wrapping negative into born-expired (or, in Clean, evict-everything). - Doc fixes: NaN / non-comparable key behavior, MaxErrorAge's no-throttling-with-stales caveat, MaxIdleAge best-effort note, MaxStaleAge without MaxAge, autoclean pinning the cache, terminal- tombstone contract on trieEntry, StopAutoClean/Set wording. All four new regression tests fail against the reverted implementation (the redrive test by double-close panic); the full -race suite, the linearizability checker, and the prior round's hammers pass on amd64, with vet clean on linux/386.
Third audit round, independently verified (interleaving traces of every expires-word consumer against the new sentinel; mutation check: the regression tests fail against the reverted implementation): - Clean now claims an entry's expiry word with a dedicated sentinel (expiredClaimed, -2) instead of -1. -1 is the born-expired sentinel, which newStale deliberately re-anchors at now — so a Get snapshotting a stale in the window between Clean's claim and its tombstone CAS would resurrect the certified-dead value as a fresh Stale for up to another MaxStaleAge. entMaybeNewStale treats a claimed word as "no stale". Pinned deterministically by performing Clean's claim by hand (TestClean_ClaimWindowDoesNotResurrectDeadStale) plus a probabilistic hammer against the real Clean. - Born-expired entries (MaxAge(0) / MaxErrorAge(0) store -1) are now reclaimable as soon as their stale dies: their grace window was computed as satAdd(-1, maxStaleAge), anchored at the process epoch, so stale-less dead entries (error churn under MaxErrorAge(0)) could not be reclaimed until process uptime exceeded MaxStaleAge — and never, for huge stale ages. Reclaim is gated on TryGet visibility: value expired and stale dead. - Idle extension re-samples the clock immediately before publishing, shrinking the revive-after-expiry window from "whole deschedule since the expiry check" to a few instructions; the residual window is documented as best effort on MaxIdleAge. - Doc accuracy: Swap/Set "load is canceled" -> "load's result is discarded (the miss function is not interrupted)"; TryGet docs now cover expired-with-valid-stale returning Stale; MaxStaleAge accepts any negative as "forever" and documents the store-anchored stale window of Expired-but-unrefreshed keys; Item/Set Clean docs match Cache.Clean's actual rules.
…orward Fourth audit round (expiry/stale state machine), independently verified: all seven of the audit's repro tests were confirmed red against b200842 and are green now; eleven regression tests are ported into the suite. The round found three root causes composing into five bugs, two High: - The stale field conflated two meanings: the refresh companion (the previous generation, served while a replacement loads or errors) and a value's own post-expiry grace window. Only Set/Swap values got a self window (finalizedLoading aliased the companion to the value); Get- loaded values had none — TryGet missed at the instant of expiry while Get served a stale for the same state — or carried the previous generation, letting TryGet serve an older value than the freshest dead one. Idle extensions moved the expiry but not the frozen window, so long-lived entries died with their promised window already closed. The self window is now derived at read time from the expires word (staleOpen: anchor + MaxStaleAge, where anchor is the expiry, the birth for born-expired values, or the Expire timestamp), used identically by entGet, loadingTryGet, Swap's classification, and Clean's reclaim gate, so every read and refresh agrees by construction. The stale field reverts to a pure refresh companion (in-flight and errored reads only); finalizedLoading stores none. - Born-expired words (-1) destroyed the only datum that could anchor their window: entMaybeNewStale re-anchored a certified-dead value's stale at every Get-driven refresh, serving values of unbounded age as Stale (TryGet: Miss, Get: Stale, no race required). The word now carries the negated birth (-now(); the epoch pad keeps it clear of expiredClaimed), and the window anchors at the birth everywhere. - Expire blindly stored now()-1, moving already-expired words forward — re-anchoring dead stale windows (the invalidation API extending data lifetime) and overwriting Clean's claim sentinel inside its claim window. Expire now CASes and only ever moves the word backward; expiring an expired entry is a no-op, documented. Documented, not changed: Delete/Swap may report a concurrently finalizing load as the removed value (closing it costs more than the return triple is worth), and under MaxErrorAge(0) collapsed waiters on an erroring load re-drive serially (the literal "errors are never cached" contract; sharing completed errors is a design change deserving its own decision). Plus doc accuracy: MaxStaleAge's uniform anchoring, MaxIdleAge shortening on access, errors-cached-forever default, and the expires-word grammar at expiredClaimed.
Fifth audit round (the read-time stale window machinery), one Low: - A read assembled its classification from two separate loads: the expires word, then the clock. A reader descheduled between them while an idle extension landed classified the dead pre-extension word at a fresh clock — TryGet reported Stale (with MaxStaleAge) or Miss (without) for an entry that was live at every instant of its call, and a later TryGet returned Hit with no intervening write, a history no sequential order explains. Get healed itself by re-deriving in its slow loop; TryGet surfaced it. expiresNow now pairs the word with the clock coherently: load, sample, confirm the word unchanged, re-deriving on movement. Each retry re-loads before re-sampling, preserving the ordering that classifies a concurrent Expire as expired rather than extending over it. Both readers use it; Range's and Delete's caller-pinned clocks predate the word load, so that inverted pairing is already conservative. The repros hold the deschedule window open via expiresPairingHook, an unexported nil-by-default test hook (the trie hashFn precedent), and were confirmed red against the pre-fix code. - The finalized classification was hand-mirrored across loadingTryGet, entGet, and Swap's displaced-value report — a divergence class that already bit once. classifyFinalized is now the one shared ladder; entGet's waited-courtesy precheck stays at its call site, and the idle-extension epilogue keys off the classified state. Doc precision: an Expire racing the entry's natural death can land just after it and trim (never grow) the dead value's remaining stale window; non-positive MaxIdleAge is ignored entirely. Plus five pins for cells the suite did not assert: Swap displacing a born-expired value (both sides of the birth-anchored window), Expire on an expired-but-in-window word (byte-identical no-op, positive and negated-birth words), Clean keeping an expired error whose companion is still servable, exact-nanosecond window boundaries across staleOpen, newStale, and stale.expired, and the safe (backward) direction of the paired read.
Sixth audit round (full re-read of cache.go and trie.go plus every test file, all six scenario classes re-traced), one Low: - tryCAS sampled the matched value's liveness once, then allocated the replacement loading, then published with the pointer CAS — the whole allocation (the widest deschedule risk in the call) sat between the liveness sample and the publish. A CAS caller parked there while the value's TTL lapsed still published: TryGet had already certified Miss, and no sequential order explains TryGet=Miss followed by CAS=true (CAS-first requires the TryGet to see the freshly installed value; TryGet-first requires the CAS to fail on an expired match). The window cannot be closed outright: a successful swap takes effect at the pointer CAS, and the clock cannot be read at that exact instant — pre-claiming the expiry word would instead delay observable expiry, and post-CAS rollback can race a reader that already observed the new value. So this follows the MaxIdleAge precedent: tryCAS now re-validates immediately before every CAS attempt, shrinking the window from "alloc + deschedule" to a few instructions, and the residue is documented on CompareAndSwap and CompareAndDelete (the mismatch direction stays exact). The repro holds the deschedule open via casPublishHook (nil-by-default, the expiresPairingHook precedent) and was confirmed red against the pre-fix ordering for both CAS and CAD. The linearizability checker deliberately runs an expiry-free config (expiry is a clock transition, not an operation), so the hook pin is the vehicle for this finding, as in rounds three through five. Doc accuracy: AutoCleanInterval now states that with a negative MaxStaleAge the goroutine is not started at all (Clean would be a no-op; pinned by TestAutoClean_DisabledWhenStaleAgeNegative); Item's CompareAndSwap/CompareAndDelete point at the Cache caveat. Everything else re-verified clean this round: tombstone terminality under the bucket lock, the claim sentinel against every expires-word consumer, derived stale windows being absolute intervals (a late snapshot cannot extend one), prune lock ordering (child linked implies parent not dead), Get's pin-and-re-derive slow path, and the WaitGroup/mutex finalize-once protocol.
Seventh audit round (full re-read of cache.go, trie.go, and every test file; all six scenario classes re-traced), two Lows, both doc accuracy on the same caveat surface: - The sixth round's commit message claims Item's CompareAndSwap and CompareAndDelete "point at the Cache caveat", but that change never made it into the diff: the Item docs carried neither the stale-values-not-considered sentence nor the best-effort expiry caveat. Both now point at the Cache method docs, following the Item.Get "See Cache.Get" precedent. - The Cache caveat described the best-effort residue as "a value whose TTL lapses while the swap is in flight"; the same instructions-wide window also lets a concurrent Expire's kill slip through, and that direction is otherwise unexplainable (Expire-first requires the CAS to fail on the dead match; CAS-first requires the Expire to kill the swapped-in value, which a later TryGet=Hit contradicts). The window is irreducible for Expire exactly as for TTL lapse — Expire writes the expiry word while the publish is a pointer CAS, and the sixth round already weighed and rejected pre-claiming the word and post-CAS rollback — so the caveat now says expiry generally: TTL or Expire. The wide (allocation) side of the Expire direction is exact, not best effort: the pre-publish re-validation loads the expiry word fresh, so an Expire landing while the caller is parked there is caught. Pinned by TestTryCAS_RevalidationCatchesExpire via casPublishHook (the TTL sibling only lapses a deadline), confirmed red against the pre-revalidation ordering for both CAS and CAD. Everything else re-verified clean this round: the claim sentinel against Swap/Delete/Expire/extension (including a benign by-value ABA on the expires word: a claim landing after extend-then-Expire returns the word to its evaluated value is semantically identical to the no-race claim), tombstone-shell unlink-then-recreate under every writer path (never two live entries for one key), walk's single-visit-per-key claim under expand and prune, Clear against in-flight writers on the old root, the error/companion classification ladder, and Clean's conservative companion gate for err-free values (provably never outlives the derived self-window gate).
Eighth audit round (full re-read of cache.go, trie.go, and every test
file; all six scenario classes re-traced), one Low:
- casMatches sampled the clock before loading the expiry word
(!l.expired(now())): now() was evaluated as the argument, then expired
loaded the word. A CAS caller descheduled between the two judged a
fresh word against a stale clock, so a kill that fully completed while
it was parked went unseen: for Expire, the stored word n_e-1 compares
above the stale clock; for a lapsing TTL, the fixed word never falls
below a clock sampled before the lapse. Either way the publish lands
against a value TryGet already certifies as Miss, and no sequential
order explains kill, TryGet=Miss, CAS=true, TryGet=Hit(new) (a
CAS-first order requires the final TryGet to miss the freshly swapped
value; kill-first orders require the CAS to fail on the dead match).
The sixth round closed the allocation-side window with the pre-publish
re-validation, but the re-validation itself kept this instructions-wide
clock-vs-word gap open on both kill modes. casMatches now takes its
(word, clock) pair from expiresNow, exactly as every read path does:
the clock postdates the word it judges, so any kill that completes
before the pair is taken is always seen, and an idle extension racing
the pair re-derives instead of spuriously failing a continuously-live
match (the round-five misclassification class, which is why the
simpler unconfirmed word-then-clock ordering was not an option). The
residue is now exactly the instructions between the re-validation and
the pointer CAS, which is what the CompareAndSwap / CompareAndDelete
docs already document as best effort.
Pinned by TestTryCAS_RevalidationPairsWordAndClock via a counting
expiresPairingHook (parks the re-validation's paired read, the call's
second), covering {CompareAndSwap, CompareAndDelete} x {TTL lapse,
Expire}; all four subtests confirmed red against the pre-fix ordering
with the hook held at the old deschedule point. The linearizability
checker deliberately runs expiry-free, so the hook pin is the vehicle,
as in rounds three through seven.
Everything else re-verified clean this round: every other expiry-word
read is either paired (entGet, TryGet), CAS-guarded (Expire, Clean's
claim, idle extension), pinned to a deliberately conservative clock
(Range, Delete, Swap's displaced-value report), or feeds a retry loop
that re-derives before acting (Get's replace gate); the claim sentinel,
tombstone terminality, walk single-visit, and prune lock ordering all
re-traced clean. Full -race suite green; vet clean on native and
linux/386.
Doc accuracy, from the eighth audit round: Cache.Range promised "calls fn for every cached value" with no concurrency caveat. The trie walk takes no snapshot — the old read/dirty implementation ranged a promoted read map, which did fix the key set at the start of the call, so the port silently weakened Range's consistency while the doc stayed one line. State the actual contract, in sync.Map.Range's terms: no key is visited more than once (pinned by TestConcurrentRange's seen-twice check under churn), a concurrently mutated or expiring key may be reflected from any point during the call (Range classifies every entry at one clock pinned at entry), and Range does not block other methods — fn may re-enter the cache (pinned by TestMapRangeNestedCall). Set.Range points at the Cache caveat, following the Item "See Cache.Get" precedent.
Doc accuracy, from the ninth audit round: the Get family's stale sentence covered only the errored case — 'if stale values are enabled, the currently cached value has an error, and there is an unexpired stale value, this returns the stale value and no error' — wording inherited verbatim from the read/dirty implementation. The third round taught TryGet's doc that an expired value with an open stale window is returned as Stale, and the fourth round made that window a first-class read state (derived from the expiry word, agreed on by every read and refresh), but Get's doc never learned it: a Get on an expired value with an open window returns the value as Stale immediately while driving the refresh, and it re-runs the miss function for an expired key in the first place — neither of which the doc stated. KeyState's Stale doc and MaxStaleAge already describe both kill modes, so Get was the one surface left claiming the narrower contract. Get, Item.Get, and Set.Get now say it the way TryGet does: the miss runs if the key is not yet cached or the cached value has expired, and an unexpired stale — the value errored, or expired with its window open — is returned with no error. Behavior unchanged and already pinned: TestExpires asserts Get returns (v, nil, Stale) for an expired-with-window value, and TestGet_ConcurrentStaleDuringInFlight pins the piggybacked flavor.
Doc accuracy, from the ninth audit round: MaxErrorAge claimed 'with neither option set, a load error is cached forever and never retried'. The fourth round wrote that sentence weighing only MaxAge; it never weighed MaxIdleAge, whose initial-TTL rule in newExpires applies to every entry — the err branch falls through to the same ttl derivation, so with only MaxIdleAge set a load error is cached for the idle age and then retried, not cached forever (verified live: error Hit within the window, Miss after, next Get re-drives). The behavior is the sensible one — an idle-config cache should not pin errors permanently — so the doc moves to the code: the default is MaxAge, or the idle age when only MaxIdleAge is set (errors share the initial TTL but are never idle-extended; extension is gated on err == nil everywhere). The forever claim is rescoped to neither-MaxAge-nor-MaxIdleAge. Pinned by TestMaxErrorAge_DefaultsToIdleInitialTTL: cached error is a Hit within the idle window, Miss after it lapses, and the next Get drives a fresh load. The no-extension half was already pinned for the MaxAge config by TestMaxIdleAge/no_extend_errors; the new test's early probe runs immediately after the load so even a regressed extension could not outlive the post-window probe.
Doc accuracy, from the tenth audit round: Get's first sentence claimed
the miss function runs "if the key is not yet cached or the cached
value has expired" — an exhaustive-reading condition that misses the
third trigger. A cached error whose own age has not lapsed is replaced
and re-driven whenever a valid stale exists: the slow path's
replacement filter passes any finalized errored loading regardless of
expiry, exactly as MaxErrorAge documents ("a cached error suppresses
repeat queries only when there is no stale value to return ... always
drives a refresh, regardless of the error's remaining age"). The ninth
round's rewrite (052940d) brought the expired-with-stale return into
Get's doc but enumerated only the two run triggers, leaving Get
contradicting MaxErrorAge for live errors.
Get, Item.Get, and Set.Get now list the third trigger, with the
MaxErrorAge rationale parenthesized on Cache.Get. Behavior unchanged
and already pinned: TestExpires's stale-while-error block returns
Stale from a Get whose miss function is concurrently executing
(blocked on a channel) while the cached error is unexpired (no error
age is configured there, so the error would otherwise be cached and
served forever).
Doc accuracy, from the tenth audit round: Clear's doc said only "deletes all keys, resetting it to an empty state", leaving its interplay with in-flight loads underivable. Clear swaps the trie root, so a load in flight at the Clear keeps running against the detached tree: its result still reaches every Get waiting on it (the waiters hold the detached entry and its loading directly), but it is never cached — a later Get for the same key misses and drives a fresh, independent load. This is the per-key Delete caveat applied wholesale, but nothing said Clear behaves like a bulk Delete rather than, say, canceling loads or letting their results re-land in the emptied cache. Cache.Clear now states it; Item.Clear and Set.Clear point at it, following the Item.CompareAndSwap "see Cache.X" precedent. Pinned by TestClear_DoesNotCacheInFlightLoads, the Clear sibling of TestDelete_DuringInFlightLoad: with the miss held open across the Clear, the waiter still receives the load's value, and the post-Clear Get re-runs the miss instead of finding the detached result.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces the read/dirty/promote backing of
cache.Cachewith a concurrenthash-trie modeled on Go's
internal/sync.HashTrieMap, implemented entirely inuserspace (
hash/maphash+unsafe.Sizeof(uintptr(0)); nointernal/abiorinternal/goarch). Public API is unchanged; all documented semantics arepreserved, with two behavior shifts noted below.
The port is followed by fifteen hardening commits: an adversarial audit run
until it converged (two consecutive passes with zero behavioral findings)
fixed 25 behavioral bugs — every one reproduced red against the pre-fix code —
plus a series of doc inaccuracies. The broad strokes are below; each commit
message is self-sufficient.
Why
per-op cost is decent on average but adversarial workloads pay heavily for
the copy. The trie spreads cost across per-bucket locks and eliminates the
promotion cycle.
sync.Mapmoved to a hash trie in Go 1.24, so the old inspirationin this package's docs was stale.
the storage layer and carry over unchanged.
The port (first six commits)
bump to go 1.24, fix stale-at-epoch, tighten Clean— baselinecleanup. Go modules had claimed 1.18 but the code required 1.19+
(
atomic.Pointer[T],atomic.Int64,atomic.Uint32).newStalewasproducing a stale born at the Unix epoch when the main entry's expiry was
<= 0— now bases onnow().Cleancallse.del()directly instead ofc.Delete(k)sincec.eachhas already promoted.expand test suite; fix data race in Get's post-miss return—drives the old implementation to 100% statement coverage first, so the
port has a behavioral oracle. The new race-targeted tests caught a real
data race in the old code:
Cache.Getreadl.vwith no happens-beforeedge when a concurrent
Swaphad replacede.pwith its own loading.add concurrent hash-trie in isolation— newcache/trie.go+unit tests. Lock-free loads;
LoadOrStore/Deleteunder a per-bucketmutex; bottom-up hand-over-hand pruning; root atomically swapped on
Clear; overflow chains for full hash collisions.back Cache with the hash-trie; retire read/dirty machinery—Cacheholds atrie[K, loading[V]]instead ofr/mu/dirty/misses/pd.Deletephysically removes viadeleteEntryIf; keys staypromptly GC-eligible (
TestIssue40999).README: rewrite for the trie-backed cache— new design blurb,fresh benchmark numbers (table below).
port the stdlib sync.Map tests we were missing—TestConcurrentClear,TestMapClearOneAllocation,TestMapRangeNoAllocations,BenchmarkClear.Hardening (remaining fifteen commits)
tombstone resurrection losing a stored value, and Clean evicting newborn
entries — all hammer-reproduced; plus
CompareAndDeleteleaking tombstonesand
NewSet/NewItemnever starting the autoclean goroutine.extension could silently overwrite
Expireor resurrect expired values;Cleancould discard a racing extension, evict entries whose stale wasstill servable, or — through its claim window — resurrect a certified-dead
value as a fresh Stale. Fixed with a CAS-based claim protocol and a
dedicated claim sentinel; expiry math now saturates and the clock is
monotonic.
stalefield conflated the refresh companion with a value's own post-expirygrace window, so
TryGetandGetdisagreed at expiry, older-than-freshestvalues could be served, and born-expired re-anchoring served values of
unbounded age as Stale. Windows are now derived at read time from the
expires word, so every read, refresh, and Clean agree by construction; and
Expirecan no longer move an expiry forward (the invalidation API couldextend data lifetime).
judgment now uses a coherent (expiry word, clock) pair, and
CompareAndSwap/CompareAndDeletere-validate liveness immediately beforepublishing — closing windows where a read could misclassify a
continuously-live entry or a CAS could succeed on a value
TryGetalreadycertified as gone. The irreducible instructions-wide residue is documented
on CAS/CAD as best effort.
Range's no-snapshot contract, theGetfamily'sstale-return and miss-run conditions,
MaxErrorAge's default underMaxIdleAge,Clear's in-flight-load semantics, and the CAS/CAD caveats.Two behaviors were documented as deliberate rather than changed:
Delete/Swapmay report a concurrently finalizing load as the removedvalue, and
MaxErrorAge(0)waiters re-drive serially ("errors are nevercached").
Semantic shifts vs. read/dirty
Both are documented in the code itself:
Rangeloses its promote-snapshot flavor. Previouslyeachpromotedunder
c.muand ranged a fixed key set. Now Range is a lock-free walk;the contract is
sync.Map.Range's, stated onCache.Range.Cleanno longer serializes onc.mu. Iteration is lock-free; per-keyphysical removal takes only the trie's bucket lock.
Verification
Set/Swap/Delete/TryGet/CAS/CADbatches against single keys, verified explainable by areal-time-consistent sequential order. Expiry-adjacent findings are pinned
by deterministic scheduling hooks instead (unexported, nil outside tests).
pruning completeness) at the quiesce point of every concurrent trie test.
pre-fix code; hammer rates for the probabilistic ones are in the commit
messages.
-raceon amd64 and arm64, plus linux/386 without-racefor the32-bit hash-budget path. Coverage ~98.6–99%; the residue is documented
defensive panics.
go test -race ./...green;go vetclean on native andlinux/386.
Benchmarks
Apple M1, Go 1.26,
-count=5 -benchtime=1s,benchstatcomparing thepre-trie baseline to the port commit. (Measured at 7c63a66; the audit rounds
since then added a pre-publish revalidation and paired clock reads on the
CAS/extension paths — worth a
benchstatre-run before merge if those pathsmatter to a consumer.)
Time per op — geomean -35.24%.
Largest wins:
Regressions:
The
*Collisionregressions are intrinsic: the trie physically removes onDelete(required to keep keys GC-eligible perTestIssue40999) so everyDelete takes the bucket lock; pre-trie deferred physical removal to the
promotion cycle.
LoadMostlyMissespays a small fixed cost for the hash walkon every miss.
Allocations.
AdversarialAllocandAdversarialDeletedrop to 0 B/op(pre-trie: 55 and 36 — the cost of dirty-map promotion copying).
SwapMostlyMissesdrops from 4 allocs/op to 2. Remaining per-op allocationson the insert/swap paths are intrinsic: each operation may install a fresh
loading[V]carrying the singleflight machinery (Mutex,WaitGroup,atomic counters) that a plain
sync.Mapdoes not need.