Skip to content

cache: replace read/dirty with a concurrent hash-trie#6

Draft
twmb wants to merge 21 commits into
mainfrom
trie-cache
Draft

cache: replace read/dirty with a concurrent hash-trie#6
twmb wants to merge 21 commits into
mainfrom
trie-cache

Conversation

@twmb

@twmb twmb commented Apr 21, 2026

Copy link
Copy Markdown
Owner

Summary

Replaces the read/dirty/promote backing of cache.Cache with a concurrent
hash-trie modeled on Go's internal/sync.HashTrieMap, implemented entirely in
userspace (hash/maphash + unsafe.Sizeof(uintptr(0)); no internal/abi or
internal/goarch). Public API is unchanged; all documented semantics are
preserved, with two behavior shifts noted below.

The port is followed by fifteen hardening commits: an adversarial audit run
until it converged (two consecutive passes with zero behavioral findings)
fixed 25 behavioral bugs — every one reproduced red against the pre-fix code —
plus a series of doc inaccuracies. The broad strokes are below; each commit
message is self-sufficient.

Why

  • The read/dirty design amortizes a write-map promotion across operations; the
    per-op cost is decent on average but adversarial workloads pay heavily for
    the copy. The trie spreads cost across per-bucket locks and eliminates the
    promotion cycle.
  • Upstream sync.Map moved to a hash trie in Go 1.24, so the old inspiration
    in this package's docs was stale.
  • Singleflight, stale values, TTLs, idle-age, and error caching all live above
    the storage layer and carry over unchanged.

The port (first six commits)

  1. 56c981f bump to go 1.24, fix stale-at-epoch, tighten Clean — baseline
    cleanup. Go modules had claimed 1.18 but the code required 1.19+
    (atomic.Pointer[T], atomic.Int64, atomic.Uint32). newStale was
    producing a stale born at the Unix epoch when the main entry's expiry was
    <= 0 — now bases on now(). Clean calls e.del() directly instead of
    c.Delete(k) since c.each has already promoted.
  2. d6e9fc3 expand test suite; fix data race in Get's post-miss return
    drives the old implementation to 100% statement coverage first, so the
    port has a behavioral oracle. The new race-targeted tests caught a real
    data race in the old code: Cache.Get read l.v with no happens-before
    edge when a concurrent Swap had replaced e.p with its own loading.
  3. 8e06ee5 add concurrent hash-trie in isolation — new cache/trie.go +
    unit tests. Lock-free loads; LoadOrStore/Delete under a per-bucket
    mutex; bottom-up hand-over-hand pruning; root atomically swapped on
    Clear; overflow chains for full hash collisions.
  4. 10c3ee7 back Cache with the hash-trie; retire read/dirty machinery
    Cache holds a trie[K, loading[V]] instead of r/mu/dirty/
    misses/pd. Delete physically removes via deleteEntryIf; keys stay
    promptly GC-eligible (TestIssue40999).
  5. 7c63a66 README: rewrite for the trie-backed cache — new design blurb,
    fresh benchmark numbers (table below).
  6. 134488b port the stdlib sync.Map tests we were missing
    TestConcurrentClear, TestMapClearOneAllocation,
    TestMapRangeNoAllocations, BenchmarkClear.

Hardening (remaining fifteen commits)

  • Trie/port races (db5c03d): delete-vs-expand silently skipping a delete,
    tombstone resurrection losing a stored value, and Clean evicting newborn
    entries — all hammer-reproduced; plus CompareAndDelete leaking tombstones
    and NewSet/NewItem never starting the autoclean goroutine.
  • Expiry-word protocol (2f95c76, be44fed, b200842): a concurrent idle
    extension could silently overwrite Expire or resurrect expired values;
    Clean could discard a racing extension, evict entries whose stale was
    still servable, or — through its claim window — resurrect a certified-dead
    value as a fresh Stale. Fixed with a CAS-based claim protocol and a
    dedicated claim sentinel; expiry math now saturates and the clock is
    monotonic.
  • Stale-window state machine (be30581, the two High findings): the
    stale field conflated the refresh companion with a value's own post-expiry
    grace window, so TryGet and Get disagreed at expiry, older-than-freshest
    values could be served, and born-expired re-anchoring served values of
    unbounded age as Stale. Windows are now derived at read time from the
    expires word, so every read, refresh, and Clean agree by construction; and
    Expire can no longer move an expiry forward (the invalidation API could
    extend data lifetime).
  • Linearizable reads and CAS (1afccb4, 0617c23, ac4d8d3): every liveness
    judgment now uses a coherent (expiry word, clock) pair, and
    CompareAndSwap/CompareAndDelete re-validate liveness immediately before
    publishing — closing windows where a read could misclassify a
    continuously-live entry or a CAS could succeed on a value TryGet already
    certified as gone. The irreducible instructions-wide residue is documented
    on CAS/CAD as best effort.
  • Docs (the rest): Range's no-snapshot contract, the Get family's
    stale-return and miss-run conditions, MaxErrorAge's default under
    MaxIdleAge, Clear's in-flight-load semantics, and the CAS/CAD caveats.
    Two behaviors were documented as deliberate rather than changed:
    Delete/Swap may report a concurrently finalizing load as the removed
    value, and MaxErrorAge(0) waiters re-drive serially ("errors are never
    cached").

Semantic shifts vs. read/dirty

Both are documented in the code itself:

  • Range loses its promote-snapshot flavor. Previously each promoted
    under c.mu and ranged a fixed key set. Now Range is a lock-free walk;
    the contract is sync.Map.Range's, stated on Cache.Range.
  • Clean no longer serializes on c.mu. Iteration is lock-free; per-key
    physical removal takes only the trie's bucket lock.

Verification

  • Linearizability checker: concurrent Set/Swap/Delete/TryGet/CAS/
    CAD batches against single keys, verified explainable by a
    real-time-consistent sequential order. Expiry-adjacent findings are pinned
    by deterministic scheduling hooks instead (unexported, nil outside tests).
  • Trie structural validator (hash placement, chain purity, parent pointers,
    pruning completeness) at the quiesce point of every concurrent trie test.
  • Every behavioral fix has a regression test confirmed red against the
    pre-fix code; hammer rates for the probabilistic ones are in the commit
    messages.
  • CI: -race on amd64 and arm64, plus linux/386 without -race for the
    32-bit hash-budget path. Coverage ~98.6–99%; the residue is documented
    defensive panics.
  • At HEAD: go test -race ./... green; go vet clean on native and
    linux/386.

Benchmarks

Apple M1, Go 1.26, -count=5 -benchtime=1s, benchstat comparing the
pre-trie baseline to the port commit. (Measured at 7c63a66; the audit rounds
since then added a pre-publish revalidation and paired clock reads on the
CAS/extension paths — worth a benchstat re-run before merge if those paths
matter to a consumer.)

Time per op — geomean -35.24%.

Largest wins:

Benchmark Pre-trie Trie Δ
AdversarialAlloc 350.8 ns 11.4 ns -96.74%
AdversarialDelete 132.2 ns 6.5 ns -95.10%
SwapMostlyMisses 1032 ns 154 ns -85.11%
LoadOrStoreUnique 1449 ns 592 ns -59.17%
LoadAndDeleteBalanced 10.7 ns 4.4 ns -58.66%
LoadAndDeleteUnique 4.3 ns 1.8 ns -58.59%
LoadOrStoreBalanced 702 ns 318 ns -54.77%
CompareAndDeleteMostlyHits 131 ns 91 ns -30.63%
DeleteCollision 5.1 ns 3.6 ns -29.23%
SwapCollision 338 ns 242 ns -28.47%
CompareAndSwapNoExistingKey 2.5 ns 1.9 ns -22.97%

Regressions:

Benchmark Pre-trie Trie Δ
LoadAndDeleteCollision 7.1 ns 67.6 ns +853.35%
CompareAndSwapCollision 10.1 ns 17.6 ns +73.45%
LoadMostlyMisses 3.4 ns 4.6 ns +33.40%

The *Collision regressions are intrinsic: the trie physically removes on
Delete (required to keep keys GC-eligible per TestIssue40999) so every
Delete takes the bucket lock; pre-trie deferred physical removal to the
promotion cycle. LoadMostlyMisses pays a small fixed cost for the hash walk
on every miss.

Allocations. AdversarialAlloc and AdversarialDelete drop to 0 B/op
(pre-trie: 55 and 36 — the cost of dirty-map promotion copying).
SwapMostlyMisses drops from 4 allocs/op to 2. Remaining per-op allocations
on the insert/swap paths are intrinsic: each operation may install a fresh
loading[V] carrying the singleflight machinery (Mutex, WaitGroup,
atomic counters) that a plain sync.Map does not need.

twmb added 21 commits April 21, 2026 12:23
- Bump go.mod from 1.18 to 1.24. The code has required at least 1.19
  since the switch to generic atomic types (atomic.Pointer[T],
  atomic.Int64, atomic.Uint32). 1.24 also unlocks maps.Copy, which
  replaces the hand-rolled copy loop in promote.
- newStale: when the main entry's expiry is 0 (infinite cache) or -1
  (immediate expiry), base the stale's lifespan on now() instead of
  producing a stale born at the Unix epoch. This was a latent bug:
  Get paths typically regenerated the stale via maybeNewStale before
  it was observed, but the initial stale set by Swap/Set was wrong.
- Clean: call e.del() directly instead of c.Delete(k). c.each already
  promotes when necessary, so every entry the callback sees lives in
  the read map and needs no dirty-map bookkeeping or per-key relock.
- Document that Expire is a no-op on in-flight loads for Cache, Item,
  and Set.
Test coverage climbs from 84.3% to 100% of statements. The new tests
either target concretely uncovered behavior (Item/Set wrappers, autoclean
lifecycle, Clean no-op on MaxStaleAge<0) or target race windows in the
existing read/dirty implementation (Swap-vs-in-flight-Get,
Delete-during-load, Expire-no-op-during-load, maxAge=0 collapsing, the
promote CAS-retry path, Swap observing the promotingDelete sentinel,
Get's unlocked-miss / locked-hit interleaving, CompareAndSwap against a
just-promoted entry). Single -count=1 -race runs land 100% coverage
reliably on this machine; the stress test driving the tightest windows
runs for ~2s against many goroutines.

The Swap-vs-in-flight-Get test caught a real data race (reported by
-race). After Get's final e.get, the original code returned l.v without
synchronizing with any writer: concurrent Swap's defer writes l.v under
l.mu and calls l.wg.Done, but Get was only waiting on whatever loading
e.p pointed to at the time, which could be Swap's replacement (a
different loading). Add l.wg.Wait() before returning l.v to establish
the happens-before edge with whoever finalized l (either setve or
Swap's defer).

New test files:
  - wrappers_test.go: every Item and Set method
  - autoclean_test.go: AutoCleanInterval goroutine lifecycle and
    StopAutoClean idempotency
  - concurrency_test.go: race-targeted tests and the miss/stale
    correctness properties called out in the trie port plan
Introduces cache/trie.go: a generic concurrent hash-trie map modeled on
Go's internal/sync.HashTrieMap, but implemented entirely in userspace
(hash/maphash for the runtime's typed hasher, unsafe.Sizeof(uintptr(0))
for pointer-size, no internal/abi or internal/goarch). The trie is not
yet wired into Cache; that happens in a later commit.

Design:
  - Root is a lazily-allocated indirect node, atomically swapped on
    Clear so readers holding the old root keep operating on it.
  - Interior nodes hold 16 atomic child pointers; leaves hold a key,
    an atomic value slot (*V), and an atomic overflow pointer for
    hash-collision chains.
  - Load is lock-free; LoadOrStore and Delete take the parent's
    per-bucket mutex. Delete prunes empty parents bottom-up while
    holding locks hand-over-hand.
  - expand splits a leaf into a subtree on hash-prefix collision; full
    hash collisions chain through overflow.

Tests (cache/trie_test.go) cover: zero-value trie, basic store/load,
load-or-store semantics with nil-value initialization, single and
collision-chain delete (head and mid-chain), large-key stress (20k keys)
through expand, full hash collisions via a test-only hashFn hook,
concurrent disjoint keys, concurrent contention on the same key,
concurrent delete-prune, range-during-mutation, and the deleteEntry
lock-retry paths under sustained delete contention.

Coverage is 99.1%. The five uncovered statements are defensive panics
at invariant-violation points (tree deeper than the hash has bits, node
type discriminator corrupted, etc.) that cannot fire without memory
corruption; each is documented inline with the reason it is unreachable.

Also:
  - Scale TestMaxIdleAge sleep windows up by ~4x so the timing-based
    assertions tolerate -count=3 -race load. The previous 30ms-within-
    50ms-window spacing had no margin for scheduler delay.
  - Fix TestSwap_CancelsInFlightGet: synchronize on the miss goroutine
    actually completing (via a missDone channel) before asserting the
    miss ran. Previously the test asserted a flag set by a goroutine
    that hadn't been scheduled to run yet under heavy load.
  - Modernize remaining int-range for-loops.
The read/dirty/promote structure is replaced by the concurrent hash-trie
from the previous commit. Public API, stale/TTL semantics, singleflight
collapsing, and CompareAndSwap/Delete behavior are preserved.

Shape of the rewrite:
  - Cache holds a single `trie[K, loading[V]]` instead of r/mu/dirty/
    misses/pd.
  - An `ent[K, V]` type alias (Go 1.24 generic aliases) names the trie's
    concrete entry type so callers can read `*ent[K, V]` instead of
    `*trieEntry[K, loading[V]]` everywhere. All entry-manipulating logic
    (entGet, entTryGet, entDel, entLoad, entMaybeNewStale) is free
    functions over that alias, since Go generics do not permit methods
    on instantiation-specific aliases.
  - Delete physically removes the entry via the trie's deleteEntryIf
    primitive (with a "still tombstoned under lock" predicate to avoid
    clobbering a concurrent Swap resurrection). This lets keys and their
    referents be GC'd, preserving TestIssue40999.
  - Clean walks the trie, tombstones expired entries, and then calls
    deleteEntryIf for each tombstone so the trie's memory footprint does
    not grow unboundedly with delete churn.
  - Range iterates via the trie walk; no snapshot (matches sync.Map and
    the existing Range docstring).
  - Swap's fast path CAS on an existing entry's value slot is preserved
    without any global lock; the slow path goes through loadOrStoreEntry
    and resolves races on the trie's per-bucket mutex.

Added to the trie for Cache's benefit:
  - deleteEntryIf(k, pred): removes only if pred returns true under the
    parent bucket's lock. Callers use this to avoid the
    resurrected-during-delete race.
  - Inline chain removal in deleteEntryIf, retiring the separate
    trieRemoveFromChain helper (it had a "not found" return path that
    became dead after the pred-and-locate restructure).

Two new tests beyond the existing suite:
  - TestGet_ConcurrentStaleDuringInFlight: a second Get observes a
    stale-returning loading that is already in flight from the first Get
    and returns the stale without attempting its own miss.
  - TestGet_StaleRefreshCASRace: 16 concurrent Gets on an expired-with-
    stale key exercise the slow-path CAS-retry that replaces a finalized
    prev loading with a fresh loading carrying a stale snapshot.
  - TestClean_SkipsInFlightLoads: Clean must skip entries whose load has
    not finalized.
  - TestTrie_DeleteRaceAgainstExpand: deleteEntryIf's "slot changed to
    non-entry after lock" branch, triggered by concurrent insert on
    hash-prefix-colliding keys forcing an expand during delete.

Full suite passes under -race; coverage settles at 98.6%-99.0% of
statements across independent runs. The remaining uncovered lines are
documented defensive panics (invariant violations that require
corruption to fire) and two tightly-timed race branches that need
scheduler interleavings we cannot force from userspace without test
hooks.

Benchmark comparison against the pre-trie baseline is deferred to the
cleanup commit.
Update the design blurb (no more read/dirty promotion), the coverage
footnote (97% → ~99% plus a note on the defensive-panic residue), and
the microbenchmark section (fresh numbers from the current code on Apple
M1; the old table compared sync.Map to the pre-trie cache and is no
longer meaningful). Flag that per-op allocations on the insert/swap
paths are intrinsic — the singleflight machinery (loading[V] with a
Mutex, WaitGroup, and atomic counters) is something sync.Map doesn't
carry, and carrying it costs a single heap allocation per inserted key.
Add three tests and one benchmark that exist in sync/map_test.go and
sync/map_bench_test.go but were not in our copy:

  - TestConcurrentClear: 10 writers, 10 readers, and 10 Clear goroutines
    run concurrently. Correctness is no panic and no phantom keys (keys
    we never Stored appearing in Range after the dust settles).
  - TestMapClearOneAllocation: asserts Clear allocates ≤1 time. Cache's
    Clear replaces the trie root with one fresh indirect node, matching
    the sync.Map guarantee.
  - TestMapRangeNoAllocations: asserts Range does not allocate. The
    benchmarks already showed 0 B/op but an explicit AllocsPerRun test
    guards against a regression in the Range closure wiring.
  - BenchmarkClear: Clear throughput.

Skipped from stdlib (intentionally):

  - TestMapMatchesDeepCopy: a second mapInterface oracle (DeepCopyMap).
    Our TestCacheMatchesRWMutex already covers the semantics; the stdlib
    keeps a second oracle mainly to catch sync.Map-specific promotion
    bugs we no longer have.
  - TestMapMatchesHashTrieMap: stdlib's internal hash trie as the
    oracle. We are the hash trie.
  - TestHashTrieMap{BadHash,TruncHash}: pathological-hash tests. Our
    TestTrie_FullHashCollision{,DeleteHead} already exercises full
    collisions via the hashFn hook.
  - BenchmarkHashTrieMapLoad[Small|Large], LoadOrStore[Large]: trie-
    level microbenchmarks. Redundant with the cache-level benchmarks
    since Cache is thin over trie.
An audit against internal/sync.HashTrieMap (which trie.go is modeled on)
and the prior read/dirty implementation found three interleaving bugs,
each reproduced with hammer tests before fixing:

1) trie.deleteEntryIf gave up (returned nil) when the slot it walked to
   was expanded into an indirect node between its unlocked walk and its
   lock, silently skipping a delete of a present key (3,109/200k rounds).
   Upstream's find() retries from the root in this case; we now do too.

2) Tombstoned entries (nil value slot) could be resurrected by Swap/Get
   via a bare CAS(nil, loading) that nothing serialized against Delete's
   and Clean's pred-then-unlink, so a concurrent Set/Swap could land in
   an entry that was then physically unlinked: the swap reported success
   (Miss = "stored") and the value vanished (66/200k rounds). The old
   read/dirty code was immune via the promotingDelete sentinel, which
   made tombstone removal a single CAS; the port lost that atomicity.

   A nil value slot is now terminal: no code path ever stores over it.
   Writers that find a tombstone unlink the dead entry and re-create the
   key as a fresh entry, which makes deleteEntryIf's nil check stable
   under the bucket lock. Delete also now derives its return value from
   the loading it actually displaced, so the value reported deleted is
   the value removed.

3) Get created its entry with loadOrStoreEntry(k, nil) and CAS'd the
   loading in afterwards; in that window a fresh, never-expired entry
   was indistinguishable from a tombstone and Clean would evict it
   (1/500k rounds), also breaking request collapsing for that key. Get
   now creates entries with the loading already in place; a linked entry
   never has a nil slot except as a tombstone.

Also: CompareAndDelete now physically unlinks like Delete (tombstones no
longer leak until Clean); NewSet/NewItem now actually honor
AutoCleanInterval (Set.StopAutoClean previously stopped a goroutine that
was never started) and Item gains Clean/StopAutoClean; onlyFinalized is
collapsed into finalized (the third state it guarded against retired
with the read/dirty machinery); stale comments describing promote/read/
dirty are rewritten; TestTrie_DeleteRaceAgainstExpand now asserts the
delete succeeds (it previously only asserted no-crash, exercising bug 1
without catching it); regression tests added for 2 and 3.
The expand-race test's hash literals overflowed uintptr on 32-bit
platforms, so 'go vet' / compiling the tests failed there (caught by a
GOARCH=386 cross-vet; the library itself was fine). Build the constants
with shifts off ptrBits so the test exercises the same top-4-bits
collision shape at either word size.
Expire could be silently overwritten by a concurrent idle extension: the
extension stored a fresh expiry computed from a clock sampled BEFORE
loading the expiry, so an Expire whose now()-1 landed between the two
samples looked unexpired and was extended over. Extension now (a) loads
the expiry before sampling the clock, so any Expire value it observes is
necessarily in its past, and (b) CASes against the observed expiry, so
an Expire landing between the load and the write wins. Notably, fix (b)
alone was not enough — the regression hammer caught the residual (a)
window at ~1/2000 rounds.

CompareAndSwap/CompareAndDelete now agree with TryGet about what exists:
expired entries and errored entries (whose v is just the miss function's
placeholder return) no longer match.

Get's docs now state the miss function's panic behavior (it runs on an
internal goroutine; a panic crashes the process) and the same-key
re-entrancy deadlock.

New verification layers:

- A linearizability checker: batches of concurrent Set/Swap/Delete/
  TryGet/CAS/CAD against single keys, verified to be explained by some
  real-time-consistent sequential ordering. This catches atomicity bugs
  generically rather than per-interleaving (the prior tombstone-
  resurrection bug class fails this checker directly).
- A trie structural validator (hash placement, overflow chain purity,
  parent pointers, dead flags, pruning completeness) run at the quiesce
  point of every concurrent trie test.
- GitHub Actions CI: -race on amd64 and arm64 (weaker memory ordering
  surfaces reordering bugs x86's TSO hides), plus linux/386 without
  -race (unsupported there) to exercise the 32-bit hash-budget path.

The README test-suite paragraph is rewritten to describe the actual
layers and the measured 98% coverage.
Implements the second audit round (independent session), verified by a
review pass with full interleaving traces and mutation-tested regressions:

- now() is monotonic (time.Since of a process epoch, padded 1s clear of
  the 0 sentinel). This makes the load-expiry-then-sample-clock ordering
  argument airtight rather than dependent on wall-clock sanity, and
  immunizes TTLs against NTP steps. Trade-off (suspend pauses aging)
  documented on epoch.

- Idle extension can no longer resurrect expired values: the extension
  CAS is gated on !expired, fixing the waited-expired path where a
  MaxAge(0) collapse value was revived for the idle window; and the
  driving Get's tail no longer extends (it publicly returns Miss, not a
  Hit, and extension was silently rewriting the MaxAge-born TTL).

- Clean and idle extension now contend on the expiry word: Clean claims
  it (CAS to the expired sentinel) before tombstoning, so a racing
  extension either wins (entry kept) or loses the same way it loses to
  Expire, instead of being silently thrown away by the eviction.

- Clean no longer evicts entries whose stale is still being served: for
  born-expired entries (MaxAge(0)/MaxErrorAge(0) store -1) the grace
  window anchored at the epoch rather than the entry. Pre-existing bug
  found while verifying the Clean change.

- Get's replace decision is re-derived from prev itself, and the stale
  piggyback is pinned to the slot it read, so a load that finalizes live
  in the window between Get's slot read and entGet's cannot be demoted
  to a stale and redundantly re-driven.

- Delete and Swap classify the displaced value with a clock sampled at
  the displacing CAS, not after, so a value removed live is not
  misreported as expired.

- All expiry arithmetic saturates at MaxInt64 (satAdd), so huge
  Durations mean "effectively forever" instead of wrapping negative
  into born-expired (or, in Clean, evict-everything).

- Doc fixes: NaN / non-comparable key behavior, MaxErrorAge's
  no-throttling-with-stales caveat, MaxIdleAge best-effort note,
  MaxStaleAge without MaxAge, autoclean pinning the cache, terminal-
  tombstone contract on trieEntry, StopAutoClean/Set wording.

All four new regression tests fail against the reverted implementation
(the redrive test by double-close panic); the full -race suite, the
linearizability checker, and the prior round's hammers pass on amd64,
with vet clean on linux/386.
Third audit round, independently verified (interleaving traces of every
expires-word consumer against the new sentinel; mutation check: the
regression tests fail against the reverted implementation):

- Clean now claims an entry's expiry word with a dedicated sentinel
  (expiredClaimed, -2) instead of -1. -1 is the born-expired sentinel,
  which newStale deliberately re-anchors at now — so a Get snapshotting
  a stale in the window between Clean's claim and its tombstone CAS
  would resurrect the certified-dead value as a fresh Stale for up to
  another MaxStaleAge. entMaybeNewStale treats a claimed word as "no
  stale". Pinned deterministically by performing Clean's claim by hand
  (TestClean_ClaimWindowDoesNotResurrectDeadStale) plus a probabilistic
  hammer against the real Clean.

- Born-expired entries (MaxAge(0) / MaxErrorAge(0) store -1) are now
  reclaimable as soon as their stale dies: their grace window was
  computed as satAdd(-1, maxStaleAge), anchored at the process epoch,
  so stale-less dead entries (error churn under MaxErrorAge(0)) could
  not be reclaimed until process uptime exceeded MaxStaleAge — and
  never, for huge stale ages. Reclaim is gated on TryGet visibility:
  value expired and stale dead.

- Idle extension re-samples the clock immediately before publishing,
  shrinking the revive-after-expiry window from "whole deschedule since
  the expiry check" to a few instructions; the residual window is
  documented as best effort on MaxIdleAge.

- Doc accuracy: Swap/Set "load is canceled" -> "load's result is
  discarded (the miss function is not interrupted)"; TryGet docs now
  cover expired-with-valid-stale returning Stale; MaxStaleAge accepts
  any negative as "forever" and documents the store-anchored stale
  window of Expired-but-unrefreshed keys; Item/Set Clean docs match
  Cache.Clean's actual rules.
…orward

Fourth audit round (expiry/stale state machine), independently verified:
all seven of the audit's repro tests were confirmed red against b200842
and are green now; eleven regression tests are ported into the suite.

The round found three root causes composing into five bugs, two High:

- The stale field conflated two meanings: the refresh companion (the
  previous generation, served while a replacement loads or errors) and a
  value's own post-expiry grace window. Only Set/Swap values got a self
  window (finalizedLoading aliased the companion to the value); Get-
  loaded values had none — TryGet missed at the instant of expiry while
  Get served a stale for the same state — or carried the previous
  generation, letting TryGet serve an older value than the freshest dead
  one. Idle extensions moved the expiry but not the frozen window, so
  long-lived entries died with their promised window already closed.

  The self window is now derived at read time from the expires word
  (staleOpen: anchor + MaxStaleAge, where anchor is the expiry, the
  birth for born-expired values, or the Expire timestamp), used
  identically by entGet, loadingTryGet, Swap's classification, and
  Clean's reclaim gate, so every read and refresh agrees by
  construction. The stale field reverts to a pure refresh companion
  (in-flight and errored reads only); finalizedLoading stores none.

- Born-expired words (-1) destroyed the only datum that could anchor
  their window: entMaybeNewStale re-anchored a certified-dead value's
  stale at every Get-driven refresh, serving values of unbounded age as
  Stale (TryGet: Miss, Get: Stale, no race required). The word now
  carries the negated birth (-now(); the epoch pad keeps it clear of
  expiredClaimed), and the window anchors at the birth everywhere.

- Expire blindly stored now()-1, moving already-expired words forward —
  re-anchoring dead stale windows (the invalidation API extending data
  lifetime) and overwriting Clean's claim sentinel inside its claim
  window. Expire now CASes and only ever moves the word backward;
  expiring an expired entry is a no-op, documented.

Documented, not changed: Delete/Swap may report a concurrently
finalizing load as the removed value (closing it costs more than the
return triple is worth), and under MaxErrorAge(0) collapsed waiters on
an erroring load re-drive serially (the literal "errors are never
cached" contract; sharing completed errors is a design change deserving
its own decision). Plus doc accuracy: MaxStaleAge's uniform anchoring,
MaxIdleAge shortening on access, errors-cached-forever default, and the
expires-word grammar at expiredClaimed.
Fifth audit round (the read-time stale window machinery), one Low:

- A read assembled its classification from two separate loads: the
  expires word, then the clock. A reader descheduled between them while
  an idle extension landed classified the dead pre-extension word at a
  fresh clock — TryGet reported Stale (with MaxStaleAge) or Miss
  (without) for an entry that was live at every instant of its call,
  and a later TryGet returned Hit with no intervening write, a history
  no sequential order explains. Get healed itself by re-deriving in its
  slow loop; TryGet surfaced it.

  expiresNow now pairs the word with the clock coherently: load,
  sample, confirm the word unchanged, re-deriving on movement. Each
  retry re-loads before re-sampling, preserving the ordering that
  classifies a concurrent Expire as expired rather than extending over
  it. Both readers use it; Range's and Delete's caller-pinned clocks
  predate the word load, so that inverted pairing is already
  conservative. The repros hold the deschedule window open via
  expiresPairingHook, an unexported nil-by-default test hook (the trie
  hashFn precedent), and were confirmed red against the pre-fix code.

- The finalized classification was hand-mirrored across loadingTryGet,
  entGet, and Swap's displaced-value report — a divergence class that
  already bit once. classifyFinalized is now the one shared ladder;
  entGet's waited-courtesy precheck stays at its call site, and the
  idle-extension epilogue keys off the classified state.

Doc precision: an Expire racing the entry's natural death can land just
after it and trim (never grow) the dead value's remaining stale window;
non-positive MaxIdleAge is ignored entirely.

Plus five pins for cells the suite did not assert: Swap displacing a
born-expired value (both sides of the birth-anchored window), Expire on
an expired-but-in-window word (byte-identical no-op, positive and
negated-birth words), Clean keeping an expired error whose companion is
still servable, exact-nanosecond window boundaries across staleOpen,
newStale, and stale.expired, and the safe (backward) direction of the
paired read.
Sixth audit round (full re-read of cache.go and trie.go plus every test
file, all six scenario classes re-traced), one Low:

- tryCAS sampled the matched value's liveness once, then allocated the
  replacement loading, then published with the pointer CAS — the whole
  allocation (the widest deschedule risk in the call) sat between the
  liveness sample and the publish. A CAS caller parked there while the
  value's TTL lapsed still published: TryGet had already certified Miss,
  and no sequential order explains TryGet=Miss followed by CAS=true
  (CAS-first requires the TryGet to see the freshly installed value;
  TryGet-first requires the CAS to fail on an expired match).

  The window cannot be closed outright: a successful swap takes effect
  at the pointer CAS, and the clock cannot be read at that exact
  instant — pre-claiming the expiry word would instead delay observable
  expiry, and post-CAS rollback can race a reader that already observed
  the new value. So this follows the MaxIdleAge precedent: tryCAS now
  re-validates immediately before every CAS attempt, shrinking the
  window from "alloc + deschedule" to a few instructions, and the
  residue is documented on CompareAndSwap and CompareAndDelete (the
  mismatch direction stays exact). The repro holds the deschedule open
  via casPublishHook (nil-by-default, the expiresPairingHook precedent)
  and was confirmed red against the pre-fix ordering for both CAS and
  CAD. The linearizability checker deliberately runs an expiry-free
  config (expiry is a clock transition, not an operation), so the hook
  pin is the vehicle for this finding, as in rounds three through five.

Doc accuracy: AutoCleanInterval now states that with a negative
MaxStaleAge the goroutine is not started at all (Clean would be a
no-op; pinned by TestAutoClean_DisabledWhenStaleAgeNegative); Item's
CompareAndSwap/CompareAndDelete point at the Cache caveat.

Everything else re-verified clean this round: tombstone terminality
under the bucket lock, the claim sentinel against every expires-word
consumer, derived stale windows being absolute intervals (a late
snapshot cannot extend one), prune lock ordering (child linked implies
parent not dead), Get's pin-and-re-derive slow path, and the
WaitGroup/mutex finalize-once protocol.
Seventh audit round (full re-read of cache.go, trie.go, and every test
file; all six scenario classes re-traced), two Lows, both doc accuracy
on the same caveat surface:

- The sixth round's commit message claims Item's CompareAndSwap and
  CompareAndDelete "point at the Cache caveat", but that change never
  made it into the diff: the Item docs carried neither the
  stale-values-not-considered sentence nor the best-effort expiry
  caveat. Both now point at the Cache method docs, following the
  Item.Get "See Cache.Get" precedent.

- The Cache caveat described the best-effort residue as "a value whose
  TTL lapses while the swap is in flight"; the same instructions-wide
  window also lets a concurrent Expire's kill slip through, and that
  direction is otherwise unexplainable (Expire-first requires the CAS
  to fail on the dead match; CAS-first requires the Expire to kill the
  swapped-in value, which a later TryGet=Hit contradicts). The window
  is irreducible for Expire exactly as for TTL lapse — Expire writes
  the expiry word while the publish is a pointer CAS, and the sixth
  round already weighed and rejected pre-claiming the word and post-CAS
  rollback — so the caveat now says expiry generally: TTL or Expire.

  The wide (allocation) side of the Expire direction is exact, not
  best effort: the pre-publish re-validation loads the expiry word
  fresh, so an Expire landing while the caller is parked there is
  caught. Pinned by TestTryCAS_RevalidationCatchesExpire via
  casPublishHook (the TTL sibling only lapses a deadline), confirmed
  red against the pre-revalidation ordering for both CAS and CAD.

Everything else re-verified clean this round: the claim sentinel
against Swap/Delete/Expire/extension (including a benign by-value ABA
on the expires word: a claim landing after extend-then-Expire returns
the word to its evaluated value is semantically identical to the
no-race claim), tombstone-shell unlink-then-recreate under every
writer path (never two live entries for one key), walk's
single-visit-per-key claim under expand and prune, Clear against
in-flight writers on the old root, the error/companion classification
ladder, and Clean's conservative companion gate for err-free values
(provably never outlives the derived self-window gate).
Eighth audit round (full re-read of cache.go, trie.go, and every test
file; all six scenario classes re-traced), one Low:

- casMatches sampled the clock before loading the expiry word
  (!l.expired(now())): now() was evaluated as the argument, then expired
  loaded the word. A CAS caller descheduled between the two judged a
  fresh word against a stale clock, so a kill that fully completed while
  it was parked went unseen: for Expire, the stored word n_e-1 compares
  above the stale clock; for a lapsing TTL, the fixed word never falls
  below a clock sampled before the lapse. Either way the publish lands
  against a value TryGet already certifies as Miss, and no sequential
  order explains kill, TryGet=Miss, CAS=true, TryGet=Hit(new) (a
  CAS-first order requires the final TryGet to miss the freshly swapped
  value; kill-first orders require the CAS to fail on the dead match).

  The sixth round closed the allocation-side window with the pre-publish
  re-validation, but the re-validation itself kept this instructions-wide
  clock-vs-word gap open on both kill modes. casMatches now takes its
  (word, clock) pair from expiresNow, exactly as every read path does:
  the clock postdates the word it judges, so any kill that completes
  before the pair is taken is always seen, and an idle extension racing
  the pair re-derives instead of spuriously failing a continuously-live
  match (the round-five misclassification class, which is why the
  simpler unconfirmed word-then-clock ordering was not an option). The
  residue is now exactly the instructions between the re-validation and
  the pointer CAS, which is what the CompareAndSwap / CompareAndDelete
  docs already document as best effort.

  Pinned by TestTryCAS_RevalidationPairsWordAndClock via a counting
  expiresPairingHook (parks the re-validation's paired read, the call's
  second), covering {CompareAndSwap, CompareAndDelete} x {TTL lapse,
  Expire}; all four subtests confirmed red against the pre-fix ordering
  with the hook held at the old deschedule point. The linearizability
  checker deliberately runs expiry-free, so the hook pin is the vehicle,
  as in rounds three through seven.

Everything else re-verified clean this round: every other expiry-word
read is either paired (entGet, TryGet), CAS-guarded (Expire, Clean's
claim, idle extension), pinned to a deliberately conservative clock
(Range, Delete, Swap's displaced-value report), or feeds a retry loop
that re-derives before acting (Get's replace gate); the claim sentinel,
tombstone terminality, walk single-visit, and prune lock ordering all
re-traced clean. Full -race suite green; vet clean on native and
linux/386.
Doc accuracy, from the eighth audit round: Cache.Range promised "calls fn
for every cached value" with no concurrency caveat. The trie walk takes no
snapshot — the old read/dirty implementation ranged a promoted read map,
which did fix the key set at the start of the call, so the port silently
weakened Range's consistency while the doc stayed one line. State the
actual contract, in sync.Map.Range's terms: no key is visited more than
once (pinned by TestConcurrentRange's seen-twice check under churn), a
concurrently mutated or expiring key may be reflected from any point
during the call (Range classifies every entry at one clock pinned at
entry), and Range does not block other methods — fn may re-enter the
cache (pinned by TestMapRangeNestedCall). Set.Range points at the Cache
caveat, following the Item "See Cache.Get" precedent.
Doc accuracy, from the ninth audit round: the Get family's stale sentence
covered only the errored case — 'if stale values are enabled, the
currently cached value has an error, and there is an unexpired stale
value, this returns the stale value and no error' — wording inherited
verbatim from the read/dirty implementation. The third round taught
TryGet's doc that an expired value with an open stale window is returned
as Stale, and the fourth round made that window a first-class read state
(derived from the expiry word, agreed on by every read and refresh), but
Get's doc never learned it: a Get on an expired value with an open window
returns the value as Stale immediately while driving the refresh, and it
re-runs the miss function for an expired key in the first place — neither
of which the doc stated. KeyState's Stale doc and MaxStaleAge already
describe both kill modes, so Get was the one surface left claiming the
narrower contract.

Get, Item.Get, and Set.Get now say it the way TryGet does: the miss runs
if the key is not yet cached or the cached value has expired, and an
unexpired stale — the value errored, or expired with its window open — is
returned with no error. Behavior unchanged and already pinned: TestExpires
asserts Get returns (v, nil, Stale) for an expired-with-window value, and
TestGet_ConcurrentStaleDuringInFlight pins the piggybacked flavor.
Doc accuracy, from the ninth audit round: MaxErrorAge claimed 'with
neither option set, a load error is cached forever and never retried'.
The fourth round wrote that sentence weighing only MaxAge; it never
weighed MaxIdleAge, whose initial-TTL rule in newExpires applies to every
entry — the err branch falls through to the same ttl derivation, so with
only MaxIdleAge set a load error is cached for the idle age and then
retried, not cached forever (verified live: error Hit within the window,
Miss after, next Get re-drives). The behavior is the sensible one — an
idle-config cache should not pin errors permanently — so the doc moves to
the code: the default is MaxAge, or the idle age when only MaxIdleAge is
set (errors share the initial TTL but are never idle-extended; extension
is gated on err == nil everywhere). The forever claim is rescoped to
neither-MaxAge-nor-MaxIdleAge.

Pinned by TestMaxErrorAge_DefaultsToIdleInitialTTL: cached error is a Hit
within the idle window, Miss after it lapses, and the next Get drives a
fresh load. The no-extension half was already pinned for the MaxAge
config by TestMaxIdleAge/no_extend_errors; the new test's early probe
runs immediately after the load so even a regressed extension could not
outlive the post-window probe.
Doc accuracy, from the tenth audit round: Get's first sentence claimed
the miss function runs "if the key is not yet cached or the cached
value has expired" — an exhaustive-reading condition that misses the
third trigger. A cached error whose own age has not lapsed is replaced
and re-driven whenever a valid stale exists: the slow path's
replacement filter passes any finalized errored loading regardless of
expiry, exactly as MaxErrorAge documents ("a cached error suppresses
repeat queries only when there is no stale value to return ... always
drives a refresh, regardless of the error's remaining age"). The ninth
round's rewrite (052940d) brought the expired-with-stale return into
Get's doc but enumerated only the two run triggers, leaving Get
contradicting MaxErrorAge for live errors.

Get, Item.Get, and Set.Get now list the third trigger, with the
MaxErrorAge rationale parenthesized on Cache.Get. Behavior unchanged
and already pinned: TestExpires's stale-while-error block returns
Stale from a Get whose miss function is concurrently executing
(blocked on a channel) while the cached error is unexpired (no error
age is configured there, so the error would otherwise be cached and
served forever).
Doc accuracy, from the tenth audit round: Clear's doc said only
"deletes all keys, resetting it to an empty state", leaving its
interplay with in-flight loads underivable. Clear swaps the trie root,
so a load in flight at the Clear keeps running against the detached
tree: its result still reaches every Get waiting on it (the waiters
hold the detached entry and its loading directly), but it is never
cached — a later Get for the same key misses and drives a fresh,
independent load. This is the per-key Delete caveat applied wholesale,
but nothing said Clear behaves like a bulk Delete rather than, say,
canceling loads or letting their results re-land in the emptied cache.

Cache.Clear now states it; Item.Clear and Set.Clear point at it,
following the Item.CompareAndSwap "see Cache.X" precedent. Pinned by
TestClear_DoesNotCacheInFlightLoads, the Clear sibling of
TestDelete_DuringInFlightLoad: with the miss held open across the
Clear, the waiter still receives the load's value, and the post-Clear
Get re-runs the miss instead of finding the detached result.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant