Skip to content

Follow-up to #210: lock-free dynamic-var deref — a design fork (A: copy-on-write vs B: active-binding counter) #211

@mparrett

Description

@mparrett

Picking up the deref-lock optimization you flagged as a follow-up in #210 ("your top-of-stack-cache / clear-isDynamic-on-empty suggestion is worth doing, but as a follow-up with numbers driving it"). I built one version, measured it, and found the decision forks two ways, and which way is right depends on let-go's real read/write ratio for dynamic vars, which you'll have a better feel for than I do.

Cost

Since isDynamic is the declaration flag, it stays set for the var's whole life: SetDynamic() is called for every ^:dynamic var at compile time (compiler.go:1700), and any push sets it too. So ExecContext.deref consults the per-context binding stack on every read of a dynamic var, taking bindingStack.mu even when the var has no active binding in this context. For *out*/*ns*/etc. that's a lock on a hot path.

That also rules out a literal "clear-isDynamic-on-empty": clearing it would break the declaration semantic (a ^:dynamic var with no current binding must still report dynamic and must still be checked, because it can be bound at any time). So the lever is the lookup, not the flag.

Option A — copy-on-write binding map

Built and measured this. Hold the binding map behind an atomic.Pointer, immutable once published. Reads load it lock-free; writers (push/pop/setCurrent/installSnapshot) serialize on the retained mutex, copy the map, and atomic-swap. Per-context isolation is unchanged. Full suite + -race green.

Reads (benchstat, n=8, local M-series; treat the deltas as the signal, absolutes will differ on your box):

                                  before        after      delta
VarDerefPreviouslyBound          18.2 ns       6.7 ns      -63%   (declared dynamic, no active binding — the common *out* read)
VarDerefPreviouslyBoundParallel  98.2 ns      11.7 ns      -88%
VarDerefBound                    22.8 ns      14.5 ns      -36%   (read inside an active binding)
VarDerefBoundParallel             118 ns      17.6 ns      -85%   (the contention you raised)
VarDerefRoot / RootParallel / DistinctParallel      unchanged (already lock-free)
geomean                                                    -57%

The cost lands on the write side: every binding establishment now allocates fresh maps:

BindingPushPop    84 ns -> 364 ns   (+335%)    16 B -> 704 B   (+43x)    1 -> 7 allocs

So A fixes every read path, including the parallel-bound contention you called out, but makes (binding [...] ...) establishment more expensive. Whether that's a good trade depends on how binding-heavy real workloads are.

Option B — per-context active-binding counter

Described, not yet built. Keep isDynamic as the declaration flag, but add an atomic count of active bindings to each bindingStack, and gate the lock: if v.isDynamic.Load() && ec.bindings.count > 0. A context with no active bindings reads every dynamic var lock-free; the write path is unchanged but for one atomic add. This is closer to the "clear-on-empty" framing: it returns a context to the fast path when its stack drains.

The trade is narrower: B fixes the common unbound read (VarDerefPreviouslyBound, roughly to root speed) at ~zero write cost, but leaves reads inside an active binding (including VarDerefBoundParallel) on the lock, since the counter is non-zero there. I can build and measure it if the trade looks right to you.

The decision point

Which tradeoff fits let-go? A buys lock-free reads everywhere (and kills the bound-parallel contention) at a real write-path regression; B is cheaper and lower-risk but only covers the unbound read. My lean is B as the conservative default: don't regress the write path without evidence the reads it buys are hot. But you raised the bound-parallel contention specifically, which only A addresses, so I didn't want to pick for you.

A is on perf/bound-deref-lock on my fork if you want to check it out and run the benches. Happy to build B for a side-by-side, or to take this whichever direction you prefer once #210 lands (this stacks on it).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    Status
    In Progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions