Follow-up to #210: lock-free dynamic-var deref — a design fork (A: copy-on-write vs B: active-binding counter)

Picking up the deref-lock optimization you flagged as a follow-up in #210 ("your top-of-stack-cache / clear-`isDynamic`-on-empty suggestion is worth doing, but as a follow-up with numbers driving it"). I built one version, measured it, and found the decision forks two ways, and which way is right depends on let-go's real read/write ratio for dynamic vars, which you'll have a better feel for than I do.

## Cost

Since `isDynamic` is the declaration flag, it stays set for the var's whole life: `SetDynamic()` is called for every `^:dynamic` var at compile time (compiler.go:1700), and any push sets it too. So `ExecContext.deref` consults the per-context binding stack on *every* read of a dynamic var, taking `bindingStack.mu` even when the var has no active binding in this context. For `*out*`/`*ns*`/etc. that's a lock on a hot path.

That also rules out a literal "clear-`isDynamic`-on-empty": clearing it would break the declaration semantic (a `^:dynamic` var with no current binding must still report dynamic and must still be checked, because it can be bound at any time). So the lever is the lookup, not the flag.

## Option A — copy-on-write binding map

Built and measured this. Hold the binding map behind an `atomic.Pointer`, immutable once published. Reads load it lock-free; writers (push/pop/setCurrent/installSnapshot) serialize on the retained mutex, copy the map, and atomic-swap. Per-context isolation is unchanged. Full suite + `-race` green.

Reads (benchstat, n=8, local M-series; treat the deltas as the signal, absolutes will differ on your box):

```
                                  before        after      delta
VarDerefPreviouslyBound          18.2 ns       6.7 ns      -63%   (declared dynamic, no active binding — the common *out* read)
VarDerefPreviouslyBoundParallel  98.2 ns      11.7 ns      -88%
VarDerefBound                    22.8 ns      14.5 ns      -36%   (read inside an active binding)
VarDerefBoundParallel             118 ns      17.6 ns      -85%   (the contention you raised)
VarDerefRoot / RootParallel / DistinctParallel      unchanged (already lock-free)
geomean                                                    -57%
```

The cost lands on the write side: every binding establishment now allocates fresh maps:

```
BindingPushPop    84 ns -> 364 ns   (+335%)    16 B -> 704 B   (+43x)    1 -> 7 allocs
```

So A fixes every read path, including the parallel-bound contention you called out, but makes `(binding [...] ...)` establishment more expensive. Whether that's a good trade depends on how binding-heavy real workloads are.

## Option B — per-context active-binding counter

Described, not yet built. Keep `isDynamic` as the declaration flag, but add an atomic count of active bindings to each `bindingStack`, and gate the lock: `if v.isDynamic.Load() && ec.bindings.count > 0`. A context with no active bindings reads every dynamic var lock-free; the write path is unchanged but for one atomic add. This is closer to the "clear-on-empty" framing: it returns a context to the fast path when its stack drains.

The trade is narrower: B fixes the common unbound read (`VarDerefPreviouslyBound`, roughly to root speed) at ~zero write cost, but leaves reads *inside* an active binding (including `VarDerefBoundParallel`) on the lock, since the counter is non-zero there. I can build and measure it if the trade looks right to you.

## The decision point

Which tradeoff fits let-go? A buys lock-free reads everywhere (and kills the bound-parallel contention) at a real write-path regression; B is cheaper and lower-risk but only covers the unbound read. My lean is B as the conservative default: don't regress the write path without evidence the reads it buys are hot. But you raised the bound-parallel contention specifically, which only A addresses, so I didn't want to pick for you.

A is on `perf/bound-deref-lock` on my fork if you want to check it out and run the benches. Happy to build B for a side-by-side, or to take this whichever direction you prefer once #210 lands (this stacks on it).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Follow-up to #210: lock-free dynamic-var deref — a design fork (A: copy-on-write vs B: active-binding counter) #211

Cost

Option A — copy-on-write binding map

Option B — per-context active-binding counter

The decision point

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Follow-up to #210: lock-free dynamic-var deref — a design fork (A: copy-on-write vs B: active-binding counter) #211

Description

Cost

Option A — copy-on-write binding map

Option B — per-context active-binding counter

The decision point

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions