fit: honest, monotonic parity gate + OOD safety gate by adrida · Pull Request #50 · adrida/tracer

adrida · 2026-06-14T15:41:22Z

What

Makes the parity gate honest, stable, and safe, and adds a distance-based OOD safety net.

1. Honest held-out gate (was: in-sample)

The gate previously selected the accept threshold and measured its agreement on the same calibration data, so a policy could clear the target by in-sample luck and then break the contract on real traffic. It now certifies on a held-out slice using an exact Clopper-Pearson lower bound, not a point estimate.

2. Monotonic in target

A bug made coverage non-monotonic (e.g. 0% at TA=0.90 but 87% at 0.95): the gate kept only the single highest-coverage threshold and gave up if it failed verification. It now verifies candidates in coverage order and deploys the highest-coverage one that clears on the held-out split.

3. Hybrid select+verify (recovers coverage)

A pure 50/50 held-out split was provably honest but too data-hungry, certifying 0% at strict targets on small calibration sets even when the confidence ranking clearly supported partial coverage. The gate now selects on a 70% slice (CP lower bound) and verifies generalization on a held-out 30% (point estimate). Validated across seeds with zero held-out contract violations; recovers strong coverage at strict targets that the split discarded.

4. Distance-based OOD safety gate

The parity gate only guarantees agreement on traffic resembling the calibration data; off-distribution inputs (gibberish, off-domain, prompt-injection) still got a confident answer. New commodity kNN-distance gate (global + per-predicted-label thresholds, per-label can only loosen) defers inputs far from the training distribution regardless of surrogate confidence.

Plus NaN-robustness in acceptor fitting and an OOD-robustness routing test.

Behavior change

At a given target, coverage now reflects an honest held-out lower bound. A surrogate whose true agreement sits just under the target will correctly defer where the old gate over-deployed. The frontier.json shows the achievable coverage per target.

Tests

tests/test_gate.py, tests/test_ood.py, updated tests/test_fit.py. Full suite green.

…imate The parity gate previously selected the accept threshold and reported its teacher agreement on the same calibration set, and accepted accept-all configurations on a raw point estimate. Both let a small or lucky set clear the target and then break the contract on real traffic. - build_global now certifies only when the Clopper-Pearson lower bound on cal agreement clears the target (was a raw point estimate). - _calibrate_threshold selects the threshold on a selection split, then verifies it on a held-out split, deploying only if the held-out lower bound clears the target. Falls back to a single-set lower-bound gate when data is too small to split, and refuses rather than guessing. - Guard NaN/inf surrogate probabilities before the acceptor fit (fixes a crash on degenerate datasets). - Expose `alpha` (confidence level) through the builders and fit_frontier. - Add regression tests for low-n refusal, the lower-bound certification, and the NaN guard. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

With the held-out lower-bound gate, a noisy 300-trace set correctly refuses to certify a deployment (teacher noise caps agreement below target). Give this test cleanly separable data so it still exercises the deploy and route path; the refusal case is covered by test_gate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Extreme out-of-distribution embeddings must route without crashing and produce finite accept scores. Documents that the OSS library defers via the acceptor threshold and does not ship a separate distance-based OOD gate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The held-out calibration picked the single highest-coverage threshold that cleared target on the selection split, then returned None if that one threshold failed held-out verification. Because the highest-coverage threshold is also the most aggressive (least likely to generalise), coverage came out non-monotonic in target: banking77 deployed 0% at TA=0.90 but 87% at TA=0.95. Now verify selection-passing thresholds in coverage order and deploy the highest-coverage one that also clears target on the held-out split. Same Clopper-Pearson maths, same held-out split, the gate just no longer quits after one failed candidate. banking77 frontier is now monotonic (97.5% @0.90 -> 87.1% @0.95 -> 65.9% @0.97 -> 50.4% @0.98). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The 50/50 held-out split was provably honest but too data-hungry: on a few-hundred calibration rows it certified 0% at strict targets even though the confidence ranking clearly supported partial coverage (banking77 0% @0.98 despite an oracle ~80%). Replace it with a hybrid: select the highest-coverage threshold whose Clopper-Pearson LOWER bound clears target on a 70% slice (removes in-sample optimism at selection), then require it to also clear on a held-out 30% by the point estimate (catches selection-bias flukes without demanding CP-tightness on the small verify slice). Falls back to a single-set CP lower-bound gate when n<40. Validated across seeds on banking77/obside at 0.90-0.98 with zero held-out contract violations; recovers banking77 to 67% @0.98 (was 0%) and Ridge to 68% @0.90 (was 0%). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…uts) The parity gate only guarantees agreement on traffic that resembles calibration data; off-distribution inputs (gibberish, off-domain, prompt-injection) still got a confident surrogate answer and slipped through. Add a commodity kNN-distance OOD gate: at fit, calibrate the 99.5th-percentile mean-10-NN distance globally and per predicted label (per-label can only loosen, never tighten below global); at inference, defer any input beyond that, regardless of surrogate confidence. Keyed on input embeddings + predicted label only, NOT the partition cells, so it carries none of the cell-construction IP. On the Obside 156-case battery it lifts OOD-junk deferral from 0% to ~80% while keeping in-distribution coverage (clean 100% / holdout 90% handled). Semantic near-misses (entity-disentanglement) are deliberately out of scope here, those need the advanced embeddings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

adrida and others added 6 commits June 13, 2026 15:47

adrida marked this pull request as ready for review June 14, 2026 15:47

docs: neutralize dataset names in gate docstring

b1a9da8

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

adrida merged commit 36abcfb into main Jun 14, 2026
5 checks passed

adrida deleted the fix/parity-gate-holdout branch June 14, 2026 17:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fit: honest, monotonic parity gate + OOD safety gate#50

fit: honest, monotonic parity gate + OOD safety gate#50
adrida merged 7 commits into
mainfrom
fix/parity-gate-holdout

adrida commented Jun 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

adrida commented Jun 14, 2026

What

1. Honest held-out gate (was: in-sample)

2. Monotonic in target

3. Hybrid select+verify (recovers coverage)

4. Distance-based OOD safety gate

Behavior change

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant