fit: honest, monotonic parity gate + OOD safety gate#50
Merged
Conversation
…imate The parity gate previously selected the accept threshold and reported its teacher agreement on the same calibration set, and accepted accept-all configurations on a raw point estimate. Both let a small or lucky set clear the target and then break the contract on real traffic. - build_global now certifies only when the Clopper-Pearson lower bound on cal agreement clears the target (was a raw point estimate). - _calibrate_threshold selects the threshold on a selection split, then verifies it on a held-out split, deploying only if the held-out lower bound clears the target. Falls back to a single-set lower-bound gate when data is too small to split, and refuses rather than guessing. - Guard NaN/inf surrogate probabilities before the acceptor fit (fixes a crash on degenerate datasets). - Expose `alpha` (confidence level) through the builders and fit_frontier. - Add regression tests for low-n refusal, the lower-bound certification, and the NaN guard. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
With the held-out lower-bound gate, a noisy 300-trace set correctly refuses to certify a deployment (teacher noise caps agreement below target). Give this test cleanly separable data so it still exercises the deploy and route path; the refusal case is covered by test_gate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Extreme out-of-distribution embeddings must route without crashing and produce finite accept scores. Documents that the OSS library defers via the acceptor threshold and does not ship a separate distance-based OOD gate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The held-out calibration picked the single highest-coverage threshold that cleared target on the selection split, then returned None if that one threshold failed held-out verification. Because the highest-coverage threshold is also the most aggressive (least likely to generalise), coverage came out non-monotonic in target: banking77 deployed 0% at TA=0.90 but 87% at TA=0.95. Now verify selection-passing thresholds in coverage order and deploy the highest-coverage one that also clears target on the held-out split. Same Clopper-Pearson maths, same held-out split, the gate just no longer quits after one failed candidate. banking77 frontier is now monotonic (97.5% @0.90 -> 87.1% @0.95 -> 65.9% @0.97 -> 50.4% @0.98). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The 50/50 held-out split was provably honest but too data-hungry: on a few-hundred calibration rows it certified 0% at strict targets even though the confidence ranking clearly supported partial coverage (banking77 0% @0.98 despite an oracle ~80%). Replace it with a hybrid: select the highest-coverage threshold whose Clopper-Pearson LOWER bound clears target on a 70% slice (removes in-sample optimism at selection), then require it to also clear on a held-out 30% by the point estimate (catches selection-bias flukes without demanding CP-tightness on the small verify slice). Falls back to a single-set CP lower-bound gate when n<40. Validated across seeds on banking77/obside at 0.90-0.98 with zero held-out contract violations; recovers banking77 to 67% @0.98 (was 0%) and Ridge to 68% @0.90 (was 0%). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…uts) The parity gate only guarantees agreement on traffic that resembles calibration data; off-distribution inputs (gibberish, off-domain, prompt-injection) still got a confident surrogate answer and slipped through. Add a commodity kNN-distance OOD gate: at fit, calibrate the 99.5th-percentile mean-10-NN distance globally and per predicted label (per-label can only loosen, never tighten below global); at inference, defer any input beyond that, regardless of surrogate confidence. Keyed on input embeddings + predicted label only, NOT the partition cells, so it carries none of the cell-construction IP. On the Obside 156-case battery it lifts OOD-junk deferral from 0% to ~80% while keeping in-distribution coverage (clean 100% / holdout 90% handled). Semantic near-misses (entity-disentanglement) are deliberately out of scope here, those need the advanced embeddings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Makes the parity gate honest, stable, and safe, and adds a distance-based OOD safety net.
1. Honest held-out gate (was: in-sample)
The gate previously selected the accept threshold and measured its agreement on the same calibration data, so a policy could clear the target by in-sample luck and then break the contract on real traffic. It now certifies on a held-out slice using an exact Clopper-Pearson lower bound, not a point estimate.
2. Monotonic in target
A bug made coverage non-monotonic (e.g. 0% at TA=0.90 but 87% at 0.95): the gate kept only the single highest-coverage threshold and gave up if it failed verification. It now verifies candidates in coverage order and deploys the highest-coverage one that clears on the held-out split.
3. Hybrid select+verify (recovers coverage)
A pure 50/50 held-out split was provably honest but too data-hungry, certifying 0% at strict targets on small calibration sets even when the confidence ranking clearly supported partial coverage. The gate now selects on a 70% slice (CP lower bound) and verifies generalization on a held-out 30% (point estimate). Validated across seeds with zero held-out contract violations; recovers strong coverage at strict targets that the split discarded.
4. Distance-based OOD safety gate
The parity gate only guarantees agreement on traffic resembling the calibration data; off-distribution inputs (gibberish, off-domain, prompt-injection) still got a confident answer. New commodity kNN-distance gate (global + per-predicted-label thresholds, per-label can only loosen) defers inputs far from the training distribution regardless of surrogate confidence.
Plus NaN-robustness in acceptor fitting and an OOD-robustness routing test.
Behavior change
At a given target, coverage now reflects an honest held-out lower bound. A surrogate whose true agreement sits just under the target will correctly defer where the old gate over-deployed. The
frontier.jsonshows the achievable coverage per target.Tests
tests/test_gate.py,tests/test_ood.py, updatedtests/test_fit.py. Full suite green.