Skip to content

feat: unified noise vocabulary and universal mock-mismatch reporting#4271

Open
slayerjain wants to merge 3 commits into
mainfrom
feat/unified-mock-mismatch-reporting
Open

feat: unified noise vocabulary and universal mock-mismatch reporting#4271
slayerjain wants to merge 3 commits into
mainfrom
feat/unified-mock-mismatch-reporting

Conversation

@slayerjain

Copy link
Copy Markdown
Member

Describe the changes that are made

  • Introduces a universal mock-mismatch reporting framework (pkg/agent/proxy/integrations/mismatch) so every protocol parser reports a mock miss with one vocabulary, and the same structured data flows to the CLI mismatch table, the test-report yaml (failure_info.unmatched_calls), keploy report output, and the platform APIs the UI consumes.
  • Unifies the noise vocabulary and controls between response assertions and HTTP mock matching:
    • Field-diff paths in mismatch reports use the noise-config grammar (body.<dotted.path>, header.<name>, query.<name>, method, path) — a reported path can be copied verbatim into test.globalNoise or spec.assertions.noise.
    • The body bucket of test.globalNoise (already forwarded to the proxy, previously unconsumed for HTTP) now participates in HTTP mock matching: excluded from req-body-noise detection and allowed as drift under strict enforcement.
    • --schema-noise-strict gets a CLI flag (was config-file-only while the in-cluster replay path force-enables it).
    • req_body_noise learned by --schema-noise-detection is now persisted even when --remove-unused-mocks is off, via a new prune-free mockdb.PersistMockNoise path (previously it was silently discarded at exit unless pruning ran).
  • Communicates the diff correctly on a miss:
    • HTTP misses report the match-cascade phase (no_mocks / no_schema_candidates / body_mismatch / strict_noise_reject / no_match), candidate counts, and per-field diffs with recorded vs live values, computed against a schema-match survivor instead of only Levenshtein over METHOD path — replacing the canned "method and path match but headers or body differ".
    • The miss log is default-visible (Warn) instead of Debug-only, with phase + next steps.
    • The generic (opaque TCP) parser now emits structured reports (closest candidate by Jaccard + score) instead of a bare error; proxy.GetMockErrors no longer drops misses that lack a structured report, so no protocol's misses vanish from the report.
    • The MOCKS MISMATCH SUMMARY table prints whenever mismatches exist — including green runs (the false-green case: tests demoted to OBSOLETE, or protocols whose misses can't fail a test).
    • Remediation hints reference real commands (keploy record, --update-test-mapping); the previous hint referenced keploy rerecord, which is not a registered command.
    • keploy report renders unmatched outgoing calls before the response diff, since the response diff is usually a downstream symptom of the miss.
  • Extensible by design: protocol parsers build reports via the mismatch.Builder; models.MockFieldDiff/MatchPhase/CandidateCount are additive fields on UnmatchedCall, so k8s-proxy/UI can render them without breaking changes. MySQL/DNS reports keep working unchanged; migrating them (and postgres/mongo/grpc in the integrations repo) to the builder is follow-up work tracked in the next PR.

Links & References

Closes: NA (gap audit follow-up; happy to link issues if we file them)

🔗 Related PRs

  • Stacked follow-up: deterministic match policy + fuzzy-match audit (next PR)

🐞 Related Issues

  • NA

📄 Related Documents

  • NA

What type of PR is this? (check all applicable)

  • 🍕 Feature
  • 🐞 Bug Fix

Added e2e test pipeline?

  • 👍 yes
  • 🙅 no, because they aren't needed

Added comments for hard-to-understand areas?

  • 👍 yes

Added to documentation?

  • 📜 README.md
  • 📓 Wiki
  • 🙅 no documentation needed (per-protocol matching reference lands as a keploy/docs PR alongside this)

Are there any sample code or steps to test the changes?

  • 👍 yes, mentioned below

Unit coverage: pkg/agent/proxy/integrations/mismatch (builder, diff helpers), pkg/matcher (JSONFieldDiffs), HTTP matcher (field-diff reports vs schema survivors, learned+user noise exclusion, strict-filter user-noise allowance), mockdb.PersistMockNoise (prune-free persistence + no-op safety), keploy report unmatched-call rendering.

Manual: replay any HTTP test set with a drifted outgoing request body field. Expect a Warn log naming the call, phase and field diff; the same diff in MOCKS MISMATCH SUMMARY; field_diffs in failure_info.unmatched_calls of the report yaml; and keploy report printing the unmatched call above the response diff, with the path copy-pastable into test.globalNoise.

Self Review done?

  • ✅ yes

Any relevant screenshots, recordings or logs?

  • NA

🤖 Generated with Claude Code

@slayerjain slayerjain requested a review from gouravkrosx as a code owner June 12, 2026 09:11
Copilot AI review requested due to automatic review settings June 12, 2026 09:11
@github-actions

Copy link
Copy Markdown

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 2.73ms 3.49ms 4.95ms 100.02 0.00% ✅ PASS
2 2.65ms 3.42ms 4.78ms 100.02 0.00% ✅ PASS
3 2.64ms 3.47ms 4.99ms 100.00 0.00% ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a shared, structured “mock mismatch” reporting vocabulary so mock misses across protocols can be surfaced consistently in CLI output, report YAML (failure_info.unmatched_calls), and downstream platform/UI consumers. It also threads request-body noise vocabulary into HTTP mock matching and ensures learned req_body_noise can be persisted even when pruning is disabled.

Changes:

  • Added a universal mismatch reporting builder (pkg/agent/proxy/integrations/mismatch) plus new structured fields (match_phase, candidate_count, field_diffs) on mismatch models.
  • Upgraded HTTP + Generic parsers to emit structured mismatch diagnostics (phase, closest mock, field-level diffs) and ensured misses without structured reports no longer get dropped.
  • Added prune-free persistence for learned HTTP request-body noise (PersistMockNoise) and exposed --schema-noise-strict as a CLI flag.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
pkg/service/report/unmatched_calls_test.go Adds unit tests for rendering structured unmatched-call output.
pkg/service/report/report.go Renders unmatched outgoing calls ahead of response diffs in keploy report output.
pkg/service/replay/utils.go Adds match-phase details into mismatch failure table output.
pkg/service/replay/replay.go Prints mismatch summary table whenever mismatches exist; updates operator log messaging.
pkg/platform/yaml/mockdb/persist_noise_test.go Adds tests for prune-free learned-noise persistence behavior.
pkg/platform/yaml/mockdb/db.go Extracts atomic gob rewrite helper and adds PersistMockNoise implementation for YAML/JSON/gob.
pkg/models/testrun.go Extends UnmatchedCall with structured mismatch fields for report consumers.
pkg/models/errors.go Introduces MockFieldDiff + match-phase constants and extends MockMismatchReport with structured fields.
pkg/matcher/risk.go Adds JSONFieldDiffs to compute structured JSON diffs with recorded/live values.
pkg/matcher/risk_fielddiffs_test.go Unit tests for JSONFieldDiffs behavior (kinds/noise/truncation).
pkg/agent/proxy/proxy.go Ensures mock misses without a structured report still reach FailureInfo.UnmatchedCalls.
pkg/agent/proxy/integrations/mismatch/mismatch.go New shared mismatch-report builder + helpers for JSON/header/query diffs and default remediation text.
pkg/agent/proxy/integrations/mismatch/mismatch_test.go Unit tests for the mismatch builder and helper diff functions.
pkg/agent/proxy/integrations/http/reqbodynoise_test.go Updates tests for new detectReqBodyNoise signature (user noise support).
pkg/agent/proxy/integrations/http/mismatch_report_test.go Adds tests for HTTP structured mismatch reports against schema survivors + noise handling.
pkg/agent/proxy/integrations/http/match.go Threads user body-noise into matching, emits match diagnostics, and builds structured HTTP mismatch reports.
pkg/agent/proxy/integrations/http/match_test.go Updates mismatch-report tests to new structured diff semantics.
pkg/agent/proxy/integrations/http/decode.go Threads test.globalNoise.body into matcher and logs mismatch diagnostics for HTTP misses.
pkg/agent/proxy/integrations/generic/match.go Adds structured mismatch report generation for the generic (opaque TCP) parser.
pkg/agent/proxy/integrations/generic/decode.go Emits generic mismatch reports and wraps miss errors with structured diagnostics.
cli/provider/cmd.go Adds --schema-noise-strict flag + alias normalization + flag parsing into config.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/matcher/risk.go
Comment on lines +202 to +207
trunc := func(s string) string {
if maxVal > 0 && len(s) > maxVal {
return s[:maxVal] + "…"
}
return s
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 70142df (landed after this review was generated): truncation now backs off to a rune boundary and the doc comment says "at most maxVal bytes (cut at a rune boundary)".

Comment thread pkg/agent/proxy/integrations/http/match.go
Comment on lines +197 to +206
// Default-visible: this is the root cause of the test
// failure that follows, so it must not hide at Debug.
h.Logger.Warn("no matching http mock found for outgoing request",
zap.String("protocol", report.Protocol),
zap.String("actual", report.ActualSummary),
zap.String("match_phase", report.MatchPhase),
zap.Int("candidates", report.CandidateCount),
zap.String("closest", report.ClosestMock),
zap.String("diff", report.Diff))
zap.String("diff", report.Diff),
zap.String("next_steps", report.NextSteps))

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Field name fixed in 9ae829d (next_steps → next_step, matching the existing operator-facing logs). Keeping Warn deliberately: this log fires only when a mock miss synthesizes a 502 that the app immediately reacts to — it is the root cause of the test failure that follows, not routine diagnostics, and the historical problem was exactly that this signal hid at Debug while users debugged the downstream response diff. Healthy runs without misses log nothing here.

Comment on lines +68 to 81
// Build the universal mismatch report so generic misses show
// up in the mismatch table / report yaml like HTTP and MySQL
// misses do, instead of vanishing as a bare error.
report := buildGenericMismatchReport(ctx, genericRequests, mockDb)
// Default-visible: this miss is the root cause of the test
// failure that follows (the app sees its connection close).
logger.Warn("no matching generic mock found for outgoing call",
zap.String("protocol", report.Protocol),
zap.Int("requestCount", len(genericRequests)),
zap.Int("firstRequestBytes", len(genericRequests[0])),
zap.String("hint", "Re-record mocks if the wire protocol data has changed"),
zap.String("closest", report.ClosestMock),
zap.String("diff", report.Diff),
zap.String("next_steps", report.NextSteps),
zap.Binary("preview", preview))

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Field name fixed in 9ae829d. Keeping Warn for the same reason as the HTTP miss log: a generic miss closes the app's connection with zero bytes, so this line is the only direct evidence of the root cause; at Info it drowns in the per-request flow logs. Intentionally-unmocked hosts should use bypass rules rather than relying on log level.

Comment on lines +769 to +777
if testRunResult {
r.logger.Warn("Tests passed, but some outgoing calls did not match the recorded mocks.",
zap.String("test_sets", testSets),
zap.String("next_steps", "Review the mismatch summary below. Add drifting dynamic fields as noise (test.globalNoise), or re-record the test-set with 'keploy record' if the request structure changed."))
} else {
r.logger.Info("Some testsets failed due to mock differences.",
zap.String("test_sets", testSets),
zap.String("next_steps", "Add drifting dynamic fields as noise (test.globalNoise); if the request structure changed, re-record the test-set with 'keploy record', or refresh mappings with --update-test-mapping."))
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Field name fixed in 9ae829d. Kept Warn for the passed-run branch deliberately: after the benign-DNS filter (70142df), a green run that still has mismatch rows means tests passed while consuming a divergent mock set (e.g. OBSOLETE demotions) — the false-green case this change exists to surface. The failed-run branch stays at Info since the failure itself already carries the signal.

Comment on lines +121 to +146
var genericMocks []*models.Mock
for _, m := range mocks {
if m.Kind == "Generic" {
genericMocks = append(genericMocks, m)
}
}
if len(genericMocks) == 0 {
return mismatch.NewReport(mismatch.ProtocolGeneric, summary).
WithPhase(models.MatchPhaseNoMocks, 0).Build()
}

bestIdx, bestSim := -1, -1.0
for idx, mock := range genericMocks {
if len(mock.Spec.GenericRequests) != len(reqBuffs) {
continue
}
var simSum float64
for i, reqBuff := range reqBuffs {
encoded, _ := util.DecodeBase64(mock.Spec.GenericRequests[i].Message[0].Data)
simSum += fuzzyCheck(encoded, reqBuff)
}
if avg := simSum / float64(len(reqBuffs)); avg > bestSim {
bestSim = avg
bestIdx = idx
}
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 9ae829d: the report builder now decodes each recorded payload per its stored Message[0].Type (String verbatim, binary via base64) and skips candidates whose payloads fail to decode, instead of unconditionally base64-decoding and discarding the error. The literal "Generic" kind string matches the existing convention in fuzzyMatch/findExactMatch in this file, so left as-is for consistency.

…reporting

Mock misses were nearly undebuggable on the default path: the rich miss
explanation logged at Debug only, the closest-mock diff degenerated to
'method and path match but headers or body differ', misses without a
structured report vanished from FailureInfo.UnmatchedCalls entirely
(generic and others), the console mismatch table only printed when the
whole run failed, its hint referenced a 'keploy rerecord' command that
does not exist, and 'keploy report' rendered none of the stored
mock-miss data. Separately, the noise vocabulary was split: the body
bucket of test.globalNoise was forwarded to the proxy but never
consumed by HTTP mock matching, schemaNoiseStrict had no CLI flag, and
req_body_noise learned by --schema-noise-detection was persisted only
inside the --remove-unused-mocks pruning path, so detection alone
silently discarded everything it learned.

This change introduces a universal mismatch-reporting framework
(pkg/agent/proxy/integrations/mismatch) that all protocol parsers can
use to build structured MockMismatchReports with one vocabulary:
field-diff paths use the same grammar as the noise configuration
(body.<dotted.path>, header.<name>, query.<name>, method, path), so a
reported path can be copied verbatim into test.globalNoise or a
testcase's spec.assertions.noise. Reports now carry the match phase
(how far the cascade got), candidate counts, and per-field diffs with
recorded/live values, and flow uniformly into the CLI mismatch table,
the test-report yaml (FailureInfo.UnmatchedCalls), 'keploy report'
output, and the platform APIs that the UI consumes.

Details:
- pkg/matcher: new JSONFieldDiffs computes value-aware field diffs with
  noise exclusions, shared by response assertions and mock matching.
- HTTP matching: consumes the user's body-noise bucket (lenient
  detection exclusion + strict enforcement allowance), records the
  cascade phase in a matchDiag, and builds field-level reports against
  a schema-match survivor instead of only Levenshtein on METHOD+path.
  The miss log is now default-visible (Warn) with phase and next steps.
- Generic parser: emits structured reports (closest candidate by
  Jaccard with score) instead of a bare error that vanished.
- proxy.GetMockErrors: misses without a structured report are no longer
  dropped; they surface with protocol 'unknown' and a log pointer.
- Replay: the MOCKS MISMATCH SUMMARY prints whenever mismatches exist
  (a green run with mock misses is exactly the false-green case the
  user must see), and remediation hints now reference real commands
  ('keploy record', --update-test-mapping) instead of 'keploy rerecord'.
- keploy report: failed tests render their unmatched outgoing calls
  first (root cause before the downstream response diff).
- --schema-noise-strict gets a CLI flag (previously config-file only,
  while the in-cluster path force-enables it).
- mockdb.PersistMockNoise persists learned req_body_noise without
  pruning; replay calls it when --schema-noise-detection runs without
  --remove-unused-mocks.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: slayerjain <shubhamkjain@outlook.com>
@slayerjain slayerjain force-pushed the feat/unified-mock-mismatch-reporting branch from c5a9aea to 0cefc76 Compare June 12, 2026 09:57
@github-actions

Copy link
Copy Markdown

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 2.62ms 3.37ms 5.08ms 100.00 0.00% ✅ PASS
2 2.58ms 3.21ms 4.9ms 100.02 0.00% ✅ PASS
3 2.61ms 3.4ms 4.83ms 100.00 0.00% ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

Five fixes from a multi-agent adversarial review of the unified
mismatch-reporting change, before it ships:

- GetMockErrors's report-less fallback now requires the error chain to
  wrap the new models.ErrNoMockMatched sentinel (http/generic/mysql miss
  paths wrap it). Without the gate, sendMockNotFoundError's blanket
  ErrMockNotFound typing would have surfaced infrastructure failures
  (decompress errors, decode errors) as unmatched calls in reports.
- The MOCKS MISMATCH SUMMARY on fully PASSING runs now excludes DNS
  misses: DNS answers misses with a synthetic response by design, so on
  a green run they are routine — without the filter every healthy run
  with app-startup DNS chatter would print the table.
- The --schema-noise-strict flag read is guarded with the same
  Changed/IsSet pattern as disable-mapping, so the flag default no
  longer clobbers a yaml-only test.schemaNoiseStrict configuration.
- User body-noise entries are normalized to presence-only for mock
  matching: path-based request matching cannot honor value regexes, and
  normalizing (vs ignoring regex-valued entries) keeps the promise that
  a path copied from a mismatch report into test.globalNoise works. The
  in-cluster runner now threads the body bucket like the CLI path.
- JSONFieldDiffs output is deterministic (sorted per kind) so the
  25-diff cap keeps a stable subset, and value truncation cuts at a
  rune boundary instead of mid-UTF-8-sequence.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: slayerjain <shubhamkjain@outlook.com>
@github-actions

Copy link
Copy Markdown

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 2.99ms 3.96ms 5.61ms 100.00 0.00% ✅ PASS
2 2.86ms 3.77ms 4.91ms 100.02 0.00% ✅ PASS
3 2.84ms 3.73ms 4.97ms 100.02 0.00% ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

- Use the codebase's established 'next_step' zap field (singular) for
  remediation guidance in the new miss logs and run summaries, matching
  the existing operator-facing logs.
- mergeNoiseMaps now always returns a fresh map, honoring its documented
  contract instead of aliasing the input when one side is empty.
- buildGenericMismatchReport decodes recorded payloads per their stored
  type (String verbatim, binary base64) instead of unconditionally
  base64-decoding and discarding the error — previously ASCII-recorded
  mocks scored similarity against nil bytes, degrading the
  closest-candidate ranking in miss reports.

The miss logs stay at Warn deliberately: an HTTP/generic mock miss
synthesizes an error the application immediately reacts to, so it is
the root cause of the test failure that follows, not routine
diagnostics. The rune-vs-byte truncation comment was already fixed in
the prior hardening commit.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: slayerjain <shubhamkjain@outlook.com>
@github-actions

Copy link
Copy Markdown

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 2.67ms 3.34ms 4.96ms 100.02 0.00% ✅ PASS
2 2.68ms 3.5ms 5.21ms 100.02 0.00% ✅ PASS
3 2.84ms 3.82ms 5.81ms 100.00 0.00% ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants