Skip to content

fix(metrics): configurable OTel cardinality limit + bound built-in path label#3567

Open
Umang01-hash wants to merge 4 commits into
deps/minor-updates-2026-06-09from
fix/metrics-cardinality-limit
Open

fix(metrics): configurable OTel cardinality limit + bound built-in path label#3567
Umang01-hash wants to merge 4 commits into
deps/minor-updates-2026-06-09from
fix/metrics-cardinality-limit

Conversation

@Umang01-hash

@Umang01-hash Umang01-hash commented Jun 9, 2026

Copy link
Copy Markdown
Member

Summary

Follow-up to #3564 (which bumps OpenTelemetry to 1.44). Resolves #3566.

OTel Go 1.44 applies a default per-instrument cardinality limit of 2000 attribute sets — past that, new series are silently aggregated into a single otel.metric.overflow="true" series. GoFr previously inherited unlimited cardinality and exposed no way to configure the limit, so the bump would silently regress high-cardinality apps. This PR makes the limit explicit and configurable, and removes GoFr's own unbounded label source.

🔧 New configuration: METRICS_CARDINALITY_LIMIT

Config Description Default
METRICS_CARDINALITY_LIMIT Max distinct attribute sets recorded per metric instrument per collection cycle. Beyond it, extra series collapse into one otel.metric.overflow="true" series. 2000
  • Default 2000 — matches the OTel SDK default (no surprise vs upstream).
  • 0 or negative → unlimited — restores pre-1.44 GoFr behavior.
  • Any N → raise/lower the cap.

Documented in the configuration reference (App table) and the observability guide.

Changes

1. Configurable cardinality limit
Wired via a new exporters.WithCardinalityLimit option → metricSdk.WithCardinalityLimit, fed from METRICS_CARDINALITY_LIMIT (parsed in Container.cardinalityLimit, falling back to the default on an invalid value).

2. Bound the built-in app_http_response path label
The path label is now always the matched route template, never the raw request URL — so cardinality (and the internal routeAttrs cache) is bounded by the number of registered routes:

  • templated route → its template (e.g. /customer/{id})
  • static files → /static (the PathPrefix("/static/") template)
  • anything unmatched, including 404s and static-looking 404s → / (GoFr's catch-all PathPrefix("/"))

Previously these fell back to the raw r.URL.Path, which was unbounded.

3. Docs & tests

  • Config documented (above). Observability note updated to describe the actual path-label behavior.
  • Tests: real overflow guard (TestPrometheus_CardinalityLimitApplied — scrapes an isolated registry, fails if WithCardinalityLimit is removed), config parsing (TestContainer_cardinalityLimit), and an integration test over GoFr's real router topology (TestMetrics_GoFrRouterTopology).

Behavioral change (intentional, documented)

Two label changes for existing dashboards, both to bound cardinality:

  • Keeping OTel's 2000 default means high-cardinality instruments now overflow rather than grow unbounded. Set METRICS_CARDINALITY_LIMIT=0 to restore unlimited.
  • app_http_response's path for static/unmatched requests is now /static / / instead of the raw URL.

Notes for review

…t-in path label

OTel 1.44 applies a default per-instrument cardinality limit of 2000 attribute
sets; beyond it, new series overflow into a single otel.metric.overflow=true
series. GoFr previously inherited unlimited cardinality and exposed no way to
control the limit.

- Add METRICS_CARDINALITY_LIMIT config (default 2000, matching the OTel SDK
  default; 0 or negative = unlimited). Wired via a new WithCardinalityLimit
  option on exporters.Prometheus and metricSdk.WithCardinalityLimit.
- Bound the app_http_response "path" label: static-asset and unmatched (404 /
  non-templated) requests previously used the raw URL, which is unbounded and
  inflates cardinality on a single instrument (and the routeAttrs cache). They
  now collapse to bounded sentinels (/static/* and /*).
- Document METRICS_CARDINALITY_LIMIT and the overflow behavior in the configs
  reference, with a note to set it to 0 to restore unlimited cardinality.

Closes: #3566

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes OpenTelemetry metrics cardinality behavior explicit and configurable in GoFr, and prevents GoFr’s built-in HTTP metrics from introducing unbounded label cardinality via raw URL paths.

Changes:

  • Added a configurable per-instrument cardinality limit via METRICS_CARDINALITY_LIMIT, defaulting to 2000 (OTel SDK default), with <=0 meaning unlimited.
  • Bounded the built-in app_http_response metric’s path label for static-asset and unmatched requests using sentinel values (/static/*, /*) instead of raw URLs.
  • Documented the new configuration option in the configs reference.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pkg/gofr/metrics/exporters/exporter.go Adds WithCardinalityLimit option plumbing and applies sdk/metric.WithCardinalityLimit when constructing the MeterProvider.
pkg/gofr/metrics/exporters/exporter_test.go Adds basic tests to guard the option wiring.
pkg/gofr/http/middleware/metrics.go Introduces bounded sentinel path labels and static-asset detection for app_http_response to avoid unbounded cardinality.
pkg/gofr/http/middleware/metrics_test.go Updates static-file expectations and adds a test ensuring unmatched routes use the bounded sentinel.
pkg/gofr/container/container.go Reads METRICS_CARDINALITY_LIMIT from config and passes it into the Prometheus exporter.
docs/references/configs/page.md Documents METRICS_CARDINALITY_LIMIT, its default, and overflow behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/gofr/http/middleware/metrics.go Outdated
Comment on lines +61 to +63
if strings.HasPrefix(urlPath, "/static") {
return true
}
Comment thread pkg/gofr/http/middleware/metrics.go Outdated
Comment on lines 132 to 139
switch {
case isStaticAsset(r.URL.Path):
path = staticAssetPath
case path == "":
path = unmatchedPath
default:
path = strings.TrimSuffix(path, "/")
}
@Umang01-hash

Copy link
Copy Markdown
Member Author

Reviewed + verified locally (built, ran tests, read the OTel 1.44 SDK source, wrote throwaway tests for the edge cases). Core mechanism is correct — WithCardinalityLimit(0) genuinely means unlimited in the SDK, GoFr's option wins over the env-derived value, and bounding routeAttrs also fixes a real unbounded-map memory leak. Three things to fix:

1. (blocking) Matched, templated routes ending in a "static" extension get mislabeled /static/*. isStaticAsset(r.URL.Path) is the first switch case, so it runs before the matched route template is honored. Confirmed: GET /openapi.json on a registered route records path=/static/*, not /openapi.json. Any real API route ending in .json/.pdf/.html/... is bucketed as a static asset. Move the isStaticAsset check into the path == "" branch so it only applies when no route matched.

2. /static prefix is overbroad. HasPrefix(urlPath, "/static") (no trailing slash) collapses /static-report/static/* (confirmed). Should be /static/.

3. Test doesn't guard the wiring it claims to. TestPrometheus_CardinalityLimitOption only asserts meter != nil — it'd still pass if the metricSdk.WithCardinalityLimit(...) line were deleted. Also no test for the container.go config parsing (valid / invalid→default fallback).

Minor: the app_http_response path-label change (/static/example.js/static/*, raw 404s→/*) is a breaking change for existing dashboards but is only in the PR body/comments — worth a note in the docs too.

…tighten static match, real wiring tests

- Path label resolution now honors a matched route template before applying
  the static/unmatched sentinels, so legitimate endpoints ending in a static
  extension (e.g. /openapi.json) or with a "static"-like prefix (e.g.
  /static-api) keep their template instead of being collapsed to /static/*.
- Tighten isStaticAsset to require the "/static/" prefix (trailing slash) so
  /static-report is no longer swept into the static sentinel.
- Keep the root route labelled "/" instead of an empty string after trimming.
- Replace the meter!=nil-only test with TestPrometheus_CardinalityLimitApplied,
  which records past the limit against an isolated registry and asserts the
  overflow series appears (and is absent when unlimited) — fails if
  WithCardinalityLimit is removed. Extracted newPrometheusMeter to inject the
  registerer for deterministic scraping.
- Extract Container.cardinalityLimit and add tests for valid / zero / invalid
  (fallback-to-default) config parsing.
- Add regression tests: matched static-extension route, root route label,
  unmatched static-asset sentinel, and an isStaticAsset table test.
- Document the app_http_response path-label change and the cardinality limit in
  the observability docs.
@Umang01-hash

Copy link
Copy Markdown
Member Author

Thanks for the careful review — all three addressed in f6cfc4f.

1 (blocking) — matched templated routes mislabeled /static/*. Reordered the path resolution so a matched route template is honored first; the static/unmatched sentinels only apply when no template resolved. GET /openapi.json on a registered route now records path=/openapi.json. Guarded by TestMetrics_MatchedRouteWithStaticExtension. Also keep the root route as / instead of an empty label after trimming (TestMetrics_RootRouteLabel).

2 — /static prefix overbroad. Now requires /static/ (trailing slash), so /static-report is no longer collapsed. Guarded by a TestIsStaticAsset table test.

One consequence worth flagging: since static files are served via PathPrefix("/static/") (a matched route with template /static), honoring the template means those now label as /static rather than the /static/* sentinel — both bounded. The /static/* sentinel now only applies to static-looking requests that match no route (covered by TestMetrics_UnmatchedStaticAssetSentinel). Updated the existing static-file tests accordingly.

3 — tests didn't guard the wiring. Replaced the meter != nil check with TestPrometheus_CardinalityLimitApplied: it records past a limit of 1 against an isolated registry and asserts the overflow series appears (and is absent when unlimited). Verified it fails if metricSdk.WithCardinalityLimit(...) is removed. Extracted newPrometheusMeter to inject the registerer for deterministic scraping. Added TestContainer_cardinalityLimit for the config parsing (valid / 0 / invalid→default).

Minor — docs. Added a note in the observability docs covering the app_http_response path-label change and the cardinality limit / overflow behavior.

@Umang01-hash

Copy link
Copy Markdown
Member Author

Re-reviewed the latest commits — rebuilt, re-ran tests/lint, and stood up a real GoFr server scraping /metrics. The three things I flagged are genuinely fixed and verified on a live server:

  • /openapi.json registered route → path="/openapi.json" (no longer collapsed to /static/*) ✅
  • /static/ prefix tightened (trailing slash) + TestIsStaticAsset
  • TestPrometheus_CardinalityLimitApplied now scrapes a real registry and asserts the otel.metric.overflow series — real wiring guard, fails if the SDK line is removed ✅
  • docs note added; build/vet/gofmt/golangci-lint clean. 👍

One new thing surfaced on deeper inspection that I think should be addressed before merge:

isStaticAsset(), staticAssetPath (/static/*) and unmatchedPath (/*) are effectively dead code in a real GoFr app. GoFr unconditionally registers a catch-all router.PathPrefix("/") (gofr.go:175) whose GetPathTemplate() returns "/", so path is never empty → the path != "" branch always wins and the sentinel branches never execute. Verified on a running server:

  • static files → path="/static" (via the PathPrefix("/static/") template, not isStaticAsset)
  • static-looking 404s (/images/missing.png, /app.js) → path="/", not /static/*
  • invented 404s + method-mismatch (POST/PUT/DELETE) → path="/", not /*

So the description's "collapse to bounded sentinels /static/* and /*" doesn't match real behavior (they collapse to /static and /), and TestMetrics_UnmatchedRouteBoundedLabel / TestMetrics_UnmatchedStaticAssetSentinel only pass because they build routers without GoFr's catch-all — they assert a path that never happens in the integrated framework.

Cardinality is still correctly bounded, so this isn't a correctness bug — it's dead code + a tests/description mismatch. Suggest either:

  1. drop isStaticAsset + both sentinels and rely on the template + root special-case (already bounds everything), and fix the description; or
  2. keep them deliberately as defense-in-depth, but say so and add an integration test that exercises the real GoFr router topology (with the catch-all) so the tests reflect reality.

…mplates

Review follow-up: GoFr registers an unconditional catch-all PathPrefix("/")
(gofr.go), so mux.CurrentRoute always resolves and GetPathTemplate is never
empty in a real app. The isStaticAsset helper and the /static/* and /*
sentinels were therefore dead code — the path-template branch always wins.

- Remove isStaticAsset, staticAssetPath, and unmatchedPath. The path label is
  now always the matched route template, trimmed of a trailing slash, with an
  empty-template fallback to "/" (keeps the bound for routers without the
  catch-all). Cardinality stays bounded by the number of routes.
- Verified on a running GoFr server scraping /metrics: registered routes report
  their template, static files report "/static", and all unmatched/404 requests
  (including static-looking ones like /images/missing.png) report "/" — never
  the raw URL.
- Replace the sentinel-asserting tests (which only passed without GoFr's
  catch-all) with TestMetrics_NoTemplateFallsBackToRoot and an integration test
  TestMetrics_GoFrRouterTopology that exercises the real router topology
  (specific route + /static/ prefix + catch-all).
- Update the observability docs note to describe actual behavior (templates,
  "/static", "/") instead of the removed sentinels.
@Umang01-hash

Copy link
Copy Markdown
Member Author

Good catch — you're right, the catch-all PathPrefix("/") (gofr.go) means a template always resolves, so isStaticAsset + the /static/* and /* sentinels were dead code. Went with option 1 in 323611b.

  • Removed isStaticAsset, staticAssetPath, unmatchedPath. The path label is now always the matched route template, trailing-slash trimmed, with an empty-template fallback to / (keeps the bound for any router without the catch-all).
  • Verified on a running GoFr server scraping /metrics: /hello/hello, /redis/redis, and /no/such/route / /images/missing.png / /app.js all → /. No raw URLs, no /static/* or /*.
  • Replaced the sentinel tests (which only passed without the catch-all) with TestMetrics_NoTemplateFallsBackToRoot and TestMetrics_GoFrRouterTopology — the latter builds the real topology (specific route + /static/ prefix + catch-all) and asserts templates / /static / /.
  • Fixed the description to match (templates, /static, / — not the removed sentinels) and made METRICS_CARDINALITY_LIMIT a prominent section. Observability docs note updated too.

build/vet/golangci-lint clean; changed packages green (the local pkg/gofr failure is just port 2121 being held by another process on my box — unrelated to this diff).

@Umang01-hash

Copy link
Copy Markdown
Member Author

Re-verified the latest commit (drop dead static/unmatched sentinels). Rebuilt, re-ran tests/vet/gofmt/golangci-lint, and stood up a real GoFr server scraping /metrics. The dead-code point is fully resolved and everything checks out.

What changed: isStaticAsset() + the /static/* and /* sentinels were removed; path resolution is now just template → trim trailing slash → fall back to /. The misleading isolated tests were dropped and replaced with TestMetrics_GoFrRouterTopology, which exercises the real router topology (specific route + PathPrefix("/static/") + catch-all PathPrefix("/")). The docs note was corrected to describe actual behavior (/static and /).

Live server /metrics confirms bounded labels, no raw-URL leakage:

  • /static/example.jspath="/static"
  • /openapi.json (registered route) → path="/openapi.json"
  • /customer/42, /customer/777path="/customer/{id}"
  • static-looking 404s, invented 404s, method-mismatch → path="/"
  • no /*, no /static/*, no per-file/per-URL series

Checks: go build, go vet, gofmt, golangci-lint all clean on changed files; affected tests pass. Earlier items remain verified: configurable METRICS_CARDINALITY_LIMIT (0/negative = unlimited, confirmed against the OTel SDK source), the real cardinality-wiring test (registry scrape + otel.metric.overflow assertion), config-parsing test with fallback, and docs in both pages. Also genuinely fixes the original unbounded-cardinality + unbounded routeAttrs cache growth.

LGTM from my side — no outstanding issues. Only operational note: this is stacked on #3564, so #3564 should merge first, after which this retargets to development.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants