fix(metrics): configurable OTel cardinality limit + bound built-in path label#3567
fix(metrics): configurable OTel cardinality limit + bound built-in path label#3567Umang01-hash wants to merge 4 commits into
Conversation
…t-in path label OTel 1.44 applies a default per-instrument cardinality limit of 2000 attribute sets; beyond it, new series overflow into a single otel.metric.overflow=true series. GoFr previously inherited unlimited cardinality and exposed no way to control the limit. - Add METRICS_CARDINALITY_LIMIT config (default 2000, matching the OTel SDK default; 0 or negative = unlimited). Wired via a new WithCardinalityLimit option on exporters.Prometheus and metricSdk.WithCardinalityLimit. - Bound the app_http_response "path" label: static-asset and unmatched (404 / non-templated) requests previously used the raw URL, which is unbounded and inflates cardinality on a single instrument (and the routeAttrs cache). They now collapse to bounded sentinels (/static/* and /*). - Document METRICS_CARDINALITY_LIMIT and the overflow behavior in the configs reference, with a note to set it to 0 to restore unlimited cardinality. Closes: #3566
There was a problem hiding this comment.
Pull request overview
This PR makes OpenTelemetry metrics cardinality behavior explicit and configurable in GoFr, and prevents GoFr’s built-in HTTP metrics from introducing unbounded label cardinality via raw URL paths.
Changes:
- Added a configurable per-instrument cardinality limit via
METRICS_CARDINALITY_LIMIT, defaulting to 2000 (OTel SDK default), with<=0meaning unlimited. - Bounded the built-in
app_http_responsemetric’spathlabel for static-asset and unmatched requests using sentinel values (/static/*,/*) instead of raw URLs. - Documented the new configuration option in the configs reference.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/gofr/metrics/exporters/exporter.go | Adds WithCardinalityLimit option plumbing and applies sdk/metric.WithCardinalityLimit when constructing the MeterProvider. |
| pkg/gofr/metrics/exporters/exporter_test.go | Adds basic tests to guard the option wiring. |
| pkg/gofr/http/middleware/metrics.go | Introduces bounded sentinel path labels and static-asset detection for app_http_response to avoid unbounded cardinality. |
| pkg/gofr/http/middleware/metrics_test.go | Updates static-file expectations and adds a test ensuring unmatched routes use the bounded sentinel. |
| pkg/gofr/container/container.go | Reads METRICS_CARDINALITY_LIMIT from config and passes it into the Prometheus exporter. |
| docs/references/configs/page.md | Documents METRICS_CARDINALITY_LIMIT, its default, and overflow behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if strings.HasPrefix(urlPath, "/static") { | ||
| return true | ||
| } |
| switch { | ||
| case isStaticAsset(r.URL.Path): | ||
| path = staticAssetPath | ||
| case path == "": | ||
| path = unmatchedPath | ||
| default: | ||
| path = strings.TrimSuffix(path, "/") | ||
| } |
|
Reviewed + verified locally (built, ran tests, read the OTel 1.44 SDK source, wrote throwaway tests for the edge cases). Core mechanism is correct — 1. (blocking) Matched, templated routes ending in a "static" extension get mislabeled 2. 3. Test doesn't guard the wiring it claims to. Minor: the |
…tighten static match, real wiring tests - Path label resolution now honors a matched route template before applying the static/unmatched sentinels, so legitimate endpoints ending in a static extension (e.g. /openapi.json) or with a "static"-like prefix (e.g. /static-api) keep their template instead of being collapsed to /static/*. - Tighten isStaticAsset to require the "/static/" prefix (trailing slash) so /static-report is no longer swept into the static sentinel. - Keep the root route labelled "/" instead of an empty string after trimming. - Replace the meter!=nil-only test with TestPrometheus_CardinalityLimitApplied, which records past the limit against an isolated registry and asserts the overflow series appears (and is absent when unlimited) — fails if WithCardinalityLimit is removed. Extracted newPrometheusMeter to inject the registerer for deterministic scraping. - Extract Container.cardinalityLimit and add tests for valid / zero / invalid (fallback-to-default) config parsing. - Add regression tests: matched static-extension route, root route label, unmatched static-asset sentinel, and an isStaticAsset table test. - Document the app_http_response path-label change and the cardinality limit in the observability docs.
|
Thanks for the careful review — all three addressed in f6cfc4f. 1 (blocking) — matched templated routes mislabeled 2 — One consequence worth flagging: since static files are served via 3 — tests didn't guard the wiring. Replaced the Minor — docs. Added a note in the observability docs covering the |
|
Re-reviewed the latest commits — rebuilt, re-ran tests/lint, and stood up a real GoFr server scraping
One new thing surfaced on deeper inspection that I think should be addressed before merge:
So the description's "collapse to bounded sentinels Cardinality is still correctly bounded, so this isn't a correctness bug — it's dead code + a tests/description mismatch. Suggest either:
|
…mplates
Review follow-up: GoFr registers an unconditional catch-all PathPrefix("/")
(gofr.go), so mux.CurrentRoute always resolves and GetPathTemplate is never
empty in a real app. The isStaticAsset helper and the /static/* and /*
sentinels were therefore dead code — the path-template branch always wins.
- Remove isStaticAsset, staticAssetPath, and unmatchedPath. The path label is
now always the matched route template, trimmed of a trailing slash, with an
empty-template fallback to "/" (keeps the bound for routers without the
catch-all). Cardinality stays bounded by the number of routes.
- Verified on a running GoFr server scraping /metrics: registered routes report
their template, static files report "/static", and all unmatched/404 requests
(including static-looking ones like /images/missing.png) report "/" — never
the raw URL.
- Replace the sentinel-asserting tests (which only passed without GoFr's
catch-all) with TestMetrics_NoTemplateFallsBackToRoot and an integration test
TestMetrics_GoFrRouterTopology that exercises the real router topology
(specific route + /static/ prefix + catch-all).
- Update the observability docs note to describe actual behavior (templates,
"/static", "/") instead of the removed sentinels.
|
Good catch — you're right, the catch-all
|
|
Re-verified the latest commit ( What changed: Live server
Checks: LGTM from my side — no outstanding issues. Only operational note: this is stacked on #3564, so #3564 should merge first, after which this retargets to |
Summary
Follow-up to #3564 (which bumps OpenTelemetry to 1.44). Resolves #3566.
OTel Go 1.44 applies a default per-instrument cardinality limit of 2000 attribute sets — past that, new series are silently aggregated into a single
otel.metric.overflow="true"series. GoFr previously inherited unlimited cardinality and exposed no way to configure the limit, so the bump would silently regress high-cardinality apps. This PR makes the limit explicit and configurable, and removes GoFr's own unbounded label source.🔧 New configuration:
METRICS_CARDINALITY_LIMITMETRICS_CARDINALITY_LIMITotel.metric.overflow="true"series.20002000— matches the OTel SDK default (no surprise vs upstream).0or negative → unlimited — restores pre-1.44 GoFr behavior.N→ raise/lower the cap.Documented in the configuration reference (App table) and the observability guide.
Changes
1. Configurable cardinality limit
Wired via a new
exporters.WithCardinalityLimitoption →metricSdk.WithCardinalityLimit, fed fromMETRICS_CARDINALITY_LIMIT(parsed inContainer.cardinalityLimit, falling back to the default on an invalid value).2. Bound the built-in
app_http_responsepathlabelThe
pathlabel is now always the matched route template, never the raw request URL — so cardinality (and the internalrouteAttrscache) is bounded by the number of registered routes:/customer/{id})/static(thePathPrefix("/static/")template)/(GoFr's catch-allPathPrefix("/"))Previously these fell back to the raw
r.URL.Path, which was unbounded.3. Docs & tests
path-label behavior.TestPrometheus_CardinalityLimitApplied— scrapes an isolated registry, fails ifWithCardinalityLimitis removed), config parsing (TestContainer_cardinalityLimit), and an integration test over GoFr's real router topology (TestMetrics_GoFrRouterTopology).Behavioral change (intentional, documented)
Two label changes for existing dashboards, both to bound cardinality:
2000default means high-cardinality instruments now overflow rather than grow unbounded. SetMETRICS_CARDINALITY_LIMIT=0to restore unlimited.app_http_response'spathfor static/unmatched requests is now/static//instead of the raw URL.Notes for review
deps/minor-updates-2026-06-09, so the diff shows only these changes and retargets todevelopmentautomatically once chore(deps): consolidate minor dependency updates (2026-06-09) #3564 merges. Merge chore(deps): consolidate minor dependency updates (2026-06-09) #3564 first./static/*and/*sentinels for non-templated requests; those were dead code in a real GoFr app (the catch-allPathPrefix("/")means a template always resolves), so they were removed in favour of the route-template + catch-all behavior described above.