Skip to content

Conversation

fancive
Copy link

@fancive fancive commented Oct 8, 2025

Reject remote write samples with label values exceeding 16MB to prevent crashes and memory issues reported in #16525.

Implementation:

  • Add DefaultMaxLabelValueLength constant (16MB) with TODO for Phase 2
  • Validate BEFORE ToLabels() deserialization to prevent panic
  • V1: Check ts.Labels directly before calling ToLabels()
  • V2: Check symbols referenced by LabelsRefs before calling ToLabels()
  • V1 behavior: Log + increment counter + continue (best-effort)
  • V2 behavior: Append to badRequestErrs → 400 response (per RW 2.0 spec)
  • Tests cover both protocols: normal, at-limit, and exceeds-limit cases
  • Package-level variable allows tests to override limit for comprehensive coverage

Phase 2 will make this configurable via GlobalConfig.LabelValueLengthLimit.


Problem

After PR #16069, Prometheus can crash when receiving label values >16MB via remote write. The crash occurs during protobuf deserialization in ToLabels() due to internal buffer limits in the labels library.

Currently there's no validation to prevent this, allowing malicious or misconfigured clients to crash the receiver.

Solution

Add validation in the remote write handler to reject samples with label values exceeding 16MB before calling ToLabels().

Key Design Decisions

1. Validate before deserialization - Check raw protobuf data to prevent panic:

  • V1: Check ts.Labels array directly
  • V2: Check req.Symbols via LabelsRefs indices

2. Protocol-specific error handling - Follow existing patterns:

  • V1: Log warning + increment counter + continue (matches duplicate label behavior)
  • V2: Return 400 with detailed error message (per RW 2.0 spec)

3. 16MB limit - Matches the implicit limit mentioned in the issue, conservative to avoid breaking users

Why only remote write?

  • Critical path that can cause crashes
  • Scraping already has validation via label_value_length_limit config
  • PromQL functions can be addressed in separate PR
  • Keeps this PR focused and reviewable

Testing

$ go test ./storage/remote/... -run TestWriteHandler_LabelValueLengthLimit -v
--- PASS: TestWriteHandler_LabelValueLengthLimit (0.00s)
    --- PASS: TestWriteHandler_LabelValueLengthLimit/v1_normal_label_value
    --- PASS: TestWriteHandler_LabelValueLengthLimit/v1_label_value_at_limit
    --- PASS: TestWriteHandler_LabelValueLengthLimit/v1_label_value_exceeds_limit
    --- PASS: TestWriteHandler_LabelValueLengthLimit/v2_normal_label_value
    --- PASS: TestWriteHandler_LabelValueLengthLimit/v2_label_value_at_limit
    --- PASS: TestWriteHandler_LabelValueLengthLimit/v2_label_value_exceeds_limit
PASS

All existing tests pass.

Future Work (Phase 2)

  1. Make configurable via GlobalConfig.LabelValueLengthLimit
  2. Unify with scrape configuration
  3. Add validation to PromQL functions
  4. Update documentation

Which issue(s) does the PR fix:

Fixes #16525

Does this PR introduce a user-facing change?

[BUGFIX] Remote write: Reject label values exceeding 16MB to prevent crashes.

…#16525)

Reject remote write samples with label values exceeding 16MB to prevent
crashes and memory issues reported in prometheus#16525.

Implementation:
- Add DefaultMaxLabelValueLength constant (16MB) with TODO for Phase 2
- Validate BEFORE ToLabels() deserialization to prevent panic
- V1: Check ts.Labels directly before calling ToLabels()
- V2: Check symbols referenced by LabelsRefs before calling ToLabels()
- V1 behavior: Log + increment counter + continue (best-effort)
- V2 behavior: Append to badRequestErrs → 400 response (per RW 2.0 spec)
- Tests cover both protocols: normal, at-limit, and exceeds-limit cases
- Package-level variable allows tests to override limit for comprehensive coverage

Phase 2 will make this configurable via GlobalConfig.LabelValueLengthLimit.

Fixes prometheus#16525

Signed-off-by: wujunchen <wujunchen@baidu.com>
Signed-off-by: fancivez <fancivez@gmail.com>
Signed-off-by: fancivez <fancivez@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Set and check (and document) a global label_value_length_limit
1 participant