storage/remote: add label value length validation to prevent crashes (#16525) #17309
+189
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Reject remote write samples with label values exceeding 16MB to prevent crashes and memory issues reported in #16525.
Implementation:
Phase 2 will make this configurable via GlobalConfig.LabelValueLengthLimit.
Problem
After PR #16069, Prometheus can crash when receiving label values >16MB via remote write. The crash occurs during protobuf deserialization in
ToLabels()
due to internal buffer limits in the labels library.Currently there's no validation to prevent this, allowing malicious or misconfigured clients to crash the receiver.
Solution
Add validation in the remote write handler to reject samples with label values exceeding 16MB before calling
ToLabels()
.Key Design Decisions
1. Validate before deserialization - Check raw protobuf data to prevent panic:
ts.Labels
array directlyreq.Symbols
viaLabelsRefs
indices2. Protocol-specific error handling - Follow existing patterns:
3. 16MB limit - Matches the implicit limit mentioned in the issue, conservative to avoid breaking users
Why only remote write?
label_value_length_limit
configTesting
$ go test ./storage/remote/... -run TestWriteHandler_LabelValueLengthLimit -v --- PASS: TestWriteHandler_LabelValueLengthLimit (0.00s) --- PASS: TestWriteHandler_LabelValueLengthLimit/v1_normal_label_value --- PASS: TestWriteHandler_LabelValueLengthLimit/v1_label_value_at_limit --- PASS: TestWriteHandler_LabelValueLengthLimit/v1_label_value_exceeds_limit --- PASS: TestWriteHandler_LabelValueLengthLimit/v2_normal_label_value --- PASS: TestWriteHandler_LabelValueLengthLimit/v2_label_value_at_limit --- PASS: TestWriteHandler_LabelValueLengthLimit/v2_label_value_exceeds_limit PASS
All existing tests pass.
Future Work (Phase 2)
GlobalConfig.LabelValueLengthLimit
Which issue(s) does the PR fix:
Fixes #16525
Does this PR introduce a user-facing change?