-
-
Notifications
You must be signed in to change notification settings - Fork 26
Description
Description
The ReDoS detection plugin (regex_redos) produces false positives for patterns that have nested quantifiers but are actually safe due to literal boundary characters.
Example False Positive
Pattern: ^/([_0-9a-zA-Z-]+/)?core/cache/busting/1/core/(.*)
This is flagged as vulnerable with "Nested quantifier in group - causes exponential O(2^n) backtracking", but it's actually safe.
Why It's Safe
The pattern ([_0-9a-zA-Z-]+/)? contains:
- Inner quantifier:
[_0-9a-zA-Z-]+(matches word chars and hyphens) - Literal boundary:
/(NOT in the character set)
The / creates a clear boundary that prevents ambiguous matching. For input like abc/, there's only ONE way to match it: abc + /. No backtracking explosion is possible.
Comparison with Truly Vulnerable Pattern
import re
import time
# SAFE pattern (has boundary)
safe = re.compile(r'^/([_0-9a-zA-Z-]+/)?core/')
# VULNERABLE pattern (no boundary)
vuln = re.compile(r'^/(a+)+$')
# Test with increasing input lengths
for n in [20, 25, 28]:
test = 'a' * n + 'X'
start = time.perf_counter()
safe.match('/' + test)
safe_time = (time.perf_counter() - start) * 1000
start = time.perf_counter()
vuln.match('/' + test)
vuln_time = (time.perf_counter() - start) * 1000
print(f"n={n}: safe={safe_time:.2f}ms, vulnerable={vuln_time:.2f}ms")Output:
n=20: safe=0.01ms, vulnerable=38.0ms
n=25: safe=0.01ms, vulnerable=1244.3ms
n=28: safe=0.01ms, vulnerable=9738.5ms
The safe pattern runs in constant time, while the vulnerable pattern shows exponential growth.
Root Cause
In _check_nested_quantifiers, the code calls _contains_quantifier(subpattern) which returns True for any nested quantifier without checking if there's a safe literal boundary.
Proposed Fix
When detecting nested quantifiers, check if the inner quantifier is followed by a literal character that is NOT in the quantified character set. If so, it's safe:
def _has_safe_boundary(self, parsed):
"""Check if quantifier is followed by a literal not in its charset."""
if len(parsed) < 2:
return False
first_op, first_av = parsed[0]
second_op, second_av = parsed[1]
if first_op not in QUANTIFIERS or second_op != LITERAL:
return False
boundary_char = second_av
_, _, quant_content = first_av
char_set = self._get_quantified_chars(quant_content)
return char_set is not None and boundary_char not in char_setImpact
This false positive affects real-world configs. In one user's nginx config, 158 out of 173 HIGH severity issues were false positives from this bug.