Skip to content

ReDoS detection has false positives for patterns with literal boundaries #99

@dvershinin

Description

@dvershinin

Description

The ReDoS detection plugin (regex_redos) produces false positives for patterns that have nested quantifiers but are actually safe due to literal boundary characters.

Example False Positive

Pattern: ^/([_0-9a-zA-Z-]+/)?core/cache/busting/1/core/(.*)

This is flagged as vulnerable with "Nested quantifier in group - causes exponential O(2^n) backtracking", but it's actually safe.

Why It's Safe

The pattern ([_0-9a-zA-Z-]+/)? contains:

  • Inner quantifier: [_0-9a-zA-Z-]+ (matches word chars and hyphens)
  • Literal boundary: / (NOT in the character set)

The / creates a clear boundary that prevents ambiguous matching. For input like abc/, there's only ONE way to match it: abc + /. No backtracking explosion is possible.

Comparison with Truly Vulnerable Pattern

import re
import time

# SAFE pattern (has boundary)
safe = re.compile(r'^/([_0-9a-zA-Z-]+/)?core/')

# VULNERABLE pattern (no boundary)  
vuln = re.compile(r'^/(a+)+$')

# Test with increasing input lengths
for n in [20, 25, 28]:
    test = 'a' * n + 'X'
    
    start = time.perf_counter()
    safe.match('/' + test)
    safe_time = (time.perf_counter() - start) * 1000
    
    start = time.perf_counter()
    vuln.match('/' + test)
    vuln_time = (time.perf_counter() - start) * 1000
    
    print(f"n={n}: safe={safe_time:.2f}ms, vulnerable={vuln_time:.2f}ms")

Output:

n=20: safe=0.01ms, vulnerable=38.0ms
n=25: safe=0.01ms, vulnerable=1244.3ms
n=28: safe=0.01ms, vulnerable=9738.5ms

The safe pattern runs in constant time, while the vulnerable pattern shows exponential growth.

Root Cause

In _check_nested_quantifiers, the code calls _contains_quantifier(subpattern) which returns True for any nested quantifier without checking if there's a safe literal boundary.

Proposed Fix

When detecting nested quantifiers, check if the inner quantifier is followed by a literal character that is NOT in the quantified character set. If so, it's safe:

def _has_safe_boundary(self, parsed):
    """Check if quantifier is followed by a literal not in its charset."""
    if len(parsed) < 2:
        return False
    
    first_op, first_av = parsed[0]
    second_op, second_av = parsed[1]
    
    if first_op not in QUANTIFIERS or second_op != LITERAL:
        return False
    
    boundary_char = second_av
    _, _, quant_content = first_av
    char_set = self._get_quantified_chars(quant_content)
    
    return char_set is not None and boundary_char not in char_set

Impact

This false positive affects real-world configs. In one user's nginx config, 158 out of 173 HIGH severity issues were false positives from this bug.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions