Comments are causing false positives

This issue was already mentioned in https://github.com/decalage2/oletools/issues/90, but I think the problem deserves a specific issue.

Currently, for matching suspicious keywords, there is no attempt to distinguish a regular line of code from a comment:

eg.:
https://github.com/decalage2/oletools/blob/168a92d7c53d972f499356bda7d3335c61710eec/oletools/olevba.py#L2201

I think there is a few ways we could avoid false positives related to comments, one of them would be to edit the pattern to look like this:
`r'(?i)^(?:[^']|\b).*\b' + re.escape(keyword) + r'\b'`

The key here is that `^(?:[^']|\b).*` will not match if the line starts with an apostrophe ('). The `|\b` is necessary otherwise the pattern would not match if the keyword was at the start of the line: 
https://regex101.com/r/CUI2V3/1

Alternatively, an other option to solve the issue would be to remove all lines with comments from `vba_code` before running the regex.

<hr>

**Affected tool:**
olevba and mraptor (maybe others as well that I haven't used)

**Describe the bug**
Suspicious keywords (eg. "create") in the comments are causing false positives

**File/Malware sample to reproduce the bug**
```vba
Sub test()
    'I love to create
    MsgBox "Hello world"
End Sub
```

**How To Reproduce the bug**
run olevba on the sample

**Expected behavior**
No threat detected


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments are causing false positives #817

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Comments are causing false positives #817

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions