-
Notifications
You must be signed in to change notification settings - Fork 191
detectString: correctly compute substring length #562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| # technically the start of the full string, but the binary does | ||
| # some optimizations and just ref's inside the full string to save | ||
| # some space | ||
| return count + loc[L_SIZE] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, let's walk through this? Because these pair of returns (on line 1032 and 1033) are trying to cover two cases:
- 1032 is supposed to handle the case where we discover the inner string first, and then detectString is called on a VA before that one. Since count is how far we've gotten into scanning a byte range, we use it to determine how big the outer string should be and add that to the length of what we already know.
- 1033 handles vice versa, where we make the longer string first, and then get a ref inside the longer string, so the length returned needs to be shorter. The return could be cleaned up a lot.
So let's workshop this a bit? I'm not fully confident we're covering both cases? But I admit, it's been a long time since I've revisited this particular section of code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@williballenthin bump :) i know you have nothing else to do in life. ;)
thanks for your help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@williballenthin is this still a valuable thing? @rakuy0 would like to walk through how to solve this in the best fashion, but we've heard nothing for months. thanks :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'll provide a test case and another explanation shortly. thanks for your patience!
|
I've added a test to demonstrate the incorrect behavior. When run prior to this patch, the output is: ======================================================================
FAIL: test_overlapping_strings (vivisect.tests.teststrings.VivStringsTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/workspaces/vivisect/vivisect/tests/teststrings.py", line 30, in test_overlapping_strings
self.eq(self.vw.detectString(0x1100 + i), 0x101 - i)
File "/workspaces/vivisect/vivisect/tests/utils.py", line 16, in eq
self.assertEqual(x, y)
AssertionError: 258 != 256
----------------------------------------------------------------------
Ran 1 test in 0.014swhich indicates that somehow a substring of a string location has a length longer than the string location. I will write up a more detailed description tomorrow. Its later here but wanted to get some initial details over to you for review. |
|
thanks @williballenthin ! |
|
@rakuy0 please consider this also for the pending release. if you still need a walkthrough despite the test case, I can write it up August 5 or 6. |
| # naturally, the overlapping substrings should have | ||
| # sizes smaller than the string at 0x1100. | ||
| for i in range(0x100): | ||
| self.eq(self.vw.detectString(0x1100 + i), 0x101 - i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might need to tweak your test case a bit? It's failing on CI.
this PR fixes the string length calculation when the requested string is a substring of a longer string that begins somewhere before it. in this case, take the longer string length and subtract the amount they overlap. i can't quite understand what the existing code was supposed to do; any ideas @atlas0fd00m ?
for example, in 294b8db1f2702b60fb2e42fdc50c2cee6a5046112da9a5703a548a4fa50477bc there is the data:
and this instruction:
references the substring two characters into the longer string ("\r\n" versus "\r\n\r\n").