Skip to content

Conversation

@williballenthin
Copy link
Contributor

@williballenthin williballenthin commented Jan 9, 2023

this PR fixes the string length calculation when the requested string is a substring of a longer string that begins somewhere before it. in this case, take the longer string length and subtract the amount they overlap. i can't quite understand what the existing code was supposed to do; any ideas @atlas0fd00m ?

for example, in 294b8db1f2702b60fb2e42fdc50c2cee6a5046112da9a5703a548a4fa50477bc there is the data:

.rodata:00000000004120A2 ; const char asc_4120A2[]
.rodata:00000000004120A2 asc_4120A2      db 0Dh,0Ah              ; DATA XREF: sub_40D0A0+B3↑o
.rodata:00000000004120A2                                         ; sub_404970+2F↑o ...
.rodata:00000000004120A2                 db 0Dh,0Ah,0
.rodata:00000000004120A7 ; const char aHttps[]

and this instruction:

.text:000000000040499F BE A4 20 41 00          mov     esi, (offset asc_4120A2+2) ; char *

references the substring two characters into the longer string ("\r\n" versus "\r\n\r\n").

# technically the start of the full string, but the binary does
# some optimizations and just ref's inside the full string to save
# some space
return count + loc[L_SIZE]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, let's walk through this? Because these pair of returns (on line 1032 and 1033) are trying to cover two cases:

  • 1032 is supposed to handle the case where we discover the inner string first, and then detectString is called on a VA before that one. Since count is how far we've gotten into scanning a byte range, we use it to determine how big the outer string should be and add that to the length of what we already know.
  • 1033 handles vice versa, where we make the longer string first, and then get a ref inside the longer string, so the length returned needs to be shorter. The return could be cleaned up a lot.

So let's workshop this a bit? I'm not fully confident we're covering both cases? But I admit, it's been a long time since I've revisited this particular section of code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@williballenthin bump :) i know you have nothing else to do in life. ;)
thanks for your help.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@williballenthin is this still a valuable thing? @rakuy0 would like to walk through how to solve this in the best fashion, but we've heard nothing for months. thanks :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll provide a test case and another explanation shortly. thanks for your patience!

@williballenthin
Copy link
Contributor Author

williballenthin commented Mar 29, 2023

I've added a test to demonstrate the incorrect behavior. When run prior to this patch, the output is:

======================================================================
FAIL: test_overlapping_strings (vivisect.tests.teststrings.VivStringsTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/workspaces/vivisect/vivisect/tests/teststrings.py", line 30, in test_overlapping_strings
    self.eq(self.vw.detectString(0x1100 + i), 0x101 - i)
  File "/workspaces/vivisect/vivisect/tests/utils.py", line 16, in eq
    self.assertEqual(x, y)
AssertionError: 258 != 256

----------------------------------------------------------------------
Ran 1 test in 0.014s

which indicates that somehow a substring of a string location has a length longer than the string location.
There's still a bug in my patch in the case of a substring with a single character at the end of a string location; I'll fix this shortly.
i'll also add some additional cases, such as detecting a larger string that overlaps with a string location and other scenarios.

I will write up a more detailed description tomorrow. Its later here but wanted to get some initial details over to you for review.

@atlas0fd00m
Copy link
Contributor

thanks @williballenthin !
@rakuy0 , you got an eye on this? complex-strings is your baby :)

@williballenthin
Copy link
Contributor Author

@rakuy0 please consider this also for the pending release. if you still need a walkthrough despite the test case, I can write it up August 5 or 6.

# naturally, the overlapping substrings should have
# sizes smaller than the string at 0x1100.
for i in range(0x100):
self.eq(self.vw.detectString(0x1100 + i), 0x101 - i)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might need to tweak your test case a bit? It's failing on CI.

@rakuy0 rakuy0 mentioned this pull request Aug 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants