detectString: correctly compute substring length #562

williballenthin · 2023-01-09T12:43:59Z

this PR fixes the string length calculation when the requested string is a substring of a longer string that begins somewhere before it. in this case, take the longer string length and subtract the amount they overlap. i can't quite understand what the existing code was supposed to do; any ideas @atlas0fd00m ?

for example, in 294b8db1f2702b60fb2e42fdc50c2cee6a5046112da9a5703a548a4fa50477bc there is the data:

.rodata:00000000004120A2 ; const char asc_4120A2[]
.rodata:00000000004120A2 asc_4120A2      db 0Dh,0Ah              ; DATA XREF: sub_40D0A0+B3↑o
.rodata:00000000004120A2                                         ; sub_404970+2F↑o ...
.rodata:00000000004120A2                 db 0Dh,0Ah,0
.rodata:00000000004120A7 ; const char aHttps[]

and this instruction:

.text:000000000040499F BE A4 20 41 00          mov     esi, (offset asc_4120A2+2) ; char *

references the substring two characters into the longer string ("\r\n" versus "\r\n\r\n").

rakuy0 · 2023-01-13T17:43:46Z

vivisect/__init__.py

                            # technically the start of the full string, but the binary does
                            # some optimizations and just ref's inside the full string to save 
                            # some space
-                            return count + loc[L_SIZE]


Hmm, let's walk through this? Because these pair of returns (on line 1032 and 1033) are trying to cover two cases:

1032 is supposed to handle the case where we discover the inner string first, and then detectString is called on a VA before that one. Since count is how far we've gotten into scanning a byte range, we use it to determine how big the outer string should be and add that to the length of what we already know.

1033 handles vice versa, where we make the longer string first, and then get a ref inside the longer string, so the length returned needs to be shorter. The return could be cleaned up a lot.

So let's workshop this a bit? I'm not fully confident we're covering both cases? But I admit, it's been a long time since I've revisited this particular section of code.

@williballenthin bump :) i know you have nothing else to do in life. ;)
thanks for your help.

@williballenthin is this still a valuable thing? @rakuy0 would like to walk through how to solve this in the best fashion, but we've heard nothing for months. thanks :)

i'll provide a test case and another explanation shortly. thanks for your patience!

williballenthin · 2023-03-29T19:36:36Z

I've added a test to demonstrate the incorrect behavior. When run prior to this patch, the output is:

======================================================================
FAIL: test_overlapping_strings (vivisect.tests.teststrings.VivStringsTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/workspaces/vivisect/vivisect/tests/teststrings.py", line 30, in test_overlapping_strings
    self.eq(self.vw.detectString(0x1100 + i), 0x101 - i)
  File "/workspaces/vivisect/vivisect/tests/utils.py", line 16, in eq
    self.assertEqual(x, y)
AssertionError: 258 != 256

----------------------------------------------------------------------
Ran 1 test in 0.014s

which indicates that somehow a substring of a string location has a length longer than the string location.
There's still a bug in my patch in the case of a substring with a single character at the end of a string location; I'll fix this shortly.
i'll also add some additional cases, such as detecting a larger string that overlaps with a string location and other scenarios.

I will write up a more detailed description tomorrow. Its later here but wanted to get some initial details over to you for review.

atlas0fd00m · 2023-03-30T01:42:13Z

thanks @williballenthin !
@rakuy0 , you got an eye on this? complex-strings is your baby :)

williballenthin · 2024-08-01T19:21:11Z

@rakuy0 please consider this also for the pending release. if you still need a walkthrough despite the test case, I can write it up August 5 or 6.

rakuy0 · 2024-08-12T14:42:25Z

vivisect/tests/teststrings.py

+        # naturally, the overlapping substrings should have
+        # sizes smaller than the string at 0x1100.
+        for i in range(0x100):
+            self.eq(self.vw.detectString(0x1100 + i), 0x101 - i)


You might need to tweak your test case a bit? It's failing on CI.

detectString: correctly compute substring length

25b88ef

williballenthin mentioned this pull request Jan 9, 2023

better extract overlapping strings mandiant/capa#1271

Closed

rakuy0 reviewed Jan 13, 2023

View reviewed changes

atlas0fd00m and others added 2 commits February 23, 2023 15:33

Merge branch 'master' into patch-15

6ef82c3

vivisect: add test demonstrating overlapping string detection

546febe

Merge branch 'master' into patch-15

d19b7fe

williballenthin mentioned this pull request Apr 3, 2023

v1.1.1 including dynamic import updates #590

Closed

Merge branch 'master' into patch-15

be70841

rakuy0 reviewed Aug 12, 2024

View reviewed changes

rakuy0 mentioned this pull request Aug 26, 2024

PE/ELF loader fixes #659

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

detectString: correctly compute substring length #562

detectString: correctly compute substring length #562

Uh oh!

williballenthin commented Jan 9, 2023 •

edited

Loading

Uh oh!

rakuy0 Jan 13, 2023

Uh oh!

atlas0fd00m Feb 23, 2023

Uh oh!

atlas0fd00m Mar 29, 2023

Uh oh!

williballenthin Mar 29, 2023

Uh oh!

williballenthin commented Mar 29, 2023 •

edited

Loading

Uh oh!

atlas0fd00m commented Mar 30, 2023

Uh oh!

williballenthin commented Aug 1, 2024

Uh oh!

rakuy0 Aug 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

detectString: correctly compute substring length #562

Are you sure you want to change the base?

detectString: correctly compute substring length #562

Uh oh!

Conversation

williballenthin commented Jan 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rakuy0 Jan 13, 2023

Choose a reason for hiding this comment

Uh oh!

atlas0fd00m Feb 23, 2023

Choose a reason for hiding this comment

Uh oh!

atlas0fd00m Mar 29, 2023

Choose a reason for hiding this comment

Uh oh!

williballenthin Mar 29, 2023

Choose a reason for hiding this comment

Uh oh!

williballenthin commented Mar 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

atlas0fd00m commented Mar 30, 2023

Uh oh!

williballenthin commented Aug 1, 2024

Uh oh!

rakuy0 Aug 12, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

williballenthin commented Jan 9, 2023 •

edited

Loading

williballenthin commented Mar 29, 2023 •

edited

Loading