Donut Malware Analysis
Donut Malware Analysis
Introduction
The purpose of this blog post is to walk our readers, particularly those who are just stepping into
the realm of malware analysis, through our process of analyzing a unique .NET PE malware
that loads Remcos. This sample features the intriguing ability to switch execution between
managed and unmanaged code, a behavior that is well-known and documented. However, while
this ability is interesting in its own right, the primary focus will be on the analysis process itself.
We will cover not only the tools and analysis process, but also the key questions we, as
malware analysts, ask ourselves when we encounter obstacles.
The motivation behind writing this post is that we want to provide the kind of resource that we
would’ve liked to have seen more of when starting our own careers in malware research. Many
malware analysis reports are usually written while keeping seasoned analysts in mind as the
intended audience, so details of the analysis process are often briefly mentioned in passing or
just omitted entirely. On the other hand, while there is already a plethora of malware analysis
tutorials and resources geared towards beginners in print and online, they can sometimes be
too contrived to be applicable in more general cases, too small in scope where they cover only a
single tool in isolation without the full context, or written in such a way where it feels more like a
retrospective account of what’s already known rather than a real-time investigation of the
unknown.
We feel that it would be immensely valuable for beginner analysts to follow along a start-to-finish
analysis of an infection chain that demonstrates the practical application of multiple malware
analysis tools and methodologies, while still maintaining the sense of uncertainty and struggle
that often comes with investigating potential false negatives. Thus, we will start from the very
beginning where we don’t even know whether the sample is malicious, and end where we
discover which family of malware is delivered.
It is assumed that the reader is already familiar with some assembly, basic reverse engineering
concepts, and tools such as IDA Pro, x64dbg, and dnSpy.
Table of Contents
Part 1: Analyzing the first stage loader
Statically analyzing C# samples
Using Type References to infer functionality
Tracing the code: decrypting the payload
Tracing the code: using .NET interoperability to invoke native code
Tracing the code: executing the next stage payload
Summary
Part 2: Debugging the execution of native code
Debugging the unmanaged code using x64dbg
Using IDA Pro in parallel to statically analyze the shellcode
Syncing static and dynamic analysis
Dynamic API resolution
Identifying memcpy
Zeroing out memory
Identifying simple resource decryption and obtaining the plaintext
Some functions are not immediately clear
Dynamic API resolution (cont.)
Summary
Part 3: Covering its tracks by patching AMSI and EtwEventWrite
Getting function pointers to AMSI functions
Changing the memory protections of AMSI functions
Program counter (PC)-relative addressing
Disabling AmsiScanBuffer
Restoring the memory protections of AmsiScanBuffer
Disabling AmisScanString
Disabling EtwEventWrite
Summary
Part 4: Creating and starting the .NET CLR
Copying a PE file into memory
Switch block in the main control code
Using GUIDs to identify interfaces: CLRCreateInstance
Loading type information libraries and creating custom structures
Converting a version string using MultiByteToWideChar
ICLRMetaHost::GetRuntime
Fixing the decompiled code
CLRRuntimeInfoImpl::IsLoadable
CLRRuntimeInfoImpl::GetInterface
ICorRuntimeHost::Start
Summary
Part 5: Loading the next stage payload
Copying the executable to a SAFEARRAY
CorHost::CreateDomain
SafeArrayCreate
Copying the PE file into the SAFEARRAY
Loading the assembly into the app domain
Cleaning up the PE file from memory
SafeArrayDestroy
Summary
Part 6: Running the .NET assembly in the CLR
Creating strings
GetType_2
InvokeMember_3
Summary
Part 7: Process injection
Attempting static analysis of the .NET assembly
Using Process Hacker to analyze .NET assembly behavior
Debugging process injection with x64dbg: Creating the process
Debugging process injection with x64dbg: Allocating and writing to the process
memory
Summary
Conclusion
Part 1: Analyzing the first stage loader
The output “PE32 executable (GUI) Intel 80386 Mono/.Net assembly, for MS Windows”
indicates that our sample is a .NET assembly.
A .NET assembly is a DLL or EXE file produced by compiling code written in a .NET language
such as C#. However, instead of containing native x86/x64 machine code, it contains MSIL
(Microsoft Intermediate Language) bytecode. This MSIL gets executed by the .NET runtime,
which compiles the MSIL into native code at runtime.
A good tool for analyzing such files is dnSpy, a .NET debugger, assembly editor, and
decompiler.
When we load our sample into dnSpy, we immediately see that some functions and objects
have names like \u0001, \u0002, etc. instead of human-readable names, which suggests that
the sample is obfuscated. The fact that the sample is obfuscated in itself is not malicious;
companies will often obfuscate their software to protect intellectual property. However, it is still a
fact worth noting.
Figure 1.1.2: Decompiled .NET sample with obfuscated function names in dnSpy
We’ll use de4dot, a tool used for detecting common obfuscators and cleaning up .NET binaries,
by dragging and dropping the binary file onto de4dot.exe. The output should resemble the
output shown in Figure 1.1.2:
Figure 1.1.3: de4dot output
None
; alternatively, use the de4dot command line
> de4dot -f sample
Figure 1.1.4
So far, we have done a simple initial triage of the sample. Using file, dnSpy, and de4dot, we
have identified the sample as a .NET assembly, determined that it is obfuscated, and
de-obfuscated it. Now that the decompiled code is de-obfuscated, we can start examining it.
Using Type References to infer functionality
A good place to start is to look for any interesting indicators and work our way from there. In this
case, we can get a clue of the functionality of the sample by examining its type references. Type
references are metadata that allow assemblies to refer to classes, functions, or variables
defined in other assemblies.
As malware commonly depends on resources from the internet, we’ll look for any network
related keywords. We can see that the sample uses HttpClient:
If we click on Mcwpgr, we are taken to its function definition, where we can see a call to
HttpClient (Figure 1.2.3):
Figure 1.2.3
We can see that the sample creates an HTTP client and downloads a payload from
hxxps://bitbucket[.]org/veloncontinetaker/utencilio/downloads/Tsudun[.]pdf.
Using static analysis to examine the type references, we were able to quickly zero in on
interesting parts of the sample without having to trace the code from the entrypoint.
Now that we’ve identified that the sample attempts to make an HTTP connection, we’d like to
know more about the payload the sample is trying to download. We can do this by setting a
breakpoint in dnSpy and running the debugger.
Fortunately the payload (SHA256:
daba1c39a042aec4791151dbabd726e0627c3789deea3fc81b66be111e7c263e) was still
available at the time of analysis. However, it doesn’t appear to be a PDF file as the file
extension suggests; the sample lacks the expected magic bytes (or the file signature, which
analysts frequently use to identify the file types of payloads), “25 50 44 46 2D” (or “%PDF-”),
at the beginning of the file. In fact, we aren’t really sure what kind of file it is, as it doesn’t have
any recognizable magic bytes, and all of the bytes are within the ASCII range of characters:
We’ll need to return to examining the decompiled code in order to gain more insight into the
payload.
If we step out of the method that downloads the file, we see what appears to be a very simple
byte conversion, which may be the decryption routine (Figure 1.3.2):
Figure 1.3.2
We can take advantage of dnSpy’s debugging capabilities to execute this code and decrypt the
payload dynamically. Be sure to use a VM when conducting dynamic analysis on a sample. We
recommend using FLARE-VM, which automates setting up a VM for malware analysis.
Let’s set a breakpoint on the line after where the payload is downloaded and decrypted (Figure
1.3.3):
Figure 1.3.3
We can run the sample by clicking ‘Start’ at the top of the window (Figure 1.3.4):
Figure 1.3.4
We’ll hit the breakpoint, at which point the payload will have been downloaded from the internet
and decrypted. We can view the decrypted payload by right-clicking array in the debugger
window > “Show in Memory Window” > “Memory 1” (Figure 1.3.5):
Figure 1.3.5
We can now see the decrypted payload bytes in the Memory Window (Figure 1.3.6):
Figure 1.3.6
Using the debugger, we were able to download the payload and quickly decrypt it
(d2bea59a4fc304fa0249321ccc0667f595f0cfac64fd0d7ac09b297465cda0c4) without having to
implement a script to do so.
If we examine the methods that take in the decrypted payload as an argument, none of them do
byte conversions like we just saw previously, so there probably aren’t any additional layers of
encryption.
From here, we can take one of two options:
1. We can either start analyzing the decrypted payload
2. We can continue analyzing the other methods of the current sample.
In our initial analysis, we chose to take the latter approach, as taking the former brings the risk
of getting sucked into a rabbit hole and losing sight of the bigger picture.
Let’s examine this method that takes in the decrypted payload bytes as an argument (Figure
1.4.1):
Figure 1.4.1
This method calls a series of other methods, but one of them catches our attention; without
even having to read the code line by line, we can guess that VirtualAlloc is probably getting
called here (Figure 1.4.2):
Figure 1.4.2
But how can a .NET assembly call a native API function from a DLL?
Tracing the code: executing the next stage payload
Let’s examine smethod_2. It takes the arguments of smethod_4 saved in an array, plus the
strings ‘kernel32.dll’ and the mangled ‘VirtualAlloc’ string (Figure 1.5.1):
Figure 1.5.1
It looks like the strings are fed into smethod_8 before they’re passed into smethod_7 (Figure
1.5.2):
Figure 1.5.2
Before looking at the contents of smethod_8, let’s take a look at smethod_7’s parameters
(Figure 1.5.3):
Figure 1.5.3
Judging by the fact that smethod_8 takes in the ‘kernel32.dll’ string and the mangled
‘VirtualAlloc’ string, and returns something of type ‘IntPtr’, smethod_8 probably returns
the function pointer to VirtualAlloc. Since we have a reasonable guess of what
smethod_8 does, we won’t look into its implementation for now unless we have reason to
believe that our guess was incorrect.
Figure 1.5.4
Let’s check out what arguments are passed into VirtualAlloc by setting a breakpoint right
after the array of arguments for VirtualAlloc is set up (Figure 1.5.5):
Figure 1.5.5
When we reach the breakpoint, we can examine array to view the argument values (Figure
1.5.6):
Let’s revisit the method that called the one where the payload was downloaded and decrypted,
as well as the method that invoked VirtualAlloc. We will examine the next method,
Marshal.Copy (Figure 1.5.7):
Figure 1.5.7
This method also calls several other methods, but one of them looks similar to what we just saw
with VirtualAlloc, only this time we see VirtualProtect (Figure 1.5.9):
Figure 1.5.9
Using the same method as before to find the arguments of VirtualAlloc, we find that the
arguments for VirtualProtect are:
Parameter Type Argument Description
name value
lpAddress LPVOID 0x06510000 “The address of the starting page of the region
of pages whose access protection attributes
are to be changed.”
dwSize SIZE_T 0x0010AA35 “The size of the memory region whose
protections are to be changed, in bytes.” In this
case, this is the size of our downloaded
payload
Indeed, if we go back to the previous method (Figure 1.5.10), we see that the next few
instructions include calls to Marshal.GetDelegateForFunctionPointer and
DynamicInvoke (which were used to invoke VirtualAlloc and VirtualProtect), and
we can see that the pointer to the allocated memory where our payload was copied to is passed
in as an argument, thus confirming our suspicion that the sample is executing shellcode:
Figure 1.5.10
Through a combination of static and dynamic analysis, we determined that VirtualAlloc and
VirtualProtect were being called, identified their arguments, and discovered that the
sample executes the decrypted payload.
Summary
To summarize this section, we’ve:
1. Identified the sample as a .NET assembly using the file utility
2. De-obfuscated the sample using de4dot
3. Used dnSpy to determine that the sample downloads a payload from the internet
4. Used dnSpy’s debugging capabilities to download the payload and decrypt it
5. Determined that the sample loads the decrypted payload into memory and executes it
The reason we bring this up is that the sample has transitioned execution from managed to
unmanaged code; in the last section, we observed that the sample, which was executed in a
runtime, downloaded a payload and wrote it to memory, converted the pointer to this memory
into a delegate, and invoked it.
Up until now, we were able to step through the managed code with dnSpy, but since execution
hopped to unmanaged code, we’ve lost visibility of the execution flow of the sample and can no
longer control its execution with dnSpy’s debugger. We’ll need to find some way to use a native
debugger, like x64dbg, to attach to the process in order to gain insight on the unmanaged code.
We first load the sample into x64dbg, and then set a breakpoint on
kernel32.VirtualProtect by selecting the ‘Symbols’ tab > Type ‘kernel32.dll’ in the
Module search bar > Type ‘VirtualProtect’ in the Symbol search bar > Right click
‘VirtualProtect’ > Click ‘Toggle Breakpoint’ (Figure 2.1.1):
Figure 2.1.1
Since we know what arguments are passed to VirtualProtect, we can set a conditional
breakpoint by selecting the ‘Breakpoints’ tab > Right-click the breakpoint we created in the
previous step > ‘Edit’ (Figure 2.1.2):
Figure 2.1.2
We’ll create the following conditional breakpoint (note that the arguments are zero-indexed, so
we are getting the second and third arguments of VirtualProtect) (Figure 2.1.3):
Figure 2.1.3
After setting the breakpoint, we run the sample. When we break at our breakpoint, we can see
that the stack now contains the expected arguments of VirtualProtect (Figure 2.1.4)
We can now set an execution breakpoint at the beginning of the allocated memory (“Memory
Map” tab > right click 06BD0000 > “Memory Breakpoint” > “Execute” > “Singleshoot”), and
remove the breakpoint we placed on VirtualProtect. We continue execution, and we break
at the first instruction of the shellcode. We have successfully found a way to control the
execution of the unmanaged code using the debugger.
At this point, as we use x64dbg for dynamic analysis, we’ll also start using IDA Pro in parallel to
statically analyze the shellcode.
When analyzing shellcode, it can be very difficult to comprehend the sample solely based on
either static or dynamic analysis alone. Although IDA Pro is capable of decompiling code, it
doesn’t give the whole picture since many variables and arguments are generated dynamically,
and functions are often called by memory reference. We would know the overall control flow of
the program, but we wouldn't know the purpose of the functions without any data, API function
names, or strings to inform us. On the other hand, with just dynamic analysis (or in this case,
x64dbg), it can be very easy to lose the forest for the trees when stepping through assembly
instructions. Thus, we will frequently need to switch between the two tools to put together a
clear picture of what the malware is doing; we’ll use the debugger to step our way through the
instructions, and use the decompiled code as landmarks and signposts to guide us along the
way.
Next, we’ll want to find the decompiled code in IDA Pro that corresponds to the first instructions
in the shellcode so that we can follow along the decompiled code as we use the debugger.
We copy the first set of bytes before the first jump:
Then we search for these bytes in IDA Pro (“Search” in top menu bar > “Next sequence of
bytes”):
Figure 2.2.3: Searching for the first set of instructions in IDA Pro
At this moment, we anticipate that readers might be wondering about the functions above the
else block. Our readers might understandably fear that important information would be missed if
those functions are left unanalyzed. We reassure our readers that we often share that same
fear. However, rather than examine each and every function, the approach we prefer to take is
to follow the main execution flow until it either runs too deep to the point where we feel that it
would be good to explore nearby functions, or the execution flow simply does not provide
enough information to understand the sample. For now, let’s dive into it. We can always return
and analyze the surrounding code if necessary. Remember there’s always the option of running
dynamic analysis tools like Process Monitor and fakenet in the background to catch any
interesting behavior should they occur while we’re debugging.
We step into sub_108F76 and see multiple calls to a function, sub_10A51D (Figure 2.3.2).
Figure 2.3.2
If we step over the first call, we can see that the function pointer to kernel32.VirtualAlloc
is returned in EAX (Figure 2.3.3):
Figure 2.3.3
If we step over the next two calls to sub_10A51D, they also yield function pointers to other APIs
(Figures 2.3.4 and 2.3.5). From this, we can probably deduce that sub_10A51D dynamically
resolves API functions and returns their pointers.
Let’s label this function as get_func_ptr to make the decompiled code easier to read (right
click the function name > “Rename global item”):
We can do the same thing in x64dbg by right clicking the address of the function (in our case, it
was 0x06CDA51D) > “Label” > “Label 06CDA51D”:
Finally, let’s also rename the variables that store the returned function pointers (right click
variable name > “Rename lvar”:
Figure 2.3.8: The decompiled code after renaming the local variables containing the returned function
pointers
We’ll step forward until we reach the next function (Figure 2.3.9):
Figure 2.3.9
Looks like we’re calling VirtualAlloc to allocate a memory region of 12288 bytes. When we
step over the call, it returns the address of the allocated memory, 0x06710000 (Figure 2.3.10):
Figure 2.3.10
Let’s label the variable that stores the memory address alloced_mem:
Figure 2.3.11: The decompiled code after renaming the pointer to the memory allocated by
VirtualAlloc
We’ll also label v7, as it’s essentially alloced_mem, and is used in the upcoming functions:
Figure 2.3.12: The decompiled code after renaming the copy of alloced_mem
Lastly, we’ll label the address in x64dbg (right click “06710000” > “Follow in Dump” > right click
“06710000” in the dump pane > “Label Current Address”):
Identifying memcpy
We step forward until we hit the next function, sub_10A992 (Figure 2.4.1):
Figure 2.4.1
Figure 2.4.2
We can guess that this function probably does something to the newly allocated memory, so
let’s watch the memory dump of alloced_mem. However, before we do so, let’s first take a look
at what’s inside 0x06BD0005 (Figure 2.4.3), one of the other arguments passed into
sub_10A992:
Figure 2.4.3
Figure 2.4.5
Apparently, sub_10A992 copied the bytes from 0x06BD0005 to the memory region pointed to
by alloced_mem. The third argument, 0x00107C96, might have been the number of bytes to
be copied. Considering that the function is copying bytes from one area of memory to another,
and the arguments appear to be 1. a pointer to destination buffer, 2. a pointer to the source
buffer, and 3. an integer denoting the number of bytes to copy, the function that we’re looking at
is memcpy.
Since it’s not immediately clear what these bytes are for, we’ll just continue stepping forward to
the next function and hope that we’ll eventually gain more info. For now though, we’ll label
sub_10A992 as memcpy:
Figure 2.4.6: The decompiled code after renaming sub_10A992 to memcpy
We’ll step to the corresponding instruction where sub_10A9B6 is called (Figure 2.5.2):
Figure 2.5.2
Figure 2.5.3
If we look up 0x0581E984 in the memory map (“Memory Map” tab > Ctrl+G > type
“0x0581E984” in the search bar), we can see that this is an address in the stack:
Figure 2.5.4: The memory region containing 0x0581E984 as shown in the “Memory Map” view of
x64dbg
From this we can probably deduce that sub_10A9B6 manipulates the stack.
Let’s compare and contrast the memory at 0x0581E984 before and after sub_10A9B6 is
called:
Figure 2.5.5
Looks like 32 bytes were zeroed out, which makes sense, considering that the second argument
is 0 and the third argument is 32.
The behavior and signature of this function suggests that we are looking at memset, so we’ll
label it as such:
It’s not immediately obvious what the purpose is of zeroing out those bytes. We’ll have to move
on and hope that the purpose will be more apparent later. We’ll rename the modified variable in
IDA Pro to zeroed_out_mem:
Figure 2.5.8: The decompiled code after renaming the variable containing the zeroed out memory
We’ll also label the address in x64dbg so that we can easily recognize it if we come across it
again:
If we peek into sub_10A68A’s decompiled code and skim over its contents, we’ll notice several
things. The first is that there are multiple calls to __ROL4__, which is the “rotate left” bit
operation. The second is that there are a lot of other bit operations, like bitwise AND. Lastly, we
see that these bitwise operations occur in a do/while loop.
The application of bit operations in an iterative manner suggests that we are looking at a
cryptographic function, and we can expect the function to return plaintext. Let’s step to the
instruction that calls sub_10A68A and check out its arguments:
Figure 2.6.4
The first three arguments appear to be within the chunk of memory that was allocated earlier at
0x06710000 (alloced_mem). Let’s do a before-and-after comparison of the data at all three
addresses:
Figure 2.6.5
Figure 2.6.6
After execution, 0x06710004 and 0x06710014 appear to be unchanged (not shown), but
0x0671023C looks different (this means that 0x06710004 is the key, and 0x0671023C is the
ciphertext/plaintext). It appears to be a list of names of common system DLLs.
Figure 2.6.8: The decompiled code after renaming the decryption function
To recap, we have just identified the purpose of a function as a decryption routine based on the
type of operations it uses, and used the debugger to obtain the plaintext from memory.
Figure 2.7.1
Figure 2.7.2
There is no change in the before-after dump comparison (not shown), and this is what the
function returns (Figure 2.7.3):
Figure 2.7.3
It’s not immediately clear what this is, but it’s obviously not a valid memory address, so we’ll just
move on.
We reassure our readers that this happens frequently in malware analysis. When analyzing a
function, we may understand what it does but not know its purpose until later. Since our goal is
to determine the overall purpose of the malware and its method of achieving its purpose (as
opposed to creating a detailed and comprehensive report on every one of its functions), we may
never even find the purpose of copying these bytes at all. This is why we suggest using a
breadth-first-search approach and only focusing on analyzing functions that are obviously vital.
Figure 2.8.1
This time, it returns LoadLibraryA, an important API function commonly called by malware for
loading DLLs:
If we quickly click through the next set of instructions, we notice that we’ve entered a loop
(Figure 2.8.4):
Figure 2.8.4
Here is the corresponding loop in IDA Pro (notice that there is also an inner loop) (Figure 2.8.5):
Figure 2.8.5
Shifting our attention back to the debugger, if we keep an eye on the registers as we iterate
through the loop, we see that EDX (which initially points to the section of memory containing a
list of names of DLLs) increments by 1, and EAX counts the number of times we loop. We then
stop when EDX points to “;”, at which point we break out of the inner loop.
Figure 2.8.6
The registers after several iterations of looping (notice that EDX has advanced forward a few
characters in the string, and EAX keeps track of the number of times we have advanced)
(Figure 2.8.7):
Figure 2.8.7
If we look back at the decompiled code, we can find the corresponding comparison (recall that
one of the operands in the cmp instruction is “3B”, or “59” in decimal):
Figure 2.8.8: One of the loop’s break conditions in the decompiled code
We’ll convert the number to a char (Right click “59” > “Char”) to make the decompiled code
easier to read:
Figure 2.8.9: The decompiled code after converting the break condition value from a decimal value to a
character
Once we break out of the inner loop, we step to where sub_10A324 is called and check out the
arguments that are passed to it (Figure 2.8.10):
Figure 2.8.10
Figure 2.8.11
The arguments are alloced_mem and the string “ole32”. When we step over the function, we
see that EAX points to the base address of ole32.dll:
It appears that sub_10A324 takes the name of a DLL as input, and returns a handle to that
DLL. The timing makes sense, as we had just loaded and gotten the function pointer for
LoadLibraryA earlier. We’ll label this function as get_dll_base_addr:
Figure 2.8.13: The decompiled code after renaming sub_10A324
Figure 2.8.14
When we step forward, we end up at the same inner loop as before. As a reminder, this is the
loop as it appears in x64dbg:
Figure 2.8.17
Here are the starting conditions at the beginning of the loop (notice that EDX contains the same
string from the first time we encountered the inner loop, only this time it’s missing the “ole32”
substring):
Figure 2.8.18: The registers in the second iteration of the outer loop, at the top of the inner loop
As we noticed before, after a few iterations, EAX is incremented a few times, and EDX is
missing the number of characters equal to the value of EAX:
Figure 2.8.19: The registers in the second iteration of the outer loop, after a few iterations of the inner
loop
If we set a breakpoint on the instruction right after the end of the inner loop and step forward, we
end up at the get_dll_base_addr function call with the following arguments (Figure 2.8.20):
Figure 2.8.20
Finally, when we step over the function, the base address of oleaut32.dll is returned
(Figure 2.8.21):
Figure 2.8.21
Let’s set a breakpoint on the instruction after the end of the outer loop and continue execution.
Figure 2.8.23
Figure 2.8.24
We also notice that we’ve entered another loop (Figure 2.8.25):
Figure 2.8.25
Using the same tactic of looping a few times and watching the registers, we observe that in
each iteration, ECX increments by 4, EBP increments by 1, and a function pointer is returned in
EAX.
Figure 2.8.26
Figure 2.8.27
In the memory dump, if we monitor the address pointed to by ECX during each iteration, we can
visually confirm that the memory after 0x06710030 is getting filled with function pointers.
Dump of 0x06710030 after the tenth iteration (Figure 2.8.28):
Figure 2.8.28
Figure 2.8.29
This is apparently where the malware saves the function pointers of the loaded APIs.
What the malware has accomplished here is loading function pointers into memory so that it can
utilize a common technique called dynamic API resolution. Libraries (i.e. DLLs) contain
functions that are exported for the caller to use, and can be statically or dynamically linked. In
the case of dynamic linking, an executable’s imports are stored in the import table in its PE
header, and the PE loader parses this table in order to import the libraries. Because import
tables are visible to antivirus software, malware authors often use functions such as
LoadLibrary and GetProcAddress to manually load imports without having to expose the
imports in the import table.
These functions are invoked by calling the offset of the memory buffer that the pointers are
stored in. In the IDA pseudocode view, it’ll look something like “a1 + <offset>”, where a1 is
a pointer to a memory buffer. We’ll see many examples of this further on.
Summary
In this section, we managed to attach a native debugger to the process so that we can control
the execution of the native code, dumped the native code so that we can decompile it in IDA
Pro, and observed that the malware:
We step forward until we encounter the next function, sub_108106 (Figure 3.1.1):
Figure 3.1.1
The function gets called with a single argument, alloced_mem (Figure 3.1.2):
Figure 3.1.2
Going through our usual process of comparing the dumps of memory addresses and checking
EAX for return values doesn’t really reveal anything interesting about this function. We’ll take a
look at the decompiled code and step into the function with the debugger to get a better idea of
what’s going on.
The first function is a call to get_dll_base_addr, which returns the base address of
amsi.dll (Figure 3.1.4):
Figure 3.1.4
We continue stepping forward until we hit the next function, sub_10A3C2 (Figure 3.1.6):
Figure 3.1.6
and returns:
The function appears to take in as arguments a pointer to alloced_mem, the base address of
amsi.dll, and the string “AmsiScanBuffer”, and returns a pointer to AmsiScanBuffer, which
scans a buffer for malware. Let’s update the variables and function accordingly:
Figure 3.1.9: The decompiled code after labeling get_dll_func and amsi_scan_buffer
Changing the memory protections of AMSI functions
The next function gets called dynamically using pointer arithmetic (Figure 3.2.1):
Figure 3.2.1
Figure 3.2.2
Let’s examine the arguments that are passed into VirtualProtect (Figure.3.2.3):
Figure 3.2.3
Referring back to Table 1.5.2, we are setting 12 (0xC) bytes of memory at the start of the
instructions of AmsiScanBuffer (0x73B840E0) to PAGE_EXECUTE_READWRITE (0x40),
and storing the previous protection value at 0x0581E95C.
Setting memory to be both writable and executable provides the ability for threat actors to
execute arbitrary code, so the fact that the memory that stores the instructions of a function
(particularly a function that is an integral component of Windows security) is getting set to both
executable and writable is very suspicious.
Figure 3.3.1
It takes no arguments, and when we step over it, it just returns the address of the function:
The decompiled code of sub_10A31A is consistent with what we just observed in the
debugger:
Figure 3.3.3: The decompiled code of sub_10A31A
The function is extremely simple, but what is the purpose of doing this? Let’s turn to the
debugger to see if we can get some clues.
Figure 3.3.5
Here are the same instructions in IDA. Note that the call instruction is actually written as call
$+5:
Putting these two facts together, call $+5 means “push the address immediately after the
call instruction onto the stack, and then jump to that address”.
This might seem like a very roundabout way of pushing the address of the next instruction onto
the stack, but there actually isn’t a more straightforward way of doing so using the x86
instruction set; an instruction like push eip+5 is not valid, as EIP cannot be used directly as
an operand.
Let’s turn our attention back to the debugger to observe this in action. call 6CDA31F pushes
0x06CDA31F onto the stack, and then jumps to 0x6CDA31F:
Figure 3.3.7: The operand of the call instruction is also the address of the next instruction
Now that 0x06CDA31F is on the stack, it gets stored in the EAX register with the pop eax
instruction:
And then we subtract 5 from 0x06CDA31F with the sub eax, 5 instruction:
As we observed when we first stepped over sub_10A31A, the end result is that 0x06CDA31A
gets stored in EAX.
By calling sub_10A31A, the shellcode can access the resources it needs using an offset
relative to the address of sub_10A31A in memory. If we look at the decompiled code to see
how it’s used, we can see that the address returned by sub_10A31A, which we’ll now call
get_pc, is used in the second argument of memcpy in order to access the address of the
source buffer.
Disabling AmsiScanBuffer
Moving on to the next function, memcpy (Figure 3.4.1):
Figure 3.4.1
Figure 3.4.2
Figure 3.4.3
We’ll use our usual approach of comparing the memory region’s contents before and after the
function call:
0x73B840E0, the destination buffer, before the call to memcpy (Figure 3.4.4):
Figure 3.4.4
0x73B840E0, the destination buffer, after the call to memcpy (Figure 3.4.5):
Figure 3.4.5
This partially answers the question we asked ourselves earlier; it appears that the malware
altered the memory protection value to PAGE_EXECUTE_READWRITE so that it can directly
write to the first 12 bytes of AmsiScanBuffer. However, we still want to determine the
purpose behind this action.
Since these bytes are instructions, we can probably gain insight into what the malware wants to
accomplish by comparing how the disassembled instructions change after memcpy is called.
Using the VM snapshot feature, we go back in time to right before memcpy is called, and
examine the disassembled instructions of AmsiScanBuffer:
Figure 3.4.6
The first instruction ‘mov eax, [esp+18]’ moves the contents of ‘esp+18’ (or in other
words, the sixth argument of AmsiScanBuffer) into EAX. According to the
AmsiScanBuffer documentation, the last argument, AMSI_RESULT, is a pointer to the result
of the buffer scan. The next instruction, ‘and [eax], 0’ sets AMSI_RESULT to 0, which,
according to amsi.h (Figure 3.4.8), corresponds to AMSI_RESULT_CLEAN (nothing malicious
was detected in the buffer):
Figure 3.4.8
‘xor eax, eax’ just zeroes out EAX. Finally, ‘ret 18’ pops the return address into EIP and
adds 0x18 to ESP, effectively cleaning up 24 bytes from the stack.
In summary, AmsiScanBuffer has been patched in memory to always return
AMSI_RESULT_CLEAN. This is one of many methods used to bypass AMSI.
Figure 3.5.1
Figure 3.5.2
This time, VirtualProtect is called with the following arguments (Figure 3.5.3):
Figure 3.5.3
Like the first call to VirtualProtect, the first argument is a pointer to AmsiScanBuffer,
and the second argument is 12, but the third argument this time is 0x20. If we look up the
Microsoft documentation for VirtualProtect, we see that this value corresponds to
PAGE_EXECUTE_READ. It appears that the malware is restoring the memory protection to its
original value.
Disabling AmisScanString
Looking back at the decompiled code in IDA, we notice that the combination of
get_dll_func, the call to VirtualProtect via pointer arithmetic to set the memory to
PAGE_EXECUTE_READWRITE, get_pc, memcpy, and the second call to VirtualProtect to
set the memory to PAGE_EXECUTE_READ, is invoked once more:
Figure 3.6.1: The same combination of get_dll_func, VirtualProtect, get_pc, and memcpy is
called again
Since the malware is essentially doing the same thing as before, we won’t go into as much
detail, but it would be good to know what function is getting altered this time. If we quickly
advance forward to the next function call to get_dll_func, we can see that the malware’s
next target function is AmsiScanString, which scans a string for malware:
Figure 3.6.2: The return value of the second call to get_dll_func is AmsiScanString
Figure 3.6.3: The instructions of AmsiScanString after they are overwritten by the dummy code
We’ve finally reached the end of sub_108106, which we’ve identified as a function that
effectively disables AmsiScanBuffer and AmsiScanString.
Disabling EtwEventWrite
Let’s step out of sub_108106 and label it as disable_amsi, and then step into the next
function, sub_108201:
If we look at the decompiled code of sub_108201, we can see that it’s pretty similar to
disable_amsi in that it calls get_dll_base_addr, get_dll_func, and memcpy, so our
intuition tells us that this function may be similar to disable_amsi:
Let’s replace the variable names that IDA Pro initially provided with more appropriate names:
Figure 3.7.5: The decompiled code after labeling the variables containing the base address of
ntdll.dll and the function address of EtwEventWrite
We encounter a function that gets called via pointer arithmetic, which resolves to
VirtualProtect (Figure 3.7.6):
Figure 3.7.6
The arguments are pretty similar to the arguments used in the previous two calls to
VirtualProtect in disable_amsi, except this time, we are setting 4 bytes of
EtwEventWrite to RWX:
Figure 3.7.7: The arguments of VirtualProtect
Figure 3.7.8
If we compare the contents of EtwEventWrite before and after the call to memcpy like we did
for AmsiScanBuffer and AmsiScanString earlier, we can see that the instructions have
been altered so that EtwEventWrite just returns right away. This prevents ETW events from
being written, making it harder for security tools to monitor the activity of this process.
Figure 3.7.9
Figure 3.7.11: The decompiled code after labeling the function disable_etw_event_write
Summary
To recap, the malware disabled several important Windows API functions critical for security by
doing the following:
We also observed the malware employing PC-relative addressing to locate resources such as
the dummy code used to overwrite AmsiScanBuffer, AmsiScanString, and
EtwEventWrite.
Figure 4.1.3
We’ll label the returned value in IDA Pro as alloced_mem_2 for now:
Figure 4.1.4: The decompiled code after renaming the variable that points to the allocated memory
Figure 4.1.5: The decompiled code after renaming the pointer copy
The next function that gets called is memcpy (Figure 4.1.6):
Figure 4.1.6
Figure 4.1.7
Figure 4.1.9
The purpose of these bytes is not immediately clear, so we’ll step forward until the next function
call, sub_10A795 (Figure 4.1.10):
Figure 4.1.10
Figure 4.1.11
Figure 4.1.12
Figure 4.1.14
To an experienced researcher or anyone who regularly works with PE files, the bytes ‘4D 5A’ or
the string ‘MZ’ may seem familiar; these are the magic bytes of PE files.
It’s very suspicious that memory has been allocated and filled with the bytes of a PE file, as it’s
highly likely that it will be executed somehow. We’ll want to take note of this memory address to
see how it’s used later:
The presence of a series of paired cmp and jump instructions, all using the same operand in the
cmp instructions, suggests that this is a switch block. It appears that we have reached this part
of the decompiled code:
Figure 4.2.2: The corresponding decompiled code in IDA Pro
If we step a few instructions into the switch block, we end up at the function inside case 1 and 2:
Before we analyze sub_108D67, we’d like to point out that we won’t attempt to determine the
switch condition for now. We figured that the switch condition is determined by some
configuration setting that resides in this payload. Also, we risk getting sucked into a rabbit hole if
we were to delve into the functions that are inside each of the other cases. We can always come
back to determine what exactly the switch condition is later if the need arises.
Having said that, let’s examine the arguments of sub_108D67 (Figure 4.2.5):
Figure 4.2.5
Figure 4.2.7
Figure 4.2.8
If we keep an eye on the contents of the memory addresses that are passed into sub_108D67,
we notice that only the contents of zeroed_out_mem change:
Figure 4.2.10
Figure 4.2.11
Based on these results, we haven’t really gleaned much information about sub_108D67. Also,
when we step over sub_108D67, we see a flurry of activity in the lower left-hand corner of
x64dbg, and the function execution takes slightly longer than the execution of the other
functions we’ve stepped over. The combination of not knowing what this function does and the
slightly longer execution time compels us to step into sub_108D67 to investigate what it does.
Figure 4.3.2
Figure 4.3.3
CLRCreateInstance is part of the Hosting API, which “enables unmanaged hosts to integrate
the common language runtime (CLR) into their applications.” In other words, it allows a program
written in C or C++ to run a CLR in order to use .NET features and run .NET code.
Why might malware want to do this? It may add flexibility to the infection chain by allowing
malicious actors to easily switch out payloads. Another reason is evasion; different antivirus
strategies are designed to handle different filetypes, so hopping between .NET code and native
code can confuse security products.
Although we do have the values that are passed into CLRCreateInstance, the
documentation doesn’t show what class/interface those values correspond to. We’ll have to do
some detective work.
A quick google search of “CLSID” yields this Microsoft documentation, which states that “each
COM class is identified by a CLSID, a unique 128-bit GUID” (see this resource to learn about
the Component Object Model (COM))”.
Let’s examine the contents of our first two arguments, 0x06710850 and 0x06710860 (Figure
4.3.4):
Figure 4.3.4
Each of them contains 128 bits of data. How exactly are these 128 bits decoded to a GUID?
Microsoft’s documentation states that:
“The order of the beginning four-byte group and the next two two-byte groups is reversed,
whereas the order of the last two-byte group and the closing six-byte group is the same”
So in our case:
● CLSID:
○ Raw: 8D 18 80 92 8E 0E 67 48 B3 0C 7F A8 38 84 E8 DE
○ String: 9280188D-0E8E-4867-B30C-7FA83884E8DE
● RIID:
○ Raw: 9E DB 32 D3 B3 B9 25 41 82 07 A1 48 84 F5 32 16
○ String: D332DB9E-B9B3-4125-8207-A14884F53216
Note that x64dbg can easily convert a series of bytes into a GUID:
1. Highlight bytes and right click
2. ‘‘Binary” > “Edit”
3. Click on the “Copy Data” tab
4. In the menu on the left-hand side, click “GUID”
While Googling for these GUIDs does yield some code snippets where those GUIDs are fed into
CLRCreateInstance, it doesn’t really provide much information on specifically which of the
classes or interfaces these GUIDs map to, and there isn’t any official documentation from
Microsoft in the results. However, at the bottom of the CLRCreateInstance documentation
page, we see that these structures are defined in metahost.h, which can be obtained by
installing the .NET framework developer pack.
Header files, such as those found in the .NET framework developer pack or the Windows SDK,
are a great complement (or in this case, alternative) to online API documentation; they contain
information such as constants, data types, function declarations (but not implementation
details), and offsets of important data structures, thereby giving semantic meaning to the bytes
that we see during analysis.
The filepath for metahost.h, in our case, is:
"C:\Program Files (x86)\Windows
Kits\NETFXSDK\4.8.1\Include\um\metahost.h"
If we search the contents of metahost.h for some of the identifiers from the
CLRCreateInstance documentation, we find some lines that resemble the GUIDs that we
had put together earlier (Figure 4.3.7):
Figure 4.3.7
At this point, we can conclude that the malware is attempting to create an ICLRMetaHost
interface.
When we step over the call to CLRCreateInstance and watch the contents of
zeroed_out_mem, we can see that CLRCreateInstance saves a pointer to the interface
that it creates:
Figure 4.4.1: The decompiled code after commenting the line where CLRCreateInstance is called
We can expect this pointer to be used to call the ICLRMetaHost interface’s functions.
However, the problem is that IDA is not aware that the type of the structure that is getting
pointed to is ICLRMetaHost; we only know that information because we obtained it through
the debugger. Because IDA has no knowledge of the type, it won’t be able to resolve any of the
interface’s members or functions; they will be shown only as offsets relative to a3 (Figure 4.4.2):
Figure 4.4.2
This obviously makes it difficult to read the decompiled code, but luckily, IDA has a feature that
allows us to define custom structures (here is a helpful primer on the subject). Once we convert
a3 to a pointer to our custom structure, IDA will know how to resolve the offsets as members
and functions. We’ll start off with creating a custom structure with just one member, a pointer to
the ICLRMetaHost interface, for now. As we run across more functions and determine the
types that are returned by these functions, we will incrementally update our custom structure.
This will help IDA automatically resolve functions and members for us in the decompiled code
as we continue analysis.
We’ll first need to load the type library containing the definitions of structures like
ICLRMetaHost, which can be found here (the type library was generated using the instructions
here if you’d like more information on the process).
First, we copy the type library to C:\Program Files\Hex-Rays IDA Pro 7.7\til\pc.
We then navigate to the “Type Libraries” tab in IDA > right click the window pane > “Load type
library” (Figure 4.4.3):
Figure 4.4.3
Figure 4.4.4
Now that we’ve loaded the type library, we can begin creating our custom structure by
navigating to the “Structures” > right click window pane > “Add struct type…” (Figure 4.4.5):
Figure 4.4.5
We’ll just keep the default name struc_1 for now since we’re not certain of the structure’s
purpose or semantic meaning. We can always rename it later when we’ve determined that
information.
Figure 4.4.6: Creating a custom structure called struc_1
We can add a new member to this structure by right clicking the window pane > “Data” (Figure
4.4.7):
Figure 4.4.7
We can then specify the type of the new member by right clicking it > “Set Type…” (Figure
4.4.8):
Figure 4.4.8
Since we know that the first member is a pointer to the new ICLRMetaHost interface, we can
set the type accordingly (Figure 4.4.9):
Figure 4.4.9
We have now added a new member to our custom structure (Figure 4.4.10):
Figure 4.4.10
Lastly, we’ll convert the type of a3 by right clicking it > “Convert to struct *...” (Figure 4.4.11):
Figure 4.4.11
After the type change (notice on line 18 that the function a3 + 12 has automatically been
resolved as a3->iclr_metahost->GetRuntime) (Figure 4.4.14):
Figure 4.4.14
Figure 4.5.1
Figure 4.5.2
0x0581E750 doesn’t seem familiar, so we’ll do a before and after comparison of the memory
dump:
Figure 4.5.3
Figure 4.5.4
It appears that the first 20 bytes are filled with the version string, and the string has been
converted to a wide character string. Our guess is that sub_10A2FF probably uses one of the
function pointers stored somewhere in alloced_mem in order to copy the version string,
convert it to a wide character string, and store the new string in 0x0581E750. If we step into
sub_10A2FF, we see that it calls another function dynamically:
x64dbg helpfully resolves this to MultiByteToWideChar, which is consistent with what we’ve
observed:
Figure 4.5.6: x64dbg resolves the function inside sub_10A2FF as MultiByteToWideChar
Figure 4.5.7
ICLRMetaHost::GetRuntime
Let’s move on to the next function, ICLRMetaHost::GetRuntime (Figure 4.6.1):
Figure 4.6.1
Because we specified the structure of a3 and set the type of its first member as a pointer to
ICLRMetaHost, IDA Pro is able to resolve this function as GetRuntime. However, x64dbg
can’t quite determine exactly what function is being called and can only provide a relative
address (Figure 4.6.2):
Figure 4.6.2
If we try to resolve the function using Ctrl+G and enter the expression, it shows which DLL the
function is from but not the name of the function (Figure 4.6.3):
Figure 4.6.3
To remedy this, we’ll download the symbols for this module (“Symbols” tab > right click
“mscoreei.dll” > “Download Symbols for This Module”) (Figure 4.6.4):
Figure 4.6.4
Figure 4.6.5
According to Microsoft’s documentation for ICLRMetaHost::GetRuntime, the arguments are
as follows:
● pwzVersion: “The .NET Framework compilation version stored in the metadata, in the
format ‘vA.B[.X]’. A, B, and X are decimal numbers that correspond to the major version,
the minor version, and the build number.”
● rrid: The identifier for the desired interface. Currently, the only valid value for this
parameter is IID_ICLRRuntimeInfo.”
● ppRuntime: “A pointer to the ICLRRuntimeInfo interface that corresponds to the
requested runtime.”
Figure 4.6.6
Notice that the top of the stack is not a version string as we might expect from the
documentation, but is instead a pointer to the instance of ICLRMetaHost. This is because the
‘this’ pointer is passed implicitly to non-static member functions by the C++ compiler. This
won’t be seen in the source code or IDA’s pseudocode view, but can be observed in the
disassembly.
Excluding the pointer to the ICLRMetaHost interface, the first argument is the wide character
version string. As stated in the documentation, the second argument is another GUID. However,
there isn’t a need to decode or look up the GUID since IID_ICLRRuntimeInfo is currently
the only valid value. As for the last argument, let’s keep an eye on 0x0581E988 so that we can
get the address to the newly created ICLRRuntimeInfo interface.
Figure 4.6.7
Figure 4.6.9
We’ll also update the custom structure with a pointer to the ICLRRuntimeInfo interface
(Figure 4.6.10):
Figure 4.6.10
Figure 4.6.12
Figure 4.7.1
IDA is having some issues with resolving the function name because of these off_28 labels
(Figure 4.7.2):
Figure 4.7.2
We also notice that IDA has set the functions with the __noreturn keyword (Figure 4.7.3).
Figure 4.7.3
However, if we were to step over those functions in the debugger (not shown), we’d find that
these functions do in fact return, so these labels must be incorrect.
The functions that have been called thus far were loaded into memory dynamically, and should
therefore not be present in the executable. However, when we double click off_28, IDA takes
us to a section within the executable:
Figure 4.7.4: The disassembly view of off_28
Notice that IDA has labeled the bytes as code. Also notice that the function starting at sub_36
doesn’t have a ret instruction, which explains why IDA labeled those functions from earlier as
__noreturn.
The IDA auto-analyzer did not pick up that the function continues beyond the jmp instruction.
We can modify the function boundaries in IDA by undefining the bytes that IDA mistakenly
labeled as code, either by right clicking the instruction > “Undefine”, or pressing the “U” key:
Figure 4.7.5: Undefining the code at off_28
The same section of the executable after labeling the code as data instead of code:
Figure 4.7.6: The disassembly view of off_28 after undefining the code
Next we’ll go back to the decompiled code for the function we were looking at and undefine the
instructions right after the call instruction.
A side-by-side comparison of the decompiled code on the left, and the corresponding original
assembly on the right (Figure 4.7.7):
Figure 4.7.7
After undefining the data that corresponds with the function call (Figure 4.7.8):
Figure 4.7.8
After re-defining the data as code in order to force IDA to re-analyze and decompile the
instructions (Figure 4.7.9):
Figure 4.7.9
We still need to fix the offset by right clicking call ds:off_28[ecx] and selecting dword
ptr [ecx+28h] so that it matches the corresponding instruction in x64dbg (Figure 4.7.10):
Figure 4.7.10
Figure 4.7.11: The decompiled code and disassembly after fixing the function address offset
CLRRuntimeInfoImpl::IsLoadable
We resume our analysis at IsLoadable (Figure 4.8.1):
Figure 4.8.1
Figure 4.8.2
Here are the arguments that are passed into CLRRuntimeInfoImpl::IsLoadable (Figure
4.8.3):
Figure 4.8.3
Figure 4.8.4
Figure 4.8.5
Apparently, pbLoadable is set to 0x1, or true, meaning that the runtime is loadable in the
current process.
CLRRuntimeInfoImpl::GetInterface
The next function has also been automatically resolved for us as GetInterface (Figure
4.9.1):
Figure 4.9.1
“Loads the CLR into the current process and returns runtime interface pointers, such as
ICLRRuntimeHost, ICLRStrongName, and IMetaDataDispenserEx.”
and takes as arguments:
● rclsid: “The CLSID interface for the coclass.”
● riid: “The IID of the requested rclsid interface”
● ppUnk: “A pointer to the queried interface.”
Figure 4.9.3
Excluding the ICLRRuntimeInfo pointer, the next two arguments are identifiers, as stated by
the documentation. Let’s check what class and interface corresponds to the identifiers:
We were able to find these GUIDs in mscoree.h, which is included in the .NET framework
development pack (Figure 4.9.5):
Figure 4.9.5
CB2F6723-AB3A-11D2-9C40-00C04FA30A3E: CLSID_CorRuntimeHost
CB2F6722-AB3A-11D2-9C40-00C04FA30A3E: IID_ICorRuntimeHost
It appears that the malware is attempting to get a pointer to the ICorRuntimeHost interface.
Let’s step over the function call and keep an eye on 0x0581E98C where the pointer of the new
interface will be stored.
Figure 4.9.6
Figure 4.9.7
Figure 4.9.8
Figure 4.9.11
ICorRuntimeHost::Start
We’ll step forward until the next function, which also has the off_28 label (Figure 4.10.1):
Figure 4.10.1
We’ll need to correct the function pointer here as well by changing the call
ds:off_28[ecx] instruction to call dword ptr [ecx+28h]:
A side-by-side comparison of the original assembly on the right, and the corresponding
decompiled code on the left (Figure 4.10.2):
Figure 4.10.2
After changing the instruction and decompiling, the function now gets resolved as Start
(Figure 4.10.3):
Figure 4.10.3
While IDA correctly resolves the function, x64dbg is having some trouble (Figure 4.10.4):
Figure 4.10.4
As before, we can help out x64dbg by downloading the symbols for clr.dll (Figure 4.10.5):
Figure 4.10.5
Now, x64dbg resolves the function as CorHost::Start() (Figure 4.10.6):
Figure 4.10.6
According to the documentation, this method is what actually starts the .NET runtime.
Summary
We’ve spent quite a while stepping past functions and examining their arguments and return
values. Let’s take a moment to recap and list those functions here:
● CLRCreateInstance
● ICLRMetaHost::GetRuntime
● CLRRuntimeInfoImpl:IsLoadable
● CLRRuntimeInfoImpl::GetInterface
● ICorRuntimeHost::Start
We’ll take this opportunity to step back and ask ourselves what the malware is doing here.
Searching for the functions we encountered above yields many articles written about running
.NET assemblies from C/C++ code, or running managed executables inside unmanaged/native
executables. Recall in our earlier discussion regarding unmanaged vs. managed code that
managed code must be run inside a CLR. What it looks like the malware is doing here is that it
uses a series of API functions to create a CLR, most likely to prepare to run the PE file that was
copied into memory earlier.
Figure 5.1.2
Recall that multi_byte_to_wide_char converts the second argument into a wide char
string and copies it into the address that’s passed in as the third argument:
Figure 5.1.4
0x0581E750 after multi_byte_to_wide_char is called (Figure 5.1.5):
Figure 5.1.5
Let’s step forward until we reach the next function, which x64dbg resolves as
SysAllocString (Figure 5.1.6):
Figure 5.1.6
Figure 5.1.7
According to the documentation, SysAllocString simply “allocates a new string and copies
the passed string into it”. After stepping over the function call, a pointer to the newly allocated
string is returned:
We’ll label this buffer in x64dbg and IDA Pro (Figure 5.1.9):
Figure 5.1.9
The decompiled code after commenting the line where SysAllocString is called (Figure
5.1.10):
Figure 5.1.10
CorHost::CreateDomain
Figure 5.2.2
Returning to the arguments, “3MHR976M” will be the friendly name of the domain, and
0x0581E990 will contain a pointer to the newly created domain. As usual, we’ll confirm with a
before and after:
Figure 5.2.3
Figure 5.2.4
The pointer to the new domain is 0x032C000C, which we’ll label as app_domain (Figure
5.2.5):
Figure 5.2.5
The decompiled code after adding the app_domain member and decompiling (Figure 5.2.8):
Figure 5.2.8
We’ll advance forward until we hit the next function call, which x64dbg resolves as
SysFreeString, which simply frees the friendly domain name string that was created earlier
(Figure 5.2.9):
Figure 5.2.9
The decompiled code after commenting the line where SysFreeString is called (Figure
5.2.10):
Figure 5.2.10
Since the purpose of this call is fairly straightforward, we’ll skip our usual argument analysis and
move on to the next function call, which has been resolved as Unknown_QueryInterface
(Figure 5.2.11):
Figure 5.2.11
Figure 5.2.13
● Raw bytes: DC 96 F6 05 29 2B 63 36 AD 8B C4 38 9C F2 A7 13
● GUID: 05F696DC-2B29-3663-AD8B-C4389CF2A713
The GUID isn’t found in mscoree.h or any of the other header files in the Windows SDK and
.NET developer kit, so we turn to Google. Googling the GUID yields this page, which indicates
that a pointer to the _AppDomain interface is created. This explains why we were not able to
find the GUID in the Windows SDK header files; _AppDomain is a .NET class defined in the
.NET assembly mscorlib.dll.
The fact that a pointer to an _AppDomain interface is strange, as CreateDomain also yielded
a pointer of type _AppDomain earlier, so this call is redundant. We won’t investigate this for
now, but we may return to it later if the need arises.
Figure 5.2.14
Figure 5.2.15
We’ll add a new member, app_domain_interface (to differentiate it from the app_domain
member that we added to the structure earlier), to our custom structure (Figure 5.2.16):
Figure 5.2.16
Figure 5.2.17
Figure 5.2.18: The decompiled code after adding app_domain_interface to struc_1 and
decompiling
SafeArrayCreate
We’ll move on to the next function:
Figure 5.3.2
We’re not too interested in the base type of the array at the moment; we may look it up later if
needed. According to the documentation, our array will have one dimension, and if we examine
the contents of 0x0581E748, we see that there will be 0x00110E00 items in the array:
SafeArrayCreate returns “a safe array descriptor, or null if the array could not be created.” In
our case, our descriptor is 0x015B39A0:
Figure 5.3.6
Figure 5.3.7: The decompiled code after renaming the variable containing the address of the SAFEARRAY
Copying the PE file into the SAFEARRAY
When we step forward, we notice that we’re caught in a loop (Figure 5.4.1):
Figure 5.4.1
Figure 5.4.2
Figure 5.4.3
It’s that executable that was written to memory earlier.
Figure 5.4.5
and according to the instructions, each time we loop, we increment ECX and compare it with
whatever is in ebp+524. If we examine the contents of ebp+524, we see a familiar number
(Figure 5.4.6):
Figure 5.4.6
This happens to be the rgsabound parameter that was passed into the SafeArrayCreate
function from earlier.
In summary, this loop copies the executable one byte at a time from the source memory region
into the SAFEARRAY that was just created. This might be puzzling if you’re not familiar with the
SAFEARRAY type; 0x015B39A0 is the pointer to the SAFEARRAY, but the destination of the
move operation from our loop (0x061FC020) is nowhere near that memory address.
Let’s take a moment to read the documentation of the SAFEARRAY type to get an idea of its
structure:
Figure 5.4.7: The members of SAFEARRAY (source)
We can see that a SAFEARRAY is not like a regular array, where the first element is at the very
beginning of the structure. If we read the definition of each of the fields, the first element of the
array is actually pointed to by the pvData field in the SAFEARRAY structure. Let’s confirm by
checking out our SAFEARRAY structure at 0x015B39A0:
USHORT is 2 bytes. ULONG can be either 4 or 8 bytes depending on the executable type. In this
case, the executable type is 32-bit, so ULONG is 4 bytes here. In total, we expect there to be 12
bytes (2 + 2 + 4 + 4) before the pvData field. Keeping this in mind, pvData is 0x061FC020.
Let’s follow that address in the dump view:
Figure 5.4.9: Memory dump of 0x061FC020 (pvData) before the for loop
If we set a breakpoint to the instruction right after the end of the loop, and hit “Run”, we see that
the memory pointed to by pvData is filled in with the executable:
Figure 5.4.10: Memory dump of 0x061FC020 (pvData) after the for loop
As expected, the SAFEARRAY gets filled with the bytes of the executable.
Figure 5.5.1
Load_3 is the native implementation of AppDomain.Load, the documentation for which states
that this function loads an assembly into the domain.
If we hover over Load_3, IDA displays a pop-up window containing the function signature
(Figure 5.5.2):
Figure 5.5.2
According to this, the last argument is a pointer to a pointer to the mscorlib::_Assembly
interface. Let’s add another member to struc_1:
The custom structure after adding the assembly member (Figure 5.5.4):
Figure 5.5.4
Figure 5.5.5
Figure 5.5.7
Figure 5.5.8
Figure 5.6.2
It looks like we’re moving the contents of dl into ecx+ebp+528 (0x05EB0528) and eax+ecx
(0x061FC020). Remember that those addresses were the source and destination respectively
of the move operation that copied the executable, and that dl is the least significant byte of the
EDX register, which is 0 at the moment:
So this loop zeroes out the memory at both the source and destination addresses that contain
the executable. We can confirm this by setting a breakpoint after the last instruction of the loop,
continuing execution until we break, and checking the contents of both addresses:
Figure 5.6.5
SafeArrayDestroy
The next function is another dynamically called function (Figure 5.7.1):
Figure 5.7.1
Figure 5.7.3
Once we’ve stepped over this call to SafeArrayDestroy, every trace of the executable is
wiped from memory.
Figure 5.7.4
Summary
In this section, the malware has created a domain, loaded an assembly into the domain, and
removed all traces of the original PE file.
Part 6: Running the .NET assembly in the CLR
We finally return from sub_108D67. We’ll rename it to create_clr_and_load_assembly,
and step into the next function, sub_109346 (Figure 6.1.1):
Figure 6.1.1
Creating strings
Notice that the pointer to zeroed_out_mem, which we had converted into a pointer to our
custom structure in Part 4, is provided as an argument in both
create_clr_and_load_assembly and sub_109346. We’ll need to update the type of the
local variable again so that all functions and member names are resolved correctly (right-click
‘a3’ > Click ‘Convert to struct *...’ > Select ‘struc_1’ > Click ‘OK’):
Before changing the type of the variable in IDA Pro (Figure 6.1.2):
Figure 6.1.2
After changing the type of the variable in IDA Pro (Figure 6.1.3):
Figure 6.1.3
If we step forward in the debugger, the control flow takes us all the way to line 118, which is
another call to multi_byte_to_wide_char (Figure 6.1.4):
Figure 6.1.4
Figure 6.1.5
Figure 6.1.6
The next function is dynamically called, so IDA can’t resolve it (Figure 6.1.8):
Figure 6.1.8
Its only argument is the wide character string that was created earlier (Figure 6.1.10):
Figure 6.1.10
SysAllocString returns the following address that contains our string (Figure 6.1.11):
Figure 6.1.11
Figure 6.1.12: The decompiled code after labeling the variable containing the new string
Figure 6.1.14
This time, it copies the string “cJNe8Pbsx” into the address 0x0581E74C:
Figure 6.1.16
Argument of SysAllocString (Figure 6.1.17):
Figure 6.1.17
Figure 6.1.18
Figure 6.1.19: The decompiled code after labeling the variable containing the second new string
GetType_2
The next function we encounter is GetType_2. This is the native implementation of
Assembly.GetType, and basically returns an object that represents the type, which,
according to this documentation, includes “class types, interface types, array types, value types,
enumeration types, type parameters, generic type definitions, and open or closed constructed
generic types”:
Figure 6.2.1: The next function, GetType_2
If we hover over GetType_2, IDA shows us its function signature (Figure 6.2.2):
Figure 6.2.2
Knowing this, we’ll add a new member of type _Type * to struc_1 (Figure 6.2.3):
Figure 6.2.3
The decompiled code after adding type and decompiling (Figure 6.2.5):
Figure 6.2.5
Figure 6.2.6
Figure 6.2.8
InvokeMember_3
The next function call we encounter is InvokeMember_3 (Figure 6.3.1):
Figure 6.3.1
Summary
In this section, the malware specified the namespace, class, and method to execute in the CLR
that was created in the previous section.
When analyzing malware, it’s common practice to look up any API functions that are
encountered in order to determine whether they have been used before by malware. In the case
of this malware, a quick Google search of the API sequence used to create a .NET runtime
reveals an article by SonicWall detailing malware that uses native code to create a CLR runtime
for executing a .NET assembly, though it doesn’t address the malware’s origins or family name.
The results of our search also included information about EDR evasion for the purpose of red
team activities. This resource mentions the tool donut, a shellcode generation tool whose
source code matches what we had observed in our own analysis of the sample.
When we first analyzed this sample, we found the donut source code after we analyzed the
code that created and started the CLR, and used the source code to guide our analysis after
that point. However, in this section, we decided to demonstrate how we would have proceeded if
we hadn't found the source code, as we felt that it would be educational to readers, and we may
not always be so fortunate to find the source code when analyzing samples.
Lastly, now that we’ve unveiled the identity of our sample and have access to the source code,
we’d like to take the opportunity to revisit the switch case block that we encountered earlier in
the main function. With the help of the source code, we were able to determine the switch
condition, which is the format of the next stage payload. The function inside each case is the run
function that corresponds to the file format of the next stage payload (Figure 6.4.1):
Figure 6.4.1: Series of if/else statements in the donut source code (source)
Switch block in the main control code in IDA Pro (Figure 6.4.2):
Figure 6.4.2
When we were conducting dynamic analysis of the sample in x64dbg, we landed in case 1,
which corresponds to DONUT_MODULE_NET_DLL.
Part 7: Process injection
In the previous section, we found that the malware invoked a function defined in the .NET DLL
that was loaded into the CLR. We’ll take a moment to statically analyze the .NET DLL that was
loaded into the runtime.
When we click the function .cctor(), we see that the decompiled code is heavily obfuscated.
The control flow is flattened, as there’s a switch case that deflates the sequence of instruction
execution. The function names are meaningless and the whole program becomes a giant
haystack. We must now turn our attention to dynamic analysis since static analysis doesn’t yield
very much information.
In order to speed up analysis, we should focus on the broader sequence of events that follow
during sample execution. Process Hacker is one of our favorite tools to grasp how the malware
process interacts with other processes.
After capturing a snapshot of our VM, we continue the execution in the debugger without setting
any breakpoints. A new process named InstallUtil.exe pops up under the malware
process that we are debugging:
Figure 7.2.1: A new suspended process InstallUtil.exe is started, as shown in Process Hacker
This new process instantly catches our attention because 1. the new process
InstallUtil.exe is colored grey, which means that the process is suspended, and 2.
InstallUtil.exe is a legitimate .NET configuration tool, into which malware often choose to
inject malicious payloads.
To prove our hypothesis that the C# code is injecting its next stage payload into
InstallUtil.exe, we right click the process InstallUtil.exe > “Properties” > “Memory”
tab:
Figure 7.2.3: Viewing the process memory of InstallUtil.exe in Process Hacker (cont’d)
We restore our VM snapshot, and are back at the point before InvokeMember_3 is called.
Once it’s called, it will become difficult to follow the execution in x64dbg of the loaded .NET
payload. However, we do know that the malware is creating another process, and must call
certain APIs to accomplish this. We can set breakpoints on some functions that are commonly
used to create a new process in order to latch onto the malware execution once more. However,
keep in mind that since our debugger is now running a .NET runtime, we must account for the
possibility that any breakpoint hit could originate from the runtime rather than the sample itself.
Having said that, let’s set breakpoints on the following functions (“Symbols” tab > type the
function name in the search bar on the right-hand panel > right click the function name > click
“Toggle Breakpoint”):
● kernelbase.CreateProcessInternalW
● kernelbase.VirtualAllocEx
● kernelbase.VirtualProtectEx
● kernelbase.WriteProcessMemory
Note that when searching for these API functions, x64dbg will often default to returning
kernel32.dll, meaning that there is a risk of setting breakpoints in kernel32.dll instead
of kernelbase.dll since kernel32.dll is a stub DLL that calls kernelbase.dll.
Therefore, we suggest putting breakpoints inside kernelbase.dll to ensure we hit all the
breakpoints as expected.
While we used x64dbg for this tutorial because of its beginner-friendly UI and is better suited for
analyzing user-mode executables, we should note that WinDbg offers some advantages. In
particular, it can break on the creation of child processes without having to manually set
breakpoints on APIs like CreateProcessInternalW, and can debug child processes.
WinDbg is also capable of debugging kernel-mode components and rootkits.
After setting our breakpoints, let’s resume execution by clicking “Run”. The first breakpoint we
hit is CreateProcessInternalW.
Another interesting argument passed into the API call is the 7th, which is the process creation
flag. The value 0x08000004 can be interpreted as “CREATE_NO_WINDOW |
CREATE_SUSPENDED”.
The flag CREATE_SUSPENDED indicates that the newly created process should appear as
suspended at the time the process is spawned. Creating a suspended process is a common
technique in process injection, allowing injectors to prepare the payload while the process
remains paused.
We can actually watch the new process get spawned in real time by opening Process Hacker,
scrolling down to the x64dbg.exe process tree, then in x64dbg, clicking ‘Execute until return’:
Figure 7.3.4
Figure 7.3.5
Going back to x64dbg, let’s continue execution until we stop at the next breakpoint,
VirtualAllocEx, by hitting “Run” again.
Figure 7.4.1
The arguments of VirtualAllocEx are mostly the same as those of VirtualAlloc, except
the first argument is the handle to a process. The second argument is the address of our newly
allocated memory, 0x00400000. We can actually view this memory in Process Hacker (open
Process Hacker as administrator > right click “InstallUtil.exe” > select “Properties” > click
the “Memory” tab):
Figure 7.4.2: Viewing the process memory of InstallUtil.exe in Process Hacker
Figure 7.4.3: Viewing the process memory of InstallUtil.exe in Process Hacker (cont’d)
Figure 7.4.4
It appears that another executable is being written to memory. However, if we scroll down to the
end of the allocated memory, we notice that only the first 800 or so bytes are written. When we
hit “Run” in x64dbg again, we end up at another call to WriteProcessMemory, and after we
return, we can see that more data has been written. In total, there are nine calls to
WriteProcessMemory before the full executable is written to memory and we hit the
breakpoint of the next function, VirtualProtectEx. Why would the malware want to write the
executable in multiple chunks as opposed to writing everything in one go? One possible reason
is to ensure that antivirus software doesn't get the whole PE payload when it scans the memory
buffer of the target process.
Figure 7.4.6
Recall from the Microsoft documentation for VirtualProtectEx that 0x20 corresponds to
PAGE_EXECUTE_READ. Initially when memory was allocated for the new process, its
protections were set to RWX. However, leaving a process with RWX protections is suspicious
and will get flagged by antivirus software, so the protections must be set back to R-X to evade
detection.
Now that the process of writing the full executable to memory is complete, we can dump the
memory at 0x400000 so that we can examine the executable (click “Save…” in the Process
Hacker memory window). The SHA256 of the payload is
6b37f9bc3649f8adf3c282328a667ec050ddf8eab13ab027bf7e210b265273d8.
Figure 7.4.7: The output of invoking the strings utility on the PE found in InstallUtil.exe’s process memory
The hard-to-miss ASCII art suggests that the payload may be Remcos, a remote access tool
(RAT) with legitimate uses but is also commonly used by malicious actors. Having discovered
the malware family of the payload, we finally conclude our analysis.
Summary
In this last section, we:
● Set breakpoints on API functions commonly used when creating and writing to
processes
● Used Process Hacker to follow the creation of a new process
● Dumped the executable from memory, examined the strings, and identified the final
payload as Remcos
Conclusion
In this blog, we demonstrated the application of malware analysis and reverse engineering
concepts while analyzing an infection chain that transitions between managed and unmanaged
code. To recap, the first stage downloads the next stage payload from the internet and decrypts
it into shellcode, and then uses .NET’s interop capabilities to invoke the shellcode. The second
stage, which turned out to be donut-generated shellcode, disabled key AV functions, created
and started a CLR, and loaded a .NET assembly and executed one of its methods, which
creates a process and injects the final payload, Remcos, into it.
The entire infection chain utilized a number of common techniques used by malware, such as
dynamic API resolution, in-memory patching, PC-relative addressing, and process injection. To
conduct our analysis, we:
● Used both static and dynamic analysis in tandem to get a complete picture of the
malware
● Referred to documentation and source code to aid analysis
● Identified COM interfaces using GUIDs
● Used the debugger to step through execution, set breakpoints, examine memory and
registers before and after function calls
● Leveraged IDA Pro’s type library and custom structure features to help with resolving
functions and structure members
● Identified the purpose of functions by examining the arguments and return value in the
debugger
● Watched the creation of a process and examining the contents of its memory using
Process Hacker
What made the infection chain especially interesting was that it made three transitions back and
forth between managed and unmanaged code. We observed the first transition in dnSpy, when
a payload from the internet was downloaded and invoked using .NET’s interop capabilities
(Transition #1 in Figure 8.1.1). We observed the second transition from unmanaged to managed
code in x64dbg and IDA Pro when the malware created a CLR, loaded an assembly, and
invoked a method from the assembly (Transition #2 in Figure 8.1.1). Finally, using Process
Hacker, we observed the third transition when the method invoked in the previous step created
a new process and injected native code into the process’s memory (Transition #3 in Figure 8.1.).
Figure 8.1.1: The malware made three transitions between managed and unmanaged code
While all of the concepts and tools covered in this blog are not new and have been extensively
discussed in many tutorials and resources, we hope that seeing their application in context was
educational.
Indicators of Compromise
● SHA256: 1d450fb80ff070385e88ab624a387d72abd9d9898109b5c5ebd35c5002223359:
● File size: 85728 bytes
● File type: PE32 executable (GUI) Intel 80386 Mono/.Net assembly, for MS Windows
● File description: first stage
● SHA256:
daba1c39a042aec4791151dbabd726e0627c3789deea3fc81b66be111e7c263e:
● File size: 2184298 bytes
● File type: ASCII text, with very long lines (65536), with no line terminators
● File name: Tsudun.pdf
● File description: encrypted Donut-generated shellcode
● SHA256: d2bea59a4fc304fa0249321ccc0667f595f0cfac64fd0d7ac09b297465cda0c4:
● File size: 1092149 bytes
● File type: data
● File description: decrypted Donut-generated shellcode
● SHA256: 0684d315df85ee1329c70dc0e84e82a054109ba595e813c2b617cbf07dbfdbd2:
● File size: 1117696 bytes
● File type: PE32 executable (DLL) (console) Intel 80386 Mono/.Net assembly, for MS
Windows
● File description: obfuscated .NET assembly
● SHA256: 6b37f9bc3649f8adf3c282328a667ec050ddf8eab13ab027bf7e210b265273d8
● File size: 532480 bytes
● File type: PE32 executable (GUI) Intel 80386, for MS
● File description: final payload, Remcos