0% found this document useful (0 votes)
35 views59 pages

CH 13

This document discusses various data encoding algorithms used by malware to hide configuration information, sensitive strings, and disguise malicious payloads. It describes simple ciphers like Caesar cipher and XOR encoding, as well as common encoding schemes like Base64. It also covers identifying custom encoding schemes in malware and provides two methods for decoding encrypted data - either by manually reprogramming the decoding functions or leveraging functions already implemented in the malware.

Uploaded by

Jayesh Shinde
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views59 pages

CH 13

This document discusses various data encoding algorithms used by malware to hide configuration information, sensitive strings, and disguise malicious payloads. It describes simple ciphers like Caesar cipher and XOR encoding, as well as common encoding schemes like Base64. It also covers identifying custom encoding schemes in malware and provides two methods for decoding encrypted data - either by manually reprogramming the decoding functions or leveraging functions already implemented in the malware.

Uploaded by

Jayesh Shinde
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Practical Malware Analysis

Ch 13: Data Encoding

Revised 4-25-16
The Goal of Analyzing
Encoding Algorithms
Reasons Malware Uses Encoding
• Hide configuration information
– Such as C&C domains
• Save information to a staging file
– Before stealing it
• Store strings needed by malware
– Decode them just before they are needed
• Disguise malware as a legitimate tool
– Hide suspicious strings
Simple Ciphers
Why Use Simple Ciphers?
• They are easily broken, but
– They are small, so they fit into space-
constrained environments like exploit
shellcode
– Less obvious than more complex ciphers
– Low overhead, little impact on performance
• These are obfuscation, not encryption
– They make it difficult to recognize the data,
but can't stop a skilled analyst
Caesar Cipher
• Move each letter forward 3 spaces in the
alphabet
ABCDEFGHIJKLMNOPQRSTUVWXYZ
DEFGHIJKLMNOPQRSTUVWXYZABC
• Example
ATTACK AT NOON
DWWDFN DW QRRQ
0 xor 0 = 0
XOR 0 xor 1 = 1
1 xor 0 = 1
1 xor 1 = 0

• Uses a key to encrypt data


• Uses one bit of data and one bit of the
key at a time
• Example: Encode HI with a key of 0x3c
HI = 0x48 0x49 (ASCII encoding)
Data: 0100 1000 0100 1001
Key: 0011 1100 0011 1100
Result: 0111 0100 0111 0101
0 xor 0 = 0

XOR Reverses Itself 0


1
xor
xor
1
0
=
=
1
1
1 xor 1 = 0

• Example: Encode HI with a key of 0x3c


HI = 0x48 0x49 (ASCII encoding)
Data: 0100 1000 0100 1001
Key: 0011 1100 0011 1100
Result: 0111 0100 0111 0101
• Encode it again
Result: 0111 0100 0111 0101
Key: 0011 1100 0011 1100
Data: 0100 1000 0100 1001
Brute-Forcing XOR Encoding
• If the key is a single byte, there are only
256 possible keys
– Error in book; this should be "a.exe"
– PE files begin with MZ
MZ = 0x4d 0x5a
Link Ch 13a
Brute-Forcing Many Files
• Look for a
common
string, like
"This Program"
XOR and Nulls
• A null byte reveals the key, because
– 0x00 xor KEY = KEY
• Obviously the key here is 0x12
NULL-Preserving Single-Byte XOR
Encoding

• Algorithm:
– Use XOR encoding, EXCEPT
– If the plaintext is NULL or the key itself, skip
the byte
Identifying XOR Loops in IDA Pro
• Small loops with an XOR instruction inside
1. Start in "IDA View" (seeing code)
2. Click Search, Text
3. Enter xor and Find all occurrences
Three Forms of XOR
• XOR a register with itself, like xor edx, edx
– Innocent, a common way to zero a register
• XOR a register or memory reference with a
constant
– May be an encoding loop, and key is the
constant
• XOR a register or memory reference with a
different register or memory reference
– May be an encoding loop, key less obvious
Base64
• Converts 6 bits into one character in a 64-
character alphabet
• There are a few versions, but all use these
62 characters:
ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz
0123456789
• MIME uses + and /
– Also = to indicate padding
Transforming Data to Base64
• Use 3-byte chunks (24 bits)
• Break into four 6-bit fields
• Convert each to Base64
base64encode.org

base64decode.org

• 3 bytes encode to 4
Base64 characters
Padding
• If input had only 2
characters, an = is
appended
Padding
• If input had only 1
character, == is
appended
Example
• URL and cookie are Base64-encoded
Cookie: Ym90NTQxNjQ
• This has 11
characters—
padding is omitted
• Some Base64
decoders will fail,
but this one just
automatically adds
the missing padding
Finding the Base64 Function
• Look for this "indexing string"
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghi
jklmnopqrstuvwxyz0123456789+/
• Look for a lone padding character
(typically =) hard-coded into the encoding
function
Decoding the URLs

• Custom indexing string


aABCDEFGHIJKLMNOPQRSTUVWXYZbcdefghijk
lmnopqrstuvwxyz0123456789+/
• Look for a lone padding character (typically
=) hard-coded into the encoding function
Common Cryptographic
Algorithms
Strong Cryptography
• Strong enough to resist brute-force attacks
– Ex: SSL, AES, etc.
• Disadvantages of strong encryption
– Large cryptographic libraries required
– May make code less portable
– Standard cryptographic libraries are easily detected
• Via function imports, function matching, or identification of
cryptographic constants
– Symmetric encryption requires a way to hide the key
Recognizing Strings and Imports
• Strings found in malware encrypted with
OpenSSL
Recognizing Strings and Imports
• Microsoft crypto functions usually start
with Crypt or CP or Cert
Searching for Cryptographic Constants

• IDA Pro's FindCrypt2 Plug-in (Link Ch 13c)


– Finds magic constants (binary signatures of
crypto routines)
– Cannot find RC4 or IDEA routines because
they don't use a magic constant
– RC4 is commonly used in malware because it's
small and easy to implement
FindCrypt2
• Runs automatically on any new analysis
• Can be run manually from the Plug-In
Menu
Krypto ANALyzer (PEiD Plug-in)
• Download from link Ch 13d
• Has wider range of constants than FindCrypt2
– More false positives
• Also finds Base64 tables and crypto function
imports
Entropy
• Entropy measures disorder
• To calculate it, just count the number of
occurrences of each byte from 0 to 255
– Calculate Pi = Probability of value i
– Then sum Pi log( Pi) for I = 0 to 255 (Link 13e)
• If all the bytes are equally likely, the
entropy is 8 (maximum disorder)
• If all the bytes are the same, the entropy is
zero
Entropy Demo
• Put output in a file
• Use binwalk -E to analyze the file
• Multiply vertical axis by 8
#!/usr/bin/python
import base64, random

a = ''
for i in range(0, 10000):
a += chr(random.randint(0,255))

b = base64.b64encode(a)
c = base64.b32encode(a)
d = base64.b16encode(a)
e = 'A' * 10000

print a + b + c + d + e

41
Entropy Demo
• Concatenate three images in different
formats

42
Searching for High-Entropy Content
• IDA Pro Entropy Plugin
• Finds regions of high entropy, indicating
encryption (or compression)
Recommended Parameters
• Chunk size: 64 Max. Entropy: 5.95
– Good for finding many constants,
– Including Base64-encoding strings (entropy 6)
• Chunk size: 256 Max. Entropy: 7.9
– Finds very random regions
Entropy Graph
• IDA Pro Entropy Plugin
– Download from link Ch 13g
– Use StandAlone version
– Double-click region, then Calculate, Draw
– Lighter regions have high entropy
– Hover over graph to see numerical value
Custom Encoding
Homegrown Encoding Schemes
• Examples
– One round of XOR, then Base64
– Custom algorithm, possibly similar to a
published cryptographic algorithm
Identifying Custom Encoding

• This sample makes a bunch of 700 KB files


• Figure out the encoding from the code
• Find CreateFileA and WriteFileA
– In function sub_4011A9
• Uses XOR with a pseudorandom stream
Advantages of Custom Encoding to the
Attacker

• Can be small and nonobvious


• Harder to reverse-engineer
Decoding
Two Methods
• Reprogram the functions
• Use the functions in the malware itself
Self-Decoding
• Stop the malware in a debugger with data
decoded
• Isolate the decryption function and set a
breakpoint directly after it
• BUT sometimes you can't figure out how
to stop it with the data you need decoded
Manual Programming of Decoding
Functions

• Standard functions may be available


PyCrypto Library
• Good for standard algorithms
How to Decrypt Using Malware

You might also like