2 BufOverflows
2 BufOverflows
Buffer Overflows
Chester Rebeiro
Payload:
the code which the attacker wants to execute
2
Subvert Execution
• In application software
– SQL Injection
• In system software
– Buffers overflows and overreads
– Heap: double free, use after free
– Integer overflows
– Format string
– Control Flow
• In peripherials
– USB drives; Printers
• In Hardware
– Hardware Trojans
These do not really subvert execution,
• Covert Channels but can lead to confidentiality attacks.
– Can exist in hardware or software
3
Buffer Overflows in the Stack
• We need to first know how a stack is managed
http://insecure.org/stf/smashstack.html 4
Stack in a Program
(when function is executing)
Stack
ESP
ESP Parameters
for function
ESP
ESP return Address
ESP prev frame pointer EBP
Locals of function
ESP
In main In function
push $3 push %ebp
push $2 movl %esp, %ebp %ebp: Frame Pointer
push $1 sub $20, %esp %esp : Stack Pointer
call function
5
Stack Usage (example)
Stack (top to bottom):
address stored data
1000 to 997 3
996 to 993 2
992 to 989 1
Parameters
for function 988 to 985 return address
996 to 993 2
992 to 989 1
What is the output of the following?
988 to 985 return address
• printf(“%x”, buffer2) : 966
• printf(“%x”, &buffer2[10]) 984 to 981 %ebp (stored
frame pointer)
976 à buffer1
(%ebp)980 to 976 buffer1
Therefore buffer2[10] = buffer1[0]
A BUFFER OVERFLOW 975 to 966 buffer2
(%sp) 964
7
Modifying the Return Address
Stack (top to bottom):
buffer2[19] =
address stored data
&arbitrary memory location
1000 to 997 3
8
Stack (top to bottom):
address stored data
1000 to 997 3 Now that we seen how buffer
overflows can skip an instruction,
996 to 993 2
9
Big Picture of the exploit
Fill the stack as follows
(where BA is buffer address) BA
BA
Parameters BA
for function BA
BA
Return Address
BA
frame pointer prev frame pointer
BA
buffer BA
10
Payload
• Lets say the attacker wants to spawn a shell
• ie. do as follows:
11
Step 1 : Get machine codes
12
Step 2: Find Buffer overflow in an
application
Defined on stack
O
O
O
O
o
13
Step 3 :
Put Machine Code in Large String
large_string
shellcode
14
Step 3 (contd) :
Fill up Large String with BA
Address of buffer is BA
large_string
shellcode BA BA BA BA BA BA BA BA
15
Final state of Stack
BA
BA
• Copy large string into buffer
BA
BA
BA
• When strcpy returns the BA
exploit code would be executed BA
BA
buffer
shellcode
large_string
shellcode BA BA BA BA BA BA BA BA BA
buffer Address
BA
16
Putting it all together
17
Buffer overflow in the Wild
• Worm CODERED … released on 13th July 2001
• Infected 3,59,000 computers by 19th July.
18
Defenses
• Eliminate program flaws that could lead to subverting of execution
Safer programming languages; Safer libraries; hardware enhancements;
static analysis
• If can’t eliminate, make it more difficult for malware to subvert
execution
W^X , ASLR, canaries
• If malware still manages to execute, try to detect its execution at
runtime
malware run-time detection techniques using learning techniques, ANN and malware signatures
• If can’t detect at runtime, try to restrict what the malware can do..
– Sandbox system
so that malware affects only part of the system; access control; virtualization; trustzone; SGX
– Track information flow
DIFT; ensure malware does not steal sensitive information
19
Preventing Buffer Overflows
with Canaries and W^X
20
Canaries
• Known (pseudo random) values placed Stack (top to bottom):
on stack to monitor buffer overflows. stored data
• A change in the value of the canary
3
indicates a buffer overflow.
• Will cause a ‘stack smashing’ to be 2
detected
1
ret addr
21
Canaries and gcc
• As on gcc 4.4.5, canaries are not added to functions by default
o Could cause overheads as they are executed for every function
that gets executed
• Canaries can be added into the code by –fstack-protector option
o If -fstack-protector is specified, canaries will get added based on
a gcc heuristic
• For example, buffer of size at-least 8 bytes is allocated
• Use of string operations such as strcpy, scanf, etc.
22
Canaries Example
Without canaries, the return address on stack gets overwritten resulting in a
segmentation fault. With canaries, the program gets aborted due to stack smashing.
23
Canaries Example
Without canaries, the return address on stack gets overwritten resulting in a
segmentation fault. With canaries, the program gets aborted due to stack smashing.
24
Canary Internals
With canaries
25
Non Executable Stacks (W^X)
• In Intel/AMD processors, ND/NX bit present to mark non code
regions as non-executable.
– Exception raised when code in a page marked W^X executes
• Works for most programs
– Supported by Linux kernel from 2004
– Supported by Windows XP service pack 1 and Windows Server 2003
• Called DEP – Data Execution Prevention
• Does not work for some programs that NEED to execute from the
stack.
– Eg. JIT Compiler, constructs assembly code from external data and then
executes it.
(Need to disable the W^X bit, to get this to work)
26
26
Will non executable
stack prevent buffer
overflow attacks ?
https://css.csail.mit.edu/6.858/2010/readings/return-to-libc.pdf 27
27
Return to Libc
(big picture)
BA
BA
BA
BA
Return Address BA
This will not work if ND bit is set
BA
BA
BA
buffer
Exploit code
28
28
Return to Libc
(replace return address to point to a function within libc)
F1 Addr
F1 Addr Stack
F1 Addr
F1 Addr
Heap
Return Address F1 Addr
F1 Addr
F1 Addr Data
F1 Addr
buffer
F1 Addr
Text
So we need to
1. Find the address of system in the program
(does not have to be a user specified function, could be a function
present in one of the linked libraries)
2. Supply an address that points to the string
/bin/sh
30
30
The return-to-libc attack
F1ptr
F1 ptr /bin/bash
Shell ptr
F1ptr
Return Address F1ptr
F1ptr
F1ptr system()
F1ptr In libc
buffer
F1ptr
31
31
Find address of system in the
executable
32
32
Find address of /bin/sh
• Every process stores the enviroment variables at
the bottom of the stack
• We need to find this and extract the string
/bin/sh from it
33
33
Finding the address of the string
/bin/sh
34
The final Exploit Stack
xxx
xxx /bin/sh
0xbfbffe25
dead
Return Address 0x28085260
xxx
xxx system()
xxx In libc
buffer
xxx
35
A clean exit
xxx
xxx /bin/bash
0xbfbffe25
0x281130d0 exit()
Return Address 0x28085260 In libc
xxx
xxx system()
xxx In libc
buffer
xxx
36
Limitation of ret2libc
37
37
Return Oriented Programming
(ROP)
38
Return Oriented Programming Attacks
The Geometry of Innocent Flesh on the Bone: Return-into-libc without Function Calls
(on the x86
39
Target Payload
Lets say this is the payload needed to be executed by an attacker.
40
Step 1: Find Gadgets
• Find gadgets
• A gadget is a short sequence of instructions followed by a return
useful instruction(s)
ret
• Useful instructions : should not transfer control outside the gadget
41
Step 2: Stitching
• Stitch gadgets so that the payload is built
xxx
AG4 movb $0x0, 0x7(%esi)
ret G2
AG3
movl $0xb, %eax
AG2 G4
ret
Return Address AG1 movb $0x0, 0xc(%esi)
ret G3
xxx
xxx
xxx movl %esi, 0x8(%esi)
buffer G1
ret
xxx
Program Binary
Program Stack
AGi: Address of Gadget i 43
Finding Gadgets
• Static analysis of libc
• To find
1. A set of instructions that end in a ret (0xc3)
The instructions can be intended (put in by the compiler) or unintended
2. Besides ret, none of the instructions transfer control out of the
gadget
44
Intended vs Unintended Instructions
• Intended : machine code intentionally put in by the compiler
• Unintended : interpret machine code differently in order to build new
instructions
Machine Code : F7 C7 07 00 00 00 0F 95 45 C3
What the compiler intended..
Highly likely to find many diverse instructions of this form in x86; not so likely to
have such diverse instructions in RISC processors 45
Geometry
• Given an arbitrary string of machine code, what is the
probability that the code can be interpreted as useful
instructions.
– x86 code is highly dense
– RISC processors like (SPARC, ARM, etc.) have low geometry
• Thus finding gadgets in x86 code is considerably more easier
than that of ARM or SPARC
• Fixed length instruction set reduces geometry
46
Finding Gadgets
• Static analysis of libc
• Find any memory location with 0xc3 (RETurn instruction)
• Build a trie data structure with 0xc3 as a root
• Every path (starting from any node, not just the leaf) to the root is a
possible gadget
C
3
child of
00 46
24 89
24 43
94
37 16
47
Finding Gadgets
33 b2 23 12 a0 31 a5 67 22 ab ba 4a 3c c3 ff ee ab 31 11 09
48
Finding Gadgets Algorithm
49
Finding Gadgets Algorithm
pop %edx
deadbeef ret
esp GadgetAdd
51
Stitch
G1
pop %edx
G2
ret
addr G2
esp G1 mov 64(%edx), %eax
ret
+64
stack
52
Store Gadget
• Store the contents of a register to a memory location in the
stack
mov %eax, 24(%edx)
ret
GadgetAddr 2
0
esp GadgetAddr 1 pop %edx
ret
24
stack
53
Gadget for addition
Add the memory pointed
to by %edx to %eax.
The result is stored in %eax
54
Gadget for addition
(put 0xc3 into %edi)
1. First put gadget ptr for 0xC3 into
%edi
2. 0xC3 corresponds to NOP in
addl (%edx), %eax ROP
GadgetAddr3 push %edi 3. Push %edi in gadget 2 just pushes
Gadget_RET
ret 0xc3 back into the stack
GadgetAddr2
Therefore not disturbing the stack
Gadget_RET
0xc3 contents
esp GadgetAddr1
4. Gadget 3 executes as planned
stack pop %edi
ret
55
Unconditional Branch
in ROP
• Changing the %esp causes unconditional
jumps
pop %esp
ret
esp GA
stack
56
Conditional Branches
In x86 instructions conditional branches have 2 parts
57
Step 1 : Set the flags
Find suitable ROPs that set appropriate flags
CMP %eax, %ebx subtraction
RET Affects flags CF, OF, SF, ZF, AF, PF
58
Step 2: Transfer flags to
memory or register
• Using lahf instruction
stores 5 flags (ZF, SF, AF, PF, CF) in the %ah register
59
Step 2: Indirect way to transfer flags to
memory
Several instructions operate using the contents of the flags
ADC %eax, %ebx : add with carry; performs eax <- eax + ebx + CF
(if eax and ebx are 0 initially, then the result will be either 1 or 0 depending on the CF)
RCL %eax, 1
(if eax = 0. then the result is either 0 or 1 depending on CF)
60
Gadget to transfer flags to memory
61
Step 3: Perturb %esp depending
on flag
What we hope to achieve
If (CF is set){
perturb %esp What we have One way of achieving …
}else{ CF stored in a memory location (say X) negate X
leave %esp as it is Current %esp offset = delta & X
} delta, how much to perturb %esp %esp = %esp + offset
62
Turing Complete
• Gadgets can do much more…
invoke libc functions,
invoke system calls, ...
• For x86, gadgets are said to be turning complete
– Can program just about anything with gadgets
• For RISC processors, more difficult to find gadgets
– Instructions are fixed width
– Therefore can’t find unintentional instructions
• Tools available to find gadgets automatically
Eg. ROPGadget (https://github.com/JonathanSalwan/ROPgadget)
Ropper (https://github.com/sashs/Ropper)
63
Address Space Layout Randomization
(ASLR)
64
The Attacker’s Plan
• Find the bug in the source code (for eg. Kernel) that can be
exploited
– Eyeballing
– Noticing something in the patches
– Following CVE
• Use that bug to insert malicious code to perform something
nefarious
– Such as getting root privileges in the kernel
65
Address Space Randomization
• Address space layout
randomization (ASLR)
randomizes the address space
layout of the process
• Each execution would have a
different memory map, thus
making it difficult for the attacker
to run exploits
• Initiated by Linux PaX project in
2001
• Now a default in many operating
systems
67
ASLR in Action
First Run
Another Run
68
ASLR in the Linux Kernel
• Permanent changes can be made by editing the /etc/sysctl.conf file
69
Internals : Making code relocatable
• Load time relocatable
– where the loader modifies a program executable so
that all addresses are adjusted properly
– Relocatable code
• Slow load time since executable code needs to be modified.
• Requires a writeable code segment, which could pose
problems
• PIE : position independent executable
– a.k.a PIC (position independent code)
– code that executes properly irrespective of its absolute address
– Used extensively in shared libraries
• Easy to find a location where to load them without overlapping with
other modules
70
Load Time Relocatable
1
71
Load Time Relocatable
2
72
Load Time Relocatable
73
Load Time Relocatable
74
Load Time Relocatable
Limitations
• Slow load time since executable code needs to be modified
75
PIC Internals
• An additional level of indirection for all global data and
function references
• Uses a lot of relative addressing schemes and a global offset
table (GOT)
• For relative addressing,
– data loads and stores should not be at absolute addresses but must be
relative
With GOT
77
Enforcing Relative Addressing
(example)
With load time relocatable
With PIC
78
Enforcing Relative Addressing
(example)
With load time relocatable
With PIC
Get address of next instruction
to achieve relativeness
79
Advantage of the GOT
• With load time relocatable code, every variable reference would need to
be changed
– Requires writeable code segments
– Huge overheads during load time
– Code pages cannot be shared
• With GOT, the GOT table needs to be constructed just once during the
execution
– GOT is in the data segment, which is writeable
– Data pages are not shared anyway
– Drawback : runtime overheads due to multiple loads
80
An Example of working with GOT
81
Data section
Text section
offset of myglob
in GOT
GOT it!
83
Deep Within the Kernel
loading the executable
(randomizing the data section)
Check if randomize_va_space
is > 1 (it can be 1 or 2)
Finally Randomize
84
Function Calls in PIC
• Theoretically could be done similar with the data…
– call instruction gets location from GOT entry that is filled in during load
time (this process is called binding)
– In practice, this is time consuming. Much more functions than global
variables. Most functions in libraries are unused
• Lazy binding scheme
– Delay binding till invocation of the function
– Uses a double indirection – PLT – procedure linkage table in addition
to GOT
85
The PLT
• Instead of directly calling func, invoke an offset in the
PLT instead.
• PLT is part of the executable text section, and consists
of one entry for each external function the shared
library calls.
1 • Each PLT entry has
a jump location to a specific GOT entry
Preparation of arguments for a ‘resolver’
Call to resolver function
86
First Invocation of Func
First Invocation of fun (steps 2 and 3)
On first invocation of func, PLT[n]
jumps to GOT[n], which simply jumps
back to PLT[n]
1
87
First Invocation of Func
(step 4). Invoke resolver, which resolves
the actual of func,
places this actual address into GOT
and then invokes func
4 3
88
Example of PLT
89
Example of PLT
ebx points to the GOT table
ebx + 0x10 is the offset
corresponding
to set_mylib_int
90
Example of PLT
Jump to the resolver, which
resolves the actual address
of set_mylib_int and fills it
into the GOT
91
Subsequent invocations of Func
3
2
92
Advantages
• Functions are relocatable, therefore good for ASLR
• Functions resolved only on need, therefore saves
time during the load phase
93
Bypassing ASLR
• Brute force
• Return-to-PLT
• Overwriting the GOT
• Timing Attacks
94
Safer Programming Languages,
and Compiler Techniques
95
Other Precautions for buffer overflows
• Enforce memory safety in programming language
– Example java, C# (slow and not feasible for system programming)
• Cannot replace C and C++.
(Too much software already developed in C / C++)
96
Compile Bounds Checking
• Check accesses to each buffer so that it cannot be beyond the
bounds
• In C and C++, bound checking performed at pointer calculation time
or dereference time.
• Requires run-time bound information for each allocated block.
• Two methodologies
– Object based techniques
– Pointer based techniques
These checks are automatically inserted at compile time for all pointer
variables. For non-pointers, this check is not required.
98
Softbound – more details
• pointer arithmetic and assignment
The new pointer inherits the base and bound of the original
pointer
99
Storing Metadata
• Table maintained for metadata
100
Softbound – more details
• Pointers passed to functions
– If pointers are passed by the stack
no issues. The compiler can put information related to metadata onto
the stack
– If pointers passed by registers.
Compiler modifies every function declaration to
add more arguments related to metadata
For each function parameter that is a pointer, the corresponding base
and bound values are also sent to the function
101