Architectural Support for High Speed Protection of Memory Integrity and Confidentiality in Multiprocessor Systems
Weidong Shi Hsien-Hsin (Sean) Lee Mrinmoy Ghosh Chenghuai Lu Georgia Institute of Technology Atlanta, GA 30332
1
Types of Security Attacks
Software-based attacks Software reverse engineering, de-assembly Software patching Hardware-based physical attacks Trace system from system bus, peripheral bus Differential power/timing analysis Build fake devices, device spoof (MOD chip) Modify RAM Replay bus signals, fake bus signal injection Trigger fake interrupts
XBOX with MOD-chip installed. MOD-chip is a low cost bus snoop and spoof device widely used to break XBOX security.
Shared-Memory MP Security Architecture 2
Cracking the XBOX
Nbridge + GPU Hyper-Transport
Secret Key South Bridge
FPGA based Bus Tracer
P-III
BIOS Flash (some BIOS codes are encrypted)
Find out the key
socket over HT Bus soldered by hackers
MOD Chip (PCB with -controller and Flash memory)
BIOS hijacking
Low cost FPGA based bus snooping device
Shared-Memory MP Security Architecture
Motivation
Yet to be solved Issues of prior security measures Uni-processor based security model Protected memory cannot be shared Large space and performance overhead in security support Some compromise some security for performance improvement
Our Work
Protect integrity and confidentiality in a Shared-memory Multiprocessor platform
Shared-Memory MP Security Architecture 4
Agenda
Uni-processor Security Architecture Platform-oriented Security Architecture
Architectural Support for Shared Memory Integrity and Confidentiality
Evaluation
Conclusions
Shared-Memory MP Security Architecture
Insecure Uni-Processor Architecture
Processor Core Caches
Secure Processor
North Bridge
(Mem Controller)
RAM
South Bridge
Ethernet
Mouse
Keyboard
Disk
Shared-Memory MP Security Architecture
Secure Uni-Processor Architecture
Processor Core Caches
Trusted Domain UnTrusted Domain
Secure Processor
North Bridge
(Mem Controller)
RAM
South Bridge
Ethernet
Mouse
Keyboard
Disk
Shared-Memory MP Security Architecture
Secure Uni-Processor Architecture
Processor Core Root Signature Caches
MAC hash tree
Trusted Domain UnTrusted Domain
Crypto Engine
Secure Processor
North Bridge
(Mem Controller)
South Bridge
RAM (encrypted data & MAC code)
Ethernet
Mouse
Keyboard
Disk
Not directly applicable to a Shared-memory Multiprocessor system
Shared-Memory MP Security Architecture 8
Basics: Integrity Check (MAC Authentication)
Sender
N-bit Plaintext Secret Key Hash/Encryption Hash/Encryption Secret Key
Receiver
M bit MAC
M bit MAC
Exception
Again, Sender and Receiver share the same secret key Detect data tampering using Message Authentication Code (or MAC) Any attempt for an adversary to modify data or forge a valid authentication code is guaranteed to be detected
Shared-Memory MP Security Architecture 9
Platform-oriented Security Architecture
Processor 1 (PE 1)
Processor Core Caches Crypto Engine encrypted data encrypted MAC
Processor n (PE n)
Processor Core Caches Crypto Engine
Cache-to-Cache
- send encrypted data first then followed by encrypted MAC - receiver decrypts data and verifies integrity
Cache-to-Memory
send encrypted data and MAC to Nbridge Need- to be - Nbridge decrypts the data, verifies its protected integrity, updates MAC tree, and store encrypted data to the RAM
RAM
Crypto Engine
MAC Tree Cache
North Bridge (PE 0)
Shared-Memory MP Security Architecture 10
Protection on the RAM MAC Tree
Root MAC MAC
MAC
32B 32B RAM Block RAM Block
32B RAM Block
M-ary MAC (message authentication code) tree to protect physical memory integrity dynamically (e.g. Replay attack). The root MAC is a signature of the protected memory space. Root MAC is kept inside the North Bridge. Frequently accessed MAC tree nodes are cached inside NBridge
Shared-Memory MP Security Architecture 11
Platform-oriented Security Architecture
Processor 1 (PE 1)
Processor Core Caches Crypto Engine encrypted data encrypted MAC
Processor n (PE n)
Processor Core Caches Crypto Engine
Cache-to-Cache
- send encrypted data first then followed by encrypted MAC - receiver decrypts data and verifies integrity
Cache-to-Memory
- send encrypted data and MAC to Nbridge - Nbridge decrypts the data, verifies its integrity, updates MAC tree, and store encrypted data to the RAM
RAM
Crypto Engine
MAC Tree Cache
Memory-to-Cache
- Nbrdige reads encrypted data and MAC from the RAM - Nbridge decrypts the data, verifies its MAC, reencrypts the data and put encrypted data and MAC on the shared bus 12 - receiver decrypts data and verifies integrity
North Bridge (PE 0)
Shared-Memory MP Security Architecture
Platform-oriented Security Architecture
Physical memory (RAM) authentication MAC Tree Protected data sharing Encryption using Bus sequence number Process key Authentication speculative execution (ASE)
Shared-Memory MP Security Architecture
13
Basics: Counter Mode Encryption
Sender
Init. Counter + 0 Secret Key Block Cipher or Cryptographic Hash Secret Key Block Cipher or Cryptographic Hash
Receiver
Init. Counter + 0
Pseudo-random pad
Pseudo-random pad
Plaintext A
XOR
Ciphertext A
XOR
Plaintext A
To send a data sequence securely Sender and receiver share a secret key, and an initial counter value. A pseudo-random pad is generated deterministically Counter value does not need to be a secret.
Shared-Memory MP Security Architecture
14
Basics: Counter Mode Encryption
Sender
Init. Counter + 1 Secret Key Block Cipher or Cryptographic Hash Secret Key Block Cipher or Cryptographic Hash
Receiver
Init. Counter + 1
Pseudo-random pad
Pseudo-random pad
Plaintext B
XOR
Ciphertext B
XOR
Plaintext B
Counter values increment coherently for both parties in a predetermined sequence
Shared-Memory MP Security Architecture
15
How to Encrypt each Transaction?
256-bit Process Key Bus sequence number
Cryptographic Hash
One-Time-Pad (OTP)
Cache Line
Encrypted Data
OTP generation Bus sequence number Process Key Bus sequence number a 64-bit secret initialized after the system is booted shared by all the parties connected to the shared bus. incremented after each transaction All PEs on the shared bus snoop each bus transaction OTP can be pre-computed based on an approximate range of bus sequence numbers
16
Shared-Memory MP Security Architecture
Generating Process Key & Bus Sequence Number
Burned inside each PE
Secret Constant
By secure kernel
Process unique ID Secret Constant
Session Key
Encryption (AES)
Session Key
Encryption (AES)
Initiated every time It boots
Process Key
Initial Bus Sequence Number
Bus Sequence Number works similar to counter mode encryption
Shared-Memory MP Security Architecture 17
Session Key Generation (Distribution)
Processor PE0 Processor PE1 Processor PE n-1
broadcast random num
receive random num from others
Secure Memory Controller PE n
Random Number PE0 Random Number PE1
Random Number PEn
Secret Hash Key
Hash (SHA256)
Burned inside each PE, same for each PE
During System Boot
128 bit Session Key
Shared-Memory MP Security Architecture 18
Protected Data Sharing Operations
Processor A Processor B
256-bit Process Key
Bus sequence number
256-bit Process Key
Bus sequence number
Cryptographic Hash
Cryptographic Hash
OTP (one-time-pad)
Data Block
Encrypted Data
OTP (one-time-pad)
Encrypted Data
Data Block
Shared-Memory MP Security Architecture
19
OTP Pre-computing
+1,+2, +3,
Process Key Latest Bus sequence number
Data to be transmitted
OTP queue
OTP(0x1234abcd0000) OTP Generation
OTP(0x1234abcd0001)
OTP(0x1234abcd0002) OTP(0x1234abcd001e) OTP(0x1234abcd001e) OTP(0x1234abcd001f) Bus Arbitration Logic
request for bus ownership
Shared Bus
Ownership granted, current bus sequence number = 0x1234abcd001e OTP Generation is on the critical path We can pre-compute OTP needed in the neighborhood
Shared-Memory MP Security Architecture
20
OTP Pre-Computing
Processor A Processor B
256-bit Process Key
Bus sequence number
256-bit Process Key
Bus sequence number
Cryptographic Hash
Cryptographic Hash
OTP (one-time-pad)
Data Block
Encrypted Data
OTP (one-time-pad)
Encrypted Data
Data Block
Shared-Memory MP Security Architecture
21
Split Transaction of Data and MAC
Sequence Authentication Buffer
ID MAC Valid Verified OTP
Processor A
Processor B
Processor C
Data(id, seq), Data(id+1, seq+1), MAC(id-3, seq-3), Data(id+2, seq+2), MAC(id, seq), Shared Bus
Shared-Memory MP Security Architecture 22
Authentication Speculative Execution (ASE)
Performance Side: allow execution to be continued using un-verified data allow execution to be continued using results derived from unverified data Security Side: under counter-mode, instructions and data may be altered by hackers. Authentication has to be performed in a timely fashion to prevent attacks that flip individual bits of encrypted data/instructions. memory state should not be altered using results of un-verified data instruction fetch should not be issued to the memory if determined by control flow using un-verified data
Shared-Memory MP Security Architecture
23
ASE
SAB Tag = 2
r3
SAB Tag =2
Load r3
r4
SAB Tag =3
r6
SAB Tag =2
Load r6
r5 r5<r6 N
SAB Tag =1
0: r3 = (addr1) 1: r4 = r3*const1 2: r5 = r4+const2 3: r6 = (addr2) 4: if (r5<r6) { 5: } else { 6: r7 = r6 + r1} 7: (addr3) = r7 MAC Fetched Fetched Fetched Verify? Verified Verified Verified
Wait if Icache miss
r6
r1 r1
SAB Tag =1
Sequential Authentication Buffer
r7
Wait until all the data sources are verified Shared-Memory MP Security Architecture
Save r7
24
Evaluation Methodology
RSIM MP simulator
Benchmarks: Splash, Splash2
Modified Rsim simulator to support bus snoop based cache coherence Added an accurate DRAM model Added shared memory support Implemented a North Bridge simulator with MAC tree authentication. Extended processor model to support performance simulation of proposed protection including speculative authentication.
Shared-Memory MP Security Architecture
25
Non-Speculative (AIO) vs. ASE
Authentication Performance (2P)
1.2 1 0.8 0.6 0.4 0.2 0
ra qu dix ic ks or t wa te r m p3 d Av er ag e fft lu
Normalized IPC Normalized IPC
Authentication Performance (4P)
1.2 1 0.8 0.6 0.4 0.2 0
ra qu dix ic ks or t wa te r m p3 d Av er ag e fft lu
AIO ASE
AIO ASE
ASE outperforms in-order execution by 80% for 2P- and 4Pprocessor systems.
Shared-Memory MP Security Architecture
26
Data Confidentiality
Performance of Protection on Confidentiality (4P)
No cache
1
8KB seq# cache
32KB seq# cache
Normalized IPC
0.8 0.6 0.4 0.2 0 fft lu radix quicksort water mp3d Average
40 to 55% Performance loss compared to no security support More cache-to-cache transactions, the faster execution due to OTP pre-computation With a sequence number cache, memory-to-cache operations can be accelerated by ~30%
Shared-Memory MP Security Architecture 27
Conclusions
Proposed security scheme to protect confidentiality and integrity for shared memory in snoop bus multiprocessor system. Proposed a number of techniques to minimize the overhead caused by security protection including,
Physical memory (RAM) authentication Shared bus sequence number based encryption Split transmission of data and MAC Authentication Speculative Execution without violating rule of authentication safe
Lightweight secure processor design with novel security design features (offload to North Bridge).
Shared-Memory MP Security Architecture 28
Questions & Answers & Entertaining
Thats All Folks !
Shared-Memory MP Security Architecture
29