0% found this document useful (0 votes)

78 views22 pages

High Efficiency Counter Mode Security Architecture Via Prediction and Pre-Computation

High Efficiency Counter Mode Security Architecture via Prediction and Precomputation proposes using counter value prediction to pre-compute decryption pads and reduce memory decryption latency. Prediction exploits spatial and temporal locality in memory access counters. Techniques include two-level prediction to increase depth, context-based prediction using previous access patterns, and resetting page root counters for frequently updated data. Evaluation shows prediction achieves 99% hit rates and near-native performance for many workloads.

Uploaded by

larryshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

78 views22 pages

High Efficiency Counter Mode Security Architecture Via Prediction and Pre-Computation

Uploaded by

larryshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

High Efficiency Counter Mode Security Architecture via Prediction and Precomputation

Weidong Shi Hsien-Hsin (Sean) Lee Mrinmoy Ghosh Chenghuai Lu Alexandra Boldyreva School of Electrical and Computer Engineering Georgia Institute of Technology

Content

Motivation Related Work Counter/Decryption Pad Prediction Profile Prediction Failures 2Level Prediction Context Based Prediction Conclusions

Why Encrypt System Memory?

Protect sensitive data stored in the RAM (many simple devices
can bypass OS memory protection and directly access physical memory)

Digital Right Management (industry witness of gradual addition of

encryption to each platform component, encrypted PCI-E, encrypted disk, encrypted flash memory, then toward encrypted RAM)

Anti-reverse engineer (majority software licenses require users not

to do reverse engineer, count on the users not breaking the promise)

Military (customer of encrypted FPGA chips, lots of embedded military

software)

Program randomization (intrusion prevention, CCS 2003)

Different Solutions
Crypto Engine Flash

Micro Controller

Create a little secure world, limited application scenarios (code signing, BIOS signature verification)

Processor Core Cache Crypto Engine

SoC. Memory is on-chip. Apply to limited platforms such as small embedded systems (cell phones)

Configurable system RAM encryption. More usage models.

Related Work
Use dedicated cache (sequence number cache) to reduce
latency overhead of memory decryption (Micro 2003)

Prefetch based memory pre-decryption (WASSA 2004)

Prediction based memory decryption (this paper)

Fully exploit pre-computation capability enabled by counter mode

encryption. Use wasted idle crypto engine pipeline stages for prediction and pre-computation. Less area overhead than caching and less memory pressure than prefetch based pre-decryption.

Counter Mode - Encryption

Processor Core
Cache Line Cache Line ... Cache Line Cache Line

Counter+2 Counter+1 Counter Key

VAddr

Counter+2 Counter+1 Counter

Vaddr+2

Crypto Engine

Key

AES Block Cipher

Encryption pad

16B Cache Line 16B Cache Line

Counter+2 Counter+1 Counter Encrypted 16B Encrypted 16B

Each memory line has its own counter. Each time memory line is updated, increment the counter.

Counter Mode -Decryption

Counter+2 VAddr Key Counter+2 Vaddr+2

Processor Core
Key
Cache Line Cache Line ... Cache Line Cache Line

Crypto Engine

AES Block Cipher

Encryption pad

16B Cache Line 16B Cache Line

Counter has to be
fetched for memory line missing L2.

Encrypted 16B

Counter Prediction
Counters exhibit both spacial and temporal coherence. To exploit spacial coherence, memory blocks from the same page
start counting from the same initial value (page root counter)
Page Base Addr 0x0000ff00 ... ... Page Root Counter (64 bits) 0xabcddcba12344321 ... ...

counter
Memory line Memory line ... Memory line Memory line 0xabcddcba123443f1 0xabcddcba12344e0a ... 0xabcddcba12344325 0xabcddcba12344321

frequently updated data infrequently updated data

static data
8

Use Free Idle Pipeline Stages for Prediction

Time Line AES Pipeline

Pipeline Idle
Retrieving Counter Value and Encrypted Line

Generate Decryption Pad

Memory Pipeline

decrypted line

Unrolled and pipelined AES decryption logic often stays idle from tens
to hundreds of cycles when data is missing L2.

Use Free Idle Pipeline Stages for Prediction

Use the idle pipeline stages to generate decryption pads based on
predicted counter values (a small window of look ahead counter values based on page root counter number) E(K,G4) correct guess

Time Line AES Pipeline

First Speculated Decryption Pad

Memory Pipeline

Retrieving Counter Value and Encrypted Line

decrypted line
10

Handle Frequent Updates

Page Base Addr 0x0000ff00 ... ... Page Root Counter (64 bits) 0xabcddcba12344321 ... ... Prediction History Vector (16bits) ... ... ...

TLB Shift Register

0 0 0 1 0 0 1 0 0 1 1 0 0 0 0 0

Counter Value Prediction Logic

If total(miss)>threshold, reset the corresponding Page Root Counter to a new number

Prediction Miss/Prediction Hit (miss =1, hit = 0)

Window based dynamic tracking of prediction rate for each page. For frequently updated memory blocks, according to prediction history
vector, reset root counter number. All future write-backs will count from the new number.
11

Experiment Setup
Parameters L1 I/D Cache Value DM, 8KB

L2 Cache
Memory Bus CPU Clock AES Latency (256-bit) Prediction Depth

4way, unified, 256KB/1M

200MHz, 8B wide 1GHz Total 64 pipeline stages, 1ns each 5

Prediction History Window 16 Bits

Simplescalar 3.0 SPEC2000 INT/FP, benchmarks with high L2 misses. Prediction hit rate study (8 billion instructions) IPC performance (400 million on representative window)
12

Prediction Rate
1.2 1 0.8 0.6 0.4 0.2 0
lu gr id pa rs er sw im tw ol f vo rt ex p ar t bz ip 2 cf gc c vp up r w is Av e er ag e p m gz i ap p m Am m

Prediction Hit Rate (256K L2)

1.2 1 0.8 0.6 0.4 0.2 0

Prediction Hit Rate (1M L2)

gr id pa rs er sw im tw ol f vo rt ex

ar t bz ip 2

gc c

128K_Counter_#_Cache

512K_Counter_#_Cache

Pred

128K_Counter_#_Cache

512K_Counter_#_Cache

Prediction hit rate under 8 billion instructions No counter number cache when using prediction Prediction depth = 5 Average prediction hit rate, about 82-83%
13

vp up r w is Av e er ag e
Pred

ap p

gz i

IPC
1.2 1 0.8 0.6 0.4 0.2 0
m p Ap pl u ar t gr id Pa rs er Sw im Tw ol Vo f rt ex Vp up r w is Av e er ag e p ip 2 Gc c cf Gz i M Bz Am M

Normalized IPC (256K L2)

1.2 1 0.8 0.6 0.4 0.2 0

m p Ap pl u ar t

Normalized IPC (1M L2)

gr id Pa rs er Sw im Tw ol Vo f rt ex

Counter_Cache_4K

Counter_Cache_128K

Counter_Cache_512K

Pred

Counter_Cache_4K

Counter_Cache_128K

Counter_Cache_512K

IPC normalized with the scenario without decryption. In general, outperform 128K counter cache On average, in par with 512K counter cache
14

Vp up r w is Av e er ag e
Pred

ip 2

Gc c

p Gz i

Prediction Miss
Reasons of prediction misses Prediction depth is too small. Reset of page root counter number. Memory lines whose counter
values based on the old page root counter cannot be predicted correctly using the new page root counter.

Solutions (details in the next few slides) Two-level prediction (divide prediction depth into sub ranges,
increase effective prediction depth without adding more predictions) Page root counter history memorization (predict using both the current page root counter and the previous root counter, only having marginal improvement) Context based prediction (exploit temporal coherence of accessing memory locations with coherent update frequency)
15

Two-level Prediction

Counter Number In Natural Order

10 01 00 11

Prediction Window Prediction Window Prediction Window Prediction Window

Divide prediction window into ranges (power of 2) With 2bits per line, effectively quadruple the prediction depth. Overhead is about 2KB on chip memory for 64-entry TLB.
16

Context Based Prediction

Counter Number In Natural Order

Prediction Window

Store the previous lines counter number depth value in a global

register. Generate new predictions based on Page Root Counter and the value in Context Register. Can be combined with regular and 2-level predictions. Feed all the predictions into the decryption pipeline.

Why Does It Work?

Memory Page (128 lines)
Memory line Memory line ... Memory line Memory line

{ while (1) { for all lines of the page write to the line; for all lines of the page read the line; } }

Regular Prediction (prediction depth=4) Prediction miss of memory read (%) 20% (for each line, every 5 reads, 1 miss)

Context Based Prediction 0.1% (for every 128*5 reads, 1 miss)

Prediction Rate
1.2 1 0.8 0.6 0.4 0.2 0
lu gr id pa rs er sw im tw ol f vo rt ex p ar t bz ip 2 cf gc c vp up r w is Av e er ag e p m gz i ap p m Am m

Prediction Hit Rate (256K L2)

1.2 1 0.8 0.6 0.4 0.2 0

Prediction Hit Rate (1M L2)

gr id pa rs er sw im tw ol f vo rt ex

ar t

gc c

Regular_Pred

Two-level_Pred

Context + Regular_Pred

Regular_Pred

Two-level_Pred

Context + Regular_Pred

8 billion instruction window Two-level prediction about 93% prediction hit Context based + regular prediction almost 99% prediction hit
19

vp up r w is Av e er ag e

bz ip

ap p

gz i

IPC
1.2 1 0.8 0.6 0.4 0.2 0
m p Ap pl u ar t gr id Pa rs er Sw im Tw ol Vo f rt ex Vp up r w is Av e er ag e p ip 2 Gc c cf Gz i M Bz Am M

Normalized IPC (256K L2)

1.2 1 0.8 0.6 0.4 0.2 0

m p Ap pl u ar t

Normalized IPC (1M L2)

gr id Pa rs er Sw im Tw ol Vo f rt ex

Regular_Pred

2level_Pred

Context + Regular_Pred

Regular_Pred

2level_Pred

Context + Regular_Pred

IPC normalized to scenario of no decryption 1-3% loss of performance using best prediction

Vp up r w is Av e er ag e

ip 2

Gc c

p Gz i

Conclusions
Counter value prediction allows pre-computing of pads speculatively
without counter value caching.

Spacial and temporal coherence of memory update frequency enables

effective counter value prediction.

Use idle cycles of pipelined decryption engine Counter prediction achieves better performance than some of the large
cache settings.

Complementary with caching technique

Questions

Micro-Architectural Attacks: Chester Rebeiro IIT Madras
No ratings yet
Micro-Architectural Attacks: Chester Rebeiro IIT Madras
59 pages
Week2 - 1
No ratings yet
Week2 - 1
64 pages
FPGA-Based Lightweight Encryption for Cybersecurity
No ratings yet
FPGA-Based Lightweight Encryption for Cybersecurity
13 pages
Conf Micro 2006
No ratings yet
Conf Micro 2006
26 pages
PHD Proposal
No ratings yet
PHD Proposal
85 pages
Register File Prefetching
No ratings yet
Register File Prefetching
14 pages
Perfbook 1c E2 rc11
No ratings yet
Perfbook 1c E2 rc11
881 pages
Module 4 part-II
No ratings yet
Module 4 part-II
22 pages
Parallel Programming
No ratings yet
Parallel Programming
692 pages
Is Parallel Programming Hard
No ratings yet
Is Parallel Programming Hard
662 pages
Perfbook 2023 06 11a
No ratings yet
Perfbook 2023 06 11a
662 pages
Perfbook-1c 2019 12 22a PDF
No ratings yet
Perfbook-1c 2019 12 22a PDF
825 pages
The Parallel Book
No ratings yet
The Parallel Book
646 pages
Is Parallel Programming Hard - and - If So - What Can You Do About It
No ratings yet
Is Parallel Programming Hard - and - If So - What Can You Do About It
533 pages
Secure Memory Architecture in AI
No ratings yet
Secure Memory Architecture in AI
13 pages
Perfbook-1c 2022 09 25a
No ratings yet
Perfbook-1c 2022 09 25a
950 pages
Attack and Risk Analysis For Hardware Supported Software Copy Protection Systems
No ratings yet
Attack and Risk Analysis For Hardware Supported Software Copy Protection Systems
25 pages
Is Parallel Programming Hard, And, If So, What Can You Do About It V2021.12.22a
No ratings yet
Is Parallel Programming Hard, And, If So, What Can You Do About It V2021.12.22a
630 pages
Bs Thesis Scan Chain
No ratings yet
Bs Thesis Scan Chain
55 pages
Onur 447 Spring15 Lecture25 Prefetching Afterlecture
No ratings yet
Onur 447 Spring15 Lecture25 Prefetching Afterlecture
57 pages
Perfbook-1c 2023 06 11a
No ratings yet
Perfbook-1c 2023 06 11a
970 pages
Perfbook-1c 2021 12 22a
No ratings yet
Perfbook-1c 2021 12 22a
930 pages
Ch1 Cache Principles
No ratings yet
Ch1 Cache Principles
56 pages
Chapter 2
No ratings yet
Chapter 2
143 pages
18bce0326 VL2020210504350 Pe003
No ratings yet
18bce0326 VL2020210504350 Pe003
43 pages
Perfbook-Eb 2023 06 11a
No ratings yet
Perfbook-Eb 2023 06 11a
1,432 pages
Design and Analysis of An FPGA-based Multi-Processor HW-SW Syste
No ratings yet
Design and Analysis of An FPGA-based Multi-Processor HW-SW Syste
101 pages
Optimizing AES For RISC-V Cores
No ratings yet
Optimizing AES For RISC-V Cores
47 pages
Data Hazards and Cache Optimization
No ratings yet
Data Hazards and Cache Optimization
2 pages
Chapter V - Large and Fast - Exploiting Memory Hierarchy
No ratings yet
Chapter V - Large and Fast - Exploiting Memory Hierarchy
33 pages
Regalado Research File
No ratings yet
Regalado Research File
6 pages
Conf Wassa 2004
No ratings yet
Conf Wassa 2004
24 pages
Lecture 8
No ratings yet
Lecture 8
37 pages
Chapter - 2 - Parallel Hardware and Parallel Software
No ratings yet
Chapter - 2 - Parallel Hardware and Parallel Software
143 pages
qt1wb7f3h4 Nosplash
No ratings yet
qt1wb7f3h4 Nosplash
126 pages
Pain VJX
No ratings yet
Pain VJX
5 pages
Is Parallel Programming Hard, And, If So, What Can You Do
No ratings yet
Is Parallel Programming Hard, And, If So, What Can You Do
475 pages
CS7810 Prefetching: Seth Pugsley
No ratings yet
CS7810 Prefetching: Seth Pugsley
22 pages
Droilcesser: A Methodology For The Analysis of Redundancy That Made Emulating and Possibly Exploring Congestion Control A Reality
No ratings yet
Droilcesser: A Methodology For The Analysis of Redundancy That Made Emulating and Possibly Exploring Congestion Control A Reality
7 pages
E-Commerce Impact on Algorithms
No ratings yet
E-Commerce Impact on Algorithms
3 pages
A Case For SMPs
No ratings yet
A Case For SMPs
7 pages
Logic in Memory
No ratings yet
Logic in Memory
124 pages
Parallel Computing
100% (1)
Parallel Computing
241 pages
AES 32 An FPGA Implementation of Lightweight-AES For
No ratings yet
AES 32 An FPGA Implementation of Lightweight-AES For
10 pages
Ucam CL TR 798
No ratings yet
Ucam CL TR 798
136 pages
A Es Implementation On Open CL
No ratings yet
A Es Implementation On Open CL
6 pages
Advanced Machine Learning Approach For Suspicious Coded Message Detection Using Enigma Cipher
No ratings yet
Advanced Machine Learning Approach For Suspicious Coded Message Detection Using Enigma Cipher
4 pages
ch5 2
No ratings yet
ch5 2
61 pages
Osymicroproject
No ratings yet
Osymicroproject
10 pages
EGC121lect19 Cache Prefetching
No ratings yet
EGC121lect19 Cache Prefetching
22 pages
Computer Architecture A Quantitative Approach 2nd Edition 1gcu6vr0gn
No ratings yet
Computer Architecture A Quantitative Approach 2nd Edition 1gcu6vr0gn
7 pages
Security Architecture and Design
No ratings yet
Security Architecture and Design
33 pages
Page Replacement in Operating System Memory Management: Heikki Paajanen
No ratings yet
Page Replacement in Operating System Memory Management: Heikki Paajanen
109 pages
EECS 470 Final Review
No ratings yet
EECS 470 Final Review
16 pages
Spectre Attack Lab
No ratings yet
Spectre Attack Lab
13 pages
Memory Management Algorithms and Implementation in C C 1st Edition Bill Blunden Direct Download
No ratings yet
Memory Management Algorithms and Implementation in C C 1st Edition Bill Blunden Direct Download
114 pages
Cse502 L11 Bpred
No ratings yet
Cse502 L11 Bpred
58 pages
Coolpression: A Hybrid Significance Compression Technique For Reducing Energy in Caches
No ratings yet
Coolpression: A Hybrid Significance Compression Technique For Reducing Energy in Caches
14 pages
Gaming and Digital Right Management On Mobile Handset: Larry Shi
No ratings yet
Gaming and Digital Right Management On Mobile Handset: Larry Shi
39 pages
Talk Gatech Lighting 2001
No ratings yet
Talk Gatech Lighting 2001
22 pages
Talk Gatech Ixp 2002
No ratings yet
Talk Gatech Ixp 2002
30 pages
Talk Gatech Game 2000
No ratings yet
Talk Gatech Game 2000
12 pages
Architectural Support For High Speed Protection of Memory Integrity and Confidentiality in Multiprocessor Systems
No ratings yet
Architectural Support For High Speed Protection of Memory Integrity and Confidentiality in Multiprocessor Systems
29 pages
Talk Gatech DSP Compilation 2000
No ratings yet
Talk Gatech DSP Compilation 2000
29 pages
Poster Gatech Power Dve 2002
No ratings yet
Poster Gatech Power Dve 2002
1 page
Conf Nossavl 2010
No ratings yet
Conf Nossavl 2010
9 pages
An Integrated Framework For Dependable and Revivable Architecture Using Multicore Processors
No ratings yet
An Integrated Framework For Dependable and Revivable Architecture Using Multicore Processors
27 pages
Efficient Implementation of Packet Scheduling Algorithm On Network Processor
No ratings yet
Efficient Implementation of Packet Scheduling Algorithm On Network Processor
19 pages
Conf Graphics Hardware 2006
No ratings yet
Conf Graphics Hardware 2006
25 pages
An Intrusion-Tolerant and Self-Recoverable Network Service System Using A Security Enhanced Chip Multiprocessor
No ratings yet
An Intrusion-Tolerant and Self-Recoverable Network Service System Using A Security Enhanced Chip Multiprocessor
18 pages
Fallon Coulombe Resume Lac Updated3-4
No ratings yet
Fallon Coulombe Resume Lac Updated3-4
2 pages
Introduction To Elementary Particle Physics 2nd Edition Bettini Solutions Manual Download
No ratings yet
Introduction To Elementary Particle Physics 2nd Edition Bettini Solutions Manual Download
28 pages
Civics For Grade 11th Unit 2 Short Note
No ratings yet
Civics For Grade 11th Unit 2 Short Note
11 pages
Asean
No ratings yet
Asean
12 pages
The Nature of The Writing Process
100% (1)
The Nature of The Writing Process
22 pages
MBA HRM Mid-Term Exam 2020
No ratings yet
MBA HRM Mid-Term Exam 2020
2 pages
Politics of The Administrative Process 8th Edition Donald F Kettl Official Test Bank
No ratings yet
Politics of The Administrative Process 8th Edition Donald F Kettl Official Test Bank
309 pages
When Leadership Fails: A Case Study of Nokia Under Stephen Elop
No ratings yet
When Leadership Fails: A Case Study of Nokia Under Stephen Elop
8 pages
Ch13 ADC, DAC and Sensor Interfacing
100% (1)
Ch13 ADC, DAC and Sensor Interfacing
33 pages
Amely Chapter Two
No ratings yet
Amely Chapter Two
15 pages
Sem 3 Psych Syllabus
No ratings yet
Sem 3 Psych Syllabus
9 pages
Physics Circuit Problems
No ratings yet
Physics Circuit Problems
2 pages
Env 6
No ratings yet
Env 6
7 pages
Best Metal Songs
No ratings yet
Best Metal Songs
2 pages
Black Beauty
No ratings yet
Black Beauty
5 pages
APRO
0% (2)
APRO
16 pages
PT2 Q3 Glow Up Challenge
No ratings yet
PT2 Q3 Glow Up Challenge
2 pages
Lease and Usufructuary
No ratings yet
Lease and Usufructuary
3 pages
Amazon vs. Whole Foods
No ratings yet
Amazon vs. Whole Foods
6 pages
Customer Perception Towards Investment in Mutual Funds Presentation Nitin
No ratings yet
Customer Perception Towards Investment in Mutual Funds Presentation Nitin
10 pages
Atomic Structure
No ratings yet
Atomic Structure
2 pages
Descriptive Text
No ratings yet
Descriptive Text
15 pages
Multiplication Snakes & Ladders Game
No ratings yet
Multiplication Snakes & Ladders Game
3 pages
Equilibrium (PROBLEMS)
No ratings yet
Equilibrium (PROBLEMS)
4 pages
WH-Questions for English Learners
No ratings yet
WH-Questions for English Learners
8 pages
Learners Differentiating Literacies Contextualizing Reading and Writing 10888162
100% (1)
Learners Differentiating Literacies Contextualizing Reading and Writing 10888162
184 pages
Subject and Object Pronouns Possessive Adjectives 93842
No ratings yet
Subject and Object Pronouns Possessive Adjectives 93842
1 page
Savitri
No ratings yet
Savitri
6 pages
Census of The N W Provinces 1865
No ratings yet
Census of The N W Provinces 1865
653 pages
Superintendent Response
No ratings yet
Superintendent Response
3 pages

High Efficiency Counter Mode Security Architecture Via Prediction and Pre-Computation

Uploaded by

High Efficiency Counter Mode Security Architecture Via Prediction and Pre-Computation

Uploaded by

High Efficiency Counter Mode Security Architecture via Prediction and Precomputation

Why Encrypt System Memory?

Digital Right Management (industry witness of gradual addition of

Anti-reverse engineer (majority software licenses require users not

Military (customer of encrypted FPGA chips, lots of embedded military

Program randomization (intrusion prevention, CCS 2003)

Processor Core Cache Crypto Engine

Configurable system RAM encryption. More usage models.

Prefetch based memory pre-decryption (WASSA 2004)

Prediction based memory decryption (this paper)

Counter Mode - Encryption

Counter+2 Counter+1 Counter Key

Counter+2 Counter+1 Counter

AES Block Cipher

AES Block Cipher

16B Cache Line 16B Cache Line

Counter Mode -Decryption

AES Block Cipher

AES Block Cipher

16B Cache Line 16B Cache Line

frequently updated data infrequently updated data

Use Free Idle Pipeline Stages for Prediction

Time Line AES Pipeline

Generate Decryption Pad

Use Free Idle Pipeline Stages for Prediction

Time Line AES Pipeline

Retrieving Counter Value and Encrypted Line

Handle Frequent Updates

TLB Shift Register

Counter Value Prediction Logic

If total(miss)>threshold, reset the corresponding Page Root Counter to a new number

Prediction Miss/Prediction Hit (miss =1, hit = 0)

4way, unified, 256KB/1M

Prediction History Window 16 Bits

Prediction Hit Rate (256K L2)

1.2 1 0.8 0.6 0.4 0.2 0

Prediction Hit Rate (1M L2)

Normalized IPC (256K L2)

1.2 1 0.8 0.6 0.4 0.2 0

Normalized IPC (1M L2)

Counter Number In Natural Order

Prediction Window Prediction Window Prediction Window Prediction Window

Context Based Prediction

Counter Number In Natural Order

Store the previous lines counter number depth value in a global

Why Does It Work?

Context Based Prediction 0.1% (for every 128*5 reads, 1 miss)

Prediction Hit Rate (256K L2)

1.2 1 0.8 0.6 0.4 0.2 0

Prediction Hit Rate (1M L2)

Normalized IPC (256K L2)

1.2 1 0.8 0.6 0.4 0.2 0

Normalized IPC (1M L2)

Spacial and temporal coherence of memory update frequency enables

Complementary with caching technique

You might also like