0% found this document useful (0 votes)

87 views18 pages

Associative Memory

Associative memory, set associative memory, cache representations, memory hierarchy, cache

Uploaded by

manishbhardwaj8131

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

87 views18 pages

Associative Memory

Associative memory, set associative memory, cache representations, memory hierarchy, cache

Uploaded by

manishbhardwaj8131

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Fully Associative

A memory block can go in any cache slot (many-to-many).

0 4 8 12 16 20 24 28 32 36 4 choices 0 1 2 3

Slot selection check all tags (preferably simultaneously) take a slot with V = 0 (a free slot) otherwise select a slot according to some replacement strategy
Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

memory address (decimal)

Set Associative
A memory block goes into a set, and can be placed anywhere within the set (many-to some)
Slot Set

0 4 8 12 16 20 24 28 32 36

0 1 0 1 2-way set associative cache

0 1

Slot selection Determine set from block address In this set, take a free slot ... ... or evict a slot according to some replacement strategy
Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

memory address (decimal)

Set Associative
Set = ((memory address) DIV (blocksize)) MOD (sets in cache). Example
In which set goes the block located at address 12D? 12 DIV 4 = 3 (block address) 3 MOD 2 = 1 (set 1)
In which slot the block finally goes depends on occupation and replacement strategy

Similar to direct mapping, the low order bits of the block address determine the destination set.
block address

m tag bits m-n

Computer Architecture WS 06/07

set

offset n
Dr.-Ing. Stefan Freinatis

Set Associative
N-way set associative cache
N = number of slots per set, not the number of sets N is a power of 2, common values are 2, 4, 8. Extremes N=1 There is only one slot per set, that is, each slot is a set. The set number (thus the slot) is drawn from the block address.

Direct Mapped
N=B There is only one set containing all slots (B = number of blocks in cache = number of slots).

Fully Associative
Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Set Associative

AMD Opteron Cache

Two-way set associative

Figure from [HP06 p. C-13]

Set Associative
Opteron cache
Cache capacity: 64 kB in 64-byte blocks (1024 slots) Cache is two-way set associative: 512 sets 2 cache lines Hardware: Two arrays with each 512 caches lines, that is, each set has one cache line in array1 and one in array2. Physical address is 40 bits. Address is divided into 34 bit block address (subdivided into 25 tag bits and 9 index bits) and 6 bits byte offset (c).
Figure: The index selects the set (29 = 512), see d. The two tags of the set are compared against the tag bits. The valid bit must be set for a hit (e). On a hit, the corresponding data is delivered using the winning input from a 2:1 multiplexer (f). The data goes to Data in of the CPU. The victim buffer is needed when a cache line has to be written back to main memory (replacement).

Computer Architecture

WS 06/07

Dr.-Ing. Stefan Freinatis

Cache Organization
Where can block 12 go?
Cache

Block address

Memory

Figure from [HP06 p.C-7] Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Cache Organization
For the previous figure, assume block 12 and block 20 being used very often. What is the problem?

Fully associative: No special problem. Both blocks can be stored in the cache at the same time. Direct mapped: Problem! Only one of them can be stored at the same time since both map to the same slot. 12 mod 8 = 20 mod 8 = 4 No special problem. Both blocks can be stored in the same set at the same time.

Set associative:

Computer Architecture

WS 06/07

Dr.-Ing. Stefan Freinatis

Cache Organization
Direct Mapped
Hard-Allocation (no choice) Simple & Inexpensive No replacement strategy required If a process uses 2 blocks mapping to the same slot, cache misses are high.

Fully Associative
Full choice Expensive searching (hardware) for free slot Replacement strategy required

Set Associative
Compromise between direct mapped and fully associative Some choice Replacement strategy required
Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Replacement Strategies
Strategies for selecting a slot to evict (when necessary)

Random
Victim cache lines are selected randomly. Hardware pseudo-random number generator generates slot-numbers.

Least-Recently Used (LRU)

Relies on the temporal locality principle. The least recently used block is hoped to have smallest likelihood of (re)usage. Expensive hardware.

First-in, First out (FIFO)

Approximation of LRU by selecting the oldest block (oldest = load time).

Computer Architecture

WS 06/07

Dr.-Ing. Stefan Freinatis

Replacement Strategies
Two-way Capacity 16 kB 64 kB 256 kB LRU 114.1 103.4 92.2 Rand 117.3 104.3 92.1 FIFO 115.5 103.9 92.5 LRU 111.7 102.4 92.1 Four-way Rand 115.1 102.3 92.1 FIFO 113.3 103.1 92.5 LRU 109.0 99.7 92.1 Eight-way Rand 111.8 100.5 92.1 FIFO 110.4 100.3 92.5

Table: Data cache misses per 1000 instructions

Data collected for Alpha architecture, block size = 64 byte.

Data from [HP06 p.C-10]

LRU is best for small caches little difference between all strategies for large caches
Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Miss Categories
Compulsory Misses
The very first access to a block cannot be in the cache, so the block must be loaded. Also called cold-start misses.

Capacity Misses
Owing to the limited capacity of the cache, capacity misses will occur in addition to compulsory misses.

Conflict Misses
In set associative or direct mapped caches too many blocks may map to the same set (or slot). Also called collision misses.

Coherency Misses
are owing to cache flushes to keep multiple caches coherent in a multiprocessor. Not considered in this lecture (see lecture Advanced Computer Architecture).
Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Cache Optimization
Average memory access time = hit time + miss rate miss penalty
reducing miss rate

Larger block size larger cache capacity higher associativity

reducing miss penalty

Multilevel caches read over write

reducing hit time

avoiding address translation

Computer Architecture

WS 06/07

Dr.-Ing. Stefan Freinatis

Block Size
The data area gets larger cache lines (but less lines), the overall cache capacity remains the same.

reduced miss rate, taking advantage of spatial locality

more accesses will likely go to the same block

increased miss penalty

More bytes have to be fetched from main memory

increased conflict misses

cache has less slots (per set)

increased capacity misses

only for small caches. In case of high locality (e.g. repeatedly access to only one byte in a block) the remaining bytes are unused and waste up cache capacity.

Common block sizes are 32 ... 128 bytes

Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Block Size

Miss rate versus block size [from HP06 p.C-26]

Computer Architecture WS 06/07

Cache capacity

Dr.-Ing. Stefan Freinatis

Block Size
For the previous figure, assume the memory system takes 80 clock cycles of overhead and then delivers 16 bytes every 2 clock cycles. Assume the hit time to be 1 clock cycle independent of block size. Which block size has the smallest average memory access time? Average memory access time = hit time + miss rate miss penalty 4K cache, 16 byte block: Average memory access time = 1 + (8.57 % 82) = 8.027 clock cycles 4K cache, 32 byte block: Average memory access time = 1 + (7.24 % 84) = 7.082 clock cycles
... and so on for all cache sizes and block sizes.
Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Block Size
cache capacity
Block size Miss penalty

4K 8.027 7.082 7.160 8.469 11.651

16K 4.231 3.411 3.323 3.659 4.685

64K 2.673 2.134 1.933 1.979 2.288

256K 1.894 1.588 1.449 1.470 1.549

16 32 64 128 256

82 84 88 96 112

Average memory access time (in clock cycles) versus block size for 4 different cache capacities green values = best (smallest access time) per column (thus per cache)
Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Cache Capacity
The cache is enlarged by adding more cache slots.

reduced miss rate

owing to less capacity misses

potentially increased hit time

owing to increased complexity

increased hardware & power consumption

Miss rates for block size 64 bytes

4K 7.00 %

16K 2.64 %
38 %

64K 1.06 %
40 %

256K 0.51 %
48 %
WS 06/07

Cache capacity Miss rate

Computer Architecture

Dr.-Ing. Stefan Freinatis

Associativity
The higher the associativity the more slots per set.

reduced miss rate

primarily owing to less conflict misses

increased hit time

time needed for finding a free slot in the set

Rules of Thumb

Eight-way set associative is almost as effective as fully associative. A direct mapped cache with capacity N has about the same miss rate as a two-way set associative cache of capacity N/2.
Common associativities are 1 (direct mapped), 2, 4, 8
Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Associativity
4 K cache
Degree Miss rate [%]

8 K cache
Degree Miss rate [%]

16 K cache
Degree Miss rate [%]

1-way 2-way 4-way 8-way

9.8 7.6 7.1 7.1

1-way 2-way 4-way 8-way

6.8 4.9 4.4 4.4

1-way 2-way 4-way 8-way

4.9 4.1 4.1 4.1

64 K cache
Degree Miss rate [%]

128 K cache
Degree Miss rate [%]

512 K cache
Degree Miss rate [%]

1-way 2-way 4-way 8-way

3.7 3.1 3.0 2.9

1-way 2-way 4-way 8-way

2.1 1.9 1.9 1.9

WS 06/07

1-way 2-way 4-way 8-way

0.8 0.7 0.6 0.6

Data from [HP06 p.C-23]

Computer Architecture

Dr.-Ing. Stefan Freinatis

Associativity

(hit time)

Computer Architecture

WS 06/07

Dr.-Ing. Stefan Freinatis

Multi-Level Caches
Building a cache hierarchy.

First-Level Cache (L1)

small high speed cache usually located in the CPU

Second level cache (L2)

fast and bigger cache located close to CPU (chip set)

Third level cache (L3), optional

Separate memory chip between L2 and main memory

CPU

Main Memory

Computer Architecture

WS 06/07

Dr.-Ing. Stefan Freinatis

Multi-Level Caches
Multi-level caches reduce the average miss penalty because on a miss the block can be fetched from the next higher level instead from main memory.
Distinction between local and global cache considerations: number of cache misses number of cache accesses

Local miss rate =

Local to a cache (e.g. L1, L2, ...)

Global miss rate =

number of cache misses number of memory references by CPU

local misses versus global references

Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Multi-Level Caches
Example CH.1: Suppose that in 1000 memory references there are 40 misses in L1 and 20 misses in L2. What are the various miss rates? Local miss rateL1 = global miss rate L1 = Local miss rateL2 = Global miss rateL2 = 20 40 20 1000 = 50 % =2%
These 2 % go from L2 to main memory

40 =4% 1000

These 4% go from L1 to L2

CPU

Main Memory

Local miss rate L2 is large because L1 skims the cream of memory accesses.

Computer Architecture

WS 06/07

Dr.-Ing. Stefan Freinatis

Multi-Level Caches
Average memory access time = hit timeL1 + miss rateL1 miss penaltyL1

CPU

Main Memory

miss penaltyL1 = hit timeL2 + local miss rateL2 miss penaltyL2 = hit timeL1 + miss rateL1 (hit timeL2 + local miss rateL2 miss penaltyL2)

Computer Architecture

WS 06/07

Dr.-Ing. Stefan Freinatis

Multi-Level Caches
Using the miss rates from example CH.1, and the following data hit time L1 = 1 clock cycle, hit timeL2 = 10 clock cycles, miss penaltyL2 = 200 clock cycles, the average memory access time is = 1 + 0.04 (10 + 0.05 200) = 5.5 clock cycles

hit timeL1 + miss rateL1 (hit timeL2 + local miss rateL2 miss penaltyL2)

Computer Architecture

WS 06/07

Dr.-Ing. Stefan Freinatis

Read over Write

Assume a direct mapped write-through cache with 512 slots, and a four-word write buffer that is not checked on a read miss.
store word

c d e

SW R3, 512(R0) LW R1, 1024(R0) LW R2, 512(R0)

load word

; mem[512]:= R3 ; R1:= mem[1024] ; R2:= mem[512]

(cache slot 0) (cache slot 0) (cache slot 0)

Read-after-write hazard: c The data in R3 is placed in write buffer. d causes a read miss. Cache line is discarded. e again causes a read miss. If the write buffer has not completed writing R3 into memory, e will read an incorrect value from mem[512]. Cache

CPU

Memory

Write Buffer Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Read over Write

Solutions to previous problem

Read misses wait until write buffer is empty,

and thereafter the required memory block is fetched into cache.

Check contents of write buffer,

if referenced data not in buffer let the read-access continue fetching the block into the cache. Write buffer is flushed later when memory system is available.

Giving reads priority over writes

Also applicable to write-back caches. The dirty block is put into a write buffer that allows inspection in case of a read miss. Read misses check the buffer before directly going to memory.

Computer Architecture

WS 06/07

Dr.-Ing. Stefan Freinatis

Address Translation
What addresses are cached, virtual or physical addresses?

A fully virtual cache uses logical addresses only, a fully physical cache uses physical addresses only.

virtual

CPU

address

virtual cache

virtual address

physical

translation

address

memory

segment tables / page tables / TLB

virtual

physical

CPU

address

translation

address

physical cache

physical address

memory

Computer Architecture

WS 06/07

Dr.-Ing. Stefan Freinatis

Address Translation
Fully virtual cache: No address translation time on a hit Cache must have copies of protection information
Protection info must be fetched from page/segment tables.

Cache flush on processor switch

Individual virtual addresses usually refer to different physical addresses.

Shared Memory
Different virt. addresses refer to same phys. address. Copies of same data in cache.

Fully physical cache: Very well on shared memory accesses Always address translation (time)
Hits are of no advantage regarding address translation

Computer Architecture

WS 06/07

Dr.-Ing. Stefan Freinatis

Address Translation
Solution: get the best from both virtual and physical caches Two issues in accessing a cache:

Indexing the cache

that is, calculating the target set (or slot with direct mapping)

Comparing tags
comparing the tag field with (parts of) the block address

The page offset (the part that is identical in both virtual and physical address space) is used to index the cache. In parallel, the virtual part of the address is translated into the physical address and used for tag comparison. Improved hit time. virtually indexed, physically tagged cache
Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Address Translation
Virtually indexed, physically tagged cache
virtual address

CPU

page number

page offset
word offset tags data data

Translation
(TLB, page table)

...

... Cache

physical address

frame address

offset
next memory level
WS 06/07 Dr.-Ing. Stefan Freinatis

next memory level

Computer Architecture

Cache Optimization
Technique Larger block size Larger cache capacity Higher associativity Multi-level cache Read over write Address translation + + +
Hit time Miss penalty Miss rate Complexity

Comment

+ + +

0 1 1 2 1 1

Trivial Widely used for L2 Widely used

Costly hardware; harder if L1 block size L2 block size

Widely used Widely used

Data from [HP06 p.C-39]

+ = improves a factor, = hurts a factor, blank = no impact

Summary of basic cache optimizations

Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Exam Computer Architecture

March 9th, 2007

09.03.2007
Date

ST 025/118
Location

8:30 hrs
Time

Duisburg - Ruhrort !

CA
Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Computer Architecture

WS 06/07

Dr.-Ing. Stefan Freinatis

Scheduling: - Job Queue - Ready Queue - Device Queue - IPC Queue
No ratings yet
Scheduling: - Job Queue - Ready Queue - Device Queue - IPC Queue
20 pages
Process Management
No ratings yet
Process Management
10 pages
Segmentation
No ratings yet
Segmentation
35 pages
Chapter 8: Main Memory
No ratings yet
Chapter 8: Main Memory
10 pages
Ch6 Process Synchronization 6
No ratings yet
Ch6 Process Synchronization 6
11 pages
Section08 Memory Management
No ratings yet
Section08 Memory Management
35 pages
Ipc
No ratings yet
Ipc
19 pages
Introduction To Introduction To Introduction To Introduction To Operating Systems Operating Systems
No ratings yet
Introduction To Introduction To Introduction To Introduction To Operating Systems Operating Systems
6 pages
Magnetic Disks
No ratings yet
Magnetic Disks
19 pages
Deadlock 2
No ratings yet
Deadlock 2
8 pages
Chapter 15: Security
No ratings yet
Chapter 15: Security
8 pages
Associative Memory
No ratings yet
Associative Memory
18 pages
Crash Recovery: A C I D
No ratings yet
Crash Recovery: A C I D
9 pages
Physical Database Design
No ratings yet
Physical Database Design
9 pages
Evaluation of Relational Operations: Chapter 14, Part A (Joins)
No ratings yet
Evaluation of Relational Operations: Chapter 14, Part A (Joins)
6 pages
Hash-Based Indexes: As For Any Index, 3 Alternatives For Data Entries K
No ratings yet
Hash-Based Indexes: As For Any Index, 3 Alternatives For Data Entries K
7 pages
Optimizing Relational Operations
No ratings yet
Optimizing Relational Operations
4 pages
Storing Data: Disks and Files: Why Not Store Everything in Main Memory?
No ratings yet
Storing Data: Disks and Files: Why Not Store Everything in Main Memory?
10 pages
M032LD
No ratings yet
M032LD
288 pages
C04 - LD800HSE Linking Device
No ratings yet
C04 - LD800HSE Linking Device
20 pages
Rockwell DF 1 Full Dupea
No ratings yet
Rockwell DF 1 Full Dupea
20 pages
Allwinner R16 User Manual V1.2
No ratings yet
Allwinner R16 User Manual V1.2
529 pages
Lecture 1 Thay Tu
100% (1)
Lecture 1 Thay Tu
87 pages
7693v3.0 (G52 76931X8) (970A G43) PDF
No ratings yet
7693v3.0 (G52 76931X8) (970A G43) PDF
80 pages
HW - Guide ASR 901 PDF
100% (1)
HW - Guide ASR 901 PDF
72 pages
Professional Document Scanners Comparison
No ratings yet
Professional Document Scanners Comparison
1 page
File: /home/scoch/desktop/file Page 1 of 50
No ratings yet
File: /home/scoch/desktop/file Page 1 of 50
50 pages
12F509 PDF
No ratings yet
12F509 PDF
20 pages
Gaisler Research IP Library
No ratings yet
Gaisler Research IP Library
1,055 pages
Report Lab4
No ratings yet
Report Lab4
26 pages
MIPS Pipeline Datapath Project
No ratings yet
MIPS Pipeline Datapath Project
22 pages
HW Delivery Report - EMP - Storage Migration
No ratings yet
HW Delivery Report - EMP - Storage Migration
6 pages
Active Cell Phone Detector
100% (3)
Active Cell Phone Detector
10 pages
g3 Enable Full Speed Hatch Open
No ratings yet
g3 Enable Full Speed Hatch Open
2 pages
Vmware Vsphere With Ontap Best Practices: Netapp Solutions
No ratings yet
Vmware Vsphere With Ontap Best Practices: Netapp Solutions
36 pages
7 PDF - 7750 SR-e SERIES ARCHITECTURE AND HARDWARE OVERVIEW
No ratings yet
7 PDF - 7750 SR-e SERIES ARCHITECTURE AND HARDWARE OVERVIEW
20 pages
RISC vs. CISC Processor Design
No ratings yet
RISC vs. CISC Processor Design
7 pages
Installation - Instruction 17mb95s-1 PDF
No ratings yet
Installation - Instruction 17mb95s-1 PDF
4 pages
Sites Google Com Site Lcd4hobby Lvds Cables D 30 Pin Cable
No ratings yet
Sites Google Com Site Lcd4hobby Lvds Cables D 30 Pin Cable
5 pages
Printer Line-Up 2001-2002: ML-4500 ML-4600 ML-1210 ML-1220M ML-1250 ML-1650 ML-7300N
No ratings yet
Printer Line-Up 2001-2002: ML-4500 ML-4600 ML-1210 ML-1220M ML-1250 ML-1650 ML-7300N
1 page
Anna University CA
No ratings yet
Anna University CA
9 pages
A Design of RISC-V RV32IMAFC Core For MCU
No ratings yet
A Design of RISC-V RV32IMAFC Core For MCU
33 pages
Inventura Carnet 2022
No ratings yet
Inventura Carnet 2022
2 pages
J 3650
No ratings yet
J 3650
49 pages
SUNLUX Price List-09.05.2023
No ratings yet
SUNLUX Price List-09.05.2023
2 pages
Bluetooth Keyboard Guide
100% (1)
Bluetooth Keyboard Guide
11 pages
Ic 693 Cpu 352
No ratings yet
Ic 693 Cpu 352
7 pages
16Ch. 2CIF Home-Use Standalone DVR: Main Features
No ratings yet
16Ch. 2CIF Home-Use Standalone DVR: Main Features
2 pages