0% found this document useful (0 votes)

86 views6 pages

Cache Performance Improving Cache Performance

Cache sheet she star ku dude Jesse khub sdtgv gddgh dhhy Street ku Street des due need aged check iniki are aren't and have is

Uploaded by

Pradeep Jagathratchagan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

86 views6 pages

Cache Performance Improving Cache Performance

Cache sheet she star ku dude Jesse khub sdtgv gddgh dhhy Street ku Street des due need aged check iniki are aren't and have is

Uploaded by

Pradeep Jagathratchagan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Cache Performance Improving Cache Performance

• CPI contributed by cache = CPIc • To improve cache performance:

= miss rate * number of cycles to handle the miss – Decrease miss rate without increasing time to handle the miss
(more precisely: without increasing average memory access time)
• Another important metric – Decrease time to handle the miss w/o increasing miss rate
Average memory access time = cache hit time * hit rate • A slew of techniques: hardware and/or software
+ Miss penalty * (1 - hit rate) – Increase capacity, associativity etc.
– Hardware assists (victim caches, write buffers etc.)
– Tolerating memory latency: Prefetching (hardware and software),
lock-up free caches
– O.S. interaction: mapping of virtual pages to decrease cache
conflicts
– Compiler interactions: code and data placement; tiling

Cache Perf. CSE 471 Autumn 01 1 Cache Perf. CSE 471 Autumn 01 2

Obvious Solutions to Decrease Miss Rate What about Cache Block Size?

• Increase cache capacity • For a given application, cache capacity and associativity,
– Yes, but the larger the cache, the slower the access time there is an optimal cache block size
– Limitations for first-level (L1) on-chip caches
• Long cache blocks
– Solution: Cache hierarchies (even on-chip)
– Good for spatial locality (code, vectors)
– Increasing L2 capacity can be detrimental on multiprocessor
systems because of increase in coherence misses – Reduce compulsory misses (implicit prefetching)
• Increase cache associativity – But takes more time to bring from next level of memory hierarchy
– Yes, but “law of diminishing returns” (after 4-way for small (can be compensated by “critical word first” and subblocks)
caches; not sure of the limit for large caches) – Increase possibility of fragmentation (only fraction of the block is
– More comparisons needed, i.e., more logic and therefore longer used – or reused)
time to check for hit/miss? – Increase possibility of false-sharing in multiprocessor systems
– Make cache look more associative than it really is (see later)

Cache Perf. CSE 471 Autumn 01 3 Cache Perf. CSE 471 Autumn 01 4

What about Cache Block Size? (c’ed) Example of (naïve) Analysis

• In general, the larger the cache, the longer the best block • Choice between 32B (miss rate m32) and 64B block sizes
size (e.g., 32 or 64 bytes for on-chip, 64, 128 or even 256 (m64)
bytes for large off-chip caches) • Time of a miss:
• Longer block sizes in I-caches – send request + access time + transfer time
– Sequentiality of code – send request independent of block size (e.g., 4 cycles)
– Matching with the IF unit – access time can be considered independent of block size (memory
interleaving) (e.g., 28 cycles)
– transfer time depends on bus width. For a bus width of say 64 bits
transfer time is twice as much for 64B (say 16 cycles) than for 32B
(8 cycles).
– In this example, 32B is better if (4+28+8)m32 < (4+28+16) m64

Cache Perf. CSE 471 Autumn 01 5 Cache Perf. CSE 471 Autumn 01 6
Example of (naïve) Analysis (c’ed) Impact of Associativity

• Case 1. 16 KB cache: m32 = 2.87, m64 = 2.64 • “Old” conventional wisdom

– 2.87 * 40 < 2.64 * 48 – Direct-mapped caches are faster; cache access is bottleneck for on-
chip L1; make L1 caches direct mapped
• Case 2. 64KB cache: m32 = 1.35, m64 = 1.06
– For on-board (L2) caches, direct-mapped are 10% faster.
– 1.35 * 40 > 1.06 * 48
• “New” conventional wisdom
• 32B better for 16KB and 64B better for 64KB – Can make 2-way set-associative caches fast enough for L1. Allows
– (Of course the example was designed to support the “in general the larger caches to be addressed only with page offset bits (see later)
larger the cache, the longer the best block size ” statement of two – Looks like time-wise it does not make much difference for L2/L3
slides ago). caches, hence provide more associativity (but if caches are
extremely large there might not be much benefit)

Cache Perf. CSE 471 Autumn 01 7 Cache Perf. CSE 471 Autumn 01 8

2.Miss in L1; Hit in VC; Send

data to register and swap
Reducing Cache Misses with more
“Associativity” -- Victim caches 3’. evicted

Victim Cache
• First example (in this course) of an “hardware assist” 1. Hit
• Victim cache: Small fully-associative buffer “behind” the
L1 cache and “before” the L2 cache
• Of course can also exist “behind” L2 and “before” main
memory
Cache
• Main goal: remove some of the conflict misses in L1
Index + Tag
direct-mapped caches (or any cache with low associativity)

3. From next level of

memory hierarchy

Cache Perf. CSE 471 Autumn 01 9 Cache Perf. CSE 471 Autumn 01 10

Bringing more Associativity --

Operation of a Victim Cache
Column-associative Caches
• 1. Hit in L1; Nothing else needed • Split (conceptually) direct-mapped cache into two halves
• 2. Miss in L1 for block at location b, hit in victim cache at • Probe first half according to index. On hit proceed
location v: swap contents of b and v (takes an extra cycle) normally
• 3. Miss in L1, miss in victim cache : load missing item • On miss, probe 2nd half ; If hit, send to register and swap
from next level and put in L1; put entry replaced in L1 in with entry in first half (takes an extra cycle)
victim cache; if victim cache is full, evict one of its entries. • On miss (on both halves) go to next level, load in 2nd half
• Victim buffer of 4 to 8 entries for a 32KB direct-mapped and swap
cache works well.

Cache Perf. CSE 471 Autumn 01 11 Cache Perf. CSE 471 Autumn 01 12
Skewed-associative Caches Reducing Conflicts --Page Coloring

• Have different mappings for the two (or more) banks of the • Interaction of the O.S. with the hardware
set-associative cache • In caches where the cache size > page size * associativity,
• First mapping conventional; second one “dispersing” the bits of the physical address (besides the page offset) are
addresses (XOR a few bits) needed for the index.
• On a page fault, the O.S. selects a mapping such that it
• Hit ratio of 2-way skewed as good as 4-way conventional.
tries to minimize conflicts in the cache .

Cache Perf. CSE 471 Autumn 01 13 Cache Perf. CSE 471 Autumn 01 14

Options for Page Coloring Tolerating/hiding Memory Latency

• Option 1: It assumes that the process faulting is using the • One particular technique: prefetching
whole cache • Goal: bring data in cache just in time for its use
– Attempts to map the page such that the cache will access data as if
it were by virtual addresses – Not too early otherwise cache pollution
– Not too late otherwise “hit-wait”cycles
• Option 2: do the same thing but hash with bits of the PID
(process identification number) • Under the constraints of (among others)
– Reduce inter-process conflicts (e.g., prevent pages corresponding – Imprecise knowledge of instruction stream
to stacks of various processes to map to the same area in the cache) – Imprecise knowledge of data stream
• Implemented by keeping “bins” of free pages • Hardware/software prefetching
– Works well for regular stride data access
– Difficult when there are pointer-based accesses

Cache Perf. CSE 471 Autumn 01 15 Cache Perf. CSE 471 Autumn 01 16

Why, What, When, Where Hardware Prefetching

• Why? • Nextline prefetching for instructions

– cf. goals: Hide memory latency and/or reduce cache misses – Bring missing block and the next one if not already there)
• What • OBL “one block look-ahead” for data prefetching
– Ideally a semantic object – As Nextline but with more variations -- e.g. depends on whether
– Practically a cache block, or a sequence of cache blocks prefetching was successful the previous time
• When • Use of special assists:
– Ideally, just in time. – Stream buffers, i.e., FIFO queues to fetch consecutive lines (good
– Practically, depends on the prefetching technique for instructions not that good for data);
– Stream buffers with hardware stride detection mechanisms;
• Where
– Use of a reference prediction table etc.
– In the cache or in a prefetch buffer

Cache Perf. CSE 471 Autumn 01 17 Cache Perf. CSE 471 Autumn 01 18
Software Prefetching Techniques to Reduce Cache Miss Penalty

• Use of special instructions (cache hints: touch in Power • Give priority to reads -> Write buffers
PC, load in register 31 for Alpha, prefetch in recent • Send the requested word first -> critical word or wrap
micros) around strategy
• Non-binding prefetch (in contrast with proposals to • Sectored (subblock) caches
prefetch in registers). • Lock-up free (non-blocking) caches
– If an exception occurs, the prefetch is ignored.
• Cache hierarchy
• Must be inserted by software (compiler analysis)
• Advantage: no special hardware
• Drawback: more instructions executed.

Cache Perf. CSE 471 Autumn 01 19 Cache Perf. CSE 471 Autumn 01 20

Write Policies A Sample of Write Mechanisms

• Loads (reads) occur twice as often as stores (writes) • Fetch-on-write and Write-allocate
– Proceed like on a read miss followed by a write hit.
• Miss ratio of reads and miss ratio of writes pretty much the
– The preferred method for write-back caches.
same
• No-write-before-hit, no-fetch-on-write and no-write-
• Although it is more important to optimize read allocate (called “write-around”)
performance, write performance should not be neglected – The cache is not modified on write misses
• Write misses can be delayed w/o impeding the progress of • No-fetch-on-write and write-allocate (“write-validate”)
execution of subsequent instructions – Write directly in cache and invalidate all other parts of the line
being written. Requires a valid bit/writeable entity. Good for
initializing a data structure.
– Assumedly the best policy for elimination of write misses in write-
through caches but more expensive (dirty bits)

Cache Perf. CSE 471 Autumn 01 21 Cache Perf. CSE 471 Autumn 01 22

Write Mechanisms (c’ed) Write Buffers

• write-before-hit, no-fetch-on-write and no-write-allocate • Reads are more important than

(called “write invalidate”) – Writes to memory if WT cache
– Possible only for direct-mapped write-through caches – Replacement of dirty lines if WB
– The data is written in the cache totally in parallel with checking of
the tag. On a miss, the rest of the line is invalidated as in the write- • Hence buffer the writes in write buffers
validate case – Write buffers = FIFO queues to store data
• A mix of these policies can be (and has been) implemented – Since writes have a tendency to come in bunches (e.g., on
– Dictate the policy on a page per page basis (bits set in the page procedure calls, context-switches etc), write buffers must be
table entry) “deep”
– Have the compiler generate instructions (hints) for dynamic
changes in policies (e.g. write validate on initialization of a data
structure)

Cache Perf. CSE 471 Autumn 01 23 Cache Perf. CSE 471 Autumn 01 24
Write Buffers (c’ed) Coalescing Write Buffers and Write Caches

• Writes from write buffer to next level of the memory • Coalescing write buffers
hierarchy can proceed in parallel with computation – Writes to an address (block) already in the write buffer are
• Now loads must check the contents of the write buffer; combined
• Note the tension between writing the coalescing buffer to memory at
also more complex for cache coherency in multiprocessors high rate -- more writes -- vs. coalescing to the max -- but buffer
– Allow read misses to bypass the writes in the write buffer might become full
• Extend write buffers to small fully associative write caches
with WB strategy and dirty bit/byte.
– Not implemented in any machine I know of

Cache Perf. CSE 471 Autumn 01 25 Cache Perf. CSE 471 Autumn 01 26

Critical Word First Sectored (or subblock) Caches

• Send first, from next level in memory hierarchy, the word • First cache ever (IBM 360/85 in late 60’s) was a sector
for which there was a miss cache
– On a cache miss, send only a subblock, change the tag and
• Send that word directly to CPU register (or IF buffer if it’s invalidate all other subblocks
an I-cache miss) as soon as it arrives – Saves on memory bandwidth
• Need a one block buffer to hold the incoming block (and • Reduces number of tags but requires good spatial locality
shift it) before storing it in the cache in application
• Requires status bits (valid, dirty) per subblock
• Might reduce false-sharing in multiprocessors
– But requires metadata status bits for each subblock
– Alpha 21164 L2 uses a dirty bit/16 B for a 64B block size

Cache Perf. CSE 471 Autumn 01 27 Cache Perf. CSE 471 Autumn 01 28

Status Sector Cache Lock-up Free Caches

bits
data
• Proposed in early 1980’s but implemented only recently
tag subblock1 subblockn because quite complex
• Allow cache to have several outstanding miss requests (hit
under miss).
– Cache miss “happens” during EX stage, i.e., longer (unpredictable)
latency
– Important not to slow down operations that don’t depend on results
of the load
• Single hit under miss (HP PA 1700) relatively simple
• For several outstanding misses, require the use of MSHR’s
(Miss Status Holding Register).

Cache Perf. CSE 471 Autumn 01 29 Cache Perf. CSE 471 Autumn 01 30
MSHR’s Implementation of MSHR’s

• The outstanding misses do not necessarily come back in • Quite a variety of alternatives
the order they were detected – MIPS 10000, Alpha 21164, Pentium Pro
– For example, miss 1 can percolate from L1 to main memory while • One particular way of doing it:
miss 2 can be resolved at the L2 level
– Valid (busy) bit (limited number of MSHR’s – structural hazard)
• Each MSHR must hold information about the particular – Address of the requested cache block
miss it will handle such as: – Index in the cache where the block will go
– Info. relative to its placement in the cache – Comparator (to prevent using the same MSHR for a miss to the
– Info. relative to the “missing” item (word, byte) and where to same block)
forward it (CPU register) – If data to be forwarded to CPU at the same time as in the cache,
needs addresses of registers (one per possible word/byte)
– Valid bits (for writes)

Cache Perf. CSE 471 Autumn 01 31 Cache Perf. CSE 471 Autumn 01 32

Cache Hierarchy Characteristics of Cache Hierarchy

• Two, and even three, levels of caches quite common now • Multi-Level inclusion (MLI) property between off-board
• L2 (or L3, i.e., board-level) very large but since L1 filters cache (L2 or L3) and on-chip cache(s) (L1 and maybe L2)
many references, “local” hit rate might appear low (maybe – L2 contents must be a superset of L1 contents (or at least have
50%) (compulsory misses still happen) room to store these contents if L1 is write-back)
– If L1 and L2 are on chip, they could be mutually exclusive (and
• In general L2 have longer cache blocks and larger inclusion will be with L3)
associativity – MLI very important for cache coherence in multiprocessor systems
• In general L2 caches are write-back, write allocate (shields the on-chip caches from unnecessary interference)
• Prefetching at L2 level is an interesting challenge (made
easier if L2 tags are kept on-chip)

Cache Perf. CSE 471 Autumn 01 33 Cache Perf. CSE 471 Autumn 01 34

“Virtual” Address Caches Impact of Branch Prediction on Caches

• Will get back to this after we study TLB’s • If we are on predicted path and:
• Virtually addressed, virtually tagged caches – An I-cache miss occurs, what should we do: stall or fetch?
– Main problem to solve is the Synonym problem (2 virtual – A D-cache miss occurs, what should we do: stall or fetch?
addresses corresponding to the same physical address). • If we fetch and we are on the right path, it’s a win
• Virtually addressed, physically tagged • If we fetch and are on the wrong path, it is not necessarily
– Advantage: can allow cache and TLB accesses concurrently a loss
– Easy and usually done for small L1, i.e., capacity < (page * ass.) – Could be a form of prefetching (if branch was mispredicted, there
– Can be done for larger caches if O.S. does a form of page coloring is a good chance that that path will be taken later)
such that “index” is the same for synonyms – However, the channel between the cache and higher-level of
hierarchy is occupied while something more pressing could be
waiting for it

Cache Perf. CSE 471 Autumn 01 35 Cache Perf. CSE 471 Autumn 01 36

Cache
No ratings yet
Cache
34 pages
05) Cache Memory Introduction
No ratings yet
05) Cache Memory Introduction
20 pages
Chapter # 05
No ratings yet
Chapter # 05
42 pages
UNIT2 Cahe-Opt
No ratings yet
UNIT2 Cahe-Opt
134 pages
Memory Hierarchy Design Guide
No ratings yet
Memory Hierarchy Design Guide
54 pages
Cache Performance
No ratings yet
Cache Performance
41 pages
Cache Optimization Techniques
No ratings yet
Cache Optimization Techniques
23 pages
L18 Cache Wrap Up
No ratings yet
L18 Cache Wrap Up
30 pages
Cache Presentation
No ratings yet
Cache Presentation
45 pages
Lec8 Memory
No ratings yet
Lec8 Memory
17 pages
10 Caches
No ratings yet
10 Caches
34 pages
10 Cacheperf
No ratings yet
10 Cacheperf
24 pages
Memory Hierarchy Design Guide
No ratings yet
Memory Hierarchy Design Guide
115 pages
Cache Optimization Techniques
No ratings yet
Cache Optimization Techniques
4 pages
CMP3010L09 MemoryII
No ratings yet
CMP3010L09 MemoryII
39 pages
L07 MemoryII
No ratings yet
L07 MemoryII
27 pages
Week 13 - Lecture 13 - Memory (Cont)
No ratings yet
Week 13 - Lecture 13 - Memory (Cont)
31 pages
Lectures wk11
No ratings yet
Lectures wk11
21 pages
10 Caches Detail
No ratings yet
10 Caches Detail
45 pages
Cache Misses
No ratings yet
Cache Misses
8 pages
Computer Org and Arch: R.Magesh
No ratings yet
Computer Org and Arch: R.Magesh
48 pages
Memory Hierarchy for Engineers
No ratings yet
Memory Hierarchy for Engineers
15 pages
COMP 740: Computer Architecture and Implementation: Montek Singh
No ratings yet
COMP 740: Computer Architecture and Implementation: Montek Singh
41 pages
ch2 Appb
No ratings yet
ch2 Appb
58 pages
Memory Cache
No ratings yet
Memory Cache
18 pages
Memory Hierarchies (Part 2) Review: The Memory Hierarchy
No ratings yet
Memory Hierarchies (Part 2) Review: The Memory Hierarchy
7 pages
Lect12 Cache
No ratings yet
Lect12 Cache
39 pages
Week12 Updated
No ratings yet
Week12 Updated
60 pages
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
No ratings yet
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
55 pages
Unit 4
No ratings yet
Unit 4
72 pages
Caches - Basic Idea
No ratings yet
Caches - Basic Idea
11 pages
ELT3047 Computer Architecture: Lecture 10: Associative Cache
No ratings yet
ELT3047 Computer Architecture: Lecture 10: Associative Cache
26 pages
Graduate Computer Architecture: Caches & Memory Systems
No ratings yet
Graduate Computer Architecture: Caches & Memory Systems
49 pages
15IF11 Multicore B
No ratings yet
15IF11 Multicore B
36 pages
09 Caches Tlbs
No ratings yet
09 Caches Tlbs
33 pages
Cache Memory
No ratings yet
Cache Memory
28 pages
Lecture 5 Cache Optimization
No ratings yet
Lecture 5 Cache Optimization
25 pages
Cache Design
No ratings yet
Cache Design
59 pages
Unit V
No ratings yet
Unit V
44 pages
CMSC 611: Advanced Computer Architecture
No ratings yet
CMSC 611: Advanced Computer Architecture
21 pages
Advanced Computer Architecture-06CS81-Memory Hierarchy Design
No ratings yet
Advanced Computer Architecture-06CS81-Memory Hierarchy Design
18 pages
Cau 6 Cache
No ratings yet
Cau 6 Cache
25 pages
Cache 2 Output
No ratings yet
Cache 2 Output
37 pages
Cache Memory Organization Guide
No ratings yet
Cache Memory Organization Guide
19 pages
Cache Memory: A Safe Place For Hiding or Storing Things
100% (1)
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
361 Computer Architecture Lecture 14: Cache Memory
No ratings yet
361 Computer Architecture Lecture 14: Cache Memory
20 pages
Cache Memory: A Safe Place For Hiding or Storing Things
No ratings yet
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
Memory Hierarchy - Ways To Reduce Misses: DAP Spr. 98 ©UCB 1
No ratings yet
Memory Hierarchy - Ways To Reduce Misses: DAP Spr. 98 ©UCB 1
23 pages
Cache Memory Management Guide
No ratings yet
Cache Memory Management Guide
45 pages
Cse 410 Computer Systems: Hal Perkins Spring 2010 L T 13 C Hwit DPF Lecture 13 - Cache Writes and Performance
No ratings yet
Cse 410 Computer Systems: Hal Perkins Spring 2010 L T 13 C Hwit DPF Lecture 13 - Cache Writes and Performance
20 pages
ECE4680 Computer Organization and Architecture Memory Hierarchy: Cache System
No ratings yet
ECE4680 Computer Organization and Architecture Memory Hierarchy: Cache System
25 pages
2015Sp CS61C L16 Kavs Caches3
No ratings yet
2015Sp CS61C L16 Kavs Caches3
25 pages
Cache Memory
No ratings yet
Cache Memory
39 pages
ECE 124 Lab Manual Spring 2014
No ratings yet
ECE 124 Lab Manual Spring 2014
29 pages
Movement of Robot Using Bluetooth
No ratings yet
Movement of Robot Using Bluetooth
12 pages
CARPROG BMW Airbag Reset Manual PDF
No ratings yet
CARPROG BMW Airbag Reset Manual PDF
6 pages
Lesson Planning MP Dilpma - Suresh Tank
No ratings yet
Lesson Planning MP Dilpma - Suresh Tank
7 pages
Previous Year Question Paper and Answers: Page 1 of 4
No ratings yet
Previous Year Question Paper and Answers: Page 1 of 4
4 pages
VHDL Cheat Sheet
No ratings yet
VHDL Cheat Sheet
2 pages
Comtech
No ratings yet
Comtech
5 pages
Interrupt Notes
No ratings yet
Interrupt Notes
10 pages
SBC Shematics
No ratings yet
SBC Shematics
16 pages
IO-Device PROFINET Functions TIAPortal en
No ratings yet
IO-Device PROFINET Functions TIAPortal en
8 pages
PIPELINED MULTIPLIER Project Idea
No ratings yet
PIPELINED MULTIPLIER Project Idea
9 pages
DELL XPS 13-9350 AAZ80 LA-C881P r1.0 A00
100% (1)
DELL XPS 13-9350 AAZ80 LA-C881P r1.0 A00
59 pages
Intel Microprocessors History
No ratings yet
Intel Microprocessors History
28 pages
tms320f241 PDF
No ratings yet
tms320f241 PDF
125 pages
COA Imp Questions Chapter-5
No ratings yet
COA Imp Questions Chapter-5
2 pages
w483 Cp1e Instructions Reference Manual en PDF
No ratings yet
w483 Cp1e Instructions Reference Manual en PDF
467 pages
BCN-B HW Architecture
No ratings yet
BCN-B HW Architecture
21 pages
Cy91520series32 Bitfr81smicrocontroller
No ratings yet
Cy91520series32 Bitfr81smicrocontroller
281 pages
Week 7
100% (1)
Week 7
68 pages
Ethernet IP Via 1761 NET ENI To SLC503
No ratings yet
Ethernet IP Via 1761 NET ENI To SLC503
21 pages
FPGA System Configuration Guide
No ratings yet
FPGA System Configuration Guide
1 page
Combinational Circuits & Sequential Circuits Latches, Flip-Flops and Registers
No ratings yet
Combinational Circuits & Sequential Circuits Latches, Flip-Flops and Registers
4 pages
ASIC Interview Question & Answer - Verilog Interview Questions
No ratings yet
ASIC Interview Question & Answer - Verilog Interview Questions
18 pages
BLDC Motor Speed Control Guide
100% (1)
BLDC Motor Speed Control Guide
41 pages
Arm11 Mpcore r1p0 TRM
No ratings yet
Arm11 Mpcore r1p0 TRM
730 pages
Week 6
No ratings yet
Week 6
11 pages
Computer Fundamentals PDF
No ratings yet
Computer Fundamentals PDF
113 pages
USB Enumeration Code and More For MAX3420
No ratings yet
USB Enumeration Code and More For MAX3420
24 pages
CD54HC4094, CD74HC4094, CD74HCT4094: High Speed CMOS Logic 8 Stage Shift and Store Bus Register, Three State
No ratings yet
CD54HC4094, CD74HC4094, CD74HCT4094: High Speed CMOS Logic 8 Stage Shift and Store Bus Register, Three State
22 pages
V N VX: NEC Versa Note VX Series Notebook Computers
No ratings yet
V N VX: NEC Versa Note VX Series Notebook Computers
86 pages

Cache Performance Improving Cache Performance

Uploaded by

Cache Performance Improving Cache Performance

Uploaded by

Cache Performance Improving Cache Performance

• CPI contributed by cache = CPIc • To improve cache performance:

What about Cache Block Size? (c’ed) Example of (naïve) Analysis

• Case 1. 16 KB cache: m32 = 2.87, m64 = 2.64 • “Old” conventional wisdom

2.Miss in L1; Hit in VC; Send

3. From next level of

Bringing more Associativity --

Options for Page Coloring Tolerating/hiding Memory Latency

Why, What, When, Where Hardware Prefetching

• Why? • Nextline prefetching for instructions

Write Policies A Sample of Write Mechanisms

Write Mechanisms (c’ed) Write Buffers

• write-before-hit, no-fetch-on-write and no-write-allocate • Reads are more important than

Critical Word First Sectored (or subblock) Caches

Status Sector Cache Lock-up Free Caches

Cache Hierarchy Characteristics of Cache Hierarchy

“Virtual” Address Caches Impact of Branch Prediction on Caches

You might also like