A4 版本1 （未使用）

The document discusses various aspects of computer architecture, including instruction set design, memory hierarchy, and performance metrics such as latency and throughput. It covers concepts like pipelining, hazards, cache performance, and the importance of branch prediction in CPU efficiency. Additionally, it highlights the differences between RISC and CISC architectures and the significance of resource management in embedded systems.

Uploaded by

sd257216114

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views2 pages

A4 版本1 （未使用）

Uploaded by

sd257216114

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Server: dependability, scalability, efficient throughput 3. Heap: dynamic objects; accessed with pointers Holding or deleting any instr.

ng or deleting any instr. after branch until the branch

Embedded: Strict resource constraints At least 16 GPRs + separate floating-point registers destination is known. Penalty is fixed.
Average Selling Price (ASP): Caller-saving: Caller saves any registers that it wants to use 2. Predict-not-taken (Predict-untaken)
Component cost + direct cost + indirect cost. after the call, then invoke. 3. Predict-taken
List price: Callee-saving: First invoke, then callee saves the registers. Only useful when the target is known before the branch
Not ASP. Stores add to the ASP to get their cut. Want RISC (Reduced Instruction Set Computer) outcome.
50% to 75% of list price. CISC (Complex Instruction Set Computer) No advantage at all for MIPS 5-stage pipeline
Classes of computers The MIPS Architecture 4. Delayed branch
SISD：每个时钟周期一条数据流和一条指令流 1. A simple load-store instruction set Instruction always executes no matter whether the branch
MISD：每个时钟周期多条指令流，单个数据集 2. Design for pipelining efficiency, including a fixed taken or not taken.
SIMD：每个时钟周期在多个数据流上执行单个指令流 instruction set encoding RAW (Read after write) true depend. B read b4 A write
ISA: Instruction Set Architecture 指令集架构 the interface 3. Efficiency as a complier target WAW (Write after write) output depend. B write b4 A write
between hardware and software Pipeline WAR (Write after read) anti-depend. B write b4 A read
Interface Design: 1. Provides convenient functionality to Goal: make fast CPU Branch prediction
higher levels 2. Permits an efficient implementation at How: takes advantage of parallelism Static branch prediction
lower levels Decompose by steps Dynamic branch prediction
Computer Architecture is the science and art of selecting Pipeline - Implementation technique whereby different
and inter-connecting hardware components to create instructions are overlapped in execution at the same time.
computers that meet functional, performance, cost and Make fast CPUdecrease CPU timeimprove throughput;
power goals. improve efficiency for resources
1. Instruction Set design 2.Organization 3. Hardware Pipeline: 1. Many stages 2. Each stage carries out a Exceptions and Interrupts
MTTF 平均故障时间，MTTR 平均修复时间，FIT=1/MTTF different part 3. Cooperate at a synchronized clock. 4. Exceptions are exceptional events that disrupt the normal
故障率，MTBF = MTTF + MTTR 平均无故障时间，模块 Exploit parallelism flow of a program
Latency: It's the amount of time between when the Pipeline must be safely shutdown when exception occurs
可用性=MTTF/MTBF
instruction is issued and when it completes. and then restarted at the offending instruction
Measuring and Reporting Performance
Throughput: The throughput of a CPU pipeline is the Precise Exceptions vs. Imprecise Exceptions
Execution time (latency) Throughput MIPS
number of instructions completed per second Latency----the number of intervening cycles between an
Criteria of performance evaluation differs among users
Ideal speedup equal to Number of pipeline stages instruction that produces a result and an instruction that
and designers
Single clock - pipelining decreases cycle time. uses the result.
response time = elapsed time
Multiple clock - pipelining decreases CPI Initiation interval----the number of cycles that must elapse
CPU time - Measures designer perception of the CPU
Performance issues: 1. Latency. 2. Imbalance among between instructions issue to the same unit.
speed. Further divided into: User CPU time and System
stages. 3. Overhead. 4. Pipeline hazards. 5. Time to “fill” Supports multiple outstanding FP operations
CPU time
and “drain”. Structural hazards can occur.
Throughput - Measure administrator perception of the
MIPS 5 stage pipeline Solve the write port conflict
system performance.
Hazard Detect and insert stalls by serializing the writes
If you improve response time, you usually improve
A hazard is a condition that prevents an instruction in the 1. Track the use of the write port in the ID stage and to stall
throughput. You can also improve throughput without
pipeline from executing its next scheduled pipeline stage. an instruction before it issues
improving response time
Hazards can always be resolved by Stall 2. To stall a conflicting instruction when it tries to enter the
MIPS comparing the same Instruction Set
1. The stall delays all instructions issued after the MEM or WB stage.
SPEC - The System Performance Evaluation Cooperative
instruction that was stalled, while other instructions in the Example: Interrupts-User hitting the keyboard, Disk drive
Total Execution Time: A Consistent Summary Measure
pipeline go on proceeding. asking for attention, Arrival of a network packet
2. No new instructions are fetched during a stall Exceptions- Divide by zero, Overflow, Page fault
Case of multi-cycle implementation Handling exceptions
1. Ignore the problem (imprecise exceptions)
Fraction enhanced: Fraction of computation time in 2. Buffer the results and delay commitment
original machine that can be converted to take advantage History file - saves the original values of the registers.
of the enhancement. Speedup = [1- ΣFEi + ΣFEi/Si]-1 Future file - stores the newer values for registers.
3. Keep enough information for the trap handler to create
a precise sequence for the exception
4. Allow instruction issue only if it is known that all
Structural hazard
previous instructions will complete without causing an
Instruction Set Design Occurs when two or more instructions want to use the
exception.
The type of internal storage in CPU: stack, accumulator, same hardware resource in same cycle
Designing instruction sets for pipelining
GPR architecture Overcome by replicating hardware resources
1. Avoid variable instruction lengths and running times
Three general types of GPR (General Purpose Registers) 1. Multiple accesses to the register file -- double bump 双
whenever possible.
1. Register-Register ------ Load/Store 端口. 2. Multiple accesses to memory -- split IM and DM /
2. Avoid sophisticated addressing modes.
ALU operations use register operands only multiple memory port / instr. buffer. 3. Some functional
3. Don't allow self-modifying code.
Alpha, ARM, MIPS, PowerPC, SPARC, superH, TM unit is not fully pipelined. 4. Not pipelined functional units
4. Avoid implicitly setting CCs in instructions
2. Register-Memory Allow: to reduce cost and latency of the unit
Memory - Hierarchy Design
IBM360/370, Inter80x86, Ti TM Motorola 6800 Data hazard
Taking advantage of the principle of locality 局部性原则
3. Memory-Memory Vas Occur when the pipeline changes the order of read/write
Cache -- Small, fast storage used to improve average access
Metrics Reg-Reg Reg-mem Mem-Mem accesses to operands comparing with that in sequential
time to slow memory
Code density Lowest Higher Highest executing
Block placement
Instr. count Largest Large Small Dependencies between “nearby” instructions
Fully Associative, Set Associative, Direct Mapped
Instr. Comp. Simplest Complex Most Comp 1. Insert NOP by compiler
Sets with n blocks -- n-way set associative
Instr. length Fixed Variable Large vari- 2. Add hardware Interlock
Block identification Address Tag / Block
Encoding com Fixed Hybrid Variable Add extra hardware to detect stall situations
Add extra hardware to push bubbles thru pipe TAG Index offset
CPI Small Middle Large vari-
3. Forwarding (bypass, short-circuiting) Block replacement -- Random, LRU, FIFO
Memory address bit byte word
“point backwards” Write strategy
1. Support at least 3 addressing mode Register indirect,
Load stall – avoid: reordering by compiler When data is written into the cache,
displacement, immediate 2. The size of the address for
EX/MEM.ALUOUT MEM/WB.ALUOUT MEM/WB.LMD 1. Write-through cache: also written to memory.
displacement mode to be at least 12-16 bits 3.The size of
Can always discard cached data - most up-to-date data is in
the immediate field to be at least 8-16 bits
memory
Little Endian: Intel Big Endian: IBM, Motorola
Cache control bit: only a valid bit
MAKE THE COMMON CASE FAST
Memory always has latest data
Instructions for Control flow
2. Write-back cache: NOT written to memory
1. Common used instructions shall be considered firstly:
Can’t just discard cached data - may have to write it back
Load, store, add, sub, move R-R, and, shift, =, ≠, branch
to memory
and etc. 2. Conditional branch: displacement 100 <=27
Cache control bits: both valid and dirty bits
PC-relative branch: displacement > 8 bits. 3. PC-relative
Much lower bandwidth, since data often overwritten
conditional branches dominate the control instructions. 4.
multiple times
Jump and link instruction for procedure call. 5. Register
Write-through advantages: Read misses don't result in
indirect jump for procedure return Control hazard
Cause: branch condition and the branch PC are not writes, memory hierarchy is consistent and it is simple to
Three areas in which current high-level languages
implement.
allocate the data: available in time to fetch an instruction on the next clock.
Write back advantages: Writes occur at speed of cache and
1. Stack: local variables; scalars (single variables) The next PC takes time to compute
main memory bandwidth is smaller when multiple writes
2. Global data area: global variables, constants; arrays 1. Freeze or flush the pipeline
occur to the same block.
Write stall ---When the CPU must wait for writes to Get data from memory before actually needed.
complete during write through Rely on having extra memory bandwidth.
Write buffers -- A small cache that can hold a few values c. compiler prefetching (reduce MR)
waiting to go to main memory. Binding prefetch: Requests load directly into register.
To avoid stalling on writes, many CPUs use a write buffer. Non-Binding prefetch: Load into cache.
This buffer helps when writes are clustered. It does not 4. Reduce the time to hit in the cache.--4
entirely eliminate stalls since it is possible for the buffer to a. small and simple caches
fill if the burst is larger than the buffer. Using small and Direct-mapped cache
Write misses -- If a miss occurs on a write (the block is not b. avoiding address translation
present), there are two options: Translation Lookaside Buffer (TLB)
1. Write allocate - The block is loaded into the cache on a Index with Physical Portion of Address -- Start tag access in
miss before anything else occurs. parallel with translation
2. Write around - The block is only written to main
memory. It is not stored in the cache.
In general, write-back caches use write-allocate, and
write-through caches use write-around. Dealing with aliases -- page coloring
Cache performance c. pipelined cache access
CPIExec includes ALU and Memory instructions d. trace caches
CPU Execution time = (CPU clock cycles + Memory stall Find a dynamic sequence of instructions to load a cache
cycles) × Clock cycle time block. Cache blocks contains dynamic traces of the
executed instructions
Summary
Average Memory Access Time
AMAT = Hit time + (Miss Rate × Miss Penalty)
  HitTime Inst  MissRateInst  MissPenalty Inst   Inst %

 HitTimeData  MissRateData  MissPenaltyData   Data%

AluOps MemAcc
CPUtime  IC * ( * CPI AluOps  * AMAT ) * CC
Inst Inst

Compulsory—cold start/ first reference misses

Capacity—can’t contain all the blocks needed
Conflict—in set associative or direct mapped, a too many
blocks map to a set. Collision / interference misses.
How to improve CPU time
1. Reduce the miss penalty--5 Main memory
a. Multilevel caches Higher Bandwidth
Add a second-level cache between main memory and a 4 send address, 56 access/word, 4 send a word
small, fast first-level cache. 1. Wider Main Memory
AMAT = Hit TimeL1 + Miss RateL1 x (Hit TimeL2 + Miss Doubling or quadrupling the width of the cache and the
RateL2(local) x Miss PenaltyL2) memory. 2*(4+56+4)
2. Simple Interleaved Memory
Memory chips are organized in banks to read or write
multiple words at a time. 4+56+4*4
# of banks ≥ # of CC to access word in bank
3. Independent Memory Banks
Independent memory controller was present for every
bank. Avoiding Memory Bank Conflicts -- use a prime
number of memory banks 2n-1
Memory Technology
Reduce miss rate of the second-level cache Access time ----- time between read is requested and
b. Critical word first desired word arrives. Cycle time ----- minimum time
Request the missed word first from memory and send it to between requests to memory.
the CPU as soon as it arrives; let the CPU continue Random Access Memory
execution while filling the rest of the words in the block. 1. SRAM – Cache, no refresh
c. Read miss before write miss 2. DRAM– Mainstream Main Memory, refreshed
With a write buffer, writes can be delayed to come after periodically, divided into 2 halves RAS (Access Strobe) &
reads. Write-through -- Check write buffer contents before CAS
read; Write-back -- copy the dirty block to a write buffer, Embedded: Read-Only Memory and Flash Memory.
then do the read Virtual Memory
d. Merging write buffers Provides illusion of very large memory
Merge multiword writes to one. Also reduces stalls due to Virtual memory space - what the program “sees”
the write buffer being full. Physical memory space - what the program runs in
e. victim caches Pages-fixed-size blocks, segments-variable-size
Small fully-associative cache. Hold a few of most recently 1. Place block: fully associative strategy, anywhere
replaced blocks. Check on miss. 2. Find block: segmentation-the offset is added to the
2. Reduce the miss rate--5 segment’s PA; paging-the offset is simply concatenated to
Assume total cache size not changed this page’s PA
a. larger block size -- Decrease compulsory miss rate (patial 3. Replace block: LRU block-minimize page faults
locality). May increase the miss penalty. Certainly increase 4. Write strategy: always write back, dirty bit
conflict misses. Pseudo-associative 伪相联
b. large cache size – Longer hit time & Higher cost AMATx = HTx + MRx * MPx
Old rule of thumb: 2 x size => 25% cut in miss rate HTx = HTy + HRx * C1
c. higher associativity -decreasing Conflict misses to HRx = HRz - HRy =（1 – Hry） - (1 - HRz) = MRy - MRz
improve miss rate. 2:1 rule direct-mapped cache of size N HTx = HTy + (MRy - MRz) * C1
has the same miss rate as a 2-way set-associative cache of MRx = MRz
size N/2. Eight-way set associative ≈ Fully associative MPx = MPy = MPz
d. way prediction and pseudo-associativity AMATx = HTy + (MRy - MRz) * C1 + MRz * MPy
Reduce conflict misses and maintains hit speed of x 表示伪相联，y 表示一路组相联，z 表示二路组相联
direct-mapped cache. (Column associative) C1 是伪关联命中时用的时钟周期
e. compiler optimizations
Reorder instr. Data: 1) Merging Arrays, 2) Loop Interchange,
3) Loop Fusion, 4) Blocking
3. Reduce the MP and MR via parallelism-- 3
a. non-blocking caches (reduce MP)
Continue to supply hits while processing read misses.
Conjunction with out-of-order execution.
b. hardware prefetching (reduce MR)

Lecture Notes Pipelining Stages 7B
No ratings yet
Lecture Notes Pipelining Stages 7B
7 pages
Computer Systems Pipelining Guide
No ratings yet
Computer Systems Pipelining Guide
7 pages
Computer Architecture 1
No ratings yet
Computer Architecture 1
37 pages
Ca06 2014 PDF
No ratings yet
Ca06 2014 PDF
53 pages
L04 Pipelining
No ratings yet
L04 Pipelining
38 pages
MIPS
No ratings yet
MIPS
70 pages
cs152 Notes
No ratings yet
cs152 Notes
34 pages
EE (CE) 6304 Computer Architecture Lecture #2 (8/28/13)
No ratings yet
EE (CE) 6304 Computer Architecture Lecture #2 (8/28/13)
35 pages
Pipe Lining
No ratings yet
Pipe Lining
66 pages
Pipeline History
No ratings yet
Pipeline History
30 pages
CA Final PDF
No ratings yet
CA Final PDF
13 pages
The Improvement of The Personal Computer
No ratings yet
The Improvement of The Personal Computer
74 pages
L03 Pipelining
No ratings yet
L03 Pipelining
45 pages
L14 MipsPipeline Ovw
No ratings yet
L14 MipsPipeline Ovw
17 pages
Me FIRST
No ratings yet
Me FIRST
4 pages
Processor Organization & Instruction Cycle
No ratings yet
Processor Organization & Instruction Cycle
31 pages
Pipelining and Vector Processing
No ratings yet
Pipelining and Vector Processing
30 pages
Questions With Answers
No ratings yet
Questions With Answers
22 pages
CA Lecture 12
No ratings yet
CA Lecture 12
48 pages
Final
No ratings yet
Final
26 pages
Module 5 - Processor Structure and Function
No ratings yet
Module 5 - Processor Structure and Function
74 pages
Chapter 2 Lecture 4 and 5
No ratings yet
Chapter 2 Lecture 4 and 5
56 pages
Moduel 5
No ratings yet
Moduel 5
46 pages
16.482 / 16.561 Computer Architecture and Design: Instructor: Dr. Michael Geiger Fall 2013
No ratings yet
16.482 / 16.561 Computer Architecture and Design: Instructor: Dr. Michael Geiger Fall 2013
42 pages
Contact Session 8
No ratings yet
Contact Session 8
63 pages
CH7-Parallel and Pipelined Processing
No ratings yet
CH7-Parallel and Pipelined Processing
23 pages
CS530 Fall2015 Lecture9
No ratings yet
CS530 Fall2015 Lecture9
5 pages
Chapter 4 The Processor
No ratings yet
Chapter 4 The Processor
72 pages
Parallel Computing
No ratings yet
Parallel Computing
46 pages
Pipeline Processor Design
No ratings yet
Pipeline Processor Design
89 pages
ACA CIE-1 Notes
No ratings yet
ACA CIE-1 Notes
4 pages
Computer Archi
No ratings yet
Computer Archi
58 pages
Unit V
No ratings yet
Unit V
23 pages
Pipe 1 New
No ratings yet
Pipe 1 New
64 pages
Instruction Pipelining and SuperScalar Development - 2019
No ratings yet
Instruction Pipelining and SuperScalar Development - 2019
53 pages
Unit - 1 Microprocessor Architecture
No ratings yet
Unit - 1 Microprocessor Architecture
52 pages
Chapter 04 Processor 2
No ratings yet
Chapter 04 Processor 2
28 pages
Giub 20223 62 17427 2024-06-18T10 47 08
No ratings yet
Giub 20223 62 17427 2024-06-18T10 47 08
32 pages
20-Unit 7-22-04-2024
No ratings yet
20-Unit 7-22-04-2024
97 pages
Aca Univ 2 Mark and 16 Mark
No ratings yet
Aca Univ 2 Mark and 16 Mark
20 pages
Basic Pipelining: CS2100 - Computer Organization
No ratings yet
Basic Pipelining: CS2100 - Computer Organization
83 pages
CO Pipelining PDF Notes
No ratings yet
CO Pipelining PDF Notes
10 pages
3-Pipelining 241110 203716
No ratings yet
3-Pipelining 241110 203716
59 pages
ACA Question Bank
No ratings yet
ACA Question Bank
19 pages
4.4 Pipelining
No ratings yet
4.4 Pipelining
39 pages
Arch3 Pipelining Afterlecture
No ratings yet
Arch3 Pipelining Afterlecture
180 pages
Parallelism in Uniprocessor System and Granularity
100% (5)
Parallelism in Uniprocessor System and Granularity
5 pages
Design of 32bit MIPS Processor
No ratings yet
Design of 32bit MIPS Processor
23 pages
ACA - All Unit
No ratings yet
ACA - All Unit
31 pages
Cse410 10 Pipelining A
No ratings yet
Cse410 10 Pipelining A
27 pages
Unit5 Parallel Processing Multiprocessor
No ratings yet
Unit5 Parallel Processing Multiprocessor
32 pages
04 Pipeline
No ratings yet
04 Pipeline
83 pages
Hazards Slideshow
No ratings yet
Hazards Slideshow
72 pages
Pipelining, Introduction To Parallel Processing and Operating System
No ratings yet
Pipelining, Introduction To Parallel Processing and Operating System
50 pages
Introduction To MIPS Architecture
No ratings yet
Introduction To MIPS Architecture
10 pages
Pipelined MIPS Processor: Dmitri Strukov ECE 154A
No ratings yet
Pipelined MIPS Processor: Dmitri Strukov ECE 154A
81 pages
Unit 5
No ratings yet
Unit 5
36 pages
Exercises Unit3 Part2 OCW
No ratings yet
Exercises Unit3 Part2 OCW
5 pages
Speech-Based Emotion & Gender ID
No ratings yet
Speech-Based Emotion & Gender ID
4 pages
MS Publisher
No ratings yet
MS Publisher
45 pages
Linear Algebra M. Thamban Nair PDF Download
No ratings yet
Linear Algebra M. Thamban Nair PDF Download
163 pages
Phonics Assessment Insights
No ratings yet
Phonics Assessment Insights
2 pages
Exam Practice 4
No ratings yet
Exam Practice 4
1 page
Bilal Philips Thesis
100% (3)
Bilal Philips Thesis
4 pages
3 Am
No ratings yet
3 Am
3 pages
Python Online Training
No ratings yet
Python Online Training
7 pages
Harvard CS197 Lecture 4 Notes
No ratings yet
Harvard CS197 Lecture 4 Notes
15 pages
Autobiography Writing Guide
No ratings yet
Autobiography Writing Guide
3 pages
Carol of The Bells
No ratings yet
Carol of The Bells
2 pages
Event Management System Web-Based Application Management System
No ratings yet
Event Management System Web-Based Application Management System
83 pages
Pronoun Translation in Three Languages
No ratings yet
Pronoun Translation in Three Languages
16 pages
Shubham Kumar Dutta Resume
No ratings yet
Shubham Kumar Dutta Resume
1 page
Linear Transformations Guide
No ratings yet
Linear Transformations Guide
12 pages
6 Reversing The Digits of An Integer
No ratings yet
6 Reversing The Digits of An Integer
19 pages
Python Programming Exercises
No ratings yet
Python Programming Exercises
3 pages
Priority Queues
No ratings yet
Priority Queues
3 pages
Jajpadoajfapaopkajmknakpp Auam Fuaifjaodanfaihfakfjaifahifuruafj Iafajfafoafa 8fhaifhaufa8afnaf
No ratings yet
Jajpadoajfapaopkajmknakpp Auam Fuaifjaodanfaihfakfjaifahifuruafj Iafajfafoafa 8fhaifhaufa8afnaf
25 pages
Basic Simple Presentation
No ratings yet
Basic Simple Presentation
37 pages
Python Handwritten Note
No ratings yet
Python Handwritten Note
7 pages
21st Century Lit.
No ratings yet
21st Century Lit.
6 pages
The Lion The Witch and The Wardrobe - Chapter 16 - Review Questions PDF
No ratings yet
The Lion The Witch and The Wardrobe - Chapter 16 - Review Questions PDF
1 page
Module 1 - Oracle Architecture
No ratings yet
Module 1 - Oracle Architecture
34 pages
TA 6 Global PRACTICE TEST - UNIT 5
No ratings yet
TA 6 Global PRACTICE TEST - UNIT 5
4 pages
拼音ppt
0% (1)
拼音ppt
11 pages
AI Agent in N8N Build
No ratings yet
AI Agent in N8N Build
21 pages
Summative Test Math5 W3-4 DepEdClick
No ratings yet
Summative Test Math5 W3-4 DepEdClick
4 pages
Compliance Against POC Models
No ratings yet
Compliance Against POC Models
16 pages

A4 版本1 （未使用）

Uploaded by

A4 版本1 （未使用）

Uploaded by

Server: dependability, scalability, efficient throughput 3. Heap: dynamic objects; accessed with pointers Holding or deleting any instr.

ng or deleting any instr. after branch until the branch

 HitTimeData  MissRateData  MissPenaltyData   Data%

Compulsory—cold start/ first reference misses

You might also like