Instruction Level Parallelism and Superscalar Processors

This document discusses instruction level parallelism and superscalar processors. It begins by defining superscalar processors as being able to initiate and execute common instructions simultaneously and independently. It then explains that superscalar processors improve performance by executing scalar operations concurrently in multiple pipelines using multiple functional units. The document outlines some limitations of instruction level parallelism due to data dependencies, procedural dependencies, and resource conflicts. It also discusses design issues like instruction reordering and register renaming techniques used in superscalar processors to mitigate these dependencies and improve parallel execution.

Uploaded by

John Michael Marasigan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views34 pages

Instruction Level Parallelism and Superscalar Processors

Uploaded by

John Michael Marasigan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 34

Instruction Level Parallelism

and Superscalar Processors

Chapter 14
William Stallings
Computer Organization and
Architecture
7th Edition
What is Superscalar?
• Common instructions (arithmetic,
load/store, conditional branch) can be
initiated simultaneously and executed
independently
• Applicable to both RISC & CISC
Why Superscalar?

• Most operations are on scalar

quantities (see RISC notes)
• Improve these operations by
executing them concurrently in
multiple pipelines
• Requires multiple functional units
• Requires re-arrangement of
instructions
General Superscalar
Organization
Superpipelined
• Many pipeline stages need less than half a
clock cycle
• Double internal clock speed gets two tasks
per external clock cycle
• Superscalar allows parallel fetch and
execute
Limitations
• Instruction level parallelism: the degree to
which the instructions can be executed
parallel (in theory)
• To achieve it:
– Compiler based optimisation
– Hardware techniques
• Limited by
– Data dependency
– Procedural dependency
– Resource conflicts
True Data (Write-Read)
Dependency

• ADD r1, r2 (r1 := r1+r2;)

• MOVE r3, r1 (r3 := r1;)
• Can fetch and decode second
instruction in parallel with first
• Can NOT execute second instruction
until first is finished
Procedural Dependency
• Cannot execute instructions
after a (conditional) branch in
parallel with instructions
before a branch
• Also, if instruction length is not
fixed, instructions have to be
decoded to find out how many
fetches are needed (cf. RISC)
• This prevents simultaneous
fetches
Resource Conflict
• Two or more instructions requiring access
to the same resource at the same time
– e.g. functional units, registers, bus
• Similar to true data dependency, but it is
possible to duplicate resources
Effect of
Dependencies
Design Issues
• Instruction level parallelism
– Some instructions in a sequence are
independent
– Execution can be overlapped or re-ordered
– Governed by data and procedural
dependency
• Machine Parallelism
– Ability to take advantage of instruction level
parallelism
– Governed by number of parallel pipelines
(Re-)ordering instructions

• Order in which instructions are

fetched
• Order in which instructions are
executed – instruction issue
• Order in which instructions
change registers and memory -
commitment or retiring
In-Order Issue
In-Order Completion

• Issue instructions in the

order they occur
• Not very efficient – not used
in practice
• May fetch >1 instruction
• Instructions must stall if
necessary
An Example

• I1 requires two cycles to execute

• I3 and I4 compete for the same
functional unit
• I5 depends on the value
produced by I4
• I5 and I6 compete for the same
functional unit
In-Order Issue In-Order
Completion (Diagram)
In-Order Issue Out-of-Order
Completion (Diagram)
In-Order Issue
Out-of-Order Completion
• Output (write-write) dependency
– R3:= R2 + R5; (I1)
– R4:= R3 + 1; (I2)
– R3:= R5 + 1; (I3)
– R6:= R3 + 1; (I4)
– I2 depends on result of I1 - data
dependency
– If I3 completes before I1, the input for I4
will be wrong - output dependency: I1&I3-
I6
Out-of-Order Issue
Out-of-Order Completion
• Decouple decode pipeline from execution
pipeline
• Can continue to fetch and decode until this
pipeline is full
• When a functional unit becomes available
an instruction can be executed
• Since instructions have been decoded,
processor can look ahead – instruction
window
Out-of-Order Issue Out-of-Order
Completion (Diagram)
Antidependency

• Read-write dependency: I2-I3

– R3:=R3 + R5; (I1)
– R4:=R3 + 1; (I2)
– R3:=R5 + 1; (I3)
– R7:=R3 + R4; (I4)
– I3 should not execute before I2 starts as I2
needs a value in R3 and I3 changes R3
Register Renaming

• Output and antidependencies

occur because register
contents may not reflect the
correct program flow
• May result in a pipeline stall
• The usual reason is storage
conflict
• Registers can be allocated
dynamically
Register Renaming example
• R3b:=R3a + R5a (I1)
• R4b:=R3b + 1 (I2)
• R3c:=R5a + 1 (I3)
• R7b:=R3c + R4b (I4)
• Without label (a,b,c) refers to logical
register
• With label is hardware register allocated
• Removes antidependency I2-I3 and output
dependency I1&I3-I4
• Needs extra registers
Machine Parallelism

• Duplication of Resources
• Out of order issue
• Renaming
• Not worth duplicating functions without
register renaming
• Need instruction window large enough
(more than 8)
Speedups Without Procedural
Dependencies (with out-of-order issue)
Branch Prediction
• Intel 80486 fetches both next sequential
instruction after branch and branch target
instruction
• Gives two cycle delay if branch taken (two
decode cycles)
RISC - Delayed Branch

• Calculate result of branch before unusable

instructions pre-fetched
• Always execute single instruction
immediately following branch
• Keeps pipeline full while fetching new
instruction stream
• Not as good for superscalar
– Multiple instructions need to execute in delay
slot
• Revert to branch prediction
Superscalar Execution
Pentium 4

• 80486 - CISC
• Pentium – some superscalar
components
– Two separate integer execution units
• Pentium Pro – Full blown
superscalar
• Subsequent models refine &
enhance superscalar design
Pentium 4 Operation
• Fetch instructions form memory in order of static
program
• Translate instruction into one or more fixed
length RISC instructions (micro-operations)
• Execute micro-ops on superscalar pipeline
– micro-ops may be executed out of order
• Commit results of micro-ops to register set in
original program flow order
• Outer CISC shell with inner RISC core
• Inner RISC core pipeline at least 20 stages
– Some micro-ops require multiple execution stages
– cf. five stage pipeline on Pentium
Pentium 4 Pipeline
Stages 1-9
• 1-2 (BTB&I-LTB, F/t): Fetch instructions,
static branch prediction, split into 4 micro-
ops
• 3-4 (TC): Dynamic branch prediction,
sequencing micro-ops
• 5: Feed into out-of-order execution logic
• 6 (R/a): Allocating resources (registers)
• 7-8 (R/a): Renaming registers and
removing false dependencies
• 9 (mopQ): Re-ordering micro-ops
Stages 10-20
• 10-14 (Sch): Scheduling (FIFO) and
dispatching micro-ops towards available
execution unit
• 15-16 (RF): Storing pending operations
• 17 (ALU, Fop): Execution of micro-ops
• 18 (ALU, Fop): Compute flags
• 19 (ALU): Branch check – feedback to
stages 3-4
• 20: Retiring instructions
Pentium 4 Block Diagram

CH - 14 - Instruction Level Parallelism and Superscalar Processors
No ratings yet
CH - 14 - Instruction Level Parallelism and Superscalar Processors
42 pages
P14-15 Superscalar
No ratings yet
P14-15 Superscalar
28 pages
Superscalar Processors & Parallelism
No ratings yet
Superscalar Processors & Parallelism
50 pages
10 Week
No ratings yet
10 Week
35 pages
Decode and Issue More and One Instruction at A Time Executing More Than One Instruction at A Time More Than One Execution Unit
No ratings yet
Decode and Issue More and One Instruction at A Time Executing More Than One Instruction at A Time More Than One Execution Unit
28 pages
William Stallings Computer Organization and Architecture: Instruction Level Parallelism and Superscalar Processors
No ratings yet
William Stallings Computer Organization and Architecture: Instruction Level Parallelism and Superscalar Processors
28 pages
Computer Organization and Architecture What Does Superscalar Mean?
No ratings yet
Computer Organization and Architecture What Does Superscalar Mean?
14 pages
CH18 COA11e
No ratings yet
CH18 COA11e
37 pages
Presentation Cea Chapter16 2 Demo
No ratings yet
Presentation Cea Chapter16 2 Demo
30 pages
L27,28 Superscaler
No ratings yet
L27,28 Superscaler
28 pages
CH18 COA11e
No ratings yet
CH18 COA11e
40 pages
Hafta 14
No ratings yet
Hafta 14
23 pages
CH16-WS ILP and Superscalar-V2
No ratings yet
CH16-WS ILP and Superscalar-V2
42 pages
CH16 COA9e Instruction Level Parallelism and Superscalar Processors
No ratings yet
CH16 COA9e Instruction Level Parallelism and Superscalar Processors
20 pages
5th Sem - Unit 2-Ec355tbf
No ratings yet
5th Sem - Unit 2-Ec355tbf
104 pages
Chapter 13 - Instruction Level Parallelism
No ratings yet
Chapter 13 - Instruction Level Parallelism
16 pages
Superscalar
No ratings yet
Superscalar
38 pages
7TH - Unit 2-21ec74h6 - Ca
No ratings yet
7TH - Unit 2-21ec74h6 - Ca
95 pages
Chapter 5 PPTV 41 STDV 1
No ratings yet
Chapter 5 PPTV 41 STDV 1
47 pages
CH16 ParallelismSuperScalar 22 Slides
No ratings yet
CH16 ParallelismSuperScalar 22 Slides
22 pages
ITEC582-Chapter 16m
No ratings yet
ITEC582-Chapter 16m
55 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
36 pages
CS 6290 Instruction Level Parallelism
No ratings yet
CS 6290 Instruction Level Parallelism
45 pages
Chapter 2 ILP
No ratings yet
Chapter 2 ILP
89 pages
CH16-WS ILP and Superscalar-V2
No ratings yet
CH16-WS ILP and Superscalar-V2
42 pages
S6 - Advanced Topics in Computer Architecture
No ratings yet
S6 - Advanced Topics in Computer Architecture
52 pages
Batch 2 ICS 2101 AND BIT 2102 (1) - 1
No ratings yet
Batch 2 ICS 2101 AND BIT 2102 (1) - 1
17 pages
William Stallings Computer Organization and Architecture 10 Edition
No ratings yet
William Stallings Computer Organization and Architecture 10 Edition
40 pages
Onur 447 Spring15 Lecture12 Ooo Execution Afterlecture
No ratings yet
Onur 447 Spring15 Lecture12 Ooo Execution Afterlecture
67 pages
William Stallings Computer Organization and Architecture 8 Edition Processor Structure and Function
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Processor Structure and Function
74 pages
Instruction-Level Parallelism and Superscalar Processors
100% (1)
Instruction-Level Parallelism and Superscalar Processors
22 pages
EE457Unit9a OoO
No ratings yet
EE457Unit9a OoO
77 pages
Superscalar Architecture & ILP
No ratings yet
Superscalar Architecture & ILP
43 pages
Architecture PDF
No ratings yet
Architecture PDF
19 pages
Superscalar Processor Insights
No ratings yet
Superscalar Processor Insights
18 pages
Presentation - Cea - Chapter16 2
No ratings yet
Presentation - Cea - Chapter16 2
33 pages
Modern Computer Architecture (Processor Design) : Prof. Dan Connors Dconnors@colostate - Edu
No ratings yet
Modern Computer Architecture (Processor Design) : Prof. Dan Connors Dconnors@colostate - Edu
32 pages
12 - Processor Structure and Function
No ratings yet
12 - Processor Structure and Function
73 pages
10th Lecture: Multiple-Issue Processors: Please Recall: Branch Prediction
No ratings yet
10th Lecture: Multiple-Issue Processors: Please Recall: Branch Prediction
28 pages
Computer Architecture 09-Superscalar
No ratings yet
Computer Architecture 09-Superscalar
83 pages
05 Wideissue
No ratings yet
05 Wideissue
77 pages
Instruction Level Parallelism
No ratings yet
Instruction Level Parallelism
19 pages
COA Report
No ratings yet
COA Report
13 pages
Computer Architecture Insights
No ratings yet
Computer Architecture Insights
41 pages
Input Unit: Memory: in Processing Element (PE) or CPU: Output
No ratings yet
Input Unit: Memory: in Processing Element (PE) or CPU: Output
24 pages
Module 1
No ratings yet
Module 1
68 pages
Cosc530 Ch3all6up
No ratings yet
Cosc530 Ch3all6up
8 pages
Instruction Level Pipelining
100% (1)
Instruction Level Pipelining
113 pages
4 MultiIssue 2024
No ratings yet
4 MultiIssue 2024
174 pages
CH10-Processor Structure and Function
No ratings yet
CH10-Processor Structure and Function
14 pages
Unit1 1.7 Instr Cycle
No ratings yet
Unit1 1.7 Instr Cycle
35 pages
Instruction Level Parallelism Guide
No ratings yet
Instruction Level Parallelism Guide
31 pages
Lect5 PDF
No ratings yet
Lect5 PDF
21 pages
Lec16 OoOa
No ratings yet
Lec16 OoOa
57 pages
Instruction Level Parallelism
No ratings yet
Instruction Level Parallelism
49 pages
CH12 CPU Structure and Function
No ratings yet
CH12 CPU Structure and Function
44 pages
Module 5 - Processor Structure and Function
No ratings yet
Module 5 - Processor Structure and Function
74 pages
Reduced Instruction Set Computer (Risc) Complex Instruction Set Computer (Cisc)
No ratings yet
Reduced Instruction Set Computer (Risc) Complex Instruction Set Computer (Cisc)
7 pages
How To Get A Data Analyst Job With No Experience and With Experience
No ratings yet
How To Get A Data Analyst Job With No Experience and With Experience
19 pages
Brocade Silkworm 4100 Quick Start Guide
No ratings yet
Brocade Silkworm 4100 Quick Start Guide
16 pages
Electronic Miter Box! Control A Stepper Motor With A Keypad - Brainy-Bits
No ratings yet
Electronic Miter Box! Control A Stepper Motor With A Keypad - Brainy-Bits
1 page
v90 Epos Doc v10 en 1
No ratings yet
v90 Epos Doc v10 en 1
30 pages
6784e29d2af0f9101f9a067e Danelec G Series VDR Installation Manual
No ratings yet
6784e29d2af0f9101f9a067e Danelec G Series VDR Installation Manual
14 pages
Event Handling in Javascript
No ratings yet
Event Handling in Javascript
10 pages
Brochure, CAMAG Automatic TLC Sampler 4
No ratings yet
Brochure, CAMAG Automatic TLC Sampler 4
2 pages
CSS NC 2 Session Plan Coc1
100% (4)
CSS NC 2 Session Plan Coc1
11 pages
Noconname2k22 Schedule
No ratings yet
Noconname2k22 Schedule
4 pages
Digital Awareness UNIT II, III, IV and V
No ratings yet
Digital Awareness UNIT II, III, IV and V
123 pages
B.Tech. (CSE) - 7 - 23 - 10 - 2016 PDF
No ratings yet
B.Tech. (CSE) - 7 - 23 - 10 - 2016 PDF
15 pages
Highest Penetration in The Market: Checkpoint Security: Baggage and Parcel Inspection
No ratings yet
Highest Penetration in The Market: Checkpoint Security: Baggage and Parcel Inspection
2 pages
Oracle 12c Partitioned and Subpartitioned Tables
No ratings yet
Oracle 12c Partitioned and Subpartitioned Tables
24 pages
Cisco SMB Solutions Exam 650-195
No ratings yet
Cisco SMB Solutions Exam 650-195
13 pages
Software Optimization For High-Performance Computing
100% (3)
Software Optimization For High-Performance Computing
409 pages
Designation Palnning Officer
No ratings yet
Designation Palnning Officer
2 pages
TIBCO Administrator Fault Tolerance
No ratings yet
TIBCO Administrator Fault Tolerance
4 pages
PhonePe Statement Nov2023 Nov2024
No ratings yet
PhonePe Statement Nov2023 Nov2024
246 pages
ICRAME 2024 Paper Template
No ratings yet
ICRAME 2024 Paper Template
5 pages
Easy PHP Websites With The Zend Framework
No ratings yet
Easy PHP Websites With The Zend Framework
236 pages
C++ OOP Small Programs Guide
No ratings yet
C++ OOP Small Programs Guide
14 pages
Resume Chaitanya
No ratings yet
Resume Chaitanya
1 page
ISYS6310 - Information System Project Management
No ratings yet
ISYS6310 - Information System Project Management
26 pages
CSE Semester V Course Details
No ratings yet
CSE Semester V Course Details
15 pages
Chart Director
No ratings yet
Chart Director
317 pages
Immo Tool FAQ
0% (1)
Immo Tool FAQ
3 pages
DSA Unit Test - I AY2021-22 C
No ratings yet
DSA Unit Test - I AY2021-22 C
14 pages
GCP - GOOD - 11 - Hybrid Load Balancing and Traffic Management - ILT
No ratings yet
GCP - GOOD - 11 - Hybrid Load Balancing and Traffic Management - ILT
39 pages
Dbms MANUAL
No ratings yet
Dbms MANUAL
98 pages
BlackLight 2R - V2020.09.25 TA
100% (1)
BlackLight 2R - V2020.09.25 TA
2 pages

Instruction Level Parallelism and Superscalar Processors

Uploaded by

Instruction Level Parallelism and Superscalar Processors

Uploaded by

Instruction Level Parallelism

and Superscalar Processors

• Most operations are on scalar

• ADD r1, r2 (r1 := r1+r2;)

• Order in which instructions are

• Issue instructions in the

• I1 requires two cycles to execute

• Read-write dependency: I2-I3

• Output and antidependencies

• Calculate result of branch before unusable

You might also like