0% found this document useful (0 votes)

67 views10 pages

COE301 Final Solution 162

1. The pipeline has 5 stages: Instruction Fetch (IF), Instruction Decode (ID), Execute (EX), Memory access (MEM), and Write Back (WB). 2. In the ID stage, the instruction is decoded and the operands are read from the register file. 3. In the EX stage, ALU operations are performed on register operands or immediate values. Memory addresses are also calculated. 4. In the MEM stage, data is read or written from data memory for load/store instructions. 5. In the WB stage, the result is written back to the register file.

Uploaded by

Karim Ibrahim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views10 pages

COE301 Final Solution 162

Uploaded by

Karim Ibrahim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

COE 301 / ICS 233

Computer Organization

Final Exam – Spring 2017

Friday, May 19, 2017

7:30 – 10 AM

Computer Engineering Department

College of Computer Sciences & Engineering
King Fahd University of Petroleum & Minerals

Student Name: SOLUTION

Student ID:

Q1 / 14 Q2 / 16
Q3 / 15 Q4 / 18
Q5 / 10 Q6 / 20
Q7 / 10
Total / 103

Notes:
• This is a closed book and closed notes exam.
• Mobile phones and tablets are not allowed and should be switched off.
• Using unauthorized information on an exam is a severe violation of
academic honesty. Detected cases will receive a failing grade.
Page 2 of 10

Question 1: True or False

(14 pts) Explain the reason why it is true or false for a full mark.

a) (2 pts) A pipelined datapath must have separate instruction and data memories
because the format of instructions is different from the format of data.

False. It is because the instruction and data memories should be accessed in

parallel during the same cycle. Otherwise, there will be a structural hazard.

b) (2 pts) Allowing ALU instructions to write back their result in the 4th stage rather
than the 5th stage, improves the performance of a MIPS 5-stage pipeline.

False, it will cause a structural hazard. Pipeline performance depends on the rate
of instruction completion, not on the latency of individual instructions.

c) (2 pts) In the MIPS 5-stage pipeline, some but not all RAW data hazards can be
eliminated by forwarding.

True, the Load instruction has a delay that cannot be eliminated by forwarding.

d) (2 pts) Name dependences such as Write-After-Read and Write-After-Write do not

cause any hazard in the 5-stage pipeline and do not require special handling.

True, because the register write-back is done in the last stage of the pipeline and
is done in program order.

e) (2 pts) A directly-mapped cache does not need to store tags because it is directly
indexed. However, a fully-associative cache requires tags because it is not indexed.

False. All caches require tags for block identification, whether the cache is
directly-mapped or fully-associative.

f) (2 pts) When comparing a set-associative with a directly-mapped cache with the same
capacity (data size), the set-associative cache decreases the cache miss rate but
increases the hit time.
True. The set-associative cache reduces the cache miss rate because it provides
more flexibility about block placement and less conflict misses. The hit time is
increased because a multiplexer is used to select a block within the set.

g) (2 pts) Each block in a write-through cache has a Modified bit to indicate whether the
block is modified or not.

False, the modified bit is used by Write-Back caches only to indicate that the
block is modified.
Page 3 of 10

Question 2: Single-Cycle / Multi-Cycle / Pipelined Performance

(16 pts) Compare the performance of a single-cycle / multi-cycle / pipelined processor.
The delay times are as follows:

Instruction memory access time = 500 ps Data memory access time = 500 ps
Instruction Decode and Register read = 300 ps Register write = 200 ps
ALU delay = 300 ps
Ignore the other delays in the multiplexers, wires, etc. Assume the following instruction
mix: 35% ALU, 20% load, 10% store, 25% branch, and 10% jump.
a) (5 pts) Compute the delay for each instruction class for the single-cycle processor.

Instruction Instruction Decode and Data Write Total

ALU
Class Memory Register Read Memory Back Delay
ALU 500 300 300 200 1300
Load 500 300 300 500 200 1800
Store 500 300 300 500 1600
Branch 500 300 300 1100
Jump 500 300 800

b) (2 pts) Compute the clock cycle for the single-cycle processor.

Clock cycle = longest delay = 1800 ps

c) (2 pts) Compute the clock cycle for the multi-cycle processor.

Clock cycle for multi-cycle = max(500, 300, 200) = 500 ps

d) (3 pts) Compute the average CPI for the multi-cycle processor.

Average CPI = 0.35×4 + 0.2×5 + 0.1×4 + 0.25×3 + 0.1×2 = 3.75

e) (2 pts) What is the speedup factor of the multi-cycle over the single-cycle processor?

Speedup = 1800 × 1 / (500 × 3.75) = 1800 / 1875 = 0.96 (Multi-cycle is slower)

f) (2 pts) What is the speedup factor of the pipelined over the single-cycle processor?

Speedup = 1800 / 500 = 3.6 (Pipelined processor is faster)

Page 4 of 10

Question 3: Performance of a MIPS program

(15 pts) The following code fragment processes two single-precision floating-point
arrays A and B, and modifies the array B. Each array consists of 500 single-precision
floats. The addresses of A and B are stored in $a0 and $a1 respectively. Note that there
is an outer loop and an inner loop in the MIPS assembly language code.

li $a2, 500 # $a2 = 500

move $t0, $a0 # $t0 = address of array A
move $t4, $zero # $t4 = 0
mtc1 $zero, $f4 # $f4 = 0 (move to coprocessor 1)

outer: # outer loop

lwc1 $f0, ($t0) # load single-precision float
add.s $f4, $f4, $f0
move $t1, $a1 # $t1 = address of Array B
move $t5, $zero # $t5 = 0

inner: # inner loop

lwc1 $f1, ($t1) # load single-precision float
mul.s $f2, $f1, $f0
div.s $f3, $f2, $f4
swc1 $f3, ($t1) # store single-precision float
addi $t1, $t1, 4
addi $t5, $t5, 1
bne $t5, $a2, inner # end of inner loop

addi $t0, $t0, 4

addi $t4, $t4, 1
bne $t4, $a2, outer # end of outer loop

The above code is executed on a processor with a 2 GHz clock that requires the
following number of cycles for each instruction:

Instruction Cycles

li, move, mtc1, addi 3

lwc1, swc1 5

add.s, mul.s 10

div.s 20

bne 4
Page 5 of 10

a) (3 pts) Count the total number of instructions executed by the above code fragment,
including those executed outside the outer loop.
b) (6 pts) How many cycles does it take to execute the above code?
c) (2 pts) What is the execution time in nanoseconds?
d) (2 pts) What is the average CPI for the above code?
e) (2 pts) What is the MIPS rate for the above code?

Solution:
a) Total Instructions = 4 (outside outer loop) +

7 (outer loop) × 500 +

7 (inner loop) × 500 × 500 = 1,753,504

b) Clock Cycles = 4 × 3 cycle (li, move, mtc1 outside outer loop) +

[1 × 5 (lwc1) + 10 (add.s) + 4 × 3 (move, addi) + 4 (bne) cycles] × 500 iterations +
[2 × 5 (lwc1, swc1) + 10 (mul.s) + 20 (div.s) + 2 × 3 (addi) + 4 (bne)] × 500 × 500
= 12 + 31 × 500 + 50 × 500 × 500 = 12,515,512 cycles

c) Clock cycle = 1 / 2 GHz = 0.5 ns

CPU Execution Time = 12,515,512 cycles × 0.5 ns = 6,257,756 ns

d) Average CPI = 12,515,512 cycles / 1,753,504 instructions = 7.137

e) MIPS Rate = 1,753,504 / (6,257,756 × 10-9 × 106) = 280.02 MIPS

Page 6 of 10

Question 4: MIPS 5-Stage Pipeline

(18 pts) The following is a 5-stage pipeline for the MIPS processor that implements a subset
of the MIPS instruction set, and that supports forwarding and pipeline stall.
IF ID EX MEM WB

ExtOp

Imm16 32

Imm
Ext
32 32 ALU result
0 A
Register File

Rs BusA 1
RA L Address

R
A
2
Instruction

Rt 3 1
U Data 0
RB

Data
BusB
PC

0 Memory
0 32
1 32 32 Data_out 1

D
B
RW BusW 2 Data_in
3
32

Rd2

Rd3

Rd4
0
1
Rd
RegDst
clk
ForwardB ForwardA
Disable PC

Disable IR

Hazard Detect
Rs
Forward
Rt
and Stall
ALUSrc MemRd
RegWr ALUOp MemRd RegWr MemWr RegWr
Stall WBdata
Op
Main Control Signals
& ALU 0
EX

func

MEM
Control Bubble = 0 1

WB
Consider the following MIPS code that is being executed on the above 5-stage pipeline:

I1: lw $t1, 8($t0)

I2: addi $t2, $t1, 2
I3: or $t3, $t1, $t2
I4: lw $t4, -4($t3)
I5: sub $t5, $t3, $t4

a) (8 pts) Complete the following table showing the timing of the above code on the 5-stage
pipeline MIPS processor (IF, ID, EX, MEM, WB). Draw an arrow showing forwarding
between the stage that provides the data and the stage that receives the data. Show all stall
cycles by placing an X in the box to represent a stall cycle.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

I1: LW IF ID EX MEM WB

I2: ADDI IF X ID EX - WB

I3: OR IF ID EX - WB

I4: LW IF ID EX MEM WB

I5: SUB IF X ID EX - WB
Page 7 of 10
b) (10 pts) Complete the following table pertaining to the control signals needed for the
execution of the following MIPS code on the 5-stage pipeline MIPS processor during the
first 9 clock cycles. Use ? for any control signal that is unknown during a given clock
cycle and X for a don't care. Assume that instruction I1:LW is being fetched during clock
cycle 1. The control signals needed in the first clock cycle are shown as an example.

I1: lw $t1, 8($t0)

I2: addi $t2, $t1, 2
I3: or $t3, $t1, $t2
I4: lw $t4, -4($t3)
I5: sub $t5, $t3, $t4

IF ID EX MEM WB

MEM.WBdata (0=ALU,1=MEM)
ExtOp(0=zero,1=sign)
Disable PC (1=disable)

Disable IR (1=disable)

RegDst (0=Rt, 1=Rd)

ALUSrc (0=B, 1=Imm)

ALUOp (add, sub, …)

Clock
Cycle

MEM.RegWr

MEM.MemRd

MEM.MemWr
ForwardA

ForwardB

EX.RegWr

EX.MemRd

WB.RegWr
Stall

1 0 0 ? ? ? ? ? ? ? ? ? ? ? ? ? ?

2 0 0 0 1 0 0 0 ? ? ? ? ? ? ? ? ?

3 1 1 0 1 1 0 1 1 1 add 1 ? ? ? ? ?

4 0 0 0 1 2 0 0 0 0 0 0 1 1 0 1 ?

5 0 0 1 X 3 1 0 1 1 add 0 0 0 0 0 1

6 0 0 0 1 1 0 0 1 0 or 0 1 0 0 0 0

7 1 1 1 X 2 1 1 1 1 add 1 1 0 0 0 1

8 0 0 1 X 3 2 0 0 0 0 0 1 1 0 1 1

9 ? ? ? ? ? ? ? 1 0 sub 0 0 0 0 0 1
Page 8 of 10

Question 5: MIPS 4-Stage Pipeline

(10 pts) Given that the load and store instructions use register-indirect addressing, and
always have a zero offset, there is NO need for the ALU to compute the memory address.
The load and store instructions will have the following format, where Rs is the register
that contains the memory address.

LW Rt, (Rs) # No immediate constant used

SW Rt, (Rs) # No immediate constant used

Accordingly, the MIPS pipeline can be modified to have only 4 stages: IF, ID, EX/MEM,
and WB, as shown below. The modified pipeline MIPS processor eliminates the load
delay and the RAW data hazard due to a load instruction.

a) (4 pts) Provide the control signals values needed for executing the LW instruction.
b) (6 pts) Write the if-statement that will be used for generating the ForwardA signal.

Solution a) Control signals for LW

RegDst ExtOp ALUSrc ALUop MemRd MemWr WBdata RegWr

LW 0=Rt X X X 1 0 1 1

Solution b) ForwardA signal

if (Rs != 0 and Rs == Rd2 and EX/MEM.RegWr == 1) ForwardA = 1;

else if (Rs != 0 and Rs == Rd3 and WB.RegWr == 1) ForwardA = 2;
else ForwardA = 0;
Page 9 of 10

Question 6: Cache Memory

(20 pts) Consider a 2-way set-associative cache with 2048 blocks, where each block has a
size of 64 bytes.
a) (4 pts) Compute the number of sets and the cache data size (do NOT include the valid,
modified, and tag bits).

Number of Sets = 2048 blocks / 2 blocks per set = 1024 sets

Cache data size = 2048 × 64 bytes = 128 KiB = 131,072 Bytes

b) (3 pts) If the memory address consists of 32 bits, find the number of tag bits, index bits,
and block offset bits.

Block offset bits = 6 bits

Index bits = 10 bits (for the 1024 sets)
Tab bits = 32 – 10 – 6 = 16 bits

c) (3 pts) Given that the cache is a write-back cache, compute the total number of bits
required to store the valid, modified, and tag bits in the cache.

Total Valid bits = 2048 bits

Total Modified bits = 2048 bits
Total Tag bits = 2048 × 16 bits = 32,768 bits
Total Valid + Modified + Tag Bits = 36,864 bits

d) (10 pts) Starting with an empty cache, show the tag, index, and way (block 0 or 1) for
each address and indicate whether a hit or a miss. The replacement policy is FIFO.

Address Tag Index Way Hit / Miss

0x001F3A70 0x001F 0x0E9 0 Miss

0x002E3A74 0x002E 0x0E9 1 Miss

0x001F3A80 0x001F 0x0EA 0 Miss

0x001F3A78 0x001F 0x0E9 0 Hit

0x002E3A78 0x002E 0x0E9 1 Hit

0x007F3A80 0x007F 0x0EA 1 Miss

0x001F3A84 0x001F 0x0EA 0 Hit

0x001F3A74 0x001F 0x0E9 0 Hit

Page 10 of 10

Question 7: Cache Performance

(10 pts) A processor runs at 3.0 GHz and has a CPI=1.5 for a perfect cache (i.e. without
including the stall cycles due to cache misses). Assume that load and store instructions
are 25% of the instructions. The processor has an I-cache with a 5% miss rate and a D-
cache with 4% miss rate. The hit time is 1 clock cycle for both caches. Assume that the
time required to transfer a block of data from the main memory to the cache, i.e. miss
penalty, is 40 ns.

a) (5 pts) Compute the number of stall cycles per instruction.

Combined misses per instruction = 5% + 25% × 4% = 0.06

Miss Penalty = 40 ns × 3 GHz = 120 cycles
Memory stall cycles per instruction = 0.06 × 120 = 7.2 cycles

b) (2 pts) Compute the overall CPI.

Overall CPI = 1.5 + 7.2 = 8.7 cycles

c) (3 pts) Compute the average memory access time (AMAT) in ns.

Combined Miss Rate = Combined misses per instruction / 1.25 access per instruction
Combined Miss Rate = 0.06 / 1.25 = 0.048 misses per access

AMAT = 1 + 0.048 × 120 cycles = 6.76 cycles = 6.76 cycles / 3 GHz = 2.253 ns

CMPE361-Final - Sanple
No ratings yet
CMPE361-Final - Sanple
8 pages
Indian Institute of Technology, Kharagpur: Mid-Spring Semester 2021-22
No ratings yet
Indian Institute of Technology, Kharagpur: Mid-Spring Semester 2021-22
4 pages
Illinois Exam2 Practice Solfa08
No ratings yet
Illinois Exam2 Practice Solfa08
4 pages
Midsem Final
No ratings yet
Midsem Final
3 pages
BFE Final Organization Fall 2014 Answer
No ratings yet
BFE Final Organization Fall 2014 Answer
8 pages
ECE 452 Spring 2010 Midterm Exam
No ratings yet
ECE 452 Spring 2010 Midterm Exam
9 pages
L-3rr-l/CSE Date:: Iw Iw
No ratings yet
L-3rr-l/CSE Date:: Iw Iw
30 pages
CENG400 Midterm Fall 2015
No ratings yet
CENG400 Midterm Fall 2015
10 pages
PS4 Solution
No ratings yet
PS4 Solution
6 pages
MIPS Pipeline Homework CS433 Fall 2007
No ratings yet
MIPS Pipeline Homework CS433 Fall 2007
3 pages
CCEE 213 - 2006 - 2007 - II - Final
No ratings yet
CCEE 213 - 2006 - 2007 - II - Final
10 pages
Computer Architecture Exam 2012
No ratings yet
Computer Architecture Exam 2012
2 pages
350 Exam 2 Spring 2024
No ratings yet
350 Exam 2 Spring 2024
7 pages
F10 E1 Solution
No ratings yet
F10 E1 Solution
5 pages
111 Computer Organization - Final
No ratings yet
111 Computer Organization - Final
4 pages
MCQs
No ratings yet
MCQs
19 pages
Coa Applied
No ratings yet
Coa Applied
13 pages
Computer Architecture Exam 2023
No ratings yet
Computer Architecture Exam 2023
3 pages
CSGC 342
No ratings yet
CSGC 342
7 pages
Exam2 Practice Sol
No ratings yet
Exam2 Practice Sol
6 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
50 pages
CENG400-Midterm-Fall 2014
No ratings yet
CENG400-Midterm-Fall 2014
9 pages
Coss 2
No ratings yet
Coss 2
2 pages
BCN1043 Computer Arc & Org S1 0119
No ratings yet
BCN1043 Computer Arc & Org S1 0119
6 pages
Past Papers
No ratings yet
Past Papers
7 pages
Mid 2
No ratings yet
Mid 2
8 pages
CA Fall 2022 Final Exam
No ratings yet
CA Fall 2022 Final Exam
6 pages
Question 1 (50 Points) Pipelining
No ratings yet
Question 1 (50 Points) Pipelining
3 pages
Computer Systems Architecture Exam Solutions
100% (1)
Computer Systems Architecture Exam Solutions
8 pages
컴구 2021 1 중간고사답안 김성태
No ratings yet
컴구 2021 1 중간고사답안 김성태
23 pages
CO ETE Solution 2024 With Marking Scheme
No ratings yet
CO ETE Solution 2024 With Marking Scheme
10 pages
hw2 2025
No ratings yet
hw2 2025
6 pages
HPC Question Bank
No ratings yet
HPC Question Bank
5 pages
03 - CPU Memory Program Execution Assembly - Exercise Sheet (Solutions) - 1504219614
No ratings yet
03 - CPU Memory Program Execution Assembly - Exercise Sheet (Solutions) - 1504219614
6 pages
MIPS & Cache Memory Study Guide
No ratings yet
MIPS & Cache Memory Study Guide
28 pages
Answer:: Remark
No ratings yet
Answer:: Remark
72 pages
F001 English CLC f1
No ratings yet
F001 English CLC f1
6 pages
CENG400-Final-Fall 2015
No ratings yet
CENG400-Final-Fall 2015
10 pages
CAO Part - C QB With ANS
No ratings yet
CAO Part - C QB With ANS
3 pages
IT3030E Exercise Chap5 v2 Ans
No ratings yet
IT3030E Exercise Chap5 v2 Ans
11 pages
Your Name:: Final Exam
No ratings yet
Your Name:: Final Exam
9 pages
Hpca Pyqp
No ratings yet
Hpca Pyqp
17 pages
Inf2c Cs 201314
No ratings yet
Inf2c Cs 201314
10 pages
2005 Computer Architecture Solutions
No ratings yet
2005 Computer Architecture Solutions
11 pages
Assignment 3 - 2024 HPC
No ratings yet
Assignment 3 - 2024 HPC
2 pages
Sheet 9
No ratings yet
Sheet 9
12 pages
Cs433 Sp12 Midterm Sol
No ratings yet
Cs433 Sp12 Midterm Sol
9 pages
Inf2c Cs 201314 Resit
No ratings yet
Inf2c Cs 201314 Resit
7 pages
Ca Mid1 2017
No ratings yet
Ca Mid1 2017
9 pages
Instructions: Csce 212: Final Exam Spring 2009
No ratings yet
Instructions: Csce 212: Final Exam Spring 2009
5 pages
2023 Contoh Soalan Computer Architecture and Organization
No ratings yet
2023 Contoh Soalan Computer Architecture and Organization
7 pages
Cs433 Fa20 Hw3 Solution
No ratings yet
Cs433 Fa20 Hw3 Solution
15 pages
Archi Second 2013 2014 JCE
No ratings yet
Archi Second 2013 2014 JCE
2 pages
CS 4290/6290: High-Performance Computer Architecture Spring 2004 Midterm Quiz
No ratings yet
CS 4290/6290: High-Performance Computer Architecture Spring 2004 Midterm Quiz
3 pages
Assignment Solution Week11
100% (1)
Assignment Solution Week11
5 pages
April 2005 Withsloution
No ratings yet
April 2005 Withsloution
21 pages
Inf2c Cs 201213
No ratings yet
Inf2c Cs 201213
9 pages
Ultrasonic Sensor HC-SR04 Arduino Guide
No ratings yet
Ultrasonic Sensor HC-SR04 Arduino Guide
6 pages
Iphone Price List
No ratings yet
Iphone Price List
1 page
Dental Materials Foundations and Applications 11th Edition Powers Digital Access
No ratings yet
Dental Materials Foundations and Applications 11th Edition Powers Digital Access
405 pages
Unit 2-Data Storage and Cloud Computing
No ratings yet
Unit 2-Data Storage and Cloud Computing
87 pages
Aci Certification Candidate Handbook 2021
No ratings yet
Aci Certification Candidate Handbook 2021
37 pages
MKI2 ComPro New ENG-1
No ratings yet
MKI2 ComPro New ENG-1
16 pages
2018 Alliance Enhancements: Quick Reference Guide
No ratings yet
2018 Alliance Enhancements: Quick Reference Guide
1 page
Amd 3 Pro 4350G
No ratings yet
Amd 3 Pro 4350G
1 page
Dell Inspiron15 UserManual
No ratings yet
Dell Inspiron15 UserManual
24 pages
Level Pro Service Menu
No ratings yet
Level Pro Service Menu
2 pages
Study and Performance Analysis of IIR Filter For Noise Diminution in Digital Signal Using MATLAB
No ratings yet
Study and Performance Analysis of IIR Filter For Noise Diminution in Digital Signal Using MATLAB
7 pages
Noun Modifier
No ratings yet
Noun Modifier
129 pages
Safwan Updated Resume
No ratings yet
Safwan Updated Resume
1 page
R00 - 16-20MVA - 33-11kV - Datasheet
No ratings yet
R00 - 16-20MVA - 33-11kV - Datasheet
3 pages
Oracle Order Import Troubleshooting
100% (1)
Oracle Order Import Troubleshooting
2 pages
Risk Management
80% (5)
Risk Management
26 pages
Steps On How To Recreate The Logo
No ratings yet
Steps On How To Recreate The Logo
5 pages
Se 4
No ratings yet
Se 4
5 pages
1relation Vs Function Wsh1
No ratings yet
1relation Vs Function Wsh1
4 pages
Course Book July Dec 2020 PART1 PDF
100% (1)
Course Book July Dec 2020 PART1 PDF
140 pages
Forex Insider User Manual PDF
No ratings yet
Forex Insider User Manual PDF
13 pages
Large Recip Intellisys® SG: Operators / Instruction Manual
No ratings yet
Large Recip Intellisys® SG: Operators / Instruction Manual
24 pages
Here's A VBA Code Snippet That Perf
No ratings yet
Here's A VBA Code Snippet That Perf
2 pages
KG-819 Wouxun Manual
No ratings yet
KG-819 Wouxun Manual
26 pages
12 PAS Essentials PSM
No ratings yet
12 PAS Essentials PSM
27 pages
VIPA System 500S: SPEED7 - CPU - 515-2AJ02 - Manual
No ratings yet
VIPA System 500S: SPEED7 - CPU - 515-2AJ02 - Manual
154 pages
AutoCAD 2016 Templates & Sheet Sets
No ratings yet
AutoCAD 2016 Templates & Sheet Sets
3 pages
Mda Tda Manual
No ratings yet
Mda Tda Manual
31 pages
How To Manage Windows
No ratings yet
How To Manage Windows
6 pages
Implementation of Digital Circuit Using VHDL
No ratings yet
Implementation of Digital Circuit Using VHDL
15 pages

COE301 Final Solution 162

Uploaded by

COE301 Final Solution 162

Uploaded by

COE 301 / ICS 233

Final Exam – Spring 2017

Friday, May 19, 2017

Computer Engineering Department

Student Name: SOLUTION

Question 1: True or False

False. It is because the instruction and data memories should be accessed in

d) (2 pts) Name dependences such as Write-After-Read and Write-After-Write do not

Question 2: Single-Cycle / Multi-Cycle / Pipelined Performance

Instruction Instruction Decode and Data Write Total

b) (2 pts) Compute the clock cycle for the single-cycle processor.

Clock cycle = longest delay = 1800 ps

c) (2 pts) Compute the clock cycle for the multi-cycle processor.

Clock cycle for multi-cycle = max(500, 300, 200) = 500 ps

d) (3 pts) Compute the average CPI for the multi-cycle processor.

Average CPI = 0.35×4 + 0.2×5 + 0.1×4 + 0.25×3 + 0.1×2 = 3.75

Speedup = 1800 × 1 / (500 × 3.75) = 1800 / 1875 = 0.96 (Multi-cycle is slower)

Speedup = 1800 / 500 = 3.6 (Pipelined processor is faster)

Question 3: Performance of a MIPS program

li $a2, 500 # $a2 = 500

outer: # outer loop

inner: # inner loop

addi $t0, $t0, 4

li, move, mtc1, addi 3

7 (outer loop) × 500 +

7 (inner loop) × 500 × 500 = 1,753,504

b) Clock Cycles = 4 × 3 cycle (li, move, mtc1 outside outer loop) +

c) Clock cycle = 1 / 2 GHz = 0.5 ns

d) Average CPI = 12,515,512 cycles / 1,753,504 instructions = 7.137

e) MIPS Rate = 1,753,504 / (6,257,756 × 10-9 × 106) = 280.02 MIPS

Question 4: MIPS 5-Stage Pipeline

I1: lw $t1, 8($t0)

I1: lw $t1, 8($t0)

RegDst (0=Rt, 1=Rd)

ALUSrc (0=B, 1=Imm)

ALUOp (add, sub, …)

Question 5: MIPS 4-Stage Pipeline

LW Rt, (Rs) # No immediate constant used

Solution a) Control signals for LW

RegDst ExtOp ALUSrc ALUop MemRd MemWr WBdata RegWr

Solution b) ForwardA signal

if (Rs != 0 and Rs == Rd2 and EX/MEM.RegWr == 1) ForwardA = 1;

Question 6: Cache Memory

Number of Sets = 2048 blocks / 2 blocks per set = 1024 sets

Block offset bits = 6 bits

Total Valid bits = 2048 bits

Address Tag Index Way Hit / Miss

0x001F3A70 0x001F 0x0E9 0 Miss

0x002E3A74 0x002E 0x0E9 1 Miss

0x001F3A80 0x001F 0x0EA 0 Miss

0x001F3A78 0x001F 0x0E9 0 Hit

0x002E3A78 0x002E 0x0E9 1 Hit

0x007F3A80 0x007F 0x0EA 1 Miss

0x001F3A84 0x001F 0x0EA 0 Hit

0x001F3A74 0x001F 0x0E9 0 Hit

Question 7: Cache Performance

a) (5 pts) Compute the number of stall cycles per instruction.

Combined misses per instruction = 5% + 25% × 4% = 0.06

b) (2 pts) Compute the overall CPI.

Overall CPI = 1.5 + 7.2 = 8.7 cycles

c) (3 pts) Compute the average memory access time (AMAT) in ns.

You might also like