0% found this document useful (0 votes)

400 views11 pages

HW4S24 - Sol

Uploaded by

ruweiyan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

400 views11 pages

HW4S24 - Sol

Uploaded by

ruweiyan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

University of Southern California

Department of Electrical Engineering

EE557 Spring 2024
Instructor: Murali Annavaram
Homework #4, Due: 11:59PM, Tuesday, April 10th
TOTAL SCORE: 127

Problem 1 [Cache Protocols] (20 pts.)

Assume a shared-memory multiprocessor with a number of processor/private cache units
connected by a shared single-transaction bus. The cache access trace to execute in this problem is
given below: (The notation Ri/X and Wi/X means a read and write operation, respectively, by
processor/cache unit i to block X). The first request read request X by processor/cache 1 will have
to be serviced by the next level.
R1/X, W1/X, R2/X, W3/X, W2/X, W3/X, W3/X, R3/X, W3/X, R1/X, R2/X
We are going to explore the three cache coherence protocols with Write-back: MSI-invalidate,
MSI-update and MESI-invalidate. When the timing model for the protocol actions is like in Table
1, compare the performance (total number of cycles taken to execute the given trace) of the three
cache coherence protocols and explain what makes a protocol more beneficial than the others.
Assume that all caches are initially empty and flush is done before a new access was made. One
snoop action is needed for multiple bus accesses by the same transaction.
Table 1

Request type Time to carry out protocol action

Read/write hit 1 cycle

Bus Read 30 cycles

Read request serviced by next level 80 cycles

Bus upgrade request 10 cycles (including read/write hit)

Bus update 25 cycles (including read/write hit)

Read block and invalidate other copies (BusRdX) 35 cycles

Flush 0 cycle

Snoop action 5 cycles

Solution:

MSI-invalidate (Figure 5.10):

Every bus transaction involves a snoop action. The first request will have to
be serviced by the next level. Therefore, it will first issue a snoop action (5 cycles)
and get the block from the next level (80 cycles). In total, the block is installed after
85 cycles. The second request will be a bus upgrade (10 cycles) and a snoop action

1
(5 cycles) and will take total of 15 cycles. The third request reads X from P1, which
has the only copy of X, and takes 35 cycles (Read from another processor + snoop).
The fourth request writes X from P3, and takes 120 cycles (Read request serviced by
next level + BusRdx + snoop). When processor 2 tries to write X at the fifth request,
W2/X takes 40 cycles (BusRdx + snoop). The following write request by processor
3 also takes 40 cycles (BusRdx + snoop). The following three requests, W3/X, R3/X
and W3/X will take 1 cycle for each(cache hit). In the 10th request, the only valid copy
of block X is in processor 3’s local cache. Therefore, the request is serviced by
the private cache of processor 3 and hence takes 35 cycles(BusRd(Read from
another processor) + snoop). While servicing the block to processor 1, the memory
is also updated. Thus, when the 11th request is given, there are three valid copies of X
in processor 1, 3, and memory. As MSI doesn’t support ownership, it is hard to
select one processor to forward the copy from its own private cache to processor
2. Therefore, the 11th request is serviced by the next level and hence takes 85 cycles.
As a summary, the total number of cycles is (85 + 15 + 35 + 120 + 40 + 40 + 1 + 1
+ 1 + 35 + 85) = 458 cycles.
MSI-update (Figure 7.10 (b)):
One snoop action is for multiple bus accesses by the same transaction. For a
PrWr(S) from I state takes two tansitions, I->S and S->S. The protocol will have the
following events corresponding to the memory accesses:
R1/X: BusRd (Read request serviced by next level + snoop action = 85), W1/X:
Write-hit (1), R2/X: BusRd (Read request serviced by P1 which has the only copy
+ snoop action = 35), W3/X: BusRd (Read request serviced by next level + snoop
action + Bus update = 110), W2/X (Bus update + snoop action = 30), W3/X (Bus
update + snoop action = 30), W3/X (Bus update + snoop action = 30), R3/X (Read
hit = 1), W3/X (Bus update + snoop action = 30), R1/X (Read hit = 1), R2/X (Read
hit = 1). As a summary, the entire sequence will take 354 cycles.
MESI-invalidate:
The only difference between MESI and MSI is shown in the 2nd request
(W1/X). When the 2nd request is executed, processor 1 has the only copy of X.
Therefore, the state of the block is E (Exclusive). In the E state, write request by its
own processor
can be done without bus upgrade. Therefore, the 2 request takes 1 cycle. Note that this doesn’t
nd

happen in the subsequent accesses as there are always multiple copies of X in the system. As a
summary, the entire sequence will take 444 cycles.

Problem 2 [True/False Sharing] (20 pts.)

Now, for the access sequence given below let us assume that each request actually accesses one of
the three different variables (A, B, C) within the block X like below: (The notation Ri/A and Wi/A
means a read and write operation, respectively, by processor/cache unit i to variable A in the block
X)

R1/A, W1/A, W3/B, R1/A, W2/B, W3/B, W3/A, R3/C, W3/C, R1/C, R2/B

Assume that the cache coherence protocol is MSI invalidate.

(a) Refer to Chapter 5.4.6 to learn about true sharing and false sharing miss and Chapter 4.3.5 for
cold miss. Then classify the misses with respect to cold, true sharing, and false sharing misses and
fill the 5th column of Table 2.

2
(b) Which of the misses could be ignored and still guarantee that the execution is correct?

(c) The data traffic (bytes to be transferred) made by each protocol action is shown in Table 3 and
the block size is 64 B. Calculate the traffic generated by the given access sequence. Then, determine
the fraction of essential traffic. The essential traffic is the data traffic that is made by cache misses
that cannot be ignored.

Solution: (a)
Table 2
Req # Processor 1 Processor 2 Processor 3 Miss type

1 R1/A Cold

2 W1/A

3 W3/B Cold

4 R1/A False-sharing

5 W2/B Cold

3
6 W3/B True-sharing

7 W3/A

8 R3/C

9 W3/C

10 R1/C True-sharing

11 R2/B True-sharing

(b) Req 4 can be ignored since it’s false-sharing.

(c) The traffic made by the sequence under MSI protocol is as follows from the 1st
to the 11th accesses: (6 + 64) + 10 + (6 + 64) + (6 + 64) + (6 + 64) + (6 + 64) + 0 +
0 + 0 + (6 +64) + (6 + 64) = 500 bytes.
Since Req 4 can be ignored. Therefore, the fraction of the essential traffic is
430/500 = 0.86 (86 %).

Problem 3 [Synchronization] (20 pts.)

Exercise 7.2 except (e) in the textbook.
Solution:
(a) That’s because BAR is a monotonically increasing value. If the statement reads BAR
while it is being updated, it might miss the current increment, but will
eventually observe BAR reaching N.
(b)
lock: LB R1,bar_lock
BNEZ R1,lock
T&S R1,bar_lock
BNEZ R1, lock
LW R2,BAR
ADDI R2,R2,#1
SW R2,BAR
SB R0,bar_lock
barrier: LW R2,BAR
BNE R2,R3,barrier //assume (R3)=N
(c)
F&A R1,BAR
wait: LW R2,BAR
BNE R2,R3,wait //assume R3 contains the value of N
(d) If R3 contains the value of N, then CAS(R3,R0,BAR) implements the condition
atomically. There is no need to implement the condition statement atomically because the
processor that reads “BAR=N” is the one that increments BAR for the last time (which can
only be done one at a time). By that time all other processors will be in their while loop.
The code can deadlock. Remember that we cannot make any assumption about the relative
4
speed of processors. For example, the process that arrives at the barrier last and resets BAR
to 0 could be suspended right before starting its while loop. During that time other
processes may execute the next iteration all the way to the next barrier execution and start
incrementing BAR. When the suspended process is awakened, it will see BAR not zero
and therefore will busy wait on the first barrier. Since it will never reach the second barrier
to increment BAR, all processes will wait on the second barrier forever.

Problem 4 [Cache Coherence] (15 pts.)

This problem considers the simple MSI, bus-based snooping protocol for cache
coherence discussed in class. There are several processor/cache nodes connected to the
bus, along with main memory. Each processor has a private L1 cache that is direct
mapped and contains 4 blocks of two words each. The initial state of main memory
and each cache block is shown in the figure below. To simplify the figure, the cache
address tag contains the full address and each word shows only two hex characters
(the rest of the data word is all zeroes, not shown). The lower addressed word of the
block is on the right; i.e., for block B0 of cache P0, the data for address x100 is on the
right, and the data for x104 is on the left. The coherence states are denoted M, S, and I
for Modified, Shared, and Invalid, respectively. Addresses and data are represented in
hexadecimal.

List the cache block states that change for each of the actions below. Treat each as an
independent action applied to the initial state shown, do not accumulate changes
from one part to the next. Simply list the blocks that change state in any cache and
memory for each action. List your answer in the form P0.B0: (I, 100, 00, 10) to mean
processor cache P0's Block B0 is Invalid, the tag holds 100, the data words are 00 and 10;
for memory use M:100 (00,00). The action write addr¬data means write data word data
to address addr. When a block changes to Invalid, assume only the state bit changes, so
your answer should include the tag and data field’s actual (current) values.

5
(1) P15: read 118
P15.B3: (S, 118, 00, 18) (1 point)

(2) P15: write 100 ¬48

P15.B0: (M, 100, 00, 48) (1 point)

(3) P15: write 118 ¬80

P15.B3: (M, 118, 00, 80) (1 point)
P1.B3: (I, 118, 00, 18) (1 point)

(4) P15: write 108 ¬80

P15.B1: (M, 108, 00, 80) (1 point)
P0.B1: (I, 108, 00, 08) (1 point)

(5) P15: read 110

P0.B2: (S, 110, 00, 30) (1 point)
P15.B2: (S, 110, 00, 30) (1 point)
M: 110 (00, 30) (1 point)

(6) P15: read 128

P1.B1: (S, 128, 00, 68) (1 point)
P15.B1: (S, 128, 00, 68) (1 point)

6
M: 128 (00, 68) (1 point)

(7) P15: write 110 ¬40

P0.B2: (I, 110, 00, 30) (1 point)
P15.B2: (M, 110, 00, 40) (1 point)
M: 110 (00, 30) (1 point)

Problem 5 [Locks] (12 pts.)

compare-and-swap(R1,R2,L) is an atomic synchronization primitive which atomically
compares the value in memory location L with R1, and if and only if they are equal,
exchanges the values in R2 and L. compare-and-swap(R1,R2,L) can be used to
efficiently emulate many other primitives.

Part A [2 points]
Implement an atomic test-and-set on memory address L in assembly using compare-
and- swap(R1,R2,L) as the only atomic primitive. Let L = 1 when the lock is taken and
L = 0 when it is free, and these will be the only values present at L. You may use any
registers you like.
(HINT: If the location is in a certain state, you do not need to do anything.)

R1 <- 0
R2 <- 1
compare-and-swap(R1,R2,L)

Part B [5 points]
Implement the test-and-test-and-set semantics on memory address L in assembly
using compare-and-swap as the only atomic primitive. Let L=1 when the lock is
taken, and L=0 when it is free. You can use any registers you like as well as ordinary
loads and stores. Include any instructions needed to ensure that the operation
eventually completes successfully, as if you are actually trying to acquire a lock.

R1 <- 1
LOCK: R1 <- Load(L)
BNEZ R1, LOCK
compare-and-swap(R1,R2,L)
BNEZ R2, LOCK

Part C [5 points]
Use compare-and-swap to implement an atomic fetch-and-increment(R1,L) in
assembly, which atomically copies the old value in L to R1 and then increments the
value in L by 1. Again, you can use any registers you like well as ordinary loads and
stores. Include any instructions needed to ensure that the operation eventually
completes successfully; i.e., the increment must be guaranteed to occur atomically.
FINC: R1 <- Load(L)
R2 <- R1 + 1
7
compare-and-swap(R1,R2,L)
BNEQ R1, R2, FINC

Note: Syntactical changes are allowed e.g. instead of R2 <- R1+1 you can use ADDI instruction
etc.

Problem 6 [Sequential Consistency] (20 pts.)

For this problem, assume sequential consistency. Consider the following code
fragments executed on two processors:
Initially P=Q=R=S=0
P1 P2
P=5 L=Q
Q=12 M=P
R=8 N=S
S=6 O=R

Part A [4 points]
The processors execute instructions independent of each other. Thus, the order of
execution of instructions cannot be determined a priori if no constraint is placed on the
execution of the instructions. What synchronization is required to ensure that all of P1’s
instructions are to be executed before any of P2’s instructions, as given above? Ensure
your solution makes clear the constituent memory operations used for the synchronization;
i.e., don’t insert just a call to a function or library without the code for that function/library.
Unnecessarily inefficient solutions will not get full credit.

Solution:

Initially
P=Q=R= S=0
Initially V = 0

P1 P2
while (V == 0);
P=5 L=Q
Q = 12 M=P
R=8 N=S
S=6 O=R
V=1

We also accept solutions that use S as a test variable instead of V, e.g. while (S!=6).

Part B [4 points]
Suppose we don’t care about the relative order of execution of the sets of instructions, but
we want atomicity in execution; i.e., we want all instructions of one processor to complete
before any instructions are executed in the other processor. What synchronization is
8
required to ensure this?

Solution:

This can be achieved by placing the code fragments between lock and unlock operations.
These operations can be implemented using Test & Set or any other technique but must
all be to the same location.

P1 P2
Lock(x) Lock(x)
P=5 L=Q
Q = 12 M=P
R=8 N=S
S=6 O=R
Unlock(x) Unlock(x)

Part C [6 points]
On a sequentially consistent system, what values can L, M, N, and O have at the end of
execution for the code in your solution of problem 6, part A? You must explain your
answer for full credit.

Solution:
L = 12, M = 5, N = 6, O = 8. On a sequentially consistent processor, we can assume that
P2’s reads of P, Q, R, and S won’t occur until its while loop terminates; i.e., until P2
sees the value 1 for V. Once P2 sees the value 1 for V, we know that P1’s write to V has
occurred and so we can assume that P1’s writes to P, Q, R, and S are complete. We are
therefore guaranteed that P2’s reads of P, Q, R, and S will see the values written to those
variables by P1.

We also accept solutions that for part A use S as a test variable instead of V.

Part D [6 points]
On a system that is not known to be sequentially consistent, what possible values can
variables L, M, N, and O have at the end of execution, for the code in your solution of
problem 6 part A? You must explain your answer for full credit.

Solution:
L = 0 or 12, M = 0 or 5, N = 0 or 6, O = 0 or 8. Since the system is not known to be
sequentially consistent, it is possible that some or all of P2’s reads of P, Q, R, S actually
occur before its while loop terminates, or some or all of P1’s writes to P, Q, R, S actually
occur after P1’s write to V. So P2 is not guaranteed to see P1’s writes to P, Q, R, S, and
may see only a subset of those writes.

We also accept solutions that for part A use S as a test variable instead of V.

9
Problem 7 [Cache organization] (20 pts.)
Exercise 7.8 in the textbook with the following modifications:
In the semi-static strategy, if a processor is idle, then it “steals” a task with the highest task
index from another processor, which still has tasks pending in its task queue. The original
schedule has a higher priority when alternative schedules exist. A lower numbered
processor has a higher priority always. Find speedups w.r.t. a single processor machine
with the same processor.

Solution:
1. T(i) = 1+i
Strategy Execution Time Speedup
Static 28 3.25
Semi-static 28 3.25
Dynamic 28 3.25
Optimum 23 3.96

2. T(i) = 13-i
Strategy Execution Time Speedup
Static 28 3.25
Semi-static 26 3.5
Dynamic 24 3.79
Optimum 23 3.96

To compute the speedup, the maximum execution time taken to complete the given tasks by
each processor is divided by the execution time taken by single processor, which is 91. The
examples of scheduling tasks depending on the strategy are given in the tables below. Note
that schedules for dynamic and optimum strategies can be varied. (1) and (2) indicate the cases
where the execution time for each task is T(i)=1+i, and T(i)=13-i, respectively. The maximum
execution time among the threads is in bold.

10
11

EE557 Homework: Memory Consistency
No ratings yet
EE557 Homework: Memory Consistency
8 pages
Midterm Solution
No ratings yet
Midterm Solution
18 pages
HW3S24 Sol
No ratings yet
HW3S24 Sol
16 pages
HW1 Sol SP 25
No ratings yet
HW1 Sol SP 25
11 pages
HW2 S24 Sol
No ratings yet
HW2 S24 Sol
15 pages
UVM Tips Tricks
No ratings yet
UVM Tips Tricks
36 pages
Baics of SystemVerilog
100% (1)
Baics of SystemVerilog
4 pages
Apb Code
No ratings yet
Apb Code
26 pages
Apbuvm
No ratings yet
Apbuvm
48 pages
SV Assignment Questions
No ratings yet
SV Assignment Questions
5 pages
8B/10B and 64B/66B Coding Systems
No ratings yet
8B/10B and 64B/66B Coding Systems
29 pages
Verilog Quiz for Engineers
No ratings yet
Verilog Quiz for Engineers
5 pages
Goto Repetition, Boolean ( ( - N), ( - N:M) )
No ratings yet
Goto Repetition, Boolean ( ( - N), ( - N:M) )
3 pages
UVM Interview Questions
No ratings yet
UVM Interview Questions
8 pages
SPI Module
No ratings yet
SPI Module
5 pages
Faqs Verilog
No ratings yet
Faqs Verilog
7 pages
SV Notes
No ratings yet
SV Notes
265 pages
Verilog Imp...
No ratings yet
Verilog Imp...
105 pages
Embedded C Interview Guide - 100 Key Questions & Answers
No ratings yet
Embedded C Interview Guide - 100 Key Questions & Answers
39 pages
Functional Coverage
No ratings yet
Functional Coverage
24 pages
CPU Cycles and Pipeline Performance
No ratings yet
CPU Cycles and Pipeline Performance
16 pages
Assertions Examples
No ratings yet
Assertions Examples
6 pages
APB Bridge Module Design
No ratings yet
APB Bridge Module Design
25 pages
Verilog Interview Essentials
No ratings yet
Verilog Interview Essentials
12 pages
Apznzaaldpmtggx Llyfwo71d6v Aixwbjouqilw9rzhf2e74gbum5aqggmwxxck8ly8xgcbf76 Vynyzo0mtc2ycekrkoeamqzaeb
No ratings yet
Apznzaaldpmtggx Llyfwo71d6v Aixwbjouqilw9rzhf2e74gbum5aqggmwxxck8ly8xgcbf76 Vynyzo0mtc2ycekrkoeamqzaeb
4 pages
Xge Mac Spec
100% (1)
Xge Mac Spec
24 pages
Interview Questions & Answers: Ans: 101 ASCII Value of "A" Is 1000001, Which When Converted Into Octal Gives 101
No ratings yet
Interview Questions & Answers: Ans: 101 ASCII Value of "A" Is 1000001, Which When Converted Into Octal Gives 101
16 pages
1x3 Router Project
No ratings yet
1x3 Router Project
3 pages
Cache Coherence in Parallel Computing
No ratings yet
Cache Coherence in Parallel Computing
31 pages
AMBA APB Protocol Design & Verification
No ratings yet
AMBA APB Protocol Design & Verification
7 pages
SV Assertion PART 3 1702008463
No ratings yet
SV Assertion PART 3 1702008463
6 pages
Important Topics Solution On Systemverilog (SV) Needed in Frontend Domain of Vlsi
No ratings yet
Important Topics Solution On Systemverilog (SV) Needed in Frontend Domain of Vlsi
8 pages
AXI Protocol
No ratings yet
AXI Protocol
34 pages
ECRC Calculation
No ratings yet
ECRC Calculation
8 pages
APB Protocol 10 Pages
100% (1)
APB Protocol 10 Pages
22 pages
UVM Acceleration for Engineers
No ratings yet
UVM Acceleration for Engineers
7 pages
DDR3 Write and Read Leveling Mechanism
No ratings yet
DDR3 Write and Read Leveling Mechanism
3 pages
Technical Content Reference - Resume
No ratings yet
Technical Content Reference - Resume
8 pages
Verilog Gotchas Part1
No ratings yet
Verilog Gotchas Part1
63 pages
Verilog HDL Implementation of Usb To Ethernet Converter
No ratings yet
Verilog HDL Implementation of Usb To Ethernet Converter
6 pages
Difference: new() vs new[] in SystemVerilog
No ratings yet
Difference: new() vs new[] in SystemVerilog
9 pages
System Verilog
No ratings yet
System Verilog
51 pages
SV TB Archetecture
No ratings yet
SV TB Archetecture
12 pages
UVM Driver and Sequencer Handshaking
No ratings yet
UVM Driver and Sequencer Handshaking
6 pages
Introduction To SystemVerilog and Verification
No ratings yet
Introduction To SystemVerilog and Verification
107 pages
2005-SNUG-Paper SystemVerilog Unique and Priority
No ratings yet
2005-SNUG-Paper SystemVerilog Unique and Priority
24 pages
APB
No ratings yet
APB
29 pages
Opencores Coding Guidelines
No ratings yet
Opencores Coding Guidelines
28 pages
Verilog Practice Programs
No ratings yet
Verilog Practice Programs
72 pages
Bromley Coverage Paper
No ratings yet
Bromley Coverage Paper
20 pages
AMBA AXI-3 Verification Guide
No ratings yet
AMBA AXI-3 Verification Guide
8 pages
Adv Verif Topics Book Final-ACCELERATION
No ratings yet
Adv Verif Topics Book Final-ACCELERATION
25 pages
Case Statement
No ratings yet
Case Statement
10 pages
System Verilog Data Type
No ratings yet
System Verilog Data Type
34 pages
Design of AXI Verification IP
No ratings yet
Design of AXI Verification IP
25 pages
Full Low Power UPF Verification QA
100% (1)
Full Low Power UPF Verification QA
3 pages
Sample Midterm
No ratings yet
Sample Midterm
9 pages
Cache Coherency in Multiprocessors (MPS) / Multi-Cores: Topic 9
No ratings yet
Cache Coherency in Multiprocessors (MPS) / Multi-Cores: Topic 9
79 pages
Cache Coherence - MESI MOESI
No ratings yet
Cache Coherence - MESI MOESI
57 pages
Chapter 05
No ratings yet
Chapter 05
19 pages
WS100 Manual PDF
No ratings yet
WS100 Manual PDF
42 pages
Strength of Materials Basics
No ratings yet
Strength of Materials Basics
21 pages
Linux Command Cheat Sheet
No ratings yet
Linux Command Cheat Sheet
2 pages
Fso Corrigendum
No ratings yet
Fso Corrigendum
3 pages
5th Connective Grammar 2-Worksheet 4
No ratings yet
5th Connective Grammar 2-Worksheet 4
4 pages
Improving Electrical System Reliability With Infrared Thermography
No ratings yet
Improving Electrical System Reliability With Infrared Thermography
12 pages
Linguistic Analysis of Adjectives
No ratings yet
Linguistic Analysis of Adjectives
175 pages
Colour
No ratings yet
Colour
20 pages
11 Computer Science
No ratings yet
11 Computer Science
4 pages
Logical DI PART 2 - Sol
No ratings yet
Logical DI PART 2 - Sol
3 pages
3.soil Nutrient Daynamics-2019
No ratings yet
3.soil Nutrient Daynamics-2019
18 pages
Semester Exam I Revision MS
No ratings yet
Semester Exam I Revision MS
11 pages
Ee8702 Power System Operation and Control Syllabus
No ratings yet
Ee8702 Power System Operation and Control Syllabus
2 pages
371 810 Falk Quadrive Shaft Mounted Drive Interchange Guide
No ratings yet
371 810 Falk Quadrive Shaft Mounted Drive Interchange Guide
12 pages
Account Statement From 15 May 2019 To 17 May 2019
No ratings yet
Account Statement From 15 May 2019 To 17 May 2019
2 pages
Evaporation: Physical Separation Processes ECH3118 Faizah MD Yasin
100% (1)
Evaporation: Physical Separation Processes ECH3118 Faizah MD Yasin
49 pages
JEE Main Entrance Test Chemistry Model Paper 2
No ratings yet
JEE Main Entrance Test Chemistry Model Paper 2
5 pages
Cu-Ru (Copper-Ruthenium) : Phase Diagram Updates: Section III
No ratings yet
Cu-Ru (Copper-Ruthenium) : Phase Diagram Updates: Section III
1 page
Steel Bridge: Prepared By: Christian J. Ejoc
No ratings yet
Steel Bridge: Prepared By: Christian J. Ejoc
13 pages
Jamorama Piano - Fingering Exercises - Web
89% (9)
Jamorama Piano - Fingering Exercises - Web
45 pages
Singer A Phragmentary Tablet Bearing An Unknowwn Script, Aphek II, 2009
No ratings yet
Singer A Phragmentary Tablet Bearing An Unknowwn Script, Aphek II, 2009
13 pages
Milnor UniqueDecompositionTheorem 1962
No ratings yet
Milnor UniqueDecompositionTheorem 1962
8 pages
Efficacy of Reactive Mineral-Based Sorbents For Phosphate, Bacteria, Nitrogen and TOC Removal e Column Experiment in Recirculation Batch Mode
No ratings yet
Efficacy of Reactive Mineral-Based Sorbents For Phosphate, Bacteria, Nitrogen and TOC Removal e Column Experiment in Recirculation Batch Mode
11 pages
7.3. Electric Fields QP
No ratings yet
7.3. Electric Fields QP
16 pages
Catalogo de Partes Gsxs 750 PDF
No ratings yet
Catalogo de Partes Gsxs 750 PDF
279 pages
Gauteng Province Mathematics - Annual Teaching Plan - Grade 11 2017
No ratings yet
Gauteng Province Mathematics - Annual Teaching Plan - Grade 11 2017
3 pages
OAT Practicals Manual
No ratings yet
OAT Practicals Manual
27 pages
Econometrics All Chapters
No ratings yet
Econometrics All Chapters
98 pages
3.08 Patch Cords
No ratings yet
3.08 Patch Cords
4 pages

HW4S24 - Sol

Uploaded by

HW4S24 - Sol

Uploaded by

University of Southern California

Department of Electrical Engineering

Problem 1 [Cache Protocols] (20 pts.)

Request type Time to carry out protocol action

Read/write hit 1 cycle

Bus Read 30 cycles

Read request serviced by next level 80 cycles

Bus upgrade request 10 cycles (including read/write hit)

Bus update 25 cycles (including read/write hit)

Read block and invalidate other copies (BusRdX) 35 cycles

Snoop action 5 cycles

MSI-invalidate (Figure 5.10):

Problem 2 [True/False Sharing] (20 pts.)

Assume that the cache coherence protocol is MSI invalidate.

(b) Req 4 can be ignored since it’s false-sharing.

Problem 3 [Synchronization] (20 pts.)

Problem 4 [Cache Coherence] (15 pts.)

(2) P15: write 100 ¬48

(3) P15: write 118 ¬80

(4) P15: write 108 ¬80

(5) P15: read 110

(6) P15: read 128

(7) P15: write 110 ¬40

Problem 5 [Locks] (12 pts.)

Problem 6 [Sequential Consistency] (20 pts.)

You might also like