Introduction to Multiprocessor Architecture (CS307)
Fall Semester 2022
Student: Christopher Williams
Sciper: 300174
EXERCISE 02
___
Consider the following program that calls three functions sequentially: funcA, funcB, and funcC.
Assume that you profile this program using a single-core CPU, and measure that funcA, funcB,
and funcC take 15, 80, and 5 seconds, respectively.
void prog(){
funcA();
funcB();
funcC();
}
Questions
1. Suppose that you can choose one of the three functions called in prog, and
optimize it to achieve a certain speedup.
a. According to Amdahl's Law, which function should you choose and put effort into
optimizing and accelerating? [0.5pt]
funcB() because this function has the longest runtime of the three, and hence,
optimizations on this function are more impactful to the total runtime.
b. Suppose that you can accelerate funcA by a speedup of 10x, calculate the overall
speedup of the given program. [0.5pt]
Speedup = old execution time / new execution time. So: overall speedup = 100 / 86.5 =
1.156
c. Suppose that you can accelerate funcB by a speedup of 4x, calculate the overall
speedup of the given program. [0.5pt]
Speedup = old execution time / new execution time. So: overall speedup = 100 / 40 =
2.5
d. Suppose that you can accelerate funcC by a speedup of 20x, calculate the overall
speedup of the given program. [0.5pt]
Speedup = old execution time / new execution time. So: overall speedup = 100 / 95.25
= 1.050
2. Suppose that you run prog in a cluster where up to 128 cores are available. Assume that
you parallelize the function suggested by the Amdahl’s Law, and the speedup you
achieve is linear to the number of cores that you use. Plot the execution time (y-axis)
versus the number of used cores(x-axis). [2pt]
1
𝑆𝑝𝑒𝑒𝑑𝑢𝑝(𝑁) = 𝑃 where P is the parallel part and N is the number of cores, then
(1−𝑃) + 𝑁
we see that if P =0.8 (function B) as suggested by Amdahl’s law and that the initial execution
time is 80 seconds of the total 100 seconds. The Speedup is therefore:
1
𝑆𝑝𝑒𝑒𝑑𝑢𝑝(𝑁) = 0.8 . Then, using Execution time = initial execution time /speedup:
0.2 + 𝑁
0.8
Execution Time = 100 * (0. 2 + 𝑁
). By generating the values using google sheets and
plotting them in a chart, we obtain the following:
3. Suppose that a cloud provider charges users with 1$ per machine per second, and each
machine has a CPU with 32 cores. You can use as many machines as you want, but note
that the cloud provider will charge you full price even if you don’t utilize all available
cores on each machine. The machine has to be rented for the whole program.
a. For a single execution of prog, plot the cost in dollars(y-axis) versus the number
of used cores (x-axis) for a range of 1 and 128 cores by parallelizing funcA, funcB,
and funcC separately. Plot all three curves in a single figure. [2pt]
b. Suppose that you have a budget of 50$. What is the minimum execution time
that you can achieve if you can only parallelize one function? [2pt]
From the spreadsheet that I have made ( link:
https://docs.google.com/spreadsheets/d/1ZgENg6rogu1gucZ8BfDi5yITbw82nBGx19L
JS1T-5Jo/edit?usp=sharing ), one can tell very quickly that minimum execution times
will come when we parallelize funcB, and with a budget of 50$ we can afford 2 CPUs,
using the full 64 cores, we find that the minimum runtime is 21.25 seconds.
4. Name two differences between shared memory and message passing models. [2pt]
i. Shared memory is a relatively faster model in general when compared to
message passing.
ii. Read and writes in the shared memory are explicitly written in code by the
programmer, whereas this is not necessary with message passing models.