Performance Measure
Measuring Performance
• When can we say one computer / architecture / design is better than
others?
• Desktop PC – (execution time of a program)
• Server (transctions/ unit time)
• When can we say X is n times faster than Y?
• Execution timeY / Execution timeX = n
• ThroughputX / ThroughputY = n
Measuring Performance
• Typical Performance metrics
• Response Time
• Throughput
• CPU Time
• Wall Clock Time
• Speedup
• Benchmarks
• Toy Programs (e.g., sorting, matrix multiply)
• Synthetic benchmarks (e.g., Dhrystone)
• Benchmark suites (e.g., SPEC06, SPLASH)
Amdahl's Law
• Amdahl's Law defines the speedup that can be gained by improving
some portion of a computer.
• The performance improvement to be gained from using some faster
mode of execution is limited by the fraction of the time the faster
mode can be executed.
Amdahl’s Law - Illustration
Solution
Fraction Enhanced = 0.4
Speedup enhanced = 10
Amdahl’s Law for Parallel Processing
How much Speed up you can achieve?
Design Example
Example: Amdahl’s Law
Example: Amdahl’s Law
Example: Amdahl’s Law
RISC and CISC
RISC and CISC
• when it comes to understanding and designing computer
architectures, three concepts are in center:
• Instruction Set, RISC and CISC.
CPU Performance
• The performance of a CPU is the number of programs
it can run in a given time. The more the number of
programs it can run in that time, the faster the CPU is.
• The performance is determined by the number of
instructions that a program has: more instructions,
more time to perform them. It also depends upon the
number of cycles (clock cycles) per instructions.
• This means that there are only two ways to improve the performance:
• either minimize the number of instructions per program, or
• reduce the number of cycles per instruction.
CISC ARCHITECTURE
• CISC is the shorthand for Complex Instruction Set Computer.
• The CISC architecture tries to reduce the number of Instructions that
a program has, thus optimizing the Instructions per Program.
• This is done by combining many simple instructions into a single
complex one.
Example: MUL instruction
• Ex: MUL 1200, 1201
• This instruction
• First takes two inputs, the memory location of the two numbers to multiply,
• it then performs the multiplication and
• Finally, it stores the result in the first memory location.
• This reduces the amount of work that the compiler has to do as the
instructions themselves are very high level.
• The instructions take very little memory in the RAM and most of
the work is done by the hardware while decoding instructions.
• Since in a CISC style instruction, the CPU has to do more work in a
single instruction, so clock cycles are more.
• Moreover, the number of general purpose registers are less as
more transistors need to be used to decode the instructions.
RISC ARCHITECTURE
• Reduced Instruction Set Computer or RISC architectures have more
instructions, but they reduce the number of cycles that an instruction
takes to perform.
• Generally, a single instruction in a RISC machine will take only one
CPU cycle.
• Multiplication in a RISC architecture cannot be done with a single
MUL like instruction.
• Instead, we have to first load the data from the memory using the
LOAD instruction, then multiply the numbers, and the store the result
in the memory.
• Load A, 1200
• Load B, 1201
• Mul A, B
• Store 1200, A
• In RISC architectures, we can only perform operations on Registers
and not directly on the memory.
• This might seem like a lot of work, but in reality, since each of these
instructions only take up one clock cycle, the whole multiplication
operation is completed in fewer clock cycles.
• RISC has simpler instruction sets, complex High-Level Instructions
needs to be broken down into many instructions by the compiler.
• This puts a lot of stress on the software and the software designers,
while reducing the work needed to be done by the hardware.
• The decoding logic is simple, transistors required are lesser and more
number of general purpose registers can be fit into the CPU.
Comparison
• CISC tries to complete an action in as few lines of assembly code as
possible, RISC tries to reduce the time taken for each instruction to
execute.
• the MUL operation on two 8-bit numbers in the register, in 8086
which is a CISC device can take as much as 77 clock-cycles, whereas
the complete multiplication operation in a RISC device like a PIC
takes only 38 cycles
• Since CISC instructions take a more number of cycles to execute,
parallelism and pipelining of instructions is much harder. In RISC
however, since all instructions take one cycle, pipelining instructions
is easier.
• the compiler plays an important role in RISC systems, and its ability
to perform this “code expansion” can hinder performance.
Final word: which is better
• CISC is most often used in automation devices whereas RISC is used
in video and image processing applications.
• When microprocessors and microcontroller were first being
introduced, they were mostly CISC. This was largely because of the
lack of software support present for RISC development.
• Later a few companies started delving into the RISC architecture,
most notable, Apple, but most companies were unwilling to risk it
with an emerging technology.
Principles of Computer Design
Principles of Computer Design
Clock cycle time – hardware technology used
CPI – organization and ISA
IC – ISA and compiler technology
Principles of Computer Design
Example: Basic Performance Analysis
Example: Basic Performance Analysis