Open navigation menu

Scribd

0% found this document useful (0 votes)

46 views30 pages

Performance Measures: Part I: Jens Saak Scientific Computing II 32/348

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views30 pages

Performance Measures: Part I: Jens Saak Scientific Computing II 32/348

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Chapter 2

Performance Measures: Part

I

Jens Saak Scientific Computing II 32/348

Time Measurement and Operation Counts
The Single Processor Case

Definition
In general we call the time elapsed between issuing a command and receiving its
results the runtime, or execution time of the corresponding process. Some authors
also call it elapsed time, or wall clock time.

Jens Saak Scientific Computing II 33/348

Time Measurement and Operation Counts
The Single Processor Case

Definition
In general we call the time elapsed between issuing a command and receiving its
results the runtime, or execution time of the corresponding process. Some authors
also call it elapsed time, or wall clock time.
In the purely sequential case it is closely related to the so called CPU time of the
process.

Jens Saak Scientific Computing II 33/348

Time Measurement and Operation Counts
The Single Processor Case

Definition
In general we call the time elapsed between issuing a command and receiving its
results the runtime, or execution time of the corresponding process. Some authors
also call it elapsed time, or wall clock time.
In the purely sequential case it is closely related to the so called CPU time of the
process. There the main contributions are:
user CPU time: Time spent in execution of instructions of the process.

Jens Saak Scientific Computing II 33/348

Time Measurement and Operation Counts
The Single Processor Case

Definition
In general we call the time elapsed between issuing a command and receiving its
results the runtime, or execution time of the corresponding process. Some authors
also call it elapsed time, or wall clock time.
In the purely sequential case it is closely related to the so called CPU time of the
process. There the main contributions are:
user CPU time: Time spent in execution of instructions of the process.
system CPU time: Time spent in execution of operating system routines
called by the process.

Jens Saak Scientific Computing II 33/348

Time Measurement and Operation Counts
The Single Processor Case

Definition
In general we call the time elapsed between issuing a command and receiving its
results the runtime, or execution time of the corresponding process. Some authors
also call it elapsed time, or wall clock time.
In the purely sequential case it is closely related to the so called CPU time of the
process. There the main contributions are:
user CPU time: Time spent in execution of instructions of the process.
system CPU time: Time spent in execution of operating system routines
called by the process.
waiting time: Time spent waiting for time slices, completion of I/O,
memory fetches. . .

Jens Saak Scientific Computing II 33/348

Time Measurement and Operation Counts
The Single Processor Case

Definition
In general we call the time elapsed between issuing a command and receiving its
results the runtime, or execution time of the corresponding process. Some authors
also call it elapsed time, or wall clock time.
In the purely sequential case it is closely related to the so called CPU time of the
process. There the main contributions are:
user CPU time: Time spent in execution of instructions of the process.
system CPU time: Time spent in execution of operating system routines
called by the process.
waiting time: Time spent waiting for time slices, completion of I/O,
memory fetches. . .
That means the time we have to wait for a response of the program includes the
waiting times besides the CPU time.

Jens Saak Scientific Computing II 33/348

Time Measurement and Operation Counts
Instructions: Timings and Counts

clock rate and cycle time

The clock rate of a processor tells us how often it can switch instructions per
second. Closely related is the (clock) cycle time, i.e., the time elapsed between
two subsequent clock ticks.

Jens Saak Scientific Computing II 34/348

Time Measurement and Operation Counts
Instructions: Timings and Counts

clock rate and cycle time

The clock rate of a processor tells us how often it can switch instructions per
second. Closely related is the (clock) cycle time, i.e., the time elapsed between
two subsequent clock ticks.
Example
A CPU with a clock rate of 3.5 GHz = 3.5 · 109 1/s executes 3.5 · 109 clock ticks
per second. The length of a clock cycle thus is

1/(3.5 · 109 ) s = 1/3.5 · 10−9 · s ≈ 0.29 ns

Jens Saak Scientific Computing II 34/348

Time Measurement and Operation Counts
Instructions: Timings and Counts

Different instructions require different times to get executed. This is represented

by the so called cycles per instruction (CPI) of the corresponding instruction. An
average CPI is connected to a process A via CPI(A).

Jens Saak Scientific Computing II 35/348

Time Measurement and Operation Counts
Instructions: Timings and Counts

Different instructions require different times to get executed. This is represented

by the so called cycles per instruction (CPI) of the corresponding instruction. An
average CPI is connected to a process A via CPI(A).

This number determines the total user CPU time together with the number of
instructions and cycle time via

TU CPU (A) = ninstr (A) · CPI (A) · tcycle

Jens Saak Scientific Computing II 35/348

Time Measurement and Operation Counts
Instructions: Timings and Counts

Different instructions require different times to get executed. This is represented

by the so called cycles per instruction (CPI) of the corresponding instruction. An
average CPI is connected to a process A via CPI(A).

This number determines the total user CPU time together with the number of
instructions and cycle time via

TU CPU (A) = ninstr (A) · CPI (A) · tcycle

Clever choices of the instructions can influence the values of ninstr (A) and CPI (A).
compiler optimization.

Jens Saak Scientific Computing II 35/348

Time Measurement and Operation Counts
MIPS versus FLOPS

A common performance measure of CPU manufacturers is the Million instructions

per second (MIPS) rate.

It can be expressed as

ninstr (A) rcycle

MIPS(A) = 6
= ,
TU CPU (A) · 10 CPI (A) · 106

where rcycle is the cycle rate of the CPU.

Jens Saak Scientific Computing II 36/348

Time Measurement and Operation Counts
MIPS versus FLOPS

A common performance measure of CPU manufacturers is the Million instructions

per second (MIPS) rate.

It can be expressed as

ninstr (A) rcycle

MIPS(A) = 6
= ,
TU CPU (A) · 10 CPI (A) · 106

where rcycle is the cycle rate of the CPU.

This measure can be misleading in high performance computing, since higher

instruction throughput does not necessarily mean shorter execution time.

Jens Saak Scientific Computing II 36/348

Time Measurement and Operation Counts
MIPS versus FLOPS

More common for the comparison in scientific computing is the rate of floating
point operations (FLOPS) executed. The MFLOPS rate of a program A can be
expressed as
nFLOPS (A)
MFLOPS(A) = [1/s],
TU CPU (A) · 106
with nFLOPS (A) the total number of FLOPS issued by the program A.

Jens Saak Scientific Computing II 37/348

Time Measurement and Operation Counts
MIPS versus FLOPS

More common for the comparison in scientific computing is the rate of floating
point operations (FLOPS) executed. The MFLOPS rate of a program A can be
expressed as
nFLOPS (A)
MFLOPS(A) = [1/s],
TU CPU (A) · 106
with nFLOPS (A) the total number of FLOPS issued by the program A.

Note that not all FLOPS (see also Chapter 4 winter term) take the same time to
execute. Usually divisions and square roots are much slower. The MFLOPS rate,
however, does not take this into account.

Jens Saak Scientific Computing II 37/348

Time Measurement and Operation Counts
CPU Time versus Execution Time

Example (A simple MATLAB® test)

Input:

ct0=0;
A=randn(1500);

tic
ct0=cputime;
pause(2)
toc
cputime-ct0

tic
ct0=cputime;
[Q,R]=qr(A);
toc
cputime-ct0

Jens Saak Scientific Computing II 38/348

Time Measurement and Operation Counts
CPU Time versus Execution Time

Example (A simple MATLAB test)

Input: Output:

ct0=0; Elapsed time is 2.000208 seconds.

A=randn(1500);
ans =
tic
ct0=cputime; 0.0300
pause(2)
toc Elapsed time is 0.733860 seconds.
cputime-ct0
ans =
tic
ct0=cputime; 21.6800
[Q,R]=qr(A);
toc
cputime-ct0
Executed on a 4x8core Xeon® system.

Jens Saak Scientific Computing II 38/348

Time Measurement and Operation Counts
CPU Time versus Execution Time

Obviously, in a parallel environment the CPU time can be much higher than the
actual execution time elapsed between start and end of the process.

In any case, it can be much smaller, as well.

Jens Saak Scientific Computing II 39/348

Time Measurement and Operation Counts
CPU Time versus Execution Time

Obviously, in a parallel environment the CPU time can be much higher than the
actual execution time elapsed between start and end of the process.

In any case, it can be much smaller, as well.

The first result is easily explained by the splitting of the execution time into
user/system CPU time and waiting time. The process is mainly waiting for the
sleep system call to return whilst basically accumulating no active CPU time.

Jens Saak Scientific Computing II 39/348

Time Measurement and Operation Counts
CPU Time versus Execution Time

Obviously, in a parallel environment the CPU time can be much higher than the
actual execution time elapsed between start and end of the process.

In any case, it can be much smaller, as well.

The first result is easily explained by the splitting of the execution time into
user/system CPU time and waiting time. The process is mainly waiting for the
sleep system call to return whilst basically accumulating no active CPU time.

The second result is due to the fact that the activity is distributed to several
cores. Each activity accumulates its own CPU time and these are summed up to
the total CPU time of the process.

Jens Saak Scientific Computing II 39/348

Parallel Cost and Optimality

Definition (Parallel cost and cost-optimality)

The cost of a parallel program with data size n is defined as

Cp (n) = p ∗ Tp (n).

Here Tp (n) is the parallel runtime of the process, i.e., its execution time on p
processors.
The parallel program is called cost-optimal if

Cp = T ∗ (n).

Here, T ∗ (n) represents the execution time of the fastest sequential program
solving the same problem.
In practice T ∗ (n) is often approximated by T1 (n).

Jens Saak Scientific Computing II 40/348

Speedup

The speedup of a parallel program

T ∗ (n)
Sp (n) = ,
Tp (n)

is a measure for the acceleration, in terms of execution time, we can expect from
a parallel program.

The speedup is strictly limited from above by p Since otherwise the parallel
program would motivate a faster sequential algorithm. See [Rauber/Rünger ’10]
for details.
In practice often the speedup is computed with respect to the sequential version
of the code, i.e.,
T1 (n)
Sp (n) ≈ .
Tp (n)

Jens Saak Scientific Computing II 41/348

Parallel Efficiency

Usually, the parallel execution of the work a program has to perform comes at the
cost of certain management of subtasks. Their distribution, organization and
interdependence leads to a fraction of the total execution, that has to be done
extra.
Definition
The fraction of work that has to be performed by a sequential algorithm as well is
described by the parallel efficiency of a program. It is computed as

T ∗ (n) Sp (n) T∗
Ep (n) = = = .
Cp (n) p p · Tp (n)

The parallel efficiency obviously is limited from above by Ep (n) = 1 representing

the perfect speedup of p.

Jens Saak Scientific Computing II 42/348

Amdahl’s Law

In many situations it is impossible to parallelize the entire program. Certain

fractions remain that need to be performed sequentially. When a (constant)
fraction f of the program needs to be executed sequentially, Amdahl’s law
describes the maximum attainable speedup.

Jens Saak Scientific Computing II 43/348

Amdahl’s Law

In many situations it is impossible to parallelize the entire program. Certain

fractions remain that need to be performed sequentially. When a (constant)
fraction f of the program needs to be executed sequentially, Amdahl’s law
describes the maximum attainable speedup.

The total parallel runtime Tp (n) then consists of

f · T ∗ (n) the time for the sequential fraction and
(1 − f )/p · T ∗ (n) the time for the fully parallel part.

Jens Saak Scientific Computing II 43/348

Amdahl’s Law

In many situations it is impossible to parallelize the entire program. Certain

fractions remain that need to be performed sequentially. When a (constant)
fraction f of the program needs to be executed sequentially, Amdahl’s law
describes the maximum attainable speedup.

The total parallel runtime Tp (n) then consists of

f · T ∗ (n) the time for the sequential fraction and
(1 − f )/p · T ∗ (n) the time for the fully parallel part.

The best attainable speedup can thus be expressed as

T ∗ (n) 1 1
Sp (n) = = ≤ .
f · T ∗ (n) + 1−f
p T ∗ (n) f + 1−f
p
f

Jens Saak Scientific Computing II 43/348

Scalability of Parallel Programs

Question
Is the parallel efficiency of a parallel program independent of the number of
processors p used?
The question is answered by the concept of parallel scalability. Scientific
computing and HPC distinguish two forms of scalability:

Jens Saak Scientific Computing II 44/348

Scalability of Parallel Programs

Question
Is the parallel efficiency of a parallel program independent of the number of
processors p used?
The question is answered by the concept of parallel scalability. Scientific
computing and HPC distinguish two forms of scalability:

strong scalability
captures the dependence of the parallel runtime on the number of processors
for a fixed total problem size.

Jens Saak Scientific Computing II 44/348

Scalability of Parallel Programs

Question
Is the parallel efficiency of a parallel program independent of the number of
processors p used?
The question is answered by the concept of parallel scalability. Scientific
computing and HPC distinguish two forms of scalability:

strong scalability
captures the dependence of the parallel runtime on the number of processors
for a fixed total problem size.

weak scalability
captures the dependence of the parallel runtime on the number of processors
for a fixed problem size per processor.

Jens Saak Scientific Computing II 44/348

You might also like

Co Unit1 Part3
No ratings yet
Co Unit1 Part3
11 pages
Defining Performance
No ratings yet
Defining Performance
6 pages
Coa Unit 1 Problems
No ratings yet
Coa Unit 1 Problems
6 pages
Computer Performance Metrics
No ratings yet
Computer Performance Metrics
40 pages
Computer Performance Metrics
No ratings yet
Computer Performance Metrics
13 pages
Unit 2 Performance
No ratings yet
Unit 2 Performance
6 pages
Computer Organization The Role of Performance
No ratings yet
Computer Organization The Role of Performance
45 pages
EC8552 CAO Unit-1 S03
No ratings yet
EC8552 CAO Unit-1 S03
19 pages
Computer Performance Insights
No ratings yet
Computer Performance Insights
22 pages
Measuring Performance: Chris Clack B261 Systems Architecture
No ratings yet
Measuring Performance: Chris Clack B261 Systems Architecture
19 pages
Lecture # 2
No ratings yet
Lecture # 2
33 pages
A Constant Clock Rate:: - Most Computers Run Synchronously Utilizing A CPU Clock Running at
No ratings yet
A Constant Clock Rate:: - Most Computers Run Synchronously Utilizing A CPU Clock Running at
45 pages
CS322 - Computer Architecture (CA) : Spring 2019 Section V3
No ratings yet
CS322 - Computer Architecture (CA) : Spring 2019 Section V3
56 pages
Performance
No ratings yet
Performance
51 pages
Lec10 Performance
No ratings yet
Lec10 Performance
22 pages
Chapter 1 Performance
No ratings yet
Chapter 1 Performance
32 pages
CS322 - Computer Architecture (CA) : Spring 2019 Section V3
No ratings yet
CS322 - Computer Architecture (CA) : Spring 2019 Section V3
52 pages
Cse - 321 - 2
No ratings yet
Cse - 321 - 2
37 pages
MCQS 8085
No ratings yet
MCQS 8085
7 pages
Lecture 4
No ratings yet
Lecture 4
37 pages
Module 2 (26-10-2024)
No ratings yet
Module 2 (26-10-2024)
50 pages
550 12 6 2011 PDF
No ratings yet
550 12 6 2011 PDF
45 pages
CPU Performance Metrics Guide
No ratings yet
CPU Performance Metrics Guide
31 pages
Module-2 Introduction and Performance Analysis
No ratings yet
Module-2 Introduction and Performance Analysis
51 pages
Performance Measures
No ratings yet
Performance Measures
25 pages
Computer Performance Analysis
No ratings yet
Computer Performance Analysis
23 pages
C A Lecture-3
No ratings yet
C A Lecture-3
41 pages
M116C 1 M116C 1 Lect02-Performance
No ratings yet
M116C 1 M116C 1 Lect02-Performance
23 pages
Module 3.3 - Problems On Performance
No ratings yet
Module 3.3 - Problems On Performance
54 pages
Computer Architecture Performance Analysis
No ratings yet
Computer Architecture Performance Analysis
34 pages
Lecture Ch4 Performance
No ratings yet
Lecture Ch4 Performance
25 pages
Performance
No ratings yet
Performance
12 pages
COD Ch. 2 The Role of Performance
No ratings yet
COD Ch. 2 The Role of Performance
13 pages
Lecture4 Performance Evaluation
No ratings yet
Lecture4 Performance Evaluation
34 pages
CSE 332 L4 - 14 Nov 2020
No ratings yet
CSE 332 L4 - 14 Nov 2020
41 pages
CompArch Studcopy4units
No ratings yet
CompArch Studcopy4units
22 pages
William Stallings Computer Organization and Architecture 8 Edition Computer Evolution and Performance
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Computer Evolution and Performance
28 pages
Week 2 - Lecture 2 - Performance Measurement
No ratings yet
Week 2 - Lecture 2 - Performance Measurement
25 pages
Performance Measures
No ratings yet
Performance Measures
5 pages
Processor's Performance: Parth Shah Parthshah - Ce@charusat - Ac.in
No ratings yet
Processor's Performance: Parth Shah Parthshah - Ce@charusat - Ac.in
49 pages
Ilovepdf - Merged (4) 36 274
No ratings yet
Ilovepdf - Merged (4) 36 274
120 pages
Lecture4 Performance Evaluation 2011
No ratings yet
Lecture4 Performance Evaluation 2011
34 pages
Performances of Computer Systems: CSE 675.02: Introduction To Computer Architecture
No ratings yet
Performances of Computer Systems: CSE 675.02: Introduction To Computer Architecture
52 pages
Cse 317 2
No ratings yet
Cse 317 2
35 pages
Computer Architecture 2
No ratings yet
Computer Architecture 2
17 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
17 pages
Da Ci
No ratings yet
Da Ci
13 pages
Computer Organization and Architecture (AT70.01)
No ratings yet
Computer Organization and Architecture (AT70.01)
29 pages
4 Perfrmance
No ratings yet
4 Perfrmance
30 pages
Chapter1 Conclusion
No ratings yet
Chapter1 Conclusion
6 pages
Computer Performance Metrics
No ratings yet
Computer Performance Metrics
19 pages
Performance Matrices
No ratings yet
Performance Matrices
14 pages
Chapter 2-Part 12 1
No ratings yet
Chapter 2-Part 12 1
38 pages
Lecture - 4 - Performance
No ratings yet
Lecture - 4 - Performance
31 pages
Basic Performance Equation and Problems
40% (5)
Basic Performance Equation and Problems
4 pages
Computer Architecture Unit1
No ratings yet
Computer Architecture Unit1
20 pages
Chapter1 SecondTerm
No ratings yet
Chapter1 SecondTerm
6 pages
Week 10 Part 02 - Processor Performance (Answers)
No ratings yet
Week 10 Part 02 - Processor Performance (Answers)
35 pages
Chapter 3: Operating-System Structures: 1. Process Management
No ratings yet
Chapter 3: Operating-System Structures: 1. Process Management
8 pages
Computer Architecture Answers
No ratings yet
Computer Architecture Answers
20 pages
Week1 - Parallel and Distributed Computing
100% (1)
Week1 - Parallel and Distributed Computing
46 pages
Java Intro
No ratings yet
Java Intro
169 pages
Os Answer 1
No ratings yet
Os Answer 1
3 pages
M.Sc. IT Course Overview
No ratings yet
M.Sc. IT Course Overview
14 pages
Processes in Unix
No ratings yet
Processes in Unix
25 pages
Unit 1 (OS)
No ratings yet
Unit 1 (OS)
26 pages
CPU Scheduling Algorithms (Report)
91% (11)
CPU Scheduling Algorithms (Report)
17 pages
8th Semester Syllabus
No ratings yet
8th Semester Syllabus
2 pages
Operating System
No ratings yet
Operating System
5 pages
Certificate in Cyber Security
No ratings yet
Certificate in Cyber Security
13 pages
Memory Management of Operating System
No ratings yet
Memory Management of Operating System
37 pages
Concurrency: CS2403 Programming Languages
No ratings yet
Concurrency: CS2403 Programming Languages
44 pages
Operating Systems For Wireless Sensor Networks: An Overview
No ratings yet
Operating Systems For Wireless Sensor Networks: An Overview
6 pages
System Overview: Total CPU Utilization (System - Cpu)
No ratings yet
System Overview: Total CPU Utilization (System - Cpu)
82 pages
Ethercat Doc
100% (1)
Ethercat Doc
101 pages
Labview Multicore Systems
No ratings yet
Labview Multicore Systems
86 pages
Os Unit 1
No ratings yet
Os Unit 1
63 pages
Ste Cs1q1m4 Afgbmts
No ratings yet
Ste Cs1q1m4 Afgbmts
22 pages
Course Module CpE Operating System 1
No ratings yet
Course Module CpE Operating System 1
84 pages
Distributed Systems Ch2-2022
No ratings yet
Distributed Systems Ch2-2022
10 pages
Unit 1 & 2 Two Marks
No ratings yet
Unit 1 & 2 Two Marks
14 pages
Lecture 03
No ratings yet
Lecture 03
37 pages
CB002 MSOA CF (EIM) (LP5) Excel Advanced
No ratings yet
CB002 MSOA CF (EIM) (LP5) Excel Advanced
23 pages
Bangladesh Army University of Engineering & Technology (BAUET)
No ratings yet
Bangladesh Army University of Engineering & Technology (BAUET)
3 pages
Embedded System
No ratings yet
Embedded System
14 pages
Unit 4
No ratings yet
Unit 4
10 pages
Unit 1:: Computer Science
No ratings yet
Unit 1:: Computer Science
21 pages
Introduction To Linux Kernel
No ratings yet
Introduction To Linux Kernel
22 pages