Learning High Performance Computing with a Pi Cluster
Martin Siebenborn
June 11, 2016
Outline
What is HPC?
Constructing manual for a Pi cluster
Educating HPC for math students
Should I build such a cluster for my projects?
M. Siebenborn (Trier University) HPC with a Pi Cluster June 11, 2016 1 / 25
Motivation for high performance computing
One of the driving forces for the development of super computers are
numerical simulations, e.g. fluid dynamics.
Weather prediction, Source: Wikipedia
Physical principals + numerical methods very large linear system
Solve Ax = b with A Rnn , x, b Rn for n 109
M. Siebenborn (Trier University) HPC with a Pi Cluster June 11, 2016 2 / 25
Moors law I
The complexity of integrated circuits doubles every 18 months, e.g.
measured by the number of transistors
But this can not be continued arbitrarily
Physical limits in the manufacturing of semiconductors
M. Siebenborn (Trier University) HPC with a Pi Cluster June 11, 2016 3 / 25
Moors law I
The complexity of integrated circuits doubles every 18 months, e.g.
measured by the number of transistors
But this can not be continued arbitrarily
Physical limits in the manufacturing of semiconductors
By now there are transistors with a size of approx. 14 nm
Compared to the wavelength of visible light 400 - 700 nm
M. Siebenborn (Trier University) HPC with a Pi Cluster June 11, 2016 3 / 25
Moors law I
The complexity of integrated circuits doubles every 18 months, e.g.
measured by the number of transistors
But this can not be continued arbitrarily
Physical limits in the manufacturing of semiconductors
By now there are transistors with a size of approx. 14 nm
Compared to the wavelength of visible light 400 - 700 nm
"Computers are not getting faster but wider."
Solution may be cluster computers with fast interconnects
M. Siebenborn (Trier University) HPC with a Pi Cluster June 11, 2016 3 / 25
Moors law II
Source: http://de.wikipedia.org/wiki/Mooresches_Gesetz
M. Siebenborn (Trier University) HPC with a Pi Cluster June 11, 2016 4 / 25
Shared and distributed memory systems
memory I/O
bus
cache cache coherence cache
CPU CPU
Shared memory system
Laptops, smartphones . . .
M. Siebenborn (Trier University) HPC with a Pi Cluster June 11, 2016 5 / 25
Shared and distributed memory systems
memory I/O network
memory memory
bus
cache cache cache cache
cache cache coherence cache CPU CPU CPU CPU
node node
CPU CPU
Shared memory system Distributed memory system
Laptops, smartphones . . . Large webservers, databases,
supercomputers
M. Siebenborn (Trier University) HPC with a Pi Cluster June 11, 2016 5 / 25
Top 500 list of super computers
Source: https://en.wikipedia.org/wiki/TOP500
M. Siebenborn (Trier University) HPC with a Pi Cluster June 11, 2016 6 / 25
Super computers in Germany
Cray XC40 at HLRS Stuttgart
Peak performance 7420 TFlops
Weight 61.5 T
Compute nodes 7712 each with 2x 12 cores (185,088 cores)
Memory per node 128 GB
Power consumption 3.2 MW
Source: https://www.hlrs.de/en/systems/cray-xc40-hazel-hen/
M. Siebenborn (Trier University) HPC with a Pi Cluster June 11, 2016 7 / 25
Super computer in a nutshell
Typical components of supercomputers we have to imitate:
1 The frontend
A dedicated login node managing user interaction
Accessible from the outside world (e.g. via ssh)
Manage data, compile code, place your program run in the execution
queue
2 The backend
Nodes dedicated for computing only, not directly accessible
3 Input/Output devices
Parallel IO is problematic, not touched in our small setup
M. Siebenborn (Trier University) HPC with a Pi Cluster June 11, 2016 8 / 25
A blueprint of the hardware
power strip
2.5''
USB
HDD
gigabit
ethernet
USB power switch
rpi01 HUB
wi
dongle
rpi02 rpi03 rpi04 rpi17
M. Siebenborn (Trier University) HPC with a Pi Cluster June 11, 2016 9 / 25
Material needed
amount price (euro)
Raspberry Pi 2 B 17 38.00
Micro SD 16GB 17 5.00
UBS power charger 17 8.00
Network cable 17 0.50
RPi cases 17 6.00
Ethernet switch 24 port 1 150.00
USB power HUB 1 25.00
2.5 USB HDD 1TB 1 55.00
Wifi dongle 1 10.00
Wood, screws, cable ties . . . 35.00
1252.50
M. Siebenborn (Trier University) HPC with a Pi Cluster June 11, 2016 10 / 25
Installation instructions
Basically, we made 2 Raspbian Wheezy installations
The login node
One compute node that is cloned 16 times
Distribution does not matter
However, be sure to have hard float support
On both installation we create the user pi
We do not want to copy our program 16x whenever it is changed
All compute nodes have to share the same /home/pi folder
M. Siebenborn (Trier University) HPC with a Pi Cluster June 11, 2016 11 / 25
Package setup
On the server we need the following packages:
1 usbmount
Automatically mount the extern USB HDD after boot
2 openssh-server
Access the server without keyboard or monitor
Passwordless authentication is required later
3 nfs-kernel-server
Share the home folder on the USB HDD to the compute nodes
4 ntp
Server must receive time from the internet and provide it to the
compute nodes
For some scenarios it is important that all compute nodes precisely
share the same time
5 gcc, g++, gfortran, openmpi . . .
M. Siebenborn (Trier University) HPC with a Pi Cluster June 11, 2016 12 / 25
Network share
1 On the compute node we install nfs-common, openssh-server,
openmpi
2 On server side modify /etc/exports to
/ m e d i a / e x t e r n _ u s b / 1 9 2 . 1 6 8 . 0 . 1 / 2 4 ( rw , f s i d =0 , i n s e c u r e , n o _ s u b t r e e _ c h e c k , a s y n c )
/ m e d i a / e x t e r n _ u s b / pi_home 1 9 2 . 1 6 8 . 0 . 1 / 2 4 ( rw , n o h i d e , i n s e c u r e , n o _ s u b t r e e _ c h e c k , a s y n c )
3 On client side add the following to /etc/fstab
1 9 2 . 1 6 8 . 0 . 1 : / pi_home /home/ p i n f s
rw , n o u s e r , a t i m e , _netdev , dev , h a r d , i n t r , r s i z e =8192 , w s i z e =8192 0 2
1 9 2 . 1 6 8 . 0 . 1 : / pi_opt / opt n f s
r o , n o u s e r , a t i m e , _netdev , dev , h a r d , i n t r , r s i z e =8192 , w s i z e =8192 0 2
4 In /etc/ntp.conf add the login node as ntp server
M. Siebenborn (Trier University) HPC with a Pi Cluster June 11, 2016 13 / 25
Getting to know each other
In the file /etc/hosts we add the following lines to get rid of IP addresses:
192.168.0.1 rpi01
192.168.0.2 rpi02
192.168.0.3 rpi03
192.168.0.4 rpi04
192.168.0.5 rpi05
192.168.0.6 rpi06
192.168.0.7 rpi07
192.168.0.8 rpi08
192.168.0.9 rpi09
192.168.0.10 rpi10
192.168.0.11 rpi11
192.168.0.12 rpi12
192.168.0.13 rpi13
192.168.0.14 rpi14
192.168.0.15 rpi15
192.168.0.16 rpi16
192.168.0.17 rpi17
M. Siebenborn (Trier University) HPC with a Pi Cluster June 11, 2016 14 / 25
Cloning the compute nodes
Time to clone the compute nodes!
1 Copy the content of /home/pi to the USB HDD and then delete it
locally
2 Clone the SD card for each compute node
Use dd for that
After each cloning mount the second partition of the SD card and
adjust the file /etc/hostname to, e.g. rpi09 for the 9th compute node
Since we use static IP addresses we have to adjust the file
/etc/network/interfaces on each compute node to the correct IP
3 Finally, put the cluster computer together and start the engine
M. Siebenborn (Trier University) HPC with a Pi Cluster June 11, 2016 15 / 25
Keeping the machine alive
Changing IP address
The login node uses wifi to connect to campus network
On each boot it gets a different IP address
M. Siebenborn (Trier University) HPC with a Pi Cluster June 11, 2016 16 / 25
Keeping the machine alive
Changing IP address
The login node uses wifi to connect to campus network
On each boot it gets a different IP address
Dynamic DNS
We use dynamic DNS to map the current IP to a global domain name
Free services like afraid.org
We use hpc-workshop.mooo.com
M. Siebenborn (Trier University) HPC with a Pi Cluster June 11, 2016 16 / 25
Keeping the machine alive
Changing IP address
The login node uses wifi to connect to campus network
On each boot it gets a different IP address
Dynamic DNS
We use dynamic DNS to map the current IP to a global domain name
Free services like afraid.org
We use hpc-workshop.mooo.com
Two cron jobs solve access problem
wget noc h e c k c e r t i f i c a t e O u p d a t eu r l >> /tmp/ d n s . l o g 2>&1 &
t o u c h / m e d i a / e x t e r n _ u s b / pi_home / . s t a y i n g a l i v e &> / d e v / n u l l
M. Siebenborn (Trier University) HPC with a Pi Cluster June 11, 2016 16 / 25
Passwordless SSH
Passwordless SSH connections are mandatory:
By now we are asked for a password to log into the cluster
If we start a parallel program, we would be asked for a password for
every compute node
SSH pubic key
ssh-keygen -t rsa -b 4096 on login node
Enter empty password (think twice when and where you do this)
Use ssh-copy-id to bring the public part of the key to rpi02 (and
thereby to all compute nodes)
M. Siebenborn (Trier University) HPC with a Pi Cluster June 11, 2016 17 / 25
Communication over the network
Data exchanges between processors are conducted via Message
Passing Interface (MPI)
We use the OpenMPI C library as MPI implementation
The interface describes a collection of basic exchange routines
Besides point-to-point send and receive we have:
Broadcast Gather Scatter
Reduce( ) Alltoall Allgather
M. Siebenborn (Trier University) HPC with a Pi Cluster June 11, 2016 18 / 25
Mathematical aspects
A few examples:
Parallel calculation of fractals
Load balancing by graph partitioning
Option pricing with Monte Carlo methods
Large scale linear systems
M. Siebenborn (Trier University) HPC with a Pi Cluster June 11, 2016 19 / 25
What does scalability mean?
Informal definitions:
Strong scalability
Fixed problem size
Number of procs is increased
Good scalability: #proc is doubled and runtime is halved
M. Siebenborn (Trier University) HPC with a Pi Cluster June 11, 2016 20 / 25
What does scalability mean?
Informal definitions:
Strong scalability
Fixed problem size
Number of procs is increased
Good scalability: #proc is doubled and runtime is halved
Weak scalability
Most interesting value for supercomputers
Problem size and number of procs is increased
Desirable result: #proc and problem size are doubled, runtime is
constant
M. Siebenborn (Trier University) HPC with a Pi Cluster June 11, 2016 20 / 25
Perfect scalability: Mandelbrot set
Examine boundedness for c C of
zn+1 = zn2 + c , z0 = 0.
(
yes? c becomes white
Is zn for n bounded
no? c becomes black
M. Siebenborn (Trier University) HPC with a Pi Cluster June 11, 2016 21 / 25
Load balancing
Partition nodes such that each processor has the same load
Cut edges result in network communication minimize this
Example: Simulate elastic phenomena in human body
Geometric model and corresponding matrix system
M. Siebenborn (Trier University) HPC with a Pi Cluster June 11, 2016 22 / 25
Volume of the d-dimensional sphere
R
Unit sphere in d is given by y
R
S d = {z d : kzk2 1}
M. Siebenborn (Trier University) HPC with a Pi Cluster June 11, 2016 23 / 25
Volume of the d-dimensional sphere
R
Unit sphere in d is given by y
R
S d = {z d : kzk2 1}
Generate uniformly distributed points in
z [1, 1]d
x
M. Siebenborn (Trier University) HPC with a Pi Cluster June 11, 2016 23 / 25
Volume of the d-dimensional sphere
R
Unit sphere in d is given by y
R
S d = {z d : kzk2 1}
Generate uniformly distributed points in
z [1, 1]d
x
d
Volume of S is approximated by
# dots outside
Vol(S d ) 2d
# dots inside
M. Siebenborn (Trier University) HPC with a Pi Cluster June 11, 2016 23 / 25
Volume of the d-dimensional sphere
R
Unit sphere in d is given by y
R
S d = {z d : kzk2 1}
Generate uniformly distributed points in
z [1, 1]d
x
d
Volume of S is approximated by
# dots outside
Vol(S d ) 2d
# dots inside
Almost perfect scalability
Each compute node works independently on N points
Compute parallel sum of number of red dots
M. Siebenborn (Trier University) HPC with a Pi Cluster June 11, 2016 23 / 25
Wind simulation in a city
Simulation of fluid flows
through a city
Leads to the solution of
large linear systems
Moderate scalability on
the Pi cluster
Slow network is the
bottleneck
M. Siebenborn (Trier University) HPC with a Pi Cluster June 11, 2016 24 / 25
Conclusion
What this cluster is not:
Not suitable for real world problems due to the slow network
Things the students learned:
Many facets of Linux administration and networks
How to implement mathematical methods on a computer using
C++ and MPI
Pitfalls in parallelizing numerical algorithms
M. Siebenborn (Trier University) HPC with a Pi Cluster June 11, 2016 25 / 25