0% found this document useful (0 votes)
10 views42 pages

1 Cluster Computing

The document provides an introduction to cluster computing, defining it as a group of linked computers that work together to enhance performance and availability. It discusses the history, configuration, advantages, types of clusters, and key design challenges associated with cluster computing. Additionally, it covers practical aspects such as secure connections using SSH, file transfer methods, and basic UNIX commands for managing files and directories.

Uploaded by

itstd.5415
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views42 pages

1 Cluster Computing

The document provides an introduction to cluster computing, defining it as a group of linked computers that work together to enhance performance and availability. It discusses the history, configuration, advantages, types of clusters, and key design challenges associated with cluster computing. Additionally, it covers practical aspects such as secure connections using SSH, file transfer methods, and basic UNIX commands for managing files and directories.

Uploaded by

itstd.5415
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Introduction to

Cluster
Computing
Dr. Hrachya Astsatryan,
Institute for Informatics and Automation Problems,
National Academy of Sciences of Armenia,
E-mail: hrach@sci.am Slide 1
1
INTRO TO
CLUSER
COMPUTING
Slide 2
Definition

• A computer cluster is a group of linked computers, working


together closely so that in many respects they form a single
computer.

• The components of a cluster are commonly, but not always,


connected to each other through fast local area networks.

• Clusters are usually deployed to improve performance and/or


availability over that provided by a single computer, while
typically being much more cost-effective than single computers of
comparable speed or availability. Slide 3
History
• In the 1960s and 1970s, high-end
mainframes were standard in large
organizations and research
institutions but had limited
processing power and memory.
• In the 1980s, researchers and
engineers began experimenting
with connecting multiple low-cost
computers, often desktop PCs, to
form a cluster, thereby creating a
more powerful computing resource.

Slide 4
Configuration

High Speed Network Computing nodes - process the


user load

compute compute compute compute File Front-end - monitor the cluster


node node node node Server hardware and software, taking
node measures to reconfigure it
according to any event.
Service Network
Service network - where the
gateway Front-end communication between nodes
node takes place.
External Network
File server node - where the data is
available to all computing nodes.
Slide 5
Key Facts

As of November 2023, there are 60 MPP


supercomputers and 440 clusters in the
Top500 list.

Slide 6
Advantage

• Cost-Effectiveness - commodity hardware, making it a cost-effective alternative to


traditional supercomputers.
• Scalability - expand the computing power according to needs.
• Fault Tolerance - If one processor or node fails, the rest of the cluster can continue
to function without interruption.
• Performance - combined computational power of multiple nodes working in parallel.
• Easy Maintenance and Upgrades - composed of standard components,
maintenance and upgrades are generally straightforward.
• Flexibility - flexible to tailor the hardware and software configuration according to
specific requirements.
• Distributed Data Storage
• Widely Used Parallel Programming Models.

Slide 7
Design challenge
Which network to use?
• Latency
• Bandwidth
• Price

Which CPU architecture to use?


• Performance (FP)
• Price

Which node architecture to use?


• Performance: local and remote communication
• Price

Space Considerations
• Cooling/ventilation
• Power required

Slide 8
2
CLUSTER TYPES

Slide 9
MAIN TYPES

High performance
Load Balancing
computing (HPC)
High Availability

Slide 10
Load balancing

• PC cluster deliver load balancing performance by distributing computational tasks


and network traffic evenly across multiple nodes in a cluster

• It ensures that each node in the cluster receives a fair share of the workload,
preventing overloading on specific nodes

• Load balancing can be implemented at various levels, including application-level,


transport-level, and network-level.

• Commonly used with busy ftp and web servers with large client base

• Large number of nodes to share load


Slide 11
Kinds of clusters – load balancing
• Round Robin - distributes tasks
sequentially to each node in a
cyclic manner.
Node Node • Weighted Round Robin - assigns
Node different weights to nodes based
on their processing power, giving
more tasks to powerful nodes.
• Least Connections - routes tasks to
the node with the fewest active
connections, distributing the load
evenly.
• Weighted Least Connections -
Head Node Similar to the least connections
algorithm, but considers node
weights as well.
• ..
Slide 12
HPC (Beowulf)
• Start from 1994

• Donald Becker of NASA assemble the world’s first cluster with 16 sets of DX4 PCs and
10 Mb/s ethernet

• Also called Beowulf cluster

• Built from commodity off-the-shelf hardware

• Applications like data mining, simulations, parallel processing, weather modelling,


computer graphical rendering, etc.

• Beowulf cluster architecture remains a popular and cost-effective solution for high-
performance computing, allowing researchers and organizations to tackle complex
computational challenges efficiently. Slide 13
Kinds of clusters - HPC

Node Node Node

Head Node

Slide 14
Kinds of clusters - HPC

Data
sent

Data
sent

Data
sent

Slide 15
HPC clusters
Working

Working

Working

Working

Slide 16
HPC clusters
Finished
Results …

Get
results…
Finished
Results

Finished

Results

Slide 17
High availability

• Avoid downtime of services

• Avoid single point of failure

• Always with redundancy

• Almost all load balancing cluster are with HA capability

Slide 18
Menti 1: 1581 5048

Slide 19
HPC user environment

• Operation system: Linux (Redhat/CentOS, Ubuntu, etc), Unix.


• Access to HPC cluster: ssh
• File transfer: secure ftp (scp)
• Job scheduler: Slurm, PBS, SGE, Loadleveler
• Software management: module
• Compilers: Intel, GNU, PGI
• MPI implementations: OpenMPI, MPICH, MVAPICH, Intel MPI
• Debugging and profiling tools: Totalview, Tau, DDT, Vtune
• Programming Languages: C, C++, Fortran, Python, Perl, R, MATLAB,
Julia

Slide 20
3
ACCESS TRANSFER
TO FRONT-END
Slide 21
Secure Connection with SSH
SSH is a cryptographic network protocol for secure remote access and
data exchange.
It's widely used for connecting to HPC clusters, remote servers, and
cloud instances.
SSH is a fundamental tool for maintaining the privacy and integrity of
your interactions with remote machines.
• Encryption: SSH encrypts data in transit, preventing eavesdropping.
• Authentication: It ensures a secure login process using keys or
passwords.
• Secure File Transfer: SSH includes
SCP and SFTP for secure data transfer.

Slide 22
SSH clients
Linux/MacOS

• OpenSSH: - Most Linux distributions come with OpenSSH pre-


installed.

Windows
• PuTTY - a popular open-source SSH client for Windows. It's a
lightweight and easy-to-use tool for remote access.
• Cygwin - a large collection of GNU and Open Source tools that
provide functionality similar to a Linux distribution.
• PowerShell - a powerful and versatile command-line shell and
scripting language developed by Microsoft. PowerShell can also be
used as an SSH client on Windows to connect to HPC clusters.
• Filezilla
Slide 23
Connect ssh sever
To establish an SSH connection, we can use the ssh command
followed by the remote server's hostname or IP address and your
username:

• ssh username@remote-server

You may be prompted to enter your password or use SSH key-based


authentication, depending on the server's configuration.

Slide 24
SSH key-based authentication
Key pair generation
• You generate a pair of cryptographic keys - a public key and a
private key.
• The private key should be kept secure on your local machine.
• The public key is placed on the remote server or HPC cluster.

Authentication process
• When you attempt to connect to the remote server, your local SSH
client uses your private key to create a digital signature.
• The server checks if the digital signature matches the public key
stored on the server.
• If they match, you are granted access without the need for a
password.

Slide 25
SSH key-based authentication steps
Generate SSH key pair
• ssh-keygen -t rsa -b 2048 -f ~/.ssh/id_rsa

Copy public key to remote server


• ssh-copy-id user@remote-server

Use ssh-copy-id (usually in ~/.ssh/id_rsa.pub) to the


~/.ssh/authorized_keys file on the HPC cluster
• ssh-copy-id user@remote-server

Secure your private key


• chmod 600 ~/.ssh/id_rsa

Test Connection:
Slide 26
Best practices for ssh key passwords

Creating a strong passphrase


• Craft a memorable passphrase that's both secure and easy to
remember.
• "Consider using a passphrase with 32 characters or more,
incorporating punctuation marks and number-for-letter
substitutions.

Password managers for convenience


• Utilize a password manager like KeePass or BitWarden with built-in
password generators.

Slide 27
ACCESS
• Install Powershell / Cygwin

• Download pem certificate, https://shorturl.at/acsK8

• ssh -i Private.pem ubuntu@185.127.66.38

Slide 28
4
UNIX COMMANDS
AND HINTS
Slide 29
File system exploration

ls - list current directory contents

cd <directory-to-change-to> - change the current directory


• cd .. - change to “one level higher” in directory tree
• cd - (without argument) change to $HOME
• cd /shared/home

pwd - print full path of the current directory

Slide 30
File manipulation
• mkdir <new-directory-name> - create a directory
mkdir your_name

• touch <new-file-name> - create an empty new file

• cp <file-to-copy> <destination> - copy a file

• mv <file-to-move> <destination/new-file-name> - move or


rename a file

• rm <file-to-remove> - delete a file


Slide 31
Permissions

chmod <whowhatwhich> <file-name> - change file


permissions
• who -> u: user , g: group , o: others, a: all
• what -> -:remove permission, +: add permission
• which -> r: read, w: write, x: execute
• example chmod u+x my-batch-job-script.sh adds execution
rights for current user to the file

chgrp - change file/folder owner


Slide 32
Check file contents

• less <text-file> - see text file (exit with q)

• cat <file-name> - see file content

• head <file-name> - list ten first lines of the file

• tail -100 <file-name> - show the last 100 lines

Slide 33
Editors Vi

vi <file-name> - (create and) open file with vi

• press i to switch to “edit mode”

• edit your file

• when done, press esc to switch to “normal mode”

• press :wq to save (write) the file and exit (quit) the editor

Slide 34
4
FILE TRANSFER TO
FRONT-END
Slide 35
FTP
The TCP/IP protocol was developed in the late 1970s and early 1980s. The first FTP
standard was RFC 114, published in April 1971, before TCP and IP even existed. In
1980 the first standard to define FTP operation over modern TCP/IP was created at
around the same time as the other primary defining standards for TCP/IP.

• FTP was created with the overall goal of allowing indirect use of computers on a
network, by making it easy for users to move files from one place to another.
• Like most TCP/IP protocols, the FTP based on a client/server model, with an FTP
client on a user machine creating a connection to an FTP server to send and
retrieve files to and from the server.
• The main objectives of FTP were to make file transfer simple, and to shield the
user from implementation details of how the files are actually moved from one
place to another.
Slide 36
FTP model

• FTP server implementations enable simultaneous access by


multiple clients.

• Clients use TCP reliable protocol to connect to a server.

• The FTP server process awaits connections and creates a


slave process to handle each connection.

• The slave process accepts and handles a control connection


from the client.

Slide 37
FTP: port number and data

• The client uses a random protocol port number during the initial
connection to a server.

• The client contacts the server at a common port number (port


21).

• client sends that port number across the control connection to


the server.

• The client waits for the server to form a TCP connection to that
specified port. The server uses port 20 for the FTP data transfer.

Slide 38
FTP: connect

ftp <username>@<hostname>

• Client

• Web browser

Slide 39
FTP: commands
• CWD - change working directory.

• LIST - list remote files

• MKD - make a remote directory

• PWD - print working directory

• QUIT - terminate the connection

• SIZE - return the size of a file

• USER - send username


Slide 40
Copy files to front-end
SCP and SFTP both run over ssh and are thus encrypted.

Linux
• Copy files from your computer to the cluster: scp local_filename
username@remote server
• Copy files from to thecluster to your computer
• scp username@remote_server:/home/username/remote_filename .

Windows
• FileZilla

Slide 41
TRANSFER YOUR FILES

Copy the file ”file.txt" from the local host to a remote host
• scp -i Private.pem file.txt
ubuntu@185.127.66.38:/shard/home/your_dir

Copy the file ”file.txt" from a remote host to the local host
• scp -i Private.pem ubuntu@185.127.66.38:/home/ubuntu/
file.txt

Slide 42

You might also like