Parallel Computing (CS 633)
January 8, 2024
Preeti Malakar
pmalakar@cse.iitk.ac.in
Logistics
• Class hours: MW 3:30 – 4:45 PM (RM 101)
• Office hour: MTW 5:00 – 5:30 PM (KD 221)
• https://www.cse.iitk.ac.in/users/cs633/2023-24-2
– Lectures will be uploaded after every class
• Announcements/uploads on
– MooKIT
– Course email alias
• Email to the instructor should always be prefixed with
[CS633] in the subject
2
Grading Policy
Participate actively in class
3
Switch OFF All Devices
4
5
Assignments
• Programming assignments in C
• In a group (group size = 3)
– Send group member information by Jan 14 to
{gsarkar,madhavm}@cse.iitk.ac.in
– Include clearly names, roll numbers, IITK email-ids
– Subject of email [CS633 Group]
– Change in group formation is not allowed
• Mode of submission will be explained in due time
6
Assignments
• Credit for early submissions (+5 / day)
– Max credit: +15 / assignment
– Last date of submission will be considered only
• Score reduction for late submissions (-3 / day)
– Max 2 late days / assignment
• None of the assignments can be completed in a day!
Plagiarism will NOT be tolerated
Use of AI tools is NOT allowed
7
Lecture 1
Introduction
Multicore Era
CPU
Intel 4004
(1971)
Single core
single chip
Single core Hydra Multiple cores
(2000)
multiple chips single chip
Cray X-MP IBM POWER4
(1982) (2001)
Multiple cores
multiple chips
9
Moore’s Law (1965)
Number of transistors in a chip doubles every 18 months
[Source: Wikipedia]
“However, it must be programmed with a more complicated parallel programming
10
model to obtain maximum performance.”
Trends
[Source: M. Frans Kaashoek, MIT]
11
12
top500.org (Nov’23)
~ $600 million
~ 7300 sq. ft.
~ 22 MW power
~ 23000 L water
13
green500.org (Nov’23)
Metric of interest: Performance per Watt 14
15
Top #1
supercomputer
https://www.top500.or
g/resources/top-
systems/
16
Making of a Supercomputer
Source: energy.gov 17
Greenest Data Centre?
Source: MIT TR 06/19
18
“The 149,000 square
foot facility built on a
hillside overlooking the
UC Berkeley campus
and San Francisco Bay
will house one of the
most energy-efficient
computing centers
anywhere, tapping into
the region’s mild
climate to cool the
supercomputers at the
National Energy
Research Scientific
Computing Center
(NERSC) and eliminating
the need for
mechanical cooling. ”
https://www.science.org/content/article/climate-change-threatens-supercomputers 19
Top Supercomputers from India
20
Supercomputing in India [topsc.cdacb.in, Jul’23]
21
Source: www.iitk.ac.in
22
Credit: Ashish Kuvelkar, CDAC
23
National Supercomputing Mission Sites
24
Big Compute
25
Massively Parallel Codes
Climate simulation of Earth [Credit: NASA]
26
Discretization
Gridded mesh for a global model [Credit: Tompkins, ICTP]
27
Numerical Weather Models
• Use numerical methods to solve equations
that govern atmospheric processes
• Are based on fluid dynamics and depend on
observations of meteorological variables
• Are used to obtain nowcast/forecast
28
Massively Parallel Simulations
Self-healing material simulation
[Nomura et al., “Nanocarbon synthesis by high-temperature
oxidation of nanoparticles”, Scientific Reports, 2016] 29
Massively Parallel Analysis
[Nomura et al., “Nanocarbon synthesis by high-temperature
oxidation of nanoparticles”, Scientific Reports, 2016]
30
Massively Parallel Codes
Cosmological simulation [Credit: ANL]
31
Massively Parallel Analysis
Virgo Consortium
32
Computational Science
[Source: Culler, Singh and Gupta] 33
Big Data
34
Output Data
10 PB / year
High-
2 PB / simulation
energy
Scaled to 786K cores on Mira
physics
Higgs boson simulation
Source: CERN
240 TB / simulation
Cosmology
Q Continuum simulation
Source: Salman Habib et al.
Climate/weather
Hurricane simulation
Source: NASA 35
Input Data
[Credit: World Meteorological Organization]
36
System Architecture Trends
[Credit: Pavan Balaji@ATPESC’17] 37
I/O trends
NERSC I/O trends [Credit: www.nersc.gov]
38
Compute vs. I/O trends
I/O VS. FLOPS FOR #1 SUPERCOMPUTER IN TOP500 LIST
1.00E-03
1.00E-04
Byte/FLOP
1.00E-05
1.00E-06
1997 2001 2004 2008 2010 2011 2013 2015 2018
39
Why Parallel?
A*
20 hours
2 hours
Not really
40
Parallelism
A parallel computer is a collection of processing
elements that communicate and cooperate to solve
large problems fast.
– Almasi and Gottlieb (1989)
41
Speedup
Example – Sum of squares of N numbers
Serial Parallel
for i = 1 to N for i = 1 to N/P
sum += a[i] * a[i] sum += a[i] * a[i]
collate result
O(N) O(N/P) +
Communication time
42
Performance Measure
• Speedup
Time ( 1 processor)
SP =
Time ( P processors)
• Efficiency
SP
EP =
P
43
Parallel Performance (Parallel Sum)
Parallel efficiency of summing 10^7 doubles
#Processes Time (sec) Speedup Efficiency
1 0.025 1 1.00
2 0.013 1.9 0.95
4 0.010 2.5 0.63
8 0.009 2.8 0.35
12 0.007 3.6 0.30
44
Ideal Speedup
Speedup Linear
Superlinear
Sublinear
Processors
45
Issue – Scalability
[Source: M. Frans Kaashoek, MIT]
46
Scalability Bottleneck
Performance of weather simulation application
47
Parallelism
A parallel computer is a collection of processing
elements that communicate and cooperate to solve
large problems fast.
– Almasi and Gottlieb (1989)
48
Distributed Memory Systems
• Networked systems
Node • Distributed memory
• Local memory
• Remote memory
• Parallel
Codefile system
Cluster
49
Parallel Programming Models
Libraries MPI, TBB, Pthread, OpenMP, …
New languages Haskell, X10, Chapel, …
Extensions Coarray Fortran, UPC, Cilk, OpenCL, …
• Shared memory
– OpenMP, Pthreads, …
• Distributed memory
– MPI, UPC, …
• Hybrid
– MPI + OpenMP
50
This course …
51
Large-scale Parallel Computing
Message Parallel
passing algorithms
Designing Performance
parallel codes analysis
52
Message Passing Paradigm
• Point-to-point (P2P) communications
• Collective communications
• Algorithms
• Performance
53
Profiling
54
Parallel I/O
NOT SHARED
2 GB/s SHARED
BRIDGE NODES
4 GB/s
IB NETWORK
128:1
Compute node rack I/O nodes GPFS filesystem
11
Job Scheduling
Wikipedia
NODES USERS
JOBS
Example of a real supercomputer activity
- Argonne National Laboratory Theta jobs
56
Supercomputer Activity
57
Reference Material
• DE Culler, A Gupta and JP Singh, Parallel Computer Architecture:
A Hardware/Software Approach Morgan-Kaufmann, 1998.
• A Grama, A Gupta, G Karypis, and V Kumar, Introduction to
Parallel Computing. 2nd Ed., Addison-Wesley, 2003.
• Marc Snir, Steve W. Otto, Steven Huss-Lederman, David W.
Walker and Jack Dongarra, MPI - The Complete Reference,
Second Edition, Volume 1, The MPI Core.
• Bill Gropp, Using MPI, Third Edition, The MIT Press, 2014.
• Research papers
58