0% found this document useful (0 votes)

26 views42 pages

L1.2 HPC Introduction

Uploaded by

krishnasribodduri07

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views42 pages

L1.2 HPC Introduction

Uploaded by

krishnasribodduri07

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 42

High Performance Computing

• What is HPC?
• Who needs high performance systems?
• How do you achieve high performance?
• How to analyze or evaluate performance?

• Power Performance Tradeoff : Green Computing

• Best architecture/design for a problem
• Parallel Architecture: Design and Programming
• Cloud Computing, FOG/EDGE Computing/IoT
What are Supercomputers Used For?
 Scientific simulations
 Animated graphics
 Analysis of geological data
 Nuclear energy research and meteorology
 Computational fluid dynamics
 Analysis of business data
 Online Sales
 Analysis of social data
 Social media, Facebook, Utube, LinedIn,…
How do you achieve high performance?
 Performance: FLOPS or MIPS
 High Performance = => Increase FLOPS
 How?
How do you achieve high performance?
 How?
 Increase number of FPU of the system
 Increase number processor in the system
 Increase amount of Register/Cache/RAM of system
 Use different cache/RAM mapping/management
policy
 Restructure Program, Use different Language
 Use different compiler
 Use different algorithm/approaches for same
problem
 Cost, AMC, Power Consumption
How ?
 Increase number of FPU of the system
 Vector Processor (SIMD, SSE, MMX), GPU Accelerator
 Increase number processor in the system
 Core i3/i5/i7, Ryzen R3/R5/R7: Dual/Quad/Hexa/Octa cores
 Intel Xeon 4,6,8,10,12,16,18,20, 24, 38 cores
 Intel Xeon Phi ( KNL), 72 cores/288Thread
 AMD Thread Ripper : 8, 16, 32, 64 cores
 Increase amount of Register/Cache/RAM of system
 Big register file/Cache :Power Hungry
 RAM/NVRAM /SSD: No disk moment fast but costly
Technology Trends
• Desktop 8086/80386
– Processor, Mother Board, Co-Processor (Floating Point
Unit), Graphics Card, RAM, Audio, Ethernet
• Desktop Pentium
– Processor (Coprocessor inside) + Mother Board (Audio,
Ethernet) + Graphics Card
• Desktop PIV
– Processor + Mother Board (Graphics + Audio +
Ethernet)
• Desktop Core
– Processor ( Graphics Inside) + Board (Audio, Ethernet)
• Mobile SOC
– Processor + Graphics+ Board (Almost in Chip)
Technology Trend
• Performance is no longer is main issue
– Power, Energy, Cost
– DVFS : run at lower frequency to reduce
power/energy consumption
• Most of modern day servers are
– Under utilized (core, RAM)
– Same for Laptop/Desktop/Mobile
• Under utilized
– Wastage of resources, can be shared
with others
– Sharing methodology (virtualization)
– Leads to Cloud Computing
Technology Trend
• Cloud Computing
– Economy: Similar to OLA/UBER
– Renting Model
• IoT : Many things on Internet
– Control and Management of Big Work
– Sensors and actuators
• FOG
– Peers Computing, Multiple Level
• Edge
– Computing at Edge, Latency sensitive
Cloud/IoT/Edges/FoG
Cloud/IoT/Edges/FoG
User

User Cnode User

Cloud System User

Amazon EC2,
Google AWS,
MS Azure
User

User Cnode User

Server User

at USA
Users of India
Cloud/IoT/Edges/FoG
User

User Cnode User

Cloud System Edge User

Amazon EC2, Server
Google AWS,
MS Azure Edge
Server User

User Cnode User

Server Edge User

at USA Servers
at India Users of India
Cloud/IoT/Edges/FoG
Cloud
System

FC FC

FC FC FC
Technology Trend
• Single processor/Single Computer
– Single processor with SIMD instruction
• Multi Computer
– Cluster, Data need to travel outside PC via LAN cable
• Multi processor
– Tightly coupled, Data no need to travel out side PC,
out side board
• Processor + Accelerator
– PCI or Board level Communication
• Processor and Accelerator in the same chip
– On chip, High BW , Intel Core (Graphics are
in Chip)
• 3D chip
Quest for Performance
Quest for Performance
• Pipelining
• Superscalar Architecture
• Out of Order Execution Single Processor
• Caches, SMT Past research
• ISA Advancements
• Parallelism
– Multi-core processors
– Clusters This is the
current
– Grid, Cloud System
and future
Trend of HPC
• HPC system
– Multi Nodes/Computer/Blades
– Programming Model MPI
• Nodes are Multicore
– Node have accelerators
– Programming Model : OpenMP, OpenCL/Cuda
• Core
– Multi Threaded
– With vector instructions
– 4 issue OOO Pipelines, Multilevel Caches,
– Programming Model: gcc optimized, vectorized
code, OpenMP
Need to study in HPC: User Prospects
• Single Processor
– Architecture: Core Pipeline, Core Multithreading,
Cache Hierarchy, SIMD
– C/C++ Optimization Methods: gcc, OpenMP,
cache optimized code
• Multicore node
– Multicore, Accelerator, Interconnections
– OpenMP Model, Cuda Model, Accelerated
Model
• HPC Server
– Multiple Nodes/Blades, Interconnection,
Storages
Param Ishan HPC
• HPC system : Data Center
• Many Racks, Many rack server in a rack
• Nodes are Multicore : One rack
server

Node/RackServer
Param Ishan SC
• Login Node:
– 2x CPU login nodes, 1x GPU login node
• Head and Management:
– 1 pair head node (in redundant mode), 1x Management Node
• Compute Node:
– 126x compute nodes
– 4x high memory compute nodes
– 16x CPU-GPU hybrid compute nodes
– 16x CPU-MIC hybrid compute nodes.
• Network: FDR InfiniBand network
• Storage :
– 150TB high throughput scratch space.
– 100TB high throughput home area
– 50TB archival for long term data storage.
HPC : overall
• Top 500 HPC : Multiprocessor, Accelerator based
• Applications : Programming Model,
Management
• Cost of HPC: Initial cost (System: Racks, Rack server,
SAS) , Place, AC, ..
• Running Cost of HPC : AMC, Energy,
Management
• HPC on Rent :
– VM, Management, Revenue Model, Cost
Model
– Cloud Model, IasS, PasS, SaaS
(Infra/Platform/Software)
Processors Trends
In the “old days” of scientific supercomputing, leading-edge high
performance systems were specially designed for the HPC
market by companies like Cray, CDC, NEC, Fujitsu, or Thinking Machin
es.

Today the HPC world is dominated by cost-effective, off-the-shelf

systems with processors that were not primarily designed for scientific
computing.

Stored-program computer architecture (SISD)

During the last decade, multicore processors have superseded the tradit
ional single-core designs. In a multicore chip, several processors
(“cores”) execute code concurrently.
Performance metrics and benchmarks brought some architectural
changes like L2 cache, Floating point units etc, to increase the
speed.
Transistors galore: Moore’s Law

• Increasing chip transistor counts and clock speeds have enabled processor designers
to implement many advanced techniques.

• A multitude of concepts have been developed, including the following:

1. Pipelined functional units.
2. Instruction- level parallelism (ILP).
3. Superscalar architecture: “direct” instruction-level
parallelism by enabling an instruction throughput of more than one per cycle.
4. Data parallelism through SIMD instructions. SIMD (Single Instruction Multi-
ple Data) instructions issue identical operations on a whole array of integer or FP operands, in sp
registers.
5. Out-of-Order execution. If arguments to instructions are not available in registers “on time,” .
6. Larger caches.
7. RISC paradigm took place.
8. Multicore processors
9. Pipelining
• Out-of-order execution and compiler optimization must work together
in order to fully exploit superscalarity.

• However, even on the most advanced architectures it is extremely hard for compiler-
generated code to achieve a throughput of more than 2–3 instructions per cycle.
Cache mapping
So far we have implicitly assumed that there is no restriction on which cache line can be
associated with which memory locations. A cache design that follows this
rule is called fully associative. the decision which cache line to replace next if the cache is
full is made by some algorithm implemented in hardware
- least recently used (LRU)
- NRU (not recently used) or random replacement are possible.
Modern processors

• Multicore processors
• Multithreaded processors

• Vector processors
They follow the SIMD paradigm
which demands that a single machine instruction is automatically applied to a la
rge number of arguments of the same type, i.e., a vector.
Multiprocessors
A Sahu 18
Mobile SoC Example

• Heterogeneous
• Diff H/W for
different purpose
• Efficiency in
terms
– Perf.
– Energy
• All in one Chip
Mobile SoC + Peripherals
Similar to motherboard and components assembly
For every components we get dozens of variety to choose

Mobile SoC

antenna
• Apple: A15, M1, M1X, M2
– 2x 3.23GHz (Firestorm) + 4x 2GHz (Icestorm) or 8 core,
Neural engine, GPU
• Qualcomm: SD888, SD870
– y. It has 1 KryoX1@2.8, 3 A78@2.4, 4 A55@1.8, AI,
5G, GPU
• Samsung: Exynos 9611
– 4 A73@2.3Ghz, 4 A53@1.7Ghz, Mali G72, 5G, Codecs
• Huwai : hisilicon9000
– 1 A77@3.13, 3 A77@2.54, 4 A55@2.0, Mali MP24, AI,
5G,
neural
• Mediatek : Dimensity 1200
– 1 A78@3.0, 3 A78@2.6, 4 A55@2.0, Mali MP24, 5G,
AI,
• Benchmarking: Antutu9, Geekbench 5, 3D Mark,
• Saturation of single processor performance
• Speed limit not to crosses : 4GHz
– The ultimate point
– Power consumption is proportional to cube of frequency
P = k.f3
• Single-processor
– Branch prediction accuracy gone upto 95%
– L1 Cache hits gone upto 80%
– ILP (Inst. Level Parall) exploited by uniprocessor is upto
8
– Thread/Data level parallelism needs to exploit
Power Aware Scheduling
• P =1/2 C V2 F = kF3 As V-F Pairs, V α F
– Running Processor at 3 Ghz will consumes 27 times higher
Power as compared to running at 1Ghz
• E = kF3*T
• Running a task at F and F/3 :Assume 3Ghz and 1Ghz
– Ef = k F 3 T
– Ef/3 = k (F/3)3 T*3 = k F3 T /9 = Ef/9
– 3 times slower but 9 times energy efficient
• If time permit reduce the frequency
• If task have enough slack before deadline reduce frequency
• Application specific IC (ASIC)
– High performance, low power than Processor
– But complexity of ASIC design is very High
– Example: 50MP+UHDVideo, GPS Camera in
side mobile handset
– It is fixed for an application
• VLSI technology offering high integration density
• Moore’s Law (In 1965, Gordon Moore Prediction)
– Exponential growth of the number of transistors on an IC
– Doubled every 26 months for the past three decades
• Why more transistors per IC? Smaller transistors, Larger
dice

A Sahu 25
• Many applications are highly parallel
– Take benefit of all parallelism (instruction, data and thread)
• Multiprocessors
• Flexible, programmable, high performance
• Take benefit of all parallelism (instruction, data and thread)
• Likely to be cost/power effective solutions

• Multiprocessors are
– Flexible, programmable, high performance
•Processor are programmable as compared to ASIC
•Flexible in terms of portability as compared to ASIC
•Higher Performance than single processor
• Multiprocessors are likely to be cost/power effective
solutions
– Share lots of resources
•Personal room is costlier than dormitory
•You cannot allocate a Bungalow to each
student: it will too costly
–Hostel room with shared facility is sufficient
– Need not require very high frequency to run
– Lots of replication makes easy to manage and cost
effective in design
– Sharing resource arise many other problems
•Critical Sections
–Lock and Barrier Design
•Coherence
–Shared data at all placed should be same
•Consistency
–Order should be similar to serial
•One processor Interference others
–Share efficiently using some policy

29
• Many applications are highly parallel
– Take benefit of all parallelism (instruction, data and thread)
– Most of the coder write sequential code
– Who will extract parallelism from
applications ?
– There is no successful auto-parallelisation tool till date
» Attempts: Cetus, SUIF, SolarisCC

• Good news: CNN/DNN python parallel library is

quite successful in GPU domain
• Task scheduling in multiprocessors
– Deterministic task scheduling on multiprocessor with
more than 2 processor is NP-Complete problem

• Simple Example
– 8 tasks with execution 2, 4, 8, 5, 6, 4, 3, 20
– Need to executed non-pre-emptively on two processor P1 and P2
– So that overall execution time is minimized
– Solution : Divide 8 tasks in to two subsets, with difference
of their sum is minimized ;Subset Sum Problem
The subset sum problem (SSP) is a decision problem in
computer science. In its most general formulation, there is a
multiset S of integers and a target-sum T, and the question is to
decide whether any subset of the integers sum to precisely T.
The problem is known to be NP-complete

Module 1-Topic 1
No ratings yet
Module 1-Topic 1
36 pages
High-Performance Computing Guide
No ratings yet
High-Performance Computing Guide
13 pages
Unit 1
No ratings yet
Unit 1
31 pages
CC Unit 1
No ratings yet
CC Unit 1
24 pages
Introduction To High-Performance Computing (HPC) : Scientific Research Engineering Data Analytics Machine Learning
No ratings yet
Introduction To High-Performance Computing (HPC) : Scientific Research Engineering Data Analytics Machine Learning
30 pages
Ayushagrawal HPC
No ratings yet
Ayushagrawal HPC
17 pages
HPC Intro Ad OS
No ratings yet
HPC Intro Ad OS
44 pages
Aum HPC Processor
No ratings yet
Aum HPC Processor
17 pages
Module - 01 CC (BCS601)
No ratings yet
Module - 01 CC (BCS601)
47 pages
5 Introduction To Huawei AI Platforms v3.5
No ratings yet
5 Introduction To Huawei AI Platforms v3.5
113 pages
HPC Lecture 3
No ratings yet
HPC Lecture 3
139 pages
Chapter 2
No ratings yet
Chapter 2
15 pages
Introduction to Parallel Computing
No ratings yet
Introduction to Parallel Computing
57 pages
Parallel Programming for Scientists
No ratings yet
Parallel Programming for Scientists
50 pages
HPC - The Next Generation - 1-18-2023
No ratings yet
HPC - The Next Generation - 1-18-2023
86 pages
High Performance Computing
No ratings yet
High Performance Computing
18 pages
HPC in Abstract
No ratings yet
HPC in Abstract
3 pages
HPC Tools and Technologies For Web Programming
No ratings yet
HPC Tools and Technologies For Web Programming
33 pages
Notes
No ratings yet
Notes
72 pages
Advancedcomputer Architecture
No ratings yet
Advancedcomputer Architecture
91 pages
Giulio Corradi Presentation PDF
No ratings yet
Giulio Corradi Presentation PDF
64 pages
High Performance Computing Lecture 1 HPC Public
No ratings yet
High Performance Computing Lecture 1 HPC Public
50 pages
Communication Costs in Parallel Machines
No ratings yet
Communication Costs in Parallel Machines
80 pages
Module 1 - PPT
No ratings yet
Module 1 - PPT
64 pages
High Permance Computing Unit 1
No ratings yet
High Permance Computing Unit 1
9 pages
Visvesvaraya Technological University (VTU) : Created by
No ratings yet
Visvesvaraya Technological University (VTU) : Created by
47 pages
Hyperion Research HPC and AI Processors
No ratings yet
Hyperion Research HPC and AI Processors
14 pages
Cloud Computing Unit-1
100% (1)
Cloud Computing Unit-1
88 pages
CH 1 Lec 1
No ratings yet
CH 1 Lec 1
118 pages
HPC Latest Trends 21122023
No ratings yet
HPC Latest Trends 21122023
40 pages
HPC Lectures 1 5
No ratings yet
HPC Lectures 1 5
18 pages
Presentation CC 1
No ratings yet
Presentation CC 1
63 pages
Document Title
No ratings yet
Document Title
17 pages
Review of LSS CSC
No ratings yet
Review of LSS CSC
21 pages
Supercomputers - A Complete Study (Aug 23, 2025)
No ratings yet
Supercomputers - A Complete Study (Aug 23, 2025)
6 pages
Unit-1 Part-1
No ratings yet
Unit-1 Part-1
14 pages
CC-All 5 Units Notes
No ratings yet
CC-All 5 Units Notes
86 pages
CC Unit1 Notes Compressed
No ratings yet
CC Unit1 Notes Compressed
41 pages
Introduction To GP-GPU and CUDA: High Performance Computing Center Hanoi University of Science & Technology
No ratings yet
Introduction To GP-GPU and CUDA: High Performance Computing Center Hanoi University of Science & Technology
43 pages
HPC Insights for Industry Leaders
No ratings yet
HPC Insights for Industry Leaders
19 pages
Cluster
No ratings yet
Cluster
172 pages
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
No ratings yet
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
41 pages
High Performance Computing (HPC) Lec1
No ratings yet
High Performance Computing (HPC) Lec1
30 pages
New Advances in High Performance Computing and Simulation: Parallel and Distributed Systems, Algorithms, and Applications
No ratings yet
New Advances in High Performance Computing and Simulation: Parallel and Distributed Systems, Algorithms, and Applications
7 pages
Week 4a - Computer Architecture Fundamentals - Part 1
No ratings yet
Week 4a - Computer Architecture Fundamentals - Part 1
45 pages
Core HPC Technologies
No ratings yet
Core HPC Technologies
6 pages
High Performance Computing Unit 1
No ratings yet
High Performance Computing Unit 1
3 pages
Difference Between High-Performance Computing (HPC) High-Throughput Computing
No ratings yet
Difference Between High-Performance Computing (HPC) High-Throughput Computing
49 pages
Hardware
No ratings yet
Hardware
5 pages
CC Notes I Unit
No ratings yet
CC Notes I Unit
31 pages
Introduction To High Performance Computing: Shaohao Chen Research Computing Services (RCS) Boston University
No ratings yet
Introduction To High Performance Computing: Shaohao Chen Research Computing Services (RCS) Boston University
29 pages
CC Notes
No ratings yet
CC Notes
78 pages
GPU Versus FPGA For High Productivity Computing: Imperial College London, Electrical and Electronic Engineering, London
No ratings yet
GPU Versus FPGA For High Productivity Computing: Imperial College London, Electrical and Electronic Engineering, London
6 pages
High End Edge Computing Demands The Server Level Compute Power of COM HPC
No ratings yet
High End Edge Computing Demands The Server Level Compute Power of COM HPC
6 pages
L1.1 HPC Environment
No ratings yet
L1.1 HPC Environment
27 pages
Scalability: Scalable Computing Over The Internet - Simplified
No ratings yet
Scalability: Scalable Computing Over The Internet - Simplified
17 pages
GPU Vs CPU Smackdown - The Rise of Throughput-Oriented Architectures
No ratings yet
GPU Vs CPU Smackdown - The Rise of Throughput-Oriented Architectures
5 pages
L1.3a HPC Concepts
No ratings yet
L1.3a HPC Concepts
43 pages
L1.0 HPC Overview
No ratings yet
L1.0 HPC Overview
58 pages
L1.3b OOOpipelines
No ratings yet
L1.3b OOOpipelines
72 pages
L1.4 - The World of Computing
No ratings yet
L1.4 - The World of Computing
34 pages
Sample Frs Document
50% (2)
Sample Frs Document
20 pages
National Scholarship Portal 2.0: Registration Details
No ratings yet
National Scholarship Portal 2.0: Registration Details
1 page
PGP 2 Case Study
No ratings yet
PGP 2 Case Study
3 pages
Santosh tr.1703853970598
No ratings yet
Santosh tr.1703853970598
17 pages
Yellow Book - Miscellaneous
No ratings yet
Yellow Book - Miscellaneous
77 pages
Listing Agreement: P.Jagannatham Corporate Advocate
No ratings yet
Listing Agreement: P.Jagannatham Corporate Advocate
24 pages
Navigating Assessment in The Digital Realm: Experiences of Educators in A Distance Learning Environment
No ratings yet
Navigating Assessment in The Digital Realm: Experiences of Educators in A Distance Learning Environment
7 pages
CIA Cia1 Bookonline Su6 Outline
No ratings yet
CIA Cia1 Bookonline Su6 Outline
13 pages
2 Review PPT 328
No ratings yet
2 Review PPT 328
14 pages
Siemens PLM Solid Edge Getting The Most Out of Your Trial Fs X21
No ratings yet
Siemens PLM Solid Edge Getting The Most Out of Your Trial Fs X21
3 pages
Ultrasonic Examination
No ratings yet
Ultrasonic Examination
14 pages
Vertical Multistage Pumps Product Specifications EVMS (.) 1-3-5-10-15-20
No ratings yet
Vertical Multistage Pumps Product Specifications EVMS (.) 1-3-5-10-15-20
8 pages
CRD 245 Syllabus Winter 2011 1-5-11
No ratings yet
CRD 245 Syllabus Winter 2011 1-5-11
7 pages
Livpure Air Cooler Catalogue
No ratings yet
Livpure Air Cooler Catalogue
32 pages
CHAPTER 15 17 Investments
No ratings yet
CHAPTER 15 17 Investments
38 pages
Management and Information Technology After Digital Transformation 1st Edition by Peter Ekman, Peter Dahlin, Christina Keller 9781000451665 1000451666 Instant Download
100% (3)
Management and Information Technology After Digital Transformation 1st Edition by Peter Ekman, Peter Dahlin, Christina Keller 9781000451665 1000451666 Instant Download
75 pages
Applications of Boolean Algebra Minterm and Maxterm Expansions
No ratings yet
Applications of Boolean Algebra Minterm and Maxterm Expansions
19 pages
Tybcom Tax Theory Sem 6
No ratings yet
Tybcom Tax Theory Sem 6
4 pages
Deep Learning in Image & Text Processing
No ratings yet
Deep Learning in Image & Text Processing
21 pages
Dieselju6h NL54
No ratings yet
Dieselju6h NL54
8 pages
TOA Audio Catalogue
No ratings yet
TOA Audio Catalogue
33 pages
Let Socsci 2017
No ratings yet
Let Socsci 2017
57 pages
SZX7 SZ61 SZ51
No ratings yet
SZX7 SZ61 SZ51
15 pages
Comac Abila Floor Scrubber
No ratings yet
Comac Abila Floor Scrubber
4 pages
Condition Assessment of Underground Pipes (April 2015)
No ratings yet
Condition Assessment of Underground Pipes (April 2015)
12 pages
Hydrogen Effects on Ni-Cr Alloys
No ratings yet
Hydrogen Effects on Ni-Cr Alloys
21 pages
Proposed Introduction Budget - Fred and Lovicer
100% (6)
Proposed Introduction Budget - Fred and Lovicer
2 pages
Administrative Law
No ratings yet
Administrative Law
28 pages
Al Bustan South Bridge Design and Built Project, Doha Qatar
No ratings yet
Al Bustan South Bridge Design and Built Project, Doha Qatar
11 pages
Red Red Wine
No ratings yet
Red Red Wine
18 pages

L1.2 HPC Introduction

Uploaded by

L1.2 HPC Introduction

Uploaded by

High Performance Computing

High Performance Computing

• Power Performance Tradeoff : Green Computing

User Cnode User

Cloud System User

User Cnode User

User Cnode User

Cloud System Edge User

User Cnode User

Server Edge User

Today the HPC world is dominated by cost-effective, off-the-shelf

Stored-program computer architecture (SISD)

• A multitude of concepts have been developed, including the following:

• Good news: CNN/DNN python parallel library is

You might also like