0% found this document useful (0 votes)

17 views15 pages

Lecture 6 3

The document is an introduction to GPU computing, covering topics such as GPU architecture, programming models, and performance optimization techniques. It contrasts CPU and GPU systems, detailing their components, functionalities, and operational focuses. Additionally, it discusses the evolution of GPUs and their applications beyond graphics processing, highlighting the differences between integrated and dedicated GPUs.

Uploaded by

online online

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views15 pages

Lecture 6 3

Uploaded by

online online

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Introduction to GPU Computing

Dr. Sabbi Vamshi Krishna

Course Outline

 GPU Server Board

• GPU connection on server board.
 GPU Architecture
• Detailed study of GPU cores, memory hierarchies, and compute units.
 Vector Pipeline
• Understanding vector processing in GPUs.
 SIMD vs SIMT
• Comparison of SIMD (Single Instruction, Multiple Data) and SIMT (Single Instruction, Multiple
Threads).

 Row-Major and Column-Major Order

• Memory layout and access patterns.
 GPU Programming Models
• Introduction to different programming models for GPUs.
 CUDA Programming Models
• Detailed exploration of CUDA programming paradigms.
Dr. Sabbi Vamshi Krishna
Course Outline
 CUDA Memories
• Understanding various CUDA memory types (global, shared, local).
 Thread Divergence
• Impact of divergent branches on performance.
 Page Fault and Zero-Copy Concepts
• Memory management techniques including page faults and zero-copy.
 GPU Occupancy
• Concepts of Streaming Multiprocessor (SM) and GPU occupancy.
 Profiler
• Tools and techniques for profiling GPU performance.
 Performance Optimization Techniques
Techniques for optimizing GPU performance such as memory coalescing and occupancy
tuning.

 Python Numba Programming

Using Numba for GPU acceleration in Python.

Dr. Sabbi Vamshi Krishna

CPU System and GPU System

Dr. Sabbi Vamshi Krishna

Server
Boards
Placement of Components
CPU, GPU, RAM and Storage
 Critical components at the heart of computing
Storage infrastructure
 Different architecture to meet diverse processing
CPU1 needs
CPU Server
CPU0  Multiple sockets to accommodate high-
performance CPUs
Motherboard  Handle multiple threads simultaneously and
perform several tasks in parallel.
CPU Compute Node  Focused on balanced performance, providing
robust memory bandwidth, high-speed
interconnects, and support for large amounts of
RAM.
Storage
 Backbone of traditional computing
infrastructures.
CPU1 GPU Server Board
 Includes multiple PCIe slots or specialized
connectors
CPU0
 Equipped with advanced features like NVMe
Motherboard storage for high-speed data access and
networking capabilities to handle large-scale
GPU Compute Node data transfers.
Dr. Sabbi Vamshi Krishna
Peripheral Component Interconnect Express (PCIe) Slots
 General Purpose, high speed interface standard
 Use to connect various components like GPUs, Storage,
Network Cards etc.

Characteristics
 Universal Standard and compatible with wide range of
devices beyond the just GPUs
 PCIe4.0  16GT/s per lane means 16Billion bits per second,
encoding used 128b/130b.
 128bits are actual data, and 2 bits are used for encoding
overhead.

 Effective bit rate= 𝟏𝟐𝟖×16 GT/s=15.754 Gb/s per lane

𝟏𝟑𝟎

 Data rate in GB/s= 𝟏𝟓.𝟕𝟓 = 1.97 GB/s per lane = 2GB/s

𝟖

 Bandwidth for Multiple Lanes =

16 lanes×2 GB/s=32 GB/s in each direction

https://serverfault.com/questions/11633/whats-the-bandwidth-  Operates on Point-to-Point connections model

a n d-form-fac tor-for-pc ie-x1-x4-x8- and-x16
Dr. S a bb i Va m s hi K r i sh na  High latency due to general purpose design
NVLink
 High Speed interconnect technology developed by Nvidia.
 Provide connectivity for GPU to GPU and GPU to CPU in their GPU architecture.
 Designed to overcome the limitation of PCIe interconnect particularly in high
performance computing and Deep learning applications.

Characteristics
 Offer higher bandwidth then PCIe (NVLink 2.0 provides up to 25 GB/s bidirectional
BW per link used in Tesla V100 GPU and support 300 GB/s when fully connected
similar for Nvlink 3.0 used in A100 GPU, support 50GB/s per link with total GPU-GPU
BW up to 600GB/s.
 Designed for low latency, critical for applications where GPUs need to exchange
data frequently and rapidly.
 Support Mesh Network Topolgy, Connects GPUs directly to each other and enables
GPUs to communicate without needing to go through CPU like PCIe, at full Nvlink
speed.
 More power efficient then PCIe and effective in dense computing environment.
Dr. Sabbi Vamshi Krishna
Computing Device
Multi Cores Device  Primary component of computing architecture
 Designed with a few powerful cores
 Capable of executing complex instructions
 Allows for concurrent processing of multiple tasks
 Effective for sequential task execution and complex decision-making
processes
 Handle different aspects of a workload simultaneously.
 Having ability to handle a variety of instructions and execute them with
high precision.
 Having inherent limitations when dealing with highly parallel tasks.
 Excel at single-threaded performance but may struggle with workloads
Many Cores Device that require massive parallelism.

 Initially designed to accelerate rendering tasks in graphical

applications.
 Characterized by its highly parallel structure

GPU
 Having thousands of smaller cores designed to handle many operations
simultaneously.
 Well-suited for tasks that can be parallelized, such as graphics
rendering, complex simulations, and large-scale data processing

Dr. Sabbi Va mshi Kris h na

Nvidia GPU Node View

Dr. Sabbi Vamshi Krishna https://www.broadberry.co.uk/tesla-gpu-rackmount-servers

GPU Evolution
 Central Processing Unit (CPU) used for visual rendering (1990) and
Origin made computer to perform slowly.
 Utilized GPU in combination with a CPU and improved the computer
speed since the GPU could conduct several computations
New Capabilities Current simultaneously
Use  Came into existence as specialized hardware for graphics
computation with primary role of visual output acceleration.
 Today, serve as powerful, programmable accelerators for a wide
Fixed-Function Hardware Evolution range of data-parallel workloads including graphics.
 Supports diverse fields such as machine learning, scientific
computations, and large-scale simulations — wherever massive
parallelism and high throughput are needed.
Programmability Driving Forces

Following are driving forces behind early GPU development

 Appetite for greater realism, more sophisticated effects, higher
Modern screen resolutions, and increased frames per second.
Architecture  Operated with a fixed-function pipeline, dedicated to executing
specific graphical tasks.
The evolution of GPU hardware has been  Gradually introduced programmability across multiple stages of
complexly linked with the changing usage the pipeline.
 enabled GPUs to undertake a broader range of tasks beyond
patterns.
graphics alone.

Dr. Sabbi Vamshi Krishna

A General-Purpose Graphics Processing Unit (GPGPU) is a graphics
processing unit (GPU) that can be used for purposes beyond graphical
processing, such as performing computations typically conducted by a
Central Processing Unit (CPU).

GPGPU vs GPU
“Extending the use of the graphics processor to non-graphic workloads
known as General Purpose GPU (GPGPU) computing”

Dr. Sabbi Vamshi Krishna

“A special-purpose device that supplements the main general-purpose
CPU to speed up certain operations”
Why Called Accelerator or
“A GPU is an additional hardware component that can perform
operations alongside a CPU”

Dr. Sabbi Vamshi Krishna

GPUs come in two flavors

Integrated GPUs
A graphics processor engine which is contained in the CPU.

Do not Have Dedicated Memory

Use the System Memory

GPU Types

Dedicated GPUs
A GPU on a separate peripheral card.

Having Dedicated Memory

Dr. Sabbi Vamshi Krishna

Clock Speed
1 CPU : High clock speed
GPU : Slow clock speed

Cores and Threads

Hardware Performance
2 CPU : Few cores but faster
GPU : Many cores but slower
Comparison

Function
Comparaison CPU vs GPU 3 CPU : Generalized component that handles main
processing functions
Exploring Key Differences GPU : Specialized component for parallel computing

Processing
4 CPU : Designed for serial instruction processing
GPU : Designed for parallel instruction processing

Suited for
5 CPU : General purpose computing applications
GPU : High-performance computing applications

Dr. Sabbi Vamshi Krishna

Operational focus
6 CPU : Low Latency
GPU : High Throughput

Interaction with other components

Hardware Performance 7 CPU : Interacts with more computer components such
Comparison as RAM, ROM, the basic input/output system
(BIOS), and input/output (I/O) ports.
GPU : Interacts mainly with RAM, Display.
Comparaison CPU vs GPU
8 Versatility
Exploring Key Differences CPU : More versatile (Execute numerous tasks)
GPU : Less versatile (Execute limited tasks)

API limitations
9 CPU : No API limitations
GPU : Limited APIs

Context switch latency

10 CPU : Slowly between multiple threads
GPU : No inter-warp context switching
Dr. Sabbi Vamshi Krishna

Lecture - 01 - CUDA Programming
No ratings yet
Lecture - 01 - CUDA Programming
52 pages
Exploring The Gpu Architecture
No ratings yet
Exploring The Gpu Architecture
9 pages
p10 Cuda
No ratings yet
p10 Cuda
28 pages
NVIDIA GPU Evolution: Gaming to AI
100% (1)
NVIDIA GPU Evolution: Gaming to AI
91 pages
Quiz3 - Pacuribot
No ratings yet
Quiz3 - Pacuribot
4 pages
2024 Aq Compute Blogpost - Cpu Vs Gpu
No ratings yet
2024 Aq Compute Blogpost - Cpu Vs Gpu
9 pages
Comp206 Lecture14
No ratings yet
Comp206 Lecture14
29 pages
Chapter 5 - General Purpose PGPU, CUDA
No ratings yet
Chapter 5 - General Purpose PGPU, CUDA
70 pages
Intro Computing BCSM-F18-071 - Assignment 1
No ratings yet
Intro Computing BCSM-F18-071 - Assignment 1
10 pages
WINSEM2022-23 CSE4001 ETH VL2022230503160 Reference Material I 10-01-2023 2.2 GPU
No ratings yet
WINSEM2022-23 CSE4001 ETH VL2022230503160 Reference Material I 10-01-2023 2.2 GPU
34 pages
PDC Lecture 09
No ratings yet
PDC Lecture 09
36 pages
CPU vs GPU: Functions and Real-Life Uses
No ratings yet
CPU vs GPU: Functions and Real-Life Uses
8 pages
Gpu Series I Cpu Vs Gpu 1720694318
No ratings yet
Gpu Series I Cpu Vs Gpu 1720694318
4 pages
Lecture 2
No ratings yet
Lecture 2
15 pages
Graphic Processing Unit
100% (1)
Graphic Processing Unit
20 pages
CPU vs GPU: A Technical Guide
No ratings yet
CPU vs GPU: A Technical Guide
22 pages
UNIT 4 GPU Computing - HPC
No ratings yet
UNIT 4 GPU Computing - HPC
13 pages
Comp Arch Project 2 Final
No ratings yet
Comp Arch Project 2 Final
29 pages
10 GPU-IntroCUDA3
No ratings yet
10 GPU-IntroCUDA3
141 pages
Note2 4
No ratings yet
Note2 4
11 pages
Main Parameters To Evaluate The GPU Performance
No ratings yet
Main Parameters To Evaluate The GPU Performance
40 pages
GPU Introduction
No ratings yet
GPU Introduction
52 pages
Why GPU?: CS8803SC Software and Hardware Cooperative Computing
No ratings yet
Why GPU?: CS8803SC Software and Hardware Cooperative Computing
14 pages
Gpus
No ratings yet
Gpus
32 pages
GPU Computing Course Overview
No ratings yet
GPU Computing Course Overview
17 pages
Report On Gpu
No ratings yet
Report On Gpu
39 pages
Comparative Study On CPU GPU and TPU
No ratings yet
Comparative Study On CPU GPU and TPU
9 pages
GPU Cluster4
No ratings yet
GPU Cluster4
31 pages
GPU Insights for Tech Enthusiasts
No ratings yet
GPU Insights for Tech Enthusiasts
35 pages
GPU Computing for Tech Enthusiasts
No ratings yet
GPU Computing for Tech Enthusiasts
4 pages
Graphics Processing Unit
No ratings yet
Graphics Processing Unit
9 pages
Parallel Processing Using GPU's
No ratings yet
Parallel Processing Using GPU's
34 pages
0 Gpu Computing I Give It
No ratings yet
0 Gpu Computing I Give It
57 pages
GPGPU
No ratings yet
GPGPU
139 pages
GPU Programming: Dr. Florian Ferreira
No ratings yet
GPU Programming: Dr. Florian Ferreira
101 pages
Lecture GPUArchCUDA01
No ratings yet
Lecture GPUArchCUDA01
57 pages
Gpu Computing
No ratings yet
Gpu Computing
57 pages
CAO Report
No ratings yet
CAO Report
17 pages
Introduction To CUDA
No ratings yet
Introduction To CUDA
51 pages
GPU vs CPU: Performance and Efficiency
No ratings yet
GPU vs CPU: Performance and Efficiency
6 pages
CUDA for Developers & Researchers
No ratings yet
CUDA for Developers & Researchers
77 pages
789
No ratings yet
789
5 pages
Introduction To GP-GPU and CUDA: High Performance Computing Center Hanoi University of Science & Technology
No ratings yet
Introduction To GP-GPU and CUDA: High Performance Computing Center Hanoi University of Science & Technology
43 pages
06 Intro Gpus
No ratings yet
06 Intro Gpus
33 pages
GPUIntro
No ratings yet
GPUIntro
21 pages
GPU (Graphics Processing Unit)
No ratings yet
GPU (Graphics Processing Unit)
23 pages
Evolution of The Graphics Process Units: Dr. Zhijie Xu Z.xu@hud - Ac.uk
No ratings yet
Evolution of The Graphics Process Units: Dr. Zhijie Xu Z.xu@hud - Ac.uk
24 pages
Owens
No ratings yet
Owens
67 pages
Kirk+Hwu GPU
No ratings yet
Kirk+Hwu GPU
92 pages
CUDA Class Lecture01
No ratings yet
CUDA Class Lecture01
26 pages
Gpu Research Paper
No ratings yet
Gpu Research Paper
6 pages
GPU Overview and Features Guide
No ratings yet
GPU Overview and Features Guide
14 pages
Lecture-12-PDC - CUDA
No ratings yet
Lecture-12-PDC - CUDA
25 pages
Graphics Processing Unit Thesis
100% (2)
Graphics Processing Unit Thesis
4 pages
GPUThe Visual Powerhouse 0 C 83 D 6453842 B 911
No ratings yet
GPUThe Visual Powerhouse 0 C 83 D 6453842 B 911
13 pages
Ada2024 Gpu 1
No ratings yet
Ada2024 Gpu 1
47 pages
GPU Gpgpu Computing: Rajan Panigrahi
No ratings yet
GPU Gpgpu Computing: Rajan Panigrahi
24 pages
Architecture, Applications, and Accelerating AI
No ratings yet
Architecture, Applications, and Accelerating AI
11 pages
GPU Programming Course Schedule
No ratings yet
GPU Programming Course Schedule
33 pages
Lecture 6 4
No ratings yet
Lecture 6 4
10 pages
Probability Models 2
No ratings yet
Probability Models 2
83 pages
Hypothesis Testing Week 3 NPTEL (FINAL)
No ratings yet
Hypothesis Testing Week 3 NPTEL (FINAL)
37 pages
Lesson 9 Week 1
No ratings yet
Lesson 9 Week 1
41 pages
Assignment Week 4
No ratings yet
Assignment Week 4
3 pages
Assignment Week 5
No ratings yet
Assignment Week 5
4 pages
Lesson 2 Week 1
No ratings yet
Lesson 2 Week 1
30 pages
Descriptive Analytics Week 3 NPTEL (FINAL)
No ratings yet
Descriptive Analytics Week 3 NPTEL (FINAL)
29 pages
Lesson 7 Week 1
No ratings yet
Lesson 7 Week 1
58 pages
Lesson 8 Week 1
No ratings yet
Lesson 8 Week 1
46 pages
Metallurgical & Materials Engineering Domain Booklet - Dec - 2024
No ratings yet
Metallurgical & Materials Engineering Domain Booklet - Dec - 2024
7 pages
Model Question Paper - 2 (P-XIV)
No ratings yet
Model Question Paper - 2 (P-XIV)
9 pages
My Break-Even Notes
No ratings yet
My Break-Even Notes
3 pages
Total No. of Pages 2
No ratings yet
Total No. of Pages 2
4 pages
Case studies-FOM
No ratings yet
Case studies-FOM
2 pages
Time: 1:30 Hrs Max. Marks: 20: B. Tech. (PE) Fourth Semester Class Test-1 (February - 2021) Me-262 Machine Design - I
No ratings yet
Time: 1:30 Hrs Max. Marks: 20: B. Tech. (PE) Fourth Semester Class Test-1 (February - 2021) Me-262 Machine Design - I
2 pages
Buisness 12th Part-2
No ratings yet
Buisness 12th Part-2
142 pages
CPA Foundation Management Fundamentals Sample Paper and Suggested Solutions
No ratings yet
CPA Foundation Management Fundamentals Sample Paper and Suggested Solutions
17 pages
Construction and Calibration of A Goniometer To Measure Contact Angles and Calculate The Surface Free Energy in Solids With Uncertainty Analysis
No ratings yet
Construction and Calibration of A Goniometer To Measure Contact Angles and Calculate The Surface Free Energy in Solids With Uncertainty Analysis
25 pages
Postgresql - PHP Interface
No ratings yet
Postgresql - PHP Interface
7 pages
Information Literacy Syllabus
No ratings yet
Information Literacy Syllabus
5 pages
SolidWorks Basics for Beginners
No ratings yet
SolidWorks Basics for Beginners
24 pages
University of Southern Mississippi Completion
No ratings yet
University of Southern Mississippi Completion
6 pages
Weekly Dairy For Capstone Project Group: G23 Topic: Furniture Store Website Date: 1 September 2021 To 15 December 2021 Project Diary Report
No ratings yet
Weekly Dairy For Capstone Project Group: G23 Topic: Furniture Store Website Date: 1 September 2021 To 15 December 2021 Project Diary Report
23 pages
Computer Maintenance and Technical Support Project
100% (1)
Computer Maintenance and Technical Support Project
40 pages
Assignment
No ratings yet
Assignment
2 pages
Wireless Ommunication and Internet of Things
No ratings yet
Wireless Ommunication and Internet of Things
139 pages
Zimbabwe Telecom Sector Q4 2019
No ratings yet
Zimbabwe Telecom Sector Q4 2019
24 pages
Patient Management System - Chapter 2 New
No ratings yet
Patient Management System - Chapter 2 New
9 pages
GSM Transmission Unit for BTS3900
No ratings yet
GSM Transmission Unit for BTS3900
4 pages
Canada Employment 2
No ratings yet
Canada Employment 2
7 pages
Ssedv Algorithm: A Seminar On
100% (1)
Ssedv Algorithm: A Seminar On
18 pages
Computer Network - Priya (Lab Notebook)
No ratings yet
Computer Network - Priya (Lab Notebook)
16 pages
Ict Al Unit1 Edexcel
No ratings yet
Ict Al Unit1 Edexcel
39 pages
BASIC 9 - Introduction To Word Processing - PROF DUKE
No ratings yet
BASIC 9 - Introduction To Word Processing - PROF DUKE
8 pages
NO.1 A. B. C. D.: Answer
No ratings yet
NO.1 A. B. C. D.: Answer
5 pages
The Effects of Cultural Dimensions On Algorithmic News
No ratings yet
The Effects of Cultural Dimensions On Algorithmic News
10 pages
Zeroth Review PPT Template
No ratings yet
Zeroth Review PPT Template
18 pages
FK605 Hanvon Software User Manual
No ratings yet
FK605 Hanvon Software User Manual
18 pages
Daftar Harga: RF640 1.6 MPSC-55502 MPCS-59006 MPCS-59007
No ratings yet
Daftar Harga: RF640 1.6 MPSC-55502 MPCS-59006 MPCS-59007
4 pages
IT Support & Infrastructure Guide
No ratings yet
IT Support & Infrastructure Guide
227 pages
15 Best Linux Distributions For Hacking Pen Testing in 2020 PDF
No ratings yet
15 Best Linux Distributions For Hacking Pen Testing in 2020 PDF
20 pages
AI & ML in Lenskart's Retail Strategy
No ratings yet
AI & ML in Lenskart's Retail Strategy
7 pages
Syllabus CST 476 MC
No ratings yet
Syllabus CST 476 MC
9 pages
Verifying Your Play Console Developer Account For Organizations
No ratings yet
Verifying Your Play Console Developer Account For Organizations
41 pages
Quick Math Using HTML, CSS and JavaScript With Source Code - SourceCodester
No ratings yet
Quick Math Using HTML, CSS and JavaScript With Source Code - SourceCodester
17 pages
NXP iMX7 HDG
No ratings yet
NXP iMX7 HDG
48 pages
ARIAAdvantages FeatureSheet RAD10481 June2018
No ratings yet
ARIAAdvantages FeatureSheet RAD10481 June2018
3 pages
Stacybalbi
No ratings yet
Stacybalbi
1 page

Lecture 6 3

Uploaded by

Lecture 6 3

Uploaded by

Introduction to GPU Computing

Dr. Sabbi Vamshi Krishna

 GPU Server Board

 Row-Major and Column-Major Order

 Python Numba Programming

Dr. Sabbi Vamshi Krishna

Dr. Sabbi Vamshi Krishna

 Effective bit rate= 𝟏𝟐𝟖×16 GT/s=15.754 Gb/s per lane

 Data rate in GB/s= 𝟏𝟓.𝟕𝟓 = 1.97 GB/s per lane = 2GB/s

 Bandwidth for Multiple Lanes =

https://serverfault.com/questions/11633/whats-the-bandwidth-  Operates on Point-to-Point connections model

 Initially designed to accelerate rendering tasks in graphical

Dr. Sabbi Va mshi Kris h na

Dr. Sabbi Vamshi Krishna https://www.broadberry.co.uk/tesla-gpu-rackmount-servers

Following are driving forces behind early GPU development

Dr. Sabbi Vamshi Krishna

Dr. Sabbi Vamshi Krishna

Dr. Sabbi Vamshi Krishna

Do not Have Dedicated Memory

Having Dedicated Memory

Dr. Sabbi Vamshi Krishna

Cores and Threads

Dr. Sabbi Vamshi Krishna

Interaction with other components

Context switch latency

You might also like