0% found this document useful (0 votes)

101 views43 pages

Introduction To GP-GPU and CUDA: High Performance Computing Center Hanoi University of Science & Technology

This document provides an introduction to general purpose computing on graphics processing units (GPGPU) and NVIDIA's CUDA programming model. It discusses how GPUs are optimized for parallel processing compared to CPUs, and outlines CUDA's hardware and memory models. The document also gives examples of bioinformatics and weather forecasting applications that can benefit from GPU acceleration.

Uploaded by

Mato Nguyễn

We take content rights seriously. If you suspect this is your content, claim it here.

0% found this document useful (0 votes)

101 views43 pages

Introduction To GP-GPU and CUDA: High Performance Computing Center Hanoi University of Science & Technology

Uploaded by

Mato Nguyễn

We take content rights seriously. If you suspect this is your content, claim it here.

You are on page 1/ 43

High Performance Computing Center

Hanoi University of Science & Technology

Introduction to GP-GPU and CUDA

Duong Nhat Tan (dn.nhattan@gmail.com)

2012
Outline

 Overview

 What is GPGPU?

 GPU Computing with CUDA

 Hardware Model

 Execution Model

 Thread Hierarchy

 Memory Model

 GPU Computing Application Areas

 Summary
High Performance Computing Center 2
Overview

 Scientific computing has the following

characteristics:
 The problems are not interested.
 Use computer to calculate the arithmetic.
 Always want the programs run faster
 For examples: weather forecasting, climate
change, modeling, simulation, gene
prediction, docking…

High Performance Computing Center 3

Several Approaches
 Supercomputers
 Mainframe
 Cluster
 Multi/many cores systems

High Performance Computing Center 4

Microprocessor trends
 Many cores running at lower frequencies are fundamentally
more power-efficient

 Multi- cores (2-8 cores)

 CPU Intel pentium D/core duo/ core 2 duo/ quad cores, core i3,i5,
i7
 Many-cores (> 8 cores)
 GPU - Graphics Processing unit
A. P. Chandrakasan, M. Potkonjak, R. Mehra, J. Rabaey, and R. W. Brodersen,
“Optimizing Power Using Transformations,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
The development of modern GPUs

 GPU - NVIDIA GeFore GTX 295

CUDA Cores 480 ( 240 per GPU )
Graphics Clock (MHz) 576
Processor Clock (MHz) 1242
Memory Clock (MHz) 999
Memory Bandwidth (GB/sec) 223.8
Benchmark (GFLPOS) 1788.48

High Performance Computing Center 6

CPU vs GPU
 CPUs are optimized for high performance on sequential code:
transistors dedicated to data caching and flow control
 GPUs use additional transistors directly for data processing

Books: “Program ming Massively Parallel Processors: A Hands-on Approach”

High Performance Computing Center 7

GPU Solutions

 NVIDIA
 GeForce (gaming/movie playback)
 Quadro (professional graphics)
 Tesla (HPC)

 AMD/ATI
 Radeon (gaming/movie playback)
 FireStream (HPC)

AMD FireStream 9170

High Performance Computing Center 8

Motivation

 Costs/performance ratio
 Costs for power supply
 Costs for maintain, operation

High Performance Computing Center 9

GPGPU
 GP-GPU stands for General Purpose Computation on GPU
 A technique/technology/approach that consists in using the GPU chip on
the video card as a coprocessor that accelerates operations that are
normally executed on the CPU
 GPGPU is different from general graphics operations?
 GPGPU – running various kinds of algorithms on a GPU, not necessarily
image processing.
 For example: FFT, Monte-Carlo, Data-Sorting, Data mining and the list
continues
 Until 2006, developers must cast their problems to graphics
field and resolve them using graphics API

High Performance Computing Center 10

Parallel Computing with GPU

High Performance Computing Center 11

NVIDIA GPU
 11/2006: NVIDIA released G80 architecture with an
environment application development - CUDA
 Allow developers to develop GPGP applications on high level
programming languages
- Built from a scalable
array of Streaming
Processors (SM)
- Each SM contains 8 SP
(Scalar Processor)
- Each SM can initialize,
manage, execute up to
768 threads

G80 Architecture

High Performance Computing Center 12

NVIDIA GPU
 G80-based GPU
 Geforce 8800 GT
 14 SMs equivalent 112 cores
 DRAM 512MB

06/2008
 Geforce GT 200 series
 30 SMs (240 cores)
 DRAM 1GB
 Tesla
 30 SMs (240 cores)
 DRAM 4GB

High Performance Computing Center 13

Tesla Specification

 Power consumption: 187 W!

High Performance Computing Center 14

GPU Computing with CUDA
 CUDA: Compute Unified Device Architect
 Application Development Environment for
NVIDIA GPU
 Compiler, debugger, profiler, high-level
programming languages
 Libraries (CUBLAS, CUFFT, ..) and Code
Samples
GPU Computing with CUDA
 The GPU is viewed as a compute device that:
 Is a coprocessor to the CPU or host
 Has its own DRAM (device memory)

 CUDA C is an extension of C/C++ language

 Data parallel programming model
 Executing thousands of processes in parallel on
GPUs
 Cost of synchronization is not expensive

High Performance Computing Center 16

Hardware implementation
A set of SIMD Multiprocessors with On- Chip shared memory