Graphics Processing Unit (Gpu) Memory Hierarchy: Presented by Vu Dinh and Donald Macintyre

The document discusses the GPU memory hierarchy. It provides a history of graphics processing and how tasks have moved from the CPU to the GPU over time. It then covers the memory hierarchies of both NVIDIA and AMD GPUs including registers, shared memory, caches, and main memory as well as techniques for maximizing bandwidth and hiding latency.

Uploaded by

George Popov

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

107 views24 pages

Graphics Processing Unit (Gpu) Memory Hierarchy: Presented by Vu Dinh and Donald Macintyre

Uploaded by

George Popov

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Graphics Processing Unit

(GPU) Memory Hierarchy

Presented by Vu Dinh and Donald MacIntyre

1
Agenda
● Introduction to Graphics Processing
● CPU Memory Hierarchy
● GPU Memory Hierarchy
● GPU Architecture Comparison
o NVIDIA
o AMD (ATI)
● GPU Memory Performance
● Q&A

2
Brief Graphics Processing History
● Graphics Processing has
evolved from single hardware
pipelined units and are now
highly programmable
pipelined units.
● Over time tasks have been
moved from the CPU to the
GPU

3
Timeline
● 1980s
o Discrete Transistor-Transistor Logic (TTL) frame
buffer with graphics processed by CPU
● 1990s
o Introduction of GPU pipeline - CPU tasks began to
be moved to GPU
● 2000s
o Introduction of Programmable GPU Pipeline
● 2010s
o GPUs becoming general purpose and also utilized
for high performance parallel computations
4
Movement of Tasks from CPU to GPU

5
Introduction to Graphic Processing

6
CPU Memory Hierarchy
NVIDIA Fermi Memory Hierarchy

7
GPU Memory Hierarchy
Streaming Multiprocessors (SM) Register Files
● Large and Unified Register File
(32768 Registers)
● 16 SMs (128KB Register File per
SM), 32 Cores per SM
-> 2MB across the chip
● 48 warps (1,536 threads per SM)
-> 21 Registers/Thread
● Multi-Banked Memory
● Very high bandwidth ( 8,000 GB/s)
● ECC protected
8
GPU Memory Hierarchy (Cont.)
Shared/L1 Memory

● Configurable 64KB Memory

● 16KB shared / 48 KB L1
OR 48KB shared / 16KB L1
● Shared Multi-Threads & L1 Private
● Shared Memory Multi-Banked
● Very low latency (20-30 cycles)
● High bandwidth (1,000+ GB/s)
● ECC protected

9
GPU Memory Hierarchy (Cont.)
Texture & Constant Cache

● 64 KB read-only constant cache

● 12 KB texture cache
● Texture cache memory throughput
(GB/s): 739.63
● Texture cache hit rate (%): 94.21

10
GPU Memory Hierarchy (Cont.)
L2 Cache

● 768KB Unified Cache

● Shared among SMs
● ECC protected
● Fast Atomic Memory Operations

11
GPU Memory Hierarchy (Cont.)
Main Memory (DRAM)

● Accessed by GPU and CPU

● Six 64-bit DRAM channels
● Up to 6GB GDDR5 Memory
● Higher latency (400-800 cycles)
● Throughput: up to 177 GB/s

12
Different GPU Memory Hierarchies
● NVIDIA GeForce ● AMD Radeon HD
GTX 580 5870

13
GPU Memory Architecture NVIDIA - Fermi
● On board GPU memory →
high bandwidth DDR5 768 MB
to 6GB
● L2 shared cache → 512-768
KB high bandwidth
● L1 cache → one for each
streaming multiprocessor

14
GPU Memory Architecture - AMD Ring
● Mid 2000s design, used to
increase memory bandwidth
● To increase bandwidth
requires a wider bus
● Ring bus was an attempt to
avoid long circuit paths and
their propagation delays
● Two 512-bit links for true bi-
directional operation
● Delivered 100 GB/s of
internal bandwidth

15
GPU Memory Architecture - AMD Hub
● Ring bus wasted power →
all nodes got data even if
they did not need it
● Switched hub approach
reduces power and latency
since data is sent point to
point
● AMD increased internal bus
width to 2k bits wide
● Maximum bandwidth was
192 GB/s

16
GPU Bandwidth
● High bandwidth between main memory is required to support
multiple cores
● GPUs have relatively small cache
● GPU memory systems are designed for data throughput with wide
memory buses
● Much larger bandwidth than typical CPUs typically 6 to 8 times

17
GPU Bandwidth (Cont.)
● Bandwidth Use Techniques
o Avoid fetching data whenever possible
 Share/reuse data
 Make use of compression
 Perform math calculations instead of fetching
data when possible → math calculations are not
limited by memory bandwidth

18
GPU vs. CPU Bandwidth Growth

19
GPU Latency
● Big register files
● Dedicated shared memory (configurable)
● Multi-banked memory

● Reuse data in dedicated memories

● Focus on parallelism

20
GPU Latency (Cont.)
Latency Hiding

● 1,536 threads per SM (48 warps)

● 32 threads per warp (SIMT)
● 1000 cycles memory access stall
● Switch to another group to hide
latency

21
Future of GPU Memory
● New manufacturing process → High Bandwidth Memory
● Stacking DRAM dies on top of each other thus allowing
for close proximity between DRAM and processor

● Allows for very

high bandwidth
memory bus
● Due to stacking
will be harder to
cool

22
References
● Fatahalian, Kayvon. “The GPU Memory
Hierarchy”. Carnegie Mellon University.
● Cao Young. “GPU Memory II”. Virginia Tech.
● McClanahan Chris. “History and Evolution of
GPU Architecture”. Georgia Tech.
● “CUDA Memory and Cache Architecture”.
Supercomputing Blog.
● “Radeon X1800 Memory Controller”. ATI.

23
Q&A
Thank you!

GPU Introduction
No ratings yet
GPU Introduction
52 pages
0 Gpu Computing I Give It
No ratings yet
0 Gpu Computing I Give It
57 pages
Gpu Computing
No ratings yet
Gpu Computing
57 pages
Main Parameters To Evaluate The GPU Performance
No ratings yet
Main Parameters To Evaluate The GPU Performance
40 pages
Lecture - 01 - CUDA Programming
No ratings yet
Lecture - 01 - CUDA Programming
52 pages
AMPE Tema4 GPU Architecture
No ratings yet
AMPE Tema4 GPU Architecture
95 pages
GPU Architectures
No ratings yet
GPU Architectures
29 pages
Comp Arch Project 2 Final
No ratings yet
Comp Arch Project 2 Final
29 pages
GPU Types
No ratings yet
GPU Types
8 pages
Exploring The Gpu Architecture
No ratings yet
Exploring The Gpu Architecture
9 pages
The Evolution of Gpus For General Purpose Computing
No ratings yet
The Evolution of Gpus For General Purpose Computing
38 pages
CS 179: GPU Computing: Lecture 4: Gpu Memory Systems
No ratings yet
CS 179: GPU Computing: Lecture 4: Gpu Memory Systems
43 pages
PDC Lecture 09
No ratings yet
PDC Lecture 09
36 pages
NVIDIA GPU Evolution: Gaming to AI
100% (1)
NVIDIA GPU Evolution: Gaming to AI
91 pages
GPU (Graphics Processing Unit)
No ratings yet
GPU (Graphics Processing Unit)
23 pages
Dissecting GPU Memory Hierarchy Through
No ratings yet
Dissecting GPU Memory Hierarchy Through
14 pages
Why GPU?: CS8803SC Software and Hardware Cooperative Computing
No ratings yet
Why GPU?: CS8803SC Software and Hardware Cooperative Computing
14 pages
CSED405 Lec2-CUDA Overview - 240916 - 131108
No ratings yet
CSED405 Lec2-CUDA Overview - 240916 - 131108
52 pages
Introduction To GP-GPU and CUDA: High Performance Computing Center Hanoi University of Science & Technology
No ratings yet
Introduction To GP-GPU and CUDA: High Performance Computing Center Hanoi University of Science & Technology
43 pages
Topic 8
No ratings yet
Topic 8
71 pages
Kirk+Hwu GPU
No ratings yet
Kirk+Hwu GPU
92 pages
GPU in Supercomputer
No ratings yet
GPU in Supercomputer
7 pages
Compute Unified Device Architecture
No ratings yet
Compute Unified Device Architecture
6 pages
Gpu Series I Cpu Vs Gpu 1720694318
No ratings yet
Gpu Series I Cpu Vs Gpu 1720694318
4 pages
Chapter 5 - General Purpose PGPU, CUDA
No ratings yet
Chapter 5 - General Purpose PGPU, CUDA
70 pages
Graphics Processing Unit: Shashwat Shriparv Infinitysoft
No ratings yet
Graphics Processing Unit: Shashwat Shriparv Infinitysoft
39 pages
Lecture GPUArchCUDA01
No ratings yet
Lecture GPUArchCUDA01
57 pages
10 GPU-IntroCUDA3
No ratings yet
10 GPU-IntroCUDA3
141 pages
GPU Fundamentals
No ratings yet
GPU Fundamentals
20 pages
GPGPU
No ratings yet
GPGPU
139 pages
GPU Architecture Ebook
No ratings yet
GPU Architecture Ebook
67 pages
GPU Khoruzhenko
No ratings yet
GPU Khoruzhenko
5 pages
Lec 3
No ratings yet
Lec 3
48 pages
Report On Gpu
No ratings yet
Report On Gpu
39 pages
Unit 4
100% (1)
Unit 4
48 pages
GPU Insights for Tech Enthusiasts
No ratings yet
GPU Insights for Tech Enthusiasts
35 pages
Using GPUs
No ratings yet
Using GPUs
18 pages
Lecture 1: Introduction: Graphics Processing Units (Gpus) : Architecture and Programming
No ratings yet
Lecture 1: Introduction: Graphics Processing Units (Gpus) : Architecture and Programming
33 pages
Presented by Ragasudha.B Pavitha.P
No ratings yet
Presented by Ragasudha.B Pavitha.P
13 pages
GPU Architecture for Engineers
No ratings yet
GPU Architecture for Engineers
32 pages
GPU Computing Course Overview
No ratings yet
GPU Computing Course Overview
17 pages
Chapter 9 - Multiple Core Computers
No ratings yet
Chapter 9 - Multiple Core Computers
44 pages
Design Patterns For Low-Level Real-Time Rendering - Nicolas Guillemot - CppCon 2017
No ratings yet
Design Patterns For Low-Level Real-Time Rendering - Nicolas Guillemot - CppCon 2017
56 pages
Lecture 0: Cpus and Gpus: Prof. Mike Giles
No ratings yet
Lecture 0: Cpus and Gpus: Prof. Mike Giles
36 pages
Introduction To CUDA
No ratings yet
Introduction To CUDA
51 pages
HPC 5th Unit - 240504 - 160548
No ratings yet
HPC 5th Unit - 240504 - 160548
18 pages
GPU Programming Course Schedule
No ratings yet
GPU Programming Course Schedule
33 pages
GPU Cluster4
No ratings yet
GPU Cluster4
31 pages
CUDA Tutorial
100% (1)
CUDA Tutorial
50 pages
Gpgpu Final
No ratings yet
Gpgpu Final
124 pages
Comp206 Lecture14
No ratings yet
Comp206 Lecture14
29 pages
Assingmentbic 10503
No ratings yet
Assingmentbic 10503
13 pages
M6 Cuda Session
No ratings yet
M6 Cuda Session
53 pages
Bava Kalai Final
No ratings yet
Bava Kalai Final
235 pages
Gpu Cuda Part1
No ratings yet
Gpu Cuda Part1
27 pages
Lecture 6 3
No ratings yet
Lecture 6 3
15 pages
Unit 2 - GPU DFG
No ratings yet
Unit 2 - GPU DFG
27 pages
Chapter7 GPU
No ratings yet
Chapter7 GPU
45 pages
GPU Programming Insights
No ratings yet
GPU Programming Insights
22 pages