0% found this document useful (0 votes)

1K views28 pages

GPGPU Sim Tutorial

This document provides an overview of GPGPU-Sim, a simulator for GPU microarchitecture. It describes the functional PTX and SASS models, as well as the timing model that simulates a GPU running CUDA kernels. It outlines two demos - the first covers setup and configuration, while the second analyzes scheduling policies by monitoring warp scheduling order. The document also details the GPU architecture modeled, including pipeline stages, the memory system, and branch divergence handling.

Uploaded by

Mohan Kumar N

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1K views28 pages

GPGPU Sim Tutorial

Uploaded by

Mohan Kumar N

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

GPGPU-Sim Tutorial

Zhen Lin
North Carolina State University
Based on GPGPU-Sim Tutorial and Manual by UBC

Outline
GPGPU-Sim Overview
Demo1: Setup & Configuration
GPGPU-Sim Internals
Demo2: Scheduling Study

GPGPU-Sim in a Nutshell
Microarchitecture timing model of contemporary GPUs
Run unmodified CUDA/OpenCL

What GPGPU-Sim Simulates

Functional model
PTX
SASS

Timing model for the compute part of a GPU

Not for CPU or PCIe
Only model microarchitecture timing relevant to compute

Functional model
PTX
A low-level, data-parallel virtual machine and instruction set architecture (ISA)
Between CUDA and hardware ISA (SASS)
Stable ISA that spans multiple GPU generations

SASS/PTXPLUS
Hardware native ISA
PTX -> Translate + Optimize -> SASS
More accurate, but not well supported

CUDA tool chain

Functional Model (PTX)

Scalar ISA
SSA representation: register allocation not done in PTX

Timing Model for GPU Micro-Architecture

GPGPU-Sim simulates the timing model
of a GPU running each launched CUDA
kernel
Report stats (e.g. # cycles) for each kernel
Exclude any time spent on data transfer
on PCIe bus
CPU is assumed to be idle when the GPU
is working

Compilation Path

Outline
GPGPU-Sim Overview
Demo1: Setup & Configuration
GPGPU-Sim Internals
Demo2: Scheduling Study

Demo1
Setup
Stats
Configuration

Outline
GPGPU-Sim Overview
Demo1: Setup & Configuration
GPGPU-Sim Internals
Demo2: Scheduling Study

Overview of the Architecture

Inside a SIMT Core

Pipeline stages

Fetch
Decode
Issue
Read operand
Execution
Writeback

Fetch + Decode
Arbitrate the I-cache
among warps
Cache miss handled by
fetching again later

Fetched instruction is
decoded and then
stored in the I-Buffer
1 or more entries / warp
Only warp with vacant
entries are considered in
fetch

Issue
Selects a warp with a ready
instruction
Acquires the activemask
from TOS of SIMT stack
Invalid the I-buffer

Scoreboard
Checks for RAW and WAW
dependency hazard
Flag instructions with hazards as not ready in I-Buffer
(masking them out from the scheduler)

Instructions reserves dest registers at issue

Release them at writeback

December 2012

GPGPU-Sim Tutorial (MICRO 2012) 4: Microarchitecture Model

4.17

Read Operand
Bank 0

Bank 1

Bank 2

Bank 3

R10

R11

add.s32 R3, R1, R2;

No Conflict

mul.s32 R3, R0, R4;

Conflict at bank 0

Operand Collector Architecture (US Patent: 7834881)

Interleave operand fetch from different threads to achieve full utilization

December 2012

GPGPU-Sim Tutorial (MICRO 2012) 4: Microarchitecture Model

4.18

Operand Collector
(from instruction issue stage)
dispatch

December 2012

GPGPU-Sim Tutorial (MICRO 2012) 4: Microarchitecture Model

4.19

Execution
ALU
Stream processor (SP)
Specific function unit (SFU)

MEM

Shared memory
Local memory
Global memory
Texture memory
Constant memory

ALU Pipelines
SIMD Execution Unit
Fully Pipelined
Each pipe may execute a subset of instructions
Configurable bandwidth and latency (depending on the instruction)
Default: SP + SFU pipes

December 2012

GPGPU-Sim Tutorial (MICRO 2012) 4: Microarchitecture Model

4.21

Memory Unit

Double clock the unit

Each cycle service half the
warp

A
G
U

Bank
Conflict

Shared MSHR
Mem

Access
Coalesc.

Data
Cache

Has a private writeback

path

December 2012

GPGPU-Sim Tutorial (MICRO 2012) 4: Microarchitecture Model

Const.
Cache
Texture
Cache

Memory Port

Model timing for memory

instructions
Support half-warp (16
threads)

4.22

Writeback
Write result to register file
Scoreboard updates the r-bit

Stack-Based Branch Divergence Hardware

When the branch diverge

New entries are pushed to SIMT stack

RPC set to the immediate post dominator
Activemast indicates which threads are active
PC is sent to fetch unit

When RPC is reached

Pop the TOS
PC of new TOS is sent to the fetch unit

Outline
GPGPU-Sim Overview
Demo1: Setup & Configuration
GPGPU-Sim Internals
Demo2: Scheduling Study

Demo2
Software framework overview
To monitor the warp scheduling order
Compare with different scheduling policies

For More Information

http://www.gpgpu-sim.org/

Thanks & question?

Cloud Computing Exam Questions
No ratings yet
Cloud Computing Exam Questions
4 pages
Principles of Scalable Performance
No ratings yet
Principles of Scalable Performance
34 pages
1-IAS Architecture-12-12-2022
No ratings yet
1-IAS Architecture-12-12-2022
34 pages
Course Plan (Operating System)
No ratings yet
Course Plan (Operating System)
14 pages
Full Stack Dev Basics Guide
No ratings yet
Full Stack Dev Basics Guide
22 pages
Slides Chapter 5 Basic Processing Unit
No ratings yet
Slides Chapter 5 Basic Processing Unit
44 pages
Ch01 Basic Concepts and Computer Evolution
No ratings yet
Ch01 Basic Concepts and Computer Evolution
36 pages
Os Unit 2
No ratings yet
Os Unit 2
277 pages
Lecture 3 Multiprocessor Vs Multicomputer Vs DS
No ratings yet
Lecture 3 Multiprocessor Vs Multicomputer Vs DS
55 pages
PPS - Unit 1
No ratings yet
PPS - Unit 1
69 pages
Operating Systems for Beginners
No ratings yet
Operating Systems for Beginners
7 pages
Introduction To Parallel Computing
No ratings yet
Introduction To Parallel Computing
13 pages
OS Question Bank
No ratings yet
OS Question Bank
4 pages
Web Development Frameworks Guide
No ratings yet
Web Development Frameworks Guide
2 pages
2012 IN4392 Lecture-5 CloudProgrammingModels
100% (1)
2012 IN4392 Lecture-5 CloudProgrammingModels
95 pages
21cse203p App l19-l27 Unit III
No ratings yet
21cse203p App l19-l27 Unit III
72 pages
Unit 2
100% (1)
Unit 2
58 pages
OOSE Exam Prep for CSE Students
No ratings yet
OOSE Exam Prep for CSE Students
6 pages
Eiot Notes
No ratings yet
Eiot Notes
129 pages
3.3 Intrinsic - Event Handling
No ratings yet
3.3 Intrinsic - Event Handling
21 pages
Software Engineering SYIT SEM-IV
No ratings yet
Software Engineering SYIT SEM-IV
7 pages
Causal Message Ordering in Systems
No ratings yet
Causal Message Ordering in Systems
4 pages
Instruction Pipeline Design, Arithmetic Pipeline Deign - Super Scalar Pipeline Design
No ratings yet
Instruction Pipeline Design, Arithmetic Pipeline Deign - Super Scalar Pipeline Design
34 pages
HighPerformanceComputing DS
No ratings yet
HighPerformanceComputing DS
2 pages
CS6801 Mca Rejinpaul Iq April May 2018
No ratings yet
CS6801 Mca Rejinpaul Iq April May 2018
1 page
Assignment Exercises For Prompt Engineering
100% (1)
Assignment Exercises For Prompt Engineering
2 pages
PC Hardware & Network Troubleshooting
0% (1)
PC Hardware & Network Troubleshooting
23 pages
Chapter 10
No ratings yet
Chapter 10
12 pages
Unit Iii - Aca
No ratings yet
Unit Iii - Aca
13 pages
Unit I
No ratings yet
Unit I
53 pages
CST 402 DC QB
No ratings yet
CST 402 DC QB
6 pages
BE-IT - Elec-3 - 414444-A - Mobile Computing Syllabus - Z
No ratings yet
BE-IT - Elec-3 - 414444-A - Mobile Computing Syllabus - Z
3 pages
UNIT 6 Hardware & Software Concepts PDF
No ratings yet
UNIT 6 Hardware & Software Concepts PDF
9 pages
Lec-10 Software Pipelining
No ratings yet
Lec-10 Software Pipelining
24 pages
Ex No 1
No ratings yet
Ex No 1
8 pages
Distributed Computing
No ratings yet
Distributed Computing
40 pages
Advanced Operating System Solved Question Paper
No ratings yet
Advanced Operating System Solved Question Paper
32 pages
Chapter-7 Multiprocessors and Multicomputers: Module-Iv
No ratings yet
Chapter-7 Multiprocessors and Multicomputers: Module-Iv
53 pages
CCS369
No ratings yet
CCS369
2 pages
Parallel Computer Architecture Classification
No ratings yet
Parallel Computer Architecture Classification
23 pages
UNIT-1 Introduction To Scripting Languages: 1.1 Scripts and Programs
100% (2)
UNIT-1 Introduction To Scripting Languages: 1.1 Scripts and Programs
34 pages
Sample PPT For Mini Project
No ratings yet
Sample PPT For Mini Project
9 pages
Unit Ii Virtualization of Cpu Memory and Io Devices
No ratings yet
Unit Ii Virtualization of Cpu Memory and Io Devices
11 pages
Unix File System Case Study
No ratings yet
Unix File System Case Study
23 pages
GPU Programming and Parallelism
No ratings yet
GPU Programming and Parallelism
16 pages
RRIT Question Bank 1 - CC - IA-1-2021-22
No ratings yet
RRIT Question Bank 1 - CC - IA-1-2021-22
2 pages
3259 - Basics of CHN Lab
No ratings yet
3259 - Basics of CHN Lab
67 pages
2-Water Fall Model, Incremental Model, RAD Model-05!01!2024
No ratings yet
2-Water Fall Model, Incremental Model, RAD Model-05!01!2024
30 pages
Welcome To The Azure Hands On Lab
No ratings yet
Welcome To The Azure Hands On Lab
8 pages
BTAIML AI Notes Upto Unit 3
No ratings yet
BTAIML AI Notes Upto Unit 3
101 pages
Cloud Computing: Resource Management in Cloud
No ratings yet
Cloud Computing: Resource Management in Cloud
33 pages
SOC Lab Manual
No ratings yet
SOC Lab Manual
11 pages
Unit 2 AI
No ratings yet
Unit 2 AI
22 pages
Session III
No ratings yet
Session III
51 pages
Artificial Intelligence (AI) - Water-Jug Problem
100% (1)
Artificial Intelligence (AI) - Water-Jug Problem
3 pages
CS6801-Multi Core Architectures and Programming
No ratings yet
CS6801-Multi Core Architectures and Programming
9 pages
Multiprocessor vs Multicomputer Systems
No ratings yet
Multiprocessor vs Multicomputer Systems
27 pages
Cloud Computing Unit - 1
No ratings yet
Cloud Computing Unit - 1
10 pages
1 Tutorial Intro
No ratings yet
1 Tutorial Intro
27 pages
GPGPU-Sim Tutorial Guide
No ratings yet
GPGPU-Sim Tutorial Guide
35 pages
Project OS2
No ratings yet
Project OS2
2 pages
Highh Mast Lighting
0% (1)
Highh Mast Lighting
6 pages
2.3.1.a.a BeefUpTechnologicalResources
No ratings yet
2.3.1.a.a BeefUpTechnologicalResources
2 pages
Nco Level2 Class 3 Set 5
No ratings yet
Nco Level2 Class 3 Set 5
8 pages
Laptop PDF
No ratings yet
Laptop PDF
4 pages
Baumer EAx PROFINET MA EN
No ratings yet
Baumer EAx PROFINET MA EN
69 pages
Vsphere Esxi Vcenter Server 672 Monitoring Performance Guide
No ratings yet
Vsphere Esxi Vcenter Server 672 Monitoring Performance Guide
233 pages
Variador Schenider-Inverter - en PDF
No ratings yet
Variador Schenider-Inverter - en PDF
219 pages
Technical Manual
No ratings yet
Technical Manual
22 pages
Brksec 2660
No ratings yet
Brksec 2660
98 pages
38-08032 CY7C68013A CY7C68014A CY7C68015A CY7C68016A EZ-USB FX2LP USB Microcontroller High-Speed USB Peripheral Controller
No ratings yet
38-08032 CY7C68013A CY7C68014A CY7C68015A CY7C68016A EZ-USB FX2LP USB Microcontroller High-Speed USB Peripheral Controller
69 pages
Datasheet Dc-m9204 & Di-M9204 Manual Call Point
No ratings yet
Datasheet Dc-m9204 & Di-M9204 Manual Call Point
4 pages
Gates Grade5
No ratings yet
Gates Grade5
2 pages
Manuel PDF
No ratings yet
Manuel PDF
4 pages
PDMS Installation Guide PDF
No ratings yet
PDMS Installation Guide PDF
54 pages
Emerson Network Power
No ratings yet
Emerson Network Power
5 pages
Idw Document
No ratings yet
Idw Document
24 pages
PROII User-Added Subroutines User Guide
100% (1)
PROII User-Added Subroutines User Guide
536 pages
993 Catalog
No ratings yet
993 Catalog
152 pages
Algorithm Flowchart PDF
No ratings yet
Algorithm Flowchart PDF
31 pages
37 Blowapart Lo
No ratings yet
37 Blowapart Lo
6 pages
Linear Project 1
No ratings yet
Linear Project 1
13 pages
Apple Store Design
0% (1)
Apple Store Design
8 pages
Inventory Custodian Slip - Appendix-59
No ratings yet
Inventory Custodian Slip - Appendix-59
4 pages
Robert Austin Resume
No ratings yet
Robert Austin Resume
2 pages
KOSPET TANK T3 Smartwatch User Manual
No ratings yet
KOSPET TANK T3 Smartwatch User Manual
140 pages
MacBook Pro - Apple Store (Indonesia)
No ratings yet
MacBook Pro - Apple Store (Indonesia)
28 pages
Aiwa Nsx-A222, A223, s222, s333
No ratings yet
Aiwa Nsx-A222, A223, s222, s333
38 pages
877 FSH 2209-0e1
No ratings yet
877 FSH 2209-0e1
51 pages
Amt 75 Series s2 Broacast Modem
No ratings yet
Amt 75 Series s2 Broacast Modem
3 pages

GPGPU Sim Tutorial

Uploaded by

GPGPU Sim Tutorial

Uploaded by

GPGPU-Sim Tutorial

What GPGPU-Sim Simulates

Timing model for the compute part of a GPU

CUDA tool chain

Functional Model (PTX)

Timing Model for GPU Micro-Architecture

Overview of the Architecture

Inside a SIMT Core

Instructions reserves dest registers at issue

GPGPU-Sim Tutorial (MICRO 2012) 4: Microarchitecture Model

add.s32 R3, R1, R2;

mul.s32 R3, R0, R4;

Operand Collector Architecture (US Patent: 7834881)

GPGPU-Sim Tutorial (MICRO 2012) 4: Microarchitecture Model

GPGPU-Sim Tutorial (MICRO 2012) 4: Microarchitecture Model

GPGPU-Sim Tutorial (MICRO 2012) 4: Microarchitecture Model

Double clock the unit

Has a private writeback

GPGPU-Sim Tutorial (MICRO 2012) 4: Microarchitecture Model

Model timing for memory

Stack-Based Branch Divergence Hardware

New entries are pushed to SIMT stack

When RPC is reached

For More Information

Thanks & question?

You might also like