0% found this document useful (0 votes)
51 views7 pages

Parameters To Compare GPUs

The document outlines parameters for comparing GPUs, emphasizing the importance of identifying use-case requirements, performance benchmarks, and GPU specifications. It categorizes GPUs based on capabilities such as memory bandwidth and specialized cores, and provides a performance comparison of various GPUs. Additionally, it highlights popular benchmarks for HPC/AI and video rendering performance to aid in selecting the ideal GPU for specific tasks.

Uploaded by

mehtadev557
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views7 pages

Parameters To Compare GPUs

The document outlines parameters for comparing GPUs, emphasizing the importance of identifying use-case requirements, performance benchmarks, and GPU specifications. It categorizes GPUs based on capabilities such as memory bandwidth and specialized cores, and provides a performance comparison of various GPUs. Additionally, it highlights popular benchmarks for HPC/AI and video rendering performance to aid in selecting the ideal GPU for specific tasks.

Uploaded by

mehtadev557
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Parameters to compare GPUs

Introduction
Steps of Identify Ideal GPU for a use-case
Categorization of GPU
Performance Comparison of Current Inventory
Some Popular GPU Benchmarks
HPC/AI Performance Benchmarks
Video Rendering Performance Benchmarks
Analytical Processing Performance Benchmarks

Introduction
To determine best GPU for a specific use case there are certain consideration that should be made like what are the dynamics for the
problem, optimum/ideal performance for a solution, cost and power utilization budget, driver support for GPU, etc. The software ecosystem
around GPU hardware should also be considered in making such decisions.
Steps of Identify Ideal GPU for a use-case
Identification of Primary Use Case and its requirements in terms for compute, memory and network requirements.
Comparison of Key specification of GPUs to determine a hierarchy.
Addition of prior use-case specific benchmarks
Add price-to performance ratio to the GPU hierarchy.
Check compatibility of GPU with existing code base dependencies.
Software and ecosystem support
Future proof
Categorization of GPU
The GPUs should be clearly categorized according to thier capabilities in terms of:

Number of parallel compute components like CUDA cores (NVIDIA) / Streaming processors (AMD).
Memory Bandwidth, higher bandwidth can make crucial difference in inferencing servers.
Available dedicated memory and memory type (e.g., GDDR6, HBM2).

Availability and count of specialized cores/accelerators like Ray Tracing and Tensor Cores in NVIDIA, and Ray Accelerators in AMD.
Use-case benchmarks like MLPerf/DLPerf or determination of a custom performance benchmark.
Form factor
Software and features: Support for features like NVIDIA's DLSS (Deep Learning Super Sampling) or AMD's FidelityFX Super
Resolution, as well as driver stability and software ecosystem
Support for APIs like DirectX, Vulkan and OpenGL

AMD Radeon RX 6800 XT Review | TechSpot This kind of a comparison when drawn for HPC/AI or rendering workloads could
give an intuitive incite to the user.

Performance Comparison of Current Inventory

Featu RTX A2 A10 A30 A40 A100 L4 L40s H100 H100 H100 RTX RTX RTX
res | 8000 PCIe NVL SXM A600 6000 4090
GPUS 0 Ada
Targe Edge / AI AI/HP Edge / Gener Rende
t Entry Infere C Entry ative ring
Audie Level nce Level AI/
nce and for AI LLMs/
Analyt and Rende
ics Data ring
analys
is

Price( 4000 1400 15000 50000 1500 6000 3500 2500 1500
$)

Memo 48 GB 16GB 24GB 24GB 48GB 40/80 24GB 48GB 80GB 94GB 80GB 48GB 48GB 24GB
ry GDDR GDDR GDDR HBM2 GB GDDR GDDR HBM2 HBM2 HBM2 GDDR GDDR GDDR
6 6 6 HBM2 6 6 with e e e 6 6 6
ECC

Memo 384- 128- 384- 3072- 384- 5120- 192- 384- 5120- 384- 384- 384
ry bit bit bit bit bit bit bit bit bit bit bit bit
bus
Width

Memo 672 200 600 933 696 1935 | 300 864 2000 3900 3350 768.0 768.0 1008
ry 2039
Band
width
(GB/s
)

Memo 6251 6251 1215 7251 1512 6251 9001 1593


ry
Clock
(Mz)

GPU GA10 GA10 GA10 GH10


2-890 2-895 0- 0-200
893FF
,
GA10
0-
893FF
F,
GA10
0-
893H
H,
GA10
0-
893H
HH

Clock ? 1440- 885- 930- 1305- 1065- 795- 1065- 1125- 1410 2175 2235
Spee 1770 1695 1440 1740 1410 2040 2520 1755 MHz MHz MHz
d
(Base
-
Boost
ed
MHz)

Cuda 4608 1280 3rd 6912 7424 18,17 10752 18,17 16,38
Cores gen 6 6 4

Tenso 576 40 | 224 432 240 568 336 568 512


r Gen 3
Cores

RT 72 10 72 ? 60 142 84 142 128


Cores

Tenso 119.4 ? ? 312 ? 1,466 309.7 1457. ?


r TFLO TFLO TFLO TFLO 0
Perfor PS PS PS PS TFLO
manc PS
e

Doubl 5.2 | 9.7 | 30 | 34 |


e 10.3 19.5 60 67
Precis
ion
(FP64
| FP64
Tenso
r core)
Pref

FP32 14.9 4.5 10.3 19.5 30.3 91.6 60 67 38.7 91.1 82.58
Perf TFLO TFLO TFLO TFLO TFLO
PS PS PS PS PS

TF32 9 | 18* 82 TF 156 | 120 183 I 989 835


Perf | 165 312* 366*
TF*

FP16 18 | 165 | 312 | 242 362.0 1671 1979


(Tens 36* 330* 624* 5I
or 733*
core)
Perf

FP8 - 485 733 I 3341 3958


Perf 1,466*

INT8 36 | 330 | 624 | 485 733 I 3341 3958


INT4 144* 661* 1248* 1,466*
72 | 661 | 733 I
144* 1321* 1,466*

RT ? ? ? ? ? 212 75.6 210.6 ?


Core TFLO TFLO TFLO
Perfor PS PS PS
manc
e

Enco 1 1 NA NA 2 3 0 0 0 1 3 2
ders

Deco 1 2 4 5 4 3 7 7 7 2 3 1t
ders

Speci 1st ? 1 OFA ? 4 Transf 7 7 7 ? ? ?


ality Ray 1 JPEG ormer JPEG JPEG JPEG
tracin NVJP DEC Engin DEC DEC DEC
g EG e
GPU

Archit Turing Amper Amper Amper Amper Ada Ada Hoppe Hoppr Hoppe Amper Ada Ada
ectur e e e e Lovela Lovela r t r e Lovela Lovela
e nce nce nce nce

Serie Quadr Tesla Tesla Tesla Tesla Tesla Tesla Tesla Tesla Tesla Quadr Quadr Gefor
s o o o ce

Cooli Active Passiv Passiv Passiv Passiv ? Passiv Passiv Passiv Passiv ? ? ?
ng e e e e e e e e

Powe 250W 40- 150W 165W 250W( 72 350 300- 350- 700W 300W 300 W 450W
r [260- 60W 40GB) 350W 400W
Cons 295 ,
umpti W]
on 150W-
300W(
80GB)
.
400W(
SXM)

Ray DESE LOW LOW GOO MEDI GOO MEDI GOO GOO
Tracin CNT D UM D UM D D
g

AI LOW LOW MEDI GOO MEDI GOO MEDI MEDI MEDI


Capa UM D UM D UM UM UM
bilitie
s

ML LOW LOW MEDI GOO MEDI GOO MEDI MEDI MEDI


traini UM D UM D UM UM UM
ng

Rend LOW LOW LOW GOO MEDI GOO MEDI MEDI GOO
ering D UM D UM UM D
Capa
bilitie
s
Multi NA NA NA 4 Up to NA NA NA NA
Instan MIGs 7 NA
ce @ GPU
Supp 6GB instan
ort 2 ces
MIGs
@
12GB
1
MIGs
@
24GB

NVLin Conne NA NA 1x 3rd 1x 3rd 3x 3rd NA NA Conne NA NA


k/ cts 2 Gen Gen Gen cts 2
NVSw at 100 NVLin NVLin NVLin at
itch GB/s k k k 112.5
bi- 200G 112.5 600G GB/s
directi B/s GB/s B/s (bidire
onal ctional
)

Four 10DE: 10DE: 10DE: 10DE: 10DE: 80GB 10DE: 10DE: 10DE:
part 1E78: 25B6: 2236: 20B7: 2235: - 27B8: 26B9 2230:
ID 10DE: 10DE: 10DE: 10DE: 10DE: 10DE: 10DE: :10DE 10DE:
(VID: 13D8 157E 1482 1532 145A 20B5: 16CA :1851 1459
DEVI 10DE:
D:SVI 1533
D:SSI 40GB
D) -
10DE:
20F1:
10DE:
145F

Form 4.4” H HHHL FHFL, FHFL, FHFL, 4/8 1-slot FHFL, FHFL, FHFL, ?
Facto x , SW, SW DW DW SXM low- DW DW DW
r 10.5” (LP) GPUs profile
L, PCIe in , PCIe
FHFL, NVIDI (169m
DW A mx
HGX 69mm
™ )
A100
PCIe

Interf PCIe PCIe PCIe PCIe PCIe PCIe PCIe PCIe PCIe PCIe
ace 3.0x1 4.0 4.0 4.0 4.0 4.0 4.0 4.0 5.0 4.0
6 x8, x4; x8, x16 x16 x16 x16, x16; x16, x16
3.0 x16; x8; 64GB/ x8;
x8 3.0 x 3.0 s Bi
4.0
16 x16
x16
Powe 8-pin - 8-pin 8-pin 8-pin 8-pin - 16-pin 16-pin 8-pin
r auxilia auxilia auxilia auxilia auxilia auxilia auxilia auxilia
Conn ry ry ry ry ry ry ry ry
ector power power power power power power power power
conne conne conne conne conne conne conne conne
ctor ctor ctor ctor ctor ctor ctor ctor

https://cloudspacetechnologies-my.sharepoint.com/:x:/g/personal/siddharth_mishra_myrealdata_in/Eff_
b2M2R1pJuy42n1WO2uMB-0rqyo2iBQXOdv55_CxWEg?e=NzojcI

Connect your OneDrive account to collaborate on work across Atlassian products. Learn more about Smart Links.

OneDrive Connect to OneDrive

NVIDIA Data Center Platform | Line Card

Precision format support in NVIDIA GPU Architectures

Some Popular GPU Benchmarks


HPC/AI Performance Benchmarks
1. MLPerf: A broad and widely recognized benchmark suite for machine learning performance. MLPerf covers a range of AI tasks
including training and inference for different types of neural networks across various hardware platforms.
2. HPL (High Performance Linpack): Traditionally used to rank supercomputers in the TOP500 list, HPL measures a system's floating-
point computing power by solving a dense system of linear equations, which is relevant for both HPC and certain AI workloads.
3. HPCG (High Performance Conjugate Gradients): Complements HPL by testing computational and data access patterns that are
more characteristic of real-world HPC applications than HPL's dense linear algebra focus.

4. GuideLLM: It is a powerful tool for evaluating and optimizing the deployment of large language models (LLMs). By simulating real-
world inference workloads, GuideLLM helps users gauge the performance, resource needs, and cost implications of deploying LLMs on
various hardware configurations. (For vLLM inference scenarios)
Video Rendering Performance Benchmarks
1. OctaneRender: Uses GPU rendering to measure how well a GPU can handle photorealistic rendering using the OctaneRender engine.
This is particularly relevant for professionals in visual effects and animation.
2. Blender Benchmark: Open-source 3D rendering software that offers a benchmarking tool for measuring the performance of GPUs
(and CPUs) in rendering tasks. It's widely used due to Blender's popularity in the 3D modeling and animation industry.
3. Redshift Benchmark: Designed for the Redshift rendering engine, this benchmark measures the performance of GPUs in rendering
scenes that are representative of motion pictures and visual effects workloads.
4. V-Ray Benchmark: V-Ray Benchmark is a free tool that measures how fast your system renders. Rendering performance evaluation
can be done using CPUs, NVIDIA GPUs, or a combination of both. Chaos® V-Ray® is a 3D rendering plugin available for all major 3D
design and CAD programs. It works seamlessly with 3ds Max, Cinema 4D, Houdini, Maya, Nuke, Revit, Rhino, SketchUp, and Unreal.
Analytical Processing Performance Benchmarks
1. SPECviewperf: Widely used to evaluate the performance of GPUs in professional visualization applications, including energy, medical,
and financial analysis tasks. It measures the graphics performance of systems in professional applications.
2. Geekbench: Provides both compute and GPU benchmarks that measure the performance of GPUs in various computational tasks,
including those relevant to analytical processing.
3. SiSoftware Sandra: Offers a suite of benchmarks that can test various aspects of GPU performance, including processing capability,
memory bandwidth, and latency, relevant for analytical processing tasks.

You might also like