0% found this document useful (0 votes)
184 views3 pages

A2 Datasheet

Uploaded by

Zachary Chapin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
184 views3 pages

A2 Datasheet

Uploaded by

Zachary Chapin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

DATASHEET

NVIDIA A2 TENSOR CORE GPU


Entry-level GPU that brings NVIDIA AI to any server.

Versatile Entry-Level Inference SYSTEM SPECIFICATIONS


The NVIDIA A2 Tensor Core GPU provides entry-level inference with Peak FP32 4.5 TF
low power, a small footprint, and high performance for NVIDIA AI at
TF32 Tensor Core 9 TF | 18 TF¹
the edge. Featuring a low-profile PCIe Gen4 card and a low 40-60
BFLOAT16 Tensor 18 TF | 36 TF¹
watt (W) configurable thermal design power (TDP) capability, the A2
Core
brings adaptable inference acceleration to any server.
Peak FP16 Tensor 18 TF | 36 TF¹
A2's versatility, compact size, and low power exceed the demands Core
for edge deployments at scale, instantly upgrading existing entry- Peak INT8 Tensor 36 TOPS | 72 TOPS¹
level CPU servers to handle inference. Servers accelerated with A2 Core
GPUs deliver higher inference performance versus CPUs and more Peak INT4 Tensor 72 TOPS | 144 TOPS¹
efficient intelligent video analytics (IVA) deployments than previous Core
GPU generations—all at an entry-level price point. RT Cores 10
NVIDIA-Certified Systems™ featuring A2 GPUs and NVIDIA AI, Media engines 1 video encoder
including the NVIDIA Triton™ Inference Server, deliver breakthrough 2 video decoders
(includes AV1 decode)
inference performance across edge, data center, and cloud.
GPU memory 16GB GDDR6
They ensure that AI-enabled applications deploy with fewer servers
and less power, resulting in easier deployments, faster insights, and GPU memory 200GB/s
bandwidth
significantly lower costs.
Interconnect PCIe Gen4 x8

Up to 20X More Inference Performance Form factor 1-slot, Low-Profile PCIe


Max thermal 40-60W (Configurable)
AI inference is deployed to make consumer lives more convenient design power
through real-time experiences, and enables them to gain insights on (TDP)
trillions of end-point sensors and cameras. Compared to CPU-only vGPU software NVIDIA Virtual PC
servers, the servers built with NVIDIA A2 Tensor Core GPU offer up support² (vPC), NVIDIA Virtual
to 20X more inference performance, instantly upgrading any server to Applications (vApps),
handle modern AI. NVIDIA RTX Virtual
Workstation (vWS), NVIDIA
AI Enterprise, NVIDIA
Virtual Compute Server
(vCS)
¹ With sparsity
² Supported in future vGPU release

ion (EfficientDet-DO) Natural Language ProcessingNatural


Computer Vision (EfficientDet-DO) (BERT-Large)
Language ProcessingNatural
(BERT-Large)
Language ProcessingText-to-Speech
(BERT-Large) (Tacotron2 + Waveglow)
Text-to-Speech (Tacotron2 + Waveglo
Text-to-

NVIDIA A2 8X NVIDIA A2 8X NVIDIA A2 7X NVIDIA A2 7X NVIDIA A2 7X NVIDIA A2 20X NVIDIA A2

1X CPU 1X CPU 1X CPU 1X CPU 1X CPU 1X CPU 1X CPU

2X 10X 4X 6X 0X 8X 2X 10X 4X 6X 0X 8X 10X


2X 4X 0X6X 2X8X 4X 0X6X 2X8X 4X 0X6X 5X 8X 10X 15X 0X 20X 5X 25X 10X 15X

Inference Speedup Inference Speedup Inference Speedup Inference Speedup Inference Speedup Inference Speedup Inference Speedup
parisons
ersus a of one NVIDIA A2 Tensor Core GPU
Comparisons
versus a of one NVIDIA A2 Tensor Core GPU
Comparisons
versus a of one NVIDIA A2 Tensor Core GPU
Comparisons
versus a of one NVIDIA A2 Tensor Core GPU
Comparisons
versus a of one NVIDIA A2 Tensor Core GPU
Comparisons
versus a of one NVIDIA A2 Tensor Core GPU
Comparisons
versus a of one NVIDIA A2 Tensor C
dual-socket Xeon Gold 6330N CPU dual-socket Xeon Gold 6330N CPU dual-socket Xeon Gold 6330N CPU dual-socket Xeon Gold 6330N CPU dual-socket Xeon Gold 6330N CPU dual-socket Xeon Gold 6330N CPU dual-socket Xeon Gold 6330N

330N
CPU: HPE DL380 Gen10System
Plus, 2S
Configuration:
Xeon Gold 6330N
CPU: HPE DL380 Gen10System
Plus, 2SConfiguration:
Xeon Gold 6330N
CPU: HPE DL380 Gen10System
Plus, 2S Configuration:
Xeon Gold 6330N
CPU: HPE DL380 Gen10System
Plus, 2S Configuration:
Xeon Gold 6330N
CPU: HPE DL380 Gen10System
Plus, 2SConfiguration:
Xeon Gold 6330N
CPU: HPE DL380 Gen10System
Plus, 2SConfiguration:
Xeon Gold 6330N
CPU: HPE DL380 Gen10System
Plus, 2SCon
Xe
512x512)
4 | Computer
| Vision: EfficientDet-D0
@2.2GHz, 512GB(COCO,
DDR4 512x512)
| Computer
| Vision: EfficientDet-D0
@2.2GHz, 512GB(COCO,
DDR4 512x512)
| NLP:|BERT-Large (Sequence
@2.2GHz, length:
512GB DDR4
384, SQuAD:
| NLP: BERT-Large (Sequence
@2.2GHz,length:
512GB DDR4
384, SQuAD:
| NLP: BERT-Large (Sequence
@2.2GHz, length:
512GB DDR4
384, SQuAD:
| Text-to-Speech: Tacotron2
@2.2GHz,+ Waveglow
512GB DDR4
end-to-end
| Text-to-Speech: Tacotron2
@2.2GHz,+ Wa51
cision:
on: INT8,
INT8,
BS:8 (GPU) | OpenVINO
TensorRT 2021.4,
8.2, Precision:
Precision:
INT8,
INT8,
BS:8 (GPU) | OpenVINO
v1.1) | TensorRT
2021.4, 8.2,
Precision:
Precision:
INT8,
INT8, BS:1 (GPU)
v1.1) | | OpenVINO
TensorRT 2021.4,
8.2, Precision: INT8, BS:1 (GPU)
v1.1) | | OpenVINO
TensorRT 2021.4,
8.2, Precision: INT8, BS:1 (GPU)
pipeline
| OpenVINO
(input length:
2021.4,
128) | PyTorch 1.9, Precision:
pipeline
FP16,
(input
BS:1
length:
(GPU)128)
| PyTorch
| PyTorch 1.9, Precision:
pipeline
FP16,
(inp
BS:8 (CPU) Precision: INT8, BS:1 (CPU) Precision: INT8, BS:1 (CPU) Precision: INT8, BS:1 (CPU) 1.9, Precision: FP32, BS:1 (CPU) 1.9, Precision: FP32, BS:1 (CPU) 1.9, Precisio

NVIDIA A2 TENSOR CORE GPU | DATASHEET | 1


Higher IVA Performance for Intelligent Edge
Servers equipped with A2 offer up to 1.3X more performance in intelligent edge use cases,
including smart cities, manufacturing, and retail. NVIDIA A2 GPUs running IVA workloads
result in more efficient deployments with up to 1.6X better price-performance and ten
percent better energy efficiency than previous GPU generations.

A2 Improves Performance by Up to 1.3X Versus T4


IVA Performance (Normalized)
NVIDIA T4 NVIDIA A2
Relative Performance (Video Streams 1080p30)

1.5x

1.3X
1.2X
1.0x
1.0X 1.0X

0.5x A2 Reduces Power Consumption by Up to


40% Versus T4
Lower Power and Configurable TDP
0.0x
ShuffleNet v2 MobileNet v2
SystemConfiguration: [Supermicro SYS-1029GQ-TRT, 2S Xeon Gold 6240 @2.6GHz, NVIDIA A2 NVIDIA T4
512GB DDR4, 1x NVIDIA A2 OR 1x NVIDIA T4] | Measured performance with
Deepstream 5.1. Networks: ShuffleNet-v2 (224x224), MobileNet-v2 (224x224) | 40 45 50 55 60 65 70 75
Pipeline represents end-to-end performance with video capture and decode,
pre-processing, batching, inference, and post-processing. TDP Operating Range (Watts)

NVIDIA A2 Brings Breakthrough NVIDIA Ampere


Architecture Innovations

THIRD-GENERATION TENSOR CORES

The third-generation Tensor Cores in A2 support integer math,


down to INT4, and floating point math, up to FP32, to deliver
high AI training and inference performance. The NVIDIA Ampere
architecture also supports TF32 and NVIDIA’s automatic mixed
precision (AMP) capabilities.

ROOT OF TRUST SECURITY

Providing security in edge deployments and end-points is critical


for enterprise business operations. A2 optionally supports secure
boot through trusted code authentication and hardened rollback
protections to protect against malicious malware attacks.

SECOND-GENERATION RT CORES

A2 includes dedicated RT Cores for ray tracing that enable


groundbreaking technologies at breakthrough speed.
With up to 2X the throughput over the previous generation and
the ability to concurrently run ray tracing with either shading or
denoising capabilities.

HARDWARE TRANSCODING PERFORMANCE

Exponential growth in video applications demand real-time


scalable performance, requiring the latest in hardware encode
and decode capabilities. A2 GPUs use dedicated hardware to fully
accelerate video decoding and encoding for the most popular
codecs, including H.265, H.264, VP9, and AV1 decode.

NVIDIA A2 TENSOR CORE GPU | DATASHEET | 2


Complete Inference Portfolio
NVIDIA offers a complete portfolio of NVIDIA-Certified Systems featuring Ampere NVIDIA-CERTIFIED SYSTEMS
Tensor Core GPUs as the inference engine powering NVIDIA AI. A2 Tensor Core NVIDIA-Certified Systems with NVIDIA
GPUs add entry-level inference in a low-profile form factor to the NVIDIA AI A2 combine compute acceleration
portfolio that already includes A100 and A30 Tensor Core GPUs. A100 features the and high-speed, secure networking to
highest inference performance at every scale and A30 brings optimal inference systems from leading NVIDIA partners
performance for mainstream servers. NVIDIA A2, NVIDIA A30, and NVIDIA A100 in configurations validated for
Tensor Core GPUs deliver leading inference performance across edge, data optimum performance, reliability, and
center, and cloud. scale. With NVIDIA-Certified Systems,
enterprises can confidently choose
performance-optimized hardware
solutions to power accelerated
computing workloads—from the
desktop to the data center to the edge.

Optimized Software and Services for Enterprise


NVIDIA AI Enterprise
NVIDIA AI Enterprise, an end-to-end cloud-native suite of AI and data analytics
software, is certified to run on A2 in hypervisor-based virtual infrastructure with
VMware vSphere. This enables management and scaling of AI and inference
workloads in a hybrid cloud environment.

Learn more

To learn more about the NVIDIA A2 Tensor Core GPU, visit nvidia.com/a2.

© 2022 NVIDIA Corporation. All rights reserved. NVIDIA, the NVIDIA logo, Triton, NVIDIA-Certified Systems, and NGC are trademarks and/or registered
trademarks of NVIDIA Corporation in the U.S. and other countries. All other trademarks and copyrights are the property of their respective owners. MAR22

You might also like