Lecture HWA

Hardware accelerators are specialized processors designed to perform specific computations more efficiently than general-purpose processors, with examples including GPUs, FPGAs, and ASICs. They offload tasks from CPUs to optimize performance and reduce power consumption, particularly excelling in parallel processing. The choice between FPGAs and ASICs depends on application needs, with FPGAs offering flexibility and faster time-to-market, while ASICs provide higher performance and lower power consumption.

Uploaded by

Athmajan Vu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views11 pages

Lecture HWA

Uploaded by

Athmajan Vu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Hardware Accelerators

Zaheer Khan
• A hardware accelerator is a specialized processor that is designed to accelerate a specific type of computation.
• They are optimized for a particular task or set of tasks, which allows them to perform those tasks faster and
more efficiently than general-purpose processors.
• Some examples of hardware accelerators include
• graphics processing units (GPUs),
• FPGAs
• ASIC
How do Hardware Accelerators Work?
• Hardware accelerators work by offloading specific types of computations from the general-purpose processor
to a specialized processor.
• This allows the specialized processor to perform the computation more efficiently and with less power
consumption than a general-purpose processor.
• Example: GPUs are optimized for the parallel processing of large amounts of data, which makes them well-
suited for graphics rendering and machine learning.

Flexibility vs Efficiency Trade-off

• The purpose-built architecture of GPUs allows
• the offloading of certain calculations from the
CPU.
• These types of calculations are called “single
instruction multiple data” (SIMD)
• so GPUs are great at simplistic operations on
large inputs
• whereas CPUs excel at complex operations on
small input streams.

• Parallelism
• Task Level
• Instruction Level
• Data Level

• Differences with Vector Processors

❑ CPUs and GPUs have fundamentally different design philosophies
➢ CPUs: low-latency, low-throughput high clock freq., large caches, sophisticated
control, powerful ALUs
➢ GPUs: high-latency, high-throughput moderate clock freq., small caches, simple
control, (many) energy efficient ALUs
➢ Require massive number of threads to tolerate latencies
On an FPGA one starts out ASIC development starts further down into the
With a large array of logic blocks, weeds.
clock buffers, This means that these components must either
PLLs, on-chip RAMs, be purchased, come as part of a library,
I/O buffers, or they must be individually developed for use
(de)serializers, power distribution networks and more, within any ASIC design.
Design Process Simple design process. Long and complex design process.

It is expensive as it involves the cost of

Expenses There are no non-recurring expenses.
circuit design and mask design.

Termed as Faster “time-to-market” product. Longer “time-to-market” product.

Speed Slower than ASIC. Fast.

Reusability and Flexibility Reusable and flexible. Not reusable and not flexible.

Wastage Un-avoidable No wastage of hardware.

Best suited When the required numbers are less. When the required numbers are large.

• When choosing between an ASIC or FPGA, it is best to ask what the end use application
will be.
• If your application requires constant bug fixes, feature and design changes, and software
flexibility, then FPGAs may be the right solution.
• If your end application requires high performance, smaller device footprint, and
significantly lower power consumption, then ASICs are your best bet.
• What to accelerate?
• Decide the operational specifications of the hardware accelerator
• Profile software applications
• Determine the critical path/bottleneck, and frequently used kernels or functions

•
Accelerator Design How to accelerate?
• Architecture of the accelerator
• Memory hierarchy and I/O interfaces
• CPU-accelerator interfaces
• Programming interfaces

• Acceleration goals/requirements/constraints?
• Maximum latency
• Minimum throughput
• Maximum power consumption
• Cost, time to market, etc.
• Accelerator Design
• A few examples of choices in hardware accelerator design
• Types of parallelism exploited
• Fine-grained vs coarse-grained
• Data parallel vs task parallel

• Optimized for high throughput vs low latency E.g., optimizing number of tasks completed per unit of time,
OR, execution time of a single task

• Memory organization
• External interfaces
• On-chip memory usage, data buffering schemes
Parallelism
Why are accelerators faster? Exploit the
parallelism in kernels/applications

Consider vector addition:

•No data dependences between loop iterations

•Explicit data parallelism in this example
•We could instantiate K parallel adders Speedup = N/K
Can we really achieve N/K speedup?
Interface Choices
How do data move in and out of the accelerator?
What are the bandwidths needed for the interfaces?

AHA U4
No ratings yet
AHA U4
199 pages
AHA Unit - 4
No ratings yet
AHA Unit - 4
173 pages
FPGA Based Hardware Acceleration A CPU - Accelerator Interface Exploration
No ratings yet
FPGA Based Hardware Acceleration A CPU - Accelerator Interface Exploration
4 pages
Lecture02 - High-Level Digital Design Automation
No ratings yet
Lecture02 - High-Level Digital Design Automation
34 pages
Unveiling The Powerhouses of AI A Comprehensive ST
No ratings yet
Unveiling The Powerhouses of AI A Comprehensive ST
9 pages
Accelerating Compute-Intensive Applications With Gpus and Fpgas
No ratings yet
Accelerating Compute-Intensive Applications With Gpus and Fpgas
7 pages
CH6 - Computer Abstractions and Technology
No ratings yet
CH6 - Computer Abstractions and Technology
69 pages
Lec 30
No ratings yet
Lec 30
13 pages
System On Chip
No ratings yet
System On Chip
47 pages
EE382N-4 Advanced Microcontroller Systems: Accelerators and Co-Processors
No ratings yet
EE382N-4 Advanced Microcontroller Systems: Accelerators and Co-Processors
75 pages
It LSN4 SP22
No ratings yet
It LSN4 SP22
47 pages
03 - Handout - 1 (2) CSA
No ratings yet
03 - Handout - 1 (2) CSA
5 pages
Computer Architecture Lec3 Combinational Logic Design 1731399879
No ratings yet
Computer Architecture Lec3 Combinational Logic Design 1731399879
193 pages
5 Introduction To Huawei AI Platforms v3.5
No ratings yet
5 Introduction To Huawei AI Platforms v3.5
113 pages
Week 4a - Computer Architecture Fundamentals - Part 1
No ratings yet
Week 4a - Computer Architecture Fundamentals - Part 1
45 pages
CPU vs GPU: Functions and Real-Life Uses
No ratings yet
CPU vs GPU: Functions and Real-Life Uses
8 pages
GPGPU
No ratings yet
GPGPU
139 pages
Algorithmic Considerations For Graphical Hardware Accelerated Applications
No ratings yet
Algorithmic Considerations For Graphical Hardware Accelerated Applications
9 pages
Trendy Rozwoju Układów CPU I GPU A.D.2024
No ratings yet
Trendy Rozwoju Układów CPU I GPU A.D.2024
37 pages
Computer Architecture Overview
No ratings yet
Computer Architecture Overview
64 pages
Comparative Study On CPU GPU and TPU
No ratings yet
Comparative Study On CPU GPU and TPU
9 pages
And Motivation: Presenter
No ratings yet
And Motivation: Presenter
22 pages
Tech Hardware for Engineers
No ratings yet
Tech Hardware for Engineers
1 page
Computer Architecture Overview
No ratings yet
Computer Architecture Overview
470 pages
RTSEC Documentation
No ratings yet
RTSEC Documentation
4 pages
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
No ratings yet
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
15 pages
Module 1
No ratings yet
Module 1
12 pages
Instructor: L. N. Bhuyan
No ratings yet
Instructor: L. N. Bhuyan
32 pages
TerminologieL1 Hardware
No ratings yet
TerminologieL1 Hardware
53 pages
Advance Operating System-Computer Organization: Chap 1a: Overview
No ratings yet
Advance Operating System-Computer Organization: Chap 1a: Overview
71 pages
Lecture1 2
No ratings yet
Lecture1 2
30 pages
Advanced Computer Architecture Course
No ratings yet
Advanced Computer Architecture Course
28 pages
Computer Architecture Overview
No ratings yet
Computer Architecture Overview
51 pages
Unit I Part 1 Introduction Design Methodologies
No ratings yet
Unit I Part 1 Introduction Design Methodologies
46 pages
Lecture 0: Cpus and Gpus: Prof. Mike Giles
No ratings yet
Lecture 0: Cpus and Gpus: Prof. Mike Giles
36 pages
Hardware Acceleratetor
No ratings yet
Hardware Acceleratetor
5 pages
Computer Architecture Course Guide
No ratings yet
Computer Architecture Course Guide
42 pages
Lecture 1 - Nursing Informatics
No ratings yet
Lecture 1 - Nursing Informatics
7 pages
Advanced Computer Architecture: Azvjvhd
No ratings yet
Advanced Computer Architecture: Azvjvhd
61 pages
Hardware View of The Embedded Systems-1 - Anz
No ratings yet
Hardware View of The Embedded Systems-1 - Anz
29 pages
Lec01 Intro
No ratings yet
Lec01 Intro
41 pages
Microprocessors
No ratings yet
Microprocessors
5 pages
Computer Proceessors
No ratings yet
Computer Proceessors
28 pages
Core of The Embedded System
No ratings yet
Core of The Embedded System
107 pages
CH1 - Introduction To Personal Computer System - 02
No ratings yet
CH1 - Introduction To Personal Computer System - 02
82 pages
Computer Architecture Insights
No ratings yet
Computer Architecture Insights
29 pages
Instruction Set: 8085 Microprocessor
No ratings yet
Instruction Set: 8085 Microprocessor
14 pages
CS3350B Computer Architecture: Marc Moreno Maza
100% (1)
CS3350B Computer Architecture: Marc Moreno Maza
45 pages
Ada2024 Gpu 1
No ratings yet
Ada2024 Gpu 1
47 pages
Computer Architecture P1
No ratings yet
Computer Architecture P1
37 pages
06 Intro Gpus
No ratings yet
06 Intro Gpus
33 pages
CAAO5
No ratings yet
CAAO5
6 pages
The Essentials of Computer Organization and Architecture, Fifth Edition by Null and
100% (2)
The Essentials of Computer Organization and Architecture, Fifth Edition by Null and
49 pages
Chapter1 Computer Abstractions and Technology
No ratings yet
Chapter1 Computer Abstractions and Technology
52 pages
1.1 Summary Notes Computer Science A Level OCR
0% (1)
1.1 Summary Notes Computer Science A Level OCR
5 pages
48423B Fusion Whitepaper WEB
No ratings yet
48423B Fusion Whitepaper WEB
8 pages
C-Programming Lab Sheet I Year / I Part Faculty: Civil/Computer/Electrical Labsheet#4
No ratings yet
C-Programming Lab Sheet I Year / I Part Faculty: Civil/Computer/Electrical Labsheet#4
4 pages
SF6 Gas Dew Point Meter Manual
No ratings yet
SF6 Gas Dew Point Meter Manual
22 pages
Home Connect App ConditionsOfUse
No ratings yet
Home Connect App ConditionsOfUse
17 pages
Spring Notes
No ratings yet
Spring Notes
13 pages
Working of Chatgpt Report
No ratings yet
Working of Chatgpt Report
24 pages
5.4 Contents of Good Resume
No ratings yet
5.4 Contents of Good Resume
2 pages
Research Paper B
No ratings yet
Research Paper B
44 pages
STV 2012
No ratings yet
STV 2012
6 pages
Python MySQL Fashion Store Management
No ratings yet
Python MySQL Fashion Store Management
17 pages
Chapter04 Answers at End
No ratings yet
Chapter04 Answers at End
47 pages
CPP Resume Template
No ratings yet
CPP Resume Template
1 page
Marex Os Iii: For A Safe Journey Catalog
100% (1)
Marex Os Iii: For A Safe Journey Catalog
102 pages
PercCLI Logs 7-21-25
No ratings yet
PercCLI Logs 7-21-25
4 pages
HM - Detectors - HM PSE AP - HM PSE I AP - R1
No ratings yet
HM - Detectors - HM PSE AP - HM PSE I AP - R1
3 pages
4200, R8.0.1, Configuration Guide Volume 5 X4 and X9 Modules, Rev. A, 009-2011-461 PDF
No ratings yet
4200, R8.0.1, Configuration Guide Volume 5 X4 and X9 Modules, Rev. A, 009-2011-461 PDF
264 pages
Lab 3 Yolo Object Detection
No ratings yet
Lab 3 Yolo Object Detection
5 pages
Vpls Configuration Ios XR With BGP and LDP Autodiscovery
No ratings yet
Vpls Configuration Ios XR With BGP and LDP Autodiscovery
104 pages
Delhi Police: (S.I/Constable)
No ratings yet
Delhi Police: (S.I/Constable)
7 pages
Princewill Coventry University Assignment
No ratings yet
Princewill Coventry University Assignment
24 pages
1.3.4 Lab - Visualizing The Black Hats
No ratings yet
1.3.4 Lab - Visualizing The Black Hats
3 pages
4256-4752 Ydc960 (1-3K) - RT (0.9PF) 120V
No ratings yet
4256-4752 Ydc960 (1-3K) - RT (0.9PF) 120V
45 pages
Loan Management System Report
No ratings yet
Loan Management System Report
30 pages
How To Keep Journal Entry Approval History
No ratings yet
How To Keep Journal Entry Approval History
8 pages
Service Management in Linux
No ratings yet
Service Management in Linux
21 pages
5.can ChatGPT Serve As A Multi-Criteria Decision Maker A Novel Approach To Supplier Evaluation
No ratings yet
5.can ChatGPT Serve As A Multi-Criteria Decision Maker A Novel Approach To Supplier Evaluation
5 pages
DDWorkflow
No ratings yet
DDWorkflow
5 pages
Pandemic's Impact on IT and Education
No ratings yet
Pandemic's Impact on IT and Education
2 pages
Mind Reading PDF Report
67% (3)
Mind Reading PDF Report
12 pages
Shubham Khatri EM
No ratings yet
Shubham Khatri EM
3 pages
Explanation Exercise10 Python
No ratings yet
Explanation Exercise10 Python
6 pages

Lecture HWA

Uploaded by

Lecture HWA

Uploaded by

Hardware Accelerators

Flexibility vs Efficiency Trade-off

• Differences with Vector Processors

It is expensive as it involves the cost of

Termed as Faster “time-to-market” product. Longer “time-to-market” product.

Speed Slower than ASIC. Fast.

Wastage Un-avoidable No wastage of hardware.

Consider vector addition:

•No data dependences between loop iterations

You might also like