0% found this document useful (0 votes)

37 views18 pages

Chapter 8 Folding

Uploaded by

dowoc61946

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views18 pages

Chapter 8 Folding

Uploaded by

dowoc61946

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

FOLDING

The objective of folding is to reduce hardware by sharing. The idea is to

fold the same operation in an algorithm to a functional unit. Of course,
the performance will suffer. Therefore, folding is a method to trade speed
for space.
e.g. y(n) = a(n) + b(n) + c(n)
Folded design by sharing the two addition
operations with one addition unit
Direct Implementation Folding factor, N=2

Note: Nl+n is an instance at which the switch is connected

A delay/storage element is automatically required l – lth iteration
N – folding order
1
Folding Transform Basic Idea
!
Given D(U→V) = w(e) and Hu is pipelined by Pu
After folding, at l-th iteration, node U and V are scheduled to
Nl+u and Nl+v respectively
Folding equation is:
!
DF(U→V) = [N(l+w(e))+v] – [Nl+Pu+u]
!
DF(U→V) = Nw(e) – Pu + v – u

Folding set is an ordered set of operations executed by the same functional unit.
Each folding set contains N entries, some of which may be null operations.
Each functional unit has its own folding set but the set may have more than one
possibility.
2
BIQUAD Filter as Example

Note: the folding sets

are typically obtained
from a scheduling and
allocation algorithm

Addition operations and multiplication operations are folded to

one adder unit and one multiplier unit, N=4. The folding order is
also shown given by folding sets:
adder multiplier
S1 = {4,2,3,1} S2 = {5, 8, 6, 7}
Addition takes 1u.t. Multiplication takes 2u.t.
i.e. PA = 1 PM = 2
3
Folding Equation for Each Edge
Folded Biquad Filter:

!
Note: DF(U→V) ≥ 0 must hold for all of the edges

This solution is in fact obtained after re-timing 4

Original Biquad Filter DFG

5
Negative Delay Is Not Valid
Nwr(e) – Pu + v – u ≥ 0 wr(e) = w(e) + r(V) – r(U)

Combined
N(w(e) + r(V) – r(U)) – Pu + v – u ≥ 0

Substituting DF(U→V) for Nw(e)-Pu+v-u and

solving for r(U)-r(V)
"
!!(#→%)
r(U) – r(V) ≤
'

Since retiming values of the nodes

are restricted to be integers,
"
!!(#→%)
r(U) – r(V) ≤
'

6
Constraint Graph for the Set of Inequalities
A solution is r(1)=1, r(2)=0, r(3)=-1, r(4)=0,
r(5)=-1, r(6)=-1, r(7)=-2 and r(8)=-1

Note: the same solution can be found by cutset retiming

7
Further Register Minimization – Lifetime Analysis

Lifetime analysis is a procedure used to compute the

minimum number of registers required to implement a
DSP algorithm in hardware. In lifetime analysis, the
number of ‘live’ variables at each time unit is computed,
and the maximum number of live variables at any time
unit is determined. This is the minimum number of
registers required to implement the DSP.

‘live’ variable – a variable is live from the time it is produced

through the time it is consumed.

8
Lifetime Chart
A DSP has 3 variables, a, b and c, and has a periodicity of 6 cycles

A variable is not live during the clock cycle in which it is produced,

and the variable is live during the clock cycle in which it is
consumed
9
Minimum Number of Registers
Lifetime chart with 3 consecutive iterations
The lifetime of a variable from one iteration may overlap
with the lifetime of variables from other iteration. When
the periodic nature is taken into account, the minimum
number of register is 3.
This can also be found by drawing the lifetime of variables
for the 0-th iteration and letting the number of live
variables at the time partitions n≥N be the sum of the
number of live variables due to the 0-th iteration at cycles
n-kN for all integers k≥0.

10
Data Allocation Using Forward-Backward
Register Minimization
1. Determine the minimum number of registers using lifetime analysis.
2. Input each variable at the time step corresponding to the beginning of
its lifetime. If multiple variables are input in a given cycle, these are
allocated to the initial register and the other variables are allocated to
consecutive registers in decreasing order of lifetime.
3. Each variable is allocated in a forward manner until it is dead or it
reaches the last register. In forward allocation, if the register i holds the
variable in the current cycle, then the register i+1 holds the same
variable in the next cycle. If the register i+1 is not available, then the
variable is allocated to the first available forward register.
4. Since the allocation is periodic, the allocation of the current iteration
also repeats itself in subsequent iterations. Thus, if Rj is occupied with
a variable in cycle l, then Rj would occupy the corresponding variable in
cycle l+N, where N denotes the periodicity of the allocation. Therefore,
we “hash” the position for Rj at the time unit l+N for each j and l.
11
continuing
5. For variables that reach the last register and are not yet dead,
the remaining life period is calculated, and these variables are
allocated to a register in a backward manner on a first-come
first served basis. If multiple registers are available for
backward allocation first try to choose a register such that
backward allocation has already been performed from the last
register to this register. In the case where more than one
register qualifies for backward allocation, choose the register
with the minimum number of forward registers among all
candidate registers that have a sufficient number of forward
registers to complete the allocation of the variable. After a
variable has been allocated backward, allocate it forward until it
is dead or it again reaches the last register.
6. Repeat steps 4 and 5 as required until the allocation is
complete.

12
Example
Transpose operation of a 3x3 matrix
𝑎 𝑏 𝑐
𝑑 𝑒 𝑓
𝑔 ℎ 𝑖

period is 9.

13
Allocation Table

14
Full Procedure to Minimize Register
in Folded Architecture
1. Perform retiming for folding
2. Write the folding equations
3. Use the folding equations to construct a lifetime table
4. Draw the lifetime chart and determine the required
number of registers
5. Perform forward-backward register allocation
6. Draw the folded architecture that uses the minimum
number of registers

15
Biquad Filter Example
Original DFG: Retimed DFG:

Folding equations: Folded architecture without register minimization:

16
Minimize Registers
Lifetime table: Allocation:

Lifetime chart indicting only 2 registers

Implementation:
are required:

17
18

FPGA Lec06 Folding
No ratings yet
FPGA Lec06 Folding
20 pages
DSP Folding Techniques Guide
No ratings yet
DSP Folding Techniques Guide
84 pages
FPGA - Ch0 - Folding
No ratings yet
FPGA - Ch0 - Folding
84 pages
VLSI DSP Folding & Register Minimization
No ratings yet
VLSI DSP Folding & Register Minimization
35 pages
10.1109UPCON.2018.8597057 (Serial No-5)
No ratings yet
10.1109UPCON.2018.8597057 (Serial No-5)
6 pages
Case Studies in Combinational Logic Design
No ratings yet
Case Studies in Combinational Logic Design
51 pages
CS346 Code Generation II
No ratings yet
CS346 Code Generation II
72 pages
10 Liveness
No ratings yet
10 Liveness
27 pages
Anand Raghunathan Raghunathan@purdue - Edu: ECE 695R: S - C D
No ratings yet
Anand Raghunathan Raghunathan@purdue - Edu: ECE 695R: S - C D
12 pages
st20270256 PORT1 CMP4011
No ratings yet
st20270256 PORT1 CMP4011
7 pages
DSP Design - Lecture 6: Unfolding
No ratings yet
DSP Design - Lecture 6: Unfolding
44 pages
Ee457 HW1B r3
No ratings yet
Ee457 HW1B r3
9 pages
Folding Retiming
No ratings yet
Folding Retiming
7 pages
VLSI CAD Flow: Logic Synthesis, Placement and Routing: Guest Lecture by Srini Devadas
No ratings yet
VLSI CAD Flow: Logic Synthesis, Placement and Routing: Guest Lecture by Srini Devadas
70 pages
LECTURE B 1 FSM Minimization Intro
No ratings yet
LECTURE B 1 FSM Minimization Intro
18 pages
Ceg 3155 Assignment 2 Solutions
No ratings yet
Ceg 3155 Assignment 2 Solutions
8 pages
Global Reg Allocation
No ratings yet
Global Reg Allocation
76 pages
Optimization Techniques
No ratings yet
Optimization Techniques
16 pages
VLSI DSP 3
No ratings yet
VLSI DSP 3
18 pages
Logic BIST Test Response Compaction: VLSI Testing and Testability
No ratings yet
Logic BIST Test Response Compaction: VLSI Testing and Testability
36 pages
ECE3073 P8 Compilation Answers PDF
No ratings yet
ECE3073 P8 Compilation Answers PDF
7 pages
BCS402 Module 3 PDF
No ratings yet
BCS402 Module 3 PDF
12 pages
VLSI DSP Unfolding Techniques
No ratings yet
VLSI DSP Unfolding Techniques
27 pages
Design of Test Generator For Embedded Self-Testing: PHD, Professor at Southern Federal University Srodzin@
No ratings yet
Design of Test Generator For Embedded Self-Testing: PHD, Professor at Southern Federal University Srodzin@
4 pages
More Code Generation and Optimization: Pat Morin COMP 3002
No ratings yet
More Code Generation and Optimization: Pat Morin COMP 3002
33 pages
Vision 2024 CD Chapter 5 Compiler Code Optimization 731689660928542
No ratings yet
Vision 2024 CD Chapter 5 Compiler Code Optimization 731689660928542
24 pages
CD Unit 5
No ratings yet
CD Unit 5
9 pages
CS 415 Compilers: Problem Set 2 Due Date: Tuesday, February 12, 11:59pm Top-Down vs. Bottom-Up Register Allocation
No ratings yet
CS 415 Compilers: Problem Set 2 Due Date: Tuesday, February 12, 11:59pm Top-Down vs. Bottom-Up Register Allocation
2 pages
Unit 6 and 7 - Code Optimization and Code Generation
No ratings yet
Unit 6 and 7 - Code Optimization and Code Generation
48 pages
MC Ia-2
No ratings yet
MC Ia-2
14 pages
Ec 1201digital Electronics
No ratings yet
Ec 1201digital Electronics
18 pages
Digital Filtering in Hardware: Adnan Aziz
No ratings yet
Digital Filtering in Hardware: Adnan Aziz
102 pages
Compiler Optimization Basics
No ratings yet
Compiler Optimization Basics
18 pages
Unfolding Unfolding: Parallel Processing
No ratings yet
Unfolding Unfolding: Parallel Processing
13 pages
PLA Minimization and Testing
100% (1)
PLA Minimization and Testing
19 pages
Fpga Implementation of Hard Error Correction Technique Using Parallel Architecture
No ratings yet
Fpga Implementation of Hard Error Correction Technique Using Parallel Architecture
8 pages
COMPUTER ARCHITECTURE Exam Correction
No ratings yet
COMPUTER ARCHITECTURE Exam Correction
8 pages
Digital System Design: Answer Any FIVE Questions All Questions Carry Equal Marks
No ratings yet
Digital System Design: Answer Any FIVE Questions All Questions Carry Equal Marks
2 pages
DAA - 2.4-Job Sequencing
No ratings yet
DAA - 2.4-Job Sequencing
10 pages
ECE3073 Computer Systems Practice Questions Program Design and Analysis: Compilation
No ratings yet
ECE3073 Computer Systems Practice Questions Program Design and Analysis: Compilation
3 pages
2013 EE303A 2013 Final Solution
No ratings yet
2013 EE303A 2013 Final Solution
13 pages
Ceg 3155 Assignment 1 Solutions
No ratings yet
Ceg 3155 Assignment 1 Solutions
18 pages
Logic Synthesis
No ratings yet
Logic Synthesis
24 pages
Compiler Design: Spring 2017
No ratings yet
Compiler Design: Spring 2017
27 pages
FPGA - Ch5 - Unfolding
No ratings yet
FPGA - Ch5 - Unfolding
76 pages
Chapter 4 (Continued) : Caching Testing Memory Modules
No ratings yet
Chapter 4 (Continued) : Caching Testing Memory Modules
20 pages
VHDL Rolling Average Design Task
No ratings yet
VHDL Rolling Average Design Task
3 pages
Practice Midterm Soln
No ratings yet
Practice Midterm Soln
9 pages
Computer Organization Hamacher Instructor Manual Solution Chapter 61
No ratings yet
Computer Organization Hamacher Instructor Manual Solution Chapter 61
31 pages
LECTURE B 1 FSM Minimization Intro
No ratings yet
LECTURE B 1 FSM Minimization Intro
18 pages
Week 3 & 4 ECE-852 Pak Austria
No ratings yet
Week 3 & 4 ECE-852 Pak Austria
24 pages
Chapter 7: Systolic Architecture Design: Keshab K. Parhi
No ratings yet
Chapter 7: Systolic Architecture Design: Keshab K. Parhi
27 pages
Unit Ii Program Design and Analysis: - Software Components. - Representations of Programs. - Assembly and Linking
No ratings yet
Unit Ii Program Design and Analysis: - Software Components. - Representations of Programs. - Assembly and Linking
60 pages
Exercise2 Solution
No ratings yet
Exercise2 Solution
9 pages
VLSI Cell Library Design Guide
No ratings yet
VLSI Cell Library Design Guide
46 pages
Problem Weight Score 15 24 16 20 17 30 18 36 19 20 Total 130
No ratings yet
Problem Weight Score 15 24 16 20 17 30 18 36 19 20 Total 130
8 pages
Common DSP Algorithms
No ratings yet
Common DSP Algorithms
20 pages
Unfolding
No ratings yet
Unfolding
18 pages
Common DSP Algorithms
No ratings yet
Common DSP Algorithms
20 pages
Implementation Methods
No ratings yet
Implementation Methods
30 pages
Pipeling and Parallel Processing: Critical Path The Minimum Time Required For Processing One Sample
No ratings yet
Pipeling and Parallel Processing: Critical Path The Minimum Time Required For Processing One Sample
13 pages
Retiming: Reduce Clock Period by Shortening Critical Path Reduce The Number of Registers
No ratings yet
Retiming: Reduce Clock Period by Shortening Critical Path Reduce The Number of Registers
17 pages
Chapter 7 Unfolding
No ratings yet
Chapter 7 Unfolding
18 pages
Online Spot Admission Guideline Final Published
No ratings yet
Online Spot Admission Guideline Final Published
3 pages
Full Pivotcall Options Course Available Premiumcourses12: Generated Via PDF Scanner
No ratings yet
Full Pivotcall Options Course Available Premiumcourses12: Generated Via PDF Scanner
34 pages
Requirement Elicitation
No ratings yet
Requirement Elicitation
11 pages
Liverpool Hospital RFID Blood Tracking
No ratings yet
Liverpool Hospital RFID Blood Tracking
4 pages
Cyber Security En-12-13
No ratings yet
Cyber Security En-12-13
2 pages
LR (0) Parser
No ratings yet
LR (0) Parser
8 pages
Chapter 2 Classified
No ratings yet
Chapter 2 Classified
17 pages
Err MSG: A Fatal Exception 0E Has Occurred at 0028:C02A0201..
No ratings yet
Err MSG: A Fatal Exception 0E Has Occurred at 0028:C02A0201..
2 pages
Question No.1: Write The Steps To: Create Company
88% (8)
Question No.1: Write The Steps To: Create Company
15 pages
NMCP
No ratings yet
NMCP
2 pages
Modéle CV Français
No ratings yet
Modéle CV Français
3 pages
List of Media Outlets
No ratings yet
List of Media Outlets
5 pages
A Comprehensive Review On Gujarati-Text Summarizat
No ratings yet
A Comprehensive Review On Gujarati-Text Summarizat
7 pages
Ece 4219
No ratings yet
Ece 4219
2 pages
Chapter 4 Final
No ratings yet
Chapter 4 Final
6 pages
Studio Strings: Spitfire Audio
No ratings yet
Studio Strings: Spitfire Audio
27 pages
IBM Guardium Data Encryption Lab
No ratings yet
IBM Guardium Data Encryption Lab
49 pages
Transmission Diagnostic Trouble Codes F375-F399
50% (2)
Transmission Diagnostic Trouble Codes F375-F399
2 pages
IT Academic and Research Profile
No ratings yet
IT Academic and Research Profile
9 pages
Filipino Thesis Writing Guide
100% (1)
Filipino Thesis Writing Guide
8 pages
SpinView Getting Started
No ratings yet
SpinView Getting Started
12 pages
Winv112cp581 e
No ratings yet
Winv112cp581 e
11 pages
3hac073447 PM Omnicore V250xt-En
No ratings yet
3hac073447 PM Omnicore V250xt-En
460 pages
ALL SAP Certification Materials Available If You Need These Books
No ratings yet
ALL SAP Certification Materials Available If You Need These Books
5 pages
Adv C Question Bank
No ratings yet
Adv C Question Bank
2 pages
Software Re-Engineering Guide
100% (1)
Software Re-Engineering Guide
3 pages
TaKaDu Company Overview - March 2019
No ratings yet
TaKaDu Company Overview - March 2019
2 pages
Splunk Queries For Ecthp Exam
No ratings yet
Splunk Queries For Ecthp Exam
26 pages
Data2vec: A General Framework For Self-Supervised Learning in Speech, Vision & Language
No ratings yet
Data2vec: A General Framework For Self-Supervised Learning in Speech, Vision & Language
20 pages
WeHear Detailed Product Presentation 2
No ratings yet
WeHear Detailed Product Presentation 2
44 pages

Chapter 8 Folding

Uploaded by

Chapter 8 Folding

Uploaded by

FOLDING

The objective of folding is to reduce hardware by sharing. The idea is to

Note: Nl+n is an instance at which the switch is connected

Note: the folding sets

Addition operations and multiplication operations are folded to

This solution is in fact obtained after re-timing 4

Substituting DF(U→V) for Nw(e)-Pu+v-u and

Since retiming values of the nodes

Note: the same solution can be found by cutset retiming

Lifetime analysis is a procedure used to compute the

‘live’ variable – a variable is live from the time it is produced

A variable is not live during the clock cycle in which it is produced,

Folding equations: Folded architecture without register minimization:

Lifetime chart indicting only 2 registers

You might also like