0% found this document useful (0 votes)
37 views18 pages

Chapter 8 Folding

Uploaded by

dowoc61946
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views18 pages

Chapter 8 Folding

Uploaded by

dowoc61946
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

FOLDING

The objective of folding is to reduce hardware by sharing. The idea is to


fold the same operation in an algorithm to a functional unit. Of course,
the performance will suffer. Therefore, folding is a method to trade speed
for space.
e.g. y(n) = a(n) + b(n) + c(n)
Folded design by sharing the two addition
operations with one addition unit
Direct Implementation Folding factor, N=2

Note: Nl+n is an instance at which the switch is connected


A delay/storage element is automatically required l – lth iteration
N – folding order
1
Folding Transform Basic Idea
!
Given D(U→V) = w(e) and Hu is pipelined by Pu
After folding, at l-th iteration, node U and V are scheduled to
Nl+u and Nl+v respectively
Folding equation is:
!
DF(U→V) = [N(l+w(e))+v] – [Nl+Pu+u]
!
DF(U→V) = Nw(e) – Pu + v – u

Folding set is an ordered set of operations executed by the same functional unit.
Each folding set contains N entries, some of which may be null operations.
Each functional unit has its own folding set but the set may have more than one
possibility.
2
BIQUAD Filter as Example

Note: the folding sets


are typically obtained
from a scheduling and
allocation algorithm

Addition operations and multiplication operations are folded to


one adder unit and one multiplier unit, N=4. The folding order is
also shown given by folding sets:
adder multiplier
S1 = {4,2,3,1} S2 = {5, 8, 6, 7}
Addition takes 1u.t. Multiplication takes 2u.t.
i.e. PA = 1 PM = 2
3
Folding Equation for Each Edge
Folded Biquad Filter:

!
Note: DF(U→V) ≥ 0 must hold for all of the edges

This solution is in fact obtained after re-timing 4


Original Biquad Filter DFG

5
Negative Delay Is Not Valid
Nwr(e) – Pu + v – u ≥ 0 wr(e) = w(e) + r(V) – r(U)

Combined
N(w(e) + r(V) – r(U)) – Pu + v – u ≥ 0

Substituting DF(U→V) for Nw(e)-Pu+v-u and


solving for r(U)-r(V)
"
!!(#→%)
r(U) – r(V) ≤
'

Since retiming values of the nodes


are restricted to be integers,
"
!!(#→%)
r(U) – r(V) ≤
'

6
Constraint Graph for the Set of Inequalities
A solution is r(1)=1, r(2)=0, r(3)=-1, r(4)=0,
r(5)=-1, r(6)=-1, r(7)=-2 and r(8)=-1

Note: the same solution can be found by cutset retiming


7
Further Register Minimization – Lifetime Analysis

Lifetime analysis is a procedure used to compute the


minimum number of registers required to implement a
DSP algorithm in hardware. In lifetime analysis, the
number of ‘live’ variables at each time unit is computed,
and the maximum number of live variables at any time
unit is determined. This is the minimum number of
registers required to implement the DSP.

‘live’ variable – a variable is live from the time it is produced


through the time it is consumed.

8
Lifetime Chart
A DSP has 3 variables, a, b and c, and has a periodicity of 6 cycles

A variable is not live during the clock cycle in which it is produced,


and the variable is live during the clock cycle in which it is
consumed
9
Minimum Number of Registers
Lifetime chart with 3 consecutive iterations
The lifetime of a variable from one iteration may overlap
with the lifetime of variables from other iteration. When
the periodic nature is taken into account, the minimum
number of register is 3.
This can also be found by drawing the lifetime of variables
for the 0-th iteration and letting the number of live
variables at the time partitions n≥N be the sum of the
number of live variables due to the 0-th iteration at cycles
n-kN for all integers k≥0.

10
Data Allocation Using Forward-Backward
Register Minimization
1. Determine the minimum number of registers using lifetime analysis.
2. Input each variable at the time step corresponding to the beginning of
its lifetime. If multiple variables are input in a given cycle, these are
allocated to the initial register and the other variables are allocated to
consecutive registers in decreasing order of lifetime.
3. Each variable is allocated in a forward manner until it is dead or it
reaches the last register. In forward allocation, if the register i holds the
variable in the current cycle, then the register i+1 holds the same
variable in the next cycle. If the register i+1 is not available, then the
variable is allocated to the first available forward register.
4. Since the allocation is periodic, the allocation of the current iteration
also repeats itself in subsequent iterations. Thus, if Rj is occupied with
a variable in cycle l, then Rj would occupy the corresponding variable in
cycle l+N, where N denotes the periodicity of the allocation. Therefore,
we “hash” the position for Rj at the time unit l+N for each j and l.
11
continuing
5. For variables that reach the last register and are not yet dead,
the remaining life period is calculated, and these variables are
allocated to a register in a backward manner on a first-come
first served basis. If multiple registers are available for
backward allocation first try to choose a register such that
backward allocation has already been performed from the last
register to this register. In the case where more than one
register qualifies for backward allocation, choose the register
with the minimum number of forward registers among all
candidate registers that have a sufficient number of forward
registers to complete the allocation of the variable. After a
variable has been allocated backward, allocate it forward until it
is dead or it again reaches the last register.
6. Repeat steps 4 and 5 as required until the allocation is
complete.

12
Example
Transpose operation of a 3x3 matrix
𝑎 𝑏 𝑐
𝑑 𝑒 𝑓
𝑔 ℎ 𝑖

period is 9.

13
Allocation Table

14
Full Procedure to Minimize Register
in Folded Architecture
1. Perform retiming for folding
2. Write the folding equations
3. Use the folding equations to construct a lifetime table
4. Draw the lifetime chart and determine the required
number of registers
5. Perform forward-backward register allocation
6. Draw the folded architecture that uses the minimum
number of registers

15
Biquad Filter Example
Original DFG: Retimed DFG:

Folding equations: Folded architecture without register minimization:

16
Minimize Registers
Lifetime table: Allocation:

Lifetime chart indicting only 2 registers


Implementation:
are required:

17
18

You might also like