DSP (2015 Spring) Computation of DFT
Computation of DFT
Efficient algorithms for computing DFT – Fast Fourier Transform.
(a) Compute only a few points out of all N points
(b) Compute all N points
What are the efficiency criteria?
Number of multiplications
Number of additions
Chip area in VLSI implementation
DFT as a Linear Transformation
Matrix representation of DFT
Definition of DFT:
N 1
X (k ) x(n)WNkn , k 0,1, , N 1
n 0
N 1
1
x(n)
N
X (k )WNkn , n 0,1, , N 1
k 0
where
x ( 0) X ( 0)
Let
x (1) X (1)
xN , X N ,
x ( N 1) X ( N 1)
and
1 1 1 1
1 W W N2 WN N 1
N
WN 1 W N2 W N4 W N2( N 1)
( N 1) 2 ( N 1)
1 W N WN W N( N 1)( N 1)
Thus,
X N WN x N N - point DFT
x N WN1X N N - point IDFT
1
WN* X N
N
Because the matrix (transformation) WN has a specific structure and because WNk has par-
ticular values (for some k and n), we can reduce the number of arithmetic operations for
computing this transform.
NCTU EE 1
DSP (2015 Spring) Computation of DFT
Example x[n ] [0 1 2 3]
W40 W40 W40 W40 1 1 1 1
0 3
W4 W4 W4 W4 1 W4 W4 W4
1 2 1 2 3
W4
W40 W42 W44 W46 1 W42 W40 W42
0 3 6 9 3 2 1
W4 W4 W4 W4 1 W4 W4 W4
1 1 1 1
1 j 1 j
1 1 1 1
1 j 1 j
Only additions are needed to compute this specific transform.
(This is a well-known radix-4 FFT)
6
Thus, the DFT of x[n ] is X W x 2 2 j
2
4 4 4
2 2 j
Fast Fourier Transform
-- Highly efficient algorithms for computing DFT
General principle: Divide-and-conquer
Specific properties of WNk
Complex conjugate symmetry: WN kn (WNkn )*
Symmetry: WNk N 2 WNk
Periodicity: WNk N WNk
Particular values of k and n: e.g., radix-4 FFT (no multiplications)
Direct computation of DFT
N 1
X [k ] x[n] WNkn , k 0,1, , N 1
n 0
N 1
Re x[n ] Re WNkn Im x[n ] Im WNkn
n 0 j Re x[n ] Im WN Im x[n ]Re WN
kn
kn
For each k, we need N complex multiplications and N-1 complex additions. 4N real
multiplications and 4N-2 real additions.
NCTU EE 2
DSP (2015 Spring) Computation of DFT
k
We will show how to use the properties of WN to reduce computations.
Radix-2 algorithms: Decimation-in-time; Decimation-in-frequency
Composite N algorithms: Cooley-Tukey; Prime factor
Winograd algorithm
Chirp transform algorithm
Radix-2 Decimation-in-time Algorithms
-- Assume N-point DFT and N 2
Idea: N-point DFT N -point DFT N -point DFT
2 4
N -point DFT
4
N -point DFT N -point DFT
2 4
N -point DFT
4
Sequence: x[0] x[1] x[2] x[3] x[n ] x[ N 1]
2
Even index: x[0] x[2] x[ N 2 ]
Odd index: x[1] x[3] x[ N 1]
N 1
X [k ] x[n]WNkn , k 0,1, , N 1
n 0
x[n ]WNkn x[n ]WNkn
n
even n
odd
n 2 r n 2 r 1
N N
1 1
2 2
x[2r ]WN2 rk x[2r 1]WN( 2 r1) k
r 0 r 0
2
2 j 2 j
WN2 e N
e N /2
WN / 2
N N
1 1
2 2
X [k ] x[2r ]WNrk/ 2 WNk x[2r 1]WNrk/ 2
0
r r 0
N N
-point DFT -point DFT
2 2
G[k ] WNk H [k ]
NCTU EE 3
DSP (2015 Spring) Computation of DFT
Comparison:
(a) Direct computation of N-point DFT (N frequency samples):
~ N 2 complex multiplications and N 2 complex adds
(b) Direct computation of N -point DFT:
2
2 2
~ N complex multiplications and N complex adds
2 2
+ additional N complex multis and N complex adds
2
~ (Total:) N 2 N N N complex multis and adds
2
2 2
(c) log 2 N -stage FFT
Since N 2 , we can further break N -point DFT into two N -point DFT and
2 4
so on.
NCTU EE 4
DSP (2015 Spring) Computation of DFT
At each stage: ~ N complex multis and adds
Total: ~ N log 2 N complex multis and adds (--> N log N )
2
2
Number of Direct Computation: FFT: Speed Im-
points, N Complex Multis Complex Multis provement
Factor
4 16 4 4.0
8 64 12 5.3
16 256 32 8
64 4,096 192 21.3
256 65,536 1,024 64.0
1024 1,048,576 5,120 204.8
Butterfly: Basic unit in FFT
Two multiplications:
NCTU EE 5
DSP (2015 Spring) Computation of DFT
One multiplication:
In-place computations
Only two registers are needed for computing a butterfly unit.
X m [ p ] X m 1 [ p ] W Nr X m 1 [ q]
X m [ q] X m 1 [ p ] W Nr X m 1 [ q ]
Advantage: less storage!
In order to retain the in-place computation property, the input data are accessed in the
bit-reversed order.
Note: The outputs are in the normal order (same as the “position”)
Position Binary equivalent Bit reversed Sequence index
6 110 011 3
2 010 010 2
NCTU EE 6
DSP (2015 Spring) Computation of DFT
Remark: Index 3 input data is placed at position 6.
We may also place the inputs in the normal order; then the outputs are in the bit-reversed
order.
If we try to maintain the normal order of both inputs and outputs, then in-place compu-
tation structure is destroyed.
NCTU EE 7
DSP (2015 Spring) Computation of DFT
Radix-2 Decimation-in-frequency Algorithms
Dividing the output sequence X [k ] into smaller pieces.
N 1
X (k ) x(n )WNkn , k 0,1,, N 1
n 0
If k is even, k 2r .
N 1
N
X [2 r ] x[n ]WNkn , r 0,1, ,
2
1
n 0
N
1
2 N 1
x[n ]WN2 nr N x[n ]WN2 nr n (n N )
2
n 0 n
2
N N
1 1 N
2r n
2
2N
x n WN 2
x[n ]WN2 nr
n 0 n 0 2
2 r[ n N ]
WN 2 WN WN WN2 rn
2 rn rN
N
1
2 N
x[n] x n WN
2 nr
n 0 2
N
1
2 N
x[n ] x n WN 2
nr
n 0 2
Similarly, if k is odd, k 2r 1 .
N
1
2
N
X [2r 1] x[n ] x n WN WN 2
n nr
n 0 2
NCTU EE 8
DSP (2015 Spring) Computation of DFT
N
1
X [2 r ] 2 N nr
x[n] x n WN 2
2
n 0
N
1
2 N n
X [2r 1] x[n] x n
2
WN WN 2
nr
n 0
N
g[n ] x[n ] x n
Let 2
h[n ] x[n ] x n
N
2
We can further break X [2r ] into even and odd groups …
Again, we can reduce the two-multiplication butterfly into one multiplication. Hence, the
computational complexity is bout N log N . The in-place computation property holds if the
2
2
outputs are in bit-reversed order (when inputs are in the normal order).
NCTU EE 9
DSP (2015 Spring) Computation of DFT
FFT for Composite N
-- Cooley-Tukey Algorithm: N N 1 N 2
0 n1 N 1 1
Time index : n N 2 n1 n2
0 n2 N 2 1
Freq. index : k k1 N1k 2 0 k1 N1 1
0 k 2 N 2 1
Remark: n ( n1 , n2 ) and k ( k1 , k 2 )
Goal: Decompose N-point DFT into two stages:
N 1 -point DFT N 2 -point DFT
N 1
X [k ] x[n]WNkn , 0 k N 1
n 0
X [k1 N 1k 2 ]
N 2 1 N1 1
x[ N 2 n1 n2 ] WNk N k N n n 1 1 2 2 1 2
n2 0 n1 0
N 2 1 N1 1
x[ N 2 n1 n2 ] W N kn
N N
W k n 2 1 1 1 2
WNk1N1n2 WNN1N 2k2n1
n2 0 n1 0
WNk1n1 WNk2 n2 1
1 2
N 2 1 N1 1
k1n1 k1n2
x[ N 2 n1 n2 ] WN1 WN WNk22n2
n2 0 n1 0
twiddle
N1 -point factor
N 2 -point
Procedure
(1) Compute N 1 -point DFT: (row transform)
N1 1
G[n2 , k1 ] x[ N 2 n1 n2 ] WNk n 1 1
1
n1 0
(2) Multiply twiddle factors:
~
G[n2 , k1 ] WNk1n2 G[n2 , k1 ]
(3) Compute N 2 -point DFT: (column transform)
N 2 1
~
X [k1 N1k 2 ] G[n2 , k1 ] WNk n 2 2
2
n2 0
NCTU EE 10
DSP (2015 Spring) Computation of DFT
(Computation of N=15-point DFT by means of 3-point and 5-point DFTs.)
NCTU EE 11
DSP (2015 Spring) Computation of DFT
Extension: N N 1 N 2 N
Let ( N ) number of multiplications for N - point DFT
If N N 1 N 2
1. row transform : N 2 ( N1 )
2. twiddle factors : N1 N 2 N
3. column transfrm : N ( N )
1 2
( N ) N 2 ( N1 ) N1 ( N 2 ) N
( N1 ) ( N 2 )
N 1
N1 N2
In general,
(Ni )
( N ) N ( 1)
i 1 Ni
In fact, the term 1 should be 1 because rearranging the butterfly structure
2
would make half of the branches becoming “1”.
Special Case: N 1 N 2 N 2
Radix-2: N 1 N 2 N 2 and log 2 N
( N ) N 1 2 multiplications because ( 2) requires no multiplications.
Radix-4: N1 N 2 N 4 and log 4 N
( N ) N 1 2 multiplications because ( 4 ) requires no multiplications. This FFT
has fewer stages than Radix-2 ==> fewer multiplications.
W40 W40 W40 W40 1 1 1 1 1 1 1 1
0
W42 W43 1 W4 W4 W4 1 j 1 j
1 2 3
W4 W41
W4 2
W40 W42 W44 W46 1 W4 W4 W4 1 1 1 1
2 0
0 1
W46 W49 1 W4 W4 W4 1 j 1 j
3 3 2
W4 W4
NCTU EE 12
DSP (2015 Spring) Computation of DFT
Inverse FFT
IDFT: 1 N 1
(*)
x[n ]
N
X [k ] WNkn
k 0
N 1
DFT:
X [k ] x[n] W Nnk
n 0
Hence, take the conjugate of (*) :
*
1 N 1 kn
x [n ] X [k ] W N
*
N k 0
X [k ] W Nkn
N 1
1 *
N k 0
X * [k ] WNkn
N 1
1
N k 0
1
N
DFT X * ( k )
Take the conjugate of the above equation:
x[ n ]
1
N
DFT X * ( k )
*
1
N
FFT X * ( k ) *
Thus, we can use the FFT algorithm to compute the inverse DFT.
The Goertzel Algorithm
2
j( ) Nk
W kN
N e N
e j 2k 1
N 1 N 1
X [k ] W kN
N x[r ]W
r 0
kr
N x[ r ]WN k ( N r )
r 0
If we define x[n] = 0 for n < 0 and n N
and yk [ n ] x[r ]W
r
k ( N r )
N u[ n r ] x[n] (WN knu[n]) ,
X [ k ] yk [ n ] n N
NCTU EE 13
DSP (2015 Spring) Computation of DFT
1
H k ( z)
1 WN k z 1
If x[n] is complex, we need 4 real multiplications and 4 real additions to compute each
yk[n].
To compute yk[N], we need to compute yk[1], yk[2], …, yk[N-1].
We need 4N real multiplications and 4N real additions to compute X[k].
Remarks:
less efficient than the direct method.
Avoid the computation or storage of the coefficients WNkn .
To reduce the number of multiplications,
1 WNk z 1 1 WNk z 1
H k ( z)
(1 WN k z 1 )(1 WNk z 1 ) 1 2 cos( 2k / N ) z 1 z 2
If x[n] is complex, we only need 2 real multiplications and 4 real additions to implement
the poles of the system.
(The complex multiplication by W Nk needs not be performed at every iteration.)
To compute X[k], we need 2N real multiplications and 4N real additions for the poles
and 4 real multiplications and 4 real additions for the zero.
Remarks:
Avoid the computation or storage of the coefficients WNkn .
Only need to compute and save WNk and cos( 2k / N ) .
NCTU EE 14