0% found this document useful (0 votes)
14 views23 pages

CS5275 Lecture 7: The Fourier Transform: Jonathan Scarlett March 8, 2025

Uploaded by

kalaivani c
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views23 pages

CS5275 Lecture 7: The Fourier Transform: Jonathan Scarlett March 8, 2025

Uploaded by

kalaivani c
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

CS5275 Lecture 7: The Fourier Transform

Jonathan Scarlett

March 8, 2025

Acknowledgment. The first version of these notes was prepared by Lau Kang Ruey Gregory and Zhang
Yang for a CS6235 assignment.

Useful references:

• 3Blue1Brown: https://www.youtube.com/playlist?list=PL4VT47y1w7A1-T_VIcufa7mCM3XrSA5DD

• Blog post primers – ‘Fourier Analysis’ under https://www.jeremykun.com/primers/

• Textbook “Principles of Fourier Analysis” by Kenneth B. Howell

• A Very Short Course on Time Series Analysis: https://bookdown.org/rdpeng/timeseriesbook/

• (Not a topic we’ll cover but particularly relevant to TCS) Fourier analysis for analyzing Boolean functions:
https://www.youtube.com/watch?v=lXJP-UkTl-4

Categorization of material:

• Core material: Sections 1–4

• Extral material: Section 5 (DFT and FFT)

(Exam will strongly focus on “Core”. Take-home assessments may occasionally require consulting “Extra”.)

1 Introduction
1.1 Overview
In simple terms, the Fourier transform is a tool that allows us to represent a real-valued function (i.e., a
“signal”, or an “image” if the coordinates are 2D) as a superposition (sum/integral) of sines and cosines (e.g.,
sin(ωx) for various ω) or complex exponentials (e.g., eiωx for various ω) with suitably-chosen amplitudes and
frequencies. An example from 3Blue1Brown (the signal at the top is a weighted combination of the signals
below):

1
In some cases, the benefit of such representations is immediate, e.g.:

• If a signal that we are interested has all of its energy a certain frequency range but it gets corrupted by
random noise (which typically spans all frequencies), we can immediately mitigate the noise by replacing
“noise-only” frequency components by zero.

• In audio applications, we can seamlessly perform operations like “increase the bass level” as well as more
complex tasks.

• In telecommunications, different transmitters can transmit their data within different frequency bands
to avoid interfering with each other, and the Fourier transform facilitates this.

In other cases, the Fourier transform may be used in a more indirect or abstract manner, e.g.:

• Sinusoids and exponentials frequently arise as solutions to differential equations, and the Fourier trans-
form can be very useful in finding such solutions more easily.

• When computing Ax for some n×n matrix A and length-n vector x (say), for certain classes of the matrix
A (e.g., circulant), the Fourier transform can be used to perform the multiplication in time O(n log n)
instead of O(n2 ).

• Fourier transforms are widely used in probability under the name characteristic functions.

• A Boolean version of the Fourier transform is widely used in the analysis of Boolean functions.

The practical impact of the Fourier transform is staggering – it is constantly used in our phones, comput-
ers, etc., and an algorithm called the ‘Fast Fourier Transform (FFT)’ may well be one of the most invoked
algorithms ever.

1.2 Background on Bases


The Fourier transform can be viewed as a convenient choice of basis for (a class of) real-valued functions.
Since the notion of basis may be more familiar for vectors rather than functions, it’s useful to draw an analogy
between the two.
Pn
First consider real vectors in Euclidean space, say Rd , with the usual inner product ⟨u, v⟩ = i=1 ui vi ,
and note the following:

• The standard basis is given by {ei }ni=1 , where ei = (0, . . . , 0, 1, 0, . . . , 0) with the 1 being in the i-th
position. The meaning of basis is that (i) any u ∈ Rd can be written as a linear combination of these,
and (ii) none of the basis vectors can be written as a linear combinations of the others.

• Other bases are also possible – for example, in R2 we could write any vectors as a linear combination of
(0, 1) and (1, 1), or a linear combination of √12 , √12 and − √12 , √12 .
 

• Some bases are more “convenient” than others for certain purposes – for example, for a matrix A ∈ Rn×n ,
the eigenvalue decomposition can be viewed as transforming to a basis in which the operation A becomes
coordinate-wise scaling.

2
• One particularly useful property a basis {vi }ni=1 can have is orthonormality: ⟨vi , vj ⟩ = 0 for all i, j
(orthogonal) and ∥vi ∥ = 1 for all i (normalized). In this case, we can write any vector u ∈ Rd as

n
X
u= ⟨u, vi ⟩vi ,
i=1

with each term ⟨u, vi ⟩vi being interpreted as the projection of u onto vi .

Now we move from vectors to real-valued functions, i.e., f (x) with x ∈ R. Here x replaces the role of the
index i ∈ {1, . . . , n} compared to the vector case. The above considerations naturally generalize as follows,
with summations now replaced by integrals:

• We use the following notion of inner product:


Z
⟨f, g⟩ = f (x)g(x)dx,
R

p qR
and the corresponding norm is given by ∥f ∥ = ⟨f, f ⟩ = R
f (x)2 dx.

• Like with vectors, we can consider forming more complex functions via additions of simpler ones. The
following is an example of a multi-modal function constructed from three Gaussian-like bumps (by
combining more and more bumps, we could clearly produce increasingly complicated functions):
2.5

1.5

0.5

-0.5
-5 0 5

• The property of orthonormality is also useful here: The functions {gi } are orthonormal if all ⟨gi , gj ⟩ = 0
and ∥gi ∥ = 1, and for any function in the span of the {gi } we can decompose
X
f (x) = ⟨f, gi ⟩gi (x).
i

(Or if f is not in the span, then the right-hand side represents the projection of f onto the span.)

• At first one might think that the space spanned by sines and cosines is limited, but in fact these are
rich enough to model general piecewise continuous functions. The relevant individual sines/cosines (or
complex exponentials) will serve as a convenient orthonormal basis for such functions. (Unlike the vector
case, the number of functions in the basis will be infinite, which is why we used the ambiguous notation
P
i above.)

We conclude this discussion by mentioning that allowing complex values will also be useful: For vectors
Pn
u, v ∈ Cd the inner product is then i=1 ui v i (with the bar meaning complex conjugation, qa + bi = a − bi),
R R
and the functions f, g the inner product is ⟨f, g⟩ = R f (x)g(x)dx, and the norm is ∥f ∥ = R
|f (x)|2 dx.

3
2 Fourier Series
Before proceeding, a quick note on terminology:
2πx

• The period T of a sin wave is the width of a full oscillation. The function f (x) = sin T has period T .
1
• The (ordinary) frequency ξ is the reciprocal of the period, ξ = T , e.g., giving f (x) = sin(2πξx).

• We will work more with the angular frequency, defined as ω = 2πξ, so that f (x) = sin(ωx). Doing so
has various pros and cons (e.g., removing the need to carry around 2π, but it does lead to having some
“nuisance terms” later on).

In this section, we will consider functions produced by adding sines/cosines whose frequencies are integer
multiples of some base frequency (e.g., sin(x), sin(2x), sin(3x), ...). This turns out to “only” be powerful
enough to produce signals that have a regularly repeating pattern (periodicity), though that pattern itself can
be highly complex. In Section 3 we will extend this idea to allow a continuum of frequencies (e.g., sin(ωx) for
arbitrary choices of ω), which will let us produce general functions that need not be periodic.

2.1 Using Trigonometric Functions (Sines and Cosines)


We first consider the space S of piecewise continuous functions f : [−π, π] → R. The specific interval [−π, π]
is considered for convenience, but this can be generalized to any finite interval [a, b] We will also view these
functions as being defined on the entire real line R, where outside [−π, π] the function is defined by having
periodic behavior with period 2π. Since we are allowing complex values, we define the inner product for S as
Z π
1
⟨f, g⟩ = f (x)g(x)dx.
π −π

Notice the restriction to [−π, π]), and the division by π which is related to our use of angular frequency.
The Fourier Series is based on using the following basis for the space S:
 
1
√ , sin(x), cos(x), sin(2x), cos(2x), ... . (1)
2

We will omit proof that this is a basis, but make some comments on it in Appendix B. The orthonormality
property can fairly easily be verified by direct integration.
Writing the basis elements in Eq. (1)as {en }∞
n=1 , we can then decompose any function f ∈ S as:


X
f (x) = ⟨f, en ⟩en (x), (2)
n=1

or substituting the functions explicitly:



a0 X 
f (x) = + an cos(nx) + bn sin(nx) , (3)
2 n=1

4
Figure 1: Illustration of two sine wave signals with different frequencies and their combined signal (first
column), along with their Fourier transforms or decomposition into frequencies (second column).

where for n = 1, 2, ...

1 π
Z
a0 = f (x)dx
π −π
1 π
Z
an = f (x) cos(nx)dx
π −π
1 π
Z
bn = f (x) sin(nx)dx.
π −π

Equation (3) is called the Fourier series expansion of f , and the terms an , bn are called Fourier coefficients.
We can simplify this decomposition for certain classes of f . For example,

• For an even function f where f (−x) = f (x), one can show that the bn terms are all zero.

• Similarly, for an odd function f where f (−x) = −f (x), one can show that the an terms are all zero.

We see that the Fourier decomposition can be viewed as a decomposition of a function f into a set of frequencies,
mapped by the trigonometric basis functions. Some examples are shown in Fig. 1, where we see that signal
comprising two sine signals with different frequencies can be readily decomposed into distinct frequencies. In
general, functions that appear less ‘smooth’ with more fluctuations at smaller scales will tend to have larger
high frequency coefficients (i.e., an and bn for large n).

5
2.2 Using Complex Exponentials
The previous section had only considered real-valued periodic functions, and used sines/cosines in the choice
of basis. Here we generalize to consider complex-valued functions, and adopt a basis based on complex
exponentials, which perhaps surprisingly comes out to be neater and more compact to write. Since we now
allow complex values, the preceding choice of inner product generalizes to
Z π
1
⟨f, g⟩ = f (x)g(x)dx.
2π −π

(Now we divide by 2π instead of π, which is related to complex numbers having 2 components.)


Similar to Eq. (1), we can introduce the following infinite set of functions:

{1, eix , e−ix , ei2x , e−i2x , ...}. (4)

We can again easily verify that these complex exponentials are orthogonal to one another by computing

⟨ψj , ψk ⟩ = −π eijx e−ikx dx for each pair of functions (ψj , ψk ) in the set. Similarly, we can verify that they
p
have unit norm by writing ∥ψ∥ = ⟨ψ, ψ⟩.
Then, the complex exponential Fourier series can be expressed as follows:

X
f (x) = cn einx , (5)
n=−∞

where for n = 0, ±1, ±2, ..., Z π


1
cn := fˆn = f (x)e−inx dx. (6)
2π −π

Using Euler’s formula eiθ = cos(θ) + i sin(θ), we can relate the complex exponential Fourier series with that
of the trigonometric Fourier series. Specifically, doing so gives the following:
an −ibn an +ibn
• cn = 2 , c−n = 2

• an = cn + c−n , bn = i(cn − c−n )

• Hence the symmetry properties: When f is an even function, we have ck = c−k for k = 1, 2, 3, ..., and
similarly when f is an odd function, we have ck = −ck for k = 1, 2, 3..., zeroing out the odd sine and
even cosine components accordingly.

2.3 Note on Finite-Term Approximation


In general, if we wanted to actually store the Fourier series of a function on a computer, we would have to
truncate to a finite number of terms. Upon doing this, one might hope that a finite-term approximation of
P∞
f (x) = n=−∞ cn einx is sufficient:
N
? X
f (x) ≈ f (x) = cn einx
n=−N

for suitably chosen N . It turns out that the approximation error is indeed very small for sufficiently “well-
behaved” functions (e.g., ones with relatively smooth behavior and no abrupt changes).
On the other extreme, for discontinuous functions an important (and unfortunate) phenomenon known
as the Gibbs phenomenon is observed, even when N is very large. The issue is that the basis functions

6
Figure 2: Illustration of Gibbs phenomenon for a square wave. Notice the “ringing” pattern near the discon-
tinuities.

(sines and cosines) are all continuous, so any finite combination of them will also be continuous. As a result,
when we truncate the infinite series, we get a “ringing” pattern causing large approximation errors near the
discontinuity; see Figure 2 for an example using the square wave. By comparison, further away from the
discontinuities, the approximation error is very small.

2.4 Parseval’s Theorem and Energy Interpretation


The following result is known as the Parseval’s theorem (or sometimes Rayleigh’s identity):


X ∞
X
∥f ∥2 = |fˆn |2 = | f, einx |2 (7)
n=−∞ n=−∞

This roughly states that the energy (i.e., sum or integral of the squared values) of a signal is the same as the
energy of its Fourier transform.
To show this property, letting en denote the n-th Fourier basis function, we have

∥f ∥2 = ⟨f, f ⟩ (definition of norm)


* ∞ ∞
+
X X
= ⟨f, en ⟩ en , ⟨f, em ⟩ em (Fourier expansion of f )
n=−∞ m=−∞

X
= ⟨f, en ⟩ ⟨f, em ⟩ ⟨en , em ⟩ (linearity of inner product)
n,m=∞
X∞
= ⟨f, en ⟩ ⟨f, en ⟩ (orthonormality of basis)
n=∞
X∞
= | f, einx |2 . (since aa = |a|2 ) (8)
n=−∞

Note also that in view of this property, we can interpret | f, einx |2 as the amount of energy associated with
the nth frequency.

7
3 Fourier Transform
Thus far we have only looked at periodic functions (or alternatively, functions restricted to a finite interval,
[−π, π]). In this section, we move to general functions from R → C, using the periodic case as a building block.
Specifically, we consider a general function f : R → C having the following two properties: (i) it is piecewise
R∞
continuous on every finite interval, and (ii) it is absolutely integrable, i.e., −∞ |f (x)|dx < ∞. (In fact, we
will not always require (ii) to hold, e.g., we will still talk about the Fourier transform of a sinusoid.)
The idea will be to apply the Fourier series to the function restricted to a finite range [−L, L] (for some
L > 0), and then consider what happens when L → ∞. We start with the following observations:

• For any L > 0, by generalizing the case that L = π (handled above), we can derive the Fourier series
P∞ RL
1
f (x) = n=−∞ cn,L eiωn x , where ωn = nπ/L and cn,L = 2L −L
f (x)e−iωn x dx.

• Observe that each frequency is a multiple of π/L. We thus define ∆ω,L = π/L, which gives ωn = n∆ω,L ,
and the Fourier coefficient can be expressed as
Z L
1
cn,L = ∆ω,L · · f (x)e−iωn x dx . (9)
2π −L
| {z }
Denoted by fˆL (ωn )

Now consider the limiting case where L → ∞. Then, the integration over [−L, L] approaches integration over
the whole real line,1 and we also have ∆ω,L → 0. This means that the summation defining f (x) approaches
an integral:

X
f (x) = lim cn,L eiωn x (10)
L→∞
n=−∞

1 X
= lim fˆL (ωn )eiωn x ∆ω,L (11)
L→∞ 2π n=−∞
Z ∞
1
= fˆ(ω)eiωx dω, (12)
2π −∞

R∞
where fˆ(ω) = −∞
f (x)e−iωx dx.
The above analysis leads us to define to Fourier Transform and Inverse Fourier Transform as follows:
Z ∞
fˆ(ω) = F(f (x)) = f (x)e−iωx dx (13)
−∞
Z ∞
−1 ˆ 1
f (x) = F (f (ω)) = fˆ(ω)eiωx dω. (14)
2π −∞

This pair of operators are the counterpart to the Fourier series for non-periodic functions. The transform F(·)
can still be viewed as a change of basis from a given “natural basis” to the “frequency basis”. Notice that now
all frequency values ω ∈ R play a role, whereas for the Fourier series it was only integer multiples of a base
frequency. Accordingly, we now get f (x) by integrating over all ω, rather than summing over integers n.
As we will see soon, there are several properties in the frequency domain that can be exploited to be able
to solve problems that would be challenging in the original domain.
1
R∞
This is formally justified by the condition |f (x)|dx < ∞, which we mentioned above.
−∞

8
Note on ordinary frequency vs. angular frequency. The quantity ω above is the angular frequency,
and is related to the ordinary frequency ξ via ω = 2πξ. The distinction is just a minor matter of normalization,
but it’s useful to be aware of the three main variations of the Fourier transform:

• The one above is called non-unitary with angular frequency.

• The unitary version with angular frequency puts a factor √1



in the expressions for fˆ and f , rather
than putting the entire 2π in the expression for f . This makes the equations “more symmetric” but can
lead to a lot of “niusance” √1 factors.

• The version with ordinary frequency puts 2π in the exponent instead, which turns out the remove the
R∞ R∞
need for any pre-factor: fˆ(ξ) = f (x)e−i2πξx dx and f (x) =
−∞
fˆ(ξ)ei2πξx dξ.
−∞

Very brief note on multi-input functions: The Fourier transform can also be defined for functions
with inputs in Rd , with R2 being particularly important in image processing. We will stick to single-input
functions in this lecture, but the main properties generally carry over naturally.

4 Examples and Properties


Here we overview some examples and main properties of the Fourier transform – see the tables are https:
//en.wikipedia.org/wiki/Fourier_transform#Tables_of_important_Fourier_transforms for a useful
summary and a more detailed list.
Before proceeding, we need to introduce a “strange but useful” notion called the Dirac delta function δ(x),
which satisfies the following equality: 
∞ x=0
δ(x) =
0 x , 0.

To make this more precise, we impose the following “definition” or “convention”:


Z ∞
δ(x) dx = 1. (15)
−∞

1
We can think of it in the following way: Imagine a rectangle-shaped function whose value is w in the interval
[−w/2, w/2] for some w > 0, and 0 elsewhere. This is a perfectly standard function that we could work with
in the usual way. The Dirac delta function captures how such a function behaves in the limit as w → 0 – thus,
it is an infinitely narrow and infinitely high “spike” but it still integrates to one.
For any function f (x), it follows from (15) that
Z ∞
δ(x − a)f (x) dx = f (a). (16)
−∞

The Dirac delta function is useful when studying Fourier transforms (and also for Linear Time Invariant
systems, which we won’t cover in detail), as two of the examples below demonstrate.

4.1 Examples
Some sketches of common functions and their Fourier transform pairs are given in Fig. 3. We give the details
of one example (the rectangular function); the interested student may want to try some of the others (using
Eq. (13) and Eq. (14)) and/or consult the Wikipedia page for a longer list of examples.

9
Figure 3: Examples of common functions and their Fourier transform pairs.

Consider the rectangular function 


1
1 |x| ≤ 2
f (x) =
0 otherwise.

The Fourier transform is given by


Z 1/2
fˆ(ω) = e−iωx dx
−1/2
h 1 i1/2
= e−iωx
−iω −1/2
−iω/2 iωx/2
e −e
=
−iω
sin(ω/2)
= ,
ω/2

ex −e−x sin(πx)
where the last line uses sin x = 2i (from Euler’s identity). Then, introducing the notation sinc(x) = πx
(the sinc function), we get ω
fˆ(ω) = sinc .

4.2 Linearity
It is straightforward to see that the Fourier transform is a linear operator, i.e., F[af (x) + bg(x)] = aF[f (x)] +
bF[g(x)], given that the integral operators are linear. This applies similarly for the inverse Fourier transform.

4.3 Shifting Properties


The following properties are fairly straightforward to prove using the definition of the (inverse) Fourier trans-
form and standard integration techniques like change of variable and/or ea+b = ea eb : If f (x) has Fourier
transform fˆ(ω), then
1 ˆ
• For any constant a, the function f (ax) has Fourier transform |a| f (ω/a). This means that when a
function gets narrower in the original domain (i.e., a > 1) it gets broader in the frequency domain, and
vice versa. This is intuitive, because making a function narrower (while otherwise maintaining the same
shape) means that any function increases/decreases get sharper/steeper, and faster changes correspond
to higher frequencies.

10
• For any constant c, the function f (x − c) has Fourier transform fˆ(x)e−icω . Similarly, the function
f (x)eicx has Fourier transform fˆ(ω − c). That is, a shift in one domain translates to multiplication by
a complex exponential in the other.

4.4 Convolution Properties


We can also consider the convolution operation between two functions f and g :
Z ∞
(f ∗ g)(x) = f (x − y)g(y)dy. (17)
−∞

Intuitively, we can view convolution as a kind of smoothing or averaging operation. It is a very important
concept in both theory and practice; for example:

• We will briefly discuss how it is used in the context of understanding linear time-invariant (LTI) systems
in Section 6.2.

• The Fourier transform applied to a probability density function is known as the characteristic function.
In this context, the relevance of convolution is that if X, Y are independent with density functions fX , fY ,
then the random variable Z = X + Y has density function fZ = fX ∗ fY .

A very useful property of the Fourier transform is that convolution in the original domain corresponds to
multiplication in the Fourier domain:
F(f ∗ g) = F(f )F(g). (18)

That is, the Fourier transform of (f ∗ g)(x) is fˆ(ω)ĝ(ω).


We can see this by considering the inverse Fourier transform of fˆĝ:
Z ∞
1
F −1 [fˆĝ](x) = fˆ(ω)ĝ(ω)eiωx dω (by definition of F −1 )
2π −∞
Z ∞ Z ∞
1
= fˆ(ω) g(y)e−iωy dy eiωx dω (by definition of ĝ)
2π −∞ −∞
Z ∞ Z ∞
1
= g(y) fˆ(ω)eiω(x−y) dω dy (by swapping order of integrals) (19)
−∞ 2π −∞
| {z }
Z ∞
= g(y)f (x − y)dy (by formula for inverse FT)
−∞

= f ∗ g. (20)

4.5 Fourier Transform vs. its Inverse


Comparing the formula for the Fourier transform and its inverse in (13) and (14), we see that the two are
1
nearly identical – the only differences are (i) the 2π factor (this difference even disappears in the other version
of the Fourier transform discussed after (14)), and (ii) the use of −i vs. i in the exponent. In this sense, the
Fourier transform and inverse Fourier transform are “essentially” doing the same thing.
For this reason, when we see a property in the original domain implying some other property in the
frequency domain, the role of the domains can easily be reversed. For example:

• Convolution in the original domain translates to multiplication in frequency domain;

11
• Conversely, multiplication in the original domain translates to convolution in the frequency domain.

(Care may just be needed in getting certain signs and scaling factors correct.)

4.6 Derivative Properties


How we consider how the Fourier transform of a differentiable function f (x) relates to the Fourier transform
of its derivative f ′ (x). This turns out to be very useful for solving differential equations; see Section 6.3 for
an example.
Technically we need certain smoothness/continuity and absolutely integrable conditions, but we will not
go into the details of those, and instead simply state that we can get the following using integration by parts:
Z ∞
F(f ′ (x)) = f ′ (x) e|−iωx
{z } dx
−∞ | {z }
dv u
Z ∞
∞
= f (x)e−iωx −∞ − f (x)(−iωe−iωx )dx

| {z } −∞
0
Z ∞
= iω f (x)(e−iωx )dx
−∞
| {z }
F (f (x)

= iωF(f (x)). (21)

We can similarly take the n-th order derivative f (n) , and obtain F(f (n) ) = (iω)n F(f (x)).
Hence, the derivative of a function, when considered after a Fourier transform to the frequency domain,
will just become multiplication or scaling by iω in the frequency domain; in particular, the magnitude at
d
frequency ω will get scaled by ω (analogous to how dx sin(ωx) = ω cos(ωx)).

4.7 Parseval’s Theorem


Parseval’s theorem applies to the Fourier transform as well, and is stated as follows:
Z ∞ Z ∞
1
|f (x)|2 dx = |fˆ(ω)|2 dω.
−∞ 2π −∞

In other words, the Fourier transform preserves the (squared) L2-norm, up to a constant related to the
1
normalization of the basis (in this case 2π ). Again, the intuition is that the total energy of the time signal
equals the total energy of the frequency signal.

5 Discrete Time
Thus far, we have considered functions f (x) with a real-valued input x ∈ R. Naturally, when we store a signal
on a computer, we need to use finitely many bits, and a natural approach is to only consider a finite number
of x values. (Each f (x) will also need to be “quantized” to finitely many bits, but usually that’s less of an
issue because 64-bit numbers tend to be “accurate enough”.) In addition, some signals/functions are simply
discrete in nature to begin with, and we would still like to have a notion of Fourier transform for them.

12
For concreteness, we will think of the function input as representing “time”, even though in applications
it could represent other dimensions such as spatial. In addition, we will switch notation (with apologies if it
is confusing) to choices that are more common when working in discrete time:

• The input x will be renamed to t, representing “time”;

• Instead of a function f (·) with Fourier transform fˆ(·), we will consider a function x(·) with Fourier
transform X(·).

• We will shortly switch from continuous time (t ∈ R) to discrete time (n ∈ Z), and in discrete time we
will use square brackets to highlight this distinction, e.g., x[n].

• Instead of the angular frequency ω, we will consider the ordinary frequency ξ (they are related by
ω = 2πξ). As we mentioned previously, using ordinary frequency, the continuous-time Fourier transform
of a signal x(t) is Z ∞
X(ξ) = x(t)e−i2πξt dt. (22)
−∞

5.1 Sampling of Continuous Signals


It is useful to first think about the following: If we can measure some real-valued signal x(t) by querying
various t values, how should we do this in a manner that lets us store it on a computer but still retain the
relevant information? By either stopping the measurements at some point or working in blocks of a given
length, we can assume that t is bounded, say t ∈ [0, T ].
The natural strategy is to query x(·) at regular intervals, say {0, Ts , 2Ts , 3Ts , . . . , T } for some sampling
T
interval Ts (which we’ll assume to be such that Ts is an integer):

x[n] = x(t) t=nTs


. (23)

T
This gives us a discrete-time signal of length Ts + 1. A key question is then what choice of Ts to use. At first
glance, it seems that any Ts > 0 leads to some loss of information, with the loss vanishing only in the limit
as Ts → 0. Remarkably, as long as x(t) does not contain arbitrarily high frequencies, we can in fact have zero
1
loss for a carefully chosen Ts . It is more natural to state the result in terms of ξs = Ts , which we call the
sampling frequency.
Shannon-Nyquist Sampling Theorem: Let x(t) be a continuous-time signal, let X(ξ) be its Fourier
transform, and let B be the highest frequency for which X(ξ) , 0 (i.e., X(ξ) = 0 for all ξ > B. The letter
B stands for “bandwidth”.). Let x[n] be the discrete-time signal formed via (23). Then, x(t) can be perfectly
reconstructed from x[n] provided that the sampling frequency ξs satisfies ξs > 2B.
This is a very famous result in signal processing, but we will skip its proof. In short, moving to discrete
time loses nothing (at least mathematically) if we sample above twice the highest frequency of the signal.
(**Optional**) Aliasing. It is also useful to understand what happens if ξs < 2B (which is unavoidable
if B = ∞; not all signals are bandlimited). To answer this, it is useful to think of all t ∈ R rather than only
t ∈ [0, T ] (e.g., by taking x(t) = 0 outside [0, T ]). Then x[n] from (23) is defined for all integers. If we use the
formula for the Fourier transform on x[n], but replace the integral by a sum since x[n] is discrete-time, we get


X
Xs (ξ) = x[n]e−i2πξnTs . (24)
n=−∞

13
Alternatively, this can be interpreted as the Fourier transform of xs (t), defined to have a delta Dirac function
of height x[n] whenever t = nTs for some integer n. Using the linearity of the Fourier transform and the fact
that the Dirac function’s Fourier transform is a complex exponential, this interpretation can be shown to lead
to the following (details omitted):


X ∞
X
Xs (ξ) = X(ξ − n/Ts ) = X(ξ − nξs ),
n=−∞ n=−∞

meaning that the spectrum of Xs (·) is a sum of all shifted copies of X(·), where the shifts are by ±ξs , ±2ξs ,
and so on. If ξs < 2B, these shifted copies start to overlap, a phenomenon known as aliasing:

Figure 4: Illustration of aliasing in the frequency domain. (Frequencies here are denoted by f instead of ξ.)

The issue is that high frequency signals look the same as low-frequency ones when we sample too slowly, an
example is shown as follows (this image and the one above are from Wikipedia):

Figure 5: Illustration of aliasing in the time domain.

ξs
The effect of this is that any frequency ξ ≥ 2 gets “folded down” and looks the same as a lower frequency
(e.g., a signal of frequency exactly ξs is indistinguishable from a signal of frequency 0, i.e., a constant signal).
ξs
Having said this, if the frequency content of X(ξ) is small enough for ξ ≥ 2 , the impact of aliasing may be
negligible.

5.2 Discrete Fourier Transform (DFT)


We are now ready to define the Discrete Fourier Transform (DFT). One way to view this is to start with Xs (ξ)
from (24) and simply evaluate it at a finite number (denoted by N ) of frequencies:

N −1

X
X[k] = x[n]e−i N kn , k = 0, 1, 2, ..., N − 1. (25)
n=0

14
ξs
This is another kind of sampling, but now it is sampling in frequency domain (with intervals of length N)
rather than in time domain. With this definition, it can be shown that the following inverse DFT recovers
the discrete-time signal:
N −1
1 X 2πkn
x[n] = X[k]ei N , n = 0, . . . , N − 1. (26)
N
k=0

Observe that both x[n] and X[k] are now described by just N numbers, making them amenable to storage
and processing on a computer. The following should be noted:

• Any information about x(t) outside t ∈ [0, (N − 1)Ts ] is lost (though we could of course form a separate
DFT corresponding to the later segments of the signal). Choosing N to make this consistent with
T
t ∈ [0, T ], we get N = Ts + 1. (The +1 term may be avoided by instead sampling at the midpoints of
length-Ts regions, but we won’t worry about this.)

• Aliasing may also incur a loss of information, as discussed above.

As we hinted earlier, the DFT is also useful for signals that are simply discrete-valued to begin with (rather
than sampled version of continuous ones). In such scenarios, the length N is simply the length of the signal
we are given (or perhaps the length of sub-blocks of the signal that we opt to work with).
The DFT has analogs of most of the main properties of the continuous-time Fourier transform (e.g.,
convolution property, Parseval’s theorem, etc.). See https://en.wikipedia.org/wiki/Discrete_Fourier_
transform#Properties for a summary.

5.3 Fast Fourier Transform (FFT)


The DFT consists of computing N coefficients, each defined as a sum over N terms, so naively the computation
time is O(N 2 ). (Or treating x[n] as a vector, we can view the DFT as multiplication by an N × N “DFT
matrix”.) It turns out that the naive approach is actually doing a lot of repeated/redundant computation.
The Fast Fourier Transform (FFT) is a (very) famous algorithm that avoids this, and brings the computation
time down to O(N log N ). There are several versions of the FFT, and the one that we present here is due to
Cooley and Tukey.
The general procedure of Cooley and Tukey’s algorithm is to re-express the discrete Fourier transform
(DFT) recursively (divide-and-conquer), to reduce the computation time. We first rewrite the DFT to be

N
X −1
X[k] = x[n]WNkn , (27)
k=0

where
2πkn
WNkn = e−i N . (28)
(k+N )n
It is useful to note that WNkn has a periodicity property: WN = WNkn .
N
To develop the FFT, we first split the single summation over N samples into 2 summations, each with 2
n n−1
samples, one for n even and the other for n odd. Substituting m = 2 for n even and m = 2 for n odd, we
have
N N
−1
2 −1
2
(2m+1)k
X X
X[k] = x[2m]WN2mk + x[2m + 1]WN . (29)
m=0 m=0

15
To simplify this, observe that
2π(2mk) −i 2πmk
WN2mk = e−i
N mk
N =e 2 = WN . (30)
2

Therefore,
N N
−1
2 −1
2
X X
mk
X[k] = x[2m]W N + WNk mk
x[2m + 1]W N . (31)
2 2
m=0 m=0

Defining G[k] to be the length-N/2 DFT on even indices, and H[K] the one on odd indices, it follows that

X[k] = G[k] + WNk H[k]. (32)

N
Thus the N -point DFT X[k] can be obtained from two 2 -point transforms! Although the frequency index k
N
ranges over N values, only 2 values of G[k] and H[k] actually need to be computed, since G[k] and H[k] are
N
periodic with period 2. An illustration is given in Figure 6.

Figure 6: Calculating the N -point DFT from the N/2-point DFT. (Image from https://lutpub.lut.fi/
handle/10024/164024)

N
Supposing that N is a power of 2, we can repeat the above process on the two 2 -point transforms, breaking
N
them down to 4 -point transforms, and so on, until we come down to 2-point transform. At a recursion depth
of log2 N we are down to a length of 1, which is trivial. The O(N log N ) scaling then simply comes from
performing O(N ) computation at each depth.

6 (**Optional**) Applications
The Fourier transform is extensively used throughout signal processing, communications, machine learning,
theoretical computer science, statistics, and more. We give just a few examples of applications here, and we
don’t go into much detail

6.1 Signal Denoising


A fundamental task in signal processing is denoising – removing as much noise as possible while keeping as
much of the underlying signal as possible. The Fourier transform provides a very natural view for this task –

16
Figure 7: Signal (red) contaminated with Gaussian white noise (blue shows the noisy signal).

Figure 8: Frequency spectrum of the noisy signal after Fourier transform.

if we know that the signal lies (mostly) in a certain frequency band, then we can filter out other frequencies
and thus attenuate or remove any noise that occurs there. (Of course, any noise at the same frequencies as
the signal will remain, but that tends to be a relatively small amount.)
Figure 7 gives an example of a signal corrupted by noise. The noise makes the signal highly corrupted, with
very erratic oscillations. But since the noise fluctuates so much but the signal only exhibits mild oscillations,
this suggests that the signal lies at low frequencies whereas the noise contains high frequencies that we should
try to remove. The Fourier transform lets us see this much more precisely – see Figure 8. With this picture
in mind, a natural denoising approach is to only keep frequencies whose value exceeds some thresold (e.g.,
the red line in the figure). Other methods are also possible, like keeping all low frequencies. There are still
various design issues like what threshold or frequency range to use, or what prior knowledge (if any) we have
about the signal, etc., but overall this is a very natural and effective approach to denoising.

6.2 Linear Time Invariant Systems


In signal processing systems, we are interested in operations that input one signal x(t) (or x[n] in discrete
time) and output another signal y(t). (In fact, denoising as described above would be an example this!)
An important class of these is linear time-invariant (LTI) systems, which satisfy the linearity property (an
input of ax1 (t) + bx2 (t) gives output ay1 (t) + by2 (t)) and the time invariance property (an input of x(t − τ )
gives output y(t − τ )).
LTI systems are fully described by their impulse response, which is the output when x(t) = δ(t) (the Dirac
delta function) – essentially, giving the system a “flick” and seeing how it responds. Once the impulse response
is known, the output associated with an arbitrary input x(t) is the convolution of x and the impulse response.
(You may wish to try to prove this using the LTI properties and (15).)

17
By moving to the Fourier view, the convolution gets replaced by multiplication, which can be much easier
to understand, interpret, and design based on.

6.3 Solving Differential Equations


The Fourier transform is a useful tool for solving differential equations efficiently, e.g., in the study of dynamical
systems and control systems. We demonstrate one example below.
In this example, we consider a damped harmonic oscillator shown in Fig 9, which is a particle of mass m
subject to a spring force and a damping force.

Figure 9: Illustration of a Damped Harmonic Oscillator

The motion of the particle can be derived using Newton’s second law to obtain the ordinary differential
equation (ODE):
d2 x dx
2
+ 2γy + ω02 x(t) = 0. (33)
dt dt
Moreover, if an additional driving force f (t) is introduced that operates on the particle, we can further
generalize this to
d2 x dx f (t)
2
+ 2γ + ω02 x(t) = . (34)
dt dt m
To solve for x(t), we first take the Fourier transform of both sides of the above equation. The result is

fˆ(ω)
−ω 2 x̂(ω) − 2iγωx̂(ω) + ω02 x̂(ω) = , (35)
m

where x̂(ω) and fˆ(ω) are the Fourier transforms of x(t) and f (t) respectively.
Thanks to moving to the Fourier domain, the inconvenient derivative operations are replaced by simpler
and easier-to-analze operations. Specifically, we have obtained an algebraic equation that can easily be solved:

F (ω)/m
X(ω) = . (36)
−ω 2 − 2iγω + ω02

Knowing X(ω), we can use the inverse Fourier transform to obtain x(t) (which is not an especially “simple”
or “nice” expression, but it does give the solution we are after, and it can readily be computed numerically):
Z ∞ Z ∞
1 F (ω)/m
x(t) = e−iωt dω, where F (ω) = eiωt f (t)dt. (37)
2π −∞ −ω 2 − 2iγω + ω02 −∞

This gives the system behavior of the particle under the effect of the external force. In control system design,
we can further design the f (t) at ease to realize the desired system behavior.

18
To summarize, the solution procedure for the driven harmonic oscillator equation consists of (i) using the
Fourier transform on f (t) to obtain F (ω), (ii) using the above equation to find X(ω) algebraically, and (iii)
performing an inverse Fourier transform to obtain x(t). We mainly leverage the derivative property of the
Fourier transform to solve ODEs in real applications.

6.4 Fast Matrix Multiplication


Many problems (including some of those above) can be understood via the following picture:

Figure 10: Problem-solving using Fourier transform

Notably, taking the 3-step approach (to Fourier, solve, and then inverser Fourier) can be much easier than
the 1-step approach. Here we give an example in the context of matrix multiplication, in which case it’s the
discrete Fourier transform that comes into play.
For an n × n matrix A and a n × 1 vector x, the computation of Ax takes O(n2 ) time, and in general
this can’t be improved (since just reading A takes O(n2 ) time). But it can be improved for certain classes of
matrices. One such class is circulant matrices, which take the form
 
c0 c1 ... cn−2 cn−1
 cn−1 c0 c1 ... cn−2
 

 . .. .. .. .. 
 . 
 . . . . . 
c1 c2 ... cn−1 c0

That is, the rows are cyclic shifts of each other, and we always have ‘diagonals’ taking a common value. If we
let F be the n × n matrix that performs the DFT operation, and collect then ci values into a vector c, then
it can be shown that A = F −1 diag(F c)F .2 Thus, we can compute Ax in O(n log n) time by 3 invocations of
the FFT algorithm (two for F and one for F −1 ), and using the fact that multiplication by a diagonal matrix
only takes O(n) time.
A circulant matrix is in fact performing a kind of convolution operation (with “wrap around” from index
n+1 back to 1), so this example is essentially a different way of stating that convolution becomes multiplication
after applying the (discrete) Fourier transform.
2 In more technical terms, this means that the DFT matrix diagonalizes A.

19
6.5 Other
Other applications of the Fourier transform include (but are not limited to) the following:

• In the field of communication systems, FT facilitates the modulation and demodulation of signals,
thereby playing a crucial role in signal transmission and reception.

• It also finds significant usage in image and audio processing, aiding in the analysis, manipulation, and
compression of image and sound signals.

• In machine learning, FT is used for various tasks such as feature encoding, fast calculation, modeling
smoothness assumptions in optimization, and so on.

• As hinted at the start of the lecture, a more abstract use is the analsyis of Boolean functions (e.g., see
https://www.youtube.com/watch?v=lXJP-UkTl-4), and there are many others.

20
Appendix
A (**Optional**) More Detailed Background on Bases
Here we review in more detail some mathematical foundations and core concepts associated with vector spaces
and the notion of basis. We assume that the reader is familiar with some linear algebra and concepts such as
vector spaces, linear independence, and span.

• Basis. A basis is a set of vectors {vi } (for a vector space V) if {vi } are linearly independent and
V = span{vi }. For continuous vector spaces (e.g., Rd ), there are infinitely many possible bases.

• Inner product. The inner product can be thought of as a generalization of the dot product of Euclidean
space. Given V that is a complex linear space and vectors u, v ∈ V, the inner product ⟨u, v⟩ ∈ C satisfies
the following properties:

– ⟨v, v⟩ ≥ 0
– ⟨v, v⟩ = 0 iff v = 0
– ∀u, v, w ∈ V and a, b ∈ C, ⟨au + bv, w⟩ = a⟨u, w⟩ + b⟨v, w⟩
– ⟨u, v⟩ = ⟨v, u⟩
Pn
• Euclidean space. For V ∈ Cn , the inner product is ⟨u, v⟩ = xi y¯i (where ȳ denotes the complex
i=1
n
Pn
conjugate of y, i.e., a + bi = a − bi.). For V ∈ R , this simplifies to ⟨u, v⟩ = i=1 xi yi .
p
• Norm. The norm associated with the inner product is ∥v∥ = ⟨v, v⟩. For Euclidean space, this gives
us the usual/familiar notion of length of a vector.

For studying the Fourier transform, rather than just finite vectors, we need to consider the space of
continuous functions. We can similarly define an inner product for such a space. Let V = S[a, b] be the space
of continuous functions f : [a, b] → C. With the usual definition of sum of functions and multiplication by a
scalar, V is a linear space and we can similarly define an inner product
Z b
⟨f, g⟩ = f (x)g(x)dx. (38)
a

Notice how intuitively the continuous functions over the interval are similar to a limiting case where we
have an “infinite-length” vector and the discrete summation for the inner product replaced by an integral of
the product of the two functions.
We can now introduce orthogonality and orthonormal systems.

• Orthogonality. u and v are orthogonal if ⟨u, v⟩ = 0.

• Orthogonal system. An orthogonal system is a sequence of non-zero vectors {ui } such that ui and
uj are orthogonal ∀i , j. Note that {ui } can consist of infinite many elements – we can call this an
infinite orthogonal system.

• Orthonormal system. An orthonormal system is an orthogonal system that additionally satisfies


∥ui ∥ = 1, ∀ui .

21
Pn
Given an orthonormal system {ei }, if v = i=1 ai ei then ai = ⟨v, ei ⟩. The proof is straightforward,
following just from the linearity of the inner product and orthonormality of the system {ei }. When v is in the
Pn Pn
span of {ei }, then v = i=1 ai ei = i=1 ⟨v, ei ⟩ ei . This may not always be the case, in which case we can
define a useful concept:

• Orthogonal projection. The orthogonal projection of v on the span of the orthonormal system {ei }
Pn
can be defined as ṽ = i=1 ⟨v, ei ⟩ ei .

With these, we can then define a closed orthonormal system. Given an infinite orthonormal system {ei }
and inner product space V , we can define it as closed in V if ∀v ∈ V ,
n
X
lim v− ⟨v, ei ⟩ ei = 0. (39)
n→∞
i=1

With a closed infinite orthonormal system, we can then represent any vector v in terms of the system basis
{ei }. Intuitively, this means that the set {ei } is rich enough to be able to capture all relevant features and
express any vector v. In practice, we may only use a finite number of ei to represent the function, in which case
it becomes an approximation rather than being exact. The approximation generally improves as the number
of terms taken increases, though in some scenarios the improvement may be slow.

B (**Optional**) Convergence Properties for the Fourier Series


We have not shown that the orthogonal systems Eq. (1) and Eq. (6) are closed, nor that the Fourier series can
“reasonably represent” any piecewise continuous periodic function. Details regarding the pointwise, uniform,
and norm convergence of the Fourier series are beyond our scope, but we briefly and informally mention some
key results here:

• Pointwise convergence. Let f be a periodic, piecewise smooth function on R, and let F S[f ] be the
Fourier series (either using sines/cosines or complex exponentials) for f . Then:

– F S[f ] convergences at every point where f is continuous, i.e., f (x) = F S[f ]|x at every x where f
is continuous.
– If a is a point where the function is discontinuous, then as the number of Fourier terms in the
1
partial sum N → ∞, we get F SN [f ] → 2 (limx→a− f (x) + limx→a+ f (x)).

• Uniform convergence. Let f be a continuous, piecewise smooth, periodic function with period p.
Then:

– Its complex exponential Fourier series F S[f ] converges uniformly to f . Furthermore, for any real
value x and pairs of integers M and N where M < 0 < N , we have

N h 1
X 1 i
f (x) − ck ei2πωk x ≤ √ + √ B,
k=M
M N

 R  12
π
where B = 1
2π p −π |f ′ (x)|2 dx , ∀x. Thus, the approximation error becomes low as M and N
increases as long as none of the derivatives are too large.

22
– The proof is based on relating coefficients of the Fourier series of f to those of its derivative f ′ , in
the same way as what we did for the Fourier transform in Section 4.6.
– We can also get a better bound for smoother functions, i.e., when we know that a function is
m-times differentiable, and f (m) is piecewise continuous. Then the bound becomes

N  
X
i2πωk x 1 1
f (x) − ck e ≤ √ + √ B̃,
k=M
M 2m−1 N 2m−1

where B̃ is a constant proportional to the magnitude of f (m) .

23

You might also like