0% found this document useful (0 votes)

56 views123 pages

Mathematical Methods

The document is a course outline for Mathematical Methods 3 at University College London, detailing topics covered in the Autumn Term of 2012. It includes sections on calculus with several variables, calculus of variations, partial differential equations, and various mathematical theories and equations. The document also contains appendices with additional resources and Sage code for diagrams.

Uploaded by

Bobby Weche

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views123 pages

Mathematical Methods

Uploaded by

Bobby Weche

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 123

University College London

Autumn Term 2012

Mathematical Methods 3

Jonny Evans

Last updated January 15, 2013

2
Contents

I Critical points in finite- and infinite-dimensional prob-

lems 13
1 Calculus with several variables 15
1.1 First derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.1.1 Partial derivatives . . . . . . . . . . . . . . . . . . . . . . 15
1.1.2 Directional derivative . . . . . . . . . . . . . . . . . . . . 16
1.1.3 Linear approximation . . . . . . . . . . . . . . . . . . . . 18
1.1.4 The gradient . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.1.5 The matrix of partial derivatives . . . . . . . . . . . . . . 20
1.1.6 Critical points . . . . . . . . . . . . . . . . . . . . . . . . 22
1.2 Second derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.2.2 The Hessian and Taylor’s theorem . . . . . . . . . . . . . 26
1.2.3 The second derivative test . . . . . . . . . . . . . . . . . . 26
1.2.4 Higher dimensions . . . . . . . . . . . . . . . . . . . . . . 28
1.3 Constrained optimisation . . . . . . . . . . . . . . . . . . . . . . 30
1.3.1 The geometric idea... . . . . . . . . . . . . . . . . . . . . . 30
1.3.2 ...and in practice . . . . . . . . . . . . . . . . . . . . . . . 32

2 Calculus of variations 35
2.1 Straight lines are shortest paths . . . . . . . . . . . . . . . . . . . 35
2.1.1 Paths and length . . . . . . . . . . . . . . . . . . . . . . . 36
2.1.2 Straight lines minimise action . . . . . . . . . . . . . . . . 37
2.1.3 Straight lines minimise length . . . . . . . . . . . . . . . . 38
2.2 The Euler-Lagrange equations . . . . . . . . . . . . . . . . . . . . 39
2.2.1 Key steps reviewed . . . . . . . . . . . . . . . . . . . . . . 39
2.2.2 The fundamental theorem of the calculus of variations . . 40
2.2.3 The Euler-Lagrange equation . . . . . . . . . . . . . . . . 41
2.2.4 Beltrami identity . . . . . . . . . . . . . . . . . . . . . . . 43
2.2.5 Vector-valued functions . . . . . . . . . . . . . . . . . . . 44
2.2.6 A caveat . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.3.1 The catenoid . . . . . . . . . . . . . . . . . . . . . . . . . 46

3
4 CONTENTS

2.3.2 Area of a frustum . . . . . . . . . . . . . . . . . . . . . . 49

2.3.3 The brachistochrone . . . . . . . . . . . . . . . . . . . . . 51
2.4 Constrained problems . . . . . . . . . . . . . . . . . . . . . . . . 53
2.4.1 The algorithm . . . . . . . . . . . . . . . . . . . . . . . . 53
2.4.2 The isoperimetric problem . . . . . . . . . . . . . . . . . . 54
2.4.3 The catenary . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.5 Several dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.5.1 Euler-Lagrange equation in two variables . . . . . . . . . 57
2.5.2 Laplace’s equation . . . . . . . . . . . . . . . . . . . . . . 58
2.5.3 Minimal surface equation . . . . . . . . . . . . . . . . . . 58

II Partial differential equations 61

3 Some general theory 63
3.1 The definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.2 Quasilinear second order equations in two variables . . . . . . . . 65
3.3 Some basic tricks . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.3.1 First order . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.3.2 Second order . . . . . . . . . . . . . . . . . . . . . . . . . 67

4 First order PDE 71

4.1 Linear case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.2 Linear case, varying coefficients . . . . . . . . . . . . . . . . . . . 72
4.3 Quasilinear case . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.3.1 Characteristic vector field . . . . . . . . . . . . . . . . . . 75
4.3.2 Cauchy data . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.3.3 An example: Burgers’s equation . . . . . . . . . . . . . . 76
4.3.4 Caustics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.4 Fully nonlinear case . . . . . . . . . . . . . . . . . . . . . . . . . 83

5 The heat equation and Laplace’s equation 87

5.1 The heat equation . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.1.1 Boundary conditions . . . . . . . . . . . . . . . . . . . . . 87
5.1.2 Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.1.3 Separation of variables . . . . . . . . . . . . . . . . . . . . 88
5.1.4 Fitting to boundary conditions . . . . . . . . . . . . . . . 89
5.1.5 Fitting to initial condition . . . . . . . . . . . . . . . . . . 90
5.1.6 The smoothing effect of parabolicity . . . . . . . . . . . . 91
5.2 Laplace’s equation . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.2.1 Harmonic functions . . . . . . . . . . . . . . . . . . . . . 91
5.2.2 The maximum principle . . . . . . . . . . . . . . . . . . . 92
5.2.3 Steady temperature distributions . . . . . . . . . . . . . . 93
5.2.4 Solution on a rectangular domain . . . . . . . . . . . . . . 94
5.2.5 Some examples . . . . . . . . . . . . . . . . . . . . . . . . 98
5.3 Eigenfunction expansions . . . . . . . . . . . . . . . . . . . . . . 99
CONTENTS 5

5.4 Discontinuities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

6 The wave equation 105

6.1 Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.2 Boundary conditions . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.2.1 The bounded string: Fourier’s solution . . . . . . . . . . . 107
6.2.2 The infinite string: D’Alembert’s solution . . . . . . . . . 109
6.2.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.2.4 Propagating signals . . . . . . . . . . . . . . . . . . . . . 110
6.2.5 Comparison of Fourier and D’Alembert solutions . . . . . 111
6.3 Some simple waves on an infinite string . . . . . . . . . . . . . . 112
6.3.1 Simple waves on an infinite string . . . . . . . . . . . . . 112
6.3.2 Reflection and transmission coefficients . . . . . . . . . . 112

III Appendices 115

A Recap of Fourier theory 117
A.1 Fourier series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
A.2 Square-integrable functions and Parseval’s theorem . . . . . . . . 118
A.3 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

B Sage code for diagrams 121

B.1 Figure 1.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
B.2 Figure 1.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
B.3 Figure 1.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
B.4 Figure 1.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
B.5 Figures 4.1 and 4.2 . . . . . . . . . . . . . . . . . . . . . . . . . . 122
B.5.1 Parts (a) . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
B.5.2 Figure 4.1(b) . . . . . . . . . . . . . . . . . . . . . . . . . 122
B.5.3 Figure 4.2(b) . . . . . . . . . . . . . . . . . . . . . . . . . 123
B.6 Figure 4.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6 CONTENTS
Introduction and notation

This course is an introduction to some of the most ingenious ideas in 18th cen-
tury mathematics. Some of these ideas are so important that their discoverers’
names appear on the side of the Eiffel Tower.

If you perform sufficiently well on the final exam, who knows where your name
will end up?

Structure

The course is divided into two parts.

In the first part we will see how to turn some natural physical and geometric
problems into differential equations. Some of these will be ordinary differential
equations (for functions of a single variable) others will be partial differential
equations (for functions of several variables). How do these equations arise?
In ordinary calculus of a single variable, maxima and minima of a function
φ : R → R at x0 ∈ R are found by differentiating and finding the zeros of the

7
8 CONTENTS

resulting function. The result is an equation for x0 :

dφ
(x0 ) = 0
dx
Now suppose that you’re interested in minimising or maximising something over
a space of all functions! Suddenly you have to differentiate a functional, F (φ)
i.e. a function which eats a function and outputs a number. The resulting
Euler-Lagrange equation
∂F
(φ0 ) = 0
∂φ
is usually a differential equation for φ0 . If φ0 itself is a function of several vari-
ables then it is likely that this Euler-Lagrange equation is a partial differential
equation.
The physical and geometric problems we are interested in are very natural, for
example:
• What is the shortest path between two points in the plane? (i.e. minimise
the length functional over all paths).
• What is the shape in the plane with maximal area amongst shapes of a
fixed perimeter? (i.e. maximise area bounded under the constraint of
fixed perimeter).
• What is the function φ : U → R (U ⊂ R2 ) which minimises the surface
area of its graph subject to the boundary condition that φ|∂U is some fixed
function?
The first two give rise to ordinary differential equations (for example, the second
becomes a harmonic oscillator equation for the (derivatives of the) components
of the parametric curve bounding the shape). The last gives a quasilinear sec-
ond order partial differential equation. I assume that you know how to solve
harmonic oscillators and other ordinary differential equations, however...
...The second part of the course will develop techniques to solve some partial
differential equations. We begin by treating first order equations (only involving
first partial derivatives). These can be tackled by a technique called the method
of characteristics which originates in geometric optics. In optics with the speed
of light set equal to 1, there is a first order PDE called the eikonal equation
2 2
∂φ ∂φ
+ =1
∂x ∂y

satisfied by the function φ(x, y) whose value at (x, y) ∈ R2 is the length of time
it take light to travel to (x, y) from some fixed light-emitting curve C ⊂ R2 (the
light being emitted in a normal direction to C). The obvious way to solve this is
to draw a straight line connecting C to (x, y) and meeting C at ninety degrees.
Let t be the parameter for this line (x(t), y(t)) and observe that if we define
f (t) = φ(x(t), y(t)) then f (t) satisfies the ODE df /dt = 1 (simply because it
CONTENTS 9

takes light time t to travel t units because the speed of light is 1). I’m not
saying that this is obvious from the eikonal equation, but from the physical
interpretation of the solution φ it’s actually a tautology.
The method of characteristics generalises this example and constructs, for any
first order PDE, a system of “characteristic curves” (in this example the light
rays emitted normally from C) along which the PDE reduces to an ODE.
Finally we move on to tackle second order equations. We will concentrate on
three which arise very naturally in physics and geometry:
• The heat equation
∂φ ∂2φ
= .
∂t ∂x2

• Laplace’s equation
∂2φ ∂2φ
+ 2 = 0.
∂x2 ∂y

• The wave equation

∂2φ 1 ∂2φ
= .
∂x2 c2 ∂t2

These are linear and therefore highly amenable to solution. They also typify
three different classes of second order equations with vastly different behaviours:
parabolic, elliptic and hyperbolic.
The general strategy of this final part of the course will be the following:
• Find infinitely many special solutions to the equation. Maybe they don’t
satisfy the boundary conditions you’re interested in, but don’t worry. This
step uses a clever idea called separation of variables: you assume the solu-
tion has the form φ(x, y) = X(x)Y (y), that is a product of two functions
each depending on only one of the variables. Then the PDE reduces to a
pair of ODEs which are easy to solve. Of course, separation of variables
rarely works, but with these simple linear equations it is fiercely effective.
• Take infinite linear combinations of these special solutions to fit to your
boundary conditions. By linearity of the PDE one can take linear combi-
nations of solutions and obtain a function which still solves the equation.
Fitting infinite sums of special functions (like sine and cosine) to arbitrary
functions is the subject of Fourier theory and indeed this is how Fourier
discovered Fourier theory.
There are also several appendices: the first is a recap of Fourier theory because
we will use lots of Fourier series in the second half of the course; the second
includes much of the Sage code I used to create the diagrams (I also used
Inkscape); the other appendices comprise all the problem sheets for the course.
10 CONTENTS

Videos
There are supplementary videos for this course, available on YouTube. These
cover a mixture of basic material prerequisite for understanding the course and
extension material for those who care to watch it. Links to these videos are in
the relevant places in the notes.

What you should already know

You should be:
• Comfortable computing Fourier series and manipulating trigonometric ex-
pressions. If you’re struggling, try working through the Schaum Outline
Series volume on Fourier Analysis.
• Able to solve a wide variety of ordinary differential equations. At the very
least the simple harmonic equation X 00 = λX! If you’re struggling, try
working through the Schaum Outline Series volume on Ordinary Differ-
ential Equations.
• Able to remember and apply Green’s theorem and have a rudimentary
knowledge of vector calculus (div, grad and curl, tangent vectors to curves
etc.).
• Ready to try solving a problem for which you have not been given a precise
template (most important).

Notation
Sections, like this one, which are marked with a left bar contain material I
consider to be nonexaminable, usually proofs. That doesn’t mean you should
ignore them, just that you shouldn’t sit there memorising them getting your
knickers into a twist.
Partial derivatives are written with curly dees ∂f ∂x . Ordinary derivatives are
df d
written with Latin dees dt . The only difference is that dt means the function
being differentiated is a function of a single variable. People who write partial
derivatives with Latin dees are wrong and will be penalised. People who write
ordinary derivatives with curly dees on the basis that single variable is a special
case of several variables are smart, but still wrong, because it’s misleading for
the reader. If ever I write d/dt instead of ∂/∂t and you don’t know why I did
it, challenge me (there’s always the possibility that I made a mistake).
Also, if x(t) is a function of time t, the notation ẋ indicates differentiation with
respect to t. If y(x) is a function of x, the notation y 0 indicates differentiation
CONTENTS 11

with respect to x. It’s possible that 0 might denote differentiation with respect
to another variable like t. It shouldn’t bother you because it will only ever occur
when the function in question is a function of a single variable, so x0 or ẋ just
mean “differentiate x with respect to whatever it depends on”.

Acknowlegdements
This course owes a lot to the course taught in previous years by Dr R Bowles and
Dr G Esler. Their notes have been indispensible in preparing these notes; many
of the nice questions from their problem sets have made it onto these problem
sets. Arnold’s book “Lectures on Partial Differential Equations” provided most
of the inspiration for my presentation of the method of characteristics; Spiegel’s
“Fourier Analysis” has been useful too: I learned Fourier series from it myself,
long ago. Thanks most of all to my extremely diligent markers (Giancarlo
Grasso, Abbygail Shaw and Huda Ramli) and my wonderful and inquisitive
second years for pointing out so many errors and typos in the lectures, in the
notes and on the question sheets.
12 CONTENTS
Part I

Critical points in finite- and

infinite-dimensional
problems

13
Chapter 1

Calculus with several

variables

For most of what I say, we’ll take “several” to mean “two” to save on notation.
There are no new ideas introduced by adding more variables. Henceforth we’ll
use (x, y) to denote Cartesian coordinates on the plane R2 .

1.1 First derivatives

1.1.1 Partial derivatives

Given a function f : R2 → R, its partial derivatives are what we get by differ-

entiating with respect to one of the variables x or y and keeping the other fixed.
In other words, to get the partial derivative with respect to x, we restrict f to
the horizontal lines y = y0 (y0 is some constant) and we get a function of one
variable
f (x, y0 ),

which we know how to differentiate (assuming it’s differentiable!). We define

the first partial derivative of f with respect to x at (x0 , y0 ) as

∂f d
(x0 , y0 ) ≡ f (x, y0 )
∂x dx x=x0
f (x0 + h, y0 ) − f (x0 , y0 )
≡ lim .
h→0 h

15
16 CHAPTER 1. CALCULUS WITH SEVERAL VARIABLES

Similarly

∂f d
(x0 , y0 ) ≡ f (x0 , y)
∂y dy y=y0
f (x0 , y0 + k) − f (x0 , y0 )
≡ lim .
k→0 k
∂f ∂f
Example 1. 1. If f (x, y) = x2 y 3 then ∂x = 2xy 3 and ∂y = 3x2 y 2 .
∂f ∂f
2. If f (x, y) = sin(x + y) then ∂x = cos(x + y) = ∂y .

We will sometimes abbreviate ∂f ∂x to ∂x f or just fx . Often you will see it written

f,x to distinguish the subscript from an ordinary subscript (like a vector index).

1.1.2 Directional derivative

Why should we restrict attention to the lines x = x0 or y = y0 ? Let’s pick a

vector v ∈ R2 and look at all lines pointing in the direction v, i.e. all lines of
the form
{p + tv : t ∈ R}.
Suppose we want to compute the derivative of f at p = (x0 , y0 ) in the direction
of v = (v1 , v2 ). Restrict f to the line (x0 + tv1 , y0 + tv2 ) and differentiate with
respect to t to get what we call the directional derivative of f in the direction
v at the point p:
d
vp (f ) ≡ f (x0 + tv1 , y0 + tv2 )
dt t=0
Certainly if v = (1, 0) then we get

∂f
vp (f ) = (p)
∂x
or if v = (0, 1) then we get
∂f
vp (f ) = (p).
∂y
Theorem 2. Suppose that the partial derivatives of f with respect to x and y
exist everywhere and that they vary continuously. Let v = (v1 , v2 ) be a vector
and p ∈ R2 be a point. Then

∂f ∂f
vp (f ) = v1 (p) + v2 (p).
∂x ∂y

This theorem is a nice piece of analysis and therefore we won’t prove it. You
can tell it’s nontrivial because it’s not immediately obvious what continuity of
the partial derivatives has to do with anything.
1.1. FIRST DERIVATIVES 17

The directional derivatives are just linear combinations of the directional deriva-
tives in the x- and y-directions. What does this mean geometrically, in terms
of the graph of f ?
In single-variable calculus, vectors in R have just one component v. The
directional derivative of a function g : R → R along the vector v at x is
vx (g) = vg 0 (x). As v varies this traces out a line with slope g 0 (x), which is
tangent to the graph of g when we translate it to (x, g(x)).
In two variables, the directional derivative of f : R2 → R2 in the direction
v = (v1 , v2 ) at p = (x0 , y0 ) is

∂f ∂f
vp (f ) = v1 (p) + v2 (p)
∂x ∂y

which traces out a plane as v1 and v2 vary. This plane is tangent to the graph
of f when it is translated to the point (x0 , y0 , f (x0 , y0 )). In other words, the
geometric content of the above theorem is that the graph of a function with
continuous partial derivatives admits a tangent plane.
Example 3. You cannot drop the assumption of continuity of partial deriva-
tives. A counterexample would be the function
(
xy
2 2 when (x, y) 6= (0, 0)
f (x, y) = x +y .
0 at (x, y) = (0, 0)

Both partial derivatives exist everywhere. They vanish at the origin, but for
example
∂f y 3 − x2 y
= 2
∂x (x + y 2 )2
and along the y-axis this is y 3 /y 4 = 1/y which does not tend to zero as y →
0, hence the partial derivatives are not continuous. The function is not even
continuous when restricted to a line which is not x = 0 or y = 0 and hence the
other directional derivatives don’t even exist.

To get away from difficulties like this, we will only discuss functions with con-
tinuous first partial derivatives, also called C 1 -functions, or once continuously
differentiable functions.
In this case we can define total derivative at p

dp f : R2 → R

which encodes all directional derivatives. The total derivative eats a vector v
and outputs the directional derivative in the v-direction at p, that is

dp f (v) = vp (f ).
18 CHAPTER 1. CALCULUS WITH SEVERAL VARIABLES

Figure 1.1: The graph of the function from Example 3 which has discontinuous
partial derivatives and hence no tangent plane at the origin.

By Theorem 2, at each point this is a linear map

v1
dp f = (∂x f )v1 + (∂y f )v2
v2

v1
= ∂x f ∂y f
v2

so we can write df as a row-vector (also known as a covector)

df = ∂x f ∂y f .

1.1.3 Linear approximation

The point of introducing the total derivative dp f : R2 → R is that it is a good

linear (or “first order”) approximation to the original function f : R2 → R at
the point p.
1.1. FIRST DERIVATIVES 19

We first recall the situation in one dimension. Let g : R → R be a differentiable

function of one variable. By the definition of the derivative we have the following
fact
g(x0 + ) = g(x0 ) + g 0 (x0 ) + η()
where η() → 0 as → 0. In other words, g(x0 )+g 0 (x0 ) is a good approximation
of g(x0 +) for small . This is actually equivalent to the usual definition because
by rearranging we have

g(x0 + ) − g(x0 )
= g 0 (x0 ) + η()

Let f : R2 → R be a function with continuous partial derivatives. Let (v1 , v2 )

be a vector and p = (x0 , y0 ) a point. Since the above approximation holds in
all directions and using Theorem 2 to express the directional derivative in the
(v1 , v2 )-direction in terms of the partial derivatives, we get

f (x0 + v1 , y0 + v2 ) = f (x0 , y0 ) + v1 ∂x f (x0 , y0 ) + v2 ∂y f (x0 , y0 ) + |v|η(|v|)

where η() → 0 as → 0. In other words,

f (x0 + v1 , y0 + v2 ) = f (x0 , y0 ) + dp f (v) + |v|η(|v|).

Remark 4. The total derivative dp f is a good approximation to f at p in the

same way that the tangent plane to the graph Graph(f ) of f at (x0 , y0 , f (x0 , y0 ))
in R3 stays close to the graph itself in a neighbourhood of p. In fact, the
tangent space at p to Graph(f ) is the image dp f (R2 ), translated to the point
(x0 , y0 , f (x0 , y0 )).

1.1.4 The gradient

Another, equivalent way of packaging the information in the total derivative is

to give the gradient ∇f of the function f . This is the column vector
!
∂f
∂x (p)
(∇f )(p) = ∂f
∂y (p)

(with more vertical entries if there are more coordinates). Note that by defini-
tion:
dp f (v) = (∇f )(p) · v.
Lemma 5. The gradient (∇f )(p) points in the direction of maximal increase
of f at p. Moreover, its magnitude equals the directional derivative in the cor-
responding unit direction (∇f )(p)/|(∇f )(p)|.
20 CHAPTER 1. CALCULUS WITH SEVERAL VARIABLES

Figure 1.2: The graph of the total derivative dp f translated to the point (p, f (p))
is the tangent plane to Graph(f ) at (p, f (p)).

Proof. Let’s assume that p is not a critical point of f , otherwise the Lemma
is obvious (because both dp f and (∇f )(p) vanish). If v is a unit vector then
|dp f (v)| = |(∇f )(p) · v| = |(∇f )(p)||v| cos(θ) where θ is the angle between v and
(∇f )(p). This is clearly maximal when cos(θ) = 1, that is when θ = 0. The
directional derivative of f in the (∇f )(p)/|(∇f )(p)|-direction is

(∇f )(p) (∇f )(p)
dp f = (∇f )(p) · = |(∇f )(p)|.
|(∇f )(p)| |∇f |

1.1.5 The matrix of partial derivatives

In general for a vector-valued function F = (F1 , . . . , Fm ) : Rn → Rm there are

n × m derivatives which we write in an m-by-n matrix
∂F1 ∂F1
···
 
∂x1 (p) ∂xn (p)
dp F = 
 .
. .. .. 
. . . 
∂Fm ∂Fm
∂x1 (p) · · · ∂xn (p)
1.1. FIRST DERIVATIVES 21

In particular, for a change of variables F : R2 → R2 ,

F (x, y) = (u(x, y), v(x, y))

we define the Jacobian matrix

!
∂u ∂u
∂x ∂y
∂v ∂v
∂x ∂y

The naturality of writing this as a matrix comes from the following theorem.
Theorem 6 (Chain rule). If F : Rn → Rm and G : Rm → R` are differentiable
functions then
dp (G ◦ F ) = dF (p) G ◦ dp F
where ◦ denotes composition of functions on the left-hand side and matrix mul-
tiplication on the right.

We first make explicit note of a useful special case.

Corollary 7. Suppose that F : R2 → R2 is a change of coordinates, (u, v) =
F (x, y), and G : R2 → R is a function. The function G ◦ F expresses G(u, v)
in terms of the coordinates (x, y) and we have
!
∂u ∂u
∂G ∂G ∂G ∂G ∂x ∂y

∂x ∂y = ∂u ∂v ∂v ∂v
∂x ∂y
∂G ∂G ∂u ∂G ∂v
i.e. = + ,
∂x ∂u ∂x ∂v ∂x
∂G ∂G ∂u ∂G ∂v
= + .
∂y ∂u ∂y ∂v ∂y

Proof of Theorem 6. See http://youtu.be/BtgBCPdT8rM. Let v be a vector

and a parameter which we assume takes on small values. We have

F (p + v) = F (p) + dp F (v) + |v|η(|v|)

and
G(u + w) = G(u) + du G(w) + |w|ξ(|w|)
for some functions η, ξ : R → Rm such that |η(r)| and |ξ(r)| tend to zero as
r → 0. Therefore

G(F (p + v)) = G(F (x) + dp F (v) + |v|η(|v|))

so if we set u = F (p) and w = dp F (v) + |v|η(|v|), we get

G(F (p + v)) = G(F (p)) + dF (p) G ◦ (dp F (v) + |v|η(|v|)) + |w|ξ(|w|)
22 CHAPTER 1. CALCULUS WITH SEVERAL VARIABLES

which equals

G(F (p)) + dF (p) G ◦ dp F (v) + dF (p) G(|v|η(|v|)) + |w|ξ(|w|)

by linearity of dF (p) G. The term

dF (p) G(|v|η(|v|)) + |w|ξ(|w|)

goes to zero as → 0 and since this linear approximation determines our first
derivative we know that

dp (G ◦ F ) = dF (p) G ◦ dp F

In case you find the level of abstraction in this proof intimidating, I suggest
you try to prove the corollary explicitly in coordinates as it is stated. In case
you find the proof fun, I suggest you go to Analysis 4 to see more where that
came from.

1.1.6 Critical points

We are interested in maxima and minima of functions. I just want to recall:

Definition 8. A function f has a local maximum (respectively local minimum)
at p if there is a neighbourhood of p such that f (p) ≥ f (x) (respectively f (p) ≤
f (x)) for all x in the neighbourhood.

In one variable, we have the following theorem:

Theorem 9. Suppose that g : R → R is a differentiable function with a local
maximum or a local minimum at x0 ∈ R. Then g 0 (x0 ) = 0.

Geometrically, you drop a horizontal line down onto the graph of f until it hits.
The point where it hits is clearly a local maximum and the horizontal line is
tangent there.

Proof. Let’s prove it for local maxima. Since g is differentiable it can be ap-
proximated by
g(x0 + ) = g(x0 ) + g 0 (x0 ) + η()
where η → 0 as → 0. If g 0 (x0 ) 6= 0 then it has a sign, ±. We consider small
with the same sign. When the magnitude of is sufficiently small, η() can be
made arbitrarily small relative to g 0 (x0 ) and hence the error term η() can be
made arbitrarily smaller in magnitude than g 0 (x0 ).
In particular, if
g(x0 ) + g 0 (x0 ) > g(x0 )
1.1. FIRST DERIVATIVES 23

Figure 1.3: A function with a local maximum. Note that the tangent plane to
the graph at the local maximum is horizontal.

for very small then

g(x0 + ) = g(x0 ) + g 0 (x0 ) + η() > g(x0 )
because the error term is not big enough to change the inequality. Because we
took to have the same sign as g 0 (x0 ) we certainly have g(x0 ) + g 0 (x0 ) > g(x0 )
and hence this is a contradiction to local maximality of g at x0 .

In two variables, the geometric idea is the same: you drop a horizontal plane
down onto the graph and when it hits it is tangent (see Figure 1.3).
Theorem 10. If f : R2 → R is a C 1 -function with a local maximum or local
minimum at p = (x0 , y0 ) then all directional derivatives vanish at p, in other
words dp f = 0.

Proof. Suppose f has a local maximum or minimum at p. Let {(x0 + tv1 , y0 +

tv2 ) : t ∈ R} be the line through p in the direction v = (v1 , v2 ). Restrict f to
this line; it still has a maximum or minimum at t = 0. Hence, by Theorem 9,
d
f (x0 + tv1 , y0 + tv2 ) = 0
dt t=0

but this is precisely vp (f ).

24 CHAPTER 1. CALCULUS WITH SEVERAL VARIABLES

Definition 11 (Critical point). Let f : R2 → R be a C 1 -function. A point

p ∈ R2 with dp f = 0 is called a critical point. Equivalently, all directional
derivatives of f vanish at p, or the tangent plane of the graph of f at p is
horizontal.

Not all critical points are maxima and minima. Even in one variable we have
inflection points. The analogous critical points in two variables are the saddle
points, see Figure 1.4.

1.2 Second derivatives

In one variable there is a sufficient condition for a critial point to be a local

maximum (or a local minimum) in terms of its second derivatives. This is
called the second derivative test and is proved using Taylor’s theorem:
Theorem 12 (Taylor’s theorem). Suppose that g : R → R is a twice differen-
tiable function with derivatives g 0 and g 00 . Then

2 00
g(x0 + ) = g(x0 ) + g 0 (x0 ) + g (x0 ) + 2 η()
2
for some function η which tends to zero as → 0.
Corollary 13 (Second derivative test). Suppose that g : R → R is a differ-
entiable function and that g has a critical point at x0 . Suppose moreover that
g 00 (x0 ) < 0. Then g has a local maximum at x0 .

Proof. From Taylor’s Theorem:

1
g(x0 + ) = g(x0 ) + g 0 (x0 ) + 2 g 00 (x0 ) + 2 η()
2
1
= g(x0 ) + 2 g 00 (x0 ) + 2 η()
2

since g 0 (x0 ) = 0 by the condition that x0 is critical. Now by taking small

enough, we can ensure that |η()| < 21 |g 00 (x0 )| so if g 00 (x0 ) < 0 then g 00 (x0 ) +
η() < 0 and hence
g(x0 + ) < g(x0 )

for all small . Hence g has a local maximum at x0 .

We will now develop the formalism of second partial derivatives in two vari-
able calculus, prove the analogue of Taylor’s theorem and explain the second
derivative test for critical points in this context.
1.2. SECOND DERIVATIVES 25

1.2.1 Definition

The second partial derivative

∂2f
∂x2
of a function f : R2 → R is defined by differentiating the function ∂f ∂x with
respect to x, keeping y fixed. Similarly one can define partial derivatives

∂2f ∂2f ∂2f

, , .
∂x∂y ∂y∂x ∂y 2

As before, to avoid analytical subtleties, we will assume that all partial deriva-
tives exist and are continuous, a property which we will call C 2 . With this
assumption we have the following
Theorem 14 (Symmetry of mixed partial derivatives). If f : R2 → R is a
C 2 -function then
∂2f ∂2f
= .
∂x∂y ∂y∂x

Given the definition

d
vp (f ) ≡ f (x0 + tv1 , y0 + tv2 )
dt t=0

of the directional derivative in the direction v = (v1 , v2 ) at p = (x0 , y0 ), the

definition of a “second directional derivative” should be clear:
d2
f (x0 + tv1 , y0 + tv2 ).
dt2 t=0

Equivalently, consider the directional derivative vp (f ) as a function of p,

v· (f ) : R2 → R

and take its directional derivative in the v direction:

vp (v· (f ))

Henceforth we’ll omit the subscript p, so this can be written as

v 2 (f ).

Of course you could also differentiate v(f ) in the w-direction for a different
vector w. Taking v and w to be standard basis vectors, you end up with the
partial derivatives
∂2f ∂2f ∂2f
, , .
∂x2 ∂x∂y ∂y 2
26 CHAPTER 1. CALCULUS WITH SEVERAL VARIABLES

Lemma 15. If v = (v1 , v2 ) then

∂2f ∂2f 2
2∂ f
v 2 (f ) = v12 + 2v 1 v 2 + v 2 .
∂x2 ∂x∂y ∂y 2

Proof. We know that

v(f ) = (v1 ∂x + v2 ∂y )f
so

v(v(f )) = (v1 ∂x + v2 ∂y )(v1 ∂x + v2 ∂y )f

∂2 ∂2 ∂2

= v12 2 + 2v1 v2 + v22 2 f
∂x ∂x∂y ∂y

by the binomial theorem.

The generalisation to higher derivatives can be derived similarly and involves

binomial coefficients.

1.2.2 The Hessian and Taylor’s theorem

The expression
∂2f ∂2f ∂2f
v12 2
(p) + 2v1 v2 (p) + v22 2 (p)
∂x ∂x∂y ∂y
is a quadratic form in the coefficients of v. We can write it as
∂2f ∂2f
!
∂x2 ∂x∂y v1
v 1 v2 ∂2f ∂2f
2
v2
∂x∂y ∂y

The two-by-two matrix is called the Hessian of f at p, Hessp (f ). The quadratic

form in the coefficients of v is v T Hessp (f )v.
Theorem 16. Suppose that f : R2 → R is a C 2 -function. Then
1
f (p + v) = f (p) + dp f (v) + v T Hess(f )v + |v|η(|v|)
2
for some function η(r) which tends to zero as r → 0.

1.2.3 The second derivative test

Theorem 17. Suppose that f : R2 → R is a C 2 -function and suppose more-

over that p is a critical point of f , i.e. dp f = 0. If both eigenvalues of the
matrix Hess(f ) are negative (respectively positive) then p is a local maximum
(respectively local minimum).
1.2. SECOND DERIVATIVES 27

Proof. First notice that Hessp (f ) is a symmetric matrix. Therefore its eigen-
values are both real, say λ1 , λ2 ∈ R. Suppose they are both negative. We can
diagonalise Hessp (f ) by an orthogonal matrix U :

λ1 0
U T Hessp (f )U =
0 λ2

Let e1 = (1, 0) and e2 = (0, 1) denote the basis vectors. Then U e1 is a λ1 -

eigenvector for Hessp (f ). Taking the Taylor expansion we get

1
f (p + U e1 ) = f (p) + dp f (U e1 ) + eT1 U T Hessp (f )U e1 + η()
2
Since p is a critical point, the first derivative term vanishes and we get

1
f (p + U e1 ) = f (p) + eT1 U T Hessp (f )U e1 + η()
2

Using the fact that U T = U −1 (orthogonality of U ), the second derivative term

is

1 T T 1 λ1 0 1
e1 U Hessp (f )U e1 = 1 0
2 2 0 λ2 0
1
= λ1
2
<0

and similarly for λ2 . The error term can be made very small when is very small,
so that f (p + e1 ) < f (p). Similarly for e2 and indeed for any v = v1 e1 + v2 e2
we get
1
f (p + v1 e1 + v2 e2 ) = f (p) + (λ1 v12 + λ2 v22 ) + · · ·
2
which is < f (p) when |v| is very small because λ1 , λ2 are both negative. This
proves that p is a local maximum and the same proof would demonstrate a local
minimum in case both eigenvalues were positive.

Corollary 18. Suppose that f : R2 → R is a C 2 -function and suppose moreover

that p is a critical point of f . If

det(Hessp (f )) > 0

then p is either a local maximum or a local minimum of f . If moreover

Trace(Hessp (f )) < 0, (respectively > 0)

then p is a local maximum (respectively local minimum).

28 CHAPTER 1. CALCULUS WITH SEVERAL VARIABLES

Proof. Since Hessp (f ) can be diagonalised by an orthogonal matrix U ,

λ1 0
det(Hessp (f )) = det U UT
0 λ2
= det(U )λ1 λ2 det(U T )
= λ1 λ2

since det(U ) = det(U T ) = ±1.

If det(Hessp (f )) > 0 this means that λ1 and λ2 are nonzero and have the same
sign. Hence Theorem 17 applies.
Of course, we cannot tell whether λ1 and λ2 are both positive or both negative
without further information and hence we don’t know whether p is a maximum
or a minimum. In the case when λ1 and λ2 have the same sign, it suffices to
compute the trace of Hessp (f ), since this is equal to λ1 + λ2 which has the same
sign as both λ1 and λ2 . This proves the theorem.
Remark 19. When there are eigenvalues equal to zero, the Taylor series ar-
gument doesn’t work because the error term dominates the second order terms
in certain directions. We call such critical points degenerate. They are more
complicated.
Definition 20. A critical point p of a function f : R2 → R is called nondegen-
erate if all its eigenvalues are nonzero. Equivalently, if det(Hessp (f )) 6= 0.

So what do things look like at a nondegenerate critical point if the determinant

of the Hessian at a critical point is negative? That is, if one of the eigenvalues
is positive and the other is negative? This is called a saddle point.
Example 21 (A saddle point). Consider the function

f (x, y) = x2 − y 2 .

The origin is a critical point and the Hessian is

1 0
Hess0 (f ) =
0 −1

with eigenvalues ±1. We can see from Figure 1.4 that the function has a maxi-
mum along the y-direction and a minimum along the x-direction.

1.2.4 Higher dimensions

See http://youtu.be/CNZOEPGKzcA.
In higher dimensions the Hessian is a larger matrix, but symmetry of partial
derivatives means that it is still symmetric. More complicated nondegenerate
1.2. SECOND DERIVATIVES 29

Figure 1.4: A function with a saddle point.

critical points are possible, but because we can always diagonalise a symmetric
matrix, these critical points will always look like one of a finite list of models
depending on how many positive and negative eigenvalues the Hessian has.
These models are
m
X n
X
x2k − x2k
k=1 k=m+1

and are known as quadratic or Morse singularities.

Note that the diagonalisation argument still works: if p is a critical point of

f and u1 , . . . , un is an orthonormal basis of eigenvectors for Hessp (f ) and v =
v1 u1 + · · · + vn u+ n is a vector then the Taylor expansion is

1
f (p + v) = f (p) + (λ1 v12 + · · · + λn vn2 ) + · · · ,
2

so maxima are still points where all λk are negative and minima are still points
where all λk are positive.
30 CHAPTER 1. CALCULUS WITH SEVERAL VARIABLES

1.3 Constrained optimisation

1.3.1 The geometric idea...

We are often interested in solving a maximisation/minimisation problem with

an extra constraint.
Example 22. • How close does the ellipse x2 /a2 + y 2 /b2 = 1 get to the
point p? This is quite easy: just parametrise the ellipse as x = a cos(t),
y = b sin(t), write the distance between (x(t), y(t)) and p as a function of
t and find and analyse the stationary points.
• How close does the surface x3 + y 3 + z 3 = 3xyz get to the point (0, 0, 1)?
It’s not so easy to find a convenient parametrisation of this cubic surface!

For the sake of clarity we restrict attention to functions f : R2 → R of two

variables and, given a curve C ⊂ R2 we restrict the function f to C and obtain
a new constrained function h ≡ f |C . The question we want to answer is: what
are the critical points of h?
There are two ways in which this can happen (see Figure 1.5):
• It is possible that a critical point of f happens to lie on C. For example,
take f (x, y) = x2 − y 2 and the curve C = {x = 0}. The restriction of f to
C is the function h(y) = −y 2 (where y is the remaining coordinate on C).
This curve passes through the saddle point at (0, 0) and h has a maximum
there.
• It is also possible that f has no critical points and yet f |C has some. For
instance, take f (x, y) = x: since ∂f /∂x = 1 this has no critical points. But
if we restrict to the circle C = {x2 + y 2 = 1} then certainly h(θ) = cos(θ)
has a maximum at θ = 0 and a minimum at θ = π (where θ is the angular
coordinate on the circle). Notice that these two critical points occur at
the points where the level sets of f (the vertical lines) are tangent to the
circle C. This is no coincidence...
Theorem 23 (See Figure 1.5). Let f : R2 → R be a C 1 -function and C ⊂ R2
a curve in the plane. Let h denote the restriction f |C . Then h has a critical
point at p ∈ C if and only if either
• f has a critical point at p or
• the level set f −1 (f (p)) and the level set C have the same tangent line at
p.
I should add the caveat that we need the C to be a nice smooth curve. In par-
ticular, we require that near every point p ∈ C there is a C 1 -parametrisation
γ : (−T, T ) → C with γ(0) = p and γ̇(0) 6= 0. Some day you will see a the-
orem (maybe you already have) called the implicit function theorem which
1.3. CONSTRAINED OPTIMISATION 31

Figure 1.5: Examples illustrating Theorem 23. In one case (left), C runs through
a critical point of f and inherits a critical point itself. In the other (right), the
dashed contours of f are tangent to C at the two marked points where h = f |C
has a minimum and a maximum

tells you that this parametrisation exists provided that C can be written as
g −1 (0) for a function g which has no critical points on the level set g −1 (0).
Crucially, we don’t need to know the parametrisations to apply the theorem,
we just need to know they exist. Similarly, the level sets of f admit local
parametrisations near p provided p is not a critical point of f .
We start with a lemma to help us understand tangent lines to level sets:
Lemma 24. If f : R2 → R is a C 1 -function and f −1 (r) is one of its level
sets (assumed to contain no critical points) then the tangent line to f −1 (r)
at p ∈ f −1 (r) is precisely the set of vectors v such that dp f (v) = 0.

Proof of Lemma 24. Take a parametrisation γ : (−T, T ) → f −1 (r) with

γ(0) = p and γ̇ 6= 0. The tangent vectors we are looking for are precisely
multiples of γ̇(0) so we want to show that these are annihilated by dp f .

We know that
f ◦γ ≡r
because the image of γ is (by definition) inside the level set f −1 (r). Therefore
d0 (f ◦ γ) = 0. By the chain rule we get
0 = d0 (f ◦ γ) = dγ(0) f (γ̇(0)).
32 CHAPTER 1. CALCULUS WITH SEVERAL VARIABLES

So dp f annihilates γ̇(0), as required.

We also want to show that the kernel is precisely the set of vectors λγ̇(0).
Since p is not a critical point of f we know that there exists a vector v with
dp f (v) 6= 0. If u is any vector then u = λγ̇(0) + µv for some λ, µ ∈ R and
hence
dp f (u) = µ.
This proves that if dp f (u) = 0 then µ = 0 and u is a multiple of γ̇(0).

Proof of Theorem 23. Take p ∈ C and let γ : (−T, T ) → C be a parametri-

sation of a neighbourhood of p in the curve C. We are asking when p is a
critical point of F , i.e. when s = 0 is a critical point of s 7→ f (γ(s)). But
f (γ(s)) = (f ◦ γ)(s) and by the chain rule

d0 (f ◦ γ) = dp f (γ̇(0)).

This vanishes if and only if the direction γ̇(0) (which is tangent to C) is

annihilated by dp f . This certainly happens if p is a critical point of f , which
is the first case of the theorem. If we assume that p is not a critical point of
f then dp f (γ̇(0)) = 0 implies that γ̇(0) is tangent to the level set of f passing
through p, by Lemma 24.

1.3.2 ...and in practice

In practice we write C = {g = 0} for some function g. By Lemma 24, the

tangent line to a level set of f at p is the kernel of the linear map dp f (and
similarly for g). If p is a critical point of h = f |g−1 (0) then, by Theorem 23, the
level sets of f and g share a tangent line. That means that the two linear maps
dp f and dp g have the same kernel and hence they are proportional:

dp f = λdp g for some λ.

We take λ as a new variable (the Lagrange multiplier) and consider the function

H(x, y, λ) = f (x, y) − λg(x, y).

Varying with respect to x and y we get critical points when

dp f − λdp g = 0,

i.e. when dp f and dp g are proportional. Varying with respect to λ, the critical
point condition is
∂H
= −g(x, y) = 0.
∂λ
In other words, critical points of H are triples (x, y, λ) such that
1.3. CONSTRAINED OPTIMISATION 33

Figure 1.6: The setup for Example 25.

• g(x, y) = 0,
• (x, y) is a critical point of F = f |g−1 (0) .
Example 25. How close does the circle x2 + y 2 = 1 get to the point (p, q)? We
know the answer: just rescale (p, q) to have length one and you’ll find the closest
point to (p, q) on the circle. But for practice, let’s work it out using Lagrange
multipliers. Let g(x, y) = x2 + y 2 − 1 and let f (x, y) = (x − p)2 + (y − q)2 , which
measures the (squared) distance to (p, q). In order to minimise f over g −1 (0),
we introduce a Lagrange multiplier λ and seek critical points of

H(x, y, λ) = (x − p)2 + (y − q)2 + λ(x2 + y 2 − 1)

Computing all the first derivatives we get

2(x − p) + 2λx = 0
2(y − q) + 2λy = 0
x2 + y 2 = 1

which gives
p q
x= , y= ,
1+λ 1+λ
so
p2 q2
+ =1
(1 + λ)2 (1 + λ)2
or p
λ= p2 + q 2 − 1.
Substituting back gives
1
(x, y) = √ (p, q)
p, q
as we suspected.
Example 26. Suppose you have enough money to buy four square metres of
cardboard and you want to use it to make a lidless box. However, you’re allowed
to cut the cardboard into the desired shape before you buy it. How should you cut
34 CHAPTER 1. CALCULUS WITH SEVERAL VARIABLES

it to maximise the volume of the box? Well you can control the three dimensions
of the box a, b, c and let’s suppose that the lid would have area bc. The total
surface area is therefore
2ab + 2ac + bc
and the volume is
f (a, b, c) = abc
Take g(a, b, c) = 2ab + 2ac + bc − 4 and introduce a Lagrange multiplier λ. We
need to minimise
H(a, b, c, λ) = abc − λ(2ab + 2ac + bc − 4)
with respect to the four variables a, b, c, λ.
First, because the situation is symmetric with respect to b and c, we expect that
the minimising configuration will have b = c. To prove this, we differentiate
with respect to b and c and we see that, at a critical point,
∂H
0= = ac − λ(2a + c)
∂b
∂H
0= = ab − λ(2a + b)
∂c
This gives
(a − λ)c = 2λa = (a − λ)b
so b = c unless a = λ. If a = λ then ac − λ(2a + c) = 2a2 = 0 so a = 0, but we
are assuming that our box has some height. Therefore we can assume b = c.
The problem is now to minimise
ab2 − λ(4ab + b2 − 4)
and the vanishing of partial derivatives with respect to a, b and λ gives
b2 − 4λb = 0
2ab − 4λa − 2λb = 0
4ab + b2 = 4
Since b 6= 0, the first and third of these equations become
b = 4λ
4 − b2
a=
4b
1 − 4λ2
=
4λ
and hence the second becomes
1 = 12λ2
Hence the solution is √ √
a = 1/ 3, b = 4/ 3.
Chapter 2

Calculus of variations

Up to now we have been interested in a very limited class of optimisation prob-

lems: allowing ourselves a finite set of variables (usually one, two or three) and
maximising a function over these. Things become much more interesting when
we have an infinite amount of freedom to vary. For example, we might be in-
terested in all paths betwen two points, or all surfaces in space with a given
boundary curve, and we might be interested in minimising length, respectively
area. We will first deal with a classic example: proving that a line between two
points in the plane is the shortest path joining them. Then we will move on to
the more general theory, illustrating it with a plethora of examples.

A very nice introduction to these ideas is provided by the Feynman lecture on

the principle of least action.

2.1 Straight lines are shortest paths

I want to convince you that a straight line is the shortest path between two
points in the plane. Maybe you don’t need much convincing of this fact, but
did you ever really think about what this means? I’m saying that if you fix
two points and consider the space of all possible paths between them (which
is of course an incomprehensibly large, in fact infinite dimensional, space) you
can define a functional on this infinite-dimensional space of paths which takes
each path to its length and this functional has a global minimum at the path
given by a straight line. It suddenly seems like a more intimidating task to
really convince ourselves that this is true, and it seems more surprising that our
human intuition picked up on this without our noticing.

35
36 CHAPTER 2. CALCULUS OF VARIATIONS

2.1.1 Paths and length

See http://youtu.be/bM_klC-oAzg. It seems like the key step in proving the

statement is in understanding what we mean by a path. Let us fix two points A
and B in the plane with coordinates (A1 , A2 ) and (B1 , B2 ). By a path between
them we mean a map
γ : [0, 1] → R2

such that γ(0) = A and γ(1) = B. Alternatively, we can project γ(t) to the
two coordinate axes and think of it as a pair of functions γ(t) = (γ1 (t), γ2 (t))
satisfying γ1 (0) = A1 , γ1 (1) = B1 , etc. Now we don’t want our paths to jump
around discontinuously in the plane so we’ll certainly require the two functions
γ1 and γ2 to be continuous. In fact, even to define the length of a path we’re
going to need to require slightly more:

• we need γ1 and γ2 to be differentiable and for the derivatives to be con-

tinuous;

• we also want the derivatives γ̇1 and γ̇2 never vanish simultaneously, in
other words the vector γ̇ is never allowed to vanish.

I’ll explain why we need these when we need them.

When we try and define the length of a curved path, the natural thing to do
is to zoom in very closely to the curve and recall from Taylor’s theorem that
on very small scales the path is well-approximated by a line. We can imagine
taking finer and finer polygonal approximations of our path and defining the
length to be the limit of the lengths of the polygons. This is of course the same
as defining length by an integral along the curve and the infinitesimal length
along the curve (the infinitesimal arc-length) is just dt times the length of the
vector γ̇ where the dot denotes time-differentiation. Using Pythagoras, this is
Z 1 q
L(γ) ≡ γ̇12 + γ̇22 dt
0

If we wanted to be wholly rigorous we would first show that the definition by

polygonal approximations gave a well-defined number and then prove that this
number could be computed as the above integral. Instead, I’ll take the shortcut
of defining the length of a differentiable path to be the above integral.

Remark 27. Note that here we need the path to be differentiable to even write
down the integrand. We also need the integral to be well-defined; if the deriva-
tive of γ is continuous then so is the integrand and we know how to integrate
continuous functions on a closed interval (Riemann integration from Analysis
I/II).
2.1. STRAIGHT LINES ARE SHORTEST PATHS 37

2.1.2 Straight lines minimise action

Actually, the square root really causes us headaches, so I want to get rid of it
for the moment and define the action of a path to be the integral
Z 1
γ̇12 + γ̇22 dt.

α(γ) ≡
0

We’ll prove
Proposition 28. If γ is a straight line between A and B and δ is another path
between A and B then
α(γ) ≤ α(δ)
with equality if and only if δ ≡ γ.

In other words, straight lines minimise action. We’ll see later how to deduce
the claim about length from this claim about action.

Proof. Define (t) = δ(t) − γ(t).

Z 1
˙ 2 dt

α(δ) = |γ̇ + |
0
Z 1
|γ̇|2 + 2γ̇ · ˙ + ||
˙ 2 dt

=
0
Z 1
= α(γ) + α() + 2 γ̇ · dt
˙
0

Now because γ is linear, γ̇ is constant and hence the final term is

Z 1
1
2γ̇ · ˙ = 2γ̇ · [(t)]t=0
dt
0

by the fundamental theorem of calculus. But because (0) = (1) = 0 (remember

the endpoints of the path are fixed) this term vanishes. Now we see that

α(δ) = α(γ) + α() ≥ α(γ)

with equality if and only if ≡ 0, if and only if δ ≡ γ.

The key idea here was to integrate by parts (i.e. apply the fundamental theorem
of calculus and use the fact that γ satisfies some highly restrictive equation (γ̇
constant) to get rid of the term linear in . We were also lucky enough that
the remaining term was strictly positive, which is what allows us to prove this
incredibly strong result about global minimation of action by lines in Euclidean
space. It’s not usually the case in such variational arguments that we can control
the higher-order terms in .
38 CHAPTER 2. CALCULUS OF VARIATIONS

2.1.3 Straight lines minimise length

Now we want to argue that straight lines are in fact shortest. If I give you a
differentiable path γ you can reparametrise it. To do this, take your favourite
differentiable, monotonically strictly increasing function φ : [0, 1] → [0, 1] and
form the composition δ = γ ◦ φ, in other words

δ(t) = γ(φ(t))

This gives a new path (differentiable by the chain rule!). Reparametrising

doesn’t change the length of a path but it can certainly change the action.
Proposition 29. Any path can be reparametrised so that its action is equal
to the square of its length.

Proof. The parametrisation is defined as follows.

• Let Z t
s(t) = |γ̇|dt
0
be the arc-length after time t. Provided that γ̇ is never zero, s is a
monotonically increasing function of t.
• We normalise and consider the function φ(t) = s(t)/L(γ) which mea-
sures what proportion of the arc-length has ben traversed after time
t. It is easy to see that φ(t) is differentiable since, by the fundamental
theorem of calculus, its derivative is
dφ
(t) = |γ̇(t)|/L(γ).
dt
Remember we are assuming that |γ̇| =
6 0 everywhere.
• Since φ is monotonically increasing, it is a bijection and hence it has
an inverse φ−1 which takes a input a number λ ∈ [0, 1] and outputs the
time t at which γ has traversed an arc-length λL(γ). We will need the
following fact about φ−1 :

Lemma 30. If φ : [0, 1] → [0, 1] is a once continuously-differentiable function

whose derivative is always positive then it admits a continuously-differentiable
inverse φ−1 whose derivative is equal to

dφ−1

dφ
=1 .
dt dt

• Therefore φ−1 gives us a reparametrisation. What does the curve δ =

γ ◦ φ−1 do? At time t = 1/2 it moves to the point on γ which is exactly
halfway along (as measured by arc-length).
2.2. THE EULER-LAGRANGE EQUATIONS 39

Now the reparametrised path δ = γ ◦ φ−1 has

dδ dγ dφ−1
=
dt dφ dt

dφ dφ
= L(γ)
dt dt
= L(γ)

and hence its action is

Z 1 Z 1
|δ̇(t)|2 dt = L(γ)2 dt
0 0
= L(γ)2 .

Now it is clear that the action of a straight line is L(γ)2 . This plus the
previous proposition tells us that the length of a straight line is strictly less
than the length of any other path joining the points!

2.2 The Euler-Lagrange equations

2.2.1 Key steps reviewed

Let us recall the key steps from the last section:

• Define some (infinite-dimensional) function space X; in that section it was
the space of paths connecting A and B.
• Define a functional F : X → R on that space; in that section it was the
action functional.
• Take a supposed critical point, γ ∈ X of the functional; in that section it
was the straight line segment.
• Compute F (γ + ), where γ + is a small variation of γ. Usually you can
only compute this to first order in , in other words you can only usually
compute the quantity we think of as the directional derivative of F in the
-direction. We call this the variation in F associated to the variation
of γ (this is where the name variational calculus comes from).
• Wherever the derivative of occurs in the resulting integral, integrate by
parts.
Let’s do this again for the action of a path:
Z 1
γ̇12 + γ̇22 dt

α(γ) ≡
0
40 CHAPTER 2. CALCULUS OF VARIATIONS

We get
Z 1
2 2
α(γ + ) = (γ̇1 + ˙1 ) + (γ̇2 + ˙2 ) dt
0
Z 1
= α(γ) + 2 (γ˙1 ˙1 + γ̇2 ˙2 ) dt + O(2 )
0

so the directional derivative of α in the -direction at γ is

Z 1
dγ α() = (γ̇1 ˙1 + γ̇2 ˙2 ) dt
0
Z 1
1
= [γ̇1 1 + γ̇2 2 ]0 − (γ̈1 1 + γ̈2 2 ) dt
0

where we have integrated by parts to get rid of s.

˙ Note that the boundary term
vanishes because (0) = (1) = 0.
Recall that a critical point is somewhere that all directional derivatives vanish.
So the condition that γ is a critical point of α is just

dγ α() = 0 for all

i.e. Z 1
− γ̈ · dt = 0 for all .
0

We now state the

Theorem 31 (Fundamental theorem of the calculus of variations). Suppose
R1
that y : [0, 1] → Rn is a vector-valued function. If 0 y(t) · (t)dt = 0 for all
C 1 -functions (“variations”) : [0, 1] → Rn then y ≡ 0.

In particular this implies that γ̈ ≡ 0, in other words the components of γ have

to be linear function of t. In other words, γ is a linearly parametrised straight
line. So even if we hadn’t known the answer was a line to begin with, we could
have worked it out (provided we could solve the differential equation γ̈ = 0).
Remark 32. Notice that the equation we obtained above is a second-order equa-
tion (γ̈ = 0). This will hold in general because of the step where we integrate
by parts. This means that we should always assume our function γ is twice
continuously differentiable (C 2 ) instead of just C 1 .

2.2.2 The fundamental theorem of the calculus of varia-

tions

In this section we will prove Theorem 31.

2.2. THE EULER-LAGRANGE EQUATIONS 41

Proof. Suppose, to the contrary, that there is a t0 ∈ [0, 1] with y(t0 ) 6= 0. We

may as well assume that the component y1 (t0 ) > 0. Because y1 (t0 ) > 0, we
know that y1 (t) > 0 for all t in some small interval (t0 − δ, t0 + δ).
Define a “bump function” f : [0, 1] → R which is
• C1,
• nonnegative everywhere and positive at t0 ,
• and vanishes outside the interval (t0 − δ, t0 + δ).
Such functions are quite easy to construct. For instance
(
exp (t−t01)2 −δ2 if t ∈ (t0 − δ, t0 + δ)
f (t) =
0 otherwise

will do!
Now consider the function : [0, 1] → Rn given by

(t) = (f (t), 0, . . . , 0)

Integrating this against y gives

Z 1 Z t0 +δ
y(t) · (t)dt = f (t)y1 (t)dt > 0
0 t0 −δ
R1
which contradicts the assumption that 0
y(t) · (t)dt = 0 for all .

2.2.3 The Euler-Lagrange equation

Suppose we are interested in a situation where the space X is a space of C 2 -

functions φ : [a, b] → R of one variable, t. Suppose moreover that the values of
φ are specified at the endpoints of the interval [a, b],

φ(a) = A, φ(b) = B.

The other functions in the space X can be written

φ+

for some variation : [a, b] → R with (a) = (b) = 0.

Let L be a function of three variables

L(p, q, r)

and suppose that the functional we want to minimise is

Z b
F (φ) = L t, φ(t), φ̇(t) dt,
a
42 CHAPTER 2. CALCULUS OF VARIATIONS

in other words the integrand depends only on t, φ and its first derivative. For
instance, the integrand in the length and area functionals above only depended
on φ̇. We call the function L the Lagrangian of our problem.
Now perturb φ by a variation and expand the integrand to first order in
using the chain rule:
∂L
L(t, φ(t) + (t), φ̇(t) + (t))
˙ = L t, φ(t), φ̇(t) + (t) t, φ(t), φ̇(t) +
∂p
∂L
+ (t)
˙ t, φ(t), φ̇(t) + O(2 )
∂q

Integrating we get
Z b
F (φ + ) = L t, φ + , φ̇ + ˙ dt
a
Z b Z b
∂L ∂L
= F (φ) + t, φ, φ̇ (t)dt + ˙ + O(2 )
t, φ, φ̇ (t)
a ∂q a ∂r
Z b
∂L d ∂L
= F (φ) + t, φ(t), φ̇(t) − t, φ(t), φ̇(t) (t)dt+
a ∂q dt ∂r
b
∂L
+ + O(2 )
∂r a

where we have integrated by parts and picked up a boundary term. The bound-
ary term vanishes because (a) = (b) = 0. From this we see that the directional
derivative of F in the -direction at φ is
Z b
∂L d ∂L
t, φ(t), φ̇(t) − t, φ(t), φ̇(t) (t)dt.
a ∂q dt ∂r

If φ is an critical point then all directional derivatives vanish, which means that
the above integral vanishes for all . By Theorem 31 this means that

∂L d ∂L
t, φ(t), φ̇(t) − t, φ(t), φ̇(t) =0
∂q dt ∂r

which is now a second-order differential equation for φ. Note that we have

d ∂
written dt and not ∂t because the object being differentiated is actually just a
function of one variable, t.
Because L(p, q, r) is often written L(t, φ, φ̇) the q- and r-derivatives are often
written
∂L ∂L
and
∂φ ∂ φ̇
which looks extremely confusing first time you see it, because how can φ and φ̇
be independent variables? You should just think of it as convenient shorthand
2.2. THE EULER-LAGRANGE EQUATIONS 43

for the equation I’ve just written, and henceforth we’ll adopt this notation too.
The Euler-Lagrange equation is then

∂L d ∂L
= (2.1)
∂φ dt ∂ φ̇

and that’s what you’ll always see written.

2.2.4 Beltrami identity

A second-order differential equation is usually not as easy as a first-order dif-

ferential equation. If the Lagrangian L(p, q, r) is independent of p then it turns
out there is a first-order equation we can use instead of the Euler-Lagrange
equation.
Theorem 33. Suppose L(p, q, r) has no p-dependence (i.e. ∂L/∂p = 0). If φ
satisfies the associated Euler-Lagrange equation then
∂L
L t, φ(t), φ̇(t) − φ̇(t) t, φ(t), φ̇(t) = C
∂r
for some constant C. This is called the Beltrami identity.

Of course, this is usually written

∂L
L − φ̇ = C.
∂ φ̇

Proof. Throughout the proof, we write

L, ∂L/∂p, etc.

instead of ∂L
L t, φ(t), φ̇(t) , t, φ(t), φ̇(t) , etc.
∂p

Using the chain rule, we get

d ∂L ∂L ∂L ∂L ∂L d ∂L
L − φ̇(t) = + φ̇(t) + φ̈(t) − φ̈(t) − φ̇(t)
dt ∂r ∂p ∂q ∂r ∂r dt ∂r

By assumption ∂L/∂p = 0 so the first term on the right-hand side vanishes.

The two terms with φ̈ cancel and we are left with

d ∂L ∂L d ∂L
L − φ̇(t) = φ̇(t) − φ̇(t) .
dt ∂r ∂q dt ∂r
44 CHAPTER 2. CALCULUS OF VARIATIONS

The right-hand side is now clearly recognisable as φ̇(t) times the Euler-Lagrange
operator. Since
we’re assuming
that φ satisfies the Euler-Lagrange equation, we
d ∂L
see that dt L − φ̇(t) ∂r = 0, so
∂L
L t, φ(t), φ̇(t) − φ̇(t) t, φ(t), φ̇(t) = C
∂r
for some constant C, as required.
When you study classical mechanics in the Lagrangian/Hamiltonian setting
(Analytical Dynamics, next term) you will be able to understand this as a
statement of conservation of energy for an autonomous Hamiltonian system.
Remark 34. It seems strange that we can replace the second-order Euler-
Lagrange equation with its two boundary conditions φ(a) = A, φ(b) = B by
a first-order equation with two boundary conditions (usually we only have the
freedom to fix one boundary condition for a first-order equation). The point is
that the Beltrami has an undetermined constant, C, which we can fix using the
second boundary condition.

2.2.5 Vector-valued functions

We often come across problems where we have to optimise over a space of vector-
valued functions. For instance, showing that a straight line is shortest involved
looking at two-component functions (γ1 (t), γ2 (t)). In this case there is an Euler-
Lagrange equation for each component. Suppose that the vector-valued function
is (f1 (t), . . . , fn (t)) and that
L(t, f1 (t), . . . , fn (t), f˙1 (t), . . . , f˙n (t))
is the Lagrangian. Then the Euler-Lagrange equations are

∂L d ∂L
=
∂f1 dt ∂ f˙1
..
.

∂L d ∂L
=
∂fn dt ∂ f˙n
This is easy to see by taking variations individually. For example, the variation
(f1 + 1 , f2 , . . . , fn ) gives rise to the first equation. In fact you could think of
this as like the “partial derivative” of the functional in the f1 -direction (very
loosely speaking).
Problem 35. If ∂L/∂t = 0 then a Beltrami identity holds:
n
X ∂L
L− f˙k = C.
k=1
∂ f˙k
2.2. THE EULER-LAGRANGE EQUATIONS 45

2.2.6 A caveat

When we do these basic variational calculations all we are doing is finding the
critical points of a particular functional. It is possible to do a second-derivative
test to figure out if these are local maxima or minima, however most of the
time we want to know when something is a global maximum or minimum, i.e.
it’s the biggest/smallest of all possibilities. The methods we are developing are
purely local and won’t tell us such global information, in the same way that
ordinary calculus won’t tell you global information for an ordinary function.
The theorems we prove therefore have the form: “Assume a global minimum
x of the functional F : X → R were to exist. Then x would be a straight
line/circle/catenary...”

Problem 36. To illustrate the difficulty (in the finite-dimensional setting),

sketch for me the graph of a function f : R2 → R which admits a local minimum
at the origin but has no global minimum.

There is a set of harder analytical techniques which belong to the calcu-

lus of variations which allow one to prove the existence of global maximis-
ers/minimisers. The idea is roughly the following. Suppose the functional
F : X → R you are interested in is bounded (say from below). Take a se-
quence xk ∈ X such that F (xk ) tends to the infimum inf x∈X F (x). Try to
find a convergent subsequence. This subsequence may not have a limit in
the space X - if the space X consists of C 2 -functions then maybe there is
some loss of differentiability in the limit. Nonetheless, the subsequence has a
limit in some slightly larger space X (using something like the Arzela-Ascoli
theorem) and is a global minimum there. This implies that it satisfies a
weak form of the Euler-Lagrange equation (weak in the sense that it may
not be differentiable and hence the Euler-Lagrange equation doesn’t even
make sense for it). Now apply some kind of regularity theory to prove that
the limit is actually as smooth as you want it to be, which implies that
the global minimum exists somewhere in X. You will hopefully meet such
compactness and regularity theorems in future functional analysis/calculus
of variations/geometric analysis courses.
Of course in the examples we care about in this course, this technology proves
the existence results we want. In the words of Hilbert:
Every problem in the Calculus of Variations has a solution,
provided the word ‘solution’ is suitably understood.
46 CHAPTER 2. CALCULUS OF VARIATIONS

2.3 Examples

2.3.1 The catenoid

Let φ : [a, b] → R be a function with φ(a) = A, φ(b) = B and φ(x) > 0 for all
x ∈ [−1, 1]. Consider the graph of this function inside R2

{(x, z) ∈ R2 : z = φ(x)}

and consider this R2 as the (x, z)-plane in R3 . Rotate the graph around the x-
axis and consider the surface it traces out. This is called a surface of revolution.

Figure 2.1: A surface of revolution, obtained by revolving a graph around an

axis.

Lemma 37. The area of this surface of revolution can be written as an integral
Z b q
2π φ(t) 1 + φ̇(t)2 dt.
a

Proof. See http://youtu.be/p_xiCrZz_IU. In order to define the area of

the surface of revolution, we approximate it as follows. Take a decomposition
of the interval [a, b] into n pieces of width = (b − a)/n. Let tk = k,
k = 0, . . . , n. We replace the surface by a union of conical frustra Fk , k =
0, . . . , n − 1, namely over the interval [tk , tk+1 ] we define Fk to be the surface
of revolution obtained by revolving a segment of tangent line to the graph
of φ at (tk , φ(tk )) around the x-axis. This frustum looks like a piece of a
conical surface connecting a circle Ctk and a circle Dtk of radius Qk and Rk
respectively.
The surface area of this frustum is given by1

area(Fk ) = πdk (Qk + Rk )

where dk is the distance along the surface of the frustum between the two
end circles. It is clear from the picture that Qk = φ(tk ) and, since the radius
2.3. EXAMPLES 47

Figure 2.2: Approximating a surface of revolution by smaller and smaller cylin-

drical frusta.

Rk is obtained by moving from (tk , φ(tk )) along a line of slope φ̇(tk ) for time
, we know that Rk = Qk + φ̇(tk ). We can compute dk from Pythagoras’s
theorem q q
dk = 2 + 2 φ̇(tk )2 = 1 + φ̇(tk )2
so the area of the frustum Fk is
q
π 1 + φ̇(tk )2 2φ(tk ) + φ̇(tk ) .

The area of the whole surface is defined as the limit

n−1
X
lim area(Fk )
n→∞
k=0

which is
n−1
X q
lim π 1 + φ̇(tk )2 2φ(tk ) + φ̇(tk )
n→∞
k=0

Let us expand this:

n−1
X q n−1
X q
lim 2π 1 + φ̇(tk )2 φ(tk ) + lim π 1 + φ̇(tk )2 φ̇(tk ).
n→∞ n→∞
k=0 k=0

We would like to ignore the second term. Since φ is continuously differen-

tiable, φ̇ is a continuous function
q on the closed interval [a, b] and hence it is
bounded. That means that π 1 + φ̇(tk )2 φ̇(tk ) ≤ C for some constant C. So
48 CHAPTER 2. CALCULUS OF VARIATIONS

expression inside the limit in the second term is bounded above by

n−1
X n−1
X
C2 = C(b − a)2 /n2 = C(b − a)2 /n
k=0 k=0

which goes to zero in the limit as n → ∞. Hence we can ignore this term.
The first term is actually just what we would get if we wrote out the definition
of the Riemann integral of the function
q
2πφ(t) 1 + φ̇(t)2

using the sequence of decompositions we began with. This proves the lemma.

Theorem 38. Assume that there is a C 2 -function φ : [a, b] → R which gives a

surface of revolution of minimal area subject to the boundary conditions φ(a) =
A, φ(b) = B. Then φ has the form

φ(x) = C cosh((x − D)/C)

for some constants C, D to be determined by the boundary conditions.

This curve is called a catenary curve and that’s why its surface of revolution is
called a catenoid. We’ll meet the catenary curve again in Section 2.4.3 below.

Proof. We are seeking to minimise the functional

Z 1 q
F (φ) = φ(t) 1 + φ̇(t)2 dt
−1

i.e. p
L(p, q, r) = q 1 + r2
Clearly, L is independent of p, so we have Beltrami’s identity
∂L
L − φ̇ =C
∂ φ̇
q
∂L
for some constant C. Since ∂ φ̇
= φφ̇/ 1 + φ̇2 , this becomes

φ̇2 φ
q
φ 1 + φ̇2 − q = C.
1 + φ̇2
q
Multiplying by 1 + φ̇2 :
q
(1 + φ̇2 )φ − φ̇2 φ = C 1 + φ̇2
2.3. EXAMPLES 49

or q
φ=C 1 + φ̇2 .
This rearranges to give r
φ2
φ̇ = − 1.
C2
Now
Z Z
φ̇
t + D0 ≡ dt = q dt
φ2
C2 −1
Z
d(φ/C)
=C q
φ2
C2 − 1
r !!
φ2φ
= C log 2 −1
+
C2C

φ
= C log 2 + C cosh−1
C

Therefore, setting D = C log 2 − D0 , we have

t−D
φ(t) = C cosh
C

as required.
Problem 39. Suppose that a = −1, b = 1. Show that the equal height case
A = B leads us to φ(t) = C cosh(t/C). For what values of A does this have
a solution? Given the interpretation of a minimal surface of revolution as a
soap-film, how do you think we should interpret this last answer?

2.3.2 Area of a frustum

See http://youtu.be/OEeJeBE1vg0. A conical frustum is the surface of a

truncated cone. Let Q and R denote the radii of the circles at the top and
50 CHAPTER 2. CALCULUS OF VARIATIONS

bottom of a frustum (assume Q < R). In the proof of Lemma 37 we used

the following formula for the surface area of a frustum:

πd(Q + R)

where d is the distance between the top and bottom circles as measured on
the surface of the frustum. We will demonstrate this formula here, though
the demonstration begins with an unjustified piece of geometric intuition
that I won’t bother to explain.
The point I will not justify is that you can cut the frustum and lay it out
flat. The result is a planar region: a segment of an annulus. You could
demonstrate this by writing down an explicit isometric (distance-preserving)
map between the cut frustum and the planar region, but I won’t do that.
Given this fact, we will compute the area of the planar region.

Let F be the fraction of the annulus which is subtended by the cut frustum
and let P denote the radius of the inner boundary of the annulus. We know
that the distance between the two boundary components of the annulus is d.
We also know that the length of the inner arc is 2πQ and the length of the
outer arc is 2πR. In other words,

2πP F = 2πQ
2π(P + d)F = 2πR.

This gives
(R − Q)
F =
d
and
Qd
P = .
R−Q
Therefore the area of the annulus, being the difference of the area of two
discs, is
2
π (P + d) − πP 2 = πd(d + 2P )
2.3. EXAMPLES 51

and the area of the segment is a fraction F of this, giving the area of the
frustum as

Qd
π(R − Q)(d + 2P ) = π(R − Q) d + 2
R−Q
= πd(R − Q + 2Q)
= πd(Q + R)

as claimed.

2.3.3 The brachistochrone

Figure 2.3: The brachistochrone curve is the optimal shape of wire for a fric-
tionless bead moving under gravity to slide down between two points in order
to minimise the time taken.

From the Greek βραχιςτ oσ + χρoνoς meaning “shortest” + “time”, the brachis-
tochrone problem is the problem of finding the shape of a wire joining two points
p and q such that a bead running on this wire under gravity (no friction) gets
from p to q in the shortest time (assuming it starts off at rest at p).
Let’s translate p to the origin and suppose that q is at (x0 , y0 ), with x0 > 0 (so
that p and q are not directly above and below one another) and with y0 < 0 (so
that the bead will really fall from p to q). The kinetic energy of a bead with
speed v is 21 mv 2 and the gravitational potential energy of a bead at (x, y) is
mgy. Therefore
1 2
2 mv + mgy

is constant throughout the trajectory. At p the bead is at rest and y = 0 so this

whole expression vanishes identically. This means that we can solve for v:
p
v = −2gy

We assume that the wire is the graph of some function y(x). Moreover, we’ll
assume that this function is invertible, that is we can express x as a function of
y and dx/dy = (dy/dx)−1 .
52 CHAPTER 2. CALCULUS OF VARIATIONS

The time taken for the bead to move along the wire to q is
Z q Z q
ds
T = dt =
p p ṡ

where s(t) is the distance travelled, so ṡ = v. Therefore

Z q
ds
T =
p v
q
p
dx2 + dy 2
Z
= √
p −2gy
q
dy 2
Z q 1 + dx
= √ dx
p −2gy

and we have a calculus of variations problem on our hands: minimise T as a

functional of the curve y(x) (under the assumption √that y(0) = 0 and y(x0 ) =
1+r 2
y0 ). The Lagrangian for our problem is L(p, q, r) = −2gq
√ .

The Lagrangian is independent of its first variable and so we have Beltrami’s

identity
p p !
1 + (y 0 )2 0 ∂ 1 + (y 0 )2
√ −y √ =C
−2gy ∂y 0 −2gy

for some constant C. This gives

1
√ p (1 + (y 0 )2 − (y 0 )2 ) = C
−2gy 1 + (y 0 )2
or r
0 1
y = − 1.
−2C 2 gy
Writing this in terms of x(y), we get

dx 1
=q
dy 1
−1
−2C 2 gy
√
−y
=√ ,
A+y

where A = 1/2C 2 g. We can now integrate this by substituting y = −A sin2 (θ)

and using the condition x(0) = 0 to get
r
−y √ p
x = A sin−1 − −y A + y.
A
2.4. CONSTRAINED PROBLEMS 53

Remark 40. From Figure 2.3 it should be clear that our assumption (that x
is expressible as a function of y) is a poor one.√ Of course, the end result is
reassuringly multivalued (it’s got a sin−1 and a ·). So although we made an
illegal assumption, the answer is actually correct. This is the beauty of the Euler-
Lagrange equations: their locality - from a global assumption about a function
(that it minimises a functional) you derive an equation for the local behaviour
of that extremal function. In the same way it’s kind of magical that light travels
in such a way as to minimise the time taken during travel, by only obeying local
laws of physics (without knowing beforehand what its endpoint is). If we were
feeling careful we could make this brachistochrone argument more rigorously, but
for the purposes of this course we’ll just be satisfied that we got the right answer.

2.4 Constrained problems

We have already considered constrained optimisation in the finite-dimensional

setting. Exactly the same idea works in infinite-dimensional functional opti-
misation problems, but it is easier not have to think in terms of tangencies of
infinite-dimensional hypersurfaces! I will explain how the algorithm works and
give some examples.

2.4.1 The algorithm

The algorithm is the same as usual. You have a functional defined on some
space of functions φ(t). This functional is given by an integral
Z b
F (φ) = L(t, φ, φ̇)dt
a

and you want to find its extrema. However, you also want to impose a constraint
on the functions φ. Maybe you want to fix the total arc-length of the graph of
φ,
Z bq
1 + φ̇2 dt
a

or the area underneath the graph

Z b
φ(t)dt.
a

Notice that these constraints are given by integrals; let’s consider the general
constraint:
Z b
M (t, φ, φ̇)dt = C.
a
54 CHAPTER 2. CALCULUS OF VARIATIONS

We impose this by introducing a Lagrange multiplier λ ∈ R. The full functional

we want to consider is
Z b
C
F (φ) = L t, φ, φ̇ + λ M t, φ, φ̇ − dt.
a b−a
Varying with respect to λ gives
Z b
∂F
= M t, φ, φ̇ dt − C = 0.
∂λ a

Varying with respect to φ gives the usual Euler-Lagrange equation but where L
is replaced by L + λ(M − C):

∂ d ∂
(L + λ(M − C)) = (L + λ(M − C)) .
∂φ dt ∂ φ̇

2.4.2 The isoperimetric problem

For an interesting historical survey of this problem and approaches to proving

it, see this article by V. Blåsjö in the American Mathematical Monthly.
The isoperimetric problem is to find a curve in the plane with a fixed length K
which bounds an area A which is maximal amongst all areas bounded by plane
curves of length K. The answer is, of course, a circle of radius K/2π. We will
prove the following:
Theorem 41. If there exists a 2π-periodic (i.e. closed) C 2 -curve γ : R → R2
of length K which maximises the area it bounds amongst all closed C 2 -curves of
length K, then it is a circle.

Note that, as usual, we do not prove existence of a maximiser! We will simply

show that the only critical point of a suitable functional on the space of curves
is the circle. In the problem sheets we will use Fourier analysis to prove the
stronger statement that the circle is a maximiser.

Proof of Theorem 41. If C is a curve parametrised by a 2π-periodic C 2 -function

γ : R → R2 and B is the bounded component of R2 \C then we seek to maximise
R R 2π p 2
the area integral B dxdy subject to the condition that 0 γ̇1 + γ̇22 dt = K.
Of course, this integral doesn’t look quite right yet: we’re used to integrating
along a curve, not over an area. By Green’s theorem:
Z Z Z Z 2π
dxdy = d(xdy) = xdy = γ1 (t)γ̇2 (t)dt
B B C 0

so our functional becomes

Z 2π q
K
F (γ) = γ1 (t)γ̇2 (t) − λ γ̇12 + γ̇22 − dt
0 2π
2.4. CONSTRAINED PROBLEMS 55

where λ is a Lagrange multiplier. Let’s write x and y rather than γ1 and γ2 .

The Lagrangian is
p
K
L(x, y, ẋ, ẏ) = xẏ − λ ẋ2 + ẏ 2 −
2π
and the Euler-Lagrange equations are
!
d λẋ
ẏ = − p
dt ẋ2 + ẏ 2
!
d λẏ
0= x− p
dt ẋ2 + ẏ 2
Z 2π p
K= ẋ2 + ẏ 2 dt
0

(differentiating F (γ) in the γ1 , γ2 and λ-directions respectively).

Now we do something cunning, which we also did when we dealt with straight
lines. If we are given a path [a, b] → R2 , we can reparametrise that path to get
a new path [a, b] → R2 . This way we can make a particle travel along it at any
speed v(t) we want, provided that
Z b
v(t)dt = K
a

and we certainly don’t change the length of the path (the distance the particle
travels). For the rigorous justification of this, see Section 2.1.3. In particular, we
can parametrise
R t p 2proportionally to arc-length and define our new time coordinate
s(t) = 2π
K 0 γ̇ 1 + γ̇2
2 dt. This satisfies

Z sp
d K
γ̇1 (σ)2 + γ̇2 (σ)2 dσ =
ds 0 2π
in other words r 2 2 K
dγ1 dγ2
ds + ds =
2π
or
p K
ẋ2 + ẏ 2 = .
2π
The Euler-Lagrange equations simplify to
2πλ
ẏ = − ẍ
K
2πλ
ẋ = ÿ
K
(note that the other equation is automatically satisfied). We have
d3 y K 2 dy
= −
ds3 4π 2 λ2 ds
56 CHAPTER 2. CALCULUS OF VARIATIONS

so ẏ is a harmonic oscillator. This implies that the solution is in fact a circle

(could be centred anywhere and parametrised starting from any angle).

It seems slightly devious that, halfway through a Lagrange-multiplier proof, we

reparametrised so we could ignore the Lagrange-multiplier equation. The ques-
tion I asked myself when I started off writing down this proof was: Why don’t
we restrict attention to arc-length parametrisations from the very beginning? If
you try it, you’ll get the wrong answer. The reason is that the space of paths
parametrised by arc-length is not a nice flat space. If you add an arbitrary per-
turbation γ +, you completely ruin the condition that the curve is parametrised
by arc-length. For the same reason, we have to impose the length-fixing con-
straint with a Lagrange multiplier instead of by starting off with the space of
all curves of a fixed length.
So why were we allowed to reparametrise then? Because we knew that, if a
solution of the constrained problem existed, we could reparametrise it afterwards
and what we obtained would still be an extremum of the constrained problem.
So we might as well restrict ourselves to curves parametrised by arc-length, but
we really need to be using the constrained Euler-Lagrange equation.

2.4.3 The catenary

Coming from the Latin word “catena” meaning chain, the catenary is the curve
which follows a hanging chain. More formally, if (a, A) and (b, B) are points
in a vertical plane and there is an infinitely thin chain of uniform density ρ
(mass per unit length) whose endpoints are fixed at these points, hanging freely
under gravity and having fixed length K then the chain fits a curve called the
catenary, given by the graph of a function φ(t), t ∈ [a, b] which is the solution
to a variational problem.
The variational problem is this: minimise the total potential energy of the chain
subject to the condition that its length equals K. The potential energy of a mass
at height y is mgy so each “infinitesimal element” of our chain (x(t), y(t)) =
(t, f (t)) has:
q
• length equal to dx2 + dy 2 = 1 + f˙2 dt,
p

q
• mass equal to ρ 1 + f˙2 dt,
q
• potential energy equal to ρgf 1 + f˙2 dt.

Integrating this over the whole interval [a, b] gives the total potential energy
Z b q
ρg f (t) 1 + f˙(t)2 dt
a
2.5. SEVERAL DIMENSIONS 57

and the total length is

Z b q
K= 1 + f˙(t)2 dt.
a
The functional for the constrained variational problem is therefore
Z b q Z bq !
ρg f (t) 1 + f˙(t)2 dt + λ 1 + f˙(t)2 dt − K
a a

where λ is a Lagrange multiplier.

Problem 42. Find the constrained Euler-Lagrange equations for this problem
and show that the solution is a catenary curve

2.5 Several dimensions

The problems we have considered so far have all concerned optimising functions
of one variable. The resulting Euler-Lagrange equation is a second-order ordi-
nary differential equation. Things become even more interesting if we consider
variational problems with functions of several variables. The theory is almost
identical except that ordinary derivatives become partial derivatives and the
equations become harder!

2.5.1 Euler-Lagrange equation in two variables

The generalisation to many variables will hopefully be clear. We consider a

functional which takes functions φ(x, y) of two variables and outputs an integral
Z Z
L(x, y, φ, ∂x φ ∂y φ)dxdy

where L(p, q, r, s, t) is a function of five variables. The usual Euler-Lagrange

argument applies but we need to use the several-variable version of the chain
rule. This results in the equation

∂L ∂ ∂L ∂ ∂L
= +
∂q ∂x ∂s ∂y ∂t
where all the derivatives of L are evaluated at
(p, q, r, s, t) = (x, y, φ(x, y), (∂x φ)(x, y), (∂y φ)(x, y)).
In order to reflect this, the equation is usually written

∂L ∂ ∂L ∂ ∂L
= + .
∂φ ∂x ∂φx ∂y ∂φy

where we are using the notation φx = ∂φ/∂x, φy = ∂φ/∂y simply to minimise

the number of ∂s floating around.
58 CHAPTER 2. CALCULUS OF VARIATIONS

2.5.2 Laplace’s equation

Consider a plane region B bounded by a closed curve C and let f : C → R be a

function. We seek a C 2 -function ϕ : B → R satisfying ϕ|C = f and minimising
the following integral over all such functions:
Z Z
2 2
|∇ϕ|2 dxdy.

F (ϕ) = (∂x ϕ) + (∂y ϕ) dxdy =
B B

Since
∂L
=0
∂ϕ
the Euler-Lagrange equation is

∂ ∂ϕ ∂ ∂ϕ
0= 2 + 2 = 2∆ϕ
∂x ∂x ∂y ∂y

where ∆ denotes the Laplace operator

∂2 ∂2
∆= 2
+ 2.
∂x ∂y

The extremals therefore satisfy ∆ϕ = 0 (Laplace’s equation) and are har-

monic functions. We will meet Laplace’s equation again later as the steady-
temperature case of the heat equation: temperature distributions evolve over
time via the heat equation
∂ϕ
= ∆ϕ
∂t
and, after a long time when the temperature reaches a steady state so that
∂ϕ/∂t = 0, we achieve a steady-temperature distribution satisfying Laplace’s
equation. You can see from our variational characterisation that the heat equa-
tion is acting to minimise the total gradient of the temperature distribution (as
defined by the functional F (ϕ)). This fits well with our physical intuition.

2.5.3 Minimal surface equation

Let C ⊂ R2 be a closed curve in the plane bounding a region B and let f : C →

R be a given function. We would like to find an extension ϕ of f to the whole
of B which minimises the surface area of Graph(ϕ). This is precisely what a
soap-film would do if you dipped the graph of f (considered as a wire frame)
into a suitably soapy liquid mixture.
Lemma 43. The surface area of Graph(ϕ) is
s 2 2
Z
∂ϕ ∂ϕ
1+ + dxdy.
B ∂x ∂y
2.5. SEVERAL DIMENSIONS 59

Proof. I’ll only give a sketch proof. Take a tangent plane to the graph of ϕ at
a point on the graph. Two of the vectors in this tangent plane are (1, 0, ∂x f )
and (0, 1, ∂y f ). The area of the parallelogram which they span is equal to the
magnitude of their cross-product

(1, 0, ∂x f ) × (0, 1, ∂y f ) = (−∂x f, −∂y f, 1)

and this magnitude is precisely the integrand. Now approximate the surface by
such parallelograms (living over squares of edge-length in R2 ) and take better
and better approximations ( → 0). This yields the integral.
Theorem 44. If ϕ is a C 2 -function with ϕ|C = f which minimises the surface
area of its graph amongst all such functions then ϕ satisfies
2 ! 2 !
∂2ϕ ∂ϕ ∂2ϕ ∂ϕ ∂ϕ ∂ϕ ∂ 2 ϕ
1 + + 1 + = 2 .
∂x2 ∂y ∂y 2 ∂x ∂x ∂y ∂x∂y

The proof is consigned to one of the problem sets: I could not deprive you of
the pleasure of this calculation.
60 CHAPTER 2. CALCULUS OF VARIATIONS
Part II

Partial differential
equations

61
Chapter 3

Some general theory

3.1 The definition

A partial differential equation (PDE) for a function φ(x1 , . . . , xn ) of n variables
is a relation between the values of φ and (finitely many of) its partial derivatives
(to arbitrary orders) at each point. That is, the relation never relates the partial
derivatives or values of φ at different points - that would be a difference equation,
and would probably be harder to solve. We will write F (φ)(x, y) = 0 for this
relation at each point (x, y). We should probably write something like

F x, y, φ(x, y), (∂x φ)(x, y), (∂y φ)(x, y), (∂x2 φ)(x, y), . . . = 0

where the dots indicate a finite list, but that would be cumbersome.
For example,

∂φ ∂φ
F (φ)(x, y) = (x, y) + φ(x, y) (x, y) − xy = 0
∂x ∂y
or 2 2
∂φ ∂φ
+ =1
∂x ∂y
or
∂φ ∂2φ
=
∂t ∂x2
are PDEs, where we have already started to omit the argument from each term.
On the other hand,
∂x φ(x, y) = ∂y φ(x + 1, y − 1)
and
∂x φ + ∂x2 φ + ∂x3 φ + · · · = 0

63
64 CHAPTER 3. SOME GENERAL THEORY

are not PDEs (one is a difference equation, one involves infinitely many deriva-
tives).
The highest order of derivatives which appears in the equation is called the order
of the equation. We will first consider first-order equations, then second-order
equations. But here are some more definitions:
Definition 45. • A PDE F (φ) = 0 is called linear if
F (λ1 φ1 + λ2 φ2 ) = λ1 F (φ1 ) + λ2 F (φ2 )
for all λ1 , λ2 ∈ R. In practical terms, this means that F (φ) is a sum of
terms of the form
A∂1 · · · ∂k φ
where A depends only on the variables x1 , . . . , xn and ∂1 . . . ∂k is some
string of partial derivatives. We also allow inhomogeneous linear equa-
tions, F (φ) = R where F is linear and R is a function of x1 , . . . , xn .
• If moreover, each coefficient A is just a constant, we say the equation has
constant coefficients.
• A PDE is called quasilinear if, in each term involving highest (say kth)
order partial derivatives ∂1 · · · ∂k φ, the coefficient doesn’t involve any other
kth derivatives. So
∂φ ∂φ ∂φ
+x =1
∂x ∂y ∂y
is not quasilinear because it is first order but there is a term involving a
product of first derivatives. The Burgers inviscid fluid equation
∂φ ∂φ
φ + =0
∂x ∂t
is quasilinear.
• If a PDE is not quasilinear then it is called fully nonlinear. For example
the eikonal equation
2 2
∂φ ∂φ
+ =1
∂x ∂y
important in geometric optics or the Monge-Ampère equation
2 2
∂2φ ∂2φ ∂ φ
−
∂x2 ∂y 2 ∂x∂y
important in geometry.

Most of the PDEs we are going to talk about in this course in are linear with the
exception of a few quasilinear and fully nonlinear first order equations (like the
eikonal equation). We will start by talking about first-order equations because
there is a very well-developed theory for them. Then we will talk about some
specific second-order equations. I will take a moment to discuss the second-order
equations we’re going to study.
3.2. QUASILINEAR SECOND ORDER EQUATIONS IN TWO VARIABLES65

3.2 Quasilinear second order equations in two

variables
The general quasilinear second order equation in two variables (x, y) is

∂2φ ∂2φ ∂2φ

A 2
+B +C 2 =R
∂x ∂x∂y ∂y
where A, B, C, R are arbitrary functions of x, y, φ, ∂x φ and ∂y φ.
Definition 46. Define ∆ = B 2 − 4AC. This is a function of x, y, φ, ∂x φ, ∂y φ.
However, sometimes it happens that this function is always positive, always
negative or always zero. For instance, if the equation is linear and has constant
coefficients then ∆ is just a number.
• If ∆ < 0 we say that the equation is elliptic,
• if ∆ = 0 we say that the equation is parabolic,
• and if ∆ > 0 we say that the equation is hyperbolic.
This is only a classification for linear, constant coefficient equations for which
∆ > / = / < 0 makes sense. More generally, it is a guideline - for example, we
expect solutions to quasilinear elliptic equations to share certain properties with
solutions to linear elliptic equations.

Here are some examples of linear equations with constant coefficients.

Example 47. Laplace’s equation
∂2φ ∂2φ
+ 2 =0
∂x2 ∂y
has A = C = 1, B = 0 so ∆ = −4 < 0 which means it is an elliptic equation.
Example 48. The heat equation
∂φ ∂2φ
=
∂t ∂x2
has A = 1, B = 0, C = 0. Therefore ∆ = 0 and the equation is parabolic.
Example 49. The wave equation
1 ∂2φ ∂2φ
2 2
=
c ∂t ∂x2
has A = 1/c2 , B = 0, C = −1 so ∆ = 1/c2 > 0 and the equation is hyperbolic.

This classification may seem like something quite formal, but the three types
of equations really do display very different kinds of behaviour (and quasilinear
equations of elliptic type really do behave very like linear elliptic equations).
66 CHAPTER 3. SOME GENERAL THEORY

Problem 50. Show that the (quasilinear) minimal surface equation

2 ! 2 !
∂2ϕ ∂ϕ ∂2ϕ ∂ϕ ∂ϕ ∂ϕ ∂ 2 ϕ
1 + + 1 + = 2
∂x2 ∂y ∂y 2 ∂x ∂x ∂y ∂x∂y

from Section 2.5.3 is elliptic.

3.3 Some basic tricks

Let’s restrict our attention to linear, homogeneous equations in two variables
with constant coefficients. We take first- and second-order equations and look
at some basic tricks for solving them. These tricks amount to making clever
changes of coordinates and will generalise to the method of characteristics for
first-order equations. They will also make it clear why ∆ is a natural quantity
to consider for second order equations.

3.3.1 First order

Suppose we have a first-order equation we want to solve:

∂φ ∂φ
A +B + Cφ = 0.
∂x ∂y

The expression A ∂φ ∂φ
∂x + B ∂y is just the partial derivative of φ in the (A, B)-
direction. Consider the family of all lines parallel to the vector (A, B), that is
the lines
{(x0 + tA, y0 + tB) : t ∈ R}
Suppose we have a solution φ and we restrict it to one of these lines:

u(t) = φ(x0 + tA, y0 + tB)

Then we know
du ∂φ ∂φ
=A +B
dt ∂x ∂y
so
du
(t) + Cu(t) = 0.
dt
This is an ordinary differential equation equivalent to

d Ct
(e u) = 0
dt
and has solution
u(t) = Ke−Ct
3.3. SOME BASIC TRICKS 67

Suppose that A 6= 0. Let x0 = 0. Then, as t and y0 vary, the point (tA, y0 + tB)
traces out the whole of R2 . Indeed, the point (x, y) corresponds to t = x/A,
y0 = y − xB/A. Therefore

φ(x, y) = K(y − xB/A)e−Cx/A .

Note that we are allowing the ‘constant of integration’ K to be an arbitrary

function of y − xB/A = y0 because y0 was fixed when we restricted to the
line (At, y0 + tB). This makes sense because y − xB/A is annihilated by the
directional derivative
∂ ∂
A +B .
∂x ∂y

This basic trick is the idea behind the method of characteristics: using the
original equation, find a system of curves (in this case straight lines) along
which the PDE reduces to an ODE which we can solve.

3.3.2 Second order

Consider the equation

∂2φ ∂2φ ∂2φ

A + B + C =0 (3.1)
∂x2 ∂x∂y ∂y 2

and suppose that ∆ = B 2 − 4AC > 0, i.e. the equation is hyperbolic. Then the
quadratic polynomial
At2 + Bt + C = 0 (3.2)
has two distinct real roots √
−B ± ∆
t± =
2A
so that At2 + Bt + C = A(t − t− )(t − t+ ). Define s± = y + xt± so that
s+ − s− t+ s− − t− s+
x= , y= .
t+ − t− t+ − t−

Then
∂ ∂x ∂ ∂y ∂
= +
∂s+ ∂s+ ∂x ∂s+ ∂y

1 ∂ ∂
= − t−
t+ − t− ∂x ∂y
∂ ∂x ∂ ∂y ∂
= +
∂s− ∂s− ∂x ∂s− ∂y

1 ∂ ∂
=− − t+ .
t+ − t− ∂x ∂y
68 CHAPTER 3. SOME GENERAL THEORY

This means that

∂ ∂ 1 ∂ ∂ ∂ ∂
=− − t− − t+
∂s+ ∂s− (t+ − t− )2 ∂x ∂y ∂x ∂y
∂2 ∂2 ∂2

1
=− A + B + C
A(t+ − t− )2 ∂x2 ∂x∂y ∂y 2
and hence, if φ solves (3.1),
∂2φ
= 0.
∂s+ ∂s−
The general solution to this equation is a sum of two completely arbitrary dif-
ferentiable functions:
f (s+ ) + g(s− ),
i.e.
f (y + xt+ ) + g(y + xt− ).

If the original equation were instead parabolic then the quadratic polynomial
(3.2) would have two identical real roots t0 and the general solution would be
f (y +xt0 )+xg(y +xt0 ). To see this, use the new coordinates (u, v) = (y +t0 x, x)
instead of s± .
If the original equation were elliptic then (3.2) would have two non-real complex-
conjugate roots z and z ∗ . The general solution is once again

f (y + xz) + g(y + xz ∗ )

but the functions f and g (which may be complex) have to be chosen so as to

make their sum real. For instance,

f (t) = g(t) = t

would give
y + xz + y + xz ∗ = 2y + x Re(z)
as the solution.

Examples

Example 51. Suppose we wish to find the general solution to

∂2φ ∂2φ ∂2φ
+ + −2 = ex .
∂x2 ∂x∂y ∂y 2
The quadratic equation is t2 +t−2 = (t−1)(t+2) and therefore t− = −2, t+ = 1.
Then s− = y − 2x and s+ = y + x and the equation becomes

∂2φ ∂2φ ∂2φ ∂2φ

−9 = 2
+ + −2 2 = ex .
∂s+ ∂s− ∂x ∂x∂y ∂y
3.3. SOME BASIC TRICKS 69

Note that x = (s+ − s− )/3 so our equation is

∂2φ −es+ −s−

=
∂s+ ∂s− 9
Integrating with respect to s+ and s− gives

φ = es+ −s− + F (s+ ) + G(s− )

or
φ(x, y) = ex + F (y + x) + G(y − 2x).

Now let us see how to solve for F and G when given initial conditions φ(x, 0) =
M (x) and ∂φ/∂t(x, 0) = N (x).
Example 52. Say we require φ(x, 0) = x2 and ∂φ/∂t(x, 0) = x3 . Then

x2 = ex + F (x) + G(−2x)
x3 = F 0 (x) + G0 (−2x)

Differentiating the first of these gives

2x = ex + F 0 (x) − 2G0 (−2x)

and now we have two simultaneous equations for F 0 and G0 . These give

3F 0 (x) + ex = 2(x3 + x)

or
x4 x2 ex
F (x) = + −
6 3 3
Plugging back into the very first equation gives

G(−2x) = x2 − ex − F (x)

or
2x2 x4 2ex
G(−2x) = − − .
3 6 3
Substituting z = −2x gives

z2 z4 2e−z/2
G(x) = − − .
6 96 3
Hence
(x + y)4 (x + y)2 ex+y
φ(x, y) = ex + + −
6 3 3
(y − 2x)2 (y − 2x)4 2e−(y−2x)/2
+ − − .
6 96 3
70 CHAPTER 3. SOME GENERAL THEORY
Chapter 4

First order PDE

As usual, I will only deal with two-variable equations, mainly for notational
convenience.

4.1 Linear case

We have already seen how to solve first order linear homogeneous PDE with
constant coefficients in Section 3.3.1. Let’s just quickly extend this to the inho-
mogeneous case. The equation we want to solve is

∂φ ∂φ
A +B + Cφ = D(x, y)
∂x ∂y

The trick we employed before works again: we restrict attention to the line
(At, y0 + Bt) along which the equation for u(t) = φ(At, y0 + Bt) reduces to an
ODE
du
+ Cu = D(At, y0 + Bt)
dt
or
d Ct
(e u) = eCt D(At, y0 + Bt)
dt
and the solution consists of a particular integral
Z t
−Ct
u(t) = e eCs D(As, y0 + Bs)ds
0

plus a complementary function

K(y0 )e−Ct

71
72 CHAPTER 4. FIRST ORDER PDE

with a ‘constant’ depending on y0 , and we can substitute t = x/A, y0 = y−Bx/A

as before to solve the original problem:
Z x/A
φ(x, y) = e−Cx/A eCs D(As, y − Bx/A + Bs)ds + K(y − Bx/A)e−Cx/A .
0

4.2 Linear case, varying coefficients

Now we allow A, B, C to depend on (x, y) too (but not yet on φ). The equation
we want to solve is
∂φ ∂φ
A(x, y) + B(x, y) + C(x, y)φ = D(x, y).
∂x ∂y

The operator A(x, y) ∂φ ∂φ

∂x + B(x, y) ∂y has an obvious interpretation as a direc-
tional derivative, but for the varying vector field v(x, y) = (A(x, y), B(x, y)).
We need a replacement for the family of straight lines (u, y0 ) in the previous
case.
Definition 53. An integral curve for the vector field v(x, y) is a curve γ : R →
R2 such that
γ̇(t) = v(γ(t)),
in other words, a curve whose tangent vector is everywhere given by v.
Example 54. Suppose that v(x, y) = (A, B), where A and B are constant.
Then the lines (x0 , y0 ) + t(A, B) are integral curves for v.
Example 55. Suppose v(x, y) = (−y, x). Then the family of circles of radius
r, γ(t) = (r cos(t), r sin(t)), are integral curves for v. Indeed,

γ̇(t) = (−r sin(t), r cos(t)) = v(γ(t)).

We’ll sort out the problem of actually finding the integral curves momentarily.
Let’s assume for the moment that we have found them. If γ(t) is a parametri-
sation of an integral curve then
d
φ(γ(t)) = dγ(t) φ(γ̇(t)) = dγ(t) φ(v)
dt
which is just the directional derivative of φ in the v-direction at γ(t), in other
words
∂φ ∂φ
A(γ(t)) + B(γ(t)) .
∂x ∂y
Therefore, setting f (t) = φ(γ(t)), we have

df
(t) = D(γ(t)) − C(γ(t))f (t)
dt
4.2. LINEAR CASE, VARYING COEFFICIENTS 73

which is an ordinary differential equation for f .

So the only difficulty is in working out γ. We have

dγ1
= A(γ1 (t), γ2 (t))
dt
dγ2
= B(γ1 (t), γ2 (t))
dt
which is a system of two ODEs for two unknowns γ1 , γ2 .
Therefore in order to solve the PDE, we have to solve a pair of coupled ODEs
for γ and then a further ODE to find f along γ. This latter ODE introduces
a ‘constant of integration’, but this ‘constant’ is allowed to vary as you move
from one integral curve γ to another.
Definition 56 (Characteristics). The curves γ(t) are called the characteristics
of our equation. Note that when we solve the system of ODEs for γ we get two
constants of integration and hence a two-parameter family of curves. One of
these parameters can always be fixed because it will correspond to reparametrising
the characteristics.

Let’s do an example.
Example 57. Consider
∂φ ∂φ
+ 2x = 1.
∂x ∂y
We have A(x, y) = 1, B(x, y) = 2x and hence

dγ1
=1
dt
dγ2
= 2γ1
dt

which has solution γ1 (t) = t + M and γ2 (t) = (t + M )2 + N ; in other words,

the integral curves are parabolae. Let us fix M = 0 by translating the time
parameter.
Now for each N define f (t) = φ(t, t2 + N ) and we need to solve df /dt = 1. This
gives f (t) = t + K(N ) or, given that x = t and y − x2 = N ,

φ(x, y) = x + K(y − x2 )

which is the general solution of our problem.

Example 58. Consider

∂φ ∂φ
y +x + xyφ = 0.
∂x ∂y
74 CHAPTER 4. FIRST ORDER PDE

We have A(x, y) = y, B(x, y) = x and hence

dγ1
= γ2
dt
dγ2
= γ1
dt
or
d(γ1 − γ2 )
= −(γ1 − γ2 )
dt
d(γ1 + γ2 )
= γ1 + γ2
dt
which has solution γ1 − γ2 = M e−t , γ1 + γ2 = N et or

1 1
N et + M e−t , N et − M e−t

γ(t) =
2 2

which are the hyperbolae x2 − y 2 = M N . Let us fix M = 1 and solve for

f (t) = φ 12 (N et + M e−t ) , 12 (N et − M e−t ) .

df
= −xyf
dt
1
= − N et + e−t N et − e−t f (t)

4
1
= − N 2 e2t − e−2t f (t)

4
which has solution

1
f (t) = K(N ) exp − N 2 e2t + e−2t

8

or, using the fact that N et = x + y, e−t = x − y,

1
φ(x, y) = K(x2 − y 2 ) exp − (x + y)2 + (x − y)2

8

1
= K(x2 − y 2 ) exp − x2 + y 2

4

4.3 Quasilinear case

See http://youtu.be/43i-A6lt_MI.
Now we move on to quasilinear equations
∂φ ∂φ
A(x, y, φ) + B(x, y, φ) + C(x, y, φ) = 0. (4.1)
∂x ∂y
4.3. QUASILINEAR CASE 75

This is slightly harder to deal with: instead of using a system of two coupled
ODEs to find the characteristics and then solving the final ODE to work out f
along the characteristics, we end up with a system of three coupled ODEs for
γ1 , γ2 , f all at the same time and we have no obvious way of separating out the
three.

4.3.1 Characteristic vector field

Suppose that φ is a solution to the equation and that (x(t), y(t), f (t)) is the
restriction of this solution to a curve (x(t), y(t)). The system of ODE is

dx
= A(x(t), y(t), f (t)) (4.2)
dt
dy
= B(x(t), y(t), f (t)) (4.3)
dt
df
= −C(x(t), y(t), f (t)). (4.4)
dt
You can see that we are unlikely to be able to solve for x and y first because f
occurs in all three equations.
Suppose this system of equations is satisfied by some family of curves (xs (t), ys (t), zs (t)),
that is: for each s the curve

(xs (t), ys (t), zs (t))

satisfies (4.2)-(4.4). This family of curves traces out some surface in (x, y, z)-
space. Suppose that this surface is the graph of a function z = φ(x, y). A simple
application of the chain rule gives:

dφ(xs (t), ys (t)) dxs ∂φ dys ∂φ

= +
dt dt ∂x dt ∂y

and when we substitute in the right-hand sides of (4.2)-(4.4) we see that φ

satisfies (4.1) identically!
This means that if a function φ(x, y) has the property that its graph

{z = φ(x, y)}

coincides with the surface traced out by

(s, t) 7→ (xs (t), ys (t), zs (t))

then φ is a solution to Equation (4.1).

Conversely, if φ is a solution to (4.1) then (x(t), y(t), φ(x(t), y(t))), i.e. z(t) =
φ(x(t), y(t)), is a solution curve to the system (4.2)-(4.4) of ordinary differential
76 CHAPTER 4. FIRST ORDER PDE

equations, since
dz dx ∂φ dy ∂φ
= +
dt dt ∂x dt ∂y
∂φ ∂φ
=A +B
∂x ∂y
= −C

as required.
Definition 59. The vector field (A, B, −C) in (x, y, z)-space is called the char-
acteristic vector field of the quasilinear equation (4.1). The integral curves of
(A, B, −C) are called characteristic curves and the projections of the character-
istic curves to the xy-plane are called the characteristic projections.

In this language, what we have proved is that (a) the graph of a solution to
(4.1) is a union of characteristic curves, and that (b) a union of characteristic
curves traces out a surface which, when it is a graph, is the graph of a solution
to (4.1).

4.3.2 Cauchy data

We know that the graph of our solution is a union of characteristic curves. We

have to pick an ‘initial condition’ for our PDE. This will take the form of a
curve in (x, y, z)-space.
Cauchy Problem. Given a curve χ : R → R2 in (x, y)-space and a function
f : χ → R defined over that curve, find a solution φ to

Aφx + Bφy + C = 0

such that φ|χ = f . This initial condition is called the Cauchy data and the
solution is called its development. The curve χ is called the Cauchy hypersurface.

Geometrically, solving the Cauchy problem means finding the characteristic

curves which pass through the curve

s 7→ (χ1 (s), χ2 (s), f (χ(s)))

in (x, y, z)-space. The union of these characteristic curves is called a solution

surface and is parametrised by the coordinates (s, t) (where s is a coordinate on
the Cauchy hypersurface and t is a coordinate along the characteristic curve).

4.3.3 An example: Burgers’s equation

It’s high time we solved an example.

4.3. QUASILINEAR CASE 77

(a)

(b)

Figure 4.1: (a) The solution surface to ∂φ ∂φ

∂t + φ ∂y = 0 with the Cauchy data
φ(0, y) = y, plotted with the characteristic vector field. This surface is a union of
straight lines which are characteristic curves. (b) The characteristic projections
of this solution. You can see that they begin to cross at (−1, 0).

Example 60. Take the equation

∂φ ∂φ
+φ =0
∂x ∂y
78 CHAPTER 4. FIRST ORDER PDE

which has A(x, y, z) = 1, B(x, y, z) = z, C(x, y, z) = 0. Therefore the charac-

teristic vector field is

ẋ = 1
ẏ = z
ż = 0

which has integral curves

x=t+D
y = Et + F
z=E

for constants D, E, F . Without loss of generality we can take D = 0 because we

can reparametrise t so that x(0) = 0. These characteristic curves are straight
lines, pointing horizontally but with a slope in the xy-plane which increases as
z increases.
Let’s choose some Cauchy data. We’ll take as Cauchy hypersurface the straight
line χ = {(0, y) ∈ R2 }, which is parametrised by s 7→ (0, s). We’ll take two
different sets of Cauchy data: φ(0, s) = f (s) with (a) f (s) = s, (b) f (s) = s2 .
This means that in case (a) we are looking for the characteristic curves which
pass through the points (0, s, s) in (x, y, z)-space and in case (b) we are looking
for the characteristic curves which pass through the points (0, s, s2 ) in (x, y, z)-
space.
We first deal with case (a). The point (0, s, s) intersects the characteristic curve
(t, Et + F, E) at t = 0 if E = F = s. In other words, the Cauchy data pick out
the characteristic curves
t 7→ (t, st + s, s).
This means that (s, t) 7→ (t, st+s, s) is a parametrisation of our solution surface
by the coordinates (s, t).
For case (b), the point (0, s, s2 ) intersects the characteristic curve (t, Et + F, E)
at t = 0 if F = s, E = s2 . In other words, the Cauchy data pick out the
characteristic curves
t 7→ (t, s2 t + s, s2 ).
This means that (s, t) 7→ (t, s2 t + s, s2 ) is a parametrisation of our solution
surface by the coordinates (s, t).

4.3.4 Caustics

The problem with this method is that is produces solution surfaces which are
parametrised by (s, t) instead of by (x, y)! It is quite possible that the surface
S ⊂ R3 traced out as (s, t) vary is not the graph of any function z = φ(x, y).
4.3. QUASILINEAR CASE 79

(a)

(b)

Figure 4.2: (a) The solution surface to ∂φ ∂φ

∂t +φ ∂y = 0 with Cauchy data φ(0, y) =
2
y , plotted with the characteristic vector field. You can see the graph starting
to bend over and become double-valued. (b) The characteristic projections of
this solution. You can see that they are all tangent to the red curve 1 + 4xy = 0
and that they are starting to cross one another near that curve.
80 CHAPTER 4. FIRST ORDER PDE

Indeed, we can see in Figures 4.1 and 4.2 that the Cauchy development is usually
not a graph. Let’s see what goes wrong in the previous example if we try and
express φ as a function of (x, y).

In case (a), the solution surface is parametrised by

(s, t) 7→ (t, st + s, s)

so y = z(x + 1) and z = y/(x + 1). When x = −1 this doesn’t make sense.

This manifests itself in our picture: all the characteristic projections cross at
the point (−1, 0).
In case (b), the solution surface is parametrised by

(s, t) 7→ (t, s2 t + s, s2 )
√ √
so y = zx + z. If u = z then xu2 + u − y = 0 and so
√ 2
2 −1 ± 1 + 4xy
z=u = .
2x

this doesn’t make sense when 1 + 4xy < 0 and, even when 1 + 4xy, gives two
possible values for z = φ(x, y). This manifests itself in our picture: all the
characteristic projections live in the region 1 + 4xy ≥ 0 and they are all tangent
to the curve 1 + 4xy = 0 (in red in the figure). The solution surface simply
doesn’t extend as far as 1 + 4xy > 0.
The red curve in Figure 4.2 is the envelope of the characteristic curves, in other
words it is the curve to which all of the characteristics are somewhere tangent.
In the case of the eikonal equation of geometric optics, where characteristics
are light trajectories, this envelope shows up as a very bright curve called the
caustic.
Remark 61. The equation from our example is known as Burgers’s equation
(without viscosity) and is usually written

∂φ ∂φ
+φ = 0.
∂t ∂x
The interpretation is that the x-axis is filled with a fluid and the velocity of fluid
particles at x at time t is φ(x, t). The phenomenon of a single-valued solution
becoming multiple-valued is known as shock-wave formation and corresponds to
the fluid “catching up with itself ” so that fluid particles with different velocities
coexist at the same (x, t)-coordinate.

We want to understand the caustic more generally. It is actually quite easy to

calculate the caustic straight from the parametrisation of the solution surface
by (s, t).
4.3. QUASILINEAR CASE 81

Definition 62. The caustic of a Cauchy development

(s, t) 7→ (u(s, t), v(s, t), w(s, t))

is the set of critical values of the map

π : R2 → R2 , π(s, t) = (u(s, t), v(s, t)),

in other words, the projection under π of the set of points p ∈ R2 where

∂s u ∂t u
det (p) = 0.
∂s v ∂t v

I will quickly explain why the caustic is relevant, then we will proceed to com-
pute it in some examples. Remember that the graph of a function φ : R2 → R
is just the surface
(x, y) 7→ (x, y, φ(x, y)).
Our solution surfaces have the form

(s, t) 7→ (u(s, t), v(s, t), w(s, t))

so that x and y are given implicitly in terms of s and t. Therefore it is not

clear if s, t or w can be given in terms of x and y. If w cannot be expressed in
terms of x and y then the solution surface fails to be the graph of a function:
the surface “folds over itself”. In other words, the projection

π(s, t) = (u(s, t), v(s, t))

which forgets the value of w is not one-to-one. Along the fold, the solutions
surface admits vertical tangencies, that is vectors tangent to the solution
surface which point vertically upward in the w-direction. So we can detect
the bad behaviour of our solution by looking for vertical tangencies to our
solution surface. It turns out that this is precisely the problem of computing
the determinant in the definition of the caustic.
Lemma 63. If the solution surface (s, t) 7→ (u(s, t), v(s, t), w(s, t)) has a
vertical tangency at some point (s0 , t0 ) then (u(s, t), v(s, t)) is in the caustic.

Proof. Suppose without loss of generality that s0 = t0 = 0. Consider the

curves s 7→ (u(s, 0), v(s, 0), w(s, 0)) and t 7→ (u(0, t), v(0, t), w(0, t)). These
are curves on the solution surface passing through the point which has a
vertical tangency. Their velocity vectors:

X = (∂s u(0, 0), ∂s v(0, 0), ∂s w(0, 0)) and Y = (∂t u(0, 0), ∂t v(0, 0), ∂t w(0, 0))

span the tangent space of the solution surface at that point. In other words,
all tangent vectors are linear combinations AX + BY of these two. If one
of the tangent directions AX + BY is vertical then when it is projected
82 CHAPTER 4. FIRST ORDER PDE

into the (u, v)-plane it vanishes, and so the corresponding linear combination
vanishes: Aπ(X)+Bπ(Y ) = 0 (where π is the projection π(u, v, w) = (u, v)).
The vectors π(X) and π(Y ) are therefore linearly dependent. The vectors
π(X) and π(Y ) are the rows of the matrix

∂s u ∂t u
(p)
∂s v ∂t v

appearing in the definition of the caustic (where p is the point (0, 0)). The
determinant of a matrix vanishes if and only if its rows are linearly dependent,
which proves the lemma.

Example 64. In our earlier example, case (a), we had

π(s, t) = (u(s, t), v(s, t)) = (t, st + s)

and so the determinant of the matrix of partial derivatives is

0 1
det = −t − 1
t+1 s

which vanishes when t = −1. When t = −1, we have (u(s, −1), v(s, −1)) =
(−1, 0) so the caustic is a single point: the point we discovered earlier as the
intersection of all the characteristic projections.
In case (b) we had (u(s, t), v(s, t)) = (t, s2 t + s) and so the determinant of the
matrix of partial derivatives is

0 1
det = −2st − 1
2st + 1 s2

which vanishes when 2st = −1. This means that (x, y) is on the caustic if it
has the form (u(−1/2t, t), v(−1/2t, t)) = (t, −1/4t) so the caustic is the curve
{4xy + 1 = 0}.
Example 65. Consider the PDE
∂φ ∂φ
− sin φ + cos φ =1
∂x ∂y
for which the characteristic vector field is

(− sin(z), cos(z), 1).

The solution to the characteristic system of ODE is

(x(t), y(t), z(t)) = (x0 + cos t, y0 + sin t, t)

(reparametrising so that z(0) = 0) and these characteristic curves are helices,

spiraling upward. Suppose that the initial condition is φ(x, 0) = 0 so that the
4.4. FULLY NONLINEAR CASE 83

Cauchy hypersurface is χ(s) = (s, 0) and the Cauchy data is φ(s, 0) = 0. The
characteristic curve (x0 + cos t, y0 + sin t, t) intersects (s, 0, 0) at t = 0 if y0 = 0
and x0 = s − 1, so the solution surface is parametrised by

(s, t) 7→ (s − 1 + cos t, sin t, t)

Clearly the solution is φ(x, y) = sin−1 (y) but this is not single-valued, indeed
there are infinitely many z such that sin z = y. Moreover, the solution goes no
further than y = 1 - this corresponds to the fact that helices start overlapping
when t = π/2 (see Figure 4.3: the characteristics are segments of circles with
centre on the x-axis).
The matrix of first derivatives of the projection of the solution surface to the
(x, y)-plane is
1 − sin t
0 cos t
which has determinant cos t. This vanishes precisely along t = π/2 as suspected,
so the caustic is the line y = sin(π/2) = 1.

Figure 4.3: The characteristics of − sin φ ∂φ ∂φ

∂x +cos φ ∂y = 1 are segments of circle.
We see they start to overlap near y = 1.

4.4 Fully nonlinear case

The method works for the fully nonlinear case but the system of ODEs becomes
yet more complicated.
Suppose that G(x, y, u, p, q) is a function of five variables and that φ : R2 →
R is a function satisfying

G (x, y, φ(x, y), ∂x φ(x, y), ∂y φ(x, y)) = 0

84 CHAPTER 4. FIRST ORDER PDE

for all (x, y) ∈ R2 . Now let γ(t) = (x(t), y(t)) be a curve and restrict φ to γ
to obtain a function u(t) as usual. Let G(t) denote

G(x(t), y(t), u(t), p(t), q(t))

where p(t) = ∂x φ(x(t), y(t)), q(t) = ∂y φ(x(t), y(t)). We have

dG ∂G ∂G ∂G ∂G ∂G
= ẋ + ẏ + u̇ + ṗ + q̇
dt ∂x ∂y ∂u ∂p ∂q

and since u̇ = ẋ ∂φ ∂φ
∂x + ẏ ∂y = ẋp + ẏq this becomes

dG ∂G ∂G ∂G ∂G
= ẋ + p + ẏ + q +
dt ∂x ∂u ∂y ∂u
∂G ∂G
+ ṗ + q̇.
∂p ∂q
This suggests a system of five coupled ODEs for the five quantities
(x(t), y(t), u(t), p(t), q(t)):

dx ∂G ∂G ∂G
= ṗ = − +p
dt ∂p ∂x ∂u

dy ∂G ∂G ∂G
= q̇ = − +q
dt ∂q ∂y ∂u
∂G ∂G
u̇ = p+ q
∂p ∂q
As usual, if we integrate this system of ODEs we will obtain a curve. Taking a
one-parameter family of these integral curves gives a surface in (x, y, u, p, q)-
space and when we project to (x, y, u)-space we obtain a surface which, wher-
ever it is a graph, is the graph of a solution u = φ(x, y).

Example 66. Let us consider the eikonal equation (in units where the speed
of light is 1)
2 2
∂φ ∂φ
+ =1
∂x ∂y
(so G(x, y, u, p, q) = p2 +q 2 ) which describes the time φ taken by light emitted
normally by some curve C ⊂ R2 to each a point (x, y) ∈ R2 . To see that
this description is valid, let’s consider the corresponding system of ODEs:
dx
= 2p ṗ = 0
dt
dy
= 2q q̇ = 0
dt
u̇ = 2(p2 + q 2 ) = 2
4.4. FULLY NONLINEAR CASE 85

Figure 4.4: The equidistants of an ellipse form singularities as characteristics

cross.

Starting from the curve C and choose the initial condition φ|C = 0. We
see that this determines p and q along C, namely (p, q) must be (plus or
2 2
minus) the unit normal vector
to the curve C because p + q = 1 (giving unit
∂φ ∂φ
length) and because (p, q) = ∂x , ∂y and by our choice of initial condition
the directional derivative of φ along C vanishes, so (p, q) is normal to C.
Now along the integral curves of the ODE, p and q do not change and hence
x and y follow the normal line (with speed 2) and the solution to u̇ = 2 is just
u(t) = 2t. If one followed the normal with speed 1, we would get u(t) = t.
This is precisely the statement that the solution of the eikonal equation is the
time taken by light to reach (x, y) from C in a normal direction. The char-
acteristics are straight lines normal to C. Solutions of the eikonal equation
can be very beautiful. Figure 4.4 shows some of the singularities
n 2 developed o
by level sets of φ corresponding to the initial condition C = x4 + y 2 = 1
(that is, the equidistants of an ellipse).
86 CHAPTER 4. FIRST ORDER PDE
Chapter 5

The heat equation and

Laplace’s equation

5.1 The heat equation

We want to study the evolution of a temperature distribution which is not in a
steady state. To simplify matters we restrict to one-dimensional distributions -
these are then represented by functions φ : [0, L] × [0, ∞) → R where the first
coordinate is spatial (x) and the second is time (t) and the equation governing
their evolution is the heat equation, a parabolic second order PDE:

∂φ ∂2φ
= .
∂t ∂x2

5.1.1 Boundary conditions

We need to impose boundary conditions in the x and t-directions. The t-

boundary conditions are called initial conditions (for obvious reasons!). We
will always impose an initial condition like

φ(x, 0) = F (x)

for some function F , and then seek a solution on [0, L] × [0, ∞).
There are various options for the boundary conditions in the x-direction:
• You can specify φ(0, t) and φ(L, t). This is called the Dirichlet problem.
• Instead of specifying φ along one of these boundaries you can specify
∂φ ∂φ
∂x (0, t) or ∂x (L, t) (in other words the directional derivative of φ in a
direction normal to the boundary). This is called a Neumann boundary

87
88 CHAPTER 5. THE HEAT EQUATION AND LAPLACE’S EQUATION

∂φ
condition. You could also impose ∂t (x, 0), i.e. a Neumann initial condi-
tion! But let’s not do that.
• Mixtures of the two are perfectly OK.
I was always confused by Neumann boundary conditions. What’s the point?
With Dirichlet it’s clear what it means - like with a soap film filling in a fixed
boundary shape. Here is a physical situation (heat) in which both are clearly
relevant.
• Dirichlet: The boundary points of our interval are kept at fixed tem-
peratures T1 , T2 , maybe by immersing the region in some vast bath of
refridgerating liquid. Since our interval is one-dimensional then you’ve
got to imagine immersing it in a one-dimensional bath, so that only the
boundary is being refridgerated...
• Neumann: The boundary of our interval is completely insulated from the
outside world. Insulation means that heat cannot flow into or out of the
boundary. Since heat travels down a temperature gradient, this means
that the directional derivative in the normal direction to the boundary
must vanish.

5.1.2 Strategy

We now want to solve the heat equation with given boundary and initial con-
ditions. Fourier solved this problem in his treatise on heat in 1822. His idea
was:
• Find lots of solutions to the heat equation. Make sure they fit the bound-
ary conditions. Maybe they don’t fit your initial condition, but don’t
worry.
• Take linear superpositions of (possibly infinitely many of) these solutions
in such a way as to fit the resulting function to the initial condition.
This works because the heat equation P∞is linear, so if φ1 , φ2 , . . . are solutions and
A1 , A2 , . . . are real numbers then k=1 Ak φk is also a solution (provided that
the partial sumsPconverge in an appropriate way (uniformly) so that derivatives
∞
commute with k=1 ).

5.1.3 Separation of variables

We find lots of solutions by using a method called separation of variables: we

seek solutions of the form

φ(x, y) = X(x)T (t)

5.1. THE HEAT EQUATION 89

where X(x) is a function of x alone and T (t) is a function of t alone. In this

case the heat equation becomes

XT 0 = X 00 T.

If X and T solve X 00 = λX and T 0 = λT then certainly XT 0 − X 00 T = λXT −

λXT = 0 so the problem would be solved. I didn’t just pull these equations out
of a hat, the argument usually goes like this: divide through by XT to get

X 00 T0
= =λ
X T
for some constant λ (has to be constant because X 00 /X depends only on X and
T 0 /T depends only on T ). Multiplying up gives the two equations I mentioned.
This has the disadvantage of dividing by XY which could potentially vanish.
In future we will not be made queasy by this division and, from now on, that’s
how we’ll derive equations for separation of variables.
Solving for T , we get either

T (t) = Keλt or K

with K constant, according to whether λ 6= 0 or λ = 0 respectively. The λ = 0

case gives X 00 = 0 and hence X = Ax+B. Since these have no time-dependence
(because T is constant) these are called steady solutions.
The λ > 0 solutions have T increasing exponentially in time, which seems very
unphysical! A hot plate, left to cool, doesn’t suddenly become hotter. We will
see shortly how our boundary conditions allow us to ignore such solutions.
We can write down the general separated solution by solving the ODEs above:
 √ √ λt
 A cos(x −λ) + B sin(x −λ) e
 if λ < 0
φλ (x, t) = (Ax + B) if λ = 0 .
 √ √
A cosh(x −λ) + B sinh(x −λ) eλt if λ > 0


5.1.4 Fitting to boundary conditions

We fit the separated solutions to the boundary conditions. We specialise to

Dirichlet conditions for simplicity. In the question sheets there will be some
with Neumann conditions; the method is the same.
The Dirichlet conditions we impose are φ(0, t) = S0 , φ(L, t) = S1 . The steady
solution
S1 − S0
φ0 (x, t) = x + S0
L
obviously fits these conditions. By setting θ(x, t) = φ(x, t) − φ0 (x, t) get a new
solution θ to the heat equation (because the heat equation is linear) satisfying
90 CHAPTER 5. THE HEAT EQUATION AND LAPLACE’S EQUATION

the new Dirichlet conditions θ(0, t) = 0 = θ(L, t). In other words, we can assume
without loss of generality that S0 = S1 = 0.
√ √
If λ > 0 then X(x) = A cosh(x λ)+B sinh(x λ) and so the condition √ X(0) = 0
means that A = 0. Also, the condition X(L) = 0 gives B sinh(L λ) = 0, but
sinh only vanishes at 0, hence B = 0 also. Therefore we can ignore separated
solutions with λ > 0 (as we had hoped) because we cannot fit them to our
boundary conditions.
If λ = −p2 < 0 then X(x) = A cos(px)+B sin(px) and so the condition X(0) = 0
means that A = 0. Also, the condition X(L) = 0 gives B sin(pL) = 0, and now
sin vanishes at nπ, n ∈ {1, 2, 3, ...}. This means we can have nontrivial solutions
(i.e. with B 6= 0) whenever pL = nπ. Thus we have separated solutions
2
φn = e−n t sin(nπx/L)

for n = 1, 2, 3, ....

5.1.5 Fitting to initial condition

Finally we want to take a (possibly infinite) linear combination of separated

solutions φn to fit to the initial condition φ(x, 0) = F (x). We are allowing φ0
too. So the aim is to write
∞
S1 − S0 X
F (x) = x + S0 + An sin(nπx/L)
L n=1

in other words, we want to expand F (x) − S1 −S L x − S0 as a (half-range)

Fourier sine series. In other words, extend F : [0, L] → R to an odd function

F̃ : [−L, L] → R: (
F (x) if x ∈ [0, L]
F̃ (x) =
−F (−x) if x ∈ [−L, 0]
and take Z L
1
An = F̃ (x) sin(nπx/L)dx.
L −L

No wonder Fourier invented Fourier series!

Example 67. Suppose we want to solve ∂φ/∂t = ∂ 2 φ/∂x2 for x ∈ [0, π], subject
to φ(x, 0) = cos x, φ(0, t) = 1, φ(π, t) = −1. We start by setting

S1 − S0
φ0 (x, t) = x + S0 = 1 − 2x/π
π
and defining θ(x, t) = φ(x, t) − φ0 (x, t). Then θ solves the Dirichlet conditions

θ(x, 0) = cos x − 1 + 2x/π, θ(0, t) = 0 = θ(π, t).

5.2. LAPLACE’S EQUATION 91

The solution to this problem is

∞
X 2
θ(x, t) = An e−n t sin(nx)
n=1

where An is the nth half-range sinusoidal Fourier coefficient of

F (x) = cos x − 1 + 2x/π

which is
2 ((−1)n + 1)
π n(n2 − 1)
so that
∞
2x 2 X ((−1)n + 1) −n2 t
φ(x, t) = 1 − + e sin(nx).
π π n=1 n(n2 − 1)

5.1.6 The smoothing effect of parabolicity

In the limit as t → ∞ all of the sinusoidal oscillations decay exponentially and

we are left with the steady solution A0 x + B0 . The Fourier modes with large
2
n decay at a faster rate e−n t . This makes physical sense: heat moves down
a temperature gradient to even out the temperature distribution. The more
gradient there is, the faster the heat moves. The higher n is, the wigglier the
Fourier mode, the steeper the gradient, the faster the heat moves to smooth out
the temperature distribution. This behaviour is typical of parabolic PDEs.

5.2 Laplace’s equation

5.2.1 Harmonic functions

The Laplace operator for functions of n variables x1 , . . . , xn is

n
X ∂2
∆ .
∂x2k
k=1

Recall (we only wrote it out for n = 2) that this arises as the Euler-Lagrange
equation for functions defined on a region U ⊂ Rn with specified boundary
values, extremising the functional
Z n 2 !
X ∂φ
dx1 · · · dxn
U ∂xk
k=1

known as the Dirichlet energy. Functions which are annihilated by ∆ are called
harmonic functions. The problem of finding a harmonic function on a given
92 CHAPTER 5. THE HEAT EQUATION AND LAPLACE’S EQUATION

region with specified boundary values is called Dirichlet’s problem. The equation
∆φ = 0 satisfied by a harmonic function is called Laplace’s equation; it is a linear
elliptic second-order equation with constant coefficients.
Example 68. Consider harmonic functions on the interval [0, 1]. Since these
satisfy d2 f /dx2 = 0 they are just functions of the form f (x) = ax + b, whose
graphs are straight line segments from (0, b) to (1, a + b).

This example displays behaviour typical of solutions to elliptic problems.

• After fixing the boundary conditions (in this case fixing b and a + b) there
is only a very limited set of solutions (in this case a unique one).
• The solution is heavily dependent on the boundary conditions: if you
change the boundary data locally (i.e. at just one of the boundary points),
the solution is altered everywhere (i.e. arbitrarily close to the other bound-
ary). This is in contrast with the wave equation: we will see that it takes
time for changes in initial conditions to propagate (at the ‘speed of light’).
• The solution displays the following mean value property: 21 (f (x) + f (y)) =
f x+y

2 . Note that the solution is actually linear, but in higher dimensions
it’s only a weaker mean value property which holds, namely that the value
of a harmonic function at the centre of a disc D ⊂ U is equal to the integral
of the function around the boundary of the disc (rescaled by the reciprocal
of the circumference of the disc).

5.2.2 The maximum principle

We alluded above to the mean value property:

Theorem 69 (Mean value property of harmonic functions). If f : U → R is a
harmonic function on a domain U ⊂ R2 and D ⊂ U is a disc centred at x of
radius r then R 2π
f (x + reiθ )rdθ
f (x) = 0 R 2π .
0
rdθ

which is the analogue of the property

x+y 1
f = (f (x) + f (y))
2 2

for harmonic functions on an interval. We won’t prove this, but note that it
bears a marked (and non-coincidental) resemblance to Cauchy’s integral formula
in complex analysis! This formula has the consequence that the maxima and
minima of a harmonic function are achieved along the boundary of its domain.
Corollary 70 (Maximum principle). If f : U → R is a harmonic function on
a domain U ⊂ R2 then the maximum and minimum of f are achieved along the
5.2. LAPLACE’S EQUATION 93

boundary of U . If the maximum or minimum is also achieved in the interior of

U then f is constant.

Proof. Suppose that the maximum is achieved somewhere (say x) in the interior
and that the function is not constant. Then there exists a point y such that
f (y) < f (x). Let U1 , . . . , Un be a sequence of discs such that x is at the centre
of U1 , y is on the boundary of Un and the boundary of Uk passes through the
centre of Uk−1 (we are making some topological assumptions about the domain
U , like the fact that any two points in U can be connected by such a sequence
of discs - that’s a fairly mild condition).
Now let zk be the point at the centre of Uk (so x = z1 ), let zn+1 = y, and let
rk be the radius of Uk . Since f (p) ≤ f (x) for all p ∈ U we have

f (x + r1 eiθ )r1 dθ
R R
∂U1
f (x)r1 dθ
R ≤ ∂U
R1 = f (x)
r dθ
∂U1 1 ∂U1 1
r dθ

with equality if and only if f (p) = f (x) for all p ∈ U1 . By the mean value
theorem, equality holds, sof (z1 ) = f (x). Applying the same argument again
and again we eventually get to f (y) = f (x), which is a contradiction.
Corollary 71 (Uniqueness of solutions to Dirichlet problem). Suppose that
φ1 , φ2 : U → R are harmonic functions on a domain U ⊂ R2 and φ1 |∂U =
φ2 |∂U . Then φ1 ≡ φ2 .

Proof. Take θ = φ1 − φ2 . By linearity of the Laplace equation, ∆θ = 0. More-

over, θ|∂U ≡ 0. By the maximum principle, the maximum and minimum of
θ are achieved along ∂U , where it vanishes, hence it vanishes identically and
φ1 ≡ φ2 .

5.2.3 Steady temperature distributions

Suppose that U ⊂ Rn is a region in n-dimensional Euclidean space (we’re

mostly interested in n = 1, 2) and that φ : Rn × R → R is a time-dependent
temperature distribution (also written φ(x, t), where x ∈ U is a spatial variable
and t ∈ R represents time). Then φ evolves according to the heat equation

∂φ
= ∆φ.
∂t
We see that harmonic functions correspond to steady temperature distributions,
that is distributions which don’t change over time. From the variational inter-
pretation of Laplace’s equation, these are the distributions which minimise the
total temperature gradient (unsurprising, given that heat flows down a tem-
perature gradient (from hot to cold) and tries to homogenise the temperature,
subject to the boundary conditions).
94 CHAPTER 5. THE HEAT EQUATION AND LAPLACE’S EQUATION

5.2.4 Solution on a rectangular domain

We will now seek to solve the Laplace equation on a rectangular domain U =

[0, a] × [0, b] ⊂ R2 . That is we need to find a harmonic function on the rectangle
with arbitrary boundary data. Our strategy is the same as for the heat equation:
• Find lots of solutions to the Laplace equation. Maybe they don’t fit your
boundary values, but don’t worry.
• Take linear superpositions of (possibly infinitely many of) these solutions
in such a way as to fit the resulting function to the boundary data.
This works because the Laplace equation P∞is linear, so if φ1 , φ2 , . . . are solutions
and A1 , A2 , . . . are real numbers then k=1 Ak φk is also a solution (provided
that the partialP sums converge in an appropriate way (uniformly) so that ∆
∞
commutes with k=1 ).

Separation of variables

We find lots of solutions by using a method called separation of variables: we

seek solutions of the form

φ(x, y) = X(x)Y (y)

where X(x) is a function of x alone and Y (y) is a function of y alone. Laplace’s

equation becomes
X 00 (x)Y (y) + X(x)Y 00 (y) = 0 (5.1)
where X 00 (respectively Y 00 ) denotes the ordinary second derivative of X (re-
spectively Y ) as a function of one variable. Divide (5.1) by XY and we get

X 00 Y 00
=− .
X Y
The left-hand side depends only on x, the right-hand side only on y and by
equality of the two sides, both expressions must be constant, equal to λ. This
gives the two equations X 00 = λX, Y 00 = −λY .
There are now three cases:
• λ = 0: In this case X 00 = 0 so X = Ax + B, Y 00 = 0 so Y = Cy + D.
• λ > 0: In this case Y is a simple harmonic oscillator so
√ √
Y = C cos y λ + D sin y λ

and √ √
X = A cosh x λ + B sinh x λ .
5.2. LAPLACE’S EQUATION 95

• λ < 0: In this case, X and Y are reversed (X becomes trigonometric, Y

becomes hyperbolic):
√ √
Y = C cosh y −λ + D sinh y −λ

and √ √
X = A cos x −λ + B sin x −λ .

Boundary conditions

We have various possible boundary conditions we can apply along the edges of
our rectangle.
• Dirichlet boundary conditions: where we fix the values of φ(x, 0),
φ(x, b), φ(−a, y) or φ(a, y).
• Neumann boundary conditions: where we fix the values of ∂φ ∂x along
∂φ
a vertical boundary or ∂y along a horizontal boundary. More generally,
we would fix the directional derivative of φ in a direction normal to the
boundary.
We can either have pure Dirichlet conditions, pure Neumann conditions or a
mixture (called mixed boundary conditions).

Fitting solutions to boundary data

Now let’s examine the various solutions we found by separation of variables and
see what kinds of boundary condition we can satisfy. We will concentrate on
Dirichlet. We denote by U the rectangular region [0, a] × [0, b] and by ∂U its
boundary.

Fitting to corners

We impose

φ(x, 0) = F1 (x) φ(x, b) = F2 (x)

φ(0, y) = F3 (y) φ(a, y) = F4 (y)

Let’s denote the corner values by:

S00 = F1 (0) = F3 (0) S10 = F1 (a) = F4 (0)

S01 = F2 (0) = F3 (b) S11 = F2 (a) = F4 (b)

and let’s start by fitting the λ = 0 solutions

φ0 (x, y) = (Ax + B)(Cy + D) = BD + ADx + BCy + ACxy

96 CHAPTER 5. THE HEAT EQUATION AND LAPLACE’S EQUATION

to this corner data. The four corner values give us four equations for A, B, C, D
which we can solve and we get

(S10 − S00 ) (S01 − S00 ) (S11 − S10 − S01 + S00 )

φ0 (x, y) = S00 + x +y + xy
a b ab
You can check that this satisfies the corner data. Now define

θ(x, y) = φ(x, y) − φ0 (x, y)

and θ is a solution to Laplace’s equation which solves some new boundary con-
ditions, but crucially θ = 0 at all the corners of U .
In other words, without loss of generality, our solution vanishes at the corners
of U .

Patching solutions

We’ve seen that we can modify F1 , F2 , F3 and F4 to F1new (x) = F1 (x)−φ0 (x, 0),
F2new (x) = F2 (x)−φ(x, b), F3new (y) = F3 (x)−φ0 (0, y) and F4new (y) = F4 (y)−
φ0 (a, y) which vanish at the corners. So let’s split our problem into four pieces:
let’s look for solutions θ1 , θ2 , θ3 , θ4 such that:

θ1 (x, 0) = F1new (x) θ1 (x, b) = 0

θ1 (0, y) = 0 θ1 (a, y) = 0

and

θ2 (x, 0) = 0 θ2 (x, b) = F2new (x)

θ2 (0, y) = 0 θ2 (a, y) = 0

etc. Note that we can only do this because F1new (0) = F1new (a) = 0, etc.
otherwise the boundary conditions are discontinuous at the corners.
If we can find θ1 , θ2 , etc. then θ1 + θ2 + θ3 + θ4 is still a solution to Laplace’s
equation and satifies all the boundary conditions we wanted originally. I’ll show
you how to solve for θ1 : the other solutions can all be obtained the same way,
or by being clever with symmetries of the rectangle.

Solve for θ1

θ1 will be written as a (possibly infinite) linear combination of separated solu-

tions. We want these separated solutions individually to satisfy X(0) = X(a) =
0 to ensure that θ1 (0, y) = 0 = θ1 (a, y).
5.2. LAPLACE’S EQUATION 97
√ √
If λ > 0 then X(x) = A cosh(x λ) + B sinh(x λ). The condition X(0) = 0
means that A = 0 and the condition X(L) = 0 means that B = 0. So there are
no separated solutions with λ > 0 that work.
√ √
If λ < 0 then X(x) = A cos(x −λ) + B sin(x −λ). The condition√X(0) = 0
means that A = 0 and the condition X(L) √ = 0 means that B sin(a −λ) = 0
which means that if B 6= 0 we need a −λ = nπ for some n ∈ Z. Thus
λ = −n2 π 2 /a2 .
We therefore want a combination
∞
X
θ1 (x, y) = (Cn sinh(nπy/a) + Dn cosh(nπy/a)) sin(nπx/a)
n=1

which fits the final two boundary conditions

θ1 (x, 0) = F1 (x), θ1 (x, b) = 0.
These give us
∞
X
F1 (x) = Dn sin(nπx/a)
n=1

so that Dn are the coefficients in the Fourier half-range sine series of F1 (x) and
∞
X
0= (Cn sinh(nπb/a) + Dn cosh(nπb/a)) sin(nπx/a)
n=1

which means
Cn sinh(nπb/a) + Dn cosh(nπb/a) = 0
or
Cn = −Dn coth(nπb/a).
Example 72. If F1 (x) = sin(x) on [0, a = π] × [0, b = 5π] then D1 = 1 and
C1 = coth(5π) so the solution is
θ1 (x, y) = (− coth(5π) sinh y + cosh y) sin x.

Finding the other θi

You could of course plough through this procedure for each θ. For instance, θ2
is particularly easy: we need to solve
∞
X
0= Dn sin(nπx/a),
n=1

whch gives Dn = 0, and

∞
F new (x) =
X
2 Cn sinh(nπb/a) sin(nπx/a)
n=1
98 CHAPTER 5. THE HEAT EQUATION AND LAPLACE’S EQUATION

so that Cn is 1/ sinh(nπb/a) times the nth Fourier coefficient of F2new (x).

Solving for θ3 and θ4 means you need the λ > 0 solutions so that Y is trigono-
metric, but otherwise the solutions are found exactly the same way.
Sometimes, using a bit of geometric intuition (e.g. reflecting the rectangle) will
allow you to guess the solution.

5.2.5 Some examples

Example 73. Here is an example. On the square [0, π]2 , take F1 (x) = sin(x),
F2 (x) = 0, F3 (y) = 0, F4 (x) = sin(y).
We get θ1 (x, y) = (− coth(π) sinh y + cosh y) sin x as in the previous example.
However, we could rearrange this using hyperbolic trigonometric identities as
sinh(π−y) sin x
sinh π . If we had instead begun with F2 (x) = sin(x), F1 (x) = 0, the
solution would have been sinh y sin x
sinh π . Clearly these two solutions are related by
y 7→ π − y, which is a reflection in the horizontal line which cuts the rectangle
in two.
sinh x sin y
We also get θ4 (x, y) = sinh π . So the full solution is

sinh x sin y sinh(π − y) sin x

φ(x, y) = + .
sinh π sinh π
Example 74. On the square [0, π]2 , take F1 (x) = 0, F2 (x) = x2 , F3 (y) = 0,
F4 (y) = y 2 .
We start by finding a λ = 0 solution to fit the corner data. Consider the function
xy. This takes on the right values at the corners of our square (i.e. the values
0, 0, 0, π). If φ solves our problem then θ(x, y) = φ(x, y) − xy solves Laplace’s
equation with boundary values

F1new (x) = 0, F2new (x) = x2 − πx, F3new (y) = 0, F4new (y) = y 2 − πy

(in particular, the boundary values are zero at the corners). The sinusoidal
Fourier series of x2 −πx on [0, π] (i.e. the half-range series obtained by extending
it to an odd function on [−π, π]) is
∞
8 X sin((2k − 1)x)
−
π (2k − 1)3
k=1

and because the values in the corners are now zero we can apply our earlier
solution
∞
8 X sin((2k − 1)x) sinh((2k − 1)y) sin((2k − 1)y) sinh((2k − 1)x)
− +
π (2k − 1)3 sinh((2k − 1)π) (2k − 1)3 sinh((2k − 1)π)
k=1
5.3. EIGENFUNCTION EXPANSIONS 99

To get φ rather than φ(x, y) − xy we need to add xy so the final answer is

∞
8 X sin((2k − 1)x) sinh((2k − 1)y) sin((2k − 1)y) sinh((2k − 1)x)
xy − + .
π (2k − 1)3 sinh((2k − 1)π) (2k − 1)3 sinh((2k − 1)π)
k=1

5.3 Eigenfunction expansions

This section is nonexaminable. It attempts to answer the question “Why is
sin(nπx/L) turning up all over the place?”.
Consider the vector space Y of functions [0, π] → R and the subspace
X = {F : [0, π] → R : F (0) = F (π) = 0}
of functions on [0, π] which vanish at 0 and π. We can define a linear operator
d2
:X→Y
dx2
which takes F to the second derivative of F . Note that F 00 (0) and F 00 (π)
need not vanish! Nonetheless, we can ask for eigenvalues and eigenvectors
(eigenfunctions) of d2 /dx2 , that is functions F ∈ X such that
F 00 = d2 F/dx2 = λF
for some λ ∈ R. First notice that
Z π Z π Z π
λF 2 dx = F F 00 dx = − F 0 F 0 dx ≤ 0
0 0 0

by integrating by√parts. This implies

√ λ ≤ 0. The equation F 00 = λF has
−λ) + B cos(x −λ) and these vanish at 0 and π if and
solutions A sin(x √
only if B = 0 and −λ = nπ for some n ∈ Z.
Now consider the heat equation in three variables
∂φ/∂t = ∆φ
where φ(t, x, y) is now a function of three variables and ∆ is the Laplacian.
Seeking separated solutions φ = T (t)M (x, y) gives
T 0 /T = ∆(M )/M = λ
where λ is constant and hence T = eλt , ∆(M ) = λM . So what are the
eigenfunctions of ∆? Well we worked them out on Sheet 7. As usual there’s
a discrete set of them Mn corresponding to a discrete collection of eigenvalues
λn . To replace the usual Fourier expansion of a function F in terms of sines,
there is now an eigenfunction expansion of F as a sum of eigenfunctions of
Mn : X
F (x, y) = An Mn (x, y)
n
100 CHAPTER 5. THE HEAT EQUATION AND LAPLACE’S EQUATION

P
and ∆(F ) = n An λn Mn (x, y). Thus a solution to the heat equation with
initial conditions φ(0, x, y) = M (x, y) is given by
X
An eλn t Mn (x, y)
n

and the nth mode of F decays in time with rate λn .

Returning to 1-dimensional problems, we might try to solve the equation

φt = f (x)φxx + g(x)φx + h(x)φ

by separating variables φ(x, t) = T (t)X(x). The result is

T 0 /T = f (x)X 00 /X + g(x)X 0 /X + h(x) = λ

where λ is constant. Solving problems like

f (x)X 00 + g(x)X 0 + h(x)X = λX

is the subject of Sturm-Liouville theory. You’ve already met an equation like

this on Sheet 4, namely Legendre’s equation

X 00 (1 − x2 ) − 2xX 0 = −`(` + 1)X

where λ = `(` + 1) for ` ∈ Z are the eigenvalues.

The Sturm-Liouville theory gives conditions on f, g, h which ensure the ex-
istence of an infinite discrete set of eigenvalues λn with eigenfunctions yn .
The key properties which let us mimic Fourier theory are:
Z π
yn (x)ym (x)dx = δmn
0

(orthogonality) like the integrals of sine and cosine which let us compute
Fourier series and completeness, which asserts the existence of an expansion
X
F (x) = An yn (x)
n

for any (reasonable, e.g. square-integrable) function. This orthogonality can

be understood as an infinite-dimensional analogue of the following theorem
from linear algebra:
Theorem 75. Suppose A is a symmetric matrix. If v and w are eigenvectors
of A associated to different eigenvalues λ 6= µ then v · w = 0.

Proof. Consider

v · Aw = (AT v) · w
= (Av) · w
5.4. DISCONTINUITIES 101

Since w is a µ-eigenvector of A and v is a λ-eigenvector of A this equation

becomes
µ(v · w) = λ(v · w)
If µ 6= λ this implies v · w = 0.

The analogue of the dot product · for the space of functions on [0, π] is the
integral Z π
“f · g” = f (x)g(x)dx
0

In function space, A is replaced by an appropriate operator like d2 /dx2 . The

analogue of the equation
v · Aw = (Av) · w
is obtained by integrating by parts twice:
Z π Z π Z π 2
d2 g df dg d f
f (x) 2 (x)dx = − (x) (x)dx = 2
(x)g(x)dx.
0 dx 0 dx dx 0 dx

(Note that we need f (0) = f (π) = g(0) = g(π) = 0 or other suitable bound-
ary conditions for this integration to work without picking up boundary
terms).
A better name for “symmetric” in this context is “self-adjoint”: an operator
has an adjoint operator AT defined by

(AT v) · w = v · Aw

and self-adjointness means that A = AT .

5.4 Discontinuities
As a final example, I want to tackle the problem of what happens when your
equation becomes discontinuous. For example, maybe you’re interested in the
heat equation on a rod [0, 2π] whose thermal diffusivity changes dramatically
halfway along and you model this by using the heat equation
∂φ ∂2φ
=σ 2
∂t ∂x
where σ is the discontinuous function
(
σ1 if x ∈ [0, π]
σ(x) =
σ2 if x ∈ [π, 2π]
Let’s figure out what the separated solutions are when σ1 = 1 and σ2 = 2 (so
the right-hand half of the rod conducts heat more quickly than the left-hand
half).
102 CHAPTER 5. THE HEAT EQUATION AND LAPLACE’S EQUATION

We impose the boundary conditions φ(0, t) = φ(2π, t) = 0. Let’s not impose

an initial condition and just look for the separated solutions. We will also need
extra “boundary conditions” at the discontinuity, namely we will require our
separated solutions and their first derivatives to be continuous at π.
Let’s write
(
φ1 (x, t) = X1 (x)T1 (t) on x ∈ [0, π]
φ(x, t) =
φ2 (x, t) = X2 (x)T2 (t) on x ∈ [π, 2π]

On [0, π] we get T10 = λT1 , X100 = λX1 as usual. On [π, 2π] we get T20 = λT2 and
X200 = 2λX2 . Therefore the separated solutions we seek are
√ √ p p
X1 (x) = A1 sin(x −λ)+B1 cos(x −λ), X2 (x) = A2 sin(x −λ/2)+B2 cos(x −λ/2)

The boundary condition φ(0, t) = 0 becomes

X1 (0) = 0 ⇒ B1 = 0

and the boundary condition φ(2π, t) = 0 becomes

p p
X2 (2π) = 0 ⇒ A2 sin(2π −λ/2) + B2 cos(2π −λ/2) = 0

We have four constants left A1 , A2 , B2 , λ and only one equation connecting

them. We need to impose continuity of X and X 0 at x = π to fix these extra
constants.
We have
√
X1 (π) = A1 sin(π −λ)
p p
X2 (π)A2 sin(π −λ/2) + B2 cos(π −λ/2)

so
X1 (π) = X2 (π)
implies √ p p
A1 sin(π −λ) = A2 sin(π −λ/2) + B2 cos(π −λ/2)
and
√ √
X10 (π) = −λA1 cos(π −λ)
p p p
X20 (π) −λ/2 A2 cos(π −λ/2) − B2 sin(π −λ/2)

so
X10 (π) = X20 (π)
implies
√ √ p p p
−λA1 cos(π −λ) = −λ/2 A2 cos(π −λ/2) − B2 sin(π −λ/2)
5.4. DISCONTINUITIES 103

We already know that

p p
A2 sin(2π −λ/2) = −B2 cos(2π −λ/2)
p
i.e. B2 = −A2 tan(2π −λ/2), so the two new equations give
√ p p p
A1 sin(π −λ) = A2 sin(π −λ/2) − tan(2π −λ/2) cos(π −λ/2)
√ √ p p p p
A1 −λ cos(π −λ) = −λ/2A2 cos(π −λ/2) + tan(2π −λ/2) sin(π −λ/2)

Dividing the first by the second equation gives

√
p p p
1 sin(π −λ/2) − tan(2π −λ/2) cos(π −λ/2)
√ tan(π −λ) = p
−λ
p p p
−λ/2 cos(π −λ/2) + tan(2π −λ/2) sin(π −λ/2)

so
√ √ tan(π −λ/2) − tan(2π −λ/2)
p p
tan(π −λ) = 2 p p
1 + tan(2π −λ/2) tan(π −λ/2)
√ p
= − 2 tan π −λ/2
√
So λ has to satisfy this bizarre equation! If we plot the graphs of tan(π −λ) and
√ p
− 2 tan π −λ/2 as functions of λ, see Figure 5.1, we see that they intersect
at a discrete set of points with abscissa λn . Therefore we get separated solutions
with these λn and the other constants can be calculated from the relations we
have already found.
This ceases to be something we can do explicitly because there is no closed form
expression for the numbers λn .
104 CHAPTER 5. THE HEAT EQUATION AND LAPLACE’S EQUATION

-8 -6 -4 -2

-2

-4

-6

√ √ p
Figure 5.1: The graphs of tan(π −λ) and − 2 tan π −λ/2 as functions of
λ. The x-coordinates of the (discrete set of) intersection points are the numbers
we are calling λn .
Chapter 6

The wave equation

6.1 Derivation
Let’s consider a uniform one-dimensional string stretched out straight between
two points (0, 0) and (L, 0). Make a small perturbation of the string so that it
follows the graph of a function u : [0, L] → R with u(0) = u(L) = 0. We will let
go of the string and allow it to vibrate. At time t and above the point x ∈ [0, L]
the height of the string will be φ(x, t). Here’s a claim which we won’t justify,
but which should be intuitively appealing because of Hooke’s law which relates
tension and lengthening:
Claim 76. Assume that the perturbation φ is small and assume the same of its
derivatives. The potential energy in this perturbed string is (to a good approx-
imation) proportional the difference between its length and the length L of the
unperturbed string, i.e. equal to
s 
Z L 2
∂φ
τ  1+ − 1 dx
0 ∂x

where τ is the tension per unit of lengthening. To lowest order in f (by Taylor
expanding the integrand) this is approximately
Z 2
τ L dφ
dx.
2 0 dx
Moreover, the total kinetic energy of the string, once it is in motion, is to a
good approximation
Z 2
ρ L dφ
dx
2 0 dt
where ρ is the density in units of mass per unit length (constant by the assump-
tion of uniformness).

105
106 CHAPTER 6. THE WAVE EQUATION

We saw on Problem Sheet 5 that, in the Lagrangian formulation of mechanics,

equations of motion can be derived by minimising the Lagrangian given by the
difference between kinetic and potential energy. Using this principle, we will
derive the equation of motion for the string by extremising the functional
Z L 2 2 !
ρ ∂φ τ ∂φ
− dx
0 2 ∂t 2 ∂x

The (two-variable) Euler-Lagrange equation for this functional is

∂ ∂φ ∂ ∂φ
ρ − τ
∂t ∂t ∂x ∂x
or
1 ∂2φ ∂2φ
=
c2 ∂t2 ∂x2
p
where c = τ /ρ. We will see that c has the interpretation of speed for the
waves of vertical disturbance that start to move up and down the string.

6.2 Boundary conditions

We have already seen how to solve the wave equation (and more generally linear
hyperbolic equations with constant coefficients) in Section 3.3.2. The general
solution is
φ(x, t) = F (x + ct) + G(x − ct)
for some arbitrary functions F and G. The only remaining task is to compute
F and G from the initial data of the string at time t = 0. Since there are
two arbitrary functions F and G (or because the wave equation has second
order) we need to know more than just the initial profile u of the string. We
need extra boundary or initial conditions. Above, we assumed fixed endpoints
φ(0, t) = φ(L, t) = 0, but there are other possibilities:
• Maybe the string is infinitely long, but we know v(x) ≡ ∂φ ∂t (x, 0). If we
were just letting go of the string then it would be stationary at the instant
we let it go (though it would immediately start to accelerate). In other
words we would have v ≡ 0. However, we allow ourselves more general
functions v.
• Maybe the string extends infinitely far in only one direction (i.e. along
the interval [0, ∞), but is fixed so that φ(0, t) = 0.
More generally there are all sorts of boundary conditions you could stick on
your string. We’ll just concern ourselves with the fixed endpoints and infinite
string settings.
6.2. BOUNDARY CONDITIONS 107

6.2.1 The bounded string: Fourier’s solution

We deal first with the case of a string stretched between 0 and L and fixed at
height zero. That is we impose the boundary conditions

φ(0, t) = φ(L, t) = 0.

We will seek solutions to the wave equation by separating variables, that is look-
ing for solutions φ(x, t) = X(x)T (t). Substituting this into the wave equation
gives:
1
X 00 T = 2 XT 00
c
and so we only need to solve simple harmonic oscillator equations

X 00 = λX
T 00 = c2 λT

Claim 77. There are no nontrivial solutions satisfying the boundary conditions

φ(0, t) = φ(L, t) = 0

if λ ≥ 0.

Proof. When λ = 0 we have X 00 = 0 which means X = Ax + B. For this

to vanish√at x = 0, L we
√ need A = B = 0. Similarly if λ > 0 then X =
A cosh(x λ) + B sinh(x λ); setting x = 0, L and requiring φ(0, t) and φ(L, t)
to vanish implies

A cosh(0) + B sinh(0) = 0 ⇒ A = 0

and √
B sinh(L λ) = 0 ⇒ B = 0.
Hence A = B = 0 and the solution is trivial.
√
In the remaining case λ < 0 there are many solutions when λ = kπ/L as usual

X(x) = sin(kπx/L), T (t) = Ck cos(kcπt/L) + Dk sin(kcπt/L).

Because the equation is second-order in time, we need to specify both f (x) =

φ(x, 0) and g(x) = ∂φ
∂t (x, 0). Suppose that

∞
X
f (x) = Fk sin(kπx/L)
k=1
X∞
g(x) = Gk sin(kπx/L)
k=1
108 CHAPTER 6. THE WAVE EQUATION

are the half-range sinusoidal Fourier series for f and g. Then if

∞
X
φ(x, t) = (Ck cos(kcπt/L) + Dk sin(kcπt/L)) sin(nπx/L)
k=1

we see that f (x) = φ(x, 0) implies

∞
X ∞
X
Fk sin(kπx/L) = Ck sin(nπx/L)
k=1 k=1

and g(x) = ∂φ/∂t(x, 0) implies

∞ ∞
X X kπc
Gk sin(kπx/L) = − Dk sin(nπx/L)
L
k=1 k=1

so Ck = Fk and Dk = LGk /kπc. The solution is therefore

∞
X LGk
φ(x, t) = Fk cos(kcπt/L) + sin(kcπt/L) sin(kπx/L).
kπc
k=1

This has an obvious oscillatory behaviour in time which is characteristic of solu-

tions to the wave equation, unlike the heat equation where oscillatory behaviour
is supressed exponentially. Note that a Fourier mode with higher spatial fre-
quency has correspondingly higher temporal frequency (this is because X and
T satisfy simple harmonic motion with constants differing only by a factor of
1/c2 ).
The different summands are called the normal modes of vibration: k = 1 is
called the fundamental mode, k = 2 is called the second harmonic, k = 3 the
third harmonic, etc. Our claim about frequency means that a string vibrating
in its kth mode will move up and down with frequency kπc/L.
Example 78. Let’s ( do the example of a plucked string, where the initial con-
mx if x ≤ L/2
dition is f (x) = and g(x) = 0. The half-range
mL/2 − mx if x ≥ L/2
Fourier sine series of f (x) is
∞
X 2mL 2 cos(nπ) − cos(nπ/2) nπx
sin(nπ/2) + sin
n=1
nπ nπ 2 L

and the Fourier expansion of G vanishes, so the solution is

∞
X 2mL 2 cos(nπ) − cos(nπ/2) nπx nπct
φ(x, t) = sin(nπ/2) + sin cos .
n=1
nπ nπ 2 L L
6.2. BOUNDARY CONDITIONS 109

6.2.2 The infinite string: D’Alembert’s solution

In this section we will deal with the case of an infinite string, specifying u(x) =
φ(x, 0) and v(x) = ∂φ
∂t (x, 0). We have already seen how to solve the wave equa-
tion (and more generally linear hyperbolic equations with constant coefficients)
in Section 3.3.2.
We know that
φ(x, 0) = u(x) = F (x) + G(x).
Since φ(x, t) = F (x + ct) + G(x − ct) we have
u(x) = F (x) + G(x)
v(x) = c(F 0 (x) − G0 (x))
and so by the fundamental theorem of calculus
Z x
v(s)ds = c(F (x) − G(x)) + K
0

for some constant K, and hence

Z x
1 1
F (x) = u(x) + v(s)ds − K
2 c
Z0 x
1 1
G(x) = u(x) − v(s)ds − K
2 c 0

which gives the general solution

φ(x, t) = F (x + ct) + G(x − ct)
Z x+ct Z x−ct
1 1
= (u(x + ct) + u(x − ct)) + − v(s)ds
2 2c 0 0
Z x+ct
1 1
= (u(x + ct) + u(x − ct)) + v(s)ds
2 2c x−ct

due to D’Alembert.

6.2.3 Example

For a similar example worked out from first principles, see Section 3.3.2.
Example 79. Suppose that φ(x, 0) = sin x and ∂φ/∂t(x, 0) = cos x. Then
Z x+ct
1 1
φ(x, t) = (sin(x + ct) + sin(x − ct)) + cos ξdξ
2 2c x−ct
1 1
= (sin(x + ct) + sin(x − ct)) + (sin(x + ct) − sin(x − ct))
2 2c
1 1
= (1 + 1/c) sin(x + ct) + (1 − 1/c) sin(x − ct).
2 2
110 CHAPTER 6. THE WAVE EQUATION

6.2.4 Propagating signals

Let’s try to understand d’Alembert’s solution a little better. Assume for the
moment that v ≡ 0 so that the solution is
1
φ(x, t) = (u(x + ct) + u(x − ct)) .
2
At each time t this is a superposition of two terms. Both terms have the same
profile (shape), namely the profile of u (scaled down by a factor of two). As time
progresses one of these profiles seems to move to the right (u(x − ct), because its
argument is decreasing as t increases) and the other (u(x + ct)) seems to move
to the left. For instance, in Example 79 the crests of the 1+1/c
2 sin(x + ct) wave
occur at
x + ct = 2πn + π/2
Let xn (t) denote the position of the nth crest at time t. This gives
xn (t) = 2πn + π/2 − ct
so as t increases, xn (t) decreases and the crest moves to the left. We therefore
call this a left-moving wave and the other a right-moving wave.
Example 80. Suppose that
(
1 if |x| < 1
u(x) =
0 otherwise.
Then two little square waves move off in either direction. Notice that an observer
standing k metres to the left will only notice the arrival of the square wave after
k/c seconds. In other words, signals propagate at a speed of precisely c.

We represent this causal relationship diagrammatically by drawing the (x, t)-

plane (called a spacetime diagram). See Figure 6.1. In the pictures, c = 1 so
that the slope of a line traced out by a wave front leaving the point x = 0
at time t = 0 is 1. We’ll talk about the waves as light waves (just to aid the
imagination). The lower part of the figure shows as dotted lines the light rays
emitted forward in time from a point situated at the origin and the light which
arrives at that point from the past. The forward light cone is the set of all points
which could be reached from this point by travelling along light rays (maybe
alternating between left- or right-moving rays by a cunning use of mirrors). On
the other hand, light can never reach the regions x > t because it would have
to travel faster than light to get there. The backward light-cone is the set of all
points from which light could conceivably reach the origin via a cunning system
of mirrors.
In the upper part of the diagram you can see the support of the two square
waves propagating to the left and right. The support of a function is the set of
points where it’s non-zero. The interval [−1, 1] at time 0 is the initial support of
the square wave. As time progresses, the supports of the left- and right-moving
square-waves move left and right along the dotted lines.
6.2. BOUNDARY CONDITIONS 111

Figure 6.1: A spacetime diagram showing light rays (moving at speed c = 1)

emitted from a point at the origin (lower part) and showing the motion of a
left- and right-moving square wave (upper part).

6.2.5 Comparison of Fourier and D’Alembert solutions

We now have two ways of solving the wave equation: by separating variables à
la Fourier and by a clever change of coordinates due to D’Alembert. It is easier
to apply the separated variables method to boundary-value problems because
it is not so easy to express the boundary conditions in terms of the arbitrary
functions F and G from D’Alembert’s solution. Of course the two solutions are
related by a standard trigonometric identity:

cos p(x − ct) − cos p(x + ct)

sin px sin pct =
2
112 CHAPTER 6. THE WAVE EQUATION

i.e. a separated solution can be seen as a superposition of right- and left-moving

waves.

6.3 Some simple waves on an infinite string

6.3.1 Simple waves on an infinite string

One particularly simple wave which we’re allowed on an infinite string is the
simple harmonic wave

φ(x, t) = A cos(k(x − ct) + B)

which we could re-express in terms of sin(k(x − ct)) and cos(k(x − ct)) if we were
so inclined. The maximal height a is the amplitude, k is the angular wavenumber,
λ = 2π/k is the wavelength, ω = kc is the angular frequency and T = 2π/ω is
the period. Changing the constant B is called shifting the phase of the wave.
This wave is moving to the left (as time increases, x − ct is decreasing) and you
could create a right-moving wave by using x + ct.
A standing wave, which, as the name suggests, doesn’t move, can be obtained by
adding a left-moving and a right-moving simple harmonic wave with the same
amplitude and frequency

φ(x, t) = A cos(k(x − ct)) + A cos(k(x + ct)) = 2A cos(kx) cos(kct)

Another way to write the simple harmonic wave is as

φ(x, t) = Re AeiB exp(ik(x − ct))

6.3.2 Reflection and transmission coefficients

Suppose that there are two strings tied together at the origin and extending
off to infinity in either direction. Suppose that waves travel at speed c− in the
left-hand string and c+ in the right-hand string. Imagine a simple harmonic
wave incoming from the left

Aeik(x−c− t) , x≤0

and when it hits the discontinuity some of it is reflected and some of it is

transmitted, resulting in a composition of three waves: incoming, reflected and
transmitted, with amplitudes A, R, T and frequencies k, `, m. In other words,
the resulting wave has the form:
(
Aeik(x−c− t) + Ri`(x+c− t) when x ≤ 0
φ(x, t) =
T eim(x−c+ t) when x ≥ 0
6.3. SOME SIMPLE WAVES ON AN INFINITE STRING 113

Note that the reflected term is right-moving! This is certainly a solution to

the wave equation away from x = 0: it’s a linear superposition of solutions
and the equation is linear. At x = 0 it’s not clear that the wave equation
even makes sense because φ might not be twice differentiable there. However,
in physics people often make such guesses, claiming that the equation itself is
discontinuous at the origin and therefore it doesn’t matter that the solution
isn’t twice differentiable there: we simply don’t care what’s going on. There’s
some physical process going on which we don’t understand and which reflects
and transmits the waves, all we can do is observe the reflected and transmitted
waves as they travel out, away from the origin, to regions where we understand
the wave equation! This is characteristic of scattering calculations.
Despite our ignorance, we will try to calculate R, T , ` and m in terms of A and
k. To do this we need to impose some assumptions about φ at x = 0: namely,
we want φ and its first x-derivative to be continuous at x = 0. Then

Ae−ikc− t + Rei`c− t = T e−imc+ t

and
kAeikc− t + `Rei`c− t = mT e−imc+ t .
Since eint are linearly independent in the space of all complex-valued functions
we need kc− = −`c− = mc+ , that is

` = −k, m = kc− /c+ .

The two equations become

A+R=T
and
k(A − R) = kT c− /c+
which we can solve to get

2Ac+ A(c+ − c− )
T = , R= .
c− + c+ c− + c+
114 CHAPTER 6. THE WAVE EQUATION
Part III

Appendices

115
Appendix A

Recap of Fourier theory

A.1 Fourier series

Let L ∈ (0, ∞) be a positive real number. Fourier theory is concerned with

functions f : R → R which are periodic with period 2L (in the sense that
f (x + 2L) = f (x)) and the attempt to express them in the form
∞
X nπx ∞
X nπx
f (x) = c + an cos + bn cos . (A.1)
n=1
L n=1
L

The letters m and n will always stand for integers greater than or equal to one.
(
L
0 if m 6= n,
Z mπx nπx
sin sin dx = (A.2)
−L L L L otherwise.
(
L
0 if m 6= n,
Z mπx nπx
cos cos dx = (A.3)
−L L L L otherwise.
Z L mπx nπx
sin cos dx = 0 (A.4)
−L L L

Assume for one moment that we are allowed to swap integrals and sums1 , for
instance:
Z L ∞
X nπx ∞ Z
X L nπx
an cos dx = an cos dx.
−L n=1 L n=1 −L L

1 You will see justification for this kind of operation next term in Analysis 4. Specifically,

we need to assume that the Fourier series converges uniformly to f .

117
118 APPENDIX A. RECAP OF FOURIER THEORY

Then a function f (x) with Fourier series given by (A.1) has Fourier coefficients
Z L
1
c= f (x)dx (A.5)
2L −L
1 L
Z nπx
an = f (x) cos dx (A.6)
L −L L
1 L
Z nπx
bn = f (x) sin dx (A.7)
L −L L
So when do Fourier series exist? Providing f is Riemann-integrable over
[−L, L] you can make sense of the integrals (A.6) and (A.7) and write down a
Fourier series (see Question 5 above).
But when do they actually have anything to do with the function you
started with? Let’s define the partial sums
N
X nπx N
X nπx
SN (x) = c + an cos + bn sin .
n=1
L n=1
L

Theorem 81. If f : R → R is a piecewise continuously differentiable function

with period 2L then it admits a Fourier series (A.1) which converges pointwise
to f , i.e. for all x ∈ R such that f is differentiable at x, the difference

f (x) − SN (x)

tends to zero as N → ∞. Moreover, when x0 is a point where f is discontinuous,

the Fourier sums converge to
f (x0 − ) + f (x0 + )
SN (x0 ) → lim .
→0 2

A.2 Square-integrable functions and Parseval’s

theorem
Theorem 82 (Parseval’s theorem). If f : R → R is periodic with period 2L
and square-integrable over [−L, L] (meaning that the integral
Z L
|f (x)|2 dx
−L

exists and is finite) then

SN → f
in the very weak sense that
Z L
|f (x) − SN (x)|2 dx → 0
−L
A.3. INTERPRETATION 119

as N → ∞. In terms familiar from statistics, you can think of SN as a least-

squares approximation to f (but by Fourier sums, rather than polynomials).
Consequently,
∞
1 L
X Z
2 2 2
2c + (an + bn ) = |f (x)|2 dx. (A.8)
n=1
L −L

A.3 Interpretation
The way you should think about what’s going on is by analogy with finite-
dimensional vector spaces. With a finite-dimensional vector space V you can
pick an inner product
h, i : V × V → R
which, given two vectors, outputs a number. The vectors v, w are orthogonal if
hv, wi = 0. A basis e1 , . . . , en is orthonormal if hei , ej i = δij .
In Fourier theory the relevant vector space is the space of periodic functions
with period 2L, square-integrable on [−L, L]. The inner product of f and g is

1 L
Z
f (x)g(x)dx.
L −L
RL RL
Note that this is a well-defined integral: if the integrals −L f (x)2 dx and −L g(x)2 dx
are both finite then by the Cauchy-Schwarz inequality,
2 ! Z !
Z L Z L L
f (x)g(x)dx ≤ f (x)2 dx g(x)2 dx ,
−L −L −L

the inner product is also finite.

With respect to this inner product the “vectors” (i.e. functions)
1 1 mπx 1 mπx
√ , √ cos , √ sin , m≥1
2L L L L L
are all orthonormal. However, they do not form a basis! Being a basis for a
vector space means that any other vector (in this case function) can be written
as a linear combination of a finite number of basis elements. The Fourier
coefficients are the orthogonal projections of a given square-integrable function
f onto the basis directions and we have certainly seen functions with infinite
Fourier series.
The right way to think about this is that the space of Fourier sums (i.e. Fourier
series with only finitely many nonzero coefficients) is dense in the space of all
square-integrable functions, and that you can approximate (in the least-squares
sense!) arbitrary square-integrable functions by infinite sequences of Fourier
sums SN and let N → ∞.
120 APPENDIX A. RECAP OF FOURIER THEORY
Appendix B

Sage code for diagrams

B.1 Figure 1.1

y=var(’y’)
x=var(’x’)
t=var(’t’)
P=plot3d(x*y/(x^2+y^2),(x,-1,1),(y,-1,1),plot_points=[150,150],frame=False)
Ax=parametric_plot(vector((t,0,0)),(t,-1.1,1.1), thickness=3, color=’black’)
Ay=parametric_plot(vector((0,t,0)),(t,-1.1,1.1), thickness=3, color=’black’)
Az=parametric_plot(vector((0,0,t)),(t,-0.7,0.7), thickness=3, color=’black’)
show(P+Ax+Ay+Az)

B.2 Figure 1.2

x=var(’x’)
y=var(’y’)
t=var(’t’)
P=plot3d(x^3-x-y^2,(x,-1,1),(y,-1,1),opacity=0.8,plot_points=[20,20],frame=False)
Q=plot3d(-x,(x,-0.4,0.4),(y,-0.4,0.4),opacity=0.8,color=’gray’,
plot_points=[10,10],frame=False)
R=parametric_plot3d((0,0,0),(t,0,1),color=’black’,thickness=15)
show(P+Q+R)

B.3 Figure 1.3

y=var(’y’)
x=var(’x’)
t=var(’t’)
P=plot3d(x^4+y^4-x^2-y^2,(x,-0.5,1),(y,-0.5,1),plot_points=[20,20],
opacity=0.85,frame=False)
Q=plot3d(0,(x,-0.3,0.3),(y,-0.3,0.3),color=’gray’,plot_points=[20,20],
opacity=0.8,frame=False)
Ax=parametric_plot(vector((t,0,0)),(t,-1.1,1.1), thickness=3, color=’black’)
Ay=parametric_plot(vector((0,t,0)),(t,-1.1,1.1), thickness=3, color=’black’)
Az=parametric_plot(vector((0,0,t)),(t,-0.7,0.7), thickness=3, color=’black’)
R=parametric_plot3d((0,0,0),(t,0,1),color=’black’,thickness=15)
show(P+Q+R)

121
122 APPENDIX B. SAGE CODE FOR DIAGRAMS

B.4 Figure 1.4

x=var(’x’)
y=var(’y’)
P=plot3d(x^2-y^2,(x,-1,1),(y,-1,1),opacity=0.8,plot_points=[20,20],
frame=False)
Ax=parametric_plot(vector((t,0,0)),(t,-1.1,1.1), thickness=3,
color=’black’)
Ay=parametric_plot(vector((0,t,0)),(t,-1.1,1.1), thickness=3,
color=’black’)
Az=parametric_plot(vector((0,0,t)),(t,-0.7,0.7), thickness=3,
color=’black’)
show(P+Ax+Ay+Az)

B.5 Figures 4.1 and 4.2

B.5.1 Parts (a)
Use one or other of the show commands.

x=var(’x’)
y=var(’y’)
z=var(’z’)
t=var(’t’)
u=var(’u’)
p=plot_vector_field3d((1,z,0),(x,-1,1),(y,-1,1),(z,-1,1),frame=False)
q=plot_vector_field3d((1,z,0),(x,-1,1),(y,-1,1),(z,0,1),frame=False)
r=parametric_plot3d((t,u+t*u,u),(t,0,1),(u,-1,1),plot_points=[10,10],
opacity=0.8,frame=False)
s=parametric_plot3d((t,u+t*u^2,u^2),(t,0,1),(u,-1,1),plot_points=[10,10],
opacity=0.8,frame=False)
Ax=parametric_plot(vector((t,0,0)),(t,-1.1,1.1),
thickness=3, color=’black’)
Ay=parametric_plot(vector((0,t,0)),(t,-1.1,1.1),
thickness=3, color=’black’)
Az=parametric_plot(vector((0,0,t)),(t,-0.7,0.7),
thickness=3, color=’black’)
Az2=parametric_plot(vector((0,0,t)),(t,0,0.7),
thickness=3, color=’black’)
show(p+r+Ax+Ay+Az,mesh=True)
show(q+s+Ax+Ay+Az2,mesh=True)

B.5.2 Figure 4.1(b)

t=var(’t’)
p1=parametric_plot((t,0),(t,-1.2,1),axes=False)
p2=parametric_plot((t,0.1+0.1*t),(t,-1.2,1),axes=False)
p3=parametric_plot((t,0.2+0.2*t),(t,-1.2,1),axes=False)
p4=parametric_plot((t,0.3+0.3*t),(t,-1.2,1),axes=False)
p5=parametric_plot((t,0.4+0.4*t),(t,-1.2,1),axes=False)
p6=parametric_plot((t,0.5+0.5*t),(t,-1.2,1),axes=False)
p7=parametric_plot((t,0.6+0.6*t),(t,-1.2,1),axes=False)
p8=parametric_plot((t,0.7+0.7*t),(t,-1.2,1),axes=False)
p9=parametric_plot((t,0.8+0.8*t),(t,-1.2,1),axes=False)
pA=parametric_plot((t,0.9+0.9*t),(t,-1.2,1),axes=False)
pB=parametric_plot((t,1+1^2*t),(t,-1.2,1),axes=False)
q1=parametric_plot((t,0),(t,-1.2,1),axes=False)
q2=parametric_plot((t,-0.1-0.1*t),(t,-1.2,1),axes=False)
q3=parametric_plot((t,-0.2-0.2*t),(t,-1.2,1),axes=False)
q4=parametric_plot((t,-0.3-0.3*t),(t,-1.2,1),axes=False)
q5=parametric_plot((t,-0.4-0.4*t),(t,-1.2,1),axes=False)
q6=parametric_plot((t,-0.5-0.5*t),(t,-1.2,1),axes=False)
B.6. FIGURE ?? 123

q7=parametric_plot((t,-0.6-0.6*t),(t,-1.2,1),axes=False)
q8=parametric_plot((t,-0.7-0.7*t),(t,-1.2,1),axes=False)
q9=parametric_plot((t,-0.8-0.8*t),(t,-1.2,1),axes=False)
qA=parametric_plot((t,-0.9-0.9*t),(t,-1.2,1),axes=False)
qB=parametric_plot((t,-1-1*t),(t,-1.2,1),axes=False)
Ax=parametric_plot(vector((t,0)),(t,-1.1,1.1),
thickness=3, color=’black’)
Ay=parametric_plot(vector((0,t)),(t,-1.1,1.1),
thickness=3, color=’black’)
show(p1+p2+p3+p4+p5+p6+p7+p8+p9+pA+pB+q1+q2+q3+q4+q5+q6+
q7+q8+q9+qA+qB+Ax+Ay)

B.5.3 Figure 4.2(b)

t=var(’t’)
p1=parametric_plot((t,0),(t,-1.2,1),axes=False)
p2=parametric_plot((t,0.1+0.1^2*t),(t,-1.2,1),axes=False)
p3=parametric_plot((t,0.2+0.2^2*t),(t,-1.2,1),axes=False)
p4=parametric_plot((t,0.3+0.3^2*t),(t,-1.2,1),axes=False)
p5=parametric_plot((t,0.4+0.4^2*t),(t,-1.2,1),axes=False)
p6=parametric_plot((t,0.5+0.5^2*t),(t,-1.2,1),axes=False)
p7=parametric_plot((t,0.6+0.6^2*t),(t,-1.2,1),axes=False)
p8=parametric_plot((t,0.7+0.7^2*t),(t,-1.2,1),axes=False)
p9=parametric_plot((t,0.8+0.8^2*t),(t,-1.2,1),axes=False)
pA=parametric_plot((t,0.9+0.9^2*t),(t,-1.2,1),axes=False)
pB=parametric_plot((t,1+1^2*t),(t,-1.2,1),axes=False)
q1=parametric_plot((t,0),(t,-1.2,1),axes=False)
q2=parametric_plot((t,-0.1+0.1^2*t),(t,-1.2,1),axes=False)
q3=parametric_plot((t,-0.2+0.2^2*t),(t,-1.2,1),axes=False)
q4=parametric_plot((t,-0.3+0.3^2*t),(t,-1.2,1),axes=False)
q5=parametric_plot((t,-0.4+0.4^2*t),(t,-1.2,1),axes=False)
q6=parametric_plot((t,-0.5+0.5^2*t),(t,-1.2,1),axes=False)
q7=parametric_plot((t,-0.6+0.6^2*t),(t,-1.2,1),axes=False)
q8=parametric_plot((t,-0.7+0.7^2*t),(t,-1.2,1),axes=False)
q9=parametric_plot((t,-0.8+0.8^2*t),(t,-1.2,1),axes=False)
qA=parametric_plot((t,-0.9+0.9^2*t),(t,-1.2,1),axes=False)
qB=parametric_plot((t,-1+1^2*t),(t,-1.2,1),axes=False)
Ax=parametric_plot(vector((t,0)),(t,-1.1,1.1),
thickness=3, color=’black’)
Ay=parametric_plot(vector((0,t)),(t,-1.1,1.1),
thickness=3, color=’black’)
r=implicit_plot(1+4*x*y==0,(x,-1,1),(y,-1,1),color=’red’,
frame=False)
show(p1+p2+p3+p4+p5+p6+p7+p8+p9+pA+pB+q1+q2+q3+q4+q5+q6+
q7+q8+q9+qA+qB+Ax+Ay+r)

B.6 Figure 4.4

t=var(’t’)
p_1=parametric_plot((2*cos(t),sin(t)),(t,0,2*pi))
p_2=parametric_plot((2*cos(t)-0.2*cos(t)/(1+sin(t)*sin(t)),sin(t)-0.2*2*sin(t)/(1+sin(t)*sin(t))),(t,0,2*pi))
p_3=parametric_plot((2*cos(t)-0.5*cos(t)/(1+sin(t)*sin(t)),sin(t)-0.5*2*sin(t)/(1+sin(t)*sin(t))),(t,0,2*pi))
p_4=parametric_plot((2*cos(t)-0.7*cos(t)/(1+sin(t)*sin(t)),sin(t)-0.7*2*sin(t)/(1+sin(t)*sin(t))),(t,0,2*pi))
p_5=parametric_plot((2*cos(t)-1*cos(t)/(1+sin(t)*sin(t)),sin(t)-1*2*sin(t)/(1+sin(t)*sin(t))),(t,0,2*pi))
p_6=parametric_plot((2*cos(t)-1.2*cos(t)/(1+sin(t)*sin(t)),sin(t)-1.2*2*sin(t)/(1+sin(t)*sin(t))),(t,0,2*pi))
show(p_1+p_2+p_3+p_4+p_5+p_6)

2023 - Partial Differential Equations (Chicago)
No ratings yet
2023 - Partial Differential Equations (Chicago)
115 pages
Partial Differential Equation Notes
No ratings yet
Partial Differential Equation Notes
103 pages
m820 2011
67% (3)
m820 2011
396 pages
Intro to Partial Differential Equations
No ratings yet
Intro to Partial Differential Equations
192 pages
KLUWERPD
No ratings yet
KLUWERPD
417 pages
Linear Partial Diferential Equations and Fourier Theory PDF
No ratings yet
Linear Partial Diferential Equations and Fourier Theory PDF
431 pages
Partial Differential Equations and Fourier
100% (1)
Partial Differential Equations and Fourier
431 pages
Differential Equations PDF
No ratings yet
Differential Equations PDF
128 pages
Differential Equations
No ratings yet
Differential Equations
128 pages
Introduction To Differential Equations: Jeffrey R. Chasnov
100% (1)
Introduction To Differential Equations: Jeffrey R. Chasnov
128 pages
Introduction To Differential Equations
No ratings yet
Introduction To Differential Equations
128 pages
Partial Differential Equations Guide
No ratings yet
Partial Differential Equations Guide
314 pages
Elementary Course On PDEs by TMK
No ratings yet
Elementary Course On PDEs by TMK
199 pages
Partial Differential Equations: T. Muthukumar Tmk@iitk - Ac.in April 18, 2014
No ratings yet
Partial Differential Equations: T. Muthukumar Tmk@iitk - Ac.in April 18, 2014
199 pages
Differential Equations
No ratings yet
Differential Equations
128 pages
Lectures On AM
100% (4)
Lectures On AM
384 pages
Functional 1
No ratings yet
Functional 1
71 pages
Differential Equations
100% (1)
Differential Equations
149 pages
Mathematical Methods Notes
No ratings yet
Mathematical Methods Notes
432 pages
Calculus of Variations
No ratings yet
Calculus of Variations
176 pages
Math 316 Notes
No ratings yet
Math 316 Notes
179 pages
m820 Block1
100% (1)
m820 Block1
208 pages
Calculus of Variations
No ratings yet
Calculus of Variations
176 pages
ND Mathematical Methods Lecture Notes
No ratings yet
ND Mathematical Methods Lecture Notes
502 pages
Calculus of Variations
No ratings yet
Calculus of Variations
141 pages
Lecture Notes On Mathematical Methods
No ratings yet
Lecture Notes On Mathematical Methods
502 pages
Sturm-Liouville Problem
No ratings yet
Sturm-Liouville Problem
43 pages
Differential Equations Text
No ratings yet
Differential Equations Text
238 pages
Differential Equation For Engineer
No ratings yet
Differential Equation For Engineer
149 pages
Differential Equations: Jeffrey R. Chasnov
No ratings yet
Differential Equations: Jeffrey R. Chasnov
149 pages
Walet - Further Mathematical Methods PDF
No ratings yet
Walet - Further Mathematical Methods PDF
79 pages
Multivariable Functions Fields and Vector Calculus Notes 2020 PDF
No ratings yet
Multivariable Functions Fields and Vector Calculus Notes 2020 PDF
99 pages
PDE Notes
No ratings yet
PDE Notes
23 pages
Differential Equations Notes
No ratings yet
Differential Equations Notes
197 pages
MECH2407 Lec 2021
100% (1)
MECH2407 Lec 2021
81 pages
LectureNotesMATH2048 PDF
No ratings yet
LectureNotesMATH2048 PDF
148 pages
Notes On Guenter Lee PDEs PDF
No ratings yet
Notes On Guenter Lee PDEs PDF
168 pages
Differential Equation
100% (1)
Differential Equation
168 pages
Mathematical Methods Lecture Notes
100% (1)
Mathematical Methods Lecture Notes
488 pages
Algebra Lineal y Otros Temas
No ratings yet
Algebra Lineal y Otros Temas
369 pages
Notes On Differential Equations: Alejandro Cantarero
100% (1)
Notes On Differential Equations: Alejandro Cantarero
28 pages
Y' + Iy") by A Vector From (X', Y') OY y (+ Iy", But The Vector Is Left Unchanged Since Its End Points Are
No ratings yet
Y' + Iy") by A Vector From (X', Y') OY y (+ Iy", But The Vector Is Left Unchanged Since Its End Points Are
4 pages
B. S. Belikov-General Methods For Solving Physics Problems (1988) - Text
No ratings yet
B. S. Belikov-General Methods For Solving Physics Problems (1988) - Text
298 pages
SRv6 Network Programming Ushering in A New Era of IP Networks
No ratings yet
SRv6 Network Programming Ushering in A New Era of IP Networks
657 pages
Logic Lectures - G Odel's Basic Logic Course at Notre Dame
No ratings yet
Logic Lectures - G Odel's Basic Logic Course at Notre Dame
302 pages
Swahilienglishdi00mada BW
No ratings yet
Swahilienglishdi00mada BW
474 pages
Thinking Small - The Processes For Creating Small Memory Software
No ratings yet
Thinking Small - The Processes For Creating Small Memory Software
37 pages
Synchronous Message Passing and Semaphor
No ratings yet
Synchronous Message Passing and Semaphor
14 pages
Outlines of Metaphysics
No ratings yet
Outlines of Metaphysics
193 pages
The Foundations of Empirical Knowledge Text
No ratings yet
The Foundations of Empirical Knowledge Text
277 pages
User Involvement - Handling Memory Constraints in The UI
No ratings yet
User Involvement - Handling Memory Constraints in The UI
35 pages
The Zeigarnik Effect
No ratings yet
The Zeigarnik Effect
12 pages
Writing A Simple 16 Bit VM in Less Than 125 Lines of C - Andreinc
No ratings yet
Writing A Simple 16 Bit VM in Less Than 125 Lines of C - Andreinc
38 pages
Debugger Specification
No ratings yet
Debugger Specification
22 pages
Logic and Representation (CSLI Lecture Notes 39) (1995)
No ratings yet
Logic and Representation (CSLI Lecture Notes 39) (1995)
210 pages
Demystifying Bitwise Operations, A Gentle C Tutorial - Andreinc
No ratings yet
Demystifying Bitwise Operations, A Gentle C Tutorial - Andreinc
77 pages
Implementing Hash Tables in C - Andreinc
No ratings yet
Implementing Hash Tables in C - Andreinc
79 pages
Gems From The Mathematics Teacher
No ratings yet
Gems From The Mathematics Teacher
168 pages
Problems of Maxima and Minima
No ratings yet
Problems of Maxima and Minima
178 pages
Randy Gaul's Game Programming Blog - Object Oriented C - Virtual Table (Vtable)
No ratings yet
Randy Gaul's Game Programming Blog - Object Oriented C - Virtual Table (Vtable)
3 pages
Implementing A Generic Priority Queue in C (Using Heaps) - Andreinc
No ratings yet
Implementing A Generic Priority Queue in C (Using Heaps) - Andreinc
9 pages
Porting GCC To Thor MCU
No ratings yet
Porting GCC To Thor MCU
156 pages
Interpretation of Mathematical Formulae
No ratings yet
Interpretation of Mathematical Formulae
242 pages
Variational Methods For The Solution of Problems of Equilibrium and Vibration
No ratings yet
Variational Methods For The Solution of Problems of Equilibrium and Vibration
1 page
ES 201 Course Syllabusjj1b
100% (3)
ES 201 Course Syllabusjj1b
2 pages
PMRF Course Dash
No ratings yet
PMRF Course Dash
2 pages
The Finite Element Method: Section 2
No ratings yet
The Finite Element Method: Section 2
12 pages
Text 11
No ratings yet
Text 11
4 pages
Chapter2 Num Methods IVPs
No ratings yet
Chapter2 Num Methods IVPs
113 pages
DOLFIN For Solving PDE With FE
No ratings yet
DOLFIN For Solving PDE With FE
12 pages
Stochastic Analysis Exam Syllabus
No ratings yet
Stochastic Analysis Exam Syllabus
2 pages
FVM PDF
No ratings yet
FVM PDF
35 pages
Tutorial - CVPD - 2019 202019 12 18 13 01 50
No ratings yet
Tutorial - CVPD - 2019 202019 12 18 13 01 50
77 pages
Chapter1 Taylor Series
No ratings yet
Chapter1 Taylor Series
2 pages
Numerical Methods for Students
No ratings yet
Numerical Methods for Students
30 pages
Estimation of Phenol
No ratings yet
Estimation of Phenol
9 pages
Lessons 1-4 PDF
No ratings yet
Lessons 1-4 PDF
4 pages
Beginning Partial Differential Equations Second Edition Peter V. O'Neil (Auth.) Newest Edition 2025
100% (3)
Beginning Partial Differential Equations Second Edition Peter V. O'Neil (Auth.) Newest Edition 2025
153 pages
Unit 1
No ratings yet
Unit 1
86 pages
Exponential Growth and Decay Problems
No ratings yet
Exponential Growth and Decay Problems
10 pages
HW 7 Sols
No ratings yet
HW 7 Sols
17 pages
Elementary Differential Equations Book
100% (3)
Elementary Differential Equations Book
807 pages
Lecture-1 Basics of Ordinary Differential Equations
100% (1)
Lecture-1 Basics of Ordinary Differential Equations
13 pages
Institute For Ias/Ifos Examinations: K. Venkanna
No ratings yet
Institute For Ias/Ifos Examinations: K. Venkanna
40 pages
Difference Equations in Economics
No ratings yet
Difference Equations in Economics
55 pages
Uma2003 - PS 3
No ratings yet
Uma2003 - PS 3
2 pages
Euler and Differential - Kelompok 2
No ratings yet
Euler and Differential - Kelompok 2
25 pages
Euler Method
No ratings yet
Euler Method
5 pages
Painlevé PIV Solutions for Experts
No ratings yet
Painlevé PIV Solutions for Experts
49 pages
Assignment3 APM2611 2023
No ratings yet
Assignment3 APM2611 2023
3 pages
Mathematics III Chemical Engineering Study Material
No ratings yet
Mathematics III Chemical Engineering Study Material
38 pages
10 Rayleigh Ritz Method-1
No ratings yet
10 Rayleigh Ritz Method-1
19 pages
Solution of Linear and Nonlinear Partial Differential Equations Using Mixture of Elzaki Transform and The Projected Differential Transform Method
No ratings yet
Solution of Linear and Nonlinear Partial Differential Equations Using Mixture of Elzaki Transform and The Projected Differential Transform Method
10 pages

Mathematical Methods

Uploaded by

Mathematical Methods

Uploaded by

University College London

Autumn Term 2012

Last updated January 15, 2013

I Critical points in finite- and infinite-dimensional prob-

2.3.2 Area of a frustum . . . . . . . . . . . . . . . . . . . . . . 49

II Partial differential equations 61

4 First order PDE 71

5 The heat equation and Laplace’s equation 87

5.4 Discontinuities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

6 The wave equation 105

III Appendices 115

B Sage code for diagrams 121

The course is divided into two parts.

resulting function. The result is an equation for x0 :

• The wave equation

What you should already know

Critical points in finite- and

Calculus with several

1.1 First derivatives

1.1.1 Partial derivatives

Given a function f : R2 → R, its partial derivatives are what we get by differ-

which we know how to differentiate (assuming it’s differentiable!). We define

We will sometimes abbreviate ∂f ∂x to ∂x f or just fx . Often you will see it written

1.1.2 Directional derivative

Why should we restrict attention to the lines x = x0 or y = y0 ? Let’s pick a

By Theorem 2, at each point this is a linear map

so we can write df as a row-vector (also known as a covector)

1.1.3 Linear approximation

The point of introducing the total derivative dp f : R2 → R is that it is a good

We first recall the situation in one dimension. Let g : R → R be a differentiable

Let f : R2 → R be a function with continuous partial derivatives. Let (v1 , v2 )

f (x0 + v1 , y0 + v2 ) = f (x0 , y0 ) + v1 ∂x f (x0 , y0 ) + v2 ∂y f (x0 , y0 ) + |v|η(|v|)

where η() → 0 as  → 0. In other words,

f (x0 + v1 , y0 + v2 ) = f (x0 , y0 ) + dp f (v) + |v|η(|v|).

Remark 4. The total derivative dp f is a good approximation to f at p in the

1.1.4 The gradient

Another, equivalent way of packaging the information in the total derivative is

1.1.5 The matrix of partial derivatives

In general for a vector-valued function F = (F1 , . . . , Fm ) : Rn → Rm there are

In particular, for a change of variables F : R2 → R2 ,

F (x, y) = (u(x, y), v(x, y))

we define the Jacobian matrix

We first make explicit note of a useful special case.

Proof of Theorem 6. See http://youtu.be/BtgBCPdT8rM. Let v be a vector

F (p + v) = F (p) + dp F (v) + |v|η(|v|)

G(F (p + v)) = G(F (x) + dp F (v) + |v|η(|v|))

so if we set u = F (p) and w = dp F (v) + |v|η(|v|), we get

by linearity of dF (p) G. The term

dF (p) G(|v|η(|v|)) + |w|ξ(|w|)

1.1.6 Critical points

We are interested in maxima and minima of functions. I just want to recall:

In one variable, we have the following theorem:

for very small  then

Proof. Suppose f has a local maximum or minimum at p. Let {(x0 + tv1 , y0 +

but this is precisely vp (f ).

Definition 11 (Critical point). Let f : R2 → R be a C 1 -function. A point

1.2 Second derivatives

In one variable there is a sufficient condition for a critial point to be a local

Proof. From Taylor’s Theorem:

since g 0 (x0 ) = 0 by the condition that x0 is critical. Now by taking  small

for all small . Hence g has a local maximum at x0 .

The second partial derivative

∂2f ∂2f ∂2f

Given the definition

of the directional derivative in the direction v = (v1 , v2 ) at p = (x0 , y0 ), the

Equivalently, consider the directional derivative vp (f ) as a function of p,

and take its directional derivative in the v direction:

Henceforth we’ll omit the subscript p, so this can be written as

Lemma 15. If v = (v1 , v2 ) then

Proof. We know that

v(v(f )) = (v1 ∂x + v2 ∂y )(v1 ∂x + v2 ∂y )f

by the binomial theorem.

The generalisation to higher derivatives can be derived similarly and involves

1.2.2 The Hessian and Taylor’s theorem

The two-by-two matrix is called the Hessian of f at p, Hessp (f ). The quadratic

1.2.3 The second derivative test

where η() → 0 as → 0. In other words,

F (p + v) = F (p) + dp F (v) + |v|η(|v|)

G(F (p + v)) = G(F (x) + dp F (v) + |v|η(|v|))

so if we set u = F (p) and w = dp F (v) + |v|η(|v|), we get

dF (p) G(|v|η(|v|)) + |w|ξ(|w|)

for very small then

since g 0 (x0 ) = 0 by the condition that x0 is critical. Now by taking small

for all small . Hence g has a local maximum at x0 .

Proof. Define (t) = δ(t) − γ(t).

by the fundamental theorem of calculus. But because (0) = (1) = 0 (remember

α(δ) = α(γ) + α() ≥ α(γ)

with equality if and only if ≡ 0, if and only if δ ≡ γ.

so the directional derivative of α in the -direction at γ is

where we have integrated by parts to get rid of s.

dγ α() = 0 for all

for some variation : [a, b] → R with (a) = (b) = 0.