Mathematical Methods
Mathematical Methods
Mathematical Methods 3
Jonny Evans
2 Calculus of variations 35
2.1 Straight lines are shortest paths . . . . . . . . . . . . . . . . . . . 35
2.1.1 Paths and length . . . . . . . . . . . . . . . . . . . . . . . 36
2.1.2 Straight lines minimise action . . . . . . . . . . . . . . . . 37
2.1.3 Straight lines minimise length . . . . . . . . . . . . . . . . 38
2.2 The Euler-Lagrange equations . . . . . . . . . . . . . . . . . . . . 39
2.2.1 Key steps reviewed . . . . . . . . . . . . . . . . . . . . . . 39
2.2.2 The fundamental theorem of the calculus of variations . . 40
2.2.3 The Euler-Lagrange equation . . . . . . . . . . . . . . . . 41
2.2.4 Beltrami identity . . . . . . . . . . . . . . . . . . . . . . . 43
2.2.5 Vector-valued functions . . . . . . . . . . . . . . . . . . . 44
2.2.6 A caveat . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.3.1 The catenoid . . . . . . . . . . . . . . . . . . . . . . . . . 46
3
4 CONTENTS
This course is an introduction to some of the most ingenious ideas in 18th cen-
tury mathematics. Some of these ideas are so important that their discoverers’
names appear on the side of the Eiffel Tower.
If you perform sufficiently well on the final exam, who knows where your name
will end up?
Structure
7
8 CONTENTS
satisfied by the function φ(x, y) whose value at (x, y) ∈ R2 is the length of time
it take light to travel to (x, y) from some fixed light-emitting curve C ⊂ R2 (the
light being emitted in a normal direction to C). The obvious way to solve this is
to draw a straight line connecting C to (x, y) and meeting C at ninety degrees.
Let t be the parameter for this line (x(t), y(t)) and observe that if we define
f (t) = φ(x(t), y(t)) then f (t) satisfies the ODE df /dt = 1 (simply because it
CONTENTS 9
takes light time t to travel t units because the speed of light is 1). I’m not
saying that this is obvious from the eikonal equation, but from the physical
interpretation of the solution φ it’s actually a tautology.
The method of characteristics generalises this example and constructs, for any
first order PDE, a system of “characteristic curves” (in this example the light
rays emitted normally from C) along which the PDE reduces to an ODE.
Finally we move on to tackle second order equations. We will concentrate on
three which arise very naturally in physics and geometry:
• The heat equation
∂φ ∂2φ
= .
∂t ∂x2
• Laplace’s equation
∂2φ ∂2φ
+ 2 = 0.
∂x2 ∂y
These are linear and therefore highly amenable to solution. They also typify
three different classes of second order equations with vastly different behaviours:
parabolic, elliptic and hyperbolic.
The general strategy of this final part of the course will be the following:
• Find infinitely many special solutions to the equation. Maybe they don’t
satisfy the boundary conditions you’re interested in, but don’t worry. This
step uses a clever idea called separation of variables: you assume the solu-
tion has the form φ(x, y) = X(x)Y (y), that is a product of two functions
each depending on only one of the variables. Then the PDE reduces to a
pair of ODEs which are easy to solve. Of course, separation of variables
rarely works, but with these simple linear equations it is fiercely effective.
• Take infinite linear combinations of these special solutions to fit to your
boundary conditions. By linearity of the PDE one can take linear combi-
nations of solutions and obtain a function which still solves the equation.
Fitting infinite sums of special functions (like sine and cosine) to arbitrary
functions is the subject of Fourier theory and indeed this is how Fourier
discovered Fourier theory.
There are also several appendices: the first is a recap of Fourier theory because
we will use lots of Fourier series in the second half of the course; the second
includes much of the Sage code I used to create the diagrams (I also used
Inkscape); the other appendices comprise all the problem sheets for the course.
10 CONTENTS
Videos
There are supplementary videos for this course, available on YouTube. These
cover a mixture of basic material prerequisite for understanding the course and
extension material for those who care to watch it. Links to these videos are in
the relevant places in the notes.
Notation
Sections, like this one, which are marked with a left bar contain material I
consider to be nonexaminable, usually proofs. That doesn’t mean you should
ignore them, just that you shouldn’t sit there memorising them getting your
knickers into a twist.
Partial derivatives are written with curly dees ∂f ∂x . Ordinary derivatives are
df d
written with Latin dees dt . The only difference is that dt means the function
being differentiated is a function of a single variable. People who write partial
derivatives with Latin dees are wrong and will be penalised. People who write
ordinary derivatives with curly dees on the basis that single variable is a special
case of several variables are smart, but still wrong, because it’s misleading for
the reader. If ever I write d/dt instead of ∂/∂t and you don’t know why I did
it, challenge me (there’s always the possibility that I made a mistake).
Also, if x(t) is a function of time t, the notation ẋ indicates differentiation with
respect to t. If y(x) is a function of x, the notation y 0 indicates differentiation
CONTENTS 11
with respect to x. It’s possible that 0 might denote differentiation with respect
to another variable like t. It shouldn’t bother you because it will only ever occur
when the function in question is a function of a single variable, so x0 or ẋ just
mean “differentiate x with respect to whatever it depends on”.
Acknowlegdements
This course owes a lot to the course taught in previous years by Dr R Bowles and
Dr G Esler. Their notes have been indispensible in preparing these notes; many
of the nice questions from their problem sets have made it onto these problem
sets. Arnold’s book “Lectures on Partial Differential Equations” provided most
of the inspiration for my presentation of the method of characteristics; Spiegel’s
“Fourier Analysis” has been useful too: I learned Fourier series from it myself,
long ago. Thanks most of all to my extremely diligent markers (Giancarlo
Grasso, Abbygail Shaw and Huda Ramli) and my wonderful and inquisitive
second years for pointing out so many errors and typos in the lectures, in the
notes and on the question sheets.
12 CONTENTS
Part I
13
Chapter 1
For most of what I say, we’ll take “several” to mean “two” to save on notation.
There are no new ideas introduced by adding more variables. Henceforth we’ll
use (x, y) to denote Cartesian coordinates on the plane R2 .
∂f d
(x0 , y0 ) ≡ f (x, y0 )
∂x dx x=x0
f (x0 + h, y0 ) − f (x0 , y0 )
≡ lim .
h→0 h
15
16 CHAPTER 1. CALCULUS WITH SEVERAL VARIABLES
Similarly
∂f d
(x0 , y0 ) ≡ f (x0 , y)
∂y dy y=y0
f (x0 , y0 + k) − f (x0 , y0 )
≡ lim .
k→0 k
∂f ∂f
Example 1. 1. If f (x, y) = x2 y 3 then ∂x = 2xy 3 and ∂y = 3x2 y 2 .
∂f ∂f
2. If f (x, y) = sin(x + y) then ∂x = cos(x + y) = ∂y .
∂f
vp (f ) = (p)
∂x
or if v = (0, 1) then we get
∂f
vp (f ) = (p).
∂y
Theorem 2. Suppose that the partial derivatives of f with respect to x and y
exist everywhere and that they vary continuously. Let v = (v1 , v2 ) be a vector
and p ∈ R2 be a point. Then
∂f ∂f
vp (f ) = v1 (p) + v2 (p).
∂x ∂y
This theorem is a nice piece of analysis and therefore we won’t prove it. You
can tell it’s nontrivial because it’s not immediately obvious what continuity of
the partial derivatives has to do with anything.
1.1. FIRST DERIVATIVES 17
The directional derivatives are just linear combinations of the directional deriva-
tives in the x- and y-directions. What does this mean geometrically, in terms
of the graph of f ?
In single-variable calculus, vectors in R have just one component v. The
directional derivative of a function g : R → R along the vector v at x is
vx (g) = vg 0 (x). As v varies this traces out a line with slope g 0 (x), which is
tangent to the graph of g when we translate it to (x, g(x)).
In two variables, the directional derivative of f : R2 → R2 in the direction
v = (v1 , v2 ) at p = (x0 , y0 ) is
∂f ∂f
vp (f ) = v1 (p) + v2 (p)
∂x ∂y
which traces out a plane as v1 and v2 vary. This plane is tangent to the graph
of f when it is translated to the point (x0 , y0 , f (x0 , y0 )). In other words, the
geometric content of the above theorem is that the graph of a function with
continuous partial derivatives admits a tangent plane.
Example 3. You cannot drop the assumption of continuity of partial deriva-
tives. A counterexample would be the function
(
xy
2 2 when (x, y) 6= (0, 0)
f (x, y) = x +y .
0 at (x, y) = (0, 0)
Both partial derivatives exist everywhere. They vanish at the origin, but for
example
∂f y 3 − x2 y
= 2
∂x (x + y 2 )2
and along the y-axis this is y 3 /y 4 = 1/y which does not tend to zero as y →
0, hence the partial derivatives are not continuous. The function is not even
continuous when restricted to a line which is not x = 0 or y = 0 and hence the
other directional derivatives don’t even exist.
To get away from difficulties like this, we will only discuss functions with con-
tinuous first partial derivatives, also called C 1 -functions, or once continuously
differentiable functions.
In this case we can define total derivative at p
dp f : R2 → R
which encodes all directional derivatives. The total derivative eats a vector v
and outputs the directional derivative in the v-direction at p, that is
dp f (v) = vp (f ).
18 CHAPTER 1. CALCULUS WITH SEVERAL VARIABLES
Figure 1.1: The graph of the function from Example 3 which has discontinuous
partial derivatives and hence no tangent plane at the origin.
g(x0 + ) − g(x0 )
= g 0 (x0 ) + η()
(with more vertical entries if there are more coordinates). Note that by defini-
tion:
dp f (v) = (∇f )(p) · v.
Lemma 5. The gradient (∇f )(p) points in the direction of maximal increase
of f at p. Moreover, its magnitude equals the directional derivative in the cor-
responding unit direction (∇f )(p)/|(∇f )(p)|.
20 CHAPTER 1. CALCULUS WITH SEVERAL VARIABLES
Figure 1.2: The graph of the total derivative dp f translated to the point (p, f (p))
is the tangent plane to Graph(f ) at (p, f (p)).
Proof. Let’s assume that p is not a critical point of f , otherwise the Lemma
is obvious (because both dp f and (∇f )(p) vanish). If v is a unit vector then
|dp f (v)| = |(∇f )(p) · v| = |(∇f )(p)||v| cos(θ) where θ is the angle between v and
(∇f )(p). This is clearly maximal when cos(θ) = 1, that is when θ = 0. The
directional derivative of f in the (∇f )(p)/|(∇f )(p)|-direction is
(∇f )(p) (∇f )(p)
dp f = (∇f )(p) · = |(∇f )(p)|.
|(∇f )(p)| |∇f |
The naturality of writing this as a matrix comes from the following theorem.
Theorem 6 (Chain rule). If F : Rn → Rm and G : Rm → R` are differentiable
functions then
dp (G ◦ F ) = dF (p) G ◦ dp F
where ◦ denotes composition of functions on the left-hand side and matrix mul-
tiplication on the right.
and
G(u + w) = G(u) + du G(w) + |w|ξ(|w|)
for some functions η, ξ : R → Rm such that |η(r)| and |ξ(r)| tend to zero as
r → 0. Therefore
G(F (p + v)) = G(F (p)) + dF (p) G ◦ (dp F (v) + |v|η(|v|)) + |w|ξ(|w|)
22 CHAPTER 1. CALCULUS WITH SEVERAL VARIABLES
which equals
G(F (p)) + dF (p) G ◦ dp F (v) + dF (p) G(|v|η(|v|)) + |w|ξ(|w|)
goes to zero as → 0 and since this linear approximation determines our first
derivative we know that
dp (G ◦ F ) = dF (p) G ◦ dp F
In case you find the level of abstraction in this proof intimidating, I suggest
you try to prove the corollary explicitly in coordinates as it is stated. In case
you find the proof fun, I suggest you go to Analysis 4 to see more where that
came from.
Geometrically, you drop a horizontal line down onto the graph of f until it hits.
The point where it hits is clearly a local maximum and the horizontal line is
tangent there.
Proof. Let’s prove it for local maxima. Since g is differentiable it can be ap-
proximated by
g(x0 + ) = g(x0 ) + g 0 (x0 ) + η()
where η → 0 as → 0. If g 0 (x0 ) 6= 0 then it has a sign, ±. We consider small
with the same sign. When the magnitude of is sufficiently small, η() can be
made arbitrarily small relative to g 0 (x0 ) and hence the error term η() can be
made arbitrarily smaller in magnitude than g 0 (x0 ).
In particular, if
g(x0 ) + g 0 (x0 ) > g(x0 )
1.1. FIRST DERIVATIVES 23
Figure 1.3: A function with a local maximum. Note that the tangent plane to
the graph at the local maximum is horizontal.
In two variables, the geometric idea is the same: you drop a horizontal plane
down onto the graph and when it hits it is tangent (see Figure 1.3).
Theorem 10. If f : R2 → R is a C 1 -function with a local maximum or local
minimum at p = (x0 , y0 ) then all directional derivatives vanish at p, in other
words dp f = 0.
Not all critical points are maxima and minima. Even in one variable we have
inflection points. The analogous critical points in two variables are the saddle
points, see Figure 1.4.
2 00
g(x0 + ) = g(x0 ) + g 0 (x0 ) + g (x0 ) + 2 η()
2
for some function η which tends to zero as → 0.
Corollary 13 (Second derivative test). Suppose that g : R → R is a differ-
entiable function and that g has a critical point at x0 . Suppose moreover that
g 00 (x0 ) < 0. Then g has a local maximum at x0 .
1
g(x0 + ) = g(x0 ) + g 0 (x0 ) + 2 g 00 (x0 ) + 2 η()
2
1
= g(x0 ) + 2 g 00 (x0 ) + 2 η()
2
We will now develop the formalism of second partial derivatives in two vari-
able calculus, prove the analogue of Taylor’s theorem and explain the second
derivative test for critical points in this context.
1.2. SECOND DERIVATIVES 25
1.2.1 Definition
As before, to avoid analytical subtleties, we will assume that all partial deriva-
tives exist and are continuous, a property which we will call C 2 . With this
assumption we have the following
Theorem 14 (Symmetry of mixed partial derivatives). If f : R2 → R is a
C 2 -function then
∂2f ∂2f
= .
∂x∂y ∂y∂x
v· (f ) : R2 → R
vp (v· (f ))
v 2 (f ).
Of course you could also differentiate v(f ) in the w-direction for a different
vector w. Taking v and w to be standard basis vectors, you end up with the
partial derivatives
∂2f ∂2f ∂2f
, , .
∂x2 ∂x∂y ∂y 2
26 CHAPTER 1. CALCULUS WITH SEVERAL VARIABLES
∂2f ∂2f 2
2∂ f
v 2 (f ) = v12 + 2v 1 v 2 + v 2 .
∂x2 ∂x∂y ∂y 2
The expression
∂2f ∂2f ∂2f
v12 2
(p) + 2v1 v2 (p) + v22 2 (p)
∂x ∂x∂y ∂y
is a quadratic form in the coefficients of v. We can write it as
∂2f ∂2f
!
∂x2 ∂x∂y v1
v 1 v2 ∂2f ∂2f
2
v2
∂x∂y ∂y
Proof. First notice that Hessp (f ) is a symmetric matrix. Therefore its eigen-
values are both real, say λ1 , λ2 ∈ R. Suppose they are both negative. We can
diagonalise Hessp (f ) by an orthogonal matrix U :
λ1 0
U T Hessp (f )U =
0 λ2
1
f (p + U e1 ) = f (p) + dp f (U e1 ) + eT1 U T Hessp (f )U e1 + η()
2
Since p is a critical point, the first derivative term vanishes and we get
1
f (p + U e1 ) = f (p) + eT1 U T Hessp (f )U e1 + η()
2
and similarly for λ2 . The error term can be made very small when is very small,
so that f (p + e1 ) < f (p). Similarly for e2 and indeed for any v = v1 e1 + v2 e2
we get
1
f (p + v1 e1 + v2 e2 ) = f (p) + (λ1 v12 + λ2 v22 ) + · · ·
2
which is < f (p) when |v| is very small because λ1 , λ2 are both negative. This
proves that p is a local maximum and the same proof would demonstrate a local
minimum in case both eigenvalues were positive.
det(Hessp (f )) > 0
f (x, y) = x2 − y 2 .
with eigenvalues ±1. We can see from Figure 1.4 that the function has a maxi-
mum along the y-direction and a minimum along the x-direction.
See http://youtu.be/CNZOEPGKzcA.
In higher dimensions the Hessian is a larger matrix, but symmetry of partial
derivatives means that it is still symmetric. More complicated nondegenerate
1.2. SECOND DERIVATIVES 29
critical points are possible, but because we can always diagonalise a symmetric
matrix, these critical points will always look like one of a finite list of models
depending on how many positive and negative eigenvalues the Hessian has.
These models are
m
X n
X
x2k − x2k
k=1 k=m+1
1
f (p + v) = f (p) + (λ1 v12 + · · · + λn vn2 ) + · · · ,
2
so maxima are still points where all λk are negative and minima are still points
where all λk are positive.
30 CHAPTER 1. CALCULUS WITH SEVERAL VARIABLES
Figure 1.5: Examples illustrating Theorem 23. In one case (left), C runs through
a critical point of f and inherits a critical point itself. In the other (right), the
dashed contours of f are tangent to C at the two marked points where h = f |C
has a minimum and a maximum
tells you that this parametrisation exists provided that C can be written as
g −1 (0) for a function g which has no critical points on the level set g −1 (0).
Crucially, we don’t need to know the parametrisations to apply the theorem,
we just need to know they exist. Similarly, the level sets of f admit local
parametrisations near p provided p is not a critical point of f .
We start with a lemma to help us understand tangent lines to level sets:
Lemma 24. If f : R2 → R is a C 1 -function and f −1 (r) is one of its level
sets (assumed to contain no critical points) then the tangent line to f −1 (r)
at p ∈ f −1 (r) is precisely the set of vectors v such that dp f (v) = 0.
We know that
f ◦γ ≡r
because the image of γ is (by definition) inside the level set f −1 (r). Therefore
d0 (f ◦ γ) = 0. By the chain rule we get
0 = d0 (f ◦ γ) = dγ(0) f (γ̇(0)).
32 CHAPTER 1. CALCULUS WITH SEVERAL VARIABLES
d0 (f ◦ γ) = dp f (γ̇(0)).
We take λ as a new variable (the Lagrange multiplier) and consider the function
dp f − λdp g = 0,
i.e. when dp f and dp g are proportional. Varying with respect to λ, the critical
point condition is
∂H
= −g(x, y) = 0.
∂λ
In other words, critical points of H are triples (x, y, λ) such that
1.3. CONSTRAINED OPTIMISATION 33
• g(x, y) = 0,
• (x, y) is a critical point of F = f |g−1 (0) .
Example 25. How close does the circle x2 + y 2 = 1 get to the point (p, q)? We
know the answer: just rescale (p, q) to have length one and you’ll find the closest
point to (p, q) on the circle. But for practice, let’s work it out using Lagrange
multipliers. Let g(x, y) = x2 + y 2 − 1 and let f (x, y) = (x − p)2 + (y − q)2 , which
measures the (squared) distance to (p, q). In order to minimise f over g −1 (0),
we introduce a Lagrange multiplier λ and seek critical points of
2(x − p) + 2λx = 0
2(y − q) + 2λy = 0
x2 + y 2 = 1
which gives
p q
x= , y= ,
1+λ 1+λ
so
p2 q2
+ =1
(1 + λ)2 (1 + λ)2
or p
λ= p2 + q 2 − 1.
Substituting back gives
1
(x, y) = √ (p, q)
p, q
as we suspected.
Example 26. Suppose you have enough money to buy four square metres of
cardboard and you want to use it to make a lidless box. However, you’re allowed
to cut the cardboard into the desired shape before you buy it. How should you cut
34 CHAPTER 1. CALCULUS WITH SEVERAL VARIABLES
it to maximise the volume of the box? Well you can control the three dimensions
of the box a, b, c and let’s suppose that the lid would have area bc. The total
surface area is therefore
2ab + 2ac + bc
and the volume is
f (a, b, c) = abc
Take g(a, b, c) = 2ab + 2ac + bc − 4 and introduce a Lagrange multiplier λ. We
need to minimise
H(a, b, c, λ) = abc − λ(2ab + 2ac + bc − 4)
with respect to the four variables a, b, c, λ.
First, because the situation is symmetric with respect to b and c, we expect that
the minimising configuration will have b = c. To prove this, we differentiate
with respect to b and c and we see that, at a critical point,
∂H
0= = ac − λ(2a + c)
∂b
∂H
0= = ab − λ(2a + b)
∂c
This gives
(a − λ)c = 2λa = (a − λ)b
so b = c unless a = λ. If a = λ then ac − λ(2a + c) = 2a2 = 0 so a = 0, but we
are assuming that our box has some height. Therefore we can assume b = c.
The problem is now to minimise
ab2 − λ(4ab + b2 − 4)
and the vanishing of partial derivatives with respect to a, b and λ gives
b2 − 4λb = 0
2ab − 4λa − 2λb = 0
4ab + b2 = 4
Since b 6= 0, the first and third of these equations become
b = 4λ
4 − b2
a=
4b
1 − 4λ2
=
4λ
and hence the second becomes
1 = 12λ2
Hence the solution is √ √
a = 1/ 3, b = 4/ 3.
Chapter 2
Calculus of variations
I want to convince you that a straight line is the shortest path between two
points in the plane. Maybe you don’t need much convincing of this fact, but
did you ever really think about what this means? I’m saying that if you fix
two points and consider the space of all possible paths between them (which
is of course an incomprehensibly large, in fact infinite dimensional, space) you
can define a functional on this infinite-dimensional space of paths which takes
each path to its length and this functional has a global minimum at the path
given by a straight line. It suddenly seems like a more intimidating task to
really convince ourselves that this is true, and it seems more surprising that our
human intuition picked up on this without our noticing.
35
36 CHAPTER 2. CALCULUS OF VARIATIONS
such that γ(0) = A and γ(1) = B. Alternatively, we can project γ(t) to the
two coordinate axes and think of it as a pair of functions γ(t) = (γ1 (t), γ2 (t))
satisfying γ1 (0) = A1 , γ1 (1) = B1 , etc. Now we don’t want our paths to jump
around discontinuously in the plane so we’ll certainly require the two functions
γ1 and γ2 to be continuous. In fact, even to define the length of a path we’re
going to need to require slightly more:
• we also want the derivatives γ̇1 and γ̇2 never vanish simultaneously, in
other words the vector γ̇ is never allowed to vanish.
When we try and define the length of a curved path, the natural thing to do
is to zoom in very closely to the curve and recall from Taylor’s theorem that
on very small scales the path is well-approximated by a line. We can imagine
taking finer and finer polygonal approximations of our path and defining the
length to be the limit of the lengths of the polygons. This is of course the same
as defining length by an integral along the curve and the infinitesimal length
along the curve (the infinitesimal arc-length) is just dt times the length of the
vector γ̇ where the dot denotes time-differentiation. Using Pythagoras, this is
Z 1 q
L(γ) ≡ γ̇12 + γ̇22 dt
0
Remark 27. Note that here we need the path to be differentiable to even write
down the integrand. We also need the integral to be well-defined; if the deriva-
tive of γ is continuous then so is the integrand and we know how to integrate
continuous functions on a closed interval (Riemann integration from Analysis
I/II).
2.1. STRAIGHT LINES ARE SHORTEST PATHS 37
Actually, the square root really causes us headaches, so I want to get rid of it
for the moment and define the action of a path to be the integral
Z 1
γ̇12 + γ̇22 dt.
α(γ) ≡
0
We’ll prove
Proposition 28. If γ is a straight line between A and B and δ is another path
between A and B then
α(γ) ≤ α(δ)
with equality if and only if δ ≡ γ.
In other words, straight lines minimise action. We’ll see later how to deduce
the claim about length from this claim about action.
The key idea here was to integrate by parts (i.e. apply the fundamental theorem
of calculus and use the fact that γ satisfies some highly restrictive equation (γ̇
constant) to get rid of the term linear in . We were also lucky enough that
the remaining term was strictly positive, which is what allows us to prove this
incredibly strong result about global minimation of action by lines in Euclidean
space. It’s not usually the case in such variational arguments that we can control
the higher-order terms in .
38 CHAPTER 2. CALCULUS OF VARIATIONS
Now we want to argue that straight lines are in fact shortest. If I give you a
differentiable path γ you can reparametrise it. To do this, take your favourite
differentiable, monotonically strictly increasing function φ : [0, 1] → [0, 1] and
form the composition δ = γ ◦ φ, in other words
δ(t) = γ(φ(t))
dφ−1
dφ
=1 .
dt dt
dδ dγ dφ−1
=
dt dφ dt
dφ dφ
= L(γ)
dt dt
= L(γ)
Now it is clear that the action of a straight line is L(γ)2 . This plus the
previous proposition tells us that the length of a straight line is strictly less
than the length of any other path joining the points!
We get
Z 1
2 2
α(γ + ) = (γ̇1 + ˙1 ) + (γ̇2 + ˙2 ) dt
0
Z 1
= α(γ) + 2 (γ˙1 ˙1 + γ̇2 ˙2 ) dt + O(2 )
0
i.e. Z 1
− γ̈ · dt = 0 for all .
0
will do!
Now consider the function : [0, 1] → Rn given by
(t) = (f (t), 0, . . . , 0)
φ(a) = A, φ(b) = B.
φ+
L(p, q, r)
in other words the integrand depends only on t, φ and its first derivative. For
instance, the integrand in the length and area functionals above only depended
on φ̇. We call the function L the Lagrangian of our problem.
Now perturb φ by a variation and expand the integrand to first order in
using the chain rule:
∂L
L(t, φ(t) + (t), φ̇(t) + (t))
˙ = L t, φ(t), φ̇(t) + (t) t, φ(t), φ̇(t) +
∂p
∂L
+ (t)
˙ t, φ(t), φ̇(t) + O(2 )
∂q
Integrating we get
Z b
F (φ + ) = L t, φ + , φ̇ + ˙ dt
a
Z b Z b
∂L ∂L
= F (φ) + t, φ, φ̇ (t)dt + ˙ + O(2 )
t, φ, φ̇ (t)
a ∂q a ∂r
Z b
∂L d ∂L
= F (φ) + t, φ(t), φ̇(t) − t, φ(t), φ̇(t) (t)dt+
a ∂q dt ∂r
b
∂L
+ + O(2 )
∂r a
where we have integrated by parts and picked up a boundary term. The bound-
ary term vanishes because (a) = (b) = 0. From this we see that the directional
derivative of F in the -direction at φ is
Z b
∂L d ∂L
t, φ(t), φ̇(t) − t, φ(t), φ̇(t) (t)dt.
a ∂q dt ∂r
If φ is an critical point then all directional derivatives vanish, which means that
the above integral vanishes for all . By Theorem 31 this means that
∂L d ∂L
t, φ(t), φ̇(t) − t, φ(t), φ̇(t) =0
∂q dt ∂r
for the equation I’ve just written, and henceforth we’ll adopt this notation too.
The Euler-Lagrange equation is then
∂L d ∂L
= (2.1)
∂φ dt ∂ φ̇
∂L
L − φ̇ = C.
∂ φ̇
L, ∂L/∂p, etc.
instead of ∂L
L t, φ(t), φ̇(t) , t, φ(t), φ̇(t) , etc.
∂p
The right-hand side is now clearly recognisable as φ̇(t) times the Euler-Lagrange
operator. Since
we’re assuming
that φ satisfies the Euler-Lagrange equation, we
d ∂L
see that dt L − φ̇(t) ∂r = 0, so
∂L
L t, φ(t), φ̇(t) − φ̇(t) t, φ(t), φ̇(t) = C
∂r
for some constant C, as required.
When you study classical mechanics in the Lagrangian/Hamiltonian setting
(Analytical Dynamics, next term) you will be able to understand this as a
statement of conservation of energy for an autonomous Hamiltonian system.
Remark 34. It seems strange that we can replace the second-order Euler-
Lagrange equation with its two boundary conditions φ(a) = A, φ(b) = B by
a first-order equation with two boundary conditions (usually we only have the
freedom to fix one boundary condition for a first-order equation). The point is
that the Beltrami has an undetermined constant, C, which we can fix using the
second boundary condition.
We often come across problems where we have to optimise over a space of vector-
valued functions. For instance, showing that a straight line is shortest involved
looking at two-component functions (γ1 (t), γ2 (t)). In this case there is an Euler-
Lagrange equation for each component. Suppose that the vector-valued function
is (f1 (t), . . . , fn (t)) and that
L(t, f1 (t), . . . , fn (t), f˙1 (t), . . . , f˙n (t))
is the Lagrangian. Then the Euler-Lagrange equations are
∂L d ∂L
=
∂f1 dt ∂ f˙1
..
.
∂L d ∂L
=
∂fn dt ∂ f˙n
This is easy to see by taking variations individually. For example, the variation
(f1 + 1 , f2 , . . . , fn ) gives rise to the first equation. In fact you could think of
this as like the “partial derivative” of the functional in the f1 -direction (very
loosely speaking).
Problem 35. If ∂L/∂t = 0 then a Beltrami identity holds:
n
X ∂L
L− f˙k = C.
k=1
∂ f˙k
2.2. THE EULER-LAGRANGE EQUATIONS 45
2.2.6 A caveat
When we do these basic variational calculations all we are doing is finding the
critical points of a particular functional. It is possible to do a second-derivative
test to figure out if these are local maxima or minima, however most of the
time we want to know when something is a global maximum or minimum, i.e.
it’s the biggest/smallest of all possibilities. The methods we are developing are
purely local and won’t tell us such global information, in the same way that
ordinary calculus won’t tell you global information for an ordinary function.
The theorems we prove therefore have the form: “Assume a global minimum
x of the functional F : X → R were to exist. Then x would be a straight
line/circle/catenary...”
2.3 Examples
Let φ : [a, b] → R be a function with φ(a) = A, φ(b) = B and φ(x) > 0 for all
x ∈ [−1, 1]. Consider the graph of this function inside R2
{(x, z) ∈ R2 : z = φ(x)}
and consider this R2 as the (x, z)-plane in R3 . Rotate the graph around the x-
axis and consider the surface it traces out. This is called a surface of revolution.
Lemma 37. The area of this surface of revolution can be written as an integral
Z b q
2π φ(t) 1 + φ̇(t)2 dt.
a
where dk is the distance along the surface of the frustum between the two
end circles. It is clear from the picture that Qk = φ(tk ) and, since the radius
2.3. EXAMPLES 47
Rk is obtained by moving from (tk , φ(tk )) along a line of slope φ̇(tk ) for time
, we know that Rk = Qk + φ̇(tk ). We can compute dk from Pythagoras’s
theorem q q
dk = 2 + 2 φ̇(tk )2 = 1 + φ̇(tk )2
so the area of the frustum Fk is
q
π 1 + φ̇(tk )2 2φ(tk ) + φ̇(tk ) .
which is
n−1
X q
lim π 1 + φ̇(tk )2 2φ(tk ) + φ̇(tk )
n→∞
k=0
which goes to zero in the limit as n → ∞. Hence we can ignore this term.
The first term is actually just what we would get if we wrote out the definition
of the Riemann integral of the function
q
2πφ(t) 1 + φ̇(t)2
using the sequence of decompositions we began with. This proves the lemma.
This curve is called a catenary curve and that’s why its surface of revolution is
called a catenoid. We’ll meet the catenary curve again in Section 2.4.3 below.
i.e. p
L(p, q, r) = q 1 + r2
Clearly, L is independent of p, so we have Beltrami’s identity
∂L
L − φ̇ =C
∂ φ̇
q
∂L
for some constant C. Since ∂ φ̇
= φφ̇/ 1 + φ̇2 , this becomes
φ̇2 φ
q
φ 1 + φ̇2 − q = C.
1 + φ̇2
q
Multiplying by 1 + φ̇2 :
q
(1 + φ̇2 )φ − φ̇2 φ = C 1 + φ̇2
2.3. EXAMPLES 49
or q
φ=C 1 + φ̇2 .
This rearranges to give r
φ2
φ̇ = − 1.
C2
Now
Z Z
φ̇
t + D0 ≡ dt = q dt
φ2
C2 −1
Z
d(φ/C)
=C q
φ2
C2 − 1
r !!
φ2φ
= C log 2 −1
+
C2C
φ
= C log 2 + C cosh−1
C
as required.
Problem 39. Suppose that a = −1, b = 1. Show that the equal height case
A = B leads us to φ(t) = C cosh(t/C). For what values of A does this have
a solution? Given the interpretation of a minimal surface of revolution as a
soap-film, how do you think we should interpret this last answer?
πd(Q + R)
where d is the distance between the top and bottom circles as measured on
the surface of the frustum. We will demonstrate this formula here, though
the demonstration begins with an unjustified piece of geometric intuition
that I won’t bother to explain.
The point I will not justify is that you can cut the frustum and lay it out
flat. The result is a planar region: a segment of an annulus. You could
demonstrate this by writing down an explicit isometric (distance-preserving)
map between the cut frustum and the planar region, but I won’t do that.
Given this fact, we will compute the area of the planar region.
Let F be the fraction of the annulus which is subtended by the cut frustum
and let P denote the radius of the inner boundary of the annulus. We know
that the distance between the two boundary components of the annulus is d.
We also know that the length of the inner arc is 2πQ and the length of the
outer arc is 2πR. In other words,
2πP F = 2πQ
2π(P + d)F = 2πR.
This gives
(R − Q)
F =
d
and
Qd
P = .
R−Q
Therefore the area of the annulus, being the difference of the area of two
discs, is
2
π (P + d) − πP 2 = πd(d + 2P )
2.3. EXAMPLES 51
and the area of the segment is a fraction F of this, giving the area of the
frustum as
Qd
π(R − Q)(d + 2P ) = π(R − Q) d + 2
R−Q
= πd(R − Q + 2Q)
= πd(Q + R)
as claimed.
Figure 2.3: The brachistochrone curve is the optimal shape of wire for a fric-
tionless bead moving under gravity to slide down between two points in order
to minimise the time taken.
From the Greek βραχιςτ oσ + χρoνoς meaning “shortest” + “time”, the brachis-
tochrone problem is the problem of finding the shape of a wire joining two points
p and q such that a bead running on this wire under gravity (no friction) gets
from p to q in the shortest time (assuming it starts off at rest at p).
Let’s translate p to the origin and suppose that q is at (x0 , y0 ), with x0 > 0 (so
that p and q are not directly above and below one another) and with y0 < 0 (so
that the bead will really fall from p to q). The kinetic energy of a bead with
speed v is 21 mv 2 and the gravitational potential energy of a bead at (x, y) is
mgy. Therefore
1 2
2 mv + mgy
We assume that the wire is the graph of some function y(x). Moreover, we’ll
assume that this function is invertible, that is we can express x as a function of
y and dx/dy = (dy/dx)−1 .
52 CHAPTER 2. CALCULUS OF VARIATIONS
The time taken for the bead to move along the wire to q is
Z q Z q
ds
T = dt =
p p ṡ
1
√ p (1 + (y 0 )2 − (y 0 )2 ) = C
−2gy 1 + (y 0 )2
or r
0 1
y = − 1.
−2C 2 gy
Writing this in terms of x(y), we get
dx 1
=q
dy 1
−1
−2C 2 gy
√
−y
=√ ,
A+y
Remark 40. From Figure 2.3 it should be clear that our assumption (that x
is expressible as a function of y) is a poor one.√ Of course, the end result is
reassuringly multivalued (it’s got a sin−1 and a ·). So although we made an
illegal assumption, the answer is actually correct. This is the beauty of the Euler-
Lagrange equations: their locality - from a global assumption about a function
(that it minimises a functional) you derive an equation for the local behaviour
of that extremal function. In the same way it’s kind of magical that light travels
in such a way as to minimise the time taken during travel, by only obeying local
laws of physics (without knowing beforehand what its endpoint is). If we were
feeling careful we could make this brachistochrone argument more rigorously, but
for the purposes of this course we’ll just be satisfied that we got the right answer.
The algorithm is the same as usual. You have a functional defined on some
space of functions φ(t). This functional is given by an integral
Z b
F (φ) = L(t, φ, φ̇)dt
a
and you want to find its extrema. However, you also want to impose a constraint
on the functions φ. Maybe you want to fix the total arc-length of the graph of
φ,
Z bq
1 + φ̇2 dt
a
Notice that these constraints are given by integrals; let’s consider the general
constraint:
Z b
M (t, φ, φ̇)dt = C.
a
54 CHAPTER 2. CALCULUS OF VARIATIONS
Varying with respect to φ gives the usual Euler-Lagrange equation but where L
is replaced by L + λ(M − C):
∂ d ∂
(L + λ(M − C)) = (L + λ(M − C)) .
∂φ dt ∂ φ̇
and we certainly don’t change the length of the path (the distance the particle
travels). For the rigorous justification of this, see Section 2.1.3. In particular, we
can parametrise
R t p 2proportionally to arc-length and define our new time coordinate
s(t) = 2π
K 0 γ̇ 1 + γ̇2
2 dt. This satisfies
Z sp
d K
γ̇1 (σ)2 + γ̇2 (σ)2 dσ =
ds 0 2π
in other words r 2 2 K
dγ1 dγ2
ds + ds =
2π
or
p K
ẋ2 + ẏ 2 = .
2π
The Euler-Lagrange equations simplify to
2πλ
ẏ = − ẍ
K
2πλ
ẋ = ÿ
K
(note that the other equation is automatically satisfied). We have
d3 y K 2 dy
= −
ds3 4π 2 λ2 ds
56 CHAPTER 2. CALCULUS OF VARIATIONS
Coming from the Latin word “catena” meaning chain, the catenary is the curve
which follows a hanging chain. More formally, if (a, A) and (b, B) are points
in a vertical plane and there is an infinitely thin chain of uniform density ρ
(mass per unit length) whose endpoints are fixed at these points, hanging freely
under gravity and having fixed length K then the chain fits a curve called the
catenary, given by the graph of a function φ(t), t ∈ [a, b] which is the solution
to a variational problem.
The variational problem is this: minimise the total potential energy of the chain
subject to the condition that its length equals K. The potential energy of a mass
at height y is mgy so each “infinitesimal element” of our chain (x(t), y(t)) =
(t, f (t)) has:
q
• length equal to dx2 + dy 2 = 1 + f˙2 dt,
p
q
• mass equal to ρ 1 + f˙2 dt,
q
• potential energy equal to ρgf 1 + f˙2 dt.
Integrating this over the whole interval [a, b] gives the total potential energy
Z b q
ρg f (t) 1 + f˙(t)2 dt
a
2.5. SEVERAL DIMENSIONS 57
Since
∂L
=0
∂ϕ
the Euler-Lagrange equation is
∂ ∂ϕ ∂ ∂ϕ
0= 2 + 2 = 2∆ϕ
∂x ∂x ∂y ∂y
∂2 ∂2
∆= 2
+ 2.
∂x ∂y
Proof. I’ll only give a sketch proof. Take a tangent plane to the graph of ϕ at
a point on the graph. Two of the vectors in this tangent plane are (1, 0, ∂x f )
and (0, 1, ∂y f ). The area of the parallelogram which they span is equal to the
magnitude of their cross-product
and this magnitude is precisely the integrand. Now approximate the surface by
such parallelograms (living over squares of edge-length in R2 ) and take better
and better approximations ( → 0). This yields the integral.
Theorem 44. If ϕ is a C 2 -function with ϕ|C = f which minimises the surface
area of its graph amongst all such functions then ϕ satisfies
2 ! 2 !
∂2ϕ ∂ϕ ∂2ϕ ∂ϕ ∂ϕ ∂ϕ ∂ 2 ϕ
1 + + 1 + = 2 .
∂x2 ∂y ∂y 2 ∂x ∂x ∂y ∂x∂y
The proof is consigned to one of the problem sets: I could not deprive you of
the pleasure of this calculation.
60 CHAPTER 2. CALCULUS OF VARIATIONS
Part II
Partial differential
equations
61
Chapter 3
F x, y, φ(x, y), (∂x φ)(x, y), (∂y φ)(x, y), (∂x2 φ)(x, y), . . . = 0
where the dots indicate a finite list, but that would be cumbersome.
For example,
∂φ ∂φ
F (φ)(x, y) = (x, y) + φ(x, y) (x, y) − xy = 0
∂x ∂y
or 2 2
∂φ ∂φ
+ =1
∂x ∂y
or
∂φ ∂2φ
=
∂t ∂x2
are PDEs, where we have already started to omit the argument from each term.
On the other hand,
∂x φ(x, y) = ∂y φ(x + 1, y − 1)
and
∂x φ + ∂x2 φ + ∂x3 φ + · · · = 0
63
64 CHAPTER 3. SOME GENERAL THEORY
are not PDEs (one is a difference equation, one involves infinitely many deriva-
tives).
The highest order of derivatives which appears in the equation is called the order
of the equation. We will first consider first-order equations, then second-order
equations. But here are some more definitions:
Definition 45. • A PDE F (φ) = 0 is called linear if
F (λ1 φ1 + λ2 φ2 ) = λ1 F (φ1 ) + λ2 F (φ2 )
for all λ1 , λ2 ∈ R. In practical terms, this means that F (φ) is a sum of
terms of the form
A∂1 · · · ∂k φ
where A depends only on the variables x1 , . . . , xn and ∂1 . . . ∂k is some
string of partial derivatives. We also allow inhomogeneous linear equa-
tions, F (φ) = R where F is linear and R is a function of x1 , . . . , xn .
• If moreover, each coefficient A is just a constant, we say the equation has
constant coefficients.
• A PDE is called quasilinear if, in each term involving highest (say kth)
order partial derivatives ∂1 · · · ∂k φ, the coefficient doesn’t involve any other
kth derivatives. So
∂φ ∂φ ∂φ
+x =1
∂x ∂y ∂y
is not quasilinear because it is first order but there is a term involving a
product of first derivatives. The Burgers inviscid fluid equation
∂φ ∂φ
φ + =0
∂x ∂t
is quasilinear.
• If a PDE is not quasilinear then it is called fully nonlinear. For example
the eikonal equation
2 2
∂φ ∂φ
+ =1
∂x ∂y
important in geometric optics or the Monge-Ampère equation
2 2
∂2φ ∂2φ ∂ φ
−
∂x2 ∂y 2 ∂x∂y
important in geometry.
Most of the PDEs we are going to talk about in this course in are linear with the
exception of a few quasilinear and fully nonlinear first order equations (like the
eikonal equation). We will start by talking about first-order equations because
there is a very well-developed theory for them. Then we will talk about some
specific second-order equations. I will take a moment to discuss the second-order
equations we’re going to study.
3.2. QUASILINEAR SECOND ORDER EQUATIONS IN TWO VARIABLES65
This classification may seem like something quite formal, but the three types
of equations really do display very different kinds of behaviour (and quasilinear
equations of elliptic type really do behave very like linear elliptic equations).
66 CHAPTER 3. SOME GENERAL THEORY
∂φ ∂φ
A +B + Cφ = 0.
∂x ∂y
The expression A ∂φ ∂φ
∂x + B ∂y is just the partial derivative of φ in the (A, B)-
direction. Consider the family of all lines parallel to the vector (A, B), that is
the lines
{(x0 + tA, y0 + tB) : t ∈ R}
Suppose we have a solution φ and we restrict it to one of these lines:
Then we know
du ∂φ ∂φ
=A +B
dt ∂x ∂y
so
du
(t) + Cu(t) = 0.
dt
This is an ordinary differential equation equivalent to
d Ct
(e u) = 0
dt
and has solution
u(t) = Ke−Ct
3.3. SOME BASIC TRICKS 67
Suppose that A 6= 0. Let x0 = 0. Then, as t and y0 vary, the point (tA, y0 + tB)
traces out the whole of R2 . Indeed, the point (x, y) corresponds to t = x/A,
y0 = y − xB/A. Therefore
This basic trick is the idea behind the method of characteristics: using the
original equation, find a system of curves (in this case straight lines) along
which the PDE reduces to an ODE which we can solve.
and suppose that ∆ = B 2 − 4AC > 0, i.e. the equation is hyperbolic. Then the
quadratic polynomial
At2 + Bt + C = 0 (3.2)
has two distinct real roots √
−B ± ∆
t± =
2A
so that At2 + Bt + C = A(t − t− )(t − t+ ). Define s± = y + xt± so that
s+ − s− t+ s− − t− s+
x= , y= .
t+ − t− t+ − t−
Then
∂ ∂x ∂ ∂y ∂
= +
∂s+ ∂s+ ∂x ∂s+ ∂y
1 ∂ ∂
= − t−
t+ − t− ∂x ∂y
∂ ∂x ∂ ∂y ∂
= +
∂s− ∂s− ∂x ∂s− ∂y
1 ∂ ∂
=− − t+ .
t+ − t− ∂x ∂y
68 CHAPTER 3. SOME GENERAL THEORY
If the original equation were instead parabolic then the quadratic polynomial
(3.2) would have two identical real roots t0 and the general solution would be
f (y +xt0 )+xg(y +xt0 ). To see this, use the new coordinates (u, v) = (y +t0 x, x)
instead of s± .
If the original equation were elliptic then (3.2) would have two non-real complex-
conjugate roots z and z ∗ . The general solution is once again
f (y + xz) + g(y + xz ∗ )
f (t) = g(t) = t
would give
y + xz + y + xz ∗ = 2y + x Re(z)
as the solution.
Examples
or
φ(x, y) = ex + F (y + x) + G(y − 2x).
Now let us see how to solve for F and G when given initial conditions φ(x, 0) =
M (x) and ∂φ/∂t(x, 0) = N (x).
Example 52. Say we require φ(x, 0) = x2 and ∂φ/∂t(x, 0) = x3 . Then
x2 = ex + F (x) + G(−2x)
x3 = F 0 (x) + G0 (−2x)
and now we have two simultaneous equations for F 0 and G0 . These give
3F 0 (x) + ex = 2(x3 + x)
or
x4 x2 ex
F (x) = + −
6 3 3
Plugging back into the very first equation gives
G(−2x) = x2 − ex − F (x)
or
2x2 x4 2ex
G(−2x) = − − .
3 6 3
Substituting z = −2x gives
z2 z4 2e−z/2
G(x) = − − .
6 96 3
Hence
(x + y)4 (x + y)2 ex+y
φ(x, y) = ex + + −
6 3 3
(y − 2x)2 (y − 2x)4 2e−(y−2x)/2
+ − − .
6 96 3
70 CHAPTER 3. SOME GENERAL THEORY
Chapter 4
As usual, I will only deal with two-variable equations, mainly for notational
convenience.
We have already seen how to solve first order linear homogeneous PDE with
constant coefficients in Section 3.3.1. Let’s just quickly extend this to the inho-
mogeneous case. The equation we want to solve is
∂φ ∂φ
A +B + Cφ = D(x, y)
∂x ∂y
The trick we employed before works again: we restrict attention to the line
(At, y0 + Bt) along which the equation for u(t) = φ(At, y0 + Bt) reduces to an
ODE
du
+ Cu = D(At, y0 + Bt)
dt
or
d Ct
(e u) = eCt D(At, y0 + Bt)
dt
and the solution consists of a particular integral
Z t
−Ct
u(t) = e eCs D(As, y0 + Bs)ds
0
K(y0 )e−Ct
71
72 CHAPTER 4. FIRST ORDER PDE
We’ll sort out the problem of actually finding the integral curves momentarily.
Let’s assume for the moment that we have found them. If γ(t) is a parametri-
sation of an integral curve then
d
φ(γ(t)) = dγ(t) φ(γ̇(t)) = dγ(t) φ(v)
dt
which is just the directional derivative of φ in the v-direction at γ(t), in other
words
∂φ ∂φ
A(γ(t)) + B(γ(t)) .
∂x ∂y
Therefore, setting f (t) = φ(γ(t)), we have
df
(t) = D(γ(t)) − C(γ(t))f (t)
dt
4.2. LINEAR CASE, VARYING COEFFICIENTS 73
dγ1
= A(γ1 (t), γ2 (t))
dt
dγ2
= B(γ1 (t), γ2 (t))
dt
which is a system of two ODEs for two unknowns γ1 , γ2 .
Therefore in order to solve the PDE, we have to solve a pair of coupled ODEs
for γ and then a further ODE to find f along γ. This latter ODE introduces
a ‘constant of integration’, but this ‘constant’ is allowed to vary as you move
from one integral curve γ to another.
Definition 56 (Characteristics). The curves γ(t) are called the characteristics
of our equation. Note that when we solve the system of ODEs for γ we get two
constants of integration and hence a two-parameter family of curves. One of
these parameters can always be fixed because it will correspond to reparametrising
the characteristics.
Let’s do an example.
Example 57. Consider
∂φ ∂φ
+ 2x = 1.
∂x ∂y
We have A(x, y) = 1, B(x, y) = 2x and hence
dγ1
=1
dt
dγ2
= 2γ1
dt
φ(x, y) = x + K(y − x2 )
∂φ ∂φ
y +x + xyφ = 0.
∂x ∂y
74 CHAPTER 4. FIRST ORDER PDE
df
= −xyf
dt
1
= − N et + e−t N et − e−t f (t)
4
1
= − N 2 e2t − e−2t f (t)
4
which has solution
1
f (t) = K(N ) exp − N 2 e2t + e−2t
8
This is slightly harder to deal with: instead of using a system of two coupled
ODEs to find the characteristics and then solving the final ODE to work out f
along the characteristics, we end up with a system of three coupled ODEs for
γ1 , γ2 , f all at the same time and we have no obvious way of separating out the
three.
Suppose that φ is a solution to the equation and that (x(t), y(t), f (t)) is the
restriction of this solution to a curve (x(t), y(t)). The system of ODE is
dx
= A(x(t), y(t), f (t)) (4.2)
dt
dy
= B(x(t), y(t), f (t)) (4.3)
dt
df
= −C(x(t), y(t), f (t)). (4.4)
dt
You can see that we are unlikely to be able to solve for x and y first because f
occurs in all three equations.
Suppose this system of equations is satisfied by some family of curves (xs (t), ys (t), zs (t)),
that is: for each s the curve
satisfies (4.2)-(4.4). This family of curves traces out some surface in (x, y, z)-
space. Suppose that this surface is the graph of a function z = φ(x, y). A simple
application of the chain rule gives:
{z = φ(x, y)}
equations, since
dz dx ∂φ dy ∂φ
= +
dt dt ∂x dt ∂y
∂φ ∂φ
=A +B
∂x ∂y
= −C
as required.
Definition 59. The vector field (A, B, −C) in (x, y, z)-space is called the char-
acteristic vector field of the quasilinear equation (4.1). The integral curves of
(A, B, −C) are called characteristic curves and the projections of the character-
istic curves to the xy-plane are called the characteristic projections.
In this language, what we have proved is that (a) the graph of a solution to
(4.1) is a union of characteristic curves, and that (b) a union of characteristic
curves traces out a surface which, when it is a graph, is the graph of a solution
to (4.1).
Aφx + Bφy + C = 0
such that φ|χ = f . This initial condition is called the Cauchy data and the
solution is called its development. The curve χ is called the Cauchy hypersurface.
(a)
(b)
∂φ ∂φ
+φ =0
∂x ∂y
78 CHAPTER 4. FIRST ORDER PDE
ẋ = 1
ẏ = z
ż = 0
x=t+D
y = Et + F
z=E
4.3.4 Caustics
The problem with this method is that is produces solution surfaces which are
parametrised by (s, t) instead of by (x, y)! It is quite possible that the surface
S ⊂ R3 traced out as (s, t) vary is not the graph of any function z = φ(x, y).
4.3. QUASILINEAR CASE 79
(a)
(b)
Indeed, we can see in Figures 4.1 and 4.2 that the Cauchy development is usually
not a graph. Let’s see what goes wrong in the previous example if we try and
express φ as a function of (x, y).
(s, t) 7→ (t, st + s, s)
(s, t) 7→ (t, s2 t + s, s2 )
√ √
so y = zx + z. If u = z then xu2 + u − y = 0 and so
√ 2
2 −1 ± 1 + 4xy
z=u = .
2x
this doesn’t make sense when 1 + 4xy < 0 and, even when 1 + 4xy, gives two
possible values for z = φ(x, y). This manifests itself in our picture: all the
characteristic projections live in the region 1 + 4xy ≥ 0 and they are all tangent
to the curve 1 + 4xy = 0 (in red in the figure). The solution surface simply
doesn’t extend as far as 1 + 4xy > 0.
The red curve in Figure 4.2 is the envelope of the characteristic curves, in other
words it is the curve to which all of the characteristics are somewhere tangent.
In the case of the eikonal equation of geometric optics, where characteristics
are light trajectories, this envelope shows up as a very bright curve called the
caustic.
Remark 61. The equation from our example is known as Burgers’s equation
(without viscosity) and is usually written
∂φ ∂φ
+φ = 0.
∂t ∂x
The interpretation is that the x-axis is filled with a fluid and the velocity of fluid
particles at x at time t is φ(x, t). The phenomenon of a single-valued solution
becoming multiple-valued is known as shock-wave formation and corresponds to
the fluid “catching up with itself ” so that fluid particles with different velocities
coexist at the same (x, t)-coordinate.
I will quickly explain why the caustic is relevant, then we will proceed to com-
pute it in some examples. Remember that the graph of a function φ : R2 → R
is just the surface
(x, y) 7→ (x, y, φ(x, y)).
Our solution surfaces have the form
which forgets the value of w is not one-to-one. Along the fold, the solutions
surface admits vertical tangencies, that is vectors tangent to the solution
surface which point vertically upward in the w-direction. So we can detect
the bad behaviour of our solution by looking for vertical tangencies to our
solution surface. It turns out that this is precisely the problem of computing
the determinant in the definition of the caustic.
Lemma 63. If the solution surface (s, t) 7→ (u(s, t), v(s, t), w(s, t)) has a
vertical tangency at some point (s0 , t0 ) then (u(s, t), v(s, t)) is in the caustic.
X = (∂s u(0, 0), ∂s v(0, 0), ∂s w(0, 0)) and Y = (∂t u(0, 0), ∂t v(0, 0), ∂t w(0, 0))
span the tangent space of the solution surface at that point. In other words,
all tangent vectors are linear combinations AX + BY of these two. If one
of the tangent directions AX + BY is vertical then when it is projected
82 CHAPTER 4. FIRST ORDER PDE
into the (u, v)-plane it vanishes, and so the corresponding linear combination
vanishes: Aπ(X)+Bπ(Y ) = 0 (where π is the projection π(u, v, w) = (u, v)).
The vectors π(X) and π(Y ) are therefore linearly dependent. The vectors
π(X) and π(Y ) are the rows of the matrix
∂s u ∂t u
(p)
∂s v ∂t v
appearing in the definition of the caustic (where p is the point (0, 0)). The
determinant of a matrix vanishes if and only if its rows are linearly dependent,
which proves the lemma.
which vanishes when t = −1. When t = −1, we have (u(s, −1), v(s, −1)) =
(−1, 0) so the caustic is a single point: the point we discovered earlier as the
intersection of all the characteristic projections.
In case (b) we had (u(s, t), v(s, t)) = (t, s2 t + s) and so the determinant of the
matrix of partial derivatives is
0 1
det = −2st − 1
2st + 1 s2
which vanishes when 2st = −1. This means that (x, y) is on the caustic if it
has the form (u(−1/2t, t), v(−1/2t, t)) = (t, −1/4t) so the caustic is the curve
{4xy + 1 = 0}.
Example 65. Consider the PDE
∂φ ∂φ
− sin φ + cos φ =1
∂x ∂y
for which the characteristic vector field is
Cauchy hypersurface is χ(s) = (s, 0) and the Cauchy data is φ(s, 0) = 0. The
characteristic curve (x0 + cos t, y0 + sin t, t) intersects (s, 0, 0) at t = 0 if y0 = 0
and x0 = s − 1, so the solution surface is parametrised by
Clearly the solution is φ(x, y) = sin−1 (y) but this is not single-valued, indeed
there are infinitely many z such that sin z = y. Moreover, the solution goes no
further than y = 1 - this corresponds to the fact that helices start overlapping
when t = π/2 (see Figure 4.3: the characteristics are segments of circles with
centre on the x-axis).
The matrix of first derivatives of the projection of the solution surface to the
(x, y)-plane is
1 − sin t
0 cos t
which has determinant cos t. This vanishes precisely along t = π/2 as suspected,
so the caustic is the line y = sin(π/2) = 1.
for all (x, y) ∈ R2 . Now let γ(t) = (x(t), y(t)) be a curve and restrict φ to γ
to obtain a function u(t) as usual. Let G(t) denote
dG ∂G ∂G ∂G ∂G ∂G
= ẋ + ẏ + u̇ + ṗ + q̇
dt ∂x ∂y ∂u ∂p ∂q
and since u̇ = ẋ ∂φ ∂φ
∂x + ẏ ∂y = ẋp + ẏq this becomes
dG ∂G ∂G ∂G ∂G
= ẋ + p + ẏ + q +
dt ∂x ∂u ∂y ∂u
∂G ∂G
+ ṗ + q̇.
∂p ∂q
This suggests a system of five coupled ODEs for the five quantities
(x(t), y(t), u(t), p(t), q(t)):
dx ∂G ∂G ∂G
= ṗ = − +p
dt ∂p ∂x ∂u
dy ∂G ∂G ∂G
= q̇ = − +q
dt ∂q ∂y ∂u
∂G ∂G
u̇ = p+ q
∂p ∂q
As usual, if we integrate this system of ODEs we will obtain a curve. Taking a
one-parameter family of these integral curves gives a surface in (x, y, u, p, q)-
space and when we project to (x, y, u)-space we obtain a surface which, wher-
ever it is a graph, is the graph of a solution u = φ(x, y).
Example 66. Let us consider the eikonal equation (in units where the speed
of light is 1)
2 2
∂φ ∂φ
+ =1
∂x ∂y
(so G(x, y, u, p, q) = p2 +q 2 ) which describes the time φ taken by light emitted
normally by some curve C ⊂ R2 to each a point (x, y) ∈ R2 . To see that
this description is valid, let’s consider the corresponding system of ODEs:
dx
= 2p ṗ = 0
dt
dy
= 2q q̇ = 0
dt
u̇ = 2(p2 + q 2 ) = 2
4.4. FULLY NONLINEAR CASE 85
Starting from the curve C and choose the initial condition φ|C = 0. We
see that this determines p and q along C, namely (p, q) must be (plus or
2 2
minus) the unit normal vector
to the curve C because p + q = 1 (giving unit
∂φ ∂φ
length) and because (p, q) = ∂x , ∂y and by our choice of initial condition
the directional derivative of φ along C vanishes, so (p, q) is normal to C.
Now along the integral curves of the ODE, p and q do not change and hence
x and y follow the normal line (with speed 2) and the solution to u̇ = 2 is just
u(t) = 2t. If one followed the normal with speed 1, we would get u(t) = t.
This is precisely the statement that the solution of the eikonal equation is the
time taken by light to reach (x, y) from C in a normal direction. The char-
acteristics are straight lines normal to C. Solutions of the eikonal equation
can be very beautiful. Figure 4.4 shows some of the singularities
n 2 developed o
by level sets of φ corresponding to the initial condition C = x4 + y 2 = 1
(that is, the equidistants of an ellipse).
86 CHAPTER 4. FIRST ORDER PDE
Chapter 5
∂φ ∂2φ
= .
∂t ∂x2
φ(x, 0) = F (x)
for some function F , and then seek a solution on [0, L] × [0, ∞).
There are various options for the boundary conditions in the x-direction:
• You can specify φ(0, t) and φ(L, t). This is called the Dirichlet problem.
• Instead of specifying φ along one of these boundaries you can specify
∂φ ∂φ
∂x (0, t) or ∂x (L, t) (in other words the directional derivative of φ in a
direction normal to the boundary). This is called a Neumann boundary
87
88 CHAPTER 5. THE HEAT EQUATION AND LAPLACE’S EQUATION
∂φ
condition. You could also impose ∂t (x, 0), i.e. a Neumann initial condi-
tion! But let’s not do that.
• Mixtures of the two are perfectly OK.
I was always confused by Neumann boundary conditions. What’s the point?
With Dirichlet it’s clear what it means - like with a soap film filling in a fixed
boundary shape. Here is a physical situation (heat) in which both are clearly
relevant.
• Dirichlet: The boundary points of our interval are kept at fixed tem-
peratures T1 , T2 , maybe by immersing the region in some vast bath of
refridgerating liquid. Since our interval is one-dimensional then you’ve
got to imagine immersing it in a one-dimensional bath, so that only the
boundary is being refridgerated...
• Neumann: The boundary of our interval is completely insulated from the
outside world. Insulation means that heat cannot flow into or out of the
boundary. Since heat travels down a temperature gradient, this means
that the directional derivative in the normal direction to the boundary
must vanish.
5.1.2 Strategy
We now want to solve the heat equation with given boundary and initial con-
ditions. Fourier solved this problem in his treatise on heat in 1822. His idea
was:
• Find lots of solutions to the heat equation. Make sure they fit the bound-
ary conditions. Maybe they don’t fit your initial condition, but don’t
worry.
• Take linear superpositions of (possibly infinitely many of) these solutions
in such a way as to fit the resulting function to the initial condition.
This works because the heat equation P∞is linear, so if φ1 , φ2 , . . . are solutions and
A1 , A2 , . . . are real numbers then k=1 Ak φk is also a solution (provided that
the partial sumsPconverge in an appropriate way (uniformly) so that derivatives
∞
commute with k=1 ).
XT 0 = X 00 T.
X 00 T0
= =λ
X T
for some constant λ (has to be constant because X 00 /X depends only on X and
T 0 /T depends only on T ). Multiplying up gives the two equations I mentioned.
This has the disadvantage of dividing by XY which could potentially vanish.
In future we will not be made queasy by this division and, from now on, that’s
how we’ll derive equations for separation of variables.
Solving for T , we get either
T (t) = Keλt or K
the new Dirichlet conditions θ(0, t) = 0 = θ(L, t). In other words, we can assume
without loss of generality that S0 = S1 = 0.
√ √
If λ > 0 then X(x) = A cosh(x λ)+B sinh(x λ) and so the condition √ X(0) = 0
means that A = 0. Also, the condition X(L) = 0 gives B sinh(L λ) = 0, but
sinh only vanishes at 0, hence B = 0 also. Therefore we can ignore separated
solutions with λ > 0 (as we had hoped) because we cannot fit them to our
boundary conditions.
If λ = −p2 < 0 then X(x) = A cos(px)+B sin(px) and so the condition X(0) = 0
means that A = 0. Also, the condition X(L) = 0 gives B sin(pL) = 0, and now
sin vanishes at nπ, n ∈ {1, 2, 3, ...}. This means we can have nontrivial solutions
(i.e. with B 6= 0) whenever pL = nπ. Thus we have separated solutions
2
φn = e−n t sin(nπx/L)
for n = 1, 2, 3, ....
S1 − S0
φ0 (x, t) = x + S0 = 1 − 2x/π
π
and defining θ(x, t) = φ(x, t) − φ0 (x, t). Then θ solves the Dirichlet conditions
which is
2 ((−1)n + 1)
π n(n2 − 1)
so that
∞
2x 2 X ((−1)n + 1) −n2 t
φ(x, t) = 1 − + e sin(nx).
π π n=1 n(n2 − 1)
Recall (we only wrote it out for n = 2) that this arises as the Euler-Lagrange
equation for functions defined on a region U ⊂ Rn with specified boundary
values, extremising the functional
Z n 2 !
X ∂φ
dx1 · · · dxn
U ∂xk
k=1
known as the Dirichlet energy. Functions which are annihilated by ∆ are called
harmonic functions. The problem of finding a harmonic function on a given
92 CHAPTER 5. THE HEAT EQUATION AND LAPLACE’S EQUATION
region with specified boundary values is called Dirichlet’s problem. The equation
∆φ = 0 satisfied by a harmonic function is called Laplace’s equation; it is a linear
elliptic second-order equation with constant coefficients.
Example 68. Consider harmonic functions on the interval [0, 1]. Since these
satisfy d2 f /dx2 = 0 they are just functions of the form f (x) = ax + b, whose
graphs are straight line segments from (0, b) to (1, a + b).
for harmonic functions on an interval. We won’t prove this, but note that it
bears a marked (and non-coincidental) resemblance to Cauchy’s integral formula
in complex analysis! This formula has the consequence that the maxima and
minima of a harmonic function are achieved along the boundary of its domain.
Corollary 70 (Maximum principle). If f : U → R is a harmonic function on
a domain U ⊂ R2 then the maximum and minimum of f are achieved along the
5.2. LAPLACE’S EQUATION 93
Proof. Suppose that the maximum is achieved somewhere (say x) in the interior
and that the function is not constant. Then there exists a point y such that
f (y) < f (x). Let U1 , . . . , Un be a sequence of discs such that x is at the centre
of U1 , y is on the boundary of Un and the boundary of Uk passes through the
centre of Uk−1 (we are making some topological assumptions about the domain
U , like the fact that any two points in U can be connected by such a sequence
of discs - that’s a fairly mild condition).
Now let zk be the point at the centre of Uk (so x = z1 ), let zn+1 = y, and let
rk be the radius of Uk . Since f (p) ≤ f (x) for all p ∈ U we have
f (x + r1 eiθ )r1 dθ
R R
∂U1
f (x)r1 dθ
R ≤ ∂U
R1 = f (x)
r dθ
∂U1 1 ∂U1 1
r dθ
with equality if and only if f (p) = f (x) for all p ∈ U1 . By the mean value
theorem, equality holds, sof (z1 ) = f (x). Applying the same argument again
and again we eventually get to f (y) = f (x), which is a contradiction.
Corollary 71 (Uniqueness of solutions to Dirichlet problem). Suppose that
φ1 , φ2 : U → R are harmonic functions on a domain U ⊂ R2 and φ1 |∂U =
φ2 |∂U . Then φ1 ≡ φ2 .
∂φ
= ∆φ.
∂t
We see that harmonic functions correspond to steady temperature distributions,
that is distributions which don’t change over time. From the variational inter-
pretation of Laplace’s equation, these are the distributions which minimise the
total temperature gradient (unsurprising, given that heat flows down a tem-
perature gradient (from hot to cold) and tries to homogenise the temperature,
subject to the boundary conditions).
94 CHAPTER 5. THE HEAT EQUATION AND LAPLACE’S EQUATION
Separation of variables
X 00 Y 00
=− .
X Y
The left-hand side depends only on x, the right-hand side only on y and by
equality of the two sides, both expressions must be constant, equal to λ. This
gives the two equations X 00 = λX, Y 00 = −λY .
There are now three cases:
• λ = 0: In this case X 00 = 0 so X = Ax + B, Y 00 = 0 so Y = Cy + D.
• λ > 0: In this case Y is a simple harmonic oscillator so
√ √
Y = C cos y λ + D sin y λ
and √ √
X = A cosh x λ + B sinh x λ .
5.2. LAPLACE’S EQUATION 95
and √ √
X = A cos x −λ + B sin x −λ .
Boundary conditions
We have various possible boundary conditions we can apply along the edges of
our rectangle.
• Dirichlet boundary conditions: where we fix the values of φ(x, 0),
φ(x, b), φ(−a, y) or φ(a, y).
• Neumann boundary conditions: where we fix the values of ∂φ ∂x along
∂φ
a vertical boundary or ∂y along a horizontal boundary. More generally,
we would fix the directional derivative of φ in a direction normal to the
boundary.
We can either have pure Dirichlet conditions, pure Neumann conditions or a
mixture (called mixed boundary conditions).
Now let’s examine the various solutions we found by separation of variables and
see what kinds of boundary condition we can satisfy. We will concentrate on
Dirichlet. We denote by U the rectangular region [0, a] × [0, b] and by ∂U its
boundary.
Fitting to corners
We impose
to this corner data. The four corner values give us four equations for A, B, C, D
which we can solve and we get
and θ is a solution to Laplace’s equation which solves some new boundary con-
ditions, but crucially θ = 0 at all the corners of U .
In other words, without loss of generality, our solution vanishes at the corners
of U .
Patching solutions
We’ve seen that we can modify F1 , F2 , F3 and F4 to F1new (x) = F1 (x)−φ0 (x, 0),
F2new (x) = F2 (x)−φ(x, b), F3new (y) = F3 (x)−φ0 (0, y) and F4new (y) = F4 (y)−
φ0 (a, y) which vanish at the corners. So let’s split our problem into four pieces:
let’s look for solutions θ1 , θ2 , θ3 , θ4 such that:
and
etc. Note that we can only do this because F1new (0) = F1new (a) = 0, etc.
otherwise the boundary conditions are discontinuous at the corners.
If we can find θ1 , θ2 , etc. then θ1 + θ2 + θ3 + θ4 is still a solution to Laplace’s
equation and satifies all the boundary conditions we wanted originally. I’ll show
you how to solve for θ1 : the other solutions can all be obtained the same way,
or by being clever with symmetries of the rectangle.
Solve for θ1
so that Dn are the coefficients in the Fourier half-range sine series of F1 (x) and
∞
X
0= (Cn sinh(nπb/a) + Dn cosh(nπb/a)) sin(nπx/a)
n=1
which means
Cn sinh(nπb/a) + Dn cosh(nπb/a) = 0
or
Cn = −Dn coth(nπb/a).
Example 72. If F1 (x) = sin(x) on [0, a = π] × [0, b = 5π] then D1 = 1 and
C1 = coth(5π) so the solution is
θ1 (x, y) = (− coth(5π) sinh y + cosh y) sin x.
You could of course plough through this procedure for each θ. For instance, θ2
is particularly easy: we need to solve
∞
X
0= Dn sin(nπx/a),
n=1
Example 73. Here is an example. On the square [0, π]2 , take F1 (x) = sin(x),
F2 (x) = 0, F3 (y) = 0, F4 (x) = sin(y).
We get θ1 (x, y) = (− coth(π) sinh y + cosh y) sin x as in the previous example.
However, we could rearrange this using hyperbolic trigonometric identities as
sinh(π−y) sin x
sinh π . If we had instead begun with F2 (x) = sin(x), F1 (x) = 0, the
solution would have been sinh y sin x
sinh π . Clearly these two solutions are related by
y 7→ π − y, which is a reflection in the horizontal line which cuts the rectangle
in two.
sinh x sin y
We also get θ4 (x, y) = sinh π . So the full solution is
(in particular, the boundary values are zero at the corners). The sinusoidal
Fourier series of x2 −πx on [0, π] (i.e. the half-range series obtained by extending
it to an odd function on [−π, π]) is
∞
8 X sin((2k − 1)x)
−
π (2k − 1)3
k=1
and because the values in the corners are now zero we can apply our earlier
solution
∞
8 X sin((2k − 1)x) sinh((2k − 1)y) sin((2k − 1)y) sinh((2k − 1)x)
− +
π (2k − 1)3 sinh((2k − 1)π) (2k − 1)3 sinh((2k − 1)π)
k=1
5.3. EIGENFUNCTION EXPANSIONS 99
P
and ∆(F ) = n An λn Mn (x, y). Thus a solution to the heat equation with
initial conditions φ(0, x, y) = M (x, y) is given by
X
An eλn t Mn (x, y)
n
(orthogonality) like the integrals of sine and cosine which let us compute
Fourier series and completeness, which asserts the existence of an expansion
X
F (x) = An yn (x)
n
Proof. Consider
v · Aw = (AT v) · w
= (Av) · w
5.4. DISCONTINUITIES 101
The analogue of the dot product · for the space of functions on [0, π] is the
integral Z π
“f · g” = f (x)g(x)dx
0
(Note that we need f (0) = f (π) = g(0) = g(π) = 0 or other suitable bound-
ary conditions for this integration to work without picking up boundary
terms).
A better name for “symmetric” in this context is “self-adjoint”: an operator
has an adjoint operator AT defined by
(AT v) · w = v · Aw
5.4 Discontinuities
As a final example, I want to tackle the problem of what happens when your
equation becomes discontinuous. For example, maybe you’re interested in the
heat equation on a rod [0, 2π] whose thermal diffusivity changes dramatically
halfway along and you model this by using the heat equation
∂φ ∂2φ
=σ 2
∂t ∂x
where σ is the discontinuous function
(
σ1 if x ∈ [0, π]
σ(x) =
σ2 if x ∈ [π, 2π]
Let’s figure out what the separated solutions are when σ1 = 1 and σ2 = 2 (so
the right-hand half of the rod conducts heat more quickly than the left-hand
half).
102 CHAPTER 5. THE HEAT EQUATION AND LAPLACE’S EQUATION
On [0, π] we get T10 = λT1 , X100 = λX1 as usual. On [π, 2π] we get T20 = λT2 and
X200 = 2λX2 . Therefore the separated solutions we seek are
√ √ p p
X1 (x) = A1 sin(x −λ)+B1 cos(x −λ), X2 (x) = A2 sin(x −λ/2)+B2 cos(x −λ/2)
X1 (0) = 0 ⇒ B1 = 0
so
X1 (π) = X2 (π)
implies √ p p
A1 sin(π −λ) = A2 sin(π −λ/2) + B2 cos(π −λ/2)
and
√ √
X10 (π) = −λA1 cos(π −λ)
p p p
X20 (π) −λ/2 A2 cos(π −λ/2) − B2 sin(π −λ/2)
so
X10 (π) = X20 (π)
implies
√ √ p p p
−λA1 cos(π −λ) = −λ/2 A2 cos(π −λ/2) − B2 sin(π −λ/2)
5.4. DISCONTINUITIES 103
√
p p p
1 sin(π −λ/2) − tan(2π −λ/2) cos(π −λ/2)
√ tan(π −λ) = p
−λ
p p p
−λ/2 cos(π −λ/2) + tan(2π −λ/2) sin(π −λ/2)
so
√ √ tan(π −λ/2) − tan(2π −λ/2)
p p
tan(π −λ) = 2 p p
1 + tan(2π −λ/2) tan(π −λ/2)
√ p
= − 2 tan π −λ/2
√
So λ has to satisfy this bizarre equation! If we plot the graphs of tan(π −λ) and
√ p
− 2 tan π −λ/2 as functions of λ, see Figure 5.1, we see that they intersect
at a discrete set of points with abscissa λn . Therefore we get separated solutions
with these λn and the other constants can be calculated from the relations we
have already found.
This ceases to be something we can do explicitly because there is no closed form
expression for the numbers λn .
104 CHAPTER 5. THE HEAT EQUATION AND LAPLACE’S EQUATION
-8 -6 -4 -2
-2
-4
-6
√ √ p
Figure 5.1: The graphs of tan(π −λ) and − 2 tan π −λ/2 as functions of
λ. The x-coordinates of the (discrete set of) intersection points are the numbers
we are calling λn .
Chapter 6
6.1 Derivation
Let’s consider a uniform one-dimensional string stretched out straight between
two points (0, 0) and (L, 0). Make a small perturbation of the string so that it
follows the graph of a function u : [0, L] → R with u(0) = u(L) = 0. We will let
go of the string and allow it to vibrate. At time t and above the point x ∈ [0, L]
the height of the string will be φ(x, t). Here’s a claim which we won’t justify,
but which should be intuitively appealing because of Hooke’s law which relates
tension and lengthening:
Claim 76. Assume that the perturbation φ is small and assume the same of its
derivatives. The potential energy in this perturbed string is (to a good approx-
imation) proportional the difference between its length and the length L of the
unperturbed string, i.e. equal to
s
Z L 2
∂φ
τ 1+ − 1 dx
0 ∂x
where τ is the tension per unit of lengthening. To lowest order in f (by Taylor
expanding the integrand) this is approximately
Z 2
τ L dφ
dx.
2 0 dx
Moreover, the total kinetic energy of the string, once it is in motion, is to a
good approximation
Z 2
ρ L dφ
dx
2 0 dt
where ρ is the density in units of mass per unit length (constant by the assump-
tion of uniformness).
105
106 CHAPTER 6. THE WAVE EQUATION
We have already seen how to solve the wave equation (and more generally linear
hyperbolic equations with constant coefficients) in Section 3.3.2. The general
solution is
φ(x, t) = F (x + ct) + G(x − ct)
for some arbitrary functions F and G. The only remaining task is to compute
F and G from the initial data of the string at time t = 0. Since there are
two arbitrary functions F and G (or because the wave equation has second
order) we need to know more than just the initial profile u of the string. We
need extra boundary or initial conditions. Above, we assumed fixed endpoints
φ(0, t) = φ(L, t) = 0, but there are other possibilities:
• Maybe the string is infinitely long, but we know v(x) ≡ ∂φ ∂t (x, 0). If we
were just letting go of the string then it would be stationary at the instant
we let it go (though it would immediately start to accelerate). In other
words we would have v ≡ 0. However, we allow ourselves more general
functions v.
• Maybe the string extends infinitely far in only one direction (i.e. along
the interval [0, ∞), but is fixed so that φ(0, t) = 0.
More generally there are all sorts of boundary conditions you could stick on
your string. We’ll just concern ourselves with the fixed endpoints and infinite
string settings.
6.2. BOUNDARY CONDITIONS 107
We deal first with the case of a string stretched between 0 and L and fixed at
height zero. That is we impose the boundary conditions
φ(0, t) = φ(L, t) = 0.
We will seek solutions to the wave equation by separating variables, that is look-
ing for solutions φ(x, t) = X(x)T (t). Substituting this into the wave equation
gives:
1
X 00 T = 2 XT 00
c
and so we only need to solve simple harmonic oscillator equations
X 00 = λX
T 00 = c2 λT
Claim 77. There are no nontrivial solutions satisfying the boundary conditions
φ(0, t) = φ(L, t) = 0
if λ ≥ 0.
A cosh(0) + B sinh(0) = 0 ⇒ A = 0
and √
B sinh(L λ) = 0 ⇒ B = 0.
Hence A = B = 0 and the solution is trivial.
√
In the remaining case λ < 0 there are many solutions when λ = kπ/L as usual
∞
X
f (x) = Fk sin(kπx/L)
k=1
X∞
g(x) = Gk sin(kπx/L)
k=1
108 CHAPTER 6. THE WAVE EQUATION
In this section we will deal with the case of an infinite string, specifying u(x) =
φ(x, 0) and v(x) = ∂φ
∂t (x, 0). We have already seen how to solve the wave equa-
tion (and more generally linear hyperbolic equations with constant coefficients)
in Section 3.3.2.
We know that
φ(x, 0) = u(x) = F (x) + G(x).
Since φ(x, t) = F (x + ct) + G(x − ct) we have
u(x) = F (x) + G(x)
v(x) = c(F 0 (x) − G0 (x))
and so by the fundamental theorem of calculus
Z x
v(s)ds = c(F (x) − G(x)) + K
0
due to D’Alembert.
6.2.3 Example
For a similar example worked out from first principles, see Section 3.3.2.
Example 79. Suppose that φ(x, 0) = sin x and ∂φ/∂t(x, 0) = cos x. Then
Z x+ct
1 1
φ(x, t) = (sin(x + ct) + sin(x − ct)) + cos ξdξ
2 2c x−ct
1 1
= (sin(x + ct) + sin(x − ct)) + (sin(x + ct) − sin(x − ct))
2 2c
1 1
= (1 + 1/c) sin(x + ct) + (1 − 1/c) sin(x − ct).
2 2
110 CHAPTER 6. THE WAVE EQUATION
Let’s try to understand d’Alembert’s solution a little better. Assume for the
moment that v ≡ 0 so that the solution is
1
φ(x, t) = (u(x + ct) + u(x − ct)) .
2
At each time t this is a superposition of two terms. Both terms have the same
profile (shape), namely the profile of u (scaled down by a factor of two). As time
progresses one of these profiles seems to move to the right (u(x − ct), because its
argument is decreasing as t increases) and the other (u(x + ct)) seems to move
to the left. For instance, in Example 79 the crests of the 1+1/c
2 sin(x + ct) wave
occur at
x + ct = 2πn + π/2
Let xn (t) denote the position of the nth crest at time t. This gives
xn (t) = 2πn + π/2 − ct
so as t increases, xn (t) decreases and the crest moves to the left. We therefore
call this a left-moving wave and the other a right-moving wave.
Example 80. Suppose that
(
1 if |x| < 1
u(x) =
0 otherwise.
Then two little square waves move off in either direction. Notice that an observer
standing k metres to the left will only notice the arrival of the square wave after
k/c seconds. In other words, signals propagate at a speed of precisely c.
We now have two ways of solving the wave equation: by separating variables à
la Fourier and by a clever change of coordinates due to D’Alembert. It is easier
to apply the separated variables method to boundary-value problems because
it is not so easy to express the boundary conditions in terms of the arbitrary
functions F and G from D’Alembert’s solution. Of course the two solutions are
related by a standard trigonometric identity:
One particularly simple wave which we’re allowed on an infinite string is the
simple harmonic wave
which we could re-express in terms of sin(k(x − ct)) and cos(k(x − ct)) if we were
so inclined. The maximal height a is the amplitude, k is the angular wavenumber,
λ = 2π/k is the wavelength, ω = kc is the angular frequency and T = 2π/ω is
the period. Changing the constant B is called shifting the phase of the wave.
This wave is moving to the left (as time increases, x − ct is decreasing) and you
could create a right-moving wave by using x + ct.
A standing wave, which, as the name suggests, doesn’t move, can be obtained by
adding a left-moving and a right-moving simple harmonic wave with the same
amplitude and frequency
Suppose that there are two strings tied together at the origin and extending
off to infinity in either direction. Suppose that waves travel at speed c− in the
left-hand string and c+ in the right-hand string. Imagine a simple harmonic
wave incoming from the left
Aeik(x−c− t) , x≤0
and
kAeikc− t + `Rei`c− t = mT e−imc+ t .
Since eint are linearly independent in the space of all complex-valued functions
we need kc− = −`c− = mc+ , that is
2Ac+ A(c+ − c− )
T = , R= .
c− + c+ c− + c+
114 CHAPTER 6. THE WAVE EQUATION
Part III
Appendices
115
Appendix A
The letters m and n will always stand for integers greater than or equal to one.
(
L
0 if m 6= n,
Z mπx nπx
sin sin dx = (A.2)
−L L L L otherwise.
(
L
0 if m 6= n,
Z mπx nπx
cos cos dx = (A.3)
−L L L L otherwise.
Z L mπx nπx
sin cos dx = 0 (A.4)
−L L L
Assume for one moment that we are allowed to swap integrals and sums1 , for
instance:
Z L ∞
X nπx ∞ Z
X L nπx
an cos dx = an cos dx.
−L n=1 L n=1 −L L
1 You will see justification for this kind of operation next term in Analysis 4. Specifically,
117
118 APPENDIX A. RECAP OF FOURIER THEORY
Then a function f (x) with Fourier series given by (A.1) has Fourier coefficients
Z L
1
c= f (x)dx (A.5)
2L −L
1 L
Z nπx
an = f (x) cos dx (A.6)
L −L L
1 L
Z nπx
bn = f (x) sin dx (A.7)
L −L L
So when do Fourier series exist? Providing f is Riemann-integrable over
[−L, L] you can make sense of the integrals (A.6) and (A.7) and write down a
Fourier series (see Question 5 above).
But when do they actually have anything to do with the function you
started with? Let’s define the partial sums
N
X nπx N
X nπx
SN (x) = c + an cos + bn sin .
n=1
L n=1
L
f (x) − SN (x)
A.3 Interpretation
The way you should think about what’s going on is by analogy with finite-
dimensional vector spaces. With a finite-dimensional vector space V you can
pick an inner product
h, i : V × V → R
which, given two vectors, outputs a number. The vectors v, w are orthogonal if
hv, wi = 0. A basis e1 , . . . , en is orthonormal if hei , ej i = δij .
In Fourier theory the relevant vector space is the space of periodic functions
with period 2L, square-integrable on [−L, L]. The inner product of f and g is
1 L
Z
f (x)g(x)dx.
L −L
RL RL
Note that this is a well-defined integral: if the integrals −L f (x)2 dx and −L g(x)2 dx
are both finite then by the Cauchy-Schwarz inequality,
2 ! Z !
Z L Z L L
f (x)g(x)dx ≤ f (x)2 dx g(x)2 dx ,
−L −L −L
121
122 APPENDIX B. SAGE CODE FOR DIAGRAMS
x=var(’x’)
y=var(’y’)
z=var(’z’)
t=var(’t’)
u=var(’u’)
p=plot_vector_field3d((1,z,0),(x,-1,1),(y,-1,1),(z,-1,1),frame=False)
q=plot_vector_field3d((1,z,0),(x,-1,1),(y,-1,1),(z,0,1),frame=False)
r=parametric_plot3d((t,u+t*u,u),(t,0,1),(u,-1,1),plot_points=[10,10],
opacity=0.8,frame=False)
s=parametric_plot3d((t,u+t*u^2,u^2),(t,0,1),(u,-1,1),plot_points=[10,10],
opacity=0.8,frame=False)
Ax=parametric_plot(vector((t,0,0)),(t,-1.1,1.1),
thickness=3, color=’black’)
Ay=parametric_plot(vector((0,t,0)),(t,-1.1,1.1),
thickness=3, color=’black’)
Az=parametric_plot(vector((0,0,t)),(t,-0.7,0.7),
thickness=3, color=’black’)
Az2=parametric_plot(vector((0,0,t)),(t,0,0.7),
thickness=3, color=’black’)
show(p+r+Ax+Ay+Az,mesh=True)
show(q+s+Ax+Ay+Az2,mesh=True)
q7=parametric_plot((t,-0.6-0.6*t),(t,-1.2,1),axes=False)
q8=parametric_plot((t,-0.7-0.7*t),(t,-1.2,1),axes=False)
q9=parametric_plot((t,-0.8-0.8*t),(t,-1.2,1),axes=False)
qA=parametric_plot((t,-0.9-0.9*t),(t,-1.2,1),axes=False)
qB=parametric_plot((t,-1-1*t),(t,-1.2,1),axes=False)
Ax=parametric_plot(vector((t,0)),(t,-1.1,1.1),
thickness=3, color=’black’)
Ay=parametric_plot(vector((0,t)),(t,-1.1,1.1),
thickness=3, color=’black’)
show(p1+p2+p3+p4+p5+p6+p7+p8+p9+pA+pB+q1+q2+q3+q4+q5+q6+
q7+q8+q9+qA+qB+Ax+Ay)