Calculus of Variations: Catenary & Soap Film
Calculus of Variations: Catenary & Soap Film
Introduction
We’ve seen how Whewell solved the problem of the equilibrium shape of chain hanging between two
places, by finding how the forces on a length of chain, the tension at the two ends and its weight,
balanced. We’re now going to look at a completely different approach: the equilibrium configuration is
an energy minimum, so small deviations from it can only make second-order changes in the gravitational
potential energy. Here we’ll find how analyzing that leads to a differential equation for the curve, and
how the technique developed can be successfully applied of a vast array of problems.
x2 x2
J y ( x ) =
2π ∫ yds =
2π ∫ y 1 + y′2 dx, dy / dx )
( y′ =
x1 x1
where we have taken the rope density and g both equal to unity for mathematical convenience.
Usually in calculus we minimize a function with respect to a single variable, or several variables. Here the
potential energy is a function of a function, equivalent to an infinite number of variables, and our
problem is to minimize it with respect to arbitrary small variations of that function. In other words, if we
nudge the chain somewhere, and its motion is damped by air or internal friction, it will settle down
again in the catenary configuration.
Formally speaking, there will be no change in that potential energy to leading order if we make an
infinitesimal change in the curve, y ( x ) → y ( x ) + δ y ( x ) (subject of course to keeping the length the
∫
same, that is δ ds = 0 .)
This method of solving the problem is called the calculus of variations: in ordinary calculus, we make an
infinitesimal change in a variable, and compute the corresponding change in a function, and if it’s zero
to leading order in the small change, we’re at an extreme value.
(Nitpicking footnote: Actually this assumes the second order term is nonzero—what about x3 near the origin? But
such situations are infrequent in the problems we’re likely to encounter.)
The difference here is that the potential energy of the hanging change isn’t just a function of a variable,
or even of a number of variables—it’s a function of a function, it depends on the position of every point
on the chain (in the limit of infinitely small links, that is, or equivalently a continuous rope).
2
So, we’re looking for the configuration where the potential energy doesn’t change to first order for any
infinitesimal change in the curve of its position, subject to fixed endpoints, and a fixed chain length.
(Interestingly, this problem is also closely related to string theory: as a closed string
propagates, its path traces out as “world sheet” and the string dynamics is determined
by that sheet having minimal area.)
Taking the axis of rotational symmetry to be the x -axis, and the radius y ( x ) ,
we need to find the function
y ( x ) that minimizes the total
area ( ds is measured along the
curve of the surface). Think of
the soap film as a sequence of rings or collars, of radius
y , and therefore area 2π yds. The total area is given by
integrating, adding all these incremental collars,
x2 x2
y ( x ) 2=
J= π ∫ yds 2π ∫ y 1 + y′2 dx
x1 x1
=
given that y ( x1 ) y=
1 , y ( x2 ) y2 .
(You might be thinking at this point: isn’t this identical to the catenary equation? The answer is yes, but
the chain has an additional requirement—it has a fixed length. The soap film is not constrained in that
way, it can stretch or contract to minimize the total area, so this is a different problem!)
That is, we want δ J = 0 to first order, if we make a change y ( x ) → y ( x ) + δ y ( x ) . Of course, this also
means y′ ( x ) → y′ ( x ) + δ y′ ( x ) where
= ( dy / dx )
δ y′ δ= ( d / dx ) δ y .
3
x2
=J [ y] f ( y, y′ ) dx
∫= ( y′ dy / dx ) .
x1
Then under any infinitesimal variation δ y ( x ) (equal to zero at the fixed endpoints)
∂f ( y, y′ ) ∂f ( y, y′ )
x2
δ J [ y ] =∫ δ y ( x) + δ y′ ( x ) dx =0.
x1
∂y ∂y′
( dy / dx )
write δ y′ δ=
To make further progress, we = ( d / dx ) δ y , then integrate the second term by
parts, remembering δ y = 0 at the endpoints, to get
x
∂f ( y, y′ ) d ∂f ( y, y′ )
2
δ J [ y ] =−
∫x ∂y δ y ( x ) dx =
0.
dx ∂ y ′
1
Since this is true for any infinitesimal variation, we can choose a variation which is only nonzero near
one point in the interval, and deduce that
∂f ( y, y′ ) d ∂f ( y, y′ )
− =0.
∂y dx ∂y′
This general result is called the Euler-Lagrange equation. It’s very important—you’ll be seeing it again.
∂f ( y, y′ ) dy d ∂f ( y, y′ )
− y′ =
0.
∂y dx dx ∂y′
df ∂f dy ∂f dy′
= +
dx ∂y dx ∂y′ dx
∂f ( y , y ′) dy
and using this to replace in the preceding equation gives
∂y dx
4
df ∂f dy ′ d ∂f ( y , y ′)
− − y′ =
0,
dx ∂y ′ dx dx ∂y ′
d ∂f
y′ − f =0,
dx ∂y′
∂f
y′ −f =
constant.
∂y′
f ( y, =
y ′ ) y 1 + y ′2 ,
d yy′
1 + y ′2 − =
0.
dx 1 + y′2
∂f yy′2 y
y′ −f = − y 1 + y ′2 =− =constant.
∂y′ 1 + y′ 2
1 + y ′2
We’ll write
y
= a,
1 + y ′2
Rearranging,
2
dy y
= − 1,
dx a
or
5
ady
dx = .
y2 − a2
x −b
y = a cosh .
a
yds ∫ y (1 + y′ )
1
∫=
2 2
dx,
the same as the area function for the soap film. But there’s an important physical difference: the chain
has a fixed length. The soap film is free to adjust its “length” to minimize the total area. The chain
isn’t—it’s constrained. How do we deal with that?
Lagrange Multipliers
The problem of finding minima (or maxima) of a function subject to constraints was first solved by
Lagrange. A simple example will suffice to show the method.
Imagine we have some smooth curve in the ( x, y ) plane that does not pass through the origin, and we
want to find the point on the curve that is its closest approach to the origin. A standard illustration is to
picture a winding road through a bowl shaped valley, and ask for the low point on the road. (We’ll also
assume that x determines y uniquely, the road doesn’t double back, etc. If it does, the method below
would give a series of locally closest points to the origin, we would need to go through them one by one
to find the globally closest point.)
6
Let’s write the curve, the road, g ( x, y ) = 0 (the wiggly red line in the figure below).
Road through valley: deep green is valley Therefore, at that point, the curves g ( x, y ) = 0 and
bottom, hills darken with height.
f ( x, y ) = amin
2
are parallel.
Therefore the normals to the curves are also parallel: ( ∂f / ∂x, ∂f / ∂y ) = λ ( ∂g / ∂x, ∂g / ∂y ) .
(Note: yes, those are the directions of the normals—for an infinitesimal displacement along the curve
f ( x, y ) =constant, 0 =df =∂ ( f / ∂x ) dx + ( ∂f / ∂y ) dy , so the vector ( ∂f / ∂x, ∂f / ∂y ) is
perpendicular to ( dx, dy ) . This is also analogous to the electric field E = −∇ϕ being perpendicular to
The constant λ introduced here is called a Lagrange multiplier. It’s just the ratio of the lengths of the
two normal vectors (of course, “normal” here means the vectors are perpendicular to the curves, they
are not normalized to unit length!) We can find λ in terms of x, y but at this point we don’t know their
values.
The equations determining the closest approach to the origin can now be written:
∂
( f − λg) =
0,
∂x
∂
( f − λg) =
0,
∂y
∂
( f − λg) =
0.
∂λ
7
(The third equation is just g ( xmin , ymin ) = 0 , meaning we’re on the road.)
The first two equations can be solved to find λ and the ratio x / y , the third equation then gives x, y
separately.
Exercise for the reader: Work through this for g ( x, y ) = x − 2 xy − y − 1. (There are two solutions
2 2
Lagrange multipliers are widely used in economics, and other useful subjects such as traffic
optimization.
J y (=
x ) ∫ y (1 + y′ ) dx,
1
∫=
22
yds
The Lagrange multiplier method generalizes in a straightforward way from variables to variable
functions. In the curve example above, we minimized f ( x, y=
) x 2 + y 2 subject to the constraint
g ( x, y ) = 0. What we need to do now is minimize J y ( x ) subject to the constraint
L y ( x ) − =0.
For the minimum curve y ( x ) and the correct (so far unknown) value of λ , an arbitrary infinitesimal
variation of the curve will give zero first-order change in J − λ L , we write this as
x2 x2
{ }
δ J y ( x ) − λ L y ( x ) = δ ∫ ( y − λ ) ds = δ ∫ ( y − λ ) 1 + y ′2 dx = 0.
x1 x1
Remarkably, the effect of the constraint is to give a simple adjustable parameter, the origin in the y
direction, so that we can satisfy the endpoint and length requirements.
The solution to the equation follows exactly the route followed for the soap film, leading to the first
integral
y−λ
= a,
(1 + y′2 )
1
2
8
Rearranging,
y−λ
2
dy
= − 1,
dx a
or
ady
dx = .
( y −λ) −a
2 2
x −b
y= λ + a cosh .
a
Here b is the second constant of integration, the fixed endpoints and length give λ , a, b. In general, the
equations must be solved numerically. To get some feel for why this will always work, note that
changing a varies how rapidly the cosh curve climbs from its low point of ( x=
, y) ( b, λ + a ) , increasing
a “fattens” the curve, then by varying b, λ we can move that lowest point to the lowest point of the
chain (or rather of the catenary, since it might be outside the range covered by the physical chain).
Algebraically, we know the curve can be written as y = a cosh ( x / a ) , although at this stage we don’t
know the constant a or where the origin is. What we do know is the length of the chain, and the
horizontal and vertical distances ( x2 − x1 ) and ( y2 − y1 ) between the fixed endpoints. It’s
=
straightforward to calculate that the length of the chain is a sinh ( x2 / a ) − a sinh ( x1 / a ) , and the
=
vertical distance v between the endpoints is v a cosh ( x2 / a ) − a cosh ( x1 / a ) from which
2 − v 2 4a 2 sinh 2 ( x2 − x1 ) / 2a . All terms in this equation are known except a , which can
=
therefore be found numerically. (This is in Wikipedia, among other places.)
Exercise: try applying this reasoning to finding a for the soap film minimization problem. In that case,
we know ( x1 , y1 ) and ( x2 , y2 ) , there is no length conservation requirement, to find a we must
=
eliminate the unknown b from the equations y1 a cosh ( ( x − b=
1 ) / a), y
2 a cosh ( ( x2 − b ) / a ) . This
is not difficult, but, in contrast to the chain, does not give a in terms of y1 − y2 , instead, y1 , y2 appear
separately. Explain, in terms of the physics of the two systems, why this is so different from the chain.
9
The Brachistochrone
Suppose you have two points, A and B, B is below A, but not directly below. You have some smooth, let’s
say frictionless, wire, and a bead that slides on the wire. The problem is to curve the wire from A down
to B in such a way that the bead makes the trip as quickly as possible.
This optimal curve is called the “brachistochrone”, which is just the Greek for “shortest time”.
But what, exactly, is this curve, that is, what is y ( x ) , in the obvious notation?
This was the challenge problem posed by Johann Bernoulli to the mathematicians of Europe in a Journal
run by Leibniz in June 1696. Isaac Newton was working fulltime running the Royal Mint, recoining
England, and hanging counterfeiters. Nevertheless, ending a full day’s work at 4 pm, and finding the
problem delivered to him, he solved it by 4am the next morning, and sent the solution anonymously to
Bernoulli. Bernoulli remarked of the anonymous solution “I recognize the lion by his clawmark”.
Here’s how to solve the problem: we’ll take the starting point A to be the origin, and for convenience
measure the y -axis positive downwards. This means the velocity at any point on the path is given by
=
1
2 mv
2
=
mgy , v 2 gy ,
1 + y ′2 dx
B B X
ds ds
=
T ∫A=
v ∫
A
=
2 gy ∫
0 2 gy
.
Notice that this has the same form as the catenary equation, the only difference being that y is replaced
by 1/ 2gy , the integrand does not depend on x, so we have the first integral:
∂f 1 + y ′2
y′
= − f constant,
= f .
∂y′ 2 gy
That is,
y ′2 1 + y ′2 1
− =
− =
constant,
(1 + y′ ) 2 gy
2 2 gy (1 + y ′ 2
) 2 gy
so
2
dy 2a
+1 =
dx y
10
Recalling that the curve starts at the origin A, it must begin by going vertically downward, since y = 0.
For small enough y , we can approximate by ignoring the 1, so 2adx ≅ ydy , 2ax ≅ 2 3 y 3/2 . The
curve must however become horizontal if it gets as far down as y = 2a , and it cannot go below that
level.
dy y
=dx = dy.
2a 2a − y
−1
y
1− z
dx = −a dz.
1+ z
Now what? We’d prefer for the expression inside the square root to be a perfect square, of course. You
1 + cos θ 2 cos 2 (θ / 2 ) , =
may remember from high school trig that = 1 − cos θ 2sin 2 (θ / 2 ) . This
gives immediately that
1 − cos θ θ
= tan 2 ,
1 + cos θ 2
θ θ θ θ θ
dx = 2a tan sin cos dθ =
−a tan dz = a (1 − cos θ ) dθ .
2a sin 2 dθ =
2 2 2 2 2
=x a (θ − sin θ )
y a (1 − cos θ ) ,
=
where we’ve fixed the constant of integration so that the curve goes through the origin (at θ = 0 ).
To see what this curve looks like, first ignore the θ term in x , leaving x =
− a sin θ , y =
− a cos θ .
Evidently as θ increases from zero, the point ( x, y ) goes anticlockwise around a circle of radius a
Now adding the θ back in, this circular motion move steadily to the right, in such a way that the initial
direction of the path is vertically down. (For very small θ , y θ 2 x θ 3 ).
Visualizing the total motion as θ steadily increases, The center moves from its original position at
( 0, −a ) to the right at a speed aθ . Meanwhile, the point is moving round the circle anticlockwise at
this same speed. Putting together the center’s linear velocity with the corresponding angular velocity,
( )
we see the motion x (θ ) , y (θ ) is the path of a point on the rim of a wheel rolling without sliding
along a road (upside down in our case, of course). This is a cycloid.
∂f ( y, y′ ) ∂f ( y, y′ )
x2
δ J [ y ] =∫ δ y ( x) + δ y′ ( x ) dx =0.
x1
∂y ∂y′
=
Writing ( dy / dx )
δ y′ δ= ( d / dx ) δ y , and integrating the second term by parts,
∂f ( y, y′ ) d ∂f ( y, y′ ) ∂f ( y, y′ )
X
x2
δ J [ y] =
∫x ∂y
− δ y ( x ) dx + δ y ( x ) =0.
1
dx ∂y′ ∂y′ 0
However, we are now trying to find the fastest time for a given horizontal distance, so the final vertical
distance is an adjustable parameter: δ y ( X ) ≠ 0 .
As before, since δ J [ y ] = 0 for arbitrary δ y, we can still choose a δ y ( x ) which is only nonzero near
some point not at the end, so we must still have
∂f ( y, y′ ) d ∂f ( y, y′ )
− =0.
∂y dx ∂y′
12
∂f ( y ( X ) , y′ ( X ) )
However, we must also have δ y ( X ) = 0, to first order for arbitrary infinitesimal
∂y′
∂f ( y, y′ )
δ y ( X ) , (imagine a variation δ y only nonzero near the endpoint), this can only be true if =0
∂y′
at x = X .
1 + y ′2 ∂f y′ ∂f ( y, y′ )
=
For the brachistochrone, f = , so = 0 at x = X means that
2 gy ∂y′ 2 gy (1 + y′2 ) ∂y′
f ′ = 0 , the curve is horizontal at the end x = X . So the curve that delivers the bead a given horizontal
distance the fastest is the half-cycloid (inverted) flat at the end. It’s easy to see this fixes the curve
uniquely: think of the curve as generated by a rolling wheel, one half-turn of the wheel takes the top
point to the bottom in distance X.
It turns out (and was proved geometrically by Newton) that the ideal pendulum path is a cycloid.
Thinking in terms of the equivalent bead on a wire problem, with a symmetric cycloid replacing the
circular arc of an ordinary pendulum, if the bead is let go from rest at any point on the wire, it will reach
the center in the same time as from any other point. So a clock with a pendulum constrained to such a
path will keep very good time, and not be sensitive to the amplitude of swing.
The proof involves similar integrals and tricks to those used above:
ds 1 + y ′2
=T ( y0 ) ∫= ∫ dx
2 g ( y − y0 ) 2 g ( y − y0 )
π
a 1 − cos θ
∫
g θ0 cos θ 0 − cos θ
dθ .
13
( x − a )( b − x ) =
b
This is left as an exercise for the reader. (Hint: you may find ∫
a
dx / π to be useful.
Can you prove this integral is correct? Why doesn’t it depend on a, b ?)
Exercise: As you well know, a simple harmonic oscillator, a mass on a linear spring with restoring force
−kx , has a period independent of amplitude. Does this mean that a particle sliding on a cycloid is
equivalent to a simple harmonic oscillator? Find out by expressing the motion as an equation F = ma
where the distance variable from the origin is s measured along the curve.
x2
J [ y] = ∫ f ( y, y′) dx
x1
has a stationary value, and we’ve seen how it works in some two-dimensional curve examples.
But most dynamical systems are parameterized by more than one variable, so we need to know how to
go from a curve in ( x, y ) to one in a space ( x, y1 , y2 , yn ) , and we need to minimize (say)
∫ f ( y , y , y , y ′ , y′ , y′ ) dx.
x2
J [ y1 , y2 , yn ] = 1 2 n 1 2 n
x1
In fact, the generalization is straightforward: the path deviation simply becomes a vector,
δ y ( x ) = (δ y1 ( x ) , δ y2 ( x ) , , δ yn ( x ) )
∂f ( y , y ′) ∂f ( y , y ′)
x2 n
δ J [ y]= ∫x ∑ ∂y
i =1 i
δ yi ( x ) +
∂yi′
δ yi′ ( x ) dx = 0.
1
Just as before, we take the variation zero at the endpoints, and integrate by parts to get now n separate
equations for the stationary path:
14
∂f ( y , y ′) d ∂f ( y , y ′)
− =0, i =
1, , n.
∂yi dx ∂yi′
∂f ( y , y ′) dyi d ∂f ( y , y ′)
− yi′ =
0.
∂yi dx dx ∂yi′
df n
∂f dyi ∂f dyi′
=
dx
∑ ∂y
i =1
+ ,
∂yi′ dx
i dx
d n ∂f
∑ yi′ − f =0,
dx i =1 ∂yi′
n
∂f
and the (important!) first integral ∑ y′ ∂y′ − f
i =1
i =
constant.
i