Gerald Folland - Advanced Calculus
Gerald Folland - Advanced Calculus
The first half of this chapter (§§1.1-4) presents basic facts and concepts concern
ing geometry, vectors, limits, continuity, and sequences; the material in it is used
throughout the later chapters. The second half (§§1.5-8) deals with some of the
more technical topological results that underlie calculus. It is quite concise and in
cludes nothing but what is needed in this book. The reader who wishes to proceed
quickly to the study of differentiation and integration may scan it quickly and refer
back to it as necessary; on the other hand, the reader who wishes to see a more
extensive development of this material is referred to books on the foundations of
analysis such as DePree and Swartz [5], Krantz [12], or Rudin [18].[
At the outset, let us review some standard notation and terminology for future
reference:
• Sums: If ai, a2, • • •, afc are numbers, their sum a\ + 02 H h afc is denoted
by Yli an, or by ]Cn=i an if necessary for clarity. The sum need not be
started at n = 1; more generally, if j < k, we have
V ak.
The letters j and k denote the limits of summation; the letter n is analo
gous to a dummy variable in an integral and may be replaced by any other
letter that is not already in use without changing the meaning of the sum.
We shall occasionally write simply J2 an when the limits of summation are
understood.
' Numbers in brackets refer to the bibliography at the end of the book.
Chapter 1. Setting the Stage
• Sets: If S and T are two sets, S U T and 5 n T denote their union and
intersection, respectively, and S\T denotes the set of all elements of S that
are not in T. The expressions "S C T" and "T D 5"' both mean that S is a
subset of T, including the possibility that S = T, and "x 6 5" and "x <£ S"
mean, respectively, that x is or is not an element of 5. The set of all objects
x satisfying a property P(x) is denoted by {x : P(x)}, and empty set is
denoted by 0.
The union and intersection of a family Si, S2,..., Sk of sets are denoted by
Ui 5n and Pl1 5V,,. The conventions for using the symbols |J and f] are the
same as those for the summation sign J2 described above.
Intervals of the form (a, b) are called open; intervals of the form [a, b] are
called closed; and intervals of the forms (a, b] and [a, b) are called half-open.
(Of course, the symbol (a, b) is also used to denote the ordered pair whose
first and second members are a and b, respectively; remarkably enough, this
rarely causes any confusion.)
If {xi,..., Xk} is a finite set of real numbers, its largest and smallest ele
ments are denoted by max(xi,.. ■ ,xk) and min(xi,...,zfc), respectively.
• Special functions: In this book, we denote the natural logarithm by log rather
than In, this being the common usage in advanced mathematics. Also, we de
note the principal branches of the inverse trig functions by arcsin, arccos, and
arctan; arcsin and arccos map [—1,1] onto [-571", ^tt] and [0, it], respectively,
and arctan maps M. onto ( — ^tt, |tt).
• Logical symbols: We shall sometimes use the symbols => and <£=> to de
note logical implication and equivalence, respectively. That is, if A and B
4 Chapter 1. Setting the Stage
are mathematical statements, "A =$> B" is read "A implies B" or "If A,
then B," and "A <=^ B" is read "/I is equivalent to B" or "A if and only
if B." We point out that "A ==> B" and "not B => not y4" are logically
equivalent; that is, in order to prove that hypothesis A implies conclusion B,
one may assume that B is false and show that A is false.
. x = Oi,;?;2,... ,xn).
I = \lx\ + x.
ab| <
This is a quadratic function oft. Its minimum value occurs at t = (a • b)/|b|2, and
that minimum value is
= a —
|b| 2 '
Multiplying through by |b|2, we obtain the desired result: |a|2|b|2 > (a • b)2. □
1/2 b -.1/2
f(x)g(x) dx \f(x)\2dx \g(x)\2dx\ ,
and similarly for points in the plane. We shall take this as a definition of distance
in n-space for any n:
x — z x-y| + |y-z
for any x, y, z 6 Rn. That is, the distance from x to z is at most the sum of the
distances from x to y and from y to z, for any intermediate point y. Hence the
name "triangle inequality": One side of a triangle is at most the sum of the other
two sides.
If we think of two vectors x and y as arrows emanating from the same point, we
can speak of the angle 6 between them. The familiar formula for 9 in dimensions
2 and 3 remains valid in higher dimensions:
6 = arccos
X
Cauchy's inequality says that the quotient in parentheses always lies in the interval
r—1,1], so it is indeed the cosine of some number 9 G [0, it}.
In particular, the directions of two vectors x and y are perpendicular to each
other if and only if x • y = 0. In this case the vectors are said to be orthogonal to
each other.
In many situations we need to control the magnitude, i.e., the norm, of a vector
x = (xi,..., xn), but it is often more convenient to work with the magnitudes of
the components Xj of x. In such cases the following inequalities are useful. Let M
be the largest of the numbers \x\\,..., \xn\. Then M2 < x\ + • ■ • + x2 (because
M2 is one of the numbers on the right), and x\ + • • • + x2n < nM2 (because each
number on the left is at most M2). In other words,
j k
a x b = det — 0362)1 + (0361 -
jL a x b = -b x a.
a x (b x c) + b x (c x a) + c x (a x b) = 0.
|axb|2 = |a|2|b|2-(a-b)2.
(|a x b|2 is the sum of the squares of the components of a x b. Multiply it out and
rearrange the terms to get |a|2|b|2 - (a • b)2.) If 9 is the angle between a and b
(0 < 6 < it), we know that a • b = |a| |b| cos 6, so
a- (ax b) = b- (a x b) = 0;
Chapter 1. Setting the Stage
a x b
EXERCISES
• The closure of S is the union of 5 and all its boundary points. It is denoted
by S:
S = SU dS.
Let us examine these ideas a little more closely. First, notice that the boundary
points of S are the same as the boundary points of Sc; the definition of boundary
point remains unchanged if S and Sc are switched. Moreover, if x is neither an
interior point of S nor an interior point of Sc, then x must be a boundary point of S.
In other words, given S C M.n and x G Rn, there are exactly three possibilities: x
is an interior point of S, or x is an interior point of Sc, or x is a boundary point
of S.
Example 1. Let S be B(p, 0), the ball of radius p about the origin. First,
given x G S, let r = p — |x|. If |y — x| < r, then by the triangle inequality we
have |y | < |y - x| + |x| < p, so that B(r, x) C S. Therefore, every x <E 5 is
an interior point of S, so S is open. Second, a similar calculation shows that if
|x| > p then B(r, x) c Sc where r = |x| — p, so every point with |x| > p is an
interior point of Sc. On the other hand, if |x| = p, then ex € S for 0 < c < 1
and ex € 5C for c > 1, and |cx — x| = \c — l\p can be as small as we please,
so x is a boundary point. In other words, the boundary of 5 is the sphere of
radius p about the origin, and the closure of S is the closed ball {x : |x| < p}.
EXAMPLE 2. Now let S be the ball of radius p about the origin together with
the "upper hemisphere" of its boundary:
The calculations in Example 1 show that Smt is the open ball B(p, 0); OS is
the sphere {x : |x| = p}, and S is the closed ball {x : |x| < p}. The set S is
neither open nor closed.
Example 3. In the real line (i.e., n = 1), let S be the set of all rational
numbers. Since every ball in M — that is, every interval — contains both
rational and irrational numbers, every point of R is a boundary point of S. The
set 5 is neither open nor closed; its interior is empty; and its closure is R.
where □ denotes one of the relations =, <, >, <, >. (Taking the quantity on the
right of □ to be 0 is no restriction; just move all the terms over to the left side.) We
anticipate some results from § 1.3 in giving the following rule of thumb: Sets defined
by strict inequalities are open; sets defined by equalities or weak inequalities are
closed. More precisely, if S is given by (1.5) where the function f is continuous,
then S is open ifO denotes < or >, and S is closed ifO denotes =, <, or >. The
reader may feel free to use this rule in doing the exercises.
12 Chapter I. Setting the Stage
EXERCISES
1. For each of the following sets S in the plane R2, do the following: (i) Draw a
sketch of 5. (ii) Tell whether S is open, closed, or neither, (iii) Describe 5"nt,
S, and OS. (These descriptions should be in the same set-theoretic language as
the description of 5 itself given here.)
a. S = {(»,!/) : 0 < x2 + y2 < 4}.
b. S={{x,y) :x2-x<y<0}.
c. 5 = {(x, y) : x > 0, y > 0, and x + y > 1}.
d. S = {(x,y):y = x3}.
e. S — {(z,y) ■ x > 0 and y = sin(l/a;)}.
f. S *= {(x,y) :a? + y* <1}\{(x,0) :x <0}.
g. S — {{x, y) : x and y are rational numbers in [0,1] j.
, . 2. Show that for any 5 C Rn, 8int is open and dS and 5 are both closed. (Hint:
Use the fact that balls are open, proved in Example 1.)
, 5. Show that the boundary of 5 is the intersection of the closures of 5 and Sc.
6. Give an example of an infinite collection Si, S2,.... of closed sets whose union
U~=1 Sj is not closed.
7. There are precisely two subsets of Kn that are both open and closed. What are
they?
8. Give an example of a set S such that the interior of S is unequal to the interior
of the closure of S.
9. Show that the ball of radius r about a is contained in the ball of radius r + a
lim /(x) = L,
x—>a
(1.7) |/(x) — L\ < e whenever 0 < max(|a;i — oi|,..., \xn — an\) < 5'.
The equivalence of (1.6) and (1.7) follows from (1.3): If (1.6) is satisfied, then
(1.7) is satisfied with 6' = S/y/n; and if (1.7) is satisfied, then (1.6) is satisfied
with 5 = 8'.
More generally, we can consider functions / that are only defined on a subset
S of W1 and points a that lie in the closure of S. The definition of lirxix^a /(x) is
the same as before except that x is restricted to lie in the set S. It may be necessary,
for the sake of clarity, to specify this restriction explicitly; for this purpose we use
the notation
lim /(x).
x—>a, xgS
In particular, for a function / on the real line we often need to consider the one
sided limits
For example, let / : K. —>■ K be the function defined by f(x) = x + 1 for \x\ < 1
and/(.-r) = Ofor|x| > 1. Then lim^^i f(x) does not exist, but lirax-^i- f(x) = 2
and limx_^i+ f(x) = 0.
Notice that the definition of limx_>a/(x) does not involve the value /(a) at
all; only the values of / at points near a but unequal to a are relevant. Indeed, /
need not even be defined at a — a situation that arises, for example, in the limits
that define derivatives. On the other hand, if limx_>a /(x) and /(a) both exist and
are equal, that is, if
14 Chapter 1. Setting the Stage
Thus the study of limits and continuity of vector-valued functions is easily reduced
to the scalar case, to which we now return out attention.
We often express the relation limx_>a f(x) = L informally by saying that /(x)
approaches L as x approaches a. In one dimension this works quite well; we can
envision x as the location of a particle that moves toward a from the right or the
left. But in higher dimensions there are infinitely many different paths along which
a particle might move toward a, and for the limit to exist one must get the same
result no matter which path is chosen. It is safer to abandon the "dynamic" picture
of a particle moving toward a; we should simply think in terms of /(x) being close
to L provided that x is close to a, without reference to any motion.
The argument just given suggests the following line of thought. We wish to
know if limx->a/(x) exists. We look at all the straight lines passing through a
and evaluate the limit of /(x) as x approaches a along each of those lines by one-
variable techniques; if we always get the same answer L, then we should have
^a /(x) = L, right? Unfortunately, this doesn't work:
I \
g(x, ex) = C3°3 CX n
= -^-—g -> 0 as x -+ 0, n
X4 + CZX2 C2 + X1
so the limit as (x, y) —> (0,0) along any other straight line is also 0. But if we
approach along & parabola y = ex2, we get
ex3
g{x,cx2) =
X4 + C2X4 1 + C2 '
which can be anything between —i and ^ as before, so the limit does not
exist. (The similarity with Example 1 is not accidental: If / is the function in
Example 1 we have g(x, y) = f(x2,y).)
After looking at examples like this one, one might become discouraged about
the possibility of ever proving that limits do exist! But things are not so bad. If / is a
continuous function, limx_>a /(x) is simply /(a). Moreover, most of the functions
of several variables that one can easily write down are built up from continuous
functions of one variable by using the arithmetic operations plus composition, and
these operations all preserve continuity (except for division when the denominator
vanishes).
Here are the precise statements and proofs of the fundamental results. (The
reader may wish to skip the proofs; they are of some value as illustrations of the sort
of formal arguments involving limits that are important in more advanced analysis,
but they contribute little to an intuitive understanding of the results.)
Proof. Let e > 0 and a e U be given, and let b = f (a). Since g is continuous on
f{U), we can choose r/ > 0 so that |g(y)—g(b)| < e whenever |y-b| < rj. Having
16 Chapter 1. Setting the Stage
chosen this 77, since f is continuous on U we can find S > 0 so that |f (x) — b| < rj
whenever |x — a < 8. Thus,
1.10 Theorem. Let fi(x,y) = x + y, f2{x,y) = xy, and g(x) = 1/x. Then f\
and J2 are continuous on M.2 and g is continuous on R \ {0}.
Proof. To prove continuity of f\ and f%, we need to show that lira(a.jl/)_>(a>j,) x+y =
a + b and lim^)^^) xy = ab for every a,b G R. That is, given e > 0 and
a, 6 G K, we need to find 5 > 0 so that if \x — a\ < 6 and \y — b\ < 5, then (i)
\{x + y) - (a + 6)| < e or (ii) |xy - a&| < e. For (i) we can simply take 5 = \e,
for if \x — a\ < \e and \y — b\ < \e, then
{x + y) - (a (y-b)\ < \x - a\ + \y - b\ \t =■ e.
S = min ( 1,
2(|a|4 ' 2(|6| +
x a ax
Given e > 0, let 8 be the smaller of the numbers h\a\ and hea2. If \x — a\ < 8, then
<\a-x\ < \\a\ + \x\ and hence |x| > \\a\, so
,2
x — a ea
= €
ax 2ax
as desired. D
1.11 Corollary. The function fa{x, y) = x - y is continuous on M.2, and the func
tion fi(x, y) = x/y is continuous on {(.t, y) : y ^ 0}.
1.3. Limits and Continuity 17
Proof. Combine Theorem 1.10 and Corollary 1.11 with Theorem 1.9. For example,
if / and g are continuous functions on U C Rn, then / + g is continuous because
it is the composition of the continuous map (/, g) from U to R2 and the continuous
map (x,y) h-> x+y from R2 to R. Likewise for the other arithmetic operations. □
In fact, the limit is 0, and this can be established with a little ad hoc estimat
ing. Clearly \x2 — y2\ < x2 + y2, so \h(x,y)\ < \xy\. But xy —f 0 as
(x, y) —* (0, 0), so h(x, y), being even smaller in absolute value than xy, must
also approach 0. Thus lim(a.]J/)_>(0,o) ^(a?, I/) = 0 and fo is continuous at (0, 0).
1.13 Theorem. Suppose f : Rn —> Rfc is continuous and U is a subset ofM.k, and
let S = {x G R" : f (x) G U}. Then S is open ifU is open, and S is closed ifU is
closed.
Proof. Suppose U is open. We shall show that S is open by showing that every
point a in 5 is an interior point of S. If a G S, then /(a) € U. Since U is open,
some ball centered at /(a) is contained in U; that is, there is a positive number e
such that every y G Rk such that |y — f(a)| < e is in U. Since / is continuous,
there is a positive number 5 such that |f(x) - f (a)| < e whenever |x — a| < 5.
But this means that /(x) G U whenever |x — a| < 5, that is, x G S whenever
x - a | < 5. Thus a is an interior point of S.
On the other hand, suppose U is closed. Then the complement of U in R is open
by Proposition 1.4b, so the set S' = {x : /(x) e Uc} is open by the argument just
given. But S" is just the complement of 5 in Rn, so 5 is closed by Proposition 1.4b
again. □
EXERCISES
1. For the following functions /, show that ]ia^x,v)-*{O,o) f(xftl) does not ex
ist.
a- f{x,y) = —.
4 4
3. Let f(x,y) = a;"1 sin(a;y) for x 7^ 0. How should you define /(0,y) for
y G 1R so as to make / a continuous function on all of E2 ?
4. Let f(x,y) = xy/(x2 + y2) as in Example 1. Show that, although / is dis
continuous at (0,0), f(x,a) and f(a,y) are continuous functions of a; and y,
respectively, for any a £ M. (including a = 0). We say that / is separately
continuous in x and y.
5. Let f(x,y) = y(y - x2)/x4 if 0 < y < x2, f{x,y) = 0 otherwise. At which
point(s) is / discontinuous?
6. Let f(x) = x if x is rational, f(x) = 0 if x is irrational. Show that / is
continuous at x = 0 and nowhere else.
7. Let f(x) = l/q if x = p/q where p and q are integers with no common factors
and q > 0, and f(x) = 0 if x is irrational. At which points, if any, is /
continuous?
8. Suppose f : Wl —> Rk has the following property: For any open set U C Rk,
{x : f(x) e U} is an open set in M". Show that f is continuous on W1. Show
also that the same result holds if "open" is replaced by "closed."
9. Let U and V be open sets in E" and let f be a one-to-one mapping from U onto
V (so that there is an inverse mapping f"1 : V —> U). Suppose that f and f"1
are both continuous. Show that for any set S whose closure is contained in U
we have f (OS) = 0(f(S)).
1.4 Sequences
numbers, n-dimensional vectors, sets, etc. If the /cth object in the sequence is Xk,
the sequence as a whole is usually denoted by {-Xfc}^, or just by {X/Jf3 or even
{Xk} if there is no possibility of confusion. (We shall comment further on this
notation below.) Alternatively, we can write out the sequence as X\, X2, X3,....
We speak of a sequence in a set S if the objects of the sequence all belong to S.
Example 1.
a. A sequence of numbers: 1,4,9,16,.... The /cth term in the sequence is k2,
and the sequence as a whole may be written as {/c2}^0.
b. A sequence of intervals: (—1,1), (—3, A), (— g> g)i (—jj?)) — The/cth
term in the sequence is the interval (—jj, j). and the sequence as a whole
may be written as {(—|, j)}f°-
1,1,2,3,5,8,13,21,34,55,89,...,
in which the first two terms are equal to 1 and each of the remaining terms is
the sum of the two preceding ones (that is, xk = xk-2 + %k-l)-
-1,1,-1,1,-1,1,...,
but its set of values is just the two-element set {—1,1}. Since curly brackets are
commonly used to specify sets (as we just did with { — 1,1}), the notation {Xk}f
for a sequence invites confusion with the set whose elements are the Xk's, and for
this reason some authors use other notations such as {Xk)f°. However, the notation
{Xfe}J° is by far the most common one, and in practice it rarely causes problems,
so we shall stick with it.
For the remainder of this section we shall be concerned with sequences of num
bers or n-dimensional vectors. We reserve the letter n for the dimension and use
letters such as k and j for the index on a sequence. Thus, for example, if {x^} is a
sequence in Wn, the components of the vector x^ are (xki, • • •, Xkn)-
A sequence {x^} in R" is said to converge to the limit L if for every e > 0
there is an integer K such that |xfc — L| < e whenever k > ,K; otherwise, {x/J
diverges. If {x^} converges to L, we write x& —> 1 or L = ^oo /
We say that lim^oo x& = oo (or +oo) if for every C > 0 there is an integer
K such that Xk > C whenever k > K, and limfe-^ x^ = — oo if for every C > 0
there is an integer K such that Xk < —C whenever k > K. (However, a sequence
whose limit is ±oo is still called divergent.)
It follows easily from the estimates (1.3) that x& —* L if and only if each
component of x^ converges to the corresponding component of L, that is, Xkm -^
Lm for 1 < m < n. The study of convergence of sequences of vectors is thus
reducible to the study of convergence of numerical sequences.
Example 4.
a. The sequence {I/A;} converges to 0, since \{l/k) — 0| < e whenever k >
(1/e).
b. The sequence {k2} diverges; more precisely, lim/^oo k2 = oo.
22 Chapter 1. Setting the Stage
Ck _CK C C C_ CK 1 1 1 CK 1
k\ " K\ K+ \ K+ 2 k K\ 2 2 2 if! 2
sequence {xfc} then converges to a, but the sequence {f (xfe)} does not converge to
f(a).
We have shown that if (a) is true then (b) is true, and that if (a) is false then (b)
is false, so the proof is complete. d
EXERCISES
/I. For each of the following sequences {xk}, find the limit or show that the se
quence diverges.
7. Show that the closure of S is the union of S and the set of all its accumulation
points.
24 Chapter 1. Setting the Stage
1.5 Completeness
The essential properties of the real number system that underlie all the theorems of
calculus are summarized by saying that R is a complete ordered field. We explain
the meaning of these terms one by one:
A. field is a set on which the operations of addition, subtraction, multiplication,
and division (by any nonzero number) are defined, subject to all the usual laws of
arithmetic: commutativity, associativity, etc. Besides the real numbers, examples of
fields include the rational numbers and the complex numbers, and there are many
others. (For more precise definitions and more examples, consult a textbook on
abstract algebra such as Birkhoff and Mac Lane [4] or Hungerford [8].)
An ordered field is a field equipped with a binary relation < that is transitive
(if a < b and b < c, then a < c) and antisymmetric (if a ^ b, then either a < box
b < a, but not both), and interacts with the arithmetic operations in the usual way
(if a < b then a + c < b + c for any c, and also ac < be if c > 0). The real number
and rational number systems are ordered fields (with the usual meaning of "<").
but the complex number system is not.
Finally, completeness is what distinguishes the real numbers from the smaller
ordered fields such as the rational numbers and makes possible the transition from
algebra to calculus; it means that there are "no holes" in the real number line. There
are several equivalent ways of stating the completeness property precisely. The one
we shall use as a starting point is the existence of least upper bounds.
If 5 is a subset of R, an upper bound for 5 is a number u such that x < u for
all x € 5, and a lower bound for 5 is a number I such that x > I for all x E 5.
If 5 has no upper bound, we shall define sup 5 to be +oo, and if 5 has no lower
bound, we shall define inf 5 to be — oo.
Example 1.
a. If 5 is the interval (0,1], then sup 5 = 1 and inf 5 = 0.
b. If 5 = {l,i,i, !,...}, then sup 5 = land inf 5 = 0.
c. If 5 = {1, 2,3,4,...}, then sup 5 = oo and inf 5 = 1.
d. If 5 is the single point a, then sup 5 = inf 5 = a.
e. If 5 = {x : x is rational and x2 < 2}, then sup 5 = \/2 and inf 5 =
— \/2. This is an example of a set of rational numbers that has no supremum
or infimum within the set of rational numbers.
1.5. Completeness 25
If S has an upper bound, the number a = sup S is the unique number such
that
i. x < a for every x € S and
ii. for every e > 0 there exists x G 5 with x > a — e.
(i) expresses the fact that a is an upper bound, whereas (ii) expresses the fact that
there is no smaller upper bound. In particular, while sup S may or may not belong
to S itself, it always belongs to the closure of S. Similarly for inf S if S is bounded
The completeness of the real number system plays a crucial role in establishing
the convergence of numerical sequences. The most basic result along these lines is
the following. First, some terminology: A sequence {x^} is called bounded if all
the numbers xn are contained in some bounded interval. A sequence {xn} is called
increasing if xn < xm whenever n < m, and decreasing if xn > xm whenever
n < rn. A sequence that is either increasing or decreasing is called monotone (or
monotonic).
1 / a
Xk - -z Xk-l "II
2 V
Observe that if x^-i > 0 then xk > 0 too; since we assume that x\ > 0,
every term of this sequence is positive. (In particular, division by zero is never
a problem.) We claim that Xk -> \fa, no matter what initial x\ is chosen.
Indeed, if we assume that the sequence converges to a nonzero limit L, by
letting k -» oo in the recursion formula we see that
or
26 Chapter 1. Setting the Stage
so that L2 = a. Since xk > 0 for every k, we must have L > 0, and hence
L = y/a. But this argument is without force until we know that {xk} converges
to a nonzero limit.
To verify this, observe that for k > 2,
The convergence to a limit L > -y/a now follows from the monotone sequence
theorem. (The verification that {xk} converges is not just a formality; see
Exercise 4.)
The sequence {xk} gives a computationally efficient recursive algorithm
for computing square roots.
1.17 Theorem (The Nested Interval Theorem). Let h = [oi, &i], h = [a2, b2],
.. .be a sequence of closed, bounded intervals in R. Suppose that (a) I\ D Ii D
Is D • ■ •, and (b) the length bk — ak of Ik tends to 0 as k —> 00. Then there is
exactly one point contained in all of the intervals Ik.
It should be emphasized that the real point of the nested interval theorem is that
the intersection f]^° In is nonempty; the fact that it can contain no more than one
point is pretty obvious from the assumption that the length of In tends to zero.
If {xk} is a sequence (in any set, not necessarily R), we may form a subse
quence of {xk} by deleting some of the terms and keeping the rest in their original
order. More precisely, a subsequence of {xk} is a sequence {xk^fLi specified
1.5. Completeness 27
by a one-to-one, increasing map j —*■ kj from the set of positive integers into it
self. For example, by taking kj = 2j we obtain the subsequence of even-numbered
terms; by taking kj = j2 we obtain the subsequence of those terms whose index is
a perfect square, and so on.
The following theorem is one of the most useful results in the foundations of
analysis; it is one version of the Bolzano-Weierstrass theorem, whose general form
will be found in Theorem 1.21.
Proof. Let {x^ } be a bounded sequence, say Xk e [a, b] for all k. Bisect the interval
[a, b] — that is, consider the two intervals [a, \{a + b)] and [|(a + b), b]. At least
one of these subintervals must contain x^ for infinitely many k; call that subinterval
I\. (If both of them contain Xk for infinitely many k, pick the one on the left.) Now
bisect I\. Again, one of the two halves must contain Xk for infinitely many k; call
that half /2. Proceeding inductively, we obtain a sequence of intervals Ij, each one
contained in the preceding one, each one half as long as the preceding one, and
each one containing Xk for infinitely many k. By the nested interval theorem, there
is exactly one point I contained in every Ij.
It is now easy to construct a subsequence of {x/J that converges to /, as follows.
Pick an integer fei such that x/Ol G I\, then pick fo > &i such that Xk2 £ h, then
pick &3 > k'2 such that Xfc3 6 1%, and so forth. By construction of the Ij's, this
process can be continued indefinitely. Since Xk and / are both in Ij, and the length
of Ij is 2~-3(6 — a), we have \xk- — l\ < 2~J(6 — a), which tends to 0 as j —> oo;
that is, Xkj —> /. □
Proof. If |xfc | < C for all k, then the components Xk\,..., Xkn all lie in the interval
[—C, C}. Hence, for each m = 1,..., n we can extract a convergent subsequence
from the sequence of mth components, {xkm}kLi- The trouble is that the indices
on these subsequences might all be different, so we can't put them together. (We
might have chosen the odd-numbered terms for m = 1 and the even-numbered
terms for m — 2, for example.) Instead, we have to proceed inductively. First
we choose a subsequence {x^,.} such that the first components converge; then we
choose a sub-subsequence {x^. } whose second components also converge, and so
on until we find a (sub)nsequence whose components all converge. □
Another way to express the completeness of the real number system is to say
that every sequence whose terms get closer and closer to each other actually con
verges. To be more precise, a sequence {x/J in IRn is called a Cauchy sequence if
28 Chapter 1. Setting the Stage
Xfc — Xj —> 0 as k, j —► oo, that is, if for every e > 0 there exists an integer K such
that |x& — Xj | < e whenever k > K and j > K.
Proof. Suppose x/; —> 1. Since x& - Xj = (x& - 1) — (xj — 1), we have 0 <
Xfc — Xj| < |xfe — 1| + |xj — 1|. Both terms on the right tend to zero as k, j —> oo;
hence so does x^ — Xj. Thus {x^} is Cauchy.
Now suppose {xfc} is Cauchy. Taking e = 1 in the definition of "Cauchy,"
we see that there is an integer K such that jx^ — Xj| < 1 if k,j > K. Then
|xfc | < |xjf+i | +1 for all k > K, and it follows that the sequence {xfc} is bounded.
By Theorem 1.18, there is a subsequence {x^.} that converges to a limit 1. But then
since {x/J is Cauchy, the whole sequence must also converge to 1. Indeed, given
e > 0, there is an integer J such that {x.^ —1| < \e if j > J, and there is an integer
K such that |x^ — xm | < |e if k, m > K. Pick an integer j > J such that kj > K;
then for k > K we have
Therefore, x^ —> 1. □
EXERCISES
1. Find sup 5 and inf S for the following sets 5. Do these numbers belong to 5
or not?
a. 5 = {x : (2x2 - l)(x2 - 1) < 0}.
b. s = {(-l)k + 2~k :k>0}.
c. S = {x : arctanx > 1}.
2. Construct a sequence {xk} that has subsequences converging to three different
limits.
6. Let rk be the ratio of the (k + l)th term to the kth term of the Fibonacci
sequence (Example 2, §1.4). (Thus the first few r^'s are 1, 2, |, |,...) Our
object is to show that lim^oo rk is the "golden ratio" ip = |(1 + Vb), the
positive root of the equation x2 = x + 1.
a. Show that
rk + 1 2 + 1
b. Show that rk < y if A; is odd and rk > <f> if k is even. Then show that
rfc+2 — r^ is positive if k is odd and negative if k is even. (Hint: For x > 0
we have x2 < x + 1 if x < <p and x2 > x + 1 if x > ip.)
c. Show that the subsequences {r2j-i} and {r^j} of odd- and even-numbered
terms both converge to <p.
7. Let {xfc} be a sequence in Kn and a a point in K.". Show that some subsequence
of {xfc} converges to x if and only if every ball centered at x contains x/j for
infinitely many values of k.
8. Show that every infinite bounded set in M.n has an accumulation point. (See
Exercises 6-7 in §1.4.)
Then the sequence {Ym} is bounded and decreasing, and {ym} is bounded and
increasing (because the sup and inf are being taken over fewer and fewer numbers
as m increases), so they both converge. The limits limym and limym are called
the limit superior and limit inferior of the sequence {xk}, respectively; they are
denoted by limsup^^ xk and liminf/c-+O0 xk:
9. Show that lim sup xk is the number a uniquely specified by the following prop
erty: For any e > 0, there are infinitely many k for which xk > a — e but only
finitely many for which xk > a + e. What is the corresponding condition for
liminf Xfc?
30 Chapter 1. Setting the Stage
10. Show that there is a subsequence of {xk} that converges to lim sup x^, and
one that converges to lim inf Xk-
11. Show that if a £ R is the limit of some subsequence of {xk}, then lim inf Xk <
a < limsupxfc.
12. Show that {xk} converges if and only if lim sup xk — lim inf Xk, in which
case this common value is equal to limxfc.
1.6 Compactness
1.22 Theorem. Continuous functions map compact sets to compact sets. That is,
suppose that S is a compact subset ofW1 and f : S —> Rm is continuous at every
point of S. Then the set
f(5) = {f(x) :x£S}
is also compact.
Proof. Suppose {yk} is a sequence in the image f (S). For each k there is a point
xfc € V such that y* = f (xfc). Since S is compact, by the Bolzano-Weierstrass
theorem the sequence {x^} has a convergent subsequence {x^} whose limit a
lies in S. Since f is continuous at a, by Theorem 1.15 the sequence {y^} =
{f(xfc.)} converges to the point f(a) e f(5). Thus, every sequence in f (5) has a
subsequence whose limit lies in i(S). By the Bolzano-Weierstrass theorem again,
f (S) is compact. □
It is not true, in general, that continuous functions map closed sets to closed
sets, or bounded sets to bounded sets. (See Exercises 1-2.) Only the combination
of closedness and boundedness is preserved.
An immediate consequence of Theorem 1.22 is the fundamental existence the
orem for maxima and minima of real-valued functions.
• f(x) = cot ttx, S = (0,1). (The values of / range from —oo to oo.)
Compactness also has another consequence that turns out to be extremely useful
in more advanced mathematical analysis, although its significance may not be very
clear at first sight. (It will not be used elsewhere in this book except in some of the
technical arguments in Appendix B, so it may be regarded as an optional topic.)
Suppose S is a subset of M.n. A collection II of subsets of Rn is called a covering
of S if S is contained in the union of the sets in U. For example, for each xeS
we could pick an open ball 5X centered at x; then U — {-Bx : x e S} is a covering
of 5.
Much of what we have done in this section and the preceding ones can be
generalized from subsets of Rn to subsets of more general spaces equipped with a
"distance function" that behaves more or less like the Euclidean distance d(x, y) =
x — y|. (Such spaces are known as metric spaces; see DePree and Swartz [5],
Krantz [12], or Rudin [18].) For example, in studying the geometry of a surface
5 in M3, one might want to take the "distance" between two points x, y S S to
be not the straight-line distance |x — y| but the length of the shortest curve on 5
that joins x to y. Another class of examples is provided by spaces of functions,
where the "distance" between two functions / and g can be measured in a number
of different ways; we shall say more about this in Chapter 8. In this general setting,
the Bolzano-Weierstrass and Heine-Borel theorems are no longer completely valid.
The conditions on a set S in Theorem 1.21b and Theorem 1.24b still imply that 5 is
closed and bounded, but not conversely. These conditions are still very important,
however, so a shift in terminology is called for. The condition in Theorem 1.24b -
that every open cover of S has a finite subcover — is usually taken as the definition
of compactness in the general setting, and the condition in Theorem 1.21b — that
every sequence in S has a subsequence that converges in S — is called sequential
compactness.
1.7. Connectedness 33
EXERCISES
1. Give an example of
a. a closed set S C R and a continuous function / : R —> R such that f(S)
is not closed;
b. an open set U C R and a continuous function / : R —> R such that /(£/)
is not open.
1.7 Connectedness
A set in Rn is said to be connected if it is "all in one piece," that is, if it is not the
union of two nonempty subsets that do not touch each other. The formal definition
is as follows: A set S C R" is disconnected if it is the union of two nonempty
subsets S\ and S2, neither of which intersects the closure of the other one; in this
34 Chapter 1. Setting the Stage
5 T
case we shall call the pair (Si, Sa) a disconnection of 5. The set 5 is connected
if it is not disconnected.
Example 1. Let
Then the set S = Si U 52 is disconnected, for the only point common to the
closures of Si and 52 is (0,0), which belongs to neither Si nor S2. However,
the set T = 5i LJ 52 is connected, for (0,0) belongs both to 52 and the closure
of Si; this point "connects" the two pieces of T. See Figure 1.2,
1.25 Theorem. The connected subsets ofR are precisely the intervals (open, half-
open, or closed; bounded or unbounded).
The following result, a cousin of Theorem 1.22, gives the basic relation between
continuity and connectedness:
1.26 Theorem. Continuous functions map connected sets to connected sets. That
is, suppose f : S —> Mm is continuous at every point of S and S is connected. Then
the set
f(S) = (f(x) : x G S}
is also connected.
Then Si and S2 are nonempty, and their union is S. If there were a point x G Si
belonging to the closure of S2, x would be the limit of a sequence {~x-k} m S2 by
Theorem 1.14. But then f(x) G U\ and f(xfe) G U2, so f(x) = limf(xfc) would
be in the closure of £/2 by Theorem 1.14 again. This is impossible; hence Si does
not intersect the closure of S2, and likewise, S2 does not intersect the closure of Si.
Thus S = Si U S2 is disconnected. □
Proof. By Theorems 1.25 and 1.26, f(V) is an interval. It contains /(a) and /(b)
and hence contains the entire interval between them. □
The following results explain the relation between connectedness and arcwise
connectedness.
Proof. We shall assume that 5 is disconnected and show that it is not arcwise con
nected. Accordingly, suppose (Si, £2) is a disconnection of S. Pick a G S\ and
b € 6*2; we claim that there is no continuous g : [0,1] —> 5 such that g(0) = a and
g(l) = b. If there were, the set V — g([0,1]) would be connected by Theorems
1.25 and 1.26. But this cannot be so: V is the union of V n Si and V n S2; these
sets are nonempty since a 6 VtlSi and b € V (IS2, and neither of them intersects
the closure of the other. Hence 5 is not arcwise connected. D
1.7. Connectedness 37
The converse of Theorem 1.28 is false: A set can be connected without being
arcwise connected. A typical example is
pictured in Figure 1.3. S consists of two pieces, the graph of sin(7r/x) and the
vertical line segment. These two sets do not form a disconnection of S, as the line
segment is included in the closure of the graph, but a point on the line segment
cannot be connected to a point on the graph by a continuous curve. The details are
sketched in Exercise 11.
However, open connected sets are arcwise connected:
Proof. Fix a point a G S. Let S\ be the set of points in S that can be joined to a
by a continuous curve in S, and let 52 be the set of points in S that cannot; thus S\
and 52 are disjoint and 5 — Si U 52. We shall show that
a. if x € Si, then all points sufficiently close to x are in Si',
b. if x G S is in the closure of Si, then x G Si.
(a) shows that no point of Si can be in the closure of S2, and (b) shows that no
point in the closure of Si can be in 52. Thus (Si, S2) will form a disconnection
of 5, contrary to the assumption that 5 is connected, unless S2 is empty — which
means that 5 is arcwise connected.
To prove (a) and (b), we use the fact that 5 is open, so that if x G 5, there is
a ball B centered at x that is included in 5. If x G Si, then every y £ Bis also
in Si, for y can be joined to a by first joining x to a and then joining y to x by
the straight line segment from x to y, which lies in B and hence in 5. Similarly,
if x is in the closure of Si, by Theorem 1.14 there is a sequence {x^} of points in
Si that converges to x. We have x^eB for k sufficiently large, so again, x can
be joined to a by joining x^ to a and then joining x to x^ by a line segment in B;
hence x G 5i. This completes the proof. □
EXERCISES
1. Show directly from the definition that the following sets are disconnected.
(That is, produce a disconnection for each of them.)
a. The hyperbola {(x, y) G R2 : x2 - y2 = 1}.
b. Any finite set in M.n with at least two elements. •
c. {(x,y,z)eR3:xyz>0}.
38 Chapter 1. Setting the Stage
4. Suppose Si and 52 are connected sets in E™ that contain at least one point in
common. Show that S\ U 52 is connected. Is it true that S\ D S2 must be
connected?
5. Show that an open set in Rn is disconnected if and only if it is the union of two
disjoint nonempty open subsets.
6. Show that a closed set in Rn is disconnected if and only if it is the union of two
disjoint nonempty closed subsets.
10. Suppose 5 is a connected set in E2 that contains (1,3) and (4, —1). Show that
5 contains at least one point on the line x = y. (Hint: Consider f(x,y) —
x - y.)
The crucial point is that for simple continuity the number 5 may depend on x, but
for uniform continuity it does not. This is a rather subtle point, and the reader
should not be discouraged if its significance is not immediately clear; some very
eminent mathematicians of the past also had trouble with it!
Some readers may find it enlightening to see these conditions rewritten in a
symbolic way that makes them as concise as possible. We employ the logical sym
bols V and 3, which mean "for all" and "there exists," respectively. With this un
derstanding, the condition for f to be continuous on S is that
(1.32) Ve> 0 38> 0 : Vx,y G 5 |x-y| < 8 => |f(x) - f(y)| < e.
The difference between (1.31) and (1.32) is that the "Vx" has been interchanged
with the "3(5," so that in (1.31) the 8 is allowed to depend on x, whereas in (1.32)
the same 5 must work for every x.
as a —> oo. To be more precise, let us suppose that a > 0 and h > 0. Since
g{a+h)-g(a) =2ah+h2 > 2ah, there is no hope to get \g(a+h) — g(a)\ < e
unless h < a/2a. Thus, the allowable 5 in (1.31) at x = a must be smaller than
e/2a, which gets smaller as a gets larger. On the other hand, g is uniformly
continuous on every bounded interval, because on such an interval there is a
finite upper bound for \g'\, and the mean value theorem can be applied as in
Example 1.
Taking S = 1, 5, j,..., we see that for each positive integer k there exist X&, y^ £
S such that |x& — y&| < k~l and |f (x^) — f(yfe)| > e. By the Bolzano-Weierstrass
theorem, by passing to a subsequence we may assume that {x^} converges, say to
a e 5. Since |x& — yt\ —> 0, we also have y/. —* a. But then f(x^) - f(yfc) —*
f (a) — f(a) = 0, contradicting the assertion that |f(xj.) — f(yfc)| > e. □
EXERCISES
DIFFERENTIAL CALCULUS
The main theme of this chapter is the theory and applications of differential cal
culus for functions of several variables. The reader is expected to be familiar with
differential calculus for functions of one variable. However, we offer a review of
the one-variable theory that contains a few features that the reader may not have
seen before, and the one-variable theory makes another appearance in the section
on Taylor's theorem.
We begin with an approach to the notion of derivative that is a bit different from
the one usually found in elementary calculus books. This point of view is very
useful in more advanced work, and it is the one that leads to the proper notion of
differentiability for functions of several variables.
The basic idea is that a function / : R —> R is differentiate at x = a if it is
approximately linear near x = a. Geometrically, this means that the graph of /
has a tangent line at x = a. Analytically, it me£ns that there is a linear function
l{x) = mx + b satisfying the following two conditions:
• l(a) = f(a), so that b = /(a) - ma and hence l(x) = /(a) + m(x - a);
• the difference f(x) — l(x) tends to zero at a faster rate than x — a as x —> a,
that is,
fix) - l(x)
±±-+ ^ -> 0 as x -> a.
x — a
43
44 Chapter 2. Differential Calculus
in other words, if /(a + /i) is the sum of the linear function /(a) + mh and an error
term that tends to zero more rapidly than h as h —> 0. In this case we have
(2.2) „***)
ft.
where E\{h) and E2(h) are o(h). Multiplying these equations together yields
where
Clearly E3(h) is o(h) since EL(h) and E2(h) are, so (2.3) is of the form (2.1)
with / replaced by fg and m = f'(a)g(a) + f(a)g'(a). In other words, fg is
differentiable at a and (fg)'(a) = f'(a)g(a) + f(a)g'(a).
The chain rule can also be derived in this way; we shall do so, in a more general
setting, in §2.3.
},ia)j±±Rzm
The Mean Value Theorem. The definition of the derivative involves passing
from the "local" information given by the values of f(x) for x near a to the "in
finitesimal" information /'(a), which (intuitively speaking) gives the infinitesimal
change in / corresponding to an infinitesimal change in x. To reverse the process
and pass from "infinitesimal" information to "local" information — that is, to ex
tract information about / from a knowledge of /' — the principal tool is the mean
value theorem, one of the most important theoretical results of elementary calculus.
The derivation begins with the following result, which is important in its own right.
2.7 Theorem (Mean Value Theorem I). Suppose f is continuous on [a, b] and
differentiable on (a, b). There is at least one point c G (a, b) such that
b-a '
Proof. The straight line joining (a, /(a)) to (6, f(b)) is the graph of the function
and the assertion is that there is a point c G (a, b) where the slope of the graph
V — f(x) is the same as the slope of this line, in other words, where the derivative
of the difference g(x) = f(x) — l(x) is zero. But / and / have the same values at
a and b, so g(a) = g(b) = 0, and the conclusion then follows by applying Rolle's
theorem to g. □
The mean value theorem is nonconstructive; that is, although it asserts the ex
istence of a certain point c € (a, b), it gives no clue as to how to find that point.
Students often find this perplexing at first, but in fact the whole power of the mean
value theorem comes from situations where there is no need to know precisely
where c is. In many applications, one has information about the behavior of /' on
some interval, and one deduces information about / on that same interval. The
following theorem comprises the most important of them.
We say that a function / is increasing (resp. strictly increasing) on an interval
I if /(a) < f(b) (resp. /(a) < /(&)) whenever a,b G / and a < b; similarly for
decreasing and strictly decreasing.
In case the reader feels that we are belaboring the obvious here, we should point
out that the mere differentiability of / at a single point a gives less information
about the behavior of / near x = a than we would like. For example, if f'{a) > 0,
it does not follow that / is increasing in some neighborhood of a; see Exercises 3
and 4.
2.9 Theorem (Mean Value Theorem II). Suppose that f and g are continuous on
[a, b] and differentiable on (a, b), andg'(x) / 0 for all x £ (a, b). Then there exists
c 6 (a, b) such that
f(c) = /(ft) - /(a)
g'(c) " g(b)-g(a)-
Proof. Let
Then h is continuous on [a, b] and differentiable on (a, b), and h(a) = h(b) = 0.
By Rolle's theorem, there is a point c 6 (a, b) such that
Since g' is never 0 on (a, b), we have g'(c) ^ 0 and also g(b) - g(a) ^ 0 (by the
mean value theorem, since g(b) - g{a) = g'{c)(b - a) for some c £ (a, b)). Hence
we can divide by both these quantities to obtain the desired result. □
L'Hopital's Rule. Often one is faced with the evaluation of limits of quotients
f(x)/g(x) where / and g both tend to zero or infinity. The collection of related
results that go under the name of THopital's rule" enable one to evaluate such
limits in many cases by examining the quotient of the derivatives, f'(x)/g'(x).
The cases involving the indeterminate form 0/0 can be summarized as follows.
2.10 Theorem (L'Hopital's Rule I). Suppose f and g are differentiable functions
on (a, b) and
lim f(x) = lim g(x) = 0.
x>a+ x>a+
48 Chapter 2. Differential Calculus
x—>a+ g yx)
lim M m L.
x-*a+ g(x)
Proof. If we (re)define /(a) and g(a) to be 0, then / and g are continuous on the
interval [a, x] for x < b. By Theorem 2.9, for each x € (a, b) there exists c € (a, x)
(depending on x) such that
m=llm = L.
>a+ g[x) C-MI+ g \C)
The proof for left-hand limits is similar, and the case of two-sided limits is obtained
by combining right-hand and left-hand limits. Finally, for the case a = ±00, we
set y = l/x and consider the functions F(y) = f{l/y) and G(y) = g(l/y).
Since F'(y) = -f(l/y)/y2 and G'(y) = -g'(l/y)/y2, we have F'(y)/G'(y) =
f/(l/y)/g'(l/y), so by the results just proved,
Under the conditions of Theorem 2.10, it may well happen that f(x) and g'(x)
tend to zero also, so that the limit of f'(x)/g'(x) cannot be evaluated immediately.
In this case we can apply Theorem 2.10 again to evaluate the limit by examining
f"{x)/g"(x). More generally, if the functions /,/',... ,/(fe"1),£?,^,... ,g{h~l)
all tend to zero as x tends to a+ or a— or ±00, but f^k\x)/g^{x) —> L, then
f(x)/g(x) -> L,
2.1. Differentiability in One Variable 49
The corresponding result for limits of the form oo/oo is also true.
2.11 Theorem (L'HopitaPs Rule II). Theorem 2.10 remains valid when the hy
pothesis that lim /(x) = \\mg{x) = 0 (as x —> a+, x —> a—, etc.) is replaced by
the hypothesis that lim |/(x)| = lim \g{x)\ = oo.
Proof. We consider the case of left-hand limits as x —> a-; the other cases follow
as in Theorem 2.10.
Given e > 0, we wish to show that \[f{x)/g(x)} - L\ < e provided that x is
sufficiently close to a on the left. Since f'(x)/g'(x) —> L and |#(x)| —» oo, we can
choose xo < a so that
f(x) ~
-I < - for xo < x < a.
(x) - g(x0)
/(so)
f(x) -
/(x)
g{x)-g{x0) g(x)
50 Chapter 2. Differential Calculus
f(x)
-L < e,
xa logx logx
lim — = lim — = hm - = 0.
x a
xa a
That is, the exponential function ex grows more rapidly than any power of x as
x —> +oo, whereas | logx| grows more slowly than any positive power of x as
x
x —> +oo and more slowly than any negative power ofx as x —> 0+.
Proof. For the first limit, let k be the smallest integer that is > a. A /c-fold appli
cation of Theorem 2.11 yields
xa „ a(a - 1) ■ • • (a - k + l)xa~k
lim — = hm — ,
a;—>+oo ex as—H-oo ex
and the latter limit is zero because a — k < 0. For the other two limits, a single
application of Theorem 2.11 suffices:
,. l°gz ,. 1 _ .. logx xa
lim = lim = 0, lim - - = lim — = 0.
x^+oo xa x-^+oo axa x—>0+ X~a x—>0+ a
We shall prove this for the more general case of functions of several variables in
62.10.
EXERCISES
1. Suppose that / is differentiable on the interval / and that f'(x) > 0 for all
x € I except for finitely many points at which f'(x) = 0. Show that / is
strictly increasing on I.
3. Let / be the function in Exercise 2, and let g(x) = f(x) + \x. Show that
g'(0) > 0 but that there is no neighborhood of 0 on which g is increasing.
(More precisely, every interval containing 0 has subintervals on which g is
decreasing.)
5. Suppose that / is continuous on [a, b] and differentiable on (a, b), and that the
right-hand limit L = limx^a+ f'(x) exists. Show that the right-hand derivative
f'+(a) exists and equals L. {Hint: Consider the difference quotients defining
f'+ (a) and use the mean value theorem.) Of course, the analogous result for
left-hand limits at b also holds.
(a + 2h)-2f(a + h) + f(a)
/(a)
/(a + 3/i) - 3/(a + 2h) + 3/(a + h) - /(a) __ (3)
h^o h3 W'
Can you find the generalization to higher derivatives?
7. Show thatfor any a,kK, limx^Q(l + ax)b/x = eab. (Hint: Take logarithms.)
8. Suppose f and g are differentiable functions on R with values in R".
a. Show that (f ■ g)' = f • g + f • g'.
b. Suppose also that n = 3, and show that (f x g)' = f x g + f x g'.
9. Define the function / by f(x) = e~l'x2 if x # 0, /(0) = 0.
2.2. Differentiability in Several Variables 53
a. Show that limx_>0 f(x)/xn = 0 for all n > 0. (You'll find that a simple-
minded application of Theorem 2.10 doesn't work. Try setting y = 1/x2
instead.)
b. Show that / is differentiable at x = 0 and that /'(0) = 0.
c. Show by induction on k that for x / 0, f^(x) = P(l/x)e~1/x2, where
P is a polynomial of degree 3k.
d. Show by induction on k that /(fe) (0) exists and equals 0 for all k. (Use the
results of (a) and (c) to compute the derivative of f(k~^ at x = 0 directly
from the definition, as in (b).)
The upshot is that / possesses derivatives of all orders at every point and that
/W(0) = 0 for all k.
10. Exercise 2 shows that it is possible for /' to exist at every point of an interval
I but to have discontinuities. It is an intriguing fact that when /' exists at every
point of /, it has the intermediate value property whether or not it is continuous.
More precisely:
Darboux's Theorem. Suppose / is differentiable on [a, b}. If v is any num
ber between f'(a) and f'(b), there is a point c G (a, b) such that f'(c) = v.
Prove Darboux's theorem, as follows: To simplify the notation, consider
the case a = 0,b = l. Define h : [0,2] -> R by setting /i(0) = /'(0),
X Z X
and /i(2) = /'(I). Show that h is continuous on [0,2] and apply the intermedi
ate value theorem to it. (This argument has a simple geometric interpretation,
which you can find if you think of h(x) as the slope of the chord joining a
certain pair of points on the graph of /.)
The simplest notion of derivative for a function of several variables is that of partial
derivatives, which are just the derivatives of the function with respect to each of
its variables when the others are held fixed. That is, the partial derivative of a
function f(xi,..., xn) with respect to the variable Xj is
The most common notations for the partial derivative just defined are
The first one is a modification of the Leibniz notation df/dx for ordinary deriva
tives with the d replaced by the "curly d" 8. The second one, with the variable of
differentiation indicated merely as a subscript on the function, is often used when
the first one seems too cumbersome. The third one is a variation on the second one
that is used when one does not want to commit oneself to naming the independent
variables but wants to speak of "the partial derivative of / with respect to its jfth
variable." The notations fXj and fj have the disadvantage that they may conflict
with other uses of subscripts — for example, denoting an ordered list of functions
by /i, hi h, It has therefore become increasingly common in advanced math
ematics to use the notations dXjf and djf instead, which are reasonably compact
and at the same time quite unambiguous.
The partial derivatives of a function give information about hpw the value of
the function changes when just one of the independent variables changes; that is,
they tell^how the function varies along the lines parallel to the coordinate axes.
Sometimes this is just what is needed, but often we want something more. We may
want to know how the function behaves when several of the variables are changed at
once; or we may want to consider a new coordinate system, rotated with respect to
the old one, and ask how the function varies along the lines parallel to the new axes.
Do the partial derivatives provide such information? Without additional conditions
on the function, the answer is no.
ClearlyJ^O, 0) and /^(0,0) aren't describing the behavior of / near the origin
very well: when either x or y is varied while the other is held fixed at 0,J
doesn't change at all, but when both are varied at once, / can change quite
drastically!
We need to give more thought to what it should mean for a function of several
variables to be differentiable. The right idea is provided by the characterization of
differentiability in one variable that we developed in the preceding section. Namely,
a function /(x) is differentiable at a point x = a if there is a linear function /(x)
such that l\a) = /(a) and the difference /(x) - Z(x) tends to zero faster than x - a
as x approaches a. Now, the general linear1 function of n variables has the form
and the condition I [a.) = /(a) forces b to be /(a) - c • a, so that Z(x) = /(a) + c •
(x — a). With this in mind, here is the formal definition.
A function / defined on an open set S C W1 is called differentiable at a point
a G S if there is a vector c e Rn such that
'Unfortunately the term "linear" has two common meanings as applied to functions: "first-degree
polynomial" and "satisfying 2(ax + fey) = oZ(x) + bl(y)." The first meaning — the one used here
— allows a constant term; the second does not. See Appendix A, (A.5).
56 Chapter 2. Differential Calculus
What does this mean? First, let us establish the geometric intuition. If n = 2,
the graph of the equation z = /(x) (with x = (x, y)) represents a surface in
3-space, and the graph of the equation z = /(a) + V/(a) • (x - a) (x is the
variable; a is fixed) represents a plane. These two objects both pass through the
point (a, /(a)), and at nearby points x = a + h we have
Condition (2.16) says precisely that this difference tends to zero faster than h as
h —» 0. Geometrically, this means that the plane z = /(a) + V/(a) • (x — a) is
the tangent plane to the surface z = /(x) at x = a, as indicated in Figure 2.1.
The same interpretation is valid in any number of variables, with a little stretch of
the imagination: The equation z = /(x) represents a "hypersurface" in Rn+1 with
coordinates (xi,... ,xn,z), and the equation z = /(a) + V/(a)-(x — a) represents
its "tangent hyperplane" at a.
Next, let us establish the connection with partial derivatives and the uniqueness
of the vector c in (2.15). Suppose / is differentiable at a. If we take the increment
h in (2.16) to be of the form h = (h, 0,..., 0) with h G R, we have c ■ h = ah
and |h| = ±h (depending on the sign of h). Thus (2.16) says (after multiplying
through by —1 if h is negative) that
The converses of Theorems 2.17 and 2.18 are false. The continuity of / does
not imply the differentiability of / even in dimension n = 1 (think of functions like}
/(x) = |x| whose graphs have corners). When n > 1, the mere existence of the
partial derivatives of / does not imply the differentiability of / either. The example
(2.14) demonstrates this: Its partial derivatives exist, but it is not continuous at the
origin, so it cannot be differentiable there.
To restate what we have just shown: For a function / to be differentiable at a
it is necessary for the partial derivatives djf(a) to exist, but not sufficient. How,
then, do we know when a function is differentiable? Fortunately, there is a simple
condition, not too much stronger than the existence of the partial derivatives, that
guarantees differentiability.
2.19 Theorem. Let/f be a function definedbn an open set in W1 that contains the
point(S) Suppose that the partial derivatives dj f jail exist on some neighborhood of
a and that they areCgontinuous at a. Then f is differentiable at a.
Proof. Let's consider the case n = 2, to keep the notation simple. We wish to show
that
for some c2 between 0 and h%. Substituting these results back into (2.21) and then
into the left side of (2.20), we obtain
Now let h —> 0. The expressions in brackets tend to 0 because the partial deriva
tives djf are continuous at a, and the ratios hi/\h\ and /i2/|h| are bounded by 1 in
absolute value. Thus (2.20) is valid and / is differentiable at a.
The idea for general n is exactly the same. We write /(a + h) — /(a) as the
sum of n increments, each of which involves a change in only one variable — for
example, the first of them is
— and then use the mean value theorem to express each difference in terms of a
partial derivative of / and proceed as before. □
A function / whose partial derivatives djf all exist and are continuous on an
open set S is said to be of class C1 on 5. For short, we shall also say that "/ is
C1 on S" or "/ e Cl{S)" and refer to "a C1 function /." Theorems 2.17 and 2.19
then say that
The reverse implications are false. We already know that existence of partial deriva
tives does not imply differentiability, and there are differentiable functions whose
derivatives are discontinuous. The standard example in one variable is the function
in Exercise 2, §2.1, and it is easy to generate higher-dimensional examples from
this one.
For most of the elementary functions that we shall work with, the continuity
of the partial derivatives is obvious by inspection, so verifying the differentiability
of a function is usually no problem. For example, for (x, y) ^ (0,0) the partial
derivatives of our old friend (2.14) are
0 ti \
dxf(x,y) = , 2
y3 - x<2y
2.2,
a a \ x3 - xy2
dyf{x,y) = -^-—2VK,
{xl + yiy {xz + yz)z
which are continuous everywhere except at the origin (but not at the origin). Thus
/ is differentiable at every point except the origin.
We conclude this section by examining a few ramifications of the notion of
differentiability.
2.2. Differentiability in Several Variables 59
where the error term is negligibly small in comparison with h. If we neglect the
error term, the resulting approximation to the increment /(a + h) — /(a) is called
the differential of / at a and is denoted by d/(a; h) or dfa(h):
This follows from (2.22) and the fact that the partial derivatives obey these rules.
We'll see later how differentials interact with the chain rule.
Differentials are handy for approximating small changes in a function. Here's
an example:
Example 3. A right circular cone has height 5 and base radius 3. (a) About
how much does the volume increase if the height is increased to 5.02 and the
radius is increased to 3.01? (b) If the height is increased to 5.02, by about how
much should the radius be decreased to keep the volume constant?
Solution. The volume of a cone is given by V = ^nr2h, so dV =
\ixrhdr + |vrr2 dh. (a) If r = 3, h = 5, dr = .01, and dh = .02, we
have dV = |tt(3)(5)(.O1) + iyr(32)(.O2) = .16rr w .50. (b) If r = 3, h = 5,
dh = .02, as in (a) we have dV = 10n dr + .06tt, so dV = 0 if dr = -.006.
60 Chapter 2. Differential Calculus
provided that the limit exists. For example, if u is the unit vector in the positive
jth coordinate direction (that is, u = (0,..., 1,... , 0) with the 1 in the jth place),
then duf is just the partial derivative djf.
In either case, this quantity tends to 0 as t -> 0, which means that duf(a) exists
and equals V/(a) • u. Q
direction as V/(a). Thus, V/(a) is the vector whose magnitude is the largest di
rectional derivative of / at a, and whose direction is the direction of that derivative.
In other words, V/(a) points in the direction of steepest increase of f at a, and its
magnitude is the rate of increase of f in that direction.
EXERCISES
1. For each of the following functions /, (i) compute V/, (ii) find the directional
derivative of / at the point (1, —2) in the direction (§,§)■
a. f(x,y) = x2y + sinirxy.
There are several different but closely related versions of the chain rule for func
tions of several variables. The most basic one concerns the situation where we have
a function f(x±,..., xn) and the variables x\,...,xn are themselves functions of
a single real variable t. To be precise, suppose Xj = gj(t), or x = g(i); we then
have the composite function ip(t) = /(g(i))-
We recall that the derivative g'(t) is defined componentwise:
2.26 Theorem (Chain Rule I). Suppose that g(t) is differentiable at t = a, f(x)
is differentiable at x = b, and b = g(a). Then the composite function ip{t) =
f(g{t)) is differentiable at t — a, and its derivative is given by
2.3. The Chain Rule 63
dw dw dx\ dw dxn
(2.27)
dt dx\ dt dxn dt
In the first equation we take h = g(a + u) - g(a). By the second equation, we also
have h = ug'(a) + E2(tt), and we are given that g(a) = b, so
where
We claim that the error term Es(n) satisfies E3(u)/u —> 0 as it —> 0. Granted this,
we have
It also implies that when tt is small we have |E2(tt)| < \u\ and hence
Now the second term in Ez(u), namely E\{h), becomes negligibly small in com
parison to |h| as |h| —¥ 0, and the estimate above shows that |h| in turn is bounded
by a constant times \u\, so Ei(h) becomes negligibly small in comparison to \u\ as
u —> 0, which means that E\(h)/u —> 0 as desired. □
64 Chapter 2. Differential Calculus
dw d „. A _n+.
— = — fit - t, sin 3t, e )
(it (it
= (5i/) • (4t3 - 1) + (d2f) • (3cos3t) + (d3f) ■ (-2e~2t),
where the partial derivatives djf are all evaluated at (i4 — t, sin3t, e~2t).
Suppose now that the variables xi,...,xn are differentiable functions, not of
a single real variable t, but of a family of variables t = (ti,..., tm); say, Xj =
gj (ti,..., tm), or x = g(t). If / is a differentiable function of x, we then have the
composite function <^(t) = /(g(t)). The chain rule, as stated above, can be used
to compute the partial derivatives of if with respect to the variables tk- Indeed, we
simply fix all but one of those variables and apply the chain rule to the resulting
function of the remaining single variable to obtain
(2.28)
dw dw dx\ dw dxn
dtk dxi dtk dxn dtk '
To be precise, this calculation shows that if the partial derivatives dg/dt^ exist
at t = a and if f is differentiable at x = b — g(a), then the partial derivatives
dip /dtk exist at t = a and are given by (2.28). It also shows that ifg is of class
C1 near a and f is of class C1 near b = g(a), then ip is of class C1, and in
particular is differentiable, near a. Indeed, under these hypotheses, (2.28) shows
that the partial derivatives dtp/dtk are continuous.
It is also natural to ask whether the composite function / o g is differentiable
when / and g are only assumed to be differentiable rather than C1. The answer is
affirmative. When t is only a single real variable, this result is contained in the chain
rule as stated and proved above. The proof for the general case, t = (t\,..., tm),
is almost identical except that the notation is a little messier, and we shall not take
the trouble to write it out. But we shall give a formal statement of the result:
2.29 Theorem (Chain Rule II). Suppose that gi,...,gn are functions of t =
(t\,..., tm) and f is a function o/x = (xi,..., xn). Let b = g(a) and ip = f o g.
Ifgi,..., gn are differentiable at a (resp. of class Cl near a) and f is differentiable
2.3. The Chain Rule 65
dtk -
i
where the derivatives df/dxj are evaluated at b and the derivatives dcp/dtk and
are evaluated at a.
The chain rule (2.30) has a neat interpretation in terms of differentials. Let
w = /(x). If we regard x\,... ,xn as independent variables, we have
On the other hand, if we regard xi,..., xn as functions of the variables ti,... ,tm
and w as the composite function /(x(t)), we have
and
The content of the chain rule (2.30) is precisely that this last expression for dw coin
cides with (2.33). In other words, the differential formalism has the chain rule "built
in," just as it does in one variable (where the chain rule dw/dt = (dw/dx)(dx/dt)
is just a matter of "canceling the dx's").
The preceding discussion concerns the situation where the variable w depends
on a set of variables Xj, and the x/s depend on a different set of variables t^.
However, in many situations the variables on different "levels" can get mixed up
with each other. The typical example is as follows. Consider a physical quantity
w = f(x, y, z, t) whose value depends on the position (x, y, z) and the time t
(temperature, for example, or air pressure in a region of the atmosphere). Consider
also a vehicle moving through space, so that its coordinates (x, y, z) are functions
of t. We wish to know how the quantity w varies in time, as measured by an
observer on the vehicle; that is, we are interested in the behavior of the composite
function
w = f(x(t),y(t),z(t),t).
Here t enters not only as a "first-level" variable, as the last argument of /, but also
as a "second-level" variable through the ^-dependence of x, y, z.
How should this be handled? There is no real problem; the only final indepen
dent variable is t, so the chain rule in the form (2.27) can be applied:
dw dw dx dw dy dw dz dw
(2'34) ~dt=~dx~ltt+^y"diJr~dz~Tt+~di'
In the last term we have omitted the derivative dt/dt, which of course equals 1. (If
this makes you nervous, denote the fourth variable in / by u instead of t; then we
are considering w = f(x(t),y(t),z(t),u(t)) where u(t) — t.)
Notice the subtle use of notation: The dw/dt on the left of (2.34) denotes the
"total derivative" of w, taking into account all the ways in which w depends on t,
whereas the dw/dt on the right denotes the partial derivative that involves only the
explicit dependence of the function / on its fourth variable t. This notation works
well enough in this situation, but it becomes inadequate if there is more than one
final independent variable.
Suppose, for example, that we are studying a function w = f(x, y, t, s), and
that x and y are themselves functions of the independent variables t and s. Then
the analogue of (2.34) would be
dw dw dx dw dy dw
dt ~= ~dx~~di + ~dy~~di + ~dt'
but this is nonsense! The dw/dt's on the left and on the right denote different
things. In such a situation we must use one of the alternative notations for partial
2.3. The Chain Rule 67
derivatives that offer more precision, or perhaps add some subscripts to the dw/dfs
to specify their meaning. In this case, if x = </?(*, s) and y = tjj(t, s), we could
write
Proof. Consider the function <p(t) = f{tx). On the one hand, since /(tx) =
tQ/(x), we have <p'(t) = ata~lf(yi) = at~lf(tx). On the other, by the chain rule
we have
Setting t = 1 and equating the two expressions for <p'(l), we obtain the asserted
result. □
We conclude this section with an additional geometric insight into the meaning
of the gradient of a function. If F is a differentiable function of (x, y, z) € M3,
the locus of the equation F(x,y,z) — 0 is typically a smooth two-dimensional
surface in S in R3. (We shall consider this matter more systematically in Chapter
3.) Suppose that (x, y, z) — g(t) is a parametric represention of a smooth curve on
S. On the one hand, by the chain rule we have (d/dt)F(g(t)) = VF(g(i)) • g'(t).
On the other hand, since the curve lies on S, we have F(g(£)) = 0 for all t and
hence (d/dt)F(g(t)) = 0. Thus, for any curve on the S, the gradient of F is
orthogonal to the tangent vector to the curve at each point on the curve. Since such
curves can go in any direction on the surface, we conclude that at any point a G S,
VF(a) is orthogonal to every vector that is tangent to S at a. (Of course, this is
interesting only if VF(a) ^ 0.) We summarize:
2.3. The Chain Rule 69
S={(x,y,z)eU:F{x,y,z) = O}
is a smooth surface. //a€5 and VF(a) / 0, then the vector VF(a) is perpen
dicular, or normal, to the surface S at a.
2.38 Corollary. Under the conditions of the theorem, the equation of the tangent
plane to S at a is VF(a) ■ (x - a) — 0.
This formula for the tangent plane to a surface agrees with the one we gave in
§2.2 when the surface is the graph of a function f(x,y). The easy verification is
left to the reader (Exercise 5).
A similar result holds if we have two equations F(x, y, z) = 0 and G(x, y, z) =
0. Each of them (usually) represents a surface, and the intersection of the two
surfaces is (usually) a curve. At any point a on this curve, the vectors VF(a) and
VG(a) are both perpendicular to the curve, and if they are linearly independent,
they span the normal plane to the curve at a.
These ideas carry over into dimensions other than 3. For n = 2, an equation
F(x, y) = 0 typically represents a curve C, and VF(a, b) is normal to C at each
(a, b) G C. For n > 3, we simply stretch our imagination to say that VF(a) is
normal to the hypersurface defined by F(x) — 0 at x = a.
EXERCISES
The mean value theorem for functions of n variables can be stated as follows. We
recall that if a and b are two points in Rn, the line passing through them can be
described parametrically by g(t) = a + t(b - a). In particular, the line segment
whose endpoints are a and b is the set of points a + £(b - a) with 0 < t < 1.
2.39 Theorem (Mean Value Theorem III). Let S be a region in W1 that contains
the points a and b as well as the line segment L that joins them. Suppose that f is
a function defined on S that is continuous at each point of L and differentiable at
each point ofL except perhaps the endpoints a and b. Then there is a point c on L
such that
Proof. Let h = b - a; then L = {a + th : 0 < t < 1}. Define <p(t) = /(a + th)
for 0 < t < 1. Since / is continuous on L, tp is continuous on [0,1]. Moreover, by
2.4. The Mean Value Theorem 11
<p'(t) = V/(a + th) ■ ^-(a + th) = V/(a + th) • h = V/(a + th) ■ (b - a).
at
By the one-variable mean value theorem, there is a point u e (0,1) such that
<p(l) - <^(0) = ip'(u) • (1 - 0) = y/(u). Let c = a + uh; then
To state the principal corollaries of the mean value theorem, we need a defini
tion. A set 5 C Rn is called convex if whenever a, b G 5, the line segment from
a to b also lies in S. Clearly every convex set is arcwise connected (line segments
are arcs!), but most connected sets are not convex. See Figure 2.4.
so a+t(b — a) e B. (We have used the fact that t and 1 —t are both nonnegative
when 0 < t < 1.)
Proof. The line segment from a to b lies in S, and for some c on this segment we
have /(b) - /(a) = V/(c) • (b - a). Hence, by Cauchy's inequality, |/(b) -
□
Proof. Pick a € S and take M = 0 in Corollary 2.40. We conclude that for every
b € S, |/(b) - /(a) | = 0, that is, /(b) = /(a). □
S" N
Figure 2.4: A convex set (Si), a set that is connected but not convex
(S2), and a disconnected set
EXERCISES
1. State and prove two analogues of Rolle's theorem for functions of several vari
ables, whose hypotheses are, respectively, the following:
2.5. Functional Relations and Implicit Functions: A First Look 73
Let us be clear about what we mean by saying that "it is possible to solve for
y." First, we mean that it is possible to solve in principle, not necessarily that there
is an explicit formula for y. Second, there might be more than one solution, and
obtaining yasa function of the Xj's then involves making a definite choice among
the solutions; moreover, the domain of this function may be smaller than one would
suspect from the original equation.
Example 1.
a. Consider the equation x — y — y5 = 0. It's easy to solve this for x in terms
of y, x = y + y5, but there is no nice algebraic formula for y in terms of
x. However, y + y5 is a strictly increasing function of y (its derivative is
1 + by4, which is positive everywhere), and its values clearly range from
—oo to ex), so there for each x there is exactly one y satisfying x = y + y5,
74 Chapter 2. Differential Calculus
and we can call itg(x). The object in such a situation is to use the equation
x = y + y5 to study the function g.
b. The equation x2 + y2 + z2 = 1 can be solved for z as a continuous function
of x and y in two ways, z = \A — a;2 — y2 and z = — i/l — x2 — y2, both
of which are defined only for x2 4- y2 < 1.
At this stage we are not going to going to worry about these matters, or about
the question of when it is possible to solve the equation at all; such questions will
be addressed in Chapter 3. Rather, we shall assume that there is a differentiable
function g(x\,..., xn), denned for X\,... ,xn in some region S C Wl, so that
the equation F(x\,... ,xn,y) = 0 is satisfied identically when g(xi,...,xn) is
substituted for y:
In this situation we can use the chain rule to compute the partial derivatives
of g in terms of the partial derivatives of F, simply by differentiating the equation
(2.43) with respect to the variables Xj\
dg dg djF
(2.44) djF + dn+i.F-r— = 0, so 7— ^
Ixj dn+xF
Example 1 (continued).
a. Differentiation of the equation x - y — y5 = 0 with respect to x yields
1 - (dy/dx) - 5y4 (dy/dx) = 0, or (dy/dx) = 1/(1 + 5j/4). Of course,
this gives dy/dx in terms of y instead of x, and we don't have a formula
for y in terms of x, but this is better than nothing!
b. Differentiation of x2+y2+z2 = 1 with respect to x, with z as as the depen
dent variable, gives 2x + 2z(dz/dx) = 0, or dz/dx = —x/z. It is easily
verified that this formula is correct whether we take z = y/l — x2 — y2 or
z = -a/1 -x2 -y2.
2 o dw
w = x +y - x - y, — = 2x — 1.
The usual way to clarify this situation is to put subscripts on the partial deriva
tives to indicate which variables are being held fixed:
Thus, in Example 2,
dw
= 2x - 1, 2z.
dx dx
y
The preceding ideas work in much the same way when we are given more than
one constraint equation. For example, if we are given two equations F(x, y, u, v) =
0 and G(x, y, u, v) = 0, we may be able to solve them for the two variables u and
v in terms of the other two variables x and y. In this case the partial derivatives
of u and v with respect to x, say, can be calculated by differentiating the equations
F = 0 and G = 0, obtaining
Qvl dv
dxF + duF— + dvF— = 0,
ox ox
dxG + duGp
dx
+ dvG^
dx
= 0,
76 Chapter 2. Differential Calculus
and then solving these (linear!) equations simultaneously for du/dx and dv/dx.
By Cramer's rule (Appendix A, (A.54)), the result is
We could solve these equations for y' and z' as they stand, but since we are
interested in the answer at (x,y,z) = (1,0,2), we can simplify matters by
substituting in these values right now. The first equation reduces to 7 + z' —
64y' = 0 and the second one to 2y' = 2 + y' — z', or
Solving these equations yields y' — ■§§ and z' = 4|p, so — returning to
the original question — dy = y' dx = ^(.02) = g^ and dz — z' dx =
121/ no^ _ 121
65 V-U^V — 3250-
EXERCISES
5. Let V = irr2h and S = 2irr(r + h) (the volume and surface area of a circular
cylinder). Compute
where the subscript indicates the variable that is being held fixed.
dx dy dz
dy dz dx
provided that the symbols are interpreted properly. (Part of the problem is to
say what the proper interpretation is.)
df
dxi [<
are
d2f
XjXii J ji dXidXjf,
dxidxj'
78 Chapter 2. Differential Calculus
if i ^ j and
2
qx j
dxg = sin(:c3 + e2y) + 3x3 cos(a;3 + e2y), dyg = 2xe2y cos(x-3 + e2y).
Example 2. Let
This last expression has no limit as (x, y) —> (0, 0) (approaching (0,0) along
different straight lines gives different limits). Thus, we see that dydxf and
dxdyf exist everywhere, are continuous except at the origin, and are equal
except at the origin.
Proof. Since only the variables X{ and Xj are actually involved here, we may as well
assume that n = 2 and write x = (x,y) and a = (a, b), so that we are studying
the derivatives dxdyf and dydxf. These derivatives can be regarded as limits of
second-order difference quotients, so we begin by examining the "difference of
differences" obtained when x and y are both changed by an amount h:
<p(t) = /(a + h,b + t)- /(a, b + t), tjj(t) = f(a + t,b + h) - /(a + t, 6),
we have
We apply the (one-variable) mean value theorem twice to the first expression for
D, obtaining
where u and v are some other numbers between 0 and h. Equating these two
expressions and cancelling the h2, we have
Now let h —* 0. Then u,v,u,v —> 0 also, so since dxdyf and dydxf are assumed
continuous at (a, b), we obtain dxdyf(a, b) — dydxf(a, b). □
Once this is known, an elementary but slightly messy inductive argument shows
that the analogous result for higher-order derivatives is also true:
whenever the sequence {j%,..., jk} is a reordering of the sequence {ii,..., i^}.
The fact that the order of differentiation in a mixed partial derivative can occa
sionally matter is a technicality that is of essentially no importance in applications.
In fact, by adopting a more sophisticated viewpoint one can prove a theorem to
the effect that, under very general conditions, didjf and djdif are always equal
"almost everywhere," which is enough to allow regarding them as equal for all
practical purposes.
The chain rale can be used to compute higher-order partial derivatives of com
posite functions, but there are some pitfalls to be avoided. To be concrete, suppose
that w = f(x, y) and that x and y are functions of s and t. Assume that all the
functions in question are at least of class C2. To begin with, the chain rule for
first-order derivatives gives
dw dw dx dw dy
of x and y, not x and s. Rather, dw/dx is a function of x and y just like w, and
to differentiate it with respect to s we use the chain rule again; and likewise for
dw/dy:
Now we plug these results into (2.49) to get the final answer, which thus contains
quite a few terms. Pitfall number 2: It's easy to forget some of these terms.
In this situation it's usually advantageous to use the notation fx and fy in
stead of dw/dx and dw/dy, and likewise for second-order derivatives. This makes
(2.48)-(2.50) look a little more manageable:
dw dx dy
— = fx (- /,,—,
ds ds ds
d2w dfx dx d2x dfy dy
ds2 ds ds x ds2 ds ds
ds ds ds ds ds ds
f \]offf!_f Wf^f
ds2 ~ !xx [ds\ + Uy ds ds + hv [ds\ + uds2 + hds2 ■
Of course, similar results also hold for the other second-order derivatives of w.
d2u
^^ = -2t[2sfxx + 2tfxy] + 2s[2sfxy + 2tfyv] + 2fy
The calculation of the mixed derivative d2u/drd9 is left to the reader (Exercise
2).
Notice, in particular, that by combining the last two equations and using
the identity sin2 9 + cos2 9 — 1, we obtain
02u i
1 du i
1 d'2u .
f I f
The expression on the right, the sum of the pure second partial derivatives of /
with respect to a Cartesian coordinate system, turns up in many practical and
theoretical applications; it is called the Laplacian of /. (We shall encounter
it again in Chapter 5.) What we have just accomplished is the calculation of
the Laplacian in polar coordinates. We state this result formally, with slightly
different notation.
d2u i
d2u _
d'2u _|_
1 du t
1 d2u
dx2 dy2 'Or'1 r dr r2 d92
If a is a multi-index, we define
a s-ox a2 an J -
0(0,3,0)
a
f _ ^!Z
I - dy3,
x(2,l,5)
x
_ 2 5
- x yz .
As the notation xa indicates, multi-indices are handy for writing not only
derivatives but also polynomials in several variables. To illustrate their use, we
present a generalization of the binomial theorem.
2.52 Theorem (The Multinomial Theorem). For any x = (xi, x2,. •. xn) G W1
and any positive integer k,
{ x2 + ■ ■ ■ 4- xnf = 2^ — xa.
\a\=k
ai!a2! x ,
\a\=k
where we have set a,\ = j, a2 = k—j, and a = (a\, a2). The general case follows
by induction on n. Suppose the result is true for n < N and x = (a?i,. •., xn). By
84 Chapter 2. Differential Calculus
using the result for n = 2 and then the result for n = N — 1, we obtain
9. For x € t" \ {0}, let F(x) = f(r) where / is a C2 function on (0, oo) and
r = |x|. Show that d\F + ■■■ + d2nF = f"(r) + (n - l)r-lf'{r).
10. Derive the following version of the product rule for partial derivatives:
da(fg) = Y.p+rdpAIPWdPfting.
11. Prove the following n-dimensional binomial theorem: For all x, y e W1 we
have (x + y)a = E/3+7=a(«!//3b'!)xV-
/3+7
(2.53)
is called the fcth-order Taylor polynomial for / based at a, and the difference
is called the /cth-order Taylor remainder. The various versions of Taylor's theorem
provide formulas or estimates for Raik that ensure that the Taylor polynomial Pa^
is a good approximation to / near a. The ones most commonly known involve the
stronger assumption that / is of class Ck+l and yield the stronger conclusion that
the remainder vanishes as rapidly as \x — a\k+l. We present two of these, as well
as one that yields the more general form of the theorem stated above.
The easiest version of Taylor's theorem to derive is the following.
2.55 Theorem (Taylor's Theorem with Integral Remainder, I). Suppose that f
is of class Ck+l (k > 0) on an interval I C R, and a G /. Then the remainder
Ra^k defined by (2.53)-(2.54) is given by
uk+1 rl
(2.56) Rak(h) = —J (l-t
86 Chapter 2. Differential Calculus
/I ra+h
The trick now is to integrate (2.57) by parts, choosing for the antiderivative of the
constant function 1 not t but t — 1, alias — (1 — t):
h2
/o
th)hdt
= H^h2 + ^ f\l-tff'"{a
1 l Jo
we obtain the theorem for k = 2. The pattern is now clear: Integrating (2.57) by
parts k times yields (2.56). □
2.58 Theorem (Taylor's Theorem with Integral Remainder, II). Suppose that
f is ofclass Ck (k > 1) on an interval IcK, and a 6 /. Then the remainder Ra^
defined by (2.53)-(2.54) is given by
hk hk
fci (k-iy-Joy ' '
this gives (2.59). □
The formulas (2.56) and (2.59) are generally used not to obtain the exact value
of the remainder but to obtain an estimate for it. The main results are in the follow
ing corollaries.
Proof. /(fc) is continuous at a, so for any e > 0 there exists 6 > 0 such that
|/(fe)(y) - f(k\a)\ < e when \y — a\ < 5. In particular,
2.61 Corollary. Iff is of class Ck+1 on I and \f{k+1)(x)\ < M for x € /, then
\Ra^h)\<-^—\h\k+\ (a + h<=I).
Proof. By (2.56),
j/, fc+l /-I u
{l-t)kMdt=——\h\k+l.
~ k\
2.62 Lemma. Suppose g is k + 1 times differentiable on [a, b}. If'g(a) = g(b) and
g^'(a) = Ofor 1 < j < k, then there is a point c £ (a, b) such that g(k+1\c) = 0.
Proof. By Rolle's theorem, there is a point c\ £ (a, b) such that g'{c\) = 0. Since
g' is continuous on [a, ci] and differentiable on (a,ci), and g'(a) = g'(c\) = 0,
there is a point C2 £ (a, c\) such that g"{c2) = 0. Proceeding inductively, we find
that for 1 < j < k + 1 there is a point Cj £ (a, Cj-i) such that g^\cj) = 0, and
the final case j = k + 1 is the desired result. □
(2.64) otTT)r
Proof. Let us fix a particular ft, and suppose for now that ft > 0. Let
9(0 = «*»(«) -
The coefficient of tfc+1 is chosen to make g(ft) = 0, and clearly g(0) = 0. Simi
larly, for j < k we have
gK
(a;sco- j)fxQ<~x
— ran aiBnjBAa oj suoisuBdxa joiabt. asn T
X'UIS T
•ajdpuud siqi jo uoijBoqddB
jaqjouB sajBJjsnm ajdurexa Suimotjoj aqx ('8'3§ PUB 6 aspjaxg sas) '(V),J J°
u§is aqi uo Suipuadap '» 513 uinunuiui jo uinunxBUJ b qjiM uoipunj oiiB-ipBiib b si
iqgp aqj uo uoissajdxa aqj pire '2(o - x)(w)///| + (»)/ ps (as-)/ uaqj '0 / (°)///
jj :bui3jjx3 ^ooj joj jsaj aApBATaap-puooas aqj joj siSBq aqi st siqx 'qi-ioj os pire
'aAispap si ui.131 japjo-puooas aq] '0 7^ (o)y// }nq 0 = (°)// JI '3UO poo§ b st
(o - x)(v)J + (»)/ ^ (x)f uoqBUiTxo.iddB 3uij }U3§ubj aqj uaqi '0 / (°);/J! 's!
jBqx ■uoisimdxd jojiiox sjj in '('»)/ iw^a; iuvisuod 3i[i wou/juvdv 'wjpi Sini{siiivt\uoii
}suif di[i dq pdwuiddidp iCjdSjvj si v = x JVdii (x)f uojiounf v fo uoiAVifaq diji yem
a^dpuud '[BisuaS lUBiaodun ub si ji 'apis {Boijajoaqj aaoui aqi uo "tp|M ajnduioo 05
A*SBa XpAijB|aj ajB jBqj siBiuiouXjod A"q suoipunj pajBoqduioo ajBunxojddB 05 auo
Aaqj 'A\aiA jo juiod {Boipujd b uiojj -sasn a"ubui aABq sreiuiouA|od j
i+czxc{\-)
t-{x — l)
suopjunf difj fo 0 = v inoqo if dduSdpfo s\viuioui\od joj&vj^ sz/j; 'uoijisodoaj 59*
•jjooq aqj jo jsaj aqj ui ^uauiuioo jnoqjiAV pasn aq
'(££'Z) uoijiutpp aqj uiaij paAuap XjisBa are qoiqM 'suoisundxa jo^b^ jnjasn
puB jBmuiBj isoui aqi jo msj b ^reoaa sm 'japBaj aqj jo aouaiuaAuoo aqj joj
•a]qBJ§ajui uuBuiaiy si (I+^)/ l^qj aumssB oj q§noua si ji ££■£ uia.ioaqx
ui 'X[[Bnpv '(n-^)/ J° '^immjuoo aqj jou 'aouajsixa aqj Xrao papaau bm £9-3 uiaa
ui jnq 'T+J/O ssbjd jo si / lBqj pamnssB aA\ gg'^ uia.ioaqx uj
'(WZ) J° aouanbasuoo ajBipauiuu ub AjsnoiAqo si ^9'
□ "[M '0] IBAiajui aq5 uo (j--)B = (?)5 uoipunj
aqj guuapisuoo /^q ApBuuiis pa]puBq si 0 > y 3SB0 aqx '(WZ) Apspajd si siqj jng
qons («/ '0) 9 a Juiod b si aaaqj '39-3 BuraiaT Xq 'aaojajaqx "0 = (o) (p)6 os
68
90 Chapter 2. Differential Calculus
Solution. We have
where the dots denote error terms that vanish faster than x6 as x —>■ 0. There
fore,
x — sinx _ qX + ■ ■ ■ ^ g + ■ ■ ■
and hence
where the expression on the right denotes the result of applying the operation
(2.67) /(a + h) =
where formulas for j?a)fc(h) can be obtained from the formulas (2.56), (2.59), or
(2.64) applied to g.
2.7. Taylor's Theorem 91
(h V)? = J3
a =j
Substituting this into (2.67) and the remainder formulas, we obtain the following:
where
(2.70) i?a.fc(h) = i
i *
/
J0
(2.71) Ra,k{h) = (k + 1)
(2.73)
n ^ n
(2.74) =/(a) i£
92 Chapter 2. Differential Calculus
The first of these formulas is (2.67) with k = 2; the second one is (2.69). (Every
multi-index a of order 2 is either of the form (... , 2,...) or (... , 1,... , 1,...),
where the dots denote zero entries, so the sum over \a\ = 2 in (2.69) breaks up into
the last two sums in (2.74).) Notice that the mixed derivatives djdk (j ^ k) occur
twice in (2.73) (since djdk — dkdj) but only once in (2.74) (since j < k there);
this accounts for the disappearance of the factor of ^ in the last sum in (2.74).
We also have the following analogue of Corollaries 2.60 and 2.61:
|i?a,fc(h)| <
where
Proof. The proof of the first assertion is the same as the proof of Corollary 2.60.
As for the second, it follows easily from either (2.71) or (2.72) that
a!
a\=k+l
2.76. Lemma. If P(h) is a polynomial of degree < k that vanishes to order > k
as h -> 0 [i.e., P(h)/|h|fc -4 07, then P = 0.
Proof. The hypothesis implies that, for each fixed h, P(th)/tk —>■ 0 as t -)■ 0.
Write P = Pq + P\ + •' • + Pk where Pj is the sum of the terms of order j in P;
thus
P(th)
2.7. Taylor's Theorem 93
^)=a(h)+...+^n(h),
so P2(h) = 0 since P{th)/t2 —* 0. Continuing inductively, we conclude that
Pj (h) = 0 for all j, so P = 0. □
Theorem 2.77 has the following important practical consequence. If one wants
to compute the Taylor expansion of /, it may be very tedious to calculate all the
derivatives needed in formula (2.69) directly. But if one can find, by any means
whatever, a polynomial Q of degree k such that [/(a + h) - Q(h)]/|h|fc —» 0,
then Q must be the Taylor polynomial. This enables one to generate new Taylor
expansions from old ones by operations such as substitution, multiplication, etc.
In the last line we have thrown the terms x4, x6, x4y, and x2y2 into the garbage
pail, since they are themselves of order > 3. Thus the answer is 1 + y + x2 +
94 Chapter 2. Differential Calculus
EXERCISES
2. Find the Taylor polynomial P\ta(h) and give a constant C such that \Ry^(h)\ <
Cfi4 on the interval \h\ < ^ for each of the following functions.
a. f(x) = logs.
b. f{x) = V&.
c. f(x) = (x + 3) 1.
3. Show that | sin.r - x + g.x3j < .08 for \x\ < \tt. (Hint: x - \x3 is actually
the 4th-order Taylor polynomial of sin.x.) How large do you have to take k so
that the A;th-order Taylor polynomial of sin x about a = 0 approximates sin x
to within .01 for \x\ < jTT?
5. Find the Taylor polynomial of order 4 based at a = (0, 0) for each of the
following functions. Don't compute any derivatives; use Proposition 2.65.
a- f(x,y) = xs'm(x + y).
b. e ■' eos(.7;~ + y2)
c,x-2y
c. e"
where C and A are positive constants (cf. Exercise 1 in §1.8). Use (2.70) to
show that there is another positive constant C such that
Proof. If / has a local maximum or minimum at a, then for any unit vector u,
the function g(t) = /(a + iu) has a local maximum or minimum at t = 0, so
g'(0) = <9U/(a) = 0. In particular, djf{a) = 0 for all j, so V/(a) = 0. □
96 Chapter 2. Differential Calculus
How can we tell whether a function has a local maximum or minimum (or nei
ther) at a critical point? For functions of one variable we have the second derivative
test: If / is of class C2, then / has a local minimum at a if /"(a) > 0 and a local
maximum if /"(a) < 0. (If /"(a) = 0, no conclusion can be drawn.) Something
similar happens for functions of n variables, but the situation is a good deal more
complicated. The full story involves a certain amount of linear algebra; the reader
who is content to consider the case of two variables and wishes to skip the linear
algebra may proceed directly to Theorem 2.82.
Suppose / is a real-valued function of class C2 on some open set S C R and
that / has a critical point at a, i.e., V/(a) = 0. Instead of one second derivative to
examine at a, we have a whole n x n matrix of them, called the Hessian of / at a:
<9n<92/(a) ...
The equality of mixed partials (Theorem 2.45) guarantees that this is a symmetric
matrix, that is, Hij = Hji.
By (2.73), the second-order Taylor expansion of / about a is
/(a + k) = /(a)
(We use k rather than h for the increment in this section to avoid a notational clash
with the Hessian H.) If V/(a) = 0, the first-order sum vanishes, and the second-
order sum is j Yl Hijkikj = \H~k ■ k. In short,
Now we can begin to see how to analyze the behavior of / about a in terms of
the matrix H. To start with the simplest situation, suppose it happens that all the
mixed partials d-idjf (?' ^ j) vanish at a. Denoting <92/(a) by Xj, we then have
Let us neglect the remainder term for the moment. If all Xj are positive, then
Y, ^jkj > 0 f°r aH k / 0, so / has a local minimum; likewise, if all Xj are neg
ative, then / has a local maximum. If some Xj are positive and some are negative,
2.8. Critical Points 97
then ^ Xjkj will be positive for some values of k and negative for others, so / will
have neither a maximum or a minimum. It's not hard to see that these conclusions
remain valid when the remainder term is included; we shall present the details be
low. Only when some of the Xj are zero is the outcome unclear; it is precisely in
this situation that the remainder term plays a significant role.
This is all very well, but the condition that didjf(a) = 0 for i / j is ob
viously very special. However, it can always be achieved by a suitable rotation
of coordinates, that is, by replacing the standard basis for R" with another suit
ably chosen orthonormal basis. This is the content of the spectral theorem, which
says that every symmetric matrix has an orthonormal eigenbasis (see Appendix A,
(A.56)-(A.58)). With this result in hand, we arrive at the second-derivative test for
functions of several variables.
Proof. We prove only the first assertion; the argument for the second one is similar.
Let ui,..., un be an orthonormal eigenbasis for H with eigenvalues Ai,..., An.
Our assertion is then that / has a local minimum if all the eigenvalues are (strictly)
positive but not if some eigenvalue is negative.
If all eigenvalues are positive, let I be the smallest of them. Writing k =
• • • + cnun as before, we have
But when k is near 0, the error term in (2.80) is less than \l\k\2 by Corollary 2.75,
so
In short, if all eigenvalues are positive, then / has a local minimum; if all
eigenvalues are negative, then / has a local maximum. If there are two eigenvalues
of opposite signs, then / is said to have a saddle point. At a saddle point, / has
neither a maximum nor a minimum; its graph goes up in one direction and down in
some other direction. The only cases where we can't be sure what's going on are
98 Chapter 2. Differential Calculus
those where all the eigenvalues of H are nonnegative or nonpositive but at least one
of them is zero. When that happens, if k is an eigenvector with eigenvalue 0 (i.e.,
k is in the nullspace of H), the quadratic term in (2.80) vanishes and the remainder
term becomes significant; to determine the behavior of / near a we need to look at
the higher-order terms in the Taylor expansion.
Some types of critical points are illustrated in Figure 2.5. A critical point for
which zero is an eigenvalue of the Hessian matrix H — or equivalently, for which
det H = 0 or H is singular — is called degenerate.
In two dimensions it is easy to sort out the various cases:
Example 1. Find and classify the critical points of the function f{x,y) =
xij{12 -Zx-Ay).
Solution. We have
y = x = 0, y = 12 - 3x - 8y = 0,
Solving these gives the critical points (0,0), (4,0), (0,3), and (|, 1). Since
d2xf = -Qy, dlf = -8a, and dxdyf = 12 - 6x - 8y, Theorem 2.82 shows
that the first three of these are saddle points and the last is a local maximum.
The geometry of this example is quite simple. The set where / = 0 is the
union of the three lines x = 0, y = 0, and 3x + Ay = 12. These lines separate
the plane into regions on which / is alternately positive and negative. The three
saddle points are the points where these lines intersect, and the local maximum
is the "peak" in the middle of the triangle defined by these lines.
Example 2. Find and classify the critical points of the function f{x,y) =
y3 - 3x2y.
Solution. We have dxf = -6xy and dyf = 3y2 - 3xi2. Thus, if dxf = 0,
then either x = 0 or y = 0, and the equation dyf = 0 then forces x = y = 0.
So (0,0) is the only critical point. The reader may readily verify that all the
second derivatives of / vanish at (0,0), so Theorem 2.82 is of no use. But since
f{x,y) = y(y — \/3 x)(y + y/3 x), the lines y = 0 andy = ±\/3a; separate the
plane into six regions on which / is alternately positive and negative, and these
regions all meet at the origin. Thus / has neither a maximum nor a minimum at
the origin. This configuration is called a "monkey saddle." (The three regions
where / < 0 provide places for the two legs and tail of a monkey sitting on the
graph of / at the origin.)
EXERCISES
1. Find all the critical points of the following functions. Tell whether each nonde-
generate critical point is a local maximum, local minimum, or saddle point. If
possible, tell whether the degenerate critical points are local extrema too.
100 Chapter 2. Differential Calculus
4. Let/(x,y) = (y-x2)(y-2x2).
a. Show that the origin is a degenerate critical point of /.
b. Show that the restriction of / to any line through the origin (i.e., the func
tion g(t) = f(at, bt) for any (a, b) / (0,0)) has a local minimum at the
origin, but / does not have a local minimum at the origin. {Hint: Consider
the regions where / > 0 or / < 0.)
5. Let H be the Hessian of /. Show that for any unit vector u, Hu ■ u is the
second directional derivative of / in the direction u.
of calculus, we shall assume that S is either (i) the closure of an open set with
a smooth or piecewise smooth boundary, or (ii) a smooth submanifold, such as a
curve or surface, defined by one or more constraint equations. (These geometric
notions will be studied in more detail in Chapter 3.)
Suppose, to begin with, that S is the closure of an open set in Rr\ and that we
wish to find the absolute maximum or minimum of a differentiable function / on
5. We assume that the boundary of 5 is a smooth submanifold (a curve if n = 2, a
surface if n = 3) that can be described as the level set of a differentiable function
G, or that it is the union of a finite number of pieces of this form. (For example,
if S is a cube, its boundary is the union of six faces, each of which is a region in a
smooth surface, viz., a plane.) If S is bounded, the extreme values are guaranteed
to exist, and we can proceed as follows.
ii. To find candidates for extreme values on the boundary, we can apply the
techniques for solving extremal problems with constraints presented below.
iii. Finally, we pick the smallest and largest of the values of / at the points
found in steps (i) and (ii); these will be the minimum and maximum of / on
S. There is usually no need to worry about the second derivative test in this
situation.
If S is unbounded, the procedure is the same, but we must add an extra argu
ment to show that the desired extremum actually exists. This must be done on a
case-by-case basis, as there is no general procedure available; however, here are a
couple of simple results that cover many situations in practice and illustrate the sort
of reasoning that must be employed.
a. If /(x) —> +00 as |x| —> oo fx € S), then f has an absolute minimum but no
absolute maximum on S.
b. Iff(x) —> 0 as |x| —> oo fx £ 5) and there is a point xo <E S where /(xn) > 0
(resp. /(xo) < 0), then f has an absolute maximum (resp. minimum) on S.
Proof, (a) If /(x) —» oo as |x| —> oo, then clearly / has no maximum. On the
other hand, pick a point x0 e S and let V = {x e S : /(x) < /(xo)}. Then V is
closed (by Theorem 1.13) and bounded (since /(x) > /(x0) when |x| is large). By
102 Chapter 2. Differential Calculus
the extreme value theorem, / has a minimum on V, say at a € V. But then /(a) is
the absolute minimum of / on V because /(x) > /(xo) > /(a) for x € S \ V.
The proof of (b) is similar. If /(x0) > 0, let V = {x : /(x) > /(x0)}. Then
V is closed (by Theorem 1.13) and bounded (since /(x) —> 0 as |x| —» oo). By the
extreme value theorem, / has a maximum on V, say at a € V. But then /(a) is the
absolute maximum of / on 5 because /(x) < /(xo) < /(a) for x € 5 \ V. D
EXAMPLE 1. Find the absolute maximum and minimum values of the function
X
f(x, y) = 2 ' on the first quadrant S = {{x,y) : x,y > 0}.
x -\- \y L) -\- 4
Solution. Clearly /(cc,j/) > 0 for cc,y > 0, and f{0,y) = 0, so the
minimum is zero, achieved at all points on the y-axis. Moreover, f(x, y) is less
than the smaller of x~x and (y — 1)~2, so it vanishes as |(x, y)\ —> oo. Hence,
by Theorem 2.83, / has a maximum on S, which must occur either in the
interior of 5 or on the positive x-axis. A short calculation that we leave to the
reader shows that the only critical point of / in 5 is at (2,1), and /(2,1) = |.
Also, f(x,0) = x/(x2 + 5), and the critical points of this function of one
variable are at x = ±\/5. Only x = \/5 is relevant for our purposes, and
f(y/5, 0) = i/5/lO, which is a bit less than |. Thus the maximum value of /
on 5 is |.
5= (x:G(x) = 0},
This is the key to the method. The n equations djf = XdjG together with the
constraint equation G = 0 give n+1 equations in the n+1 variables xi,..., Xn and
A, and solving them simultaneously will locate the local extrema of / on 5. (It will
also produce the appropriate values of A, which are usually not of much interest,
although one may have to find them in the process of solving for the x/s.) This
method is called Lagrange's method, and the parameter A is called the Lagrange
multiplier for the problem.
The other methods described above involve reducing the original n-variable
problem to an (n — Invariable problem, whereas Lagrange's method deals directly
with the original n variables. This may be advantageous when the reduction is awk
ward or when it would involve breaking some symmetry of the original problem.
The disadvantage is that, whereas the other methods lead to solving n — 1 equations
in n — 1 variables, Lagrange's method requires solving n + 1 equations in n + 1
variables.
Example 2. Let's try out Lagrange's method on the simple problem of max
imizing the area of a rectangle with perimeter P. Here f(x, y) = xy and
G(x,y) — 2x + 2y - P, so the equations dxf = XdxG, dyf = XdyG, and
G = 0 become
y = 2A, x = 2A, 2x + 2y = P.
The first two equations give y = x; substituting into the third equation shows
that x = y = \ P, so the maximum of / is jqP2- (Note that the only relevant
values of x and y are 0 < x, y < \P, so we're working on a compact set and
the existence of the maximum is not in question. The minimum on this set,
namely 0, is achieved when x = 0, y = \P, or vice versa.)
104 Chapter 2. Differential Calculus
2x = 2Xx, 2y + l=2\y, x2 + y2 = 1.
Similar ideas work when there is more than one constraint equation. Let's
consider the case of two equations:
5={x:G1(x)=G2(x) = 0}.
Here G\ and G2 are differentiable functions (the subscripts are labels for the func
tions, not partial derivatives), and we assume that the vectors VGi (x) and VG2(x)
are linearly independent for x € S. (Again, this guarantees that S is a "smooth"
set, as we shall see in Chapter 3.) To find the extreme values of a differentiable
function on S, we have three methods:
• Solve the equations Gi(x) = G2(x) = 0 for two of the variables and find
the critical points of the resulting function of the remaining n — 2 variables.
The n equations djf = XdjGi + iidjGi together with the two constraint
equations G\ = G2 = 0 can be solved for the n + 2 variables xi,... ,xn, X,
and n, yielding the points where local extrema can occur.
EXERCISES
k~l y\ xiVi — xv
a~ J,-lVnT2-T2 ' V
where x = fc"1 Yla xj and V = k'1 J2i Uj are me averages of the x/s and
y/s.
11. Let x, y, z be positive variables and a, b, c positive constants. Find the mini
mum of x + y + z subject to the constraint (a/x) + [b/y) + {c/z) = 1.
12. Find the minimum possible value of the sum of the three linear dimensions
(length, breadth, and width) of a rectangular box whose volume is a given
constant V. Is there a maximum possible value?
13. Find the point on the line through (1,0,0) and (0,1,0) that is closest to the
line through (0,0,0) and (1,1,1). (Hint: Minimize the square of the distance.)
106 Chapter 2. Differential Calculus
14. Find the maximum possible volume of a rectangular solid if the sum of the
areas of the bottom and the four vertical sides is a constant A, and find the
dimensions of the box that has the maximum volume.
16. Find the maximum value of (xv — yu)2 subject to the constraints x2 + y2 = a2
and u2+v2 = b2.Do this (a) by Lagrange's method, (b) by the parametrization
x = a cos 6,y = a sin 9,u = b cos <p,v = b sin ip.
17. Let Pi = (xi,y{) and P2 = (^2,2/2) be two points in the plane such that
x\ 7^ £2 and j/i > 0 > j/2- A particle travels in a straight line from Pi to a point
Q on the x-axis with speed v\, then in a straight line from Q to P2 with speed
V2- The point Q is allowed to vary. Use Lagrange's method to show that the
total travel time from Pi to P2 is minimized when (sin #1)/(sin #2) = V1/V2,
where 6\ (resp. #2) is the angle between the line P\Q (resp. QP2) and the
vertical line through Q. (Hint: Take 6>i, (92 as the independent variables.)
18. Let xi,X2, ■ ■ ■ ,xn denote nonnegative numbers. For c > 0, maximize the
product £1X2 • • • %n subject to the constraint x\ + X2 H \-xn — c, and hence
derive the inequality of geometric and arithmetic means,
where equality holds if and only if the Xj's are all equal.
19. Let A be a symmetric nxn matrix, and let /(x) = (.Ax) • x for xel". Show
that the maximum and minimum of / on the unit sphere {x : |xj = 1} are the
largest and smallest eigenvalues of A.
So far our focus has been on real-valued functions on Rn, that is, mappings from
W1 to R. In a number of situations, however, it is useful to consider vector-valued
functions, that is, mappings (or maps, for short) from Rn to Rm where n and in
are any positive integers. We shall denote such functions or mappings by boldface
letters such as f:
• In the situation of the chain rale, where w is a function of x\,..., xn and the
Xj's are functions of other variables t\,..., £&, we are dealing with a map
x = g(t) from Rk to Rn.
• A map f : Rn —> Rn can represent a vector field, that is, a map that assigns
to each point x a vector quantity f (x) such as a force or a magnetic field.
fe=l
You can see that the study of mappings from Rn to Rm is complicated, as the study
of the linear ones already constitutes the subject of linear algebra! However, the
basic ideas of differential calculus generalize easily from the scalar case. The only
bits of linear algebra we need for present purposes are the correspondence between
linear maps and matrices, the notion of addition and multiplication of matrices, and
the notion of determinant; see Appendix A, (A.3)-(A.15) and (A.24)-(A.33).
"Here we use the word "linear" in the more restrictive sense; see Appendix A, (A.5).
108 Chapter 2. Differential Calculus
There can only be one such matrix L (the reason is given in the next paragraph),
and it is called the (Frechet) derivative of f at a. Commonly used notations for
it include Df (a), Daf, f (a), and dfa. We shall denote it by Df(a). Thus, if f is
differentiable on S, the map Df that assigns to each a £ S the derivative -Df (a) is
a matrix-valued function on S.
We need to verify that there is at most one matrix L satisfying (2.84). If V is
another such matrix, we have
so that |Lh - L'h|/|h| -*■ 0. But if V ^ L, we can pick a unit vector u with
Lu 7^ L'u. Setting h = su, we have h —> 0 as s -> 0, but
|L()^)| |(L^u)|= a
su|
This is a contradiction, so V = L.
In the scalar case rn — 1 (where f = /), the definition of differentiability
above coincides with the old one, and Df (a) is just V/(a), considered as a row
vector, i.e., a 1 X n matrix. (If we think of V/(a) as a column vector, then Df (a) =
[V/(a)]*.) Something similar happens when m > 1. Indeed, a vector v approaches
the vector 0 precisely when each of its components approaches the number 0, so
(2.84) is equivalent to the equations
where D is the jth row of the matrix L. But these equations say that the compo
nents fj are differentiable at x = a and that V/;(a) = U. In short, we have:
\dfm/dxi ■ ■■ dfmldxn
2.10. Vector-Valued Functions and Their Derivatives 109
The general fompof the chain rule can now be stated very simply:
= £>f (g(a))£>g(a),
vv/iere ?/ie expression on the right is the product of the matrices £)f (g(a)) and
+ ■ ■ ■ + (dnfi)(dkgn) =
.7 = 1
and c^j are to be evaluated at a, <9j/; at g(a).) But dkH{ is the ikth entry
of the matrix DH, and the sum on the right is the ikth entry of the product matrix
(Df)(Dg), so we are done. □
Since the product of two matrices gives the composition of the linear transfor
mations defined by those matrices, the chain rule just says that the linear approxi
mation of a composition is the composition of the linear approximations.
As we pointed out at the end of §2.1, the mean value theorem is false for vector-
valued functions. That is, for a differentiable Rm -valued function f with m > 1,
given two points a and b there is usually no c on the line segment between a and b
such that f(b) - f (a) = [Df(c)][b - a]. However, the main corollary of the mean
value theorem, an estimate on |f (a) - f(b)| in terms of a bound on the derivative
of f, is still valid. To state it, we employ the following terminology: The norm of
a linear mapping A : Rn -> Rm is the smallest constant C such that \Ax\ < C|x|
for all xei". The norm of A is denoted by ||A||; thus,
Proof. Given a unit vector u e Rm, let us consider the scalar-valued function
/u(x) = u • f(x). Clearly fu is differentiable on S and dkfu = u ■ dkf =
Y.m=iujdkfj- By the mean value theorem (2.39) applied to /u, then, there is a
point c on the line segment between a and b (depending on u) such that
The desired result now follows by taking u to be the unit vector in the direction of
f(b)-f(a),sothatu-[f(b)-f(a)] = |f(b)-f(a)|. (Of course, if f(b)-f(a) = 0,
the result is trivial.) CD
(2.89)
(The last notation may look peculiar at first, but it is actually quite handy.) Since
the determinant of a product of two matrices is the product of the determinants, the
chain rule implies that if y = f (x) and x = g(t) (t, x, y G W1), then
Jfog(t) = Jf (g(t))Jg(t), or
(2.90) 0(yi, ...,2/n) = d{yu...,yn)d{xu...,xn)
Df(x,y,z)-{ 5y_7z
SO
= (10, -
y)
EXERCISES
7. Suppose f and g are differentiate mappings from Rn to Wm. Show that their
dot product, h(x.) = f (x) ■ g(x), is a differentiable real-valued function on Rn,
and that
if we think of Vh(x), f (x), and g(x) as column vectors. (Here A* denotes the
transpose of the matrix A; see Appendix A, (A. 15).)
112 Chapter 2. Differential Calculus
8. Suppose that w = f(x, y, t, s) and x and y are also functions of t and s (the
situation depicted in Figure 2.3). The total dependence of w on t and s can be
expressed by writing w = /(g(£,s)) where g(£, s) = (x(t,s), y(t,s), t, s).
Show that the chain rule (2.86), applied to the composite function / o g, yields
the same result as the one obtained in §2.3.
In this chapter we take up the general question of the local solvability of systems
of equations involving nonlinear differentiable functions. The main result is the
implicit function theorem, one of the major theoretical results of advanced calcu
lus. Among other things, it provides the key to answering many questions about
relations between analytic properties of functions and geometric properties of the
sets they define. We shall present some of its applications to the study of geomet
ric transformations, coordinate systems, and various ways of representing curves,
surfaces, and smooth sets of higher dimension.
Simple examples show that it is usually impossible to represent the whole set
S = {x : F(x) = 0} as the graph of a function. For example, if n = 2 and
F(x, y) = x2 + y2 — 1, the set S is the unit circle. We can represent the upper or
lower semicircle as the graph of f(x) = ±\/l - x2, and the right or left semicircle
as the graph of g(y) = ±\/l — y2, but the whole circle is not a graph. Thus, in
order to get reasonable results, we must be content only to represent pieces of S
as graphs. More specifically, our object will be to represent a piece of S in the
neighborhood of a given point a G S as a graph.
Since we want to single out one of the variables as the one to be solved for, we
make a little change of notation: We denote the number of variables by n + 1 and
denote the last variable by y rather than xn+i. We then have the following precise
analytical statement of the problem:
We do not try to specify in advance how big the open sets in question will be; that
will depend strongly on the nature of the function F.
The key to the answer is to look at the linear case. If
the solution is obvious: The equation L(x,y) = 0 can be solved for y if and only
if the coefficient /3 is nonzero. But near a given point (a, b), every differentiate
function F{x, y) is approximately linear; in fact, if F(a., b) = 0,
If the "small error" were not there, the equation -F(x, y) = 0 could be solved for y
precisely when dyF(a., b) ^ 0. We now show that the condition dyF(&, b) ^ 0 is
still the appropriate one when the error term is taken into account.
3.1 Theorem (The Implicit Function Theorem for a Single Equation). Let
F{x, y) be a function of class Cl on some neighborhood of a point (a, b) 6 Mn+1.
Suppose that F(a, 6) = 0 and dyF(a,b) / 0. Then there exist positive numbers
ro, n such that the following conclusions are valid.
3.1. The Implicit Function Theorem 115
L =^ X
•2r0-
FlGURE 3.1: The geometry of the implicit function theorem. 0yF > 0
in the box, F > 0 on the top side, F < 0 on the bottom side, and
F = 0 on the curve.
a. For each x in the ball |x — a| < r$ there is a unique y such that \y — b\ < T\
and F(x, y) = 0. We denote this y by /(x); in particular, /(a) = b.
b. The function f thus defined for x — a| < is of class Cl, and its partial
derivatives are given by
(3.2)
Notes.
i. The number tq may be very small, and there is no way to estimate its size
without further hypotheses on F.
ii. The formula (3.2) for djf is, of course, the one obtained via the chain rule
by differentiating the equation F(x, /(x)) = 0.
Proof. We first prove (a). We may assume that dyF(a, b) > 0 (by replacing F by
—F if necessary). Since dyF is continuous, it remains positive in some neighbor
hood of (a, b), say for |x — a| < r\ and \y — b\ < r\. On this set, F(x,y) is a strictly
increasing function of y for each fixed x. In particular, since F(a, b) = 0 we have
F(a, b + 7"i) > 0 and F(a, b — r\) < 0. The continuity of F then implies that for
some ro < r\ we have F(x, b + r{) > 0 and F(x, b — ri) < 0 for |x — a| < r0.
In short, for each x in the ball B = {x : |x—a| < ro} wehaveF(x, b—ri) < 0,
F(x, 6 + r{) > 0, and F(x, y) is strictly increasing as a function of y for \y — b\ <
r\. It follows from the intermediate value theorem that there is a unique y for each
x € B that satisfies \y — b\ < r\ and F(x, y) = 0, which establishes (a). See
Figure 3.1.
Next we observe that the function y = /(x) thus defined is continuous at x =
a; in other words, for any e > 0 there is a 5 > 0 such that |/(x) — /(a)| < e
116 Chapter 3. The Implicit Function Theorem and Its Applications
whenever |x - a| < 5. Indeed, the argument just given shows that |/(x) - /(a)| =
\y — b\ < n whenever |x — a| < tq, and we could repeat that argument with r\
replaced by any smaller number e to obtain an appropriate 5 in place of r0.
In fact, this argument can also be applied with a replaced by any other point xo
in the ball B to show that / is continuous at x0. To recapitulate it briefly: Given
e > 0, there exists 5 > 0 such that if |x — xo| < <5 we have -F(x, yo — e) < 0 and
F(x.,yo + e) > 0, where yo = /(xo)- For each such x there is a unique y such
that \y — yo\ < e and F(x,y) = 0, and that y is /(x); hence |/(x) — /(xo)| =
\y-yo\ < e.
Now that we know that / is continuous on B, we can show that its partial
derivatives djf exist on B and are given by (3.2) — which also shows that they are
continuous. Given xeB and a (small) real number h, let y = /(x) and
0 = F(x + h, y + k)-F(x,y)
= hdjF(-x. + th, y + tk) + kdyF(x + th,y + tk)
Now let h —> 0. Since / is continuous we also have k —> 0, and then since djF
and dyF are continuous and dyF ^ 0, passage to the limit yields (3.2). □
3.3 Corollary. Let F be afunction ofclass Cl on Rn, and let S = {x : F(x) = 0}.
For every a G S such that VF(a) ^ 0 there is a neighborhood Nofa such that
S n N is the graph of a Cl function.
point (a, b) for which F(a, b) = 0. Of course, for this particular F it is easy
to solve for x explicitly — namely, x = y2 + 1 — and this solution is valid
not just locally but globally. Next, dyF(a, 6) = 0 precisely when b = 0, so
the implicit function theorem guarantees that the equation F(x,y) = 0 can be
solved uniquely for y near any point (a, b) such that F(a, b) = 0 and b ^ 0.
In fact, the possible solutions are y = \Jx — 1 and y = —\/x — 1. For x very
close to a only one of these solutions will be very close to 6 — namely, \/x — 1
if b > 0 and — \Jx — 1 if b < 0 — and this solution is the one that figures in
the implicit function theorem. Also, these solutions are defined only for x > 1,
so the number r$ in the statement of the implicit function theorem is a — 1.
Finally, we have F(l, 0) = 0, but the equation F(x, y) — 0 cannot be solved
uniquely for y as a function of x in any neighborhood of (1,0): If x > 1 there
are two solutions, both equally close to 0, and if x < 1 there are none.
We now turn to the more general problem of solving several equations simul
taneously for some of of the variables occurring in them. This will require some
facts about invertible matrices and determinants, for which we refer to Appendix
A, (A.24)-(A.33) and (A.50)-(A.55). To fix the notation, we shall consider k func
tions JFi,...,F% of n + k variables a?i,...,xn, y\,..., y^, and ask when we can
solve the equations
Fi(xi,...,xn,yi,...,yk) = 0,
(3.4) :
Fk(xi,...,xn,yi,...,yk) = 0
for the y's in terms of the x's. We shall use vector notation to abbreviate (3.4) as
(3.5) F(x,y) = 0.
118 Chapter 3. The Implicit Function Theorem and Its Applications
We assume that F is of class C1 near a point (a, b) such that F(a, b) = 0, and we
ask when (3.5) determines y as a C1 function of x in some neighborhood of (a, b).
Again the key to the problem is to consider the linear case,
(3.6) Ax + By + c = 0,
(3.7) £i;
yf, then, after substituting the results into the remaining equation, one solves that
equation for the remaining variable. The main difficulty is in showing that the
implicit function theorem can be applied to the last equation. □
(3.10) x — yu2 = 0, xy + uv = 0
d(u,v)
so the implicit function theorem guarantees a local solution near any point
0^0) yo,uo,vo) at which (3.10) holds provided that —2yoUQ / 0, that is, yo / 0
and uq ^ 0. Notice that under this condition, the first equation in (3.10) im
plies that xq / 0 and that xq and yo have the same sign; the second equation
then implies that vq ^ 0 and that uq and vq have opposite signs.
It is not hard to find the solution explicitly:
the signs of u and v being the same as the signs of uq and vq, respectively. This
solution is valid for all (a;, y) in the same quadrant as (xo, yo)- The problems
that arise if yo = 0 or uq = 0 are evident: If yo = 0, then the formula for u
does not even make sense for y = yo; if uq = 0, then xq must also be 0, and
the square roots present the same sort of problem as in Example 1.
EXERCISES
2. Show that the equation x2 + 2xy + 3y2 = c can be solved either for y as a
C1 function of x or for x as a C1 function of y (but perhaps not both) near any
point (a, b) such that a2 + 2ab + 362 = c, provided that c > 0. What happens
if c = 0 or if c < 0?
120 Chapter 3. The Implicit Function Theorem and Its Applications
3. Can the equation (x2 + y2 + 2z2)1!2 = cos z be solved uniquely for y in terms
of x and z near (0,1,0)? For z in terms of x and y?
4. Sketch the graph of the equation x - e1~x - y3 = 0 in Example 2. Show
graphically that for each x there is a unique y satisfying this equation, and vice
versa.
Of these, (i) is the simplest, and it a special case of the other two. Indeed, the curve
given by y = f(x) is the locus of the equation F(x,y) = 0 where F(x,y) —
'The locus of an equation F(x) = c is the set of all x that satisfy the equation.
3.2. Curves in the Plane 121
y — f(x), and it is also the range of the map f (t) = (t, f(t)). The representations
(ii) and (iii) are more flexible, but they are also too general as they stand because
the sets represented by them may not be smooth curves. Consider the following
examples, in which c denotes an arbitrary real constant:
Example 4. The function g(t) = (sin21, cos21) is C\ but its range is the
line segment from (0,1) to (1, 0). The point g(£) traverses this line segment
from (0,1) to (1, 0) as t goes from 0 to \k, then traverses it in the reverse
direction as t goes from ^tt to tt, and this pattern is repeated on every interval
[rar, (n + 1)tt\.
122 Chapter 3. The Implicit Function Theorem and Its Applications
In these examples, the functions in question are all of class C1, but the sets they
describe fail to be smooth curves at certain points. However, they share a common
feature: The points where smoothness fails — namely, the origin in Examples 1-3
and the points (0,1) and (1, 0) in Example 4 — are the points where the derivatives
of the relevant functions vanish. That is, the origin is the one and only point where
the gradients Vi71, VG, and VH vanish, and it is the image under f of the one and
only point (t = 0) where f vanishes. Moreover, (0,1) and (1,0) are the images
under g of the points t = nir and t = (n + ^)tt where g'(t) = 0.
This suggests that it might be a good idea to impose the extra conditions that
VF / 0 on the set where F = 0 in (ii) and that f'(£) ^ 0 in (iii). And indeed, with
the help of the implicit function theorem, it is easy to see that under these extra
conditions the representations (i)-(iii) are all locally equivalent. That is, if a curve
is represented in one of the forms (i)-(iii) and a is a point on the curve, at least a
small piece of the curve including the point a can also be represented in the other
two forms.
We now make this precise. Since (i) is more special than either (ii) or (iii), as
we have observed above, it is enough to see that a curve given by (ii) or (iii) can
also be represented in the form (i).
3.11 Theorem.
a. Let F be a real-valued function of class Cl on an open set in IR2, and let S =
{{x, y) : F(x, y) = 0}. If a £ S and V.F(a) + 0, there is a neighborhood N
of a in E2 such that S n N is the graph of a C1 function f (either y = f(x) or
Proof. Part (a) is a special case of Corollary 3.3. As for (b), let f = (ip,ip). If
f'(£o) 7^ 0, then either (p'(to) / 0 or ip'(to) j^ 0; let's assume that the former
condition holds. Let F(x,t) = x - ip(t) and xq = ip{to). Since dtF(xo,to) =
—(p'(to) ^ 0, the implicit function theorem guarantees that the equation x = <p(t)
can be solved for t as aC1 function of x, say t = ui(x), in some neighborhood of the
point (.To, to). But then (</?(£), ip{t)) = (x, ip(oj(x))) for t in some neighborhood /
of to; that is, the set {f (t) : t G /} is the graph of the C1 function / = ip o u. (If
ip'(to) ^z o instead, one can make the same argument with x and y switched.) □
With this in mind, we may make the following more formal definition of a
smooth curve: A set S C R2 is a smooth curve if (a) S is connected, and (b)
every a G S has a neighborhood iV such that S n N is the graph of a C1 function
/ (either y = f(x) or x = f(y)). This agrees with the notion of smooth curve
indroduced at the beginning of this section: The curve described by y = f(x)
has a tangent line at each point (xq, /{xq)), and that line is given by an equation
V — f(%o) — f'(xo)(x — xo) whose coefficients depend continuously on xq.
It should be emphasized that the conditions VF ^ 0 and f ^ 0 in Theorem
3.11, are sufficient for the smoothness of the associated curves but not necessary.
In other words, the condition VF(a) = 0 or f'(io) = 0 allows the possibility
of non-smoothness at a or f (t0) but does not guarantee it. For example, suppose
G(x, y) is a C1 function whose gradient does not vanish on the set S = {(x, y) :
G(x,y) = 0}, so that 5 is a smooth curve, and let F = G2. Then the set where
F = 0 coincides with S, but VF = 2GVG = 0 on 5! Similarly, as t ranges over
the interval (—1,1), the functions f(t) and g(t) = f(t3) describe the same curve,
but g'(0) = 0 no matter what f is.
The following question remains: Suppose S is a subset of M2 that is described
in one of the forms (i)-(iii), and suppose that the regularity condition VF ^ 0 on
S (in case (ii)) or i'(t) ^ 0 for all t G (a, b) (in case (III)) is satisfied. Theorem
3.11 shows that every sufficiently small piece of 5 is a smooth curve, but is the
entire set S a smooth curve? In case (i) the answer is clearly yes. However, in cases
(ii) and (iii) the answer may be no.
The trouble in case (ii) is that 5 may be disconnected. For example, if F =
GH, then 5 is the union of the sets {(x, y) : G(x, y) = 0} and {(x, y) : H(x, y) =
0}, and these sets may well be disjoint and form a disconnection of 5. (Also see
Exercise 6.)
EXAMPLE 5. Let F(x, y) = (x2 + y2 - l)(x2 + y2 - 2). Then the set where
F = 0 is the union of two disjoint circles centered at the origin. See Figure
3.3.
As for the representation (iii), a set of the form {f (t) : a < t < 6} is necessarily
connected if f is continuous (Theorem 1.26). However, the function f (t) may not
be one-to-one, in which case the curve it describes may be traced more than once
(as we observed in Example 4) or may cross itself. These phenomena can happen
124 Chapter 3. The Implicit Function Theorem and Its Applications
even ifi'{t) never vanishes. Consequently, the condition f'(£) ^ 0 is not sufficient
to guarantee that the set 5 = {f(£) : t e (a, b)} is a smooth curve, only that
the pieces of it obtained by restricting t to small intervals are smooth curves. In
practice, sometimes one simply imposes the extra assumption that f is one-to-one
in order to avoid various pitfalls.
The reader with access to a computer graphics program may find it entertaining
to experiment with examples similar to the ones in this section to obtain a better
understanding of the relations between analytic and geometric properties of func
tions and to see the various types of singularities that can arise when the regularity
condition VF ^ 0 or f (t) ^ 0 is violated.
EXERCISES
1. For each of the following functions F(x,y), determine whether the set S =
{(x, y) : F(x, y) = 0} is a smooth curve. Draw a sketch of S. Examine the
3.2. Curves in the Plane 125
3. For each of the following functions f (£), determine whether the set S = {f (t) :
t G M.} is a smooth curve. Draw a sketch of S. Examine the nature of S near
any points f (t) where f'(t) = 0.
a. f(t) = («2-l,t + l).
b. f(t) = (t2 - 1, i2 + 1).
c. f(t) = (t3 - 1, t3 + 1)
d. f(t) = (cos3 i, sin3 t).
e. f (£) = (cos t + cos 2i, sin t + sin 2i).
4. Let <^(s) = s2 if s > 0, </?(s) = -s2 if s < 0.
a. Show that <p is of class C1, even at s = 0.
b. Let f(£) = ((p(cost), ip(sint)). Show that {f(t) : t € R} is the square
with vertices at (±1, 0) and (0, ±1). For which values of t is f'(t) = 0?
What are the corresponding points f (£)?
6. Let F\ and F2 be C1 functions on some open set U in the plane, and let F% =
FiF2. For j = 1,2,3, let Sj = {x G U : Fj(x) = 0}.
a. Show that S3 = Sil) S2-
b. Show that if a G Si D S2, then VF3(a) = 0.
126 Chapter 3. The Implicit Function Theorem and Its Applications
In this section we discuss ways of representing smooth surfaces and curves in 1R3,
with a brief sketch of the situation in higher dimensions.
As before, (i) is a special case of (ii) and (iii), with F(x, y,z) = z — f(x, y) and
f(u,v) = (u,v,f(u,v)), and as before, some additional conditions need to be
imposed in cases (ii) and (iii) in order to guarantee the smoothness of the surface.
The condition in case (ii) is exactly the same as for curves, namely, that
Here x = (a;, y, z) and u = (u, v)\ the variables it and v are the parameters used to
represent the surface 5. We can think of them as giving a coordinate system on 5,
with the coordinate grid being formed by the images of the lines v = constant and
u = constant, that is, the curves given parametrically x = f (u, c) and x = f (c, v).
The picture is as in Figure 3.4.
What is the appropriate nondegeneracy condition on the derivatives off? A first
guess might be that the Frechet derivative Df (a 3 x 2 matrix) should be nonzero,
but this is not enough. We can obtain more insight by looking at the case where
f is linear, that is, i(u,v) — ua + vb + c for some a, b,c € R3. Typically the
range of such an f is a plane, but if the vectors a and b are linearly dependent
— that is, if one is a scalar multiple of the other — it will only be a line (unless
a = b = 0, in which case it is a single point). Now, for a general smooth f, the
linear approximation to f near a point {uo,v0) is f (u, v) ^ ua + vh + c where the
3.3. Surfaces and Curves in Space 111
I Z
,' X
vectors a, b, and c are duf, dvf, and f evaluated at (uo, un). Hence we are led to
the regularity hypothesis:
Since two vectors in R3 are linearly independent if and only if their cross product
is nonzero, (3.13) can be restated as
<9f 0f
(3.14) (u, v) / 0 at each (it, v) € U.
du dv
3.15 Theorem.
a. Let F be a real-valued function of class C1 on an open set in ]R3, and let
S = {(x,y,z) : F(x,y,z) = 0}. // a e S and VF(a) ^ 0, there is a
neighborhood N of a in R3 such that S n N is the graph of a C1 function f
(either z = f(x, y), y = f(x, z), or x = f(y, z)).
128 Chapter 3. The Implicit Function Theorem and Its Applications
b. Let f be a Cl mapping from an open set in R2 into R3. If[dufx dvf] (uo ,^0)7^
0, there is a neighborhood N of (uo,vq) in R2 such that the set {f(u,t>) :
(it, v) £ N} is the graph of a C1 function.
Proof. Part (a) is a special case of Corollary 3.3. As for (b), let f = (ip, ip, x). The
components of the cross product dui x dvf are just the Jacobians d(<p, ip)/d(u, v),
d(ip, x)/d(u, v), and d(ip, x)/d(u, v). Under the hypothesis of (b), at least one of
them — let us say d(ip, ijj)/d(u, v) — is nonzero at (uq, vq). The implicit function
theorem then guarantees that the pair of equations x — ip(u,v), y = tp(u,v) can
be solved to yield u and v as Cl functions of x and y near it = no, v = vq,
x = tp(uQ,Vo), y = i)}{uq,vq). Substituting these functions for u and v in the
equation z = x{u, v) then yields z as a Cl function of x and y whose graph is the
range of f. □
Thus the representations (i)-(iii) for surfaces are locally equivalent in the pres
ence of the regularity conditions (3.12) and (3.13); a smooth surface is a connected
subset of M.2 that can be locally described in any of these three forms. The poten
tial global problems with the representations (ii) and (iii) are the same as for plane
curves; namely, the set where a C1 function F vanishes may be disconnected, and
a map f that is locally one-to-one need not be globally one-to-one.
and
Here 0 is the longitude and tp is the co-latitude, i.e., the latitude as measured
from the north pole rather than the equator. The longitude 9 is only well defined
up to multiples of 2tt, but the co-latitude is usually restricted to the interval
[0,7r]. The sphere is a smooth surface, but the map f does not provide a "good"
parametrization of the whole sphere because it is not locally one-to-one when
sin cp = 0. (That is, the longitude is completely undetermined at the north and
south poles.) This degeneracy is also reflected in the tangent vectors
they are linearly independent when simp ^ 0, but d$f = 0 when siny? = 0.
However, if we restrict 6 and <p to the rectangle — 7r<9<ir,Q<<p<Tr, we
obtain a good parametrization of the sphere with the "international date line"
removed.
Finally, a few words about finding the tangent plane to a smooth surface S at a
point a € S. In general, the tangent plane is given by the equation n • (x — a) = 0,
where n is a (nonzero) normal vector to S at a. We have already observed in
Theorem 2.37 that when 5 is given by an equation F = 0, then the vector VF(a)
is normal to S at a. On the other hand, when S is given parametrically as the range
of a map f (u, v), the vectors dui(a) and dvi(a) are tangent to certain curves in 5*
and hence to 5 itself at a; we therefore obtain a normal at a by taking their cross
product. In both cases, the conditions on F or f that guarantee the smoothness of
5 also guarantee that these normal vectors are nonzero.
i. as a graph, y = f(x) and z = g(x) (or similar expressions with the coordi
nates permuted), where / and g are Cl functions;
The form (ii) describes the curve as the intersection of the two surfaces F = 0
and G — 0, and (i) is a special case of (ii) (with F(x,y,z) = y — f(x) and
G(x,y,z) =*-0(a;))andaf(iii)(withf(t) = (tJ(t),g(t))).
By now the reader should be able to guess what the appropriate regularity con
dition for cases (ii) and (iii) is. In (iii) it is simply that f'(t) ^ 0, and in (ii) it is
that
(Geometrically, this means that the surfaces F = 0 and G = 0 are nowhere tangent
to each other.) With these conditions we have an analogue of Theorems 3.11 and
3.15. Rather than give another precise statement and proof, we sketch the ideas and
leave the details to the reader (Exercise 7).
First, if V.F and VG are linearly independent, then at least one of the Jacobians
d{F,G)/d(x,y), d(F,G)/d(x,z), and d(F,G)/d(y,z) must be nonzero; let us
say the last one. Then the implicit function theorem guarantees that the equations
F = G = 0 can be solved for y and z as functions of x. Second, if f'(t) ^ 0,
then one of the components of f'(£) must be nonzero; let us say the first one. Then
the equation x = f\ (t) can be solved for t in terms of x, and then the equations
y = f'2{t) and z = fz(t) yield y and z as functions of x. In either case we end up
with the representation (i).
Let us say a little more about what can go wrong in case (ii) when WF and
VG are linearly dependent. The potential problems are clearly displayed in the
following situation: Let F(x,y,z) = z - ip(x,y), where <p is a C1 function, and
let G(x, y, z) = z. Then the sets where F — 0 and G = 0 are smooth surfaces; the
former is the graph of <p, whereas the latter is the xy-plane. The intersection of these
two surfaces is the curve in the xy-plane described by the equation <p(x,y) = 0.
Now, this curve can have all sorts of singularities if there are points on it where
Vtp = (0,0), as we have discussed in §3.2. But since VF = (—dxip, —dy(p, 1) and
VG = (0,0,1), the points where V</? = (0,0) are precisely the points where VF
and VG are linearly dependent.
If a curve S is represented parametrically by a function f (£), the derivative f'(i)
furnishes a tangent vector to S at the point f(i). On the other hand, if 5 is given
by a pair of equations F = G = 0 and a 6 S, the vectors VF(a) and VG(a) are
both normal to S at a and hence span the normal plane to S at a. One can therefore
obtain a tangent vector to S at a by taking their cross product.
3.3. Surfaces and Curves in Space 131
Higher Dimensions. The pattern for representations of curves and surfaces that
we have established in this section and the preceding one should be pretty clear by
now, and it generalizes readily to higher dimensions. We sketch the main points
briefly and leave it to the ambitious reader to work out the details.
The general name for a "smooth /c-dimensional object" is manifold; thus, a
curve is a 1-dimensional manifold and a surface is a 2-dimensional manifold. Here
we consider the question of representing /c-dimensional manifolds in M.n, for any
positive integers k and n with n > k. The two general forms, corresponding to (ii)
and (iii) above for curves and surfaces, are as follows.
The Nonparametric Form: A /c-dimensional manifold S in W1 can be described
as the set of simultaneous solutions of n — k equations. That is, given C1 functions
F\,..., Fn_k defined on some open set U C W1, or equivalently a C1 mapping
F = (Fi,..., Fn_fc) from U into Wl~k, we can consider the set
or, equivalently,
(3.17) 5 = {f(u):uey}.
or equivalently,
EXERCISES
1. For each of the following maps f : M.2 —> R3, describe the surface S =
and find a description of S as the locus of an equation F(x, y, z) = 0. Find the
points where duf and dvf are linearly dependent, and describe the singularities
of 5 (if any) at these points.
a. f(u, v) = (2u + v, u — v, 3u).
b. f(u,v) = (cmcosu, businv, u) (a,b > 0).
c. f(u,v) = (cosizcoshu, Binucoshv, sinhv).
d. ((u,v) = (ucosv, usinv, u2).
2. Find an equation for the tangent plane to the following parametrized surfaces
at the point (1, —2,1). (The first step is to find the values of the parameters u, v
that yield this point.)
a. x = eu~v, y = u-3v,z = \{uz + v2).
b. x = l/(u + v),y = —(u + ev), z = u3.
3. Find a parametrization for each of the following surfaces (perhaps involving an
angular variable that is denned only up to multiples of 2-k).
a. The surface obtained by revolving the curve z = f(x) (a < x < b) in the
xz-plane around the z-axis, where a > 0.
3.4. Transformations and Coordinate Systems 133
b. The surface obtained by revolving the curve z = f(x) (a < x < b) in the
xz-plane around the x-axis, where f(x) > 0.
c. The lower sheet of the hyperboloid z2 — 2x2 — y2 = 1.
d. The cylinder x2 + z2 = 9.
4. Find a parametric description of the following lines:
a. The intersection of the planes x — 2y + z = 3 and 2x — y — z = — 1.
b. The intersection of the planes x + 2y = 3 and y — 3z = 2.
5. Let S be the circle formed by intersecting the plane x + z = 1 with the sphere
x2 + y2 + z2 - 1.
a. Find a parametrization of S.
b. Find parametric equations for the tangent line to 5 at the point (|, — -t, |).
6. Let S1 be the intersection of the cone z2 = x2 + y2 and the plane z = ax + I,
where a € R.
a. Show that 5 is a circle if a = 0, an ellipse if |a| < 1, a parabola if \a\ = 1,
and a hyperbola if |a\ > 1.
b. Find a parametrization for 5 in the first two cases and for the part of S
lying above the xy-plane in the third case.
7. Give a precise statement and proof of the analogue of Theorem 3.11 for curves
inR3.
In this section we study smooth mappings from Rn to itself in more detail, with
emphasis on geometric intuition for the cases n = 2 and n = 3.
Suppose f : Rn —*• R" is a map of class C1. We can regard f as a transfor
mation of Rn, that is, an operation that moves the points in Rn around in some
definite fashion. When n > 1, such transformations are usually best pictured with
"before and after" sketches. That is, if x = f (u), we think of u and x as living
in two separate copies of Rn. We draw a sketch of u-space with some geometric
figures in it, such as a grid of coordinate lines, then draw a sketch of x-space with
the images of those figures under the transformation f.
1 1 1
1 1 1
1 I 1
1 1 1
1 1
1 1
1 1
t
1 1
1 1
1 1
1
1
1 1
FIGURE 3.7: The transformation (x,y) = (u2 - v2, 2uv), showing the
image in the a;y-plane of the coordinate grid in the half-plane u > 0.
lines v = ±a when 0 and the union of these two lines when c = 0. See
Figure 3.8.
We can think of mappings from M'1 to itself pictorially in the same way, though
the pictures are harder to draw. Figure 3.9 shows what happens to a cube under the
transformation t(u,v,w) = (—2u,v, hw).
Another common interpretation of a map f : K." -> Mn is as a coordinate
system on Wl. For example, we usually think of f(r, 0) = (r cos 6, r sin 6) as
representing polar coordinates in the plane. In the preceding discussion we thought
in terms of moving the points in Rn around without changing the labeling system
(namely, Cartesian coordinates); here we are thinking of leaving the points alone
but giving them different labels (polar rather than Cartesian coordinates.) It's just
a matter of point of view; the same transformation f can be interpreted either way.
For example, the systems of parabolas and hyperbolas in Figures 3.7 and 3.8 can
136 Chapter 3. The Implicit Function Theorem and Its Applications
TT
one further requirement that is natural to impose, namely, that the inverse mapping
f -1 : V ->■ U should also be of class C1, so that the correspondence is smooth in
both directions. Hence, the question arises: Given a Cl transformation f : U —» V,
when does f possess a C1 inverse f"1 : V ->■ Ul That is, when can the equation
f (x) = y be solved uniquely for x as a C1 function of y?
This question is clearly closely related to the ones that led to the implicit func
tion theorem, and indeed, if we restrict attention to the solvability of the equation
f (x) = y in a small neighborhood of a point, its answer becomes a special case of
that theorem. As we did before, we can guess what the answer should be by looking
at the linear approximation. If f (a) = b, the linear approximation to the equation
f (x) = y at the point (a, b) is T(x - a) = y - b where the matrix T is the Frechet
derivative Df (a), and the latter equation can be solved for x precisely when T is
invertible, that is, when the Jacobian det Df (a) is nonzero. We are therefore led to
the following theorem.
3.18 Theorem (The Inverse Mapping Theorem). Let U and V be open sets in W1,
a <E U, and b = f (a). Suppose that f : U -> V is a mapping of class C1 and the
Frechet derivative -Df (a) is invertible (that is, the Jacobian det Df (a) is nonzero).
Then there exist neighborhoods M C U and NcVofa. and b, respectively, so
that f is a one-to-one map from M onto N, and the inverse map f"1 from N to M
is also of class Cl. Moreover, ify = f(x) G N, D{f~l){y) = 1
Proof The existence of the inverse map is equivalent to the unique solvability of
the equation F(x, y) = 0 for x, where F(x, y) = f (x) - y. Since the derivative of
F as a function of x is just Df (x), the implicit function theorem (3.9) guarantees
that this unique solvability will hold for (x,y) near (a, b) provided that Df(a) is
invertible. (In referring to the statement of the implicit function theorem, however,
note that the roles of the variables x and y have been reversed here.) Moreover,
since f-1(f (x)) = x for x G M, the chain rule gives D(f-l){f(x)) • Df (x) = /
where / is the n x n identity matrix; in other words, D(f~1)(y) = [Df(x)]-"1
where y = f (x). D
DS<r-»> =
so det Df ^ 0 on £/, but f is not one-to-one since f (r, 9 + 2/cvr) = f (r, #). It is,
however, locally one-to-one, in that it is one-to-one if one restricts 9 to any interval
of length less than 2tt. (Notice also that the Jacobian of the polar coordinate map
vanishes when r = 0. This accords with the fact that the polar coordinate map is
not even locally invertible there; the angular coordinate is completely undetermined
at the origin.)
The question of global invertibility is a delicate one. Consider the following
situation: Let f : Mn —> WJ be a map whose component functions are all polyno
mials, and suppose that the Jacobian det Df is identically equal to 1. Is f globally
invertible? The answer is so far unknown; this is a famous unsolved problem.
We should also point out that the invertibility of Df (a) is not necessary for the
existence of an inverse map, although it is necessary for the differentiability of that
inverse. (Example: Let f(x) = x3. Then / has the global inverse f~l(y) = y1//3,
but /(0) = /'(0) = 0 and f'1 is not differentiable at 0.)
EXERCISES
1. For each of the following transformations (u,v) = i(x,y), (i) compute the
Jacobian det Di, (ii) draw a sketch of the images of some of the lines x =
constant and y = constant in the TO-plane, (iii) find formulas for the local
inverses of f when they exist.
a. u = ex cos y,v = ex sin y.
b. u = x2, v = y/x.
c. u = x2 + 2xy + y2, v — 2x + 2y.
2. Let (u, v) = f (x, y) = {x- 2y, 2x - y).
a. Compute the inverse transformation (x, y) — f^1(tt, v).
b. Find the image in the uw-plane of the triangle bounded by the lines y = x,
y = —x, and y = 1 — 2x.
c. Find the region in the xy-plane that is mapped to the triangle with vertices
(0,0), (-1,2), and (2,1) in the uv-plme.
3.4. Transformations and Coordinate Systems 139
5. Find a one-to-one C1 mapping f from the first quadrant of the xy-plane to the
first quadrant of the w-plane such that the region where x2 < y < 2x2 and
1 < xy < 3 is mapped to a rectangle. Compute the Jacobian det Df and the
inverse mapping f 1. (Hint: Map all the regions where ax2 < y < bx2 and
c < xy < d to rectangles.)
6. Let f : K3 -> R3 be the spherical coordinate map,
Thus r is the distance to the origin, ip is the co-latitude (the angle from the
positive z-axis), and 9 is the longitude.
a. Describe the surfaces in ary^-space that are the images of the planes r =
constant, tp = constant, and 9 = constant.
b. Compute the derivative Df and show that det Df (r, </?, 9) = r2 sin tp.
c. What is the condition on the point {ro,<po, 9q) for f to be locally invertible
about this point? What is the corresponding condition on (x'o^Vo, ^o) =
140 Chapter 3. The Implicit Function Theorem and Its Applications
In the implicit function theorem and its applications discussed in the preceding
sections, we have drawn consequences from the nonvanishing of various Jacobians.
In this section we consider the opposite situation, in which a Jacobian vanishes
identically.
For motivation, let us first consider the linear case. Let A be an n x n matrix,
and define F : Rn —>■ Rn by F(x) = Ax (where x is considered as a column
vector). If A is nonsingular, F is a one-to-one map from Rn onto itself, whose
inverse is F-1(y) = A~ly. However, if detA = 0, the range of T (namely,
the column space of .A) is a proper linear subspace of Rn, and the components
(/l; • • • i fn) of F satisfy at least one nontrivial linear relation. More precisely, if
the rank of A is k, where k < n, then the range of F is a /c-dimensional subspace
of Rn, and the components of F satisfy n — k independent linear relations (namely,
the relations satisfied by the rows of A).
A =
that is,
fi(x,y,z) =x + 2y - z,
fo{x,y,z) = 2x -y + 3z.
It is easily verified that det A = 0, that the first two rows of A are independent,
and that the third row is the sum of the first two. This last relation means that
the functions /i, /2, /3 satisfy the linear relation fz — f\ + f%. Equivalently,
the range of F is the plane defined by the equation ys = y\ + y2.
3.5. Functional Dependence 141
A =
that is,
fi(x,y,z) = x + 2ij- z,
f2(x,y,z) = 2x + 4y - 2z,
h{x,y,z) = -x-2y + z.
Here the rank of A is 1, and the functions fj satisfy the relations fi = 2/i,
f'i = — 3/i. The range of F is the line passing through the origin and the point
(1,2,-3).
More generally, one can consider linear maps F : M.m —» M.n defined bynxm
matrices A. The range of such a map is a linear subspace of Rn whose dimension
is the rank of A. It must happen when n > m, and may happen when n < m,
that this subspace is a proper subspace of Rn, in which case the components of F
satisfy nontrivial linear relations.
Now let us return to the study of more general functions. The appropriate ana
logue of "linear dependence" for nonlinear functions is "functional dependence,"
which means that the functions in question satisfy a nontrivial functional relation,
in other words, that one of them must be expressible as a function of the others.
We shall formulate this idea precisely in a way that is appropriate for C1 func
tions, although the notion of functional dependence does not really depend on any
differentiability conditions.
Suppose /i,..., fn are C1 real-valued functions on an open set U C Mm.
We say that /i, ...,/„ are functionally dependent on U if there is a C1 function
$ : Rn ->■ R such that
The nonvanishing of V$ guarantees, via the implicit function theorem, that the
equation $ = 0 can be solved locally for one of the variables in terms of the others;
in other words, one of the functions fj can be expressed in terms of the remaining
ones.
Geometrically, (3.19) means that the range of the map f = (/i,..., fn) is
contained in the hypersurface {y : <fr(y) = 0} in W1, so that it is at most (n — 1)-
dimensional. (It might be even smaller, of course; the functions fj might satisfy
other relations in addition to the equation $(f (x)) = 0.)
142 Chapter 3. The Implicit Function Theorem and Its Applications
fi(x,y,z) = x + y + z,
/2(x, y, z) = xy + xz + yz,
f3{x,y,z) =x2 + y2 + z2
are functionally dependent on R3, for fz = ff — 2/2.
Example 4. The functions fi(x,y) = 3x + 1, /2(x,y) = x2 — y are not
functionally dependent on any open set in R2. Indeed, the transformation f =
(/i) /2) is a one-to-one map from R2 onto itself whose inverse g = (51,52) is
given by gi(u, v) = \{u — 1), §2(11, v) = jjj(« - I)2 - u; hence the values of
f are not subject to any restrictions.
Proof. Functional dependence of the fj's means that there is a C1 function $ such
that $(f(x)) = 0 and V$(f(x)) ^ 0 for x g U. Differentiation of the equation
$(f (x)) = 0 with respect to the variables xi,... ,xn via the chain rule yields
n) = 0,
2) + • • • + {dn$)(dnfn) = 0,
where the derivatives of $ are evaluated at f (x) and the derivatives of the /,-'s are
evaluated at x. Thus, at each xGt/, the system of equations
has a nonzero solution, namely y = V$(f (x)). Therefore, its coefficient matrix
(<9j/fc(x)), which is nothing but the transpose of -Df(x), must be singular, and
hence det -Df(x) = 0. □
More interesting is the fact that the converse of this theorem is also true: The
vanishing of the Jacobian det Di implies the functional dependence of the //s. We
now present a version of this result with an additional hypothesis (the constancy of
the rank of Df) that yields a sharper conclusion. We formulate it so that it also cov
ers the case when the number of functions differs from the number of independent
variables.
3.21 Theorem. Let f = (/i,..., fn) be a C1 map from a connected open set
U C Rm into W1. Suppose that the matrix Di{x) has rank k at every x € U,
where k < n. Then every xo G U has a neighborhood N such that f\,..., fn are
functionally dependent on N and t(N) is a smooth k-dimensional submanifold of
K .
3.22 Theorem. Let f = (/, g, h) be a Cl map from a connected open set U C M.3
into R3. Suppose that the matrix Df(x) has rank k at every xeP, where k = 1
or 2. Then every Xq £ U has a neighborhood N such that the functions f,g,h are
functionally dependent on N and f(N) is a smooth curve (ifk = 1) or a smooth
surface (ifk = 2).
(1,1) entry is nonzero, that is, dxf(xo) ^ 0. By the implicit function theorem,
then, the equation u = f(x, y, z) can be solved near x = xo, u = uq = /(xo), to
yield x as a function of y, z, and u. Then v and w turn into functions of y, z, and
u also. Implicit differentiation of the equations u = f(x, y, z) and v = g(x, y, z)
with respect to y (taking y, z, and u as the independent variables) yields
Solving the first equation for dyx and substituting the result into the second equa
tion then yields
But since Df has rank 1, all of its 2 x 2 submatrices are singular; therefore,
d(f,g)/d(x,y) = 0 and hence dyv = 0. Restricting to a convex neighborhood
of (yo> zo, uo), we conclude that v is independent of y. For exactly the same rea
son, v is independent of z, and w is independent of y and z. That is, v and w are
functions of u alone, say v = ip(u) and w = ip(u). This shows that f,g, h are
functionally dependent — g(x.) = ip(f(x.)) and /i(x) = ip(f(x)) — and that the
image of a neighborhood of Xo under f is the locus of the equations v = <p(u),
w = ip(u), which is a smooth curve.
Now let us turn to the case k — 2. Here some 2x2 submatrix of Df{x.o) is
nonsingular; by relabeling the functions and variables, we can assume that it is the
one in the upper left corner, so that d(f, g)/d(x, y) is nonzero at xo. By the implicit
function theorem, the equations u = f(x,y,z) and v = g{x,y,z) can be solved
near x = xo, u = uq = /(xo), v = vq = g(xo), to yield x and y as functions of
u, v, and z. Taking u, v, and z as the independent variables, then, we differentiate
the equations u = f(x,y,z), v = g(x,y,z), and w = h(x,y,z) implicitly with
respect to z to obtain
or
) =-dzf,
(dxg)(dzx) + (dyg)(dzy) =-dzg,
These equations may be solved simultaneously for dzx, dzy, and dzw. By Cramer's
rule (Appendix A, (A.54)),
We conclude with a few words about the assumption that the rank of Df is con
stant. Suppose that A{x) is a matrix whose entries depend continuously on x 6 U
(U an open subset of Rm), and the rank of A(x0) is k. Since a set of linearly inde
pendent vectors remains linearly independent if the vectors are perturbed slightly,
the rank of A(x) is at least k when x is sufficiently close to xo. In other words,
for each k the set {x <E U : rank(^4(x)) > k} is open. In particular, if k0 is the
maximum rank of A(x) as x ranges over U, then {x e U : rank(.A(x)) = k0} is
open.
EXERCISES
1. For each of the following maps f = {f,g,h), determine whether j.g, h are
functionally dependent on some open set U C R3 by examining the Jacobian
146 Chapter 3. The Implicit Function Theorem and Its Applications
INTEGRAL CALCULUS
In this chapter we study the integration of functions of one and several real vari
ables. As we assume that the reader is already familiar with the standard techniques
of integration for functions of one variable, our discussion of integration on the line
is limited to theoretical issues. On the other hand, some of these issues arise also in
higher dimensions, and we shall sometimes invoke the careful treatment of the one-
variable case as an excuse for being somewhat sketchy in developing the theory for
several variables.
In elementary calculus, the term "integral" can refer either to the antiderivative
of a function / or to a limit of sums of the form Yl fixj)<^Xj; one speaks of in
definite or definite integrals. At the more advanced level, and in particular in this
book, "integral" almost always carries the latter meaning. The notion of integra
tion as a sophisticated form of summation is one of the truly fundamental ideas of
mathematical analysis, and it arises in many contexts where the connection with
differentiation is tenuous or nonexistent.
Recall that for a nonnegative function /, the basic geometric interpretation of the
integral Ja& f(x) dx is as the area of the region between the graph of / and the x-
axis over the interval [a, b]. The idea for computing this area is to subdivide the
interval [a, b] into small subintervals [xq, xi], [x\, X2], ■■■, [xj-i,Xj], with xq = a
and xj = b, and to approximate the region under the graph of / by a union of
rectangles based on the intervals [xj-i,Xj]. If we choose the height hj of the
jth rectangle to be smaller (resp. larger) than all the values of / on the interval
[xj-i,xj}, the corresponding sum ^1 hj(xj — Xj-i) will be a lower (resp. upper)
147
148 Chapter 4. Integral Calculus
bound for the area under the graph of /. If all goes well, these lower and upper
approximations will approach each other as we subdivide the interval [a, b] into
smaller and smaller pieces, and their common limit will be the integral of /.
Let us make this more precise, introducing some useful definitions along the
way. A partition P of the interval [a, b] is a subdivision of [a, b] into nonover-
lapping subintervals, specified by giving the subdivision points x\,. .., xj-i along
with the endpoints xq — a and x j = b. In symbols, we shall write
If P and P' are partitions of [o, b], we say that P' is a refinement of P if P' is
obtained from P by adding in more subdivision points, that is, if P C P'.
Observe that if P and Q are any two partitions of [a, b], they can be combined
into a single partition P U Q whose subdivision points are those of P together with
those of Q; P U Q is a refinement of both P and Q.
Now let / be a bounded real-valued function on [a, b]. (We make no continuity
assumptions on / at this point.) Given a partition P = {xq, ... , xj} of [a, b], for
1 < j < J we set
(4.1)
rrtj = inf{/(x) : Xj-i < x < Xj}, Mj = sup{/(x) : Xj-\ < x < x,j}.
(If / is continuous, rrij and Mj are just the minimum and maximum values of
/ on [xj-i,Xj], which exist by the extreme value theorem.) We then define the
lower Riemann sum spf and the upper Riemann sum Spf corresponding to the
partition P by
See Figure 4.1, where the lower and upper Riemann sums are the sums of the areas
of the rectangles, an area being counted as negative if the rectangle is below the
x-axis.
If m and M are the infimum and supremum of the values of / over the whole
interval [a, b], we clearly have rrij > m and Mj < M for all j, and hence
./
M\ ^(rp . _ -r . \ — ]\/f(h — n)
1
4.1. Integration on the Line 149
7
V
The same argument shows that if one of the subintervals [xj-%,Xj] is subdivided
further, the lower sum spf becomes larger while the upper sum Spf becomes
smaller. In short:
4.3 Lemma. IfP' is a refinement ofP, then sp/f > spf and Sp>f < Spf.
An immediate consequence of this is that any lower Riemann sum for / is less
than any upper Riemann sum for /:
4.4 Lemma. IfP and Q are any partitions of [a, b], then spf < Sqf.
the supremum and infimum being taken over all partitions P of [a, b}. By Lemma
4.4, we have /„(/) < IaU)- If me upper and lower integrals coincide, / is called
Riemann integrable on [a,b], and the common value of the upper and lower in
tegrals is the Riemann integral /afe f(x) dx. We shall generally omit the eponym
"Riemann," as the Riemann integral is the only one we shall use in this book, but it
is significant not only for historical reasons but in order to distinguish the Riemann
integral from the more sophisticated Lebesgue integral.
At first sight it would seem difficult to determine whether a function / is inte
grable and to evaluate its integral, as the definitions involve all possible partitions
of [a, 6]. The following lemma is the key to making these calculations more man
ageable.
150 Chapter 4. Integral Calculus
4.5 Lemma. Iff is a boundedfunction on [a, b], the following conditions are equiv
alent:
a. f is integrable on [a, b].
b. For every e > 0 there is a partition P of [a, b] such that Spf — spf < e.
Proof. If Spf — spf < e for some partition P, then Iaf — I^f < e, and since
e is arbitrary, it follows that laf = /£/, i.e., / is integrable. Conversely, if /
is a bounded function and e is positive, we can find partitions Q and Q' of [a, b]
such that 5,3/ < laf + \e and sq/f > l}'af - \e. Thus, if / is integrable, we have
Sqf-SQ'f < e. LetP = QuQ'; then by Lemma 4.3, s'Qf < sPf < SPf < SQf,
so Spf - sPf < sQf - SQ/f < e. □
The condition (b) in Lemma 4.5 not only gives a workable criterion for integra-
bility but also gives us some leverage for computing the integral of an integrable
function /. Indeed, for any partition P we have
,6
so if Spf - spf < e, Spf and spf are both within e of Ja6 f(x) dx. The latter
quantity is therefore the limit of the sums Spf or spf as P runs through any
sequence of partitions such that Spf — spf -* 0.
We next present the fundamental additivity properties of the integral, which are
are easy but not quite trivial consequences of the definitions:
4.6 Theorem.
a. Suppose a < b < c. If f is integrable on [a, b] and on [b, c], then f is integrable
on [a,c], and
Proof, (a) Given e > 0, let P and Q be partitions of [a, b] and [b, c], respectively,
such that Spf - spf < |e and Sqf — sgf < \t. Then P U Q is a partition of
[a, c] and
— Spf + SQf, SpyjQf = Spf + SQf.
4.1. Integration on the Line 151
It follows that SpUQf - spuqf < e, so that / is integrable on [a, c] by Lemma 4.5.
Moreover, j^ f(x) dx is within e of Spyjqf, and fa f(x) dx, and Jfcc f{x) dx are
within |e oiSpf and Sq/, respectively, so /ac /(x) dx is within 2e of Ja f(x) dx-\-
fb f(x) dx. Since e is arbitrary, (4.7) follows.
(b) Given e > 0, choose partitions P and Q of [a, b] such that 5p/ — spf < \e
and Sqq — sqq < \e, and let R = PuQbe the common refinement of P and
Q. Then by Lemma 4.3 we have SRf - sRf < Spf - spf and Srq - sag <
Sgg — sqg. Moreover, the maximum of the sum of two functions is at most the
sum of the maxima, and the minimum of the sum is at least the sum of the minima,
so
Hence,
/ f{x)dx = - / f(x)dx;
Jb Ja
then (4.7) holds no matter how the points a, b, c are ordered.
The following theorem lists some more standard properties of integrals. They
are all quite easy to derive from the definitions with the help of Lemma 4.5, and we
leave their proofs as Exercises 2-5.
k k
The next criterion for integrability is the one that is most commonly stated in
calculus books. Its proof, however, is frequently omitted because it relies on the
notion of uniform continuity that we studied in §1.8.
Proof. First, / is bounded on [a, b] by Theorem 1.23, so the upper and lower Rie-
mann sums for any partition exist. By Theorem 1.33, / is uniformly continuous
on [a, b); thus, given e > 0, we can find 5 > 0 so that \f{x) — f{y)\ < e/(b - a)
whenever x,y € [a, b] and \x — y\ < 5. Let P be any partition of [a, b] whose
subintervals [xj-\, Xj] all have length less than 5. Then \f(x) — f(y)\ < e/(b - a)
whenever x and y both lie in the same subinterval, and in particular the maximum
and minimum values of / on that subinterval differ by less than e/(6 — a). But this
means that
Spf - sPf =
l
4.12 Theorem. Iff is bounded on [a, b] and continuous at all except finitely many
points in [a, b], then f is integrable on [a, b].
Proof. Let yi,..., y^ be the points in [a, b] where / is discontinuous, and let m
and M be the infmum and supremum of {f{x) : a < x < b}, the set of values of /
on [a, b). Given 5 > 0, let
and let
L
U = \Jli, V = [a,b]\Uiat.
i
where Spf (resp. Sp'f) is the sum of the terms Mj(xj — xx-\) in Spf for which
the interval [xj-i,Xj] is contained in U (resp. V), and likewise for sp'f and Sp/.
Now, let e > 0 be given. Since / is continuous on each of the closed intervals
that constitute V, Theorem 4.11 shows that we can make
qV f V f . 1
DpJ— SpJ < g£
suPf -suPf= £
and we can make this less than \e by taking 5 < e/2L(M — m). In short, for a
suitably chosen P we have Spf — spf < e, so / is integrable by Lemma 4.5. □
The preceding argument actually proves more than is stated in Theorem 4.12.
It is not necessary that the set of discontinuities of / be finite, only that it can be
covered by finitely many intervals /i,..., Ii whose total length is as small as we
please. Certain infinite sets, such as convergent sequences, also have this property
(Exercise 6). We make it into a formal definition: A set Z C M. is said to have zero
content if for any e > 0 there is a finite collection of intervals I\,,.., Ii such that
(i) Z C Uf Ih and (ii)tne sum of the lengths of the JTj's is less than e. The proof of
Theorem 4.12 now yields the following result:
4.13 Theorem. Iff is bounded on [a, b] and the set ofpoints in [a, b] at which f is
discontinuous has zero content, then f is integrable on [a,b].
Theorem 4.13 is only a technical refinement of Theorem 4.12, and the reader
should not attach undue importance to it.1 We mention it because its analogue in
higher dimensions does play a significant role in the theory, as we shall see. We
also remark that neither of Theorems 4.10 and 4.13 includes the other; the set of
discontinuities of a monotone function need not have zero content, and there are
continuous functions that are not monotone on any interval.
If / is an integrable function on [a, b], the value of fa f(x) dx is somewhat
insensitive to the values of / at individual points, in the following sense:
4.14 Proposition. Suppose f and g are integrable on [a, b\ and f(x) = g(x) for
all except finitely many points x € [a, b]. Then Ja f(x) dx = fa g(x) dx.
1 It does, however, point the way toward a necessary and sufficient condition for a function to be
integrable, which we shall describe at the end of §4.8.
4.1. Integration on the Line 155
Proof. First suppose g is identically zero. That is, we are assuming that f(x) = 0
for all x e [a, b] except for finitely many points j/i,..., yi,. Let Pfc be the partition
of [a, b] into k equal subintervals, and take k large enough so that the points yi all
lie in different subintervals. Then
min
Both these quantities tend to zero as k —> oo, and hence Ja f(x) dx = 0.
The general case follows by applying this argument to the difference f — g. □
The main use of Proposition 4.14 is in the context of functions with finitely
many discontinuities, as in Theorem 4.12. For such a function / there is often no
"right" way to define / at the points where it is discontinuous. Proposition 4.14
assures us that this problem is of no consequence as far as integration is concerned;
we may define / at these points however we like, or indeed leave / undefined there,
without any effect on Ja f(x) dx.
Next, we present a general version of the fundamental theorem of calculus. Its
two parts say in effect that differentiating an integral or integrating a derivative
leads back to the original function.
y — xj y — xj
156 Chapter 4. Integral Calculus
we have
F(y)-F(x) 1
[f(t)-f(x)]dt.
y-x y-x Jx
Hence, if \y - x\ < 5, we have |/(£) — f{x)\ < e for all t between y and x, so
F(y)-F(x)
edt = e.
y-x \y -x
Since / is integrable, we can make spf and Spf as close to fa f(x) dx as we like
by choosing P suitably, and the desired result follows immediately. □
E f(tj)ixj -x3-i
l
is called a Riemann sum for / associated to the partition P. Clearly, if rrij and
are as in (4.1) we have rrij < f(tj) < Mj, so that
spf <
Thus, if / is integrable and we choose the partition P so that spf and Spf are
good approximations to f f(x) dx, all the Riemann sums corresponding to P will
also be good approximations to f f(x) dx.
4.1. Integration on the Line 157
One last question should be addressed: Given an integrable function / on [a, b],
for which partitions P do the sums spf and Spf furnish a good approximation to
/o f(x) dxl It might seem that the answer might depend strongly on the nature of
the function /, but in fact, any partition whose subintervals are sufficiently small
will do the job. More precisely:
4.16 Proposition. Suppose f is integrable on [a, b]. Given e > 0, there exists S > 0
such that if P = {xo,..., xj} is any partition of [a, b] satisfying
j - Xj-i) <S,
Proposition 4.16 shows, in particular, that one can always compute Ja6 f{x) dx
as the limit as k —> oo of spkf or Spkf, where Pk is the partition of [a, b] into k
equal subintervals.
One final remark: The definite integral, which is defined as a limit of Riemann
sums, may be considered on the intuitive level as a sum of infinitely many infinites
imal terms. This notion, which is probably quite obvious to the alert reader, is often
not stated explicitly in mathematics texts because of its lack of rigorous meaning.
But the fact is that in many situations — and we shall encounter several of them
later on — the interpretation of the integral as a sum of infinitesimals is the clearest
way to understand what is going on.
EXERCISES
2. Prove Theorem 4.9a. (Hint: Show that sP{cf) = csPf and SP(cf) = cSPf
if c > 0, and sP{cf) = cSPf and SP(cf) = csPf if c < 0.)
3. Prove Theorem 4.9b. (Hint: Consider partitions of [a, 6] for which c and d are
among the subdivision points.)
5. Prove Theorem 4.9d. (Hint: To prove that |/| is integrable, show that Sp\f\ -
sp\f\ < Spf - spf. For the inequality | / f\ < f |/|, observe that ±f < \f\
and use Theorem 4.9c.)
158 Chapter 4. Integral Calculus
6. Let {xk} be a convergent sequence in R. Show that the set {xi, X2, • • •} has
zero content.
7. Let / be an integrable function on [a, b]. Suppose that f(x) > 0 for all x
and there is at least one point xq G [a, b] at which / is continuous and strictly
positive. Show that Ja f(x) dx > 0.
8. Let / be an integrable function on [a, b]. Prove the following formulas directly
from the definitions:
a. For any c > 0, /o6 f(x) dx - c fj° f(cx) dx.
b. f*f(x)dx = fZbaf(-x)dx.
c. For any c G R, /'J f(x) dx = /fl6_cc /(a + c) dx.
9. Suppose g and h are continuous functions on [a, b], and / is a continuous func
tion on M2. Show that for any e > 0 there is a <5 > 0 such that if P =
{xo, • • •, xj} is any partition of [a, b] satisfying maxi<j<j(xj — Xj-i) < S,
then
rb
f(g(x),h(x))dx - < e
' a
3=1
for any choice of x'a, x'' in the interval [xj-i,Xj]. (The point is that x'j and x^'
need not be equal, so the sum in this inequality may not be a genuine Riemann
sum for the integral.)
In this section we develop the theory of multiple integrals. The basic ideas are much
the same as for single integrals; the most serious complication comes from the
greater variety of regions over which integration is to be performed. To minimize
the complexity of the notation, we first develop the two-dimensional case and then
sketch the extension to higher dimensions.
Here and in what follows we shall employ the following notation. If 5 and T
are sets, their Cartesian product S x T is the set of all ordered pairs (s, t) with
s G S and t € T. For example, the plane is the Cartesian product of the line with
itself: I2 = K x 1. This idea extends in the obvious way to products of n sets,
with ordered n-tuples replacing ordered pairs; for example, I3 = 1 x 1 x 1. We
can also think of 1R3 asR2xRoras!xR2.
(Thus, a "rectangle" in this sense is always closed, and its sides are always parallel
to the coordinate axes.) A partition of R is a subdivision of R into subrectangles
obtained by partitioning both sides of R. Thus, a partition P is specified by its
subdivision points,
with area
AAjk = (xj - Xj-i)(yk - yk-i)-
J K J K
the supremum and infimum being taken over all partitions P of R. If the lower and
upper integrals coincide, / is called (Riemann) integrable on R, and the common
value of the upper and lower integrals is called the (Riemann) integral of / over
R and is denoted by
f dA or // f(x,y)dxdy.
r J Jr
easily be adapted to the present situation. However, we have not yet built a satis
factory definition of two-dimensional integrals, because we often wish to integrate
functions over regions other than rectangles. The solution to this problem is simple,
in principle: To integrate a function / over a bounded region 5 C K2, we draw a
large rectangle R that contains S, (re)define / to be zero outside of S, and integrate
the resulting function over R.
To express this neatly, it is convenient to introduce another definition. If S is a
subset of IR2 (or W1, or indeed any set), the characteristic function or indicator
function of S is the function xs defined by
xs(x) = i .
I 0 otherwise.
fdA =
s J Jr.
It is easily verified that this definition does not depend on the choice of the en
veloping rectangle R, since the integrand fxs vanishes outside of S. (It also does
not depend on the values of / outside of 5. We could just as well assume that
/ is only denned on S or on some set containing 5, with the understanding that
The properties of integrals in two dimensions are very similar to those in one;
the following theorem provides a list of the most basic ones. The proof is essentially
identical to that of Theorems 4.6 and 4.9; we leave the details to the interested
reader.
4.17 Theorem.
a. If fi and fi are integrable on the bounded set S and c\,C2 6 K, then c\f\ +
C2/2 is integrable on S, and
[ JJ JJ f2dA.
c. If f and g are integrable on S and /(x) < <?(x) for x G S, then JJ^ f dA <
IIs
d. Iff is integrable on S, then so is \f\, and \ JJS f dA\ < JJs\f\ dA.
At this point we need to say more about the conditions under which a function is
integrable. In the one-variable situation, we can get along quite well by restricting
attention to continuous functions, but that is not the case here: Even if the function
/ is continuous, the function xs that enters into the definition of Jfs f dA is not.
The starting point is the analogue of Theorem 4.13. The notion of "zero content"
transfers readily to sets in the plane; namely, a set Z C R.2 is said to have zero
content if for any e > 0 there is a finite collection of rectangles R\,..., Rm such
that (i) Z c [jf1 Rm, and (ii) the sum of the areas of the Rm's is less than e. We
then have:
Proof. The proof is essentially identical to that of Theorem 4.13. That is, one
first shows that / is integrable if / is continuous on all of R by the argument
that proves Theorem 4.11, then encompasses the general case by the argument that
proves Theorem 4.12. Details are left to the reader. □
The notion of "zero content" is considerably more interesting in the plane than
on the line, as the sets having this property include not only finite sets but things
such as smooth curves (that is, curves parametrized by C1 functions f : [a, b] —>
M.2). The following proposition summarizes the results we will need; see also Ex
ercise 2.
4.19 Proposition.
a. //Zcl2 has zero content and U C Z, then U has zero content.
b. If Z\,..., Zk have zero content, then so does \Jt Zj.
c. If f : (ao,6o) —* M2 is of class C1, then f([a, b]) has zero content whenever
ciq < a < b < bo.
Proof. Parts (a) and (b) are easy, and their proofs are left as an exercise. As for
(c), let Pfc = {to,... ,tk} be the partition of [a, b] into k equal subintervals of
length 5 = (b - a)/k, and let C be an upper bound for {\i'(t)\ : t € [a, b}}. By
the mean value theorem applied to the two components x(t),y(t) of f (t), we have
\x(t) - x(tj)\ < CS and \y(t) - y(tj)\ < C5 for t E [tj-i,tj]. In other words,
f([tj-i,tj]) is contained in the square of side length 2C5 centered at f (tj). Hence,
f ([a, b}) is contained in the union of these squares, and the sum of their areas is
k(2CS)2 = AC2(b — a)2Ik. This can be made as small as we please by taking k
sufficiently large, so f ([a, b]) has zero content. □
162 Chapter 4. Integral Calculus
To apply Theorem 4.18 to the integrand fxs m the definition of ffs f dA, we
need to know about the discontinuities of xs- The following lemma provides the
answer.
In view of Theorem 4.18 and Lemma 4.20, to have a good notion of integra
tion over a set S, we should require the boundary of S to have zero content. We
make this condition into a formal definition: A set 5 C R2 is Jordan measurable
if it is bounded and its boundary has zero content. (We shall comment further on
this nomenclature below.) We shall generally say "measurable" rather than "Jor
dan measurable," but we advise the reader that in more advanced works the term
"measurable" refers to the more general concept of Lebesgue measurability (see
§4.8).
By Proposition 4.19, any bounded set whose boundary is a finite union of pieces
of smooth curves is measurable; these are the sets that we almost always encounter
in practice. The following theorem gives a convenient criterion for integrability.
Proof. The only points where fxs can be discontinuous are those points in the
closure of S where either / or xs is discontinuous. By Lemma 4.20 and Proposition
4.19b, the set of such points has zero content. By Theorem 4.18, fxs is integrable
on any rectangle R containing S, and hence / is integrable on S. □
Proof. Given e > 0, there is a finite collection of rectangles R\,..., Rm such that
Z C Uf7 Rm and the sum of the areas of the Rm's is less than e. By subdividing
these rectangles if necessary, we can assume that they have disjoint2 interiors and
form part of a grid obtained by partitioning some large rectangle R. Denoting this
partition by P, the area of Rj by \Rj\, and supx |/(x)| by C, we have
M M
Since e is arbitrary, the desired conclusion follows directly from the definition of
the integral. □
4.23 Corollary.
a. Suppose that f is integrable on the set S C M2. Ifg(x) = /(x) except for x in
a set of zero content, then g is integrable on S and Js g dA = Js f dA.
b. Suppose that f is integrable on S and on T, and S C\T has zero content. Then
f is integrable on SUT, and JSuT f dA = Js f dA + fT f dA.
Proof. For (a), apply Proposition 4.22 to the function / — g. For (b), we are as
suming that fxs and Jxt are integrable; moreover, by Proposition 4.22, fxsnT
is integrable and its integral is zero. But fxsur = fxs + fXT ~ fxsnT, so the
result follows. □
Area. The problem of determining the area of regions in the plane goes back
to antiquity. The first effective general method of attacking this problem was pro
vided by the integral calculus in one variable, which yields the area of a region
under a graph, or of a region between two graphs. It therefore produces a theory
of area for regions that can be broken up into finitely many subregions bounded by
graphs of (nice) functions. However, the two-variable theory of integration con
tains, as a special case, a theory of area (due to the French mathematician Jordan)
that encompasses more complicated sorts of regions too. Namely, if S is any Jordan
measurable set in the plane, its area is the integral over S of the constant function
/(x) = 1:
/ \ / \
\\ \
y y
\ \
( ) ( )
K /
\ y
of R, which produces a grid of small rectangles that cover S. The lower sum for this
partition is simply the sums of the areas of the small rectangles that are contained
in S, whereas the upper sum is the sum of the areas of the small rectangles that
intersect S. Taking the supremum of the lower sums and the infimum of the upper
sums yields quantities that may be called the inner area and outer area of S:
A(S)=IR(xs), = IR(XS).
When these two quantities coincide, that is, when the characteristic function xs is
integrable, their common value is the area of S. See Figure 4.3.
When do we have A(S) = A(S)1 It is not hard to see (Exercises 3-5) that for
any bounded set S,
• the inner area of 5mt plus the outer area of the boundary dS equals the outer
area of the closure 5.
It follows that the inner and outer areas ofS coincide precisely when the outer area
of the boundary dS is zero. But a moment's thought shows that this is nothing but
the condition that dS should have zero content. In short, the inner and outer area
ofS coincide precisely when S is measurable. This is the explanation for the name
"measurable": The measurable sets are the ones that have a well-defined area.
Although the class of Jordan measurable sets is much more extensive than the
class of sets whose area can be computed by one-variable calculus, it is not as big
as we would ideally wish. It does not include all bounded open sets or all compact
sets, for example. Moreover, it does not behave well with respect to passage to
4.2. Integration in Higher Dimensions 165
R — [oi,&i] x • • • x [an, bn] = {x : ai < Ei < &i, ..., an < xn < bn}.
The n-dimensional volume of such a box is the product of the lengths of its sides,
~Yj=i ipj ~ ai)- (Here Y[ is the product sign, analogous to ^ for sums.) A partition
of such a box is specified by partitioning each of its "sides" [ai,&i],..., [an, bn].
The notion of "zero content" generalizes to n dimensions in the obvious way:
A bounded set Z c Rn has zero content if for any e > 0 there are rectangular
boxes R\,..., Rk whose total volume is less than e, such that Z c (Ji Rj- The
analogue of Proposition 4.19c is that smooth submanifolds of dimension k < n in
R™ (given parametrically by Cl maps f : Rk —> Rn) have zero content.
With these modifications, the definition of integrability and Theorems 4.17,
4.18, and 4.21 work just as in the 2-dimensional case. The element of area dA
becomes an element of n-dimensional volume, which may be denoted by dVn,
dnx, or dx\ ■ ■ ■ dxn: thus, the notation for n-dimensional integrals over a region
166 Chapter 4. Integral Calculus
S C Wn is
= /■ • 7
where /• • •/ is shorthand for a row of n integral signs. When n = 3, we usually
write dV instead of dV3, the V denoting ordinary 3-dimensional volume.
We conclude with a useful fact about integrals in any number of dimensions.
4.24 Theorem (The Mean Value Theorem for Integrals). Let S be a compact,
connected, measurable susbset of W1, and let f and g be continuous functions on
S with g > 0. Then there is a point a € S such that
Proof. Let m and M be the minimum and maximum values of / on 5, which exist
since 5 is compact. Since g > 0, we have mg < fg < Mg on S, and hence
m ■■■ o(x) dnx < /•■•/ /(x)^(x) d"5x <M [■■■[ 5(x) d"x.
J Js J Js
Thus the quotient (/■ • -J fg)/(J- ■ ■/ 3) lies between m and M, so by the interme
diate value theorem, it is equal to /(a) for some a G 5. □
4.25 Corollary. Let S be a compact, connected, measurable subset ofW1, and let
f be a continuous function on S. Then there is a point a£5 such that
EXERCISES
5. Let S be a bounded set in M2. Show that the inner area of S plus the outer area
of dS equals the outer area of S. (Use Exercises 3 and 4.)
6. Let S be the subset of the z-axis consisting of the union of the open interval
of length \ centered at |, the open intervals of length ^ centered at \ and f,
the open intervals of length ^ centered at |, |, |, and |, and so forth. Let
U = S x (0,1) be the union of the open rectangles of height 1 based on these
intervals. Thus U is the union of one rectangle of area |, two rectangles of area
jq , four rectangles of area p,..., some of which overlap.
a. Show that U is an open subset of the unit square R — [0,1] x [0,1].
b. Show that the inner area of U is less than j.
c. Show that U is dense in R and hence that its outer area is 1. (Use Exercise
4.)
d. Let V = R\U. Show that V is a closed set whose inner area is 0 and
whose outer area is bigger than ^.
7. (The Second Mean Value Theorem for Integrals) Suppose / is continuous on
[a, b] and ip is of class Cl and increasing on [a, b]. Show that there is a point
c E [a,b] such that
fC
[ f{x)ip{x)dx = <p{a) fCf(x)dx + <p(b) [ f(x)dx.
a Ja Jc
to the latter integral. To remove the condition ip{b) = 0, show that if the
conclusion is true for / and <p, it is true for / and <p + C for any constant C.)
J K
If/ is integrable on R, this double sum approximates the integral JfRf(x, y) dx dy.
On the other hand, for each fixed y, the sum ]C/=i /(^j> 2/) Axj ^s a Riemann sum
for the single integral g(y) = f^ f(x, y) dx, and then the sum Y,k=i diVk) ^Vk is
a Riemann sum for the integral Jc g(y) dy. Thus, in an approximate sense,
J K
R
=l k=l
K b
f(x,y)dx\ dy.
a
We could also play the same game with x and y switched, obtaining
rb
fdA = f(x,y)dy\ dx.
R
here there is one potential pitfall: The integrability of / on R need not imply the
integrability of f(x, yo), as a function of x for fixed yo, on [a, b]. The line seg
ment {(#, y) : a < x < b, y = yo} is a set of zero content, after all, so / could
be discontinuous at every point on it, and its behavior as a function of x could be
quite wild. This problem is actually not too serious, and we shall sweep it under
the rug by making the assumption — quite harmless in practice — that it does not
occur. The resulting theorem is as follows. It is sometimes referred to as Fubini's
theorem, although that name belongs more properly to the generalization of the
theorem to Lebesgue integrals.
4.26 Theorem. Let R = {(x,y) : a < x < b, c < y < d}, and let f be an
integrable function on R. Suppose that, for each y € [c, d], the function fy defined
by fy(x) = f(x, y) is integrable on [a, b], and the function g(y) = fa f(x, y) dx is
integrable on [c, d]. Then
The integrals on the right side of (4.27) and (4.28) are called iterated integrals.
It is customary to omit the brackets in these integrals and to write, for example,
f(x,y)dxdy,
with the understanding that the integration is to be done "from the inside out." That
is, the innermost integral Ja corresponds to the innermost differential dx, and the
integral with respect to the corresponding variable x is to be performed first. Some
people find it clearer to write the differentials dx and dy next to the integral signs
to which they pertain, thus:
rd rb
/ dy / dxf{x,y).
170 Chapter 4. Integral Calculus
If our region of integration is not the whole rectangle R but a subset S, the in
tegration effectively stops at the boundary of S, and the limits of integration should
be adjusted accordingly. For example, if S is bounded above and below by the
graphs of two functions,
we have
rb rip(x)
(4.30) fdA= / f(x,y)dydx.
Here it is essential to integrate first in y, then in x, since the limits <p(x) and ip(x)
furnish part of the ^-dependence of the integrand for the outer integral Ja • ■ ■ dx.
It is important to observe that if 5 is a region of the form (4.29) where <p and
ijj are of class C1, and / is continuous on S, the hypotheses in Theorem 4.26
that allow integration first in y and then in x are automatically satisfied, so that
(4.30) is valid. Indeed, the integrability of fxs on any rectangle R D S follows
from Proposition 4.19c and Theorem 4.21, and the integrability of the function
(fxs)(xi y) as a function of y for fixed x is obvious since it is continuous except
at the two points y = (p(x) and y = ip{x).
On the other hand, if S is bounded on the left and right by graphs of functions
of y, we obtain a formula similar to (4.30) with the roles of x and y reversed.
In general, most of the regions S that arise in practice can be decomposed into a
finite number of pieces S\,..., Sk, each of which is of the form (4.29) or of the
analogous form with x and y switched. By using the additivity property (Theorem
4.17b), we can reduce the computation of ffs f dA to the calculation of iterated
integrals on these subregions.
Figure 4.4 may be helpful in interpreting iterated integrals. The sketch on the
left symbolizes JJ ■ ■ ■ dxdy, in which we integrate first over the horizontal lines
that run from the left side to the right side of the region, then integrate over the
y-interval that comprises the y-coordinates of all these lines. Similarly, the sketch
on the right symbolizes JJ • • • dydx.
4.3. Multiple Integrals and Iterated Integrals 171
: -2
FIGURE 4.5: The regions of integration in Example 1 (left) and Example 2 (right).
/ f(x,y)dxdy.
Jy2-i
To integrate first in y, we must break up R into its left and right halves:
f(x,y)dydx+ / / f(x,y)dydx.
172 Chapter 4. Integral Calculus
The ideas in higher dimensions are entirely similar. The analogue of Theo
rem 4.26 is that an integral over an n-dimensional rectangular solid with sides
[ai, bi],..., [an, bn] can be evaluated as an n-fold iterated integral,
r r rK rbi
... fdV= •••/ f(xi,...,xn)dxi---dxn,
J JR Ja,,. J ai
provided that the indicated integrals exist. The meaning of the iterated integral
on the right is that the integration is to be performed first with respect to x\ and
last with respect to xn. However, the same formula remains valid with the n inte
grations performed in any order. The only thing that needs some care is that the
integral signs jj must be matched up with the differentials dxj in the right order
so as to get the right limits of integration, and the convention is the same as in
the case n — 2: The integrations are to be performed in order from innermost to
outermost.
When the region of integration is something other than a rectangular solid, set
ting up the right limits of integration can be rather complicated. A typical situation
in 3 dimensions is as follows: The region of integration S is the region in between
two graphs,
based on some region U in the xy-plane. The region U in turn is the region between
two graphs,
7 rb rr(x) rip(x,y)
fdV= / / f(x,y,z)dzdydx.
a Ja(x)
Ja Ja(x) Jtp(x,y)
J(fi(x,v)
Example 3. Find the mass of the tetrahedron T formed by the three coordi
nate planes and the plane x + y + 2z — 2 (see Figure 4.6) if the mass density
is given by p(x, y, z) = e~z.
4.3. Multiple Integrals and Iterated Integrals 173
(0,0,1)
(0,2,0)
Solution. There are six ways to write the triple integral JJJT e z dV as an
iterated integral, although only three of them are essentially different, namely,
e zdzdydx, , e zdxdydz,
'0 JO u 0 JO
2 rl-{y/2) p2-y-2z
e z dxdz dy.
o Jo Jo
(The remaining three can be obtained from these simply by interchanging x
and y, since T and the density function are invariant under this interchange.)
Using the first of these, we obtain
_ (x+y)/2-l
(1 )dydx= x dx
10 JO
The reader may verify that the other two iterated integrals give the same answer.
! rd rh
(4.31) f(x,y)dydx= / / f{x,y)dxdy.
Jc Ja
174 Chapter 4. Integral Calculus
More precisely, (4.31) is valid if / satisfies the conditions in Theorem 4.26 for
both (4.27) and (4.28) to hold. (See Exercise 13 for an example to demonstrate
the significance of these conditions.) The importance of this result can hardly be
overestimated; it is an extremely powerful tool for evaluating quantities defined by
integrals. We shall see a number of examples in later chapters.
Applications. Double and triple integrals can be used to calculate physical and
geometric quantities in much the same way as single integrals. Here are a few
standard examples:
• If f(x, y) > 0, the integral /Js / dA can be interpreted as the volume of the
region in 1R3 between the graph of / and the rcy-plane that lies over the base
region S.
(The reader may wish for a more careful discussion of the meaning of p. In
formally, p(x) represents the ratio of the amount of substance in an infinites
imal cube centered at x to the volume of that cube. To make this rigorous,
one should interpret p(x) as the limit of the ratio of the amount of substance
in a finite cube centered at x to the volume of that cube as the side length of
the cube tends to zero. One can then prove, under suitable hypotheses, that
the amount of substance in any region S is fffs p(x)ci3x. But a complete
analysis of these matters is beyond the scope of this book.)
4.3. Multiple Integrals and Iterated Integrals 175
• Suppose that a massive object with mass density p{x) occupies the region
5 C R3, so that its total mass is m = fffs p(x)d3x. The center of gravity of
the object is the point x whose coordinates are Xj = m~l fJJs xJp(x)<i3x. In
the special case where p = 1, x is the centroid of the region 5, which is the
point whose coordinates are the average values of the coordinate functions
on S. The center of mass, in general, can be interpreted similarly as the point
whose coordinates are the weighted averages of the coordinate functions on
S where the weighting is given by the density p.
• Again suppose that a massive object with mass density p(x) occupies the
region S C R3, and let L be a line in R3. The moment of inertia of the
body about the line L, a quantity that is useful in analyzing rotational motion
about L, is JJJS d(x)2/9(x)ci3x, where d(x) is the distance from x to L. (For
example, if L is the z-axis, then d(x, y, z)2 = x2 + y2.)
EXERCISES
2. Find the volume of the region above the triangle in the xy-plane with vertices
(0,0), (1,0), and (0,1), and below the surface z = Qxy(l - x — y).
3. For the following regions S C M.2, express the double integral JJS f dA in
terms of iterated integrals in two different ways.
a. S = the region in the left half plane between the curve y = x3 and the line
y = 4x.
b. S = the triangle with vertices (0, 0), (2, 2), and (3,1).
c. S = the region between the parabolas y = x2 and y = 6 — 4x — x2.
4. Express each of the following iterated integrals as a double integral and as an
iterated integral in the opposite order. (That is, find the region of integration
for the double integral and the limits of integration for the other iterated inte
gral.)
i 1/3
a. /0 /J> f(x,y)dydx.
c- J?fi°gXf(x,y)dydx.
176 Chapter 4. Integral Calculus
5. Evaluate the following iterated integrals. (You may need to reverse the order of
integration.)
a. Jx fi ye2x dx dy.
c. J1 jx,x ye y dy dx.
6. Fill in the blanks: Jo J^x2 f(y) dy dx = jo[ ] dy + J^[ ]dy. The expres
sions you obtain for the [ ]'s should not contain integral signs.
7. Given a continuous function g : R —> R, let h(x) = Jox J*^ <?(£) dt dy. That is,
/i is obtained by integrating g twice, starting the integration at 0. Show that h
can be expressed as a single integral, namely, h(x) = Jq{x — t)g(t) dt. (Note
that x can be treated as a constant here; y and t are the variables of integration.)
8. Let 5 C R3 be the region between the paraboloid z = x2+y2 and the plane z =
1. Express the triple integral JJJS f dV as an iterated integral with the order of
integration (a) z, y, x; (b) y, z, x; (c) x, y, z. (That is, find the appropriate limits
of integration in each case.)
10. Find the centroid of the tetrahedron bounded by the coordinate planes and the
plane (x/a) + (y/b) + (z/c) = 1.
11. An object with mass density p(x, y, z) = yz occupies the cube {(x, y, z) : 0 <
x, y, z < 2}. Find its mass and center of mass.
12. A body with charge density p(x, y, z) = 2z occupies the region bounded below
by the parabolic cylinder z = x2 — 3, above by the plane z — x — 1, and on the
sides by the planes y = 0 and y = 2. Find its net charge (total positive charge
minus total negative charge).
13. Let f(x, y) = y~2 if 0 < x < y < 1, f(x, y) = -x~2 if 0 < y < x < 1, and
f(x, y) = 0 otherwise, and let S be the unit square [0,1] x [0,1].
a. Show that / is not integrable on S, but that f(x, y) is integrable on [0,1]
as a function of x for each fixed y and as a function of y for each fixed x.
b. Show by explicit calculation that the iterated integrals jQ Jo f(x, y) dxdy
and Jq Jq1 f(x, y) dy dx both exist and are unequal.
4.4. Change of Variables for Multiple Integrals 111
To motivate the ideas in this section, we recall the change-of-variable formula for
single definite integrals: If g is a one-to-one function of class C1 on the interval
[a, 6], then for any continuous function /,
rb rg(b)
(4.32) / f(g(u))g'(u) du = / f{x) dx.
Ja Jg(a)
The proof is a simple matter of combining the chain rule and the fundamental the
orem of calculus. Indeed, if F is an antiderivative of /, the right side of (4.32) is
F(g(b)) — F(g(a)), which in turn equals Ja (F o g)'(u) du, and the latter integrand
is f(g(u))g'(u). (Formula (4.32) is actually valid when / is merely integrable, but
we shall not worry about this refinement here.)
There is one slightly tricky point here, which we point out now because it will
be significant later. If g is an increasing function, (4.32) is fine as it stands, but
if g is decreasing, the endpoints on the integral on the right are in the "wrong"
order, and we might prefer to put them back in the "right" order by introducing a
minus sign: f9,v = — Jy^ • Since g is increasing or decreasing according as g' is
positive or negative, we could rewrite (4.32) as
Here g([a, b]) is the interval to which [a, b] is mapped under g, and for any interval
/ the symbol Jf means the integral from the left endpoint of / to the right endpoint.
The replacement of g' by | g' | compensates for the extra minus sign that comes from
adjusting the order of the endpoints when g is decreasing.
In practice it is often more convenient to have all the g's on one side of the
equation. If we set / = g([a, b]), we have [a, b] = g"1^), and (4.33) becomes
Our object is to find the analogous formula for multiple integrals. It is natural
to use (4.34) rather than (4.32) as a starting point, since for multiple integrals the
issue of left-to-right or right-to-left disappears and we just speak of integrals over
a region, like the integrals over intervals that appear in (4.34). More precisely,
suppose G is a one-to-one transformation from a region R in Mra to another region
5 = G(R) in Rn; then R = G~1(S), and the formula we are seeking should look
178 Chapter 4. Integral Calculus
rdO
>
dr
dr
(4.35) /(x)cTx= /
g-1(s)
The missing ingredient is the quantity that will play the role of \g'{u) | in the formula
(4.34).
Now, the g'(u) in (4.32) or (4.34) is the factor that relates the differentials du
and dx under the transformation x = g(u). In n variables, the n-fold differential
d"x = dx\ ■ ■ ■ dxn represents the "element of volume," that is, the volume of an
infinitesimal piece of ?^-space. So the question is: How does the volume of a tiny
piece of n-space change when one applies the transformation G?
To get a feeling for what is going on, let us look at the polar coordinate map
A small rectangle in the r6-plane with lower left corner at (r, 9) and sides of length
dr and dO is mapped to a small region in the xy-plane bounded by two line seg
ments of length dr and two circular arcs of length r d6 and (r + dr) dO. When dr
and d9 are very small, this is essentially a rectangle with sides dr and r d9, so its
area is r dr d9. In short, a small bit of the r^-plane with area dr d9 is mapped to a
small bit of the xy-plane with area r dr d6; see Figure 4.7. Hence, in this case the
missing factor in (4.35) is simply r, and (4.36) becomes
Here 5 is a region in the xy-plane and R = G"1 (S) is the corresponding region in
the r#-plane. Our argument here has been very informal, but this result is correct,
and it gives the formula for computing double integrals in polar coordinates.
The case of a linear mapping of the plane is also easy to analyze. Given a
matrix A = (a b.) with detA = ad - be ^ 0, let x = G(u) = Au, that is,
4.4. Change of Variables for Multiple Integrals 179
(0,1) (1,1) a + c, b + d)
(M)
(x,y) = G(u,v) — (au + bv, cu + dv). The transformation G takes the unit
vectors (1,0) and (0,1) to the vectors (a,c) and (b,d), so it maps the standard
coordinate grid to a grid of parallelograms with sides parallel to these vectors. In
particular, it maps the square [0,1] x [0,1] to the parallelogram with vertices at
(0,0), (o, c), (b, d), and (a + b, c + d), as indicated in Figure 4.8. The area of that
parallelogram is |ad — be|, that is, | det A\. (To see this, think of the plane as sitting
in R3 and recall the geometric interpretation of the cross product: The area of the
parallelogram is
ff ff
// f(x,y)dxdy=\ad-bc\ II f(au + bv, cu + dv) du dv.
JJs JJg-1 (S)
which is nothing but | det.A| (Exercise 8 in §1.1). As before, we conclude that for
the linear map G(u) = .Au of R3, the missing factor in (4.35) should be | det A\.
180 Chapter 4. Integral Calculus
It is now reasonable to conjecture that the same result should hold for linear
mappings of Rn for any n. We proceed to show that this is correct.
4.37 Theorem. Let A be an invertible n x n matrix, and let G(u) = ^4u be the
corresponding linear transformation ofW1. Suppose S is a measurable region in
W1 and f is an integrable function on S. Then G~1(S) = {A-1x : x e S} is
measurable and f o G is integrable on G~l(S), and
1. Multiply the /cfh component by a nonzero number c, leaving all the other
components alone:
2. Add a multiple of the jth component to the kth component, leaving all the
other components alone:
The corresponding matrices A\, A%, A3 are obtained by performing the same row
operations on the identity matrix. Since det / = 1, the rules that tell how row
operations affect determinants (see (A. 30)) give
It is easy to verify that (4.38) holds for these three types of transformations.
The first two involve a change in only the kth variable, so we can integrate first
with respect to that variable and use (4.34) (or, rather, the simple special cases of
(4.34) discussed in Exercise 8 of §4.1). Thus, for Gx we set xk = cuk and obtain
= \c\ / f(...,cuk,...)duk.
—00
(The endpoints have to be switched if c < 0, which accounts for replacing c by \c\,
as in the discussion preceding (4.34).) Likewise, for G2 we set xk = uk + cuj and
obtain
/■OO TOO
00 /■oo
f(...,Uj,...,uk,...)dujduk
—00 j — 00
/*OO
simply because the variables Uj and uk are dummy variables here. That is, we are
integrating / with respect to its jth and A;th variables, and it doesn't matter what
we call them. Now an integration with respect to the remaining variables, together
with (4.39), gives (4.38) for G3.
Step 3: We next verify that if (4.38) is valid for the linear maps G(u) = Au
and H(u) = Bu, then it is also valid for the composition (G o H)(u) = ABu.
182 Chapter 4. Integral Calculus
as claimed.
The Final Step: From Step 3, it follows easily by induction that if (4.38) is valid
for Gi,..., Gfc, then it is also valid for the composition Gio- ■ -oGfc. Thus, in view
of Step 2, to complete the proof we have merely to observe that every invertible
linear transformation of W1 is a composition of elementary transformations. This
is equivalent to the fact that every invertible matrix A can be row-reduced to the
identity matrix; see (A.52) (in particular, the equivalence of (a) and (i)) and (A.53)
in Appendix A. □
There is one more simple class of transformations for which the change-of-
variable formula is easily established, namely the translations. These are the map
pings of the form G(u) = u + b where b is a fixed vector. Indeed, we just make
the substitution Xj = Uj + bj, dxj = duj in each variable separately to conclude
that
[[[ [[ b)cTu.
JS J JS-b
we think of the former as a sum of infinitesimal terms /(x) cPx, each of which is
the value of / at a point x multiplied by the volume cPx of an infinitesimal region
dS located at x. Under the transformation x = G(u), /(x) becomes /(G(u)),
and the region dS is the image under G of another infinitesimal region dT whose
volume is dnu. But on the infinitesimal level, the differentiable map G is the same
as its linearization:
G(u + du) = x + £>G(u) • du.
Therefore, by (4.40), the elements of volume dnx and dnu are related by the for
mula dnx = | det£>G(u)| dnu. Putting this all together, we arrive at the main
theorem.
4.41 Theorem. Given open sets U and V in Rn, let G : U —>• V be a one-to-one
transformation of class C1 whose derivative DG(u) is invertible for all u € U.
Suppose that T C U and S C V are measurable sets such that G(T) = S. If f is
an integrable function on S, then f o G is integrable on T, and
Notice that the results derived earlier in this section are indeed special cases of
Theorem 4.41. If G is a linear map, G(u) = Au, then DG(u) = A for all u,
so | det DG(u)| = | det A\ is a constant that can be brought outside the integral
sign. And if G is the polar coordinate map, G(r,9) = (rcos(9, rsinO), then
det DG(r, 9) = r, so we recover (4.36).
Let us record the corresponding results for the standard "polar" coordinate sys
tems in R3, shown in Figure 4.9. Cylindrical coordinates are just polar coordi
nates in the xy-plane with the ^-coordinate added in,
It is easily verified that det DGcy\(r, 8,z) = r again, so the formula for integration
in cylindrical coordinates is
.--" 9
Here r is the distance from the origin, 9 is the longitude, and <p is the co-latitude (the
angle from the positive z-axis). The reader may check that det DGsph(r, <p, 9) =
r2 shop (Exercise 6c, §3.4), so the formula for integration in spherical coordinates
is
(4.44) f(x,ij,z)dxdyd2
Example 1. Find the volume and the centroid of the region S above the
surface z = x2 + y2 and below the plane z = 4. (See Figure 4.10.)
Solution. Because of the circular symmetry, it is most convenient to use
polar coordinates. The projection of S onto the xy-pla.ne is the disc of radius 2
about the origin, so the volume of S is
hf[['<r-r
v .1J .Is 87r Jo Jt* Jo
rz d9 dz dr =
Jo
4.4. Change of Variables for Multiple Integrals 185
Example 2. Find the volume of the "ice cream cone" T bounded below by
the cone z = 2-i/z2 + y2 and above by the sphere x2 + y2 + z2 = 1. (See
Figure 4.10.)
Solution. In spherical coordinates (r,ip,0), the equation of the cone is
tan <p — \ and the equation of the sphere is r = 1. Hence the volume is
/•i /■tan-1(l/2)
nan *(1/2J /-27T
rz-rr ^ ,. .„,
/ / / r2 sin tp cW dtp dr — (2vr) [ — cos 92]i tan"1
„ (1/2) r
LI' Jo
Jo Jo io
This can also be done in cylindrical coordinates (r, 9, z) (note that the meaning
of r has changed here), in which the equation of the cone is z = 2r and the
equation of the sphere is r2 + z2 = 1. The projection of T onto the xy-plane
is the disc r < 1/V5, so the volume is
/ / / rd6dzdr =
JO J2r JO
2u + v \ ( v — u
xydA = - dv du,
p 'o Jo
2 o
3 *
-I 2
where the columns of the 2x2 matrix are the vectors from the origin to the
two adjacent vertices. Taking this transformation as G in Theorem 4.37 yields
xy dA = 2 2t)(~\s + 2t)dtds.
/o Jo
This integral is essentially the same as the preceding one; the variables (s,t)
and (u, v) are related by u = s, v = 6t.
Example 4. Let R be the region in the first quadrant of the xy-plane bounded
by the x-axis and the parabolas x = 1 — \y2, x = \y2 — 1, and x = 4 — j$y2-
(See Figure 4.11.) What is JJS xy dx dyl
Solution. Refer back to Example 3 in §3.4: The region R is the image of
the rectangle {(u, v) : I < u < 2, 0 < v < 1} under the map G(u, v) =
{u2 - v2, 2uv). We have DG{u,v) = ( Jj ~ and hence detDG(u,v) =
4.4. Change of Variables for Multiple Integrals 187
EXERCISES
1. Find the area of the region inside the cardioid r = 1 + cos 9 (polar coordinates).
2. Find the centroid of the half-cone \Jx2 + -y2 < z < 1, x > 0.
3. Find the volume of the region inside both the sphere x2 + y2 + z2 =4 and the
cylinder x2 + y2 = 1.
4. Find the volume of the region above the xy-plane, below the cone z = 2 —
\/x2 + %j2, and inside the cylinder (x - I)2 + y2 = 1.
5. Find the mass of a right circular cylinder of base radius R and height h if the
mass density is c times the distance from the bottom of the cylinder.
6. Find the volume of the portion of the sphere x2 + y2 + z2 =4 lying above the
plane z = 1.
7. Find the mass of a ball of radius R if the mass density is c times the distance
from the boundary of the ball.
8. Find the centroid of the portion of the ball x2 + y2 + z2 < 1 lying in the first
octant (x, y, z > 0).
9. Find the centroid of the parallelogram bounded by the lines x - 2>y = 0, 2.x +
y = 0, x - 3y = 10, and 2x + y = 15.
10. Calculate JJs{x + y)'x{x - y)~5 dA where S is the square -1 < x + y < 1,
1 < •'•■ - y'< 3.
11. Find the volume of the ellipsoid (x + 2y)2 + (x - 2y + z)2 + 3z2 = 1.
12. Let S be the region in the first quadrant bounded by the curves xy = 1, xy = 4,
and the lines y = x, y = 4x. Find the area and the centroid of 5 by using the
transformation u = xy, v — y/x.
13. Let 5 be the region in the first quadrant bounded by the curves xy = 1, xy = 3,
X2 _ y2 _ ^ anc| 3.2 _ y2 _ 4_ Compute JJs(x2 + y2) dA. (Hint: Let
G(x,y) = (xy, x2 - y2). What is |det£JG|?)'
188 Chapter 4. Integral Calculus
(Y e 5);
is it true that
f
limF(x) = /•••/ g(y)dny?
f
x~>a J Js
In other words, can one interchange the operations of integrating with respect to y
and taking a limit with respect to x? Is the limit of the integral equal to the integral
of the limit? In general, the answer is no.
Example 1. let
f(x,y) = 2J
\x -+- y )
Evidently limx^o f(x,y) = 0 for each y (although for different reasons when
y = 0 or when y ^ 0). However, limx^o /0 f{x, y) dy ^ 0; in fact,
/
x2y x2
\dy =
./o (x 2 + y 2Y 2{x2+y2) 0
2(1
4.46 Theorem. Suppose S and T are compact subsets ofW1 and IRm, respectively,
and S is measurable. //'/(x, y) is continuous on the setT x 5 = {(x,y) : x e
T, ye 5}, then the function F defined by (4.45) is continuous on T.
Proof Given e > 0, we wish to find 5 > 0 so that |-F(x) - F(x!)\ < e whenever
x — x'| < 5. Let 151 denote the n-dimensional volume of S. Since T x 5 is
compact, / is uniformly continuous on it by Theorem 1.33, so there is a 6 > 0 so
that |/(x,y) - /(x',y)| < e/\S\ whenever y G 5, x,x' G T, and |x - x'| < S.
But then
Proof. Given a point xo G T, choose r > 0 small enough so that x£T whenever
lx - xo| < 2r. We shall show that F is of class C1 on B(r, x0) and prove (4.48)
for x G B(r, xo); since xo is an arbitrary point in T, this will establish the theorem.
For the purpose of computing dXjF, the other variables Xk (k ^ j) play no role, so
we may assume that m = 1. In fact, in order to simplify the notation a bit, we shall
also assume that n = 1; the proof for general n is exactly the same. Accordingly,
we write x and y instead of x and y henceforth.
For 0 < \h\ < r and \x — xq\ < r, we consider the difference quotient
(4.49)
F(x + h) - F{x)
, 3xf(x, y) dy = / [dxf(x + th, y) - dxf(x, y)} dy.
The argument now proceeds as in the proof of Theorem 4.46. Since dxf is contin
uous on the compact set B(r, xq) x 5, it is uniformly continuous there by Theorem
1.33. Thus, given e > 0, we can find S > 0 so that the integrand on the right of
(4.49) is less than e/\S\ for all y G S, x G B(r, x0), and t G (0,1), whenever
\h\ < 5. It follows that
F{x + h) - F(x)
/ dxf{x,y) dy < / 7^- dy = e for \h\ < 5,
h Js .Is \b\
and hence that
(x + h)F(x)_r
>*■ JS
as claimed. □
Situations often occur in which the variable x occurs in the limits of integration
as well as the integrand. For simplicity we consider the case where x and y are
scalar variables:
We suppose that / is continuous in x and y and of class Cl in x for each y, and that
<p is of class C1. If / does not depend on x, the derivative of F can be computed
by the fundamental theorem of calculus together with the chain rule:
dx Ja
For the more general case (4.50), we can differentiate F by combining this result
with Theorem 4.47 according to the recipe in Exercise 7 of §2.3: Differentiate with
respect to each x in (4.50) in turn while treating the others as constants, and add
the results. The upshot is that
Then
rx c
and hence h"(x) = g(x). (Cf. Exercise 7 in §4.3, where this result is ap
proached from a different angle.)
The hypotheses of Theorems 4.46 and 4.47 can be weakened considerably, but
only at the cost of a more intricate proof. More sophisticated theories of integra
tion (see §4.8) furnish a powerful theorem, the so-called dominated convergence
theorem, that generally provides the sharpest results in these situations. The full
statement of this theorem requires more background than we have available here,
but its restriction to the context of Riemann integrable functions is the following
result, in which the crucial condition is the existence of the uniform bound C.
f f f f
lim /•••/ fj(y)dny= /•••/ f(y)dny.
J"*00 J JS J JS
192 Chapter 4. Integral Calculus
An elementary (but not simple) proof for the case where 5 is an interval in R
can be found in Lewin [17]. The full dominated convergence theorem can be found
in Bear [3, p. 68], DePree and Swartz [5, p. 194], Jones [9, p. 133], and Rudin [18,
p. 321].
Theorem 4.52 implies the following improvements on Theorems 4.46 and 4.47.
4.53. Corollary. Let S be a measurable subset ofW1 and T a subset ofW1. Sup
pose /(x, y) is a function on T x S that is integrable as a function ofye S for
each x G T, and let F be defined by (4.45).
a. If f(x, y) is continuous as a function ofx G T for each y G S, and there is
a constant C such that |/(x, y)| < C for all x G T and y G 5, then F is
continuous on T.
b. Suppose T is open. If f{x, y) is of class Cl as a function o/x G T for each
y G S, and there is a constant C such that | Vx/(x, y) | < C for allx €T and
y G 5, then F is of class C1 on T and (4.48) holds.
Proof. To prove part (a), by Theorem 1.15 it is enough to show that F(x.j) —> F(x)
whenever {xj} is a sequence in 5 converging to x G S. This follows by applying
the bounded convergence theorem to the sequence of functions /7(y) = /(x7, y).
Similarly, part (b) is proved by applying the bounded convergence theorem to the
sequence of difference quotients with increments hj, where {hj} is a sequence
tending to zero along one of the coordinate axes. The uniform bound on these quo
tients is obtained by applying the mean value theorem as in the proof of Theorem
4.47; details are left as Exercise 8. □
EXERCISES
\x -
f[n](x) = ^yy J\
Show that (/["■])' = /t""1! for n > 1 and conclude that /tn' is an nth-order
antiderivative of /.
7. Let / be any continuous function on [0,1]. For i6R and t > 0, let
f 9 f f (XI I
u{x,t)=t~1/2 e & yy*uf(y)dy, v(x,t)=t -, ,2 2 dy.
I. Jq°° f(x) dx, where / is integrable over every finite subinterval [a, 6].
194 Chapter 4. Integral Calculus
II. Ja f(x) dx, where / is integrable over [c, b] for every c > a but is unbounded
near x = a.
We study these two types in turn and then consider integrals of more complicated
sorts that can be obtained by combining them.
/■oo 6
rb
More precisely, the integral /a°° f(x) dx is said to converge if the limit on the right
exists, in which case its value is defined to be that limit; otherwise the integral is
said to diverge, and it is not assigned a numerical value. (However, we may say
that J^° f{x) dx = oo if fa f(x) dx grows without bound as b —> oo.)
Example 1.
a. Jo°°
°° e~x dx = lim^oo
lim^ [ — e~x] = 1, 1 since lim&^oo
lim& e~b = 0.
b. Jo°° cos x dx diverges, since lim&^oo sin b does not exist.
Our main concern here is not with the evaluation of j°° f(x) dx but with the
more basic question of whether or not it converges. At the outset, we make one
simple but useful remark: If c > a, the convergence of J£° f(x) dx is equivalent to
the convergence of /c°° f(x) dx, the difference between the two being the ordinary
integral J^ f(x) dx. Thus, the convergence of Ja°° f(x) dx depends only on the
behavior of f(x) as x —> oo, not on its behavior on a finite interval [a, c\.
We first consider the situation when / > 0. In this case, the integral Ja f(x) dx
increases along with the upper endpoint b, so we can exploit the following variant
of the monotone sequence theorem.
4.54 Lemma. Ify> is a bounded increasing function on [a, oo), then linx^oo p(x)
exists and equals sup{</?(a;) : x > a}.
Proof. The proof is left to the reader (Exercise 7); it is essentially identical to the
proof of the monotone sequence theorem (1.16). □
By applying Lemma 4.54 to the function tp(x) = J® f(t) dt, we see that the
integral Ja°° f(x) dx converges if and only if ja f(x) dx remains bounded as b —>
oo. This immediately leads to the basic comparison test for convergence.
4.6. Improper Integrals 195
4.55 Theorem. Suppose that 0 < }{x) < g(x) for all sufficiently large x. If
f™g{x)dx converges, so does Ja°° f(x) dx. If f™ f(x)dx diverges, so does
f™g(x)dx.
rb rb rco
The convergence of /o°° f(x) dx then follows from Lemma 4.54. The second as
sertion is equivalent to the first one. □
4.56 Corollary. Suppose f > 0, g > 0, and f(x)/g(x) ->■ / as x -> oo. IfO<
I < oo, then Ja°° f(x) dx and /a°° g(x) dx are both convergent or both divergent.
If I = 0, the convergence o//a°° g(x) dx implies the convergence o//Q°° f{x) dx. If
I = oo, the divergence of /a°° g(x) dx implies the divergence off™ f(x) dx.
Proof. If 0 < I < oo, the fact that f(x)/g(x) ->■ I yields the estimates f(x) <
2lg{x) and f(x) > \lg{x) for sufficiently large x, so the first assertion follows by
comparing / to a multiple of g. If I = 0 (resp. I = oo), we have f(x) < g(x) (resp.
g{%) > f{x)) f°r sufficiently large x, whence the other assertions follow. □
The functions most often used for comparison in Theorem 4.55 and Corollary
4.56 are the power functions x~p. Taking a = 1 for convenience, for p ^ 1 we
have
rb dx bl-p - 1 f oo if p < 1,
—>
and J* x~1dx = log 6 —> oo. In short, f™ x~p dx converges if and only if p > 1.
Combining this fact with Theorem 4.55, we obtain the following handy rule:
4.57 Corollary. IfO < f(x) < Cx~p for all sufficiently large x, where p > 1, then
Ja°° f{x)dx converges. If f{x) > ex'1 (c > 0) for all sufficiently large x, then
J*a°° f(x)dx diverges.
2x + 14 4a; 4
—3—- < ^r = -j for x > 7.
x6 + 1 ar a:
196 Chapter 4. Integral Calculus
2x + 14 / 1
—5 / —7r —► 2 as x —> oo
x3 + 1 / x2
and use Corollary 4.56 with g(x) = x~2 to establish the convergence of the
integral over, say, [1, oo). (The integral over [0,1] is proper.) Note that we are
not comparing /0°°[(2z + 14)/(x3 + 1)] dx to /0°° x~2 dx, which presents an
additional difficulty because x~2 is unbounded at x = 0; the comparison of
(2x + 14)/(x3 + 1) with x~2 is significant only for large x.
It should be noted that the power functions x~p do not quite tell the whole story.
There are functions whose rate of decay at infinity is faster than x~l but slower
than x~p for p > 1, and their integrals may be either convergent or divergent; see
Exercises 4 and 5.
Next we remove the assumption that / is nonnegative, and with a view toward
future applications, we shall allow / to be complex-valued. The question of con
vergence can often be reduced to the case where / > 0 via the following result.
The integral /a°° f(x) dx is called absolutely convergent if Ja°° \f(x) \ dx con
verges. Theorem 4.55 and its corollaries can be used to test for absolute conver
gence, by applying them to |/|. It is possible, however, for /a°° f(x) dx to converge
even when Ja°° \f(x)\ dx diverges because of cancellation effects between positive
and negative values. Here is an important example.
/oo gin a;
dx is not absolutely convergent (Exercise
x
b rb
sm x , — cos x cos x
dx = dx.
x x
4.6. Improper Integrals 197
Now, /j00 |x~2 cosx| dx converges by Corollary 4.57 since \x~2 cosx| < x~2,
so the integral on the right approaches a finite limit as b —> oo; moreover, since
|6-1 cos b\ < b~l —> 0, so does the other term. Hence lim^oc J^ x"1 sin x dx
exists, as claimed.
Improper Integrals of Type II. In this subsection, all functions in question are
assumed to be defined on (a, b] and integrable on [c, b] for every c > a.
The definition of the improper integral in this situation is
rb rb
4.59 Theorem. Suppose that 0 < f(x) < g(x) for all x sufficiently close to
a. If f^ g(x) dx converges, so does Ja f(x) dx. If Ja f(x) dx diverges, so does
fag{x)dx.
The functions most often used for comparison in this situation are the power
functions (x — a)~v, but now the condition for convergence is p < 1 rather than
p > 1. Indeed, forp ^ 1,
(x — a) F dx =
1-p
and Jc6(x - a)"1 dx = log(x - a)\bc —> oo. Hence the analogue of Corollary 4.57
is as follows:
4.60 Corollary. IfO< f(x) < C(x — a)~p for x near a, where p < 1, then
Jab f{x) dx converges. If f(x) > c(x — a)'1 (c > 0)for x near a, then Ja f(x) dx
diverges.
Theorem 4.58 also remains valid in this situation; that is, absolute convergence
implies convergence.
/■oo rO />oo
The integral on the left converges only when both of the limits on the right exist
independently of one another; there is no relation between the variables a and b.
The same ideas apply to J^° fix) dx when / is unbounded at a or to f f(x) dx
when / is unbounded at both a and b.
dx ,fe 7T / 7T\
= lmi arctanx = = vr.
,1 + X2 o-*-oo, 6-H-oo la 2 V 2/
EXAMPLE 9. Let fix) = (x3 - 8z2)~1/3, and let us consider /Q9 f(x) dx and
Jo°° f(x) dx. The singularities of / occur at x = 0 and x = 8, so for the first
integral we write
f = r+ f + i (o<c<8).
Jo Jo Jc Js
We have \f(x)\ = x~2l3\x - 8|~1/3, which is approximately \x~2lz for x
near 0 and approximately \\x - 8I"1/3 for x near 8. Hence all three subinte-
grals are absolutely convergent by Corollary 4.60, and the original integral /0
converges. On the other hand, f{x) is positive for x > 8 and f(x)/x~1 =
(1 - 83T1)-1/3 -> 1 as x -> oo, so /g°° f(x) dx diverges by Corollary 4.56. It
follows that /0°° f(x) dx diverges too.
The definition of the improper integral Ja6 f(x)dx given above when / has
a singularity in the interior of [o, b] is a little too restrictive for some purposes.
Consider, for example, Jj^ x~ldx. According to our definition, this integral is to
be considered as the limit of
f-S dx f1 dx , . .
(4.61) / h / — = log 5 - log e = lo,
7-i x Je x
as 5 and e decrease to 0, and this limit does not exist: When 5 and e are extremely
small, their ratio can be arbitrarily large or arbitrarily small. However, since x'1 is
an odd function, it seems natural to interpret the value of the integral as 0; the neg
ative infinity of j'°_x x^1 dx should exactly cancel the positive infinity of JQ x~ldx.
We can achieve this result by modifying (4.61) so as to preserve the symmetry of
the situation, namely, by taking S = e, so that log(<5/e) = 0.
These considerations lead to the following definition. Suppose a < c < b, and
supppose / is integrable on [a, c — e] and on [c + e, b) for all e > 0. The (Cauchy)
principal value of the integral Ja f(x) dx is
provided that the limit exists. Of course, if Ja f(x) dx converges, its Cauchy prin
cipal value is its ordinary value.
200 Chapter 4. Integral Calculus
For the general case, we write <p(x) = p(0) + [p(x) - <p(0)}, obtaining
We have just seen that the first quantity on the right exists, and the second one is a
proper integral: The integrand is actually continuous on [a, b] if we define its value
six - Otobe^'(O). □
POO R
rR
For example, the integral /^ x(l + x2) x dx is divergent because the integrand
is asymptotically equal to x"1 as x —> ±oo, but its principal value is zero because
the integrand is odd.
EXERCISES
poo
c. / x2e~x dx.
Jo
d. / -z— —dx.
J'S x2 — x — 2
4.6. Improper Integrals 201
/OO 1
tan — dx.
x
a. / -L-^da.
'1 - x2
a. / £-3/4e-x dx.
Jo
1
b.
o
r
ex _
do;
d.
0
oo
6. Let /(a) = 1 on the intervals [1, lj], [2, 2|], [3, 3|], ..., and /(s) = 0 else
where.
202 Chapter 4. Integral Calculus
a. Show that /0°° f(x) dx converges (and is equal to 1) although f(x) -/> 0 as
x —>■ oo.
8. Prove that J^° x~l\ sin.r| dx diverges. (Hint: Show that there is a constant
c > 0 such that [{n+1)v x~l\ sinarl dx > c f('l+1)7r z"1 ^ for all n > 1.)
•J 717T v Tin
V.)
where the 5r's are a family of measurable sets that fill out W2 as r ->■ 00. For
instance, we could take Sr to be the disc of radius r about the origin, or the square
of side length r centered at the origin, or the rectangle of side lengths r and r2
centered at the origin, or the disc of radius r centered at (15, —37), and so on. The
difficulty is evident: There is a bewildering array of possibilities, with no rationale
for choosing one over another and no guarantee that different families Sr will yield
the same limit.
Evidently there is some work to be done, and we shall not give all the details
here. The outcome, in a nutshell, is that everything goes well when the integrand is
nonnegative or when the integral is absolutely convergent, but not otherwise.
4.7. Improper Multiple Integrals 203
With 5, /, and Uj as in (4.63), the integrals /• • -/^ / dVn exist for all j, and
they increase along with j since the sets Uj do. It therefore follows from the mono
tone sequence theorem that the limit
always exists, provided that we allow +00 as a value, and this limit is an obvious
candidate for the value of the improper integral /• • -Js f dVn.
Here is the crucial point: Suppose that {Uj} is another sequence of sets satis
fying the conditions of (4.63). Then the two limits
are equal. Therefore, it makes sense to define to define the integral of / over S by
The proof that the limit in (4.64) is independent of the choice of {Uj}, in full
generality, requires the Lebesgue theory of integration. We shall give a proof under
some additional restrictions on S and the Uj's, usually easy to satisfy in practice,
in Appendix B.6 (Theorem B.25).
It is also true that improper multiple integrals of nonnegative functions can be
evaluated as iterated improper integrals under suitable conditions on 5 and / so
that the latter integrals exist. For example,
We shall not attempt to state a general theorem to cover all the various cases (much
less give a precise proof), but we assure the reader that as long as the integrand is
nonnegative, there is almost never any difficulty.
The analogue of the comparison test, Theorem 4.55, is valid for multiple im
proper integrals, with the same proof. Again the basic comparison functions are
powers of |x|, but the critical exponent depends on the dimension.
4.65 Proposition. Forp > 0, define fp on R"\{0} by fp(x) = |x|~p. The integral
of fp over a ball {x : |x| < a} is finite if and only ifp < n; the integral of fp over
the complement of a ball, {x : |x| > a}, is finite if and only ifp > n.
rb p2tt
r-prd6dr, / /
As e —> 0 and b —> oo we obtain 2vr Joa rl~p dr and 27r Ja°° rl~p dr, which are
convergent when p < 2 and p > 2, respectively.
The proof for general n is similar, using spherical coordinates and their ana
logues in higher dimensions. The reader is invited to work out the case n = 3 in
Exercise 1. □
e-x2-y2
On the one hand, we can take the approximating regions Uj to be discs centered at
the origin and switch to polar coordinates:
The two integrals in parentheses are equal, of course; the name of the variable of
integration is irrelevant. We have shown that
_ 2 V
e x dxj = 7T.
2
Since e~x > 0, we can take the positive square root of both sides to obtain the
magic formula:
f°° _ 2
f
4.66 Proposition. / e x dx
J
Now, what about functions that are not nonnegative? Let us suppose that 5, /,
and {Uj} are as in (4.63), but / is merely assumed to be real-valued. The essential
point is that the preceding theory can be applied to |/|, so that it makes sense to say
that J- ■ Js\f\ dVn converges. If this condition holds, the argument used to prove
Theorem 4.58 shows that linij^oo J- ■ -jv f dVn exists and that
where /+(x) = max[/(x), 0] and /~(x) = max[—/(x), 0]. The integrals on the
right converge by comparison to the integral of |/|, and they are independent of
the choice of {Uj}; hence, so is the limit on the left. In short, if J- • -Js\f\ dVn
converges, we may define the improper integral of / over S by formula (4.64); the
limit in question exists and is independent of the choice of approximating sequence
{Uj}.
The same result holds if / is complex-valued; we simply consider its real and
imaginary parts separately.
In dimensions n > 1, however, there is no general theory of improper integrals
that are convergent but not absolutely convergent. Such integrals, when they arise,
must be defined by specific limiting procedures that are adapted to the situation at
hand.
EXERCISES
2. Determine whether the following improper integrals converge, and evaluate the
ones that do.
dV
/R3 -1- + x
ft
b.
(x2 + y2 + z2fl2
d. // xe~x v dA.
>x>0
2
4. Let f(x,y) = (x2 - y2){x2 + y2)"2, and let S be the unit square [0,1] x
[0,1]-
a. Show that ffs |/| dA = oo.
b. Show by explicit calculation that the iterated integrals /0 f^ fix, y) dx dy
and Jo Jo f{x,y) dy dx both exist and are unequal.
In several places in this book we allude to the fact that in advanced analysis, the
Riemann theory of integration that we have developed here is replaced by the more
sophisticated theory due to Lebesgue. Detailed accounts of the Lebesgue integral
can be found in Bear [3], Jones [9], and Rudin [18]. Here we shall content our
selves with a brief informal description of how it works. (Note: There are several
ways to develop the Lebesgue theory of integration; in some treatments, the char
acterization of Lebesgue measure and the Lebesgue integral that we give here are
theorems rather than definitions.) In a few places we need the notion of the sum of
an infinite series, for which the reader is referred to §6.1.
The starting point is a refined concept of re-dimensional measure, independent
of any theory of integration. To keep things on a concrete level, let us explain this
concept for the case n = 2.
In the Jordan theory of area, described in §4.2, we find the area of a set S C K2
by approximating S from the inside and the outside by unions of rectangles. For
the Lebesgue notion of area, we use a two-step approximation process: We first
approximate S from the inside by compact sets and from the outside by open sets,
then approximate the compact sets from the outside and the open sets from the
inside by unions of rectangles. More precisely, let us agree to call a set that is
the union of a finite collection of rectangles with disjoint interiors a tiled set. The
Lebesgue measure m(S) of a set S C M2 is then defined as follows:
• If T = U/c=i Rk is a tiled set, where the Rk's are rectangles with disjoint
interiors, the Lebesgue measure m(T) is the sum of the areas of the R^s.
208 Chapter 4. Integral Calculus
are equal, in which case their common value is the Lebesgue measure m,(S).
Note that there is no assumption that the sets in question are bounded (although
compact sets are bounded by definition); the Lebesgue theory applies equally well
to bounded and unbounded sets.
The notion of n-dimensional Lebesgue measure for sets in Wl is entirely simi
lar; only the terminology needs to be modified a little. Every set that one will ever
meet in "real life" — in particular, every open set, every closed set, every intersec
tion of countably many open sets, every union of countably many closed sets, and
so on — is Lebesgue measurable.3 Lebesgue measure has the following fundamen
tal additivity property: If {Sj} is a finite or infinite sequence of disjoint Lebesgue
measurable sets, then (J Sj is Lebesgue measurable and rn(\J Sj) = Y2 m(Sj). In
the Jordan theory, this additivity is guaranteed to hold only for finitely many sets;
the extension to infinitely many sets is the crucial property that allows the Lebesgue
theory to handle various limiting processes more smoothly.
It is not hard to show that every open set U C K" is the union of a finite or
countably infinite family of rectangular boxes Rj (intervals when n — 1) with dis
joint interiors, and the Lebesgue measure of U is just the sum of the n-dimensional
volumes of the boxes. (In general these boxes are not part of a fixed grid of boxes;
if there are infinitely many of them, the diameter of Rj generally tends to zero as
j —> ex).) It follows that a set S C Wl has Lebesgue measure zero if and only iffor
every e > 0, 5 is contained in the union of a finite or countable family of boxes,
the sum of whose volumes is less than e. The only difference between this and the
condition that S have zero content is the fact that here we allow an infinite family
'For those who know some set theory: More precisely, one cannot construct Lebesgue nonmea-
surable sets without invoking the axiom of choice.
4.8. Lebesgue Measure and the Lebesgue Integral 209
(The Lebesgue measurability of / is needed so that the terms in this sum are well
defined. One or more of them may be infinite, in which case the value of the
sum is +00.) The sums Snf increase with n because the associated partitions of
[0, oo) become finer and finer, so they have a limit (possibly +oo), which is defined
to be the Lebesgue integral of / (over W1), denoted by / / dm. More generally,
we define the Lebesgue integral of / over any Lebesgue measurable set S C W1,
denoted by Js f dm, to be f(fxs) dm. Note that neither the function / nor the
set 5 needs to be bounded; for nonnegative integrands there are no "improper"
integrals in the Lebesgue theory.
Now we drop the assumption that / > 0. If / is any Lebesgue measurable
function, we write it as the difference of the two nonnegative functions
Js f dm, and the two are equal; but the class of Lebesgue integrable functions is
much bigger than the class of Riemann integrable functions. We conclude with two
additional remarks about the relation between the Lebesgue and Riemann integrals.
The themes of this chapter are (1) integrals over curves and surfaces and (2) differ
ential operations on vector fields, which combine to yield (3) a group of theorems
relating integrals over curves, surfaces, and regions in space that are among the
most powerful and useful results of advanced calculus.
At the outset, let us explain the term "vector field" in more detail. Let F be
an Rn-valued function defined on some subset of Rn. We have encountered such
things in previous chapters, where we generally thought of them as representing
transformations from one region of K" to another or coordinate systems on regions
of Rn. In this chapter, however, we think of such an F as a function that assigns to
each point x in its domain a vector F(x), represented pictorially as an arrow based
at x, and we therefore call it a vector field. Two simple vector fields are sketched in
Figure 5.1. The primary physical motivation is the idea of a force field. For exam
ple, F could represent a gravitational field, F(x) being the gravitational force felt
by a unit mass located at x, or an electric field, F(x) being the electrostatic force
felt by a unit charge located at x. There are many other physical interpretations; for
example, in a moving fluid like a stream of water, F(x) could represent the velocity
of the fluid at position x. (In all these examples, F(x) may also depend on other
parameters such as the time £.)
211
212 Chapter 5. Line and Surface Integrals; Vector Analysis
Figure 5.1: The vector fields F(x,y) = (x,y) (left) and F(x,y) =
(-y, x) (right).
set S C Wl, we always mean that it is of class Ck on some open set containing S.
In this section we discuss integrals over curves, traditionally called "line integrals,"
which are generalizations of ordinary (one-dimensional) integrals over intervals on
the real line. As one would expect, they are based on the idea of cutting up the curve
into many tiny pieces, forming appropriate Riemann sums, and passing to the limit.
However, there are two species of line integrals, appropriate for integrating real-
valued or vector-valued functions, depending on how one adapts the differential dx
appearing in f% f(x) dx to the more general situation. Our discussion here will
be on the informal, intuitive level where we think of dx as being an infinitesimal
increment in the variable x.
is the vector difference between the two points, and we imagine it as being infinitely
small. We may, however, be more interested in the distance between the two points,
traditionally denoted by ds, which is
(5.2) ds = = \l dx\
To give these differentials a precise meaning that can be used for calculations, the
best procedure is to parametrize the curve. Thus, we assume that C is given by
5.1. Arc Length and Line Integrals 213
(The difference between the increment of g and its linear approximation disappears
in the infinitesimal limit.) Moreover,
which is just what one gets by formally multiplying and dividing the expression on
the right of (5.2) by dt.
What happens if we sum up all the infinitesimal increments dx or ds — that
is, if we integrate the differentials dx or ds = |dx| over the curve? Integration of
the vector increments dx just gives the total vector increment, that is, the vector
difference between the initial and final points on the curve:
This is nothing but the fundamental theorem of calculus applied to the components
of g; it is simple but not very exciting. On the other hand, ds is the straight-line
distance between two infinitesimally close points x and x + dx on the curve, and
since smooth curves are indistinguishable from their linear approximations on the
infinitesimal level, ds is the arc length of the bit of curve between dx and x + dx.
Adding these up gives the total arc length of the curve:
mapping from the interval [c, d] to the interval [a, b]. Then the curve C described
by x = g(i) is also described by x = (g o ip)(u), c < u < d, so we should have
where for the second equality we have used the chain rule. This does indeed agree
with (5.6), by formula (4.34).
The same independence of parametrization holds for the related integral (5.5),
with one subtle but important difference. The integral Ja g'(t) dt gives the vector
difference between the two endpoints of the curve, which is clearly independent of
the parametrization except insofar as the parametrization determines which is the
initial point and which is the final point. If we choose a new parameter u as above
so that t is a decreasing function of u (thus a = <p(d) and b = ip(c)), then the initial
and final points get switched, and so their difference is multiplied by —1.
The issue here is that a parametrization x = g(t) determines an orientation for
the curve C, that is, a determination of which direction along the curve is "forward"
and which direction is "backward," the "forward" direction being the direction in
which the point g(t) moves as t increases. The orientation of a curve can be conve
niently indicated in a picture by drawing one or more arrowheads along the curve
that point in the "forward" direction, as indicated in Figure 5.2. The substance of
the preceding paragraph is then that the integral (5.5) depends on the parametriza
tion only insofar as the parametrization determines a choice of orientation. In
contrast, the arc length of a curve is independent even of the orientation.
The notion of arc length extends in an obvious way to piecewise smooth curves,
obtained by joining finitely many smooth curves together end-to-end but allow
ing comers or cusps at the joining points; we simply compute the lengths of the
smooth pieces and add them up. We can express this more precisely in terms of
parametrizations, as follows: The function g : [a, b] —» R" is called piecewise
smooth if (i) it is continuous, and (ii) its derivative exists and is continuous except
perhaps at finitely many points tj, at which the one-sided limits limt-^-i g'(t) ex
ist. (Note. In Chapter 8 we shall use the term "piecewise smooth" in a slightly
5.1. Arc Length and Line Integrals 215
Remarks.
This is independent of the parametrization and the orientation, by the same chain-
rule calculation that we performed above for the case / = 1.
216 Chapter 5. Line and Surface Integrals; Vector Analysis
Average of / over C =
Icfds _ Icfds
Arc length of C Jc ds
Example 1. What is the centroid of the upper half of the unit circle, C =
{(x,y):x2 + y2 = l, y > 0}?
Solution. The centroid of C is the point whose coordinates (x, y) are the
averages of x and y over C. Clearly x = 0 by symmetry. Just to get some
practice, let's do the calculation of the arc length of C (which of course is ir)
and Jc y ds with two different parametrizations: (i) taking x as the parameter
and y = Vl - x2, and (ii) taking the polar angle 9 as the parameter, x = cos 9,
y — sin 9. (Note that these two parametrizations give opposite orientations on
C; the first goes from left to right, the second from right to left.)
In the first parametrization, we have
—xdx dx
dy = ds = dy2 = dx =
l-x2
so
yds = dx = x _ — 2,
c
/ ds = = arcsmx , = tt.
2 1-1
Jc
In the second one, we have
so
f
/ yds =
r ds= d9 = ir.
Jc Jo 'c Jo
Either way, y — 2/vr.
F(i) dt \F(t)\dt.
Here we have applied Theorem 4.9d to the scalar-valued function F(t) ■ u and then
invoked Cauchy's inequality. The desired result is obtained by taking u to be the
unit vector in the direction of Ja F(t) dt. □
Of greater interest is a scalar-valued line integral for vector fields — that is, for
Rn-valued functions on Rn. If C is a smooth (or piecewise smooth) curve in W1
and F is a continuous vector field defined on some neighborhood of C in Rn, the
line integral of F over C is
(5.9) = f F(g(t))-g'(t)dt.
C Ja
*(*(*)) = O^T.
That is, t(x) is the unit tangent vector to the curve C in the forward direction at the
point x, and Ftang(x) is the component of F(x) in the direction of t(x). Then
so
Remarks.
i. If F is a force field, then Jc F • dx represents a quantity of energy; it is the
work done by the force on a particle that traverses the curve C.
ii. The integrand F • dx = F\ dx\ + ■ ■ ■ + Fn dxn in a line integral, with the
dx's included, is often called a differential form, and we speak of integrating
a differential form over a curve. We shall return to this notion in §5.9.
What does all this boil down to when n = 1? In this case, vector fields and
scalar functions are the same thing, and both the scalar and vector versions of line
integrals are just ordinary one-variable integrals. The former, however, is indepen
dent of orientation, whereas the latter depends on orientation. The distinction is the
same as the one between formulas (4.32) and (4.33) in §4.4; it is a question of
/
f f(x) dx versus /
fb f(x) dx.
J[a,b] Ja
In the integral on the left we must have a < b; but in the integral on the right a and
b can occur in either order, and the sign of the integral depends on the order.
The integral of the first three terms over [0, 2tt] vanishes, and the integral of the
last one is tt. So Jc F • dx = vr.
5.1. Arc Length and Line In tegrals 219
Note that it doesn't matter which point on C we choose to start and end at.
Instead of taking t 6 [0, 2n], we could take t G [a, a + 2tt] for any a € R; the
answer is the same since the integral of a trig function over a complete period
is independent of the particular period chosen.
is bounded, then C is called rectifiable, and the arc length L(C) is defined to be
the supremum of £,:
Note that if P' is a refinement of P then Lpi (C) > Lp (C), by the triangle inequal
ity; hence the supremum is indeed the appropriate sort of limit. This estimate also
implies that the supremum is unchanged if we consider only partitions containing
a given c £ (a, b) among their subdivision points, and from this it follows that arc
220 Chapter 5. Line and Surface Integrals; Vector Analysis
length is additive: If C\ and C% are the curves parametrized by g(t) for t e [a, c]
and t € [c, 6], then L(C) = L(Ci) + L(C2). See Exercise 8.
We now show that this definition coincides with our previous one for C1 curves.
L(C)= [ \g'(t)\dt.
Ja
Proof. For any partition P of [a, b], by (5.5) and Proposition 5.8 we have
rt-i j
1 J
LP(C) = j o
(t)dt <y
— / ^
\g'(t)\dt = Id*.
i
Jt.l- 1 i
It follows that L(C) < / |g'(t)| dt, and in particular that C is rectifiable.
Next, for r, s 6 [a, b], let C* be the curve parametrized by g(£) with t € [r, s],
and let (p(s) = L(C£). (That is, we consider the length of the curve C, starting
at t = a, as a function of the right endpoint of the parameter interval.) Suppose
h > 0. Since arc length is additive, we have L(Css+h) = <p(s + h) — tp{s), so by
the inequality we have just proved (applied to the curve Cg+h) and the mean value
theorem for integrals,
/s+h
\g'(t)\dt = h\g'(a)\,
where a is some number between s and s + h. On the other hand, \g(s + h) — g(s) |
is Lp(Css+h) where P is the trivial partition {s, s + h}, and hence it is no bigger
than L(Css+h). Combining these estimates and dividing by h, we see that
As h —> 0, the quantities on the left and right approach |g'(s)|, and hence so does
the one in the middle. A slight modification of this argument works also for h < 0,
so we conclude that ip is differentiable and that <p'(s) = |g'(s)|. The desired result
is now immediate:
= f \g'(s)\ds.
J
□
5.1. Arc Length and Line Integrals 221
EXERCISES
Use the mean value theorem to express the differences inside the square root in
terms of g' and h', and then use Exercise 9 in §4.1 to give an alternate proof of
Theorem 5.11. (Exactly the same idea works for curves in Rn.)
f f
/ F • dx or / F\ dx\ + i
JdS JdS
5.2. Green's Theorem 223
the sum of the line integrals of F over the positively oriented closed curves that
make up dS.
(5.13) /
Jds
In the more common notation, if we set F = (P, Q) and x = (x, y),
J
IdS
Proof First we consider a very restricted class of regions, for which the proof is
quite simple. We shall say that the region S is x-simple if it is the region between
the graphs of two functions of x, that is, if it has the form
where tpi and ip2 are continuous, piecewise smooth functions on [a, 6]. Likewise,
we say that S is y-simple if it has the form
where ifj\ and ip2 are continuous, piecewise smooth functions on [a, b].
-1
f fb fb
/ Pdx = P(x,(fi(x)) dx — / P(x,if2(x)) dx.
JdS Ja Ja
f Pdx = - ff^dA.
Jas JJs dy
In exactly the same way, using the representation (5.16) for S, we see that
Qiy=fld-3.iA.
ds JJs dx
(There is no minus sign here, because if we take y as the parameter for the curves
x = ipi{y) and ^2(2/), the orientation is wrong for V'i and right for ip2-) Adding
these last two equalities, we obtain the desired result (5.14).
5.2. Green's Theorem 225
Figure 5.6: A decomposition of the region in Figure 5.4 into simple subregions.
Thus Green's theorem is established for regions that are both x-simple and y-
simple. There is now an immediate generalization to a much larger class of regular
regions. Namely, suppose the region S can be cut up into finitely many subregions,
say S = Si U USk, where
a. the 5j's may intersect along common edges but have disjoint interiors;
b. each Sj has a piecewise smooth boundary and is both x-simple and y-simple.
(See Figure 5.6.) Since the 5/s overlap only in a set of zero content, by Corollary
4.23b we have
dQ__dP^ dQ__dP_
dA.
dx dy dx dy
.7 = 1
because the integrals over the parts of the boundaries of the S/s that are not parts of
the boundary of 5 all cancel out. In more detail, if Si and Sj have a common edge
C, then C will have one orientation as part of dSi and the opposite orientation
as part of dSj, so the two integrals over C that make up parts of JdS and Jas_
will cancel each other. Therefore, we obtain Green's theorem for the region S by
applying Green's theorem to the simple regions Sj and adding up the results.
The result we have just obtained is sufficient for most practical purposes, but
it is not definitive. The class of regular regions that can be cut up into simple
subregions does not include all regions with C1 boundary, much less all regions
with piecewise smooth boundary, and it may be difficult to tell whether a given
region has this property. For example, the region
is x-simple but cannot be cut up into finitely many y-simple subregions because
the graph of x3 sinx"1 has infinitely many "wiggles." The deduction of the general
case from the special cases considered here requires some additional machinery that
is of interest in its own right; we present it in Appendix B.7 (Theorem B.28). □
x2 - yexy + 3j/l ^ dA
(2x - 3) dA = -6tt.
Example 3. It is an amusing and sometimes useful fact that the area of a reg
ular region S in the plane can be expressed as a line integral over the boundary
dS. This can be done in many different ways; for instance,
/ / f
Area of S = j xdy = — y dx — ^(xdy — ydx).
Jas Jas Jas
Indeed, Green's theorem shows that all of these integrals are equal to JJS 1 dA.
Hence, by applying Green's theorem to the rotated field F, we obtain the following
result:
(5.18) [*.n«- [[
Jos JJs
Let us see what Green's theorem says when F is the gradient of a C2 function
/, so that F\ = d\f and F2 = d2f ■ Formula (5.13) gives
This is no surprise; it is easy to see directly that the line integral of a gradient over
any closed curve vanishes. Indeed, if the curve C is parametrized by x = g(t) with
g(a) = g(b), then by the chain rule,
The formula (5.18) gives a more interesting result. V/ • n is the directional deriva
tive of / in the outward normal direction to dS, or normal derivative of / on dS,
often denoted by df/dn; and (29) says that
fd2f d2f\
as
jf
dn
ds = JJS
/ / \dxf
-4
\df
4 + ^4
d\)
) dA-
The integrand on the right is the Laplacian of /, which we encountered in §2.6 and
which will play an important role in §5.6.
EXERCISES
2. Let 5 be the annulus 1 < x2 + y2 < 4. Compute fdS(xy2 dy — ary dx), both
directly and by using Green's theorem.
3. Find the positively oriented simple closed curve C that maximizes the line
integral fc[y3 dx + (3a; - x3) dy].
4. Use Green's theorem as in Example 3 to calculate the area under one arch of
the cycloid described parametrically by .7; = R(t - shit), y = R(l — cos I).
7. The point of this exercise is to show how Green's theorem can be used to de
duce a special case of Theorem 4.41. Let U, V be connected open sets in R2,
and let G : U —> V be a one-to-one transformation of class CL whose deriva
tive DG(u) is invertible for all u G U. Moreover, let 5 be a regular region in V
with piecewise smooth boundary, let A be its area, and let T = G~l (S).
a. The Jacobian det DG is either everywhere positive or everywhere negative
on U; why?
b. Suppose det,DG(u) > 0 for all u G U. Write A = fQSydx as in Ex
ample 3, make a change of variable to transform this line integral into
a line integral over OT, and apply Green's theorem to deduce that A =
ffT det DG dA.
c. By a similar argument, show that if detZ?G(u) < 0 for all u € U, then
A = - jjr det DG dA = ]JT \ det, DG\ dA. Where does the minus sign
come from?
In this section we discuss integrals of functions and vector fields over smooth sur-
faces in K . Like line integrals, surface integrals come in two varieties, unoriented
and oriented. On a curve the orientation is a matter of deciding which direction
along a curve is "positive"; on a surface it is a matter of deciding which side of the
surface is the "positive" side. The convenient way of specifying the orientation of
5.3. Surface Area and Surface Integrals 229
a smooth surface in R3 is to make a choice of one of the two unit normal vectors
at each point of the surface, in such a way that the choice varies continuously with
the point. The "positive" side of the surface is the one into which the normal arrow
points.
It is important to note that not every surface can be oriented. The standard
example of a nonorientable surface is the Mobius band, which can be constructed
by taking a long strip of paper, giving it a half twist, and gluing the ends together.
(That is, call the two sides of the original strip A and B; the ends are to be glued
together so that side A of one end matches with side B of the other.) A sketch of a
Mobius band is given in Figure 5.7, but the best way to appreciate the features of
the Mobius band is to make one for yourself.
However, if a surface forms part of the boundary of a regular region in R3, it
is always orientable, and the standard specification for the orientation is that the
positive normal vector is the one pointing out of the region.
For a given surface S, it may not be the case that all of S can be represented by a
single parametrization. We shall assume, however, that S can be cut up into finitely
many pieces which each admit a parametrization; it is then enough to consider the
pieces separately. Also, it is usually sufficient to have a good parametrization for a
subset of S whose complement is of lower dimension, such as the one provided by
spherical coordinates on the unit sphere with the "international date line" removed.
To see how to compute surface area on S, consider a small rectangle in the uv-
plane with vertices (u, v), (u + Ait, v), (u, v + Av), and (u + An, v + A?;).* Its
image under the map G is a small quadrilateral (with curved sides) on the surface
230 Chapter 5. Line and Surface Integrals; Vector Analysis
S whose vertices are G(u, v), G(u + Au, v), etc. (See Figure 3.4 in §3.3.) In the
limit in which the increments Au and Av become infinitesimals du and dv, this
quadrilateral becomes a parallelogram whose sides from the vertex x = G(u, v) to
the two adjacent vertices are described by the vectors
<9G dC
G(u + du, v) — G(u, v) = —— du and G(u, v + dv) — G(v) = —— dv.
These two vectors are tangent to the surface 5 at x, so their cross product is a
vector normal to S at x, whose magnitude is the area of the parallelogram they
span. Therefore, the element of area on S is given in terms of the parametrization
x = G(u,v) by
dG dG
(5.19) dA = dudv.
du dv
dG dG
(5.20) Area of G{R) = du dv.
R du dv
i j k
dG dG d(z,x) d(x,y)]
= det | dux duy duz
du dv d(u,v) d{u,v)) d(u,v)
tx ovy ovz/
Thus,
~d{y,z) , x) d(x,y)
(5.21) dA = dudv.
d(u,v) d(u,v) d(u,v)
Computationally, this is usually a horrible mess. (But what did you expect? Arc
length is already problematic; surface area must be worse!)
5.3. Surface Area and Surface Integrals 231
As with arc length, we must verify that our informally-derived formula for sur
face area really makes sense by checking that it is independent of the parametriza-
tion. Thus, suppose we make a change of variables (u, v) = <fr(s, t), where $ is a
one-to-one C1 map from a region V in the st-plane to the region W in the uv-plane.
The elements of area are then related by
d(u, v)
du dv = dsdt,
d(s,t)
d{y,z) d(u,
dA = y/tf + W+i*dsdt, where a = o(u,v)
a) [*, J> etc-
o(s,t)
But by the chain rule and the fact that the determinant of a product is the product
of the determinants, we have
d{y,z)d(u,v) d{y,z)
d{u,v) d(s,t) " d{s,t) '
and likewise for the other two terms. Hence, in the st-parametrization,
9
\d(y,z) d(z,x)V , \d(x,y)
dsdt.
[d(s,t)\ [d{s,t)\
dG dG
^ = {d^^p)3 + k,
(Note that our surface is a level set of the function &(x, y, z) — z — tp(x, y) and
that — (d>,;<p)i — (dyip)j + k = V$; we deduced that V<1> is normal to the surface
by other means in Theorem 2.37.)
232 Chapter 5. Line and Surface Integrals; Vector Analysis
and by (5.22), the area of the upper hemisphere is obtained by integrating this
function over the unit disc. (Note that this integral is improper, as the integrand
blows up along the boundary of the disc.) Switching to polar coordinates yields
. 2 a d(z,x) . 2 d(x,y)
= sm ipcos0, ——— = sm ipsins', —-—— = cosipsmip,
sin4 <p(cos2 9 + sin2 9) + cos2 (p sin2 ip = sin2 (p(cos2 <p + sin2 ip) = sin2 tp.
Hence, by (5.21), the area of the sphere is
[•it r'2n
dG dG
fdA= f(G(u,v)) x dudv.
s J Jw du dv
fdA= // f(x,y,
J Jw
5.3. Surface Area and Surface Integrals 233
dG dG\ , ,
—— x —— du dv = n dA
du dv )
where n is a unit normal vector to the surface S. We have already observed that
dA is independent of the parametrization, and clearly so is n up to a factor of ±1.
However, using a different parametrization (for example, interchanging u and v)
might result in replacing n by -n. In other words, a parametrization for a surface
5 gives a definite orientation for the 5, that is, a specification of which side of S is
the "positive" side.
Now suppose 5 is a surface with a specified orientation, and F is a continuous
vector field defined on a neighborhood of 5. The surface integral of F over 5 is
defined to be
say z = <p(x, y). As in the preceding discussion of surface area, we take x and y
as the parameters and find that
The orientation here is the one with the normal pointing upward, since its z com
ponent is positive. Thus, if F = Fii + F2j + F3k and G(x, y) = (x, y, <p(x, y)),
(5.23) F-ndA
dxdy.
This gives the right orientation since the z component, namely r, is positive.
Thus,
F-ndA
/■2tt /"I
= / / [(r cos 6)2(-r cos 0) + (r sin 9)r(-r sin 9) + (r sin 9)r] drd9,
Jo Jo
whose value is easily found to be —\ix. Alternatively, we could use the repre
sentation z = \Jx2 + y2 and use (5.23). The reader may verify that this leads
to
F • n dA = dx dy,
Ix2+y2<l
and conversion of this integral to polar coordinates leads to the same r#-integral
as before.
5.3. Surface Area and Surface Integrals 235
Finally, as a practical matter we need to extend the ideas in this section from
smooth surfaces to piecewise smooth surfaces. Giving a satisfactory general def
inition of a "piecewise smooth surface" is a rather messy business, and we shall
not attempt it. For our present purposes, it will suffice to assume that the surface 5
under consideration is the union of finitely many pieces Si,..., Sk that satisfy the
following conditions:
ii. The intersections Si fl Sj are either empty or finite unions of smooth curves.
jfsf^=±JLfdA-
Condition (ii) guarantees that the parts of S that are counted more than once on
the right, namely the intersections Si D Sj, contribute nothing to the integral, by
Propositions 4.19 and 4.22.
Example 3.
a. Let 5 be the surface of a cube; then we can take Si,..., 56 to be the faces
of the cube.
b. Let S be the surface of the cylindrical solid {(x,y,z) : x2 + y2 < 1, \z\ <
1}. We can write 5 = Si U S2 U S3 where Si and 5*2 are the discs forming
the top and bottom and 53 is the circular vertical side. Si and 52 can be
parametrized by (x,y) —)• (x,y, 1) and (x,y) —* {x,y, — 1) witha;2 + y2 <
1, and 53 can be parametrized by {6,z) —> (cos6,s'm9,z) with 0 < 0 <
2n and \z\ < 1. If one wishes to use only one-to-one parametrizations with
compact parameter domains, one can cut 53 further into two pieces, say
the left and right halves defined by 0 < 9 < ir and ir < 9 < 2tt.
Remark. In condition (ii) above, we have in mind that the sets Sj will intersect
each other only along their edges, although there is nothing to forbid them from
crossing one another. For example, 5 could be the union of the two spheres S\ =
{x : |x| = 1} and 52 = {x : |x - i| = 1}. This added generality is largely useless
but also harmless.
EXERCISES
1. Find the area of the part of the surface z = xy inside the cylinder x2 + y2 = a2.
236 Chapter 5. Line and Surface Integrals; Vector Analysis
2. Find the area of the part of the surface z = x2+y2 inside the cylinder x2+y2 =
a2.
3. Suppose 0 < a < b. Find the area of the torus obtained by revolving the circle
(x — b)2 + z2 = a2 in the xz-plane about the z axis. (Hint: The torus may be
parametrized by x = (b + a cos </?) cos 9,y = (b + a cos ip) sin 6, z = a sin tp,
with 0 < ip, 6 < 2tt.)
7. Compute JJs(x2 + y2 — 2z2) dA where S1 is the unit sphere. Can you find the
answer by symmetry considerations without doing any calculations?
We are already familiar with this notation in connection with the gradient of a C1
function on Rn, which is the vector field defined by
(Some authors write rotF instead of curlF; "rot" stands for "rotation.") Again,
the curl has a geometric significance that will be explained later, in §5.7.
We shall employ the notations div F and curl F in preference to V-F and V x F
because they seem to be more readable. In this section we shall also write grad /
instead of V/ for the sake of consistency; later we shall use these two notations
interchangeably.
The operators grad, curl, and div satisfy product rules with respect to scalar
multiplication and dot and cross products. As these rules are useful and some of
them are not obvious, it is well to make a list for handy reference. In the following
formulas, / and g are real-valued functions and F and G are vector fields, all of
class C1.
. dG
Equations (5.24) and (5.28) are valid in R" for any n; the others, which involve
cross products and curls, are restricted to n = 3. The proofs of all these formulas
are just a matter of computation; we leave them to the reader as exercises.
We can combine the operations grad, curl, and div pairwise in several ways.
That is, if / and F are of class C2, we can form
It is an important fact that the first two of these always vanish, by the equality
of mixed partials:
(5.30) curl(grad/)
and
(5.31) div(curlF)
= 9i(a2F3 - d3F2) + d2(d3F1 - dLF3) + d3{dvF2 - d2Fi) = 0.
Schematically, we have
and (5.30) and (5.31) say that the composition of two successive mappings is zero.
The third combination, div(grad/), which makes sense in any number of di
mensions, is of fundamental importance for both physical and purely mathematical
reasons. It is called the Laplacian of / and is usually denoted by V2/ or A/:
The last two combinations are of less interest by themselves, but together they yield
the Laplacian for vector fields in M3:
EXERCISES
7. Prove (5.33).
8. Why is the minus sign in (5.29) there? That is, on grounds of symmetry, with
out going through any calculations, why must the formula div(F x G) =
G • (curl F) + F • (curl G) be wrong?
9. Show that for any C2 functions / and g, div(grad / x gradg) = 0.
where W is a regular region in the xy-plane and <pi and <p2 are piecewise smooth
functions on W. We define the notions of yz-simple and xz-simple similarly, and
we say that R is simple if it is rcy-simple, yz-simple, and xz-simple.
240 Chapter 5. Line and Surface Integrals; Vector Analysis
Suppose now that R is simple. We shall prove the divergence theorem for
the region R by considering the components of F separately. That is, let F =
Fii + F23 + F3k; we shall show that
ff ff
F3k-ndA= // F3(x,y,ip2(x,y)) dxdy - // F3(x,y,tpi(x,y)) dxdy
dR J Jw J Jw
x, y, z) dz dx dy
W Jipi(x.y)
d3F3(x,y,z)dV,
R
as claimed. The proof for Fii and F2j is the same, using the assumptions that R is
yz-simple and zz-simple.
It now follows that the divergence theorem is valid for regions that can be cut
up into finitely many simple regions R\,..., R^. The integrals of div F over the
regions i?i,..., Rk add up to the integral over R, and the integrals of F ■ n over
the boundaries 8R±,..., dRk add up to the integral over dR because the integrals
over the portions of the dRj's that are not part of dR cancel out. (The reasoning is
the same as in the proof of Green's theorem.)
The completion of the proof for general regular regions with smooth boundary,
with indications of how to generalize it to the piecewise smooth case, is given in
Appendix B.7 (Theorem B.30). □
The integral on the right is the flux of F across dBr from the inside (J5r) to the
outside (the complement of Br). If we think of the vector field as representing
the flow of some substance through space, the integral represents the amount of
substance flowing out of Br minus the amount of substance flowing in; thus, the
condition divF(a) > 0 means that there is a net outflow near a, in other words,
that F tends to "diverge" from a. (The effect is subtle, though: One has to divide
the flux by r3 in (5.36) to get something that does not vanish in the limit.) In any
case, the integral in (5.36) is a geometrically defined quantity that is independent
of the choice of coordinates; this gives the promised coordinate-free interpretation
of div F.
Among the important consequences of the divergence theorem are the follow
ing identities.
(5.38) If fVg-ndA =
J JdR
EXERCISES
,1.^ Use the divergence theorem to evaluate the surface integral JJS F • n dA for the
following F and S, where S is oriented so that the positive normal points out
of the region bounded by S.
a. F, S as in Exercise 8b in §5.3.
b. F, S as in Exercise 8e in §5.3.
c. F(x, y, z) = x2i + y2] + z2k; 5 is the surface of the cube 0 < x, y, z < a.
d. F{x,y,z) = (x/a2)i + (y/62)j + {z/c2)k; S is the ellipsoid (x/a)2 +
(y/b)2 + (z/c)2 = 1.
e. F(x, y, z) = x2i - 2xyj + z2k; S is the surface of the cylindrical solid
{(x,y,z) : (x,y) G W, 1 < z < 2} where W is a smoothly bounded
regular region in the plane with area A.
2. Let F(x, y, z) = (x2 + y2 + z2)(xi + yj + zk) and let S be the sphere of radius
a about the origin. Compute Jfs F ■ n both directly and by the divergence
theorem.
where nx is the x-component of the unit outward normal to dR. (Of course,
similar formulas also hold with x replaced by y and z.)
In this section we illustrate the uses of the divergence theorem by deriving some
important differential equations of mathematical physics. We make a standing as
sumption that all unspecified mathematical functions that denote physical quantities
are smooth enough to ensure the validity of the calculations.
(a) (b)
dS
v V
|v| dt cL4, so the amount of substance in the box is p\ v| dt dA. In short, the rate of
flow of substance through dS is p|v| dA.
Now suppose, more generally, that the angle from the velocity v to the normal
n to dS is 9. We apply the same reasoning to the box in Figure 5.8b. The vertical
height of the box is now | cos 9\ times the slant height of dS, so the volume of the
box is |v| | cos#| dtdA = |v ■ n| dtdA. Therefore, the rate of flow of substance
through dS is pv • n dA if we take orientation into account, that is, if we count the
flow as negative when it goes in across dS in the direction opposite to n.
Passing from the infinitesimal level to the macroscopic level, we conclude that
the rate offlow of substance through a surface S is
The time-dependent vector field J = pv that occurs here represents the momentum
density if p is the mass density of the substance, and it represents the current density
if the substance is electric charge and p is the charge density. Our earlier remarks
about interpreting vector fields in terms of flows really mean thinking of the vector
field as a momentum or current density.
d
p(x, t) J ■ ndA.
dt R dR
5.6. Some Applications to Physics 245
(The integral on the right is positive when the substance flows out of S, i.e., when
the amount of substance in S is decreasing; hence the minus sign.) The quantity
on the left is the integral over R of dp/dt, by Theorem 4.47. We can use the
divergence theorem to convert the integral on the right to another integral over R,
obtaining
Now, this relation holds for any region R. In particular, let us take R = Br to
be the ball of radius r centered at the point x. After division of both sides by the
volume of Br, (5.40) says that the mean values of dp/dt and - div J on Br are
equal. Letting r —> 0 and assuming that these functions are continuous, we see that
their values at the center x are equal. In short, we have
(5.41) ^
at
+ divJ = 0,
the classic differential equation relating the charge and current densities (or mass
and momentum densities, etc.).
This argument is reversible; that is, (5.41) implies that the substance is con
served. Indeed, suppose R is a regular region such that no substance flows in or out
of R. Integrating (5.41) and using Theorem 4.47 and the divergence theorem, we
obtain
The Heat Equation. We now derive a mathematical model for the transfer
of heat through a substance by diffusion. (If the substance in question is a fluid
like water or air, our model does not take convection effects into account; we must
assume that the fluid is immobile on the macroscopic scale. But our model is valid
246 Chapter 5. Line and Surface Integrals; Vector Analysis
for the diffusion of heat in solids as well as in fluids that cannot flow readily, such
as air in a down jacket.) Our model will take the form of a differential equation for
the temperature u(x., t) at position x and time t.
The first basic physical assumption (which may be a simplification of the real-
life situation) is that the thermal energy density is proportional to the temperature.
The constant of proportionality a is the specific heat density; it is the product of the
usual specific heat or heat capacity and the mass density of the substance. The total
thermal energy (or "heat," for short) within a region R at time t is then
CTU(X, tJCTX.
R
The next assumption is Newton's law of cooling, which says that heat flows
from hotter to colder regions at a rate proportional to the difference in temperature.
In our situation, the precise interpretation of this statement is that the flux of heat
per unit area in the direction of the unit vector n at the point x is proportional to the
directional derivative Vii(x) • n of the temperature in the direction n, the constant
of proportionality being negative since heat flows in the direction of decreasing
temperature. Denoting the constant of proportionality by —K, then, we see that the
flux of heat across an oriented surface S with normal vector n is
KVu-ndA.
s
d
fff au(x, t) d3x = If KVu(x, t)-ndA+ IIf F(x, t) o!3x.
Here n denotes the unit outward normal to dR, as usual, and the minus sign on the
surface integral has disappeared because a positive flow of heat out of R represents
a decrease of heat in R.
As in the preceding subsection, we bring the d/dt inside the integral and apply
the divergence theorem to obtain
5.6. Some Applications to Physics 247
Since this holds for an arbitrary regular region R, we conclude as before that
E = -Vu.
(For any points xi and X2, u(x2) - u(xi) is the work done in moving a unit positive
charge from xi to X2 through the field E.)
If, instead of a single charge at one point, our system of charges consists of
a number of charges located at different points, the electric field (resp. electric
potential) generated by the system is just the sum of the fields (resp. potentials)
generated by the individual charges. We wish to consider the case where there
is a continuous distribution of charge (an idealization, but a useful one) in some
bounded region of space. That is, we are given a charge density function p(p), a
continuous function that vanishes outside some bounded set R. The field generated
by such a charge distribution is found in the usual way: Chop up the set R into tiny
pieces, treat the charge coming from each piece as a point charge, and add up the
248 Chapter 5. Line and Surface Integrals; Vector Analysis
resulting fields or potentials. We shall work primarily with the potentials, for which
the result is
(5.43) tt(x)=
|p-x
(5.44) «(*)=
(5.45) V2(|y|-1)=0fory^0.
Proof. We can differentiate u by passing the derivatives under the integral sign.
They fall on p, which is assumed to be of class C2, so u is of class C2 and
(Strictly speaking, Theorem 4.47 does not apply because of the singularity of the
integrand at the origin, but this is a minor technicality. One can finesse the problem,
for example, by switching to spherical coordinates, in which the r2 sin ip coming
from the volume element cancels the r"1 of the integrand with room to spare.)
Here V2p(x + y) is obtained by differentiating p with respect to x, but the same
result is obtained by taking the derivatives with respect to y, for dx [p(x + y)] =
(djp)(x + y) = dyj [p(x + y)]. We can therefore use Green's formula to transfer
5.6. Some Applications to Physics 249
the derivatives to |y| 1. We need to take some care, however, since the singularity
of |y|-1 does not remain harmless after being differentiated twice.
Let us fix the point x and choose positive numbers e and K, with e < 1 and K
large enough so that p(x + y) = 0 if |y| > K - 1. Let Re>K = {y : e < |y| < K}.
We then have
V2p(x + y) «
V2n(x) = lim
The integrand has no singularities in the region R€jk, so we can apply Green's
formula (5.39) to obtain
V u(x) = lim
The integral over R€ik on the right vanishes by (5.45). Also, the boundary of R€>k
consists of two pieces, the sphere |y| = K and the sphere |y| = e, and the integral
over |y| = K is zero because p(x + y) and its derivatives vanish for |y| > K — 1.
Therefore,
Vp(x + y) • n C C
(5.48) dA -dA- -47T62 = 4yrCe,
|y|=e
which vanishes as e —>• 0. To evaluate the second term in (5.47), we observe that
n = —e 1y. (See the remark preceding the exercises in §5.5.) An easy calculation
gives Vdyj"1) = -y/|y|3, so Vflyl"-1) • n = e"1 |yl2/|y|3 = e"2. Therefore,
(5.47) and (5.48) show that
y)
V ■u(x) = — lim dA
e—>-0
|y|=«
= (-4?r) lim
\y\=e
250 Chapters. Line and Surface Integrals; Vector Analysis
But the expression inside the brackets is just the mean value of p(x + y) on the
sphere |y| = e, which tends to p(x) as e —> 0, so the proof is complete. □
5.49 Corollary. The electric field E is related to the charge density p by div E =
47T/3.
density p, and the current density J. In suitably normalized units, they are
1<9B
div E = 4-rrp, curl E = —
(5.50) c dt '
divB = 0, curlB = -^
c ot
+ —J,
c
where c is the speed of light. This is not the place for a thorough study of Maxwell's
equations and their consequences for physics, but we wish to point out a couple of
features of them in connection with the ideas we have been developing. In what
follows we shall assume that all functions in question are of class C2, so that the
second derivatives make sense and the mixed partials are equal.
First, Maxwell's equations contain the law of conservation of charge. Indeed,
by formula (5.30) we have
dp 1 ,. <9E c
-— = — div —— = — drv curlB — div J = — div J,
dt 4tt dt 4tt v '
and this is the conservation law in the form (5.41). Second, in a region of space
with no charges or currents (p = 0 and J = 0), by formula (5.33) we have
1 /9R 1
V2E = V(divE) -curl(curlE) = 0 + -curl
curl = rr^Tr
c
c ot c2 dt2
and
EXERCISES
5.52 Theorem (Stokes's Theorem). Let S and dS be as described above, and let
F be a Cl vector field defined on some neighborhood of S in R3. Then
'Here are the precise definitions: A point x G S is in the interior of S relative to So if it has a
neighborhood U (in R3) such that U n So C S; it is in the boundary of S relative to So if all of
its neighborhoods contain points in S and points in So \ S. S is regular if it is compact and every
neighborhood of every (relative) boundary point contains points in the (relative) interior.
254 Chapter 5. Line and Surface Integrals; Vector Analysis
On the other hand, since the formalism of differentials automatically encodes the
chain rule,
Fdx= F l—du+—dv
as Jaw \du dv
(In both of these equations, F and its derivatives are evaluated at G(u, v).) We
apply Green's theorem to this last line integral:
,, , dudv.
]aw \du dv ) JJW \du [ dv ov ou
By the product rale and the chain rale, the integrand on the right equals
dF^dy_ dx+p^
dx du dy
d du
d dz du_ dv dudv
dFdx dFdy dF dz~\ dx d2x
dx dv dy dv dz dv\ du dvdu
_dFd(z,x) dF3(x,y)
dz d(u,v) dy d(u,v)
But this is the integrand on the right side of (5.55), so (5.54) is proved.
5.7. Stokes's Theorem 255
/ F ■ dx = // (1 - 2y) dx dy = vr.
Jc JJx2w<i
(No computation is necessary here; the integral of 1 is the area of the disc and
the integral of —2y vanishes by symmetry.)
There is an interesting feature of Stokes's theorem that does not appear in its
siblings. A closed curve in R2 is the boundary of just one regular region in R2,
and a closed surface in R3 is the boundary of just one regular region in R3; but a
closed curve in R3 is the boundary of infinitely many surfaces in R3! For example,
the unit circle in the xy-plane is the boundary of the unit disc in the xy-plane, the
upper and lower hemispheres of the unit sphere in R3, the portion of the paraboloid
z = 1 — x2 — y2 lying above the unit disc, and so forth. Stokes's theorem says that
if C is a closed curve in R3 and 5 is any oriented surface bounded by C, then
I F-dx= I (cmlF)-ndA
'C JJS
for any C1 vector field F, provided that the orientations on C and S are compatible.
theorem again, the latter line integral is equal to ffD curl F • n dA where D is
the unit disc in the zy-plane. Here n = k, so curl F • n = 0 and the integral
vanishes!
Here is an analogue of the fact that the integral of the gradient of a function
over any closed curve vanishes:
Proof. If F extends differentiably to the region R inside S, this follows from the
divergence theorem, since div(curl F) = 0 for any F. However, it is true even if F
has singularities inside S. To see this, draw a small simple closed curve C in 5 (say,
the image of a small circle in the nu-plane under a parametrization x = G(u, v)).
C divides 5 into two regular regions Si and 52, and we have
On the other hand, if we give C the orientation compatible with S\, Stokes's theo
rem gives
EXERCISES
3. Given any nonvertical plane P parallel to the x-axis, let C be the curve of
intersection of P with the cylinder x2 + y2 = a2. Show that fc[(yz -y)dx +
(xz + x) dy] = 2ira2.
4. Evaluate fjs curl F • n d,A where F(x, y, z) = y\ + (x - 2x3z)j + xy3k and S
is the upper half of the sphere x2 + y2 + z2 = a2.
5. Let F(x, y, z) = 2xi + 2yj + (x2 + y2 + z2)k and let 5 be the lower half of the
ellipsoid (.x2/4) + (j/2/9) + {z'2/27) = 1. Use Stokes's theorem to calculate
the flux of curl F across 5 from the lower side to the upper side.
7. Let Cr denote the circle of radius r about the origin in the zz-plane, oriented
counterclockwise as viewed from the positive y-axis. Suppose F is a C1 vector
field on the complement of the y-axis in M3 such that jc F • dx = 5 and
curl F(.t, y, z) = 3j + {z\ - xk)/(x2 + z2)2. Compute [c F ■ dx for every r.
258 Chapter 5. Line and Surface Integrals; Vector Analysis
/ [[(VfxVg)-ndA.
dS JjS
Proof, (a) implies (b): Suppose C starts and ends at a. Then C has the same initial
and final point as the "constant curve" C2 described by x(t) = a, and obviously
jc G • dx = 0 since d,x = 0 on C?-
(b) implies (a): Suppose C\ and C'2 start at a and end at b. Let C be the closed
curve obtained by following C\ from a to b and then Ci backwards from b to a.
Then 0 = Jc G ■ dx = JCi G • d,x - /^ G • dx. □
A vector field G that satisfies (a) and (b) is called conservative in the region
R. (The word "conservative" has to do with conservation of energy. If we interpret
G as a force field, condition (b) says that the force does no net work on a particle
that returns to its starting point.) A good deal of mathematical physics is based on
the following characterization of conservative vector fields:
1 /■ 1 fh
= — / G • dx = — / Gi(x+ t,X2, ■ ■ ■ ,xn) dt,
h "> Jl
IL 'l JO
'<■ Jo
(5.61)
We observe that when n = 3, the quantities in (5.61) are the components of curl G,
so that (5.61) is equivalent to the condition curl G = 0.
The condition (5.61) is almost sufficient to guarantee that G is a gradient; the
only possible problem arises from the geometry of R, as we shall explain in more
detail below. When R is convex, the problem disappears, and we have the following
result. Our proof will only be complete in dimensions 2 and 3 because it invokes
Green's or Stokes's theorem, but the same idea works in higher dimensions.
260 Chapter 5. Line and Surface Integrals; Vector Analysis
Proof. The idea is similar to the proof of Proposition 5.60, but we do not know
yet that condition (a) of Proposition 5.59 is satisfied, so we must be more careful.
Pick a base point a in R, and define /(x) for x <E R by /(x) = /L(-ax) G ■
dx, where L(a, x) is the line segment from a to x. (We need the hypothesis of
convexity so that this line segment lies in R.) To show that G(x) = V/(x), let
h = (h, 0, • • • , 0) be small enough so that x + h <E R. Let C be the triangular
closed curve obtained by following L(a, x) from a to x, L(x, x + h) from x to
x + h, and then L(a, x + h) backwards from x + h to a. Green's theorem (if
n = 2), Stokes's theorem (if n = 3), or the higher-dimensional version of Stokes's
theorem (if n > 3; see §5.9) converts Jc G • dx into a double integral over the
solid triangle whose boundary is C, whose integrand vanishes by (5.61). Hence
Jc G ■ dx = 0, or in other words,
Now the same argument as in Proposition 5.60 shows that d\f = G\, and likewise
djf =zGj for the other j. □
G{x,y,z) = —2——2".
The hypothesis on R that should replace convexity in Theorem 5.62 to give the
best result is that every simple closed curve in R is the boundary of a surface lying
entirely in R. (The proof requires more advanced techniques.) The region R in
Example 1 does not have this property; no closed curve that encircles the z-axis
can be the boundary of a surface in R.
In practice, if R is a rectangular box, to find a function whose gradient is G one
can proceed in a more simple-minded way than is indicated in the proof of Theorem
5.62. Consider the 2-dimensional case, where R = [a, 6] x [a, 13] and G(x,y) =
P(x, y)i + Q(x, y)j. Assuming that dxQ = dyP, we begin by integrating P with
respect to x, including a "constant" of integration that can depend on the other
variable y:
f(x,y)= /
r P(t,y) dt + <p(y).
Jc
Here c can be any point in the interval [a, b\. Any such / will satisfy dxf = P. To
obtain dyf = Q, differentiate the formula for / with respect to y and use Theorem
4.47:
= Q(x,y) -Q(a,y)+ip'(y).
Then d\f = G\. Differentiating this formula with respect to x%, ■■■ ,xn and using
the facts that djG\ = d\Gj, we obtain formulas for d2^>, ■ ■ ■, dn<p. The problem is
thereby reduced to a similar problem (finding a function with a given gradient) in
one less variable, so we can proceed inductively.
Proof. We shall not give the general proof but shall content ourselves with present
ing an algorithm for solving curl F = G when R is a rectangular box, similar to the
5.8. Integrating Vector Derivatives 263
one given above for solving V/ = G. Suppose that R = [ai, b\] x [02, 62] x [03^3]
and G is a C1 vector field satisfying div G = 0 on R. Unlike the problem of find
ing a function with a given gradient, whose solution is unique up to an additive
constant, there is lots of freedom in choosing an F such that curl F = G, for if
curlF = G then also curl(F + V/) = G for any smooth function /. This gives
enough leeway to allow us to assume that the z-component of F is zero. Thus, let
us write G = Gii + G2J + G3IC and F = i*\i + F23; we then want
F2 = - I Gi(x,y,t)dt + ip(x,y), Fi
Jc
rx
and plugging these results into the third equation yields dx<p — dyip = Ax.
Therefore, one solution (with ip = 2x2 and if) = 0) is
Now, what about the equation div F = gl Here there are no obstructions to
solvability, and there is an enormous amount of freedom in finding a solution. For
example, if we wish to solve div F = g in a rectangular box in Rn, we could take
or similar expressions with the variables permuted; there are many other possi
bilities. In fact, this problem is so easy that it seems reasonable to make it more
interesting by imposing additional conditions on F. We restrict attention to the
three-dimensional situation, but there are similar results in higher dimensions.
The key result here is Theorem 5.46, which shows that we can solve the equa
tion div F = g subject to the restriction that curl F = 0. More precisely, suppose
R is a bounded open set in E3 and g is of class C1 on R. (In Theorem 5.46 g
was assumed to be C2, but see the remarks following the proof.) Smoothness on
R means that g can be extended as a C1 function to an open set containing R, and
it can be modified outside R so as to vanish outside some bounded set while re
maining of class C1. (One multiplies g by a C1 function that is identically 1 on R
and vanishes outside some slightly larger region; we omit the details, which are of
little importance for this argument.) Hence we may assume that g is Cl on R3 and
vanishes outside a bounded set. Then, by Theorem 5.46, the function
5.8. Integrating Vector Derivatives 265
satisfies V2u = g, and so the vector field F = Vti satisfies both div F = g and
curlF = Ooni?.
With this result in hand, we show that the equations curl F = G and div P = g
can be solved simultaneously (for the same F).
5.64 Theorem. Let R be a bounded convex open set in R3. For any Cl function g
on R and any C2 vector field G on R such that div G = 0, there is a C2 vector
field F on R such that curl F = G and div F = g on R.
There is a companion result to Theorem 5.64: Not every vector field is a gra
dient, and not every vector field is a curl, but every vector field is the sum of a
gradient and a curl. The proof is left to the reader as Exercise 3, where a more
precise statement is given.
One might also ask about uniqueness in Theorem 5.64; that is, to what extent is
a vector field determined by its curl and divergence? Clearly, if F satisfies curl F =
G and div F = g, then so does F + H whenever curl H = 0 and div H = 0.
Solutions of the latter pair of equations can be obtained simply by taking H = V<^
where tp is any solution of Laplace's equation W2(p = 0. Such solutions exist in
great abundance, so the F in Theorem 5.64 is far from unique. However, one can
pin down a unique solution by imposing suitable boundary conditions.
5.65 Proposition. Let Rbe a bounded convex open set in R3 with piecewise smooth
boundary. Suppose H is a C1 vectorfield on R such that curl F = 0 and div F = 0
on R and F • n = 0 on dR. Then H vanishes identically on R.
field E vanishes only when there are no time-varying magnetic fields present. Only
in this case is E the gradient of a potential function. However, div B = 0 always
(this expresses the fact that there are no "magnetic charges"), so B is the curl of a
vector potential A. We then have
EXERCISES
1. Determine whether each of the following vector fields is the gradient of a func
tion /, and if so, find /. The vector fields in (a)-(c) are on R2; those in (d)-(f)
are on R3, and the one in (g) is on R4. In all cases i, j, k, and 1 denote unit
vectors along the positive x-, y-, z~, and if-axes.
a. G(x, y) = {2xy + x2)i + (x2 - y2)j.
b. G(x, y) = (3y2 + 5x4y)i + (x5 - 6xy)j.
c. G(x,y) = (2e2x sin y - 3y + 5)i + (e2x cosy- 3.x )j
d. G(x,y,z) = (yz — ysinxy)i + (xz — xsinxy + zcosyz)} + (xy +
ycosyz)\s..
e. G(x, y, z) = (y- z)\ + (x - z)j + (x - y)k
f. G(x, y, z) = 2xyi + {x2 + log z)j + {{y + 2)/z)k (z > 0).
g. G(x, y, z, w) = (xw2 + yzw)i + (xzw + yz2 — 2e22/+z)j + (xyw + y2z —
e2y+z — w sin zw)]<. + (xyz + x2w — z sin zw)\.
2. Determine whether each of the following vector fields is the curl of a vector
field F, and if so, find such an F.
a. G(x, y, z) = (x3 + yz)i + (y - 3x2y)j + 4y2k.
b. G(x, y, z) = (xy + z)\ + xzj - (yz + x)k.
c. G(x, y, z) = (xe-x°'z2 - 6x)i + (by + 2z)j + (z - ze~x2z2)k.
3. Let R be a bounded convex open set in R3. Show that for any C2 vector
field Honi? there exist a C2 function / and a C2 vector field G such that
H = grad / + curl G. (Hint: Solve V2/ = div H.)
4. Let F = Fii + F2j be a Cl vector field on S = R2 \ {(0,0)} such that
d\F2 = c*2-Fi on S (but F may be singular at the origin).
5.9. Higher Dimensions and Differential Forms 267
Green's theorem has to do with integrals of vector fields in the plane, and the di
vergence theorem and Stokes's theorem have do do with integrals of vector fields
in 3-space. What happens in dimension n? There are a couple of things we can say
without too much additional explanation.
First, the obvious analogue of the divergence theorem holds in Rn for any
n > 1. To wit, if R is a regular region in Rn bounded by a piecewise smooth
hypersurface dR, and F is a C1 vector field on R, then
diGn
1 = det du\■■■dun—i,
dn-lGn)
where ei,..., en are the standard basis vectors for Rn. (The reader may verify that
in the case n = 2, these formulas yield Green's theorem in the form (5.18).)
Second, the analogue of the divergence theorem in dimension 1 is just the fun
damental theorem of calculus:
f(b)-f(a)= f f'(t)dt.
J[a,b]
On the real line, vector fields are the same thing as functions, and the divergence of
a vector field is just the derivative of a function. A regular region in R is an interval
268 Chapter 5. Line and Surface Integrals; Vector Analysis
[a, b], whose boundary is the two-element set {a, b}. Since the boundary is finite,
"integration" over the boundary is just summation, and the minus sign on f(a)
comes from assigning the proper "orientation" to the two points in the boundary.
There are also analogues of Stokes's theorem in higher dimensions, which say
that the integral of some gadget G over the boundary of a /c-dimensional submani-
fold of Rn equals the integral of another gadget formed from the first derivatives of
G over the submanifold itself. However, to formulate things properly in this general
setting, it is necessary to develop some additional algebraic machinery, the theory
of differential forms. To do so is beyond the scope of this book; what follows is
intended to provide an informal introduction to the ideas involved. For a detailed
treatment of differential forms, we refer the reader to Hubbard and Hubbard [7] and
Weintraub [19].
Roughly speaking, a differential /c-form is an object whose mission in life is to
be integrated over /c-dimensional sets; thus, 1-forms are designed to be integrated
over curves, 2-forms are designed to be integrated over surfaces, and so on. Here
is how the ideas of vector analysis that we have been studying can be reformulated
in terms of differential forms.
lo = Fi(xi,...,xn)dxi H \- Fn(xi,...,xn)dxn,
However, not every 1-form is the differential of a function; the necessary condition
for to to be of the form df is (5.61).
We note that the set of 1-forms on Rn is a vector space. That is, it makes sense
to add 1-forms to each other and to multiply them by scalars. In fact, the "scalars"
here can be taken to be not just constants but arbitrary continuous functions on W1.
Thus, if a = Ai dx\ + V An dxn and (3 = B\ dx\ + V Bn dxn are 1-forms
and / is a continuous function,
This operation is just the "built-in chain rule" for differentials of functions, ex
tended to arbitrary 1-forms. To wit, let x%,... ,xn and u\,... ,Uk be the coordi
nates on Rn and Rk, respectively. If a; = Fi dx\ + ■ ■ • + Fn dxn is a 1-form on
W1, its puUback via T is the 1-form T*tu on Rk defined by substituting into u the
expressions for the x's in terms of the n's and the dx's in terms of the du's:
(5.67)
T OXr
-\ h duk,
' dui n du\ duk duk
where
Am(ui,...,uk) = Am(T(ui,...,Uk)).
Two special cases are of particular interest. First, the chain rale says that when
u = df, T*lj = d{f o T). Second, when k = 1 so that T : R -> defines a
curve in Wl, (5.67) becomes
u = du.
du du
u= f g(t)dt.
a,b] Ja
Now, if u! = A\ dx\ + • • ■ + An dxn is a 1-form on W1 and C is a smooth curve
parametrized by x = g(t), Jcuj is defined by pulling ui back to M. via g and
integrating the result as before:
r fb
u>= dt.
IC J[a,b] Ja
u= / F • dx.
c Jc
270 Chapter 5. Line and Surface Integrals; Vector Analysis
n n
But according to (5.69), dxj A da;^ = — dxj A dxj and dxj A dx{ = 0. Thus the
terms with i = j in (5.70) drop out, and for i / j we can combine the ijth and jith
terms into one:
AiBj dxi A dxj + A/-B,; dxj A dxj = (A-Bj - AjBi) dxi A dxj
= (AjBi - AiBj) dxj A dxi.
We have the option of using either of the two expressions on the right, and the usual
choice is to use the one where the first index is smaller than the second one. (In R3
a different choice is sometimes convenient, as we shall soon see.) Thus, we finally
obtain
P
(5.71) to = ^2 Cij(xi,...,xn)dxi/\dxj,
5.9. Higher Dimensions and Differential Forms 271
where the CV, are continuous functions on Rn. We note that the number of terms
in this sum, that is, the number of pairs (i,j) with 1 < i < j < n, is ^n(n — 1).
In (5.71) we also have the option of rewriting dx-h A dxj as — dxj A dx{ if we so
choose.
What does this really mean? We have been proceeding purely formally, without
saying what meaning is to be attached to the expressions dxi A dxj. In the full-dress
treatment of this subject, 2-forms are defined to be alternating rank-2 tensor fields
over R™, but this is somewhat beside the point. For now it is probably best to
think of a 2-form on Rn simply as a \n{n — l)-tuple of functions, namely the
functions Cij in (5.71), and the expressions dx-L A dxj simply as a convenient set of
signposts to mark the various components, just as i, j, and k are used to mark the
components of vector fields in R3. The important features of 2-forms are not their
precise algebraic definition but the way they transform under changes of variables
and the way they integrate over surfaces.
Before proceeding to these matters, however, let us see how things look in the
3-dimensional case. When n = 3 we also have \n(n — 1) = 3, so 2-forms have 3
components just as vector fields and 1-forms do: This is the "accident" that makes
n = 3 special! The general 2-form on R3 can be written as
Observe carefully how we have set this correspondence up: we have written the
basis elements dx{ A dxj with the variables in cyclic order,
rather than the "i < j" order we used above, so that the middle term is dz A dx
rather than dx A dz. Also, we identify the unit vector i in the x direction with the
2-form dy A dz from which dx is missing, and likewise for j and k.
The exterior product in 3 dimensions looks like this: If
then
Thus, if we identify a and (3 with vector fields according to (5.66) and a A (3 with
a vector field according to (5.72), the exterior product turns into the cross product:
d(xi,xm)
. d(ui,iij) % J'
so in general, if
to = y~] Cim{x) dxL A dxm,
then
It is a consequence of the chain rule that the pullback operation behaves properly
under composition of mappings, namely, (Ti o T2)*w = T^(T\uo).
We can now show how to integrate 2-forms over surfaces. First consider the
simplest case, where the surface is simply a region D in M2. If we name the coor
dinates on IR2 x and y, the general 2-form on R2 has the form to = f(x,y)dxf\ dy,
and its integral over D is the obvious thing:
the integral on the right being the ordinary double integral of / over D. The only
subtle point is that the integral on the left is an oriented integral, the orientation
being carried in the fact that dx comes before dy in dx A dy. If we wrote dy A dx
instead, we would introduce a minus sign.
The nice thing about (5.73) is that the change-of-variable formula for double
integrals is more or less built into it. Namely, suppose T : R2 —>• R2 is an invertible
transformation, say T(u, v) = (x, y). If u) — f(x, y) dx A dy, then
In other words, the formalism of differential forms produces the necessary Jacobian
factor automatically. The change-of-variable formula as we have seen it before
involved |detZ?T| rather than detDT, but this discrepancy is accounted for by
the difference between ordinary integrals and oriented integrals.
Now we turn to the case of integrals over a surface S in W1. The idea is the
same as for line integrals: If u is a 2-form on M3 and S is a surface parametrized
by x = G(u, v), (u, v) e D C K2, we define JJS w by pulling to back to D via G
and using (5.73) to define the resulting integral:
io= 11 G*w.
s JJd
This is independent of the parametrization, in the following sense: If G = G o T
where T : R2 -> R2 is a C(1) transformation, then by (5.74),
'D
lo= flp-ndA.
Js J Js
Hence the notion of surface integrals of vector fields in R3 also fits into the theory
of differential forms.
274 Chapter 5. Line and Surface Integrals; Vector Analysis
Here, as in the case of 2-forms, one can think of the expressions dxi A dxj A
simply as formal basis elements, and one can put the indices i,j,k in an order other
than i < j < k with the understanding that whenever one interchanges two of the
dx's one introduces a minus sign. The number of terms in the sum in (5.75) is the
binomial coefficient n!/3!(n — 3)!. When n — 3, this number is 1: All 3-forms on
R3 have the form
uj = f(x, y, z) dx A dy A dz
1 = C\ dx + C<2 dy + C3 dz,
then
Mi A2 A3\
a A (/? A 7) = (a A /?) A 7 = det \Bi B2 B3\ dx A dy A dz,
\Ci C2 C3J
a A to = uj A a = (AiW\ + A2W2 + A3W3) dx A dy A dz.
Thus, if we identify a, 0,7 with the vector fields F, G, H and uj with the vector
field V, the exterior product turns into the scalar triple product and dot product:
T*u; = f(T(u,v,w))®py'Z\
o(u,v,w)
duAdvA dw.
The integral of u over a region D C R3 is denned in the obvious way:
r r r
f(x,y,z)dx Ady Adz- f,
Id JJJd
and the change-of variable formula (for oriented integrals) reads
"=///oTV
We have now sketched the whole idea of differential forms in dimension 3.
In dimension n one needs to develop the theory of /c-forms for all k < n, which
requires the machinery of multilinear algebra.
The Exterior Derivative. When the operations of gradient, curl, and diver
gence are expressed in terms of differential forms, they are all instances of a single
operation, denoted by d and called the exterior derivative, which maps /c-forms
on Rn into (k + 1)-forms on Rn:
dAj
276 Chapter 5. Line and Surface Integrals; Vector Analysis
\dA2 &4i]
+ -f, r— dx A dy.
I dx dy ]
But this is just the curl! That is, if we identify the 1-form uj and the 2-form du
with vector fields F and G in the standard way, then G = curl F. The curl is the
exterior derivative on 1-forms in M3.
Now suppose that uj = A dy Adz + B dz A dx + C dx A dy is a 2-form. As the
notation in higher dimensions gets messy, we shall write out only the 3-dimensional
case:
duj — dA A dy A dz + dB A dz A dx + dC A dx A dy
(For the last equality we have used the fact that an exterior product containing two
identical factors vanishes and the fact that the product dxAdyA dz is unchanged by
cyclic permutation of its three terms.) If we identify uj with a vector field F and dio
with a function g as before, we see that g = div F. The divergence is the exterior
derivative on 2-forms in K3.
We observed earlier that curl(V/) = 0 for any function / and div(curl F) = 0
for any vector field F. The interpretation of these identities in terms of differential
forms is that d(df) = 0 for any 0-form (function) / and d{duj) = 0 for any 1-form
uj. It is true in general that
(5.76) d{duj) = 0
for any k-form uj on W1. In all cases the proof of this fact boils down to the equality
of mixed partials.
As an illustration of the exterior derivative, we give the relativistically covari-
ant reformulation of Maxwell's equations (5.50). The key idea is to think of elec-
tromagnetism as a phenomenon in 4-dimensional space-time rather than a time-
dependent phenomenon in 3-dimensional space. The electric and magnetic fields
E = (Ex,Ey,Ez) and B = (Bx,By,Bz) are combined into a single entity, the
5.9. Higher Dimensions and Differential Forms 111
electromagnetic field tensor, which we identify in two ways with a 2-form on R4:
uj = c{Ex dx Adt + Ey dy A dt + Ez dz A dt)
+ Bx dy Adz + By dz A dx + Bz dx A dy,
where c is the speed of light. Also, the current and charge densities p and J =
{Jx,Jy,Jz) are combined into a single entity, the 4-currentdensity, which we iden
tify with a 3-form on M4:
7 = c(Jx dy A dz A dt + Jy dz A dx A dt + Jz dx A dy A dt) - p dx A dy A dz.
The four Maxwell equations (5.50) then turn into the two equations
du = 0, duo* = 47T7.
The verification of this is a good way for readers to see whether they have learned
how to compute exterior derivatives!
Stokes's Theorem. We can now state the general theorem that encompasses the
integral theorems of the preceding sections and their higher dimensional analogues:
'dM J J JM
(5.78) d{dM) = 0.
The general Stokes theorem shows that (5.76) and (5.78) are in some sense
equivalent. Indeed, if M is k-dimensional and to is a (k — 2)-form, the Stokes
theorem gives
If we accept the geometric fact that d(dM) = 0, then the integral on the left
vanishes, and hence so does the integral on the right. But since this happens for
every M, it follows that d(dui) = 0. Similarly, if we know that d(doj) = 0 for
every to, we can conclude that d(dM) = 0. This sort of interplay of algebra,
analysis, and geometry is a significant feature of much of modern mathematics.
Chapter 6
INFINITE SERIES
Infinite series are sums with infinitely many terms, of which the most familiar
examples are the nonterminating decimal expansions. For instance, the equality
vr = 3.14159 ... is an abbreviation of the statement that vr is the sum of the infinite
series
14 15 9
3 + — + --77 +
10 102 103 104 105
The procedure by which one makes sense out of such sums stands alongside dif
ferentiation and integration as one of the fundamental limiting processes of mathe
matical analysis. Just as decimal expansions provide a useful way of obtaining all
real numbers from the finite decimal fractions, infinite series provide a flexible and
powerful way of building complicated functions out of simple ones.
This chapter is devoted to the foundations of the theory of infinite series. In
it we develop the basic facts about series of numbers; then in the next chapter we
proceed to the study of series of functions.
Informally speaking, an infinite series (or just a series, for short) is an expression
of the form
oo
y~] an = au + ffli + Q2 H •
o
Here the a^'s can be real numbers, complex numbers, vectors, and so on; for the
present, we shall mainly consider the case where they are real numbers.
279
280 Chapter 6. Infinite Series
00
/ ^ an = ag + oq + ar H ,
5
Before proceeding further, let us record a couple of very simple but important
facts about series.
6.1 Theorem.
a. If the series Y^ an and ^^° bn are convergent, with sums S and T, then
X^o°(a« + ^«) i-s convergent, with sum S + T.
b. If the series Y^ an is convergent, with sum S, then for any c €. M the series
X^o° ca« *5 convergent, with sum cS.
c. If the series ^^° an is convergent, then lirrin^oo an = 0. Equivalently, if
an -/-> 0 as n —> oo, then the series ]P^° an is divergent.
6.1. Definitions and Examples 281
Proof. Let {sk} and {tk} be the sequences of partial sums of the series ]T)^° a«
Y^ bn, respectively, (a) and (b) follow from the fact that if sk —> 5 and ifc —> T,
then Sfc + tk —>■ 5 + T and cs^ —> cS. As for (c), we observe that an — sn - sn-%.
If the series converges to the sum 5, it follows that lim an = lim sn — lim sn_i =
5-5 = 0. □
At present we are thinking primarily of series whose terms are numbers, but
most of the really significant applications of series come from situations where the
terms an depend on a variable x. In this case the series ^o° an{%) may converge
for some values of x and diverge for others, and it defines a function whose domain
is the set of all x for which it converges. We shall explore this idea in more detail
in the next chapter; at this point we recall some familiar examples.
One of the simplest and most useful infinite series is the geometric series, in
which the ratio of two succeeding terms is a constant x. That is, the geometric
series with initial term a and ratio x is
The partial sums sk = ^o x" of the series So° x" are easily evaluated. If
X = 1, then of course s^ = 1 + 1 + ■ • ■ + 1 = k + 1. If x / 1, we observe that
sk = 1 + x -\ hxfc,
xsk = x-\ \-xk + xk+1,
and subtracting the second equation from the first yields (1 — x)sh = 1 — xk+1.
Therefore,
I _ x
If \x\ < 1, then xk+1 —> 0 as k —> oo, so 8% —> (1 — x) l. It also follows easily
from (6.2), or from Theorem 6.1c, that {s/J diverges when \x\ > 1. In short, we
have:
6.3 Theorem. The geometric series ^o° xH converges if and only if \x\ < 1, in
which case its sum is (1 — x)~l.
282 Chapter 6. Infinite Series
If it happens that R^ (x) —>■ 0 as k —> oo, we can let k —> oo in (6.4) to obtain an
infinite series expansion of f(x), the Taylor series of / (centered at x = 0):
(6.5,
One simple sufficient condition to guarantee that Rk{x) —> 0 follows from the
estimate for the Taylor remainder in Corollary 2.61:
6.6 Theorem. Let f be a function of class C°° on the interval (—c,c), where
0 < c < oo.
a. If there exist constants a, b > 0 such that \f(k\x)\ < abkk\for all \x\ < c and
k > 0, then (6.5) holds for \x\ < min(c, 6"1).
b. If there exist constants A7B > 0 smc/i f/iaf |/(fc)(x)| < ABk for all \x\ < c
and k > 0, ^/zew (6.5) holds for \x\ < c.
Proof. By Corollary 2.61, the estimate \f^(x)\ < abkk\ implies the estimate
\Rk-i{x)\ < a|6a;|fc for \x\ < c. If also \x\ < b 1, then \bx\k -> 0 as A; —> oo,
so (6.4) yields the result (a). To deduce (b), we observe that the factorial function
grows faster than exponentially (see Example 5 in §1.4), so that for any positive
A, B, and b, the sequence A(B/b)h/kl tends to zero as k —> oo. Letting a be the
largest term in this sequence, we have
so the estimate \f^k\x)\ < ABk, for a given A and B, implies the estimate
W < abkk\ for every b > 0 (with a depending on b). Hence (b) follows
from (a). □
Remark. The interval (—c,c) might not be the whole set where the function
/ and its derivatives are defined. It may be necessary to restrict x to a proper
subinterval of the domain of / to obtain the estimates on f^k\x) in Theorem 6.6,
as Example 2 will show.
6.1. Definitions and Examples 283
Example 1. Let f(x) = cos x. The derivatives f^ (x) are equal to ± cos x or
± sin x, depending on k, so they all satisfy \j^ (x) \ < 1 for all x. By Theorem
6.6b, it follows that cosx is the sum of its Taylor series, 2^°(—l)nx2n/(2n)\,
for all x. For exactly the same reason, sin x is the sum of its Taylor series,
£g°(-l)nx2n+1/(2n + 1)!, for all x.
Example 2. Let f(x) = ex. Here f^(x) = ex for all k. We cannot obtain a
good estimate on /^ (x) that is valid for all x at once, but for \x\ < c we have
|/(fc'(x)| < ec. By Theorem 6.6b, it follows that ex is the sum of its Taylor
series, ^^° xn/n\, for \x\ < c. But c is arbitrary, so in fact ex = Y^ xn/nl
for all x.
Finally, we mention one other simple type of series that arises from time to
time. Just as J f(x) dx is easy to compute when / is the derivative of a known
function, the series £^ an is easy to sum when the terms an are the differences of
a known sequence {bn}. That is, suppose ao = b0 and an = bn - bn-i for n > 1;
then
so the series ^o° an converges if and only if the sequence {bn} converges, in which
case ^^° an = lim bn. Such series are called telescoping series.
EXERCISES
1. Find the values of x for which each of the following series converges and com
pute its sum.
a. 2(x + 1) + 4(x + I)4 + 8(x + I)7 + • • ■ + 2n+1(x + l)3n+1 + • • •
b. 10x"2 + 20x"4 + 40x~6 + • • • + 10 • 2naT2(n+i) + • • ■
c. 1 + (1 - x)/(l + ar) + (1 - x)2/(l + x)2 + • • • + (1 - x)"/(l + x)n + ■ • ■
d. logx + (logx)2 + (logx)3 H 1- (logx)" H
2. Tell whether each of the following series converges; if it does, find its sum.
ii3,5,_9_i17i
a. x -r 4 T g t jg t 32 t ■
c. (v 2 — v 1) + (v 3 — v 2) + (\/4 — v3) + • • •
H1
a. 1-51 +
I 11 1-4-1
3 + 1-4J- +i_ 1-5
1 1_L
+
3. Let /(x) = log(l + x). Show that the Taylor remainder Ro,k{x) (defined by
(2.54)) tends to zero as k —» 00 for — 1 < x < 1, and conclude that
oo
xn
log(l + x) = V(-l)"+1— for - 1 < x < 1.
1
284 Chapter 6. Infinite Series
(Hint: Lagrange's formula for i?0,fc easily yields the desired result when —| <
x < 1 but not when -1 < x < —|. For x < 0, use the integral for
mula (2.56) for Rq^ and the mean value theorem for integrals to show that
\Ro,k(x)\ = \x\ix' - x)n(x' + l)~n~l for some x' G (x,0), and thence show
4. Given a sequence {an} of numbers, let f|1 an denote the product of the num
bers ai,..., afc. The infinite product f]^° an is said to converge to the number
P if the sequence of partial products converges to P:
oo k
(TVofe: In many books one finds a more complicated definition that takes ac
count of the peculiar role of the number 0 with regard to multiplication.)
a. Show that if Y[^° an converges to a nonzero number P, then lirxin^oo an =
1. (This is the analogue of Theorem 6. lc for products.)
b. Show that if YIT an converges to a nonzero number P, then ^i° l°g an
converges after omission of those terms for which an < 0. (By (a), there
can only be finitely many such terms, and no an can be 0.) Conversely,
show that if an > 0 for all n and J2T 1°S an converges to 5*, then r]J° an
converges to es. (See also Exercise 5 in §6.3.)
In this section we begin the systematic study of the convergence of infinite series
by considering series with nonnegative terms. If an > 0 for all n, the partial sums
$k — Q-o + • ■ ■ + «fc form an increasing sequence. By the monotone sequence
theorem, therefore, the series J^^° an converges if and only if the partial sums s^
have a finite upper bound. This observation leads to a variety of comparison tests,
in which the partial sums 8k are compared to more easily computable quantities
that can be shown to be bounded or unbounded.
6.7 Theorem. Suppose f is a positive, decreasing function on the half-line [a, oo).
Then for any integers j, k with a < j < k,
fc-1
Proof Since / is decreasing, for n < x < n +1 we have f(n) > f(x) > /(n+1),
and hence
/k roo
f(x)dx<f(l)+ I f(x)dx,
v—^
= /(l)
286 Chapter 6. Infinite Series
so the partial sums are bounded above and hence the series converges. On the other
hand, if J^3 f(x) dx = oo, we have
fc-i k
Of course, a similar result relates J2T /(n) to JJ° /(x) dx, for any integer J.
We chose J = 1 because it is appropriate for the following important application.
6.9 Theorem. The series ^2^° n~p converges ifp > 1 and diverges ifp < 1.
Proof. The same is true of the integrals f^° x~'p dx, for
K f(p-l)-1 ifp>l,
x~p dx = lim
1 — p oo if p < 1,
Theorem 6.7 does more than provide a test for convergence; it also provides an
approximation to the partial sums and the full sum of the series. In the convergent
case, this can be used to provide a numerical approximation to the sum Y2T f(n)
or an estimate of how many terms must be used for a partial sum to provide a good
approximation; in the divergent case, it can be used to estimate how rapidly the
partial sums grow.
Suppose, for example, that / is positive and decreasing, and that J^° f(x)dx <
oo. By letting k —> oo in Theorem 6.7, we obtain
oo
E
and hence
^OG _°°_ /'OO
- / f(x)dx.
This gives an approximation to the sum J^T f(n) with an error of at most /(I).
A better approximation can be obtained by using this estimate not for the whole
series but for its tail end:
oo fc-1 oo
oo
l
Jw
A bit of work with a pocket calculator yields the value of this last sum as
1.08226 ..., so we can conclude that 1.08226 < YT n~4 < 1-08236. (The
exact value of YT n~4 is ^V90 = 1-0823232 ...; see Exercise 3 in §8.3 or
Exercise 9a in §8.6.)
General Comparison Tests. One can often decide whether a series of nonneg-
ative terms converges by comparing it to a series whose convergence or divergence
is known. The general method is as follows.
6.11 Theorem. Suppose 0 < an < bnforn > 0. IfYT^n converges, then so
does YT an- tfYT an diverges, then so does YT ^n-
converges, the numbers tk form a bounded set; hence so do the numbers sk, so the
sequence {s/J converges by the monotone sequence theorem. This proves the first
assertion, to which the second one is logically equivalent. □
A couple of remarks are in order concerning this result. First, the convergence
or divergence of a series is unaffected if finitely many terms are deleted from or
added to the series. Hence, the comparison an < bn only has to be valid for all
n > N, where N is some (possibly large) positive integer. Second, the convergence
or divergence of a series is unaffected if all the terms of the series are multiplied by
a nonzero constant. Hence, the comparison o,, < bn can be replaced by an < cbn,
where c is any positive number.
When an is an algebraic function of n (obtained from n by applying various
combinations of the arithmetic operations together with the operation of raising to
a power, x —> xa), one can usually decide the convergence of Y an by comparing
288 Chapter 6. Infinite Series
it to one of the series ^i° n~p> discussed in Theorem 6.9. The rule of thumb,
obtained by combining Theorems 6.9 and 6.11, is that if an > cn~1 then y] an
diverges, whereas if an < cn~p for some p > 1 then ^ an converges.
In -1 2n 2 n
1 112
12),
n2 - 6n + 10 (?i2/2) + 10 («2/2) n2 v<
which gives the desired comparison. However, there is also a simpler way to
proceed. The key observation is that when n is large, —6n + 10 is negligibly
small in comparison with n2, so (n2 - 6n + 10)"1 is practically equal to n~2.
More precisely,
(n2-6n + 10)-1 n2 1
^I32 " n2 - 6n + 10 " 1 - On"1 + lOn"2 ^ ! as n ^ °°'
which immediately gives the comparison (n2 — 6n + 10)-1 < 2n~2 when n is
large.
The second method for solving Example 3 can be formulated quite generally;
the result is often called the limit comparison test:
6.12 Theorem. Suppose {an} and {bn} are sequences of positive numbers and
that an/bn approaches a positive, finite limit as n —> oo. Then the series ]P^° an
and X^o° ^n we either both convergent or both divergent.
Proof. If an/bn —> / as n —> oo, where 0 < / < oo, we have hi < an/bn < 21
when n is large; that is, an < 2lbn and bn < (2/l)an. The result therefore follows
from Theorem 6.11 and the remarks following it. D
Theorem 6.12 can be extended a little. If an/bn -4 0 as n —> oo, then an <
bn for large n, so the convergence of ^ bn will imply the convergence of Y an-
Likewise, if an/bn —> oo, then an > bn for large n, so the convergence of J2an
will imply the convergence of £)&n- However, the reverse implications are not
valid in these cases.
6.2. Series with Nonnegative Terms 289
6.13 Theorem (The Ratio Test). Suppose {an} is a sequence of positive num
bers.
a. Ifan+i/an < r for all sufficiently large n, where r < 1, then the series Y^ an
converges. On the other hand, ifan+i/an > I for all sufficiently large n, then
the series Y^ an diverges.
b. Suppose that I — limn^oo an+i/an exists. Then the series ^^° an converges
if I < 1 and diverges if I > 1. No conclusion can be drawn if I = 1.
oo
On the other hand, if an+i/an > 1 then an+i > an; if this is so for all n > N,
then an -ft 0, so Y an cannot converge. This proves (a).
Assertion (b) is a corollary of (a). If I < 1, choose r with I < r < 1. If
liman+i/an = I, then an+i/an < r for large n, so ]T) an converges. If / > 1,
then an+i/an > 1 for large n, so J2an diverges. Finally, if we take an = n~p,
we know that Y1T an converges if p > 1 and diverges if p < 1; but an+i/an =
{n/(n + l)]p —> 1 no matter what p is. Hence the test is inconclusive if / = 1. □
6.14 Theorem (The Root Test). Supppose {an} is a sequence of positive num
bers,
a. If an < r far all sufficiently large n, where r < 1, then the series ^o° a«
converges. On the other hand, ifan > I for all sufficiently large n, then the
series Y^ o-n diverges.
b. Suppose that I = limn
limn_>oo an exists. Then the series ^g30 an converges if
I < 1 and diverges if I > 1. No conclusion can be drawn if I = 1.
Part (b) follows as in the proof of the ratio test. If an —> I < 1, we pick
r G (/, 1) and obtain an < r for large n, so Yl an converges. If an —*■ I > 1,
then an > 1 for large n, and ^ an diverges. Finally, for an = n~p we have
o-n = rCpln —♦ 1 for any p, so the test is inconclusive when i = 1. D
: In the last line of this proof, and in Example 4 below, we use the fact
that linix-Kjo xx/x = 1. To see, this, observe that log^1/^) = (\ogx)/x, and
limx^oo(logx)/a; = 0 by l'Hopital's rule.
It can be shown that if an+i/an converges to a limit I, then al/n also converges
to the same limit; but the convergence of an does not imply the convergence of
an+i/an. (See Example 6.) Thus the root test is, in theory, more powerful than
the ratio test. However, the ratio test is often more convenient to use in practice,
especially for series whose terms involve factorials or similar sorts of products.
Example 4. Let an = n2/2n. The ratio test and the root test can both be used
to establish the convergence of ^2^ an:
Z /
+ +
0
Here an+i/an equals 1 if n is even and ^ if n is odd, so the ratio test (even
the more general form in part (a) of Theorem 6.13) fails; its hypotheses are not
satisfied. But the root test works: an equals 2"1/2 if n is even and 2~(n~1^2n
if n is odd; both of these expressions converge to 2~1//2 as n —> oo, so the series
converges. (Of course, this can also be proved more simply. By grouping the
terms in pairs, one sees that ^^° an = 2 ^^° 2~m = 4.)
6.2. Series with Nonnegative Terms 291
Raabe's Test. The ratio test and the root test are, in a sense, rather crude,
for the indecisive cases where liman+i/an = 1 or liman = 1 include many
commonly encountered series such as Y^° n~p. The reason for this insensitivity is
that the terms of the geometric series Y2 r™ either converge to zero exponentially
fast (if r < 1) or not at all (if r > 1), so they do not furnish a useful comparison for
quantities such as n~v that tend to zero only polynomially fast. However, there is
another test, Raabe's test, that is sometimes useful in the case where lim an+\/an =
1. The class of problems for which Raabe's test is effective is rather limited, and
there is another way of attacking the most important of them that we shall present in
§7.6. Hence we view Raabe's test as an optional topic; however, the insight behind
it is of interest in its own right.
The idea is to use the ratios an+i/an to compare the series ^2 an to one of the
series Y2 n~p rather than to the geometric series. For the series Yl n~%'> me rati°
of two successive terms is (n + l)~p/n~p = [1 + (l/n)]~p. To put this quantity
in a form more amenable to comparison, we use the tangent line approximation to
the function f(x) = (1 + x)-p at x = 0. Since f'(x) = -p(l + x)^"1 and
f"(x) = p(p + 1)(1 + x)~p~2, Lagrange's formula for the error term gives
(6.15)
n\ n
Thus, n[l — [n + l)~p jn~p\ is approximately p when n is large. With this in mind,
we are ready for the main result.
6.16 Theorem (Raabe's Test). Let {an} be a sequence ofpositive numbers. Sup
pose that
Proof. If L > 1, choose a number p with 1 < p < L. Then, when n is large, we
have n[l — (an+i/an)} > p, that is, an+i/an < 1 — (p/n). Thus, by (6.15),
On the other hand, if L < 1, choose numbers p and q with L < q < p < 1.
Then, when n is large, we have n[l - (an+i/an)] < q, that is, (an+i/an) >
1 - (q/n). If also n > p(p + l)/2(p - g), we have p(p + l)/2n2 < (p - q)/n, so
by (6.15),
The main applications of Raabe's test are to series whose terms involve quo
tients of factorial-like products. The following example is typical.
l-4-7---(3n
Example 7. Let an = —■ „„_ . ' '--—. We have
?/,23"n!
This tends to 1 as n —> oo (the dominant term on both top and bottom is 3n3),
so the ratio test fails. But
Concluding Remarks. Faced with an infinite series ^an, how does one de
cide how to test it for convergence? Some series require more cleverness than
others, but the following rules of thumb may be helpful.
• If an involves expressions with n in the exponent, try the ratio test or the root
test.
6.2. Series with Nonnegative Terms 293
• If an involves factorial-like products, the ratio test is the best bet. If the ratio
test fails because liman+i/an = 1, try Raabe's test.
• The integral test may be useful when numerical estimates are desired or when
the series is near the borderline between convergence and divergence.
In any case, one should beware of confusing the various sequences that arise in
the study of infinite series. For any infinite series Yl an> one has the sequence {an,}
of terms and the sequence {s^} of partial sums. In the ratio test, one considers the
sequence {an+i/an} of ratios of successive terms of a series, whereas in the limit
comparison test, one considers the sequence {an/bn} of ratios of corresponding
terms of two different series. Don't mix these sequences up!
EXERCISES
^ \Jn + 1
n2 _ 4n _|_ 5 ■
oo
ne
x 2n2 - n
, 2n8/3
n + 1
n! '
"
3n(2n)!
0
OC'
n!
7.
10n'
oc
\-100
8.
2
oo
1 ■ 3 ■ ■ • (2n + 1)
u
7■
/ ^ ■•(3n + 2)
0
294 Chapter 6. Infinite Series
00 / 1 \9
{2n)V
0
00
on |
O Tl.
11. y ■
n
0
( " y2
00
53
) ■
12.
00
E (l/n)].
r-i
13.
1
00
y/n+ 1 - V"
14.
E \fn-+^
1
00
n
15.
E 1
sin _
n2 H--3"
00
n2 [vr + (_1)n]n
16.
E 5
1
00
(2n - 1)
17
it*
1
4.6... (2n + 2)
00
2-4- • • (2n)
19. Suppose an > 0. Show that if ^T an converges, then so does J2 a« f°r any
00
22. By Exercise 20, ^2° Vtn ^°g n] diverges while J2T l/[n(^°g n)2] converges.
Use Theorem 6.7 to show that
10*' oo
The point is that for series such as these that are near the borderline between
convergence and divergence, attempts at numerical approximation by adding
6.3. Absolute and Conditional Convergence 295
up the first few terms aren't much use. If you add up the first 1040 terms of the
first series, you get no clue that the series diverges; and if you add up the first
1040 terms of the second one, the answer you get still differs from the full sum
in the second decimal place. (By way of comparison, the universe is around
1018 seconds old, and the earth contains around 1050 atoms.)
23. Verify that x/(x2 + I)2 is decreasing for x > 3"1/2, and thence show that
0.38 < £fn/(n2 + l)2 < 0.41.
24. Let ck — l + \ + --- + \- log k. Show that the sequence {cfc} is positive
and decreasing, and hence convergent, (lim^oo Ck is conventionally denoted
by 7 and is called Euler's constant or the Euler-Mascheroni constant. It is
approximately equal to 0.57721; it is conjectured to be transcendental, but at
present no one knows whether it is even irrational.)
25. Suppose an > 0 for all n > 0, and let L = limsupa^ n (see Exercises 9-12
in §1.5). Show that J2T an converges if L < 1 and diverges if L > 1.
We now consider the question of convergence of series whose terms may be either
positive or negative. To a certain extent, this question may be reduced to the study
of series with nonnegative terms, via the notion of absolute convergence.
A series ^)£° an is called absolutely convergent if the series Y^ \an\ con
verges. For series with nonnegative terms, absolute convergence is the same thing
as convergence. For more general series, the basic result is as follows.
Proof. Suppose ]T^° \an\ converges. Let s^ = Yloan and Sk = Ylo \an\- The
sequence {Sk} is convergent and hence Cauchy, so given e > 0, there exists an
integer K such that
But then
| + • ■ • + flfcl < \aj+i\ + ' ■' + lajl < e whenever k > j > K,
so the sequence {sfc} is also Cauchy. By Theorem 1.20, the sequence {sk}, and
hence the series ]T} an, is convergent. □
296 Chapter 6. Infinite Series
Important Remark. We can consider series whose terms are complex numbers
or n-dimensional vectors instead of real numbers. The definition of absolute con
vergence is the same, with \an\ denoting the norm of the vector an. Theorem 6.17
remains valid in this more general setting, with exactly the same proof.
The converse of Theorem 6.17 is false; a series that is not absolutely convergent
may still converge because of cancellation between the positive and negative terms.
A series that converges but does not converge absolutely is said to be conditionally
convergent.
3 3 """ 5 5 "f"
0
0 0 ""* '
_i\n-l
n * ° *
i
(n — 1)! „ _ , .
k{x)
TV.
x2 x3
■■ + -^1 +Rk(x),
and by Corollary 2.61,
1 (-l)kk\ 1
[k -\- lj! o<i<i (1 ~l~ v "<
a^ = max(an,0) a~ = max(—an,0).
Absolutely convergent series are much more pleasant to deal with than condi
tionally convergent ones. For one thing, they converge more rapidly; the partial
sums Sk of conditionally convergent series tend to provide poor approximations to
the full sum unless one takes k very large because the divergence of Y2 \an\ implies
that an cannot tend to zero very rapidly as n —» oo. For another thing, the sum
of an absolutely convergent series cannot be affected by rearranging the terms, but
this is not the case for conditionally convergent series!
Let us explain this mysterious statement in more detail. The terms of a series
^o° an are presented in a definite order: ao, Oi, a2, We might think of forming
a new series by writing down these terms in a different order, such as
ao,a2,ai,a4,a6,a3,«8) 0-10)05, ■ • •,
298 Chapter 6. Infinite Series
where we take the first two even-numbered terms, the first odd-numbered term,
the next two even-numbered terms, the next odd-numbered term, and so forth. In
general, if u is any one-to-one mapping from the set of nonnegative integers onto it
self, we can form the series Y"^ a<r(n), which we call a rearrangement of ^o° an-
(The reasons why we would want to do this are perhaps not so clear right now, but
we will encounter situations in §6.5 where this issue must be addressed.) The sharp
contrast between absolutely and conditionally convergent series with respect to re
arrangements is explained in the following two theorems.
Proof. First suppose an > 0 for all n. Every term of the rearranged series Y aa(n)
is among the terms of the original series Y an, and hence the partial sums of the
rearranged series cannot exceed 5. It follows that the full sum S' of the rearranged
series satisfies S' < S. The same reasoning shows that S < S', so S' = S.
Now we do the general case. If < oo, we have Y \aa(n) | < oo by what
we have just proved. Hence, given e > 0, for k sufficiently large we have
oo
fe+l
Given such a k, let K be the largest of the numbers <r(0),..., a(k), so that
{o-(0),a(l),...,a(k)}c{0,l,...,K}.
The elements of {0,1,... ,K}\{a(0),a(l),... ,a(k)} are among the u(n)'s with
n > k + 1, so
K k oo
But then
ir K
6.20 Theorem. Suppose ^o° an *s conditionally convergent. Given any real num
ber S, there is a rearrangement Y'q' aa-(n) tnat converges to S.
6.3. Absolute and Conditional Convergence 299
Proof. By Theorem 6.18, the series J2 at and 12 an °f positive and negative terms
from Y^ an both diverge; but since J2 an converges, we have an —> 0 as n —> oo.
These pieces of information are all we need.
Suppose 5 > 0. (A similar argument works for S < 0.) We construct the
desired rearrangement as follows:
1. Add up the positive terms from the series ^ an (in their original order) until
the sum exceeds S. This is possible since J2an = °°- Stop as soon as the sum
exceeds S.
2. Now start adding in the negative terms (in their original order) until the sum
becomes less than S. Again, this is possible since ^ a~ = oo. Stop as soon as
the sum is less than S.
3. Repeat steps 1 and 2 ad infinitum. That is, add in positive terms until the sum
is greater than S, then add in negative terms until the sum is less than S, and
so forth. This process never terminates since the series J2 a+ and ^ an both
diverge, and sooner or later every term from the original series will be added
into the new series. The result is a rearrangement 12^ a<j(n) of the original
series.
We claim that this rearrangement converges to S. Indeed, given e > 0, there exists
an integer N so that \an\ < e if n > N. If we choose K large enough so that all
the terms ao, ai,..., a^ are included among the terms aa(o),aa(i)i ■ ■ ■ aa(K)> then
lao-(n)l < t if n > K. It follows that the partial sums Y,o aa(n) differ from 5 by
less than e if k > K, because the procedure specifies switching from positive to
negative terms or vice versa as soon as the sum is greater than or less than S; if
the sum became greater than 5 + e or less than S - e, we would have added in too
many terms of the same sign. Hence the sums Ylo acr(n) converge to S. □
EXERCISES
Show that the sum of this series is | log 2. (Hint: Deduce from Example 2 that
0+\ +0- \ +0+\ +0 = \ log 2 and add this to the result of Example
2.)
4. Let ]T^ an be a convergent series, and let ]T^ bn be its rearrangement ob
tained by interchanging each even-numbered term with the odd-numbered term
immediately following it: a\ + ao + 03 + a-z + 0,5 + a.4 + ■ • •. Show that
EOO T V—vOO
0 °n — 2^0 a"'
5. Suppose an > — 1 for all n. By suitable applications of Taylor's theorem to the
functions log(l + x) or ex, show the following:
a- X] an is absolutely convergent if and only if ^T log(l + an) is absolutely
convergent. (This is of interest in connection with Exercise 4 of §6.1: If
^2 \an\ < oo> then rj(l + an) converges.)
b. Let an = {—\)n+l/^/n. Then ^^° an is conditionally convergent (see
Theorem 6.22 below), but Y,T 1°g(1 + an) diverges.
The tests we developed in §6.2 for the convergence of series of nonnegative terms
immediately yield tests for the absolute convergence of more general series. We
sum up the most important results:
6.21 Theorem.
&■ U \an\ < Cn~l~e for some C,e > 0, then ^an converges absolutely. If
\an\ > CrT1 for some C > 0, then ^ an either converges conditionally or
diverges.
b. (The Ratio Test) If\an+i/an\ -^!asn->oo, then ^ an converges absolutely
ifl<l and diverges ifl>l.
c. (The Root Test) If \an\l^n —> / as n —> 00, then J2 an converges absolutely if
1 < 1 and diverges if I > 1.
In the ratio and root tests, the divergence (rather than conditional convergence)
when I > 1 is guaranteed because an /> 0 in this case; see the proofs of Theorems
6.13 and 6.14. The statements of the ratio and root tests can be sharpened a bit as
in Theorems 6.13a and 6.14a.
Warning. It is a common mistake to obtain incorrect results by forgetting the
absolute values in Theorem 6.21. For example, the series X^o°(~^)n satisfies
an+i/an = —2, and —2 < 1, but the series diverges!
It remains to investigate criteria that will yield information about conditional
convergence as well as absolute convergence. By far the most commonly used
6.4. More Convergence Tests 301
result of this kind pertains to alternating series, that is, series whose terms alternate
in sign. Such a series can be written in the form ^(—l)raan or Y2{—l)n ^ctn
(depending on whether the even or odd numbered terms are positive), where an >
0; we shall consider the first form for the sake of defmiteness.
6.22 Theorem (The Alternating Series Test). Suppose the sequence {an} is de
creasing and limn^oo an = 0. Then the series X^o°(^l)nan 's convergent. More
over, if Sfc and S denote the kth partial sum and the full sum of this series, we
have
Sfc > S for k even, s& < S for k odd, and \sk — S\ < cik+ifor all k.
+ a2m — >
S2m+2 =
Thus the sequence {s2m-i} of odd-numbered partial sums is increasing and the
sequence {s2m} of even-numbered partial sums is decreasing. This monotonicity
further yields
so {s2m-i} and {s2m} are bounded above and below, respectively. By the mono
tone sequence theorem, these sequences both converge, and since S2m ~ S2m-i =
«2m —* 0, their limits are equal. Thus the whole sequence {s/J converges, that is,
the series J^(—l)nan converges. The even-numbered partial sums decrease to the
full sum S while the odd-numbered ones increase, so S < S2m and S > S2m-l f°r
all m. In particular,
0 < S - S2m-
0 < S'2m — 5
The alternating series test is a useful test for conditional convergence, but the
fact that the difference between a partial sum and the full sum is less in absolute
value than the first neglected term is also of interest in the absolutely convergent
case. (This estimate for the error in replacing the full sum by a partial sum is, in
most cases, accurate to within an order of magnitude.)
The alternating series test can be applied to a series ^2{ — l)nan for which
limon = 0 provided that the on's decrease from some point onward. (Of course,
the inequalities for the partial sums are only valid from that point onward too.)
However, the monotonicity condition cannot be dropped entirely, as the following
example shows:
I 1
6 in 2m
Here an —* 0 as n —> oo, but not monotonically, and the series diverges. (The sum
of the first 1m terms is^(l + | + | + --- + —), a partial sum of the divergent series
1
The tests we have developed can be used to analyze a wide variety of power
series, that is, series of the form ^^° cn(x — a)n where x is a real variable. In
typical cases, the ratio test or the root test will establish that there is some number r
such that the series converges absolutely for \x—a\ < r and diverges for x—a > r.
The convergence at the two remaining points x = a±r can then be studied by one
of the other tests.
test:
Thus the series converges absolutely for \x—3| < 4 and diverges for \x—3\ > 4.
(The root test would also yield this result.) The two remaining points are where
x — 3 = ±4, that is, x = — 1 and x = 7. At these two points the series becomes
CXI oo
The first of these diverges, while the second one converges by the alternating
series test. The convergence is only conditional, by the divergence of the first
series. Thus the original series converges absolutely for — 1 < x < 7, con
verges conditionally at x = 7, and diverges elsewhere.
6.4. More Convergence Tests 303
6.23 Lemma (Summation by Parts). Given two numerical sequences {an} and
{bn}, let
a'n = an - On-li Bn = bo + h bn.
Then
k k
= -a'lBo - a'2Bi - • • • -
6.25 Theorem (Dirichlet's Test). Let {an} and {bn} be numerical sequences.
Suppose that the sequence {an} is decreasing and tends to 0 as n —> 00, and
that the sums Bn = bo + ■ ■ ■ + bn are bounded in absolute value by a constant C
independent ofn. Then the series ^^° anbn converges.
= C[(a0 - 01) + (a, - a-2)-\ h (ak-i - ak)] = C(a0 - ak) < CaQ
for all k. It follows that the series ^i° o!nBn_i is absolutely convergent and hence
convergent. □
304 Chapter 6. Infinite Series
Dirichlet's test includes the alternating series test as a special case, by taking
bn = (—l)n, for which Bn = 1 or 0 according as n is even or odd. The other
situations in which it is most commonly applied are those with bn = sin nO or
bn = cos n&, where 8 is not an integer multiple of 2n. That the hypotheses on {bn}
in Dirichlet's test are satisfied in these cases is shown by the following calculation.
)S2( ftBina( + j
' sini0
The asserted formulas follow by taking the real and imaginary parts of both sides.
□
6.27 Corollary. Suppose that the sequence {an} decreases to 0 as n —> oo. Then
the series Y^ o,n cos nd converges for all 9 except perhaps for integer multiples of
2tt, and the series ^^° an sin nO converges for all 6.
Proof. The hypotheses of Dirichlet's test are satisfied for 8 ^ 2nj, for if bn is either
cos nO or sinnd, the lemma implies that \Bn\ < \ csc \d\ for all n. (If 8 = 2vrj, the
series V an sin n6 converges trivially since sin n6 =■ 0 for all n.) D
6.4. More Convergence Tests 305
EXERCISES
In Exercises 1-9, determine the values of x at which the series converges absolutely
or conditionally.
(x + 2)n
oo
2.
l
3 VX
4.
5> ^ (-l)n(x-4)n
,. (2«-3)log(n
oo 1
6. > —;—
9.
10.
2
oo
12.
nl/n
306 Chapter 6. Infinite Series
oo
oo
n + 1
14. ^j-ir e —
15. Use the alternating series test to show that x~l sin as = 1 — ^x2 + 4,x4 —
ii6 + E(x) where 0 < E(x) < 0.027 for Id < it.
16. (Abel's Test) Suppose J2an is a convergent series and {bn} is a decreasing
sequence of positive numbers, (lim bn need not be zero.) Show that Yl an,bn
converges. (This can be done by using Dirichlet's test or by modifying the
proof of Dirichlet's test.)
17. Show that if ^i° an converges, then so does ^^° rTvan for any p > 0. For
which p can you guarantee absolute convergence without knowing anything
more about the an's?
(6.28)
m.n=0
that is, a series whose terms are indexed by ordered pairs of nonnegative integers.
The difficulty in making precise sense out of such an expression is that it is not
clear what one should mean by a "partial sum." Two obvious candidates are the
"square" partial sums and the "triangular" partial sums
m,n=0 m+n<k
which are defined by adding up all the terms amn for which (m, n) lies in the
outlined regions in Figure 6.2. (Note that passing from s^ or s£ to s^+1 or s£+1
involves adding not just a single term but a finite set of terms to the sum. It is not
necessary to specify the order in which these terms are added, as finite addition
is commutative.) Clearly there are many other possibilities. Indeed, there are in
finitely many ways to enumerate the set of ordered pairs of nonnegative integers,
each of which leads to a different notion of "partial sums."
6.5. Double Series; Products of Series 307
k\
\
k m \
\
B m
i i \
i \
\n i
i \
i s
. i . i .
m \
i i
0 0!
\
i
i 1
\
~0~ ~~k 0 k
There is yet another possibility: One can consider the double series (6.28) as
an iterated series, just as one can regard double integrals as iterated integrals. That
is, one could interpret (6.28) as
oo / oo oo • OO
E(
m.=0
/
or
E(
ra=0 Vm=0
a
in which one forms the ordinary series am — J2^=o ar«« ^or eacn m an(^ men
adds up the sums to obtain Ylm=o CT™' or similarly with m and n switched. This is
different from the partial-sum procedures discussed above because the intermediate
steps involve infinite sums rather than finite ones.
How is one to make sense out of all these ways of interpreting (6.28)? The
answer, in a nutshell, is that the situation is similar to that for improper double
integrals discussed in §4.7: For series of positive terms, or for absolutely conver
gent series, there is no problem, as all interpretations lead to the same answer.
Otherwise, one must proceed with great caution.
Let us explain this in more detail. Given any one-to-one correspondence j ^
(m, n) between the set of nonnegative integers and the set of ordered pairs of non-
negative integers, we can set bj = amn and form the ordinary infinite series ^^° bj\
we call such a series an ordering of the double series ^n=0 amn. The essential
point is that the orderings of'^ amn are all rearrangements of one another, and we
can apply Theorem 6.19.
First, if amn > 0, then either all orderings of ]T amn diverge or all orderings
converge, and in the latter case their sums are all equal. Thus, the sum of the series
Y a-mn is well defined as a positive number or +oo, independent of the choice of
ordering.
Second, without the assumption of positivity, if ]T \bj\ is convergent for one
ordering of ^ amn, then the same is true for every ordering. In this case the series
308 Chapter 6. Infinite Series
Y2amn is called absolutely convergent, and by Theorem 6.19 again, all order-
ings of Y amn have the same sum, which we call the sum of the double series
Y amn. Moreover, an argument similar to the proof of Theorem 6.19 shows that
the double series Y, amn is absolutely convergent if and only if the iterated series
]CmEn Knnl) is convergent, in which case Ym.namn = Ym(Y,namn)- (See
Exercises 5 and 6.)
Given a double series Y amn, we can therefore proceed as follows. First we
evaluate the series Y, \amn\ by ordering it in some fashion or treating it as an iter
ated series; if it turns out to be finite, we can then evaluate Y amn by ordering it in
any fashion or treating it as an iterated series.
What if Y amn is not absolutely convergent? Let us separate out the positive
and negative terms as we did in Theorem 6.18. The argument in the proof of Theo
rem 6.18 shows that if ^ a.+n = oo but Y a^nn < oo, then all orderings of Y <hnn
diverge to +oo; likewise, if Yl, a^nn < oo but Y ornn = oo, then all orderings of
Y a-mn diverge to — oo. On the other hand, if J2 o-mn = /C amn = °° but amn —► 0
asm,n-» oo, the proof of Theorem 6.20 shows that various orderings of Y, amn
can converge to any real number. In this case, therefore, we simply cannot make
numerical sense out of the expression Y, amn without specifying more precisely
how the summation is to be performed.
An important situation in which double series occur is in multiplying two series
together. The basic result is as follows.
6.29 Theorem. Suppose that Y^ "m and ^o° ^« are b°fh absolutely convergent,
with sums A and B. Then the double series Y,m n=o ambn is absolutely convergent,
and its sum is AB.
Proof. We consider the square partial sums of J^ ambn, which are just the products
of the partial sums of J2 am and ^ bn'
(6.30)
m,n=Q
If we replace am and bn by \am\ and \bn\ in (6.30), the right side is bounded by the
finite quantity (X^o° \am\){Y2o^ \bn\), which shows that the double series J2 amK
is absolutely convergent. Then, letting k —> oo in (6.30), we obtain Yl am.bn =
AB. □
Under the conditions of Theorem 6.29, we are free to use any ordering of
Y, ambn that we choose, and in particular, we can use the triangular partial sums
rather than the square ones. This is the natural thing to do when considering power
6.5. Double Series; Products of Series 309
series. Indeed, if ^ anxn and ^ bnxn are absolutely convergent for a particu
lar value of x, their product is ^ ambnxm+n, which can also be expressed as a
power series if we group together all the terms involving a given power of x. The
terms involving x-7 are those with m + n = j, i.e., those with m = 0,1,..., j and
n = j — m. Collecting these terms together yields
3=0
The expression on the right is a power series whose jth coefficient is a finite sum
of products of the original coefficients; its partial sums are precisely the triangular
partial sums of the double series J2 ambnxm+n■
The same procedure can also be used for series without an x (by taking x = 1,
if you like). That is, given two convergent series J2o^ am and Y^f bn, we can form
the series
oo
whose partial sums are the triangular partial sums of the double series Yl amb
mbn,
it is called the Cauchy product of J] am and ^ bn. As we have seen, if Yl a m
and ^2 bn are absolutely convergent, their Cauchy product is too, and its sum is
Cl2am)(J2bn)- In fact, the Cauchy product converges to CCam)CC&n) Pro"
vided that at least one of ^ am and J2 bn is absolutely convergent (see Krantz
[12, pp. 109-10], or Rudin [18, p. 74]). However, if j^ am and Y.K are both
conditionally convergent, their Cauchy product may diverge. (See Exercise 4.)
EXERCISES
1. By multiplying the geometric series by itself, show that for \x\ < 1,
a. (l-x)-2 = ^(n + l)xn;
b. (l-x)-3 = ±£;5> + l)(n + 2)z™.
2. Let f(x) = YJ^xn/n\. Show directly from this formula that f{x)f(y) =
f(x + y),
3. Verify that the Taylor series of (1 - Ax)'1/2 about x = 0 is J2™(2n)\xn/(n!)2
and that this series converges absolutely for \x\ < \. Then, taking for granted
that the sum of this series actually is (1 — 4a;)"1/2 (which we shall prove in
310 Chapter 6. Infinite Series
§7.3), multiply the series by itself and conclude that for any positive integer j,
4. Show that the series ^^°(-l)"(n + l) 1//2 is conditionally convergent and that
the Cauchy product of this series with itself diverges. (Hint: The maximum
of the function f(x) = (x + l)(j - x + 1) occurs at x = \j, and hence
i 2
5. Show that £m,n=o omn = Em=o(E~=o°mn) whenever amn > 0 for all
7n, n > 0.
6. Suppose Em n=o °™n ^s absolutely convergent. Show that the iterated series
Em=o(E^Lo amn) converges to the sum Em,n=o amn- (Use Exercise 5.)
7. Show that Em,n=i(m + n) P converges if and only if p > 2. (Hint: Use
triangular partial sums.)
In this chapter we study the convergence of sequences and series whose terms are
functions of a variable x and improper integrals whose integrand contains x as a
free variable. In all these situations, the study of the resulting function of x may
reveal unpleasant surprises unless we have some control over the way the rate of
convergence varies along with x; the most commonly encountered form of such
control, uniform convergence, is a major theme of this chapter.
We recall that a sequence {/fc}o° of functions is a map that assigns to each non-
negative integer k a function fk. It is implicitly assumed that the functions fk are
all defined on some common domain S (usually a subset of R or W1) and all take
values in the same space (R, C, or Rm).
What does it mean for a sequence of functions {/&} defined on a set 5 C I"
to converge to a function / on SI The most obvious interpretation is that
This is, indeed, what is usually meant by the statement "//. —> / on S" when no
further qualification is added; when we wish to be very clear about it, we shall say
that fk —> / pointwise on S when (7.1) holds.
Unfortunately, pointwise convergence is a rather badly behaved operation in
the sense that it does not interact well with other limiting operations, such as dif
ferentiation and integration. Consider the following group of examples:
311
312 Chapter 7. Functions Defined by Series and Integrals
Example 1. Let
(72) -2k2x
fl ifx = 0,
lim gh{x) = g{x) = <
k^oo I 0 otherwise.
iii. hk(x) -* 0 for all x. (hk(0) = 0 for all k, and if x / 0, hk(x) ps -2//c2x3
for large fc.)
Therefore, 3 is discontinuous even though the gk's are all continuous; more
over, since gk is the derivative of fk and an antiderivative of hk,
Clearly, if we want some theorems to the effect that "the integral of the limit is
the limit of the integrals," or "the derivative of a limit is the limit of the derivatives,"
pointwise convergence is the wrong condition to impose. We now develop a more
stringent notion of convergence that removes some of the pathologies.
The real trouble with pointwise convergence is as follows. The statement
"fk{x) -^ f{x) for all x G 5"' means that, for each x, fk(x) will be close to
/(x) provided k is sufficiently large, but the rate of convergence of fk(x) to /(x)
can be very different for different values of x. For example, if gk is as in (7.2), for
all x / 0 we have gk(x) —» 0, so \gk(x)\ < 1CT4 (say) provided k is sufficiently
314 Chapter 7. Functions Defined by Series and Integrals
large; for x = 10, "sufficiently large" means k > 10, but for x = 0.1, it means
k > 1000. If, however, we have some control over the rate of convergence that is
independent of the particular point x, then many of the pathologies disappear.
The precise definition is as follows. A sequence {fk} of functions defined on a
set S C W1 is said to converge uniformly on S to the function / if for every e > 0
there is an integer K such that
The point here is that the same K will work for every x E S. Another way of
writing (7.3) is
The geometry of this inequality is indicated in Figure 7.2. Yet another way of
expressing uniform convergence is the following, which is sufficiently useful to be
displayed as a theorem.
Let us take another look at the examples in (7.2) with regard to uniform con
vergence. First, the sequence {fk} defined by fk(x) = A;"1 arctan/cx converges
uniformly to 0 on R, since we can take Ck = 7r/2/c in Theorem 7.5. Second, the
sequence {gk} defined by gk(x) = (k2x2 + I)"1 does not converge uniformly to
its limit g on M; indeed,
(Notice that the supremum is not actually achieved; the maximum of (k2x2 + l)^1
occurs six = 0, but g(0) = 1, sogk(0)—g{0) = 0. See Figure 7.2.) Finally, the se
quence {hk} defined by hk(x) = —2k2x{k2x2 + \)~2 does not converge uniformly
to its limit 0 on R. Indeed, a bit of calculus shows that the minimum and maximum
values oihk{x), achieved atx = ±l/2k, are =Fl6fc/25, sosupx \hk{x)—Q\ actually
tends to oo rather than 0.
7.1. Sequences and Series of Functions 315
On the other hand, the bad behavior in these examples is all at x = 0. The
sequences {g^} and {hk} do converge uniformly to 0 on the intervals [6, oo) and
(-oo, —8] for any 5 > 0. For gk this is clear:
^^-1 (x<-8orX>5),
and (82k2 +1)"1 —)■ 0 as k —> oo. For hk we do not get a good estimate for the first
few values of A;, but (by the same bit of calculus as in the preceding paragraph) when
k > 1/28 the function hk is positive and increasing on (—oo, — 5] and negative
and increasing on [8, oo), so the maximum of \hk\ on these intervals occurs at the
endpoints ±8:
25k2
\hk(x)-0\< x < —8 or x > 5, k >
(S2k2 + I)2
The phenomenon exhibited here is quite common. That is, one has a sequence
{fk} of functions that converge pointwise to / on a set S; the convergence is not
uniform on all of S but is uniform on many "slightly smaller" subsets of S. The
situation we shall encounter most often is where 5 is an open interval (a, b), and
the "bad behavior" occurs near the endpoints, so that the convergence is uniform on
[a + 8, b — 5] for any 8 > 0. In this case, the sequence of constants Ck in Theorem
39 will generally depend on 8 — as they do in the preceding examples.
The notion of Cauchy sequence has an obvious adaptation to the context of uni
form convergence. Namely, a sequence {//J of functions on a set S is uniformly
Cauchy if for every e > 0 there is an integer K so that
or in other words,
7.7 Theorem. The sequence {fk} is uniformly Cauchy on S if and only if there is
a function f on S such that fk —> / uniformly on S.
Proof. If {fk} is uniformly Cauchy, then for each x£5 the numerical sequence
{/fe(x)} is Cauchy. By Theorem 1.20, it has a limit, which we call /(x). Letting
j —> oo in (7.6), we see that |/fc(x) - /(x)| < e whenever k > K and x € 5',
so that fk —> / uniformly on 5. Conversely, if /^ —> / uniformly on S, we have
|/fe(x) - /(x)| < Ck for all x G 5, where Cfc -> 0 as k -> oo, and
<3 + 3 + 3=£'
which shows that / is continuous at a. □
k+l
1 X
sk(x) = — , so sk{x) -
1 -x 1 -X
The following is the most commonly used test for uniform convergence of se-
ries:
But Cfc —>■ 0 as k —> oo since the series ^ Mn is convergent, so it follows from
Theorem 7.5 that the sequence {sk}, i.e., the series ^ /„, is uniformly convergent
on S. □
318 Chapter 7. Functions Defined by Series and Integrals
Example 3. The M-test gives an easy verification that the geometric series
^"j;" converges uniformly on [-r,r] for any r < 1, by taking Mn = rn.
(\xn\ < r" for |a?| < r, and X>" < oo.)
The remarks following Theorem 7.8, to the effect that local uniform conver
gence is enough to yield continuity, apply to this situation also.
EXERCISES
a.
0
oo
n2 + n
oo -,
c.
1
oo
, v^ cos nx
d. > =—.
Z-, n3
oo ,
e.
oo
f. £„-
l
that does not contain a nonzero integer, and conclude that the sum of the series
is a continuous function on R \ {±1, ±2,. ..}.
' (-1)"-1
5. Show that the series } ^ 2 \ - converges uniformly on R, although the
x2 + n
convergence is conditional at every point.
6. Given a sequence {cn} of real numbers such that J2T c« converges, consider
oo
E,Jtl t
8. Let {fk} be a sequence of continuous functions on [a, b]. Show that if {f^}
converges uniformly on (a, b), then it converges uniformly on [a, b].
If {fk} is a sequence of functions on the interval [a, b] and fk —> f on [a, b], is
it true that Ja fk(x) dx —» JQ /(x) dxl The sequence {/ifc} in (7.2) shows that
the answer is sometimes no. The best general affirmative result in the context of
Riemann integration is the bounded convergence theorem that we stated in §4.5.
As we indicated there, the proof of that theorem is beyond the scope of this book;
however, uniform convergence yields a affirmative result with an easy proof. It
works equally well for n-dimensional integrals, so we present it in that generality.
This last quantity is the n-dimensional volume of S times Ck, which tends to zero
as k —> oo. □
1.2. Integrals and Derivatives of Sequences and Series 321
Example 1. Let gk{x) = k~1sinkx. Then \gk(x)\ < fc"1 for all x, so
gk —> 0 uniformly on M. On the other hand, g'k(x) = coskx; the sequence
{cos kx} does not converge at all for most values of x, and when it does —
namely, when x is an even multiple of vr — its limit is 1, not 0.
Thus /(x) = /(a) + JJ g(t) dt. But by the fundamental theorem of calculus, the
function on the right is differentiable and its derivative is g. □
The example {fk} in (7.2) shows that pointwise convergence of {f'k} is not
sufficient to obtain Hm(/jQ = (lim fk)'. On the other hand, Theorem 7.12 can be
extended somewhat. Since differentiability (like continuity) is a local property, it is
enough for the convergence of{f'k} to be uniform on a neighborhood of each point,
rather than on the whole interval in question. In many situations, the sequence
{fk} is defined on an open interval (a, 6) and one has uniform convergence of
{fk} on each compact subinterval [a + S, b — 5]; this suffices to guarantee that
I El [
b. If the fn's are of class Cl and the series Yl f'n converges uniformly on [a, b],
then the sum ^ fn is of class C1 on [a,b] and
EXERCISES
1. Let f(x) = Y2T n~2 s'mnx- Show that / is a continuous function on R and
that £ 2 f(x) dx = En=i,3,5,... n~3 + 2 En=2,6,10,... n~3-
2. Let f(x) = J2T(X + n)~2. Show that / is a continuous function on [0, oo)
and that f0 f(x)dx = l.
3. Let fk(x) — xarctankx.
a. Show that lim^^oo ^(.t) = ^tt|:e|.
b. Show that lim^oo f'kix) exists for every x, including x = 0, but that the
convergence is not uniform in any interval containing 0.
4. For each of the series (a-f) in Exercise 2, §7.1, show that the series can be dif
ferentiated term-by-term on its interval of convergence (except at the endpoints
in (b)).
5. For x / ±1, ±2,..., let f(x) = 2x ^2^°(x2 - n2)"1 (see Exercise 4, §7.1).
Show that / is of class C1 on its domain and that f'(x) = — X^i°[(-X' ~ n)~2 +
6. Let / be a continuous function on [0, oo) such that 0 < f(x) < Cx~l~e for
some C, e > 0, and let a = Jo°° f(x)dx. (The estimate on / implies the
convergence of this integral.) Let fk(x) = kf(kx).
a. Show that limfc_>oo//t(x) = 0 for all x > 0 and that the convergence is
uniform on [5, oo) for any 5 > 0.
b. Show that lim/^oo Jo fk(x) dx = a.
c. Show that lim^oo fQ fh(x)g(x) dx = ag(0) for any integrable function g
on [0,1] that is continuous at 0. (Hint: Write Jo = fQ + Js .)
7.3. Power Series 323
7.15 Lemma. If the power series converges for x = xq, then it con
verges absolutely for all x such that \x\ < \xo\.
Proof The convergence of ]P «n^o implies that anxg —► 0, and in particular that
< C for some constant C independent of n. Since
x
<c
•-o
for |xj < |a?o| the series Yl anx™ converges absolutely by comparison with the
geometric series ^ \x/xq \n. D
7.16 Theorem. For any power series ^^° anxn, there is a number R £ [0, oo],
called the radius of convergence of the series, such that the series converges ab
solutely for \x\ < R and diverges for \x\ > R. (When R — 0, this means that the
series converges only for x = 0; when R = oo, it means that the series converges
absolutely for all x.)
Proof Let R = sup{|xo| : J2an%o converges}. (R > 0 since the series always
converges at xq = 0.) Thus Y,anxn diverges if |x| > R. On the other hand, if
< R, there exists xq such that |xo| > \x\ and J2anxo converges, and then
, o,nxn converges absolutely by Lemma 7.15. D
Important Remark. The reader has probably been thinking of an and x as real
numbers, but Theorem 7.16 is valid, with exactly the same proof, when an and x
are complex numbers.
324 Chapter 7. Functions Defined by Series and Integrals
Theorem 7.16 says that the set of all real x such that J2 anxn converges is an
open interval centered at 0, possibly together with one or both endpoints, and the
set of all complex x such that Yl, o,nxn converges is an open disc centered at 0 in
the complex plane, possibly together with some or all of its boundary points. The
behavior of the series on the boundary of the region of convergence must be decided
on a case-by-case basis.
oo
^—v X xn y^
71 2 ' ' J II
1
An easy application of the ratio test shows that each of these series converges
absolutely for \x\ < 1 and diverges for |:r| > 1, so their radius of convergence
is 1. The first one is absolutely convergent when \x\ = 1 by comparison with
2^n~2, whereas the second is divergent when \x\ = 1 because xn /> 0 as
n —>■ oo in that case. The third one is divergent when x = 1 but is conditionally
convergent at a; = -1 by the alternating series test. It is also conditionally
convergent at all other complex numbers x such that |x| = 1, by Dirichlet's
test. (Indeed, take an = n 1 and bn = xn. Then b\ + ■ ■ ■ + bn is a finite
geometric series whose sum equals x(l — xn)/(l — x), and this is bounded by
- x\) asn-> oo.)
The standard tools for determining the radius of convergence of a power series
are the ratio test and the root test. We have already seen how this works in §6.4
(especially Example 2 and Exercises 1-9), so we shall not belabor the point here.
However, see Exercise 1. In fact, a slight extension of the root test yields a formula
for the radius of convergence of an arbitrary power series; see Exercise 4.
Theorem 7.16 shows that any power series converges absolutely within the re
gion \x\ < R. Equally important is that it converges uniformly on compact subsets
of this region.
7.17 Theorem. Let R be the radius of convergence ofY^f anxn. For any r < R,
the series X^o° anxTl converges uniformly on the set {x : \x\ < r}, and its sum is a
continuous function on the set {x : \x\ < R}.
Proof. For|a;| < r we have |anxn| < \an\rn, and the series ]T \an\rn is convergent
since Y^an%n is absolutely convergent at x = r. The first assertion therefore fol
lows from the Weierstrass M-test, and the second follows from the first by Theorem
7.8. □
7.3. Power Series 325
7.18 Theorem. Suppose the series f(x) = Y^o* anxn has radius of convergence
R>0.
- '■ ^ bn+i _ an+i
a. If — R < a < b < R, then / f(x) dx =
Ja n+1
oo
b. IfF is any antiderivative of f, then F{x) = F(0) + ^ -^— xn+1 for \x\ <
R.
Proof. Assertion (a) follows immediately from Theorems 7.13a and 7.17. The fun
damental theorem of calculus then shows that ^^° anxn+1 /(n+1) is an antideriva
tive of / on (-R, R) — specifically, the one whose value at x = 0 is zero — and
any other antiderivative differs from this one by a constant. □
Theorem 7.18 gives a way of generating new series expansions from old ones.
dt
OX sr^ I 1 ) X ... .
n = / (\x\ < 1).
l+t> ^ 272+1 U '
The series for log(l +x) is easily obtained from Taylor's theorem (see Exercise
3 in §6.1), but not the series for arctanx; the computation of the high-order
derivatives of the latter function is very cumbersome. {Remark: The expansion
of log(l+x) is also valid at x = 1, and that of arctanx is also valid at x = ±1.
However, these facts do not follow from Theorem 7.18. The extra result needed
here is Abel's theorem, which we shall present below.)
Theorem 7.18 also offers a technique for expressing definite or indefinite inte
grals of functions that have no elementary antiderivatives in a computable form.
326 Chapter 7. Functions Defined by Series and Integrals
. oo
sint (-l)mi2m {-l)mx2m+l
dt = Clt =
v (2m (2m+ 1) ■ (2m+ 1)!'
This gives a precise analytic expression for JQX t l sin t dt that is valid for all
x, and the first few terms, x — -^x3 + g^x5 + • • •, furnish a good numerical
approximation to the integral when x is not too large.
7.19 Theorem. The radius of convergence of any power series Eo° anxn is equal
to the radius of convergence of the derived series Y^ nanx n~1
Proof. Let R and R! be the radii of convergence of ^o° anXn and YV nanxn
respectively. Suppose \x\ < R'. Then Y nanxn~l is absolutely convergent, and
n-li -n-l
anxn\ = nanx < nanxn | for large n,
n
n-l
1
\nanx
Since \x/r\ < 1, the sequence \x/r tends to 0 exponentially fast as n -» oo,
Combining this result with Theorem 7.13b, we obtain the fundamental theorem
on term-by-term differentiation of a power series.
7.20 Theorem. Suppose the radius of convergence of the series f(x) = ]T a.nx"
is R > 0. Then the function f is of class C°° on the interval (-R, R), and its kth
derivative may be computed on (-R, R) by differentiating the series ^^° anxn
termwise k times.
Proof. In view of Theorem 7.19, Theorem 7.13b shows that f'(x) = ^nanxn^1
for \x\ < R. It now follows by induction on k that, for any positive integer k, f is
of class Ck on (—R, R) and that f(k) is the sum of the /c-times derived series. □
7.21 Corollary. Every power series J^q^ 0LnXn with a positive radius of conver
gence is the Taylor series of its sum; that is, if f(x) = ^^° anxn for \x\ < R
(R > Q), then
dn
f{n) (x) = — (ao + aix + ■ ■ • + anxn + ■■•)= n\an + ■■■ ,
where the last set of dots denotes terms containing positive powers of x. Setting
x = 0, we obtain /W (()) = n\an. □
7.22 Corollary. //X)o° anXn = Eo° Kxn for \x\ < R(R> 0), then an = bnfor
alln.
Proof. We have an = pn> (0)/n! = bn where f(x) is the common sum of the two
series. D
The following examples will illustrate the use of Theorem 7.20. The second one
contains a result of importance in its own right, the binomial formula for fractional
and negative exponents.
Example 4. Suppose we wish to express the sum of the series ^^° xn/n2
in terms of familiar elementary functions. The key is to recognize that this
series is related to the geometric series Ylo^ xU> an<^ tnat me factors of 1/n
should arise from integrating the latter series. With this in mind, we proceed as
follows. Setting f(x) = ^J° xn/n2, we obtain successively
n
328 Chapter 7. Functions Defined by Series and Integrals
and, finally,
JO l
dn
— (1 + x)a = a(a - 1) • ■ • (a - n + 1)(1 + x)
dxn
(with the understanding that (^) = 1). This series is called the binomial series
of order a. When a is a nonnegative integer k, the terms with n > k all vanish
since they contain a factor of (a — k), and we obtain the familiar binomial
expansion formula for (1 + x)k. For other values of a, the Taylor series is a
genuine infinite series, and one can easily check by the ratio test that its radius
of convergence is 1. Our aim is to verify that the sum of this series is actually
(l + x)Qfor |x| < 1.
We need the following formulas concerning the generalized binomial co
efficients ()
'a- T
= afa-i{x).
7.3. Power Series 329
(For the third equality we have made the change of variable n —> n + 1.) On
the other hand,
nn+l
Abel's Theorem. Suppose f(x) = ^^° anxn is a power series whose radius of
convergence R is positive and finite. We have seen that the convergence is uniform
on any compact subinterval of (-R, R) and hence that / is continuous on (-R, R).
But now suppose that the series converges at one of the endpoints, say x = R. Does
the uniformity of convergence and the continuity of the sum persist up to this point?
If the series converges absolutely at x = R, then the M-test (with Mn =
\an\Rn) shows that the series converges absolutely and uniformly on [-R,R], so
its sum is continuous there. But when the convergence is only conditional, a more
subtle argument is needed. The necessary tool is the summation-by-parts formula
that we used to obtain Dirichlet's test; since we need a slightly different version of
that formula than the one given in Lemma 6.23 (namely, formula (7.27)), we shall
simply derive it as we proceed.
330 Chapter 7. Functions Defined by Series and Integrals
Let / —> oo: then A[+\ —> 0 and xl remains bounded, so the last term on the right
disappears and we obtain
Now, given e > 0, we can choose k so large that \An\ < \e whenever n > k.
Since x G [0,1], we have xn+1 - xn < 0, so (7.27) yields
2^ anXU
k
oo
exk < e
k
for all x G [0,1] when k is sufficiently large, which establishes the desired uniform
convergence. □
7.3. Power Series 331
Remark. If J^ anRn converges, we already know (Theorem 7.17) that ]T) anxn
converges uniformly on [—r, r] for any r < R. Combining this with Abel's the
orem, we see that Y^anXn converges uniformly on [—r,R]. (See Exercise 7 in
§7-1.)
The continuity of the series at the endpoint can be restated in the following way.
Recall that lim^a- f(x) denotes the limit of f(x) as x approaches a from the left.
7.28 Corollary. //^o° a" converges, then limx^i- ^^° anxn = J2o an-
~ (-1)"
W
4
= -r—»1_
lim arctanx = />^■< 9'n 4- 1
= 1 — 40 + 10/
— ! + •••■
The converse of Corollary 7.28 is false: The limit S — lim^^i- ^0 anxn may
exist even when ^^° an diverges. (Example: Take an = (—1)"; then ^^ anxn =
(1 + a;)"1 for \x\ < 1, so S = |.) In this case the series Yl an is said to be Abel
summable to the sum S. Abel summation provides a way of making sense out
of certain divergent series that is useful in some situations, one of which we shall
discuss in §8.2.
EXERCISES
5. Show that each of the following functions of x admits a power series expansion
on some interval centered at the origin. Find the expansion and give its interval
of validity.
332 Chapter 7. Functions Defined by Series and Integrals
o_e dt.
b. Jq cos t2 dt.
c. f* t~r log(l + 2t) dt.
6. Use the series expansions in Exercise 5 to calculate the following integrals to
three decimal places, and prove the accuracy of your answer.
a. fi e~t2 dt.
b. Jq1 cost2 dt.
c. /01/V1log(l + 2£)(ii
7. Let f(x) = 5^o° anXn be a power series with positive radius of convergence.
Show that f(-x) = f(x) (resp. }'{-x) = —f(x)) for all x in the interval of
convergence if and only if an = 0 for all odd n (resp. all even n).
1 + —
Hi + 2 • 3 ■ 5 • 6 + + 2-3-5-6---(3n-
converges for all a; and that its sum f(x) satisfies f"{x) = xf(x).
10. Express the sums of the following series in terms of elementary functions and
(perhaps) their antiderivatives in the manner of Example 4.
(2n + l) ■ (2n
C'
7.4. The Complex Exponential and Trig Functions 333
1 * 1 i 1 i 1 1 1 i _ 1^
The series Yl'o' zn/nl converges absolutely for every complex number z, by the
ratio test, so we can use it to define the exponential function for a complex variable:
This extended exponential function still obeys the basic law of exponents. Indeed,
by Theorem 6.29,
2 x4 \ ( x3 x5
The series on the right are the Taylor series of cos x and sin x, so we have arrived
at Euler's formula
This is the appropriate place to raise the issue of the definition of cos x and
sin x. These functions are so familiar that we take them entirely for granted, but the
334 Chapter 7. Functions Defined by Series and Integrals
._,.. ^ (l)x .
(7.31) COS 2= > . ., , S1M= > .
v (2n)! o (2n+iy-
We now indicate how to derive all the familiar properties of the trig functions
from these definitions. First, it is clear from (7.31) that
Third, the addition formulas for sine and cosine follow easily from the law of ex
ponents:
Next, we have to bring the number vr into play somehow. We can proceed as
follows. The series ^^°( —l)n22"/(2n)! for cos2 is an alternating series whose
terms decrease in magnitude starting with n = 1, so by the alternating series test,
22 24 2
cos 2 = 1- —- = — 1 with error less than — = -.
In particular, cos 2 < 0, and of course cos 0 = 1 > 0, so by the intermediate value
theorem there is at least one number a 6 (0, 2) such that cos a = 0. Therefore, the
set Z = {x > 0 : cosx = 0} is nonempty; it is closed since cos is continuous;
hence it contains its greatest lower bound, which is positive since cosO = 1. We
denote this smallest positive zero of cos by jtt. (Again, this may be taken as a
definition of the number vr, from which its other familiar properties can be derived.)
Now, by (7.33), (d/dx) sin x = cos x > 0 for 0 < x < \ix, so sin is increasing
on [0, |tt], and sinO = 0; hence sin ^tt > 0. But by (7.35), sin2 \-k = sin2 tjvt +
cos 2 2
2?r = 1; hence, sin jtt = 1. In summary,
These, in turn, yield the 2vr-periodicity of sine and cosine. Indeed, replacing x by
—x in (7.37) and using (7.32), we see that cos(x+^7r) = — sinx andsin(x+^vr) =
cos x, whence
and therefore
cos(x + 2tt) = — cos(x + vr) = cosx, sin(x + 27r) = — sin(x + vr) = sinx.
336 Chapter 7. Functions Defined by Series and Integrals
EXERCISES
1. Recall that the hyperbolic sine and cosine functions are defined by sinh z =
\(ez — e~z) and coshz = \{ez + e~z). Here, z may now be taken to be a
complex number.
a. Show that sinh ix = i sin x and cosh ix = cos x.
b. Show that sinh(z+u>) = sinh z cosh w+cosh z sinh. w and cosh(z+w) =
cosh z cosh w + sinh z sinh w.
c. Express sinh(x + iy) and cosh(x + iy) in terms of real functions of the
real variables x and y.
2. Verify that the formula (d/dx)ecx — cecx remains valid when c is a complex
number. (However, x is still a real variable, since we have not discussed differ
entiation of functions of a complex variable.)
In the preceding sections we have considered infinite series of functions. The ana
logue for integrals is an improper integral Jc f(x, t) dt, where the integrand con
tains a free variable x as well as the variable of integration and the resulting integral
defines a function of x. The integral may be improper because c = —oo or d = oo
or because of singularities of the function /. To keep the notation simple, we shall
restrict our discussion to the case where d = oo and / has no singularities on
[c, oo), but everything we say extends to the other cases with the obvious modifica
tions.
In this situation, the notion of uniform convergence is as follows: We say that
the integral /c°° f{x,t) dt converges uniformly for x € / (where / is an interval
in R) if the difference between the "partial integral" Jc and the full integral J^
- that is, the "tail end" J^° - tends to zero uniformly for x € / as d —» oo.
Precisely, this means that
/•oo
The most useful test for uniform convergence is the following analogue of the
Weierstrass M-test. The proof is essentially identical to that of the M-test, and we
leave the details to the reader (Exercise 1).
7.38 Theorem. Suppose there is a function g(t) > 0 on [c, oo) such that (i)
\f(x,t)\ < g(t) for all x £ / and t > c, and (ii) f™g(t)dt < oo. Then
/c°° f(x, t) dt converges absolutely and uniformly for x G /.
7.39 Theorem. Suppose that f{x,t) is a continuous function on the set {(x,t) :
X S I, t > c} and that the integral Jc°° f(x,t)dt is uniformly convergent for
x G /. Then:
a. The function F{x) = Jc°° f(x, t) dt is continuous on I.
b. If[a,b] C /, then
i'b roo rco rb
/ / f(x,t)dtdx= / f{x,t)dxdt.
Ja Jc Jc Ja
Proof. The conclusions are true if /c°° is replaced by j^1 where d < oo, by The
orems 4.46 and 4.26. (a) then follows because the uniform limit of continuous
functions is continuous, and (b) follows by the argument in the proof of Theorem
7.11. □
7.40 Theorem. Suppose that f(x, t) and its partial derivative dxf(x, t) are con
tinuous functions on the set {(x, t) : x £ I, t > c}. Suppose also that the integral
Jc°° f(x, t) dt converges for x € I and the integral /c°° dxf(x, t) dt converges uni
formly for x G /. Then the former integral is differentiate on I as a function of x,
and
d f°° f°° df
~ f(x,t)dt= jf(x,t)dt.
dx Jc Jc dx
Theorem 7.40 may be deduced from Theorem 7.39 in much the same way as
Theorem 7.12 was deduced from Theorem 7.11 (Exercise 2).
Let us state explicitly the result of combining Theorems 7.39 and 7.40 with
Theorem 7.38:
7.41 Theorem. The conclusions of Theorem 7.39 are valid whenever \f(x,t)\ <
g(t) for all x G / and t > c, where Jc°° g(t) dt < oo. The conclusions of Theorem
7.40 are valid whenever Jc°° f(x, t) dt converges for x G / and \dxf{x, t)\ < g(t)
for all x <E I and t > c, where Jc°° g(t) dt < oo.
338 Chapter 7. Functions Defined by Series and Integrals
,,-,,
f
, arctan(fri) — arctan(at) ,
Example 1. Evaluate / — K-—'- dt where 0 < a < b.
Jo
Solution: We recognize that the integrand is fa (x2t2 + l)~1dx. For x > a
t >0wehave(x2t2 + l)"1 < (a2t2 +1)"1, and /0°°(a2£2 +1)"1 dt < oo.
Thus, by Theorem 7.38, the integral /O°°(x2t2 + 1)~1 dt is uniformly convergent
for x > a, so we can apply Theorem 7.39 to obtain
rb foo i rb fb
J- ... / _l no In
n n dtdx = I x^1 arctanxt ^ = I
\AiU VAiUj f ^Lj tJuX. Kj ULU1 <JU O ^, I — dx
TV fb
= -z log -
2 \a
Example 2. Let
/■OO
~ Jo
Since (dk/dxk)e~xt2 = {-~t2)ke~xt2, by Theorem 7.40 we can conclude that
/•oo
provided that we establish the uniform convergence of the integral on the right.
In fact, the convergence is not uniform on the whole interval (0, oo), but it is
uniform on [8, oo) for any 6 > 0, which is sufficient. This follows easily from
Theorem 7.38, since t?ke-xt2 < t2ke~6t2 for x > 5.
On the other hand, we can evaluate F(x) explicitly by making the substi
tution u = xxl2t and invoking Proposition 4.66:
f{X)= re-u2
Jo
and therefore
7.5. Functions Defined by Improper Integrals 339
°° ,2k -xt2
Example 3. We now derive one of the most important of all integral formulas:
f°° suit , n
(7.42) / dt
This is a bit tricky, since the integral is not absolutely convergent. (Note that
since t"1 sin* —>■ 1 as t -» 0, the integral over [0,1] is an ordinary proper inte
gral. The convergence of the integral over [1, oo) was proved in §4.6 [Example
3].) Our strategy will be to consider an improper integral with two parameters:
of _ r°°
e xt cos ytdt.
dy Jo
By Theorem 7.41, this formula is indeed valid, since |e xicosyt| < e for
all y and /0°° e~xt dt < oo. The integral on the right can be evaluated by
elementary calculus (integrate by parts twice, or use Exercise 3 in §7.4), and
the result is
dy x2 + yl x2 + y2
rv x
x
The variable y has now served its purpose, and we henceforth set it equal to 1.
We have shown that
00 e xt sin t
(7.44) dt = arctan(l/a;) (x > 0).
t
We now wish to let x —» 0. In order to pass the limit under the integral sign
in (7.44), it is enough to show that the integral in (7.44) is uniformly convergent
for x > 0. Unfortunately, Theorem 7.38 does not apply here, since the integral
is not absolutely convergent at x = 0. (Theorem 7.38 easily yields the uniform
convergence for x > 5 for any S > 0, but that isn't good enough!) Recall the
meaning of uniform convergence: What we need to show is that
"" e-xlsint
sup (it 0 as b —> oo.
x>0
roc e-xts[nt
e-bx (xsinb °° e~xt {x sin t + cos t)
Ib 1)6
Now,
e (x sin t + cos t)
(x2 + 1)
The quantity on the right is continuous on R and tends to zero as x —> oo, so it
is bounded by a constant C for x > 0. Therefore,
roo _—xt
e~xt sin t
sup dt
x>0 t
which tends to zero as 6 —> oo, as desired. Thus the convergence is uniform in
(7.44), and it follows that
sini -xtsint
dt — lim dt = lim arctan(l/a;) = —.
x->0+
2The idea is much the same as the use of summation by parts in the proof of Abel's theorem.
7.5. Functions Defined by Improper Integrals 341
EXERCISES
3. Suppose x > 0. Verify that /*0°° e~xt dt = x~1, justify differentiating under
the integral sign, and deduce that Jo°° tne~xt dt = nlar™"1.
4. Verify that JQ°°{t2 + .r)"1 dt = \irx~ll2, justify differentiating under the inte
gral sign, and thence evaluate L (i2 + x) n dt.
roo e-bx _ e-ax a
5. Show that / —— dx = log - for a, b > 0.
,/o x b
roo c-bx _ e-ax | _j_ a2
6. Show that / ■ cos x dx = 5 log r-r7 for a, b > 0.
Jo x 1 + V
/ 1 e
13. Show that / = dt = V^nr for a; > 0.
/o t
342 Chapter 7. Functions Defined by Series and Integrals
Jo
a. Show that L[f] is of class C°° on (6, oo) and (d/ds)nL[f] = (-l)nL[/n]
where fn(x) = xnf(x).
b. Suppose that / is of class C1 on [0, oo) and that /' satisfies the same sort of
exponential growth condition as /. Show that L[f'](s) = sL[f}(s) - /(0).
Perhaps the most important of all functions defined by improper integrals is the
gamma function F(x) denned for x > 0 by
/>oo
which has a way of turning up in many unexpected places. Let us analyze the
integrals over [0,1] and [1, oo) separately. The integral over [0,1] is proper for
x > 1 and improper but convergent for 0 < x < 1. In fact, by Theorem 7.38 it
is uniformly convergent for x > 5, for any 5 > 0, since 0 < tx~1e~t < ts l for
x > 5 and 0 < t < 1. The integral over [1, oo) is convergent for all x and uniformly
convergent for x < C, for any constant C, since 0 < tx^le^t < tc~1e~t for
x < C and t > 1. Therefore, the integral defining F(x) is convergent for x > 0
and uniformly convergent on <5 < x < C for any 5 > 0 and C > 0.
It follows that F is a continuous function on (0, oo). In fact, V is of class C°° on
(0, oo), and its derivatives can be calculated by differentiating under the integral:
Hx
Since | log t| grows more slowly than any power of t as t —> 0 or £ —► oo, the argu
ment of the preceding paragraph shows that the integral on the right is absolutely
7.6. The Gamma Function 343
and uniformly convergent for 6 < x < C for any positive S and C, so Theorem
7.40 guarantees the validity of (7.46).
The most important property of T is that it satisfies the functional equation
\ /
Jo Jo
There are two values of Y that can be calculated easily by hand:
r(l)= / dt =-
Jo
(For the second one we set u = \fi and used Proposition 4.66.) The functional
equation (7.47) now yields the values of T at all positive integers and half-integers:
and so by induction,
Thus the gamma function provides an extension of the factorial function to non-
integers: x\ = F(a; + 1), if you like. It is the natural extension of the factorial
function, not just because it gives the right values at the integers, but because the
functional equation T(x + 1) = xT(x) is the natural generalization of the recursive
formula n! = n ■ (n — 1)\ that defines factorials.
Other factorial-like products — more precisely, products of numbers in an arith
metic progression — can also be expressed in terms of the gamma function. Indeed,
since
n)(c + n- l)---cT(c),
r(f)
344 Chapter 7. Functions Defined by Series and Integrals
rw . Efe + i)
x
shows that F(x) blows up like x^1 as x —► 0. It also provides a way of extending
the gamma function to negative values of x. Indeed, the expression on the right is
defined for all x > — 1 except x = 0, and it can be taken as a definition of T{x)
for -1 < x < 0. Once this has been done, T(x + l)/x is denned for all x > -2
except x = 0, — 1, and it can be taken as a definition of F(x) for -2 < x < —1.
Proceeding inductively, we eventually obtain a definition of F(x) for all x except
the nonpositive integers, where T(x) blows up. In more explicit form, it is
oo
L
'77 /
(7 .51) e~btdt = 6-
T
poo />OO
dv
Lt2"
B-le-t2 M /1
,,{2»-l)/2.
(7 .52) (XL —
Jo 2V1/2
There is another important integral related to the gamma function, the so-called
beta function
(7.53) tx'l(l-t)y-1
B(x,y)= f tx'l(l-t)y-1dt (x,y>0).
J
Jo
7.6. The Gamma Function 345
T(x)T(y)
7.55 Theorem. Forx,y > 0, B(x,y) =
T(x + y)'
/•oo roo
= 4/ / 82y-lt2x-le-s*-t2d3dt
Jo Jo
rti/2 roo
= 4/ / (r cos 0)'2v-l{rsm9)2x-le-r2rdrde
Jo Jo
/"ft/2 roo
=4 / cos22'-1 9sin21-1 9d9 / r2x+2y-le~r'2 dr
Jo Jo
= B(x,y)T(x + y).
We draw two useful consequences from Theorem 7.55. The first one is another
functional equation for the gamma function; the second one compares the growth
of T(x) and T(x + a) as x -4 oo.
Proof. Assume that x > 0. By taking y = x in Theorem 7.55 and observing that
the function i(l - t) is symmetric about t = |, we see that
dt.
By the substitution
Since F(^) = tt1/2, the result follows. The extension to negative values of a; is left
to the reader (Exercise 6). □
F(x + a)
7.57 Theorem. Fora > 0, lim v ^ / = 1.
7.6. The Gamma Function 347
r(x)r(o) rl
{l-t)a~idt =
x_
T(x + a)
When x is large, e xu is very small unless u is close to 0, and in that case 1 — e "is
approximately u. Hence, the integral on the right should be approximately equal to
/0°° ua le xu du = x~aT(a), which is what we are trying to show. More precisely,
we have
/ u"'-1e-xudu+ / [ 1 - -ua-l]e~xudu
T{x + a) 'o Jo
oo
—u\ a—I
xaT(x) 1-e" ua-le-xudUi
(7.58) - 1 = -1
T{x + a) r(a) Jo
It remains to show that the quantity on the right tends to zero as x —> oo.
The function defined by /(u) = (1 - e~u)/u for u/0 and /(0) = 1 is
everywhere positive and of class C°° (even at u = 0, for it is the sum of the power
n-lun-l
series ; (-1) l/n!). Hence the same is true of f(u)a l, so the function
g(it) = f{u)a~l — 1 is smooth and vanishes at u = 0. By the mean value theorem,
then, for 0 < u < 1 we have \g(u)\ = \g(u) - g(0)\ < Cu where C is the
maximum value of \g'(u)\ on [0,1]. On the other hand, for u > 1 we clearly have
0 < f(u) < 1 and hence —1 < g(u) < 0. Therefore, the quantity on the right of
(7.58) is bounded in absolute value by
x"
Cuae~xu du
T(a)
x
< uae~xu du uae~xu du uae-xu du
r»
where we have used (7.51) again in the last step. In short, the right side of (7.58) is
dominated by x~l as x —> oo, so we are done. □
348 Chapter 7. Functions Defined by Series and Integrals
^ n23nn!
Since
EXERCISES
1. Prove the duplication formula for the case where x is a positive integer simply
by using (7.48).
4. Prove the following identities directly from the definition (7.53) (without using
Theorem 7.55):
a. B{x,y) = B{y,x).
b. B{x,l) =x~\
7.6. The Gamma Function 349
7. Use (7.54), Theorem 7.55, and (7.48) to evaluate £/2 sinfc x dx. (The form of
the answer is different depending on whether k is even or odd.)
n _ 2-2-4-4-6-6---(2n)(2n)
2 n->oc l-3-3-5-5-7---(2n-l)(2n + l)'
(Hint: Denote the fraction on the right by cn. Use Exercise 7 and the fact that
sin2n+1 x < sin2" x < sin2""1 x for 0 < x < \-k to show that cn < \tx <
(2n + l)cn/2n.)
9. Suppose / is a continuous function on [0, oo). For a > 0, define the function
/Q[/]on[0,cx))by
• 9 • • • (4n + 5)
12. Suppose a,b,c > 0. Show that \ ' _.—-—— converges if and only
*-^ T(c + n)n\
if a + b < c.
The sum on the right suggests a Riemann sum for Jlogxdx. Indeed, it is the
midpoint Riemann sum for f^/t, logxdx corresponding to a division into n
equal subintervals, so the latter integral provides an approximation to log(n!). In
more detail, using this Riemann sum means taking log k as an approximation to
rk+{l/2) /-1/2
/ logxdx = / log(/c + x) dx.
Jk-(l/2) J-l/2
To see how good this approximation is, we approximate log (A; + x) by its tangent
line at x = 0 and use Taylor's theorem to estimate the error:
x 1 x2
log(fcH-x) = logk+T + Ek(x), \Ek(x)\< sup —.
k \t\<\x\ {k + ty l\
(Here (k+t)~2 is the absolute value of the second derivative of log(/c+i).) Clearly,
for \x < h and k > 1 we have
Hence,
/•1/2 /■1/2
= / [logk + k lx
7-1/2
E =
J-l/2 + Ek{i
where
I''2 Ek(x) dx
1
(7.59) Cfc —
<
J-l/2 2A;2
7.7. Stirling's Formula 351
I logxdx = log(n!)
Jl/2
Therefore,
Since log(l + x) « x for x near 0, as n —> oo the quantity on the right approaches
the constant | — \ log | -J]^0 c^, where the series converges by the estimate (7.59).
Exponentiating both sides, we obtain a preliminary version of Stirling's formula:
n!
7.60 Lemma. As n oo, approaches a limit L G (0, oo).
n n+(l/2)e-n
Since F(n) = ji!/n, Lemma 7.60 says that T(n)/(nn ^/^e n) —> L as n —>
oo. We now extend this result from integers n to real numbers x. To do so we need
a slight strengthening of Theorem 7.57, namely, the uniformity of the convergence
with respect to the parameter a.
xar(x)
7.61 Lemma. For any A > 0, sup - 1 as x —> oo.
0<a<,4 T(x + a)
Proof. With g(u) = f(u)a l — 1 as in the proof of Theorem 7.57, the function
l#'(tl)l = l(a ~ l)/(u)a 2f'(.u)\ is jointly continuous in a and u in the compact
region a £ [0, A], u € [0,1], so its maximum on this region is finite. The constant C
in that proof can be taken to be this maximum when a € [0, A], and the conclusion
of the proof shows that
xaT(x)
sup - 1
0<a<A T(x + a) x
T(x)
7.62 Lemma, lim —-p-, = L, where L is as in Lemma 7.60.
x—>oo xx \ i ' e~x
T(x) T(n + a)
xx-(l/2)e-x (n_|_a)n+a-(l/2)e-n-a
ea
77.
n-(l/2)e-n nar(n)
By Lemma 7.61, the first factor in this last expression will be as close to L as we
please when n is sufficiently large. By Lemma 7.62, the second factor will be as
close to 1 as we please when n is sufficiently large and 0 < a < 1. The same is
also true of the third factor; indeed, by taking logarithms it is enough to verify that
a - {n + a - \)\og(l + -
will be as close to 0 as we please when n is sufficiently large and 0 < a < 1, and
this is easily accomplished by using the Taylor expansion of log(l + t) about t = 0.
(Details are left to the reader as Exercise 1.) Combining these results, we see that
r(x)/xx~(1/2^e~x becomes as close to L as we please when x is sufficiently large,
as claimed. □
Y(x)
7.63 Theorem (Stirling's Formula), lim _(\ ' _ = \/2ix.
Proof. It remains only to identify the constant L in Lemma 7.62. According to that
lemma, the quantities
all approach L as x —> 00. Dividing the product of the first two by the third and
using the duplication formula
we see that
r(x)
L = lim
r(2s)
- lim V27re [ 1 + — ) .
2x )
The last factor on the right tends to e 1//2 as x —> 00, so we are done. □
7.7. Stirling's Formula 353
where ~ means that the ratio of the quantities on the left and right approaches 1 as
x —► oo. (The difference of these two quantites, however, tends to oo along with
x.)
EXERCISES
1. Complete the proof of Lemma 7.62 by showing that for some constant C > 0
we have sup0<a<1 a - (n + a - \) log[l + (a/ri)] j < C/n.
2. If a fair coin is tossed 2n times, the probability that it will come up heads
exactly n times is (2n)!/(n!)222n. (The total number of possible outcomes is
22n, and the number of those with exactly n heads is the binomial coefficient
{™) = (2n)!/(n!)2.) Use Stirling's formula to estimate this probability when
n is large.
hm -ttt^. = V2tt,
can be proved more simply than the general case. One begins, as we did, by
proving Lemma 7.60, but it is then enough to evaluate the constant L there.
To do this, show that the fraction on the right of Wallis's formula (Exercise 8
in §7.6) equals [2rin!]4/[(2n)!]2(2n + 1), then use Lemma 7.60 to show that it
approaches \L2 as n -^ oo; conclude that L = \/2n.
Chapter 8
FOURIER SERIES
Fourier series are infinite series that use the trigonometric functions cos n6 and
sinnO, or, equivalently, emd and e inS, as the basic building blocks, in the same
way that power series use the monomials xn. They are a basic tool for analyzing
periodic functions, and they therefore have applications in the study of physical
phenomena that are periodic in time (such as circular or oscillatory motion) or in
space (such as crystal lattices). They can also be used to analyze functions defined
on finite intervals in ways that are useful in solving differential equations, and this
leads to many other applications in physics and engineering. The theory of Fourier
series and its ramifications is an extensive subject that lies at the heart of much
of modern mathematical analysis. Here we present only the basics; for further
information we refer the reader to Foil and [6], Kammler [10], and Korner [11].
355
356 Chapter 8. Fourier Series
exist (and are finite). Moreover, we shall say that a P-periodic function / on R is
piecewise continuous if it is piecewise continuous on each interval of length P. (If
it is piecewise continuous on one such interval, of course, it is piecewise continuous
on all of them.)
Note. It is sometimes convenient to allow a piecewise continuous function to
be undefined at the points where it has jumps. This does not affect anything that
follows in a significant way.
A piecewise continuous function is integrable over every bounded interval in
its domain. In this connection, the following elementary fact is worth pointing out
explicitly: If / is P-periodic and piecewise continuous, the integrals of / over all
intervals of length P are equal:
ra+P rP
(8.3)
Here / may be either real-valued or complex-valued; the cn's are complex numbers,
and the series on the right is always to be interpreted as the limit of the symmetric
partial sums in which the nth and (—n)th terms are added in together:
oo k
= lim
k—»oo
—k
8.1. Periodic Functions and Fourier Series 357
Since e±ine = cos n6 ±i sin n6, combining the nth and (—n)th terms gives
where
The grouping of the nth and (—n)th terms in (8.4) corresponds to the grouping of
the cos n9 and sinn# terms in (8.4). (The factor of | in front of ao is an artifact of
the definition ao = cq + c_0 = 2cq.)
The series (8.3) and (8.4) can be used interchangeably. The more traditional
form is (8.4), but each of them has its advantages. The advantages of (8.4) derive
from the fact that cos n9 and sin n9 are real-valued and are respectively even and
odd; the advantages of (8.3) derive from the fact that exponentials tend to be eas
ier to manipulate than trig functions. For developing the basic theory, the latter
consideration is compelling, so we shall work mostly with (8.3).
The questions that face us are as follows: Given a 2yr-periodic function /, can
it be expanded in a series of the form (8.3)? If so, how do we find the coefficients
cn in this series? It turns out to be easier to tackle the second question first. That
is, we first assume that / can be expressed in the form (8.3) and figure out what
the coefficients cn must be; then we show that with this choice of cn, the expansion
(8.3) is actually valid under suitable hypotheses on /.
Suppose, then, that the series X]^°oo cn^m6 converges pointwise to the function
/(#), and suppose also that the convergence is sufficiently well behaved that term-
by-term integration is permissible. The coefficients cn can then be evaluated by
the following device. To compute ck, we multiply both sides of (8.3) by e~lk0 and
integrate over [—ir, tx\.
Now,
Thus all the terms on the right of the integrated series vanish except for the one
with n = k, and we obtain
or, relabeling k as n,
f(6)cosn0d9, bn = - I f(9)sinn6d9,
1 f
are well defined. We call them the Fourier coefficients of /, and we call the series
oo oo
The study of general Fourier series will be undertaken in the following sections.
We conclude this one by working out two simple examples.
f(0)=6, (-ir<9<ir).
That is, / is the sawtooth wave depicted in the top graph of Figure 8.1. The
calculation of the Fourier coefficients cn is an easy integration by parts for
cn = IT- \ Ve «# = — : I"
2tt /__
— 7T
2?r —in n2 7,71
cin6
in
Grouping together the nth and (—n)th terms yields the equivalent form
(We could also have obtained this series directly by using (8.7); we have an = 0
for all n since / is odd, and a calculation similar to the one above shows that
bn = 2{-l)n+l/n.)
The series (8.9) converges for all 6 by Dirichlet's test. (See Corollary
6.27. The factor of (-l)n+1 does not affect the result, since (-1)71 sinnfl =
sinn(# + 7r).) The sketches of some of the partial sums in Figure 8.1 lend plau
sibility to the conjecture that (8.9) does indeed converge to the function f(6),
at least at the points where / is continuous. (At the points 6 = (2k + l)n where
/ is discontinuous, every term in (8.9) vanishes.)
That is, g is the triangle wave depicted in the top graph of Figure 8.2. Here it
is a bit easier to calculate the Fourier coefficients in terms of sines and cosines.
Since g is an even function, we have bn = 0 for all n and
i r 2 r
On — - / g(6) cos n9d9 = - / 9 cos nOdO.
K J-7T X JO
360 Chapter 8. Fourier Series
\ A
A
V V
Figure 8.1: Top to bottom: The sawtooth wave of Example 1
and the partial sums 54, Sio, and 5i6 of its Fourier series
~1 siring).
8.1. Periodic Functions and Fourier Series 361
7T
[ n
n J0 ^ n
Since ^^° n 2 < 00, this series converges absolutely and uniformly by the
Weierstrass M-test. Again, a glance at its first few partial sums in Figure 8.2
supports the conjecture that its full sum is g(9).
362 Chapter 8. Fourier Series
EXERCISES
In Exercises 1-8, find the Fourier series of the 27r-periodic function f(9) that
is given on the interval (—it, it) by the indicated formula. (All but Exercise 5 are
either even or odd, so their Fourier series are naturally expressed in terms of cosines
or sines.) Sketches of these functions are given in Figure 8.3.
2. f(9) = sin2 0. (You don't need calculus if you look at this the right way.)
3. f($) = | sin 91. (Hint: sin a cos b = \ [sin(a + b) + sin(a - 6)].)
4. f(0) = 92.
5. f{9) = ebB (b > 0).
6. f(9)=9(7T-\9\).
7. f(0) = I 1//a ''"' "' where 0 < a < tt. (The values of /
are chosen to make the areas of the rectangles between the graph of / and the
x-axis on the intervals [0, a] and [a, it] both equal to 1.)
9. Prove that (8.2) is valid for every piecewise continuous P-periodic function /.
(This can be done either directly by changes of variable or by differentiating
f"+ with respect to a via Theorem 4.15a.)
Exercise 1 Exercise 2
Exercise 3
Exercise 4
Exercise 5 Exercise 6
o"! !"tt" a ir
Exercise 7 Exercise 8
bounded by a constant:
However, it is actually true that cn —> 0; in fact, we can say something more precise.
--n 2 <
_;
Proof. We examine the difference between / and a partial sum of its Fourier series.
Since the absolute value of a complex number z is given by \z\2 = z~z, we have
N N
-N -N -N
N N
cnf(9)eine]
-N m,n=-N
Next, integration of both sides over [—tt, tt], using the definition of cn and the
relation (8.5), yields
1 N
-N -N
-N
N
0<
(8.13) SfN(9) = cn =
-N
(The second equality is obtained by replacing n by — n, which leaves the sum from
—N to N unchanged, and the third one comes from the change of variable ip =
ip - 9 with the help of (8.2).) In other words,
r 1 N
(8.14) SfN{9) = f{tp + 0)DN(cp)dip, where DN{<p) =
Z)/v is called the TVth Dirichlet kernel. Its essential properties are summarized in
the following lemma.
_ i
□
366 Chapter 8. Fourier Series
Incidentally, if we multiply and divide the formula in Lemma 8.15b for Dw((p)
by e~*^/2, we obtain
1 ei{N+(l/2))ip _ e-i
This shows that D^ is real-valued and gives an easy way to visualize it: Its graph
is the rapidly oscillating sine wave y = sin(N + ^)<p, amplitude-modulated to fit
inside the envelope y = ±(2tt sin ^)~1. (The reader may wish to generate graphs
of Dn for various values of iV on a computer.)
We are now ready to formulate and prove the basic convergence theorem for
Fourier series. It turns out that piecewise continuity of a periodic function / is not
enough to yield a good result. Instead we shall assume, in effect, that not only
/ but also its derivative /' is piecewise continuous. More precisely, we shall say
that a periodic function / is piecewise smooth if, on any bounded interval, / is
of class C1 except at finitely many points, at which the one-sided limits f(0+),
/(#-), f'(0+), and f'(6-) (as defined in (8.1)) exist and are finite. (Note that this
definition of piecewise smoothness is more general than that given in §5.1, which
required the function to be continuous.) Pictorially, / is piecewise smooth if its
graph over any bounded interval is a smooth curve except at finitely many points
where it has jumps (if / is discontinuous) or corners (if / is continuous but /' is
discontinuous). In addition, the one-sided tangent lines at the jumps and corners
are not allowed to be vertical.
8.16 Theorem. Suppose f is 2ir-periodic and piecewise smooth. Then the partial
sums SN(9) of the Fourier series of f, defined by (8.13), converge pointwise to
\[f{6~) + /(#+)]• In particular, they converge to f(9) at eachpoint 6 where f is
continuous.
-i [/oh
8.2. Convergence of Fourier Series 367
Our object is to show that this quantity vanishes as N —> oo. By Lemma 8.15b, we
can rewrite it as
M7> 5F — 7T
where
(We could define g(0) to be anything we please; altering the value at this one point
does not affect (8.17), by Proposition 4.14.) On the interval [—7r,7r], g(tp) is con
tinuous wherever f(ip + 6) is and has jump discontinuities wherever f(ip + 8) does,
except for an additional singularity at <p = 0 caused by the vanishing of el<p — 1
there. But this singularity is also at worst a jump discontinuity; that is, the limits
<?(0+) and g(0—) both exist. Indeed, by l'Hopital's rule,
tend to zero as n —> =too. But the quantity (8.17) is simply C_/v_i - CV, so it
vanishes as iV —> oo, as desired. □
EXAMPLE 1. The sawtooth wave f(9) defined by f(9) = 9 for |0| < tt is
smooth except at the odd multiples of tt, where its left- and right-hand limits are
n and —tt, respectively. Thus the Fourier series of / converges to / everywhere
except at the odd multiples of tt, where it converges to 0. On the interval
(—tt.tt), the result is
00(_l\n+i q
sin n6 = - for — tt < 6 < tt.
^ n 22
(l) 11 i „
^ 2m - 1 35 7 ~ 4'
which we derived by other methods in Example 5 of §7.3.
Example 2. The triangle wave g{6) denned by f(6) = \9\ for |#| < it is
piecewise smooth and everywhere continuous, so it is the sum of its Fourier
series at every point. Thus,
it 4 ^ cos(2m - 1)0
> = 0 tor - 7T < 9 < TT.
2 7T ^ (2m - I)2
^ 1 111 Z[!
^ (2m - I)2 = 32 + 52 + 72 + : 8
From this it is also easy to obtain the sum
A 1 111
^ n2 " + 22 + 32 + 42
by separating out the odd and even terms:
1 1 \ / 1 1
)+{
vr 2 1 / 1 1 \ tt2 S
j n
We conclude by remarking that one can often use simple changes of variable
to generate new Fourier expansions from old ones without recalculating the coeffi
cients from scratch.
-l)m-1sin(2m- 1)0
(2m- I)2
j.
(8.19)
1 v^ >(<9-V)
dtp.
370 Chapter 8. Fourier Series
Since / is bounded, the Weierstrass M-test (comparison to ^r'nl again) gives the
uniform convergence to justify interchange of the summation and integration, and
a couple of manipulations like those that lead to (8.14) then show that
The function Pr is called the Poisson kernel. Like the Dirichlet kernel, it satisfies
0 f"K y
(8.21) / Pr(<p)d<p= /
J-n JO
(write Pr(f) = (Zk)'1 + tt~[ Yyj° r™ cos niP and integrate term by term), and it is
easily expressed in closed form since it is the sum of two geometric series:
(8.22)
ir°°
ir °° i i r i —iv>
i V
1-r2 1-r2
2tt(1 — re%v)(l — re~1^) 2vr(l -\- r2 — 2r cos ip)
However, the Poisson kernel has one additional crucial property that is not shared
by the Dirichlet kernel:
(8.23)
For any 5 > 0, Pr(<fi) —» 0 uniformly on [—it, —5] and on [5, ir] as r —> 1—.
1-r2
0
r2 -2rcos<5)'
and the expression on the right tends to zero as r —> 1 -. With these results in hand,
we come to the main theorem.
Proof. We sketch the ideas and leave the details to the reader as Exercises 5 and 6.
Given# G Rande > 0, wechoosec) > 0 small enough so that \f{0+<p)—f{0+)\ <
e when 0 < <p < 8 and \f(0 + <p) — f(Q-)\ < e when -5 < <p < 0. We then write
the formula (8.20) for Arf{6) as
Arf{9) =
s -
The first and last integrals tend to zero as r —* 1- by (8.23). In the second and
third integrals, f(0 + cp) is within e of /(#-) and f(6+), respectively, and (8.21)
and (8.23) together show that the integrals of Pr(<p) over [—6,0] and [0,5] tend to
\ as r -^ 1-. The upshot is that Arf{6) is within 2e of \[f{6-) + /(#+)] when
r is sufficiently close to 1, and since e is arbitrary, the first assertion is proved.
If / is continuous, it is uniformly continuous on [—vr, it] by Theorem 1.33 and
hence uniformly continuous on R by periodicity. This means that the 5 in the
preceding paragraph can be chosen independent of 6, and the argument given there
then yields uniform convergence. □
EXERCISES
1. Find the Fourier series of the sawtooth waves depicted below by modifying the
series in Example 1.
■tt/2
(a) (b)
2. Find the Fourier series of the 27r-periodic function f(9) defined by f(9) =
- ivr on the interval [— jtt, f vr]. Use the result of Exercise 4 in §8.1.
3. Find the Fourier series of the 2ir-periodic functions defined on the interval
(—it, it) by the indicated formulas by modifying the series in the exercises of
§8.1.
(-7T < 9 < 0),
a. f(0) =
(0<9 <tt).
372 Chapter 8. Fourier Series
V1 ' /; whereO<a<7r.
0 (a<\8\<n),
d. f{9) = sinhd.
4. Find the sums of the following series by applying Theorem 8.16 to the series
obtained in the indicated exercises from §8.1 and choosing appropriate values
of0.
a. \J —2 and yj -j—q (Exercise 3). Can you sum the first series
l ' i
in a more elementary way by rewriting it as a telescoping series?
00 y °°
b. > —^ and > -z— (Exercise 4).
z_^ n2 z_^ n2
l l
c. y^ —5—-x
n + ^
and y^ —=——, where b > 0 (Exercise 5).
00 (
d. >^ -, ^7 (Exercise 6).
z—^/ (2n 1)
1
5. Fill in the details of the proof of the first assertion of Theorem 8.24.
6. Fill in the details of the proof of the second assertion of Theorem 8.24.
We next study the differentiation and integration of Fourier series. As a first step,
we point out that by the fundamental theorem of calculus as stated in §4.1, the
formula
is valid when / is continuous and piecewise smooth, even though /' may be un
defined at finitely many points. (However, it is generally false if / itself has
jump discontinuities.) In particular, if / and g are both continuous and piecewise
smooth, then so is fg, and an application of (8.25) to the latter function yields the
integration-by-parts formula
The first main result is that there is a very simple relation between the Fourier
coefficients of / and those of /'.
c'n = incn.
Equivalently, ifan, bn and a'n, b'n are the Fourier coefficients of f and f given by
(8.7), then a'n = nbn and b'n = —nan.
The first term on the right vanishes because f(0)e m0 is 27r-periodic, and the sec
ond one is incn. The argument for an and bn is similar (Exercise 1). □
Note that Theorem 8.26 makes no claim about the Fourier series of /'; it is
valid whether or not that series actually converges. If we add more conditions on /
to ensure that it does, we obtain the following result:
8.27 Corollary. Suppose that f is 2vr -periodic, continuous, and piecewise smooth,
and that f is also piecewise smooth. If
is the Fourier series of f, then f'(9) is the sum of the derived series
oo
incnem
cnem = \J(n6n cos n9 —
1
at every 8 at which f'(0) exists. At the exceptional points where f has jumps, the
series converges to \[f'{0—) + /'(
Proof. By Theorem 8.16, /' is the sum of its Fourier series everywhere except
where it has jumps, and the coefficients in that series are given by Theorem 8.26.
□
374 Chapter 8. Fourier Series
that is, when the mean value of / over an interval of length 2vr is zero, or, equiv-
alently, when the constant term in the Fourier series of / vanishes. We make this
assumption in the following theorem; if it is not valid, we may wish to subtract off
the constant term and deal with it separately.
oo , ,
[ I-TT
2tt)-F{6)= f(<p)d<p=
J6 J-n
By Theorem 8.16, F is the sum of its Fourier series at every point, and by Theorem
8.26, its Fourier coefficients Cn are given for n ^ 0 by inCn = cn (and likewise
for the cosine and sine coefficients). The constant term Co is, as always, the mean
value of F. □
Observe that the series in Theorem 8.28 is obtained by formally integrating the
Fourier series of / term-by-term, whether the latter series converges or not.
8.3. Derivatives, Integrals, and Uniform Convergence 375
Example 2. Subtraction of the mean value from the triangle wave (Example
2 in §8.2) and multiplication by —2 gives
Theorem 8.28 and the Corollary 8.27 exhibit situations where we can integrate
or differentiate a series termwise without worrying about uniform convergence.
However, uniform and absolute convergence are still highly desirable things, so
we present a simple criterion for the Fourier series of a function to have these
properties.
Proof. Let cn and c'n be the Fourier coefficients of / and /'. Since |cr*e | =
cn\, the absolute convergence of J2cne%n9 is equivalent to the convergence of
^ \cn\, and by the Weierstrass M-test, this also implies the uniform convergence
of J2 Cnein9. But by Theorem 8.26, cn = djin for n ^ 0, so
(The inequality a(3 < ±(a2 + /32) is valid for all a, (3 € R since a2 + f32 - 2a(i =
(a — (3)2 > 0.) But the series J2 \c'n\2 and J2n~2 are both convergent — by
Bessel's inequality in the former case, since /' is piecewise continuous — and
hence so is ]T \cn\. □
EXERCISES
2. Given a e (0, it), let / be the 27r-periodic function defined by f(6) = a"1 for
|0| < a and f(0) = {a - tt)"1 for a < \9\ < ir.
a. Find the formula for g{6) = Jo f(tp) dip on [—ir, n] and sketch its graph.
b. Use the Fourier series of / found in Exercise 7 of §8.1 to compute the
Fourier series of g.
<,).
4.X0 ft
PS*
oo -. 4
El 7T4
n* ~ 90'
4. From Exercise 3 of §8.1, we know that
. 2 4^cos2n0. n^a^
sm 9 = > —s for 0 < 9 < n.
ix n ^-^ in2 - 1
Show that this series can be differentiated or integrated termwise to yield two
apparently different series expansions of cos 0 for 0 < 9 < it, and reconcile
these two expansions. (Hint: Example 1 of §8.2 is useful.)
8.4. Fourier Series on Intervals 311
5. Let f(9) be the 27r-periodic function such that f(6) = ee for \0\ < n, and let
J2^oo cnein6 be its Fourier series. If we formally differentiate this equation,
we obtain ee = Yl°?oo incnem8 for \9\ < jr. But then cn and incn are both
equal to [2tt)~1 f* eee~in9 d9, so cn = incn and hence cn = 0 for all n.
Clearly this is wrong; where is the mistake?
6. How smooth are the following functions? That is, for which k can you show
that the function is of class Ck?
0 0
For this extension the Fourier sine coefficients bn all vanish because /even(^) sinn6>
378 Chapter 8. Fourier Series
Figure 8.5: A function on [0, tt] (above) and its even and odd exten
sions to [—7T, 7i] (below, left and right).
Here the Fourier cosine coefficients an all vanish, and the sine coefficients bn are
given by
2 r
an = — / f(9)cosn9d9,
2 f71
7 bn = — / f(9)sinn9d9,
% Jo
8.4. Fourier Series on Intervals 379
Example 1. Let f(6) = 6 on [0, tt]. The even and odd periodic extensions of
/ are the triangle and sawtooth waves, respectively, and the Fourier cosine and
sine series of / are
tt sinn6>
respectively.
If / is piecewise smooth on [0, tt], its even and odd periodic extensions will be
piecewise smooth on IR. If /(0) = /(0+) and /(tt) = /(tt-), its even periodic
extension will be continuous at both 0 and tt, but its odd periodic extension will
have jumps at 0 or tt unless /(0) = 0 or /(tt) = 0, respectively. In any case, an
application of Theorem 8.16 to these extensions easily yields the following:
8.30 Theorem. Suppose f is piecewise smooth on [0, tt]. The Fourier cosine series
and the Fourier sine series of f converge to ^ [/($—) + /($+)] o-t every 9 £ (0, tt).
The cosine series converges to /(0+) at 9 = 0 and to /(tt—) at 0 = it; the sine
series converges to 0 at both these points.
We may wish to consider periodic functions with period other than 2tt, or func
tions defined on intervals other than [0, tt]. The general situation can be reduced to
the one we have studied by a linear change of variable; we record the results for
future reference.
Suppose f(x) is a piecewise smooth 2/-periodic function. We make the change
of variables
oo f7T
°° 1 fl
(8.31) f(x) = Vcnem7Tx/', cn = — f(x)e~m7rx/ld
-oo J '
oo
EF 'Kl TT Hf* *n TT O" ~1
an cos — h bn sin —— .
L i I J
380 Chapter 8. Fourier Series
where
1 fl ti \
On = 7 / j{x)cos——dx,
nnx j i,
bn = -
l fl ti \ ■ n7rx j
f(x)sm——dx.
1 .1-1 ' ij-j I
It follows that the Fourier cosine and sine series of a piecewise smooth function /
on the interval [0,1] are
and
We conclude with a few remarks comparing Taylor series and Fourier series,
EXERCISES
1. Find the Fourier cosine series and the Fourier sine series of the following func
tions on the interval [0, it]. All of these series can be derived from the results
of the examples and exercises in §8.1 without computing the coefficients from
scratch.
a. f{9) = 1.
b. f(9) = sin0.
c- f(@) = @2- (f°r me sme series, use Example 1 and Exercise 6 in §8.1.)
d. f($) = 9 for 0 < 9 < \n, f{6) = ir - 9 for \ix < 9 < ir.
2. Expand the given function in a series of the given type. As in Exercise 1, use
previously derived results as much as possible.
a. f(x) — 1; sine series on [0,1].
b. f{x) = 1 for 0 < x < 2, f(x) = — 1 for 2 < x < 4; cosine series on
[0,4].
c. f(x) = Ix — x2; sine series on [0,1].
d. f(x) = ex; series of the form J^00^ cne27Tinx on [0,1].
3. Suppose / is a piecewise continuous function on [0,2/] that satisfies f(x) =
f(2l - x) (that is, the graph of / is symmetric about the line x = I). Let an and
bn be the Fourier cosine and sine coefficients of / (given by (8.32) and (8.33)
with / replaced by 21). Show that an = 0 for n odd and bn = 0 for n even.
Fourier series were originally invented in order to solve some boundary value prob
lems of mathematical physics. In this section we study a few basic examples.
Heat Flow in an Insulated Rod. Consider a rod occupying the interval [0.1],
insulated so that no heat can enter or leave it, and let f{x) be the temperature at
position x and time t = 0. How does the temperature distribution evolve with time?
(Note: Instead of thinking of a thin rod, one can think of a thick cylindrical slab
382 Chapter 8. Fourier Series
Figure 8.6: The cylindrical slab {{x, y, z) : 0 < x < I, (y, z) e R}.
occupying the region where 0 < x < I and (y, z) € R, where R is a bounded
region in the yz-plane, as in Figure 8.6. The model of heat flow described here is
valid under the hypothesis that the temperature depends only on x.)
Let u(x, t) denote the temperature at position x and time t; thus u satisfies the
initial condition u(x, 0) = f(x). As we showed in §5.6, u obeys the heat equation
dtu — kd%u, where /c is a positive constant (equal to K/a in (5.42)). Since the
rate of heat flow across the point x is proportional to —dxu(x,t) (Newton's law
of cooling), the fact that no heat enters or leaves the ends of the rod means that u
satisfies the boundary conditions dxu(0, t) = dxu(l, t) = 0. In summary,
In this last equation, the quantities on the left and right depend only on t and x,
respectively, so they must both be equal to a constant that we call —a. Thus,
These are simple ordinary differential equations, and the general solutions are read
ily found:
We have thus found a large family of solutions of the heat equation of the form
<p(x)ijj(t). For these solutions, the boundary conditions dxu(0,t) = dxu{l,t) = 0
8.5. Applications to Differential Equations 383
so the condition (p'(0) = 0 forces C% = 0, and the condition y>'(l) = 0 then forces
v/a to be a multiple of n/l, or a = n2ir2/l2 where n is an integer (which might as
well be nonnegative). In short, we have obtained the following family of solutions
of the heat equation together with the boundary conditions:
Since the heat equation and the boundary conditions are linear, we obtain more
general solutions by taking linear combinations of these. In fact, we can pass to
infinite linear combinations — that is, infinite series of the form
oo / 2 2 7 j. \
Finally, we are ready to tackle the initial condition u(x, 0) = f(x). If we set
t = 0 in (8.35), we obtain
oo
v—\ nirx
u{x,0) — > an cos—:—,
o
so we can make u(x, 0) equal to f(x) by taking the series on the right to be the
Fourier cosine series of /, defined by (8.32)! (Note that the constant term, which
we called ^ao before, is called oq here.) In other words, to solve the problem (8.34),
we take u(x, t) to be defined by (8.35), where the coefficients an are given in terms
of the initial data / by
At this point we should stop to verify that the proposed solution (8.35) of the
problem (8.34) really works, as the passage from finite linear combinations to infi
nite series has the potential to cause difficulties. In fact, everything turns out quite
nicely for this problem. In the first place, if the initial temperature distribution f(x)
is continuous and piecewise smooth (a reasonable physical assumption), the same
will be true of its even 2/-periodic extension, so by Theorem 8.29, its Fourier series
is absolutely and uniformly convergent. In particular, Y.T \an\ < °°- The abso
lute value of the nth term of the series in (8.35) is at most an , so the Weierstrass
384 Chapter 8. Fourier Series
M-test shows that this series converges absolutely and uniformly for 0 < x < I and
t > 0 to define a continuous function u(x, t) there. Moreover, for t > 0, the ex
ponential factors in (8.35) decay rapidly as n —> oo, which makes the convergence
even better. In particular, repeated differentiation with respect to t or x introduces
factors of nk into the series, which are still overpowered by the decay of the expo
nential factors, so the differentiated series still converges absolutely and uniformly.
If follows that u(x, t) is of class C°° for t > 0 and that termwise differentiation
is permissible; u therefore satisfies the heat equation and the boundary conditions
because each term of the series does.
Two further remarks: First, as t —> oc, the exponential factors in (8.35) all
tend rapidly to zero except for the one with n = 0, and so u(x,t) approaches
the constant ao, the mean value of /' on the interval [0, /]. In physical terms this
means that the rod approaches thermal equilibrium as time progresses. Second, the
series (8.35) will usually diverge when t < 0, for then the exponential factors grow
rather than decay! This corresponds to the physical fact that time is irreversible for
diffusion processes governed by the heat equation.
The Vibrating String. We now study the vibrations of a string stretched across
the interval 0 < x < I and fixed at the endpoints. (Think of a guitar string, and see
Figure 8.7.) Here u(x, t) will denote the displacement of the string (in a direction
perpendicular to the x-axis) at position x and time t. The relevant differential
equation is the wave equation d2u = c2d2u, where c is a positive constant that
can be interpreted as the speed with which disturbances propagate down the string.
(See Folland [6, pp. 388-90] or Kammler [10, pp. 526-7] for a derivation of the
wave equation from physical principles.) Since the string is fixed at both ends,
the boundary conditions for this problem are u(0, t) = u(l, t) = 0. As for initial
conditions, since the wave equation is second-order in t we need to specify both
the initial displacement u(x, 0) and the initial velocity dtu(x, 0). Thus the problem
we have to solve is
(8.36)
do o9 O
ip(x)ip"{t) = c2tp"(x)4>{t), or
tp(x)
8.5. Applications to Differential Equations 385
In the last equation, the quantities on the left and right depend only on t and x,
respectively, so they are both equal to a constant —a, and we obtain the ordinary
differential equations
The boundary condition u(0, t) = 0 forces C\ to vanish, and then the boundary
condition u(l, t) = 0 forces ^/a to be a multiple of w/l, so a = n2TT2/l2 for some
(positive) integer n. With this value of a, the general solution of the differential
equation for tp is
, . . , nirct „ . mrct
ip{t) = ocos — h B sin —-—.
I L
(The arbitrary constants are labeled b and B for reasons that will become clearer in
a moment.)
For each positive integer n, we therefore have the solution
Taking linear combinations and passing to limits, we are led to the series solution
oo
nirx
t &nsin——.
so we satisfy the condition u(x, 0) = f(x) by taking the bn's to be the Fourier sine
coefficients of /:
2 r
bn = t / f{x)sm- dsc.
' JO
386 Chapter 8. Fourier Series
— (x,0) = T—Bnsm
i
2 fl . . n-KX
Bn = / g(x) sin —— dx.
n-nc Jo I
to converge. The extra factor of n2 makes the terms larger, and there is no ex
ponential decay anywhere to compensate. If we recall that the decay of Fourier
coefficients is related to the degree of smoothness of the function in question, the
contrast with the heat equation may be expressed as follows: The diffusion of heat
tends to smooth out irregularities in the initial temperature distribution, but in wave
motion, any initial roughness simply propagates without dying out.
We can obtain a positive result by imposing more differentiability hypotheses
on / and g. If we assume that not only / and g but also the first two derivatives of /
and the first derivative of g are continuous and piecewise smooth, and that not only
/ and g but also /" vanishes at the endpoints (so that its odd periodic extension
is continuous there), then Theorems 8.26 and 8.29 imply that Y2n2\bn\ < oo and
J2 n'2\Bn\ < oo, which guarantees the absolute and uniform convergence of (8.38).
This is also enough to guarantee that the formal differentiation of (8.37) that led to
the formula for the Bn's is valid.
However, these additional assumptions are rather unnatural from a physical
point of view. The obvious model for a plucked string, for example, is to take
/' to be a piecewise linear function as in Figure 8.8. It is easy to calculate the
8.5. Applications to Differential Equations 387
coefficients bn explicitly for such an / (Exercise 4), and they turn out to decay
exactly like n~2. The series (8.37) therefore converges nicely, and we may expect
it to provide a good description of the physical vibration of the string. On the other
hand, the twice-differentiated series (8.38) does not converge at all, so it is hard to
say in what sense (8.37) satisfies the wave equation. The resolution of this paradox
is to expand our vision of what a solution of a differential equation ought to be and
to develop a notion of "weak solution" that will encompass examples such as this
one. But this is a more advanced topic; see, for example, Folland [6, §9.5].
Taking for granted that the series (8.37) really is the solution of the boundary
value problem (8.36), we say a few words about its physical interpretation. Think of
the string as being a producer of musical notes such as a guitar string. The nth term
in the series (8.37), as a function of t, is a pure sine wave with frequency nirc/l,
which represents a musical tone at a pure, definite pitch. The series (8.37) therefore
shows how the sound produced by the string can be resolved into a superposition
of these pure pitches. Typically, the coefficients bn and Bn decrease as n increases,
so that the largest contribution comes from the first term, n = 1. This is the
"fundamental" pitch, and the higher n's are the "overtones" that give the note its
particular tone quality.
Related Problems. The heat flow and vibration problems (8.34) and (8.36)
can be modified by changing the boundary conditions; this leads to models of other
interesting physical processes. Here are a few examples:
du d2u
k u{x,0) = f{x), u(0,t) = u(l,t) = 0
models the flow of heat in a rod that occupies the interval 0 < x < I when both
ends are held at temperature zero — by immersing them in ice water, for instance.
(Note that the heat equation doesn't care where the zero point of the temperature
scale is located; if u is a solution, so is u + c for any constant c. Of course, this
means that the validity of the heat equation as a model for actual thermodynamic
processes has its limitations, as absolute zero exists physically.) The method of
solution is exactly the same as for the insulated problem (8.34), except that the
388 Chapter 8. Fourier Series
boundary conditions for <p(x) are ip(0) = <p(l) = 0. Thus, as in the vibrating string
problem, we obtain ip(x) = sin(nirx/l), and the solution is given by
—n2TT2kt\ rmx
u(x,t) = > exp r= sin
_ 2
dx2'
models the vibration of air in a cylindrical pipe occupying the interval 0 < x < I
that is open at both ends. (Examples: flutes and some organ pipes.) Here u(x,t)
represents the longitudinal displacement of the air at position x and time t. The
boundary conditions dxu(0, t) = dxu(l, t) = 0 come from the fact that the change
in air pressure due to the displacement u is proportional to dxu, and the air pressure
at both ends must remain equal to the ambient air pressure. Again, the solution is
very similar to (8.37) except that it involves cosines instead of sines in x:
where \olq + ^i° an cos(nnx/l) and \Aq + ^^°(n7rc/ln//) cos(n7rx//) are the
Fourier cosine series of / and g, respectively. (The term |(ao + AqI) represents
a flow of air down the tube with constant velocity, of no importance for the vibra
tions.) As with the vibrating string, the vibrations of the pipe are a superposition of
vibrations at the definite frequencies mrc/l (n = 1, 2,3,...).
3. We can also mix the two types of boundary conditions we have been consid
ering: for the heat equation,
d2u 2^u
(/'It (til
u(x, 0) = f(x), —(x, 0) = g(x), u(0, t) = —(I, t) = 0.
8.5. Applications to Differential Equations 389
The first of these models heat flow in a rod where one end is held at temperature
zero and the other is insulated; the second models vibrations of air in cylindrical
pipes where one end is closed and the other is open, such as clarinets and some
organ pipes. In both of them, separation of variables leads to the ordinary differen
tial equation <p"(x) = —atp(x) with boundary conditions ip(0) = f'{l) = 0. The
general solution of the differential equation is f(x) = C\ cos ^fax + C2 sin^/ax;
the condition ip(0) = 0 forces C\ to vanish, and then the condition ip'(l) = 0 forces
yfa. to be of the form (n — ^)tt/1 with n a positive integer. We are therefore led to
try to expand the initial functions in a series of the form
00
TTX
}{x) = y^ ans'm(n - \, ,
i '
This can indeed be done; the technique for reducing this problem to one of ordinary
Fourier sine series is outlined in Exercise 4 of §8.4.
It is interesting to note that the resulting frequencies for the vibrating pipe are
(n — ~)irc/l (n = 1,2,3,...). In particular, the fundamental frequency for a pipe
closed at one end and open at the other, namely ^ttc/1, is half as great as for a
pipe of equal length that is open at both ends. Moreover, only the odd-numbered
multiples of this fundamental frequency occur as "harmonics" for half-open pipes,
whereas all integer multiples occur for open pipes; as a result, the two kinds of
pipes produce notes of different tone qualities.
4. Clearly there are many other variations to be played on this theme — dif
ferent boundary conditions, other differential equations, and so on. A few further
examples are outlined in the exercises, and we shall indicate a more general frame
work in which such problems can be studied in the next section.
EXERCISES
1. A rod 100 cm long is insulated along its length and at both ends. Suppose that
its initial temperature is u(x, 0) = x (x in cm, u in °C, t in sec, 0 < x < 100),
and that its diffusivity coefficient k is 1.1 cm2/sec (about right if the rod is made
of copper).
a. Find the temperature u(x, t) for t > 0. (For the relevant Fourier series, see
Example 1 of §8.4.)
b. Show that the first three nonvanishing terms of the series (including the
constant term) give the temperature accurately to within 1° when t = 60
(one minute after starting). What are u(0, 60), u(10, 60), and u(40, 60) to
the nearest 1°? (Hint: Y^T(2n ~ l)~2 = ^78' so ET(2n - l)~2 =
(tt2/8) - 1 - i n 0.123.)
390 Chapter 8. Fourier Series
c. Show that u(x, t) is within 1° of its equilibrium value of 50° for all x when
t > 3600 (i.e., after one hour). (Don't work too hard; crude estimates are
enough.)
2. Find the temperature function u(6, t) (t > 0) for a rod bent into the shape of
a circular hoop, given the initial temperature u(6, 0) = f{6). (Here 6 denotes
the angular coordinate on the circle, and the boundary conditions for a straight
rod are replaced by the requirement that u should be a 27r-periodic function of
e.)
4. Consider a vibrating string occupying the interval [0, /]. Suppose the string is
plucked at x = b (0 < b < 1) so that its initial displacement u(x, 0) is rnx/b
for 0 < x < b and m(l — x)/(l - b) for b < x < I (that is, u(x, 0) is linear on
[0, b] and on [b, 1], and equal to m at x = b), and its initial velocity dtu(x, 0)
is zero. (Note: For this to be a realistic model of a plucked string, we should
have I 3> m.)
a. Find the Fourier series for u(x, t) for t > 0. (The result of Exercise 2 of
§8.3 can be used.)
b. Compute the coefficients b\,..., 65 of the first five terms (notation as in
(8.37)) numerically when b = (0.4)/ and when b = (0.1)/. Observe that
the higher frequencies contribute a lot more to u(x,t) when b = (0.1)1
than when b = (0.4)/. (Musically: Plucking a string nearer the end gives a
note with more "harmonics.")
8.5. Applications to Differential Equations 391
5. The model for a vibrating string given by the wave equation is unrealistic be
cause it predicts that the vibration will continue forever without dying out. Real
strings, however, are not perfectly elastic, so the vibrational energy is gradu
ally dissipated. A better model is obtained by the following modification of the
wave equation:
d\u — (?d2xu - 2Sdtu,
where 5 is a small positive constant. (The left side is the acceleration, and the
terms on the right are the effects of the elastic restoring force and the damping
force that tends to slow the motion down. The factor of 2 is just for conve
nience.) Find the general solution of this differential equation subject to the
boundary conditions u(0,t) = u(l,t) = 0 by modifying the method used in
the text for the ordinary wave equation. Assume that 5 < ire/1. You should find
that the solutions decay exponentially in time and that the frequencies decrease
as the damping constant 5 increases.
Exercises 6 and 7 concern the Dirichlet problem for a bounded open set 5 C
IR2: Given a function / on the boundary dS, find a solution of Laplace's equation
<92u + d2u = 0 on S such that u = f on dS. (A physical interpretation: Find
the steady-state distribution of heat in S when the temperature on the boundary is
given.)
In this section we shall re-examine the notion of Fourier series in the light of a
profound analogy with certain ideas from vector algebra. We begin with a quick
review of the latter.
When expressed in algebraic terms, the concepts of Euclidean geometry in n
dimensions are based on the vector-space structure of Rn (that is, the operations
of vector addition and scalar multiplication), together with the dot product or inner
product a • b, in terms of which we can define lengths (|a| = (a • a)1/2) and angles
(the angle from a to b is arccos(a • b/|a| |b|)). The "natural" coordinate systems
for this geometry are the ones arising from an orthonormal basis for R", that is, a
basis ui,..., un such that Uj ■ u^ equals 0 for j ^ k and 1 for j = k. The formula
for expressing an arbitrary vector x in terms of such a basis is given very simply in
terms of inner products:
= £■
x =
(The formula for Cj results from taking the inner product of both sides of the equa
tion x = jyi ckuk with uj to yield x • u3■ = Y^l
8.6. The Infinite-Dimensional Geometry of Fourier Series 393
Similar ideas underlie the study of complex n-dimensional vectors. The main
difference is that, since the absolute value \z\ of a complex number z is given by
(zz)1//2 rather than (z2)1/'2, the appropriate definition of inner product is
If the basis {uj} is orthogonal ((uj, Ufc) =0 for j ^ k) but not normalized (||uj|
not necessarily equal to 1), the formula becomes
Now we are ready to make the conceptual leap from the discrete and finite-
dimensional to the continuous and infinite dimensional. Suppose we are studying
functions on an interval [a,b] — let us say, piecewise continuous, complex-valued
ones. We regard such a function / as a "vector" whose "components" are the
values f(x) as x ranges over [a, b]. We define the inner product of two functions
/ and g just as in (8.39) except that the sum is replaced by an integral:
i 1/2
\f(x)\2dx ,
394 Chapter 8. Fourier Series
Thus {en}™oo z<s an orthogonal set; the corresponding orthonormal set is {ipn}~ oo
oo
where Lpn = (27r)~1//2en. The formula for the Fourier series of a function /,
f—W e
./ — /^cnen, r ■- — f f(T\r-inx rlv —
cn — — ; J{x)e ax —
is an exact analogue of the formula (8.40) for the expansion of a vector in terms of
an orthogonal basis!
A similar interpretation holds for Fourier cosine and sine series. To wit, it is
easy to verify (Exercise 1) that {cosnTrx/^j-Q0 and {smrnrx/l}1^0 are orthogonal
sets on the interval [0,1], and that the formulas for the Fourier cosine and sine
coefficients of a function / on [0,1] are analogous to (8.40).
There are some unanswered questions here, however. The inner product (/, g)
makes sense when / and g are piecewise continuous on [a, b], but we have proved
the validity of Fourier expansions only for piecewise smooth functions. So, what is
the "right" class of functions to consider here? Can we make sense out of Fourier
series for functions that may not be piecewise smooth?
The key insight is that pointwise convergence is the wrong notion of conver
gence in this situation. Instead, we should use a notion of convergence that arises
from the geometry of the inner product. That is, we think of the set
r rb -jl/2
Distance from / to g = \\f - g\\ = \ \f(x) - g(x)\2 dx ,
\.Ja
and the corresponding notion of convergence is that
Note. If the distance ||/ — g\\ between two piecewise continuous functions is
zero, it does not follow that / is identically equal to g, but only that f(x) = g(x) for
all except perhaps finitely many values of x. In this setting, it is appropriate not to
worry about this technicality and to think of two functions as being the same when
they differ only at finitely many points. This issue already arose in connection
with the behavior of the Fourier series of / at points where / is discontinuous
(cf. Corollary 8.18).
Mean-square convergence is rather different from pointwise convergence, and
neither one implies the other. For example, let us take [a, b] = [—1,1]. If
then fk —* 0 pointwise but ||/fc|| = (J^ k2 dx)1/2 — \fk —♦ oo. On the other
hand, if
. . (l if -1/k < x < 1/k,
9k{x) = < .
I (J otherwise,
then \\gk\\ = {J^{k,k dx)1/2 = y/2/~k -* 0, but gk(Q) = 1 ■/* 0. (By replacing
the interval (—1/k, 1/k) here by an interval Ik whose length tends to 0 but whose
midpoint oscillates back and forth within the interval [—1,1] as k —> oo, one can
construct examples of sequences {gk} that converge in norm but do not converge
at any point.) However, for uniform rather than pointwise convergence there is
something to say.
8.42 Proposition. If fk —> f uniformly on [a, 6], then fk —> f in norm on [a, 6].
\fk(x)-f(x)\2dx<(b-a)C2^0.
More generally, fk —> / in norm provided that fk —> / pointwise and there is a
constant C such that \fk(x)\ < C for all k and all x G [a, b]; this follows from the
bounded convergence theorem (4.52).
396 Chapter 8. Fourier Series
The introduction of norm convergence is justified by the fact that the Fourier
series of any piecewise continuous function f on [—n, n] converges in norm to f.
This is a substantial result, but there is more to be said before we state a formal
theorem.
The space PC(a, b) of piecewise continuous functions on [a, b] fails to be a
good infinite-dimensional analogue of Euclidean space in one crucial respect: it is
not complete. That is, if {fk} is a sequence in PC (a, b) such that \\fj — fk\\ —* 0
as j, k —> oo, there may not be a function / £ PC (a, b) such that \\fk — f\\ —> 0.
For example, with [a, b] = [0,1], let
[0 otherwise.
It is easily verified that \\fj — fk\\2 = 2|j~1//2 —fc~1//2| —» 0 as j,k —» oo. However,
the function to which the f^'s are converging is clearly f(x) = x"1/4 (x > 0),
which does not belong to PC(0,1) because it blows up at 0. Thus, to fill in the
"holes" in PC(a, b) one will have to deal with unbounded functions and improper
integrals. But even this is not enough; with more cleverness one can construct ex
amples where the limiting function / is not (Riemann) integrable on any subinterval
of [a, b).
What is needed here is the Lebesgue integral, which handles integrals of un
bounded and discontinuous functions more capably (see §4.8). The appropriate
"completion" of the space PC(a, b) is the space of square-integrable functions,
where the integral is a Lebesgue integral. (The name "L2" is pronounced "L-two";
the L is in honor of Lebesgue and the 2 refers to the exponent in |/(x) |2.)
We can now state the general convergence theorem for Fourier series.
p c — — f(6)e~in6d9
N 2
Jnd
lim d,6 = 0.
-N
8.6. The Infinite-Dimensional Geometry of Fourier Series 397
oo f7r
-oo J-'<
c. If {cn}0?^ is any sequence of complex numbers such that Yl^oo \cn\2 con
verges, then the series X^co cn^n converges in norm to a function in L2(—tt, tt).
Proof. A full proof of Theorem 8.43 is beyond the scope of this book. (One may
be found in Jones [9, p. 325] or Rudin [18, pp. 328ff.].) However, the idea is as
follows. If / is continuous and piecewise smooth, we know that its Fourier series
converges uniformly (Theorem 8.29) and hence in norm, so (a) is valid for such /.
We then obtain the result for arbitrary / € L2(—vr, tt) by a limiting argument that
involves proving that any function in L2{-tt, tt) is the limit in norm of a sequence
of continuous, piecewise smooth functions. (A partial result in this direction is
indicated in Exercise 7.) (b) follows easily because, as we showed in the proof of
Bessel's inequality,
N PIT JV
\f(9)\2dd- md
2tt ^C"I=^L JV
and the integral on the right tends to zero as iV —* oo since the series converges
in norm to /. (c) follows from (b) and the completeness of L2(—tt, tt). Indeed, by
(b),
2
dO = 2n 2_^ \cn\ ,
M<\n\<N M<\n\<N
so the partial sums of the series' cnen are Cauchy in norm; by completeness, the
series converges in norm. □
Theorem 8.43 says that {einx}°^oo is an orthogonal basis for L2(-7r,vr), that
is, an orthogonal set with the property that every function in L2(—tt,tt) can be
expanded uniquely as a norm-convergent series of scalar multiples of functions in
the set. Likewise, {cosnrc}o° and {sinnx}f° are orthogonal bases for L2(0,7r);
see Exercises 1 and 2.
The equality in Theorem 8.43b,
oo
(8.44)
E
—oo
398 Chapter 8. Fourier Series
v^ 1 1[v- 1 v- 1
l2d6=T-
n"
—oo
8.45 Corollary. If f,g £ L2(-k, it) have the Fourier series ^2 cnen and Y^, In^n,
then
° "I
The Fourier bases {e1™}^, {cosnx}o° and {smnxjf1 play a special role
among all the orthogonal bases for L2(-tt, it) and L2(0, n) because these functions
are eigenfunctions for the differential operators d/dx and d2/dx2. To explain this
in more detail, we recall that an eigenvector for a linear transformation T on W1
or Cn is a nonzero vector x such that Tx = Ax for some scalar A. (See Appendix
A, (A.56)-(A.58)). In our situation, the "vectors" are functions in L2(—vr,7r) or
L2(0, it), and the linear transformation in question is d/dx or d2/dx2, defined not
8.6. The Infinite-Dimensional Geometry of Fourier Series 399
on the whole L2 space but on a suitable subspace of functions that possess the
requisite derivatives and satisfy certain boundary conditions. Indeed, we have
777 T • ?T} T 2 2
—e —me , -—r cos nx =—n cosnx, ——^sinnx=—n smnx.
ax axz dxz
The functions emx are precisely the eigenfunctions of d/dx on [—it, n] that satisfy
the periodicity condition /( — it) = f(ir), and the functions cos no; and sinnx are
precisely the eigenfunctions of d2/dx2 on [0,n] that satisfy the boundary condi
tions /'(0) = f'{ir) = 0 and /(0) = f{n) — 0, respectively. The Fourier expan
sion of a function therefore provides the analogue of the spectral theorem (A.58)
for these fundamental differential operators, with all the resulting simplifications
that one expects when one finds an orthonormal eigenbasis for a matrix.
For example, we can rederive the solution (8.35) of the insulated heat flow
problem (8.34) as follows. To solve the heat equation dtu = kd2u subject to the
boundary conditions dxu(0, t) = dxu(l, t) = 0, we take u to be the sum of a series
of eigenfunctions of d2 satisfying these boundary conditions:
oo
EXERCISES
1. Show that {cos nxj'jf and {sin nx}f are orthogonal sets of functions on [0, it].
What are the norms of these functions?
2. Deduce from Theorem 8.43 that if / G L2(0, n), the Fourier cosine and sine
series of / both converge to / in norm.
400 Chapter 8. Fourier Series
= lim
8. Show that in terms of the cosine and sine coefficients an and bn defined by
(8.7), Parseval's identity takes the form
9. Evaluate the following series by applying Parseval's identity, in the form given
in Exercise 8, to certain of the Fourier series found in the exercises of §8.1 and
§8.3. (Remember that the constant term is \ao, not a0.)
a.
oo
\—■>.
b.
^ (2n - l)f
~ 1
C.
1
8.7. The Isoperimetric Inequality 401
v-^ sm na
a. } 5— (First assume that 0 < a < it, then deduce the general re-
^—' nz
suit.)
10. Suppose that / is 27r-periodic, real-valued, and of class C1. Show that /' is
orthogonal to / on [—ir, n] in two ways: (i) directly from the fact that 2//' =
if2)', and (ii) by expanding / in a Fourier series and using (8.46). (Hint: When
/ is real we have c_n = cn; why?)
We conclude this chapter by using Fourier analysis together with Green's theorem
(thereby joining two of the main threads of this book) to show that among all simple
closed curves in the plane with a given length, the circle is the one that encloses the
greatest area.
(8.47) |h'(s)| =
Now, suppose in addition that our curve is a simple closed curve; this means
that, for 0 < s\ < S2 < L, h(si) = h(s2) only when 8\ = 0 and S2 = L. We can
then extend the function h from [0, L] to R by requiring it to be L-periodic; this
extension is still continuous and piecewise smooth. (Indeed, this is the natural way
to think of a simple closed curve. We think of 6 = 2tys/L as the angular coordinate
on a circle; then h(s) traces out the curve as 9 goes once around the circle.)
Finally, we observe that we can identify R2 with the complex plane C and
the vector-valued function h = (/ii,/i2) with the complex-valued function ( =
hi + ih2. The "velocity" h'(s) then turns into ('(s), and the condition (8.47)
becomes |('(s)| = 1.
Now we are ready to state our theorem:
402 Chapter 8. Fourier Series
Proof. We identify the plane with C Dilating the plane by a factor of r, z —> rz,
has the effect of multiplying the length of a curve by r and the area of a region by r2,
so it is enough to consider the case L = 2ir, for which the conclusion is that A < it.
By the preceding remarks, then, we can assume that C is given by z = ((s), where
( is a continuous, piecewise smooth, 27r-periodic, complex-valued function on R,
and |C'(s)| = 1 (except at isolated points where £'(s) is undefined). We expand £
in a Fourier series:
Cn.e .
Since (" is continuous and piecewise smooth, the nth Fourier coefficient of (' is
incn, by Theorem 8.26. Since |C'(s)| = 1, Parseval's identity implies that
(8.49)
On the other hand, by Green's theorem (see Example 3 in §5.2), the area of the
region enclosed by C is
A = \ j xdy-y dx
so
oo
A = it Im cnincn = TT
£■»
—oo
Comparing this with (8.49) yields the desired upper bound for A:
oo oo
A = 7T Vn
Z—/
2
< vr^ n Cr. ' <w = TT.
— oo —oo
8.7. The Isoperimetric Inequality 403
Moreover, the second inequality is strict unless cn = 0 for n\ > 1. In that case,
the first inequality becomes
'-1
which is strict unless either c\ or c_i vanishes. Thus A < n unless ((s) =
Co + c\els or ("(s) = cq + C-\e~ls, both of which describe a circle centered at
Co, traversed counterclockwise or clockwise, respectively. (In either case the radius
is 1 since |e±i| = |C'(s)| = 1.) □