0% found this document useful (0 votes)

430 views401 pages

Gerald Folland - Advanced Calculus

This chapter introduces basic concepts in geometry, vectors, limits, continuity and sequences that will be used throughout later chapters. It also covers some foundational topological results in a concise manner. The key points are: 1) It reviews standard notation for sums, factorials, sets, real numbers, complex numbers, mappings/functions, special functions and logical symbols. 2) It defines n-dimensional Euclidean space and discusses representing points as vectors and basic vector operations like addition and scalar multiplication that work in any number of dimensions. 3) It introduces norms of vectors and states Cauchy's inequality and the triangle inequality relating the dot product and norms, generalizing these concepts to higher dimensions.

Uploaded by

egemn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

430 views401 pages

Gerald Folland - Advanced Calculus

Uploaded by

egemn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 401

Chapter 1

SETTING THE STAGE

The first half of this chapter (§§1.1-4) presents basic facts and concepts concern
ing geometry, vectors, limits, continuity, and sequences; the material in it is used
throughout the later chapters. The second half (§§1.5-8) deals with some of the
more technical topological results that underlie calculus. It is quite concise and in
cludes nothing but what is needed in this book. The reader who wishes to proceed
quickly to the study of differentiation and integration may scan it quickly and refer
back to it as necessary; on the other hand, the reader who wishes to see a more
extensive development of this material is referred to books on the foundations of
analysis such as DePree and Swartz [5], Krantz [12], or Rudin [18].[
At the outset, let us review some standard notation and terminology for future
reference:

• Sums: If ai, a2, • • •, afc are numbers, their sum a\ + 02 H h afc is denoted
by Yli an, or by ]Cn=i an if necessary for clarity. The sum need not be
started at n = 1; more generally, if j < k, we have

V ak.

The letters j and k denote the limits of summation; the letter n is analo
gous to a dummy variable in an integral and may be replaced by any other
letter that is not already in use without changing the meaning of the sum.
We shall occasionally write simply J2 an when the limits of summation are
understood.

' Numbers in brackets refer to the bibliography at the end of the book.
Chapter 1. Setting the Stage

• Factorials: If n is a positive integer, n\ ("n factorial") is the product of

all the integers from 1 to n. By convention, 0! = 1, so that the formula
n\ = n • (n — 1)! remains true even for n = 1.

• Sets: If S and T are two sets, S U T and 5 n T denote their union and
intersection, respectively, and S\T denotes the set of all elements of S that
are not in T. The expressions "S C T" and "T D 5"' both mean that S is a
subset of T, including the possibility that S = T, and "x 6 5" and "x <£ S"
mean, respectively, that x is or is not an element of 5. The set of all objects
x satisfying a property P(x) is denoted by {x : P(x)}, and empty set is
denoted by 0.

The union and intersection of a family Si, S2,..., Sk of sets are denoted by
Ui 5n and Pl1 5V,,. The conventions for using the symbols |J and f] are the
same as those for the summation sign J2 described above.

• Real numbers: The set of real numbers is denoted by R. The following

notations are used for intervals in R:

a,b) = {x : a < x < b}, [a,b] = {x : a < x < 6}

a, b] — {x : a < x < 6},
a [a, b) = {x : a < x < b}.

Intervals of the form (a, b) are called open; intervals of the form [a, b] are
called closed; and intervals of the forms (a, b] and [a, b) are called half-open.
(Of course, the symbol (a, b) is also used to denote the ordered pair whose
first and second members are a and b, respectively; remarkably enough, this
rarely causes any confusion.)

If {xi,..., Xk} is a finite set of real numbers, its largest and smallest ele
ments are denoted by max(xi,.. ■ ,xk) and min(xi,...,zfc), respectively.

Infinity. In discussing limits it is often convenient to add two "points at in

finity" 00 (also called +00) and —00 to the real number system. These are
not real numbers, and one can perform arithmetical operations on them only
with great caution, but there is no harm in thinking of them as actual math
ematical objects. The points ±00 may be used as endpoints of intervals; for
example, (a, 00) = {x : x > a}. Intervals of the form [a, 00) and (—00, b]
are classified as closed intervals; (a, 00) and (—00, a) are open.

Complex numbers: The imaginary unit \/^l is denoted by i, although the

letter i may be used for other purposes when complex numbers are not under
discussion. The set of complex numbers, that is, numbers of the form x + iy
where x, y G R is denoted by C. As a set, C may be identified with the
Cartesian plane by the correspondence x + iy i—> (x, y), and we speak of
"the complex plane C." If z = x + iy is a complex number, x and y are
called its real and imaginary parts, respectively, and are denoted by Re z
and Im z. The number x - iy is called the complex conjugate of z and is
denoted by z, and the number </zH = \Jx2 + y2 (the distance from (x, y)
to the origin in the plane) is called the absolute value of z and is denoted by
\z .

• Mappings andfunctions: A mapping, or map, is a rule / that assigns to each

element of some set A an element of some other set B (possibly equal to A),
We write /:A-4Bto display all these ingredients together. If x G A, the
element of B assigned to x by / is called the value of / at x and is denoted
by f{x). If 5 is a subset of A, the set of values {f[x) : x € 5} is denoted
by f(S). The set A is called the domain of /, and the set f(A) (a subset of
B) is called the range of /. The mapping / : A ~* B is called one-to-one if
/(.T) = f{y) only when x = y, and / is said to map A onto B if f(A) = B.

If / : A -> B and g : B -» C are mappings, their composition is the

mapping g o f : A ~¥ C defined by {g o /')(:;;) = g{f{x)).

A mapping / : A -4 B is said to be invertible if there is another mapping

g : B -4 A such that g(f{x)) = x for all x € A and f{g[y)) = y for all
y <= R The equation g(f{x)) = x can be valid for all x £ A only if / is
one-to-one, and the equation f(g(y)) - y can be valid for all y G B only if
/ maps A onto B. Conversely, if these two conditions are satisfied, it is easy
to verify that / is invertible. In this case, the mapping g is called the inverse
of / and is commonly denoted by /"'.
Mappings are sometimes called "functions," but we shall reserve the term
function for mappings whose values are real numbers, complex numbers,
or vectors. Mappings of a set A into itself (B = A) are sometimes called
transformations.

• Special functions: In this book, we denote the natural logarithm by log rather
than In, this being the common usage in advanced mathematics. Also, we de
note the principal branches of the inverse trig functions by arcsin, arccos, and
arctan; arcsin and arccos map [—1,1] onto [-571", ^tt] and [0, it], respectively,
and arctan maps M. onto ( — ^tt, |tt).

• Logical symbols: We shall sometimes use the symbols => and <£=> to de
note logical implication and equivalence, respectively. That is, if A and B
4 Chapter 1. Setting the Stage

are mathematical statements, "A =$> B" is read "A implies B" or "If A,
then B," and "A <=^ B" is read "/I is equivalent to B" or "A if and only
if B." We point out that "A ==> B" and "not B => not y4" are logically
equivalent; that is, in order to prove that hypothesis A implies conclusion B,
one may assume that B is false and show that A is false.

1.1 Euclidean Spaces and Vectors

We shall be studying functions of several real variables, say f(x\,X2,... ,xn).

In elementary treatments of the subject one usually focuses on the cases n = 2
and,;//, = 3, because these are the ones where ordered ?i-tuples of numbers can
represent points in physical space. However, most of the ideas work equally well
for any number of variables, and it is helpful to continue using geometric language
in this more general setting even though "n-dimensional space" doesn't correspond
directly to a physical object that can be visualized.
The set of all ordered //-tuples of real numbers is called ?i-dimensional Eu
clidean space and is denoted by Wl. We will denote such n-tuples either by writing
out the components or by single boldface letters:

. x = Oi,;?;2,... ,xn).

The n-tuple whose components are all zero is denoted by 0:

When n = 2 or 3, we shall often write (x,y) or (x,y,z) instead of (2:1,0:2) or

(2:1,;X2,2.-3), but we shall still use x as a single symbol to denote the ordered pair
or triple.
Ordered n-tuples of numbers lead a double life. We usually think of the n-
tuple (xi,... , xn) as representing the Cartesian coordinates of a point in the n-
dimensional space W1. However, sometimes we think of it as representing a "quan
tity with magnitude and direction" such as a force or velocity and visualize it as an
arrow. There is some virtue in maintaining a notational distinction between these
two concepts, but we shall not attempt to do so.
To express the basic ideas of n-dimensional geometry it is convenient to use
the language of vector algebra. Most of the vector'operations work equally well in
any number of dimensions:

Addition : x + y = (x\ + y{,... , xn + yn),

Scalar multiplication: ex = {cx\,... ,cxn),

Dot product : x ■ y = x±y\ + ■ ■ ■ + xnyn.

1.1. Euclidean Spaces and Vectors 5

• The exception is the cross product, which is peculiar to 3 dimensions; we shall

discuss it at the end of this section. If x G Mn, the norm of x is denned to be

I = \lx\ + x.

Some people denote norms by double vertical bars, thus: ||x||.

There are two fundamental inequalities involving the dot product and norm,
Cauchy's inequality and the triangle inequality. The reader is probably familiar
with them in dimensions 2 and 3, and the ideas are exactly the same in higher
dimensions.

1.1 Proposition (Cauchy's Inequality). For any a, b G Rn,

ab| <

Proof. If b = 0 then both sides of the inequality are 0. Otherwise, we introduce a

real variable t and consider the function

f(t) = |a - ib|2 = (a - ib) • (a - tb) = |a|2 - 2ta • b + t2|b|2.

This is a quadratic function oft. Its minimum value occurs at t = (a • b)/|b|2, and
that minimum value is

= a —
|b| 2 '

On the other hand, clearly f(t) > 0 for all t, so

Multiplying through by |b|2, we obtain the desired result: |a|2|b|2 > (a • b)2. □

Note. Cauchy's inequality is also called Schwarz's inequality, the Cauchy-

Sckwarz inequality, or Buniakovsky's inequality. (Schwarz and Buniakovsky in
dependently discovered the corresponding result for integrals of functions, namely,

1/2 b -.1/2
f(x)g(x) dx \f(x)\2dx \g(x)\2dx\ ,

which can be proved in much the same way.)

Chapter I. Setting the Stage

1.2 Proposition (The Triangle Inequality). For any a, b € W1,

|a + b| < |a| + |b|.

Proof. We have |a + b|2 = (a + b) • (a + b) = |a|2 + 2a-b+ |b|2. By Cauchy's

inequality, this last sum is at most ja|2 + 2|a| |b| + |b|2 = (|a| + |b|)2, so the result
follows by taking square roots. □

The distance between two points x and y in 3-space is given by

(x2 - (x3 - y3)2 =

and similarly for points in the plane. We shall take this as a definition of distance
in n-space for any n:

Distance from x to y = |x — y|.

By taking a = x — y and b = y - z in the triangle inequality, we see that

x — z x-y| + |y-z

for any x, y, z 6 Rn. That is, the distance from x to z is at most the sum of the
distances from x to y and from y to z, for any intermediate point y. Hence the
name "triangle inequality": One side of a triangle is at most the sum of the other
two sides.
If we think of two vectors x and y as arrows emanating from the same point, we
can speak of the angle 6 between them. The familiar formula for 9 in dimensions
2 and 3 remains valid in higher dimensions:

6 = arccos
X

Cauchy's inequality says that the quotient in parentheses always lies in the interval
r—1,1], so it is indeed the cosine of some number 9 G [0, it}.
In particular, the directions of two vectors x and y are perpendicular to each
other if and only if x • y = 0. In this case the vectors are said to be orthogonal to
each other.
In many situations we need to control the magnitude, i.e., the norm, of a vector
x = (xi,..., xn), but it is often more convenient to work with the magnitudes of
the components Xj of x. In such cases the following inequalities are useful. Let M
be the largest of the numbers \x\\,..., \xn\. Then M2 < x\ + • ■ • + x2 (because
M2 is one of the numbers on the right), and x\ + • • • + x2n < nM2 (because each
number on the left is at most M2). In other words,

(1.3) max 'nmax

1.1. Euclidean Spaces and Vectors

Cross Products. Let i = (1,0,0), j = (0,1,0), and k = (0,0,1) be the

standard basis vectors for M3; then an arbitrary vector a G M3 can be written as

a = (ai, a2, a3) = aii + a2j + a3k.

The cross product of two vectors a,beM3 is defined by

j k
a x b = det — 0362)1 + (0361 -

(For a review of determinants, see Appendix A, (A.24)-(A.33).) It is easily verified

that cross products distribute over addition and scalar multiplication in the usual
way:

(ciai + c2a2) x b = ci(ai x b) + c2(a2 x b),

a x (cibi + c2b2) = ci(a x bi) + c2(a x b2).

The cross product is anticommutative:

jL a x b = -b x a.

It is not associative; that is, a x (b x c) / (a x b) x c in general. Instead, it

satisfies a quasi-associative law called the Jacobi identity:

a x (b x c) + b x (c x a) + c x (a x b) = 0.

A messy but straightforward calculation shows that

|axb|2 = |a|2|b|2-(a-b)2.
(|a x b|2 is the sum of the squares of the components of a x b. Multiply it out and
rearrange the terms to get |a|2|b|2 - (a • b)2.) If 9 is the angle between a and b
(0 < 6 < it), we know that a • b = |a| |b| cos 6, so

|a x b|2 = |a|2|b|2(l - cos2 9), or |a x b| = |a| |b| sin0.

If a and b represent two sides of a parallelogram and we take a to be the "base,"

then |b|sin0 is the "height"; hence, |a x b| is the area of the parallelogram
generated by a and b. Another easy calculation shows that

a- (ax b) = b- (a x b) = 0;
Chapter 1. Setting the Stage

a x b

FIGURE 1.1: The geometry of the cross product.

in other words, axbij orthogonal to both a and b. See Figure 1.1.

The two italicized statements specify the magnitude and direction of a x b in
purely geometric terms and show that a x b has an intrinsic geometric meaning,
independent of the choice of coordinate axes. Well, almost: The fact that a x b
is orthogonal to both a and b specifies its direction only up to a factor of ±1, and
this last bit of information is provided by the "right hand rule": If you point the
thumb and first finger of your right hand in the directions of a and b, respectively,
and bend the middle finger so that it is perpendicular to both of them, the middle
finger points in the direction of a x b. Thus the definition of cross product is tied
to the convention of using "right-handed" coordinate systems. If we were to switch
to "left-handed" ones, all cross products would be multiplied by —1.

EXERCISES

1. Letx = (3,-1,-1,1) andy = (-2,2,1,0). Compute the norms of x and y

and the angle between them.

2. Given x, y € Rn, show that

a. |x + y|2 = |x|2 + 2x-y + |y|2.
b. |x + y|2 + |x-y|2 = 2(|x|2 + |y|2).
3. Suppose xi,... , Xk € Mn.
a. Generalize Exercise 2a to obtain a formula for |xi + ■ • • + x^|2.
b. (The Pythagorean Theorem) Suppose the vectors Xj are mutually orthog
onal, i.e., that x, • Xj = 0 for i ^ j. Show that |xi + • • ■ + xk\2 =
2 2 "
4. Under what conditions on a and b is Cauchy's inequality an equality? (Exam
ine the proof.)

5. Under what conditions on a and b is the triangle inequality an equality?

1.2. Subsets of Euclidean Space 9

,: 6. Show that | |a| - |b| | < |a - b| for every a, b G Wl.

7. Suppose a, b G M3.
a. Show that if a • b = 0 and a x b = 0, then either a = 0 or b = 0.
b. Show that if a • c = b • c and a x c = b x c for some nonzero c G M3,
then a = b.
c. Show that (a x a) x b = a x (a x b) if and only if a and b are proportional
(i.e., one is a scalar multiple of the other).
8. Show that a • (b x c) is the determinant of the matrix whose rows are a, b, and
c (if these vectors are considered as row vectors) or the matrix whose columns
are a, b, and c (if they are considered as column vectors).

1.2 Subsets of Euclidean Space

In this section we introduce some standard terminology for sets in Rn.

First, the set of all points whose distance from a fixed point a is equal to some
number r is called the sphere of radius r about a, and the set of points whose dis
tance from a is less than r is called the (open) ball of radius r about a. (In ordinary
English the word "sphere" is often used for both these purposes, but mathemati
cians have found it helpful to reserve the word "sphere" for the spherical surface
and to use "ball" to denote the solid body.) We shall use the notation B(r, a) for
the ball of radius r about a:

B{r,a) = {x G Mn : |x - a] < r}.

Of course, when in dimension 1, a ball is just an open interval, and in dimension 2,

the words "disc" and "circle" may be used in place of "ball" and "sphere."
A set S C W1 is called bounded if it is contained in some ball about the origin,
that is, if there is a constant C1 such that |x| < C for every x G S.
When one studies functions of a single variable, one frequently considers inter
vals in the real line, and it is often necessary to distinguish between open intervals
(with the endpoints excluded) and closed intervals (with the endpoints included).
When n > 1, there is a much greater variety of interesting subsets of Un to be
considered, but the notions of "open" and "closed" are still fundamental. Here are
the definitions.

Let 5 be a subset of Wl.

• The complement of S is the set of all points in W1 that are not in S; we

denote it by W1 \ S or by Sc:

sc = mn \s = {xgit :x^ s}.

10 Chapter 1. Setting the Stage

• A point x G Wn is called an interior point of S if all points sufficiently

close to x (including x itself) are also in S, that is, if S contains some ball
centered at x. The set of all interior points of S is called the interior of S
and is denoted by 5mt:

Sini = {x G 5 : B(r, x) C S for some r > 0}.

• A point x G W1 is called a boundary point of S if every ball centered at x

contains both points in 5 and points in Sc. (Note that if x is a boundary point
of 5, x may belong to either 5 or Sc.) The set of all boundary points of S is
called the boundary of S and is denoted by dS:

dS = {x G Mn : B{r, x) n S + 0 and B(r, x) n Sc ^ 0 for every r > 0}.

(Remark. We shall use the term "boundary" slightly differently in §5.7 in

connection with Stokes's theorem, in the context of surfaces in R3 being
"bounded" by curves. But the present definition is the general-purpose one.)

• 5 is called open if it contains none of its boundary points.

• S is called closed if it contains all of its boundary points.

• The closure of S is the union of 5 and all its boundary points. It is denoted
by S:
S = SU dS.

• Finally, a neighborhood of a point x G Rn is a set of which x is an interior

point. That is, S is a neighborhood of x if and only if x is an interior point
of 5.

Let us examine these ideas a little more closely. First, notice that the boundary
points of S are the same as the boundary points of Sc; the definition of boundary
point remains unchanged if S and Sc are switched. Moreover, if x is neither an
interior point of S nor an interior point of Sc, then x must be a boundary point of S.
In other words, given S C M.n and x G Rn, there are exactly three possibilities: x
is an interior point of S, or x is an interior point of Sc, or x is a boundary point
of S.

1.4 Proposition. Suppose S C Wn.

a. S is open <J=^- eve?y point of S is an interior point.
b. S is closed <==> Sc is open.
1.2. Subsets of Euclidean Space 11

Proof. Every point of S is either an interior point or a boundary point; thus S is

open <==> every point of S is an interior point. On the other hand, S is closed
^=> it contains all of dS, which is the same as d(Sc); this happens precisely when
Sc contains none of its boundary points, i.e., when Sc is open. □

Example 1. Let S be B(p, 0), the ball of radius p about the origin. First,
given x G S, let r = p — |x|. If |y — x| < r, then by the triangle inequality we
have |y | < |y - x| + |x| < p, so that B(r, x) C S. Therefore, every x <E 5 is
an interior point of S, so S is open. Second, a similar calculation shows that if
|x| > p then B(r, x) c Sc where r = |x| — p, so every point with |x| > p is an
interior point of Sc. On the other hand, if |x| = p, then ex € S for 0 < c < 1
and ex € 5C for c > 1, and |cx — x| = \c — l\p can be as small as we please,
so x is a boundary point. In other words, the boundary of 5 is the sphere of
radius p about the origin, and the closure of S is the closed ball {x : |x| < p}.

EXAMPLE 2. Now let S be the ball of radius p about the origin together with
the "upper hemisphere" of its boundary:

S = B((9,0)u{xer : |x| = pandxn > 0}.

The calculations in Example 1 show that Smt is the open ball B(p, 0); OS is
the sphere {x : |x| = p}, and S is the closed ball {x : |x| < p}. The set S is
neither open nor closed.

Example 3. In the real line (i.e., n = 1), let S be the set of all rational
numbers. Since every ball in M — that is, every interval — contains both
rational and irrational numbers, every point of R is a boundary point of S. The
set 5 is neither open nor closed; its interior is empty; and its closure is R.

Subsets of Rn are often specified in terms of equations or inequalities — for

example, by an expression of the form

(1.5) S= {xef :/(x)D0},

where □ denotes one of the relations =, <, >, <, >. (Taking the quantity on the
right of □ to be 0 is no restriction; just move all the terms over to the left side.) We
anticipate some results from § 1.3 in giving the following rule of thumb: Sets defined
by strict inequalities are open; sets defined by equalities or weak inequalities are
closed. More precisely, if S is given by (1.5) where the function f is continuous,
then S is open ifO denotes < or >, and S is closed ifO denotes =, <, or >. The
reader may feel free to use this rule in doing the exercises.
12 Chapter I. Setting the Stage

EXERCISES

1. For each of the following sets S in the plane R2, do the following: (i) Draw a
sketch of 5. (ii) Tell whether S is open, closed, or neither, (iii) Describe 5"nt,
S, and OS. (These descriptions should be in the same set-theoretic language as
the description of 5 itself given here.)
a. S = {(»,!/) : 0 < x2 + y2 < 4}.
b. S={{x,y) :x2-x<y<0}.
c. 5 = {(x, y) : x > 0, y > 0, and x + y > 1}.
d. S = {(x,y):y = x3}.
e. S — {(z,y) ■ x > 0 and y = sin(l/a;)}.
f. S *= {(x,y) :a? + y* <1}\{(x,0) :x <0}.
g. S — {{x, y) : x and y are rational numbers in [0,1] j.
, . 2. Show that for any 5 C Rn, 8int is open and dS and 5 are both closed. (Hint:
Use the fact that balls are open, proved in Example 1.)

3. Show that if Si and S-2 are open, so are S\ U 52 and Si n ^2.

f 4. Show that if 5i and £2 are closed, so are 5] U 52 and Si D S2. (One way is to
use Exercise 3 and Proposition 1.4b.)

, 5. Show that the boundary of 5 is the intersection of the closures of 5 and Sc.
6. Give an example of an infinite collection Si, S2,.... of closed sets whose union
U~=1 Sj is not closed.
7. There are precisely two subsets of Kn that are both open and closed. What are
they?

8. Give an example of a set S such that the interior of S is unequal to the interior
of the closure of S.

9. Show that the ball of radius r about a is contained in the ball of radius r + a

about the origin. Conclude that a set S C M.'1 is bounded if it is contained in

some ball (whose center can be anywhere in W1).

1.3 Limits and Continuity

We now commence our study of functions defined on W1 or subsets of Rn. For

the most part we shall be dealing with real-valued functions, but in many situations
we shall deal with vector-valued or complex-valued functions, that is, functions
whose values lie in Rk or C For our present purposes we can regard C as M2 by
identifying the complex number u + iv with the ordered pair (u, v), so it is enough
to consider vector-valued functions. But we begin with the real-valued case.
1.3. Limits and Continuity 13

Suppose / is a real-valued function defined on W1. We say that

lim /(x) = L,
x—>a

and call L the limit of / (x) as x approaches a, if / (x) becomes as close as we

wish to L provided x is sufficiently close to, but not equal to, a. More formally,
the statement limx_>a/(x) = L means that for any positive number e there is a
positive number 6 so that

(1.6) o |/(x) - L\ < e whenever 0 < |x - a| < 6.

This condition can be rephrased in terms of the individual components Xj — a,j of

x - a, as follows: limx_).a/(x) = L if and only if for every positive number e
there is a positive number 5' so that

(1.7) |/(x) — L\ < e whenever 0 < max(|a;i — oi|,..., \xn — an\) < 5'.

The equivalence of (1.6) and (1.7) follows from (1.3): If (1.6) is satisfied, then
(1.7) is satisfied with 6' = S/y/n; and if (1.7) is satisfied, then (1.6) is satisfied
with 5 = 8'.
More generally, we can consider functions / that are only defined on a subset
S of W1 and points a that lie in the closure of S. The definition of lirxix^a /(x) is
the same as before except that x is restricted to lie in the set S. It may be necessary,
for the sake of clarity, to specify this restriction explicitly; for this purpose we use
the notation
lim /(x).
x—>a, xgS

In particular, for a function / on the real line we often need to consider the one
sided limits

lim f(x) = lim f(x) and lim f(x) = lim f(x).

x>a+ x—>a, x>a x—>a— x-^a, x<a

For example, let / : K. —>■ K be the function defined by f(x) = x + 1 for \x\ < 1
and/(.-r) = Ofor|x| > 1. Then lim^î f(x) does not exist, but lirax-î- f(x) = 2
and limx_î+ f(x) = 0.
Notice that the definition of limx_>a/(x) does not involve the value /(a) at
all; only the values of / at points near a but unequal to a are relevant. Indeed, /
need not even be defined at a — a situation that arises, for example, in the limits
that define derivatives. On the other hand, if limx_>a /(x) and /(a) both exist and
are equal, that is, if
14 Chapter 1. Setting the Stage

then / is said to be continuous at a.

If / is continuous at every point of a set U C W1, f is said to be continuous on
U. Going back to the condition (1.6) that defines limits, we see that the continuity
of / on U is equivalent to the following condition: For every positive number e and
every a € U there is a positive number S so that

(1.8) l/(x) — /(a)l < e whenever |x — a| < 5.

Informally speaking, / is continuous if changing the input values by a small amount

changes the output values by only a small amount.
The same definitions apply equally well to vector-valued functions, that is,
functions f with values in M.k for some k > 1. In this case the limit L is an el
ement of Mk, and |f (x) — L| is the norm of the vector f (x) — L. In view of (1.3),
it is clear that

lim f (x) = L ■<=£• lim fj(x) = Lj for j = 1,..,, k.

x—»a

Thus the study of limits and continuity of vector-valued functions is easily reduced
to the scalar case, to which we now return out attention.
We often express the relation limx_>a f(x) = L informally by saying that /(x)
approaches L as x approaches a. In one dimension this works quite well; we can
envision x as the location of a particle that moves toward a from the right or the
left. But in higher dimensions there are infinitely many different paths along which
a particle might move toward a, and for the limit to exist one must get the same
result no matter which path is chosen. It is safer to abandon the "dynamic" picture
of a particle moving toward a; we should simply think in terms of /(x) being close
to L provided that x is close to a, without reference to any motion.

Example 1. Let f(x,y) = 9Xy 9 if (x,y) ^ (0,0), and let /(0,0) =

xl + y2
0. Show that lim^^^O) f(xi v) does not exist — and, in particular, / is
discontinuous at (0, 0).
Solution. First, note that /(x,0) = /(0,y) = 0 for all x and y, so
f{x,y) —> 0 as (x,y) approaches (0,0) along the x-axis or the y-axis. But
if we consider other straight lines passing through the origin, say y = ex, we
have f(x,cx) = cx2/(x2 + c2x2) = c/(l + c2), so the limit as (x,y) ap
proaches (0, 0) along the line y = ex is c/(l + c2). Depending on the value
of c, this can be anything between — \ and \ (these two extreme values being
achieved when c = — 1 or c = 1). So there is no limit as (x,y) approaches
(0,0) unrestrictedly.
1.3. Limits and Continuity 15

The argument just given suggests the following line of thought. We wish to
know if limx->a/(x) exists. We look at all the straight lines passing through a
and evaluate the limit of /(x) as x approaches a along each of those lines by one-
variable techniques; if we always get the same answer L, then we should have
^a /(x) = L, right? Unfortunately, this doesn't work:

Example 2. Let g(x, y) = -^-^ if {x, y) ^ (0,0) and g(0,0) = 0. Again

x + y
we have g(x,0) = g(0,y) = 0, so the limit as (x,y) —* (0,0) along the
coordinate axes is 0. Moreover, if c ^ 0,

I \
g(x, ex) = C3°3 CX n
= -^-—g -> 0 as x -+ 0, n
X4 + CZX2 C2 + X1

so the limit as (x, y) —> (0,0) along any other straight line is also 0. But if we
approach along & parabola y = ex2, we get

ex3
g{x,cx2) =
X4 + C2X4 1 + C2 '

which can be anything between —i and ^ as before, so the limit does not
exist. (The similarity with Example 1 is not accidental: If / is the function in
Example 1 we have g(x, y) = f(x2,y).)

After looking at examples like this one, one might become discouraged about
the possibility of ever proving that limits do exist! But things are not so bad. If / is a
continuous function, limx_>a /(x) is simply /(a). Moreover, most of the functions
of several variables that one can easily write down are built up from continuous
functions of one variable by using the arithmetic operations plus composition, and
these operations all preserve continuity (except for division when the denominator
vanishes).
Here are the precise statements and proofs of the fundamental results. (The
reader may wish to skip the proofs; they are of some value as illustrations of the sort
of formal arguments involving limits that are important in more advanced analysis,
but they contribute little to an intuitive understanding of the results.)

1.9 Theorem. Suppose f : M.n -» Wm is continuous on U C Rn andg : M.m -> Mk

is continuous on f (£/) C Wn. Then the composite function gof : M" —» Rk is
continuous on U.

Proof. Let e > 0 and a e U be given, and let b = f (a). Since g is continuous on
f{U), we can choose r/ > 0 so that |g(y)—g(b)| < e whenever |y-b| < rj. Having
16 Chapter 1. Setting the Stage

chosen this 77, since f is continuous on U we can find S > 0 so that |f (x) — b| < rj
whenever |x — a < 8. Thus,

|x-a|<<5 =» |f(x)-f(a)| < |g(f(x))-g(f(a))|<€,

which says that g o f is continuous on U. n

1.10 Theorem. Let fi(x,y) = x + y, f2{x,y) = xy, and g(x) = 1/x. Then f\
and J2 are continuous on M.2 and g is continuous on R \ {0}.

Proof. To prove continuity of f\ and f%, we need to show that lira(a.jl/)_>(a>j,) x+y =
a + b and lim^)^^) xy = ab for every a,b G R. That is, given e > 0 and
a, 6 G K, we need to find 5 > 0 so that if \x — a\ < 6 and \y — b\ < 5, then (i)
\{x + y) - (a + 6)| < e or (ii) |xy - a&| < e. For (i) we can simply take 5 = \e,
for if \x — a\ < \e and \y — b\ < \e, then

{x + y) - (a (y-b)\ < \x - a\ + \y - b\ \t =■ e.

For (ii) we observe that xy — ab = (x — a)y + a(y — b), so we can make xy - ab

small by making the two terms on the right small. Indeed, let

S = min ( 1,
2(|a|4 ' 2(|6| +

If |a; - a\ < 8 and \y - b\ < 8, then \y\ < + 5 < \b\ + 1, so

\xy-ab\ < x — a\\y-b\

e e e
1) 2+2=
This proves the continuity of /i and fi- As for g, to show that limx^a 1/x = I/a
for a 7^ 0, we observe that
1 1 a — x

x a ax

Given e > 0, let 8 be the smaller of the numbers h\a\ and hea2. If \x — a\ < 8, then
<\a-x\ < \\a\ + \x\ and hence |x| > \\a\, so
,2
x — a ea
= €
ax 2ax

as desired. D

1.11 Corollary. The function fa{x, y) = x - y is continuous on M.2, and the func
tion fi(x, y) = x/y is continuous on {(.t, y) : y ^ 0}.
1.3. Limits and Continuity 17

Proof. With notation as in Theorem 1.10, we have fi(x,y) = f2(x,g(y)), so f± is

the composition of continuous mappings and hence is continuous on the set where
y / 0. Likewise, fo(x,y) = fi(x,f2(—l,y)), so f% is continuous. (Alternatively,
continuity for f% may be proved in exactly the same way as for f\.) □

1.12 Corollary. The sum, product, or difference of two continuous functions is

continuous; the quotient of two continuous functions is continuous on the set where
the denominator is nonzero.

Proof. Combine Theorem 1.10 and Corollary 1.11 with Theorem 1.9. For example,
if / and g are continuous functions on U C Rn, then / + g is continuous because
it is the composition of the continuous map (/, g) from U to R2 and the continuous
map (x,y) h-> x+y from R2 to R. Likewise for the other arithmetic operations. □

The elementary functions of a single variable (polynomials, trig functions, ex

ponential functions, etc.) are all continuous on their domains of definition, and
elementary functions of several variables are generally built up out of functions of
one variable by the arithmetic operations and composition. The preceding results
therefore allow the continuity of such functions to be established almost immedi-
i ■ r- i i n. ■ r ^ sin(3x + 2y) .
ately m most cases. For example, the function <z?(z, y) = = is contin-
x2 -y
uous everywhere except along the parabola y = x2, because it is built up from the
continuous functions of one variable 3x, 2y, x2, and — y by taking sums (3x + 2y
and x2 — y), composing with the sine function (sin(3x + 2y)), and then taking a
quotient. For another example, the function ip(x,y) = xy, defined on the region
where x > 0, is continuous there, because it can be rewritten as tp(x, y) = ey log:c,
which is assembled from the (continuous) exponential and logarithmic functions
and the operation of multiplication (y ■ log x). Similarly, the functions in Examples
1 and 2 are continuous everywhere except at the origin.
Let us look at one more example:

Example 3. Let h(x,y) = -^ §-^ for (x,y) / (0,0) and h(0,0) = 0.

xz +yz
Evaluate lim(x 2/)_>(2 3) h(x, y) and lim^ y)^(o 0) h(x, y). Is h continuous at
(0,0)?
Solution. The first limit is easy: Clearly h is continuous everywhere except
at the origin, so lim(x,j/)_»(2,3) h(x,y) = h(2,3) = 6(4 - 9)/(4 + 9) = -f§.
The behavior of h at the origin requires a closer examination. Since h(x, 0) = 0
for all x, if the limit exists it must equal 0. Experimentation with lines and
parabolas as in Examples 1 and 2 fails to yield any evidence to the contrary.
18 Chapter 1. Setting the Stage

In fact, the limit is 0, and this can be established with a little ad hoc estimat
ing. Clearly \x2 — y2\ < x2 + y2, so \h(x,y)\ < \xy\. But xy —f 0 as
(x, y) —* (0, 0), so h(x, y), being even smaller in absolute value than xy, must
also approach 0. Thus lim(a.]J/)_>(0,o) ^(a?, I/) = 0 and fo is continuous at (0, 0).

We now establish the relation between inequalities on continuous functions and

open and closed sets that was mentioned at the end of the preceding section.

1.13 Theorem. Suppose f : Rn —> Rfc is continuous and U is a subset ofM.k, and
let S = {x G R" : f (x) G U}. Then S is open ifU is open, and S is closed ifU is
closed.

Proof. Suppose U is open. We shall show that S is open by showing that every
point a in 5 is an interior point of S. If a G S, then /(a) € U. Since U is open,
some ball centered at /(a) is contained in U; that is, there is a positive number e
such that every y G Rk such that |y — f(a)| < e is in U. Since / is continuous,
there is a positive number 5 such that |f(x) - f (a)| < e whenever |x — a| < 5.
But this means that /(x) G U whenever |x — a| < 5, that is, x G S whenever
x - a | < 5. Thus a is an interior point of S.
On the other hand, suppose U is closed. Then the complement of U in R is open
by Proposition 1.4b, so the set S' = {x : /(x) e Uc} is open by the argument just
given. But S" is just the complement of 5 in Rn, so 5 is closed by Proposition 1.4b
again. □

The result about the openness or closedness of sets denned by inequalities or

equations at the end of §1.2 is a corollary of Theorem 1.13. For example, if / :
R" -> R is a continuous function, the set {x : /(x) > 0} (resp.2 {x : /(x) = 0})
is of the form {x : /(x) G 17} where U = (0, oo) (resp. U = {0}), and this U is
open (resp. closed).
Theorem 1.13 can be generalized to functions that are only defined on subsets of
Rn; with notation as above, the correct statement is that if U is open (resp. closed)
then 5 is the intersection of the domain of f with an open (resp. closed) set. (For
example, the set {x G R : logx < 0}, namely (0,1], is the intersection of the
domain of log, namely (0, oo), with the closed set [0,1]. On the other hand, the set
{x G R : \/x < 1}, namely [0,1), is the intersection of the domain of the square
root function, namely [0, oo), with the open set (—1,1).) In particular, if U and the
domain of f are both open (resp. closed), then so is S.
The converse of Theorem 1.13 is also true; see Exercise 8.

2"resp." is an abbreviation for "respectively."

1.4. Sequences 19

EXERCISES

1. For the following functions /, show that ]ia^x,v)-*{O,o) f(xftl) does not ex
ist.

a- f{x,y) = —.

4 4

2. For the following functions /, show that lim(S)S)_+(o,o) f{x, y) = 0.

3. Let f(x,y) = a;"1 sin(a;y) for x 7^ 0. How should you define /(0,y) for
y G 1R so as to make / a continuous function on all of E2 ?
4. Let f(x,y) = xy/(x2 + y2) as in Example 1. Show that, although / is dis
continuous at (0,0), f(x,a) and f(a,y) are continuous functions of a; and y,
respectively, for any a £ M. (including a = 0). We say that / is separately
continuous in x and y.

5. Let f(x,y) = y(y - x2)/x4 if 0 < y < x2, f{x,y) = 0 otherwise. At which
point(s) is / discontinuous?
6. Let f(x) = x if x is rational, f(x) = 0 if x is irrational. Show that / is
continuous at x = 0 and nowhere else.

7. Let f(x) = l/q if x = p/q where p and q are integers with no common factors
and q > 0, and f(x) = 0 if x is irrational. At which points, if any, is /
continuous?

8. Suppose f : Wl —> Rk has the following property: For any open set U C Rk,
{x : f(x) e U} is an open set in M". Show that f is continuous on W1. Show
also that the same result holds if "open" is replaced by "closed."

9. Let U and V be open sets in E" and let f be a one-to-one mapping from U onto
V (so that there is an inverse mapping f"1 : V —> U). Suppose that f and f"1
are both continuous. Show that for any set S whose closure is contained in U
we have f (OS) = 0(f(S)).

1.4 Sequences

Generally speaking, a sequence is a collection of mathematical objects that is in

dexed by the positive integers. The objects in question can be of any sort, such as
20 Chapter 1. Setting the Stage

numbers, n-dimensional vectors, sets, etc. If the /cth object in the sequence is Xk,
the sequence as a whole is usually denoted by {-Xfc}^, or just by {X/Jf3 or even
{Xk} if there is no possibility of confusion. (We shall comment further on this
notation below.) Alternatively, we can write out the sequence as X\, X2, X3,....
We speak of a sequence in a set S if the objects of the sequence all belong to S.

Example 1.
a. A sequence of numbers: 1,4,9,16,.... The /cth term in the sequence is k2,
and the sequence as a whole may be written as {/c2}^0.
b. A sequence of intervals: (—1,1), (—3, A), (— g> g)i (—jj?)) — The/cth
term in the sequence is the interval (—jj, j). and the sequence as a whole
may be written as {(—|, j)}f°-

Sequences can be defined by formulas, as in the examples above: Xk = k2, or

Ik = (—i, j^. They can also be defined by recursion (or induction), that is, by
specifying the first term or the first few terms and then giving a rule that tells how
to obtain the /cth term from the preceding ones.

EXAMPLE 2. The Fibonacci sequence is the sequence

1,1,2,3,5,8,13,21,34,55,89,...,

in which the first two terms are equal to 1 and each of the remaining terms is
the sum of the two preceding ones (that is, xk = xk-2 + %k-l)-

Example 3. Define a sequence {xk} as follows: x\ is a given positive integer

a. If Xk is odd, then Xk+i = 3xk + 1; if xk is even, then Xk+i = xk/2. For
example, if a = 13, the sequence is

13, 40,20,10, 5,16, 8, 4, 2,1,4, 2,1,4,2,1,...,

ending in the infinite repetition of (4,2,1). It is a famous unsolved problem (as

of this writing) to prove or disprove that this sequence eventually ends in the
repeating figure (4, 2,1) no matter what initial number a is chosen. (Try a few
values of a to see how it works! For more information, see Lagarias [13].)

It is convenient to make the definition of sequence a little more flexible by

allowing the index k to begin with something other than 1. Thus, we may speak of a
sequence {X/Jq0 whose objects are Xq, Xi, X2, ..., or a sequence {Xk}f, whose
objects are X7, X$, X9,.... We may also speak of a finite sequence whose terms
are indexed by a finite collection of integers, such as {Xk}\ (a finite sequence of
eight terms), or a doubly infinite sequence whose terms are indexed by the whole
set of integers: {Xk}0?^.
1.4. Sequences 21

Strictly speaking, a sequence in a set 5 is a rule that assigns to each positive

integer (or each integer in some other suitable set, as indicated above) an element
of S, in other words, a function or mapping from the positive integers to S. The
common functional notation would be to write X(k) instead of Xk for the value of
this mapping at the integer k, but for sequences it is customary to write the input
variable A; as a subscript.
It is sometimes necessary to distinguish between the sequence {^a-}i° an^ the
set of values (i.e., the range) of the sequence, because a sequence may assume the
same value many times. For example, consider the sequence of numbers a^ =
(—l)fc. Then the sequence {ak}f is the function on the positive integers whose
values are alternately —1 and +1, which may be written out as

-1,1,-1,1,-1,1,...,

but its set of values is just the two-element set {—1,1}. Since curly brackets are
commonly used to specify sets (as we just did with { — 1,1}), the notation {Xk}f
for a sequence invites confusion with the set whose elements are the Xk's, and for
this reason some authors use other notations such as {Xk)f°. However, the notation
{Xfe}J° is by far the most common one, and in practice it rarely causes problems,
so we shall stick with it.
For the remainder of this section we shall be concerned with sequences of num
bers or n-dimensional vectors. We reserve the letter n for the dimension and use
letters such as k and j for the index on a sequence. Thus, for example, if {x^} is a
sequence in Wn, the components of the vector x^ are (xki, • • •, Xkn)-
A sequence {x^} in R" is said to converge to the limit L if for every e > 0
there is an integer K such that |xfc — L| < e whenever k > ,K; otherwise, {x/J
diverges. If {x^} converges to L, we write x& —> 1 or L = ^oo /
We say that lim^oo x& = oo (or +oo) if for every C > 0 there is an integer
K such that Xk > C whenever k > K, and limfe-^ x^ = — oo if for every C > 0
there is an integer K such that Xk < —C whenever k > K. (However, a sequence
whose limit is ±oo is still called divergent.)
It follows easily from the estimates (1.3) that x& —* L if and only if each
component of x^ converges to the corresponding component of L, that is, Xkm -^
Lm for 1 < m < n. The study of convergence of sequences of vectors is thus
reducible to the study of convergence of numerical sequences.

Example 4.
a. The sequence {I/A;} converges to 0, since \{l/k) — 0| < e whenever k >
(1/e).
b. The sequence {k2} diverges; more precisely, lim/^oo k2 = oo.
22 Chapter 1. Setting the Stage

c. The sequence {xk} = {{ — l)k} diverges, but the subsequence {y,} =

{x2j-i} of odd-numbered terms converges to —1, and the subsequence
{zj} = {x2j} of even-numbered terms converges to 1.

^ Example 5. If C is any positive number, Ck/kl —> 0 as k —> oo (that is, k\

grows faster than exponentially as k —> oo). Indeed, pick an integer K > 2C.
For k > K, we then have

Ck _CK C C C_ CK 1 1 1 CK 1
k\ " K\ K+ \ K+ 2 k K\ 2 2 2 if! 2

But CK/K\ is a fixed number, and \)2k-K -> 0 as k ->• oo.

Sequential convergence is often a useful tool in studying questions relating to

open and closed sets, continuity, and related matters. The fundamental results are
the following two theorems.

1.14 Theorem. Suppose S C W1 and x € Rn. Then x belongs to the closure of S

if and only if there is a sequence ofpoints in S that converges to x.

Proof If {x/J is a sequence in S that converges to x, then every neighborhood of

x contains elements of S — namely, x^ where k is sufficiently large — so x is in
the closure of S. Conversely, suppose x is in the closure of 5. If x is in S itself, let
x/j = x for all k. If not, for each k the ball of radius l/k about x contains points
of S; pick one and call it X&. In either case, {xfe} is a sequence of points in S that
converges to x. □

1.15 Theorem. Given S C Rr\ a € S, and f : S -+ R"\ the following are

equivalent:
a. f is continuous at a.
b. For any {x/J sequence in S that converges to a, the sequence {f(xfe)} con
verges to f (a).

Proof. Suppose f is continuous at a and x^ —> a. Given e > 0, we wish to show

sequence {xfc} then converges to a, but the sequence {f (xfe)} does not converge to
f(a).
We have shown that if (a) is true then (b) is true, and that if (a) is false then (b)
is false, so the proof is complete. d

EXERCISES

/I. For each of the following sequences {xk}, find the limit or show that the se
quence diverges.

y/2k + 1 sin A; kit

%k ~ 2y/k + 1' k 3 '
3fe + 4
JZ. Let Xk = — ; then lim^oo Xk — 3. Given e > 0, find an integer K so
k — 5
that \xk — 31 < e whenever k > K.

$■; Define a sequence {x/J recursively by x\ — 1 and x^+i = kx^/ik + 1) for

k > 1. Find an explicit formula for x^. What is lim^oo a;fc?
4. Let {x/J and {y^} be sequences in R such that x^ ->■ a and y^ —>• 6. Show that
^fc +J/fe ~*■ o + & and a^fcl/A; —>■ a^- (Use Theorems 1.10 and 1.15.)
5. Given f : W1 -+ Rm; show that limx_,af(x) = 1 if and only if f (xfc) -> 1 for
every sequence {x^} that converges to a. (Adapt the proof of Theorem 1.15.)

A point a G Rn is called an accumulation point of a set 5 C R" if every neigh

borhood of a contains infinitely many points of S. (The point a itself may or may
not belong to S. Some people use the terms "limit point" or "cluster point" instead
of "accumulation point.") For example, the accumulation points of the interval
( —1,1) in R are the points in the closed interval [—1,1], and the only accumulation
point of the set {1, g, |, |,... } is 0.

6. Show that a is an accumulation point of S if and only if there is a sequence

{x,t} of points in S, none of which are equal to a, such that x^. —>■ a. (Adapt
the proof of Theorem 1.14.)

7. Show that the closure of S is the union of S and the set of all its accumulation
points.
24 Chapter 1. Setting the Stage

1.5 Completeness

The essential properties of the real number system that underlie all the theorems of
calculus are summarized by saying that R is a complete ordered field. We explain
the meaning of these terms one by one:
A. field is a set on which the operations of addition, subtraction, multiplication,
and division (by any nonzero number) are defined, subject to all the usual laws of
arithmetic: commutativity, associativity, etc. Besides the real numbers, examples of
fields include the rational numbers and the complex numbers, and there are many
others. (For more precise definitions and more examples, consult a textbook on
abstract algebra such as Birkhoff and Mac Lane [4] or Hungerford [8].)
An ordered field is a field equipped with a binary relation < that is transitive
(if a < b and b < c, then a < c) and antisymmetric (if a ^ b, then either a < box
b < a, but not both), and interacts with the arithmetic operations in the usual way
(if a 0). The real number
and rational number systems are ordered fields (with the usual meaning of "<").
but the complex number system is not.
Finally, completeness is what distinguishes the real numbers from the smaller
ordered fields such as the rational numbers and makes possible the transition from
algebra to calculus; it means that there are "no holes" in the real number line. There
are several equivalent ways of stating the completeness property precisely. The one
we shall use as a starting point is the existence of least upper bounds.
If 5 is a subset of R, an upper bound for 5 is a number u such that x I for all x E 5.

The Completeness Axiom. Let S be a nonempty set of real numbers. If S has

an upper bound, then S has a least upper bound, called the supremum of S and
denoted by sup S. If S has a lower bound, then S has a greatest lower bound,
called the infimum of S and denoted by inf 5*.

If 5 has no upper bound, we shall define sup 5 to be +oo, and if 5 has no lower
bound, we shall define inf 5 to be — oo.

Example 1.
a. If 5 is the interval (0,1], then sup 5 = 1 and inf 5 = 0.
b. If 5 = {l,i,i, !,...}, then sup 5 = land inf 5 = 0.
c. If 5 = {1, 2,3,4,...}, then sup 5 = oo and inf 5 = 1.
d. If 5 is the single point a, then sup 5 = inf 5 = a.
e. If 5 = {x : x is rational and x2 < 2}, then sup 5 = \/2 and inf 5 =
— \/2. This is an example of a set of rational numbers that has no supremum
or infimum within the set of rational numbers.
1.5. Completeness 25

If S has an upper bound, the number a = sup S is the unique number such
that
i. x < a for every x € S and
ii. for every e > 0 there exists x G 5 with x > a — e.
(i) expresses the fact that a is an upper bound, whereas (ii) expresses the fact that
there is no smaller upper bound. In particular, while sup S may or may not belong
to S itself, it always belongs to the closure of S. Similarly for inf S if S is bounded

The completeness of the real number system plays a crucial role in establishing
the convergence of numerical sequences. The most basic result along these lines is
the following. First, some terminology: A sequence {x^} is called bounded if all
the numbers xn are contained in some bounded interval. A sequence {xn} is called
increasing if xn < xm whenever n < m, and decreasing if xn > xm whenever
n < rn. A sequence that is either increasing or decreasing is called monotone (or
monotonic).

1.16. Theorem (The Monotone Sequence Theorem). Every bounded monotone

sequence in M is convergent. More precisely, the limit of an increasing (resp. de
creasing) sequence is the supremum (resp. infimum) of its set of values.

Proof. Suppose {xk} is a bounded increasing sequence. Let I be the supremum of

the set of values {x\,X2, ■ ■ ■ }; I claim that Xk -> I- Since I is an upper bound, we
have Xk 0 there is some K for which xk > I — (. Since the x^'s increase with k, we
also have Xk > / - e for all k > K. Therefore, / - e < Xk K, and
this shows that Xk —> I.
Similarly, if {x^} is decreasing, it converges to inf{.xi, x-i, • • • }• □

Example 2. Given a positive real number a, define a sequence {x^} recur

sively as follows. X[ is some fixed positive real number, and for k > 2,

1 / a
Xk - -z Xk-l "II
2 V
Observe that if x^-i > 0 then xk > 0 too; since we assume that x\ > 0,
every term of this sequence is positive. (In particular, division by zero is never
a problem.) We claim that Xk -> \fa, no matter what initial x\ is chosen.
Indeed, if we assume that the sequence converges to a nonzero limit L, by
letting k -» oo in the recursion formula we see that

or
26 Chapter 1. Setting the Stage

so that L2 = a. Since xk > 0 for every k, we must have L > 0, and hence
L = y/a. But this argument is without force until we know that {xk} converges
to a nonzero limit.
To verify this, observe that for k > 2,

4 = K4-i +2a + «2^-i) = a + U4-1 - 2a

= a + (iCfc_i — axfc-1) > a.
■©
Thus, starting with the second term, the sequence {xk} is bounded below by
\fa > 0, and it is decreasing:

-"V xk+\ - xk = \{axk l - xk) < \(xk - Xk) = 0.

The convergence to a limit L > -y/a now follows from the monotone sequence
theorem. (The verification that {xk} converges is not just a formality; see
Exercise 4.)
The sequence {xk} gives a computationally efficient recursive algorithm
for computing square roots.

The following consequence of the monotone sequence theorem is also a useful

technical tool.

1.17 Theorem (The Nested Interval Theorem). Let h = [oi, &i], h = [a2, b2],
.. .be a sequence of closed, bounded intervals in R. Suppose that (a) I\ D Ii D
Is D • ■ •, and (b) the length bk — ak of Ik tends to 0 as k —> 00. Then there is
exactly one point contained in all of the intervals Ik.

Proof. The condition I\ D 1% D I3 D • ■ ■ means that a\ < 02 < 03 < • • • and

h > b2 > b% > ■ ■ ■, so the sequences {ak} and {bk} are monotone. They are also
bounded, since all ak and bk are contained in I\; hence, by the monotone sequence
theorem, they are both convergent. Moreover, since bk — ak —> 0, their limits are
equal. Call their common limit I. Then ak < I < bk for all k, so / e Ik for all
n. No other point I' can be common to all Ik, for the length of Ik is less than the
distance \l — l'\ when k is sufficiently large. D

It should be emphasized that the real point of the nested interval theorem is that
the intersection f]^° In is nonempty; the fact that it can contain no more than one
point is pretty obvious from the assumption that the length of In tends to zero.
If {xk} is a sequence (in any set, not necessarily R), we may form a subse
quence of {xk} by deleting some of the terms and keeping the rest in their original
order. More precisely, a subsequence of {xk} is a sequence {xk^fLi specified
1.5. Completeness 27

by a one-to-one, increasing map j —*■ kj from the set of positive integers into it
self. For example, by taking kj = 2j we obtain the subsequence of even-numbered
terms; by taking kj = j2 we obtain the subsequence of those terms whose index is
a perfect square, and so on.
The following theorem is one of the most useful results in the foundations of
analysis; it is one version of the Bolzano-Weierstrass theorem, whose general form
will be found in Theorem 1.21.

1.18 Theorem. Every bounded sequence in R has a convergent subsequence.

Proof. Let {x^ } be a bounded sequence, say Xk e [a, b] for all k. Bisect the interval
[a, b] — that is, consider the two intervals [a, \{a + b)] and [|(a + b), b]. At least
one of these subintervals must contain x^ for infinitely many k; call that subinterval
I\. (If both of them contain Xk for infinitely many k, pick the one on the left.) Now
bisect I\. Again, one of the two halves must contain Xk for infinitely many k; call
that half /2. Proceeding inductively, we obtain a sequence of intervals Ij, each one
contained in the preceding one, each one half as long as the preceding one, and
each one containing Xk for infinitely many k. By the nested interval theorem, there
is exactly one point I contained in every Ij.
It is now easy to construct a subsequence of {x/J that converges to /, as follows.
Pick an integer fei such that x/Ol G I\, then pick fo > &i such that Xk2 £ h, then
pick &3 > k'2 such that Xfc3 6 1%, and so forth. By construction of the Ij's, this
process can be continued indefinitely. Since Xk and / are both in Ij, and the length
of Ij is 2~-3(6 — a), we have \xk- — l\ < 2~J(6 — a), which tends to 0 as j —> oo;
that is, Xkj —> /. □

Theorem 1.18 generalizes easily to higher dimensions: ^

1.19 Theorem. Every bounded sequence in W1 has a convergent subsequence.

Proof. If |xfc | < C for all k, then the components Xk\,..., Xkn all lie in the interval
[—C, C}. Hence, for each m = 1,..., n we can extract a convergent subsequence
from the sequence of mth components, {xkm}kLi- The trouble is that the indices
on these subsequences might all be different, so we can't put them together. (We
might have chosen the odd-numbered terms for m = 1 and the even-numbered
terms for m — 2, for example.) Instead, we have to proceed inductively. First
we choose a subsequence {x^,.} such that the first components converge; then we
choose a sub-subsequence {x^. } whose second components also converge, and so
on until we find a (sub)nsequence whose components all converge. □

Another way to express the completeness of the real number system is to say
that every sequence whose terms get closer and closer to each other actually con
verges. To be more precise, a sequence {x/J in IRn is called a Cauchy sequence if
28 Chapter 1. Setting the Stage

Xfc — Xj —> 0 as k, j —► oo, that is, if for every e > 0 there exists an integer K such
that |x& — Xj | < e whenever k > K and j > K.

1.20 Theorem. A sequence {x^} m Mn is convergent if and only if it is Cauchy.

Proof. Suppose x/; —> 1. Since x& - Xj = (x& - 1) — (xj — 1), we have 0 <
Xfc — Xj| < |xfe — 1| + |xj — 1|. Both terms on the right tend to zero as k, j —> oo;
hence so does x^ — Xj. Thus {x^} is Cauchy.
Now suppose {xfc} is Cauchy. Taking e = 1 in the definition of "Cauchy,"
we see that there is an integer K such that jx^ — Xj| < 1 if k,j > K. Then
|xfc | < |xjf+i | +1 for all k > K, and it follows that the sequence {xfc} is bounded.
By Theorem 1.18, there is a subsequence {x^.} that converges to a limit 1. But then
since {x/J is Cauchy, the whole sequence must also converge to 1. Indeed, given
e > 0, there is an integer J such that {x.^ —1| < \e if j > J, and there is an integer
K such that |x^ — xm | < |e if k, m > K. Pick an integer j > J such that kj > K;
then for k > K we have

|xfe - 1| < |xfe - Kkj | + |xfcj - 1| < \e + |e = e.

Therefore, x^ —> 1. □

EXERCISES

1. Find sup 5 and inf S for the following sets 5. Do these numbers belong to 5
or not?
a. 5 = {x : (2x2 - l)(x2 - 1) < 0}.
b. s = {(-l)k + 2~k :k>0}.
c. S = {x : arctanx > 1}.
2. Construct a sequence {xk} that has subsequences converging to three different
limits.

3. Consider the sequence 5,3,§»j)!,!»§»§>§>!>•••! obtained by listing the

rational numbers in (0,1) with denominator n in increasing order, for n succe-
sively equal to 2,3,4,.... Show that for any a e [0,1], there is a subsequence
that converges to a. {Hint: Consider the decimal expansion of a.)

4. Given a real number a, define a sequence {xk} recursively by x\ = a, Xk+\ =

T2
xk-
a. Show, as in Example 2, that if {x^} converges, its limit must be 0 or 1.
b. For which a is the limit equal to 0? equal to 1? nonexistent?
1.5. Completeness 29

5. Define a sequence {z/J recursively by x\ = \/2, Xk+i = y/2 + xk. Show by

induction that (a) Xk < 2 and (b) xk < Xfc+i for all k. Then show that limar^
exists and evaluate it.

6. Let rk be the ratio of the (k + l)th term to the kth term of the Fibonacci
sequence (Example 2, §1.4). (Thus the first few r^'s are 1, 2, |, |,...) Our
object is to show that lim^oo rk is the "golden ratio" ip = |(1 + Vb), the
positive root of the equation x2 = x + 1.
a. Show that
rk + 1 2 + 1

b. Show that rk < y if A; is odd and rk > <f> if k is even. Then show that
rfc+2 — r^ is positive if k is odd and negative if k is even. (Hint: For x > 0
we have x2 < x + 1 if x < x + 1 if x > ip.)
c. Show that the subsequences {r2j-i} and {r^j} of odd- and even-numbered
terms both converge to <p.

7. Let {xfc} be a sequence in Kn and a a point in K.". Show that some subsequence
of {xfc} converges to x if and only if every ball centered at x contains x/j for
infinitely many values of k.

8. Show that every infinite bounded set in M.n has an accumulation point. (See
Exercises 6-7 in §1.4.)

Let {x/;}^° be a bounded sequence in R. For m = 1, 2,3,..., let

Ym = sup{xm,xm+i,xm+2,...}, ym = inf{xm,xm+i,xm+2,- ■ ■}■

Then the sequence {Ym} is bounded and decreasing, and {ym} is bounded and
increasing (because the sup and inf are being taken over fewer and fewer numbers
as m increases), so they both converge. The limits limym and limym are called
the limit superior and limit inferior of the sequence {xk}, respectively; they are
denoted by limsup^^ xk and liminf/c-+O0 xk:

= lim (supjxfc : k > in}), liminf xk = lim (inf{xfc : k > m}).

The following exercises pertain to these ideas.

9. Show that lim sup xk is the number a uniquely specified by the following prop
erty: For any e > 0, there are infinitely many k for which xk > a — e but only
finitely many for which xk > a + e. What is the corresponding condition for
liminf Xfc?
30 Chapter 1. Setting the Stage

10. Show that there is a subsequence of {xk} that converges to lim sup x^, and
one that converges to lim inf Xk-

11. Show that if a £ R is the limit of some subsequence of {xk}, then lim inf Xk <
a < limsupxfc.

12. Show that {xk} converges if and only if lim sup xk — lim inf Xk, in which
case this common value is equal to limxfc.

1.6 Compactness

A subset of Rn is called compact if it is both closed and bounded. (Note: The

notion of compactness can be extended to settings other than Rn, but a different
definition must be adopted; see the concluding paragraph of this section.) Com
pactness is an important property, principally because it yields existence theorems
for limits in many situations. The fundamental result is the following theorem.

1.21 Theorem (The Bolzano-Weierstrass Theorem). If S is a subset ofW1, the

following are equivalent:
a. S is compact.
b. Every sequence of points in S has a convergent subsequence whose limit lies
in S.

Proof. Suppose 5 is compact. If {x/c} is a sequence in 5, it has a convergent sub

sequence by Theorem 1.19 since S is bounded, and the limit lies in S by Theorem
1.14 since S is closed; thus (b) holds.
On the other hand, suppose S is not compact, i.e., S is either not closed or not
bounded. If S is not bounded, there is a sequence of points {x/j} in S such that
Xfc| —» oo. But then {x/J has no convergent subsequence, as any subsequence
must also satisfy |x^. | —> oo. If S is not closed, there is a point x that lies in S
but not in S. By Theorem 1.14 there is a sequence {x/J in S that converges to x.
Every subsequence also converges to x, which is not in S. Thus (b) is false if S is
either not closed or not bounded. □

Remark. Every finite subset of R" is obviously compact. If 5 is finite, (b) is

true because if {xfc} is a sequence in S, then there must be a single point x € 5
such that xjt = x for infinitely many k; the subsequence consisting of those x^'s
trivially converges to x.

The Bolzano-Weierstrass theorem paves the way to the fundamental connection

between continuity and compactness:
1.6. Compactness 31

1.22 Theorem. Continuous functions map compact sets to compact sets. That is,
suppose that S is a compact subset ofW1 and f : S —> Rm is continuous at every
point of S. Then the set
f(5) = {f(x) :x£S}

is also compact.

Proof. Suppose {yk} is a sequence in the image f (S). For each k there is a point
xfc € V such that y* = f (xfc). Since S is compact, by the Bolzano-Weierstrass
theorem the sequence {x^} has a convergent subsequence {x^} whose limit a
lies in S. Since f is continuous at a, by Theorem 1.15 the sequence {y^} =
{f(xfc.)} converges to the point f(a) e f(5). Thus, every sequence in f (5) has a
subsequence whose limit lies in i(S). By the Bolzano-Weierstrass theorem again,
f (S) is compact. □

It is not true, in general, that continuous functions map closed sets to closed
sets, or bounded sets to bounded sets. (See Exercises 1-2.) Only the combination
of closedness and boundedness is preserved.
An immediate consequence of Theorem 1.22 is the fundamental existence the
orem for maxima and minima of real-valued functions.

1.23 Corollary (The Extreme Value Theorem). Suppose S C Rn is compact and

f : S —> R is continuous. Then f has an absolute minimum value and an absolute
maximum value on S; that is, there exist points a, b € 5 such that /(a) < /(x) <
f(b)forallxeS.

Proof. By Theorem 1.22, the set f(S) is a compact subset of K. Thus, it is

bounded, so inf f(S) and sup/(Sl) exist, and closed, so inf f(S) and sup/(5)
actually belong to f(S). But this says precisely that the set of values of / on V has
a smallest and a largest element, as desired. □

The assumption that S is compact is necessary. If 5 is not closed or not

bounded, the function / might be unbounded, or its extreme values might occur
at points on the boundary of 5 that are not in S or "at infinity." Here are a few
simple counterexamples with n = 1:

• f(x) = x, S = (0,1). (The extreme values occur on the boundary.)

• f(x) = cot ttx, S = (0,1). (The values of / range from —oo to oo.)

• f(x) = arctanx, S = R. (/ approaches but does not achieve the extreme

values ±5?r.)
32 Chapter 1. Setting the Stage

• f(x) = 3x — x3, S = M. (/ has a local maximum at x = 1 and a local

minimum at x = —1, but no absolute maximum or minimum.)

Compactness also has another consequence that turns out to be extremely useful
in more advanced mathematical analysis, although its significance may not be very
clear at first sight. (It will not be used elsewhere in this book except in some of the
technical arguments in Appendix B, so it may be regarded as an optional topic.)
Suppose S is a subset of M.n. A collection II of subsets of Rn is called a covering
of S if S is contained in the union of the sets in U. For example, for each xeS
we could pick an open ball 5X centered at x; then U — {-Bx : x e S} is a covering
of 5.

1.24 Theorem (The Heine-Borel Theorem). If S is a subset of W1, the following

are equivalent:
a. S is compact.
b.IfU is any covering of S by open sets, there is a finite subcollection ofU that
still forms a covering of S. (In brief: Every open covering of S has a finite
subcovering.)

Proof. The proof is given in Appendix B.I (Theorem B.I). □

Much of what we have done in this section and the preceding ones can be
generalized from subsets of Rn to subsets of more general spaces equipped with a
"distance function" that behaves more or less like the Euclidean distance d(x, y) =
x — y|. (Such spaces are known as metric spaces; see DePree and Swartz [5],
Krantz [12], or Rudin [18].) For example, in studying the geometry of a surface
5 in M3, one might want to take the "distance" between two points x, y S S to
be not the straight-line distance |x — y| but the length of the shortest curve on 5
that joins x to y. Another class of examples is provided by spaces of functions,
where the "distance" between two functions / and g can be measured in a number
of different ways; we shall say more about this in Chapter 8. In this general setting,
the Bolzano-Weierstrass and Heine-Borel theorems are no longer completely valid.
The conditions on a set S in Theorem 1.21b and Theorem 1.24b still imply that 5 is
closed and bounded, but not conversely. These conditions are still very important,
however, so a shift in terminology is called for. The condition in Theorem 1.24b -
that every open cover of S has a finite subcover — is usually taken as the definition
of compactness in the general setting, and the condition in Theorem 1.21b — that
every sequence in S has a subsequence that converges in S — is called sequential
compactness.
1.7. Connectedness 33

EXERCISES

1. Give an example of
a. a closed set S C R and a continuous function / : R —> R such that f(S)
is not closed;
b. an open set U C R and a continuous function / : R —> R such that /(£/)
is not open.

2. a. Give an example of a bounded set S C R \ {0} and a real-valued function

/ that is defined and continuous on R \ {0} such that f(S) is not bounded,
b. However, show that if / : R" —* Rm is continuous everywhere and 5 C
Rn is bounded, then f (S) is bounded.
3. Show that an infinite set S C R™ is compact if and only if every infinite subset
of S has an accumulation point that lies in S. (See Exercises 6-7 in §1.4 and
Exercise 8 in §1.5.)

4. Suppose 5 C R™ is compact, / : S —» R is continuous, and /(x) > 0 for

every x <E S. Show that there is a number c > 0 such that /(x) > c for every
x€5.

5. (A generalization of the nested interval theorem) Suppose {£&} is a sequence

of nonempty compact subsets of R™ such that S± D S2 D S3 D .... Show that
there is at least one point contained in all of the Sk's (that is, f)^ S& / 0).
(This can be done using either the Bolzano-Weierstrass theorem or the Heine-
Borel theorem. Can you find both proofs?)

6. The distance between two sets U, V C Rn is defined to be

d(U,V) =inf{|x-y| : x e U, y € V}.

a. Show that d(U, V) = 0 if either of the sets U, V contains a point in the

closure of the other one.
b. Show that if U is compact, V is closed, and U n V = 0, then d(U, V) > 0.
c. Give an example of two closed sets U and V in R2 that have no point in
common but satisfy d(U, V) = 0.

1.7 Connectedness

A set in Rn is said to be connected if it is "all in one piece," that is, if it is not the
union of two nonempty subsets that do not touch each other. The formal definition
is as follows: A set S C R" is disconnected if it is the union of two nonempty
subsets S\ and S2, neither of which intersects the closure of the other one; in this
34 Chapter 1. Setting the Stage

5 T

Figure 1.2: The sets 5 and T in Example 1.

case we shall call the pair (Si, Sa) a disconnection of 5. The set 5 is connected
if it is not disconnected.

Example 1. Let

Si = {(x, y) : (x + I)2 + y2 < l}, S2 = {(x,y) : (x - I)2 + y2 < l},

Then the set S = Si U 52 is disconnected, for the only point common to the
closures of Si and 52 is (0,0), which belongs to neither Si nor S2. However,
the set T = 5i LJ 52 is connected, for (0,0) belongs both to 52 and the closure
of Si; this point "connects" the two pieces of T. See Figure 1.2,

The connected subsets of the real line are easy to describe.

1.25 Theorem. The connected subsets ofR are precisely the intervals (open, half-
open, or closed; bounded or unbounded).

Proof. If 5 C R is not an interval, there exist a,b £ 5 and c ^ S such that

a < c < b. Let Si = S n (-00, c) and 52 = 5 n (c, 00). Then 5 = Si U 52 (since
c fi 5), and Si and 52 are nonempty since a €. Si and b € S%. The closures of
S\ and S2 are contained in (—00, c] and [c, 00), so the only point where they can
intersect is c, which is not in either Si or 52. Thus 5 is disconnected.
Conversely, suppose 5 is an interval. We shall suppose that 5 is disconnected
and derive a contradiction.
We first consider the case where 5 is compact, say 5 = [a, &]. Suppose (5i,52)
is a disconnection of 5. By relabeling if necessary, we take 52 to be the set that
contains b. Let c = sup Si. Then c belongs to the closure of Si, so it cannot be in
52; hence c G S\. In particular, c ^ b. But then the interval (c, b] is included in
52, and c is in the closure of this interval; so c is in the closure of 52 and so cannot
belong to Si. This contradiction shows that 5 must be connected.
1.7. Connectedness 35

Finally, suppose S is a noncompact interval and (Si, S2) is a disconnection of

S. Pick a G Si and b G S2; then [a, 6] C S since S is an interval. But then
[a, b] = Ti U T2 where Tx = [a, b] n Si and T2 = [a, 6] n S2. The sets Tx and T2
are nonempty (a G Ti and 6 G T2), and they are contained in Si and S2, so neither
one can intersect the closure of the other. But this means that [a, b] is disconnected,
which we have just proved to be false. Therefore, S is connected.

The following result, a cousin of Theorem 1.22, gives the basic relation between
continuity and connectedness:

1.26 Theorem. Continuous functions map connected sets to connected sets. That
is, suppose f : S —> Mm is continuous at every point of S and S is connected. Then
the set
f(S) = (f(x) : x G S}

is also connected.

Proof We proceed by contraposition; that is, we assume that f (S) is disconnected

and deduce that S is disconnected. Thus, suppose that (f/i, C/2) is a disconnection
off(S). Let

Si = {x G S : f(x) G Ui}, S2 = {x G S : f(x) G U2}.

Then Si and S2 are nonempty, and their union is S. If there were a point x G Si
belonging to the closure of S2, x would be the limit of a sequence {~x-k} m S2 by
Theorem 1.14. But then f(x) G U\ and f(xfe) G U2, so f(x) = limf(xfc) would
be in the closure of £/2 by Theorem 1.14 again. This is impossible; hence Si does
not intersect the closure of S2, and likewise, S2 does not intersect the closure of Si.
Thus S = Si U S2 is disconnected. □

1.27 Corollary (The Intermediate Value Theorem). Suppose f : S —> R is

continuous at every point of S and V C S is connected. If a, b G V and /(a) <
t < /(b) or /(b) < t < /(a), there is a point c G V such that /(c) = t.

Proof. By Theorems 1.25 and 1.26, f(V) is an interval. It contains /(a) and /(b)
and hence contains the entire interval between them. □

There is another notion of connectedness that is important in many situations.

A set S C Rn is called arcwise connected (or pathwise connected) if any two
points in S can be joined by a continuous curve in S, that is, if for any a, b in S
there is a continuous map f : [0,1] -+ Rn such that f (0) = a, f (1) = b, and
f(i) G S for alii e [0,1].
36 Chapter 1. Setting the Stage

Figure 1.3: The set defined in (1.29).

It is useful to observe that the relation of being joined by a continuous curve is

transitive; that is, if there is a continuous curve in 5 from a to b, and one from b
to c, then there is one from a to c. Namely, if f : [0,1] —> S and g : [0,1] —> S
are continuous maps with f (0) = a, f (1) = g(0) = b, and g(l) = c, we obtain a
continuous h : [0,1] —> S by gluing f and g together:

(f(2t) if0 <t< \,

h(t) =

The following results explain the relation between connectedness and arcwise
connectedness.

1.28 Theorem. If S C R™ is arcwise connected, then S is connected.

Proof. We shall assume that 5 is disconnected and show that it is not arcwise con
nected. Accordingly, suppose (Si, £2) is a disconnection of S. Pick a G S\ and
b € 6*2; we claim that there is no continuous g : [0,1] —> 5 such that g(0) = a and
g(l) = b. If there were, the set V — g([0,1]) would be connected by Theorems
1.25 and 1.26. But this cannot be so: V is the union of V n Si and V n S2; these
sets are nonempty since a 6 VtlSi and b € V (IS2, and neither of them intersects
the closure of the other. Hence 5 is not arcwise connected. D
1.7. Connectedness 37

The converse of Theorem 1.28 is false: A set can be connected without being
arcwise connected. A typical example is

(1.29) S = {(x,y) : 0 < x < 2 and y = sm(ir/x)} U {(0,y) : y G [-1,1]},

pictured in Figure 1.3. S consists of two pieces, the graph of sin(7r/x) and the
vertical line segment. These two sets do not form a disconnection of S, as the line
segment is included in the closure of the graph, but a point on the line segment
cannot be connected to a point on the graph by a continuous curve. The details are
sketched in Exercise 11.
However, open connected sets are arcwise connected:

1.30 Theorem. If S C W1 is open and connected, then S is arcwise connected.

Proof. Fix a point a G S. Let S\ be the set of points in S that can be joined to a
by a continuous curve in S, and let 52 be the set of points in S that cannot; thus S\
and 52 are disjoint and 5 — Si U 52. We shall show that
a. if x € Si, then all points sufficiently close to x are in Si',
b. if x G S is in the closure of Si, then x G Si.
(a) shows that no point of Si can be in the closure of S2, and (b) shows that no
point in the closure of Si can be in 52. Thus (Si, S2) will form a disconnection
of 5, contrary to the assumption that 5 is connected, unless S2 is empty — which
means that 5 is arcwise connected.
To prove (a) and (b), we use the fact that 5 is open, so that if x G 5, there is
a ball B centered at x that is included in 5. If x G Si, then every y £ Bis also
in Si, for y can be joined to a by first joining x to a and then joining y to x by
the straight line segment from x to y, which lies in B and hence in 5. Similarly,
if x is in the closure of Si, by Theorem 1.14 there is a sequence {x^} of points in
Si that converges to x. We have x^eB for k sufficiently large, so again, x can
be joined to a by joining x^ to a and then joining x to x^ by a line segment in B;
hence x G 5i. This completes the proof. □

EXERCISES

1. Show directly from the definition that the following sets are disconnected.
(That is, produce a disconnection for each of them.)
a. The hyperbola {(x, y) G R2 : x2 - y2 = 1}.
b. Any finite set in M.n with at least two elements. •
c. {(x,y,z)eR3:xyz>0}.
38 Chapter 1. Setting the Stage

2. Show that the unit sphere {(x,y,z) : x2 + y2 + z2 = 1} in ]R3 is arcwise

connected. Can you generalize your argument to show that the unit sphere in
Mn is arcwise connected for all n > 1?

3. Suppose / is an interval in E and / : / —> R is continuous and one-to-one (i.e.,

f(xi) ¥" f(T-2) unless x\ — x%). Show that / must be strictly increasing or
strictly decreasing on /.

4. Suppose Si and 52 are connected sets in E™ that contain at least one point in
common. Show that S\ U 52 is connected. Is it true that S\ D S2 must be
connected?

5. Show that an open set in Rn is disconnected if and only if it is the union of two
disjoint nonempty open subsets.

6. Show that a closed set in Rn is disconnected if and only if it is the union of two
disjoint nonempty closed subsets.

7. Show that 5 C R" is disconnected if and only if there is a continuous function

/ : 5 -> R such that f{S) consists of the two points 0 and 1.

8. Show that the closure of a connected set is connected.

9. Let 5 = {x : |x| = 1} be the unit sphere in Rn, and let / : 5 ->• R be a

continuous function. Assuming the fact that 5 is connected (see Exercise 2),
show that there must be a pair of diametrically opposite points on 5 at which /
assumes the same value. (Hint: Consider g(x) = /(x) — /(—x).)

10. Suppose 5 is a connected set in E2 that contains (1,3) and (4, —1). Show that
5 contains at least one point on the line x = y. (Hint: Consider f(x,y) —
x - y.)

11. Let 5 C E2 be given by (1.29).

a. Show that 5 is connected. (Hint: The curve y = sin(7r/x-), x > 0, is
arcwise connected. Use Exercise 8.)
b. Show that 5 is not arcwise connected. (Suppose f : [0,1] -> 5 is con
tinuous and satisfies f(0) = (2,0) and f(l) = (0,1). Show that the in
coordinate of f (t) must assume all values between 2 and 0 as t ranges from
0 to 1, and conclude that for each positive integer k there exists tk € [0,1]
such that t(tk) = (1/2A;, 0). By passing to a convergent subsequence, you
can suppose that to = limfc->oo tk exists. Show that the y-coordinate of
f(t) must assume all values between —1 and 1 as t ranges from tk to
and derive a contradiction.)
1.8. Uniform Continuity 39

1.8 Uniform Continuity

Suppose 5 is a subset of Rn. We recall that a function f : S —> Rm is said to be

continuous on S if, for each xgS, f (y) can be made as close as we wish to f (x)
by taking y sufficiently close to x. In general, the meaning of "sufficiently close"
will depend on x: If f is nearly constant near x, we may be able to move quite
a distance away from x without changing the value of f much, but if f is rapidly
varying near x, we will need to stay close to x to ensure that the value of f remains
close to f (x). For some purposes, however, it is important to have some control
over the rate at which f (y) approaches f (x) as y approaches x that is independent
ofx. Functions for which this is possible are called uniformly continuous.
More precisely, a function f : S —> Rm is said to be uniformly continuous on
S if for every e > 0 there is a 8 > 0 so that

|f (x) - f (y) | < e whenever x, y € S and |x — y j < 5.

The crucial point is that for simple continuity the number 5 may depend on x, but
for uniform continuity it does not. This is a rather subtle point, and the reader
should not be discouraged if its significance is not immediately clear; some very
eminent mathematicians of the past also had trouble with it!
Some readers may find it enlightening to see these conditions rewritten in a
symbolic way that makes them as concise as possible. We employ the logical sym
bols V and 3, which mean "for all" and "there exists," respectively. With this un
derstanding, the condition for f to be continuous on S is that

(1.31) Ve > 0 Vx e S 35 > 0 : Vy e S |x -y| < 8 =*- |f(x) - f(y)| < e,

whereas the condition for f to be uniformly continuous on 5 is that

(1.32) Ve> 0 38> 0 : Vx,y G 5 |x-y| < 8 => |f(x) - f(y)| < e.

The difference between (1.31) and (1.32) is that the "Vx" has been interchanged
with the "3(5," so that in (1.31) the 8 is allowed to depend on x, whereas in (1.32)
the same 5 must work for every x.

EXAMPLE 1. The function f(x) = sin x is uniformly continuous on R. Indeed,

since \f'{x)\ = | cosx| < 1 for all x, the mean value theorem (reviewed in
§2.1) shows that \f(x) - f(y)\ < \x - y\ for all x, y. Thus, we can take 5 = e,
independent ofx: If \x — y\ < e, then \f(x) - f(y)\ < e.

EXAMPLE 2. The function g(x) = x2 is not uniformly continuous on R,

essentially because the slope of the graph at x = a increases without bound
40 Chapter 1. Setting the Stage

as a —> oo. To be more precise, let us suppose that a > 0 and h > 0. Since
g{a+h)-g(a) =2ah+h2 > 2ah, there is no hope to get \g(a+h) — g(a)\ < e
unless h < a/2a. Thus, the allowable 5 in (1.31) at x = a must be smaller than
e/2a, which gets smaller as a gets larger. On the other hand, g is uniformly
continuous on every bounded interval, because on such an interval there is a
finite upper bound for \g'\, and the mean value theorem can be applied as in
Example 1.

Example 2 exemplifies the typical situation, in the following sense. On a set

that is not bounded or not closed, things can get worse and worse as one goes off to
infinity or to the boundary of the set; but on a compact set such pathologies cannot
occur.

1.33 Theorem. Suppose S C Rn and f : S —> Rm is continuous at every point of

S. If S is compact, then f is uniformly continuous on S.

Proof Suppose f is not uniformly continuous on S; we shall derive a contradiction.

The negation of the uniform continuity condition (1.32) is that

3e > 0W> 0 3x,y € S: |x-y| < 5 and |f (x) -f(y)| > e.

Taking S = 1, 5, j,..., we see that for each positive integer k there exist X&, y^ £
S such that |x& — y&| < k~l and |f (x^) — f(yfe)| > e. By the Bolzano-Weierstrass
theorem, by passing to a subsequence we may assume that {x^} converges, say to
a e 5. Since |x& — yt\ —> 0, we also have y/. —* a. But then f(x^) - f(yfc) —*
f (a) — f(a) = 0, contradicting the assertion that |f(xj.) — f(yfc)| > e. □

It is remarkable that continuity is the only condition that must be imposed on f

in this theorem. In particular, in contrast to what Examples 1 and 2 might suggest,
no conditions on the derivatives of f enter the picture, even their existence! See
Exercise 2.

EXERCISES

1. A function f : S —► Rm that satisfies

|f (x) - f (y)| < C\x - y|A for all x, y e S,

where C and A are positive constants, is said to be Holder continuous on 5

(with exponent A). Show that if f is Holder continuous on 5, then f is uniformly
continuous on S.
1.8. Uniform Continuity 41

2. Suppose 0 < A < 1.

a. Show that (a + b)x < ax + bx for all a, b > 0. (Hint: Since A - 1 < 0, for
t > 0 we have (a + t)A^"1 < tx~l. Integrate both sides from 0 to b.)
b. Let f\(x) — |x|A. Show that / satisfies the condition in Exercise 1, with
S = M and C = 1, and hence conclude that / is uniformly continuous on
R. (Note that / is unbounded on R and that the slope of its graph becomes
infinite at the origin.)

3. Suppose that f : S -> Rm and g : S -> Rm are both uniformly continuous on

5. Show that f + g is uniformly continuous on S.
4. Show that if f : 5 -> Rm is uniformly continuous on S and {xfc} is a Cauchy
sequence in S, then {f (xfc)} is also a Cauchy sequence. On the other hand, give
an example of a Cauchy sequence {x^} in (0, oo) and a continuous function
/ : (0, oo) —> R (of necessity, not uniformly continuous) such that {/(xfe)} is
not Cauchy.

5. Show that if f : S —> Rm is uniformly continuous and S is bounded, then f(S)

is bounded.
Chapter 2

DIFFERENTIAL CALCULUS

The main theme of this chapter is the theory and applications of differential cal
culus for functions of several variables. The reader is expected to be familiar with
differential calculus for functions of one variable. However, we offer a review of
the one-variable theory that contains a few features that the reader may not have
seen before, and the one-variable theory makes another appearance in the section
on Taylor's theorem.

2.1 Differentiability in One Variable

We begin with an approach to the notion of derivative that is a bit different from
the one usually found in elementary calculus books. This point of view is very
useful in more advanced work, and it is the one that leads to the proper notion of
differentiability for functions of several variables.
The basic idea is that a function / : R —> R is differentiate at x = a if it is
approximately linear near x = a. Geometrically, this means that the graph of /
has a tangent line at x = a. Analytically, it me£ns that there is a linear function
l{x) = mx + b satisfying the following two conditions:

• l(a) = f(a), so that b = /(a) - ma and hence l(x) = /(a) + m(x - a);

• the difference f(x) — l(x) tends to zero at a faster rate than x — a as x —> a,
that is,
fix) - l(x)
±±-+ ^ -> 0 as x -> a.
x — a

It will be convenient to denote the increment x — a by h, so that

f(x)-l(x) = f(a + h)-f(a)-mh.

43
44 Chapter 2. Differential Calculus

We think of this difference as a function of h and denote it by E(h); thus E(h) is

the error when we approximate /(a + h) by the linear function /(a) + mh.
We proceed to the formal definition. Suppose / is a real-valued function de
fined on some open interval in R containing the point a. We say that / is differen-
tiable at a if there is a number m such that

(2.1) f{a + h) = /(a) + mh + E(h), where lim ^^ = 0;

ft-t-0 ft,

in other words, if /(a + /i) is the sum of the linear function /(a) + mh and an error
term that tends to zero more rapidly than h as h —> 0. In this case we have

m = /(a + fe) - /(a) - £(ft) = /(a + /i) - /(a) E(h)

h h h

As h —> 0 the last term on the right vanishes, so we see that

(2.2) „***)
ft.

Thus the number m is uniquely determined, and it is the derivative of / at a as

usually defined in elementary calculus books, denoted by /'(a). Conversely, if the
limit m in (2.2) exists, then (2.1) holds with E(h) = f(a + h)- f(a) - mh. Thus,
our definition of differentiability is equivalent to the usual one; it simply puts more
emphasis on the idea of linear approximation.
Observe that if E(h)/h vanishes as h —» 0, then so does E(h) itself and hence
so does f(a + h) — /(a). That is, differentiability at a implies continuity at a.
It is often convenient to express the relation lim^_o E(h)/h = 0 by saying that
"E{h) is o(h)" (pronounced "little oh of h"), meaning that E(h) is of smaller order
of magnitude than h. Thus the differentiability of / at x = a can be expressed by
saying that f(a + h) is the sum of a linear function of h and an error term that is
o(h).
The standard rules for differentiation are easily derived from (2.1). We illustrate
the ideas by working out the product rule.

The Product Rule: Suppose / and g are differentiable at x = a. Then

/(a + h) = /(a) + f'{a)h + Ex(h), g{a + h) = g(a) + g'{a)h + E2{h),

where E\{h) and E2(h) are o(h). Multiplying these equations together yields

(2.3) f(a + h)g(a + h) = f(a)g(a) + [f(a)g(a) + f(a)g'{a)} h + E3(h),

2.1. Differentiability in One Variable 45

where

E3(h) = [/(a) + f(a)h + El{h)]E2{h) + Ex{h) [g(a) + g'(a)h].

Clearly E3(h) is o(h) since EL(h) and E2(h) are, so (2.3) is of the form (2.1)
with / replaced by fg and m = f'(a)g(a) + f(a)g'(a). In other words, fg is
differentiable at a and (fg)'(a) = f'(a)g(a) + f(a)g'(a).

The chain rule can also be derived in this way; we shall do so, in a more general
setting, in §2.3.

We can also define "one-sided derivatives" of a function / at a point a. To

wit, the left-hand derivative f'_{a) and the right-hand derivative f^(a) are the
one-sided limits

},ia)j±±Rzm

Clearly / is differentiable at a if and only if its left-hand and right-hand derivatives

at a exist and are equal. These notions are particularly useful in two situations: (i)
in discussing functions whose graphs have "corners" such as f(x) = \x\, which has
one-sided derivatives at the origin although it is not differentiable there, and (ii) in
discussing functions whose domain is a closed interval [a, b], where the one-sided
derivatives f'+(a) and f'_(b) may be significant.

The Mean Value Theorem. The definition of the derivative involves passing
from the "local" information given by the values of f(x) for x near a to the "in
finitesimal" information /'(a), which (intuitively speaking) gives the infinitesimal
change in / corresponding to an infinitesimal change in x. To reverse the process
and pass from "infinitesimal" information to "local" information — that is, to ex
tract information about / from a knowledge of /' — the principal tool is the mean
value theorem, one of the most important theoretical results of elementary calculus.
The derivation begins with the following result, which is important in its own right.

2.5 Proposition. Suppose f is defined on an open interval I and a & I. If f has

a local maximum or minimum at the point a € / and f is differentiable at a, then
f{a) = 0.

Proof. Suppose / has a local minimum at a; the argument at a maximum is similar.

In the difference quotient [/(a + h) - f(a)]/h, the numerator is > 0 for all h near
0 since f(a + h) > f(a), so the quotient has the same sign as h. It follows that the
one-sided limits as h —> 0 from the left and right must be < 0 and > 0, respectively;
since they are both equal to f'{a), the only possibility is that f'(a) = 0.
46 Chapter 2. Differential Calculus

2.6 Lemma (Rolle's Theorem). Suppose f is continuous on [a, b] and differen

tiable on (a, b). If f(a) — f(b), there is at least one point c G (a, b) such that
f'(c) = 0.
Proof. By the extreme value theorem (1.23), / assumes a maximum value and a
minimum value on [a, b]. If the maximum and minimum each occur at an endpoint,
then / is constant on [a, b] since the values at the endpoints are equal, so f'(x) = 0
for all x G (a, b). Otherwise, at least one of them occurs at some interior point
c G (a, b), and then f'(c) = 0 by Proposition 2.5. □

2.7 Theorem (Mean Value Theorem I). Suppose f is continuous on [a, b] and
differentiable on (a, b). There is at least one point c G (a, b) such that

b-a '

Proof. The straight line joining (a, /(a)) to (6, f(b)) is the graph of the function

and the assertion is that there is a point c G (a, b) where the slope of the graph
V — f(x) is the same as the slope of this line, in other words, where the derivative
of the difference g(x) = f(x) — l(x) is zero. But / and / have the same values at
a and b, so g(a) = g(b) = 0, and the conclusion then follows by applying Rolle's
theorem to g. □

The mean value theorem is nonconstructive; that is, although it asserts the ex
istence of a certain point c € (a, b), it gives no clue as to how to find that point.
Students often find this perplexing at first, but in fact the whole power of the mean
value theorem comes from situations where there is no need to know precisely
where c is. In many applications, one has information about the behavior of /' on
some interval, and one deduces information about / on that same interval. The
following theorem comprises the most important of them.
We say that a function / is increasing (resp. strictly increasing) on an interval
I if /(a) < f(b) (resp. /(a) < /(&)) whenever a,b G / and a < b; similarly for
decreasing and strictly decreasing.

2.8 Theorem. Suppose f is differentiable on the open interval I.

a. If\f(x)\ < C for all x G /, then \f(b) - f(a)\ < C\b - a\for all a,b E I.
b. If f'(x) = Ofor all x G /, then f is constant on I.
c If f'{x) > 0 (resp. f(x) > 0, f(x) < 0, or f'(x) < 0)for all x G /, then f is
increasing (resp. strictly increasing, decreasing, or strictly decreasing) on I.
2.1. Differentiability in One Variable 47

Proof. Given a, b € /, we have f(b) - /(a) = f'{c)(b - a) for some c € /. In (a)

or (b) we know that |/'(c)| < C or /'(c) = 0, respectively, and we conclude that
|/(6) - f(a)\ < C\b - a\ or f(b) = f(a). In (c), if we know that /'(c) > 0, we
conclude that f(b) - f(a) > 0 for b > a, and similarly for the other cases. □

In case the reader feels that we are belaboring the obvious here, we should point
out that the mere differentiability of / at a single point a gives less information
about the behavior of / near x = a than we would like. For example, if f'{a) > 0,
it does not follow that / is increasing in some neighborhood of a; see Exercises 3
and 4.

The mean value theorem admits the following important generalization, of

which we shall present some applications below.

2.9 Theorem (Mean Value Theorem II). Suppose that f and g are continuous on
[a, b] and differentiable on (a, b), andg'(x) / 0 for all x £ (a, b). Then there exists
c 6 (a, b) such that
f(c) = /(ft) - /(a)
g'(c) " g(b)-g(a)-

Proof. Let

h(x) = [/(&) - f(a)}[g(x) - g(a)} - [g(b) - g(a)}[f(x) - /(a)].

Then h is continuous on [a, b] and differentiable on (a, b), and h(a) = h(b) = 0.
By Rolle's theorem, there is a point c 6 (a, b) such that

0 = ti(c) = [/(&) - f(a)]g'(c) - [g(b) - g(a)]f'(c).

Since g' is never 0 on (a, b), we have g'(c) ^ 0 and also g(b) - g(a) ^ 0 (by the
mean value theorem, since g(b) - g{a) = g'{c)(b - a) for some c £ (a, b)). Hence
we can divide by both these quantities to obtain the desired result. □

L'Hopital's Rule. Often one is faced with the evaluation of limits of quotients
f(x)/g(x) where / and g both tend to zero or infinity. The collection of related
results that go under the name of THopital's rule" enable one to evaluate such
limits in many cases by examining the quotient of the derivatives, f'(x)/g'(x).
The cases involving the indeterminate form 0/0 can be summarized as follows.

2.10 Theorem (L'Hopital's Rule I). Suppose f and g are differentiable functions
on (a, b) and
lim f(x) = lim g(x) = 0.
x>a+ x>a+
48 Chapter 2. Differential Calculus

Ifg' never vanishes on (a, b) and the limit

x—>a+ g yx)

exists, then g never vanishes on (a, b) and

lim M m L.
x-*a+ g(x)

The same result holds for

• the left-hand limit Ymxx^a-, if f and g are dijferentiable on an interval (d, a),
• the two-sided limit ]xm%->a> if f and g are differentiate on intervals (d, a) and
(a, b), and
• the limit limx^oo or limT_+_00, if f and g are dijferentiable on an interval
(6,00) or (—00, b).

Proof. If we (re)define /(a) and g(a) to be 0, then / and g are continuous on the
interval [a, x] for x < b. By Theorem 2.9, for each x € (a, b) there exists c € (a, x)
(depending on x) such that

f(x) = f(x) - f(a) = f'(c)

g(x) g(x) - g(a) g'{c)'

Since c G (a,x), c approaches a+ as x does, so

m=llm = L.
>a+ g[x) C-MI+ g \C)

The proof for left-hand limits is similar, and the case of two-sided limits is obtained
by combining right-hand and left-hand limits. Finally, for the case a = ±00, we
set y = l/x and consider the functions F(y) = f{l/y) and G(y) = g(l/y).
Since F'(y) = -f(l/y)/y2 and G'(y) = -g'(l/y)/y2, we have F'(y)/G'(y) =
f/(l/y)/g'(l/y), so by the results just proved,

m. lim £M = ,,m m = lim in

Under the conditions of Theorem 2.10, it may well happen that f(x) and g'(x)
tend to zero also, so that the limit of f'(x)/g'(x) cannot be evaluated immediately.
In this case we can apply Theorem 2.10 again to evaluate the limit by examining
f"{x)/g"(x). More generally, if the functions /,/',... ,/(fe"1),£?,^,... ,g{h~l)
all tend to zero as x tends to a+ or a— or ±00, but f^k\x)/g^{x) —> L, then
f(x)/g(x) -> L,
2.1. Differentiability in One Variable 49

EXAMPLE 1. Let f(x) = 2x- sin2x, g(x) = x2 sinx, a = 0. Then /, g, and

their first two derivatives vanish at x — a, but the third derivatives do not, so

2x — sin 2x 2 —2cos2x .. 4sin2x

lim
x^sinx x-*0 2x sin x + x2 cos x z->o (2 — x2) sin x + 4x cos x
8cos2x _ 4
z->o (6 — x2) cosx — 6xsinx 3

The corresponding result for limits of the form oo/oo is also true.

2.11 Theorem (L'HopitaPs Rule II). Theorem 2.10 remains valid when the hy
pothesis that lim /(x) = \\mg{x) = 0 (as x —> a+, x —> a—, etc.) is replaced by
the hypothesis that lim |/(x)| = lim \g{x)\ = oo.

Proof. We consider the case of left-hand limits as x —> a-; the other cases follow
as in Theorem 2.10.
Given e > 0, we wish to show that \[f{x)/g(x)} - L\ < e provided that x is
sufficiently close to a on the left. Since f'(x)/g'(x) —> L and |#(x)| —» oo, we can
choose xo < a so that

-L < - and ^(x) ^ 0 for xo < x < a.

Moreover, by Theorem 2.9, if xo < x < a we have

/(x) - /(x0) /'(c)

for some c G (xq,x),
g'(c)
and hence, since xo < c < a,

f(x) ~
-I < - for xo < x < a.
(x) - g(x0)

Next, division of top and bottom by g(x) yields

/(so)
f(x) -

Since |g(x)| —► oo as x —> a, the quotients f{xo)/g{x) and g(xo)/g(x) can be

made as close to zero as we please by taking x sufficiently close to a. It follows
that for x sufficiently close to a we have

/(x)
g{x)-g{x0) g(x)
50 Chapter 2. Differential Calculus

and hence, by the preceding estimate,

f(x)
-L < e,

which is what we needed to show. D

The following special cases of Theorem 2.11 are of fundamental importance.

2.12 Corollary. For any a > Owe have

xa logx logx
lim — = lim — = hm - = 0.
x a
xa a

That is, the exponential function ex grows more rapidly than any power of x as
x —> +oo, whereas | logx| grows more slowly than any positive power of x as
x
x —> +oo and more slowly than any negative power ofx as x —> 0+.

Proof. For the first limit, let k be the smallest integer that is > a. A /c-fold appli
cation of Theorem 2.11 yields

xa „ a(a - 1) ■ • • (a - k + l)xa~k
lim — = hm — ,
a;—>+oo ex as—H-oo ex

and the latter limit is zero because a — k < 0. For the other two limits, a single
application of Theorem 2.11 suffices:

,. l°gz ,. 1 _ .. logx xa
lim = lim = 0, lim - - = lim — = 0.
x^+oo xa x-^+oo axa x—>0+ X~a x—>0+ a

By raising the quantities in Corollary 2.12 to a positive power b and replacing

a by a/b, we obtain the more general formulas

r^Q (\r\rr /y ^ ^ I 1,-vrv /v> I &

(2.13) lim ^-= lim ^^- = lim ^^- = 0 (a,6>0).
3;—>+oo e x-~»+oo a;a .-r—+0+ X °

Vector- Va/ued Functions. The differential calculus generalizes easily to func

tions of a real variable with values in Kn rather than R. If f = (/1;..., fn) is such
a function, its derivative at the point a is defined to be

ft / \ t I yQ' ~\~ tl J ^l^/

(a) = lim .
ji 'jBjnoiyBd ui 'P^ba ureuiaj uigjoaqi gnjBA UB9Ui 9qi jo saiJBjpjoo gqj jo guios
'I9A3MOH '0 = (?)/J 3-isilAV ? juiod ou si gjgqj os l(?so3 'juts -) = (?),j inq '(-iig)j
= (o)j sausiiBS (?uts '?soo) = (?)j uoipunj gqj '9jduiEX9 joj suoujunf p9npj/\
-UO1D9A Jo/ptjvA iou si uidjoai/i dnjvA uvdvu at/} jEq:) wo pgjuiod aq pjnoqs jj
•g J3jdBq3
ui XjqSnojoqj ajom siajjBUi ssaqj ssnosip ||Bqs 3^ ("|^| = ^ qdBJ§ aqj si uopssnb
ui SAjno aqj jnq '(o 'o) = (o)/J U3lU '(gl?l 'g?) = (?)j J! 'sidraBXS joj -(o)j j
}U9§irei b 3A^q jou Abui 3AJti3 3qj 'o = (»);J JI) '(»)j luiod aqj }B qaiuo aqj
}U3§ub} aqj jo uoqBju3S3jd3J oujsuiEiBd b ssaiS (o)^? + (o)j = (?)j uoipunj aqj
'0 ^ (o)^ pUB u^[ UI SA-im B JO UOIJBJU3S3jd3J OUPUIEJBd B S3AI§ UH ^ H : J JI
■["[ 'o] IBAJ3JUI sqi oj ? Suipujsaj Xq pgureiqo si q oj b uiojj
JU3UI§9S 3UIJ 3lP ;(B — Cl)? + B = (?)l ^Q U3AI§ SI 9UI[ 3qj piIB 'B - q = 3 3ABq 9A\ q
puB b sjuiod omj qgnojqi SuissBd aui] aq^ joj 'iBjnopiBd u\ -ay. + b = (?)| Xq Xhbd
-iijsuiBJBd p3ju3S9idsj si o JopsA aqj oj piiBJBd uoposjip aqi ui 12 qgnojqi aui| sqj
'0^3 PUB u^ 3 3 '13 JI ■„![ UI S3UH jq§IBflS 3qj 3JB 30UBJJOduiI JBpAOIJJBd JO
7 3UII1 JB 3[0IJJBd 3qj JO A^jpopA 3qj SJU3S3.ld3J
(?);J 3AHBAU9P 3qj '§UIJJ9S Siqi UJ -S9IJBA ? SB 3AJI10 B WO S9DBJJ IBqj ? 3UII} JB U^J
UI oUIAOUI SpiJJBd B JO UOIJISOd 3qi SI (?)j pUB '9UIIJ SB p9J9jdj:9JUI SI 1 3{qBUBA
JU9pU9d3pUI 3qj 'SI JBqX "UH UI SAinO B JO U0IJBJU3S9id9J 3UJ9UIBJBd 3qj SB SI (J
< u) uM <— ¥ : J uoipunj b jo uoijBj9jdj9jui 3uj9uiO9§ uouiuioo jsoui gqx
(■sjonpojd 93jqj ije ui aspao 9iubs
gqj ui sq jsnui 3 puB j sjojobj 9qi jBqj si 9J9q uopugjjB sp99U jBqj juiod /Cjuo 9qx)
:spnpojd ssaio joj 9{tu pnpojd sqj 3ABq 3A\
£ = U U3qA\ 'ApBJltUIS '(8 9SIDJ9Xg) ASB9 SB JSOUqB SI 9UO pUO09S 9qj pUB 'l&i JO
ffch luguoduioo qoB9 oj psqddB 3|iu pnpoid Ajcuipio sqj jsnf si 9S9qi jo jsig gqx
•? ■ J + S • ;J = ,(8 • J) ',J<* + J/* = #(jrt)
:§ puB j suopounj jojo9a oau jo pnpojd }op aqj joj 9uo puB ' j uoipunj iop9A b puB
(3S uoipunj jbibos b jo pnpoid 9qj joj 9uo :9]nj pnpojd 9qj jo suijoj omj 9JB 9J9qj
i -uoiiBnjis siq^ 05 Ajisb9 9zqBJ9U3§ uoijBpu9-i3JJip jo S9^nj jBnsn
:9SIA\lU9UOduiO0 p9UIJOJJ9d AjdUIIS SI UOUBIJU9J9JJip ]Bqi pUB '9iqBIJU9J9JJip SI
ff suoipunj ;u9uodmoo sjt jo qoB9 ji Xpo puB ji 3iqBnu9J9jjip si j iBqa smoijoj \\
■[(v)ff — (y + v)ff] I_^/ si jq§u 9qj uo jiraponb 9ou9J9jjip 9qjjo iiiauoduioo qjf 9qx
ui M!nqunu3J9JJ!<J TZ
52 Chapter 2. Differential Calculus

We shall prove this for the more general case of functions of several variables in
62.10.

EXERCISES

1. Suppose that / is differentiable on the interval / and that f'(x) > 0 for all
x € I except for finitely many points at which f'(x) = 0. Show that / is
strictly increasing on I.

2. Define the function / by f(x) = x2 sin(l/x) if x ^ 0 and /(0) = 0. Show that

/ is differentiable at every i£l, including x = 0, but that /' is discontinuous
at x = 0. (Calculating f'(x) for x / 0 is easy; to calculate /'(0) you need to
go back to the definition of derivative.)

3. Let / be the function in Exercise 2, and let g(x) = f(x) + \x. Show that
g'(0) > 0 but that there is no neighborhood of 0 on which g is increasing.
(More precisely, every interval containing 0 has subintervals on which g is
decreasing.)

4. Define the function h by h(x) = x2 if x is rational, h(x) = 0 if x is irrational.

Show that h is differentiable at x = 0, even though it is discontinuous at every
other point.

5. Suppose that / is continuous on [a, b] and differentiable on (a, b), and that the
right-hand limit L = limx^a+ f'(x) exists. Show that the right-hand derivative
f'+(a) exists and equals L. {Hint: Consider the difference quotients defining
f'+ (a) and use the mean value theorem.) Of course, the analogous result for
left-hand limits at b also holds.

6. Suppose that / is three times differentiable on an interval containing a. Show

that

(a + 2h)-2f(a + h) + f(a)
/(a)
/(a + 3/i) - 3/(a + 2h) + 3/(a + h) - /(a) __ (3)
h^o h3 W'
Can you find the generalization to higher derivatives?

7. Show thatfor any a,kK, limx^Q(l + ax)b/x = eab. (Hint: Take logarithms.)
8. Suppose f and g are differentiable functions on R with values in R".
a. Show that (f ■ g)' = f • g + f • g'.
b. Suppose also that n = 3, and show that (f x g)' = f x g + f x g'.
9. Define the function / by f(x) = e~l'x2 if x # 0, /(0) = 0.
2.2. Differentiability in Several Variables 53

a. Show that limx_>0 f(x)/xn = 0 for all n > 0. (You'll find that a simple-
minded application of Theorem 2.10 doesn't work. Try setting y = 1/x2
instead.)
b. Show that / is differentiable at x = 0 and that /'(0) = 0.
c. Show by induction on k that for x / 0, f^(x) = P(l/x)e~1/x2, where
P is a polynomial of degree 3k.
d. Show by induction on k that /(fe) (0) exists and equals 0 for all k. (Use the
results of (a) and (c) to compute the derivative of f(k~^ at x = 0 directly
from the definition, as in (b).)
The upshot is that / possesses derivatives of all orders at every point and that
/W(0) = 0 for all k.
10. Exercise 2 shows that it is possible for /' to exist at every point of an interval
I but to have discontinuities. It is an intriguing fact that when /' exists at every
point of /, it has the intermediate value property whether or not it is continuous.
More precisely:
Darboux's Theorem. Suppose / is differentiable on [a, b}. If v is any num
ber between f'(a) and f'(b), there is a point c G (a, b) such that f'(c) = v.
Prove Darboux's theorem, as follows: To simplify the notation, consider
the case a = 0,b = l. Define h : [0,2] -> R by setting /i(0) = /'(0),

X Z X

and /i(2) = /'(I). Show that h is continuous on [0,2] and apply the intermedi
ate value theorem to it. (This argument has a simple geometric interpretation,
which you can find if you think of h(x) as the slope of the chord joining a
certain pair of points on the graph of /.)

2.2 Differentiability in Several Variables

The simplest notion of derivative for a function of several variables is that of partial
derivatives, which are just the derivatives of the function with respect to each of
its variables when the others are held fixed. That is, the partial derivative of a
function f(xi,..., xn) with respect to the variable Xj is

,. / \Xl, ■ • ■ ; Xj -\- n, . . ■ , Xn) — j[X\, . . . , Xj, ■ ■ . , Xn)

h^0 h

provided that the limit exists.

54 Chapter!. Differential Calculus

The most common notations for the partial derivative just defined are

-KT-i fxj, fj, dXjf, djf.

•..■ ' J

The first one is a modification of the Leibniz notation df/dx for ordinary deriva
tives with the d replaced by the "curly d" 8. The second one, with the variable of
differentiation indicated merely as a subscript on the function, is often used when
the first one seems too cumbersome. The third one is a variation on the second one
that is used when one does not want to commit oneself to naming the independent
variables but wants to speak of "the partial derivative of / with respect to its jfth
variable." The notations fXj and fj have the disadvantage that they may conflict
with other uses of subscripts — for example, denoting an ordered list of functions
by /i, hi h, It has therefore become increasingly common in advanced math
ematics to use the notations dXjf and djf instead, which are reasonably compact
and at the same time quite unambiguous.

Example 1. Let f(x,y,z) = e *mxy . Then

^/ 1 + by - Iz

,-, , r, , df 3e3x sin xy + e3xy cos xy

J J 8x l+by-lz
a , a , df (1 + 5y-7z)e3xxcosxy — be3x sinxy
dfd2f ^
ZJ OJ dz (l+5y-72)2-

The partial derivatives of a function give information about hpw the value of
the function changes when just one of the independent variables changes; that is,
they tell^how the function varies along the lines parallel to the coordinate axes.
Sometimes this is just what is needed, but often we want something more. We may
want to know how the function behaves when several of the variables are changed at
once; or we may want to consider a new coordinate system, rotated with respect to
the old one, and ask how the function varies along the lines parallel to the new axes.
Do the partial derivatives provide such information? Without additional conditions
on the function, the answer is no.

Example 2. Let us take another look at the function in Example 1 of §1.3:

(2.14) f(x, y) = Xy ., for (x, y) ^ (0, 0), /(0, 0) = 0.

2.2. Differentiability in Several Variables 55

We have already observed that / is discontinuous at the origin; it approaches

different limits as (x,y) approaches the origin along different straight lines.
However, we have fix, 0) = 0 for all x and /(0, y) = 0 for all y, so the partial
derivatives fx(0,0) and fy(0,0) both exist and equal zero:

,.(„, 0) = lim tMnW = 0 = lim /(0,fc)

ClearlyJ^O, 0) and /^(0,0) aren't describing the behavior of / near the origin
very well: when either x or y is varied while the other is held fixed at 0,J
doesn't change at all, but when both are varied at once, / can change quite
drastically!

We need to give more thought to what it should mean for a function of several
variables to be differentiable. The right idea is provided by the characterization of
differentiability in one variable that we developed in the preceding section. Namely,
a function /(x) is differentiable at a point x = a if there is a linear function /(x)
such that l\a) = /(a) and the difference /(x) - Z(x) tends to zero faster than x - a
as x approaches a. Now, the general linear1 function of n variables has the form

/(x) = b + c\X\ + ■ • • + cnxn = b + c • x,

and the condition I [a.) = /(a) forces b to be /(a) - c • a, so that Z(x) = /(a) + c •
(x — a). With this in mind, here is the formal definition.
A function / defined on an open set S C W1 is called differentiable at a point
a G S if there is a vector c e Rn such that

In this case c (which is uniquely determined by (2.15), as we shall see shortly) is

called the gradient of/ at a and is denoted by. V/(a). Denoting the numerator
of the quotient on the left side of (2.15) by E(h), we observe that (2.15) can be
rewritten as

(2.16) /(a + h) = /(a) + V/(a) • h + E(h), where —^ -> 0 as h -> 0,

which clearly expresses the fact that /(a + h), as a function of h, is well approxi
mated by the linear function/(a) + V/(a) ■ h near h = 0.

'Unfortunately the term "linear" has two common meanings as applied to functions: "first-degree
polynomial" and "satisfying 2(ax + fey) = oZ(x) + bl(y)." The first meaning — the one used here
— allows a constant term; the second does not. See Appendix A, (A.5).
56 Chapter 2. Differential Calculus

FIGURE 2.1: A tangent plane to a smooth surface.

What does this mean? First, let us establish the geometric intuition. If n = 2,
the graph of the equation z = /(x) (with x = (x, y)) represents a surface in
3-space, and the graph of the equation z = /(a) + V/(a) • (x - a) (x is the
variable; a is fixed) represents a plane. These two objects both pass through the
point (a, /(a)), and at nearby points x = a + h we have

Surface ~ %>lane = /(a + h) - /(a) - V/(a) ■ h.

Condition (2.16) says precisely that this difference tends to zero faster than h as
h —» 0. Geometrically, this means that the plane z = /(a) + V/(a) • (x — a) is
the tangent plane to the surface z = /(x) at x = a, as indicated in Figure 2.1.
The same interpretation is valid in any number of variables, with a little stretch of
the imagination: The equation z = /(x) represents a "hypersurface" in Rn+1 with
coordinates (xi,... ,xn,z), and the equation z = /(a) + V/(a)-(x — a) represents
its "tangent hyperplane" at a.
Next, let us establish the connection with partial derivatives and the uniqueness
of the vector c in (2.15). Suppose / is differentiable at a. If we take the increment
h in (2.16) to be of the form h = (h, 0,..., 0) with h G R, we have c ■ h = ah
and |h| = ±h (depending on the sign of h). Thus (2.16) says (after multiplying
through by —1 if h is negative) that

f(ai+h,a2,-. ■ ,an) - f(ai,... ,an)

lim c\ = 0,
/i-»0 h

or in other words, that c\ = dif(a). Likewise, c, = djf(a) for j = 2,..., n. To

summarize:

2.17 Theorem. If f is differentiable at a, then the partial derivatives djf(a) all

exist, and they are the components of the vector V/(a).

We also have the following:

2.18 Theorem. If f is differentiable at a, then f is continuous at a.

2.2. Differentiability in Several Variables 57

Proof. Multiplying (2.15) through by |h|, weseethat/(a+h)-/(a)-V/(a)-h -»

Oash —> 0. Since V/(a)-h clearly vanishes ash does, we have/(a+h)-/(a) —»
0 as h —> 0, which says precisely that / is continuous at a. □

The converses of Theorems 2.17 and 2.18 are false. The continuity of / does
not imply the differentiability of / even in dimension n = 1 (think of functions like}
/(x) = |x| whose graphs have corners). When n > 1, the mere existence of the
partial derivatives of / does not imply the differentiability of / either. The example
(2.14) demonstrates this: Its partial derivatives exist, but it is not continuous at the
origin, so it cannot be differentiable there.
To restate what we have just shown: For a function / to be differentiable at a
it is necessary for the partial derivatives djf(a) to exist, but not sufficient. How,
then, do we know when a function is differentiable? Fortunately, there is a simple
condition, not too much stronger than the existence of the partial derivatives, that
guarantees differentiability.

2.19 Theorem. Let/f be a function definedbn an open set in W1 that contains the
point(S) Suppose that the partial derivatives dj f jail exist on some neighborhood of
a and that they areCgontinuous at a. Then f is differentiable at a.

Proof. Let's consider the case n = 2, to keep the notation simple. We wish to show
that

(2.20) /(a + h)~^(a)~c'h->0ash->0, wherec= (ft/(a),ft/(a)).

To do this, we shall analyze the increment /(a + h) — /(a) by making the change
one variable at a time:

(2.21) /(a + h) - /(a) = [f{at + hua2 + h2) - f(au a2 + h2)}

+ [/(«i, a2 + h2) - ]
We assume that h is small enough so that the partial derivatives djf(x) exist when
ever |x — a| < |h|. In this case, we can use the one-variable mean value theorem to
express the differences on the right side of (2.21) in terms of the partial derivatives
of / at suitable points. If we set g(t) = /(£, a2 + h2), we have

h1,a2 + h2) - /(oi, a2 + h2) = g(ax + hi) -

= g'{ai + a)hi = d\f{ax + ci, a2

for some number c\ lying between 0 and h\. Similarly,

, a2 + h2) - f(ai,a2) = d2f{a1,a2 + c2)h2

58 Chapter 2. Differential Calculus

for some c2 between 0 and h%. Substituting these results back into (2.21) and then
into the left side of (2.20), we obtain

/(a + h)-/(a) -c-h f /i!

|h| " = [Oij{ai + ci, a2 + /i2) -<91/(ai,a2)J|T-y

+ [<92/(ai, a2 + c2) - 52/(ai, a2)] -^.

Now let h —> 0. The expressions in brackets tend to 0 because the partial deriva
tives djf are continuous at a, and the ratios hi/\h\ and /i2/|h| are bounded by 1 in
absolute value. Thus (2.20) is valid and / is differentiable at a.
The idea for general n is exactly the same. We write /(a + h) — /(a) as the
sum of n increments, each of which involves a change in only one variable — for
example, the first of them is

/(ai + hi , a2 + /)-2,... ,an + hn) - f(<n, a2 + /i2,..., an + hn)

— and then use the mean value theorem to express each difference in terms of a
partial derivative of / and proceed as before. □

A function / whose partial derivatives djf all exist and are continuous on an
open set S is said to be of class C1 on 5. For short, we shall also say that "/ is
C1 on S" or "/ e Cl{S)" and refer to "a C1 function /." Theorems 2.17 and 2.19
then say that

C1 => differentiable =^- partial derivatives exist.

The reverse implications are false. We already know that existence of partial deriva
tives does not imply differentiability, and there are differentiable functions whose
derivatives are discontinuous. The standard example in one variable is the function
in Exercise 2, §2.1, and it is easy to generate higher-dimensional examples from
this one.
For most of the elementary functions that we shall work with, the continuity
of the partial derivatives is obvious by inspection, so verifying the differentiability
of a function is usually no problem. For example, for (x, y) ^ (0,0) the partial
derivatives of our old friend (2.14) are

0 ti \
dxf(x,y) = , 2
y3 - x<2y
2.2,
a a \ x3 - xy2
dyf{x,y) = -^-—2VK,
{xl + yiy {xz + yz)z

which are continuous everywhere except at the origin (but not at the origin). Thus
/ is differentiable at every point except the origin.
We conclude this section by examining a few ramifications of the notion of
differentiability.
2.2. Differentiability in Several Variables 59

Differentials. Suppose / is differentiable at a, so that

/(a + h) - /(a) = V/(a) ■ h + error,

where the error term is negligibly small in comparison with h. If we neglect the
error term, the resulting approximation to the increment /(a + h) — /(a) is called
the differential of / at a and is denoted by d/(a; h) or dfa(h):

(2.22) d/(a; h) = d/a(h) = V/(a) • h = ft/(a)/i! + • • • + dnf(a)hn.

If we set /(x) = u and h = dx = (dx\,..., dxn), this formula can be written

informally as

du = ——dx\ + ——dx2 + ■ ■ ■ + -z—dxn.

OX\ OX2 OXn

We can think of this in two ways. Intuitively, if we think of dxi,..., dxn as in

finitesimal increments in the independent variables x±,... ,xn, then du is the cor
responding infinitesimal increment in the dependent variable u. Or, if we think of
dxi,..., dxn as honest, finite increments, du is the corresponding increment in the
u value, not on the (hyper)surface u = /(x), but on its tangent (hyper)plane: It is
the linear approximation to the increment in the function /.
Differentials obey the usual elementary rules of differentiation, such as the sum,
product, and quotient rules:

d(fg) = fdg + gdf,

This follows from (2.22) and the fact that the partial derivatives obey these rules.
We'll see later how differentials interact with the chain rule.
Differentials are handy for approximating small changes in a function. Here's
an example:

Example 3. A right circular cone has height 5 and base radius 3. (a) About
how much does the volume increase if the height is increased to 5.02 and the
radius is increased to 3.01? (b) If the height is increased to 5.02, by about how
much should the radius be decreased to keep the volume constant?
Solution. The volume of a cone is given by V = ^nr2h, so dV =
\ixrhdr + |vrr2 dh. (a) If r = 3, h = 5, dr = .01, and dh = .02, we
have dV = |tt(3)(5)(.O1) + iyr(32)(.O2) = .16rr w .50. (b) If r = 3, h = 5,
dh = .02, as in (a) we have dV = 10n dr + .06tt, so dV = 0 if dr = -.006.
60 Chapter 2. Differential Calculus

Directional Derivatives. The partial derivatives djf give information about

how /(x) varies as x moves along lines parallel to the coordinate axes. Sometimes
we wish to study the variation of / along oblique lines instead. Thus, given a unit
vector u and a base point a, we consider the line passing through a in the direction
u, which can be represented parametrically by g(t) = a + tu. The directional
derivative of / at a in the direction u is defined to be

<9u/(a) = /(a + tu)|t=0 = lim,

provided that the limit exists. For example, if u is the unit vector in the positive
jth coordinate direction (that is, u = (0,..., 1,... , 0) with the 1 in the jth place),
then duf is just the partial derivative djf.

2.23 Theorem. Iff is dijferentiable at a, then the directional derivatives of f at a

all exist, and they are given by

(2.24) au/(a) = V/(a) ■ u.

Proof. Differentiability of / means that

,2.25, /(a + h)-/(a)

lhl
We take h = tu. If t > 0, then |h| = t and the expression on the left of (2.25) is

If t < 0, then |h| = -t and the expression on the left of (2.25) is

_/(a + tu)-/(a) + v/(a) _ ^

In either case, this quantity tends to 0 as t -> 0, which means that duf(a) exists
and equals V/(a) • u. Q

It is possible for all the directional derivatives of / to exist even if / is not

differentiable, but in that case they cannot be computed from the partial derivatives
by the simple formula (2.24); see Exercise 7.
Consideration of directional derivatives leads to a geometric interpretation of
the gradient vector V/(a) when this vector is nonzero. Indeed, by (2.24) and
Cauchy's inequality, we have |<9u/(a)| < |V/(a)| for every unit vector u, and
the extreme case duf(a.) = |V/(a)| occurs when u is the unit vector in the same
2.2. Differentiability in Several Variables 61

direction as V/(a). Thus, V/(a) is the vector whose magnitude is the largest di
rectional derivative of / at a, and whose direction is the direction of that derivative.
In other words, V/(a) points in the direction of steepest increase of f at a, and its
magnitude is the rate of increase of f in that direction.

EXAMPLE 4. Let f(x, y) = x2 + bxy2, a = (-2,1). (a) Find the directional

derivative of / at a direction of the vector v = (12,5). (b) What is the largest
of the directional derivatives of / at a, and in what direction does it occur?
Solution. We have Vf{x,y) = (2x + by2, lOxy), so that V/(-2,1) =
(1, —20). The unit vector in the direction of v is u = (y|, ^), so the direc
tional derivative in this direction is V/(a) ■ u = (1, -20) • (jf, ^) = -f§-
The largest directional derivative at a is |V/(a)| = v^Ol, and it occurs in the
direction -^==(1,-20).

EXERCISES

1. For each of the following functions /, (i) compute V/, (ii) find the directional
derivative of / at the point (1, —2) in the direction (§,§)■
a. f(x,y) = x2y + sinirxy.

c. f{x, y) = (x + 2y + 4)/(7z + 3y).

2. For each of the following functions /, (i) compute the differential df, (ii) use
the differential to estimate the difference /(1.1,1.2, -0.1) - /(1,1,0).
/> / \ O f o/_|_Q y
at I t* ii y 1 — rp** c,itj }j\xJ*'
■ J \x> Uiz) — x e
b. f(x,y, z) = y3 + log{x + z2).
x2y3/2z
3. Let w = f(x,y,z) = ■ —. Suppose that, at the outset, (x.y.z) =

(5,4,1), so that w = 100. Use differentials to answer the following ques

tions.
a. Suppose we change x to 5.03 and y to 3.92. By (about) how much should
we change z in order to keep w = 100?
b. Suppose we want to increase the value of w a little bit by changing the
value of only one of the independent variables. Which variable should
we choose to get the biggest increase in w for the smallest change of the
independent variable?

4. Show that u = f(x, y, z) = xe2z + y 1e5z satisfies the differential equation

xdxu + 2ydyu + dzu = 3u.
62 Chapter 2. Differential Calculus

5. Show that u = f(x, y) = xy/(xy — y + 2x) satisfies the differential equation

x2dxu + y2dyu = u2.
6. For j = 1,..., n, define the function fj on Rn \ {0} by fj (x) = Xj/\x\. Show
that YH Xj dfj = 0.
2

7. Let f(x, y) = J^ if (z, y) / (0,0) and /(0,0) = 0.

a. Show that / is continuous at (0,0). (Hint: Since 0 < (x ± y)2 = x2 +
y2 ± 2xy, we have \xy\ < \{x2 + y2) for all x,y.)
b. Show that the directional derivatives <9u/(0, 0) all exist, and compute them.
(Work directly with the definition of directional derivative. The best way
to write a unit vector in R2 is as u = (cos 9, sin 6).)
c. Show that / is not differentiable at (0,0). (Hint: If it were, the directional
derivatives duf(0, 0) would be related to the partial derivatives dxf(O, 0)
and dyf(0,0) by (2.24).)
8. Suppose / is a function denned on an open set S C IK". Show that if the
partial derivatives djf exist and are bounded on S, then / is continuous on S.
(Exercise 7 provides an example of a function that satisfies these conditions on
S = R2 but is not everywhere differentiable.)

2.3 The Chain Rule

There are several different but closely related versions of the chain rule for func
tions of several variables. The most basic one concerns the situation where we have
a function f(x±,..., xn) and the variables x\,...,xn are themselves functions of
a single real variable t. To be precise, suppose Xj = gj(t), or x = g(i); we then
have the composite function ip(t) = /(g(i))-
We recall that the derivative g'(t) is defined componentwise:

Geometrically speaking, the equation x = g(i) represents a parametrized curve in

Rn; we may think of a particle moving in W1 whose position at time t is g(i). In
this case the vector g'(t) is the velocity of the particle at time t; it is tangent to the
curve at g(i), and its magnitude is the speed at which the particle is traveling along
the curve.

2.26 Theorem (Chain Rule I). Suppose that g(t) is differentiable at t = a, f(x)
is differentiable at x = b, and b = g(a). Then the composite function ip{t) =
f(g{t)) is differentiable at t — a, and its derivative is given by
2.3. The Chain Rule 63

or, in Leibniz notation, with w = /(x),

dw dw dx\ dw dxn
(2.27)
dt dx\ dt dxn dt

Proof. Differentiability of / and g at the appropriate points means that

/(b + h) = /(b) + V/(b) • h + £i(h), £i(h)/|h| -^ 0 as h -> 0;

g(a + u) = g(o) + itg'(a) + E2(u), |E2(u)|/u -> 0 as u -> 0.

In the first equation we take h = g(a + u) - g(a). By the second equation, we also
have h = ug'(a) + E2(tt), and we are given that g(a) = b, so

<p{a + u) = /(g(a + u)) = /(b + h) = /(b) + V/(b) ■ h + Ei(h)

= /(g(«)) + V/(b) • [ug'(a) + E2(u)] + £?i(h)

where

We claim that the error term Es(n) satisfies E3(u)/u —> 0 as it —> 0. Granted this,
we have

= V/(b) • g'(a) V/(b) • g'(a) as u^ 0,

so that if'(a) = V/(b) • g'(a) as claimed.

Showing that E${u)/u —^ 0 is just a matter of sorting out the mess a little.
The fact that |E2(u)|/a —> 0 takes care of the first term in E^{u), by Cauchy's
inequality:
|V/(b)-E2(u)
< 0.
ii

It also implies that when tt is small we have |E2(tt)| < \u\ and hence

Now the second term in Ez(u), namely E\{h), becomes negligibly small in com
parison to |h| as |h| —¥ 0, and the estimate above shows that |h| in turn is bounded
by a constant times \u\, so Ei(h) becomes negligibly small in comparison to \u\ as
u —> 0, which means that E\(h)/u —> 0 as desired. □
64 Chapter 2. Differential Calculus

Example 1. Suppose w = f(x, y, z) is a differentiable function of (x, y, z),

and that x = t4 — t, y = sin3i, and z = e"2t. Then w can be regarded as a
composite function of t, and we have

dw d „. A _n+.
— = — fit - t, sin 3t, e )
(it (it
= (5i/) • (4t3 - 1) + (d2f) • (3cos3t) + (d3f) ■ (-2e~2t),

where the partial derivatives djf are all evaluated at (i4 — t, sin3t, e~2t).

Suppose now that the variables xi,...,xn are differentiable functions, not of
a single real variable t, but of a family of variables t = (ti,..., tm); say, Xj =
gj (ti,..., tm), or x = g(t). If / is a differentiable function of x, we then have the
composite function <^(t) = /(g(t)). The chain rule, as stated above, can be used
to compute the partial derivatives of if with respect to the variables tk- Indeed, we
simply fix all but one of those variables and apply the chain rule to the resulting
function of the remaining single variable to obtain

(2.28)

or, setting w — /(x),

dw dw dx\ dw dxn
dtk dxi dtk dxn dtk '

To be precise, this calculation shows that if the partial derivatives dg/dt^ exist
at t = a and if f is differentiable at x = b — g(a), then the partial derivatives
dip /dtk exist at t = a and are given by (2.28). It also shows that ifg is of class
C1 near a and f is of class C1 near b = g(a), then ip is of class C1, and in
particular is differentiable, near a. Indeed, under these hypotheses, (2.28) shows
that the partial derivatives dtp/dtk are continuous.
It is also natural to ask whether the composite function / o g is differentiable
when / and g are only assumed to be differentiable rather than C1. The answer is
affirmative. When t is only a single real variable, this result is contained in the chain
rule as stated and proved above. The proof for the general case, t = (t\,..., tm),
is almost identical except that the notation is a little messier, and we shall not take
the trouble to write it out. But we shall give a formal statement of the result:

2.29 Theorem (Chain Rule II). Suppose that gi,...,gn are functions of t =
(t\,..., tm) and f is a function o/x = (xi,..., xn). Let b = g(a) and ip = f o g.
Ifgi,..., gn are differentiable at a (resp. of class Cl near a) and f is differentiable
2.3. The Chain Rule 65

at b (resp. of class C1 near b), then tp is differentiable at a (resp. of class C1 near

a.), and its partial derivatives are given

dtk -
i
where the derivatives df/dxj are evaluated at b and the derivatives dcp/dtk and
are evaluated at a.

Example 2. Suppose that / is a differentiable function of x and y and that

x — slog(l + t2) and y = cos(s3 + 5i). Then the partial derivatives of the
composite function z = /(slog(l + t2), cos(s3 + 5t)) are given by

Here, the partial derivatives of / are to be evaluated at (s log(l +12), cos(s3 +

U)).

The chain rule (2.30) has a neat interpretation in terms of differentials. Let
w = /(x). If we regard x\,... ,xn as independent variables, we have

(2.31) dw = -—dx\ -\ + -— dxn.

dx\ dxn

On the other hand, if we regard xi,..., xn as functions of the variables ti,... ,tm
and w as the composite function /(x(t)), we have

(1.51) dxj — -77—dti + ■

oil dtm

and

(2.33) dw=d^dtl + .-. + ^dtm.

If we substitute the expressions (2.32) for dxj into (2.31) and regroup the terms,
we obtain

dw \dxi dxi dw \dxn dxn

dw = -— —— dti H h wr-dtm -\ \--— -wr-dti -\ h ^r-dtm
dxi [dti dtm J dxn [dti dtm
dw dxi dw dxn] \dw dx\ dw dxn
dt\-\ h -—
\ dt\ dxndti\ \dx\dtm dxndt
66 Chapter 2. Differential Calculus

The content of the chain rule (2.30) is precisely that this last expression for dw coin
cides with (2.33). In other words, the differential formalism has the chain rule "built
in," just as it does in one variable (where the chain rule dw/dt = (dw/dx)(dx/dt)
is just a matter of "canceling the dx's").
The preceding discussion concerns the situation where the variable w depends
on a set of variables Xj, and the x/s depend on a different set of variables t^.
However, in many situations the variables on different "levels" can get mixed up
with each other. The typical example is as follows. Consider a physical quantity
w = f(x, y, z, t) whose value depends on the position (x, y, z) and the time t
(temperature, for example, or air pressure in a region of the atmosphere). Consider
also a vehicle moving through space, so that its coordinates (x, y, z) are functions
of t. We wish to know how the quantity w varies in time, as measured by an
observer on the vehicle; that is, we are interested in the behavior of the composite
function
w = f(x(t),y(t),z(t),t).
Here t enters not only as a "first-level" variable, as the last argument of /, but also
as a "second-level" variable through the ^-dependence of x, y, z.
How should this be handled? There is no real problem; the only final indepen
dent variable is t, so the chain rule in the form (2.27) can be applied:

dw dw dx dw dy dw dz dw
(2'34) ~dt=~dx~ltt+^y"diJr~dz~Tt+~di'
In the last term we have omitted the derivative dt/dt, which of course equals 1. (If
this makes you nervous, denote the fourth variable in / by u instead of t; then we
are considering w = f(x(t),y(t),z(t),u(t)) where u(t) — t.)
Notice the subtle use of notation: The dw/dt on the left of (2.34) denotes the
"total derivative" of w, taking into account all the ways in which w depends on t,
whereas the dw/dt on the right denotes the partial derivative that involves only the
explicit dependence of the function / on its fourth variable t. This notation works
well enough in this situation, but it becomes inadequate if there is more than one
final independent variable.
Suppose, for example, that we are studying a function w = f(x, y, t, s), and
that x and y are themselves functions of the independent variables t and s. Then
the analogue of (2.34) would be

dw dw dx dw dy dw
dt ~= ~dx~~di + ~dy~~di + ~dt'
but this is nonsense! The dw/dt's on the left and on the right denote different
things. In such a situation we must use one of the alternative notations for partial
2.3. The Chain Rule 67

Figure 2.2: Diagram of dependence for the basic chain rule.

derivatives that offer more precision, or perhaps add some subscripts to the dw/dfs
to specify their meaning. In this case, if x = </?(*, s) and y = tjj(t, s), we could
write

(2.35) -^ = (difiidup) + (d2f)(d1ij) + d3f.

The mixture of dependent-and-independent-variable notation on the left and

functional notation on the right in (2.35) is perhaps inelegant, but it does the job!
In general, it is best not to be too doctrinaire about deciding to use one notation
for partial derivatives rather than another one; clarity is more important than con
sistency. We shall be quite free about adopting whichever notation works best in a
particular situation, and the exercises aim at encouraging the reader to do likewise.
When the relations among the variables become too complicated for comfort,
we can often sort things out by drawing a schematic diagram of the functional
relationships. The idea is as follows:
i. Write down the dependent variable on the left of the page, a list of the inde
pendent variables on which it ultimately depends on the right, and lists of the
intermediate variables in the middle,
ii. Whenever one variable p depends directly on another one q, draw a line joining
them; this line represents the partial derivative dp/dq.
iii. To find the derivative of the variable w on the left with respect to one of the
variables t on the right, consider all the ways you can go from w to t by follow
ing the lines. For each such path, write down the product of partial derivatives
corresponding to the lines along the path, then add the results.
The diagram for the basic chain rule (2.27) is shown in Figure 2.2: The path
from w to Xj to t gives the term (dw/dxj)(dxj/dt) in (2.27). On the other hand,
Figure 2.3 gives the diagram for w = f(x, y, t, s) where x and y depend on t and
s: There are three paths from w to t (w to x to t, w to y to t, and w to t directly)
that give the three terms on the right of (2.35).
68 Chapter 2. Differential Calculus

Figure 2.3: Diagram of dependence for w = f(x,y,t,s), x

ip(t,s),y = ip{t,s).

Here is another useful corollary of the chain rule. A function / on Rn is called

(positively) homogeneous of degree a (a € M.) if /(£x) = ta/(x) for all £ > 0 and

2.36 Theorem (Euler's Theorem). If f is homogeneous of degree a, then at any

point x where f is differentiable we have

xidif(x) + x2d2f(x) H h xndnf{-x) = a/(x).

Proof. Consider the function <p(t) = f{tx). On the one hand, since /(tx) =
tQ/(x), we have <p'(t) = ata~lf(yi) = at~lf(tx). On the other, by the chain rule
we have

</(;) = v/(ix)--^(tx) = x-v/(;x).

Setting t = 1 and equating the two expressions for <p'(l), we obtain the asserted
result. □

We conclude this section with an additional geometric insight into the meaning
of the gradient of a function. If F is a differentiable function of (x, y, z) € M3,
the locus of the equation F(x,y,z) — 0 is typically a smooth two-dimensional
surface in S in R3. (We shall consider this matter more systematically in Chapter
3.) Suppose that (x, y, z) — g(t) is a parametric represention of a smooth curve on
S. On the one hand, by the chain rule we have (d/dt)F(g(t)) = VF(g(i)) • g'(t).
On the other hand, since the curve lies on S, we have F(g(£)) = 0 for all t and
hence (d/dt)F(g(t)) = 0. Thus, for any curve on the S, the gradient of F is
orthogonal to the tangent vector to the curve at each point on the curve. Since such
curves can go in any direction on the surface, we conclude that at any point a G S,
VF(a) is orthogonal to every vector that is tangent to S at a. (Of course, this is
interesting only if VF(a) ^ 0.) We summarize:
2.3. The Chain Rule 69

2.37 Theorem. Suppose that F is a differentiable function on some open set U C

M.3, and suppose that the set

S={(x,y,z)eU:F{x,y,z) = O}

is a smooth surface. //a€5 and VF(a) / 0, then the vector VF(a) is perpen
dicular, or normal, to the surface S at a.

2.38 Corollary. Under the conditions of the theorem, the equation of the tangent
plane to S at a is VF(a) ■ (x - a) — 0.

This formula for the tangent plane to a surface agrees with the one we gave in
§2.2 when the surface is the graph of a function f(x,y). The easy verification is
left to the reader (Exercise 5).
A similar result holds if we have two equations F(x, y, z) = 0 and G(x, y, z) =
0. Each of them (usually) represents a surface, and the intersection of the two
surfaces is (usually) a curve. At any point a on this curve, the vectors VF(a) and
VG(a) are both perpendicular to the curve, and if they are linearly independent,
they span the normal plane to the curve at a.
These ideas carry over into dimensions other than 3. For n = 2, an equation
F(x, y) = 0 typically represents a curve C, and VF(a, b) is normal to C at each
(a, b) G C. For n > 3, we simply stretch our imagination to say that VF(a) is
normal to the hypersurface defined by F(x) — 0 at x = a.

EXERCISES

In these exercises, all functions in question are assumed to be differentiable.

1. Find the indicated derivatives of w in terms of the derivatives of /, g, h.

a. w = f(x, y, t), x = g(y, t), y = h(t). What is dw/dtl
b. w = f(x,u,v), u = g(x,y), v = h(x,z). What are dxw, dyw, dzwl
(dxw refers to the complete dependence of to on x, as opposed to d\f.)
c. w = f(u), u = g(x, y), y = h(x). What is dw/dxl
2. Find dxw and dyw in terms of the partial derivatives d\f, 62/, and d^f.
a. w = f(2x — y2, xsin3y, x4).
b. u; = f{ex-3y, log(:r2 + 1), vA/4 + 4).
C. w = arctan[/(y2, 2x — y, —4)].
3. Show that the given function u satisfies the given differential equation.
a. u = f(3x + 2y); 2dxu - 3dyu = 0.
b. u = xy + xf(y/x); xdxu + ydyu — u = xy.
70 Chapter 2. Differential Calculus

c. u — f(xz, yz); xdxu + ydyu = zdzu.

4. Let u = f(r) and r = |x| = (x? + • • • + x2n)1/2. Show that YX(duldxi)2 =

[f(r)?.
5. Show that the formula for the tangent plane to the surface z = f(x,y) given in
§2.2 coincides with the formula for the tangent plane to the surface F(x, y, z) =
0 given in this section, when F(x, y, z) = f(x, y) — z.
6. Find the tangent plane to the surface in 1R3 described by the given equation at
the given point a € M3.
a. z = x2 -y3, a= (2,-1,5).
b. x2 + 2j/2 + 3z2 = 6, a = (1,1, -1).
c. z = -\/x + arctany, a = (9,0,3).
d. xyz2 - log(z - 1) = 8, a = (-2, -1, 2).
7. Suppose <p(x) is defined by a formula in which x occurs in several places.
(For example, there are three x's in <p(x) = x2exj(x + 3).) Show that the
derivative <p'(x) is obtained by differentiating with respect to each of the x's
in turn, treating the others as constants, and adding the results. (Hint: If x
occurs in n places in the formula for cp, let F(x\,..., xn) be the function of
n variables obtained by replacing each of the x's in the formula by a different
variable. How do you express <p in terms of Fl) Notice that the rules for
differentiating sums and products are special cases of this result, obtained by
taking (p(x) = f{x) + g(x) or (p(x) = f(x)g(x). What is the derivative of

2.4 The Mean Value Theorem

The mean value theorem for functions of n variables can be stated as follows. We
recall that if a and b are two points in Rn, the line passing through them can be
described parametrically by g(t) = a + t(b - a). In particular, the line segment
whose endpoints are a and b is the set of points a + £(b - a) with 0 < t < 1.

2.39 Theorem (Mean Value Theorem III). Let S be a region in W1 that contains
the points a and b as well as the line segment L that joins them. Suppose that f is
a function defined on S that is continuous at each point of L and differentiable at
each point ofL except perhaps the endpoints a and b. Then there is a point c on L
such that

Proof. Let h = b - a; then L = {a + th : 0 < t < 1}. Define <p(t) = /(a + th)
for 0 < t < 1. Since / is continuous on L, tp is continuous on [0,1]. Moreover, by
2.4. The Mean Value Theorem 11

the chain rule, tp is differentiable on (0,1) and

<p'(t) = V/(a + th) ■ ^-(a + th) = V/(a + th) • h = V/(a + th) ■ (b - a).
at

By the one-variable mean value theorem, there is a point u e (0,1) such that
<p(l) - <^(0) = ip'(u) • (1 - 0) = y/(u). Let c = a + uh; then

/(b) - /(a) = ¥>(1) - ^(0) = ^(u) = V/(c) • (b - a).

To state the principal corollaries of the mean value theorem, we need a defini
tion. A set 5 C Rn is called convex if whenever a, b G 5, the line segment from
a to b also lies in S. Clearly every convex set is arcwise connected (line segments
are arcs!), but most connected sets are not convex. See Figure 2.4.

EXAMPLE 1. Every ball is convex. Indeed, let B = {x : |x — c| < r} be the

ball of radius r about c. If a, b e B, for 0 < t < 1 we have

|[a + t(b-a)] -c| = |(l-t)(a-c) +t(b-c)|

< (1 - i)|a - c| + t\b — c| < (1 — t)r +tr = r,

so a+t(b — a) e B. (We have used the fact that t and 1 —t are both nonnegative
when 0 < t < 1.)

2.40 Corollary. Suppose that f is differentiable on an open convex set S and

|V/(x)| < M for every x€ S. Then |/(b) -/(a)| < M|b -a| for all a, b € S.

Proof. The line segment from a to b lies in S, and for some c on this segment we
have /(b) - /(a) = V/(c) • (b - a). Hence, by Cauchy's inequality, |/(b) -
□

2.41 Corollary. Suppose f is dijferentiable on an open convex set S and V/(x) =

Ofor all xeS. Then f is constant on S.

Proof. Pick a € S and take M = 0 in Corollary 2.40. We conclude that for every
b € S, |/(b) - /(a) | = 0, that is, /(b) = /(a). □

The hypothesis of convexity is essential in Corollary 2.40. In a situation like

that of the set S2 in Figure 2.4, |b —a| is small, but/(b) —/(a) could be quite large
even when | V/| is small in 52- (Think of a gently sloping spiral ramp.) However,
Corollary 2.41 can be generalized substantially.
72 Chapter 2. Differential Calculus

S" N

Figure 2.4: A convex set (Si), a set that is connected but not convex
(S2), and a disconnected set

2.42 Theorem. Suppose that f is differentiable on an open connected set S and

V/(x) = Ofor all x G S. Then f is constant on S.

Proof. Pick a G S, and define Si = {x G S : /(x) = /(a)} and S2 = {x G S :

J(x) ^ /(a)}- We shall show that 52 must be empty, and hence that / is constant
on 5 = Si, by showing that otherwise (Si, S2) would be a disconnection of S.
Clearly Si and S2 are disjoint and their union is 5; moreover, a g Si. The
set S2 is open (by Theorem 1.13) because the complement of the point /(a) is
an open subset of R. The set Si is also open, for the following reason. Suppose
x G Si. Since S is open, there is a ball B centered at x that is contained in 5.
Since B is convex, / is constant on B by Corollary 2.41, and hence B c Si. That
is, every x G Si is an interior point of Si, so Si is open. Since both S\ and ,5*2
are open, neither one can intersect the closure of the other one without intersecting
the other one itself. But clearly Si and S2 are disjoint, their union is S, and S\ is
nonempty since it contains a. Therefore, (Si, S2) is a disconnection of S unless S2
is empty. □

The hypothesis of connectedness is necessary here. If S = S' U 5" where S'

and S" are open and disjoint, we obtain a counterexample by taking /(x) = 0 for
x G S' and /(x) = 1 for x g 5"'. (See Figure 2.4. Differentiability of a function /
on the set 53 there affords no control over the relation between the values of / at a
and b.)

EXERCISES

1. State and prove two analogues of Rolle's theorem for functions of several vari
ables, whose hypotheses are, respectively, the following:
2.5. Functional Relations and Implicit Functions: A First Look 73

a. / is differentiable on a set containing the line segment from a to b, and

/(a) = /(b).
b. / is differentiable on a bounded open set 5, continuous on the closure of
S, and constant on the boundary of S.

2. Question: If / is differentiable on a connected open set S and <9i/(x) = 0 for

all x € 5, must / be independent of x\ on 5 (that is, /(a) = /(b) whenever
a, b G S and Oj = bj for all j / 1)?
a. Show that the answer is yes when 5 is convex.
b. Give a counterexample to show that the answer is no in general. {Hint:
Think of a staircase where you go halfway up on one flight, make a 180°
turn on a flat landing, then go the rest of the way up on a second flight
parallel to the first one.)

2.5 Functional Relations and Implicit Functions: A First

Look

Often we are presented with an equation F(x\,. .. ,xn) = 0 relating a collection

of variables x\,..., xn. (There is no harm in taking the right side to be 0; just
move everything over to the left side of the equation.) It may be possible to solve
this equation for one of the variables in terms of the remaining ones, say xn =
g(x\,... ,xn-i), and we wish to study the resulting function g in terms of the
original function F.
To make things clearer, let us change the notation a little, replacing n by n +
1 and denoting the last variable xn by y; thus, the given equation has the form
F(x\,... ,xn,y) = 0, and it is supposed to determine y as a function of x =
[Xl, . . . , Xn).

Let us be clear about what we mean by saying that "it is possible to solve for
y." First, we mean that it is possible to solve in principle, not necessarily that there
is an explicit formula for y. Second, there might be more than one solution, and
obtaining yasa function of the Xj's then involves making a definite choice among
the solutions; moreover, the domain of this function may be smaller than one would
suspect from the original equation.

Example 1.
a. Consider the equation x — y — y5 = 0. It's easy to solve this for x in terms
of y, x = y + y5, but there is no nice algebraic formula for y in terms of
x. However, y + y5 is a strictly increasing function of y (its derivative is
1 + by4, which is positive everywhere), and its values clearly range from
—oo to ex), so there for each x there is exactly one y satisfying x = y + y5,
74 Chapter 2. Differential Calculus

and we can call itg(x). The object in such a situation is to use the equation
x = y + y5 to study the function g.
b. The equation x2 + y2 + z2 = 1 can be solved for z as a continuous function
of x and y in two ways, z = \A — a;2 — y2 and z = — i/l — x2 — y2, both
of which are defined only for x2 4- y2 < 1.

At this stage we are not going to going to worry about these matters, or about
the question of when it is possible to solve the equation at all; such questions will
be addressed in Chapter 3. Rather, we shall assume that there is a differentiable
function g(x\,..., xn), denned for X\,... ,xn in some region S C Wl, so that
the equation F(x\,... ,xn,y) = 0 is satisfied identically when g(xi,...,xn) is
substituted for y:

(2.43) F(xi,. ..,xn, g(xi,...,xn)) = 0, (xi,... ,xn) € S.

In this situation we can use the chain rule to compute the partial derivatives
of g in terms of the partial derivatives of F, simply by differentiating the equation
(2.43) with respect to the variables Xj\

dg dg djF
(2.44) djF + dn+i.F-r— = 0, so 7— ^
Ixj dn+xF

Example 1 (continued).
a. Differentiation of the equation x - y — y5 = 0 with respect to x yields
1 - (dy/dx) - 5y4 (dy/dx) = 0, or (dy/dx) = 1/(1 + 5j/4). Of course,
this gives dy/dx in terms of y instead of x, and we don't have a formula
for y in terms of x, but this is better than nothing!
b. Differentiation of x2+y2+z2 = 1 with respect to x, with z as as the depen
dent variable, gives 2x + 2z(dz/dx) = 0, or dz/dx = —x/z. It is easily
verified that this formula is correct whether we take z = y/l — x2 — y2 or
z = -a/1 -x2 -y2.

In a related situation, we may wish to differentiate a function <p(x\,..., xm y)

where the variables x\,..., xn, y satisfy a relation F(x\,..., xn, y) = 0. Assum
ing, as before, that the equation F(x\, ...,xn,y) = 0 can be solved for y, say
y = g(xi,..., xn), it then becomes a matter of applying the chain rule to the com
posite function
w = ip(xi,...,xn,g(xi,...,xn)),
to obtain
dw
—- = djp + (dn+np)(d3g).
OXj
2.5. Functional Relations and Implicit Functions: A First Look 75

The derivatives djg can then be evaluated by using (2.44).

In such a situation, however, there is a tricky point that must be confronted. Let
us explain it in the case of three variables for simplicity. That is, suppose we are
given w = (p(x, y, z) where x, y, z are constrained to satisfy F(x, y, z) = 0, and
suppose we can solve the latter equation for any one of the three variables in terms
of the other two. If we take x as an independent variable, the meaning of dw/dx
depends critically on whether we take y or z as the other independent variable.

Example 2. Let w = x2 + y2 + z, and suppose x,y, z are constrained to

satisfy x + y + z = 0. If we take x and y as independent variables, then
z = —(x + y), so

2 o dw
w = x +y - x - y, — = 2x — 1.

But if we take x and z as independent variables, then y = — (x + z), and

w = x2 + (x + z)2 + z = 2x2 + 2xz + z2 + z, -^ = 4x + 2z.

Clearly, these two formulas for dw/dx almost never agree.

The usual way to clarify this situation is to put subscripts on the partial deriva
tives to indicate which variables are being held fixed:

-—- = derivative of w with respect to x when y is fixed.

dx y

Thus, in Example 2,

dw
= 2x - 1, 2z.
dx dx
y

The preceding ideas work in much the same way when we are given more than
one constraint equation. For example, if we are given two equations F(x, y, u, v) =
0 and G(x, y, u, v) = 0, we may be able to solve them for the two variables u and
v in terms of the other two variables x and y. In this case the partial derivatives
of u and v with respect to x, say, can be calculated by differentiating the equations
F = 0 and G = 0, obtaining

Qvl dv
dxF + duF— + dvF— = 0,
ox ox

dxG + duGp
dx
+ dvG^
dx
= 0,
76 Chapter 2. Differential Calculus

and then solving these (linear!) equations simultaneously for du/dx and dv/dx.
By Cramer's rule (Appendix A, (A.54)), the result is

(dxF 8VF\ [3UF 8XF

du \dxG dvG dv \duG dxG
dx J L fduF dvF\ ' dx fduF dvF
,G dvGj \duG 8VG

Example 3. Suppose the quantities x, y, and z are initially equal to 1, 0, and

2, respectively, and are constrained to satisfy the equations x5 + x(y3 + l)z —
2yz5 = 3 and yz = sin(2x + y — z). By about how much do y and z change
if x is changed to 1.02?
Solution. We need to find dy/dx and dz/dx, which we abbreviate as y1
and z'. Differentiating the two equations with respect to x, treating y and z as
implicit functions of x, we obtain

5x4 + (y3 + l)z + Sxy2zy' + x(y3 + l)z' - 2z5y' - 10yz4z' = 0,

zy' + yz' = cos(2x + y - z) ■ (2 + y' - z').

We could solve these equations for y' and z' as they stand, but since we are
interested in the answer at (x,y,z) = (1,0,2), we can simplify matters by
substituting in these values right now. The first equation reduces to 7 + z' —
64y' = 0 and the second one to 2y' = 2 + y' — z', or

64y'-^ = 7, y' + z' = 2 (when {x, y,z) = (1,0,2)).

Solving these equations yields y' — ■§§ and z' = 4|p, so — returning to
the original question — dy = y' dx = ^(.02) = g^ and dz — z' dx =
121/ no^ _ 121
65 V-U^V — 3250-

EXERCISES

1. Compute dz/dx and dz/dy when z is determined as a function of y and x by

the following equations:
a. x + y2 + z3 = 3xyz.
b. 2x2 + 3y2 + z2 = e~z.
2. Suppose y and z are determined as functions of x by the equations z = x2 —y2
and z = 2x+4y. Find dy/dx and dz/dx (a) by solving the equations explicitly
for y and z; (b) by implicit differentiation.
2.6. Higher-Order Partial Derivatives 11

3. Compute dy/dt and dz/dt when y and z are determined as functions of t by

the equations y5 + eyz + zt2 = 1 and y2 + zA = t2.
4. If u = x2 + 3y2 and y = xz, there are two possible meanings for du/dx
depending on whether the independent variables are taken as {x,y) or {x,z).
Compute both of them.

5. Let V = irr2h and S = 2irr(r + h) (the volume and surface area of a circular
cylinder). Compute

8V_ dV_ dV_ 8S

dh dh as dv

where the subscript indicates the variable that is being held fixed.

6. Suppose that F(x, y, z) = 0 is an equation that can be solved to yield any of

the three variables as a function of the other two. Show that

dx dy dz
dy dz dx

provided that the symbols are interpreted properly. (Part of the problem is to
say what the proper interpretation is.)

7. Suppose that the variables E, T, V, and P are related by a pair of equations,

f(E, T, V, P) = 0 and g{E, T, V, P) = 0, that can be solved for any two of the
variables in terms of the other two, and suppose that the differential equation
dy E — TdrP + P = 0 is satisfied when V and T are taken as the independent
variables. Show that dPE + TdrV + PdPV = 0 when P and T are taken as
the independent variables. (This example comes from thermodynamics, where
E, T, V, and P represent energy, temperature, volume, and pressure.)

2.6 Higher-Order Partial Derivatives

If / is a differentiable function on an open set Scf, its partial derivatives djf

are also functions on S, and they themselves may have partial derivatives. The
standard notations for the second-order derivative

df
dxi [<
are

d2f
XjXii J ji dXidXjf,
dxidxj'
78 Chapter 2. Differential Calculus

if i ^ j and
2

qx j

if i = j. The analogues of these notations for higher-order partial derivatives

should be pretty clear. However, all of them become quite cumbersome when the
order of the derivative is even moderately large. There is a more compact notation
for partial derivatives of arbitrary order that we shall introduce below.
A function / is said to be of class Ck on an open set U if all of its partial
derivatives of order < k — that is, all the derivatives d^di2 ■ ■ ■ di,f, for all choices
of the indices ij and all/ < k — exist and are continuous on U. We also say that /
is of class Ck on a nonopen set S if it is of class Ck on some open set that includes
S. If the partial derivatives of / of all orders exist and are continuous on U, f is
said to be of class C°° on U.
It is common to refer to the derivatives d2f and didjf (i ^ j) as pure and
mixed second-order partial derivatives of /, respectively. In this connection, a
question that immediately arises is whether the order of differentiation matters.
In other words, is didjf the same as djdifl Experimentation with elementary
examples suggests that the answer is yes.

EXAMPLE 1. If g(x,y) = a;sin(a;3 + e2y), we have

dxg = sin(:c3 + e2y) + 3x3 cos(a;3 + e2y), dyg = 2xe2y cos(x-3 + e2y).

Differentiating dxg with respect to y and dyg with respect to x yields

dydxg(x, y) = 2e2y cos(a;3 + e2y) - 6x3e2y sin(a;3 + e2y) = dxdyg{x, y).

However, the following example shows that didjf may fail to coincide with
dAf.

Example 2. Let

f(x,y) = Xy{f^yf] i[(x,y)^ (0,0), /(0,0)=0.

Since /(x,0) = f{0.y) = 0 for all x,y, we have dxf (0,0) = dyf (0,0) = 0,
and a little calculation shows that for (x, y) ^ (0,0),

xAy + 4x2y3 - y5 x5 - 4a?V - xyA

dxf(x,y) = (x2+y2y ' dyf(x,y) = ^2 + y2)2 ■

In particular, dxf(0, y) = -y and dyf(x, 0) = x for all x, y, so

dydxf(0,0) = -1 but dxdyf(0,0) = 1.

2.6. Higher-Order Partial Derivatives 79

On the other hand, another little calculation shows that

-6 I n~4,,2 _ n-3,,,4 _ ,,6
x - 9x3y4 - if
dydxf(x,y) = dxdyf(x,y) = , \ 7/ J- for (xty) + (0,0).
[x -+- y )

This last expression has no limit as (x, y) —> (0, 0) (approaching (0,0) along
different straight lines gives different limits). Thus, we see that dydxf and
dxdyf exist everywhere, are continuous except at the origin, and are equal
except at the origin.

Fortunately, the pathological behavior in Example 2 is quite atypical. The fol

lowing theorem guarantees that the order of differentiation is immaterial in most
situations that arise in practice.

2.45 Theorem. Let f be a function defined in an open set S C W1. Suppose a £ S

and i,j <G {1,..., n}. If the derivatives dif, djf, didjf, and djdif exist in S, and
if didj f and djdif are continuous at a, then didjf {a) — djdif (a).

Proof. Since only the variables X{ and Xj are actually involved here, we may as well
assume that n = 2 and write x = (x,y) and a = (a, b), so that we are studying
the derivatives dxdyf and dydxf. These derivatives can be regarded as limits of
second-order difference quotients, so we begin by examining the "difference of
differences" obtained when x and y are both changed by an amount h:

D = [/(a + h, b + h) - /(a + h, b)} - [/(a, b + h) - /(a, b)}

= [/(a + h,b + h)- /(a, b + h)] - [/(a + h, b) - /(a, b)].

That is, if we set

<p(t) = /(a + h,b + t)- /(a, b + t), tjj(t) = f(a + t,b + h) - /(a + t, 6),

we have

We apply the (one-variable) mean value theorem twice to the first expression for
D, obtaining

D = <p'{v)h = [dyf(a + h,b + v) - dyf{a, b + v)} h

= 9xdyf (a + u, b + v)h2,
where u and v are some numbers between 0 and h. Likewise, using the second
expression for D, we obtain

D = 4>'(u)h = [dxf{a + u,b + h)- dxf{a + 5, b)} h

= dydxf{a + u, b + v)h2,
80 Chapter 2. Differential Calculus

where u and v are some other numbers between 0 and h. Equating these two
expressions and cancelling the h2, we have

dxdyf(a + u, b + v) — dydxf(a + u, b + v).

Now let h —* 0. Then u,v,u,v —> 0 also, so since dxdyf and dydxf are assumed
continuous at (a, b), we obtain dxdyf(a, b) — dydxf(a, b). □

2.46 Corollary. If f is of class C2 on an open set S, then didjf = djdif on S.for

all i and j.

Once this is known, an elementary but slightly messy inductive argument shows
that the analogous result for higher-order derivatives is also true:

2.47 Theorem. If f is of class Ck on an open set S, then

ditdi2--- dh, f = djl dj2 ■ ■ ■ 8jk f on S

whenever the sequence {j%,..., jk} is a reordering of the sequence {ii,..., i^}.

The fact that the order of differentiation in a mixed partial derivative can occa
sionally matter is a technicality that is of essentially no importance in applications.
In fact, by adopting a more sophisticated viewpoint one can prove a theorem to
the effect that, under very general conditions, didjf and djdif are always equal
"almost everywhere," which is enough to allow regarding them as equal for all
practical purposes.
The chain rale can be used to compute higher-order partial derivatives of com
posite functions, but there are some pitfalls to be avoided. To be concrete, suppose
that w = f(x, y) and that x and y are functions of s and t. Assume that all the
functions in question are at least of class C2. To begin with, the chain rule for
first-order derivatives gives

dw dw dx dw dy

If we want to compute d2w/ds2, we differentiate (2.48) with respect to s, obtaining

d2w d \dw~\ dx i dwd2x i d \dw~\ dy i dw d2y

ds2 ds [dx\ ds dx ds2 ds [dy \ ds dy ds2 '

The first pitfall is to write — —— as a mixed partial derivative . This

ds \_dx \ dsdx
makes no sense because when we write dw/dx we are thinking of w as a function
2.6. Higher-Order Partial Derivatives 81

of x and y, not x and s. Rather, dw/dx is a function of x and y just like w, and
to differentiate it with respect to s we use the chain rule again; and likewise for
dw/dy:

d Idwl _ d2w dx d2w dy d fowl _ d2w dx d2w dy

\/..J\J) — — — „ — h — — _ , — _ _ _ "T"
ds l dx J <9x2 ds dxdy ds' ds \_ dy J <9x<9y c?s 5y2 ds

Now we plug these results into (2.49) to get the final answer, which thus contains
quite a few terms. Pitfall number 2: It's easy to forget some of these terms.
In this situation it's usually advantageous to use the notation fx and fy in
stead of dw/dx and dw/dy, and likewise for second-order derivatives. This makes
(2.48)-(2.50) look a little more manageable:

dw dx dy
— = fx (- /,,—,
ds ds ds
d2w dfx dx d2x dfy dy
ds2 ds ds x ds2 ds ds

ds ds ds ds ds ds

The final result is then

f \]offf!_f Wf^f
ds2 ~ !xx [ds\ + Uy ds ds + hv [ds\ + uds2 + hds2 ■
Of course, similar results also hold for the other second-order derivatives of w.

Example 3. Suppose u = f{x,y), x — s2 — t2, y = 2st. Assuming / is of

class C2, find d2u/dsdt in terms of the derivatives of /.
011 dx du
Solution. — = fx— + fy-£ = -2tfx + 2sfy, so

d2u
^^ = -2t[2sfxx + 2tfxy] + 2s[2sfxy + 2tfyv] + 2fy

= ~4stfxx + 4(s2 - t2)fxy + Astfw + 2fv.

EXAMPLE 4. Let us see what happens to some derivatives when we change

from Cartesian to polar coordinates. Let u = f(x, y), where / is of class C2,
and let x = r cos 0 and y = r sin 9. Then
82 Chapter 2. Differential Calculus

Proceeding to the second derivatives,

d2u n\dfx , / . Q,dfv

(0) + (9)^

= (cos'2 0)fxx + (2 cos 0sin0)/X!, + (sin2 9)fyy,

~ = -(rcoS0)fx - (rsine)^ - (rsmO)fy + (rcosO)^

= (r2 sin2 0)/M - (2r2 sinflcos 0)/XJ/ + (r2 cos2 0)fyy - r~.

The calculation of the mixed derivative d2u/drd9 is left to the reader (Exercise
2).
Notice, in particular, that by combining the last two equations and using
the identity sin2 9 + cos2 9 — 1, we obtain

02u i
1 du i
1 d'2u .
f I f

The expression on the right, the sum of the pure second partial derivatives of /
with respect to a Cartesian coordinate system, turns up in many practical and
theoretical applications; it is called the Laplacian of /. (We shall encounter
it again in Chapter 5.) What we have just accomplished is the calculation of
the Laplacian in polar coordinates. We state this result formally, with slightly
different notation.

2.51. Proposition. Suppose u is a C junction of(x, y) in some open set in IK2. If

(x, y) is related to (r, 6) by x = r cos 9, y — r s'mO, we have

d2u i
d2u _
d'2u _|_
1 du t
1 d2u
dx2 dy2 'Or'1 r dr r2 d92

Multi-index Notation. Traditional notations for partial derivatives become

rather cumbersome for derivatives of order higher than two, and they make it rather
difficult to write Taylor's theorem in an intelligible fashion. However, a better
notation, which is now in common usage in the literature of partial differential
equations, is available.
A multi-index is an n-tuple of nonnegative integers. Multi-indices are gener
ally denoted by the Greek letters a or /?:

a = (ai,o2,..- ,<Xn), P = Wi,02,---,Pn) {aj,/3j € {0,1,2,...}).

2.6. Higher-Order Partial Derivatives 83

If a is a multi-index, we define

Oi\ = Oil + CK2 H + Q;n, a! = a\\a2l ■ ■ ■ an\,

= x^x^2 ■ ■ ■ x"" (where x = (xi, x2,..., xn) G Rn),

a s-ox a2 an J -

The number |a| = a\ + ■ ■ • + an is called the order or degree of a. Thus, the

order of a is the same as the order of xQ as a monomial or the order of da as a
partial derivative. (The notation \a\ = ax + • ■ • + an conflicts with the notation
x| = (x\ -\ \-xfJ xl2 for the norm of an n-tuple of real numbers, but the meaning
will be clear from the context.)
If / is a function of class Ck, by Theorem 2.47 the order of differentiation in a
fcth-order partial derivative of / is immaterial. Thus, the generic /cth-order partial
derivative of / can be written simply as daf with \a\ = k.

Example 5. With n — 3 and x = (x,y,z), we have

0(0,3,0)
a
f _ ^!Z
I - dy3,
x(2,l,5)
x
_ 2 5
- x yz .

As the notation xa indicates, multi-indices are handy for writing not only
derivatives but also polynomials in several variables. To illustrate their use, we
present a generalization of the binomial theorem.

2.52 Theorem (The Multinomial Theorem). For any x = (xi, x2,. •. xn) G W1
and any positive integer k,

{ x2 + ■ ■ ■ 4- xnf = 2^ — xa.
\a\=k

Proof. The case n = 2 is just the binomial theorem:

ai!a2! x ,
\a\=k

where we have set a,\ = j, a2 = k—j, and a = (a\, a2). The general case follows
by induction on n. Suppose the result is true for n < N and x = (a?i,. •., xn). By
84 Chapter 2. Differential Calculus

using the result for n = 2 and then the result for n = N — 1, we obtain

(zi H + Xjv)fc = [(xi -\ + xN-i) + xN]

where /3 = (/3i,... ,Pn-i) and x = (xi,... ,xn-i)- To conclude, we set a =

(/3i,... ,/3yv_i, j), so that fi\j\ = a! and x^x^ = xQ. Observing that a runs over
all multi-indices of order k when j3 runs over all multi-indices of order i = k - j
and j runs from 0 to k, we obtain Yl\a\=k k\^.a/a\. D

In these exercises, all functions in question are assumed to be of class C2.

1. Verify by explicit calculation that dxdyf = dydxf:

a. f(x,y)=x2y-\-smTTxy.

c. f(x, y) = (x + 2y + 4)/(7.t + 2>y).

2. Calculate d2u/drd9 if u = f(x,y), x = rcos9, y = rsin9. (See Example
4.)
3. Compute the indicated derivatives of w in terms of the derivatives of /:
a. d2w and dxdyw, if w = f(2x — y2, xsin3y, x4).
b. dxdyw and d2w, if w = f{ex~3y, log(a;2 + 1), ^y4 + 4).
4. Show that if u = F(x + g{y)), then uxuxy = uyuxx.
5. Suppose that / is a homogeneous function of degree a on IRn. Show that
Y^j,k=-\ xjxkdjdkf = a(a - 1)/ (cf. Euler's theorem (2.36) and its proof).
6. Suppose u = f(x}y), x = s2 — t2, y = 2st. Show that d2u + d2u =
4(s2 + t2)(d2f + d2f) (cf. Example 3).
7. Suppose u = f(x — ct) + g(x + ct), where c is a constant. Show that <92u =
— 9 o9
o ti 11

8. For x = (x, y, z) € K3 \ {0} and t 6 R let F(x, t) = r~lg(ct - r), where

c is a constant, g is a C2 function of one variable, and r = |x|. Show that
d2xF + d2yF + d2F = c~2d2F.
2.1. Taylor's Theorem 85

9. For x € t" \ {0}, let F(x) = f(r) where / is a C2 function on (0, oo) and
r = |x|. Show that d\F + ■■■ + d2nF = f"(r) + (n - l)r-lf'{r).
10. Derive the following version of the product rule for partial derivatives:
da(fg) = Y.p+rdpAIPWdPfting.
11. Prove the following n-dimensional binomial theorem: For all x, y e W1 we
have (x + y)a = E/3+7=a(«!//3b'!)xV-
/3+7

2.7 Taylor's Theorem

In this section we discuss Taylor expansions in their finite form, as polynomial

approximations to a function rather than expansions in infinite series. We begin
with a review of Taylor's theorem for functions of one real variable.
Taylor's theorem is a higher-order version of the tangent line approximation; it
says that a function / of class Ck on an interval / containing the point x = a is the
sum of a certain polynomial of degree k and a remainder term that vanishes more
rapidly than \x - a\k as x —> a. Specifically, the polynomial P — Pa^ of order k
such that Ptf)(o) = fU){a) for 0 < j < k, namely

(2.53)

is called the fcth-order Taylor polynomial for / based at a, and the difference

(2.54) Ratk(h) = f(a + h) - Pa.k{h) = f(a + h)

is called the /cth-order Taylor remainder. The various versions of Taylor's theorem
provide formulas or estimates for Raik that ensure that the Taylor polynomial Pa^
is a good approximation to / near a. The ones most commonly known involve the
stronger assumption that / is of class Ck+l and yield the stronger conclusion that
the remainder vanishes as rapidly as \x — a\k+l. We present two of these, as well
as one that yields the more general form of the theorem stated above.
The easiest version of Taylor's theorem to derive is the following.

2.55 Theorem (Taylor's Theorem with Integral Remainder, I). Suppose that f
is of class Ck+l (k > 0) on an interval I C R, and a G /. Then the remainder
Ra^k defined by (2.53)-(2.54) is given by

uk+1 rl
(2.56) Rak(h) = —J (l-t
86 Chapter 2. Differential Calculus

Proof. For k = 0 the assertion is just that

(2.57) f(a + h) = f(a) + h f f'(a + th)dt,

which is easily verified by the substitution u = a + th:

/I ra+h

f{a + th)dt= / f'(u)du = f{a + h)- /(a).

The trick now is to integrate (2.57) by parts, choosing for the antiderivative of the
constant function 1 not t but t — 1, alias — (1 — t):

h [ f'(a + th)dt = -(l-t)hf'{a + th) l + h I (1 - t)f"(a + th)hdt

Jo Jo
rl
= f'(a)h + h2 (l-t)f"(a + th)dt.
Jo
Plugging this into (2.57), we obtain (2.56) in the case k = 1. If we integrate by
parts again,

h2
/o

th)hdt

= H^h2 + ^ f\l-tff'"{a
1 l Jo

we obtain the theorem for k = 2. The pattern is now clear: Integrating (2.57) by
parts k times yields (2.56). □

Next we present a modification of Theorem 2.55 that works without assum

ing that / has any additional derivatives beyond the ones occurring in the Taylor
polynomial.

2.58 Theorem (Taylor's Theorem with Integral Remainder, II). Suppose that
f is ofclass Ck (k > 1) on an interval IcK, and a 6 /. Then the remainder Ra^
defined by (2.53)-(2.54) is given by

(2.59) Ra.k(h) = -^-— [\l - t)k~l [/W(a + th) - /(fc)(a)] dt.

2.7. Taylor's Theorem 87

Proof. We begin by using Theorem 2.55, with k replaced by k — 1:

j=o 3' \K *-)■ Jo

Subtracting f^k\a)hk/k\ from both sides gives

j=0 '' v '

In view of the fact that

hk hk
fci (k-iy-Joy ' '
this gives (2.59). □

The formulas (2.56) and (2.59) are generally used not to obtain the exact value
of the remainder but to obtain an estimate for it. The main results are in the follow
ing corollaries.

2.60 Corollary. /// is of class Ck on I, then Ra,k(h)/hk -> 0 as h -> 0.

Proof. /(fc) is continuous at a, so for any e > 0 there exists 6 > 0 such that
|/(fe)(y) - f(k\a)\ < e when \y — a\ < 5. In particular,

)f{)\ <eforO< t < 1 when \h\ < 5.

Hence, (2.59) gives

^ f\ k^ = ^\h\k for \h\ < 5.

In other words, \Ra^{h)/hk\ < e/k\ whenever \h\ < 5, and hence Ra^{h)/hk —*
0 as h -> 0. □

Thus, if / is of class Ck near x = a, we can write/(x) as the sum of a feth-order

polynomial (the Taylor polynomial) in h = x — a and a remainder that vanishes at
x — a faster than any nonzero term in the polynomial. Notice that for k = 1, this
is just a restatement of the differentiability of /. If / is actually of class Ck+1, we
obtain a better estimate from (2.56):
88 Chapter 2. Differential Calculus

2.61 Corollary. Iff is of class Ck+1 on I and \f{k+1)(x)\ < M for x € /, then

\Ra^h)\<-^—\h\k+\ (a + h<=I).
Proof. By (2.56),
j/, fc+l /-I u

{l-t)kMdt=——\h\k+l.
~ k\

Finally, we present Lagrange's form of the remainder, which turns Taylor's

theorem into a higher-order version of the mean value theorem. Just as we deduced
the mean value theorem from Rolle's theorem, we shall obtain Lagrange's formula
from the following variant of Rolle's theorem.

2.62 Lemma. Suppose g is k + 1 times differentiable on [a, b}. If'g(a) = g(b) and
g^'(a) = Ofor 1 < j < k, then there is a point c £ (a, b) such that g(k+1\c) = 0.
Proof. By Rolle's theorem, there is a point c\ £ (a, b) such that g'{c\) = 0. Since
g' is continuous on [a, ci] and differentiable on (a,ci), and g'(a) = g'(c\) = 0,
there is a point C2 £ (a, c\) such that g"{c2) = 0. Proceeding inductively, we find
that for 1 < j < k + 1 there is a point Cj £ (a, Cj-i) such that g^\cj) = 0, and
the final case j = k + 1 is the desired result. □

2.63 Theorem (Taylor's Theorem with Lagrange's Remainder). Suppose f is

k + 1 times differentiable on an interval IcR, and a £ /. For each ft £
1 a + ft £ / there is a point c between 0 and i

(2.64) otTT)r
Proof. Let us fix a particular ft, and suppose for now that ft > 0. Let

9(0 = «*»(«) -

= /(a + () - /(.) - /'(a). - ... -

The coefficient of tfc+1 is chosen to make g(ft) = 0, and clearly g(0) = 0. Simi
larly, for j < k we have

gK
(a;sco- j)fxQ<~x
— ran aiBnjBAa oj suoisuBdxa joiabt. asn T
X'UIS T
•ajdpuud siqi jo uoijBoqddB
jaqjouB sajBJjsnm ajdurexa Suimotjoj aqx ('8'3§ PUB 6 aspjaxg sas) '(V),J J°
u§is aqi uo Suipuadap '» 513 uinunuiui jo uinunxBUJ b qjiM uoipunj oiiB-ipBiib b si
iqgp aqj uo uoissajdxa aqj pire '2(o - x)(w)///| + (»)/ ps (as-)/ uaqj '0 / (°)///
jj :bui3jjx3 ôoj joj jsaj aApBATaap-puooas aqj joj siSBq aqi st siqx 'qi-ioj os pire
'aAispap si ui.131 japjo-puooas aq] '0 7^ (o)y// }nq 0 = (°)// JI '3UO poo§ b st
(o - x)(v)J + (»)/ ^ (x)f uoqBUiTxo.iddB 3uij }U3§ubj aqj uaqi '0 / (°);/J! 's!
jBqx ■uoisimdxd jojiiox sjj in '('»)/ iwâ; iuvisuod 3i[i wou/juvdv 'wjpi Sini{siiivt\uoii
}suif di[i dq pdwuiddidp iCjdSjvj si v = x JVdii (x)f uojiounf v fo uoiAVifaq diji yem
a^dpuud '[BisuaS lUBiaodun ub si ji 'apis {Boijajoaqj aaoui aqi uo "tp|M ajnduioo 05
A*SBa XpAijB|aj ajB jBqj siBiuiouXjod A"q suoipunj pajBoqduioo ajBunxojddB 05 auo
Aaqj 'A\aiA jo juiod {Boipujd b uiojj -sasn a"ubui aABq sreiuiouA|od j
i+czxc{\-)
t-{x — l)
suopjunf difj fo 0 = v inoqo if dduSdpfo s\viuioui\od joj&vj^ sz/j; 'uoijisodoaj 59*
•jjooq aqj jo jsaj aqj ui ûauiuioo jnoqjiAV pasn aq
'(££'Z) uoijiutpp aqj uiaij paAuap XjisBa are qoiqM 'suoisundxa jo^b^ jnjasn
puB jBmuiBj isoui aqi jo msj b ^reoaa sm 'japBaj aqj jo aouaiuaAuoo aqj joj
•a]qBJ§ajui uuBuiaiy si (I+^)/ l^qj aumssB oj q§noua si ji ££■£ uia.ioaqx
ui 'X[[Bnpv '(n-^)/ J° 'îmmjuoo aqj jou 'aouajsixa aqj Xrao papaau bm £9-3 uiaa
ui jnq 'T+J/O ssbjd jo si / lBqj pamnssB aA\ gg'^ uia.ioaqx uj
'(WZ) J° aouanbasuoo ajBipauiuu ub AjsnoiAqo si ^9'
□ "[M '0] IBAiajui aq5 uo (j--)B = (?)5 uoipunj
aqj guuapisuoo /^q ApBuuiis pa]puBq si 0 > y 3SB0 aqx '(WZ) Apspajd si siqj jng
qons («/ '0) 9 a Juiod b si aaaqj '39-3 BuraiaT Xq 'aaojajaqx "0 = (o) (p)6 os
68
90 Chapter 2. Differential Calculus

Solution. We have

x2 - sinrr2 = x2 - (x2 - ^x6 H ) = ^a;6 -\ ,

x4(l - cosx) = x4(l - (1 - \x2 + ■ ■ ■)) = ±x-6 + ■ ■ • ,

where the dots denote error terms that vanish faster than x6 as x —>■ 0. There
fore,
x — sinx _ qX + ■ ■ ■ ^ g + ■ ■ ■

x4(l - cosx) ia;6 + • • ■ f + ""'

where the dots in the last fraction denote error terms that vanish as x -> 0. The
limit is therefore |. (To appreciate the efficiency of this calculation, try doing
it by l'Hopital's rule!)

We now generalize these results to functions on ]Rn. Suppose / : W1 —> R is of

class Ck on a convex open set S. We can derive a Taylor expansion for /(x) about
a point a 6 S1 by looking at the restriction of / to the line joining a and x. That is,
we set h = x — a and

By the chain rule,

(?'(*)= h-V/(a + ih),

and hence

where the expression on the right denotes the result of applying the operation

(2.66) h-\7 = hi-—-\ \-hn

j times to /. The Taylor formula for g with a = 0 and h = 1,

g(l) = y^ —-—V + (remainder),

o J-
therefore yields

(2.67) /(a + h) =

where formulas for j?a)fc(h) can be obtained from the formulas (2.56), (2.59), or
(2.64) applied to g.
2.7. Taylor's Theorem 91

It is usually preferable, however, to rewrite (2.67) and the accompanying for

mulas for the remainder so that the partial derivatives of / appear more explicitly.
To do this, we apply the multinomial theorem to the expression (2.66) to get

(h V)? = J3
a =j

Substituting this into (2.67) and the remainder formulas, we obtain the following:

2.68 Theorem (Taylor's Theorem in Several Variables). Suppose f : R"■ -> IE

is of class C on an open convex set S. Ifa.<E S and a + h G S, then

(2.69) f(a + h)=J2 ^~-ha + Ra>k(h),

\a\<k

where

(2.70) i?a.fc(h) = i
i *
/
J0

If f is of class Ck+1 on S, we also have

(2.71) Ra,k{h) = (k + 1)

(2.72) .Ra.fclh)^ V 9a/(a + ch)^/or^omecG (0,1).

|fc+l

This result bears a pleasing similarity to the single-variable formulas (2.54),

(2.56), (2.59), and (2.64) — a triumph for multi-index notation! It may be reas
suring, however, to see the formula for the second-order Taylor polynomial written
out in the more familiar notation:

(2.73)
n ^ n

Pa,2(h) = /(a) + X>/(a)fy + -J2 d3dkf(a.)h,hk

3=1 j,k=l

(2.74) =/(a) i£
92 Chapter 2. Differential Calculus

The first of these formulas is (2.67) with k = 2; the second one is (2.69). (Every
multi-index a of order 2 is either of the form (... , 2,...) or (... , 1,... , 1,...),
where the dots denote zero entries, so the sum over \a\ = 2 in (2.69) breaks up into
the last two sums in (2.74).) Notice that the mixed derivatives djdk (j ^ k) occur
twice in (2.73) (since djdk — dkdj) but only once in (2.74) (since j < k there);
this accounts for the disappearance of the factor of ^ in the last sum in (2.74).
We also have the following analogue of Corollaries 2.60 and 2.61:

2.75. Corollary. /// is of class Ck on S, then Ratk{h)/\h\k -4 0 as h -» 0. ///

is of class Ck+1 on S and |<9Q/(x)| < M for x G S and \a\ - k + 1, then

|i?a,fc(h)| <

where

Proof. The proof of the first assertion is the same as the proof of Corollary 2.60.
As for the second, it follows easily from either (2.71) or (2.72) that

a!
a\=k+l

and this last expression equals M||h||'!+1/(fe+l)! by the multinomial theorem. □

An essential fact about the Taylor expansion of a function / about a point a

is that it is the only way to write / as the sum of a polynomial of degree k and a
remainder that vanishes to higher order than |x — &\k as x -> a. To see this, we
need the following lemma.

2.76. Lemma. If P(h) is a polynomial of degree < k that vanishes to order > k
as h -> 0 [i.e., P(h)/|h|fc -4 07, then P = 0.

Proof. The hypothesis implies that, for each fixed h, P(th)/tk —>■ 0 as t -)■ 0.
Write P = Pq + P\ + •' • + Pk where Pj is the sum of the terms of order j in P;
thus

P(th) = Po + iPi(h) + t2P2{h) + ■■■ + tkPk(h).

Pq is the constant term; since P(0) = 0 we must have Po = 0. Hence, dividing by

P(th)
2.7. Taylor's Theorem 93

Since P(th)/t —* 0, we must have Pi(h) = 0. But then, dividing by t again,

^)=a(h)+...+^n(h),
so P2(h) = 0 since P{th)/t2 —* 0. Continuing inductively, we conclude that
Pj (h) = 0 for all j, so P = 0. □

2.77 Theorem. Suppose f is of class C™ near a. ///(a + h) = Q(h) + E(h)

where Q is a polynomial of degree < k and E(h)/\h\k —> 0 as h —> 0, then Q is
the Taylor polynomial Pa,A;-

Proof. Corollary 2.75 says that/(a+h) = P^k{h)+Ra>k{h), where Ratk(h)/\h\k

tends to zero ash does. If also/(a+h) = Q(h)+E(h), thenQ-Pa)fc = R^-E,
so
Q(h)-Pa,fc(h) .Ra|fc(h)-E(h)
\h\k \h\k "* '
By Lemma 2.76, Q = Paife. □

Theorem 2.77 has the following important practical consequence. If one wants
to compute the Taylor expansion of /, it may be very tedious to calculate all the
derivatives needed in formula (2.69) directly. But if one can find, by any means
whatever, a polynomial Q of degree k such that [/(a + h) - Q(h)]/|h|fc —» 0,
then Q must be the Taylor polynomial. This enables one to generate new Taylor
expansions from old ones by operations such as substitution, multiplication, etc.

EXAMPLE 2. Find the 3rd-order Taylor polynomial of f(x, y) = ex2+y about

(ar,y) = (O,O).
Solution. The direct method is to calculate the derivatives fx, fy, fxx, fxy,
fyy, fxxx, fxxxj, fxyy, and fyyy, and then plug the results into (2.69), but only a
masochist would do this. Instead, use the familiar expansion for the exponential
function (Proposition 2.65), neglecting all terms of order higher than 3:

ex*+y = 1 + (x.2 + y) + i (X2 + yj2 + i (S2 + yf + (order > 3)

= 1 + x2 + y + \{xA + 2x2y + y2) + ^(x6 + 3x4y + 3x2y2 + y3)

+(order > 3)

= 1 + y + x2 + \y2 + x2y + |y3 + (order > 3).

In the last line we have thrown the terms x4, x6, x4y, and x2y2 into the garbage
pail, since they are themselves of order > 3. Thus the answer is 1 + y + x2 +
94 Chapter 2. Differential Calculus

^y2 + x2y + jjry3. Alternatively,

= 1+y + x2 + \y2 + x2y + \yz -\

where the dots indicate terms of order > 3.

EXERCISES

1. Let /(:/;) = x2{x - sins) and g(x) = (ex - 1)(cob 2s - I)2.

a. Compute the Taylor polynomials of order 5 based at a = 0 of / and g.
(Don't compute any derivatives; use Proposition 2.65 as a starting point.)
b. Use the result of (a) to find llmx^o f(x)/g(x) without using l'Hopital's
rule.

2. Find the Taylor polynomial P\ta(h) and give a constant C such that \Ry^(h)\ <
Cfi4 on the interval \h\ < ^ for each of the following functions.
a. f(x) = logs.
b. f{x) = V&.
c. f(x) = (x + 3) 1.
3. Show that | sin.r - x + g.x3j < .08 for \x\ < \tt. (Hint: x - \x3 is actually
the 4th-order Taylor polynomial of sin.x.) How large do you have to take k so
that the A;th-order Taylor polynomial of sin x about a = 0 approximates sin x
to within .01 for \x\ < jTT?

4. Use a Taylor approximation to c~x to compute j0 e~x dx to three decimal

places, and prove the accuracy of your answer. (Hint: It's easier to apply
Corollary 2.61 to f(t) = e,~l and set t = x2 than to apply Corollary 2.61
to e x directly.)

5. Find the Taylor polynomial of order 4 based at a = (0, 0) for each of the
following functions. Don't compute any derivatives; use Proposition 2.65.
a- f(x,y) = xs'm(x + y).
b. e ■' eos(.7;~ + y2)
c,x-2y
c. e"

6. Find the 3rd-order Taylor polynomial of /(.x, y) = x + cos ny + x log y based

at a = (3,1).
7. Find the 3rd-order Taylor polynomial of f(x,y. z) = x2y + z based at a =
(1,2,1). The remainder vanishes identically; why? (You can see this either
from the Taylor remainder formula or by algebra.)
2.8. Critical Points 95

8. Suppose / is defined on the open interval / and a e I. The Taylor polynomial

Pa^ is well defined provided merely that / is of class Cffc~1 on / and f^(a)
exists. Show that under these hypotheses, the remainder Ra^ = f - Pa,k still
satisfies limh^0 Ratk(h)/hk = 0. {Hint: Apply 1'Hopital's rule k - 1 times,
then recall precisely what it means for f^{a) to exist.)
9. Suppose that / is of class Ck on an open interval containing the point a, and
that f'{a) =••■=: /C*"1)^) = 0 but f(k){a) ± 0. Use Corollary 2.60 to
show that (i) if k is even, then / has a local maximum or local minimum at a
according as f^ (a) is negative or positive, and (ii) if k is odd, / has neither a
maximum nor a minimum at a.

10. Suppose / is of class Ck on an open convex set S C W and its &th-order

derivatives, daf with \a\ = k, satisfy

\daf(y)-daf(*)\ <C|y-x|A (xjeS),

where C and A are positive constants (cf. Exercise 1 in §1.8). Use (2.70) to
show that there is another positive constant C such that

|#a,ifc(h)| < C|h|fc+A (a G S and a + h G S).

2.8 Critical Points

We know from elementary calculus that in studying a differentiable function / of a

real variable, it is particularly important to look at the points where the derivative
/' vanishes. The same is true for functions of several variables.
Suppose / is a differentiable function on some open set S C W1. The point
a G 5 is called a critical point for / if V/(a) = 0. Finding the critical points of /
is a matter of solving the n equations <9i/(x) = 0,..., dnf(x) — 0 simultaneously
for the n quantities a?i,..., xn.
We say that / has a local maximum (or local minimum) at a if /(x) < /(a)
(or /(x) > /(a)) for all x in some neighborhood of a. Just as in the one-variable
case, we have:

2.78 Proposition. Iff has a local maximum or minimum at a and f is differentiate

at a, then V/(a) = 0.

Proof. If / has a local maximum or minimum at a, then for any unit vector u,
the function g(t) = /(a + iu) has a local maximum or minimum at t = 0, so
g'(0) = <9U/(a) = 0. In particular, djf{a) = 0 for all j, so V/(a) = 0. □
96 Chapter 2. Differential Calculus

How can we tell whether a function has a local maximum or minimum (or nei
ther) at a critical point? For functions of one variable we have the second derivative
test: If / is of class C2, then / has a local minimum at a if /"(a) > 0 and a local
maximum if /"(a) < 0. (If /"(a) = 0, no conclusion can be drawn.) Something
similar happens for functions of n variables, but the situation is a good deal more
complicated. The full story involves a certain amount of linear algebra; the reader
who is content to consider the case of two variables and wishes to skip the linear
algebra may proceed directly to Theorem 2.82.
Suppose / is a real-valued function of class C2 on some open set S C R and
that / has a critical point at a, i.e., V/(a) = 0. Instead of one second derivative to
examine at a, we have a whole n x n matrix of them, called the Hessian of / at a:

<9n<92/(a) ...

The equality of mixed partials (Theorem 2.45) guarantees that this is a symmetric
matrix, that is, Hij = Hji.
By (2.73), the second-order Taylor expansion of / about a is

/(a + k) = /(a)

(We use k rather than h for the increment in this section to avoid a notational clash
with the Hessian H.) If V/(a) = 0, the first-order sum vanishes, and the second-
order sum is j Yl Hijkikj = \H~k ■ k. In short,

(2.80) /(a + k) = /(a) + ±#k • k + i?a,2(k).

Now we can begin to see how to analyze the behavior of / about a in terms of
the matrix H. To start with the simplest situation, suppose it happens that all the
mixed partials d-idjf (?' ^ j) vanish at a. Denoting <92/(a) by Xj, we then have

/(a + k) = /(a) + £ Xjk] + fla,2(k).

Let us neglect the remainder term for the moment. If all Xj are positive, then
Y, ^jkj > 0 f°r aH k / 0, so / has a local minimum; likewise, if all Xj are neg
ative, then / has a local maximum. If some Xj are positive and some are negative,
2.8. Critical Points 97

then ^ Xjkj will be positive for some values of k and negative for others, so / will
have neither a maximum or a minimum. It's not hard to see that these conclusions
remain valid when the remainder term is included; we shall present the details be
low. Only when some of the Xj are zero is the outcome unclear; it is precisely in
this situation that the remainder term plays a significant role.
This is all very well, but the condition that didjf(a) = 0 for i / j is ob
viously very special. However, it can always be achieved by a suitable rotation
of coordinates, that is, by replacing the standard basis for R" with another suit
ably chosen orthonormal basis. This is the content of the spectral theorem, which
says that every symmetric matrix has an orthonormal eigenbasis (see Appendix A,
(A.56)-(A.58)). With this result in hand, we arrive at the second-derivative test for
functions of several variables.

2.81 Theorem. Suppose f is of class C2 at a and that V/(a) = 0, and let H be

the Hessian matrix (2.79). For f to have a local minimum at a, is it necessary for
the eigenvalues of H all to be nonnegative and sufficient for them all to be strictly
positive. For f to have a local maximum at a, it is necessary for the eigenvalues of
H all to be nonpositive and sufficient for them all to be strictly negative.

Proof. We prove only the first assertion; the argument for the second one is similar.
Let ui,..., un be an orthonormal eigenbasis for H with eigenvalues Ai,..., An.
Our assertion is then that / has a local minimum if all the eigenvalues are (strictly)
positive but not if some eigenvalue is negative.
If all eigenvalues are positive, let I be the smallest of them. Writing k =
• • • + cnun as before, we have

But when k is near 0, the error term in (2.80) is less than \l\k\2 by Corollary 2.75,
so

/(a + k) - /(a) > iZ|k|2 - |Z|k|2 > 0.

Thus / has a local minimum. On the other hand, if some eigenvalue, say Ai, is
negative, the same argument shows that /(a + iui) — /(a) < 0 for small t ^ 0, so
/ does not have a local minimum. □

In short, if all eigenvalues are positive, then / has a local minimum; if all
eigenvalues are negative, then / has a local maximum. If there are two eigenvalues
of opposite signs, then / is said to have a saddle point. At a saddle point, / has
neither a maximum nor a minimum; its graph goes up in one direction and down in
some other direction. The only cases where we can't be sure what's going on are
98 Chapter 2. Differential Calculus

FIGURE 2.5: Left: A local maximum (z = —x2 — y2). Middle: A

saddle point (z = x2 - y2). Right: A degenerate critical point (z =
x3 - y2).

those where all the eigenvalues of H are nonnegative or nonpositive but at least one
of them is zero. When that happens, if k is an eigenvector with eigenvalue 0 (i.e.,
k is in the nullspace of H), the quadratic term in (2.80) vanishes and the remainder
term becomes significant; to determine the behavior of / near a we need to look at
the higher-order terms in the Taylor expansion.
Some types of critical points are illustrated in Figure 2.5. A critical point for
which zero is an eigenvalue of the Hessian matrix H — or equivalently, for which
det H = 0 or H is singular — is called degenerate.
In two dimensions it is easy to sort out the various cases:

2.82 Theorem. Suppose f is of class C2 on an open set in M2 containing the point

a, and suppose V/(a) = 0. Let a = d2f(&), p = <9i<92/(a), 7 = <9f/(a).
Then:
a. Ifocy — P2 < 0, / has a saddle point at a.
b. Ifa^ — P2>0 and a > 0, / has a local minimum at a.
c. Ifa'j — P2>0 and a < 0, / has a local maximum at a.
d. Ifory — P2 = 0, no conclusion can be drawn.

Proof. The determinant of the Hessian matrix H = („ ^) is aj — P2. Since the

determinant is the product of the eigenvalues, the two eigenvalues have opposite
signs if a-f — P2 < 0, and they have the same sign if cry — P2 > 0. In the
latter case, H is positive (or negative) definite when the eigenvalues are positive
(or negative), and since a = Hu ■ u where u = (1,0), these cases occur precisely
when a > 0 or a < 0. The result now follows from Theorem 2.81. D
2.8. Critical Points 99

Example 1. Find and classify the critical points of the function f{x,y) =
xij{12 -Zx-Ay).
Solution. We have

dxf = 12y - 6xy - Ay2 = y(12 - 6x - Ay),

dyf = 12.x - 3.t2 - 8xy = x{\2 - Zx - 8y).

Thus, if dxf = 0 then y = 0 or 12 - 6z - 4y = 0, and if dyf = 0 then x = 0

or 12 — 3a; — 8y = 0. So there are four possibilities:

y = x = 0, y = 12 - 3x - 8y = 0,

12 - Qx - 4y = x = 0, and 12 - 6a; - Ay = 12 - 3.t - 8y = 0.

Solving these gives the critical points (0,0), (4,0), (0,3), and (|, 1). Since
d2xf = -Qy, dlf = -8a, and dxdyf = 12 - 6x - 8y, Theorem 2.82 shows
that the first three of these are saddle points and the last is a local maximum.
The geometry of this example is quite simple. The set where / = 0 is the
union of the three lines x = 0, y = 0, and 3x + Ay = 12. These lines separate
the plane into regions on which / is alternately positive and negative. The three
saddle points are the points where these lines intersect, and the local maximum
is the "peak" in the middle of the triangle defined by these lines.

Example 2. Find and classify the critical points of the function f{x,y) =
y3 - 3x2y.
Solution. We have dxf = -6xy and dyf = 3y2 - 3xi2. Thus, if dxf = 0,
then either x = 0 or y = 0, and the equation dyf = 0 then forces x = y = 0.
So (0,0) is the only critical point. The reader may readily verify that all the
second derivatives of / vanish at (0,0), so Theorem 2.82 is of no use. But since
f{x,y) = y(y — \/3 x)(y + y/3 x), the lines y = 0 andy = ±\/3a; separate the
plane into six regions on which / is alternately positive and negative, and these
regions all meet at the origin. Thus / has neither a maximum nor a minimum at
the origin. This configuration is called a "monkey saddle." (The three regions
where / < 0 provide places for the two legs and tail of a monkey sitting on the
graph of / at the origin.)

EXERCISES

1. Find all the critical points of the following functions. Tell whether each nonde-
generate critical point is a local maximum, local minimum, or saddle point. If
possible, tell whether the degenerate critical points are local extrema too.
100 Chapter 2. Differential Calculus

a. f(x, y) = x2 + 3y4 + 4y3 - 12y2.

b. f(x,y)=x4-2x2 + y3-6y.
c. f(x,y) = (x-l)(x2-y2).
d. f(x,y) = x2y2(2-x-y).
e. f(x,y) = (2x2 + y2)e-3?-y\
f. f(x, y) = ax"1 + by'1 + xy, a, 6 7^ 0. (The nature of the critical point
depends on the signs of a and b.)
g. f(x, y, z) = x3 - 3x - y3 + 9y + z2.
h. /(x, y, z) = (3x2 + 2y2 + ^e^2"^"22.
i. /(.x, y, z) = xyz(4 - a; - y - z).
2. What are the conditions on a, b, c for /(x, y) = ax2 + bxy + cy2 to have a
minimum, maximum, or saddle point at the origin?

3. The origin is a degenerate critical point of the functions f\ (x, y) = x2 + y4,

/2(x, y) = x2 - yA, and /s(x, y) = x2 + y3. Describe the graphs of these three
functions near the origin. Is the origin a local extremum for any of them?

4. Let/(x,y) = (y-x2)(y-2x2).
a. Show that the origin is a degenerate critical point of /.
b. Show that the restriction of / to any line through the origin (i.e., the func
tion g(t) = f(at, bt) for any (a, b) / (0,0)) has a local minimum at the
origin, but / does not have a local minimum at the origin. {Hint: Consider
the regions where / > 0 or / < 0.)

5. Let H be the Hessian of /. Show that for any unit vector u, Hu ■ u is the
second directional derivative of / in the direction u.

2.9 Extreme Value Problems

In the previous section we studied the critical points of a differentiable function,

which include its local maxima and minima. In this section we consider the prob
lem of finding the absolute maximum or minimum of a differentiable function on a
set S C W1, which has a somewhat different flavor.
The fundamental theoretical fact that underlies this study is the extreme value
theorem (1.23), whose statement we now recall: If S is a compact subset ofW1 and
f is a continuous function on S, then f assumes a minimum and a maximum value
on S — that is, there are points a, b G S such that /(a) < /(x) < /(b) for all
x € S. As the examples that we presented in §1.6 show, the conclusion is generally
invalid if S fails to be both closed and bounded. Accordingly, we shall assume
throughout this section that S is closed, but we shall include some discussion of the
situation when S is unbounded. Moreover, to keep the problem within the realm
2.9. Extreme Value Problems 101

of calculus, we shall assume that S is either (i) the closure of an open set with
a smooth or piecewise smooth boundary, or (ii) a smooth submanifold, such as a
curve or surface, defined by one or more constraint equations. (These geometric
notions will be studied in more detail in Chapter 3.)
Suppose, to begin with, that S is the closure of an open set in Rr\ and that we
wish to find the absolute maximum or minimum of a differentiable function / on
5. We assume that the boundary of 5 is a smooth submanifold (a curve if n = 2, a
surface if n = 3) that can be described as the level set of a differentiable function
G, or that it is the union of a finite number of pieces of this form. (For example,
if S is a cube, its boundary is the union of six faces, each of which is a region in a
smooth surface, viz., a plane.) If S is bounded, the extreme values are guaranteed
to exist, and we can proceed as follows.

i. If an extreme value occurs at an interior point of S, that point must be a crit

ical point of /. So, we find all the critical points of / inside S and compute
the values of / at these points.

ii. To find candidates for extreme values on the boundary, we can apply the
techniques for solving extremal problems with constraints presented below.

iii. Finally, we pick the smallest and largest of the values of / at the points
found in steps (i) and (ii); these will be the minimum and maximum of / on
S. There is usually no need to worry about the second derivative test in this
situation.

If S is unbounded, the procedure is the same, but we must add an extra argu
ment to show that the desired extremum actually exists. This must be done on a
case-by-case basis, as there is no general procedure available; however, here are a
couple of simple results that cover many situations in practice and illustrate the sort
of reasoning that must be employed.

2.83 Theorem. Let f be a continuous function on an unbounded closed set S C

TCP"
IK .

a. If /(x) —> +00 as |x| —> oo fx € S), then f has an absolute minimum but no
absolute maximum on S.
b. Iff(x) —> 0 as |x| —> oo fx £ 5) and there is a point xo <E S where /(xn) > 0
(resp. /(xo) < 0), then f has an absolute maximum (resp. minimum) on S.

Proof, (a) If /(x) —» oo as |x| —> oo, then clearly / has no maximum. On the
other hand, pick a point x0 e S and let V = {x e S : /(x) < /(xo)}. Then V is
closed (by Theorem 1.13) and bounded (since /(x) > /(x0) when |x| is large). By
102 Chapter 2. Differential Calculus

the extreme value theorem, / has a minimum on V, say at a € V. But then /(a) is
the absolute minimum of / on V because /(x) > /(xo) > /(a) for x € S \ V.
The proof of (b) is similar. If /(x0) > 0, let V = {x : /(x) > /(x0)}. Then
V is closed (by Theorem 1.13) and bounded (since /(x) —> 0 as |x| —» oo). By the
extreme value theorem, / has a maximum on V, say at a € V. But then /(a) is the
absolute maximum of / on 5 because /(x) < /(xo) < /(a) for x € 5 \ V. D

EXAMPLE 1. Find the absolute maximum and minimum values of the function
X
f(x, y) = 2 ' on the first quadrant S = {{x,y) : x,y > 0}.
x -\- \y L) -\- 4

Solution. Clearly /(cc,j/) > 0 for cc,y > 0, and f{0,y) = 0, so the
minimum is zero, achieved at all points on the y-axis. Moreover, f(x, y) is less
than the smaller of x~x and (y — 1)~2, so it vanishes as |(x, y)\ —> oo. Hence,
by Theorem 2.83, / has a maximum on S, which must occur either in the
interior of 5 or on the positive x-axis. A short calculation that we leave to the
reader shows that the only critical point of / in 5 is at (2,1), and /(2,1) = |.
Also, f(x,0) = x/(x2 + 5), and the critical points of this function of one
variable are at x = ±\/5. Only x = \/5 is relevant for our purposes, and
f(y/5, 0) = i/5/lO, which is a bit less than |. Thus the maximum value of /
on 5 is |.

Let us turn to the study of extremum problems with constraints. To be precise,

we consider the following situation: We wish to minimize or maximize a differen-
tiable function / on the set

5= (x:G(x) = 0},

where G is of class C1 and VG(x) / 0 on S. (The latter assumption guarantees

that the set S is smooth in the sense that it possesses a tangent (hyper)plane at every
point a G S, namely, the (hyper)plane through a that is perpendicular to the vector
VG(a); see Theorem 2.37 and §§3.3-4.) Most applied max-min problems are of
this sort, including the ones one first meets in freshman calculus — for example,
"Find the maximum area of a rectangle with a given perimeter P," i.e., maximize
xy subject to the constraint 2x + 2y = P.
There are several methods for attacking such a problem. The most obvious
one is to solve the constraint equation G(x) = 0 for one of the variables, either
explicitly or implicitly, and thus reduce the problem to finding the critical points of
a function of the remaining n — 1 variables. (Of course, this is what one always does
in freshman calculus.) Another possibility is to describe the set 5 parametrically
and thus obtain an (n — Invariable problem with the parameters as independent
2.9. Extreme Value Problems 103

variables. This is particularly effective when S is a closed curve or surface such as

a circle or sphere that cannot be described in its entirety as the graph of a function.
There is yet another method, however, which may derived from the following
considerations. Suppose that /, as a function on the set S = {x : G(x) = 0},
has a local extremum at x = a. If x = h(t) is a curve on S passing through a at
t = 0, the composite function ip(t) = /(h(i)) has a local extremum at t = 0, so
V/(a) ■ h'(0) = </(0) = 0. Thus, V/(a) is orthogonal to the tangent vector to
every curve on S passing through a; in other words, V/(a) is normal to S at a.
But we already know that VG(a) is normal to S at a since S is a level set of G. It
follows that V/ is proportional to VG at a:

V/(a) = AVG(a) for some AgI.

This is the key to the method. The n equations djf = XdjG together with the
constraint equation G = 0 give n+1 equations in the n+1 variables xi,..., Xn and
A, and solving them simultaneously will locate the local extrema of / on 5. (It will
also produce the appropriate values of A, which are usually not of much interest,
although one may have to find them in the process of solving for the x/s.) This
method is called Lagrange's method, and the parameter A is called the Lagrange
multiplier for the problem.
The other methods described above involve reducing the original n-variable
problem to an (n — Invariable problem, whereas Lagrange's method deals directly
with the original n variables. This may be advantageous when the reduction is awk
ward or when it would involve breaking some symmetry of the original problem.
The disadvantage is that, whereas the other methods lead to solving n — 1 equations
in n — 1 variables, Lagrange's method requires solving n + 1 equations in n + 1
variables.

Example 2. Let's try out Lagrange's method on the simple problem of max
imizing the area of a rectangle with perimeter P. Here f(x, y) = xy and
G(x,y) — 2x + 2y - P, so the equations dxf = XdxG, dyf = XdyG, and
G = 0 become

y = 2A, x = 2A, 2x + 2y = P.

The first two equations give y = x; substituting into the third equation shows
that x = y = \ P, so the maximum of / is jqP2- (Note that the only relevant
values of x and y are 0 < x, y < \P, so we're working on a compact set and
the existence of the maximum is not in question. The minimum on this set,
namely 0, is achieved when x = 0, y = \P, or vice versa.)
104 Chapter 2. Differential Calculus

Example 3. Find the absolute maximum and minimum of f(x, y) = x2 +

y2 + y on the disc x2 + y2 < 1.
Solution. We have fx = 2x, fy = 2y + 1. Thus the only critical point is
at (0, —|) (which lies in the disc), at which / = —\. To see what happens on
the boundary, we can use Lagrange's method with G(x, y) = x2 + y2 — 1. We
have to solve

2x = 2Xx, 2y + l=2\y, x2 + y2 = 1.

The first equation implies that either x = 0 or A = 1. The latter alternative

is impossible since the equation 2y + 1 = 2y has no solutions, so x = 0 and
then y = ±1 (since a;2 + y2 = 1). We have /(0,1) = 2, /(0, -1) = 0.
So the absolute maximum is 2 (at (0,1)) and the absolute minimum is —4 (at
(0,-1)).
We could also analyze / on the boundary by parametrizing the latter as
x = cos 9, y = sin 6. Then /(cos 8, sin 0) = 1 + cos 0, which has a maximum
value of 2 at # = 0 and a minimum value of 0 at 6 = tt.

Similar ideas work when there is more than one constraint equation. Let's
consider the case of two equations:

5={x:G1(x)=G2(x) = 0}.

Here G\ and G2 are differentiable functions (the subscripts are labels for the func
tions, not partial derivatives), and we assume that the vectors VGi (x) and VG2(x)
are linearly independent for x € S. (Again, this guarantees that S is a "smooth"
set, as we shall see in Chapter 3.) To find the extreme values of a differentiable
function on S, we have three methods:

• Solve the equations Gi(x) = G2(x) = 0 for two of the variables and find
the critical points of the resulting function of the remaining n — 2 variables.

• Find a parametrization of the set 5 in terms of parameters ti,..., tn-2, and

find the critical points of / as a function of these variables.

• (Lagrange's method) At a local extremum, V/ must be normal to 5 and

hence must be a linear combination of VGi and VG2:

V/ = AVGi + 1NG2 for some A, /j, e R.

The n equations djf = XdjGi + iidjGi together with the two constraint
equations G\ = G2 = 0 can be solved for the n + 2 variables xi,... ,xn, X,
and n, yielding the points where local extrema can occur.

The generalization to k constraint equations should now be pretty clear.

2.9. Extreme Value Problems 105

EXERCISES

1. Find the extreme values of f(x,y) = 2x2 + y2 + 2x on the set {(x,y) :

x2 + y2 < 1}.
2. Find the extreme values of f(x,y) = 3x2 - 2y2 + 2y on the set {(x,y) :
x2 + y2 <1}.
3. Find the extreme values of f(x, y) = x3 -x + y2 - 2y on the closed triangular
region with vertices at (—1, 0), (1, 0), and (0, 2).
4. Find the extreme values of f(x, y) = 3x2 - 8xy - Ay2 + 2x + 16y on the set
{(£,?/) :0<x< 4, 0<y<3}.
5. Let f(x, y) = (A - bx — cy)2 + x2 + y2, where A, b, c are positive constants.
Show that / has an absolute minimum on M2 and find it.
6. Show that f(x, y) = (x2 + 2y2)e~x y has an absolute minimum and maxi
mum on M.2, and find them.
7. Show that f(x, y) = (x2 — 2y2)e~x2~y2 has an absolute minimum and maxi
mum on R2, and find them.
8. Let/(x,y) = xy+3x~l+Ay^1. Show that / has a minimum but no maximum
on the set {(x, y) : x,y > 0}, and find the minimum.
9. Find the extreme values of f(x, y, z) = x2 + 2y2 + 3z2 on the unit sphere
{(x,y,z) :x2 + y2 + z2 = 1}.
10. Let (xi,y{),..., (xk, yt) be points in the plane whose ^-coordinates are not
all equal. The linear function f(x) = ax + b such that the sum of the squares
of the vertical distances from the given points to the line y = ax + b (namely,
YliiVj ~ axj ~ ^)2) is minimized is called the linear least-squares fit to the
points (xj , yj). Show that it is given by

k~l y\ xiVi — xv
a~ J,-lVnT2-T2 ' V

where x = fc"1 Yla xj and V = k'1 J2i Uj are me averages of the x/s and
y/s.
11. Let x, y, z be positive variables and a, b, c positive constants. Find the mini
mum of x + y + z subject to the constraint (a/x) + [b/y) + {c/z) = 1.
12. Find the minimum possible value of the sum of the three linear dimensions
(length, breadth, and width) of a rectangular box whose volume is a given
constant V. Is there a maximum possible value?

13. Find the point on the line through (1,0,0) and (0,1,0) that is closest to the
line through (0,0,0) and (1,1,1). (Hint: Minimize the square of the distance.)
106 Chapter 2. Differential Calculus

14. Find the maximum possible volume of a rectangular solid if the sum of the
areas of the bottom and the four vertical sides is a constant A, and find the
dimensions of the box that has the maximum volume.

15. The two planes x + z = 4 and 3x — y = 6 intersect in a line L. Use Lagrange's

method to find the point on L that is closest to the origin. (Hint: Minimize the
square of the distance.)

16. Find the maximum value of (xv — yu)2 subject to the constraints x2 + y2 = a2
and u2+v2 = b2.Do this (a) by Lagrange's method, (b) by the parametrization
x = a cos 6,y = a sin 9,u = b cos <p,v = b sin ip.

17. Let Pi = (xi,y{) and P2 = (^2,2/2) be two points in the plane such that
x\ 7^ £2 and j/i > 0 > j/2- A particle travels in a straight line from Pi to a point
Q on the x-axis with speed v\, then in a straight line from Q to P2 with speed
V2- The point Q is allowed to vary. Use Lagrange's method to show that the
total travel time from Pi to P2 is minimized when (sin #1)/(sin #2) = V1/V2,
where 6\ (resp. #2) is the angle between the line P\Q (resp. QP2) and the
vertical line through Q. (Hint: Take 6>i, (92 as the independent variables.)

18. Let xi,X2, ■ ■ ■ ,xn denote nonnegative numbers. For c > 0, maximize the
product £1X2 • • • %n subject to the constraint x\ + X2 H \-xn — c, and hence
derive the inequality of geometric and arithmetic means,

/ U/n , Xi +X2 H Vxn

(XiX2---Xn) < {Xi,...,Xn > 0),

where equality holds if and only if the Xj's are all equal.

19. Let A be a symmetric nxn matrix, and let /(x) = (.Ax) • x for xel". Show
that the maximum and minimum of / on the unit sphere {x : |xj = 1} are the
largest and smallest eigenvalues of A.

2.10 Vector-Valued Functions and Their Derivatives

So far our focus has been on real-valued functions on Rn, that is, mappings from
W1 to R. In a number of situations, however, it is useful to consider vector-valued
functions, that is, mappings (or maps, for short) from Rn to Rm where n and in
are any positive integers. We shall denote such functions or mappings by boldface
letters such as f:

Examples of the uses of such mappings include the following:

2.10. Vector- Valued Functions and Their Derivatives 107

• Functions from R to Rm can be interpreted as parametrized curves in Rm.

Similarly, maps from I2 to Rm give parametrizations of 2-dimensional sur
faces in Rm, and so forth.

• In the situation of the chain rale, where w is a function of x\,..., xn and the
Xj's are functions of other variables t\,..., £&, we are dealing with a map
x = g(t) from Rk to Rn.

• A map f : Rn —> Rn can represent a vector field, that is, a map that assigns
to each point x a vector quantity f (x) such as a force or a magnetic field.

• A map f : Rn —» Rn can represent a transformation of a region of space

obtained by applying geometric operations such as dilations and rotations.
For example, under the transformation f (x) = 2x + a, a region in W1 is
expanded by a factor of 2 and then moved over by the amount a.

• A map f : Rn —> Rn can represent the transformation from one coordi

nate system to another — for example, the polar coordinate map f (r, 6) =
(rcos9, rsind).

We shall have more to say about all of these interpretations in Chapter 3.

The simplest mappings from Rn to Rm are the linear2 ones, that is, maps f :
Rn -* Rm that satisfy

f (ax + by) = af (x) + 6f (y) (a, 6, e R, x, y G Rn).

Such a map is represented by an m x n matrix A = (Aj^), in such a way that if

elements of Rn and Rm are represented as column vectors, f (x) is just the matrix
product Ax.. In other words,

fe=l

You can see that the study of mappings from Rn to Rm is complicated, as the study
of the linear ones already constitutes the subject of linear algebra! However, the
basic ideas of differential calculus generalize easily from the scalar case. The only
bits of linear algebra we need for present purposes are the correspondence between
linear maps and matrices, the notion of addition and multiplication of matrices, and
the notion of determinant; see Appendix A, (A.3)-(A.15) and (A.24)-(A.33).

"Here we use the word "linear" in the more restrictive sense; see Appendix A, (A.5).
108 Chapter 2. Differential Calculus

A mapping f from an open set S C IRn into Em is said to be differentiable at

a € S if there is an rn x n matrix L such that

|f(a + h) -f(a) -Lhl

(2.84) lim i-i ,, , [ = 0.
h->o |h|

There can only be one such matrix L (the reason is given in the next paragraph),
and it is called the (Frechet) derivative of f at a. Commonly used notations for
it include Df (a), Daf, f (a), and dfa. We shall denote it by Df(a). Thus, if f is
differentiable on S, the map Df that assigns to each a £ S the derivative -Df (a) is
a matrix-valued function on S.
We need to verify that there is at most one matrix L satisfying (2.84). If V is
another such matrix, we have

|Lh - L'h\ = |[f (a + h) - f (a) - L'h] - [f (a + h) - f(a) - Lh]|

< |f(a + h) - f(a) - L'h\ + |f(a + h) - f (a) - Lh|,

so that |Lh - L'h|/|h| -*■ 0. But if V ^ L, we can pick a unit vector u with
Lu 7^ L'u. Setting h = su, we have h —> 0 as s -> 0, but

|L()^)| |(L^u)|= a
su|

This is a contradiction, so V = L.
In the scalar case rn — 1 (where f = /), the definition of differentiability
above coincides with the old one, and Df (a) is just V/(a), considered as a row
vector, i.e., a 1 X n matrix. (If we think of V/(a) as a column vector, then Df (a) =
[V/(a)]*.) Something similar happens when m > 1. Indeed, a vector v approaches
the vector 0 precisely when each of its components approaches the number 0, so
(2.84) is equivalent to the equations

where D is the jth row of the matrix L. But these equations say that the compo
nents fj are differentiable at x = a and that V/;(a) = U. In short, we have:

2.85. Proposition. An W1-valued junction f is differentiable at a. precisely when

each of its components f\,... , fm is differentiable at a. In this case, Df (a) is the
matrix whose jth row is the row vector V/j(a). In other words,

fdfi/dXi ■■■ dfi/dxr'

Df = : :

\dfm/dxi ■ ■■ dfmldxn
2.10. Vector-Valued Functions and Their Derivatives 109

The general fompof the chain rule can now be stated very simply:

2.86 TheoreprtChain Rule III). Suppose g : Rk ->■ Rn is differentiate at a G Rk

y^U m is differentiate at g(a) G Mn. 77ien H = f o g : Rk ->■ Mm is
differentiable at a,

= £>f (g(a))£>g(a),

vv/iere ?/ie expression on the right is the product of the matrices £)f (g(a)) and

Proof. Differentiability of H is equivalent to the differentiability of each of its

components ffj = /j o g, and for these we have, by Theorem 2.29,

+ ■ ■ ■ + (dnfi)(dkgn) =
.7 = 1

and c^j are to be evaluated at a, <9j/; at g(a).) But dkH{ is the ikth entry
of the matrix DH, and the sum on the right is the ikth entry of the product matrix
(Df)(Dg), so we are done. □

Since the product of two matrices gives the composition of the linear transfor
mations defined by those matrices, the chain rule just says that the linear approxi
mation of a composition is the composition of the linear approximations.

As we pointed out at the end of §2.1, the mean value theorem is false for vector-
valued functions. That is, for a differentiable Rm -valued function f with m > 1,
given two points a and b there is usually no c on the line segment between a and b
such that f(b) - f (a) = [Df(c)][b - a]. However, the main corollary of the mean
value theorem, an estimate on |f (a) - f(b)| in terms of a bound on the derivative
of f, is still valid. To state it, we employ the following terminology: The norm of
a linear mapping A : Rn -> Rm is the smallest constant C such that \Ax\ < C|x|
for all xei". The norm of A is denoted by ||A||; thus,

(2.87) \Ax\ < \\A\\\x\ (xe

Equivalently, ||A|| = max{|Ax| : |x| = 1}; see Exercise 9. An estimate for

in terms of the entries Aj^ is given in Exercise 10.

2.88 Theorem. Suppose f is a differentiable W71 -valued function on an open con

vex set S C Rn, and suppose that \\Df (x)|| < M for all x G S. Then

|f(b) -f(a)| < M\b-Si\ for all a,b G S.

HO Chapter 2. Differential Calculus

Proof. Given a unit vector u e Rm, let us consider the scalar-valued function
/u(x) = u • f(x). Clearly fu is differentiable on S and dkfu = u ■ dkf =
Y.m=iujdkfj- By the mean value theorem (2.39) applied to /u, then, there is a
point c on the line segment between a and b (depending on u) such that

u • [f (b) - f (a)] = /u(b) - /u(a) = [V/U(c)] • [b - a]

= J2ujdkfJ(c)(bk - ak) = u ■ [CDf(c))(b - a)].

Hence, by Cauchy's inequality, the fact that |u| = 1, and (2.87),

lu • [f(b) - f(a)] < |u| ||-Df(c)|| |b - a| < M\b - a|.

The desired result now follows by taking u to be the unit vector in the direction of
f(b)-f(a),sothatu-[f(b)-f(a)] = |f(b)-f(a)|. (Of course, if f(b)-f(a) = 0,
the result is trivial.) CD

In the case m = n, the Frechet derivative Df of a function f : W -> W is

annxn matrix of functions, defined on the set S where f is differentiable, so we
can form its determinant. This determinant, a scalar-valued function on 5, is called
the Jacobian of the mapping f. It is sometimes denoted by Jf, or, if y = f (x), by
,... ,xn):

(2.89)

(The last notation may look peculiar at first, but it is actually quite handy.) Since
the determinant of a product of two matrices is the product of the determinants, the
chain rule implies that if y = f (x) and x = g(t) (t, x, y G W1), then

Jfog(t) = Jf (g(t))Jg(t), or
(2.90) 0(yi, ...,2/n) = d{yu...,yn)d{xu...,xn)

If f : W1 —> Rm with n > m, we can form a number of different Jacobians by

singling out m of the independent variables for attention and treating the others as
constants, thereby considering f as a function from Rm to Rm. In other words, we
can look at the determinants of all the m x m submatrices of the m x n matrix Df.
The last notation in (2.89) is handy in this situation because it allows us to name
the m independent variables that have been singled out. Similarly, if n < m, we
can consider the determinants of the n x n submatrices of £>f obtained by singling
out n of the components of f.
2.10. Vector- Valued Functions and Their Derivatives 111

Example 1. Let (u,v) = f(x,y,z) = (2x + y3, xeby~7z). Then

Df(x,y,z)-{ 5y_7z

= (10, -
y)

EXERCISES

1. Let (u,v) = t(x,y,z) = (xyz2 - Ay2, 3xy2 - yz). Compute Df(x,y,z),

d(u, v)/d(x, y), d(u, v)/d(y, z), and d(u, v)/d(x, z)

2. Let (u, v,w) = i(x,y) = (x + Qy, 3xy, x2 — 3y2). Compute Df(x,y),

d(u, v)/d(x, y), d(v, w)/d(x, y), and d(u, w)/d(x, y).
3. Define f : R2 -> R3 by f(«, v) = (u2 - 5v, ve2u, 2u - log(l + v2)).
a. Compute Df(u, v). What is £>f (0,0)?
b. Suppose g : R2 -> R2 is of class C\ g(l,2) = (0,0), and Dg(l,2) =
/1 2
\3
I). Compute D(fog)( 1,2).
4. Define f : M3 -» R2byf(x,y,z) = (2x + (y-l)2-sinz, 3x+2e2y~5z).
a. Compute Df(x, y, z). What are f (0,0,0) and Df (0,0,0)?
b. Let g be as in Exercise 3b. Compute D(g o f)(0,0,0).

5. Show that if f : Wl -s- Rm is defined by f (x) = Ax + b, where A is an m x n

matrix and b e Rm, then -Df (x) = A for all x.

6. Suppose / : Rn —► R is of class C2; then V/ is a C1 mapping from Rn to

itself. Show that D(Vf) is the Hessian matrix of /.

7. Suppose f and g are differentiate mappings from Rn to Wm. Show that their
dot product, h(x.) = f (x) ■ g(x), is a differentiable real-valued function on Rn,
and that

i(x) = [Df (x)]g(x) + [Dg(x)]f (x),

if we think of Vh(x), f (x), and g(x) as column vectors. (Here A* denotes the
transpose of the matrix A; see Appendix A, (A. 15).)
112 Chapter 2. Differential Calculus

8. Suppose that w = f(x, y, t, s) and x and y are also functions of t and s (the
situation depicted in Figure 2.3). The total dependence of w on t and s can be
expressed by writing w = /(g(£,s)) where g(£, s) = (x(t,s), y(t,s), t, s).
Show that the chain rule (2.86), applied to the composite function / o g, yields
the same result as the one obtained in §2.3.

9. Let A : Wn —> Rm be a linear map.

a. Show that the function <£>(x) = \Ax\ has a maximum value on the set
{x:|x| = l}.
b. Let M be the maximum in part (a). Show that \Ax\ < M|x| for all x G Kn,
with equality for at least one unit vector x. Deduce that M = \\A\\.
10. Let A : W1 -> Rm be a linear map.
a. Show that \\A\\ < v^rnmax^GC]^ \Ajk\). (Hint: Use (1.3).)
b. Show that this inequality is an equality when the matrix of A is given by
Aji = 1 and Ajk = 0 for k > 1 (1 < j < m).
Chapter 3

THE IMPLICIT FUNCTION

THEOREM AND ITS APPLICATIONS

In this chapter we take up the general question of the local solvability of systems
of equations involving nonlinear differentiable functions. The main result is the
implicit function theorem, one of the major theoretical results of advanced calcu
lus. Among other things, it provides the key to answering many questions about
relations between analytic properties of functions and geometric properties of the
sets they define. We shall present some of its applications to the study of geomet
ric transformations, coordinate systems, and various ways of representing curves,
surfaces, and smooth sets of higher dimension.

3.1 The Implicit Function Theorem

In this section we consider the problem of solving an equation F(x\,..., xn) = 0

for one of the variables Xj as a function of the remaining n — 1 variables, or more
generally of solving a system of k such equations for k of the variables as functions
of the remaining n — k variables.
We begin with the case of a single equation, and to develop some feeling for
the geometry of the problem we consider the cases n = 2 and n = 3. For n — 2
we are given an equation F(x, y) = 0 relating the variables x and y, and we ask
when we can solve for y as a function of x or vice versa. Geometrically, the set
S = {(x, y) : F(x, y) = 0} will usually be some sort of curve, and our question
is: When can S be represented as the graph of a function y = f(x) or x — g(y)l
Likewise, for n = 3, the set where F(x, y, z) = 0 will usually be a surface, and we
ask when this surface can be represented as the graph of a function z = f{x,y),
y = g(x,z), or x = h(y,z).
113
114 Chapter 3. The Implicit Function Theorem and Its Applications

Simple examples show that it is usually impossible to represent the whole set
S = {x : F(x) = 0} as the graph of a function. For example, if n = 2 and
F(x, y) = x2 + y2 — 1, the set S is the unit circle. We can represent the upper or
lower semicircle as the graph of f(x) = ±\/l - x2, and the right or left semicircle
as the graph of g(y) = ±\/l — y2, but the whole circle is not a graph. Thus, in
order to get reasonable results, we must be content only to represent pieces of S
as graphs. More specifically, our object will be to represent a piece of S in the
neighborhood of a given point a G S as a graph.
Since we want to single out one of the variables as the one to be solved for, we
make a little change of notation: We denote the number of variables by n + 1 and
denote the last variable by y rather than xn+i. We then have the following precise
analytical statement of the problem:

Given a function F(x, y) of class C1 and a point (a, b) satisfying F(a., b) = 0,

when is there
i. a function /(x), defined in some open set in W1 containing a, and
ii. an open set U C ]Rn+1 containing (a, b),
such that for (x, y) € U,

We do not try to specify in advance how big the open sets in question will be; that
will depend strongly on the nature of the function F.
The key to the answer is to look at the linear case. If

L(xi,...,xn,y) = aixi -\ V anxn + (3y + c,

the solution is obvious: The equation L(x,y) = 0 can be solved for y if and only
if the coefficient /3 is nonzero. But near a given point (a, b), every differentiate
function F{x, y) is approximately linear; in fact, if F(a., b) = 0,

F(x,y) = [<9iF(a, b)]{xi - ai) + • ■ • + [dnF{si,b)}{xn - an)

+ [dyF(a, b)](y - b) + small error.

If the "small error" were not there, the equation -F(x, y) = 0 could be solved for y
precisely when dyF(a., b) ^ 0. We now show that the condition dyF(&, b) ^ 0 is
still the appropriate one when the error term is taken into account.

3.1 Theorem (The Implicit Function Theorem for a Single Equation). Let
F{x, y) be a function of class Cl on some neighborhood of a point (a, b) 6 Mn+1.
Suppose that F(a, 6) = 0 and dyF(a,b) / 0. Then there exist positive numbers
ro, n such that the following conclusions are valid.
3.1. The Implicit Function Theorem 115

L =^ X

•2r0-

FlGURE 3.1: The geometry of the implicit function theorem. 0yF > 0
in the box, F > 0 on the top side, F < 0 on the bottom side, and
F = 0 on the curve.

a. For each x in the ball |x — a| < r$ there is a unique y such that \y — b\ < T\
and F(x, y) = 0. We denote this y by /(x); in particular, /(a) = b.
b. The function f thus defined for x — a| < is of class Cl, and its partial
derivatives are given by

(3.2)

Notes.
i. The number tq may be very small, and there is no way to estimate its size
without further hypotheses on F.
ii. The formula (3.2) for djf is, of course, the one obtained via the chain rule
by differentiating the equation F(x, /(x)) = 0.

Proof. We first prove (a). We may assume that dyF(a, b) > 0 (by replacing F by
—F if necessary). Since dyF is continuous, it remains positive in some neighbor
hood of (a, b), say for |x — a| < r\ and \y — b\ < r\. On this set, F(x,y) is a strictly
increasing function of y for each fixed x. In particular, since F(a, b) = 0 we have
F(a, b + 7"i) > 0 and F(a, b — r\) < 0. The continuity of F then implies that for
some ro < r\ we have F(x, b + r{) > 0 and F(x, b — ri) < 0 for |x — a| < r0.
In short, for each x in the ball B = {x : |x—a| < ro} wehaveF(x, b—ri) < 0,
F(x, 6 + r{) > 0, and F(x, y) is strictly increasing as a function of y for \y — b\ <
r\. It follows from the intermediate value theorem that there is a unique y for each
x € B that satisfies \y — b\ < r\ and F(x, y) = 0, which establishes (a). See
Figure 3.1.
Next we observe that the function y = /(x) thus defined is continuous at x =
a; in other words, for any e > 0 there is a 5 > 0 such that |/(x) — /(a)| < e
116 Chapter 3. The Implicit Function Theorem and Its Applications

whenever |x - a| < 5. Indeed, the argument just given shows that |/(x) - /(a)| =
\y — b\ < n whenever |x — a| < tq, and we could repeat that argument with r\
replaced by any smaller number e to obtain an appropriate 5 in place of r0.
In fact, this argument can also be applied with a replaced by any other point xo
in the ball B to show that / is continuous at x0. To recapitulate it briefly: Given
e > 0, there exists 5 > 0 such that if |x — xo| < <5 we have -F(x, yo — e) < 0 and
F(x.,yo + e) > 0, where yo = /(xo)- For each such x there is a unique y such
that \y — yo\ < e and F(x,y) = 0, and that y is /(x); hence |/(x) — /(xo)| =
\y-yo\ < e.
Now that we know that / is continuous on B, we can show that its partial
derivatives djf exist on B and are given by (3.2) — which also shows that they are
continuous. Given xeB and a (small) real number h, let y = /(x) and

k = /(x + h) — /(x), where

h = (0,..., 0, h, 0,..., 0) with the h in the jth place.

Then y + k = /(x + h), so F(x + h, y + k) = F(x, y) = 0. Hence, by the mean

value theorem,

0 = F(x + h, y + k)-F(x,y)
= hdjF(-x. + th, y + tk) + kdyF(x + th,y + tk)

for some t G (0,1). Rearranging this equation gives

/(x + h) - ,/(x) = k _ djF(x + th,y + tk)

h h~ dyF(x + tb.}y + tk)'

Now let h —> 0. Since / is continuous we also have k —> 0, and then since djF
and dyF are continuous and dyF ^ 0, passage to the limit yields (3.2). □

3.3 Corollary. Let F be afunction ofclass Cl on Rn, and let S = {x : F(x) = 0}.
For every a G S such that VF(a) ^ 0 there is a neighborhood Nofa such that
S n N is the graph of a Cl function.

Proof. Since VF(a) ^ 0, we have djF(&) ^ 0 for some j. The equation F = 0

can then be solved to yield Xj as a C1 function of the remaining variables near the
point a. □

EXAMPLE 1. Let F(x,y) = x — y2 - 1, for which dxF(x,y) = 1 and

dyF{x,y) = —2y. First, dxF is never 0, so the implicit function theorem
guarantees that the equation F(x, y) = 0 can be solved for x locally near any
3.1. The Implicit Function Theorem 117

point (a, b) for which F(a, b) = 0. Of course, for this particular F it is easy
to solve for x explicitly — namely, x = y2 + 1 — and this solution is valid
not just locally but globally. Next, dyF(a, 6) = 0 precisely when b = 0, so
the implicit function theorem guarantees that the equation F(x,y) = 0 can be
solved uniquely for y near any point (a, b) such that F(a, b) = 0 and b ^ 0.
In fact, the possible solutions are y = \Jx — 1 and y = —\/x — 1. For x very
close to a only one of these solutions will be very close to 6 — namely, \/x — 1
if b > 0 and — \Jx — 1 if b < 0 — and this solution is the one that figures in
the implicit function theorem. Also, these solutions are defined only for x > 1,
so the number r$ in the statement of the implicit function theorem is a — 1.
Finally, we have F(l, 0) = 0, but the equation F(x, y) — 0 cannot be solved
uniquely for y as a function of x in any neighborhood of (1,0): If x > 1 there
are two solutions, both equally close to 0, and if x < 1 there are none.

EXAMPLE 2. For a contrast with Example 1, let G(x,y) — x — e1~x — y3.

First, dxG(a, 6) = 1 + e1~a > 1 for all (a, 6), so the implicit function theorem
guarantees that the equation G(x,y) = 0 can be solved for x locally near
any point (a, b) such that G(a, b) = 0. It is not hard to see (Exercise 4) that
there is a single solution that works globally, but there is no nice formula for
this solution in terms of elementary functions. Next, dyG(a, b) = —3b2, so
the implicit function theorem guarantees that the equation G(x,y) = 0 can
be solved for y as a C1 function of x locally near any point (a, b) such that
G(a, b) = 0 and b / 0. In fact, the solution is y = (x — e1^)1/3, which is
globally uniquely defined but fails to be differentiable at the point where y = 0
(i.e., x = 1).

We now turn to the more general problem of solving several equations simul
taneously for some of of the variables occurring in them. This will require some
facts about invertible matrices and determinants, for which we refer to Appendix
A, (A.24)-(A.33) and (A.50)-(A.55). To fix the notation, we shall consider k func
tions JFi,...,F% of n + k variables a?i,...,xn, y\,..., y^, and ask when we can
solve the equations

Fi(xi,...,xn,yi,...,yk) = 0,
(3.4) :

Fk(xi,...,xn,yi,...,yk) = 0

for the y's in terms of the x's. We shall use vector notation to abbreviate (3.4) as

(3.5) F(x,y) = 0.
118 Chapter 3. The Implicit Function Theorem and Its Applications

We assume that F is of class C1 near a point (a, b) such that F(a, b) = 0, and we
ask when (3.5) determines y as a C1 function of x in some neighborhood of (a, b).
Again the key to the problem is to consider the linear case,

(3.6) Ax + By + c = 0,

where A is a k x n matrix, B is a k x k matrix, and c G Rfc. Here the criterion for

solvability is obvious: The matrix B must be invertible, in which case the solution
is y = — B~l(Ax.+c). Now, the linear approximation to the equation (3.5) near the
point (a, b) is an equation of the form (3.6) in which the matrix B is the (partial)
Frechet derivative of F with respect to the variables y, evaluated at (a, b):

(3.7) £i;

Hence, the crucial requirement is that

(3.8) the matrix B defined by (3.7) is invertible.

Invertibility of a matrix can be characterized in a number of different ways, as

discussed in Appendix A, (A.52). For example, (3.8) can be expressed more geo
metrically as the condition that the gradient vectors VyFj = (dyiFj,... ,dVkFj),
1 < j < K are linearly independent at (a, b). However, the version of (3.8) that
is directly used in the proof of the following theorem, as well as in many of its
applications, is that det B ^ 0. We therefore state the theorem in these terms.

3.9 Theorem (The Implicit Function Theorem for a System of Equations).

Let F(x, y) be an Rfc -valued function of class Cl on some neighborhood of a
point (a,b) <E W+k and let Bij = (dFl/dyj){&,b). Suppose that F(a,b) = 0
and detB ^ 0. Then there exist positive numbers ro,n such that the following
conclusions are valid.
a. For each x in the ball |x — a| < tq there is a unique y such that |y — b| < r\
and F(x, y) = 0. We denote this y by f (x); in particular, f (a) = b.
b. The function f thus definedfor |x—a| < tq is of class C1, and its partial deriva
tives dx.f can be computed by differentiating the equations F(x, f(x)) = 0
with respect to Xj and solving the resulting linear system of equations for
dXjfi, ■ ■ ■ ,dX]fk-

Proof. The proof is presented in Appendix B.2 (Theorem B.2). In a nutshell, it

proceeds by induction on k. The hypothesis that det B ^ 0 implies that at least
one of the [k — 1) x (k — 1) submatrices of B is invertible. By inductive hypothesis,
one can solve the corresponding system of k — 1 equations for k — 1 of the variables
3.1. The Implicit Function Theorem 119

yf, then, after substituting the results into the remaining equation, one solves that
equation for the remaining variable. The main difficulty is in showing that the
implicit function theorem can be applied to the last equation. □

Example 3. Consider the problem of solving the equations

(3.10) x — yu2 = 0, xy + uv = 0

for u and v as functions of x and y. Setting F — x — yu2 and G = xy + uv,

we see that

d(u,v)

so the implicit function theorem guarantees a local solution near any point
0^0) yo,uo,vo) at which (3.10) holds provided that —2yoUQ / 0, that is, yo / 0
and uq ^ 0. Notice that under this condition, the first equation in (3.10) im
plies that xq / 0 and that xq and yo have the same sign; the second equation
then implies that vq ^ 0 and that uq and vq have opposite signs.
It is not hard to find the solution explicitly:

the signs of u and v being the same as the signs of uq and vq, respectively. This
solution is valid for all (a;, y) in the same quadrant as (xo, yo)- The problems
that arise if yo = 0 or uq = 0 are evident: If yo = 0, then the formula for u
does not even make sense for y = yo; if uq = 0, then xq must also be 0, and
the square roots present the same sort of problem as in Example 1.

EXERCISES

1. Investigate the possibility of solving the equation x2 — 4x + 2y2 — yz = 1

for each of its variables in terms of the other two near the point (2, —1,3). Do
this both by checking the hypotheses of the implicit function theorem and by
explicitly computing the solutions.

2. Show that the equation x2 + 2xy + 3y2 = c can be solved either for y as a
C1 function of x or for x as a C1 function of y (but perhaps not both) near any
point (a, b) such that a2 + 2ab + 362 = c, provided that c > 0. What happens
if c = 0 or if c < 0?
120 Chapter 3. The Implicit Function Theorem and Its Applications

3. Can the equation (x2 + y2 + 2z2)1!2 = cos z be solved uniquely for y in terms
of x and z near (0,1,0)? For z in terms of x and y?
4. Sketch the graph of the equation x - e1~x - y3 = 0 in Example 2. Show
graphically that for each x there is a unique y satisfying this equation, and vice
versa.

5. Suppose F(x, y) is a C1 function such that F(0,0) = 0. What conditions on

F will guarantee that the equation F(F(x, y),y) =0 can be solved for y as a
C1 function of x near (0,0)?
6. Investigate the possibility of solving the equations xy + 2yz — 3xz = 0, xyz +
x — y = 1 for two of the variables as functions of the third near the point
(x,y,z) = (1,1,1).
7. Investigate the possibility of solving the equations u3 + su — y = 0, v3 + yu —
x = 0 for any two of the variables as functions of the other two near the point
(x,y,u,v) = (0,1,1,-1).
8. Investigate the possibility of solving the equations xy2 + xzu + yv2 = 3 and
u3yz + 2xv — u2v2 = 2 for u and v as functions of x, y, and z near x = y =
z = u = v = 1.
9. Can the equations x2 + y2 + z2 = 6, xy + tz = 2, xz + ty + ef = 0 be solved
for x, y, and z as C1 functions oft near (x,y,z,t) = (—1, -2,1,0)?

3.2 Curves in the Plane

In this section we examine the relations between various ways of representing

smooth curves in the plane. Here we shall take "smooth" to mean that the curve
possesses a tangent line at each point and that the tangent line varies continuously
with the point of tangency. (Don't worry if this last continuity condition seems a
little unclear; we will reformulate it more precisely below.) Thus "smooth" is the
geometric equivalent of "C1."
There are three common ways of representing smooth curves in the plane E2:

i. as the graph of a function, y = f(x) or x = f(y), where / is of class C1;

ii. as the locus1 of an equation F(x, y) = 0, where F is of class C1;

iii. parametrically, as the range of a C1 function f : (a, b) ->• R2.

Of these, (i) is the simplest, and it a special case of the other two. Indeed, the curve
given by y = f(x) is the locus of the equation F(x,y) = 0 where F(x,y) —

'The locus of an equation F(x) = c is the set of all x that satisfy the equation.
3.2. Curves in the Plane 121

FIGURE 3.2: Left: The sets x2 - y2 = c for c = ±1 (the hyperbolas)

and c = 0 (the cross). Right: The sets y3 = x2 + c for c = 1 (top),
c = 0 (middle), and c = — 1 (bottom).

y — f(x), and it is also the range of the map f (t) = (t, f(t)). The representations
(ii) and (iii) are more flexible, but they are also too general as they stand because
the sets represented by them may not be smooth curves. Consider the following
examples, in which c denotes an arbitrary real constant:

Example 1. Let F(x,y) = x2 + y2 - c. The set where F(x,y) — 0 is a

smooth curve (a circle) if c > 0, but it is a single point if c = 0 and it is the
empty set if c < 0.

Example 2. Let G(x,y) = x2 - y2 - c. The set where G(x,y) — 0 is a

hyperbola (the union of two disjoint smooth curves) if c / 0, but if c = 0 it
is the union of the two lines y = x and y = —x. The latter set looks like a
smooth curve in a neighborhood of any of its points except the origin, where
the two lines cross. See Figure 3.2.

Example 3. Let H(x,y) = y3 - x2 - c. The set where H(x,y) = 0 is a

smooth curve if c / 0, but when c = 0 it is a curve with a sharp cusp at the
origin. The latter set can also be described parametrically by f(t) = (i3,t2).
See Figure 3.2.

Example 4. The function g(t) = (sin21, cos21) is C\ but its range is the
line segment from (0,1) to (1, 0). The point g(£) traverses this line segment
from (0,1) to (1, 0) as t goes from 0 to \k, then traverses it in the reverse
direction as t goes from ^tt to tt, and this pattern is repeated on every interval
[rar, (n + 1)tt\.
122 Chapter 3. The Implicit Function Theorem and Its Applications

In these examples, the functions in question are all of class C1, but the sets they
describe fail to be smooth curves at certain points. However, they share a common
feature: The points where smoothness fails — namely, the origin in Examples 1-3
and the points (0,1) and (1, 0) in Example 4 — are the points where the derivatives
of the relevant functions vanish. That is, the origin is the one and only point where
the gradients Vi71, VG, and VH vanish, and it is the image under f of the one and
only point (t = 0) where f vanishes. Moreover, (0,1) and (1,0) are the images
under g of the points t = nir and t = (n + ^)tt where g'(t) = 0.
This suggests that it might be a good idea to impose the extra conditions that
VF / 0 on the set where F = 0 in (ii) and that f'(£) ^ 0 in (iii). And indeed, with
the help of the implicit function theorem, it is easy to see that under these extra
conditions the representations (i)-(iii) are all locally equivalent. That is, if a curve
is represented in one of the forms (i)-(iii) and a is a point on the curve, at least a
small piece of the curve including the point a can also be represented in the other
two forms.
We now make this precise. Since (i) is more special than either (ii) or (iii), as
we have observed above, it is enough to see that a curve given by (ii) or (iii) can
also be represented in the form (i).

3.11 Theorem.
a. Let F be a real-valued function of class Cl on an open set in IR2, and let S =
{{x, y) : F(x, y) = 0}. If a £ S and V.F(a) + 0, there is a neighborhood N
of a in E2 such that S n N is the graph of a C1 function f (either y = f(x) or

b. Let f : (a, b) -¥ R2 be a function of class Cl. Iff'(to) / 0, there is an open

interval I containing to such that the set (f(£) : t € 1} is the graph of a C1
function f (either y = f(x) or x = f(y)).

Proof. Part (a) is a special case of Corollary 3.3. As for (b), let f = (ip,ip). If
f'(£o) 7^ 0, then either (p'(to) / 0 or ip'(to) j^ 0; let's assume that the former
condition holds. Let F(x,t) = x - ip(t) and xq = ip{to). Since dtF(xo,to) =
—(p'(to) ^ 0, the implicit function theorem guarantees that the equation x = <p(t)
can be solved for t as aC1 function of x, say t = ui(x), in some neighborhood of the
point (.To, to). But then (</?(£), ip{t)) = (x, ip(oj(x))) for t in some neighborhood /
of to; that is, the set {f (t) : t G /} is the graph of the C1 function / = ip o u. (If
ip'(to) ^z o instead, one can make the same argument with x and y switched.) □

It should be noted that the conditions of nonvanishing derivatives in Theorem

3.11 are automatically satisfied in the special case where the curve is given in the
form (i). That is, if F(x, y) = y — f(x), then dF/dy = 1, so VF never vanishes;
similarly, if f(t) = (t, /(£)), then f'(i) = (!,/'(*)) * (0,0).
3.2. Curves in the Plane 123

With this in mind, we may make the following more formal definition of a
smooth curve: A set S C R2 is a smooth curve if (a) S is connected, and (b)
every a G S has a neighborhood iV such that S n N is the graph of a C1 function
/ (either y = f(x) or x = f(y)). This agrees with the notion of smooth curve
indroduced at the beginning of this section: The curve described by y = f(x)
has a tangent line at each point (xq, /{xq)), and that line is given by an equation
V — f(%o) — f'(xo)(x — xo) whose coefficients depend continuously on xq.
It should be emphasized that the conditions VF ^ 0 and f ^ 0 in Theorem
3.11, are sufficient for the smoothness of the associated curves but not necessary.
In other words, the condition VF(a) = 0 or f'(io) = 0 allows the possibility
of non-smoothness at a or f (t0) but does not guarantee it. For example, suppose
G(x, y) is a C1 function whose gradient does not vanish on the set S = {(x, y) :
G(x,y) = 0}, so that 5 is a smooth curve, and let F = G2. Then the set where
F = 0 coincides with S, but VF = 2GVG = 0 on 5! Similarly, as t ranges over
the interval (—1,1), the functions f(t) and g(t) = f(t3) describe the same curve,
but g'(0) = 0 no matter what f is.
The following question remains: Suppose S is a subset of M2 that is described
in one of the forms (i)-(iii), and suppose that the regularity condition VF ^ 0 on
S (in case (ii)) or i'(t) ^ 0 for all t G (a, b) (in case (III)) is satisfied. Theorem
3.11 shows that every sufficiently small piece of 5 is a smooth curve, but is the
entire set S a smooth curve? In case (i) the answer is clearly yes. However, in cases
(ii) and (iii) the answer may be no.
The trouble in case (ii) is that 5 may be disconnected. For example, if F =
GH, then 5 is the union of the sets {(x, y) : G(x, y) = 0} and {(x, y) : H(x, y) =
0}, and these sets may well be disjoint and form a disconnection of 5. (Also see
Exercise 6.)

EXAMPLE 5. Let F(x, y) = (x2 + y2 - l)(x2 + y2 - 2). Then the set where
F = 0 is the union of two disjoint circles centered at the origin. See Figure
3.3.

Example 6. Let F(x,y) = (x2 + y2 - l){x2 + y2 - 2x). Then the set S

where F — 0 is the union of the circles of radius 1 about (0, 0) and (1,0).
These circles intersect at the points (|, ±5 \/3), and S is not a smooth curve
at these points. The reader may verify that VF = (0,0) at these points, in
accordance with Theorem 3.11. See Figure 3.3 and also Exercise 6.

As for the representation (iii), a set of the form {f (t) : a < t < 6} is necessarily
connected if f is continuous (Theorem 1.26). However, the function f (t) may not
be one-to-one, in which case the curve it describes may be traced more than once
(as we observed in Example 4) or may cross itself. These phenomena can happen
124 Chapter 3. The Implicit Function Theorem and Its Applications

FIGURE 3.3: The sets in Examples 5 (left), 6 (middle), and 8 (right).

even ifi'{t) never vanishes. Consequently, the condition f'(£) ^ 0 is not sufficient
to guarantee that the set 5 = {f(£) : t e (a, b)} is a smooth curve, only that
the pieces of it obtained by restricting t to small intervals are smooth curves. In
practice, sometimes one simply imposes the extra assumption that f is one-to-one
in order to avoid various pitfalls.

EXAMPLE 7. Let f(t) = (cost,sini). Then f (t) = (— sin£,cost) is never

zero since the sine and cosine functions have no common zeros, but f is one-to-
one on the interval (a, 6) only when b — a < 2ir. The range {f (t) : t G M.} of f
is a smooth curve (namely, the unit circle), but in order to obtain a one-to-one
correspondence between points on the circle and values of the parameter t, one
must restrict t to an interval of the form [a, a H- 2tt) or (a, a + 2ty].

Example 8. Letf(t) = (t3-t, t2). Thenf(i) = (3£2-l, 2£) never vanishes,

but f(-l) = f(l) = (0,1). The curve {i(t) : t G R} loops around and
crosses itself at (0,1), so it fails to be a smooth curve at that point. However,
{f (t) : t G /} is a smooth curve as long as / is an interval that does not contain
both —1 and 1. See Figure 3.3.

The reader with access to a computer graphics program may find it entertaining
to experiment with examples similar to the ones in this section to obtain a better
understanding of the relations between analytic and geometric properties of func
tions and to see the various types of singularities that can arise when the regularity
condition VF ^ 0 or f (t) ^ 0 is violated.

EXERCISES

1. For each of the following functions F(x,y), determine whether the set S =
{(x, y) : F(x, y) = 0} is a smooth curve. Draw a sketch of S. Examine the
3.2. Curves in the Plane 125

nature of S near any points where VF = 0. Near which points of S is S the

graph of a function y = f(x)l x = /(y)?
a. F(x,y) =x2 + 3y2 - 3.
b. F{x,y) =x2 -3y2 - 3.
c. F(x,y)=x
d. F(z,y) =xy(x + y- 1).
e. F(z,y) = (x2 + y2){y - x2 - 1).
f. F(x,y) = (x2+y2)(y-x2).
g. F(z,y) = (e*-l)2 + (smy-l)2.
2. Let 5P = {(.x, y) : xp + yp = 1}, where p is a positive integer.
a. Show that Sp is a smooth curve for all p.
b. Draw a sketch of Sp. (The geometry of Sp depends strongly on whether p
is even or odd.)
c. Which portions of Sp can be represented as the graph of a continuous func
tion y = f{x)l x = /(y)? What if / is required to be C1? (Again, the
cases p even, p odd and > 1, and p — 1 are different.)

3. For each of the following functions f (£), determine whether the set S = {f (t) :
t G M.} is a smooth curve. Draw a sketch of S. Examine the nature of S near
any points f (t) where f'(t) = 0.
a. f(t) = («2-l,t + l).
b. f(t) = (t2 - 1, i2 + 1).
c. f(t) = (t3 - 1, t3 + 1)
d. f(t) = (cos3 i, sin3 t).
e. f (£) = (cos t + cos 2i, sin t + sin 2i).
4. Let <^(s) = s2 if s > 0, </?(s) = -s2 if s < 0.
a. Show that <p is of class C1, even at s = 0.
b. Let f(£) = ((p(cost), ip(sint)). Show that {f(t) : t € R} is the square
with vertices at (±1, 0) and (0, ±1). For which values of t is f'(t) = 0?
What are the corresponding points f (£)?

5. Letf(t) = ((t2-l)/(t2 + l), t{t'2-l)/{t2 + l)) and5= {f(t) :teR}.

a. Show that S is the locus of the equation y2(l - x) = x2(l + x).
b. Draw a sketch of S. (S is a curve containing a loop; it is called a strophoid.)
Show that S is asymptotic to the line x = 1.
c. Discuss the nature of the point (0,0) where S crosses itself in terms of the
parametric and nonparametric representations of S in (a).

6. Let F\ and F2 be C1 functions on some open set U in the plane, and let F% =
FiF2. For j = 1,2,3, let Sj = {x G U : Fj(x) = 0}.
a. Show that S3 = Sil) S2-
b. Show that if a G Si D S2, then VF3(a) = 0.
126 Chapter 3. The Implicit Function Theorem and Its Applications

3.3 Surfaces and Curves in Space

In this section we discuss ways of representing smooth surfaces and curves in 1R3,
with a brief sketch of the situation in higher dimensions.

Surfaces in M3. The standard ways of representing surfaces in 3-space are

analogous to the standard ways of representing curves in the plane:

i. as the graph of a function, z = f(x,y) (or y = f(x,z) or x = f(y,z)),

where / is of class C1;

ii. as the locus of an equation F(x, y, z) = 0, where F is of class C1;

iii. parametrically, as the range of a C1 function f : R2 —> K3.

As before, (i) is a special case of (ii) and (iii), with F(x, y,z) = z — f(x, y) and
f(u,v) = (u,v,f(u,v)), and as before, some additional conditions need to be
imposed in cases (ii) and (iii) in order to guarantee the smoothness of the surface.
The condition in case (ii) is exactly the same as for curves, namely, that

(3.12) VF(x, y, z) ^ (0,0,0) whenever F(x, y, z) = 0.

The situation in case (iii) needs to be examined a little more closely.

To be precise, we assume that f is a C1 map from some open set [/Cl2 into
M3, and we consider the set

5={x€l3 :x = f(u), u 6 U}.

Here x = (a;, y, z) and u = (u, v)\ the variables it and v are the parameters used to
represent the surface 5. We can think of them as giving a coordinate system on 5,
with the coordinate grid being formed by the images of the lines v = constant and
u = constant, that is, the curves given parametrically x = f (u, c) and x = f (c, v).
The picture is as in Figure 3.4.
What is the appropriate nondegeneracy condition on the derivatives off? A first
guess might be that the Frechet derivative Df (a 3 x 2 matrix) should be nonzero,
but this is not enough. We can obtain more insight by looking at the case where
f is linear, that is, i(u,v) — ua + vb + c for some a, b,c € R3. Typically the
range of such an f is a plane, but if the vectors a and b are linearly dependent
— that is, if one is a scalar multiple of the other — it will only be a line (unless
a = b = 0, in which case it is a single point). Now, for a general smooth f, the
linear approximation to f near a point {uo,v0) is f (u, v) ^ ua + vh + c where the
3.3. Surfaces and Curves in Space 111

I Z

,' X

Figure 3.4: Parametric representation of a surface.

vectors a, b, and c are duf, dvf, and f evaluated at (uo, un). Hence we are led to
the regularity hypothesis:

the vectors —(u,v) and —-(u,v) are linearly independent

(3.13) ou ov
at each (it, v) e U,

Since two vectors in R3 are linearly independent if and only if their cross product
is nonzero, (3.13) can be restated as

<9f 0f
(3.14) (u, v) / 0 at each (it, v) € U.
du dv

If S is the graph of a function / and we take the standard parametrization f(u, v) =

(u,v,f(u,v)), the condition (3.13) or (3.14) is automatically satisfied, because
8J = (1,0, duf) and 8J = (0,1, dvf).
Notice that dui and dvf are the tangent vectors to the "coordinate curves"
x = f(n, c) and x = f(c, t>) described above. Thus, the condition (3.13) means
that these tangent vectors, at each point of the surface, are nonzero and point in
different directions; this implies that the coordinate curves are smooth and intersect
nontangentially.
With these things in mind, we arrive at the analogue of Theorem 3.11 for sur
faces.

3.15 Theorem.
a. Let F be a real-valued function of class C1 on an open set in ]R3, and let
S = {(x,y,z) : F(x,y,z) = 0}. // a e S and VF(a) ^ 0, there is a
neighborhood N of a in R3 such that S n N is the graph of a C1 function f
(either z = f(x, y), y = f(x, z), or x = f(y, z)).
128 Chapter 3. The Implicit Function Theorem and Its Applications

b. Let f be a Cl mapping from an open set in R2 into R3. If[dufx dvf] (uo ,^0)7^
0, there is a neighborhood N of (uo,vq) in R2 such that the set {f(u,t>) :
(it, v) £ N} is the graph of a C1 function.

Proof. Part (a) is a special case of Corollary 3.3. As for (b), let f = (ip, ip, x). The
components of the cross product dui x dvf are just the Jacobians d(<p, ip)/d(u, v),
d(ip, x)/d(u, v), and d(ip, x)/d(u, v). Under the hypothesis of (b), at least one of
them — let us say d(ip, ijj)/d(u, v) — is nonzero at (uq, vq). The implicit function
theorem then guarantees that the pair of equations x — ip(u,v), y = tp(u,v) can
be solved to yield u and v as Cl functions of x and y near it = no, v = vq,
x = tp(uQ,Vo), y = i)}{uq,vq). Substituting these functions for u and v in the
equation z = x{u, v) then yields z as a Cl function of x and y whose graph is the
range of f. □

Thus the representations (i)-(iii) for surfaces are locally equivalent in the pres
ence of the regularity conditions (3.12) and (3.13); a smooth surface is a connected
subset of M.2 that can be locally described in any of these three forms. The poten
tial global problems with the representations (ii) and (iii) are the same as for plane
curves; namely, the set where a C1 function F vanishes may be disconnected, and
a map f that is locally one-to-one need not be globally one-to-one.

Example 1. Let i(u,v) = Uu + v)cos(u — v), (u + v)sin(u — v), u +

v). The set 5 = f(M2) is a right circular cone with vertex at the origin; it
is described nonparametrically by the equation x2 + y2 = z2. The set S is
a smooth surface except at the origin, which accords with the fact that the
gradient of F{x, y, z) — x2 + y2 — z2 vanishes at the origin and nowhere else.
Correspondingly, the vectors

= (cos(-u — v) — (u + v) sin(u — v), s'm(u — v) + (u + v) cos(u — v), l)

and

= (cos('u — v) + (u + v) sm{u — v), sin(u — v) — (u + v) cos(u — v), l)

are linearly independent except when u + v = 0, in which case they coincide.

The map f is locally one-to-one except along the line u + v = 0, and this entire
line is mapped to the origin. (The reader will recognize that u + v and u — v are
really the r and 9 of cylindrical coordinates in R3. We have chosen to disguise
them a little in order to display a situation where duf and dvi are both nonzero
but are linearly dependent where the singularities occur.)
3.3. Surfaces and Curves in Space 129

Example 2. The unit sphere S = {(x,y, z) : x2 + y2 + z2 = 1} can be

parametrized by spherical coordinates,

f (9, ip) = (cos9 sin</?, sin0sin</?, cosy?).

Here 0 is the longitude and tp is the co-latitude, i.e., the latitude as measured
from the north pole rather than the equator. The longitude 9 is only well defined
up to multiples of 2tt, but the co-latitude is usually restricted to the interval
[0,7r]. The sphere is a smooth surface, but the map f does not provide a "good"
parametrization of the whole sphere because it is not locally one-to-one when
sin cp = 0. (That is, the longitude is completely undetermined at the north and
south poles.) This degeneracy is also reflected in the tangent vectors

dgf = (—sin9sin.(p, cos 0 sin y?, 0),

d^f = (cos9 cos ip, sin 9 cos <p, — sin<p);

they are linearly independent when simp ^ 0, but d$f = 0 when siny? = 0.
However, if we restrict 6 and <p to the rectangle — 7r<9<ir,Q<<p<Tr, we
obtain a good parametrization of the sphere with the "international date line"
removed.

Finally, a few words about finding the tangent plane to a smooth surface S at a
point a € S. In general, the tangent plane is given by the equation n • (x — a) = 0,
where n is a (nonzero) normal vector to S at a. We have already observed in
Theorem 2.37 that when 5 is given by an equation F = 0, then the vector VF(a)
is normal to S at a. On the other hand, when S is given parametrically as the range
of a map f (u, v), the vectors dui(a) and dvi(a) are tangent to certain curves in 5*
and hence to 5 itself at a; we therefore obtain a normal at a by taking their cross
product. In both cases, the conditions on F or f that guarantee the smoothness of
5 also guarantee that these normal vectors are nonzero.

Curves in R3. Curves in R3 are generally described either parametrically or

as the intersection of two surfaces. The situation where two of the coordinates are
given as functions of the third one can be considered as a special case of either of
these. Thus, once again we have three kinds of representation for curves:

i. as a graph, y = f(x) and z = g(x) (or similar expressions with the coordi
nates permuted), where / and g are Cl functions;

ii. as the locus of two equations F(x, y, z) = G(x, y, z) = 0, where F and G

are Cl functions;
130 Chapter 3. The Implicit Function Theorem and Its Applications

iii. parametrically, as the range of a Cl function f : M. —> M3.

The form (ii) describes the curve as the intersection of the two surfaces F = 0
and G — 0, and (i) is a special case of (ii) (with F(x,y,z) = y — f(x) and
G(x,y,z) =*-0(a;))andaf(iii)(withf(t) = (tJ(t),g(t))).
By now the reader should be able to guess what the appropriate regularity con
dition for cases (ii) and (iii) is. In (iii) it is simply that f'(t) ^ 0, and in (ii) it is
that

V-F(x) and VG(x) are linearly independent

at every x at which F(x.) = G(x) = 0.

(Geometrically, this means that the surfaces F = 0 and G = 0 are nowhere tangent
to each other.) With these conditions we have an analogue of Theorems 3.11 and
3.15. Rather than give another precise statement and proof, we sketch the ideas and
leave the details to the reader (Exercise 7).
First, if V.F and VG are linearly independent, then at least one of the Jacobians
d{F,G)/d(x,y), d(F,G)/d(x,z), and d(F,G)/d(y,z) must be nonzero; let us
say the last one. Then the implicit function theorem guarantees that the equations
F = G = 0 can be solved for y and z as functions of x. Second, if f'(t) ^ 0,
then one of the components of f'(£) must be nonzero; let us say the first one. Then
the equation x = f\ (t) can be solved for t in terms of x, and then the equations
y = f'2{t) and z = fz(t) yield y and z as functions of x. In either case we end up
with the representation (i).
Let us say a little more about what can go wrong in case (ii) when WF and
VG are linearly dependent. The potential problems are clearly displayed in the
following situation: Let F(x,y,z) = z - ip(x,y), where <p is a C1 function, and
let G(x, y, z) = z. Then the sets where F — 0 and G = 0 are smooth surfaces; the
former is the graph of <p, whereas the latter is the xy-plane. The intersection of these
two surfaces is the curve in the xy-plane described by the equation <p(x,y) = 0.
Now, this curve can have all sorts of singularities if there are points on it where
Vtp = (0,0), as we have discussed in §3.2. But since VF = (—dxip, —dy(p, 1) and
VG = (0,0,1), the points where V</? = (0,0) are precisely the points where VF
and VG are linearly dependent.
If a curve S is represented parametrically by a function f (£), the derivative f'(i)
furnishes a tangent vector to S at the point f(i). On the other hand, if 5 is given
by a pair of equations F = G = 0 and a 6 S, the vectors VF(a) and VG(a) are
both normal to S at a and hence span the normal plane to S at a. One can therefore
obtain a tangent vector to S at a by taking their cross product.
3.3. Surfaces and Curves in Space 131

Higher Dimensions. The pattern for representations of curves and surfaces that
we have established in this section and the preceding one should be pretty clear by
now, and it generalizes readily to higher dimensions. We sketch the main points
briefly and leave it to the ambitious reader to work out the details.
The general name for a "smooth /c-dimensional object" is manifold; thus, a
curve is a 1-dimensional manifold and a surface is a 2-dimensional manifold. Here
we consider the question of representing /c-dimensional manifolds in M.n, for any
positive integers k and n with n > k. The two general forms, corresponding to (ii)
and (iii) above for curves and surfaces, are as follows.
The Nonparametric Form: A /c-dimensional manifold S in W1 can be described
as the set of simultaneous solutions of n — k equations. That is, given C1 functions
F\,..., Fn_k defined on some open set U C W1, or equivalently a C1 mapping
F = (Fi,..., Fn_fc) from U into Wl~k, we can consider the set

(3.16) S = (x:F(x) = 0}.

The regularity condition that guarantees that S is a smooth /c-dimensional manifold

is that

x),..., VFn_t(x) are linearly independent at each xeS,

or, equivalently,

the matrix DF(x) has rank n — k at every xeS.

This condition implies that, for each xo € 5, some (n — k) x (n — k) submatrix

of .DF(xo) is nonsingular, and the implicit function theorem then implies that the
equations F(x) = 0 can be solved near xo for n — k of the variables as C1 functions
of the remaining k variables. This leads to the more special representation of (small
pieces of) S by the equations analogous to (i) for curves and surfaces, namely,
x" = g(x'), where x" represents an ordered (n — /c)-tuple of coordinates and x' is
the ordered fc-tuple of remaining coordinates.
The Parametric Form: Given a C1 map f from some open set V C Rk into R",
we can consider the set

(3.17) 5 = {f(u):uey}.

The regularity condition that guarantees that 5 is a smooth /c-dimensional manifold

is that

dUlf (u),..., dUki(u) are linearly independent at each ueK,

132 Chapter 3. The Implicit Function Theorem and Its Applications

or equivalently,

the matrix Df(u) has rank k at each u e V.

This condition implies that, for each uo £ V, some k x k submatrix of Df (uo)

is invertible, say the one formed from the rows i\,..., ik. The implicit function
theorem then implies that the equations ccj. = fiAux,... ,Uk) (1 < j < A;) can
be solved near uo to yield the Uj 's as C1 functions of x' = (x{1,.. .,x%k). Substi
tuting these functions for the Uj's in the remaining equations xi = fi(u\,..., Uk)
again yields a representation of (small pieces of) S analogous to (i) for curves and
surfaces.
It is perhaps worth pointing out what these two representations boil down to
in the linear case. That is, suppose S is a /c-dimensional vector subspace of W1;
then S can be represented in the forms (3.16) or (3.17) where the functions F and
f are linear and hence are given by matrices. (3.16) is the representation of 5 as
the nullspace of an (n — k) x n matrix, and (3.17) is the representation of S as the
column space of an n x k matrix; in both cases the regularity condition is that the
rank of the matrix in question is as large as possible.

EXERCISES

1. For each of the following maps f : M.2 —> R3, describe the surface S =
and find a description of S as the locus of an equation F(x, y, z) = 0. Find the
points where duf and dvf are linearly dependent, and describe the singularities
of 5 (if any) at these points.
a. f(u, v) = (2u + v, u — v, 3u).
b. f(u,v) = (cmcosu, businv, u) (a,b > 0).
c. f(u,v) = (cosizcoshu, Binucoshv, sinhv).
d. ((u,v) = (ucosv, usinv, u2).
2. Find an equation for the tangent plane to the following parametrized surfaces
at the point (1, —2,1). (The first step is to find the values of the parameters u, v
that yield this point.)
a. x = eu~v, y = u-3v,z = \{uz + v2).
b. x = l/(u + v),y = —(u + ev), z = u3.
3. Find a parametrization for each of the following surfaces (perhaps involving an
angular variable that is denned only up to multiples of 2-k).
a. The surface obtained by revolving the curve z = f(x) (a < x < b) in the
xz-plane around the z-axis, where a > 0.
3.4. Transformations and Coordinate Systems 133

b. The surface obtained by revolving the curve z = f(x) (a < x < b) in the
xz-plane around the x-axis, where f(x) > 0.
c. The lower sheet of the hyperboloid z2 — 2x2 — y2 = 1.
d. The cylinder x2 + z2 = 9.
4. Find a parametric description of the following lines:
a. The intersection of the planes x — 2y + z = 3 and 2x — y — z = — 1.
b. The intersection of the planes x + 2y = 3 and y — 3z = 2.

5. Let S be the circle formed by intersecting the plane x + z = 1 with the sphere
x2 + y2 + z2 - 1.
a. Find a parametrization of S.
b. Find parametric equations for the tangent line to 5 at the point (|, — -t, |).
6. Let S1 be the intersection of the cone z2 = x2 + y2 and the plane z = ax + I,
where a € R.
a. Show that 5 is a circle if a = 0, an ellipse if |a| < 1, a parabola if \a\ = 1,
and a hyperbola if |a\ > 1.
b. Find a parametrization for 5 in the first two cases and for the part of S
lying above the xy-plane in the third case.

7. Give a precise statement and proof of the analogue of Theorem 3.11 for curves
inR3.

3.4 Transformations and Coordinate Systems

In this section we study smooth mappings from Rn to itself in more detail, with
emphasis on geometric intuition for the cases n = 2 and n = 3.
Suppose f : Rn —*• R" is a map of class C1. We can regard f as a transfor
mation of Rn, that is, an operation that moves the points in Rn around in some
definite fashion. When n > 1, such transformations are usually best pictured with
"before and after" sketches. That is, if x = f (u), we think of u and x as living
in two separate copies of Rn. We draw a sketch of u-space with some geometric
figures in it, such as a grid of coordinate lines, then draw a sketch of x-space with
the images of those figures under the transformation f.

Example 1. Define f : R2 -> R2 by t(u,v) = |(\/3u -v,u+ V3v). The

map f represents a counterclockwise rotation through the angle g?r about the
origin (since |\/3 = cos ^vr and \ = sin \tx). See Figure 3.5.

EXAMPLE 2. Define f : R2 -> R2 by f (u, v) = (2u, v). f simply stretches out

the u coordinate by a factor of 2. See Figure 3.6.
134 Chapter 3. The Implicit Function Theorem and Its Applications

1 1 1

1 I 1

1 1 1
1 1

FIGURE 3.5: The transformation f(u, v) — — v,u+

1 1
1 1
t

1 1

1 1
1

1
1 1

Figure 3.6: The transformation i(u,v) = (2u,v).

EXAMPLE 3. Define f : R2 -> R2 by f{u,v) = {v? - v2, 2uv). Unlike the

previous two examples, this f is not one-to-one; it maps (u, v) and (—u, —v) to
the same point. (It's not hard to check that this is the only duplication of values:
If f(u, v) = f(z,w) then (z,w) = ±(u,v).) In order to draw an intelligible
picture, we restrict attention to the region u > 0. We also denote f (u, v) by
(x,y), so the "before" and "after" pictures are the ut>-plane and the xy-plane.
The image of the vertical line u = c under f is given by x = c2 — v2, y = 2cv.
Elimination of v yields x = c2 — y2 /4c2, the equation of a parabola that opens
out to the left. On the other hand, the image of the horizontal line v = c is
given by x = u2 — c2, y = 2cu, which yields x = y2/4c2 — c2. Since we
are assuming u > 0, we have y > 0 or y < 0 depending on whether c > 0
or c < 0; in either case this curve is half of a parabola opening to the right.
See Figure 3.7: The u-axis is mapped to the negative a>axis (both (0, v) and
(0, —v) being mapped to (—v2,0)), as indicated by the dotted lines, and the
right half of the uv-p\ane is bent to the left to fill up the rest of the a;y-plane.
We can also draw the reverse picture. The horizontal line y = c in the xy-
plane corresponds to the curve 2uv = c in the uv-plane, which is a hyperbola
whose asymptotes are the coordinate axes. The vertical line x = c corresponds
to the curve u2 — v2 = c, which is a hyperbola whose asymptotes are the
3.4. Transformations and Coordinate Systems 135

FIGURE 3.7: The transformation (x,y) = (u2 - v2, 2uv), showing the
image in the a;y-plane of the coordinate grid in the half-plane u > 0.

Figure 3.8: The transformation (x,y) = (u2 — v2, 2uv), showing

the curves in the half-plane u > 0 that map to the coordinate grid in
the xy-plane.

lines v = ±a when 0 and the union of these two lines when c = 0. See
Figure 3.8.

We can think of mappings from M'1 to itself pictorially in the same way, though
the pictures are harder to draw. Figure 3.9 shows what happens to a cube under the
transformation t(u,v,w) = (—2u,v, hw).
Another common interpretation of a map f : K." -> Mn is as a coordinate
system on Wl. For example, we usually think of f(r, 0) = (r cos 6, r sin 6) as
representing polar coordinates in the plane. In the preceding discussion we thought
in terms of moving the points in Rn around without changing the labeling system
(namely, Cartesian coordinates); here we are thinking of leaving the points alone
but giving them different labels (polar rather than Cartesian coordinates.) It's just
a matter of point of view; the same transformation f can be interpreted either way.
For example, the systems of parabolas and hyperbolas in Figures 3.7 and 3.8 can
136 Chapter 3. The Implicit Function Theorem and Its Applications

FIGURE 3.9: The transformation f(n, v, w) = (-2u,v,^w). (The u

and w axes are horizontal and vertical, respectively.)

Figure 3.10: The polar coordinate transformation (x, y) = (r cos 8, r sin (

be interpreted as the grids for curvilinear coordinate systems in the appropriate

parts of the plane, and the map f (r, 9) = (r cos 0, r sin 9) can be interpreted as a
transformation. Figure 3.10 shows a representative piece of it.
Not all mappings f : R" —♦ W1 can be used as coordinate systems, however. A
"good" coordinate system should have the property that there is a one-to-one cor
respondence between points and their coordinates; that is, each set of coordinates
should specify a unique point in W1, and two different sets of coordinates should
specify different points. Polar coordinates, for example, do not satisfy this condi
tion — (r,8) and (r, ip) are polar coordinates of the same point whenever 9 — 0 and — ix < 9 < it, we do get a "good" coordinate system, not on the
whole plane, but on the plane with the negative real axis removed. Likewise, the
map (u, v) = (x2 — y'2, 2xy) in Example 3, restricted to the half-plane x > 0, gives
a "good" coordinate system on the nu-plane with the negative u axis removed.
In short, our attention is directed to transformations f of class C1 that map an
open set U C Rn in a one-to-one fashion onto another open set ^cl". There is
3.4. Transformations and Coordinate Systems 137

one further requirement that is natural to impose, namely, that the inverse mapping
f -1 : V ->■ U should also be of class C1, so that the correspondence is smooth in
both directions. Hence, the question arises: Given a Cl transformation f : U —» V,
when does f possess a C1 inverse f"1 : V ->■ Ul That is, when can the equation
f (x) = y be solved uniquely for x as a C1 function of y?
This question is clearly closely related to the ones that led to the implicit func
tion theorem, and indeed, if we restrict attention to the solvability of the equation
f (x) = y in a small neighborhood of a point, its answer becomes a special case of
that theorem. As we did before, we can guess what the answer should be by looking
at the linear approximation. If f (a) = b, the linear approximation to the equation
f (x) = y at the point (a, b) is T(x - a) = y - b where the matrix T is the Frechet
derivative Df (a), and the latter equation can be solved for x precisely when T is
invertible, that is, when the Jacobian det Df (a) is nonzero. We are therefore led to
the following theorem.

3.18 Theorem (The Inverse Mapping Theorem). Let U and V be open sets in W1,
a <E U, and b = f (a). Suppose that f : U -> V is a mapping of class C1 and the
Frechet derivative -Df (a) is invertible (that is, the Jacobian det Df (a) is nonzero).
Then there exist neighborhoods M C U and NcVofa. and b, respectively, so
that f is a one-to-one map from M onto N, and the inverse map f"1 from N to M
is also of class Cl. Moreover, ify = f(x) G N, D{f~l){y) = 1
Proof The existence of the inverse map is equivalent to the unique solvability of
the equation F(x, y) = 0 for x, where F(x, y) = f (x) - y. Since the derivative of
F as a function of x is just Df (x), the implicit function theorem (3.9) guarantees
that this unique solvability will hold for (x,y) near (a, b) provided that Df(a) is
invertible. (In referring to the statement of the implicit function theorem, however,
note that the roles of the variables x and y have been reversed here.) Moreover,
since f-1(f (x)) = x for x G M, the chain rule gives D(f-l){f(x)) • Df (x) = /
where / is the n x n identity matrix; in other words, D(f~1)(y) = [Df(x)]-"1
where y = f (x). D

It is to be emphasized that the inverse mapping theorem is local in nature; the

global invertibility of f is a more delicate matter. To be more precise, consider the
following question: Suppose f : U -)■ V is of class Cl and that Df(x) is invertible
for every x G U. Is f one-to-one on U?
When n = 1, the answer is yes, provided that U is an interval. Here we are
considering a C1 function f(x) such that f'(x) ^ 0 on the interval U = {a,b).
Since /' is continuous, we must have either f'(x) > 0 for all x G (a, b), so that /
is strictly increasing, or f'(x) < 0 for all x G (a, b), so that / is strictly decreasing.
In either case, / is one-to-one.
138 Chapter 3. The Implicit Function Theorem and Its Applications

When n > 1, however, the answer is no. The simplest counterexample is

our old friend the polar coordinate map, f(r,9) = (r cos 9, r sin#), on the set
U = {{r,9) :r > 0}. We have

DS<r-»> =
so det Df ^ 0 on £/, but f is not one-to-one since f (r, 9 + 2/cvr) = f (r, #). It is,
however, locally one-to-one, in that it is one-to-one if one restricts 9 to any interval
of length less than 2tt. (Notice also that the Jacobian of the polar coordinate map
vanishes when r = 0. This accords with the fact that the polar coordinate map is
not even locally invertible there; the angular coordinate is completely undetermined
at the origin.)
The question of global invertibility is a delicate one. Consider the following
situation: Let f : Mn —> WJ be a map whose component functions are all polyno
mials, and suppose that the Jacobian det Df is identically equal to 1. Is f globally
invertible? The answer is so far unknown; this is a famous unsolved problem.
We should also point out that the invertibility of Df (a) is not necessary for the
existence of an inverse map, although it is necessary for the differentiability of that
inverse. (Example: Let f(x) = x3. Then / has the global inverse f~l(y) = y1//3,
but /(0) = /'(0) = 0 and f'1 is not differentiable at 0.)

EXERCISES

1. For each of the following transformations (u,v) = i(x,y), (i) compute the
Jacobian det Di, (ii) draw a sketch of the images of some of the lines x =
constant and y = constant in the TO-plane, (iii) find formulas for the local
inverses of f when they exist.
a. u = ex cos y,v = ex sin y.
b. u = x2, v = y/x.
c. u = x2 + 2xy + y2, v — 2x + 2y.
2. Let (u, v) = f (x, y) = {x- 2y, 2x - y).
a. Compute the inverse transformation (x, y) — f^1(tt, v).
b. Find the image in the uw-plane of the triangle bounded by the lines y = x,
y = —x, and y = 1 — 2x.
c. Find the region in the xy-plane that is mapped to the triangle with vertices
(0,0), (-1,2), and (2,1) in the uv-plme.
3.4. Transformations and Coordinate Systems 139

3. Let u = sin a; cosh y, v = cos x sinhy.

a. Show that the images of the lines x = constant (resp. y = constant) in the
liu-plane are hyperbolas (resp. ellipses).
b. Show that d(u, v)/0{x, y) = cos2 x + sinh2 y. At what points (x, y) does
this Jacobian vanish? Show that the corrsponding points in the uv-plane.
are (±1,0).
c. (optional) Show that the ellipses and hyperbolas in (a) all have foci at
(±1,0).
4. Let (u,v) = f{x,y) = (a; - y, xy).
a. Sketch some of the curves x — y = constant and xy = constant in the
xy-plane. Which regions in the xy-plane map onto the rectangle in the
uv-plane given by 0 < u < 1, 1 < v < 4? There are two of them; draw a
picture of them.
b. Compute the derivative Df and the Jacobian J = det Z?f.
c. The Jacobian J vanishes at (a, b) precisely when the gradients Vu(a, b)
and Vv(a, b) are linearly dependent, i.e., when the level sets of u and v
passing through a and b are tangent to each other. (If this doesn't seem
obvious at first, think about it!) Use your sketch of the level sets in (a) to
show pictorially that this assertion is correct.
d. Notice that f (2, -3) = (5, -6). Compute explicitly the local inverse g of
f such that g(5, —6) = (2, —3) and also compute its derivative Dg.
e. Show by explicit calculation that the matrices Df(2, —3) and -Dg(5, —G)
are inverses of each other.

5. Find a one-to-one C1 mapping f from the first quadrant of the xy-plane to the
first quadrant of the w-plane such that the region where x2 < y < 2x2 and
1 < xy < 3 is mapped to a rectangle. Compute the Jacobian det Df and the
inverse mapping f 1. (Hint: Map all the regions where ax2 < y < bx2 and
c < xy < d to rectangles.)
6. Let f : K3 -> R3 be the spherical coordinate map,

(x,y,z) = f (r, f. 9) = (r sin<£> cos 0, rs'mtpsmO, r cos </?).

Thus r is the distance to the origin, ip is the co-latitude (the angle from the
positive z-axis), and 9 is the longitude.
a. Describe the surfaces in ary^-space that are the images of the planes r =
constant, tp = constant, and 9 = constant.
b. Compute the derivative Df and show that det Df (r, </?, 9) = r2 sin tp.
c. What is the condition on the point {ro,<po, 9q) for f to be locally invertible
about this point? What is the corresponding condition on (x'o^Vo, ^o) =
140 Chapter 3. The Implicit Function Theorem and Its Applications

7. We have obtained the inverse mapping theorem as a corollary of the implicit

function theorem. It is also possible to prove the inverse mapping theorem di
rectly and then obtain the implicit function theorem as a corollary of it. Do this
last step; that is, assume the inverse mapping theorem and deduce the implicit
function theorem from it. (Hint: Let F(x, y) be as in Theorem 3.9. Apply the
inverse mapping theorem to the transformation G : Wl+k —* Rn+fc denned by
G(x,y) = (x,F(

3.5 Functional Dependence

In the implicit function theorem and its applications discussed in the preceding
sections, we have drawn consequences from the nonvanishing of various Jacobians.
In this section we consider the opposite situation, in which a Jacobian vanishes
identically.
For motivation, let us first consider the linear case. Let A be an n x n matrix,
and define F : Rn —>■ Rn by F(x) = Ax (where x is considered as a column
vector). If A is nonsingular, F is a one-to-one map from Rn onto itself, whose
inverse is F-1(y) = A~ly. However, if detA = 0, the range of T (namely,
the column space of .A) is a proper linear subspace of Rn, and the components
(/l; • • • i fn) of F satisfy at least one nontrivial linear relation. More precisely, if
the rank of A is k, where k < n, then the range of F is a /c-dimensional subspace
of Rn, and the components of F satisfy n — k independent linear relations (namely,
the relations satisfied by the rows of A).

EXAMPLE 1. Let F = (/i, /2, /3) be given by the matrix

A =

that is,

fi(x,y,z) =x + 2y - z,

f2(x,y,z) = x -3y + 4z,

fo{x,y,z) = 2x -y + 3z.

It is easily verified that det A = 0, that the first two rows of A are independent,
and that the third row is the sum of the first two. This last relation means that
the functions /i, /2, /3 satisfy the linear relation fz — f\ + f%. Equivalently,
the range of F is the plane defined by the equation ys = y\ + y2.
3.5. Functional Dependence 141

Example 2. Let F = (/i, /2, /3) be given by the matrix

A =

that is,

fi(x,y,z) = x + 2ij- z,
f2(x,y,z) = 2x + 4y - 2z,

h{x,y,z) = -x-2y + z.

Here the rank of A is 1, and the functions fj satisfy the relations fi = 2/i,
f'i = — 3/i. The range of F is the line passing through the origin and the point
(1,2,-3).

More generally, one can consider linear maps F : M.m —» M.n defined bynxm
matrices A. The range of such a map is a linear subspace of Rn whose dimension
is the rank of A. It must happen when n > m, and may happen when n < m,
that this subspace is a proper subspace of Rn, in which case the components of F
satisfy nontrivial linear relations.
Now let us return to the study of more general functions. The appropriate ana
logue of "linear dependence" for nonlinear functions is "functional dependence,"
which means that the functions in question satisfy a nontrivial functional relation,
in other words, that one of them must be expressible as a function of the others.
We shall formulate this idea precisely in a way that is appropriate for C1 func
tions, although the notion of functional dependence does not really depend on any
differentiability conditions.
Suppose /i,..., fn are C1 real-valued functions on an open set U C Mm.
We say that /i, ...,/„ are functionally dependent on U if there is a C1 function
$ : Rn ->■ R such that

(3.19) $(/i(x),..., /n(x)) = 0 and V«(/i(x),..., /n(x)) ± 0 for x € U.

The nonvanishing of V$ guarantees, via the implicit function theorem, that the
equation $ = 0 can be solved locally for one of the variables in terms of the others;
in other words, one of the functions fj can be expressed in terms of the remaining
ones.

Geometrically, (3.19) means that the range of the map f = (/i,..., fn) is
contained in the hypersurface {y : <fr(y) = 0} in W1, so that it is at most (n — 1)-
dimensional. (It might be even smaller, of course; the functions fj might satisfy
other relations in addition to the equation $(f (x)) = 0.)
142 Chapter 3. The Implicit Function Theorem and Its Applications

Example 3. The functions

fi(x,y,z) = x + y + z,
/2(x, y, z) = xy + xz + yz,

f3{x,y,z) =x2 + y2 + z2
are functionally dependent on R3, for fz = ff — 2/2.
Example 4. The functions fi(x,y) = 3x + 1, /2(x,y) = x2 — y are not
functionally dependent on any open set in R2. Indeed, the transformation f =
(/i) /2) is a one-to-one map from R2 onto itself whose inverse g = (51,52) is
given by gi(u, v) = \{u — 1), §2(11, v) = jjj(« - I)2 - u; hence the values of
f are not subject to any restrictions.

It should be noted that the question of functional dependence is interesting

only when the number of functions does not exceed the number of independent
variables; when it does, functional dependence is almost automatic. For example,
if / and g are any two C1 functions of one variable, then / and g are functionally
dependent on any interval / on which either /' 7^ 0 or g' / 0. Indeed, if /' ^ 0
on /, then / is one-to-one on / and so has an inverse; then $(f(x),g(x)) — 0 on /
where $(u,v) = g(f~l(u)) — v.
The main results of this section concern the close relation between the func
tional dependence of a family of functions and the linear dependence of their linear
approximations. To begin with, we consider the case where the number of functions
equals the number of independent variables.

3.20 Theorem. Suppose f = (fi,...,fn) is a Cl map on some open set U C W1.

If fu • • ■) fn are functionally dependent on U, then the Jacobian det Di vanishes
identically on U.

Proof. Functional dependence of the fj's means that there is a C1 function $ such
that $(f(x)) = 0 and V$(f(x)) ^ 0 for x g U. Differentiation of the equation
$(f (x)) = 0 with respect to the variables xi,... ,xn via the chain rule yields

n) = 0,

2) + • • • + {dn$)(dnfn) = 0,

where the derivatives of $ are evaluated at f (x) and the derivatives of the /,-'s are
evaluated at x. Thus, at each xGt/, the system of equations

(<9i/i)yi + (dl/2)2/2 + • • • + (&lfn)Vn = 0,

+ (dnf2)y2 + ■■■ + (dnfn)yn = 0,

3.5. Functioned Dependence 143

has a nonzero solution, namely y = V$(f (x)). Therefore, its coefficient matrix
(<9j/fc(x)), which is nothing but the transpose of -Df(x), must be singular, and
hence det -Df(x) = 0. □

More interesting is the fact that the converse of this theorem is also true: The
vanishing of the Jacobian det Di implies the functional dependence of the //s. We
now present a version of this result with an additional hypothesis (the constancy of
the rank of Df) that yields a sharper conclusion. We formulate it so that it also cov
ers the case when the number of functions differs from the number of independent
variables.

3.21 Theorem. Let f = (/i,..., fn) be a C1 map from a connected open set
U C Rm into W1. Suppose that the matrix Di{x) has rank k at every x € U,
where k < n. Then every xo G U has a neighborhood N such that f\,..., fn are
functionally dependent on N and t(N) is a smooth k-dimensional submanifold of
K .

(The restriction to a small neighborhood iV is necessary because the set f (U)

can cross itself, as in Example 8 in §3.2.)
Since Di(x) is an n x m matrix, its rank k always satisfies k <m and k < n.
When k = m, the situation described here is simply the representation in para
metric form of an m-dimensional submanifold of W1, as discussed in §§3.2-3, and
the conclusion of the theorem is that such a submanifold can also be described as
the locus of a system of equations. In other words, the case k — m boils down
to Theorems 3.11b and 3.15b and their generalizations to higher dimensions. The
case where more needs to be said is the one where k <m.
Rather than proving this theorem in complete generality, we shall restrict atten
tion to the case where m = n = 3 and k is 1 or 2. The ideas in the general case
are the same; only the details are more cumbersome. (See also Exercise 2.) Let us
restate the theorem for the special case:

3.22 Theorem. Let f = (/, g, h) be a Cl map from a connected open set U C M.3
into R3. Suppose that the matrix Df(x) has rank k at every xeP, where k = 1
or 2. Then every Xq £ U has a neighborhood N such that the functions f,g,h are
functionally dependent on N and f(N) is a smooth curve (ifk = 1) or a smooth
surface (ifk = 2).

Proof. Let x = (x,y,z), u = /(x), v = g(x), and w = /i(x), and fix x0 =

(xo,yo,^o) G U.
First suppose k = 1. Since the matrix Df (xo) has rank 1, it has at least one
nonzero entry; by relabeling the functions and variables, we may assume that the
144 Chapter 3. The Implicit Function Theorem and Its Applications

(1,1) entry is nonzero, that is, dxf(xo) ^ 0. By the implicit function theorem,
then, the equation u = f(x, y, z) can be solved near x = xo, u = uq = /(xo), to
yield x as a function of y, z, and u. Then v and w turn into functions of y, z, and
u also. Implicit differentiation of the equations u = f(x, y, z) and v = g(x, y, z)
with respect to y (taking y, z, and u as the independent variables) yields

dyv = (dxg)(dyx) + (dyg).

Solving the first equation for dyx and substituting the result into the second equa
tion then yields

But since Df has rank 1, all of its 2 x 2 submatrices are singular; therefore,
d(f,g)/d(x,y) = 0 and hence dyv = 0. Restricting to a convex neighborhood
of (yo> zo, uo), we conclude that v is independent of y. For exactly the same rea
son, v is independent of z, and w is independent of y and z. That is, v and w are
functions of u alone, say v = ip(u) and w = ip(u). This shows that f,g, h are
functionally dependent — g(x.) = ip(f(x.)) and /i(x) = ip(f(x)) — and that the
image of a neighborhood of Xo under f is the locus of the equations v = <p(u),
w = ip(u), which is a smooth curve.
Now let us turn to the case k — 2. Here some 2x2 submatrix of Df{x.o) is
nonsingular; by relabeling the functions and variables, we can assume that it is the
one in the upper left corner, so that d(f, g)/d(x, y) is nonzero at xo. By the implicit
function theorem, the equations u = f(x,y,z) and v = g{x,y,z) can be solved
near x = xo, u = uq = /(xo), v = vq = g(xo), to yield x and y as functions of
u, v, and z. Taking u, v, and z as the independent variables, then, we differentiate
the equations u = f(x,y,z), v = g(x,y,z), and w = h(x,y,z) implicitly with
respect to z to obtain

0 = (dxg)(dzx) + (dyg){dzy) + (dzg),

8zw = {dxh)(dzx) + (dyh)(dzy) + (dzh),

) =-dzf,
(dxg)(dzx) + (dyg)(dzy) =-dzg,

(dxh){dzx) + (dyh){dzy) - (dzw) = -dzh.

3.5. Functional Dependence 145

These equations may be solved simultaneously for dzx, dzy, and dzw. By Cramer's
rule (Appendix A, (A.54)),

9yf ~9zf\ I (9xf dyf

dzw = det | dxg dyg -dzg /det dxg 9yg
dyh -dzh) I \9xh dyh
d(f,g)_
d{x,y,z) I 9(x,y)'

The denominator is nonzero by assumption, but the numerator is zero because Di

has rank 2. Hence w is independent of z; that is, w depends only on u and v,
say w = <p(u,v). This shows that f,g,h are functionally dependent — /i(x) =
y>(/(x), g(x)) — and that the image of a neighborhood of x0 under f is the locus
of the equation w = ip(u, v), which is a smooth surface. □

We conclude with a few words about the assumption that the rank of Df is con
stant. Suppose that A{x) is a matrix whose entries depend continuously on x 6 U
(U an open subset of Rm), and the rank of A(x0) is k. Since a set of linearly inde
pendent vectors remains linearly independent if the vectors are perturbed slightly,
the rank of A(x) is at least k when x is sufficiently close to xo. In other words,
for each k the set {x <E U : rank(^4(x)) > k} is open. In particular, if k0 is the
maximum rank of A(x) as x ranges over U, then {x e U : rank(.A(x)) = k0} is
open.

Now, in this chapter we have been concerned with C maps f : U —> Rn (U

an open subset of Rm) and the matrix in question is the derivative Df (x). If k0
is the maximum rank of this matrix as x ranges over U, the set V = {x G U :
rank(i3>f (x)) = ko} is open, and the theorems of this chapter can be applied on V.
(The implicit function and inverse mapping theorems deal with the case when ko is
as large as possible, namely, ko = min(m, n); the theorems of this section provide
information for smaller values of k.) The typical situation is that V is dense in U,
that is, the set U \ V has no interior points. Thus, the structure of the mapping
f near "most" points of U (the ones in V) is fairly simple to understand, but at
the remaining points, various kinds of singularities can occur. The study of such
singularities is a substantial and rather intricate branch of mathematical analysis.

EXERCISES

1. For each of the following maps f = {f,g,h), determine whether j.g, h are
functionally dependent on some open set U C R3 by examining the Jacobian
146 Chapter 3. The Implicit Function Theorem and Its Applications

d(f,g,h)/d(x,y,z). If they are, determine the rank of Di on U and find

functional relations (one relation if rank(£>f) = 2, two relations if rank(Z)f) =
1) satisfied by /, g, h.
a. f(x, y,z) = x + y - z, g(x, y, z) = x - y + z, h(x, y, z) = x2 + y2 +
z2 - 2yz.
b. f(x, y, z) = x2 + y2 + z2, g(x, y:z) = x + y + z, h(x, y,z) = y - z.
c. f(x, y, z) = y1/2 sin x, g(x, y,z)=y cos2 x - y, h(x, y, z) = z - 3.
d. f(x, y, z) = xy+z, g(x, y, z) = x2y2+2xyz+z2, h(x, y, z) = 2-xy-z.
e- f(x, y, z) = log x - log y + z, g(x, y, z) = log x - log y-z, h(x, y, z) =
{x2 + 2y2)/xy.
f. f(x, y,z) = x -y + z, g(x, y, z) = x2 - y2, h(x, y,z) = x + z.
2. Write out the statement and give a precise proof for the following special cases
of Theorem 3.21, along the lines of Theorem 3.22.
a. m — n = 2, k = 1.
b. m = 2, n = 3, k = 1.
Chapter 4

INTEGRAL CALCULUS

In this chapter we study the integration of functions of one and several real vari
ables. As we assume that the reader is already familiar with the standard techniques
of integration for functions of one variable, our discussion of integration on the line
is limited to theoretical issues. On the other hand, some of these issues arise also in
higher dimensions, and we shall sometimes invoke the careful treatment of the one-
variable case as an excuse for being somewhat sketchy in developing the theory for
several variables.
In elementary calculus, the term "integral" can refer either to the antiderivative
of a function / or to a limit of sums of the form Yl fixj)<^Xj; one speaks of in
definite or definite integrals. At the more advanced level, and in particular in this
book, "integral" almost always carries the latter meaning. The notion of integra
tion as a sophisticated form of summation is one of the truly fundamental ideas of
mathematical analysis, and it arises in many contexts where the connection with
differentiation is tenuous or nonexistent.

4.1 Integration on the Line

Recall that for a nonnegative function /, the basic geometric interpretation of the
integral Ja& f(x) dx is as the area of the region between the graph of / and the x-
axis over the interval [a, b]. The idea for computing this area is to subdivide the
interval [a, b] into small subintervals [xq, xi], [x\, X2], ■■■, [xj-i,Xj], with xq = a
and xj = b, and to approximate the region under the graph of / by a union of
rectangles based on the intervals [xj-i,Xj]. If we choose the height hj of the
jth rectangle to be smaller (resp. larger) than all the values of / on the interval
[xj-i,xj}, the corresponding sum ^1 hj(xj — Xj-i) will be a lower (resp. upper)

147
148 Chapter 4. Integral Calculus

bound for the area under the graph of /. If all goes well, these lower and upper
approximations will approach each other as we subdivide the interval [a, b] into
smaller and smaller pieces, and their common limit will be the integral of /.
Let us make this more precise, introducing some useful definitions along the
way. A partition P of the interval [a, b] is a subdivision of [a, b] into nonover-
lapping subintervals, specified by giving the subdivision points x\,. .., xj-i along
with the endpoints xq — a and x j = b. In symbols, we shall write

P = i,... ,xj], with a = xq < x\ < ■ ■ ■ < xj = b.

If P and P' are partitions of [o, b], we say that P' is a refinement of P if P' is
obtained from P by adding in more subdivision points, that is, if P C P'.
Observe that if P and Q are any two partitions of [a, b], they can be combined
into a single partition P U Q whose subdivision points are those of P together with
those of Q; P U Q is a refinement of both P and Q.
Now let / be a bounded real-valued function on [a, b]. (We make no continuity
assumptions on / at this point.) Given a partition P = {xq, ... , xj} of [a, b], for
1 < j < J we set
(4.1)
rrtj = inf{/(x) : Xj-i < x < Xj}, Mj = sup{/(x) : Xj-\ < x < x,j}.

(If / is continuous, rrij and Mj are just the minimum and maximum values of
/ on [xj-i,Xj], which exist by the extreme value theorem.) We then define the
lower Riemann sum spf and the upper Riemann sum Spf corresponding to the
partition P by

(4.2) spf = 2_^ mj(xj — xj-l)i Spf — 2_^ Mj(xj — xj-i)-

l i

See Figure 4.1, where the lower and upper Riemann sums are the sums of the areas
of the rectangles, an area being counted as negative if the rectangle is below the
x-axis.
If m and M are the infimum and supremum of the values of / over the whole
interval [a, b], we clearly have rrij > m and Mj < M for all j, and hence

spf > Tn/_\{xj — Xj-i) = m{b — a),

M\ ^(rp . _ -r . \ — ]\/f(h — n)

1
4.1. Integration on the Line 149

7
V

FIGURE 4.1: Lower and upper Riemann sums.

The same argument shows that if one of the subintervals [xj-%,Xj] is subdivided
further, the lower sum spf becomes larger while the upper sum Spf becomes
smaller. In short:

4.3 Lemma. IfP' is a refinement ofP, then sp/f > spf and Sp>f < Spf.

An immediate consequence of this is that any lower Riemann sum for / is less
than any upper Riemann sum for /:

4.4 Lemma. IfP and Q are any partitions of [a, b], then spf < Sqf.

Proof. Consider the common refinement P U Q. By Lemma 4.3,

spf < spuqf < Spuqf < Sqf.

Next, we define the lower and upper integrals of / on [a, b] by

J*(/)= sup spf, = in(SPf,

the supremum and infimum being taken over all partitions P of [a, b}. By Lemma
4.4, we have /„(/) < IaU)- If me upper and lower integrals coincide, / is called
Riemann integrable on [a,b], and the common value of the upper and lower in
tegrals is the Riemann integral /afe f(x) dx. We shall generally omit the eponym
"Riemann," as the Riemann integral is the only one we shall use in this book, but it
is significant not only for historical reasons but in order to distinguish the Riemann
integral from the more sophisticated Lebesgue integral.
At first sight it would seem difficult to determine whether a function / is inte
grable and to evaluate its integral, as the definitions involve all possible partitions
of [a, 6]. The following lemma is the key to making these calculations more man
ageable.
150 Chapter 4. Integral Calculus

4.5 Lemma. Iff is a boundedfunction on [a, b], the following conditions are equiv
alent:
a. f is integrable on [a, b].
b. For every e > 0 there is a partition P of [a, b] such that Spf — spf < e.

Proof. If Spf — spf < e for some partition P, then Iaf — I^f < e, and since
e is arbitrary, it follows that laf = /£/, i.e., / is integrable. Conversely, if /
is a bounded function and e is positive, we can find partitions Q and Q' of [a, b]
such that 5,3/ < laf + \e and sq/f > l}'af - \e. Thus, if / is integrable, we have
Sqf-SQ'f < e. LetP = QuQ'; then by Lemma 4.3, s'Qf < sPf < SPf < SQf,
so Spf - sPf < sQf - SQ/f < e. □

The condition (b) in Lemma 4.5 not only gives a workable criterion for integra-
bility but also gives us some leverage for computing the integral of an integrable
function /. Indeed, for any partition P we have

spf < / f(x) dx < Spf,

so if Spf - spf < e, Spf and spf are both within e of Ja6 f(x) dx. The latter
quantity is therefore the limit of the sums Spf or spf as P runs through any
sequence of partitions such that Spf — spf -* 0.
We next present the fundamental additivity properties of the integral, which are
are easy but not quite trivial consequences of the definitions:

4.6 Theorem.
a. Suppose a < b < c. If f is integrable on [a, b] and on [b, c], then f is integrable
on [a,c], and

(4.7) T f{x) dx= f f{x) dx + f fix) dx.

Ja Ja Jb

b. If f and g are integrable on [a, b], then so is f + g, and

(4.8) f [f(x)+g(x)]dx= f f{x)dx+ f g{x)dx.

Jtx Jo. J a,

Proof, (a) Given e > 0, let P and Q be partitions of [a, b] and [b, c], respectively,
such that Spf - spf < |e and Sqf — sgf < \t. Then P U Q is a partition of
[a, c] and
— Spf + SQf, SpyjQf = Spf + SQf.
4.1. Integration on the Line 151

It follows that SpUQf - spuqf < e, so that / is integrable on [a, c] by Lemma 4.5.
Moreover, j^ f(x) dx is within e of Spyjqf, and fa f(x) dx, and Jfcc f{x) dx are
within |e oiSpf and Sq/, respectively, so /ac /(x) dx is within 2e of Ja f(x) dx-\-
fb f(x) dx. Since e is arbitrary, (4.7) follows.
(b) Given e > 0, choose partitions P and Q of [a, b] such that 5p/ — spf < \e
and Sqq — sqq < \e, and let R = PuQbe the common refinement of P and
Q. Then by Lemma 4.3 we have SRf - sRf < Spf - spf and Srq - sag <
Sgg — sqg. Moreover, the maximum of the sum of two functions is at most the
sum of the maxima, and the minimum of the sum is at least the sum of the minima,
so

SrU + g)< SRf + SRg, sR{f + g)> sRf + sRg.

Hence,

SrU + g)< SRf + SRg < sRf + \t + sRg + \e< sR{f + g) + e.

In other words, SrU +9)- sR(f + g) < e, so that / + g is integrable by Lemma

4.5. The formula (4.8) then follows in much the same way as (4.7). □

Remark. We make the usual convention that

pa rb
rb

/ f{x)dx = - / f(x)dx;
Jb Ja
then (4.7) holds no matter how the points a, b, c are ordered.
The following theorem lists some more standard properties of integrals. They
are all quite easy to derive from the definitions with the help of Lemma 4.5, and we
leave their proofs as Exercises 2-5.

4.9 Theorem. Suppose f is integrable on [a,b].

a. If c € R, then cf is integrable on [a, b], and Ja cf(x) dx = cfa f(x) dx.
b. If[c, d] C [a, b], then f is integrable on [c, d}.
c. If g is integrable on [a, b] and f(x) < g(x) for x G [a, b], then Ja f(x) dx <
Ia9(x)dx.
d. |/| is integrable on [a, b], and f f(x) dx\ < fa \f(x)\ dx.
We now derive some useful criteria for integrability. The first one has a very
simple proof, and in conjunction with Theorem 4.6a it establishes the integrability
of most of the functions that arise in elementary calculus. (Such functions have
only a finite number of local maxima and minima on any bounded interval [a,b],
so one can break [a, b) up into finitely many subintervals on which the function in
question is monotone, apply Theorem 4.10 on each subinterval, and then add the
results by Theorem 4.6a.)
152 Chapter 4. Integral Calculus

Figure 4.2: An increasing function and a partition with equal subin-

tervals. The difference between the upper and lower Riemann sums is
the sum of the areas of the solid rectangles, which is easily found by
stacking them.

4.10 Theorem. If f is bounded and monotone on [a, b], then f is integrable on

Proof. Suppose / is increasing on [a, b]; the proof is similar if / is decreasing.

Consider the partition P^ of [a, b) into k equal subintervals of length (b — a)/k.
Since / is increasing, the quantities rrij and Mj in (4.1) are given by

and hence the lower and upper Riemann sums are

and their difference is

k k

This can be made as small as we please by taking k sufficiently large, so / is

integrable by Lemma 4.5. (The geometry of this calculation is shown in Figure
4.2.) □

The next criterion for integrability is the one that is most commonly stated in
calculus books. Its proof, however, is frequently omitted because it relies on the
notion of uniform continuity that we studied in §1.8.

4.11 Theorem. If f is continuous on [a, b], then f is integrable on [a, b\.

4.1. Integration on the Line 153

Proof. First, / is bounded on [a, b] by Theorem 1.23, so the upper and lower Rie-
mann sums for any partition exist. By Theorem 1.33, / is uniformly continuous
on [a, b); thus, given e > 0, we can find 5 > 0 so that \f{x) — f{y)\ < e/(b - a)
whenever x,y € [a, b] and \x — y\ < 5. Let P be any partition of [a, b] whose
subintervals [xj-\, Xj] all have length less than 5. Then \f(x) — f(y)\ < e/(b - a)
whenever x and y both lie in the same subinterval, and in particular the maximum
and minimum values of / on that subinterval differ by less than e/(6 — a). But this
means that

Spf - sPf =
l

By Lemma 4.5, then, / is integrable. □

Theorem 4.11 can be extended to functions that have some discontinuities, as

long as the set of discontinuities is "small." The following result suffices for most
practical purposes.

4.12 Theorem. Iff is bounded on [a, b] and continuous at all except finitely many
points in [a, b], then f is integrable on [a, b].

Proof. Let yi,..., y^ be the points in [a, b] where / is discontinuous, and let m
and M be the infmum and supremum of {f{x) : a < x < b}, the set of values of /
on [a, b). Given 5 > 0, let

It = [a,b] n [yi-6, yi + S],

and let
L

U = \Jli, V = [a,b]\Uiat.
i

Thus U is a union of small intervals that contain the discontinuities of /, and V is

the remainder of [a, b]. Each interval Im has length at most 28, and there are L of
these intervals, so the total length of the set U is at most 2L5. On the other hand,
V is a finite union of closed intervals, on each of which / is continuous.
Let P be any partition of [a, b) that includes the endpoints of the intervals Im
among its subdivision points. Then we can write
154 Chapter 4. Integral Calculus

where Spf (resp. Sp'f) is the sum of the terms Mj(xj — xx-\) in Spf for which
the interval [xj-i,Xj] is contained in U (resp. V), and likewise for sp'f and Sp/.
Now, let e > 0 be given. Since / is continuous on each of the closed intervals
that constitute V, Theorem 4.11 shows that we can make

qV f V f . 1
DpJ— SpJ < g£

by choosing the partition P sufficiently fine. On the other hand,

suPf -suPf= £

< (Af - m)(length of U) < (M - m)2LS,

and we can make this less than \e by taking 5 < e/2L(M — m). In short, for a
suitably chosen P we have Spf — spf < e, so / is integrable by Lemma 4.5. □

The preceding argument actually proves more than is stated in Theorem 4.12.
It is not necessary that the set of discontinuities of / be finite, only that it can be
covered by finitely many intervals /i,..., Ii whose total length is as small as we
please. Certain infinite sets, such as convergent sequences, also have this property
(Exercise 6). We make it into a formal definition: A set Z C M. is said to have zero
content if for any e > 0 there is a finite collection of intervals I\,,.., Ii such that
(i) Z C Uf Ih and (ii)tne sum of the lengths of the JTj's is less than e. The proof of
Theorem 4.12 now yields the following result:

4.13 Theorem. Iff is bounded on [a, b] and the set ofpoints in [a, b] at which f is
discontinuous has zero content, then f is integrable on [a,b].

Theorem 4.13 is only a technical refinement of Theorem 4.12, and the reader
should not attach undue importance to it.1 We mention it because its analogue in
higher dimensions does play a significant role in the theory, as we shall see. We
also remark that neither of Theorems 4.10 and 4.13 includes the other; the set of
discontinuities of a monotone function need not have zero content, and there are
continuous functions that are not monotone on any interval.
If / is an integrable function on [a, b], the value of fa f(x) dx is somewhat
insensitive to the values of / at individual points, in the following sense:

4.14 Proposition. Suppose f and g are integrable on [a, b\ and f(x) = g(x) for
all except finitely many points x € [a, b]. Then Ja f(x) dx = fa g(x) dx.

1 It does, however, point the way toward a necessary and sufficient condition for a function to be
integrable, which we shall describe at the end of §4.8.
4.1. Integration on the Line 155

Proof. First suppose g is identically zero. That is, we are assuming that f(x) = 0
for all x e [a, b] except for finitely many points j/i,..., yi,. Let Pfc be the partition
of [a, b] into k equal subintervals, and take k large enough so that the points yi all
lie in different subintervals. Then

min

Both these quantities tend to zero as k —> oo, and hence Ja f(x) dx = 0.
The general case follows by applying this argument to the difference f — g. □

The main use of Proposition 4.14 is in the context of functions with finitely
many discontinuities, as in Theorem 4.12. For such a function / there is often no
"right" way to define / at the points where it is discontinuous. Proposition 4.14
assures us that this problem is of no consequence as far as integration is concerned;
we may define / at these points however we like, or indeed leave / undefined there,
without any effect on Ja f(x) dx.
Next, we present a general version of the fundamental theorem of calculus. Its
two parts say in effect that differentiating an integral or integrating a derivative
leads back to the original function.

4.15 Theorem (The Fundamental Theorem of Calculus).

a. Let f be an integrable function on [a, b]. For x € [a, b], let F(x) = f* /(£) dt
(which is well defined by Theorem 4.9b). Then F is continuous on [a, b]; more
over, F'(x) exists and equals f(x) at every x at which f is continuous.
b. Let F be a continuous function on [a, b] that is differentiate except perhaps at
finitely many points in [a, b], and let f be a function on [a, b] that agrees with
F' at all points where the latter is defined. If f is integrable on [a, b], then
J%f(t)dt = F{b) - F(a).
Proof, (a) If x, y € [a, b], by (4.7) we have

F(y)-F(x)= I" f(t)dt.

Jx
Let C = sup{|/(t)| :te[a, b]}; then by Theorem 4.9d,

\F(y) - F(x)\ < I" \f(t)\dt <C [U dt = C\y - x\,

JX JX
which implies that F is continuous. Next, suppose that / is continuous at x; thus,
given e > 0, there is a 5 > 0 so that |/(i) — f(x)\ < e whenever \t — x\ < 8. Since

y — xj y — xj
156 Chapter 4. Integral Calculus

we have
F(y)-F(x) 1
[f(t)-f(x)]dt.
y-x y-x Jx
Hence, if \y - x\ < 5, we have |/(£) — f{x)\ < e for all t between y and x, so

F(y)-F(x)
edt = e.
y-x \y -x

It follows that l\vny^x[F(y) — F(x)]/(y — x) = f(x), as claimed.

(b) Let P = {xo,...,xj}bea partition of [a,&]; by adding in extra points,
we may assume that all the points where F is not differentiable are among the
subdivision points Xj. Then, for each j, F is continuous on the interval [xj-i, Xj]
and differentiable on its interior, so by the mean value theorem,

F{xj) - F(Xj-i) = Fl(tj)(xj - xj-i) = f(tj){xj - Xj-!)

for some point tj G (xj-i,Xj). Adding up these equalities yields

F(b) - F(a) = F(xj) - F(xo) = £ f{t3){x3 - Xj-i),

which implies that

spf < F(b) - F(a) < Spf.

Since / is integrable, we can make spf and Spf as close to fa f(x) dx as we like
by choosing P suitably, and the desired result follows immediately. □

We have developed the notion of the integral of a function / in terms of the up

per and lower Riemann sums Spf and spf. More generally, if P = {x0,..., xj}
is a partition of [a, b] and tj is any point in the interval [xj-i,Xj] (1 < j < J), the
quantity
j

E f(tj)ixj -x3-i
l

is called a Riemann sum for / associated to the partition P. Clearly, if rrij and
are as in (4.1) we have rrij < f(tj) < Mj, so that

spf <

Thus, if / is integrable and we choose the partition P so that spf and Spf are
good approximations to f f(x) dx, all the Riemann sums corresponding to P will
also be good approximations to f f(x) dx.
4.1. Integration on the Line 157

One last question should be addressed: Given an integrable function / on [a, b],
for which partitions P do the sums spf and Spf furnish a good approximation to
/o f(x) dxl It might seem that the answer might depend strongly on the nature of
the function /, but in fact, any partition whose subintervals are sufficiently small
will do the job. More precisely:

4.16 Proposition. Suppose f is integrable on [a, b]. Given e > 0, there exists S > 0
such that if P = {xo,..., xj} is any partition of [a, b] satisfying

j - Xj-i) <S,

the sums spf and Spf differfrom Ja f(x) dx by at most e.

Proof The proof is presented in Appendix B.3 (Theorem B.7). □

Proposition 4.16 shows, in particular, that one can always compute Ja6 f{x) dx
as the limit as k —> oo of spkf or Spkf, where Pk is the partition of [a, b] into k
equal subintervals.
One final remark: The definite integral, which is defined as a limit of Riemann
sums, may be considered on the intuitive level as a sum of infinitely many infinites
imal terms. This notion, which is probably quite obvious to the alert reader, is often
not stated explicitly in mathematics texts because of its lack of rigorous meaning.
But the fact is that in many situations — and we shall encounter several of them
later on — the interpretation of the integral as a sum of infinitesimals is the clearest
way to understand what is going on.

EXERCISES

1. Let f(x) = 1 if x is rational, f(x) = 0 if x is irrational. Show that / is not

integrable on any interval.

2. Prove Theorem 4.9a. (Hint: Show that sP{cf) = csPf and SP(cf) = cSPf
if c > 0, and sP{cf) = cSPf and SP(cf) = csPf if c < 0.)
3. Prove Theorem 4.9b. (Hint: Consider partitions of [a, 6] for which c and d are
among the subdivision points.)

4. Prove Theorem 4.9c.

5. Prove Theorem 4.9d. (Hint: To prove that |/| is integrable, show that Sp\f\ -
sp\f\ < Spf - spf. For the inequality | / f\ < f |/|, observe that ±f < \f\
and use Theorem 4.9c.)
158 Chapter 4. Integral Calculus

6. Let {xk} be a convergent sequence in R. Show that the set {xi, X2, • • •} has
zero content.

7. Let / be an integrable function on [a, b]. Suppose that f(x) > 0 for all x
and there is at least one point xq G [a, b] at which / is continuous and strictly
positive. Show that Ja f(x) dx > 0.
8. Let / be an integrable function on [a, b]. Prove the following formulas directly
from the definitions:
a. For any c > 0, /o6 f(x) dx - c fj° f(cx) dx.
b. f*f(x)dx = fZbaf(-x)dx.
c. For any c G R, /'J f(x) dx = /fl6_cc /(a + c) dx.
9. Suppose g and h are continuous functions on [a, b], and / is a continuous func
tion on M2. Show that for any e > 0 there is a <5 > 0 such that if P =
{xo, • • •, xj} is any partition of [a, b] satisfying maxi<j<j(xj — Xj-i) < S,
then

f(g(x),h(x))dx - < e
' a
3=1

for any choice of x'a, x'' in the interval [xj-i,Xj]. (The point is that x'j and x^'
need not be equal, so the sum in this inequality may not be a genuine Riemann
sum for the integral.)

4.2 Integration in Higher Dimensions

In this section we develop the theory of multiple integrals. The basic ideas are much
the same as for single integrals; the most serious complication comes from the
greater variety of regions over which integration is to be performed. To minimize
the complexity of the notation, we first develop the two-dimensional case and then
sketch the extension to higher dimensions.
Here and in what follows we shall employ the following notation. If 5 and T
are sets, their Cartesian product S x T is the set of all ordered pairs (s, t) with
s G S and t € T. For example, the plane is the Cartesian product of the line with
itself: I2 = K x 1. This idea extends in the obvious way to products of n sets,
with ordered n-tuples replacing ordered pairs; for example, I3 = 1 x 1 x 1. We
can also think of 1R3 asR2xRoras!xR2.

Double Integrals. We begin by defining the double integral of a function over

a rectangular region in the plane. In this chapter, by a rectangle we shall mean a
4.2. Integration in Higher Dimensions 159

set of the form

R = [a, b] x [c, d\ = {(x, y) E E2 : x E [a, b], y E [c, d]}.

(Thus, a "rectangle" in this sense is always closed, and its sides are always parallel
to the coordinate axes.) A partition of R is a subdivision of R into subrectangles
obtained by partitioning both sides of R. Thus, a partition P is specified by its
subdivision points,

r ■, \a = x0 < ■■■ < xj = b,

P = {xo,...,xj;yo,...,yKt, \
[c- y0 < ■■■ < yx = a,

and it yields a decomposition of R into the subrectangles

Rjk = [xj-i,xj] x [yk-i,yk]

with area
AAjk = (xj - Xj-i)(yk - yk-i)-

Now let / be a bounded function on the rectangle R. Given a partition P as

above, we set

mjk = mf{f{x, y) : (x, y) E Rjk}, Mjk = sup{/(x, y) : (x, y) G Rjk},

and define the lower and upper Riemann sums of / corresponding to P by

J K J K

sPf = ^2Y1 mok&Ajk, Spf = Y^ Yj Mjk^Ajk.

3=1 k=l j=l fc=l

The lower and upper integrals of / on R are

LrU) = SUP spf-: IrU) = inf 5'p/>

p p

the supremum and infimum being taken over all partitions P of R. If the lower and
upper integrals coincide, / is called (Riemann) integrable on R, and the common
value of the upper and lower integrals is called the (Riemann) integral of / over
R and is denoted by

f dA or // f(x,y)dxdy.
r J Jr

These notions are entirely analogous to their one-dimensional counterparts.

The reader should refer back to §4.1 for a more detailed discussion, which can
160 Chapter 4. Integra! Calculus

easily be adapted to the present situation. However, we have not yet built a satis
factory definition of two-dimensional integrals, because we often wish to integrate
functions over regions other than rectangles. The solution to this problem is simple,
in principle: To integrate a function / over a bounded region 5 C K2, we draw a
large rectangle R that contains S, (re)define / to be zero outside of S, and integrate
the resulting function over R.
To express this neatly, it is convenient to introduce another definition. If S is a
subset of IR2 (or W1, or indeed any set), the characteristic function or indicator
function of S is the function xs defined by

xs(x) = i .
I 0 otherwise.

Now, suppose S is a bounded subset of ffi2 and / is a bounded function on M2.

Let R be a rectangle that contains S. We say that / is integrable on S if fxs is
integrable on R, in which case we define the integral of / over S by

fdA =
s J Jr.
It is easily verified that this definition does not depend on the choice of the en
veloping rectangle R, since the integrand fxs vanishes outside of S. (It also does
not depend on the values of / outside of 5. We could just as well assume that
/ is only denned on S or on some set containing 5, with the understanding that

The properties of integrals in two dimensions are very similar to those in one;
the following theorem provides a list of the most basic ones. The proof is essentially
identical to that of Theorems 4.6 and 4.9; we leave the details to the interested
reader.

4.17 Theorem.
a. If fi and fi are integrable on the bounded set S and c\,C2 6 K, then c\f\ +
C2/2 is integrable on S, and

[ JJ JJ f2dA.

b. Let S\ and S2 be bounded sets with no points in common, and let f be a

bounded function. If f is integrable on Si and on S2, then f is integrable
on Si U S2, in which case

fdA= fdA+ // fdA.

S1US2 JJSi JJS2
4.2. Integration in Higher Dimensions 161

c. If f and g are integrable on S and /(x) < <?(x) for x G S, then JJ^ f dA <
IIs

d. Iff is integrable on S, then so is \f\, and \ JJS f dA\ < JJs\f\ dA.

At this point we need to say more about the conditions under which a function is
integrable. In the one-variable situation, we can get along quite well by restricting
attention to continuous functions, but that is not the case here: Even if the function
/ is continuous, the function xs that enters into the definition of Jfs f dA is not.
The starting point is the analogue of Theorem 4.13. The notion of "zero content"
transfers readily to sets in the plane; namely, a set Z C R.2 is said to have zero
content if for any e > 0 there is a finite collection of rectangles R\,..., Rm such
that (i) Z c [jf1 Rm, and (ii) the sum of the areas of the Rm's is less than e. We
then have:

4.18 Theorem. Suppose f is a bounded function on the rectangle R. If the set of

points in R at which f is discontinuous has zero content, then f is integrable on R.

Proof. The proof is essentially identical to that of Theorem 4.13. That is, one
first shows that / is integrable if / is continuous on all of R by the argument
that proves Theorem 4.11, then encompasses the general case by the argument that
proves Theorem 4.12. Details are left to the reader. □

The notion of "zero content" is considerably more interesting in the plane than
on the line, as the sets having this property include not only finite sets but things
such as smooth curves (that is, curves parametrized by C1 functions f : [a, b] —>
M.2). The following proposition summarizes the results we will need; see also Ex
ercise 2.

4.19 Proposition.
a. //Zcl2 has zero content and U C Z, then U has zero content.
b. If Z\,..., Zk have zero content, then so does \Jt Zj.
c. If f : (ao,6o) —* M2 is of class C1, then f([a, b]) has zero content whenever
ciq < a < b < bo.

Proof. Parts (a) and (b) are easy, and their proofs are left as an exercise. As for
(c), let Pfc = {to,... ,tk} be the partition of [a, b] into k equal subintervals of
length 5 = (b - a)/k, and let C be an upper bound for {\i'(t)\ : t € [a, b}}. By
the mean value theorem applied to the two components x(t),y(t) of f (t), we have
\x(t) - x(tj)\ < CS and \y(t) - y(tj)\ < C5 for t E [tj-i,tj]. In other words,
f([tj-i,tj]) is contained in the square of side length 2C5 centered at f (tj). Hence,
f ([a, b}) is contained in the union of these squares, and the sum of their areas is
k(2CS)2 = AC2(b — a)2Ik. This can be made as small as we please by taking k
sufficiently large, so f ([a, b]) has zero content. □
162 Chapter 4. Integral Calculus

To apply Theorem 4.18 to the integrand fxs m the definition of ffs f dA, we
need to know about the discontinuities of xs- The following lemma provides the
answer.

4.20 Lemma. The function xs is discontinuous at x if and only if x is in the

boundary of S.

Proof. If x is in the interior of 5, then xs 1S identically 1 on some ball containing

x, so it is continuous at x. Likewise, if x is in the interior of the complement Sc,
then / is identically 0 near x and hence continuous at x. But if x is in the boundary
of S, then there are points arbitrarily close to x where xs = 1 and other such points
where xs = 0, so xs is discontinuous at x. □

In view of Theorem 4.18 and Lemma 4.20, to have a good notion of integra
tion over a set S, we should require the boundary of S to have zero content. We
make this condition into a formal definition: A set 5 C R2 is Jordan measurable
if it is bounded and its boundary has zero content. (We shall comment further on
this nomenclature below.) We shall generally say "measurable" rather than "Jor
dan measurable," but we advise the reader that in more advanced works the term
"measurable" refers to the more general concept of Lebesgue measurability (see
§4.8).
By Proposition 4.19, any bounded set whose boundary is a finite union of pieces
of smooth curves is measurable; these are the sets that we almost always encounter
in practice. The following theorem gives a convenient criterion for integrability.

4.21 Theorem. Let S be a measurable subset of M2. Suppose f : M.2 —> R is

bounded and the set of points in S at which f is discontinuous has zero content.
Then f is integrable on S.

Proof. The only points where fxs can be discontinuous are those points in the
closure of S where either / or xs is discontinuous. By Lemma 4.20 and Proposition
4.19b, the set of such points has zero content. By Theorem 4.18, fxs is integrable
on any rectangle R containing S, and hence / is integrable on S. □

To complete the picture, we need the following generalization of Proposition

4.14, which shows that sets of zero content are negligible for the purposes of inte
gration.

4.22 Proposition. Suppose Z C M.2 has zero content. If f : M2 —» R is bounded,

then f is integrable on Z and Jz f dA = 0.
4.2. Integration in Higher Dimensions 163

Proof. Given e > 0, there is a finite collection of rectangles R\,..., Rm such that
Z C Uf7 Rm and the sum of the areas of the Rm's is less than e. By subdividing
these rectangles if necessary, we can assume that they have disjoint2 interiors and
form part of a grid obtained by partitioning some large rectangle R. Denoting this
partition by P, the area of Rj by \Rj\, and supx |/(x)| by C, we have

M M

-Ce < -CJ2\Rj\ < sp(fxz) < Sp(fxz) <

Since e is arbitrary, the desired conclusion follows directly from the definition of
the integral. □

4.23 Corollary.
a. Suppose that f is integrable on the set S C M2. Ifg(x) = /(x) except for x in
a set of zero content, then g is integrable on S and Js g dA = Js f dA.
b. Suppose that f is integrable on S and on T, and S C\T has zero content. Then
f is integrable on SUT, and JSuT f dA = Js f dA + fT f dA.

Proof. For (a), apply Proposition 4.22 to the function / — g. For (b), we are as
suming that fxs and Jxt are integrable; moreover, by Proposition 4.22, fxsnT
is integrable and its integral is zero. But fxsur = fxs + fXT ~ fxsnT, so the
result follows. □

Area. The problem of determining the area of regions in the plane goes back
to antiquity. The first effective general method of attacking this problem was pro
vided by the integral calculus in one variable, which yields the area of a region
under a graph, or of a region between two graphs. It therefore produces a theory
of area for regions that can be broken up into finitely many subregions bounded by
graphs of (nice) functions. However, the two-variable theory of integration con
tains, as a special case, a theory of area (due to the French mathematician Jordan)
that encompasses more complicated sorts of regions too. Namely, if S is any Jordan
measurable set in the plane, its area is the integral over S of the constant function
/(x) = 1:

(area) (S) = jj ldA = JJxsdA,

the latter integral being taken over any rectangle that contains S.
Let us pause to see just what this means. Given any bounded set 5 C R2, to
compute Jjs xs dA we enclose 5 in a large rectangle R and consider a partition P

'A collection {5, } of sets is disjoint if Sj flSt =0 for j ^ k.

164 Chapter 4. Integral Calculus

/ \ / \
\\ \
y y

\ \
( ) ( )
K /
\ y

Figure 4.3: Approximations to the inner and outer areas of a region.

of R, which produces a grid of small rectangles that cover S. The lower sum for this
partition is simply the sums of the areas of the small rectangles that are contained
in S, whereas the upper sum is the sum of the areas of the small rectangles that
intersect S. Taking the supremum of the lower sums and the infimum of the upper
sums yields quantities that may be called the inner area and outer area of S:

A(S)=IR(xs), = IR(XS).

When these two quantities coincide, that is, when the characteristic function xs is
integrable, their common value is the area of S. See Figure 4.3.
When do we have A(S) = A(S)1 It is not hard to see (Exercises 3-5) that for
any bounded set S,

• 5 and its interior Smt have the same inner area;

• 5 and its closure 5 have the same outer area;

• the inner area of 5mt plus the outer area of the boundary dS equals the outer
area of the closure 5.

It follows that the inner and outer areas ofS coincide precisely when the outer area
of the boundary dS is zero. But a moment's thought shows that this is nothing but
the condition that dS should have zero content. In short, the inner and outer area
ofS coincide precisely when S is measurable. This is the explanation for the name
"measurable": The measurable sets are the ones that have a well-defined area.
Although the class of Jordan measurable sets is much more extensive than the
class of sets whose area can be computed by one-variable calculus, it is not as big
as we would ideally wish. It does not include all bounded open sets or all compact
sets, for example. Moreover, it does not behave well with respect to passage to
4.2. Integration in Higher Dimensions 165

limits: The union of a sequence of measurable sets, all contained in a common

rectangle, need not be measurable.
A simple example of the latter phenomenon can be obtained by considering the
sets Sk of all points in the unit square whose ^-coordinate is an integer multiple of
2~k. Each Sk is the union of a finite collection of line segments, so it is measurable
and its area is zero. However, the union Ui° 5^ is the set of all points in the unit
square whose ^-coordinate has a terminating base-2 decimal expansion. This set
is dense in the unit square but has no interior, from which it is easy to see that its
inner area is 0 but its outer area is 1 (Exercises 3 and 4). By "fattening up" the
sets Sk (replacing the line segments in them by thin rectangles), we can also obtain
examples of open sets and closed sets that are not measurable (Exercise 6).
The defects of the Jordan theory of area carry over more generally to the theory
of integration we are discussing, and for more advanced work one needs the more
sophisticated Lebesgue theory of measure and integration, of which we present a
brief sketch in §4.8. It is largely for this reason that we are being somewhat cavalier
about presenting all the theoretical details in this chapter; there seems to be little
virtue in expending an enormous amount of effort on a theory that must be upgraded
when one proceeds to a more advanced level.

Higher Dimensions. The theory of ?^-dimensional integrals is almost identical

to the theory of double integrals; the only reason we have not considered an arbi
trary n from the beginning is that the notation is simpler, and the geometric intuition
is clearer, when n = 2. We have merely to replace rectangles by n-dimensional
rectangular boxes, that is, regions in M.n of the form

R — [oi,&i] x • • • x [an, bn] = {x : ai < Ei < &i, ..., an < xn < bn}.

The n-dimensional volume of such a box is the product of the lengths of its sides,
~Yj=i ipj ~ ai)- (Here Y[ is the product sign, analogous to ^ for sums.) A partition
of such a box is specified by partitioning each of its "sides" [ai,&i],..., [an, bn].
The notion of "zero content" generalizes to n dimensions in the obvious way:
A bounded set Z c Rn has zero content if for any e > 0 there are rectangular
boxes R\,..., Rk whose total volume is less than e, such that Z c (Ji Rj- The
analogue of Proposition 4.19c is that smooth submanifolds of dimension k < n in
R™ (given parametrically by Cl maps f : Rk —> Rn) have zero content.
With these modifications, the definition of integrability and Theorems 4.17,
4.18, and 4.21 work just as in the 2-dimensional case. The element of area dA
becomes an element of n-dimensional volume, which may be denoted by dVn,
dnx, or dx\ ■ ■ ■ dxn: thus, the notation for n-dimensional integrals over a region
166 Chapter 4. Integral Calculus

S C Wn is

= /■ • 7
where /• • •/ is shorthand for a row of n integral signs. When n = 3, we usually
write dV instead of dV3, the V denoting ordinary 3-dimensional volume.
We conclude with a useful fact about integrals in any number of dimensions.

4.24 Theorem (The Mean Value Theorem for Integrals). Let S be a compact,
connected, measurable susbset of W1, and let f and g be continuous functions on
S with g > 0. Then there is a point a € S such that

J- ■ -J f(x)g(x) d«x = /(a) j. ■ -J <?(x)

Proof. Let m and M be the minimum and maximum values of / on 5, which exist
since 5 is compact. Since g > 0, we have mg < fg < Mg on S, and hence

m ■■■ o(x) dnx < /•■•/ /(x)^(x) d"5x <M [■■■[ 5(x) d"x.
J Js J Js

Thus the quotient (/■ • -J fg)/(J- ■ ■/ 3) lies between m and M, so by the interme
diate value theorem, it is equal to /(a) for some a G 5. □

The special case g = 1 is of particular interest:

4.25 Corollary. Let S be a compact, connected, measurable subset ofW1, and let
f be a continuous function on S. Then there is a point a£5 such that

where \S\ denotes the n-dimensional volume of S.

The ratio of J- ■ -Js f(x)dnx to the n-dimensional volume of S is, by definition,

the average or mean value of / on 5. Corollary 4.25 says that when / is contin
uous and S is compact and connected, there is some point in S at which the actual
value of / is the average value.
4.2. Integration in Higher Dimensions 167

EXERCISES

1. Prove Proposition 4.19(a,b).

2. Let / : [a, b] —> R be an integrable function.

a. Show that the graph of / in R2 has zero content. (Hint: Given a partition
P of [a, b], interpret Spf - spf as a sum of areas of rectangles that cover
the graph of/.)
b. Suppose / > 0 and let S = {(x,y) : x 6 [a, b], 0 < y < f{x)}. Show
that S is measurable and that its area (as defined in this section) equals
faf(x)dx.
3. Let S be a bounded set in R2. Show that S and 5int have the same inner area.
(Hint: For any rectangle contained in S, there are slightly smaller rectangles
contained in S[nt.)
4. Let S be a bounded set in R2. Show that S and 5 have the same outer area.
(Hint: For any rectangle that does not intersect S, there are slightly smaller
rectangles that do not intersect 5.)

5. Let S be a bounded set in M2. Show that the inner area of S plus the outer area
of dS equals the outer area of S. (Use Exercises 3 and 4.)

6. Let S be the subset of the z-axis consisting of the union of the open interval
of length \ centered at |, the open intervals of length ^ centered at \ and f,
the open intervals of length ^ centered at |, |, |, and |, and so forth. Let
U = S x (0,1) be the union of the open rectangles of height 1 based on these
intervals. Thus U is the union of one rectangle of area |, two rectangles of area
jq , four rectangles of area p,..., some of which overlap.
a. Show that U is an open subset of the unit square R — [0,1] x [0,1].
b. Show that the inner area of U is less than j.
c. Show that U is dense in R and hence that its outer area is 1. (Use Exercise
4.)
d. Let V = R\U. Show that V is a closed set whose inner area is 0 and
whose outer area is bigger than ^.
7. (The Second Mean Value Theorem for Integrals) Suppose / is continuous on
[a, b] and ip is of class Cl and increasing on [a, b]. Show that there is a point
c E [a,b] such that

fC
[ f{x)ip{x)dx = <p{a) fCf(x)dx + <p(b) [ f(x)dx.
a Ja Jc

(Hint: First suppose tp(b) = 0. Set F(x) = f* f(t)dt, integrate by parts

to show that Ja6 f(x)ip(x) dx = — /a6 F(x)ip'(x) dx, and apply Theorem 4.24
168 Chapter 4. Integral Calculus

to the latter integral. To remove the condition ip{b) = 0, show that if the
conclusion is true for / and <p, it is true for / and <p + C for any constant C.)

4.3 Multiple Integrals and Iterated Integrals

The next issue to be addressed is the evaluation of n-dimensional integrals. The

usual procedure is to reduce them to one-dimensional integrals.
Again we focus on the case n = 2, and we begin by considering the integral of
a function / over a rectangle R. Given a partition P = {xo,... ,xj;yo,- • ■ ,Vk}
of R, we pick points Xj G [xj-i,Xj] and yk G [yk-uVk] (1 < j < J, 1 < k < K)
and form the Riemann sum

J K

C H ft&jiVk) Axj AVk {Axj = Xj - Xj-i, Ayk =yk-Vk-l)-

j=i k=i

If/ is integrable on R, this double sum approximates the integral JfRf(x, y) dx dy.
On the other hand, for each fixed y, the sum ]C/=i /(^j> 2/) Axj ^s a Riemann sum
for the single integral g(y) = f^ f(x, y) dx, and then the sum Y,k=i diVk) ^Vk is
a Riemann sum for the integral Jc g(y) dy. Thus, in an approximate sense,

J K

R
=l k=l

K b
f(x,y)dx\ dy.
a

In short, if there are no unexpected pitfalls we should have

fdA = f{x,y)dx\ dy.

We could also play the same game with x and y switched, obtaining

rb
fdA = f(x,y)dy\ dx.
R

If / is continuous on the rectangle R, it is not hard to make this argument

rigorous by using the uniform continuity of /. However, we need to allow discon
tinuous functions in order to encompass integrals over more general regions, and
4.3. Multiple Integrals and Iterated Integrals 169

here there is one potential pitfall: The integrability of / on R need not imply the
integrability of f(x, yo), as a function of x for fixed yo, on [a, b]. The line seg
ment {(#, y) : a < x < b, y = yo} is a set of zero content, after all, so / could
be discontinuous at every point on it, and its behavior as a function of x could be
quite wild. This problem is actually not too serious, and we shall sweep it under
the rug by making the assumption — quite harmless in practice — that it does not
occur. The resulting theorem is as follows. It is sometimes referred to as Fubini's
theorem, although that name belongs more properly to the generalization of the
theorem to Lebesgue integrals.

4.26 Theorem. Let R = {(x,y) : a < x < b, c < y < d}, and let f be an
integrable function on R. Suppose that, for each y € [c, d], the function fy defined
by fy(x) = f(x, y) is integrable on [a, b], and the function g(y) = fa f(x, y) dx is
integrable on [c, d]. Then

(4.27) JJ fdA = J \J f{x, y) dx\ dy.

Likewise, if fx(y) = f(x, y) is integrable on [c, d]for each x € [a, b], and h(x) =
jc f(xi v) dy is integrable on [a, b], then

(4.28) JJ fdA = J \J fix, y) dy] dx.

Proof. The proof is presented in Appendix B.4 (Theorem B.9). The issue that must
be addressed is the permissibility of first letting the x-subdivisions get finer and
finer, and then doing the same for the y-subdivisions, or vice versa, as opposed to
requiring both subdivisions to become finer at the same time. □

The integrals on the right side of (4.27) and (4.28) are called iterated integrals.
It is customary to omit the brackets in these integrals and to write, for example,

f(x,y)dxdy,

with the understanding that the integration is to be done "from the inside out." That
is, the innermost integral Ja corresponds to the innermost differential dx, and the
integral with respect to the corresponding variable x is to be performed first. Some
people find it clearer to write the differentials dx and dy next to the integral signs
to which they pertain, thus:

rd rb

/ dy / dxf{x,y).
170 Chapter 4. Integral Calculus

Figure4.4: jj ■■ -dxdy versus ff ■•■dydx.

If our region of integration is not the whole rectangle R but a subset S, the in
tegration effectively stops at the boundary of S, and the limits of integration should
be adjusted accordingly. For example, if S is bounded above and below by the
graphs of two functions,

(4.29) S = {{x,y) : a < x < b, <p(x) < y <

we have
rb rip(x)
(4.30) fdA= / f(x,y)dydx.

Here it is essential to integrate first in y, then in x, since the limits <p(x) and ip(x)
furnish part of the ^-dependence of the integrand for the outer integral Ja • ■ ■ dx.
It is important to observe that if 5 is a region of the form (4.29) where <p and
ijj are of class C1, and / is continuous on S, the hypotheses in Theorem 4.26
that allow integration first in y and then in x are automatically satisfied, so that
(4.30) is valid. Indeed, the integrability of fxs on any rectangle R D S follows
from Proposition 4.19c and Theorem 4.21, and the integrability of the function
(fxs)(xi y) as a function of y for fixed x is obvious since it is continuous except
at the two points y = (p(x) and y = ip{x).
On the other hand, if S is bounded on the left and right by graphs of functions
of y, we obtain a formula similar to (4.30) with the roles of x and y reversed.
In general, most of the regions S that arise in practice can be decomposed into a
finite number of pieces S\,..., Sk, each of which is of the form (4.29) or of the
analogous form with x and y switched. By using the additivity property (Theorem
4.17b), we can reduce the computation of ffs f dA to the calculation of iterated
integrals on these subregions.
Figure 4.4 may be helpful in interpreting iterated integrals. The sketch on the
left symbolizes JJ ■ ■ ■ dxdy, in which we integrate first over the horizontal lines
that run from the left side to the right side of the region, then integrate over the
y-interval that comprises the y-coordinates of all these lines. Similarly, the sketch
on the right symbolizes JJ • • • dydx.
4.3. Multiple Integrals and Iterated Integrals 171

: -2

FIGURE 4.5: The regions of integration in Example 1 (left) and Example 2 (right).

Example 1. Find the volume of the region in R3 above the triangle T in

the xy-plane with vertices (0,0), (1, 0), and (1,2) and below the surface z =
xy + y2. (See Figure 4.5.)
Solution. The volume in question is JJT(xy + y2) dA, which can be ex
pressed as an iterated integral in two ways:

y2)dxdy or / / (xy + y2)dydx.

10 Jy/2 Jo Jo

For the sake of illustration, we perform both calculations:

+ y2)dxdy= / [\x2y + xy2]l/2dy = / {\y + y2 - |y3) dy,

JO Jy/2 Jo Jo

I I {xy + y2) dydx = f [\xy2 + \if]f dx = I ^x3 dx.

Jo Jo Jo Jo

Both single integrals on the right evaluate to g.

EXAMPLE 2. Let S be the region between the parabolas x = 4 — y2 and

x = y2 — 4. (See Figure 4.5.) A double integral JJS f(x, y) dA can be reduced
to iterated integrals in two ways. Integrating first in x is more straightforward:

/ f(x,y)dxdy.
Jy2-i

To integrate first in y, we must break up R into its left and right halves:

f(x,y)dydx+ / / f(x,y)dydx.
172 Chapter 4. Integral Calculus

The ideas in higher dimensions are entirely similar. The analogue of Theo
rem 4.26 is that an integral over an n-dimensional rectangular solid with sides
[ai, bi],..., [an, bn] can be evaluated as an n-fold iterated integral,

r r rK rbi
... fdV= •••/ f(xi,...,xn)dxi---dxn,
J JR Ja,,. J ai

provided that the indicated integrals exist. The meaning of the iterated integral
on the right is that the integration is to be performed first with respect to x\ and
last with respect to xn. However, the same formula remains valid with the n inte
grations performed in any order. The only thing that needs some care is that the
integral signs jj must be matched up with the differentials dxj in the right order
so as to get the right limits of integration, and the convention is the same as in
the case n — 2: The integrations are to be performed in order from innermost to
outermost.

When the region of integration is something other than a rectangular solid, set
ting up the right limits of integration can be rather complicated. A typical situation
in 3 dimensions is as follows: The region of integration S is the region in between
two graphs,

S = {(x,y,z) : (x,y) g U, <p(x,y) < z < ip(x,y)},

based on some region U in the xy-plane. The region U in turn is the region between
two graphs,

JJ = {(x,y) : a < x < b. cr(x) < y < t(x)},

based on an interval [a, b] C R. We then have

7 rb rr(x) rip(x,y)
fdV= / / f(x,y,z)dzdydx.
a Ja(x)
Ja Ja(x) Jtp(x,y)
J(fi(x,v)

The rule to remember is that limits of integration in an iterated integral can

depend on the remaining "outer" variables whose integration is yet to be performed,
but not on the "inner" variables that have already been integrated out. The final
answer should be a number, not a function of some of the variables!

Example 3. Find the mass of the tetrahedron T formed by the three coordi
nate planes and the plane x + y + 2z — 2 (see Figure 4.6) if the mass density
is given by p(x, y, z) = e~z.
4.3. Multiple Integrals and Iterated Integrals 173

(0,0,1)

(0,2,0)

Figure 4.6: The tetrahedron in Example 3.

Solution. There are six ways to write the triple integral JJJT e z dV as an
iterated integral, although only three of them are essentially different, namely,

r2-x fl-(x+y)/2 2-2z p2-y-2z

e zdzdydx, , e zdxdydz,
'0 JO u 0 JO
2 rl-{y/2) p2-y-2z

e z dxdz dy.
o Jo Jo
(The remaining three can be obtained from these simply by interchanging x
and y, since T and the density function are invariant under this interchange.)
Using the first of these, we obtain

_ (x+y)/2-l
(1 )dydx= x dx
10 JO

-1 -x)dx= 1 - \x2]l = 2 - 4e-\

The reader may verify that the other two iterated integrals give the same answer.

In the preceding discussion, iterated integrals appeared as a tool for evaluating

n-dimensional integrals. However, they also arise in a number of other contexts
in advanced analysis where a quantity is defined by performing two or more in
tegrations in succession. In this context, the significance of Theorem 4.26 is that
under suitable hypotheses on the integrand f, the order of integration in an iterated
integral can be reversed:

! rd rh
(4.31) f(x,y)dydx= / / f{x,y)dxdy.
Jc Ja
174 Chapter 4. Integral Calculus

More precisely, (4.31) is valid if / satisfies the conditions in Theorem 4.26 for
both (4.27) and (4.28) to hold. (See Exercise 13 for an example to demonstrate
the significance of these conditions.) The importance of this result can hardly be
overestimated; it is an extremely powerful tool for evaluating quantities defined by
integrals. We shall see a number of examples in later chapters.

Example 4. Evaluate JQ2 JL ye~xi dx dy.

Solution. The integral cannot be evaluated by elementary methods as it
stands, since e x has no elementary antiderivative. However, it can be inter
preted as JJTye x dA where T is the triangle with vertices (0,0), (1,0), and
(1,2) as in Example 1. Writing this double integral as an iterated integral in
the other order leads to an easy calculation:

/ / ye x dydx = \yl e x dx = / 2xze x dx

Jo Jo Jo Jo
x3!
2 „-:<•■

Applications. Double and triple integrals can be used to calculate physical and
geometric quantities in much the same way as single integrals. Here are a few
standard examples:

• If f(x, y) > 0, the integral /Js / dA can be interpreted as the volume of the
region in 1R3 between the graph of / and the rcy-plane that lies over the base
region S.

• Suppose that a quantity of some substance (which might be mass, elec

tric charge, a particular chemical compound, etc.) is distributed through
out a region U C M3. It is frequently useful to think of the distribution
of the substance as being described by a density function p; the meaning
of this, in practical terms, is that the amount of substance in a set S C U
is [fjs p(x) d'3x. This idea works also in other dimensions, for example, to
describe distributions of a substance on a planar surface or a line.

(The reader may wish for a more careful discussion of the meaning of p. In
formally, p(x) represents the ratio of the amount of substance in an infinites
imal cube centered at x to the volume of that cube. To make this rigorous,
one should interpret p(x) as the limit of the ratio of the amount of substance
in a finite cube centered at x to the volume of that cube as the side length of
the cube tends to zero. One can then prove, under suitable hypotheses, that
the amount of substance in any region S is fffs p(x)ci3x. But a complete
analysis of these matters is beyond the scope of this book.)
4.3. Multiple Integrals and Iterated Integrals 175

• Suppose that a massive object with mass density p{x) occupies the region
5 C R3, so that its total mass is m = fffs p(x)d3x. The center of gravity of
the object is the point x whose coordinates are Xj = m~l fJJs xJp(x)<i3x. In
the special case where p = 1, x is the centroid of the region 5, which is the
point whose coordinates are the average values of the coordinate functions
on S. The center of mass, in general, can be interpreted similarly as the point
whose coordinates are the weighted averages of the coordinate functions on
S where the weighting is given by the density p.

• Again suppose that a massive object with mass density p(x) occupies the
region S C R3, and let L be a line in R3. The moment of inertia of the
body about the line L, a quantity that is useful in analyzing rotational motion
about L, is JJJS d(x)2/9(x)ci3x, where d(x) is the distance from x to L. (For
example, if L is the z-axis, then d(x, y, z)2 = x2 + y2.)

EXERCISES

1. Evaluate the following double integrals.

a. JJs(x + 3y3) dA, S = the upper half (y > 0) of the unit disc x2 + y2 < 1.
b. JJs{x2 — ^/y) dA, S = the region between the parabola x = y2 and the
line x = 2y.

2. Find the volume of the region above the triangle in the xy-plane with vertices
(0,0), (1,0), and (0,1), and below the surface z = Qxy(l - x — y).
3. For the following regions S C M.2, express the double integral JJS f dA in
terms of iterated integrals in two different ways.
a. S = the region in the left half plane between the curve y = x3 and the line
y = 4x.
b. S = the triangle with vertices (0, 0), (2, 2), and (3,1).
c. S = the region between the parabolas y = x2 and y = 6 — 4x — x2.
4. Express each of the following iterated integrals as a double integral and as an
iterated integral in the opposite order. (That is, find the region of integration
for the double integral and the limits of integration for the other iterated inte
gral.)
i 1/3

a. /0 /J> f(x,y)dydx.

c- J?fi°gXf(x,y)dydx.
176 Chapter 4. Integral Calculus

5. Evaluate the following iterated integrals. (You may need to reverse the order of
integration.)
a. Jx fi ye2x dx dy.

c. J1 jx,x ye y dy dx.

6. Fill in the blanks: Jo J^x2 f(y) dy dx = jo[ ] dy + J^[ ]dy. The expres
sions you obtain for the [ ]'s should not contain integral signs.

7. Given a continuous function g : R —> R, let h(x) = Jox J*^ <?(£) dt dy. That is,
/i is obtained by integrating g twice, starting the integration at 0. Show that h
can be expressed as a single integral, namely, h(x) = Jq{x — t)g(t) dt. (Note
that x can be treated as a constant here; y and t are the variables of integration.)

8. Let 5 C R3 be the region between the paraboloid z = x2+y2 and the plane z =
1. Express the triple integral JJJS f dV as an iterated integral with the order of
integration (a) z, y, x; (b) y, z, x; (c) x, y, z. (That is, find the appropriate limits
of integration in each case.)

9. Express the iterated integral fQ f0 v Jq f(x,y,z)dzdxdy

a. as a triple integral (i.e., describe the region of integration);
b. as an iterated integral in the order z, y, x;
c. as an iterated integral in the order y,z,x.

10. Find the centroid of the tetrahedron bounded by the coordinate planes and the
plane (x/a) + (y/b) + (z/c) = 1.

11. An object with mass density p(x, y, z) = yz occupies the cube {(x, y, z) : 0 <
x, y, z < 2}. Find its mass and center of mass.

12. A body with charge density p(x, y, z) = 2z occupies the region bounded below
by the parabolic cylinder z = x2 — 3, above by the plane z — x — 1, and on the
sides by the planes y = 0 and y = 2. Find its net charge (total positive charge
minus total negative charge).

13. Let f(x, y) = y~2 if 0 < x < y < 1, f(x, y) = -x~2 if 0 < y < x < 1, and
f(x, y) = 0 otherwise, and let S be the unit square [0,1] x [0,1].
a. Show that / is not integrable on S, but that f(x, y) is integrable on [0,1]
as a function of x for each fixed y and as a function of y for each fixed x.
b. Show by explicit calculation that the iterated integrals jQ Jo f(x, y) dxdy
and Jq Jq1 f(x, y) dy dx both exist and are unequal.
4.4. Change of Variables for Multiple Integrals 111

4.4 Change of Variables for Multiple Integrals

To motivate the ideas in this section, we recall the change-of-variable formula for
single definite integrals: If g is a one-to-one function of class C1 on the interval
[a, 6], then for any continuous function /,

rb rg(b)
(4.32) / f(g(u))g'(u) du = / f{x) dx.
Ja Jg(a)

The proof is a simple matter of combining the chain rule and the fundamental the
orem of calculus. Indeed, if F is an antiderivative of /, the right side of (4.32) is
F(g(b)) — F(g(a)), which in turn equals Ja (F o g)'(u) du, and the latter integrand
is f(g(u))g'(u). (Formula (4.32) is actually valid when / is merely integrable, but
we shall not worry about this refinement here.)
There is one slightly tricky point here, which we point out now because it will
be significant later. If g is an increasing function, (4.32) is fine as it stands, but
if g is decreasing, the endpoints on the integral on the right are in the "wrong"
order, and we might prefer to put them back in the "right" order by introducing a
minus sign: f9,v = — Jy^ • Since g is increasing or decreasing according as g' is
positive or negative, we could rewrite (4.32) as

(4.33) / f(g(u))\g'(u)\du = / f(x)dx.

J[a,b] Jg([aM)

Here g([a, b]) is the interval to which [a, b] is mapped under g, and for any interval
/ the symbol Jf means the integral from the left endpoint of / to the right endpoint.
The replacement of g' by | g' | compensates for the extra minus sign that comes from
adjusting the order of the endpoints when g is decreasing.
In practice it is often more convenient to have all the g's on one side of the
equation. If we set / = g([a, b]), we have [a, b] = g"1^), and (4.33) becomes

(4.34) [f(x)dx=l f{g{u))\g'{u)\du.

Jl Jg-l(l)

Our object is to find the analogous formula for multiple integrals. It is natural
to use (4.34) rather than (4.32) as a starting point, since for multiple integrals the
issue of left-to-right or right-to-left disappears and we just speak of integrals over
a region, like the integrals over intervals that appear in (4.34). More precisely,
suppose G is a one-to-one transformation from a region R in Mra to another region
5 = G(R) in Rn; then R = G~1(S), and the formula we are seeking should look
178 Chapter 4. Integral Calculus

rdO
>
dr
dr

Figure 4.7: The element of area in polar coordinates.

something like this:

(4.35) /(x)cTx= /
g-1(s)

The missing ingredient is the quantity that will play the role of \g'{u) | in the formula
(4.34).
Now, the g'(u) in (4.32) or (4.34) is the factor that relates the differentials du
and dx under the transformation x = g(u). In n variables, the n-fold differential
d"x = dx\ ■ ■ ■ dxn represents the "element of volume," that is, the volume of an
infinitesimal piece of ?^-space. So the question is: How does the volume of a tiny
piece of n-space change when one applies the transformation G?
To get a feeling for what is going on, let us look at the polar coordinate map

(x,y) = G(r,6) — (rcosO, rs'mO).

A small rectangle in the r6-plane with lower left corner at (r, 9) and sides of length
dr and dO is mapped to a small region in the xy-plane bounded by two line seg
ments of length dr and two circular arcs of length r d6 and (r + dr) dO. When dr
and d9 are very small, this is essentially a rectangle with sides dr and r d9, so its
area is r dr d9. In short, a small bit of the r^-plane with area dr d9 is mapped to a
small bit of the xy-plane with area r dr d6; see Figure 4.7. Hence, in this case the
missing factor in (4.35) is simply r, and (4.36) becomes

(4.36) // f(x,y)dxdy= // /(rcos0, r sin 9)rdrdd.

Here 5 is a region in the xy-plane and R = G"1 (S) is the corresponding region in
the r#-plane. Our argument here has been very informal, but this result is correct,
and it gives the formula for computing double integrals in polar coordinates.
The case of a linear mapping of the plane is also easy to analyze. Given a
matrix A = (a b.) with detA = ad - be ^ 0, let x = G(u) = Au, that is,
4.4. Change of Variables for Multiple Integrals 179

(0,1) (1,1) a + c, b + d)
(M)

(0,0) (1,0) (0,0)

Figure 4.8: The linear map (x, y) = (au + bv, cu + dv).

(x,y) = G(u,v) — (au + bv, cu + dv). The transformation G takes the unit
vectors (1,0) and (0,1) to the vectors (a,c) and (b,d), so it maps the standard
coordinate grid to a grid of parallelograms with sides parallel to these vectors. In
particular, it maps the square [0,1] x [0,1] to the parallelogram with vertices at
(0,0), (o, c), (b, d), and (a + b, c + d), as indicated in Figure 4.8. The area of that
parallelogram is |ad — be|, that is, | det A\. (To see this, think of the plane as sitting
in R3 and recall the geometric interpretation of the cross product: The area of the
parallelogram is

|(ai + cj) x (bi + dj)\ = \{ad - bc)k\ = \ad - bc\.)

Since the map G is linear, it commutes with translations and dilations, so if R is

any square in the w-plane, its image under G is a parallelogram in the xy-plane
whose area is | det A\ times that of R. It follows that the missing factor in (4.35)
should be simply | det A\, so that for linear maps of the plane, (4.35) becomes

ff ff
// f(x,y)dxdy=\ad-bc\ II f(au + bv, cu + dv) du dv.
JJs JJg-1 (S)

The situation is similar for linear mappings of 3-space. Namely, let x =

G(u) = Am where A is an invertible 3x3 matrix. If i, j, and k are the stan
dard basis vectors for IR3, we have Ai = a, A} = b, and Ak = c where a, b, c are
the columns of A, so A maps the unit cube to the parallelepiped generated by these
vectors. To find the volume of that parallelepiped, think of the bc-plane as its base.
Then the area of the base is |b x c|, and the height is the length of the projection of
a onto a line perpendicular to the bc-plane, namely, the line generated by b x c.
But this length is |a| | cos 8\ where 6 is the angle between a and b x c (we need the
absolute value because 6 might be obtuse). Hence,

Volume = |b x c| |a| | cos#| = |a • (b x c)|,

which is nothing but | det.A| (Exercise 8 in §1.1). As before, we conclude that for
the linear map G(u) = .Au of R3, the missing factor in (4.35) should be | det A\.
180 Chapter 4. Integral Calculus

It is now reasonable to conjecture that the same result should hold for linear
mappings of Rn for any n. We proceed to show that this is correct.

4.37 Theorem. Let A be an invertible n x n matrix, and let G(u) = ^4u be the
corresponding linear transformation ofW1. Suppose S is a measurable region in
W1 and f is an integrable function on S. Then G~1(S) = {A-1x : x e S} is
measurable and f o G is integrable on G~l(S), and

(4.38) /•■•//(x)drax= |detA| /•■■/ f(Au)dnu.

J Js J Jg-i(S)

Proof. The proof of the measurability of G~1(5) and the integrability of / o G,

which is not profound but rather tedious, is given in Appendix B.5 (Corollaries
B.16 and B. 17). (Actually, what is proved in Appendix B.5 is that if / is continuous
except on a set of zero content, a slightly stronger condition than integrability, then
the same is true of / o G.) Here we concentrate on proving (4.38). The proof
naturally requires some linear algebra, in particular, the facts about elementary row
operations and determinants in (A.17)-(A.18), (A.28), and (A.30) of Appendix A.
Step 1: Let us agree to (re)define /(x) to be 0 for x ^ 5. Then f{Au) = 0
for u ^ G~l(S), and we can replace the regions S and G~1(S) in (4.38) by R".
This makes the integrals in (4.38) look improper, but they really are not, since the
integrands vanish outside bounded sets. The point is that now we don't have to
worry about what the limits of integration in each variable are; we can take them
to be ±oo.
Step 2: We prove the theorem when G is an "elementary transformation," that
is, the transformation given by performing a single elementary row operation on
the column vector u. There are three kinds of elementary transformations, corre
sponding to the three types of row operations (see (A17)-(A18)):

1. Multiply the /cfh component by a nonzero number c, leaving all the other
components alone:

Gi(ui,... ,uk,... ,un) = (tii,. • • ,cuk,.- -,un).

2. Add a multiple of the jth component to the kth component, leaving all the
other components alone:

G2(ui,..., Uk, ...,un) = (m, ...,Uk + cuj, ...,«„).

3. Interchange the jth and A-th components:

4.4. Change of Variables for Multiple Integrals 181

The corresponding matrices A\, A%, A3 are obtained by performing the same row
operations on the identity matrix. Since det / = 1, the rules that tell how row
operations affect determinants (see (A. 30)) give

(4.39) detAx = c, detA2 = l, det A3 = -1.

It is easy to verify that (4.38) holds for these three types of transformations.
The first two involve a change in only the kth variable, so we can integrate first
with respect to that variable and use (4.34) (or, rather, the simple special cases of
(4.34) discussed in Exercise 8 of §4.1). Thus, for Gx we set xk = cuk and obtain

/(.. ,xk,...)dxk =- I f(...,cuk,...)cduk

roo

= \c\ / f(...,cuk,...)duk.
—00

(The endpoints have to be switched if c < 0, which accounts for replacing c by \c\,
as in the discussion preceding (4.34).) Likewise, for G2 we set xk = uk + cuj and
obtain
/■OO TOO

/ f(...,xk,...)dxk = f(...,uk + cuj,...)duk.

J—00 J—00

(uj is a constant as far as this calculation is concerned.) Now an integration with

respect to the remaining variables (for which xi = ui) yields

J- ■ J /(x) d«x = |c| J- ■ J /(G-l(u)) cTu = f--j /(G2(u)) dnu.

In view of (4.39), this establishes (4.38) for Gi and G2. As for G3, we have

00 /■oo

f(...,Uj,...,uk,...)dujduk
—00 j — 00
/*OO

/ /(..., tijt,..., Uj,...) ditj duk,

00 J-00

simply because the variables Uj and uk are dummy variables here. That is, we are
integrating / with respect to its jth and A;th variables, and it doesn't matter what
we call them. Now an integration with respect to the remaining variables, together
with (4.39), gives (4.38) for G3.
Step 3: We next verify that if (4.38) is valid for the linear maps G(u) = Au
and H(u) = Bu, then it is also valid for the composition (G o H)(u) = ABu.
182 Chapter 4. Integral Calculus

Indeed, if we set v = Bu and x = Av, we have

/■ ■ • / /(x) cTx = I det A\ [■••[ /(Av) dnv

J Js J Jg-^s)

= detA\\detB\ [■■■[ f(ABu)dnu.

J Jn-1(G-1(S))

But (det ,4)(det 5) = det(AB) and H^G"1^)) = (G o H)-x(5), so the

integral on the right equals

|det(AB)| [■■■[ f{ABu)dnu,

J J(GoH)-1(S)

as claimed.
The Final Step: From Step 3, it follows easily by induction that if (4.38) is valid
for Gi,..., Gfc, then it is also valid for the composition Gio- ■ -oGfc. Thus, in view
of Step 2, to complete the proof we have merely to observe that every invertible
linear transformation of W1 is a composition of elementary transformations. This
is equivalent to the fact that every invertible matrix A can be row-reduced to the
identity matrix; see (A.52) (in particular, the equivalence of (a) and (i)) and (A.53)
in Appendix A. □

There is one more simple class of transformations for which the change-of-
variable formula is easily established, namely the translations. These are the map
pings of the form G(u) = u + b where b is a fixed vector. Indeed, we just make
the substitution Xj = Uj + bj, dxj = duj in each variable separately to conclude
that

[[[ [[ b)cTu.
JS J JS-b

Combining this with Theorem 4.37, we see that if G(u) = Au + b, then

(4.40) [■■■[ /(x) cTx = | det A\ I-I /(Au + b) dnu.

J Js J Jg-^s)

In particular, by taking / = 1, we see that the n-dimensional volume of S is | det A\

times the n-dimensional volume of G~1(5).
It is now easy to guess what the change-of-variable formula for a general invert
ible C1 transformation must be. Indeed, suppose that U and V are open sets in W1,
G : U —> V is a one-to-one transformation of class C1 whose derivative Df(u) is
invertible for all u € U, and / is a continuous function on V. To relate the integral
of / over a measurable set S C V to an integral of / o G over T = G~1(S),
4.4. Change of Variables for Multiple Integrals 183

we think of the former as a sum of infinitesimal terms /(x) cPx, each of which is
the value of / at a point x multiplied by the volume cPx of an infinitesimal region
dS located at x. Under the transformation x = G(u), /(x) becomes /(G(u)),
and the region dS is the image under G of another infinitesimal region dT whose
volume is dnu. But on the infinitesimal level, the differentiable map G is the same
as its linearization:
G(u + du) = x + £>G(u) • du.

Therefore, by (4.40), the elements of volume dnx and dnu are related by the for
mula dnx = | det£>G(u)| dnu. Putting this all together, we arrive at the main
theorem.

4.41 Theorem. Given open sets U and V in Rn, let G : U —>• V be a one-to-one
transformation of class C1 whose derivative DG(u) is invertible for all u € U.
Suppose that T C U and S C V are measurable sets such that G(T) = S. If f is
an integrable function on S, then f o G is integrable on T, and

(4.42) y. •.j /(x) d"x = /■••/ /(G(u))|det£>G(u)|<Tu.

Proof. We present a proof in Appendix B.5 (Theorem B.24), under the slightly
stronger hypothesis that / is continuous except on a set with zero content. The
key idea is explained in the preceding paragraph, but turning it into a solid proof
is a surprisingly laborious task. An interesting and quite different approach to the
problem can be found in Lax [14], [15]. It shifts the hard work to a different part
of the argument; in particular, it uses the notion of partition of unity developed in
Appendix B.7. □

Notice that the results derived earlier in this section are indeed special cases of
Theorem 4.41. If G is a linear map, G(u) = Au, then DG(u) = A for all u,
so | det DG(u)| = | det A\ is a constant that can be brought outside the integral
sign. And if G is the polar coordinate map, G(r,9) = (rcos(9, rsinO), then
det DG(r, 9) = r, so we recover (4.36).
Let us record the corresponding results for the standard "polar" coordinate sys
tems in R3, shown in Figure 4.9. Cylindrical coordinates are just polar coordi
nates in the xy-plane with the ^-coordinate added in,

Gcyi(r,9,z) = (rcosO, rsin.9, z).

It is easily verified that det DGcy\(r, 8,z) = r again, so the formula for integration
in cylindrical coordinates is

(4.43) f(x,y,z)dxdydz= f{rcos9,rsm6,z)rdrd6dz.

JJjG-y\(S)
184 Chapter 4. Integral Calculus

.--" 9

Figure 4.9: Cylindrical coordinates (left) and spherical coordinates (right).

Spherical coordinates are given by

GHph(r,ip,8) = (?• sin</?cos9, ?• sin <£> sin 6>, rcos<£>).

Here r is the distance from the origin, 9 is the longitude, and <p is the co-latitude (the
angle from the positive z-axis). The reader may check that det DGsph(r, <p, 9) =
r2 shop (Exercise 6c, §3.4), so the formula for integration in spherical coordinates
is

(4.44) f(x,ij,z)dxdyd2

f (r simp .cos 6, r simp sin 8, r cos tp) r2 sin ip dr dtp d8.

We conclude with some examples.

Example 1. Find the volume and the centroid of the region S above the
surface z = x2 + y2 and below the plane z = 4. (See Figure 4.10.)
Solution. Because of the circular symmetry, it is most convenient to use
polar coordinates. The projection of S onto the xy-pla.ne is the disc of radius 2
about the origin, so the volume of S is

V = I I (4 - r'2)r dB dr = 2tt [2r2 - ±r4]I - 8tt.

Jo Jo
By symmetry, the centroid lies on the z-axis, and its z-coordinate is

i-2 r'\ /-27T -, r2

hf[['<r-r
v .1J .Is 87r Jo Jt* Jo
rz d9 dz dr =

Jo
4.4. Change of Variables for Multiple Integrals 185

FIGURE 4.10: The regions in Example 1 (left) and Example 2 (right).

Example 2. Find the volume of the "ice cream cone" T bounded below by
the cone z = 2-i/z2 + y2 and above by the sphere x2 + y2 + z2 = 1. (See
Figure 4.10.)
Solution. In spherical coordinates (r,ip,0), the equation of the cone is
tan <p — \ and the equation of the sphere is r = 1. Hence the volume is

/•i /■tan-1(l/2)
nan *(1/2J /-27T
rz-rr ^ ,. .„,
/ / / r2 sin tp cW dtp dr — (2vr) [ — cos 92]i tan"1
„ (1/2) r
LI' Jo
Jo Jo io

This can also be done in cylindrical coordinates (r, 9, z) (note that the meaning
of r has changed here), in which the equation of the cone is z = 2r and the
equation of the sphere is r2 + z2 = 1. The projection of T onto the xy-plane
is the disc r < 1/V5, so the volume is

/ / / rd6dzdr =
JO J2r JO

which yields the same answer as before.

Example 3. Let P be the parallelogram bounded by the lines x — y = 0,

x + 2y = 0, x — y = 1, and a; + 2y = 6. (See Figure 4.11.) Compute
JJp xy dA.
Solution. The equations of the bounding lines suggest the linear transfor
mation u = x — y, v = x + 2y, which maps P to the rectangle 0 < u < 1,
0 < v < 6. In the notation of Theorem 4.37, P plays the role of S and this
186 Chapter 4. Integral Calculus

FIGURE 4.11: The regions in Example 3 (left) and Example 4 (right).

transformation is G 1; its inverse G is easily computed to be x — \{2u + v),

y = ^(v — u), whose determinant is |. Thus, by Theorem 4.37,

2u + v \ ( v — u
xydA = - dv du,
p 'o Jo

which is easily computed to be p.

Alternatively, one can readily calculate that the vertices of P are (0,0),
(§,§), (§, — 5), and (2, 2). It follows that P is the image of the unit square
0 < s, t < 1 under the transformation

2 o
3 *
-I 2

where the columns of the 2x2 matrix are the vectors from the origin to the
two adjacent vertices. Taking this transformation as G in Theorem 4.37 yields

xy dA = 2 2t)(~\s + 2t)dtds.
/o Jo

This integral is essentially the same as the preceding one; the variables (s,t)
and (u, v) are related by u = s, v = 6t.

Example 4. Let R be the region in the first quadrant of the xy-plane bounded
by the x-axis and the parabolas x = 1 — \y2, x = \y2 — 1, and x = 4 — j$y2-
(See Figure 4.11.) What is JJS xy dx dyl
Solution. Refer back to Example 3 in §3.4: The region R is the image of
the rectangle {(u, v) : I < u < 2, 0 < v < 1} under the map G(u, v) =
{u2 - v2, 2uv). We have DG{u,v) = ( Jj ~ and hence detDG(u,v) =
4.4. Change of Variables for Multiple Integrals 187

4(«2 + v2). Thus, the substitutions x = u2 — v2, y — 2uv give

xydxdy= I 1 (u2 - v2)(2uv)4(u2 + v2) dudv

./o h
2
■uv ) du av — (|u v — 4u v ) | au
o Jo
/•i
= / (84u - 12u5) dv = (42v2 - 2v6)\: = 40.
Jo

EXERCISES

1. Find the area of the region inside the cardioid r = 1 + cos 9 (polar coordinates).

2. Find the centroid of the half-cone \Jx2 + -y2 < z < 1, x > 0.
3. Find the volume of the region inside both the sphere x2 + y2 + z2 =4 and the
cylinder x2 + y2 = 1.
4. Find the volume of the region above the xy-plane, below the cone z = 2 —
\/x2 + %j2, and inside the cylinder (x - I)2 + y2 = 1.
5. Find the mass of a right circular cylinder of base radius R and height h if the
mass density is c times the distance from the bottom of the cylinder.

6. Find the volume of the portion of the sphere x2 + y2 + z2 =4 lying above the
plane z = 1.

7. Find the mass of a ball of radius R if the mass density is c times the distance
from the boundary of the ball.

8. Find the centroid of the portion of the ball x2 + y2 + z2 < 1 lying in the first
octant (x, y, z > 0).

9. Find the centroid of the parallelogram bounded by the lines x - 2>y = 0, 2.x +
y = 0, x - 3y = 10, and 2x + y = 15.

10. Calculate JJs{x + y)'x{x - y)~5 dA where S is the square -1 < x + y < 1,
1 < •'•■ - y'< 3.
11. Find the volume of the ellipsoid (x + 2y)2 + (x - 2y + z)2 + 3z2 = 1.
12. Let S be the region in the first quadrant bounded by the curves xy = 1, xy = 4,
and the lines y = x, y = 4x. Find the area and the centroid of 5 by using the
transformation u = xy, v — y/x.

13. Let 5 be the region in the first quadrant bounded by the curves xy = 1, xy = 3,
X2 _ y2 _ ^ anc| 3.2 _ y2 _ 4_ Compute JJs(x2 + y2) dA. (Hint: Let
G(x,y) = (xy, x2 - y2). What is |det£JG|?)'
188 Chapter 4. Integral Calculus

14. Use the transformation x = u — uv,y = uv to evaluate $$s{x + y)~l dA where

S is the region in the first quadrant between the lines x + y = 1 and x + y = 4.

15. Use "double polar coordinates" x = r cos 6, y = r sin 9, z = s cos </?, to =

s sin 9? in R4 to compute the 4-dimensional volume of the ball x2 + y2 + z2 +
w2 = R2.

4.5 Functions Defined by Integrals

Suppose /(x, y) is a function of x e Rm and y <G R™. If /(x, y) is integrable over

the set S C Rn as a function of y for each fixed x, we can form a new function of
x by integrating out y:

(4.45) F(x) = J---J f(x,y)dny.

The question then arises as to how properties of / such as continuity and differen
tiability relate to the corresponding properties of F.
Perhaps the most basic question of this sort is the following. Suppose that

(Y e 5);

is it true that
f
limF(x) = /•••/ g(y)dny?
f
x~>a J Js
In other words, can one interchange the operations of integrating with respect to y
and taking a limit with respect to x? Is the limit of the integral equal to the integral
of the limit? In general, the answer is no.

Example 1. let

f(x,y) = 2J
\x -+- y )

Evidently limx^o f(x,y) = 0 for each y (although for different reasons when
y = 0 or when y ^ 0). However, limx^o /0 f{x, y) dy ^ 0; in fact,

/
x2y x2
\dy =
./o (x 2 + y 2Y 2{x2+y2) 0
2(1

which tends to I as x —> 0.

4.5. Functions Defined by Integrals 189

Notice, however, that the / in Example 1 is discontinuous, and indeed un

bounded, at the origin; for instance, f(x,x) = 1/Ax —> oo as x —> 0. (f(xo,y)
is bounded as a function of y for each fixed xq, but its maximum value tends to
infinity as xq —» 0.) If we impose some stronger conditions on /, we can obtain
an affirmative result. The following theorem is not the last word in the subject (see
Corollary 4.53), but it suffices for many purposes.

4.46 Theorem. Suppose S and T are compact subsets ofW1 and IRm, respectively,
and S is measurable. //'/(x, y) is continuous on the setT x 5 = {(x,y) : x e
T, ye 5}, then the function F defined by (4.45) is continuous on T.

Proof Given e > 0, we wish to find 5 > 0 so that |-F(x) - F(x!)\ < e whenever
x — x'| < 5. Let 151 denote the n-dimensional volume of S. Since T x 5 is
compact, / is uniformly continuous on it by Theorem 1.33, so there is a 6 > 0 so
that |/(x,y) - /(x',y)| < e/\S\ whenever y G 5, x,x' G T, and |x - x'| < S.
But then

|F(x) - F(x')| < J- ■ J |/(x,y) - /(x',y)| dny < j- ■ -J ± dny = e,

and we are done. D

Remark. In the statement of Theorem 4.46 we could assume that T is open

rather than compact. Indeed, every point x in an open set T is the center of a closed
ball B that is contained in T. Since B is compact, the preceding argument shows
that F is continuous on B, and hence F is continuous at every x e T.

A related question concerns differentiability. Suppose that / is differentiable

as a function of x for each y € S; is it true that F is differentiable in x and that its
partial derivatives dVjF are the integrals of the derivatives dVjfl In other words, is
the integral of the derivative equal to the derivative of the integral? This is another
question about the interchange of limits and integrals. Indeed, it is always true that
the finite difference F(x + h) - -F(x) is the integral of /(x + h, y) - /(x,y),
simply because integration is a linear operation, and the question is what happens
in the limit as h —» 0. As in Example 1, things can go wrong; see Exercise 1. Our
main positive result is as follows.

4.47 Theorem. Suppose S C Rn is compact and measurable, and T C Rm is

open. If f is a continuous function onT x S that is of class Cl as a function of
x G T for each y € S, then the function F defined by (4.45) is of class C on T,
and

(4-48) ^(x) = /.../^(x,y)<ry (xeT).

190 Chapter 4. Integral Calculus

Proof. Given a point xo G T, choose r > 0 small enough so that x£T whenever
lx - xo| < 2r. We shall show that F is of class C1 on B(r, x0) and prove (4.48)
for x G B(r, xo); since xo is an arbitrary point in T, this will establish the theorem.
For the purpose of computing dXjF, the other variables Xk (k ^ j) play no role, so
we may assume that m = 1. In fact, in order to simplify the notation a bit, we shall
also assume that n = 1; the proof for general n is exactly the same. Accordingly,
we write x and y instead of x and y henceforth.
For 0 < \h\ < r and \x — xq\ < r, we consider the difference quotient

F(x + h)-F(x) f f(x + h,y) - /(*, y)

h = Is h ^
By the mean value theorem, we have f(x + h, y) — f(x.y) = hdxf(x + th, y),
where t is some number between 0 and 1 depending on x, h, and y. Hence,

(4.49)
F(x + h) - F{x)
, 3xf(x, y) dy = / [dxf(x + th, y) - dxf(x, y)} dy.

The argument now proceeds as in the proof of Theorem 4.46. Since dxf is contin
uous on the compact set B(r, xq) x 5, it is uniformly continuous there by Theorem
1.33. Thus, given e > 0, we can find S > 0 so that the integrand on the right of
(4.49) is less than e/\S\ for all y G S, x G B(r, x0), and t G (0,1), whenever
\h\ < 5. It follows that

F{x + h) - F(x)
/ dxf{x,y) dy < / 7^- dy = e for \h\ < 5,
h Js .Is \b\
and hence that

(x + h)F(x)_r
>*■ JS

as claimed. □

Example 2. Let F(x) = f^ y~lexy s'mydy. This integral cannot be eval

uated in elementary terms; however, we have F'(x) = f'^ exy s'mydy, which
can be evaluated by two integrations by parts. The result is that F'(x) =
(e™ + l)/(.x2 + 1).

Situations often occur in which the variable x occurs in the limits of integration
as well as the integrand. For simplicity we consider the case where x and y are
scalar variables:

(4.50) F(x)= f f{x,y)dy.

4.5. Functions Defined by Integrals 191

We suppose that / is continuous in x and y and of class Cl in x for each y, and that
<p is of class C1. If / does not depend on x, the derivative of F can be computed
by the fundamental theorem of calculus together with the chain rule:

dx Ja

For the more general case (4.50), we can differentiate F by combining this result
with Theorem 4.47 according to the recipe in Exercise 7 of §2.3: Differentiate with
respect to each x in (4.50) in turn while treating the others as constants, and add
the results. The upshot is that

(4.51) F'(x) = f(x,<p(xW(x) + J'P |£(z,

EXAMPLE 3. Given a continuous function g on R, let

h(x) = / (x - y)g(y) dy.

Then
rx c

h'(x) = (x - x)g{x) + / g(y)dy= / g{y)dy,

Jo Jo

and hence h"(x) = g(x). (Cf. Exercise 7 in §4.3, where this result is ap
proached from a different angle.)

The hypotheses of Theorems 4.46 and 4.47 can be weakened considerably, but
only at the cost of a more intricate proof. More sophisticated theories of integra
tion (see §4.8) furnish a powerful theorem, the so-called dominated convergence
theorem, that generally provides the sharpest results in these situations. The full
statement of this theorem requires more background than we have available here,
but its restriction to the context of Riemann integrable functions is the following
result, in which the crucial condition is the existence of the uniform bound C.

4.52 Theorem (The Bounded Convergence Theorem). Let S be a measurable

subset ofW1 and {fj} a sequence of integrable functions on S. Suppose that
fj(y) ~~* f(y)for each y G S, where f is an integrable function on S, and that
there is a constant C such that \fj(y) | < C for all j and all y G 5. Then

f f f f
lim /•••/ fj(y)dny= /•••/ f(y)dny.
J"*00 J JS J JS
192 Chapter 4. Integral Calculus

An elementary (but not simple) proof for the case where 5 is an interval in R
can be found in Lewin [17]. The full dominated convergence theorem can be found
in Bear [3, p. 68], DePree and Swartz [5, p. 194], Jones [9, p. 133], and Rudin [18,
p. 321].
Theorem 4.52 implies the following improvements on Theorems 4.46 and 4.47.

4.53. Corollary. Let S be a measurable subset ofW1 and T a subset ofW1. Sup
pose /(x, y) is a function on T x S that is integrable as a function ofye S for
each x G T, and let F be defined by (4.45).
a. If f(x, y) is continuous as a function ofx G T for each y G S, and there is
a constant C such that |/(x, y)| < C for all x G T and y G 5, then F is
continuous on T.
b. Suppose T is open. If f{x, y) is of class Cl as a function o/x G T for each
y G S, and there is a constant C such that | Vx/(x, y) | < C for allx €T and
y G 5, then F is of class C1 on T and (4.48) holds.

Proof. To prove part (a), by Theorem 1.15 it is enough to show that F(x.j) —> F(x)
whenever {xj} is a sequence in 5 converging to x G S. This follows by applying
the bounded convergence theorem to the sequence of functions /7(y) = /(x7, y).
Similarly, part (b) is proved by applying the bounded convergence theorem to the
sequence of difference quotients with increments hj, where {hj} is a sequence
tending to zero along one of the coordinate axes. The uniform bound on these quo
tients is obtained by applying the mean value theorem as in the proof of Theorem
4.47; details are left as Exercise 8. □

EXERCISES

1. Let f(x,y) = x3y-2e~x2/y if V > 0, f(x,y) = 0if y < 0.

a. Show that f(x, y) is of class C1 as a function of x for each fixed y and as a
function of y for each fixed x, but that / is unbounded in any neighborhood
of the origin. (For the smoothness in y, cf. Exercise 9 in §2.1.)
b. Let F(x) = jQ f(x,y)dy. Show that F(x) = xe~x and hence that
F'(0) = 1, but that £ yx/(0, y) dy = 0.
2. Compute F'(x) for the functions F(x) defined for x > 0 by the following
formulas. Your answers should not contain integral signs.
a. F(ar) = /01log(l+^)dy.
b. F(x)~ffy-lco8(xy2)dy.
c. F(x) = ffxy~1exydy.
4.6. Improper Integrals 193

3. Given a continuous function g on K, let h(x) = f£(x - y)ex~yg(y) dy. Show

that h" - 2ti + h = g.
4. Given a continuous function g on R, let h(x) = \ /Qxsin2(a; — y)g{y)dy.
Show that h" + Ah = g.
5. Given Fix) = f?r) f(x,y)dy, find F'(x), assuming suitable smoothness
conditions on tp, <p, and /.
6. {How to compress n antidifferentiations into one) Let / be a continuous func
tion on R. For n > 1, let

\x -
f[n](x) = ^yy J\

Show that (/["■])' = /t""1! for n > 1 and conclude that /tn' is an nth-order
antiderivative of /.

7. Let / be any continuous function on [0,1]. For i6R and t > 0, let

f 9 f f (XI I
u{x,t)=t~1/2 e & yy*uf(y)dy, v(x,t)=t -, ,2 2 dy.

a. Show that dtu = d2u.

b. Show that d2v + dfv = 0.
8. Complete the deduction of Corollary 4.53b from the bounded convergence the
orem.

4.6 Improper Integrals

In this section we return to integration in one variable. The Riemann theory of

integration pertains to bounded functions on finite intervals, but there are many sit
uations where one needs to integrate functions over infinite intervals (i.e., half-lines
or the whole line) or functions that are unbounded near some point in the interval
of integration. Such integrals are called improper, and they are defined in terms of
limits of ordinary integrals. To do a really good job with improper integrals, one
should adopt the more powerful Lebesgue theory of integration, sketched in §4.8.
(Even then, additional limiting procedures are needed to handle integrals such as
the one in Example 3 below.) Here we content ourselves with a short discussion of
useful results about simple types of improper integrals.
The two most basic types of improper integrals are as follows:

I. Jq°° f(x) dx, where / is integrable over every finite subinterval [a, 6].
194 Chapter 4. Integral Calculus

II. Ja f(x) dx, where / is integrable over [c, b] for every c > a but is unbounded
near x = a.

We study these two types in turn and then consider integrals of more complicated
sorts that can be obtained by combining them.

Improper Integrals of Type I. In this subsection, all functions in question are

assumed to be defined on [a, oo) and integrable on [a, b) for every b > a.
The definition of the improper integral is

/■oo 6
rb

I f(x) dx = lim / f(x) dx

Ja b^ca Ja

More precisely, the integral /a°° f(x) dx is said to converge if the limit on the right
exists, in which case its value is defined to be that limit; otherwise the integral is
said to diverge, and it is not assigned a numerical value. (However, we may say
that J^° f{x) dx = oo if fa f(x) dx grows without bound as b —> oo.)

Example 1.
a. Jo°°
°° e~x dx = limôo
lim^ [ — e~x] = 1, 1 since lim&ôo
lim& e~b = 0.
b. Jo°° cos x dx diverges, since lim&ôo sin b does not exist.

Our main concern here is not with the evaluation of j°° f(x) dx but with the
more basic question of whether or not it converges. At the outset, we make one
simple but useful remark: If c > a, the convergence of J£° f(x) dx is equivalent to
the convergence of /c°° f(x) dx, the difference between the two being the ordinary
integral J^ f(x) dx. Thus, the convergence of Ja°° f(x) dx depends only on the
behavior of f(x) as x —> oo, not on its behavior on a finite interval [a, c\.
We first consider the situation when / > 0. In this case, the integral Ja f(x) dx
increases along with the upper endpoint b, so we can exploit the following variant
of the monotone sequence theorem.

4.54 Lemma. Ify> is a bounded increasing function on [a, oo), then linx^oo p(x)
exists and equals sup{</?(a;) : x > a}.

Proof. The proof is left to the reader (Exercise 7); it is essentially identical to the
proof of the monotone sequence theorem (1.16). □

By applying Lemma 4.54 to the function tp(x) = J® f(t) dt, we see that the
integral Ja°° f(x) dx converges if and only if ja f(x) dx remains bounded as b —>
oo. This immediately leads to the basic comparison test for convergence.
4.6. Improper Integrals 195

4.55 Theorem. Suppose that 0 < }{x) < g(x) for all sufficiently large x. If
f™g{x)dx converges, so does Ja°° f(x) dx. If f™ f(x)dx diverges, so does
f™g(x)dx.

Proof. By the remarks following the definition of convergence, we may assume

that 0 < f(x) < g(x) for all x > a. If /a°° g(x) dx converges, it provides an upper
bound for ip(b) = fa f(x) dx as b ->■ oo:

rb rb rco

/ f{x)dx< / g(x)dx< / g(x)dx.

Ja Ja Ja

The convergence of /o°° f(x) dx then follows from Lemma 4.54. The second as
sertion is equivalent to the first one. □

The following variant of Theorem 4.55 is sometimes easier to apply:

4.56 Corollary. Suppose f > 0, g > 0, and f(x)/g(x) ->■ / as x -> oo. IfO<
I < oo, then Ja°° f(x) dx and /a°° g(x) dx are both convergent or both divergent.
If I = 0, the convergence o//a°° g(x) dx implies the convergence o//Q°° f{x) dx. If
I = oo, the divergence of /a°° g(x) dx implies the divergence off™ f(x) dx.

Proof. If 0 ■ I yields the estimates f(x) <
2lg{x) and f(x) > \lg{x) for sufficiently large x, so the first assertion follows by
comparing / to a multiple of g. If I = 0 (resp. I = oo), we have f(x) < g(x) (resp.
g{%) > f{x)) f°r sufficiently large x, whence the other assertions follow. □

The functions most often used for comparison in Theorem 4.55 and Corollary
4.56 are the power functions x~p. Taking a = 1 for convenience, for p ^ 1 we
have
rb dx bl-p - 1 f oo if p < 1,
—>

and J* x~1dx = log 6 —> oo. In short, f™ x~p dx converges if and only if p > 1.
Combining this fact with Theorem 4.55, we obtain the following handy rule:

4.57 Corollary. IfO < f(x) < Cx~p for all sufficiently large x, where p > 1, then
Ja°° f{x)dx converges. If f{x) > ex'1 (c > 0) for all sufficiently large x, then
J*a°° f(x)dx diverges.

Example 2. The integral /0°°[(22; + 14)/(x3 + 1)] dx converges, because

2x + 14 4a; 4
—3—- < ^r = -j for x > 7.
x6 + 1 ar a:
196 Chapter 4. Integral Calculus

Alternatively, we could observe that

2x + 14 / 1
—5 / —7r —► 2 as x —> oo
x3 + 1 / x2

and use Corollary 4.56 with g(x) = x~2 to establish the convergence of the
integral over, say, [1, oo). (The integral over [0,1] is proper.) Note that we are
not comparing /0°°[(2z + 14)/(x3 + 1)] dx to /0°° x~2 dx, which presents an
additional difficulty because x~2 is unbounded at x = 0; the comparison of
(2x + 14)/(x3 + 1) with x~2 is significant only for large x.

It should be noted that the power functions x~p do not quite tell the whole story.
There are functions whose rate of decay at infinity is faster than x~l but slower
than x~p for p > 1, and their integrals may be either convergent or divergent; see
Exercises 4 and 5.
Next we remove the assumption that / is nonnegative, and with a view toward
future applications, we shall allow / to be complex-valued. The question of con
vergence can often be reduced to the case where / > 0 via the following result.

4.58 Theorem. If Ja°° \f(x)\ dx converges, then Ja°° f(x) dx converges.

Proof. First suppose / is real-valued. Let f+(x) = max[/(x),0] and f~(x) =

max[-/(x),0]. Then we have 0 < /+(z) < |/(ar)| and 0 < f~(x) < |/(x)|, so
/o°° f+{x) dx and /Q°° f~{x) dx converge by Theorem 4.55. But / = /+-/-, so
/a°° f(x^ dx converges also.
If / is complex-valued, we have | Re/(x)| < \f{x)\ and | Im/(x)| < |/(a;)|,
so the convergence of Ja°° |/(x)| dx implies the convergence of J°° | Re f(x)\ dx
and Ja°° \Imf(x)\dx and hence (by the preceding argument) the convergence of
the real and imaginary parts of J^° f(x) dx. D

The integral /a°° f(x) dx is called absolutely convergent if Ja°° \f(x) \ dx con
verges. Theorem 4.55 and its corollaries can be used to test for absolute conver
gence, by applying them to |/|. It is possible, however, for /a°° f(x) dx to converge
even when Ja°° \f(x)\ dx diverges because of cancellation effects between positive
and negative values. Here is an important example.

/oo gin a;
dx is not absolutely convergent (Exercise
x

8), but it is convergent. To see this, integrate by parts:

b rb
sm x , — cos x cos x
dx = dx.
x x
4.6. Improper Integrals 197

Now, /j00 |x~2 cosx| dx converges by Corollary 4.57 since \x~2 cosx| < x~2,
so the integral on the right approaches a finite limit as b —> oo; moreover, since
|6-1 cos b\ < b~l —> 0, so does the other term. Hence lim^oc J^ x"1 sin x dx
exists, as claimed.

Improper Integrals of Type II. In this subsection, all functions in question are
assumed to be defined on (a, b] and integrable on [c, b] for every c > a.
The definition of the improper integral in this situation is
rb rb

/ fix) dx = lim / f(x) dx.

Ja c>a> c^a Jc
That is, /ab f(x) dx converges if the limit on the right exists, and diverges other
wise. The obvious analogues of the results in the preceding subsection are valid in
this situation with essentially the same proofs; one has merely to replace conditions
like "x —» oo" or "for sufficiently large x" by "x -» a" or "for x sufficiently close
to a." For instance, here is the basic comparison test:

4.59 Theorem. Suppose that 0 < f(x) < g(x) for all x sufficiently close to
a. If f^ g(x) dx converges, so does Ja f(x) dx. If Ja f(x) dx diverges, so does
fag{x)dx.
The functions most often used for comparison in this situation are the power
functions (x — a)~v, but now the condition for convergence is p < 1 rather than
p > 1. Indeed, forp ^ 1,

(x — a) F dx =
1-p

and Jc6(x - a)"1 dx = log(x - a)\bc —> oo. Hence the analogue of Corollary 4.57
is as follows:

4.60 Corollary. IfO< f(x) < C(x — a)~p for x near a, where p < 1, then
Jab f{x) dx converges. If f(x) > c(x — a)'1 (c > 0)for x near a, then Ja f(x) dx
diverges.

EXAMPLE 4. JQ x~2 sin 3x dx diverges. Indeed, x~l sin 3x —> 3 as x —> 0, so

x~2 sin 3x > 2x~l for x near 0.

Theorem 4.58 also remains valid in this situation; that is, absolute convergence
implies convergence.

EXAMPLE 5. J01x~1/2sin(a;~1)(ix is absolutely convergent, because

x-1/2 smix-1)] <x"1/2.
198 Chapter 4. Integral Calculus

Other Types of Improper Integrals. Various other kinds of improper integrals

can be built up out of those of types I and II.
First, obviously one can consider the "mirror images" of types I and II; that
is, integrals of the form J_oo f(x) dx where / is integrable on [a, b] for all a < b,
or integrals of the form Ja f(x) dx where / is integrable on [a, c] for all c < b
but is unbounded near x = b. The ideas are exactly the same; only minor nota-
tional changes are needed. (In the latter situation, the comparison functions for the
analogue of Corollary 4.60 are the power functions \x — b\~p = (b — x)~p.)
Second, one can consider improper integrals Ja f(x) dx where a difficulty oc
curs at both endpoints of the interval of integration, either because the endpoint is
at infinity or because the integrand is unbounded there. The trick here is to pick an
intermediate point c G (a, b) and write ja = Jflc + Jc , thus reducing the integral to
a sum of two integrals that are each of type I or II; the original integral is said to be
convergent if and only if each of the two subintegrals is convergent. For example,
if / is integrable over every finite interval [a,b], we define

/■oo rO />oo

/ f{x) dx= fix) dx + / fix) dx

J— oo J—oo JO

= lim / fix) dx + lim / fix) dx.

The integral on the left converges only when both of the limits on the right exist
independently of one another; there is no relation between the variables a and b.
The same ideas apply to J^° fix) dx when / is unbounded at a or to f f(x) dx
when / is unbounded at both a and b.

Example 6. J^^dx/il + x2) converges; the integrals over (—oo,0] and

[0, oo) are both convergent by comparison to x 2. In fact,

dx ,fe 7T / 7T\
= lmi arctanx = = vr.
,1 + X2 o-*-oo, 6-H-oo la 2 V 2/

Example 7. /0°° x~p dx is divergent for every p. Indeed, if p < 1, JQ x~p dx

converges but f^°x~pdx diverges, whereas the reverse is true if p > 1. If
p = 1, these integrals both diverge.

Example 8. Consider /0°° f(x) dx where /(x) = l/{x1/2 + x^/2). Since

0 < f(x) < x~1/2, Jq f(x) dx converges by Corollary 4.60. Since 0 <
f[x) < x~3//2, /j00 f(x) dx converges by Corollary 4.57. Hence /0°° f(x) dx
converges.
4.6. Improper Integrals 199

Finally, one can consider improper integrals J^ f(x) dx where / is unbounded

near one or more interior points of [a, b]. Again the trick is to break up [a, b] into
subintervals such that the singularities of / occur only at endpoints of the subinter
vals and consider the integrals of / over the subintervals separately.

EXAMPLE 9. Let fix) = (x3 - 8z2)~1/3, and let us consider /Q9 f(x) dx and
Jo°° f(x) dx. The singularities of / occur at x = 0 and x = 8, so for the first
integral we write

f = r+ f + i (o<c<8).
Jo Jo Jc Js
We have \f(x)\ = x~2l3\x - 8|~1/3, which is approximately \x~2lz for x
near 0 and approximately \\x - 8I"1/3 for x near 8. Hence all three subinte-
grals are absolutely convergent by Corollary 4.60, and the original integral /0
converges. On the other hand, f{x) is positive for x > 8 and f(x)/x~1 =
(1 - 83T1)-1/3 -> 1 as x -> oo, so /g°° f(x) dx diverges by Corollary 4.56. It
follows that /0°° f(x) dx diverges too.

The definition of the improper integral Ja6 f(x)dx given above when / has
a singularity in the interior of [o, b] is a little too restrictive for some purposes.
Consider, for example, Jj^ x~ldx. According to our definition, this integral is to
be considered as the limit of

f-S dx f1 dx , . .
(4.61) / h / — = log 5 - log e = lo,
7-i x Je x

as 5 and e decrease to 0, and this limit does not exist: When 5 and e are extremely
small, their ratio can be arbitrarily large or arbitrarily small. However, since x'1 is
an odd function, it seems natural to interpret the value of the integral as 0; the neg
ative infinity of j'°_x x^1 dx should exactly cancel the positive infinity of JQ x~ldx.
We can achieve this result by modifying (4.61) so as to preserve the symmetry of
the situation, namely, by taking S = e, so that log(<5/e) = 0.
These considerations lead to the following definition. Suppose a < c < b, and
supppose / is integrable on [a, c — e] and on [c + e, b) for all e > 0. The (Cauchy)
principal value of the integral Ja f(x) dx is

P.V, I f(x) dx = lim [ / f(x) dx+ i f{x) dx\ ,

Ja e^0 \_Ja Jc+£ J

provided that the limit exists. Of course, if Ja f(x) dx converges, its Cauchy prin
cipal value is its ordinary value.
200 Chapter 4. Integral Calculus

The following proposition describes a typical situation in which principal val

ues occur.

4.62 Proposition. Suppose a < 0 < b. If p is continuous on [a, b] and differen

tiate at 0, then P.V. Ja x~l<p(x) dx exists.

Proof First we check the case ip = 1 by explicit calculation:

P.V. / — = lim / h / — = log \x\\ + logx = log — .

Ja X £^0 [Ja X Je X J ' a e \\a\J

For the general case, we write <p(x) = p(0) + [p(x) - <p(0)}, obtaining

rb u>(x\ rb dnr. fb (p(x) - ipU})

dx.
x

We have just seen that the first quantity on the right exists, and the second one is a
proper integral: The integrand is actually continuous on [a, b] if we define its value
six - Otobe^'(O). □

The notion of principal value is also occasionally applied to integrals of the

form J^ f(x) dx in which / is integrable over any finite interval:

POO R
rR

P.V. / f{x)dx= lim / f{x)dx.

J-oc R^°° J-R

For example, the integral /^ x(l + x2) x dx is divergent because the integrand
is asymptotically equal to x"1 as x —> ±oo, but its principal value is zero because
the integrand is odd.

EXERCISES

1. Determine whether the following improper integrals of type I converge.

f°° dx
a. / — .
Ji x\/x + 3
f°° x2 - 3x - 1 ,
b. / ——2 -r- dx.

poo

c. / x2e~x dx.
Jo

d. / -z— —dx.
J'S x2 — x — 2
4.6. Improper Integrals 201

/OO 1

tan — dx.
x

2. Determine whether the following improper integrals of type II converge.

a. / -L-^da.
'1 - x2

3. Determine whether the following improper integrals converge. In each case

it will be necessary to break up the integral into a sum of integrals of types I
and/or II.
oo

a. / £-3/4e-x dx.
Jo
1
b.
o
r
ex _

do;
d.
0

e. / a;"1'5 sin — dx.

x
o

4. Forp > 0, let fp(x) = x-l{\ogx)-p.

a. Given p > 0 and e > 0, show that x~l~e < fp{x) < x~~l for sufficiently
large x.
b. For which p does J2°° fp{x) dx converge?

5. Let fp be as in Exercise 4 and gp(x) = (a;logs)"1 (loglog2:)~p.

a. Given p > 0 and e > 0, show that /i+e(ar) < ffp(x') < fi(x) for suffi
ciently large x.
b. For which p does J3°° gp(a;) da; converge?

6. Let /(a) = 1 on the intervals [1, lj], [2, 2|], [3, 3|], ..., and /(s) = 0 else
where.
202 Chapter 4. Integral Calculus

a. Show that /0°° f(x) dx converges (and is equal to 1) although f(x) -/> 0 as
x —>■ oo.

b. Modify / to make an example of a function g such that Jo°° g{x) dx con

verges although g(x) does not remain bounded as x —> oo.

7. Prove Lemma 4.54.

8. Prove that J^° x~l\ sin.r| dx diverges. (Hint: Show that there is a constant
c > 0 such that [{n+1)v x~l\ sinarl dx > c f('l+1)7r z"1 ^ for all n > 1.)
•J 717T v Tin

9. (Dirichlet's Test for Convergence) Let / be continuous and let g be C1 on

[a, oo). Suppose that (i) the function F(x) = f* f(t)dt remains bounded
as x -? oo; (ii) g'(x) < 0 on [a,oo) and limx._).oo g(x) = 0. Show that
f°° f(x)g(x) dx converges. (Hint: Example 3 is the case f(x) — sinx,
g(x) = a;"1. Generalize the argument given there.)
10. Evaluate P.V. f\ dx/x(x + 2).
11. Suppose ip is of class C3 on [—1,1]. Show that P.V. f_x x~3tp(x) dx exists if
and only if 95'(0) = 0. (Hint: Consider the second-order Taylor expansion of

V.)

4.7 Improper Multiple Integrals

The problem of defining improper integrals in dimensions n > 1 is trickier than in

dimension 1. Suppose, for example, that / is a continuous function on M2 and we
wish to define fL-> f dA. The obvious idea is to set

fdA= lim // fdA,

r—>oo j I q

where the 5r's are a family of measurable sets that fill out W2 as r ->■ 00. For
instance, we could take Sr to be the disc of radius r about the origin, or the square
of side length r centered at the origin, or the rectangle of side lengths r and r2
centered at the origin, or the disc of radius r centered at (15, —37), and so on. The
difficulty is evident: There is a bewildering array of possibilities, with no rationale
for choosing one over another and no guarantee that different families Sr will yield
the same limit.
Evidently there is some work to be done, and we shall not give all the details
here. The outcome, in a nutshell, is that everything goes well when the integrand is
nonnegative or when the integral is absolutely convergent, but not otherwise.
4.7. Improper Multiple Integrals 203

We begin by considering the situation where a nonnegative function / is to be

integrated over a set S C 1". We suppose that / is not integrable on S according
to the definitions in §4.2, either because S is unbounded or because / is unbounded
on S. Instead, we assume the following:

S is the union of an increasing sequence of sets U\, U2, • ■ •,

(4.63) S = \JUj (f/i C U2 C U3 C • • •),

where each Uj is measurable and / is integrable on each Uj.

EXAMPLE 1. If S — Rn and / is continuous on Rn, we can take Uj to be

the ball of radius j about the origin. As noted above, there are many other
possibilities.

Example 2. Suppose / is continuous on Kn \ {0} but /(x) —* ooasx^ 0,

and S is the ball {x : |x| < 1}. Then we can take Uj to be the spherical shell
{x : 1/j < |x| < 1}. (Strictly speaking, the union of the Uj's is S \ {0}, but
this is immaterial: Omission of a single point, or any set of zero content, from
a domain has no effect on integration over that domain.)

With 5, /, and Uj as in (4.63), the integrals /• • -/^ / dVn exist for all j, and
they increase along with j since the sets Uj do. It therefore follows from the mono
tone sequence theorem that the limit

lim [■■■[ fdVn

i^°° J JUj

always exists, provided that we allow +00 as a value, and this limit is an obvious
candidate for the value of the improper integral /• • -Js f dVn.
Here is the crucial point: Suppose that {Uj} is another sequence of sets satis
fying the conditions of (4.63). Then the two limits

lim [■■■[ fdVn and lim [■■■[ fdVn

i->°°7 JUj J^ooJ Jfj.

are equal. Therefore, it makes sense to define to define the integral of / over S by

(4.64) [■■■[ fdVn= lim [■■■[ f dVn,

J Js i^°°7 Juj

where {Uj} is any sequence of sets satisfying the conditions of (4.63). It is un

derstood that the value of the integral may be +00, in which case we say that the
integral diverges.
204 Chapter 4. Integral Calculus

The proof that the limit in (4.64) is independent of the choice of {Uj}, in full
generality, requires the Lebesgue theory of integration. We shall give a proof under
some additional restrictions on S and the Uj's, usually easy to satisfy in practice,
in Appendix B.6 (Theorem B.25).
It is also true that improper multiple integrals of nonnegative functions can be
evaluated as iterated improper integrals under suitable conditions on 5 and / so
that the latter integrals exist. For example,

r r rOC /"OO TOO /*0O

// fdA = / f(x,y)dxdy = / f(x,y)dydx,

J JM? J — oo J—oo J-co J —oo

and if 5= {(x,y) :0<x<y},

p p poo py poo poo

fdA= / f(x,y)dxdy= / / f{x,y)dydx.

JJs JO JO J0 Jx

We shall not attempt to state a general theorem to cover all the various cases (much
less give a precise proof), but we assure the reader that as long as the integrand is
nonnegative, there is almost never any difficulty.
The analogue of the comparison test, Theorem 4.55, is valid for multiple im
proper integrals, with the same proof. Again the basic comparison functions are
powers of |x|, but the critical exponent depends on the dimension.

4.65 Proposition. Forp > 0, define fp on R"\{0} by fp(x) = |x|~p. The integral
of fp over a ball {x : |x| < a} is finite if and only ifp < n; the integral of fp over
the complement of a ball, {x : |x| > a}, is finite if and only ifp > n.

Proof. We present the proof when n = 2. The only singularity of / is at the

origin, so we may use the annuli {x : e < |x| < a} and {x : a < |x| < 6} as
approximating regions. In polar coordinates, the integrals then become

rb p2tt

r-prd6dr, / /

As e —> 0 and b —> oo we obtain 2vr Joa rl~p dr and 27r Ja°° rl~p dr, which are
convergent when p < 2 and p > 2, respectively.
The proof for general n is similar, using spherical coordinates and their ana
logues in higher dimensions. The reader is invited to work out the case n = 3 in
Exercise 1. □

As another example of improper double integrals, we now perform a classic

calculation that leads to one of the most important formulas in mathematics.
4.7. Improper Multiple Integrals 205

Let us consider the integral

e-x2-y2

On the one hand, we can take the approximating regions Uj to be discs centered at
the origin and switch to polar coordinates:

2 2 I'R f27T 2 f°° r27T 2

e~x ~y~ dA = lim / / e~r r d6 dr = / e~r~rd6dr
R-^°°Jo Jo Jo Jo
„ r i _r2-ioo
= 2n~¥ ]0 =7r-
On the other hand, we can take the approximating regions to be squares centered at
the origin and stick to Cartesian coordinates:

ft e~x2-y2dA= lim / f e-x2e-y2dxdy

J Jr2 r^°° J-rJ-r

° e~x'2 dx] ( f°° e-y2 dy] .

The two integrals in parentheses are equal, of course; the name of the variable of
integration is irrelevant. We have shown that

_ 2 V
e x dxj = 7T.

2
Since e~x > 0, we can take the positive square root of both sides to obtain the
magic formula:

f°° _ 2
f
4.66 Proposition. / e x dx
J

The function e x turns up in many contexts. In particular, it is essentially

the "bell curve" or "normal distribution" of probability and statistics, but in that
setting one must rescale it so that the total area under its graph is 1; Proposition
4.66 provides the appropriate scaling factor. Proposition 4.66 is remarkable not
only because it is inaccessible by elementary calculus (the antiderivative of e~x~ is
not an elementary function) but because it presents the number vr in a starring role
that has nothing to do with circles.
206 Chapter 4. Integral Calculus

Now, what about functions that are not nonnegative? Let us suppose that 5, /,
and {Uj} are as in (4.63), but / is merely assumed to be real-valued. The essential
point is that the preceding theory can be applied to |/|, so that it makes sense to say
that J- ■ Js\f\ dVn converges. If this condition holds, the argument used to prove
Theorem 4.58 shows that linij^oo J- ■ -jv f dVn exists and that

lim [•••[ f dVn = [■■■[ f+ dVn -[■■■[ f- dVn,

3^°°J JUj J JS J JS

where /+(x) = max[/(x), 0] and /~(x) = max[—/(x), 0]. The integrals on the
right converge by comparison to the integral of |/|, and they are independent of
the choice of {Uj}; hence, so is the limit on the left. In short, if J- • -Js\f\ dVn
converges, we may define the improper integral of / over S by formula (4.64); the
limit in question exists and is independent of the choice of approximating sequence

{Uj}.
The same result holds if / is complex-valued; we simply consider its real and
imaginary parts separately.
In dimensions n > 1, however, there is no general theory of improper integrals
that are convergent but not absolutely convergent. Such integrals, when they arise,
must be defined by specific limiting procedures that are adapted to the situation at
hand.

EXERCISES

1. Prove Proposition 4.65 for the case n = 3.

2. Determine whether the following improper integrals converge, and evaluate the
ones that do.
dV

/R3 -1- + x
ft
b.

(x2 + y2 + z2fl2

d. // xe~x v dA.
>x>0
2

/x2+?,2<i [x2 + ylY

4.8. Lebesgue Measure and the Lebesgue Integral 207

3. The electrostatic potential generated by a distribution of electric charge in M3

with density p is defined to be

Show that this integral is absolutely convergent if p is continuous and vanishes

outside a bounded set.

4. Let f(x,y) = (x2 - y2){x2 + y2)"2, and let S be the unit square [0,1] x
[0,1]-
a. Show that ffs |/| dA = oo.
b. Show by explicit calculation that the iterated integrals /0 f^ fix, y) dx dy
and Jo Jo f{x,y) dy dx both exist and are unequal.

4.8 Lebesgue Measure and the Lebesgue Integral

In several places in this book we allude to the fact that in advanced analysis, the
Riemann theory of integration that we have developed here is replaced by the more
sophisticated theory due to Lebesgue. Detailed accounts of the Lebesgue integral
can be found in Bear [3], Jones [9], and Rudin [18]. Here we shall content our
selves with a brief informal description of how it works. (Note: There are several
ways to develop the Lebesgue theory of integration; in some treatments, the char
acterization of Lebesgue measure and the Lebesgue integral that we give here are
theorems rather than definitions.) In a few places we need the notion of the sum of
an infinite series, for which the reader is referred to §6.1.
The starting point is a refined concept of re-dimensional measure, independent
of any theory of integration. To keep things on a concrete level, let us explain this
concept for the case n = 2.
In the Jordan theory of area, described in §4.2, we find the area of a set S C K2
by approximating S from the inside and the outside by unions of rectangles. For
the Lebesgue notion of area, we use a two-step approximation process: We first
approximate S from the inside by compact sets and from the outside by open sets,
then approximate the compact sets from the outside and the open sets from the
inside by unions of rectangles. More precisely, let us agree to call a set that is
the union of a finite collection of rectangles with disjoint interiors a tiled set. The
Lebesgue measure m(S) of a set S C M2 is then defined as follows:

• If T = U/c=i Rk is a tiled set, where the Rk's are rectangles with disjoint
interiors, the Lebesgue measure m(T) is the sum of the areas of the R^s.
208 Chapter 4. Integral Calculus

• The Lebesgue measure of a compact set K is

m{K) = inf{m(T) : T is a tiled set and T D K).

• The Lebesgue measure of an open set U is

m(U) = sup{m(T) : T is a tiled set and r C £/}.

• A set 51 C R2 is said to be Lebesgue measurable if the quantities

sup{m(K) : K is compact and K C 5}

and

\n£{m(U) : £/ is open and U D S}

are equal, in which case their common value is the Lebesgue measure m,(S).

Note that there is no assumption that the sets in question are bounded (although
compact sets are bounded by definition); the Lebesgue theory applies equally well
to bounded and unbounded sets.
The notion of n-dimensional Lebesgue measure for sets in Wl is entirely simi
lar; only the terminology needs to be modified a little. Every set that one will ever
meet in "real life" — in particular, every open set, every closed set, every intersec
tion of countably many open sets, every union of countably many closed sets, and
so on — is Lebesgue measurable.3 Lebesgue measure has the following fundamen
tal additivity property: If {Sj} is a finite or infinite sequence of disjoint Lebesgue
measurable sets, then (J Sj is Lebesgue measurable and rn(\J Sj) = Y2 m(Sj). In
the Jordan theory, this additivity is guaranteed to hold only for finitely many sets;
the extension to infinitely many sets is the crucial property that allows the Lebesgue
theory to handle various limiting processes more smoothly.
It is not hard to show that every open set U C K" is the union of a finite or
countably infinite family of rectangular boxes Rj (intervals when n — 1) with dis
joint interiors, and the Lebesgue measure of U is just the sum of the n-dimensional
volumes of the boxes. (In general these boxes are not part of a fixed grid of boxes;
if there are infinitely many of them, the diameter of Rj generally tends to zero as
j —> ex).) It follows that a set S C Wl has Lebesgue measure zero if and only iffor
every e > 0, 5 is contained in the union of a finite or countable family of boxes,
the sum of whose volumes is less than e. The only difference between this and the
condition that S have zero content is the fact that here we allow an infinite family

'For those who know some set theory: More precisely, one cannot construct Lebesgue nonmea-
surable sets without invoking the axiom of choice.
4.8. Lebesgue Measure and the Lebesgue Integral 209

of boxes, but as with additivity, this difference is significant. In particular, every

countable set has Lebesgue measure zero (if S = {xi,X2,...}, let Rj be a box
centered at Xj with volume 2~Je), whereas many countable sets — the set of points
with rational coordinates, for example — are not Jordan measurable.
With the notion of Lebesgue measure in hand, we turn to the Lebesgue integral.
First we specify the class of functions to which the theory applies. A function
/ : IRn —> R is called Lebesgue measurable if, for every interval /cR, the set
{xein: / (x) G /} is Lebesgue measurable. Again, every function that one will
ever meet in "real life" is Lebesgue measurable. In particular, every continuous
function is Lebesgue measurable, and if / is Riemann integrable on the set S,
then fxs is Lebesgue measurable. Moreover, if {fj} is a sequence of Lebesgue
measurable functions such that /j(x) —» /(x) for every x, then the limit / is
Lebesgue measurable. (This last statement is quite false if "Lebesgue measurable"
is replaced by "Riemann integrable"!)
Suppose that / is Lebesgue measurable and nonnegative. Rather than parti
tioning the domain of /, we partition the set [0, oo) in which / takes its values into
small intervals [0,2~n), [2~n, 2 • 2~n), [2 • 2~n, 3 ■ 2"n), and so on, and form the
sum

(The Lebesgue measurability of / is needed so that the terms in this sum are well
defined. One or more of them may be infinite, in which case the value of the
sum is +00.) The sums Snf increase with n because the associated partitions of
[0, oo) become finer and finer, so they have a limit (possibly +oo), which is defined
to be the Lebesgue integral of / (over W1), denoted by / / dm. More generally,
we define the Lebesgue integral of / over any Lebesgue measurable set S C W1,
denoted by Js f dm, to be f(fxs) dm. Note that neither the function / nor the
set 5 needs to be bounded; for nonnegative integrands there are no "improper"
integrals in the Lebesgue theory.
Now we drop the assumption that / > 0. If / is any Lebesgue measurable
function, we write it as the difference of the two nonnegative functions

/+(x) = max[/(x),0] and /"(x) = max[-/(x),0]

and define the Lebesgue integral / / dm to be / /+ dm - J f~ dm. The integral
/ / dm is not defined in the case where J f+ dm and J f~ dm are both infinite,
although in some instances one can define it as an "improper" integral by limiting
procedures such as those in §4.6. (Example 3 in §4.6 illustrates this phenomenon.)
The Lebesgue integral is an extension of the Riemann integral. That is, if
the (proper) Riemann integral Js f dVn exists, then so does the Lebesgue integral
210 Chapter 4. Integral Calculus

Js f dm, and the two are equal; but the class of Lebesgue integrable functions is
much bigger than the class of Riemann integrable functions. We conclude with two
additional remarks about the relation between the Lebesgue and Riemann integrals.

• The notion of Lebesgue measure provides a definitive answer to the question

of which functions are Riemann integrable. Namely, a function f : W1 —> K
is Riemann integrable on the bounded set S if and only if f is bounded on S
and the set of points at which fxs is discontinuous has Lebesgue measure
zero. (Cf. Theorems 4.13 and 4.18 and the discussion of zero content versus
zero measure above.)

• There is a way of giving the Riemann theory of integration an extra twist to

obtain an integral, called the Henstock-Kurzweil integral, generalized Rie
mann integral, or gauge integral, that is equivalent to the Lebesgue integral
for nonnegative functions but also gives a well-defined result for some func
tions / for which J f+ dm and J f~ dm are both infinite. See Bartle [2] for
a brief introduction and DePree and Swartz [5] for a complete treatment. The
virtue of this theory is that it yields a powerful theory of integration within
the same conceptual framework as the familiar Riemann integral without the
necessity of developing a theory of measure first. The compensating virtue
of the Lebesgue theory is that it generalizes readily to yield useful notions of
measure and integration in many important situations other than the classical
integral on Euclidean space.
Chapter 5

LINE AND SURFACE INTEGRALS;

VECTOR ANALYSIS

The themes of this chapter are (1) integrals over curves and surfaces and (2) differ
ential operations on vector fields, which combine to yield (3) a group of theorems
relating integrals over curves, surfaces, and regions in space that are among the
most powerful and useful results of advanced calculus.

At the outset, let us explain the term "vector field" in more detail. Let F be
an Rn-valued function defined on some subset of Rn. We have encountered such
things in previous chapters, where we generally thought of them as representing
transformations from one region of K" to another or coordinate systems on regions
of Rn. In this chapter, however, we think of such an F as a function that assigns to
each point x in its domain a vector F(x), represented pictorially as an arrow based
at x, and we therefore call it a vector field. Two simple vector fields are sketched in
Figure 5.1. The primary physical motivation is the idea of a force field. For exam
ple, F could represent a gravitational field, F(x) being the gravitational force felt
by a unit mass located at x, or an electric field, F(x) being the electrostatic force
felt by a unit charge located at x. There are many other physical interpretations; for
example, in a moving fluid like a stream of water, F(x) could represent the velocity
of the fluid at position x. (In all these examples, F(x) may also depend on other
parameters such as the time £.)

One other general comment: The notion of differentiability, or being of class

Ck, is defined for functions on open sets, because to compute the derivative of a
function at a point it is necessary to know the values of the function at neighboring
points. However, we shall frequently be dealing with functions and vector fields on
closed sets. When we say that a function or vector field is of class Ck on a closed

211
212 Chapter 5. Line and Surface Integrals; Vector Analysis

Figure 5.1: The vector fields F(x,y) = (x,y) (left) and F(x,y) =
(-y, x) (right).

set S C Wl, we always mean that it is of class Ck on some open set containing S.

5.1 Arc Length and Line Integrals

In this section we discuss integrals over curves, traditionally called "line integrals,"
which are generalizations of ordinary (one-dimensional) integrals over intervals on
the real line. As one would expect, they are based on the idea of cutting up the curve
into many tiny pieces, forming appropriate Riemann sums, and passing to the limit.
However, there are two species of line integrals, appropriate for integrating real-
valued or vector-valued functions, depending on how one adapts the differential dx
appearing in f% f(x) dx to the more general situation. Our discussion here will
be on the informal, intuitive level where we think of dx as being an infinitesimal
increment in the variable x.

Differentials on Curves; Arc Length. Suppose C is a smooth curve in Rn.

We consider two nearby points x and x + dx on the curve; here

(5.1) = (dx\,..., dxn]

is the vector difference between the two points, and we imagine it as being infinitely
small. We may, however, be more interested in the distance between the two points,
traditionally denoted by ds, which is

(5.2) ds = = \l dx\

To give these differentials a precise meaning that can be used for calculations, the
best procedure is to parametrize the curve. Thus, we assume that C is given by
5.1. Arc Length and Line Integrals 213

parametric equations x = g(£), a < t < b, where g is of class C1 and g'(t) ^ 0.

Then the neighboring points x and x + dx are given by g(t) and g(i + dt), so

(5.3) dx = g(i + dt) - g(t) = g'(t) dt = fe ..., ^

(The difference between the increment of g and its linear approximation disappears
in the infinitesimal limit.) Moreover,

which is just what one gets by formally multiplying and dividing the expression on
the right of (5.2) by dt.
What happens if we sum up all the infinitesimal increments dx or ds — that
is, if we integrate the differentials dx or ds = |dx| over the curve? Integration of
the vector increments dx just gives the total vector increment, that is, the vector
difference between the initial and final points on the curve:

(5.5) f dx= fgl(t)dt

JC Ja

This is nothing but the fundamental theorem of calculus applied to the components
of g; it is simple but not very exciting. On the other hand, ds is the straight-line
distance between two infinitesimally close points x and x + dx on the curve, and
since smooth curves are indistinguishable from their linear approximations on the
infinitesimal level, ds is the arc length of the bit of curve between dx and x + dx.
Adding these up gives the total arc length of the curve:

(5.6) Arc length = I ds = I \g'{t)\dt.

JC Ja

Our derivation of (5.6) in terms of infinitesimals was meant as motivation rather

than as a rigorous proof of anything. Henceforth, we shall take (5.6) as a definition
of arc length for a smooth curve. (There is another, perhaps better, definition that
does not require the curve to be C1; we shall discuss it at the end of this section.)
There is, however, one crucial issue that must be addressed: The arc length of a
curve C is an intrinsic property of the geometric object C and should not depend
on the particular parametrization we use. To see that this is the case, suppose we
choose a new parameter u related to t by t — <p(u), where ip is a one-to-one smooth
214 Chapter 5. Line and Surface Integrals; Vector Analysis

Figure 5.2: Two oriented curves.

mapping from the interval [c, d] to the interval [a, b]. Then the curve C described
by x = g(i) is also described by x = (g o ip)(u), c < u < d, so we should have

Arc length = / |(g o <p)\u)\ du = I | du,

Jc Jc

where for the second equality we have used the chain rule. This does indeed agree
with (5.6), by formula (4.34).
The same independence of parametrization holds for the related integral (5.5),
with one subtle but important difference. The integral Ja g'(t) dt gives the vector
difference between the two endpoints of the curve, which is clearly independent of
the parametrization except insofar as the parametrization determines which is the
initial point and which is the final point. If we choose a new parameter u as above
so that t is a decreasing function of u (thus a = <p(d) and b = ip(c)), then the initial
and final points get switched, and so their difference is multiplied by —1.
The issue here is that a parametrization x = g(t) determines an orientation for
the curve C, that is, a determination of which direction along the curve is "forward"
and which direction is "backward," the "forward" direction being the direction in
which the point g(t) moves as t increases. The orientation of a curve can be conve
niently indicated in a picture by drawing one or more arrowheads along the curve
that point in the "forward" direction, as indicated in Figure 5.2. The substance of
the preceding paragraph is then that the integral (5.5) depends on the parametriza
tion only insofar as the parametrization determines a choice of orientation. In
contrast, the arc length of a curve is independent even of the orientation.
The notion of arc length extends in an obvious way to piecewise smooth curves,
obtained by joining finitely many smooth curves together end-to-end but allow
ing comers or cusps at the joining points; we simply compute the lengths of the
smooth pieces and add them up. We can express this more precisely in terms of
parametrizations, as follows: The function g : [a, b] —» R" is called piecewise
smooth if (i) it is continuous, and (ii) its derivative exists and is continuous except
perhaps at finitely many points tj, at which the one-sided limits limt-^-i g'(t) ex
ist. (Note. In Chapter 8 we shall use the term "piecewise smooth" in a slightly
5.1. Arc Length and Line Integrals 215

different sense.) In this case |g'(i)| is an integrable function on [a, b] by Theorem

4.12 (the fact that it may be undefined at a few points is immaterial), and its inte
gral gives the arc length. The same generalization also applies to the line integrals
discussed below.

Remarks.

i. The parametrization x = g(i) may be considered as representing the curve C

as the path traced out by a moving particle whose position at time t is g(t).
The derivative g'(t) is then the velocity of the particle, and its norm |g'(i)|
is the speed of the particle. Integrating the velocity, J g'(i) dt, gives the net
difference in the initial and final positions of the particle, whereas integrating
the speed, Ja |g'(i)| dt, gives the total distance traveled by the particle, i.e., the
arc length of the curve.
ii. In the preceding discussion, we have implicitly assumed that the parametri
zation x = g(t) is one-to-one. This is not always the case if we think of g(i)
as the position of a particle at time t, for the particle can traverse a path more
than once. For example, g(t) = (cost, shit) represents a particle moving
around the unit circle with constant speed. If we restrict t to an interval of
length < 27r, we get a one-to-one parametrization of part or all of the circle,
but from the physical point of view there is no reason to make such a restriction.
However, the interpretations in the preceding paragraph hold whether g is one-
to-one or not: /a& g'(t) dt is still g(6) - g(a), and /Q6 |g'(i)| dt is still the total
distance traveled by the particle from time a to time b; it can be interpreted
as arc length if the portions of the curve that are traversed more than once are
counted with the appropriate multiplicity.
iii. While theoretically simple, calculation of arc length tends to be difficult in
practice because the square root implicit in the definition of the norm |g'(i)|
often leads to unpleasant integrands. This is just a fact of life.

Line Integrals of Scalar Functions. If / is a continuous function whose do

main includes a smooth (or piecewise smooth) curve C in R", we can integrate /
over the curve, taking the differential in the integral to be the element of arc length
ds. Thus, if C is parametrized by x = g(t), a < t < b, we define

(5.7) f fds= I'f(g(t))\g'(t)\dt.

JO Ja

This is independent of the parametrization and the orientation, by the same chain-
rule calculation that we performed above for the case / = 1.
216 Chapter 5. Line and Surface Integrals; Vector Analysis

As an example of an application of such integrals, we can define the average

value of / over the curve C, just like the average value over a region:

Average of / over C =
Icfds _ Icfds
Arc length of C Jc ds

Example 1. What is the centroid of the upper half of the unit circle, C =
{(x,y):x2 + y2 = l, y > 0}?
Solution. The centroid of C is the point whose coordinates (x, y) are the
averages of x and y over C. Clearly x = 0 by symmetry. Just to get some
practice, let's do the calculation of the arc length of C (which of course is ir)
and Jc y ds with two different parametrizations: (i) taking x as the parameter
and y = Vl - x2, and (ii) taking the polar angle 9 as the parameter, x = cos 9,
y — sin 9. (Note that these two parametrizations give opposite orientations on
C; the first goes from left to right, the second from right to left.)
In the first parametrization, we have

—xdx dx
dy = ds = dy2 = dx =
l-x2
so

yds = dx = x _ — 2,
c

/ ds = = arcsmx , = tt.
2 1-1
Jc
In the second one, we have

dx — — sin9d9, dy = cos 9 d9; ds = \/dx2 + dy2 — d0,

f
/ yds =
r ds= d9 = ir.
Jc Jo 'c Jo
Either way, y — 2/vr.

Line Integrals of Vector Fields. We can define the integral of an Rm-valued

function over a curve in Rn, simply by integrating each component separately; that
is, if F = (Fi,..., Fm), then Jc F ds = {Jc F1ds,..., Jc Fm ds). There is not
much to be said about such integrals that does not follow immediately from the
facts about scalar-valued integrals. One significant fact, however, does require a
little extra proof, namely the analogue of Theorem 4.9d. We state it for ordinary
integrals over [a, b]\ the generalization to integrals over curves is easy (Exercise 7a).
5.1. Arc Length and Line Integrals 217

5.8 Proposition. IfF is a continuous M.m-valuedfunction on [a, b], then

F(i) dt \F(t)\dt.

Proof. For any unit vector u, we have

F(t) dt u ! F{t)-udt < I |F(t)-u|*< I \F{t)\dt.

Ja Ja Ja

Here we have applied Theorem 4.9d to the scalar-valued function F(t) ■ u and then
invoked Cauchy's inequality. The desired result is obtained by taking u to be the
unit vector in the direction of Ja F(t) dt. □

Of greater interest is a scalar-valued line integral for vector fields — that is, for
Rn-valued functions on Rn. If C is a smooth (or piecewise smooth) curve in W1
and F is a continuous vector field defined on some neighborhood of C in Rn, the
line integral of F over C is

/ F • cbc = / + F2 dx2 Fn dxr

c Jc
That is, if C is described parametrically by x = g(i), a <t <b, then

(5.9) = f F(g(t))-g'(t)dt.
C Ja

If we make a change of parameters, say t = <p(u), the chain rule g'{t)dt =

g'((p(u))ip'(u) du together with the change-of-variable formula for ordinary (sin
gle) integrals guarantees that the quantity on the right of (5.9) is unchanged, except
that the new endpoints of integration may end up in the wrong order. Therefore:
The line integral Jc F • dx is independent of the parametrization as long as
the orientation is unchanged, but it acquires a factor of — 1 if the orientation is
reversed. That is, [c F ■ cix is a well-defined quantity once the vector field F and
the oriented curve C are specified.
The line integral fc F • dx. can be expressed as an integral of a scalar function
over C. Indeed, let us choose a parametrization x = g(i) and set

*(*(*)) = O^T.
That is, t(x) is the unit tangent vector to the curve C in the forward direction at the
point x, and Ftang(x) is the component of F(x) in the direction of t(x). Then

g'(t) = F(g(t)) ds,

218 Chapter 5. Line and Surface Integrals; Vector Analysis

(5.10) [ F-dx = f Ftangds.

Jc Jc
That is, Jc F • dx is the integral of the tangential component of F with respect to
arc length. The dependence on the orientation here comes through Ftang. which
changes sign if the orientiation is reversed. (Any temptation to compute specific
line integrals by using (5.10), however, should probably be resisted, because the
element of arc length ds is often hard to compute with. It is almost always better to
use the basic definition (5.9) instead.)

Remarks.
i. If F is a force field, then Jc F • dx represents a quantity of energy; it is the
work done by the force on a particle that traverses the curve C.
ii. The integrand F • dx = F\ dx\ + ■ ■ ■ + Fn dxn in a line integral, with the
dx's included, is often called a differential form, and we speak of integrating
a differential form over a curve. We shall return to this notion in §5.9.

What does all this boil down to when n = 1? In this case, vector fields and
scalar functions are the same thing, and both the scalar and vector versions of line
integrals are just ordinary one-variable integrals. The former, however, is indepen
dent of orientation, whereas the latter depends on orientation. The distinction is the
same as the one between formulas (4.32) and (4.33) in §4.4; it is a question of

/
f f(x) dx versus /
fb f(x) dx.
J[a,b] Ja
In the integral on the left we must have a < b; but in the integral on the right a and
b can occur in either order, and the sign of the integral depends on the order.

Example 2. Let C be the ellipse formed by the intersection of the circular

cylinder x2 + y2 = 1 and the plane z — 2y + 1, oriented counterclockwise
as viewed from above, and let F(x, y, z) = (y, z, x). Calculate Jc F • dx =
Jc(y dx + z dy + x dz).
Solution. We can parametrize C by x = cost, y = sini, z = 2sint + 1,
with 0 < t < 2ir. Then dx = (— sin t, cos t, 2 cos t) dt, so

F ■ dx = (- sin2 t + (2 sin t + 1) cos t + 2 cos2 t) dt

= (cos It + sin 2t + cos t + cos2 t) dt.

The integral of the first three terms over [0, 2tt] vanishes, and the integral of the
last one is tt. So Jc F • dx = vr.
5.1. Arc Length and Line In tegrals 219

FIGURE 5.3: Approximation of a curve by a piecewise linear curve.

Note that it doesn't matter which point on C we choose to start and end at.
Instead of taking t 6 [0, 2n], we could take t G [a, a + 2tt] for any a € R; the
answer is the same since the integral of a trig function over a complete period
is independent of the particular period chosen.

Rectifiable Curves. There is an alternative definition of arc length that requires

no a priori hypotheses about the smoothness of the curve. One cuts the curve C up
into a finite number of pieces by inserting subdivision points and approximates
C by the piecewise linear curve obtained by connecting the dots, as indicated in
Figure 5.3. The length of the piecewise linear approximation is obtained by adding
up the lengths of its constituent line segments, and the arc length of C is defined to
be the limit of this sum as the subdivision is made finer and finer.
To make this more precise, it is convenient to describe C parametrically. Thus,
we assume that C is the range of a one-to-one continuous mapping g : [a, b] —> R".
Given a partition P = {to,.. ., tj} of [a, b], the sum of the lengths of the line
segments joining the points g(i,-) is

If the set of numbers

L = {Lp(C) : P is a partition of [a, b}}

is bounded, then C is called rectifiable, and the arc length L(C) is defined to be
the supremum of £,:

L(C) = sup{Lp(C) : F is a partition of [a, 6]}.

Note that if P' is a refinement of P then Lpi (C) > Lp (C), by the triangle inequal
ity; hence the supremum is indeed the appropriate sort of limit. This estimate also
implies that the supremum is unchanged if we consider only partitions containing
a given c £ (a, b) among their subdivision points, and from this it follows that arc
220 Chapter 5. Line and Surface Integrals; Vector Analysis

length is additive: If C\ and C% are the curves parametrized by g(t) for t e [a, c]
and t € [c, 6], then L(C) = L(Ci) + L(C2). See Exercise 8.
We now show that this definition coincides with our previous one for C1 curves.

5.11 Theorem. With notation as above, if g is of class Cl, then C is rectifiable,

and

L(C)= [ \g'(t)\dt.
Ja

Proof. For any partition P of [a, b], by (5.5) and Proposition 5.8 we have

rt-i j
1 J

LP(C) = j o
(t)dt <y
— / ^
\g'(t)\dt = Id*.
i
Jt.l- 1 i

It follows that L(C) < / |g'(t)| dt, and in particular that C is rectifiable.
Next, for r, s 6 [a, b], let C* be the curve parametrized by g(£) with t € [r, s],
and let (p(s) = L(C£). (That is, we consider the length of the curve C, starting
at t = a, as a function of the right endpoint of the parameter interval.) Suppose
h > 0. Since arc length is additive, we have L(Css+h) = <p(s + h) — tp{s), so by
the inequality we have just proved (applied to the curve Cg+h) and the mean value
theorem for integrals,

/s+h

\g'(t)\dt = h\g'(a)\,

where a is some number between s and s + h. On the other hand, \g(s + h) — g(s) |
is Lp(Css+h) where P is the trivial partition {s, s + h}, and hence it is no bigger
than L(Css+h). Combining these estimates and dividing by h, we see that

g(s + h) - g{s) tp{s + h) - <p{s)

h h

As h —> 0, the quantities on the left and right approach |g'(s)|, and hence so does
the one in the middle. A slight modification of this argument works also for h < 0,
so we conclude that ip is differentiable and that <p'(s) = |g'(s)|. The desired result
is now immediate:

= f \g'(s)\ds.
J

□
5.1. Arc Length and Line Integrals 221

EXERCISES

1. Find the arc length of the following parametrized curves:

a. g(t) = (acost, asint, bt), t £ [0,27r].
b. g(t) = (±t3-t, t2),te [0,2].
c. g(t) = (log£, 2t, t2),te[l,e].
d. g(t) = (6t, 4t3/2, -4t3/2, 3t2),t e [0,2].
2. Express the arc length of the following curves in terms of the integral
[■ir/2

E{k) = VI -A;2sin2tdt (0 < k < 1),

for suitable values of k. (E(k) is one of the standard elliptic integrals, so

called because of their connection with the arc length of an ellipse.)
a. An ellipse with semimajor axis a and semiminor axis b.
b. The portion of the intersection of the sphere x2 + y2 + z2 — A and the
cylinder x2 + y2 — 2y = 0 lying in the first octant.
3. Find the centroid of the curve y = coshx, — 1 < x < 1.
4. Compute Jc^/zds where C is parametrized by g(i) = (2cost, 2sint, t2),
0 < t < 2?r.

5. Compute Jc F • cfoc for the following F and C:

a. F(x,y,2;) = (yz, x2, xz); C is the line segment from (0,0,0) to (1,1,1).
b. F is as in (a); C is the portion of the curve y — x2, z = x3 from (0,0,0)
to (1,1,1).
c. F(x, y) = (x — y, x + y); C is the circle x2 + y2 = 1, oriented clockwise.
d. F(x, y) = (x2y, x3y2); C is the closed curve formed by portions of the
line y = 4 and the parabola y = x2, oriented counterclockwise.
6. Compute the following line integrals:
a. Jc(xe~y dx + sin7rx dy), where C is the portion of the parabola y = x2
from (0,0) to (1,1).
b. Jc(y dx + zdy + xy dz), where C is given by x = cos t,y = sin t, z = t
with 0 < t < 2vr.
c. Jc(y2dx — 2x dy), where C is the triangle with vertices (0,0), (1,0), and
(1,1), oriented counterclockwise.
7. Let F : Mn -* Wn be a continuous map, and let C be a C1 curve in Rn.
a. Deduce from Proposition 5.8 that | Jc F ds\ < Jc |F| ds.
b. In the case m = n, show that | Jc F ■ dx.\ < fc |F| ds.
8. Prove in detail that arc length, as defined for rectifiable curves, is additive; that
is, if C, C\, and C<l are the curves parametrized by g(t) for t G [a, b], t £ [a, c],
and t e [c, 6], then L(C) = L{CX) + L{C2).
222 Chapter 5. Line and Surface Integrals; Vector Analysis

9. Let g(£) = (g(t),h(t)) be a Cl parametrization of a plane curve. Given a

partition P = {to, ■ • ■ ,tj} of [a, 6], the distance between two neighboring
points g(tj-i) and g(t,) is

Use the mean value theorem to express the differences inside the square root in
terms of g' and h', and then use Exercise 9 in §4.1 to give an alternate proof of
Theorem 5.11. (Exactly the same idea works for curves in Rn.)

5.2 Green's Theorem

Green's theorem is the simplest of a group of theorems — actually, they're all

special cases of one big theorem, as we shall indicate in §5.9 — that say that "the
integral of something over the boundary of a region equals the integral of something
else over the region itself." To state it, we need some terminology.
A simple closed curve in Rn is a curve whose starting and ending points co
incide, but that does not intersect itself otherwise. More precisely, a simple closed
curve is one that can be parametrized by a continuous map x = g(£), a < t < b,
such that g(a) = g(6) but g(s) ^ g(£) unless {s, £} — {a, b}.
We shall use the term regular region to mean a compact set in Rn thatjs the
closure of its interior. Equivalently, a compact set S C Rn is a regular region if
every neighborhood of every point on the boundary dS contains points in Smt. For
example, a closed ball is a regular region, but a closed line segment in Rn (n > 1)
is not, because its interior is empty.
Now let n = 2. We say that a regular region S C R2 has a piecewise smooth
boundary if the boundary dS consists of a finite union of disjoint, piecewise
smooth simple closed curves, where "piecewise smooth" has the meaning assigned
in the previous section. (We thus allow the possibility that S contains "holes," so
that its boundary may be disconnected.) In this case, the positive orientation on dS
is the orientation on each of the closed curves that make up the boundary such that
the region S is on the left with respect to the positive direction on the curve. More
precisely, if x is a point on dS at which dS is smooth, and t = (ti, £2) is the unit
tangent vector in the positive direction at that point, then the vector n = (£2, —£1),
obtained by rotating t by 90° clockwise, points out of S. (That is, x + en ^ S for
small e > 0.) See Figure 5.4.
If F = (Fi, F2) is a continuous vector field on R2, we denote by

f f
/ F • dx or / F\ dx\ + i
JdS JdS
5.2. Green's Theorem 223

FIGURE 5.4: A region with piecewise smooth, positively oriented boundary.

the sum of the line integrals of F over the positively oriented closed curves that
make up dS.

5.12 Theorem (Green's Theorem). Suppose S is a regular region in R2 with

piecewise smooth boundary dS. Suppose also that F is a vector field of class
C1 on S. Then

(5.13) /
Jds
In the more common notation, if we set F = (P, Q) and x = (x, y),

J
IdS

Proof First we consider a very restricted class of regions, for which the proof is
quite simple. We shall say that the region S is x-simple if it is the region between
the graphs of two functions of x, that is, if it has the form

(5.15) S = {(x,y) :a<x<b, }

where tpi and ip2 are continuous, piecewise smooth functions on [a, 6]. Likewise,
we say that S is y-simple if it has the form

(5.16) S={(x,y):c<y<d, fa(y) < x < ^2{y)},

where ifj\ and ip2 are continuous, piecewise smooth functions on [a, b].

Example 1. The region bounded by the curve y — |x3 — 1, the line x + 2y =

2, and the y-axis is both x-simple and y simple. (See Figure 5.5.) It has the
forms (5.15) and (5.16) with

a = 0, b = 2, <fii(x) = |x3 - 1, <P2(x) — 1 — \x,

a a i / \ n , f \ /2(y + l)1/3 if -1 <»< 0,

2 - 2y if 0 < y < 1.
224 Chapter 5. Line and Surface Integrals; Vector Analysis

-1

Figure 5.5: The region in Example 1.

Now let us suppose that S is both x-simple and y-simple. If we write S in

the form (5.15), then dS consists of (i) the curve y = (pi(x), oriented from left to
right, (ii) the curve y = (p2(x), oriented from right to left, and (iii) portions of the
vertical lines x = a and x = b, which may reduce to single points. The line integral
Ls P dx is the sum of the integrals over these pieces. On the vertical lines, x is
constant and so dx = 0 (that is, dx/dt = 0 in any parametrization), so these pieces
contribute nothing. On the curves y = <pi(%) and y — <f2{x) we can take x as the
parameter, except that the orientation is wrong for y = ip^ix)', hence

f fb fb
/ Pdx = P(x,(fi(x)) dx — / P(x,if2(x)) dx.
JdS Ja Ja

On the other hand, by the fundamental theorem of calculus,

ftp rb i"P2(x) ftp fb

rl- dA= I / %-dydx = / \P{x, - P{x, dx.
9y Ja Ju>x(x) ^J Ja

Comparing these equalities, we obtain

f Pdx = - ff^dA.
Jas JJs dy

In exactly the same way, using the representation (5.16) for S, we see that

Qiy=fld-3.iA.
ds JJs dx
(There is no minus sign here, because if we take y as the parameter for the curves
x = ipi{y) and ^2(2/), the orientation is wrong for V'i and right for ip2-) Adding
these last two equalities, we obtain the desired result (5.14).
5.2. Green's Theorem 225

Figure 5.6: A decomposition of the region in Figure 5.4 into simple subregions.

Thus Green's theorem is established for regions that are both x-simple and y-
simple. There is now an immediate generalization to a much larger class of regular
regions. Namely, suppose the region S can be cut up into finitely many subregions,
say S = Si U USk, where
a. the 5j's may intersect along common edges but have disjoint interiors;
b. each Sj has a piecewise smooth boundary and is both x-simple and y-simple.
(See Figure 5.6.) Since the 5/s overlap only in a set of zero content, by Corollary
4.23b we have

dQ__dP^ dQ__dP_
dA.
dx dy dx dy
.7 = 1

On the other hand, we also have

I {Pdx + Qdy) = V f (Pdx Qdy),

JdS -=l JdSj

because the integrals over the parts of the boundaries of the S/s that are not parts of
the boundary of 5 all cancel out. In more detail, if Si and Sj have a common edge
C, then C will have one orientation as part of dSi and the opposite orientation
as part of dSj, so the two integrals over C that make up parts of JdS and Jas_
will cancel each other. Therefore, we obtain Green's theorem for the region S by
applying Green's theorem to the simple regions Sj and adding up the results.
The result we have just obtained is sufficient for most practical purposes, but
it is not definitive. The class of regular regions that can be cut up into simple
subregions does not include all regions with C1 boundary, much less all regions
with piecewise smooth boundary, and it may be difficult to tell whether a given
region has this property. For example, the region

{{x,y) : 0 < x < 1, 0 < y < 1 + x3sinx"1}

226 Chapter 5. Line and Surface Integrals; Vector Analysis

is x-simple but cannot be cut up into finitely many y-simple subregions because
the graph of x3 sinx"1 has infinitely many "wiggles." The deduction of the general
case from the special cases considered here requires some additional machinery that
is of interest in its own right; we present it in Appendix B.7 (Theorem B.28). □

EXAMPLE 2. Let C be the unit circle x2 + y2 — 1, oriented counterclockwise.

The line integral

/ [V1 + x2 - yexy + 3y] dx + [x2 - xexy + log(l + y4)} dy

Jc
is difficult to evaluate directly, but it yields easily to Green's theorem. Indeed,
C is the oriented boundary of the unit disc D, so the integral equals

x2 - yexy + 3j/l ^ dA
(2x - 3) dA = -6tt.

(The integral of 2x over D vanishes by symmetry.)

Example 3. It is an amusing and sometimes useful fact that the area of a reg
ular region S in the plane can be expressed as a line integral over the boundary
dS. This can be done in many different ways; for instance,

/ / f
Area of S = j xdy = — y dx — ^(xdy — ydx).
Jas Jas Jas

Indeed, Green's theorem shows that all of these integrals are equal to JJS 1 dA.

The line integral Ls F ■ dx is the integral of the tangential component of F

over dS. However, Green's theorem can also be interpreted as at statement about
the integral of the normal component of a vector field.
To see this, recall that counterclockwise and clockwise rotations by 90° in
the plane are given by the transformations R+(x,y) = {—y,x) and R-(x,y) =
(y, —x), respectively. Thus, if t = (ti,t2) is the unit tangent vector to dS at a
point on dS, pointing in the forward direction, then n = -R-(t) = fa, — h) is the
unit normal vector to dS pointing out of 5. Given a vector field F = (Fi,F2),
let F = R+(F) = (—i<2, F{) be the vector field obtained by rotating the values
of F by 90° counterclockwise. Then the normal component of F is the tangential
component of F:
F ■ n = Fit2 - F2h = F • t.
5.2. Green's Theorem 227

Hence, by applying Green's theorem to the rotated field F, we obtain the following
result:

5.17 Corollary. Suppose S is a regular region in R2 with piecewise smooth bound

ary dS, and let n(x) be the unit outward normal vector to dS at~x. £ dS. Suppose
also that F is a vector field of class C1 on S. Then

(5.18) [*.n«- [[
Jos JJs

Let us see what Green's theorem says when F is the gradient of a C2 function
/, so that F\ = d\f and F2 = d2f ■ Formula (5.13) gives

f V/-dx = f[(dld2f-d2dlf)dA= f[odA = O.

.Ids JJs JJs

This is no surprise; it is easy to see directly that the line integral of a gradient over
any closed curve vanishes. Indeed, if the curve C is parametrized by x = g(t) with
g(a) = g(b), then by the chain rule,

f V/ • dx = Ja/ V/(g(i)) ■ g'(t) dt = Jaf ^/(g(t))

JC at
dt
= /(g(6)) - /(g(a)) = 0.

The formula (5.18) gives a more interesting result. V/ • n is the directional deriva
tive of / in the outward normal direction to dS, or normal derivative of / on dS,
often denoted by df/dn; and (29) says that

fd2f d2f\
as
jf
dn
ds = JJS
/ / \dxf
-4
\df
4 + ^4
d\)
) dA-

The integrand on the right is the Laplacian of /, which we encountered in §2.6 and
which will play an important role in §5.6.

EXERCISES

1. Evaluate the following line integrals by using Green's theorem.

a. The integral in Exercise 5c in §5.1.
b. The integral in Exercise 6c in §5.1.
c. Jc[{x2 + Wxy + y2) dx + (5x2 + 5xy)dy], where C is the square with
vertices (0,0), (2,0), (0,2), and (2,2), oriented counterclockwise.
228 Chapter 5. Line and Surface Integrals; Vector Analysis

d. JdS(3x2 shiy2 dx + 2x3ycosy2dy), where 5 is any regular region with

piecewise smooth boundary.

2. Let 5 be the annulus 1 < x2 + y2 < 4. Compute fdS(xy2 dy — ary dx), both
directly and by using Green's theorem.

3. Find the positively oriented simple closed curve C that maximizes the line
integral fc[y3 dx + (3a; - x3) dy].
4. Use Green's theorem as in Example 3 to calculate the area under one arch of
the cycloid described parametrically by .7; = R(t - shit), y = R(l — cos I).

5. Let 5 = {(x,y) : a < x < b, 0 < y < f(x)}, where / is a nonnegative C[

function on [a, b]. Explain how the formula A = — fQS y dx for the area of 5
in Example 3 leads to the familiar formula .4 = fa f(x) dx.
6. Let S be a regular region in R2 with piecewise smooth boundary, and let / and
g be functions of class C2 on S. Show that

/gj ds = If [f(d2x9 + dig) + V/ • Vp] dA.

dS <-'n JJS

7. The point of this exercise is to show how Green's theorem can be used to de
duce a special case of Theorem 4.41. Let U, V be connected open sets in R2,
and let G : U —> V be a one-to-one transformation of class CL whose deriva
tive DG(u) is invertible for all u G U. Moreover, let 5 be a regular region in V
with piecewise smooth boundary, let A be its area, and let T = G~l (S).
a. The Jacobian det DG is either everywhere positive or everywhere negative
on U; why?
b. Suppose det,DG(u) > 0 for all u G U. Write A = fQSydx as in Ex
ample 3, make a change of variable to transform this line integral into
a line integral over OT, and apply Green's theorem to deduce that A =
ffT det DG dA.
c. By a similar argument, show that if detZ?G(u) < 0 for all u € U, then
A = - jjr det DG dA = ]JT \ det, DG\ dA. Where does the minus sign
come from?

5.3 Surface Area and Surface Integrals

In this section we discuss integrals of functions and vector fields over smooth sur-
faces in K . Like line integrals, surface integrals come in two varieties, unoriented
and oriented. On a curve the orientation is a matter of deciding which direction
along a curve is "positive"; on a surface it is a matter of deciding which side of the
surface is the "positive" side. The convenient way of specifying the orientation of
5.3. Surface Area and Surface Integrals 229

Figure 5.7: A Mobius band.

a smooth surface in R3 is to make a choice of one of the two unit normal vectors
at each point of the surface, in such a way that the choice varies continuously with
the point. The "positive" side of the surface is the one into which the normal arrow
points.
It is important to note that not every surface can be oriented. The standard
example of a nonorientable surface is the Mobius band, which can be constructed
by taking a long strip of paper, giving it a half twist, and gluing the ends together.
(That is, call the two sides of the original strip A and B; the ends are to be glued
together so that side A of one end matches with side B of the other.) A sketch of a
Mobius band is given in Figure 5.7, but the best way to appreciate the features of
the Mobius band is to make one for yourself.
However, if a surface forms part of the boundary of a regular region in R3, it
is always orientable, and the standard specification for the orientation is that the
positive normal vector is the one pointing out of the region.

Surface Area. We begin by deriving a formula for the area of a region on

a smooth surface S. We shall assume that S is represented parametrically as the
image of a connected open set W in the iw-plane under a one-to-one C1 map
G : W -> M3:
x= (x,y,z) = G(u,v), (u,v)eW.

For a given surface S, it may not be the case that all of S can be represented by a
single parametrization. We shall assume, however, that S can be cut up into finitely
many pieces which each admit a parametrization; it is then enough to consider the
pieces separately. Also, it is usually sufficient to have a good parametrization for a
subset of S whose complement is of lower dimension, such as the one provided by
spherical coordinates on the unit sphere with the "international date line" removed.
To see how to compute surface area on S, consider a small rectangle in the uv-
plane with vertices (u, v), (u + Ait, v), (u, v + Av), and (u + An, v + A?;).* Its
image under the map G is a small quadrilateral (with curved sides) on the surface
230 Chapter 5. Line and Surface Integrals; Vector Analysis

S whose vertices are G(u, v), G(u + Au, v), etc. (See Figure 3.4 in §3.3.) In the
limit in which the increments Au and Av become infinitesimals du and dv, this
quadrilateral becomes a parallelogram whose sides from the vertex x = G(u, v) to
the two adjacent vertices are described by the vectors

<9G dC
G(u + du, v) — G(u, v) = —— du and G(u, v + dv) — G(v) = —— dv.

These two vectors are tangent to the surface 5 at x, so their cross product is a
vector normal to S at x, whose magnitude is the area of the parallelogram they
span. Therefore, the element of area on S is given in terms of the parametrization
x = G(u,v) by

dG dG
(5.19) dA = dudv.
du dv

In other words, if R is a measurable subset of W in the uv-plane and G(R) is the

corresponding region in the surface S,

dG dG
(5.20) Area of G{R) = du dv.
R du dv

Henceforth we shall take (5.20) as the definition of area for a parametrized

surface. One might wonder if surface area can also be defined by considering poly
hedral approximations to the surface, as polygonal approximations to a curve were
used to define arc length in the appendix of §5.1. The answer is affirmative, but this
matter is a good deal trickier than the theory of arc length, and we shall not pursue
it further.
Let us be a little more explicit about the formula (5.19). With the notation
G(u, v) = (x, y, z), we have

i j k
dG dG d(z,x) d(x,y)]
= det | dux duy duz
du dv d(u,v) d{u,v)) d(u,v)
tx ovy ovz/

Thus,

~d{y,z) , x) d(x,y)
(5.21) dA = dudv.
d(u,v) d(u,v) d(u,v)

Computationally, this is usually a horrible mess. (But what did you expect? Arc
length is already problematic; surface area must be worse!)
5.3. Surface Area and Surface Integrals 231

As with arc length, we must verify that our informally-derived formula for sur
face area really makes sense by checking that it is independent of the parametriza-
tion. Thus, suppose we make a change of variables (u, v) = <fr(s, t), where $ is a
one-to-one C1 map from a region V in the st-plane to the region W in the uv-plane.
The elements of area are then related by

d(u, v)
du dv = dsdt,
d(s,t)

by Theorem 4.41. If we plug this into (5.21), we get

d{y,z) d(u,
dA = y/tf + W+i*dsdt, where a = o(u,v)
a) [*, J> etc-
o(s,t)

But by the chain rule and the fact that the determinant of a product is the product
of the determinants, we have

d{y,z)d(u,v) d{y,z)
d{u,v) d(s,t) " d{s,t) '

and likewise for the other two terms. Hence, in the st-parametrization,

9
\d(y,z) d(z,x)V , \d(x,y)
dsdt.
[d(s,t)\ [d{s,t)\

This is of exactly the same form as (5.21), as we wished to show.

The formula for surface area becomes a little less hideous in the special case
where the surface is the graph of a function, z = <p(x, y). In this case we can take
x and y as the parameters, that is,

G(x,y) = (x, y, v(x,y)).

Here dxG = (1,0, dx<p) and dyG = (0,1, dy<p), so

dG dG
^ = {d^^p)3 + k,

dA = y/1 + {dx<p)2 + (dyip)2 dx dy.

(Note that our surface is a level set of the function &(x, y, z) — z — tp(x, y) and
that — (d>,;<p)i — (dyip)j + k = V$; we deduced that V<1> is normal to the surface
by other means in Theorem 2.37.)
232 Chapter 5. Line and Surface Integrals; Vector Analysis

EXAMPLE 1. Let us compute the surface area of the unit sphere x2 + y2 + z2 =

1. We can proceed on two ways:
Solution I. The upper hemisphere is the graph of the function ip(x,y) =
y/l — x2 — y2. A little calculation yields
1
(dx<p)2
v1 x ~ v
w

and by (5.22), the area of the upper hemisphere is obtained by integrating this
function over the unit disc. (Note that this integral is improper, as the integrand
blows up along the boundary of the disc.) Switching to polar coordinates yields

/ / dOdr = -2nVl~r2 „ = 2n.

Jo Jo ^/^^72 l0
Hence the area of the whole sphere is An.
Solution II. We can parametrize the sphere by the spherical coordinates
x = sin (p cos 9, y = sin ip sin 9, z = cos (p. An easy calculation yields

. 2 a d(z,x) . 2 d(x,y)
= sm ipcos0, ——— = sm ipsins', —-—— = cosipsmip,

and the sum of the squares of these quantities is

sin4 <p(cos2 9 + sin2 9) + cos2 (p sin2 ip = sin2 (p(cos2 <p + sin2 ip) = sin2 tp.
Hence, by (5.21), the area of the sphere is

[•it r'2n

/ / sin <p d9 dip = —2tt cos (p ™ = in.

Jo Jo
Surface Integrals of Scalar Functions. Now that we know how to compute
surface area, it is easy to define the integral of a real-valued continuous function
over a surface: It is just JJS f dA, where dA is the element of surface area de
fined above. (To keep the notation simple, we shall take the region over which the
integration is performed to be the whole surface 5; the idea is exactly the same
for integration over subsets of S.) More precisely, if S admits a parametrization
x = G(u, v) with (u, v) 6 W, where W is tacitly assumed to be measurable,

dG dG
fdA= f(G(u,v)) x dudv.
s J Jw du dv

If S is the graph of a function z = tp(x, y), (x, y) € W, the result is

fdA= // f(x,y,
J Jw
5.3. Surface Area and Surface Integrals 233

Surface Integrals of Vector Fields. The element of area dA on a surface S

parametrized by x = G(u, v) is the norm of the vector {duG x dvG) du dv. It is
natural to regard the vector (duG x dvG) du dv itself as a "vector element of area"
for S: its magnitude gives the area of a small bit of S, and its direction, namely the
normal direction to S, specifies how that bit is oriented in space. That is, we have

dG dG\ , ,
—— x —— du dv = n dA
du dv )

where n is a unit normal vector to the surface S. We have already observed that
dA is independent of the parametrization, and clearly so is n up to a factor of ±1.
However, using a different parametrization (for example, interchanging u and v)
might result in replacing n by -n. In other words, a parametrization for a surface
5 gives a definite orientation for the 5, that is, a specification of which side of S is
the "positive" side.
Now suppose 5 is a surface with a specified orientation, and F is a continuous
vector field defined on a neighborhood of 5. The surface integral of F over 5 is
defined to be

Thus, if S is parametrized by x = G(«, v), (u, v) G W, we have

F-ndA= F{G{u,v)) • ( —- x —- ) dudv.

s JJw \du dv J
This integral is independent of the choice of parametrization as long as the paramet
rization induces the specified orientation of 5; switching to the opposite orientation
results in multiplying the integral by — 1. (If S is a nonorientable surface such as a
Mobius band, JJS F ■ n dA is not defined.)
A geometric-physical interpretation of this is easy to obtain. F-n is the normal
component of F along S; it is positive or negative according as F points into the
positive or negative side of 5. We can think of F as representing the flow of some
substance (air, for example, although there is no need to be specific at this point):
the magnitude of F(x) is the rate of flow of the substance past x and its direction
is the direction of flow. The integral JjF ■ ndA then represents the net flow, or
flux, of F across the surface S from the negative side to the positive side. We shall
discuss this in more detail in §5.6.
As with line integrals, surface integrals of vector fields are often easier to com
pute than suface integrals of scalar functions because the inconvenient square root
in the formula for dA does not appear in the vector n dA. Let us see, for example,
what JJ5 F • n dA becomes when S is the graph of a function with domain I^Cl2,
234 Chapter 5. Line and Surface Integrals; Vector Analysis

say z = <p(x, y). As in the preceding discussion of surface area, we take x and y
as the parameters and find that

n • dA = [— (dxip)i - (dyip)j + k] dx dy.

The orientation here is the one with the normal pointing upward, since its z com
ponent is positive. Thus, if F = Fii + F2j + F3k and G(x, y) = (x, y, <p(x, y)),

(5.23) F-ndA

dxdy.

Here and in what follows, we adopt the common practice of denoting by i, j,

and k the unit vectors in the positive coordinate directions and writing vector fields
in M3 as F = FJ + F2j + F3k in preference to F = (F1; F2, F3); this serves to
emphasize the interpretation of F as a vector field rather than a transformation.

Example 2. Let S be the portion of the cone x2 + y2 = z2 with 0 < z < 1,

oriented so that the normal points upward, and let F(x, y, z) = x2i + yzj + yk.
Compute JJS F ■ n dA.
Solution. One way is to use polar coordinates as parameters: G(r, 9) =
(rcosO, rsin9, r). Then we have drG = (cos#)i + (sin0)j + k and <9gG =
— (r sin6)'i + (r cos6>)j, so

drG x deG = -(rcoB0)i- (r sin 0)j + rk.

This gives the right orientation since the z component, namely r, is positive.
Thus,

F-ndA

/■2tt /"I

= / / [(r cos 6)2(-r cos 0) + (r sin 9)r(-r sin 9) + (r sin 9)r] drd9,
Jo Jo
whose value is easily found to be —\ix. Alternatively, we could use the repre
sentation z = \Jx2 + y2 and use (5.23). The reader may verify that this leads
to

F • n dA = dx dy,
Ix2+y2<l

and conversion of this integral to polar coordinates leads to the same r#-integral
as before.
5.3. Surface Area and Surface Integrals 235

Finally, as a practical matter we need to extend the ideas in this section from
smooth surfaces to piecewise smooth surfaces. Giving a satisfactory general def
inition of a "piecewise smooth surface" is a rather messy business, and we shall
not attempt it. For our present purposes, it will suffice to assume that the surface 5
under consideration is the union of finitely many pieces Si,..., Sk that satisfy the
following conditions:

i. Each Sj admits a smooth parametrization as discussed above,

ii. The intersections Si fl Sj are either empty or finite unions of smooth curves.

We then define integration over 5 in the obvious way:

jfsf^=±JLfdA-
Condition (ii) guarantees that the parts of S that are counted more than once on
the right, namely the intersections Si D Sj, contribute nothing to the integral, by
Propositions 4.19 and 4.22.

Example 3.
a. Let 5 be the surface of a cube; then we can take Si,..., 56 to be the faces
of the cube.
b. Let S be the surface of the cylindrical solid {(x,y,z) : x2 + y2 < 1, \z\ <
1}. We can write 5 = Si U S2 U S3 where Si and 5*2 are the discs forming
the top and bottom and 53 is the circular vertical side. Si and 52 can be
parametrized by (x,y) —)• (x,y, 1) and (x,y) —* {x,y, — 1) witha;2 + y2 <
1, and 53 can be parametrized by {6,z) —> (cos6,s'm9,z) with 0 < 0 <
2n and \z\ < 1. If one wishes to use only one-to-one parametrizations with
compact parameter domains, one can cut 53 further into two pieces, say
the left and right halves defined by 0 < 9 < ir and ir < 9 < 2tt.

Remark. In condition (ii) above, we have in mind that the sets Sj will intersect
each other only along their edges, although there is nothing to forbid them from
crossing one another. For example, 5 could be the union of the two spheres S\ =
{x : |x| = 1} and 52 = {x : |x - i| = 1}. This added generality is largely useless
but also harmless.

EXERCISES

1. Find the area of the part of the surface z = xy inside the cylinder x2 + y2 = a2.
236 Chapter 5. Line and Surface Integrals; Vector Analysis

2. Find the area of the part of the surface z = x2+y2 inside the cylinder x2+y2 =
a2.
3. Suppose 0 < a < b. Find the area of the torus obtained by revolving the circle
(x — b)2 + z2 = a2 in the xz-plane about the z axis. (Hint: The torus may be
parametrized by x = (b + a cos </?) cos 9,y = (b + a cos ip) sin 6, z = a sin tp,
with 0 < ip, 6 < 2tt.)

4. Find the area of the ellipsoid (x/a)2 + {y/a)2 + (z/b)2 = 1.

5. Find the centroid of the upper hemisphere of the unit sphere x2 + y2 + z2 — 1.
6. Compute JJs(x2+y2 )dA where 5 is the portion of the sphere x2+y2 + z2 = 4
with z > 1.

7. Compute JJs(x2 + y2 — 2z2) dA where S1 is the unit sphere. Can you find the
answer by symmetry considerations without doing any calculations?

8. Calculate JJS F • n dA for the following F and S.

a. F(x,y,z) = xzi — xyk; S1 is the portion of the surface z = xy with
0 < x < 1, 0<y<2, oriented so that the normal points upward.
b. F(x, y, z) = x2i + zj — yk; S is the unit sphere x2 + y2 + z2 = 1, oriented
so that the normal points outward (away from the center).
c. F(x,y,z) = xy\ + zj; S is the triangle with vertices (2,0,0), (0,2,0),
(0,0,2), oriented so that the normal points upward.
d. F(x, y, z) = z2k; S is the boundary of the region x2 + y2 < 1, o < z < b,
oriented so that the normal points out of the region. (You should be able to
do this in your head.)
e. F(x, y, z) = xi + y] + zk; 5 is the boundary of the region x2 + y2 < z <
a/2 — x2 — y2, oriented so that the normal points out of the region.

5.4 Vector Derivatives

Let V denote the n-tuple of partial differential operators dj =

We are already familiar with this notation in connection with the gradient of a C1
function on Rn, which is the vector field defined by

We can also use V to form interesting combinations of the derivatives of a vector

field, via the dot and cross product. If F is a C1 vector field on an open subset of
5.4. Vector Derivatives 237

Rn, the divergence of F is the function defined by

divF = V • F = diFi + ■■■ + dn,Fn.

The geometric (coordinate-invariant) meaning of V • F will be explained in §5.5.

Next, suppose n = 3. If F is a C1 vector field on an open subset of R3, the
curl of F is the vector field defined by

curlF = V x F = (d2F3 - <93F2)i + (<93-Fi - ft.JF'sJJ + (diF2 - d2F1)k.

(Some authors write rotF instead of curlF; "rot" stands for "rotation.") Again,
the curl has a geometric significance that will be explained later, in §5.7.
We shall employ the notations div F and curl F in preference to V-F and V x F
because they seem to be more readable. In this section we shall also write grad /
instead of V/ for the sake of consistency; later we shall use these two notations
interchangeably.
The operators grad, curl, and div satisfy product rules with respect to scalar
multiplication and dot and cross products. As these rules are useful and some of
them are not obvious, it is well to make a list for handy reference. In the following
formulas, / and g are real-valued functions and F and G are vector fields, all of
class C1.

(5.24) grad(/#) = / grad g + g grad /

(5.25) grad(F • G) = (F ■ V)G + F x (curl G) + (G • V)F + G x (curlF)

(5.26) curl(/G) = / curl G + (grad /) x G
(5.27) curl(F x G) = (G ■ V)F + (divG)F - (F • V)G - (divF)G
(5.28) div(/G) = / div G + (grad /) ■ G
(5.29) div(F x G) = G • (curlF) - F ■ (curlG)

In (5.25) and (5.27), F • V denotes the directional derivative J2 Fjdj>tnat is>

. dG

Equations (5.24) and (5.28) are valid in R" for any n; the others, which involve
cross products and curls, are restricted to n = 3. The proofs of all these formulas
are just a matter of computation; we leave them to the reader as exercises.
We can combine the operations grad, curl, and div pairwise in several ways.
That is, if / and F are of class C2, we can form

curl(grad/), div(cuiiF), div(grad/), curl(curlF), grad(divF).

238 Chapter 5. Line and Surface Integrals; Vector Analysis

It is an important fact that the first two of these always vanish, by the equality
of mixed partials:

(5.30) curl(grad/)

= (d2d3f - d3d2f)i + (dsdxf - 0id3/)j + (d&f - ftft/Jk = 0

and

(5.31) div(curlF)
= 9i(a2F3 - d3F2) + d2(d3F1 - dLF3) + d3{dvF2 - d2Fi) = 0.

Schematically, we have

scalar grad vector curl vector div scalar

functions fields fields functions

and (5.30) and (5.31) say that the composition of two successive mappings is zero.
The third combination, div(grad/), which makes sense in any number of di
mensions, is of fundamental importance for both physical and purely mathematical
reasons. It is called the Laplacian of / and is usually denoted by V2/ or A/:

(5.32) V2/ = A/ = div(grad /) = d\f + ■ • ■ + d2f.

The last two combinations are of less interest by themselves, but together they yield
the Laplacian for vector fields in M3:

(5.33) grad(divF) - curl(curlF) = V2F = (V2.F\)i + (V2F2)j + (V2F3)k.

The verification of (5.33) is a straightforward but somewhat tedious calculation that

we leave to the reader.

EXERCISES

1. Compute the curl and divergence of the following vector fields.

a. F(x, y, z) = xy2\ + xyj + xijk.
b. F(x, y, z) = (sinyz)'i + (xz cos yz)j + (xy cos yz)h.
c. F(;r, y, z) = x2z'\ + Axyzj + (y - 3xz2)k.
2. Compute the Laplacians of the following functions.
a. f(x,y) = x5 - 10x'sy2 + 5xy4.
b. f(x,y,z) = xy2 -Ayz3.
5.5. The Divergence Theorem 239

c. /(x) = |x|a (x € K" \ {0}, a € K). (flinf Use Exercise 9 in §2.6.)

d. f(x, y) = log(x2 + y2) ((x, y) + (0, 0)).
3. LetF(x,y,z) = xi+yj+zk. Show that for any a £ R3, wehavecurl(axF) =
2a, div[(a ■ F)a] = |a|2 and div[(a x F) x a] = 2|a|2.
4. Prove (5.24) and (5.25).

5. Prove (5.26) and (5.27).

6. Prove (5.28) and (5.29).

7. Prove (5.33).

8. Why is the minus sign in (5.29) there? That is, on grounds of symmetry, with
out going through any calculations, why must the formula div(F x G) =
G • (curl F) + F • (curl G) be wrong?
9. Show that for any C2 functions / and g, div(grad / x gradg) = 0.

5.5 The Divergence Theorem

The divergence theorem, also known as Gauss's theorem or Ostrogradski's the

orem, is the 3-dimensional analogue of the version (5.18) of Green's theorem; it
relates surface integrals over the boundary of a regular region in R3 to volume inte
grals over the region itself. The divergence theorem is valid for regions with piece-
wise smooth boundaries, but we shall allow the meaning of "piecewise smooth"
to remain a little vague; see the remarks at the end of §5.3. To formulate precise
conditions that encompass all the cases of interest would involve a rather arduous
excursion into technicalities, and the more retricted class of regions covered by the
following argument suffices for most purposes.

5.34 Theorem (The Divergence Theorem). Suppose R is a regular region in R3

with piecewise smooth boundary dR, oriented so that the positive normal points
out of R. Suppose also that F is a vectorfield of class Cl on R. Then

(5.35) // F-ndA= divFdV.

JJdR JJJr
Proof. As with Green's theorem, we begin by considering a class of simple regions.
We say that R is xy-simple if it has the form

R= {(x,y,z) : (x,y) e W, <fii(x,y) <z< tp2{x,y)},

where W is a regular region in the xy-plane and <pi and <p2 are piecewise smooth
functions on W. We define the notions of yz-simple and xz-simple similarly, and
we say that R is simple if it is rcy-simple, yz-simple, and xz-simple.
240 Chapter 5. Line and Surface Integrals; Vector Analysis

Suppose now that R is simple. We shall prove the divergence theorem for
the region R by considering the components of F separately. That is, let F =
Fii + F23 + F3k; we shall show that

F3k-ndA= /// 83F3dV,

dR, JJJr
and similarly for the other two components. Since R is ay-simple, the boundary
dR consists of three pieces: the "top" and "bottom" surfaces z = ip2(x,y) and
z = ipi(x, y) and the "sides" consisting of the union of the vertical line segments
from (x,y,<pi(x,y)) to (x,y,ip2(x,y)) as (x.y) ranges over the boundary of W.
The outward normal to R is horizontal on the sides, i.e., k • n = 0 there, so the
sides contribute nothing to the surface integral. For the top and bottom surfaces we
use (5.23). The outward normal points upward on the top surface and downward
on the bottom surface, so

ff ff
F3k-ndA= // F3(x,y,ip2(x,y)) dxdy - // F3(x,y,tpi(x,y)) dxdy
dR J Jw J Jw

x, y, z) dz dx dy
W Jipi(x.y)

d3F3(x,y,z)dV,
R

as claimed. The proof for Fii and F2j is the same, using the assumptions that R is
yz-simple and zz-simple.
It now follows that the divergence theorem is valid for regions that can be cut
up into finitely many simple regions R\,..., R^. The integrals of div F over the
regions i?i,..., Rk add up to the integral over R, and the integrals of F ■ n over
the boundaries 8R±,..., dRk add up to the integral over dR because the integrals
over the portions of the dRj's that are not part of dR cancel out. (The reasoning is
the same as in the proof of Green's theorem.)
The completion of the proof for general regular regions with smooth boundary,
with indications of how to generalize it to the piecewise smooth case, is given in
Appendix B.7 (Theorem B.30). □

Armed with the divergence theorem, we can obtain a better understanding of

the meaning of div F. Suppose F is a vector field of class Cl in some open set
containing the point a. For r > 0, let Br be the ball of radius r about a. If r is
very small, the average value of divF(x) on the ball Bv is very nearly equal to
divF(a). Therefore, by the divergence theorem,

divF(a) w -^ [tl divFdV = -^ 11 F-ndA.

4^r3 JJJBr 4:irr3 JJdBr
5.5. The Divergence Theorem 241

This approximation becomes better and better as r —> 0, and hence

(5.36) div F(a) = lim -—* / / F • n dA.

x—a|=r

The integral on the right is the flux of F across dBr from the inside (J5r) to the
outside (the complement of Br). If we think of the vector field as representing
the flow of some substance through space, the integral represents the amount of
substance flowing out of Br minus the amount of substance flowing in; thus, the
condition divF(a) > 0 means that there is a net outflow near a, in other words,
that F tends to "diverge" from a. (The effect is subtle, though: One has to divide
the flux by r3 in (5.36) to get something that does not vanish in the limit.) In any
case, the integral in (5.36) is a geometrically defined quantity that is independent
of the choice of coordinates; this gives the promised coordinate-free interpretation
of div F.
Among the important consequences of the divergence theorem are the follow
ing identities.

5.37 Corollary (Green's Formulas). Suppose R is a regular region in R3 with

piecewise smooth boundary, and f and g are functions of class C2 on R. Then

(5.38) If fVg-ndA =
J JdR

(5.39) ff (fVg - gVf) ■ndA= fff (fV2g - gV2f) dV.

JJdR JJJR

Proof An application of the product rule (5.28) shows that div(/Vg) = V/ ■

V# + / ■ V2#, so the divergence theorem applied to F = fWg yields (5.38). The
corresponding equation with / and g switched also holds; by subtracting the latter
equation from the former we obtain (5.39). □

The directional derivative V/ • n that occurs in these formulas is called the

outward normal derivative of / on dR and is often denoted by df/dn.

EXERCISES

In several of these exercises it will be useful to note that if Sr is the sphere of

radius r about the origin, the unit outward normal to Sr at a point x 6 Sr is just
r^x. This is geometrically obvious if you think about it a little. Alternatively,
since Sr is a level set of the function |x|2 = x2 + y2 + z2, we know that V(|x|2) =
2xi + 2yj + 2zk = 2x is normal to Sr, so the unit normal is |x|~xx = r"1^ for
xeSr.
242 Chapter 5. Line and Surface Integrals; Vector Analysis

,1.^ Use the divergence theorem to evaluate the surface integral JJS F • n dA for the
following F and S, where S is oriented so that the positive normal points out
of the region bounded by S.
a. F, S as in Exercise 8b in §5.3.
b. F, S as in Exercise 8e in §5.3.
c. F(x, y, z) = x2i + y2] + z2k; 5 is the surface of the cube 0 < x, y, z < a.
d. F{x,y,z) = (x/a2)i + (y/62)j + {z/c2)k; S is the ellipsoid (x/a)2 +
(y/b)2 + (z/c)2 = 1.
e. F(x, y, z) = x2i - 2xyj + z2k; S is the surface of the cylindrical solid
{(x,y,z) : (x,y) G W, 1 < z < 2} where W is a smoothly bounded
regular region in the plane with area A.

2. Let F(x, y, z) = (x2 + y2 + z2)(xi + yj + zk) and let S be the sphere of radius
a about the origin. Compute Jfs F ■ n both directly and by the divergence
theorem.

3. Let R be a regular region in M3 with piecewise smooth boundary. Show that

the volume of R is | JJgR F • n dA where F(x, y, z) = xi + yj + zk.
4. Prove the following integration-by-parts formula for triple integrals:

where nx is the x-component of the unit outward normal to dR. (Of course,
similar formulas also hold with x replaced by y and z.)

5. Suppose R is a regular region in K3 with piecewise smooth boundary, and / is

a function of class C2 on R.
a. Show that ff ^fdA= [[[ \72fdV.
JJdR™ JJJr
b. Show that if V2/= 0, then // f-J-dA= flf\Vf\2dV.
JJdR dn JJJR>
6. Let x = (x, y, z) and g(x) = |x|—1 = (x2 + y2 + z2)"1/2.
a. Compute Vg(x) for x / 0.
b. Show that V2#(x) = 0 for x ^ 0. (Cf. Exercise 9 in §2.6.)
c. Show by direct calculation that Jfs(dg/dn) dA = —4vr if S is any sphere
centered at the origin.
d. Since dg/dn = Vp • n and V2g = div(Vg), why do (b) and (c) not
contradict the divergence theorem?
e. Show that JJdR(dg/dn) dA — —An if R is any regular region with piece-
wise smooth boundary whose interior contains the origin. {Hint: Consider
the region obtained by excising a small ball about the origin from R.)
5.6. Some Applications to Physics 243

7. Suppose that / is a C2 function on R3 that satisfies Laplace's equation

V2/ = 0.
a. By applying (5.39) to / and g, with g as in Exercise 6 and R = {x : e <
x| < r}, show that the mean values of / on the spheres |x| = r and
|x| = e are equal. (Use Exercises 5a and 6.)
b. Conclude that the mean value of / on any sphere centered at the origin is
equal to the value of / at the origin. (Remark: Jhere is nothing special
about the origin here. By applying this result to /(x) = /(x + a), which
also satisfies Laplace's equation, we see that the mean value of / on any
sphere is the value of / at the center. The converse is also true; a function
that has this mean value property must satisfy Laplace's equation.)

5.6 Some Applications to Physics

In this section we illustrate the uses of the divergence theorem by deriving some
important differential equations of mathematical physics. We make a standing as
sumption that all unspecified mathematical functions that denote physical quantities
are smooth enough to ensure the validity of the calculations.

Flow of Material. We have previously alluded to an interpretation of a vector

field in terms of material flowing through space. We now develop this idea in more
detail.
Suppose there is some substance moving through a region of space — it might
be air, water, electric charge, or whatever. The distribution of the substance is given
by a density function p(x, £); thus p(x, t) dV is the amount of substance at time t
in a small box of volume dV located at the point x = (x, y, z). The substance is
moving around, so we also have the velocity field v(x, t) that gives the velocity of
the substance at position x and time t.
Now consider a small bit of oriented surface dS (imagined, not physical) with
area dA and normal vector n located near the point x. (We shall picture dS as a
parallelogram, but its exact shape is unimportant.) At what rate does the substance
flow through this bit of surface?
First suppose that n is parallel to the velocity v = v(x, t). We picture a small
box with vertical face dS and length \v\dt, where dt is a small increment in time,
as in Figure 5.8a. We assume that the box is sufficiently small so that that the
velocity and density are essentally constant throughout the box during the time
interval (t, t + dt). Then the substance that flows through the surface dS in the
time interval dt is just the contents of the box at time t. The volume of the box is
244 Chapter 5. Line and Surface Integrals; Vector Analysis

(a) (b)

v V

FIGURE 5.8: Flow of material through a surface element dS.

|v| dt cL4, so the amount of substance in the box is p\ v| dt dA. In short, the rate of
flow of substance through dS is p|v| dA.
Now suppose, more generally, that the angle from the velocity v to the normal
n to dS is 9. We apply the same reasoning to the box in Figure 5.8b. The vertical
height of the box is now | cos 9\ times the slant height of dS, so the volume of the
box is |v| | cos#| dtdA = |v ■ n| dtdA. Therefore, the rate of flow of substance
through dS is pv • n dA if we take orientation into account, that is, if we count the
flow as negative when it goes in across dS in the direction opposite to n.
Passing from the infinitesimal level to the macroscopic level, we conclude that
the rate offlow of substance through a surface S is

J -ndA, where J(x, t) = p(x, i)v(x, t).

The time-dependent vector field J = pv that occurs here represents the momentum
density if p is the mass density of the substance, and it represents the current density
if the substance is electric charge and p is the charge density. Our earlier remarks
about interpreting vector fields in terms of flows really mean thinking of the vector
field as a momentum or current density.

A Conservation Law. Now we come to the application of the divergence the

orem. In the context of the preceding discussion, suppose that the substance is
conserved, i.e., that it is neither created nor destroyed. Consider a regular region R
in space with smooth boundary dR. The total amount of substance in R at time t
is JJJR p(x, t) <i3x. Since the substance is conserved, the only way for this amount
to change is for the substance to flow in or out through dR. Therefore,

d
p(x, t) J ■ ndA.
dt R dR
5.6. Some Applications to Physics 245

(The integral on the right is positive when the substance flows out of S, i.e., when
the amount of substance in S is decreasing; hence the minus sign.) The quantity
on the left is the integral over R of dp/dt, by Theorem 4.47. We can use the
divergence theorem to convert the integral on the right to another integral over R,
obtaining

Now, this relation holds for any region R. In particular, let us take R = Br to
be the ball of radius r centered at the point x. After division of both sides by the
volume of Br, (5.40) says that the mean values of dp/dt and - div J on Br are
equal. Letting r —> 0 and assuming that these functions are continuous, we see that
their values at the center x are equal. In short, we have

(5.41) ^
at
+ divJ = 0,
the classic differential equation relating the charge and current densities (or mass
and momentum densities, etc.).
This argument is reversible; that is, (5.41) implies that the substance is con
served. Indeed, suppose R is a regular region such that no substance flows in or out
of R. Integrating (5.41) and using Theorem 4.47 and the divergence theorem, we
obtain

so the amount of substance in R remains constant. Although (5.41) is equivalent

to the conservation of the substance, it is more informative than the mere statement
the substance is neither created or destroyed; it provides information about how the
substance can move around.
The conservation law (5.41) has an important consequence for an incompress
ible fluid such as water. Incompressibility means that the density p is a constant,
so that on the one hand, dp/dt = 0, and on the other, div J = div(pv) = pdivv.
Thus, (5.41) implies that the velocity field v for an incompressible fluid satisfies
divv = 0.

The Heat Equation. We now derive a mathematical model for the transfer
of heat through a substance by diffusion. (If the substance in question is a fluid
like water or air, our model does not take convection effects into account; we must
assume that the fluid is immobile on the macroscopic scale. But our model is valid
246 Chapter 5. Line and Surface Integrals; Vector Analysis

for the diffusion of heat in solids as well as in fluids that cannot flow readily, such
as air in a down jacket.) Our model will take the form of a differential equation for
the temperature u(x., t) at position x and time t.
The first basic physical assumption (which may be a simplification of the real-
life situation) is that the thermal energy density is proportional to the temperature.
The constant of proportionality a is the specific heat density; it is the product of the
usual specific heat or heat capacity and the mass density of the substance. The total
thermal energy (or "heat," for short) within a region R at time t is then

CTU(X, tJCTX.
R

The next assumption is Newton's law of cooling, which says that heat flows
from hotter to colder regions at a rate proportional to the difference in temperature.
In our situation, the precise interpretation of this statement is that the flux of heat
per unit area in the direction of the unit vector n at the point x is proportional to the
directional derivative Vii(x) • n of the temperature in the direction n, the constant
of proportionality being negative since heat flows in the direction of decreasing
temperature. Denoting the constant of proportionality by —K, then, we see that the
flux of heat across an oriented surface S with normal vector n is

KVu-ndA.
s

K is called the thermal conductivity.

Next, the amount of heat in a regular region R can change only by the flow of
heat across the boundary dR or by the creation or destruction of heat within R (by
a chemical or nuclear reaction, for example). Thus, if we denote by F(x, t) the rate
per unit volume at which heat is being produced at position x at time t, we have

d
fff au(x, t) d3x = If KVu(x, t)-ndA+ IIf F(x, t) o!3x.

Here n denotes the unit outward normal to dR, as usual, and the minus sign on the
surface integral has disappeared because a positive flow of heat out of R represents
a decrease of heat in R.
As in the preceding subsection, we bring the d/dt inside the integral and apply
the divergence theorem to obtain
5.6. Some Applications to Physics 247

Since this holds for an arbitrary regular region R, we conclude as before that

(5.42) <T^(x,i) 2() ()

This partial differential equation is known as the (inhomogeneous) heat equation;

it is of fundamental importance in the study of all sorts of diffusion processes. The
important special case F = 0 (the homogeneous equation) is what is usually called
the heat equation.
We have implicitly assumed that the specific heat density a and the thermal
conductivity K are constants. However, the same arguments apply to the more
general situation where they are allowed to depend on position, as will be the case
where the material through which the heat is diffusing varies in some way from
point to point. The reader may verify that the result is the following generalized
heat equation:

Potentials and Laplace's Equation. The electric field generated by a system

of electric charges is the vector field E whose value at a point x is the force felt
by a unit positive charge locted at x as the result of the electrostatic attraction or
repulsion to the system of charges. If the system is just a single unit positive charge
at the point p, the field is given by the usual inverse square law force, E(x) =
(x - p)/|x - p|3. (There should be a constant of proportionality, but we shall
assume that units of measurement have been chosen so that the constant is 1.) For
many purposes, it is more convenient to work with the electric potential u(x) =
p — xj"1, which is related to the electric field E by

E = -Vu.

(For any points xi and X2, u(x2) - u(xi) is the work done in moving a unit positive
charge from xi to X2 through the field E.)
If, instead of a single charge at one point, our system of charges consists of
a number of charges located at different points, the electric field (resp. electric
potential) generated by the system is just the sum of the fields (resp. potentials)
generated by the individual charges. We wish to consider the case where there
is a continuous distribution of charge (an idealization, but a useful one) in some
bounded region of space. That is, we are given a charge density function p(p), a
continuous function that vanishes outside some bounded set R. The field generated
by such a charge distribution is found in the usual way: Chop up the set R into tiny
pieces, treat the charge coming from each piece as a point charge, and add up the
248 Chapter 5. Line and Surface Integrals; Vector Analysis

resulting fields or potentials. We shall work primarily with the potentials, for which
the result is

(5.43) tt(x)=
|p-x

It will be convenient to make the substitution y = p — x. This is just a translation

of coordinates, so its Jacobian is 1, and we obtain

(5.44) «(*)=

A couple of comments are in order about this integral. We have written it as

an integral over M3, but it really extends only over the bounded region R — x —
{y : x + y G R} on which p(x + y) ^ 0. The integral is improper because of
the singularity of |y|~x at the origin, but one can easily see that it is absolutely
convergent by Proposition 4.65.
The main object of this subsection is to derive an important differential equation
relating u and p. The key point is the fact that the Laplacian of |y|-1 vanishes
except at the origin (where it is undefined):

(5.45) V2(|y|-1)=0fory^0.

The proof is a straightforward calculation (Exercise 2c in §5.4 or Exercise 6b in

§5.5).

5.46 Theorem. Suppose p is a function of class C2 on R3 that vanishes outside

a bounded set, and let u be defined by (5.44). Then u is of class C2 and V2u =
—4irp.

Proof. We can differentiate u by passing the derivatives under the integral sign.
They fall on p, which is assumed to be of class C2, so u is of class C2 and

(Strictly speaking, Theorem 4.47 does not apply because of the singularity of the
integrand at the origin, but this is a minor technicality. One can finesse the problem,
for example, by switching to spherical coordinates, in which the r2 sin ip coming
from the volume element cancels the r"1 of the integrand with room to spare.)
Here V2p(x + y) is obtained by differentiating p with respect to x, but the same
result is obtained by taking the derivatives with respect to y, for dx [p(x + y)] =
(djp)(x + y) = dyj [p(x + y)]. We can therefore use Green's formula to transfer
5.6. Some Applications to Physics 249

the derivatives to |y| 1. We need to take some care, however, since the singularity
of |y|-1 does not remain harmless after being differentiated twice.
Let us fix the point x and choose positive numbers e and K, with e < 1 and K
large enough so that p(x + y) = 0 if |y| > K - 1. Let Re>K = {y : e < |y| < K}.
We then have
V2p(x + y) «
V2n(x) = lim

The integrand has no singularities in the region R€jk, so we can apply Green's
formula (5.39) to obtain

V u(x) = lim

The integral over R€ik on the right vanishes by (5.45). Also, the boundary of R€>k
consists of two pieces, the sphere |y| = K and the sphere |y| = e, and the integral
over |y| = K is zero because p(x + y) and its derivatives vanish for |y| > K — 1.
Therefore,

(5.47) V2n(x) = lim ff [Vp(x + y)^)"1 - p(x + y)V(|y|-1)] ndA.

Vp(x + y) • n C C
(5.48) dA -dA- -47T62 = 4yrCe,
|y|=e

which vanishes as e —>• 0. To evaluate the second term in (5.47), we observe that
n = —e 1y. (See the remark preceding the exercises in §5.5.) An easy calculation
gives Vdyj"1) = -y/|y|3, so Vflyl"-1) • n = e"1 |yl2/|y|3 = e"2. Therefore,
(5.47) and (5.48) show that

y)
V ■u(x) = — lim dA
e—>-0
|y|=«

= (-4?r) lim
\y\=e
250 Chapters. Line and Surface Integrals; Vector Analysis

But the expression inside the brackets is just the mean value of p(x + y) on the
sphere |y| = e, which tends to p(x) as e —> 0, so the proof is complete. □

Remark. The hypothesis that p is of class C2 can be weakened (C1 is more

than enough); we impose it simply to avoid technicalities in the proof. In fact, if
p vanishes outside a bounded set and is integrable there, then the equation V2u =
—4-rrp holds on any open set on which p is C1. The key ideas of the proof are all
present in the preceding argument.

5.49 Corollary. The electric field E is related to the charge density p by div E =
47T/3.

Proof. Anp = div(-V-u) = div E. □

The differential equation V2u — —4np is called the inhomogeneous Laplace

equation or Poisson equation. The special case V2u = 0, valid in regions where
there are no charges, is the (homogeneous) Laplace equation. These equations
have been extensively studied; solutions of V2u = 0, in particular, have many
interesting properties and applications in many areas.
Everything we have said applies also to gravitational potentials and fields gen
erated by mass distributions with mass density p, except for some minus signs
coming from the fact that masses attract whereas like charges repel. Specifically,
the gravitational potential is given by u(x) = - JJJ p(x + y^y)"1 d3y, and it
satisfies V2« = Anp.
It should be noted that the preceding discussion applies only to situations where
the charge or mass density p is static, that is, unchanging in time. If the charges or
masses move around, things become more complicated. The basic reason is that if
a charge or mass at p is moved to a nearby point p', the potential it induces cannot
change instantly from |x-p|~x to |x - p'l"1 throughout all of space, because the
news of the move can only travel with the speed of light. For electricity, the physics
of time-varying fields is contained in Maxwell's equations, which we shall present
below; for gravity, it is described by general relativity. (If the time dependence
is not too rapid, however, the relativistic effects will be small and the preceding
calculations can be used as a good approximation. This is more often the case with
gravity than with electricity, because gravity is a much weaker interaction.)

Maxwell's Equations. Maxwell's equations are the fundamental differential

equations that are the foundation for the classical (unquantized) theory of electicity
and magnetism. They relate the electric field E, the magnetic field B, the charge
5.6, Some Applications to Physics 251

density p, and the current density J. In suitably normalized units, they are

1<9B
div E = 4-rrp, curl E = —
(5.50) c dt '
divB = 0, curlB = -^
c ot
+ —J,
c

where c is the speed of light. This is not the place for a thorough study of Maxwell's
equations and their consequences for physics, but we wish to point out a couple of
features of them in connection with the ideas we have been developing. In what
follows we shall assume that all functions in question are of class C2, so that the
second derivatives make sense and the mixed partials are equal.
First, Maxwell's equations contain the law of conservation of charge. Indeed,
by formula (5.30) we have

dp 1 ,. <9E c
-— = — div —— = — drv curlB — div J = — div J,
dt 4tt dt 4tt v '
and this is the conservation law in the form (5.41). Second, in a region of space
with no charges or currents (p = 0 and J = 0), by formula (5.33) we have

1 /9R 1
V2E = V(divE) -curl(curlE) = 0 + -curl
curl = rr^Tr
c
c ot c2 dt2

and

V2B = V(divB) - curl(curlB) = 0 - - curl ^ = ^^v

c ot cl otA
That is, the components of E and B all satisfy the differential equation

This is the wave equation, another of the fundamental equations of mathematical

physics. It describes the propagation of waves in many different situations; here it
concerns electromagnetic radiation — light, radio waves, X-rays, and so on.

EXERCISES

Besides distributions of charge or mass in 3-space, one can consider distributions on

surfaces or curves (physically: thin plates or wires). The formula for the associated
potential or field is similar to (5.43) except that the triple integral is replaced by a
surface or line integral, and the density p represents charge or mass per unit area or
unit length rather than unit volume. In the following exercises, "uniform" means
"of constant density."
252 Chapter 5. Line and Surface Integrals; Vector Analysis

1. Consider a uniform distribution of mass on the sphere of radius r about the

origin. Show that
a. inside the sphere, the potential is constant and the gravitational field van
ishes;
b. outside the sphere; the potential and field are the same as if the entire mass
were located at the origin.
2. Consider a uniform distribution of mass on the solid ball of radius R about the
origin. Show that the gravitational field at a point x is the same as if the mass
closer to the origin than x were all located at the origin and the mass farther
from the origin than x (if any) were absent. (Use Exercise 1.)
3. Consider a uniform distribution of charge on the z-axis, with density p (charge
per unit length).
a. Compute the electric field generated by this distribution. (The relevant
formula is similar to (5.43), but l/|p - x| is replaced by the negative of its
gradient with respect to x, namely, (x — p)/|x — p|3.)
b. Show that the modification of (5.43) that presumably gives the potential
for this charge distribution is a divergent integral.
c. To resolve the difficulty presented by (b), we make use of the fact that
the defining property of the potential u, namely Vu = —E, only deter
mines u up to an additive constant, so we may subtract constants from u
without affecting the physics. Consider instead a uniform distribution of
charge on the interval [-R, R] on the z-axis with density p. Compute the
potential ur generated by this distribution, and show that ur — 2p\ogR
converges as R ~> oo to a function whose gradient is the negative of the
field found in (a). (This sort of removal of divergences by "subtracting off
infinite constants" is common in quantum field theory, where it is known
as renonncdizcition.)
4. Prove the following two-dimensional analogue of Theorem 5.46: Suppose p is
a function of class C2 on R2 that vanishes outside a bounded set, and let

u x = / p(x + y) log |y| d2y.

Then u is of class C2 and V2'/i = 2ixp. (The proof is very similar to that of
Theorem 5.46; see Exercise 2d in §5.4.)

5.7 Stokes's Theorem

Stokes's theorem is the generalization of Green's theorem in which the plane is

replaced by a curved surface. The precise setting is as follows.
5.7. Stokes's Theorem 253

Figure 5.9: An oriented surface and its positively oriented boundary.

Let So be a smooth surface in R3, and let S be a region in 5o that is bounded by

a piecewise smooth curve dS. By this we mean that dS is the boundary of S within
the surface S$} (Of course, if we think of S as a subset of M.3, it has no interior
and so is its own boundary.) We assume that S is oriented by a choice of normal
vector field n, so we can speak of the positive and negative sides of S, and we give
dS the orientation compatible with the orientation of S in the sense we used in
Green's theorem. This means, informally speaking, that if you walk around dS in
the positive direction, standing on the positive side of S, then S is on your left. In
more mathematical terms, if t is the unit tangent to dS in the forward direction at
a point x G dS, then n x t, considered as an arrow emanating from x, points into
S. See Figure 5.9.

5.52 Theorem (Stokes's Theorem). Let S and dS be as described above, and let
F be a Cl vector field defined on some neighborhood of S in R3. Then

(5.53) f F-dx= /Y(curlF)-ncfA

Jds JJs

Proof. If 5 is a region in the ;cy-plane, then n = k = (0,0,1); moreover, F • dx

involves only the x- and y-components of F, i.e., F\ and F2, and (curl F) • n is the
z-component of curlF, namely diF2 — #2-^1 • Hence Stokes's theorem reduces to
Green's theorem in this case.

'Here are the precise definitions: A point x G S is in the interior of S relative to So if it has a
neighborhood U (in R3) such that U n So C S; it is in the boundary of S relative to So if all of
its neighborhoods contain points in S and points in So \ S. S is regular if it is compact and every
neighborhood of every (relative) boundary point contains points in the (relative) interior.
254 Chapter 5. Line and Surface Integrals; Vector Analysis

Next, suppose that S admits a parametrization x = G(u,v), so that S is the

image under G of a regular region W in the uu-plane and dS is the image of dW.
We assume that this parametrization yields the given orientation on S (otherwise,
just switch u and v). We use the parametrization to pull back the integrals over S
and BS to integrals over W and dW, and we apply Green's theorem to the latter. It
is just a matter of seeing that this change of variables works out as it should.
As in the proofs of Green's theorem and the divergence theorem, we consider
the components of F separately. Thus, if we write F = Fi + Gj + ilk, it is enough
to prove (5.53) for Fi, Gj, and Hk separately. All three of them work the same
way, so we shall just do Fi, for which (5.53) reduces to

(5.54) I F(x, y, z) dx == // [(dzF)j - (dyF)k] ■ n dA.

JdS JJS

Now, using the parametrization x = G(u, v), we have

(5.55)
()
C f C /'/'
C (BC BC \
JJ[(dzF)j - (dyF)k] -ndA=- Jjwi(dzF)i - (ByF)k] [ x J dudv
(BFd{z,x) 3Fd{x,y)
w \ dz d(u, v) dy d(u, v) dudv-

On the other hand, since the formalism of differentials automatically encodes the
chain rule,

Fdx= F l—du+—dv
as Jaw \du dv
(In both of these equations, F and its derivatives are evaluated at G(u, v).) We
apply Green's theorem to this last line integral:

,, , dudv.
]aw \du dv ) JJW \du [ dv ov ou

By the product rale and the chain rale, the integrand on the right equals

dF^dy_ dx+p^
dx du dy
d du
d dz du_ dv dudv
dFdx dFdy dF dz~\ dx d2x
dx dv dy dv dz dv\ du dvdu
_dFd(z,x) dF3(x,y)
dz d(u,v) dy d(u,v)

But this is the integrand on the right side of (5.55), so (5.54) is proved.
5.7. Stokes's Theorem 255

Finally, as in the proofs of Green's theorem and the divergence theorem, we

obtain Stokes's theorem more generally for surfaces 5 that can be cut up into a
finite number of pieces that each admit a parametrization by applying the preceding
argument to the pieces and adding up the results. Alternatively, we can obtain
Stokes's theorem for general surfaces by an adaptation of the proof of Green's
theorem in Appendix B. 7. □

Example 1. Use Stokes's theorem to compute fc F • dx where F(x, y, z) =

Vx2 + 1 i + xj + 2yk and C is the intersection of the surfaces z — xy and
x2 + y2 = 1, oriented counterclockwise as viewed from above.
Solution. C is the boundary of the portion of the surface z = xy inside
the cylinder x2 + y2 = 1, and its orientation is compatible with the orientation
of S with the normal pointing upward. We have curlF(x, y, z) = 2i + k and
n dA = (—yi — xj + k) dx dy, so

/ F ■ dx = // (1 - 2y) dx dy = vr.
Jc JJx2w<i
(No computation is necessary here; the integral of 1 is the area of the disc and
the integral of —2y vanishes by symmetry.)

There is an interesting feature of Stokes's theorem that does not appear in its
siblings. A closed curve in R2 is the boundary of just one regular region in R2,
and a closed surface in R3 is the boundary of just one regular region in R3; but a
closed curve in R3 is the boundary of infinitely many surfaces in R3! For example,
the unit circle in the xy-plane is the boundary of the unit disc in the xy-plane, the
upper and lower hemispheres of the unit sphere in R3, the portion of the paraboloid
z = 1 — x2 — y2 lying above the unit disc, and so forth. Stokes's theorem says that
if C is a closed curve in R3 and 5 is any oriented surface bounded by C, then

I F-dx= I (cmlF)-ndA
'C JJS

for any C1 vector field F, provided that the orientations on C and S are compatible.

Example 2. Let F(x, y, z) = [exz + ex+2^]i + [log(2 + y + z) + 2ex+2y)i +

3xyzk. Compute JJS curl F • n dA, where 5 is the portion of the surface z =
1 — x2 — y2 above the xy-plane, oriented with the normal pointing upward.
Solution. We have curl F(x,y,z) = [3xz- (2 + y + z)~1]i+[xexz -3yz]j
and n dA = (2xi + 2yj + k) dx dy, so direct evaluation of the integral is quite
unpleasant. By Stokes's theorem, the integral equals jc F ■ dx where C is
the unit circle in the xy-plane; this is not much better. However, by Stokes's
256 Chapter 5. Line and Surface Integrals; Vector Analysis

theorem again, the latter line integral is equal to ffD curl F • n dA where D is
the unit disc in the zy-plane. Here n = k, so curl F • n = 0 and the integral
vanishes!

Here is an analogue of the fact that the integral of the gradient of a function
over any closed curve vanishes:

5.56 Corollary. If S is a closed surface (i.e., a surface with no boundary) in R3

with unit outward normal n, and F is a Cl vector field on S, then JJ5(curlF) ■
ncL4 = 0.

Proof. If F extends differentiably to the region R inside S, this follows from the
divergence theorem, since div(curl F) = 0 for any F. However, it is true even if F
has singularities inside S. To see this, draw a small simple closed curve C in 5 (say,
the image of a small circle in the nu-plane under a parametrization x = G(u, v)).
C divides 5 into two regular regions Si and 52, and we have

(5.57) (cmiF)-ndA= (curl F) • ndA +// (curlF) • ncL4.

On the other hand, if we give C the orientation compatible with S\, Stokes's theo
rem gives

ft (curlF) ■ ndA = f F • dx = - // (curlF)

because the orientation compatible with S2 is the opposite one. Hence the terms on
the right of (5.57) cancel.
(Note: We had to say that C is a "small" closed curve, because otherwise C
might not divide S into two pieces. For example, take 5 to be a torus [the surface
of a doughnut] and C to be a circle that goes completely around S in one direction.)
□

Stokes's theorem gives a geometric, coordinate-free interpretation of the curl of

a vector field. Namely, suppose F is a C1 vector field on some open set containing
the point a; here's how to find the component of curl F(a) in the direction of any
unit vector u, that is, (curlF(a)) ■ u . Let Dc be the disc of radius e centered at a
in the plane perpendicular to u, oriented so that u is the positive normal for De. As
e —* 0, the average value of (curl F) • u over Dt approaches its value at a:

(curlF(a)) • u = lim At / / (curlF) • ndA.

5.7. Stokes's Theorem 257

Since u is the normal to De, Stokes's theorem gives

(5.58) (curlF(a)) ■ u = lim -^ f F • dx,

£-+o 7re^ In

where Cc is the circle of radius e about a in the plane perpendicular to u, traversed

counterclockwise as viewed from the side on which u lies. This is the promised
coordinate-free description of curl F.
If we think of F as a force field, fc F • dx is the work done by F on a particle
that moves around C£. Thus (5.58) says that (curlF(a)) -u represents the tendency
of the force F to push the particle around C,:, counterclockwise if (curlF(a)) • u
is positive and clockwise if it is negative (as viewed from the u-side).

EXERCISES

1. Use Stokes's theorem to calculate Jc[{x — z) dx + (x + y) dy + (y + z) dz]

where C is the ellipse where the plane z = y intersects the cylinder x2+y2 = 1,
oriented counterclockwise as viewed from above.

2. Use Stokes's theorem to evaluate fc[y dx + y2 dy + (x + 2z) dz] where C is the

curve of intersection of the sphere x2 +y2 + z2 = a2 and the plane y + z = a,
oriented counterclockwise as viewed from above.

3. Given any nonvertical plane P parallel to the x-axis, let C be the curve of
intersection of P with the cylinder x2 + y2 = a2. Show that fc[(yz -y)dx +
(xz + x) dy] = 2ira2.
4. Evaluate fjs curl F • n d,A where F(x, y, z) = y\ + (x - 2x3z)j + xy3k and S
is the upper half of the sphere x2 + y2 + z2 = a2.
5. Let F(x, y, z) = 2xi + 2yj + (x2 + y2 + z2)k and let 5 be the lower half of the
ellipsoid (.x2/4) + (j/2/9) + {z'2/27) = 1. Use Stokes's theorem to calculate
the flux of curl F across 5 from the lower side to the upper side.

6. Define the vector field F on the complement of the z-axis by F(x,y,z) =

a. Show that curl F = 0.

b. Show by direct calculation fc F • dx. = 2ir for any horizontal circle C
centered at a point on the z-axis.
c. Why do (a) and (b) not contradict Stokes's theorem?

7. Let Cr denote the circle of radius r about the origin in the zz-plane, oriented
counterclockwise as viewed from the positive y-axis. Suppose F is a C1 vector
field on the complement of the y-axis in M3 such that jc F • dx = 5 and
curl F(.t, y, z) = 3j + {z\ - xk)/(x2 + z2)2. Compute [c F ■ dx for every r.
258 Chapter 5. Line and Surface Integrals; Vector Analysis

8. Let S be a smooth oriented surface in M3 with piecewise smooth, compatibly

oriented boundary OS. Suppose / is C1 and g is C2 on some open set contain
ing S. Show that

/ [[(VfxVg)-ndA.
dS JjS

5.8 Integrating Vector Derivatives

In this section we study the question of solving the equations

grad / = G, curl F = G, divF = g

for / or F, given g or G. We first consider the equation V/ = G, and we begin

with a simple and useful result:

5.59. Proposition. Suppose G is a continuous vector field on an open set R in W1.

The following two conditions are equivalent:
a. If C\ and C'i are any two oriented curves in R with the same initial point and
the same final point, then Jc G ■ dx — fc G ■ dx.
b. If C is any closed curve in R, jc G ■ dx = 0.

Proof, (a) implies (b): Suppose C starts and ends at a. Then C has the same initial
and final point as the "constant curve" C2 described by x(t) = a, and obviously
jc G • dx = 0 since d,x = 0 on C?-
(b) implies (a): Suppose C\ and C'2 start at a and end at b. Let C be the closed
curve obtained by following C\ from a to b and then Ci backwards from b to a.
Then 0 = Jc G ■ dx = JCi G • d,x - /^ G • dx. □

A vector field G that satisfies (a) and (b) is called conservative in the region
R. (The word "conservative" has to do with conservation of energy. If we interpret
G as a force field, condition (b) says that the force does no net work on a particle
that returns to its starting point.) A good deal of mathematical physics is based on
the following characterization of conservative vector fields:

5.60. Proposition. A continuous vector field G in an open set R, C W1 is conser

vative if and only ifG is the gradient ofaC1 function f on R.
Proof. If G = V/ and C is a closed curve parametrized by x = g(t), a < t < b,
by the chain rule we have

f Vf-dx= l'vf(g(t))-g'(t)dt= I y,f(g(t))dt

JC Ja Ja at
5.8. Integrating Vector Derivatives 259

because g(6) = g(a), so condition (b) in Proposition 5.59 is satisfied.

Conversely, suppose G is conservative in R. To construct a function of which
G is the gradient, we shall assume R is connected. (Otherwise we can consider
each connected piece of R separately.) Pick a base point a € R. For any x € R,
let C be a curve in R from a to x — such a curve always exists, by Theorem 1.30
— and define /(x) = JCG ■ <ix. This definition makes sense by condition (a)
in Proposition 5.59: It doesn't matter which curve we pick. We shall show that
G = V/ by showing that Fj = djf for each j; it is enough to do the case j = 1.
Let h = (h, 0,..., 0). Given x G R, suppose h is small enough so that the line
segment L from x to x + h lies entirely in R. We have /(x) = Jc G ■ dx where
C is a curve from a to x. We can make a curve from a to x + h by joining L onto
the end of C, so that /(x + h) = Jc G • dx + JL G • dx. But then

1 /■ 1 fh
= — / G • dx = — / Gi(x+ t,X2, ■ ■ ■ ,xn) dt,
h "> Jl
IL 'l JO
'<■ Jo

and by letting /nOwe obtain <9i/(x) = Gi(x). D

The function / in Proposition 5.60 is determined up to an additive constant, as

suming that R is connected. It is called the potential associated to the conservative
vector field G.
It remains to find a good method for determining whether a vector field is con
servative, i.e., whether it is the gradient of a function. Another way of phrasing this
question: When is a differential form G\ dx\ + ■ • ■ + Gn dxn the differential of a
function? We shall assume that the vector field G is of class C1 on an open set
R. In this case, there is an obvious necessary condition for G to be a gradient of
a function on R. Indeed, if Gj = djf, then djGk and dtGj are both equal to the
mixed partial djdkf, so

(5.61)

We observe that when n = 3, the quantities in (5.61) are the components of curl G,
so that (5.61) is equivalent to the condition curl G = 0.
The condition (5.61) is almost sufficient to guarantee that G is a gradient; the
only possible problem arises from the geometry of R, as we shall explain in more
detail below. When R is convex, the problem disappears, and we have the following
result. Our proof will only be complete in dimensions 2 and 3 because it invokes
Green's or Stokes's theorem, but the same idea works in higher dimensions.
260 Chapter 5. Line and Surface Integrals; Vector Analysis

5.62 Theorem. Suppose R is a convex open set in Rn and G is a Cl vector field

on R. If G satisfies (5.61) in R (which means that curlG = 0 in R in the case
n = 3), then G is the gradient of a C2 function on R.

Proof. The idea is similar to the proof of Proposition 5.60, but we do not know
yet that condition (a) of Proposition 5.59 is satisfied, so we must be more careful.
Pick a base point a in R, and define /(x) for x <E R by /(x) = /L(-ax) G ■
dx, where L(a, x) is the line segment from a to x. (We need the hypothesis of
convexity so that this line segment lies in R.) To show that G(x) = V/(x), let
h = (h, 0, • • • , 0) be small enough so that x + h <E R. Let C be the triangular
closed curve obtained by following L(a, x) from a to x, L(x, x + h) from x to
x + h, and then L(a, x + h) backwards from x + h to a. Green's theorem (if
n = 2), Stokes's theorem (if n = 3), or the higher-dimensional version of Stokes's
theorem (if n > 3; see §5.9) converts Jc G • dx into a double integral over the
solid triangle whose boundary is C, whose integrand vanishes by (5.61). Hence
Jc G ■ dx = 0, or in other words,

/(x + h) - /(x) = / G ■ dx - / G-dx= I G ■ dx.

A(a,x) JL(a,x+h) ii(x,x+h)

Now the same argument as in Proposition 5.60 shows that d\f = G\, and likewise
djf =zGj for the other j. □

The hypothesis of convexity in Theorem 5.62 is stronger than necessary; one

can generalize the argument by using curves other than straight lines. What is
crucial is that when one joins the points a, x, and x + h by line segments or curves,
the resulting "triangle" is the boundary of a piece of surface that lies entirely in R,
so that the condition (5.61) and Stokes's theorem can be applied. This may not be
the case if the region R has "holes." The following example shows what can go
wrong in such a case.

Example 1. Let R be the complement of the z-axis in E3, and let

G{x,y,z) = —2——2".

It is easily verified that the condition curl G = 0 is satisfied on R, but that

G is not conservative on R; in fact, fc G • dx = 2tt when C is the unit
circle. (See Exercise 6 in §5.7.) The key to the mystery is as follows: G is
really the gradient of the angular variable 9 in cylindrical coordinates, but 6
is not a well-defined function on R. It is defined only up to multiples of 2tt.
5.8. Integrating Vector Derivatives 261

However, if we choose a convex subregion of S C R (for example, the half-

space y > 0), we can choose a well-defined "branch" of the angle 9 on S (for
example, 0 < 9 < tt), and then G is the gradient of this function on S. The
same example can be used in R2, taking R to be the complement of the origin.

The hypothesis on R that should replace convexity in Theorem 5.62 to give the
best result is that every simple closed curve in R is the boundary of a surface lying
entirely in R. (The proof requires more advanced techniques.) The region R in
Example 1 does not have this property; no closed curve that encircles the z-axis
can be the boundary of a surface in R.
In practice, if R is a rectangular box, to find a function whose gradient is G one
can proceed in a more simple-minded way than is indicated in the proof of Theorem
5.62. Consider the 2-dimensional case, where R = [a, 6] x [a, 13] and G(x,y) =
P(x, y)i + Q(x, y)j. Assuming that dxQ = dyP, we begin by integrating P with
respect to x, including a "constant" of integration that can depend on the other
variable y:

f(x,y)= /
r P(t,y) dt + <p(y).
Jc
Here c can be any point in the interval [a, b\. Any such / will satisfy dxf = P. To
obtain dyf = Q, differentiate the formula for / with respect to y and use Theorem
4.47:

dyf(x, y) = f dyP(t, y) dt + <p'(y) = f dxQ(t, y) dt + ip'{y)

Ja Ja

= Q(x,y) -Q(a,y)+ip'(y).

Thus we obtain the desired / by taking <p to be an antiderivative of Q(a, y).

The same idea works in n variables. If G is a vector field on W1 that satisfies
(5.61), we integrate G\ with respect to x\ to get

Gl{t,X2, ■ ■ ■ ,Xn) dt + f(x2, ■ ■ ■ ,Xn).

Then d\f = G\. Differentiating this formula with respect to x%, ■■■ ,xn and using
the facts that djG\ = d\Gj, we obtain formulas for d2^>, ■ ■ ■, dn<p. The problem is
thereby reduced to a similar problem (finding a function with a given gradient) in
one less variable, so we can proceed inductively.

Example 2. Let G(x,y) = [y2exy]i + [(xy + l)exy + cosyjj. We have

= (2.x + x2y)exy, so (5.61) is satisfied. To find a function /
262 Chapter 5. Line and Surface Integrals; Vector Analysis

such that V/ = G, we set

f(x, y) = J y2exy dx = yexv + ip{y).

Then dyf = (xy + l)exy + ip'(y); matching this up with the second component
yields <p'(y) = cosy, so we can take <p(y) = siny. The general solution is
f(x, y) = yexy + siny + C.

Example 3. LetG(z,y, z) = yzi+(xz+y)j+(xy-z)'k. An easy calculation

shows that curl G = 0. To find a function / such that V/ = G, we integrate
the first component with respect to x, obtaining f(x, y, z) = xyz + <p(y, z).
Differentiating this in y and z yields dyf — xz + dy<p and dzf = xy + dz<p.
Therefore, we must have dy<p = y and dz<p = —z. Integrating the first of these
equations with respect to y gives ip(y, z) = \y2 + ip(z), so dzip = ip'{z) = —z
and ip(z) = -\z2 + C. Putting this all together,

f(x, y, z) = xyz + \y2 - \z2 + C.

Next, we turn to the question of solving the equation curl F = G, where G is a

C1 vector field on some open set R C M3. There is an obvious necessary condition
for solvability: Since div(curlF) = 0 for any F (formula (5.31)), we must have
div G = 0 on R. Again, this condition turns out to be sufficient provided that R
has "no holes," but here the meaning of "no holes" is somewhat different. Instead
of requiring that every closed curve in R be the boundary of a surface that lies
entirely in R, we require that every closed surface in R be the boundary of a 3-
dimensional region that lies entirely in R. For example, the complement of the
z-axis in M3 satisfies the second condition but not the first; the complement of the
origin satisfies the first condition but not the second. An example of a vector field
G that satisfies div G = 0 on the complement of the origin but is not the curl
of any vector field there is provided by G(x) = x/|x|3, the "inverse square law
force." This G cannot be a curl because its integral over a sphere about the origin
is nonzero, and this contradicts Corollary 5.56. (See Exercise 6 in §5.5; our G is
the negative of the gradient of the g there.)
Convex regions have no holes, no matter what one means by "holes," and the
following analogue of Theorem 5.62 is valid.

5.63 Theorem. Suppose R is a convex open set in M3 and G is a Cl vector field

on R. IfG satisfies div G = 0 on R, then G is the curl of a C2 vector field on R.

Proof. We shall not give the general proof but shall content ourselves with present
ing an algorithm for solving curl F = G when R is a rectangular box, similar to the
5.8. Integrating Vector Derivatives 263

one given above for solving V/ = G. Suppose that R = [ai, b\] x [02, 62] x [03^3]
and G is a C1 vector field satisfying div G = 0 on R. Unlike the problem of find
ing a function with a given gradient, whose solution is unique up to an additive
constant, there is lots of freedom in choosing an F such that curl F = G, for if
curlF = G then also curl(F + V/) = G for any smooth function /. This gives
enough leeway to allow us to assume that the z-component of F is zero. Thus, let
us write G = Gii + G2J + G3IC and F = i*\i + F23; we then want

curlF = -0zF2i + dzFx] + {dxF2 - dyFx))<L = di + G2j + G3k.

We solve the first two equations by taking

F2 = - I Gi(x,y,t)dt + ip(x,y), Fi
Jc

where c is some chosen point in [03,63]. We then have

dxF2 - 8yF1 = - f [dyG2{x,y,t) + 0xGi(a;,j/,i)] dt

Jc
+ dxip{x,y) + dyip{x,y).

Since div G = 0, this equals

y,t)dt + dxip(x,y) + dyip(x,y)

= G3(x,y,z) - G3(x,y,c) + dx(p(x,y) + dyip{x,y).

We therefore achieve our goal by choosing <p and ip to satisfy

dxip(x,y) + dyip{x,y) = G3(x,y,c).

There is still lots of freedom here; for example, we could take

<p(x,y) = G3{t,y,c)dt, ip(x,y)=0 {aE[aubi]).

If div G = 0, a vector field F such that curlF = G is called a vector potential

forG.
264 Chapter 5. Line and Surface Integrals; Vector Analysis

EXAMPLE 4. Find a vector potential for the vector field

G(x, y, z) = (6xz + x3)i - (3x2y + y2)] + (4a; + 2yz - 3z2)k.
Solution. First one should verify that div G = 0 so as not to go on a fool's
errand. Having done so, one can take F = F\ i + F2i where

= -(3x2y + y2), dzF2 = -{6xz x 3)

dxF2 - dyFi = Ax + 2yz - 3z2.

Solving the first two equations yields

JFi = -3x2yz-y2z + ijj(x,y), F2 = -3xz2 - x3z + (p(x,y),

and plugging these results into the third equation yields dx<p — dyip = Ax.
Therefore, one solution (with ip = 2x2 and if) = 0) is

Fo = -{3x2yz + y2z)\ + (2x2 - 3xz3 - x3z)y,

the general solution is F = Fo + V/ where / is an arbitrary C1 function.

Now, what about the equation div F = gl Here there are no obstructions to
solvability, and there is an enormous amount of freedom in finding a solution. For
example, if we wish to solve div F = g in a rectangular box in Rn, we could take

F = (F,0,...,0), F(x)= I lg(t,x2,...,xn)dt,

or similar expressions with the variables permuted; there are many other possi
bilities. In fact, this problem is so easy that it seems reasonable to make it more
interesting by imposing additional conditions on F. We restrict attention to the
three-dimensional situation, but there are similar results in higher dimensions.
The key result here is Theorem 5.46, which shows that we can solve the equa
tion div F = g subject to the restriction that curl F = 0. More precisely, suppose
R is a bounded open set in E3 and g is of class C1 on R. (In Theorem 5.46 g
was assumed to be C2, but see the remarks following the proof.) Smoothness on
R means that g can be extended as a C1 function to an open set containing R, and
it can be modified outside R so as to vanish outside some bounded set while re
maining of class C1. (One multiplies g by a C1 function that is identically 1 on R
and vanishes outside some slightly larger region; we omit the details, which are of
little importance for this argument.) Hence we may assume that g is Cl on R3 and
vanishes outside a bounded set. Then, by Theorem 5.46, the function
5.8. Integrating Vector Derivatives 265

satisfies V2u = g, and so the vector field F = Vti satisfies both div F = g and
curlF = Ooni?.
With this result in hand, we show that the equations curl F = G and div P = g
can be solved simultaneously (for the same F).

5.64 Theorem. Let R be a bounded convex open set in R3. For any Cl function g
on R and any C2 vector field G on R such that div G = 0, there is a C2 vector
field F on R such that curl F = G and div F = g on R.

Proof. Let H be a solution of curlH = G, as in Theorem 5.63, and let u be

a solution of V2tt = g — divH, as explained above. Let F = X7u + H; then
curlF = curl(Vit) + G = G and div F = V2u + divH = g.

There is a companion result to Theorem 5.64: Not every vector field is a gra
dient, and not every vector field is a curl, but every vector field is the sum of a
gradient and a curl. The proof is left to the reader as Exercise 3, where a more
precise statement is given.
One might also ask about uniqueness in Theorem 5.64; that is, to what extent is
a vector field determined by its curl and divergence? Clearly, if F satisfies curl F =
G and div F = g, then so does F + H whenever curl H = 0 and div H = 0.
Solutions of the latter pair of equations can be obtained simply by taking H = V<^
where tp is any solution of Laplace's equation W2(p = 0. Such solutions exist in
great abundance, so the F in Theorem 5.64 is far from unique. However, one can
pin down a unique solution by imposing suitable boundary conditions.

5.65 Proposition. Let Rbe a bounded convex open set in R3 with piecewise smooth
boundary. Suppose H is a C1 vectorfield on R such that curl F = 0 and div F = 0
on R and F • n = 0 on dR. Then H vanishes identically on R.

Proof. By Theorem 5.62, F is the gradient of a function u on R, and V2u =

div F = 0. Since F • n = du/dn, by Green's formula (5.38) we have

0= [[ u^ dA = If! (|Vn|2 + uV2u) dV = fit (|H|2 + 0) dV.

JJdR "n JJjR JJJr
But |H|2 is a nonnegative continuous function, so its integral over R can be zero
only if |H|2 (and hence H) vanishes identically on R. □

By applying Proposition 5.65 to the difference of two solutions of the problem

in Theorem 5.64, we see that ifF and F' are vector fields with the same curl and
divergence on R and the same normal component on dR, then F = F' on R.
We conclude with a few remarks about the application of the results of this
section to Maxwell's equations (5.50). First, we observe that the curl of the electric
266 Chapter 5. Line and Surface Integrals; Vector Analysis

field E vanishes only when there are no time-varying magnetic fields present. Only
in this case is E the gradient of a potential function. However, div B = 0 always
(this expresses the fact that there are no "magnetic charges"), so B is the curl of a
vector potential A. We then have

curl ( E + «— J = curlE + --— = 0,

\ c dt J c dt

so E + c~ldtA is the gradient of a function —if. The four-component quantity

(<p, A) = (<p, Ai,A2, As) is called the electromagnetic 4-potential. It is best re
garded as a vector in 4-dimensional space-time, with ip being the time component,
in the context of special relativity.

EXERCISES

1. Determine whether each of the following vector fields is the gradient of a func
tion /, and if so, find /. The vector fields in (a)-(c) are on R2; those in (d)-(f)
are on R3, and the one in (g) is on R4. In all cases i, j, k, and 1 denote unit
vectors along the positive x-, y-, z~, and if-axes.
a. G(x, y) = {2xy + x2)i + (x2 - y2)j.
b. G(x, y) = (3y2 + 5x4y)i + (x5 - 6xy)j.
c. G(x,y) = (2e2x sin y - 3y + 5)i + (e2x cosy- 3.x )j
d. G(x,y,z) = (yz — ysinxy)i + (xz — xsinxy + zcosyz)} + (xy +
ycosyz)\s..
e. G(x, y, z) = (y- z)\ + (x - z)j + (x - y)k
f. G(x, y, z) = 2xyi + {x2 + log z)j + {{y + 2)/z)k (z > 0).
g. G(x, y, z, w) = (xw2 + yzw)i + (xzw + yz2 — 2e22/+z)j + (xyw + y2z —
e2y+z — w sin zw)]<. + (xyz + x2w — z sin zw)\.
2. Determine whether each of the following vector fields is the curl of a vector
field F, and if so, find such an F.
a. G(x, y, z) = (x3 + yz)i + (y - 3x2y)j + 4y2k.
b. G(x, y, z) = (xy + z)\ + xzj - (yz + x)k.
c. G(x, y, z) = (xe-x°'z2 - 6x)i + (by + 2z)j + (z - ze~x2z2)k.
3. Let R be a bounded convex open set in R3. Show that for any C2 vector
field Honi? there exist a C2 function / and a C2 vector field G such that
H = grad / + curl G. (Hint: Solve V2/ = div H.)
4. Let F = Fii + F2j be a Cl vector field on S = R2 \ {(0,0)} such that
d\F2 = c*2-Fi on S (but F may be singular at the origin).
5.9. Higher Dimensions and Differential Forms 267

a. Let Cr be the circle of radius r about the origin, oriented counterclockwise.

Show that Jc F ■ dx is a constant a that does not depend on r. (Hint:
Consider the region between two circles.)
b. Show that jc F • dx = a for any simple closed curve C, oriented counter
clockwise, that encircles the origin.
c. Let Fo = (xj - yi)/{x2 + y2) as in Example 1. Show that F - (a/27r)F0
is the gradient of a function on S. (Thus, all curl-free vector fields on S
that are not gradients can be obtained from Fo by adding gradients.)

5.9 Higher Dimensions and Differential Forms

Green's theorem has to do with integrals of vector fields in the plane, and the di
vergence theorem and Stokes's theorem have do do with integrals of vector fields
in 3-space. What happens in dimension n? There are a couple of things we can say
without too much additional explanation.
First, the obvious analogue of the divergence theorem holds in Rn for any
n > 1. To wit, if R is a regular region in Rn bounded by a piecewise smooth
hypersurface dR, and F is a C1 vector field on R, then

-/ F-ndV"'1 =[[■■■[ divFdVn.

Here dVn is the n-dimensional volume element in Rn and dVn~l is the (n — 1)-
dimensional "area" element on dR. The "vector area element" ndVni is given
by a formula analogous to the one in M3. Namely, if (part of) dR is parametrized
by x = G(«i,... ,Un-i). then

diGn
1 = det du\■■■dun—i,

dn-lGn)
where ei,..., en are the standard basis vectors for Rn. (The reader may verify that
in the case n = 2, these formulas yield Green's theorem in the form (5.18).)
Second, the analogue of the divergence theorem in dimension 1 is just the fun
damental theorem of calculus:

f(b)-f(a)= f f'(t)dt.
J[a,b]

On the real line, vector fields are the same thing as functions, and the divergence of
a vector field is just the derivative of a function. A regular region in R is an interval
268 Chapter 5. Line and Surface Integrals; Vector Analysis

[a, b], whose boundary is the two-element set {a, b}. Since the boundary is finite,
"integration" over the boundary is just summation, and the minus sign on f(a)
comes from assigning the proper "orientation" to the two points in the boundary.
There are also analogues of Stokes's theorem in higher dimensions, which say
that the integral of some gadget G over the boundary of a /c-dimensional submani-
fold of Rn equals the integral of another gadget formed from the first derivatives of
G over the submanifold itself. However, to formulate things properly in this general
setting, it is necessary to develop some additional algebraic machinery, the theory
of differential forms. To do so is beyond the scope of this book; what follows is
intended to provide an informal introduction to the ideas involved. For a detailed
treatment of differential forms, we refer the reader to Hubbard and Hubbard [7] and
Weintraub [19].
Roughly speaking, a differential /c-form is an object whose mission in life is to
be integrated over /c-dimensional sets; thus, 1-forms are designed to be integrated
over curves, 2-forms are designed to be integrated over surfaces, and so on. Here
is how the ideas of vector analysis that we have been studying can be reformulated
in terms of differential forms.

1-Forms. A differential 1-form on Rn is an expression of the form

lo = Fi(xi,...,xn)dxi H \- Fn(xi,...,xn)dxn,

where the Fj's are continuous functions. There is an obvious correspondence be

tween the 1-form u and the vector field F = (F±,..., Fn). In particular, in 3
dimensions the correspondence between 1-forms and vector fields takes the form

(5.66) lo = Fdx + Gdy + Hdz <—> F = Fi + Gj + H k.

One type of 1-form that we have already encountered is the differential of a C1

function,
df = (d1f)dxl + --- + {dnf)dxn.

However, not every 1-form is the differential of a function; the necessary condition
for to to be of the form df is (5.61).
We note that the set of 1-forms on Rn is a vector space. That is, it makes sense
to add 1-forms to each other and to multiply them by scalars. In fact, the "scalars"
here can be taken to be not just constants but arbitrary continuous functions on W1.
Thus, if a = Ai dx\ + V An dxn and (3 = B\ dx\ + V Bn dxn are 1-forms
and / is a continuous function,

a + (3 = (Ai + Bi) dxi-\ h (An + Bn) dxn,

••• + (/An)dxn.
5.9. Higher Dimensions and Differential Forms 269

Any smooth mapping T : Rk —> W1 induces a mapping of 1-forms in the

opposite direction, that is, an operation T* which takes 1-forms on K™ to 1-forms
on Rfc. Schematically:
TcpA; x , inn
IK > IK
T*
1-forms on'. 1-forms on.

This operation is just the "built-in chain rule" for differentials of functions, ex
tended to arbitrary 1-forms. To wit, let x%,... ,xn and u\,... ,Uk be the coordi
nates on Rn and Rk, respectively. If a; = Fi dx\ + ■ ■ • + Fn dxn is a 1-form on
W1, its puUback via T is the 1-form T*tu on Rk defined by substituting into u the
expressions for the x's in terms of the n's and the dx's in terms of the du's:

(5.67)

T*u = duk + "- + An + ••• + duk

dui duk duk

T OXr
-\ h duk,
' dui n du\ duk duk

where
Am(ui,...,uk) = Am(T(ui,...,Uk)).
Two special cases are of particular interest. First, the chain rale says that when
u = df, T*lj = d{f o T). Second, when k = 1 so that T : R -> defines a
curve in Wl, (5.67) becomes

u = du.
du du

1-forms can be integrated over curves. To begin with, a 1-form on R is merely

something of the form uj = g(t) dt, and its integral over an interval [a, b] is just
what you think it is:

u= f g(t)dt.
a,b] Ja
Now, if u! = A\ dx\ + • • ■ + An dxn is a 1-form on W1 and C is a smooth curve
parametrized by x = g(t), Jcuj is defined by pulling ui back to M. via g and
integrating the result as before:
r fb
u>= dt.
IC J[a,b] Ja

In other words, if we identify to with the vector field F as before,

u= / F • dx.
c Jc
270 Chapter 5. Line and Surface Integrals; Vector Analysis

2-Forms and the Exterior Product. We now define a notion of a "product of

two 1-forms" that is related to the cross product of vector fields in R3 but works
in any number of dimensions. This product is called the exterior product; the
exterior product of two 1-forms a and 0 is denoted by a A 0. The novel feature of
this is that a A 0 is no longer a 1-form but a new type of object called a 2-form.
Without specifying what a 2-form is just yet, we list the basic properties that
the exterior product is to have. First, it distributes over addition and scalar multi
plication in the usual way. That is, if ai, a2, and 0 are 1-forms on Rn and f\ and
/2 are continuous functions on Rn,

(/iai + /2a2) A0 = A(oi A 0) + /2(a2 A 0),

0 A (/ioi + /2a2) = A(/? A c*i) + /2(/3 A a2).

Second, the exterior product is anticommutative:

(5.69) 0/\a = -aA0.

Thus, if a — Ai dx\ + • • • + An dxn and 0 = B\ dx\ + • • • + Bn dxn, we can

expand a A 0 according to (5.68) to obtain

n n

(5.70) a A 0 = "^2 ^2 MBj dxi A ctej.

i=l ,7=1

But according to (5.69), dxj A da;^ = — dxj A dxj and dxj A dx{ = 0. Thus the
terms with i = j in (5.70) drop out, and for i / j we can combine the ijth and jith
terms into one:

AiBj dxi A dxj + A/-B,; dxj A dxj = (A-Bj - AjBi) dxi A dxj
= (AjBi - AiBj) dxj A dxi.

We have the option of using either of the two expressions on the right, and the usual
choice is to use the one where the first index is smaller than the second one. (In R3
a different choice is sometimes convenient, as we shall soon see.) Thus, we finally
obtain
P

In general, a differential 2-form on Rn is an expression of the type

(5.71) to = ^2 Cij(xi,...,xn)dxi/\dxj,
5.9. Higher Dimensions and Differential Forms 271

where the CV, are continuous functions on Rn. We note that the number of terms
in this sum, that is, the number of pairs (i,j) with 1 < i < j < n, is ^n(n — 1).
In (5.71) we also have the option of rewriting dx-h A dxj as — dxj A dx{ if we so
choose.
What does this really mean? We have been proceeding purely formally, without
saying what meaning is to be attached to the expressions dxi A dxj. In the full-dress
treatment of this subject, 2-forms are defined to be alternating rank-2 tensor fields
over R™, but this is somewhat beside the point. For now it is probably best to
think of a 2-form on Rn simply as a \n{n — l)-tuple of functions, namely the
functions Cij in (5.71), and the expressions dx-L A dxj simply as a convenient set of
signposts to mark the various components, just as i, j, and k are used to mark the
components of vector fields in R3. The important features of 2-forms are not their
precise algebraic definition but the way they transform under changes of variables
and the way they integrate over surfaces.
Before proceeding to these matters, however, let us see how things look in the
3-dimensional case. When n = 3 we also have \n(n — 1) = 3, so 2-forms have 3
components just as vector fields and 1-forms do: This is the "accident" that makes
n = 3 special! The general 2-form on R3 can be written as

uj = F(x, y, z) dy A dz + G(x,y, z) dz A dx + H(x, y, z) dx A dy,

so there is a one-to-one correspondence between 2-forms and vector fields:

(5.72) ui = FdyAdz + GdzAdx + HdxAdy <—> F = Fi + Gj + Hk.

Observe carefully how we have set this correspondence up: we have written the
basis elements dx{ A dxj with the variables in cyclic order,

dx before dy before dz before dx,

rather than the "i < j" order we used above, so that the middle term is dz A dx
rather than dx A dz. Also, we identify the unit vector i in the x direction with the
2-form dy A dz from which dx is missing, and likewise for j and k.
The exterior product in 3 dimensions looks like this: If

a = A\dx + A2dy + A% dz, (3 = B\ dx + B<idy + B3 dz,

then

a A /? = (A2B3 - A3B2) dy Adz + (A:iBi - A1B3) dz A dx

+ (AiB2 - A2Bi) dx A dy.

272 Chapter 5. Line and Surface Integrals; Vector Analysis

Thus, if we identify a and (3 with vector fields according to (5.66) and a A (3 with
a vector field according to (5.72), the exterior product turns into the cross product:

a <—► F, (3 <—> G, a A (3 <—► F x G.

Pullbacks and Integrals of 2-Forms. We have seen that a smooth mapping

T : Rk -> Rn induces a "pullback" mapping T* that takes 1-forms on Rn to 1-
forms on Rh. It also induces a pullback mapping, still denoted by T*, from 2-forms
on Rn to 2-forms on M.k, in exactly the same way: We simply substitute T(u) for
x and ^2-(dxm/duj) duj for dxj. Thus,

A dxm) = —— du\ + • • • + —— duk A —— du\ + • • • + —— duk

OUu™ !
J L
OU\-1 OUk"-

d(xi,xm)
. d(ui,iij) % J'

so in general, if
to = y~] Cim{x) dxL A dxm,

then

It is a consequence of the chain rule that the pullback operation behaves properly
under composition of mappings, namely, (Ti o T2)*w = T^(T\uo).
We can now show how to integrate 2-forms over surfaces. First consider the
simplest case, where the surface is simply a region D in M2. If we name the coor
dinates on IR2 x and y, the general 2-form on R2 has the form to = f(x,y)dxf\ dy,
and its integral over D is the obvious thing:

(5.73) // f(x,y)dxAdy = JJ f(x,y)dxdy,

the integral on the right being the ordinary double integral of / over D. The only
subtle point is that the integral on the left is an oriented integral, the orientation
being carried in the fact that dx comes before dy in dx A dy. If we wrote dy A dx
instead, we would introduce a minus sign.
The nice thing about (5.73) is that the change-of-variable formula for double
integrals is more or less built into it. Namely, suppose T : R2 —>• R2 is an invertible
transformation, say T(u, v) = (x, y). If u) — f(x, y) dx A dy, then

T*u = f(T(u, v))dJf'y\du

o(u,v)
Adv = f{T{u, u))(det DT) dudv,
5.9. Higher Dimensions and Differential Forms 273

so the change-of-variable formula simply says that

(5.74) ff uj= If T*lj.

J Jt(d) JJd

In other words, the formalism of differential forms produces the necessary Jacobian
factor automatically. The change-of-variable formula as we have seen it before
involved |detZ?T| rather than detDT, but this discrepancy is accounted for by
the difference between ordinary integrals and oriented integrals.
Now we turn to the case of integrals over a surface S in W1. The idea is the
same as for line integrals: If u is a 2-form on M3 and S is a surface parametrized
by x = G(u, v), (u, v) e D C K2, we define JJS w by pulling to back to D via G
and using (5.73) to define the resulting integral:

io= 11 G*w.
s JJd
This is independent of the parametrization, in the following sense: If G = G o T
where T : R2 -> R2 is a C(1) transformation, then by (5.74),

ff Gcu = [[ TGa; = ff Guj.

JJd JJd JJt(d)
Let us see how this looks in the case n = 3. If

cj = A dy A dz + B dz A dx + C dx A dy and {x,y, z) = G(u, v),

then G*u> equals

^^^ ^\] duAdv,

and hence JJS u equals

But this is something we have seen before. Indeed, we have

d(y,z), d(z, x), d(x,y) dG dG

d(u,v) d(u,v) d(u,v) du dv

so if we identify uj with the vector field F = Ai + Bj + Ck as in (5.72), we have

lo= flp-ndA.
Js J Js
Hence the notion of surface integrals of vector fields in R3 also fits into the theory
of differential forms.
274 Chapter 5. Line and Surface Integrals; Vector Analysis

3-Forms. A differential 3-form on M.n is an expression of the form

(5.75) uj = P djk(xi,...,xn)dxi Adxj

Here, as in the case of 2-forms, one can think of the expressions dxi A dxj A
simply as formal basis elements, and one can put the indices i,j,k in an order other
than i < j < k with the understanding that whenever one interchanges two of the
dx's one introduces a minus sign. The number of terms in the sum in (5.75) is the
binomial coefficient n!/3!(n — 3)!. When n — 3, this number is 1: All 3-forms on
R3 have the form
uj = f(x, y, z) dx A dy A dz

and hence can be identified with functions:

f(x,y,z)dx Ady Adz < > f(x,y,z).

The notion of exterior product extends so as to yield a 3-form as the product

of three 1-forms or as the product of a 1-form and a 2-form. The idea is pretty
obvious: dxi A dxj A dxk is the exterior product of the three 1-forms dxi, dxj, and
dxk, or the 1-form dx-h and the 2-form dxj A dxk, or the 2-form dxi A dxj and the
1-form dxk- The exterior product distributes over sums and scalar multiples in the
usual way, and the anticommutative law becomes

a A p = (—i)l+ml/3 A a if a is an l-form and (3 is an m-form.

Here is how it works when n = 3: If

a = Aidx + A^dy + A% dz,

/3 = Bidx + B2dy + B3dz,

1 = C\ dx + C<2 dy + C3 dz,

to = Widy Adz + W2dz A dx + W3 dx A dy,

then

Mi A2 A3\
a A (/? A 7) = (a A /?) A 7 = det \Bi B2 B3\ dx A dy A dz,
\Ci C2 C3J
a A to = uj A a = (AiW\ + A2W2 + A3W3) dx A dy A dz.

Thus, if we identify a, 0,7 with the vector fields F, G, H and uj with the vector
field V, the exterior product turns into the scalar triple product and dot product:

a A/? A 7 <—► F-(GxH), a A co <—> F V.

5.9. Higher Dimensions and Differential Forms 275

Pullbacks and integrals of 3-forms work just as before; we restrict ourselves to

the 3-dimensional case. Let u = f{x, y, z) dx A dy A dz. If T : R3 -> R3 is a
C1 transformation, say T(u, v, w) = (x, y, z), we obtain T*co by subsituting in the
formulas for x, y, z, dx, dy, and dz in terms of u, v, w, the result is

T*u; = f(T(u,v,w))®py'Z\
o(u,v,w)
duAdvA dw.
The integral of u over a region D C R3 is denned in the obvious way:
r r r
f(x,y,z)dx Ady Adz- f,
Id JJJd
and the change-of variable formula (for oriented integrals) reads

"=///oTV
We have now sketched the whole idea of differential forms in dimension 3.
In dimension n one needs to develop the theory of /c-forms for all k < n, which
requires the machinery of multilinear algebra.

The Exterior Derivative. When the operations of gradient, curl, and diver
gence are expressed in terms of differential forms, they are all instances of a single
operation, denoted by d and called the exterior derivative, which maps /c-forms
on Rn into (k + 1)-forms on Rn:

0-forms —-* 1-forms —-* 2-forms —-» 3-forms —-» • • • .

Here's how it works.

First, a 0-form is, by definition, a function; if / is a O-form, then df is just the
differential of /. If we identify 1-forms with vector fields, df becomes V/. That
is, the gradient is the exterior derivative on 0-forms.
Now, any fc-form ui with k > 1 is a sum of terms of the form f(3 where / is a
function and j3 is one of the basis elements (dxi for 1-forms, dx{ A dxj for 2-forms,
etc.). du is defined to be the (k + l)-form obtained by replacing each such term f/3
by df A p.
This is what it looks like when u> = A\ dx\ + A2 dx2 + ■ ■ ■ + An dxn is a
1-form:

dtu = dA\ A dx\ + • • • + dAn A dxn

= r\_ OX\ + ■ ■ • + Tr^dxn A dx\ + • • ■ + -7r-^dxi + • • • + -—^dxn A dxn

U^n \ |_ C^l VJsn J

dAj
276 Chapter 5. Line and Surface Integrals; Vector Analysis

When n = 3 and we write x, y, z instead of Xi,x%, X3, we obtain

\dA3 dA2] \dA, dA3]

du = — — \ dy A dz + — — dz A dx
[ dy dz J I dz dx J

\dA2 &4i]
+ -f, r— dx A dy.
I dx dy ]

But this is just the curl! That is, if we identify the 1-form uj and the 2-form du
with vector fields F and G in the standard way, then G = curl F. The curl is the
exterior derivative on 1-forms in M3.
Now suppose that uj = A dy Adz + B dz A dx + C dx A dy is a 2-form. As the
notation in higher dimensions gets messy, we shall write out only the 3-dimensional
case:

duj — dA A dy A dz + dB A dz A dx + dC A dx A dy

= (dxA dx + dyA dy + dzA dz) Ady Adz

+ (dxB dx + dyB dy + dzB dz) A dz A dx

-|- (dxC dx + dyC dy + dzC dz) A dx A dy

= (dxA + dyB + dzC) dx A dy A dz.

(For the last equality we have used the fact that an exterior product containing two
identical factors vanishes and the fact that the product dxAdyA dz is unchanged by
cyclic permutation of its three terms.) If we identify uj with a vector field F and dio
with a function g as before, we see that g = div F. The divergence is the exterior
derivative on 2-forms in K3.
We observed earlier that curl(V/) = 0 for any function / and div(curl F) = 0
for any vector field F. The interpretation of these identities in terms of differential
forms is that d(df) = 0 for any 0-form (function) / and d{duj) = 0 for any 1-form
uj. It is true in general that

(5.76) d{duj) = 0

for any k-form uj on W1. In all cases the proof of this fact boils down to the equality
of mixed partials.
As an illustration of the exterior derivative, we give the relativistically covari-
ant reformulation of Maxwell's equations (5.50). The key idea is to think of elec-
tromagnetism as a phenomenon in 4-dimensional space-time rather than a time-
dependent phenomenon in 3-dimensional space. The electric and magnetic fields
E = (Ex,Ey,Ez) and B = (Bx,By,Bz) are combined into a single entity, the
5.9. Higher Dimensions and Differential Forms 111

electromagnetic field tensor, which we identify in two ways with a 2-form on R4:
uj = c{Ex dx Adt + Ey dy A dt + Ez dz A dt)
+ Bx dy Adz + By dz A dx + Bz dx A dy,

u* = c{Bx dxAdt + By dy A dt + Bz dz A dt)

- Ex dy A dz — Ey dz A dx — Ez dx A dy,

where c is the speed of light. Also, the current and charge densities p and J =
{Jx,Jy,Jz) are combined into a single entity, the 4-currentdensity, which we iden
tify with a 3-form on M4:
7 = c(Jx dy A dz A dt + Jy dz A dx A dt + Jz dx A dy A dt) - p dx A dy A dz.

The four Maxwell equations (5.50) then turn into the two equations

du = 0, duo* = 47T7.

The verification of this is a good way for readers to see whether they have learned
how to compute exterior derivatives!

Stokes's Theorem. We can now state the general theorem that encompasses the
integral theorems of the preceding sections and their higher dimensional analogues:

5.77 Theorem (The General Stokes Theorem). Let M be a smooth, oriented k-

dimensional submanifold of W1 with a piecewise smooth boundary dM, and let
dM carry the orientation that is (in a suitable sense) compatible with the one on
M. Ifu) is a (k — l)-form of class Cl on an open set containing M, then

'dM J J JM

We conclude with a final suggestive remark. The formal differential-algebraic

identity d(dtu) = 0 stated above has a geometric counterpart. The boundary of a
region in the plane is a closed curve with no endpoints, and the boundary of a region
in 3-space is a closed surface with no edge. In general, the boundary of a (smoothly
bounded) region M in a k-dimensional manifold is a (k — 1)-dimensional manifold
with no boundary, that is,

(5.78) d{dM) = 0.

The general Stokes theorem shows that (5.76) and (5.78) are in some sense
equivalent. Indeed, if M is k-dimensional and to is a (k — 2)-form, the Stokes
theorem gives

/ u>= //•••/ dco= ///•••/ d(du).

'd(dM) J J JdM JJJ JM
278 Chapter 5. Line and Surface Integrals; Vector Analysis

If we accept the geometric fact that d(dM) = 0, then the integral on the left
vanishes, and hence so does the integral on the right. But since this happens for
every M, it follows that d(dui) = 0. Similarly, if we know that d(doj) = 0 for
every to, we can conclude that d(dM) = 0. This sort of interplay of algebra,
analysis, and geometry is a significant feature of much of modern mathematics.
Chapter 6

INFINITE SERIES

Infinite series are sums with infinitely many terms, of which the most familiar
examples are the nonterminating decimal expansions. For instance, the equality
vr = 3.14159 ... is an abbreviation of the statement that vr is the sum of the infinite
series
14 15 9
3 + — + --77 +
10 102 103 104 105

The procedure by which one makes sense out of such sums stands alongside dif
ferentiation and integration as one of the fundamental limiting processes of mathe
matical analysis. Just as decimal expansions provide a useful way of obtaining all
real numbers from the finite decimal fractions, infinite series provide a flexible and
powerful way of building complicated functions out of simple ones.
This chapter is devoted to the foundations of the theory of infinite series. In
it we develop the basic facts about series of numbers; then in the next chapter we
proceed to the study of series of functions.

6.1 Definitions and Examples

Informally speaking, an infinite series (or just a series, for short) is an expression
of the form
oo

y~] an = au + ffli + Q2 H •
o

Here the a^'s can be real numbers, complex numbers, vectors, and so on; for the
present, we shall mainly consider the case where they are real numbers.

279
280 Chapter 6. Infinite Series

It is not immediately clear what precise meaning is to be attached an expres

sion of the form ^^° an that involves a sum of infinitely many terms. The formal
definition must be phrased in terms of limits of finite sums, as follows.
Given a sequence {an}o° of real numbers (or complex numbers, vectors, etc.),
we can form a new sequence {sfc}o° by adding up the terms of the original sequence
successively:

so = a-0, si = ao + ai, s2 = aQ + ax + a2, ...,

Sk = «fl + a,i + ■ • • + a/j.

An infinite series is formally defined to be a pair of sequences {an} and {s^} re

lated by these equations, and the notation Yo^ o,n is to be regarded as a convenient
way of encoding this information. The an's are called the terms of the series, and
the s/,:'s are called the partial sums of the series. If the sequence {s^} of partial
sums converges to a limit S, then the series is said to be convergent, S is called its
sum, and we write J^^° an = S; otherwise, the series is said to be divergent, and
no numerical meaning is attached to the expression J2q° an- (However, if s^ —> oo
as k —> oo, we may say that ^^° an = oo.)
Remark. We have elected to start the numbering of the sequences {an} and
{s/J at n = 0 and k = 0, since this is perhaps the most common situation in
practice. However, we could equally well start at some other point, for instance,

/ ^ an = ag + oq + ar H ,
5

for which we would write

■55 = OS, Sq = (25 + 0,6, S7 = 05 + (Iq + 07, ....

Before proceeding further, let us record a couple of very simple but important
facts about series.

6.1 Theorem.
a. If the series Y^ an and ^^° bn are convergent, with sums S and T, then
Xô°(a« + ^«) i-s convergent, with sum S + T.
b. If the series Y^ an is convergent, with sum S, then for any c €. M the series
Xô° ca« *5 convergent, with sum cS.
c. If the series ^^° an is convergent, then lirrinôo an = 0. Equivalently, if
an -/-> 0 as n —> oo, then the series ]P^° an is divergent.
6.1. Definitions and Examples 281

Proof. Let {sk} and {tk} be the sequences of partial sums of the series ]T)^° a«
Y^ bn, respectively, (a) and (b) follow from the fact that if sk —> 5 and ifc —> T,
then Sfc + tk —>■ 5 + T and cs^ —> cS. As for (c), we observe that an — sn - sn-%.
If the series converges to the sum 5, it follows that lim an = lim sn — lim sn_i =
5-5 = 0. □

At present we are thinking primarily of series whose terms are numbers, but
most of the really significant applications of series come from situations where the
terms an depend on a variable x. In this case the series ^o° an{%) may converge
for some values of x and diverge for others, and it defines a function whose domain
is the set of all x for which it converges. We shall explore this idea in more detail
in the next chapter; at this point we recall some familiar examples.
One of the simplest and most useful infinite series is the geometric series, in
which the ratio of two succeeding terms is a constant x. That is, the geometric
series with initial term a and ratio x is

a + ax + ax2 + ax3 + • • • = V^ axn.

The constant a can be factored out, according to Theorem 6.1b, so it suffices to

consider the case a = I.

The partial sums sk = ^o x" of the series So° x" are easily evaluated. If
X = 1, then of course s^ = 1 + 1 + ■ • ■ + 1 = k + 1. If x / 1, we observe that

sk = 1 + x -\ hxfc,
xsk = x-\ \-xk + xk+1,

and subtracting the second equation from the first yields (1 — x)sh = 1 — xk+1.
Therefore,

I _ x

(6.2) Sfc = ■ if x t^ 1, Sk — k + lif x = 1.

1 - x

If \x\ < 1, then xk+1 —> 0 as k —> oo, so 8% —> (1 — x) l. It also follows easily
from (6.2), or from Theorem 6.1c, that {s/J diverges when \x\ > 1. In short, we
have:

6.3 Theorem. The geometric series ^o° xH converges if and only if \x\ < 1, in
which case its sum is (1 — x)~l.
282 Chapter 6. Infinite Series

Another familiar result that leads to infinite series is Taylor's theorem. We

recall that if / is a function of class C°° (that is, possessing derivatives of all orders)
on some interval {—c,c) centered at the origin, for any positive integer k we have

(6.4) f{x) = f'(0)x Rk(x) (\x\ < c).

If it happens that R^ (x) —>■ 0 as k —> oo, we can let k —> oo in (6.4) to obtain an
infinite series expansion of f(x), the Taylor series of / (centered at x = 0):

(6.5,

One simple sufficient condition to guarantee that Rk{x) —> 0 follows from the
estimate for the Taylor remainder in Corollary 2.61:

6.6 Theorem. Let f be a function of class C°° on the interval (—c,c), where
0 < c < oo.
a. If there exist constants a, b > 0 such that \f(k\x)\ < abkk\for all \x\ < c and
k > 0, then (6.5) holds for \x\ < min(c, 6"1).
b. If there exist constants A7B > 0 smc/i f/iaf |/(fc)(x)| < ABk for all \x\ < c
and k > 0, ^/zew (6.5) holds for \x\ < c.

Proof. By Corollary 2.61, the estimate \f^(x)\ < abkk\ implies the estimate
\Rk-i{x)\ < a|6a;|fc for \x\ < c. If also \x\ 0 as A; —> oo,
so (6.4) yields the result (a). To deduce (b), we observe that the factorial function
grows faster than exponentially (see Example 5 in §1.4), so that for any positive
A, B, and b, the sequence A(B/b)h/kl tends to zero as k —> oo. Letting a be the
largest term in this sequence, we have

ABk = bkk\ < abkk\,

so the estimate \f^k\x)\ < ABk, for a given A and B, implies the estimate
W < abkk\ for every b > 0 (with a depending on b). Hence (b) follows
from (a). □

Remark. The interval (—c,c) might not be the whole set where the function
/ and its derivatives are defined. It may be necessary to restrict x to a proper
subinterval of the domain of / to obtain the estimates on f^k\x) in Theorem 6.6,
as Example 2 will show.
6.1. Definitions and Examples 283

Example 1. Let f(x) = cos x. The derivatives f^ (x) are equal to ± cos x or
± sin x, depending on k, so they all satisfy \j^ (x) \ < 1 for all x. By Theorem
6.6b, it follows that cosx is the sum of its Taylor series, 2^°(—l)nx2n/(2n)\,
for all x. For exactly the same reason, sin x is the sum of its Taylor series,
£g°(-l)nx2n+1/(2n + 1)!, for all x.
Example 2. Let f(x) = ex. Here f^(x) = ex for all k. We cannot obtain a
good estimate on /^ (x) that is valid for all x at once, but for \x\ < c we have
|/(fc'(x)| < ec. By Theorem 6.6b, it follows that ex is the sum of its Taylor
series, ^^° xn/n\, for \x\ < c. But c is arbitrary, so in fact ex = Y^ xn/nl
for all x.

Finally, we mention one other simple type of series that arises from time to
time. Just as J f(x) dx is easy to compute when / is the derivative of a known
function, the series £^ an is easy to sum when the terms an are the differences of
a known sequence {bn}. That is, suppose ao = b0 and an = bn - bn-i for n > 1;
then

Sk — oo + ai H + (ik — b0 + (6i - b0) + ■ ■ ■ + {bk - bk-i) = &fc,

so the series ^o° an converges if and only if the sequence {bn} converges, in which
case ^^° an = lim bn. Such series are called telescoping series.

EXERCISES

1. Find the values of x for which each of the following series converges and com
pute its sum.
a. 2(x + 1) + 4(x + I)4 + 8(x + I)7 + • • ■ + 2n+1(x + l)3n+1 + • • •
b. 10x"2 + 20x"4 + 40x~6 + • • • + 10 • 2naT2(n+i) + • • ■
c. 1 + (1 - x)/(l + ar) + (1 - x)2/(l + x)2 + • • • + (1 - x)"/(l + x)n + ■ • ■
d. logx + (logx)2 + (logx)3 H 1- (logx)" H
2. Tell whether each of the following series converges; if it does, find its sum.
ii3,5,_9_i17i
a. x -r 4 T g t jg t 32 t ■

c. (v 2 — v 1) + (v 3 — v 2) + (\/4 — v3) + • • •
H1
a. 1-51 +
I 11 1-4-1
3 + 1-4J- +i_ 1-5
1 1_L
+

3. Let /(x) = log(l + x). Show that the Taylor remainder Ro,k{x) (defined by
(2.54)) tends to zero as k —» 00 for — 1 < x < 1, and conclude that
oo
xn
log(l + x) = V(-l)"+1— for - 1 < x < 1.
1
284 Chapter 6. Infinite Series

(Hint: Lagrange's formula for i?0,fc easily yields the desired result when —| <
x < 1 but not when -1 < x < —|. For x < 0, use the integral for
mula (2.56) for Rq^ and the mean value theorem for integrals to show that
\Ro,k(x)\ = \x\ix' - x)n(x' + l)~n~l for some x' G (x,0), and thence show

4. Given a sequence {an} of numbers, let f|1 an denote the product of the num
bers ai,..., afc. The infinite product f]^° an is said to converge to the number
P if the sequence of partial products converges to P:

oo k

nan = lira 1T an = lira aia2

(TVofe: In many books one finds a more complicated definition that takes ac
count of the peculiar role of the number 0 with regard to multiplication.)
a. Show that if Y[^° an converges to a nonzero number P, then lirxin^oo an =
1. (This is the analogue of Theorem 6. lc for products.)
b. Show that if YIT an converges to a nonzero number P, then ^i° l°g an
converges after omission of those terms for which an < 0. (By (a), there
can only be finitely many such terms, and no an can be 0.) Conversely,
show that if an > 0 for all n and J2T 1°S an converges to 5*, then r]J° an
converges to es. (See also Exercise 5 in §6.3.)

6.2 Series with Nonnegative Terms

In this section we begin the systematic study of the convergence of infinite series
by considering series with nonnegative terms. If an > 0 for all n, the partial sums
$k — Q-o + • ■ ■ + «fc form an increasing sequence. By the monotone sequence
theorem, therefore, the series J^^° an converges if and only if the partial sums s^
have a finite upper bound. This observation leads to a variety of comparison tests,
in which the partial sums 8k are compared to more easily computable quantities
that can be shown to be bounded or unbounded.

The Integral Test. If an = f(n) where / is a function of a real variable, a

sum J2n=j an can be compared to an integral f. f(x) dx. The virtue of this idea is
that although integration is a more sophisticated concept than summation, integrals
are often easier to compute than sums! The fundamental theorem, whose pictorial
meaning is indicated in Figure 6.1, is as follows:
6.2. Series with Nonnegative Terms 285

Figure 6.1: Comparison of jk f(x)dx (the area under the curve)

with £*-*/(n
n=j i=7+i /(n) ^ts uPPer and lower Riemann
sums).

6.7 Theorem. Suppose f is a positive, decreasing function on the half-line [a, oo).
Then for any integers j, k with a < j < k,

fc-1

/(n)> / /(s)da!> X] /(*)■

n=j+l

Proof Since / is decreasing, for n < x < n +1 we have f(n) > f(x) > /(n+1),
and hence

/(n)=/ f(n)dx> f{x)dx>

Jn Jn

Adding up these inequalities from n — j to n = k — 1, we obtain the asserted

result. □

An immediate corollary is the following test for convergence.

6.8 Corollary (The Integral Test). Suppose f is a positive, decreasing function on

the half-line [1, oo). Then the series Y^T f{n) converges if and only if the improper
integral J^° f(x) dx converges.

Proof Let sk = Y.n=i f(n)- If Ji°° f(x) dx < °°' we have

/k roo
f(x)dx<f(l)+ I f(x)dx,
v—^

= /(l)
286 Chapter 6. Infinite Series

so the partial sums are bounded above and hence the series converges. On the other
hand, if J^3 f(x) dx = oo, we have

fc-i k

sk = Y, /(") + f(k) * / fix) dx + f(k) -> oo as k -> oo,

so the series diverges. □

Of course, a similar result relates J2T /(n) to JJ° /(x) dx, for any integer J.
We chose J = 1 because it is appropriate for the following important application.

6.9 Theorem. The series ^2^° n~p converges ifp > 1 and diverges ifp < 1.

Proof. The same is true of the integrals f^° x~'p dx, for

K f(p-l)-1 ifp>l,
x~p dx = lim
1 — p oo if p < 1,

and j^° x~x dx = lim^^oo logx . = oo. □

Theorem 6.7 does more than provide a test for convergence; it also provides an
approximation to the partial sums and the full sum of the series. In the convergent
case, this can be used to provide a numerical approximation to the sum Y2T f(n)
or an estimate of how many terms must be used for a partial sum to provide a good
approximation; in the divergent case, it can be used to estimate how rapidly the
partial sums grow.
Suppose, for example, that / is positive and decreasing, and that J^° f(x)dx <
oo. By letting k —> oo in Theorem 6.7, we obtain

E
and hence
^OG _°°_ /'OO

- / f(x)dx.

This gives an approximation to the sum J^T f(n) with an error of at most /(I).
A better approximation can be obtained by using this estimate not for the whole
series but for its tail end:

f(x) dx < Y, f(n) < f(k) + / f{x) dx.

k —
J
7fc
6.2. Series with Nonnegative Terms 287

Adding on the first k — 1 terms of the series, we see that

oo fc-1 oo

(6.10) ^/(n) = ^2f{n) + / f(x) dx, with an error of at most f(k).

i o ■'k
The error f(k) will be as small as we please provided k is sufficiently large.

Example 1. To evaluate YT n~4 wim an error °f at most 0-0001, we take

k = 10 in (6.10) to get

l
Jw

A bit of work with a pocket calculator yields the value of this last sum as
1.08226 ..., so we can conclude that 1.08226 < YT n~4 < 1-08236. (The
exact value of YT n~4 is ^V90 = 1-0823232 ...; see Exercise 3 in §8.3 or
Exercise 9a in §8.6.)

General Comparison Tests. One can often decide whether a series of nonneg-
ative terms converges by comparing it to a series whose convergence or divergence
is known. The general method is as follows.

6.11 Theorem. Suppose 0 < an < bnforn > 0. IfYT^n converges, then so
does YT an- tfYT an diverges, then so does YT ^n-

Proof. Let sk = ^ an and tk = £o bn; thus 0 < sk < tk for all k. If Y^ b n

converges, the numbers tk form a bounded set; hence so do the numbers sk, so the
sequence {s/J converges by the monotone sequence theorem. This proves the first
assertion, to which the second one is logically equivalent. □

A couple of remarks are in order concerning this result. First, the convergence
or divergence of a series is unaffected if finitely many terms are deleted from or
added to the series. Hence, the comparison an < bn only has to be valid for all
n > N, where N is some (possibly large) positive integer. Second, the convergence
or divergence of a series is unaffected if all the terms of the series are multiplied by
a nonzero constant. Hence, the comparison o,, < bn can be replaced by an < cbn,
where c is any positive number.
When an is an algebraic function of n (obtained from n by applying various
combinations of the arithmetic operations together with the operation of raising to
a power, x —> xa), one can usually decide the convergence of Y an by comparing
288 Chapter 6. Infinite Series

it to one of the series ^i° n~p> discussed in Theorem 6.9. The rule of thumb,
obtained by combining Theorems 6.9 and 6.11, is that if an > cn~1 then y] an
diverges, whereas if an < cn~p for some p > 1 then ^ an converges.

Example 2. The series YlT(2n ~ 1)"1 = 1 + f + f + • • • diverges by

comparison to J2T n~1> f°r
1 111

In -1 2n 2 n

Example 3. The series ^i°(n2 ~ 6n + 10)"1 converges by comparison to

J2T n~2' but here the comparison takes more work to establish. Since 6n > 10
except for n = 1, it is not true that (n2 - 6n + 10)"1 < nT2. However, we can
observe when n > 12 we have 6n < in2, 2'
and hence

1 112
12),
n2 - 6n + 10 (?i2/2) + 10 («2/2) n2 v<

which gives the desired comparison. However, there is also a simpler way to
proceed. The key observation is that when n is large, —6n + 10 is negligibly
small in comparison with n2, so (n2 - 6n + 10)"1 is practically equal to n~2.
More precisely,

(n2-6n + 10)-1 n2 1
^I32 " n2 - 6n + 10 " 1 - On"1 + lOn"2 ^ ! as n ^ °°'
which immediately gives the comparison (n2 — 6n + 10)-1 < 2n~2 when n is
large.

The second method for solving Example 3 can be formulated quite generally;
the result is often called the limit comparison test:

6.12 Theorem. Suppose {an} and {bn} are sequences of positive numbers and
that an/bn approaches a positive, finite limit as n —> oo. Then the series ]P^° an
and X^o° ^n we either both convergent or both divergent.

Proof. If an/bn —> / as n —> oo, where 0 < / < oo, we have hi < an/bn < 21
when n is large; that is, an < 2lbn and bn < (2/l)an. The result therefore follows
from Theorem 6.11 and the remarks following it. D

Theorem 6.12 can be extended a little. If an/bn -4 0 as n —> oo, then an <
bn for large n, so the convergence of ^ bn will imply the convergence of Y an-
Likewise, if an/bn —> oo, then an > bn for large n, so the convergence of J2an
will imply the convergence of £)&n- However, the reverse implications are not
valid in these cases.
6.2. Series with Nonnegative Terms 289

Comparisons to the Geometric Series. There are a couple of very useful

convergence tests that are based on a comparison to the geometric series YlT rn,
where r > 0. We recall that this series converges for r < 1 and diverges for r > 1.

6.13 Theorem (The Ratio Test). Suppose {an} is a sequence of positive num
bers.
a. Ifan+i/an < r for all sufficiently large n, where r < 1, then the series Y^ an
converges. On the other hand, ifan+i/an > I for all sufficiently large n, then
the series Y^ an diverges.
b. Suppose that I — limn^oo an+i/an exists. Then the series ^^° an converges
if I < 1 and diverges if I > 1. No conclusion can be drawn if I = 1.

Proof. Suppose an+i/an < r < 1 for all n > N. Then

r2a,N, a-N+3 < <

so api+m < rmaN for all m > 0. The series YT an therefore converges by
comparison to the geometric series Y rm:

an < o,0 H + oj\r_i + ajv(l + r + r2 -\ ) < oo.

On the other hand, if an+i/an > 1 then an+i > an; if this is so for all n > N,
then an -ft 0, so Y an cannot converge. This proves (a).
Assertion (b) is a corollary of (a). If I < 1, choose r with I < r < 1. If
liman+i/an = I, then an+i/an < r for large n, so ]T) an converges. If / > 1,
then an+i/an > 1 for large n, so J2an diverges. Finally, if we take an = n~p,
we know that Y1T an converges if p > 1 and diverges if p < 1; but an+i/an =
{n/(n + l)]p —> 1 no matter what p is. Hence the test is inconclusive if / = 1. □

6.14 Theorem (The Root Test). Supppose {an} is a sequence of positive num
bers,
a. If an < r far all sufficiently large n, where r < 1, then the series ^o° a«
converges. On the other hand, ifan > I for all sufficiently large n, then the
series Y^ o-n diverges.
b. Suppose that I = limn
limn_>oo an exists. Then the series ^g30 an converges if
I < 1 and diverges if I > 1. No conclusion can be drawn if I = 1.

Proof. If an < r, we have an < rn, so we have an immediate comparison to the

geometric series Y,r™ that gives the convergence of Y °n when r < 1. If an > 1
then an > 1, so an /> 0 and Y an diverges. This proves (a).
290 Chapter 6. Infinite Series

Part (b) follows as in the proof of the ratio test. If an —> I < 1, we pick
r G (/, 1) and obtain an < r for large n, so Yl an converges. If an —*■ I > 1,
then an > 1 for large n, and ^ an diverges. Finally, for an = n~p we have
o-n = rCpln —♦ 1 for any p, so the test is inconclusive when i = 1. D

: In the last line of this proof, and in Example 4 below, we use the fact
that linix-Kjo xx/x = 1. To see, this, observe that log^1/^) = (\ogx)/x, and
limx^oo(logx)/a; = 0 by l'Hopital's rule.

It can be shown that if an+i/an converges to a limit I, then al/n also converges
to the same limit; but the convergence of an does not imply the convergence of
an+i/an. (See Example 6.) Thus the root test is, in theory, more powerful than
the ratio test. However, the ratio test is often more convenient to use in practice,
especially for series whose terms involve factorials or similar sorts of products.

Example 4. Let an = n2/2n. The ratio test and the root test can both be used
to establish the convergence of ^2^ an:

an " n2/2™ " 2 [ n J 2' ra " 2 2"

Example 5. Let an = ; . Here the root test is cumber-

2nn\
some, but the ratio test works easily:

an+l _ 1 • 4 • • • (3n + l)(3n + 4)/2n+1(n + 1)! 3n + 4 3

an " 1 -4--- (3n + l)/2nn! ~2(n + l)~*2'
so ^o° an diverges.

Example 6. Let an = 2~n/2 if n is even and an = 2"(n~1)/2 if n is odd; thus

Z /
+ +
0

Here an+i/an equals 1 if n is even and ^ if n is odd, so the ratio test (even
the more general form in part (a) of Theorem 6.13) fails; its hypotheses are not
satisfied. But the root test works: an equals 2"1/2 if n is even and 2~(n~1^2n
if n is odd; both of these expressions converge to 2~1//2 as n —> oo, so the series
converges. (Of course, this can also be proved more simply. By grouping the
terms in pairs, one sees that ^^° an = 2 ^^° 2~m = 4.)
6.2. Series with Nonnegative Terms 291

Raabe's Test. The ratio test and the root test are, in a sense, rather crude,
for the indecisive cases where liman+i/an = 1 or liman = 1 include many
commonly encountered series such as Y^° n~p. The reason for this insensitivity is
that the terms of the geometric series Y2 r™ either converge to zero exponentially
fast (if r < 1) or not at all (if r > 1), so they do not furnish a useful comparison for
quantities such as n~v that tend to zero only polynomially fast. However, there is
another test, Raabe's test, that is sometimes useful in the case where lim an+\/an =
1. The class of problems for which Raabe's test is effective is rather limited, and
there is another way of attacking the most important of them that we shall present in
§7.6. Hence we view Raabe's test as an optional topic; however, the insight behind
it is of interest in its own right.
The idea is to use the ratios an+i/an to compare the series ^2 an to one of the
series Y2 n~p rather than to the geometric series. For the series Yl n~%'> me rati°
of two successive terms is (n + l)~p/n~p = [1 + (l/n)]~p. To put this quantity
in a form more amenable to comparison, we use the tangent line approximation to
the function f(x) = (1 + x)-p at x = 0. Since f'(x) = -p(l + x)^"1 and
f"(x) = p(p + 1)(1 + x)~p~2, Lagrange's formula for the error term gives

(1 + x)-p = l-px + E(x), 0 < E(x) < P^P + 1V for x > 0.

Hence,

(6.15)
n\ n

Thus, n[l — [n + l)~p jn~p\ is approximately p when n is large. With this in mind,
we are ready for the main result.

6.16 Theorem (Raabe's Test). Let {an} be a sequence ofpositive numbers. Sup
pose that

-> 1 and nil —► L as n —* oo.

an V an J
If L > 1, the series ^2 CLn converges, and if L < 1, the series ^ an diverges. (If
L = 1, no conclusion can be drawn.)

Proof. If L > 1, choose a number p with 1 p, that is, an+i/an < 1 — (p/n). Thus, by (6.15),

fln+i -, _ P_ (n+ l)~p qn+i ^o^

n
292 Chapter 6. Infinite Series

Thus the sequence {an/n~p} is decreasing, so it is bounded above by a constant

C. In other words, an < Cn p, so since p > 1, X] an converges by comparison to

On the other hand, if L < 1, choose numbers p and q with L < q 
1 - (q/n). If also n > p(p + l)/2(p - g), we have p(p + l)/2n2 < (p - q)/n, so
by (6.15),

«n+l . 9 _ 1 £ p-g 1 p _, (n-f 1)~P

an n n n n n~p

Thus (n + 1) p/an+i < n~p/an, so the sequence {n~p/an} is decreasing. As

before, this gives n~p < Can, and p < 1 in this case, so ^ an diverges by com
parison to J3 n~p. □

The main applications of Raabe's test are to series whose terms involve quo
tients of factorial-like products. The following example is typical.

l-4-7---(3n
Example 7. Let an = —■ „„_ . ' '--—. We have
?/,23"n!

an+i_ _ 1 ■ 4 ■ ■ ■ {3n + l)(3n + 4)/(n + l)23n+1 (n + 1)! __ (3n + 4)n2

«„ 1 ■ 4 ■ • • (3n + l)/n23nn! ~~" 3(n + I)3 '

This tends to 1 as n —> oo (the dominant term on both top and bottom is 3n3),
so the ratio test fails. But

1 _ a"+ll _ |\ _ (3n + 4)n2l __ 5"-3 + 9ri2 + 3 , 5

o(n + 1)° J J(n + ljJ 3

and | > 1, so the series J2 °>n converges.

Concluding Remarks. Faced with an infinite series ^an, how does one de
cide how to test it for convergence? Some series require more cleverness than
others, but the following rules of thumb may be helpful.

• Does an —> 0 as n —> oo? If not, J^ an diverges.

• If an is an algebraic function of n (say, a rational function of n, or a similar

expression involving fractional powers of n), try comparison with ^ n~p for
a suitable value of p.

• If an involves expressions with n in the exponent, try the ratio test or the root
test.
6.2. Series with Nonnegative Terms 293

• If an involves factorial-like products, the ratio test is the best bet. If the ratio
test fails because liman+i/an = 1, try Raabe's test.

• The integral test may be useful when numerical estimates are desired or when
the series is near the borderline between convergence and divergence.

In any case, one should beware of confusing the various sequences that arise in
the study of infinite series. For any infinite series Yl an> one has the sequence {an,}
of terms and the sequence {s^} of partial sums. In the ratio test, one considers the
sequence {an+i/an} of ratios of successive terms of a series, whereas in the limit
comparison test, one considers the sequence {an/bn} of ratios of corresponding
terms of two different series. Don't mix these sequences up!

EXERCISES

In Exercises 1-18, test the series for convergence.

^ \Jn + 1
n2 _ 4n _|_ 5 ■

x 2n2 - n
, 2n8/3

n + 1

n! '

3n(2n)!
0
OC'
n!
7.
10n'
oc

\-100
8.
2
oo
1 ■ 3 ■ ■ • (2n + 1)
u
7■
/ ^ ■•(3n + 2)
0
294 Chapter 6. Infinite Series

00 / 1 \9

{2n)V
0
00
on |
O Tl.
11. y ■

n
0

( " y2
00

53
) ■
12.

E (l/n)].
r-i
13.
1
00
y/n+ 1 - V"
14.
E \fn-+^
1
00
n
15.
E 1
sin _
n2 H--3"
00
n2 [vr + (_1)n]n

16.
E 5
1
00
(2n - 1)
17
it*

1
4.6... (2n + 2)
00
2-4- • • (2n)

19. Suppose an > 0. Show that if ^T an converges, then so does J2 a« f°r any

20. Show that E ~7j ^~ converges if p > 1 and diverges if p < 1.

2
00 1

21. For which p does 7^ — — — converge?

z--' n(log n) (log log n)P

22. By Exercise 20, ^2° Vtn ^°g n] diverges while J2T l/[n(^°g n)2] converges.
Use Theorem 6.7 to show that

10*' oo

4.88 < V — < 5.61, 0.011.

nlogn

The point is that for series such as these that are near the borderline between
convergence and divergence, attempts at numerical approximation by adding
6.3. Absolute and Conditional Convergence 295

up the first few terms aren't much use. If you add up the first 1040 terms of the
first series, you get no clue that the series diverges; and if you add up the first
1040 terms of the second one, the answer you get still differs from the full sum
in the second decimal place. (By way of comparison, the universe is around
1018 seconds old, and the earth contains around 1050 atoms.)
23. Verify that x/(x2 + I)2 is decreasing for x > 3"1/2, and thence show that
0.38 < £fn/(n2 + l)2 < 0.41.
24. Let ck — l + \ + --- + \- log k. Show that the sequence {cfc} is positive
and decreasing, and hence convergent, (lim^oo Ck is conventionally denoted
by 7 and is called Euler's constant or the Euler-Mascheroni constant. It is
approximately equal to 0.57721; it is conjectured to be transcendental, but at
present no one knows whether it is even irrational.)

25. Suppose an > 0 for all n > 0, and let L = limsupa^ n (see Exercises 9-12
in §1.5). Show that J2T an converges if L < 1 and diverges if L > 1.

6.3 Absolute and Conditional Convergence

We now consider the question of convergence of series whose terms may be either
positive or negative. To a certain extent, this question may be reduced to the study
of series with nonnegative terms, via the notion of absolute convergence.
A series ^)£° an is called absolutely convergent if the series Y^ \an\ con
verges. For series with nonnegative terms, absolute convergence is the same thing
as convergence. For more general series, the basic result is as follows.

6.17 Theorem. Every absolutely convergent series is convergent.

Proof. Suppose ]T^° \an\ converges. Let s^ = Yloan and Sk = Ylo \an\- The
sequence {Sk} is convergent and hence Cauchy, so given e > 0, there exists an
integer K such that

aj+\\ + • ■ ■ + |a/c| = Sk — Sj < e whenever k > j > K.

But then

| + • ■ • + flfcl < \aj+i\ + ' ■' + lajl < e whenever k > j > K,

so the sequence {sfc} is also Cauchy. By Theorem 1.20, the sequence {sk}, and
hence the series ]T} an, is convergent. □
296 Chapter 6. Infinite Series

Important Remark. We can consider series whose terms are complex numbers
or n-dimensional vectors instead of real numbers. The definition of absolute con
vergence is the same, with \an\ denoting the norm of the vector an. Theorem 6.17
remains valid in this more general setting, with exactly the same proof.

The converse of Theorem 6.17 is false; a series that is not absolutely convergent
may still converge because of cancellation between the positive and negative terms.
A series that converges but does not converge absolutely is said to be conditionally
convergent.

EXAMPLE 1. Let an = l/(n + 1) if n is even, an = -1/n if n is odd; thus,

3 3 """ 5 5 "f"
0

Clearly s^ = 0 if A; is odd and Sk = l/(k + 1) if k is even, so the series

converges to the sum 0. However,
oo oo

0 0 ""* '

which diverges by comparison to Yl n~1-

EXAMPLE 2. Here is a more interesting example. The series

_i\n-l

n * ° *
i

is not absolutely convergent since J2T n 1 diverges. However, it is the Taylor

series for f(x) = log(l + x) at x = 1. Indeed, for n > 0 we have f(n\%) =
(—l)ra~1(n — 1)!(1 + a;) n, so Taylor's formula gives

(n — 1)! „ _ , .
k{x)
TV.

x2 x3
■■ + -^1 +Rk(x),
and by Corollary 2.61,

1 (-l)kk\ 1
[k -\- lj! o<i<i (1 ~l~ v "<

which tends to zero as k —> oo. It follows that ]Ci°(~l)™ Vn converges to

log 2.
6.3. Absolute and Conditional Convergence 297

It is to be emphasized that conditionally convergent series converge only be

cause of cancellation between positive and negative terms. More precisely, let

a^ = max(an,0) a~ = max(—an,0).

That is, a+ = an if an is positive and a+ = 0 otherwise, and a~ = \an\ if an

is negative and a~ = 0 otherwise; the nonzero a+'s are the positive terms of the
series Y2 an, and the nonzero a~'s are the absolute values of the negative terms.
Observe that

at - an = an, <^n + an = \an ■

6.18 Theorem. IfY2 an is absolutely convergent, the series Y2 a+ and Y2 a~ are
both convergent. IfY2an is conditionally convergent, the series J2 an an<^ S °n
are both divergent.

Proof. This theorem follows from the following three facts:

i. The convergence of Y2 \an\ implies the convergence of ]P a+ and ^ a~.
ii. The divergence of ^ |anj implies the divergence of at least one of Y^ an

iii. If Yl an converges, it cannot happen that one of Yl at and Yl an converges

while the other one diverges.
The first of these is clear since 0 < a+ < \an\ and 0 < an < \an\, and the second
is clear since \an\ = a+ + a~. As for the third, let s& and s^ denote the A;th partial
sums of the series ]T an and Y^ at> tnus sk = st ~ sk- Suppose, to be definite,
that ^0.^ = 00 while J2an ~ S < oo; then for any C > 0 (no matter how
large), for sufficiently large k we will have s^ > C + S, while sk < S, so that
Sfc > C + S — S = C. It follows that s^ —> +oo, so Y2 an diverges. D

Absolutely convergent series are much more pleasant to deal with than condi
tionally convergent ones. For one thing, they converge more rapidly; the partial
sums Sk of conditionally convergent series tend to provide poor approximations to
the full sum unless one takes k very large because the divergence of Y2 \an\ implies
that an cannot tend to zero very rapidly as n —» oo. For another thing, the sum
of an absolutely convergent series cannot be affected by rearranging the terms, but
this is not the case for conditionally convergent series!
Let us explain this mysterious statement in more detail. The terms of a series
^o° an are presented in a definite order: ao, Oi, a2, We might think of forming
a new series by writing down these terms in a different order, such as

ao,a2,ai,a4,a6,a3,«8) 0-10)05, ■ • •,
298 Chapter 6. Infinite Series

where we take the first two even-numbered terms, the first odd-numbered term,
the next two even-numbered terms, the next odd-numbered term, and so forth. In
general, if u is any one-to-one mapping from the set of nonnegative integers onto it
self, we can form the series Y"^ a<r(n), which we call a rearrangement of ^o° an-
(The reasons why we would want to do this are perhaps not so clear right now, but
we will encounter situations in §6.5 where this issue must be addressed.) The sharp
contrast between absolutely and conditionally convergent series with respect to re
arrangements is explained in the following two theorems.

6.19 Theorem. If Y^ an is absolutely convergent with sum S, then every rear

rangement Y'q' Qo-(n) is a^so absolutely convergent with sum S.

Proof. First suppose an > 0 for all n. Every term of the rearranged series Y aa(n)
is among the terms of the original series Y an, and hence the partial sums of the
rearranged series cannot exceed 5. It follows that the full sum S' of the rearranged
series satisfies S' < S. The same reasoning shows that S < S', so S' = S.
Now we do the general case. If < oo, we have Y \aa(n) | < oo by what
we have just proved. Hence, given e > 0, for k sufficiently large we have

\an\ < eand < e.

fe+l

Given such a k, let K be the largest of the numbers <r(0),..., a(k), so that

{o-(0),a(l),...,a(k)}c{0,l,...,K}.

The elements of {0,1,... ,K}\{a(0),a(l),... ,a(k)} are among the u(n)'s with
n > k + 1, so
K k oo

an — 2_^ Cla-(n) ^ 2_] |o<tC»i.iI

<r(n) < 6.
0 0 fc+1

But then

ir K

~S E an - S < e + V o- < 2e.

o K+l

As e is arbitrary, we conclude that ^^° aain\ = S. D

6.20 Theorem. Suppose ^o° an *s conditionally convergent. Given any real num
ber S, there is a rearrangement Y'q' aa-(n) tnat converges to S.
6.3. Absolute and Conditional Convergence 299

Proof. By Theorem 6.18, the series J2 at and 12 an °f positive and negative terms
from Y^ an both diverge; but since J2 an converges, we have an —> 0 as n —> oo.
These pieces of information are all we need.
Suppose 5 > 0. (A similar argument works for S < 0.) We construct the
desired rearrangement as follows:
1. Add up the positive terms from the series ^ an (in their original order) until
the sum exceeds S. This is possible since J2an = °°- Stop as soon as the sum
exceeds S.
2. Now start adding in the negative terms (in their original order) until the sum
becomes less than S. Again, this is possible since ^ a~ = oo. Stop as soon as
the sum is less than S.
3. Repeat steps 1 and 2 ad infinitum. That is, add in positive terms until the sum
is greater than S, then add in negative terms until the sum is less than S, and
so forth. This process never terminates since the series J2 a+ and ^ an both
diverge, and sooner or later every term from the original series will be added
into the new series. The result is a rearrangement 12^ a<j(n) of the original
series.
We claim that this rearrangement converges to S. Indeed, given e > 0, there exists
an integer N so that \an\ < e if n > N. If we choose K large enough so that all
the terms ao, ai,..., a^ are included among the terms aa(o),aa(i)i ■ ■ ■ aa(K)> then
lao-(n)l < t if n > K. It follows that the partial sums Y,o aa(n) differ from 5 by
less than e if k > K, because the procedure specifies switching from positive to
negative terms or vice versa as soon as the sum is greater than or less than S; if
the sum became greater than 5 + e or less than S - e, we would have added in too
many terms of the same sign. Hence the sums Ylo acr(n) converge to S. □

EXERCISES

1. Show that the following series are absolutely convergent.

a. J2™xH c°sn$ (\x\ <1,6 £l).
b. X^n"2 sin n6> (0 € R).
c. ^J°(-l)"n231-na;n (\x\ < 3).
2. Suppose J2 an is conditionally convergent. Show that there are rearrangements
of ^ an whose partial sums diverge to +oo or —oo.
3. Consider the rearrangement of the series Y2T(~^)n~l/n obtained by taking
two positive terms, one negative term, two positive terms, one negative term,
and so forth:

1+3~2 + 5 + 7^4 + 9 + n~6H '

300 Chapter 6. Infinite Series

Show that the sum of this series is | log 2. (Hint: Deduce from Example 2 that
0+\ +0- \ +0+\ +0 = \ log 2 and add this to the result of Example
2.)
4. Let ]T^ an be a convergent series, and let ]T^ bn be its rearrangement ob
tained by interchanging each even-numbered term with the odd-numbered term
immediately following it: a\ + ao + 03 + a-z + 0,5 + a.4 + ■ • •. Show that
EOO T V—vOO

0 °n — 2^0 a"'
5. Suppose an > — 1 for all n. By suitable applications of Taylor's theorem to the
functions log(l + x) or ex, show the following:
a- X] an is absolutely convergent if and only if ^T log(l + an) is absolutely
convergent. (This is of interest in connection with Exercise 4 of §6.1: If
^2 \an\ < oo> then rj(l + an) converges.)
b. Let an = {—\)n+l/^/n. Then ^^° an is conditionally convergent (see
Theorem 6.22 below), but Y,T 1°g(1 + an) diverges.

6.4 More Convergence Tests

The tests we developed in §6.2 for the convergence of series of nonnegative terms
immediately yield tests for the absolute convergence of more general series. We
sum up the most important results:

6.21 Theorem.
&■ U \an\ < Cn~l~e for some C,e > 0, then ^an converges absolutely. If
\an\ > CrT1 for some C > 0, then ^ an either converges conditionally or
diverges.
b. (The Ratio Test) If\an+i/an\ -^!asn->oo, then ^ an converges absolutely
ifl<l and diverges ifl>l.
c. (The Root Test) If \an\l^n —> / as n —> 00, then J2 an converges absolutely if
1 < 1 and diverges if I > 1.

In the ratio and root tests, the divergence (rather than conditional convergence)
when I > 1 is guaranteed because an /> 0 in this case; see the proofs of Theorems
6.13 and 6.14. The statements of the ratio and root tests can be sharpened a bit as
in Theorems 6.13a and 6.14a.
Warning. It is a common mistake to obtain incorrect results by forgetting the
absolute values in Theorem 6.21. For example, the series X^o°(~^)n satisfies
an+i/an = —2, and —2 < 1, but the series diverges!
It remains to investigate criteria that will yield information about conditional
convergence as well as absolute convergence. By far the most commonly used
6.4. More Convergence Tests 301

result of this kind pertains to alternating series, that is, series whose terms alternate
in sign. Such a series can be written in the form ^(—l)raan or Y2{—l)n ^ctn
(depending on whether the even or odd numbered terms are positive), where an >
0; we shall consider the first form for the sake of defmiteness.

6.22 Theorem (The Alternating Series Test). Suppose the sequence {an} is de
creasing and limn^oo an = 0. Then the series X^o°(^l)nan 's convergent. More
over, if Sfc and S denote the kth partial sum and the full sum of this series, we
have

Sfc > S for k even, s& < S for k odd, and \sk — S\ < cik+ifor all k.

Proof. Since a^ > a^+i for all k, we have

+ a2m — >
S2m+2 =

Thus the sequence {s2m-i} of odd-numbered partial sums is increasing and the
sequence {s2m} of even-numbered partial sums is decreasing. This monotonicity
further yields

so {s2m-i} and {s2m} are bounded above and below, respectively. By the mono
tone sequence theorem, these sequences both converge, and since S2m ~ S2m-i =
«2m —* 0, their limits are equal. Thus the whole sequence {s/J converges, that is,
the series J^(—l)nan converges. The even-numbered partial sums decrease to the
full sum S while the odd-numbered ones increase, so S < S2m and S > S2m-l f°r
all m. In particular,

0 < S - S2m-
0 < S'2m — 5

so jsfc — 5*| < afc+i whether fe is even or odd. D

EXAMPLE 1. The series ^i°(~l)"(el/n - 1) converges by the alternating

series test, because e1/" decreases to 1 as n —» oo. The convergence is only
conditional, however, since e1//n — 1 « 1/n when n is large. (More precisely,
by Taylor's theorem we have ex = 1 + x + i?(x) where |i?(x)| < Cx2 for
0 < a; < 1. Thus ^(e1/" - 1) = ^ n"1 + ^ -R(l/n); the first series on the
right diverges, while the second converges by comparison to Y^ n~2.)
302 Chapter 6. Infinite Series

The alternating series test is a useful test for conditional convergence, but the
fact that the difference between a partial sum and the full sum is less in absolute
value than the first neglected term is also of interest in the absolutely convergent
case. (This estimate for the error in replacing the full sum by a partial sum is, in
most cases, accurate to within an order of magnitude.)
The alternating series test can be applied to a series ^2{ — l)nan for which
limon = 0 provided that the on's decrease from some point onward. (Of course,
the inequalities for the partial sums are only valid from that point onward too.)
However, the monotonicity condition cannot be dropped entirely, as the following
example shows:

I 1
6 in 2m

Here an —* 0 as n —> oo, but not monotonically, and the series diverges. (The sum
of the first 1m terms is^(l + | + | + --- + —), a partial sum of the divergent series
1
The tests we have developed can be used to analyze a wide variety of power
series, that is, series of the form ^^° cn(x — a)n where x is a real variable. In
typical cases, the ratio test or the root test will establish that there is some number r
such that the series converges absolutely for \x—a\ < r and diverges for x—a > r.

The convergence at the two remaining points x = a±r can then be studied by one
of the other tests.

EXAMPLE 2. Consider the series . We start with the ratio

test:

a; - 3)n+1/(n + 2)22n+3 n + 1 \x — 3| -3|

Thus the series converges absolutely for \x—3| < 4 and diverges for \x—3\ > 4.
(The root test would also yield this result.) The two remaining points are where
x — 3 = ±4, that is, x = — 1 and x = 7. At these two points the series becomes

CXI oo

(~l)n(-4)T i -l)n4" ("l)n

E (n + 1)22'n+1
and
E n +
0

The first of these diverges, while the second one converges by the alternating
series test. The convergence is only conditional, by the divergence of the first
series. Thus the original series converges absolutely for — 1 < x < 7, con
verges conditionally at x = 7, and diverges elsewhere.
6.4. More Convergence Tests 303

We conclude with another test for convergence (absolute or conditional) that

generalizes the alternating series test and is sometimes useful for trigonometric
series. Its proof is based on the following discrete analogue of the integration-
by-parts formula, in which a sum ^1 anbn is rewritten by "differentiating" the
sequence {on} and "integrating" the sequence {bn}.

6.23 Lemma (Summation by Parts). Given two numerical sequences {an} and
{bn}, let
a'n = an - On-li Bn = bo + h bn.

Then
k k

/ y o,nbn = akBk - 2_^ a'nBn-l-

o i

Proof. We have b0 = Bo, and bn = —Bn-\ + Bn for n > 1, so

(6.24) aobo + a\b\ + 0262 + h

= a0BQ - a\BQ + a\B\

= -a'lBo - a'2Bi - • • • -

6.25 Theorem (Dirichlet's Test). Let {an} and {bn} be numerical sequences.
Suppose that the sequence {an} is decreasing and tends to 0 as n —> 00, and
that the sums Bn = bo + ■ ■ ■ + bn are bounded in absolute value by a constant C
independent ofn. Then the series ^^° anbn converges.

Proof. With notation as in Lemma 6.23, Y,oanbn = Ofc-Bfc - Xa a'nBn-i, so

it is enough to show that lim/c-+00 a^Bk exists and that the series Y1T a'n-Bn-i
converges. The first assertion is easy: Since \Bk\ < C and ak —> 0, we have
akBk\ < Cdk —+ 0. On the other hand, since {an} is decreasing, we have a'n < 0
for all n, so

= C[(a0 - 01) + (a, - a-2)-\ h (ak-i - ak)] = C(a0 - ak) < CaQ

for all k. It follows that the series ^i° o!nBn_i is absolutely convergent and hence
convergent. □
304 Chapter 6. Infinite Series

Dirichlet's test includes the alternating series test as a special case, by taking
bn = (—l)n, for which Bn = 1 or 0 according as n is even or odd. The other
situations in which it is most commonly applied are those with bn = sin nO or
bn = cos n&, where 8 is not an integer multiple of 2n. That the hypotheses on {bn}
in Dirichlet's test are satisfied in these cases is shown by the following calculation.

6.26 Lemma. If 8 is not an integer multiple of 2tx, then

cos\(k+ 1)8 ■ sinkkd

cos nO =
sin \

8iah(k + 1)0- sin hk6

sin nd = -, —.
i±6>

Proof. These formulas can be established by using various trigonometric identities.

The easiest method is to use Euler's formula cos x + i sin x = elx (which we shall
discuss in detail in §7.5). By the formula (6.2) for the sum of a finite geometric
series,

k &ik9 _ ikd/2! ike/2 _ p-ike/2]

1 1 J
7^ pin9 Jl lu I

—j eiQ _ eie/2\pi6/2 _ ( ;-i0/2]

1
eike/2 _ p-ike/2
,*(
e,9/2 _ e-i6/2

)S2( ftBina( + j
' sini0
The asserted formulas follow by taking the real and imaginary parts of both sides.
□

6.27 Corollary. Suppose that the sequence {an} decreases to 0 as n —> oo. Then
the series Y^ o,n cos nd converges for all 9 except perhaps for integer multiples of
2tt, and the series ^^° an sin nO converges for all 6.

Proof. The hypotheses of Dirichlet's test are satisfied for 8 ^ 2nj, for if bn is either
cos nO or sinnd, the lemma implies that \Bn\ < \ csc \d\ for all n. (If 8 = 2vrj, the
series V an sin n6 converges trivially since sin n6 =■ 0 for all n.) D
6.4. More Convergence Tests 305

EXERCISES

In Exercises 1-9, determine the values of x at which the series converges absolutely
or conditionally.

(x + 2)n

2.
l

3 VX

5> ^ (-l)n(x-4)n
,. (2«-3)log(n
oo 1

6. > —;—

?< y 2-4-.-(2n) (lj 3)7^

In Exercises 10-14, determine whether the series converges absolutely, converges

conditionally, or diverges.

10.
2

12.
nl/n
306 Chapter 6. Infinite Series

13. y^(—I)""1 log^sinn"1).

oo
n + 1
14. ^j-ir e —

15. Use the alternating series test to show that x~l sin as = 1 — ^x2 + 4,x4 —
ii6 + E(x) where 0 < E(x) < 0.027 for Id < it.
16. (Abel's Test) Suppose J2an is a convergent series and {bn} is a decreasing
sequence of positive numbers, (lim bn need not be zero.) Show that Yl an,bn
converges. (This can be done by using Dirichlet's test or by modifying the
proof of Dirichlet's test.)

17. Show that if ^i° an converges, then so does ^^° rTvan for any p > 0. For
which p can you guarantee absolute convergence without knowing anything
more about the an's?

18. For which x and 9 does J2T n~lxU cos n® converge?

6.5 Double Series; Products of Series

A double infinite series, informally speaking, is an expression of the form

(6.28)
m.n=0

that is, a series whose terms are indexed by ordered pairs of nonnegative integers.
The difficulty in making precise sense out of such an expression is that it is not
clear what one should mean by a "partial sum." Two obvious candidates are the
"square" partial sums and the "triangular" partial sums

m,n=0 m+n<k

which are defined by adding up all the terms amn for which (m, n) lies in the
outlined regions in Figure 6.2. (Note that passing from s^ or s£ to s^+1 or s£+1
involves adding not just a single term but a finite set of terms to the sum. It is not
necessary to specify the order in which these terms are added, as finite addition
is commutative.) Clearly there are many other possibilities. Indeed, there are in
finitely many ways to enumerate the set of ordered pairs of nonnegative integers,
each of which leads to a different notion of "partial sums."
6.5. Double Series; Products of Series 307

k\
\

k m \

\
B m

i i \

i \
\n i

i \

i s

. i . i .

m \
i i

0 0!
\
i

i 1
\

~0~ ~~k 0 k

Figure 6.2: Schematic representation of square and triangular partial

sums of a double series.

There is yet another possibility: One can consider the double series (6.28) as
an iterated series, just as one can regard double integrals as iterated integrals. That
is, one could interpret (6.28) as
oo / oo oo • OO

E(
m.=0
/
or
E(
ra=0 Vm=0
a

in which one forms the ordinary series am — J2^=o ar«« ^or eacn m an(^ men
adds up the sums to obtain Ylm=o CT™' or similarly with m and n switched. This is
different from the partial-sum procedures discussed above because the intermediate
steps involve infinite sums rather than finite ones.
How is one to make sense out of all these ways of interpreting (6.28)? The
answer, in a nutshell, is that the situation is similar to that for improper double
integrals discussed in §4.7: For series of positive terms, or for absolutely conver
gent series, there is no problem, as all interpretations lead to the same answer.
Otherwise, one must proceed with great caution.
Let us explain this in more detail. Given any one-to-one correspondence j ^
(m, n) between the set of nonnegative integers and the set of ordered pairs of non-
negative integers, we can set bj = amn and form the ordinary infinite series ^^° bj\
we call such a series an ordering of the double series ^n=0 amn. The essential
point is that the orderings of'^ amn are all rearrangements of one another, and we
can apply Theorem 6.19.
First, if amn > 0, then either all orderings of ]T amn diverge or all orderings
converge, and in the latter case their sums are all equal. Thus, the sum of the series
Y a-mn is well defined as a positive number or +oo, independent of the choice of
ordering.
Second, without the assumption of positivity, if ]T \bj\ is convergent for one
ordering of ^ amn, then the same is true for every ordering. In this case the series
308 Chapter 6. Infinite Series

Y2amn is called absolutely convergent, and by Theorem 6.19 again, all order-
ings of Y amn have the same sum, which we call the sum of the double series
Y amn. Moreover, an argument similar to the proof of Theorem 6.19 shows that
the double series Y, amn is absolutely convergent if and only if the iterated series
]CmEn Knnl) is convergent, in which case Ym.namn = Ym(Y,namn)- (See
Exercises 5 and 6.)
Given a double series Y amn, we can therefore proceed as follows. First we
evaluate the series Y, \amn\ by ordering it in some fashion or treating it as an iter
ated series; if it turns out to be finite, we can then evaluate Y amn by ordering it in
any fashion or treating it as an iterated series.
What if Y amn is not absolutely convergent? Let us separate out the positive
and negative terms as we did in Theorem 6.18. The argument in the proof of Theo
rem 6.18 shows that if ^ a.+n = oo but Y a^nn < oo, then all orderings of Y <hnn
diverge to +oo; likewise, if Yl, a^nn < oo but Y ornn = oo, then all orderings of
Y a-mn diverge to — oo. On the other hand, if J2 o-mn = /C amn = °° but amn —► 0
asm,n-» oo, the proof of Theorem 6.20 shows that various orderings of Y, amn
can converge to any real number. In this case, therefore, we simply cannot make
numerical sense out of the expression Y, amn without specifying more precisely
how the summation is to be performed.
An important situation in which double series occur is in multiplying two series
together. The basic result is as follows.

6.29 Theorem. Suppose that Y^ "m and ^o° ^« are b°fh absolutely convergent,
with sums A and B. Then the double series Y,m n=o ambn is absolutely convergent,
and its sum is AB.

Proof. We consider the square partial sums of J^ ambn, which are just the products
of the partial sums of J2 am and ^ bn'

(6.30)
m,n=Q

If we replace am and bn by \am\ and \bn\ in (6.30), the right side is bounded by the
finite quantity (X^o° \am\){Y2o^ \bn\), which shows that the double series J2 amK
is absolutely convergent. Then, letting k —> oo in (6.30), we obtain Yl am.bn =
AB. □

Under the conditions of Theorem 6.29, we are free to use any ordering of
Y, ambn that we choose, and in particular, we can use the triangular partial sums
rather than the square ones. This is the natural thing to do when considering power
6.5. Double Series; Products of Series 309

series. Indeed, if ^ anxn and ^ bnxn are absolutely convergent for a particu
lar value of x, their product is ^ ambnxm+n, which can also be expressed as a
power series if we group together all the terms involving a given power of x. The
terms involving x-7 are those with m + n = j, i.e., those with m = 0,1,..., j and
n = j — m. Collecting these terms together yields

3=0

The expression on the right is a power series whose jth coefficient is a finite sum
of products of the original coefficients; its partial sums are precisely the triangular
partial sums of the double series J2 ambnxm+n■
The same procedure can also be used for series without an x (by taking x = 1,
if you like). That is, given two convergent series J2o^ am and Y^f bn, we can form
the series

anbm ] = y^(ao6j + aify-i -\ h aj-ih + a,jb0),

J=0

whose partial sums are the triangular partial sums of the double series Yl amb
mbn,
it is called the Cauchy product of J] am and ^ bn. As we have seen, if Yl a m
and ^2 bn are absolutely convergent, their Cauchy product is too, and its sum is
Cl2am)(J2bn)- In fact, the Cauchy product converges to CCam)CC&n) Pro"
vided that at least one of ^ am and J2 bn is absolutely convergent (see Krantz
[12, pp. 109-10], or Rudin [18, p. 74]). However, if j^ am and Y.K are both
conditionally convergent, their Cauchy product may diverge. (See Exercise 4.)

EXERCISES

1. By multiplying the geometric series by itself, show that for \x\ < 1,
a. (l-x)-2 = ^(n + l)xn;
b. (l-x)-3 = ±£;5> + l)(n + 2)z™.
2. Let f(x) = YJ^xn/n\. Show directly from this formula that f{x)f(y) =
f(x + y),
3. Verify that the Taylor series of (1 - Ax)'1/2 about x = 0 is J2™(2n)\xn/(n!)2
and that this series converges absolutely for \x\ < \. Then, taking for granted
that the sum of this series actually is (1 — 4a;)"1/2 (which we shall prove in
310 Chapter 6. Infinite Series

§7.3), multiply the series by itself and conclude that for any positive integer j,

' (2n)l(2j - 2n)\ AJ

4. Show that the series ^^°(-l)"(n + l) 1//2 is conditionally convergent and that
the Cauchy product of this series with itself diverges. (Hint: The maximum
of the function f(x) = (x + l)(j - x + 1) occurs at x = \j, and hence
i 2
5. Show that £m,n=o omn = Em=o(E~=o°mn) whenever amn > 0 for all
7n, n > 0.

6. Suppose Em n=o °™n ^s absolutely convergent. Show that the iterated series
Em=o(E^Lo amn) converges to the sum Em,n=o amn- (Use Exercise 5.)
7. Show that Em,n=i(m + n) P converges if and only if p > 2. (Hint: Use
triangular partial sums.)

8. Let amn = 1 if m = n, amn = — 1 if m — n = 1, and amn = 0 otherwise.

Show that the iterated series E^Lo Em=o amn and Em=o E^Lo amn both
converge, but their sums are unequal.
Chapter 7

FUNCTIONS DEFINED BY SERIES

AND INTEGRALS

In this chapter we study the convergence of sequences and series whose terms are
functions of a variable x and improper integrals whose integrand contains x as a
free variable. In all these situations, the study of the resulting function of x may
reveal unpleasant surprises unless we have some control over the way the rate of
convergence varies along with x; the most commonly encountered form of such
control, uniform convergence, is a major theme of this chapter.

7.1 Sequences and Series of Functions

We recall that a sequence {/fc}o° of functions is a map that assigns to each non-
negative integer k a function fk. It is implicitly assumed that the functions fk are
all defined on some common domain S (usually a subset of R or W1) and all take
values in the same space (R, C, or Rm).
What does it mean for a sequence of functions {/&} defined on a set 5 C I"
to converge to a function / on SI The most obvious interpretation is that

(V.I) /fe(x) ^/(x) for every x€£.

This is, indeed, what is usually meant by the statement "//. —> / on S" when no
further qualification is added; when we wish to be very clear about it, we shall say
that fk —> / pointwise on S when (7.1) holds.
Unfortunately, pointwise convergence is a rather badly behaved operation in
the sense that it does not interact well with other limiting operations, such as dif
ferentiation and integration. Consider the following group of examples:

311
312 Chapter 7. Functions Defined by Series and Integrals

Figure 7.1: Some of the functions defined in (7.2). Top: /i (dashed)

and fs (solid). Middle: g\ (dashed) and 53 (solid). Bottom: h\
(dashed) and /i3 (solid).
7.1. Sequences and Series of Functions 313

Example 1. Let

fk{x) = - arctan/cx, gk(x) = f'h(x) =

(72) -2k2x

Observe that fk(%) = k~1fi(kx), gu{x) = g\{kx), and hk(x) = khi(kx).

In graphical terms, as shown in Figure 7.1, this means that the graph of fk is
obtained from the graph of f\ by shrinking the x and y scales by a factor of k;
the graph of gk is obtained from the graph of g\ by shrinking the x scale by a
factor of k and leaving the y scale unchanged; and the graph of hk is obtained
from the graph of hi by shrinking the x scale and expanding the y scale by a
factor of k. We have:
i- fk(x) —» 0 for all x, since |/fc(x)| < vr/2A;.
ii. 5fc(x) -* ° for a11 ^ T4 0, but gk(0) = 1 for all k. That is,

fl ifx = 0,
lim gh{x) = g{x) = <
k^oo I 0 otherwise.

iii. hk(x) -* 0 for all x. (hk(0) = 0 for all k, and if x / 0, hk(x) ps -2//c2x3
for large fc.)
Therefore, 3 is discontinuous even though the gk's are all continuous; more
over, since gk is the derivative of fk and an antiderivative of hk,

lim ,^(0) = l#0=(/fc)();

lim lim gk(x) = 1 / 0 = lim lim gk(x);

k—»oo o:™>0 s—»0 fc—>oo
/•I rl

lim / hk{x) dx = -1 ^ 0 = \ lim /ifc(^)l rf^-

k^ooj0 JQ Lfe^oo

Clearly, if we want some theorems to the effect that "the integral of the limit is
the limit of the integrals," or "the derivative of a limit is the limit of the derivatives,"
pointwise convergence is the wrong condition to impose. We now develop a more
stringent notion of convergence that removes some of the pathologies.
The real trouble with pointwise convergence is as follows. The statement
"fk{x) -^ f{x) for all x G 5"' means that, for each x, fk(x) will be close to
/(x) provided k is sufficiently large, but the rate of convergence of fk(x) to /(x)
can be very different for different values of x. For example, if gk is as in (7.2), for
all x / 0 we have gk(x) —» 0, so \gk(x)\ < 1CT4 (say) provided k is sufficiently
314 Chapter 7. Functions Defined by Series and Integrals

large; for x = 10, "sufficiently large" means k > 10, but for x = 0.1, it means
k > 1000. If, however, we have some control over the rate of convergence that is
independent of the particular point x, then many of the pathologies disappear.
The precise definition is as follows. A sequence {fk} of functions defined on a
set S C W1 is said to converge uniformly on S to the function / if for every e > 0
there is an integer K such that

(7.3) |/fe(x) - /(x)| < e whenever k > K and x € S.

The point here is that the same K will work for every x E S. Another way of
writing (7.3) is

(7.4) sup |/fc(x) - /(x)| < e whenever k > K.

The geometry of this inequality is indicated in Figure 7.2. Yet another way of
expressing uniform convergence is the following, which is sufficiently useful to be
displayed as a theorem.

7.5 Theorem. The sequence {fk} converges to f uniformly on S if and only if

there is a sequence {Ck} of positive constants such that |/fc(x) — /(x)| < G^for
all x £ S and lim^oo Ck = 0.

Proof. If fk -* / uniformly, by (7.4) we can take Ck = supxe5 |/fe(x) - /(x)|.

Conversely, if Ck —> 0, for any e > 0 there exists K such that Ck < e whenever
k > K, and hence |/fc(x) - /(x)| < Ck < e for all x € S whenever k > K; that
is, (7.3) holds. □

Let us take another look at the examples in (7.2) with regard to uniform con
vergence. First, the sequence {fk} defined by fk(x) = A;"1 arctan/cx converges
uniformly to 0 on R, since we can take Ck = 7r/2/c in Theorem 7.5. Second, the
sequence {gk} defined by gk(x) = (k2x2 + I)"1 does not converge uniformly to
its limit g on M; indeed,

sup \gk{x) - g(x)\ = sup = 1 for all k.

xeu x^o kAx/ + 1

(Notice that the supremum is not actually achieved; the maximum of (k2x2 + l)^1
occurs six = 0, but g(0) = 1, sogk(0)—g{0) = 0. See Figure 7.2.) Finally, the se
quence {hk} defined by hk(x) = —2k2x{k2x2 + \)~2 does not converge uniformly
to its limit 0 on R. Indeed, a bit of calculus shows that the minimum and maximum
values oihk{x), achieved atx = ±l/2k, are =Fl6fc/25, sosupx \hk{x)—Q\ actually
tends to oo rather than 0.
7.1. Sequences and Series of Functions 315

■■'.■■■■■ .•••■ .■■■■Vi11: ■:■■■'"■■■.:

FIGURE 7.2: Left: Uniform convergence. For k large, the graph of

fk — f is contained in the shaded strip \y\ < e. Right: Nonuniform
convergence of the sequence {gk} in (7.2). The spike of g^ — g around
the origin becomes narrower as k -> oo but is never wholly within the
shaded strip.

On the other hand, the bad behavior in these examples is all at x = 0. The
sequences {g^} and {hk} do converge uniformly to 0 on the intervals [6, oo) and
(-oo, —8] for any 5 > 0. For gk this is clear:

^^-1 (x<-8orX>5),
and (82k2 +1)"1 —)■ 0 as k —> oo. For hk we do not get a good estimate for the first
few values of A;, but (by the same bit of calculus as in the preceding paragraph) when
k > 1/28 the function hk is positive and increasing on (—oo, — 5] and negative
and increasing on [8, oo), so the maximum of \hk\ on these intervals occurs at the
endpoints ±8:

25k2
\hk(x)-0\< x < —8 or x > 5, k >
(S2k2 + I)2

The phenomenon exhibited here is quite common. That is, one has a sequence
{fk} of functions that converge pointwise to / on a set S; the convergence is not
uniform on all of S but is uniform on many "slightly smaller" subsets of S. The
situation we shall encounter most often is where 5 is an open interval (a, b), and
the "bad behavior" occurs near the endpoints, so that the convergence is uniform on
[a + 8, b — 5] for any 8 > 0. In this case, the sequence of constants Ck in Theorem
39 will generally depend on 8 — as they do in the preceding examples.
The notion of Cauchy sequence has an obvious adaptation to the context of uni
form convergence. Namely, a sequence {//J of functions on a set S is uniformly
Cauchy if for every e > 0 there is an integer K so that

(7.6) |/j(x) — /fc(x)| < e whenever j, k > K and x € S,

316 Chapter 7. Functions Defined by Series and Integrals

or in other words,

sup \fj(x) — /fc(x)| < e whenever j, k > K.

xes

We have the following analogue of Theorem 1.20:

7.7 Theorem. The sequence {fk} is uniformly Cauchy on S if and only if there is
a function f on S such that fk —> / uniformly on S.

Proof. If {fk} is uniformly Cauchy, then for each x£5 the numerical sequence
{/fe(x)} is Cauchy. By Theorem 1.20, it has a limit, which we call /(x). Letting
j —> oo in (7.6), we see that |/fc(x) - /(x)| < e whenever k > K and x € 5',
so that fk —> / uniformly on 5. Conversely, if /^ —> / uniformly on S, we have
|/fe(x) - /(x)| < Ck for all x G 5, where Cfc -> 0 as k -> oo, and

|/,-(x) - /fc(x)| < |/,(x) - /(x)| + |/(x) - /fe(x)| < Cj + Ck,

and Cj + Ck < e when j and k are sufficiently large, so (7.6) holds.

One of the most important properties of uniform convergence is that it preserves

continuity, as mere pointwise convergence does not (see the example {gk} in (7.2)).

7.8 Theorem. Suppose fk —* / uniformly on S. If each fk is continuous on S,

then so is f.

Proof. Given a point a € S, we show that / is continuous at a. Given e > 0, we

|/(x) - /(a)| < |/(x) - /fc(x)| + |/fc(x) - /fe(a)| + |/fc(a) - /(a)|

e e e

<3 + 3 + 3=£'
which shows that / is continuous at a. □

Theorem 7.8 can be strengthened somewhat, because the continuity of a func

tion / at a point a depends only on the behavior of / at points close to a. Hence,
if fk is continuous on S and fk —> / pointwise on 5, it is not necessary to have
uniform convergence on all of 5 to guarantee continuity of the limit function /; it
is enough to have uniform convergence on some neighborhood of each point in S.
For example, if 5 is the interval (a, b) and /fe —> / uniformly on [a + 8, b - 5} for
each 6 > 0, we conclude that / is continuous on [a + <5, b — 5} for each S and hence
that / is continuous on all of (a, b).
7.1. Sequences and Series of Functions 317

The preceding discussion of sequences of functions leads immediately to re

sults about series of functions. Namely, given a sequence of functions {/n}o° de
fined on a set S, we can form the infinite series Xo° /n(x) for each x G S. If this
series converges for each x G S, we say that the series Xo° fn is (pointwise) con
vergent on S; in this case, its sum defines a function on S, which we also denote
by Xo° fn. The series Xo° fn is said to be uniformly convergent on S if the
sequence of partial sums, sk = Xo /»»>1S uniformly convergent on 5.

Example 2. The geometric series Xo° x" converges pointwise on (—1,1) to

(1 — x)~l. Denoting the fcth partial sum by sk(x), we have

k+l
1 X
sk(x) = — , so sk{x) -
1 -x 1 -X

The latter quantity tends to oo as x -> 1 and to \ as x ->• -1 no matter what

k is, so the convergence is not uniform on (—1,1). (This is hardly surprising,
since the series diverges at both endpoints.) But it is uniform on [—r, r] for any
r < 1, for
x\k+l rk+l
< for \x < r,
1 —x 1 — r

and this quantity vanishes as k —> oo.

The following is the most commonly used test for uniform convergence of se-
ries:

7.9 Theorem (The Weierstrass M-Test). Let {/n}o° be a sequence offunctions

on the set S. Suppose there is a sequence {Mn}o° of positive constants such that
(i) |/n(20| < Mnfor all x G S and all n, and (ii) XX Mn < 00. Then the series
So° fn i$ absolutely and uniformly convergent on S.

Proof. The series X^o°/n(a;) *s absolutely convergent for each x G S by com

parison to the series ^o° Mn. Let us denote its sum by s(x), the kth partial sum
Eo fn{x) by sk{x), and ^^1 Mn by Ck; then

\s(x)-8k(x)\ <^2\fn(x)\ <Y;Mn = Ck (x € 5).

fc+1 ifc+l

But Cfc —>■ 0 as k —> oo since the series ^ Mn is convergent, so it follows from
Theorem 7.5 that the sequence {sk}, i.e., the series ^ /„, is uniformly convergent
on S. □
318 Chapter 7. Functions Defined by Series and Integrals

The tribute to Weierstrass in the name of this theorem is appropriate, since

Weierstrass was one of the pioneers in the rigorous theory of infinite series; but the
term "M-test" signifies nothing more than the fact that the sequence of constants in
the theorem is traditionally denoted by {Mn}.
It is quite possible for a series of functions to be uniformly convergent on
S without being absolutely convergent. (See Exercises 5 and 6.) Therefore, the
Weierstrass M-test, unlike its cousin Theorem 7.5, gives a sufficient condition for
uniform convergence but not a necessary one.

Example 3. The M-test gives an easy verification that the geometric series
^"j;" converges uniformly on [-r,r] for any r < 1, by taking Mn = rn.
(\xn\ < r" for |a?| < r, and X>" < oo.)

Example 4. The Taylor series for log(l + x), ^f {—\)n~lxn/n, converges

absolutely for a; € (—1,1) (by the ratio test) and conditionally at x = 1 (by the
alternating series test). Since \(—l)nlxn/n\ < rn/n when \x\ < r, the M-test
(with Mn = rn/n) shows that this series converges uniformly on [—r, r] for
any r < 1. It actually converges uniformly on [—r, 1] for any r < 1, but the M-
test will not yield this result because the convergence at 1 is only conditional.
(The result needed here is a theorem of Abel that we shall present in §7.3.)

Theorem 7.8, concerning the continuity of limits of sequences, translates im

mediately into a theorem about continuity of sums of series, as follows:

7.10 Theorem. Suppose {fn} is a sequence of continuous functions on a set S. If

the series ^2 fn converges uniformly on S, its sum is a continuous function on S.

Proof. Apply Theorem 7.8 to the sequence of partial sums. □

The remarks following Theorem 7.8, to the effect that local uniform conver
gence is enough to yield continuity, apply to this situation also.

EXERCISES

1. For each of the following sequences {/&} of functions, compute lim^oo f\ on

the given interval and tell whether the convergence is uniform on that interval.
If not, is the convergence uniform on some slightly smaller sets?
a. fk(x)=xk,xe[0,l}.
b. fk(x) = xllk, x € [0,1].
c. fk{x) = sin^x, x 6 [0,7r].
d. fk(x) = k-1e-'x2lk,x 6R
7.1. Sequences and Series of Functions 319

e. fk(x) = kxe~kx, x G [0, oo).

f. A(x) = (i/fc)e-^,xG[0,oo).
g. fk(x)=xk/{l+x2k),xe [0,oo).
2. Test the following series for absolute and uniform convergence; state the inter
vals) on which you obtain such convergence. What can you conclude about
the continuity of the sum of the series?

0
oo

n2 + n
oo -,

1
oo

, v^ cos nx
d. > =—.
Z-, n3

oo ,

f. £„-
l

k where g is continuous on [0,1] and g(l) = 0. Show that

3. Let /fc(x) = g(x)xk,
fk —> 0 uniformly on [0,1]. (Cf. Exercise la.)
oo ^

4. Show that the series VJ —^ ^ converges uniformly on any compact interval

x' — n

that does not contain a nonzero integer, and conclude that the sum of the series
is a continuous function on R \ {±1, ±2,. ..}.

' (-1)"-1
5. Show that the series } ^ 2 \ - converges uniformly on R, although the
x2 + n
convergence is conditional at every point.

6. Given a sequence {cn} of real numbers such that J2T c« converges, consider
oo
E,Jtl t

Cn- (x ^ ±1). (Such a series is called a Lambert se-

1 — xn
ries.)
a. Show that the series converges absolutely and uniformly on [—a, a] for any
a < 1.
b. Show that the series converges uniformly on (—oo, —b] and on [b, oo) for
any b > 1, and that the convergence is absolute if and only if Y,T lc« I <
320 Chapter 7. Functions Defined by Series and Integrals

oo. (Hint: xn(l - x71)"1 = (1 - xn)~l - 1.)

7. Let {fk} be a sequence of functions defined on a set S, and let Si,..., Sm be
a finite collection of subsets of S. Show that if {/&} converges uniformly on
each Sm, then it converges uniformly on \J1 Sm.

8. Let {fk} be a sequence of continuous functions on [a, b]. Show that if {f^}
converges uniformly on (a, b), then it converges uniformly on [a, b].

9. Let {fk} be a sequence of continuous functions on a compact set S C W1.

Suppose that (a) the sequence {/&(x)} is bounded and increasing (and hence
has a limit) for each x e S, and (b) the function / = lim/^oo fk is continuous
on S. Show that fk —> / uniformly on 5. (Hint: Given e > 0, apply Exercise
5 in §1.6 to the sets Sfc = {x(E5: /(x) - /fc(x) > e}.)

7.2 Integrals and Derivatives of Sequences and Series

If {fk} is a sequence of functions on the interval [a, b] and fk —> f on [a, b], is
it true that Ja fk(x) dx —» JQ /(x) dxl The sequence {/ifc} in (7.2) shows that
the answer is sometimes no. The best general affirmative result in the context of
Riemann integration is the bounded convergence theorem that we stated in §4.5.
As we indicated there, the proof of that theorem is beyond the scope of this book;
however, uniform convergence yields a affirmative result with an easy proof. It
works equally well for n-dimensional integrals, so we present it in that generality.

7.11 Theorem. Suppose S is a measurable set in W1 and {fk} is a sequence of

integrable functions on S that converges uniformly to an integrable function f on
S. Then

/••■//(x)cTx= lim I... I /*(x)dBx.

J JS k^ooj Js

Proof. By Theorem 7.5, there is a sequence {Ck} of constants such that Ck —► 0

and |/fc(x) - /(x)| < Ck for x G S. But then

y. y /fc(x) ^x ^ y. .y /(X) ^x < y. .y i/fe(X) - /(X)i ^

<J-Jsckd^.

This last quantity is the n-dimensional volume of S times Ck, which tends to zero
as k —> oo. □
1.2. Integrals and Derivatives of Sequences and Series 321

Returning to the one-dimensional situation, we now ask the corresponding

question for derivatives: If fk —> /, is it true that fi —> /'? Equivalently, set
ting gk = /fc — /, if gk —> 0, is it true that g'k —» 0? Here the answer is clearly no
in general; the function g^ can be very small but also very wiggly, so that g'k is not
small.

Example 1. Let gk{x) = k~1sinkx. Then \gk(x)\ < fc"1 for all x, so
gk —> 0 uniformly on M. On the other hand, g'k(x) = coskx; the sequence
{cos kx} does not converge at all for most values of x, and when it does —
namely, when x is an even multiple of vr — its limit is 1, not 0.

In this situation, the crucial uniformity hypothesis is not on the original se

quence {fk} but on the differentiated sequence {/(.}• Here is the result:

7.12 Theorem. Let {fk} be a sequence offunctions of class C1 on the interval

[a, b]. Suppose that {fk} converges pointwise to f and that {f'k} converges uni
formly to g on [a, b]. Then f is of class Cl on [a, b], and g = f.

Proof. The function g is continuous on [a, b] by Theorem 7.8, so it is integrable

over any subinterval of [a, b]. By Theorem 7.11,

f g(t) dt = lim f fk(t) dt = lim [fk(x) - fk(a)} = f(x) - /(a).

Thus /(x) = /(a) + JJ g(t) dt. But by the fundamental theorem of calculus, the
function on the right is differentiable and its derivative is g. □

The example {fk} in (7.2) shows that pointwise convergence of {f'k} is not
sufficient to obtain Hm(/jQ = (lim fk)'. On the other hand, Theorem 7.12 can be
extended somewhat. Since differentiability (like continuity) is a local property, it is
enough for the convergence of{f'k} to be uniform on a neighborhood of each point,
rather than on the whole interval in question. In many situations, the sequence
{fk} is defined on an open interval (a, 6) and one has uniform convergence of
{fk} on each compact subinterval [a + S, b — 5]; this suffices to guarantee that

The results on term-by-term integration and differentiation of series are imme

diate consequences of those for sequences. We have merely to apply Theorems
7.11 and 7.12 to the partial sums of the series to obtain the following theorem.

7.13 Theorem. Suppose that {fn} is a sequence of continuous functions on the

interval [a, b] and that the series ^ fn converges pointwise on [a, b].
322 Chapter 7. Functions Defined by Series and Integrals

a- tfJ2fn converges uniformly on [a, b], then

I El [
b. If the fn's are of class Cl and the series Yl f'n converges uniformly on [a, b],
then the sum ^ fn is of class C1 on [a,b] and

EXERCISES

1. Let f(x) = Y2T n~2 s'mnx- Show that / is a continuous function on R and
that £ 2 f(x) dx = En=i,3,5,... n~3 + 2 En=2,6,10,... n~3-
2. Let f(x) = J2T(X + n)~2. Show that / is a continuous function on [0, oo)
and that f0 f(x)dx = l.
3. Let fk(x) — xarctankx.
a. Show that lim^^oo ^(.t) = ^tt|:e|.
b. Show that lim^oo f'kix) exists for every x, including x = 0, but that the
convergence is not uniform in any interval containing 0.

4. For each of the series (a-f) in Exercise 2, §7.1, show that the series can be dif
ferentiated term-by-term on its interval of convergence (except at the endpoints
in (b)).

5. For x / ±1, ±2,..., let f(x) = 2x ^2^°(x2 - n2)"1 (see Exercise 4, §7.1).
Show that / is of class C1 on its domain and that f'(x) = — X^i°[(-X' ~ n)~2 +

6. Let / be a continuous function on [0, oo) such that 0 < f(x) < Cx~l~e for
some C, e > 0, and let a = Jo°° f(x)dx. (The estimate on / implies the
convergence of this integral.) Let fk(x) = kf(kx).
a. Show that limfc_>oo//t(x) = 0 for all x > 0 and that the convergence is
uniform on [5, oo) for any 5 > 0.
b. Show that lim/^oo Jo fk(x) dx = a.
c. Show that lim^oo fQ fh(x)g(x) dx = ag(0) for any integrable function g
on [0,1] that is continuous at 0. (Hint: Write Jo = fQ + Js .)
7.3. Power Series 323

7.3 Power Series

A power series is an infinite series of the form

(7.14) ~ b)n = a0 + ai(x - b) + a2(x - bf

where x is a real or complex variable. The lower limit of summation is always

n = 0 in principle, although the first few terms might vanish (ao = ■ • • = fflfc = 0);
the crucial point is that only nonnegative integer powers of x — b are allowed. (Thus,
one might think of a power series as a "polynomial of infinite degree in x — &.")
The study of series of the general form (7.14) can be reduced to the special case
b = 0 by the change of variable x —> x + b, and we do so henceforth.
The first order of business in studying power series is to determine the range of
values of the variable x for which they converge. The key observation is as follows.

7.15 Lemma. If the power series converges for x = xq, then it con
verges absolutely for all x such that \x\ < \xo\.

Proof The convergence of ]P «n^o implies that anxg —► 0, and in particular that
< C for some constant C independent of n. Since

x
<c
•-o

for |xj < |a?o| the series Yl anx™ converges absolutely by comparison with the
geometric series ^ \x/xq \n. D

7.16 Theorem. For any power series ^^° anxn, there is a number R £ [0, oo],
called the radius of convergence of the series, such that the series converges ab
solutely for \x\ < R and diverges for \x\ > R. (When R — 0, this means that the
series converges only for x = 0; when R = oo, it means that the series converges
absolutely for all x.)

Proof Let R = sup{|xo| : J2an%o converges}. (R > 0 since the series always
converges at xq = 0.) Thus Y,anxn diverges if |x| > R. On the other hand, if
< R, there exists xq such that |xo| > \x\ and J2anxo converges, and then
, o,nxn converges absolutely by Lemma 7.15. D

Important Remark. The reader has probably been thinking of an and x as real
numbers, but Theorem 7.16 is valid, with exactly the same proof, when an and x
are complex numbers.
324 Chapter 7. Functions Defined by Series and Integrals

Theorem 7.16 says that the set of all real x such that J2 anxn converges is an
open interval centered at 0, possibly together with one or both endpoints, and the
set of all complex x such that Yl, o,nxn converges is an open disc centered at 0 in
the complex plane, possibly together with some or all of its boundary points. The
behavior of the series on the boundary of the region of convergence must be decided
on a case-by-case basis.

Example 1. Consider the series

oo
^—v X xn y^
71 2 ' ' J II
1

An easy application of the ratio test shows that each of these series converges
absolutely for \x\ < 1 and diverges for |:r| > 1, so their radius of convergence
is 1. The first one is absolutely convergent when \x\ = 1 by comparison with
2^n~2, whereas the second is divergent when \x\ = 1 because xn /> 0 as
n —>■ oo in that case. The third one is divergent when x = 1 but is conditionally
convergent at a; = -1 by the alternating series test. It is also conditionally
convergent at all other complex numbers x such that |x| = 1, by Dirichlet's
test. (Indeed, take an = n 1 and bn = xn. Then b\ + ■ ■ ■ + bn is a finite
geometric series whose sum equals x(l — xn)/(l — x), and this is bounded by
- x\) asn-> oo.)

The standard tools for determining the radius of convergence of a power series
are the ratio test and the root test. We have already seen how this works in §6.4
(especially Example 2 and Exercises 1-9), so we shall not belabor the point here.
However, see Exercise 1. In fact, a slight extension of the root test yields a formula
for the radius of convergence of an arbitrary power series; see Exercise 4.
Theorem 7.16 shows that any power series converges absolutely within the re
gion \x\ < R. Equally important is that it converges uniformly on compact subsets
of this region.

7.17 Theorem. Let R be the radius of convergence ofY^f anxn. For any r < R,
the series X^o° anxTl converges uniformly on the set {x : \x\ < r}, and its sum is a
continuous function on the set {x : \x\ < R}.

Proof. For|a;| < r we have |anxn| < \an\rn, and the series ]T \an\rn is convergent
since Y^an%n is absolutely convergent at x = r. The first assertion therefore fol
lows from the Weierstrass M-test, and the second follows from the first by Theorem
7.8. □
7.3. Power Series 325

We now turn to the question of integrating power series. In this discussion we

take x to be a real variable.

7.18 Theorem. Suppose the series f(x) = Y^o* anxn has radius of convergence
R>0.
- '■ ^ bn+i _ an+i
a. If — R < a < b < R, then / f(x) dx =
Ja n+1
oo

b. IfF is any antiderivative of f, then F{x) = F(0) + ^ -^— xn+1 for \x\ <
R.

Proof. Assertion (a) follows immediately from Theorems 7.13a and 7.17. The fun
damental theorem of calculus then shows that ^^° anxn+1 /(n+1) is an antideriva
tive of / on (-R, R) — specifically, the one whose value at x = 0 is zero — and
any other antiderivative differs from this one by a constant. □

Theorem 7.18 gives a way of generating new series expansions from old ones.

EXAMPLE 2. If we integrate the geometric series X]o°(~x)n = (^ + x)~1

(\x\ < 1), we obtain

(The last equality is obtained by the change of variable n —» n — 1.) Similarly,

integration of the geometric series ^o°(~x2)n = (^ + x2)~l leads to

OX sr^ I 1 ) X ... .
n = / (\x\ < 1).
l+t> ^ 272+1 U '

The series for log(l +x) is easily obtained from Taylor's theorem (see Exercise
3 in §6.1), but not the series for arctanx; the computation of the high-order
derivatives of the latter function is very cumbersome. {Remark: The expansion
of log(l+x) is also valid at x = 1, and that of arctanx is also valid at x = ±1.
However, these facts do not follow from Theorem 7.18. The extra result needed
here is Abel's theorem, which we shall present below.)

Theorem 7.18 also offers a technique for expressing definite or indefinite inte
grals of functions that have no elementary antiderivatives in a computable form.
326 Chapter 7. Functions Defined by Series and Integrals

EXAMPLE 3. The function f(x) = x l sin x has no elementary antiderivative,

but

. oo
sint (-l)mi2m {-l)mx2m+l
dt = Clt =
v (2m (2m+ 1) ■ (2m+ 1)!'

This gives a precise analytic expression for JQX t l sin t dt that is valid for all
x, and the first few terms, x — -^x3 + g^x5 + • • •, furnish a good numerical
approximation to the integral when x is not too large.

Next, what about term-by-term differentiation of a power series ^o°

According to Theorem 7.13b, we must examine the convergence of the series
Eo° nan%n~x obtained by termwise differentiation, which we shall call the de
rived series. At first glance, the latter series seems less likely to converge than
the original series, since the nth term of the derived series is much larger than the
corresponding term of the original series when n is large (by a factor of n/|:r|). But
in fact, the only values of x for which this really matters are those on the boundary
of the interval (or disc) of convergence; elsewhere, the exponential behavior of xn
as n —+ oo swamps the extra factor of n, as will be seen in the following proof.

7.19 Theorem. The radius of convergence of any power series Eo° anxn is equal
to the radius of convergence of the derived series Y^ nanx n~1

Proof. Let R and R! be the radii of convergence of ^o° anXn and YV nanxn
respectively. Suppose \x\ < R'. Then Y nanxn~l is absolutely convergent, and

n-li -n-l
anxn\ = nanx < nanxn | for large n,
n

so Yl,anXn is absolutely convergent by comparison. Thus, if \x\ < R1 then x <

R, and it follows that R' < R.
On the other hand, if \x\ < R,we can pick a number r such that \x\ < r < R.
Then the series ^ anfn is absolutely convergent, and

n-l
1
\nanx

Since \x/r\ < 1, the sequence \x/r tends to 0 exponentially fast as n -» oo,

and hence n\x/r —> 0 also. In particular, we have |nanxn x for n

large, so Y nanx 1 converges (absolutely) by comparison to Y anrr . In short,
if [d, < i? then \x\ < R', and it follows that R < R'. Combining this inequality
with the one in the preceding paragraph, we conclude that R = R'. □
7.3. Power Series 327

Combining this result with Theorem 7.13b, we obtain the fundamental theorem
on term-by-term differentiation of a power series.

7.20 Theorem. Suppose the radius of convergence of the series f(x) = ]T a.nx"
is R > 0. Then the function f is of class C°° on the interval (-R, R), and its kth
derivative may be computed on (-R, R) by differentiating the series ^^° anxn
termwise k times.

Proof. In view of Theorem 7.19, Theorem 7.13b shows that f'(x) = ^nanxn^1
for \x\ < R. It now follows by induction on k that, for any positive integer k, f is
of class Ck on (—R, R) and that f(k) is the sum of the /c-times derived series. □

7.21 Corollary. Every power series J^q^ 0LnXn with a positive radius of conver
gence is the Taylor series of its sum; that is, if f(x) = ^^° anxn for \x\ < R
(R > Q), then

Proof. Since (d/dx)nxk = 0 when k < n and (d/dx)nxn = n\, we have

dn
f{n) (x) = — (ao + aix + ■ ■ • + anxn + ■■•)= n\an + ■■■ ,

where the last set of dots denotes terms containing positive powers of x. Setting
x = 0, we obtain /W (()) = n\an. □

7.22 Corollary. //X)o° anXn = Eo° Kxn for \x\ < R(R> 0), then an = bnfor
alln.

Proof. We have an = pn> (0)/n! = bn where f(x) is the common sum of the two
series. D

The following examples will illustrate the use of Theorem 7.20. The second one
contains a result of importance in its own right, the binomial formula for fractional
and negative exponents.

Example 4. Suppose we wish to express the sum of the series ^^° xn/n2
in terms of familiar elementary functions. The key is to recognize that this
series is related to the geometric series Ylo^ xU> an<^ tnat me factors of 1/n
should arise from integrating the latter series. With this in mind, we proceed as
follows. Setting f(x) = ^J° xn/n2, we obtain successively

n
328 Chapter 7. Functions Defined by Series and Integrals

Undoing these transformations in turn yields

xf(x) = -log(l - x), f'(x) = -

and, finally,

JO l

Example 5. Let a be a real number. Since

dn
— (1 + x)a = a(a - 1) • ■ • (a - n + 1)(1 + x)
dxn

the Taylor series of (1 + x)a is

(7.23) fa(x) = f:(a)x", where ^ afa"

(with the understanding that (^) = 1). This series is called the binomial series
of order a. When a is a nonnegative integer k, the terms with n > k all vanish
since they contain a factor of (a — k), and we obtain the familiar binomial
expansion formula for (1 + x)k. For other values of a, the Taylor series is a
genuine infinite series, and one can easily check by the ratio test that its radius
of convergence is 1. Our aim is to verify that the sum of this series is actually
(l + x)Qfor |x| < 1.
We need the following formulas concerning the generalized binomial co
efficients ()

(7.24) n(\n/)=-^ !'(n "L,

- 1)!
— = a(\n — 1,
(7.25)
'a^ [(a-n)+n](a-l)---(Q-n + l) /a - 1\ /a - 1
n! \ n / \n — 1

Now, if /Q(x) is defined by (7.23) for \x\ < 1, by (7.24) we have

'a- T

= afa-i{x).
7.3. Power Series 329

(For the third equality we have made the change of variable n —> n + 1.) On
the other hand,

nn+l

In the second equality, we substituted n — 1 for n in the second sum, and

the third equality comes from (7.25). Combining these results, we see that
(1 + x)f'a(x) = afQ(x). Multiplying through by (1 + x)~a~l yields

0 = (1 + x)-af'a(x) - a(l + x)-a-lfa(x) = ^[(1 + ar)-a/a(aj)].

Thus (1 + ^^""/(^(x1) is a constant C, and setting x = 0, we see that C =
/a(0) = 1. In short, fa{x) = (1 + x)a, as claimed.

Example 6. The series ^^°(-1)™i2" is a geometric series with ratio — x2,

so it converges to (1 + x2)~1 for \x\ < 1 and diverges elsewhere. By Corollary
7.21, this series is the Taylor series of the function f(x) = (1 + x2)~l about
x = 0. Now, the function / is C°° on the whole real line, so it seems rather
mysterious that its Taylor series converges only on a finite interval. Why should
the series behave badly as x —> ±1 when the function itself does not? The
mystery is dispelled by considering complex values of x and recalling that the
region of convergence of a power series in the complex plane is always a disc.
The function f(x) does blow up at x = ±i, so the largest disc about the origin
in the complex plane on which / is smooth is the disc |x| < 1.

Abel's Theorem. Suppose f(x) = ^^° anxn is a power series whose radius of
convergence R is positive and finite. We have seen that the convergence is uniform
on any compact subinterval of (-R, R) and hence that / is continuous on (-R, R).
But now suppose that the series converges at one of the endpoints, say x = R. Does
the uniformity of convergence and the continuity of the sum persist up to this point?
If the series converges absolutely at x = R, then the M-test (with Mn =
\an\Rn) shows that the series converges absolutely and uniformly on [-R,R], so
its sum is continuous there. But when the convergence is only conditional, a more
subtle argument is needed. The necessary tool is the summation-by-parts formula
that we used to obtain Dirichlet's test; since we need a slightly different version of
that formula than the one given in Lemma 6.23 (namely, formula (7.27)), we shall
simply derive it as we proceed.
330 Chapter 7. Functions Defined by Series and Integrals

7.26 Theorem (Abel's Theorem). If the series Y^ anXn converges at x = R

(resp. x = —R), then it converges uniformly on the interval [0, R] (resp. [-R, 0})
and hence defines a continuous function on that interval.

Proof. Convergence at x = —R (and uniform convergence on [-R, 0]) of f(x) =

Y^ anxn is the same as convergence at x = R (and uniform convergence on [0, R])
of /(—x) = Y,( — l)nanxn, so it is enough to consider convergence at x = R.
Moreover, convergence at x = R (and uniform convergence on [0, R]) of f(x) =
Y anxn is the same as convergence at x = 1 (and uniform convergence on [0,1])
of f(Rx) = Y anRnxn. In short, it is enough to assume that ^o° an converges
and to prove that Y^q* anxn converges uniformly on [0,1]. To do this we must
show that the tail end YT anxn of the series converges uniformly to zero on [0,1]
as k —> oo.
For k > 1, let A}. = J^^° an be the kth tail end of the series J^ an, so that
(ik = Ak — -Afc+i- For / > k and x G [0,1] we have

akxk + ■■■+ atxl = (Ak - Ak+1)xk + • ■ • + (Aj - Al+1)xl

= Akxk + Ak+l(xk+1 -xk) + --- + Ai{xl - xl~l) - l

Let / —> oo: then A[+\ —> 0 and xl remains bounded, so the last term on the right
disappears and we obtain

(7.27) Y,anxn = Akxk + ]T>n+i(:rn+1 - xn).

k k

Now, given e > 0, we can choose k so large that \An\ < \e whenever n > k.
Since x G [0,1], we have xn+1 - xn < 0, so (7.27) yields

2^ anXU
k
oo

If x = 1, the series on the right vanishes; if 0 < x < 1, it is a telescoping series

whose sum is xk. In either case, we obtain

exk < e
k

for all x G [0,1] when k is sufficiently large, which establishes the desired uniform
convergence. □
7.3. Power Series 331

Remark. If J^ anRn converges, we already know (Theorem 7.17) that ]T) anxn
converges uniformly on [—r, r] for any r < R. Combining this with Abel's the
orem, we see that Y^anXn converges uniformly on [—r,R]. (See Exercise 7 in
§7-1.)
The continuity of the series at the endpoint can be restated in the following way.
Recall that lim^a- f(x) denotes the limit of f(x) as x approaches a from the left.

7.28 Corollary. //^o° a" converges, then limx^i- ^^° anxn = J2o an-

Example 7. The expansion arctanx = ]r^°(-l)n;c2n+1/(2n + l) was es"

tablished in Example 2 for \x\ < 1. Since the series also converges at x = 1
(by the alternating series test), we obtain a neat series formula for n:

~ (-1)"
W
4
= -r—»1_
lim arctanx = />^■< 9'n 4- 1
= 1 — 40 + 10/
— ! + •••■

The converse of Corollary 7.28 is false: The limit S — lim^^i- ^0 anxn may
exist even when ^^° an diverges. (Example: Take an = (—1)"; then ^^ anxn =
(1 + a;)"1 for \x\ < 1, so S = |.) In this case the series Yl an is said to be Abel
summable to the sum S. Abel summation provides a way of making sense out
of certain divergent series that is useful in some situations, one of which we shall
discuss in §8.2.

EXERCISES

1. Let {an}g° be a sequence of real or complex numbers.

a. Suppose that \an+\/an\ converges to a limit L as n —> oo. Show that the
radius of convergence of ^o° a«x?l is L -1.
b. Suppose that [onl1'7* converges to a limit L as n —> oo. Show that the
radius of convergence of J2^ o,nxn is L 1,
2. Show that if the sequence {an}o° is bounded, the radius of convergence of
^o° anxn is at least 1.
3. Suppose the radius of convergence of ^^° anxn is R. What is the radius of
convergence of ]T^ anxkn (k = 2,3,4,...)?
4. Show that for any sequence {an}o°, the radius of convergence of ^^° anxn is
the reciprocal of limsupj^^ la^l1/™. (See Exercises 9-12 in §1.5 and Exer
cise 25 in §6.2.)

5. Show that each of the following functions of x admits a power series expansion
on some interval centered at the origin. Find the expansion and give its interval
of validity.
332 Chapter 7. Functions Defined by Series and Integrals

o_e dt.
b. Jq cos t2 dt.
c. f* t~r log(l + 2t) dt.
6. Use the series expansions in Exercise 5 to calculate the following integrals to
three decimal places, and prove the accuracy of your answer.
a. fi e~t2 dt.
b. Jq1 cost2 dt.
c. /01/V1log(l + 2£)(ii
7. Let f(x) = 5^o° anXn be a power series with positive radius of convergence.
Show that f(-x) = f(x) (resp. }'{-x) = —f(x)) for all x in the interval of
convergence if and only if an = 0 for all odd n (resp. all even n).

8. Let A; be a nonnegative integer. The Bessel function of order k is the function

J/c defined by

' - Z_, nun + kV 2

a. Verify that the series defining Jfc(x) converges for all x.

b. Show that (d/dx)[xk Jk{x)\ = xk Jk-\{x).
c. Show that (d/dx)[x~k Jk{x)] = —x~hJk+\{x).
d. Show that u = Jk(x) satisfies the differential equation x2u" + xu'
{x2 - k2)u = 0.
9. Show that the series

1 + —
Hi + 2 • 3 ■ 5 • 6 + + 2-3-5-6---(3n-
converges for all a; and that its sum f(x) satisfies f"{x) = xf(x).
10. Express the sums of the following series in terms of elementary functions and
(perhaps) their antiderivatives in the manner of Example 4.

(2n + l) ■ (2n

C'
7.4. The Complex Exponential and Trig Functions 333

11. Consider the function f(x) = /Qx arctan t dt.

a. Perform the integration to evaluate / in terms of elementary functions.
b. Using the result of Example 2, compute the Taylor series of f(x) (centered
at the origin) and show that it converges to f(x) for x e [—1,1]. (The
endpoints require special attention.)
c. Deduce that

1 * 1 i 1 i 1 1 1 i _ 1^

7.4 The Complex Exponential and Trig Functions

The series Yl'o' zn/nl converges absolutely for every complex number z, by the
ratio test, so we can use it to define the exponential function for a complex variable:

This extended exponential function still obeys the basic law of exponents. Indeed,
by Theorem 6.29,

(7.29) ezew = V '—— = V V

Z—/ mini
mini ^
^—' L^
Z—' m\n\
i.n 0 /e=0 m+n=fc fc=0

(In the third equality we have used the binomial theorem.)

Let i = v/-^ be the imaginary unit. Since i2 = -1, we have i3 = —i and
i4 = 1, so

■4n = 1) j4n+l=-) ^n+2 = _^ -4n+3 _ _^ (n = 0, 1, 2, . . .).

Therefore, when z = ix is purely imaginary,

2 x4 \ ( x3 x5

The series on the right are the Taylor series of cos x and sin x, so we have arrived
at Euler's formula

(7.30) elx = cos x + isinx.

This is the appropriate place to raise the issue of the definition of cos x and
sin x. These functions are so familiar that we take them entirely for granted, but the
334 Chapter 7. Functions Defined by Series and Integrals

definitions presented in elementary trigonometry — as ratios of sides of right trian

gles, or as the coordinates of the point where the unit circle intersects the ray that
makes an angle x with the positive horizontal axis — are quite unsatisfactory, for
they provide neither a precise formula nor a computationally effective algorithm.
(Think for a minute: How could you possibly use these definitions to calculate
cos(l) to four decimal places?)1 In fact, the best procedure is to use Taylor series
as a definition! That is, we define cosx and sinx for all real (or, for that matter,
complex) numbers x by

._,.. ^ (l)x .
(7.31) COS 2= > . ., , S1M= > .
v (2n)! o (2n+iy-
We now indicate how to derive all the familiar properties of the trig functions
from these definitions. First, it is clear from (7.31) that

(7.32) cos(—x) = cosx, sin(—x) = — sinx,

so that e~lx = cosx — isinx. Second, termwise differentiation of (7.31) immedi

ately yields

(7.33) cos' = — sin, sin' = cos.

Third, the addition formulas for sine and cosine follow easily from the law of ex
ponents:

cos(x ± y) + i sin(x ± y) = el{-x±y) = eLXe±iy

= (cos x + i sin x) (cos y ± i sin y)

= (cos x cos y =p sin x sin y) + i (sin x cos y ± cos x sin y).

Taking the real and imaginary parts of both sides, we obtain

cosfx ± y) = cos x cos y =F sin x sin y,

(7 34)
sin(x ± y) = sin x cos y ± cos x sin y.

In particular, we have the Pythagorean identity

(7.35) cos2 x + sin2 x = cos(x — x) = cos 0 = 1.

'A similar problem arises if one tries to define ex directly. However, here there is an alternative:
Define logrr to be f® t^1 dt and then define exp to be the inverse function of log. The analogous
procedure for developing trig functions, taking the equation arcsinx = /^(l — t2)~1^2 dt as a
starting point, is less satisfactory, because the inverse function of arcsin is not the whole sine function
but only its restriction to the interval [—tt/2, tt/2].
7.4. The Complex Exponential and Trig Functions 335

Next, we have to bring the number vr into play somehow. We can proceed as
follows. The series ^^°( —l)n22"/(2n)! for cos2 is an alternating series whose
terms decrease in magnitude starting with n = 1, so by the alternating series test,

22 24 2
cos 2 = 1- —- = — 1 with error less than — = -.

In particular, cos 2 < 0, and of course cos 0 = 1 > 0, so by the intermediate value
theorem there is at least one number a 6 (0, 2) such that cos a = 0. Therefore, the
set Z = {x > 0 : cosx = 0} is nonempty; it is closed since cos is continuous;
hence it contains its greatest lower bound, which is positive since cosO = 1. We
denote this smallest positive zero of cos by jtt. (Again, this may be taken as a
definition of the number vr, from which its other familiar properties can be derived.)
Now, by (7.33), (d/dx) sin x = cos x > 0 for 0 < x < \ix, so sin is increasing
on [0, |tt], and sinO = 0; hence sin ^tt > 0. But by (7.35), sin2 \-k = sin2 tjvt +
cos 2 2
2?r = 1; hence, sin jtt = 1. In summary,

(7.36) cos 0 = sin |tt = 1, sin 0 = cos \v: — 0.

All of the familiar formulas of (precalculus) trigonometry can be derived from

the even-odd relations (7.32), the addition formulas (7.34), and the special values
(7.36), and these together with (7.33) yield all the formulas for integration and
differentiation of trigonometric functions. For example, (7.34) and (7.36) yield the
complementarity relations

cos(s?r — x) = cos ^vrcosx + sin ivrsinx = sinx,

(7.37) ^2^2 2
sin(^7r — x) = sin ^vr cosx — cos ^7rsinx = cosrc.

These, in turn, yield the 2vr-periodicity of sine and cosine. Indeed, replacing x by
—x in (7.37) and using (7.32), we see that cos(x+^7r) = — sinx andsin(x+^vr) =
cos x, whence

cos(a; + tt) = cos(x + ^vr + ^tt) = — sin(x + |tt) = — cos x,

sin(x + vr) = sin(rc + \tt + \tt) = cos(x + ^vr) = — sinx,

and therefore

cos(x + 2tt) = — cos(x + vr) = cosx, sin(x + 27r) = — sin(x + vr) = sinx.
336 Chapter 7. Functions Defined by Series and Integrals

EXERCISES

1. Recall that the hyperbolic sine and cosine functions are defined by sinh z =
\(ez — e~z) and coshz = \{ez + e~z). Here, z may now be taken to be a
complex number.
a. Show that sinh ix = i sin x and cosh ix = cos x.
b. Show that sinh(z+u>) = sinh z cosh w+cosh z sinh. w and cosh(z+w) =
cosh z cosh w + sinh z sinh w.
c. Express sinh(x + iy) and cosh(x + iy) in terms of real functions of the
real variables x and y.

2. Verify that the formula (d/dx)ecx — cecx remains valid when c is a complex
number. (However, x is still a real variable, since we have not discussed differ
entiation of functions of a complex variable.)

3. Let a and b be real numbers. Compute J e^a+lb^x dx by using the result of

Exercise 2; then, by taking real and imaginary parts, deduce the formulas

f „„. , , eax(acosbx+ bsinbx)

/ eax cosbx dx =
a2 + b2
eax(a sin bx — b cos bx)
eax sin bx dx =

7.5 Functions Defined by Improper Integrals

In the preceding sections we have considered infinite series of functions. The ana
logue for integrals is an improper integral Jc f(x, t) dt, where the integrand con
tains a free variable x as well as the variable of integration and the resulting integral
defines a function of x. The integral may be improper because c = —oo or d = oo
or because of singularities of the function /. To keep the notation simple, we shall
restrict our discussion to the case where d = oo and / has no singularities on
[c, oo), but everything we say extends to the other cases with the obvious modifica
tions.
In this situation, the notion of uniform convergence is as follows: We say that
the integral /c°° f{x,t) dt converges uniformly for x € / (where / is an interval
in R) if the difference between the "partial integral" Jc and the full integral J^
- that is, the "tail end" J^° - tends to zero uniformly for x € / as d —» oo.
Precisely, this means that

/•oo

sup / f(x,t)dt —> 0 as d —> oo.

xei Jd
7.5. Functions Defined by Improper Integrals 337

The most useful test for uniform convergence is the following analogue of the
Weierstrass M-test. The proof is essentially identical to that of the M-test, and we
leave the details to the reader (Exercise 1).

7.38 Theorem. Suppose there is a function g(t) > 0 on [c, oo) such that (i)
\f(x,t)\ < g(t) for all x £ / and t > c, and (ii) f™g(t)dt < oo. Then
/c°° f(x, t) dt converges absolutely and uniformly for x G /.

The consequences of uniform convergence for continuity, integration, and dif

ferentiation of the function F(x) = /c°° f(x, t) dt are much the same as for series.
The following two theorems provide analogues of Theorems 7.10 and 7.13 in the
present setting.

7.39 Theorem. Suppose that f{x,t) is a continuous function on the set {(x,t) :
X S I, t > c} and that the integral Jc°° f(x,t)dt is uniformly convergent for
x G /. Then:
a. The function F{x) = Jc°° f(x, t) dt is continuous on I.
b. If[a,b] C /, then
i'b roo rco rb

/ / f(x,t)dtdx= / f{x,t)dxdt.
Ja Jc Jc Ja

Proof. The conclusions are true if /c°° is replaced by j^1 where d < oo, by The
orems 4.46 and 4.26. (a) then follows because the uniform limit of continuous
functions is continuous, and (b) follows by the argument in the proof of Theorem
7.11. □

7.40 Theorem. Suppose that f(x, t) and its partial derivative dxf(x, t) are con
tinuous functions on the set {(x, t) : x £ I, t > c}. Suppose also that the integral
Jc°° f(x, t) dt converges for x € I and the integral /c°° dxf(x, t) dt converges uni
formly for x G /. Then the former integral is differentiate on I as a function of x,
and
d f°° f°° df
~ f(x,t)dt= jf(x,t)dt.
dx Jc Jc dx

Theorem 7.40 may be deduced from Theorem 7.39 in much the same way as
Theorem 7.12 was deduced from Theorem 7.11 (Exercise 2).
Let us state explicitly the result of combining Theorems 7.39 and 7.40 with
Theorem 7.38:

7.41 Theorem. The conclusions of Theorem 7.39 are valid whenever \f(x,t)\ <
g(t) for all x G / and t > c, where Jc°° g(t) dt < oo. The conclusions of Theorem
7.40 are valid whenever Jc°° f(x, t) dt converges for x G / and \dxf{x, t)\ < g(t)
for all x <E I and t > c, where Jc°° g(t) dt < oo.
338 Chapter 7. Functions Defined by Series and Integrals

The manipulation of improper integrals by the foregoing theorems can be quite

an entertaining exercise, and it leads to a number of interesting and useful results.
Let us look at some examples.

,,-,,
f
, arctan(fri) — arctan(at) ,
Example 1. Evaluate / — K-—'- dt where 0 < a < b.
Jo
Solution: We recognize that the integrand is fa (x2t2 + l)~1dx. For x > a
t >0wehave(x2t2 + l)"1 < (a2t2 +1)"1, and /0°°(a2£2 +1)"1 dt < oo.
Thus, by Theorem 7.38, the integral /O°°(x2t2 + 1)~1 dt is uniformly convergent
for x > a, so we can apply Theorem 7.39 to obtain

arctan(6i) - arctan(at) f°° fb 1

y—L y—l dt= / -^ dx dt

rb foo i rb fb
J- ... / _l no In
n n dtdx = I x^1 arctanxt ^ = I
\AiU VAiUj f ^Lj tJuX. Kj ULU1 <JU O ^, I — dx

TV fb
= -z log -
2 \a

Example 2. Let

/■OO

~ Jo
Since (dk/dxk)e~xt2 = {-~t2)ke~xt2, by Theorem 7.40 we can conclude that
/•oo

/ t2ke~xt dt {x > 0),

provided that we establish the uniform convergence of the integral on the right.
In fact, the convergence is not uniform on the whole interval (0, oo), but it is
uniform on [8, oo) for any 6 > 0, which is sufficient. This follows easily from
Theorem 7.38, since t?ke-xt2 < t2ke~6t2 for x > 5.
On the other hand, we can evaluate F(x) explicitly by making the substi
tution u = xxl2t and invoking Proposition 4.66:

f{X)= re-u2
Jo

and therefore
7.5. Functions Defined by Improper Integrals 339

Comparing the two formulas for F^ (x), we conclude that

°° ,2k -xt2

This result can also be obtained by a laborious /c-fold integration by parts (u =

t2k~l, dv = te~xt2 dt, etc.), but differentiation under the integral gives a rather
painless derivation.

Example 3. We now derive one of the most important of all integral formulas:

f°° suit , n
(7.42) / dt

This is a bit tricky, since the integral is not absolutely convergent. (Note that
since t"1 sin* —>■ 1 as t -» 0, the integral over [0,1] is an ordinary proper inte
gral. The convergence of the integral over [1, oo) was proved in §4.6 [Example
3].) Our strategy will be to consider an improper integral with two parameters:

/■oo -xt „;„ ,,f

(7.43) F(x,y) = / —^ dt (x > 0, y € K).

./o '

Again, this integral is proper at t = 0, and for x > 0 it is absolutely convergent.

First, we fix x > 0 and consider the integral as a function of y. Formal
differentiation of (7.43) with respect to y leads to

of _ r°°
e xt cos ytdt.
dy Jo
By Theorem 7.41, this formula is indeed valid, since |e xicosyt| < e for
all y and /0°° e~xt dt < oo. The integral on the right can be evaluated by
elementary calculus (integrate by parts twice, or use Exercise 3 in §7.4), and
the result is

dF _xty s'myt — x cos yt x

dy x2 + yl x2 + y2

Now we can recover F by integrating in y. Obviously F(x, 0) = 0, so we get

the right constant of integration by starting the integration at 0:

rv x
x

F{x,y) = / ^——? ds = &TCta,n(y/x).

Jo x2 + s2
340 Chapter 7. Functions Defined by Series and Integrals

The variable y has now served its purpose, and we henceforth set it equal to 1.
We have shown that

00 e xt sin t
(7.44) dt = arctan(l/a;) (x > 0).
t

We now wish to let x —» 0. In order to pass the limit under the integral sign
in (7.44), it is enough to show that the integral in (7.44) is uniformly convergent
for x > 0. Unfortunately, Theorem 7.38 does not apply here, since the integral
is not absolutely convergent at x = 0. (Theorem 7.38 easily yields the uniform
convergence for x > 5 for any S > 0, but that isn't good enough!) Recall the
meaning of uniform convergence: What we need to show is that

"" e-xlsint
sup (it 0 as b —> oo.
x>0

To this end, we use integration by parts,2 taking u = t 1 and dv — e xt sin t dt;

the result is

roc e-xts[nt
e-bx (xsinb °° e~xt {x sin t + cos t)
Ib 1)6

Now,

e (x sin t + cos t)
(x2 + 1)

The quantity on the right is continuous on R and tends to zero as x —> oo, so it
is bounded by a constant C for x > 0. Therefore,

roo _—xt
e~xt sin t
sup dt
x>0 t

which tends to zero as 6 —> oo, as desired. Thus the convergence is uniform in
(7.44), and it follows that

sini -xtsint
dt — lim dt = lim arctan(l/a;) = —.
x->0+

2The idea is much the same as the use of summation by parts in the proof of Abel's theorem.
7.5. Functions Defined by Improper Integrals 341

EXERCISES

1. Prove Theorem 7.38.

2. Prove Theorem 7.40.

3. Suppose x > 0. Verify that /*0°° e~xt dt = x~1, justify differentiating under
the integral sign, and deduce that Jo°° tne~xt dt = nlar™"1.
4. Verify that JQ°°{t2 + .r)"1 dt = \irx~ll2, justify differentiating under the inte
gral sign, and thence evaluate L (i2 + x) n dt.
roo e-bx _ e-ax a
5. Show that / —— dx = log - for a, b > 0.
,/o x b
roo c-bx _ e-ax | _j_ a2
6. Show that / ■ cos x dx = 5 log r-r7 for a, b > 0.
Jo x 1 + V

7. Show that /°°e~xl ~cosax dx = \ log(l + a2) for all a G R

Jo x
8. Deduce from (7.42) that

Show that the convergence is uniform for x E I if / is any compact interval

withO ^ I, but not if 0 G /.
POO yjn2 rj.j
9. Use Exercise 8 to show that / — dt = hirx for x > 0.
Jo f
-. _ _. ,. f00 cos bx — cos ax ,
10. Let J(a, 6)= / 5 dx.
Jo x*
a. Show that I(u. b) is convergent for all a, b 6 M and that the convergence is
uniform for a in any finite interval when b is fixed (or vice versa).
b. Use Exercise 8 to show that I(a, b) = s7r(a — b) if a, b > 0.
c. Show that /(a, 6) = ^7r(|a| - |6|) for all a, b € K.
11. Let F{x) = Jo°° e~e cos xt dt for i£l.
a. Justify differentiating under the integral sign and thence show that F'(x) =
-\xF{x).
b. Show that F{x) = \^e-x'2l\
12. Let G(x) = fo°° e~l sin xtdt for i£i Proceeding as in Exercise 11, show

/ 1 e
13. Show that / = dt = V^nr for a; > 0.
/o t
342 Chapter 7. Functions Defined by Series and Integrals

14. Let F(x) = /0°° e t ~(x"/f2' dt.

a. Show that F is a continuous function on R that satisfies F'(x) = —2F(x)
for x > 0 and F'(x) = 2F(x) for x < 0.
b. Show that F(x) = \yfix e~2\xV
c. Evaluate /0°° e"**8"^) dt for p, g > 0.
15. Let / be a continuous function on [0, oo) that satisfies |/(x)| < a(l + x)Nebx
for some a,b,N > 0. The Laplace transform of / is the function L[f] defined
on (6, oo) by
/"OO
7- r rt-| / \ / aw * i \ -w

a. Show that L[f] is of class C°° on (6, oo) and (d/ds)nL[f] = (-l)nL[/n]
where fn(x) = xnf(x).
b. Suppose that / is of class C1 on [0, oo) and that /' satisfies the same sort of
exponential growth condition as /. Show that L[f'](s) = sL[f}(s) - /(0).

7.6 The Gamma Function

Perhaps the most important of all functions defined by improper integrals is the
gamma function F(x) denned for x > 0 by

/>oo

(7.45) T(x-) = / «*-1e~< dt,

which has a way of turning up in many unexpected places. Let us analyze the
integrals over [0,1] and [1, oo) separately. The integral over [0,1] is proper for
x > 1 and improper but convergent for 0 < x < 1. In fact, by Theorem 7.38 it
is uniformly convergent for x > 5, for any 5 > 0, since 0 < tx~1e~t < ts l for
x > 5 and 0 < t < 1. The integral over [1, oo) is convergent for all x and uniformly
convergent for x < C, for any constant C, since 0 < tx^le^t < tc~1e~t for
x < C and t > 1. Therefore, the integral defining F(x) is convergent for x > 0
and uniformly convergent on <5 < x < C for any 5 > 0 and C > 0.
It follows that F is a continuous function on (0, oo). In fact, V is of class C°° on
(0, oo), and its derivatives can be calculated by differentiating under the integral:

Since | log t| grows more slowly than any power of t as t —> 0 or £ —► oo, the argu
ment of the preceding paragraph shows that the integral on the right is absolutely
7.6. The Gamma Function 343

and uniformly convergent for 6 < x < C for any positive S and C, so Theorem
7.40 guarantees the validity of (7.46).
The most important property of T is that it satisfies the functional equation

(7.47) F(x + 1) = xT(x).

The proof is a simple integration by parts (u = tx, dv = e~l dt):

\ /
Jo Jo
There are two values of Y that can be calculated easily by hand:

r(l)= / dt =-
Jo

(For the second one we set u = \fi and used Proposition 4.66.) The functional
equation (7.47) now yields the values of T at all positive integers and half-integers:

r(2) = ir(i) = i, r(3) = 2r(2) = 21, r(4) = 3r(3) = 31,...

and so by induction,

(7.48) r(n) = (n-l)!, T(n + \) = (n - J) • • ■ §

Thus the gamma function provides an extension of the factorial function to non-
integers: x\ = F(a; + 1), if you like. It is the natural extension of the factorial
function, not just because it gives the right values at the integers, but because the
functional equation T(x + 1) = xT(x) is the natural generalization of the recursive
formula n! = n ■ (n — 1)\ that defines factorials.
Other factorial-like products — more precisely, products of numbers in an arith
metic progression — can also be expressed in terms of the gamma function. Indeed,
since

n)(c + n- l)---cT(c),

for a, b > 0 we have

(7.49)

r(f)
344 Chapter 7. Functions Defined by Series and Integrals

The functional equation, written in the form

rw . Efe + i)
x

shows that F(x) blows up like x^1 as x —► 0. It also provides a way of extending
the gamma function to negative values of x. Indeed, the expression on the right is
defined for all x > — 1 except x = 0, and it can be taken as a definition of T{x)
for -1 < x < 0. Once this has been done, T(x + l)/x is denned for all x > -2
except x = 0, — 1, and it can be taken as a definition of F(x) for -2 < x < —1.
Proceeding inductively, we eventually obtain a definition of F(x) for all x except
the nonpositive integers, where T(x) blows up. In more explicit form, it is

(7.50) T(x) = ^(:E!nl

a;(a; + 1) • • ■ (x + n — 1)
tt (* > -n).
This extended gamma function still satisfies the functional equation (7.47), more or
less by definition, and (7.49) remains valid provided that a/b is not a nonpositive
integer.
The qualitative behavior of the gamma function for x > 0 can be analyzed as
follows: Since r(l) = F(2) = 1, there is a critical point x0 in the interval (1,2)
by Rolle's theorem. On the other hand, from (7.46) it is clear that T"(x) > 0
for x > 0, so that T'(x) is strictly increasing. It follows that F is decreasing for
0 < x < xq and increasing for x > xq\ in particular, it has a minimum at xq. Also,
it tends to oo as x —* 0 or x —> oo, so its graph is roughly U-shaped. The behavior
for x < 0 can then be deduced from (7.50). The graph of F is sketched in Figure
7.3.
A number of useful integrals can be transformed into the integral defining T(x)
by a change of variables. We single out two particularly useful ones, obtained by
setting u = bt and v = t2, respectively:

L
'77 /

(7 .51) e~btdt = 6-
T
poo />OO
dv

Lt2"
B-le-t2 M /1
,,{2»-l)/2.
(7 .52) (XL —

Jo 2V1/2

There is another important integral related to the gamma function, the so-called
beta function

(7.53) tx'l(l-t)y-1
B(x,y)= f tx'l(l-t)y-1dt (x,y>0).
J
Jo
7.6. The Gamma Function 345

FIGURE 7.3: Graph of the equation y = T(x), -4 < x < 4. (The

lines x = — k, k — 0,1,2,..., are vertical asymptotes.)

Since the integrand is approximately equal to tx 1 for t near 0 and to (1 — t)y l

for t near 1, the integral is proper when x, y > 1 and convergent for x, y > 0. Like
the gamma function, the beta function can be expressed in a number of different
forms by changes of variable in the integral. Other than (7.53), the most important
of these is obtained by the substitution t = sin2 6, which makes 1 — t = cos2 6 and
dt = 2 sin 9 cos 9 d6, so that

(7.54) B{x,y) = 2l sin21"1 tfcos2^1

The relation between the gamma and beta functions is as follows:

T(x)T(y)
7.55 Theorem. Forx,y > 0, B(x,y) =
T(x + y)'

Proof. We employ the same device that we used to calculate/^ e x dx in §4.7:

We express F(x) and T(y) by (7.52), write T(x)T(y) as an iterated integral, convert
346 Chapter 7. Functions Defined by Series and Integrals

the latter to a double integral, and switch to polar coordinates:

/•oo roo

r(x)T(y) = 4 / t2x-\rt2 dt / s2v-le-s* ds

Jo Jo
roo roo

= 4/ / 82y-lt2x-le-s*-t2d3dt
Jo Jo
rti/2 roo

= 4/ / (r cos 0)'2v-l{rsm9)2x-le-r2rdrde
Jo Jo
/"ft/2 roo
=4 / cos22'-1 9sin21-1 9d9 / r2x+2y-le~r'2 dr
Jo Jo
= B(x,y)T(x + y).

In the last step we have used (7.52) and (7.54). □

We draw two useful consequences from Theorem 7.55. The first one is another
functional equation for the gamma function; the second one compares the growth
of T(x) and T(x + a) as x -4 oo.

7.56 Theorem (The Duplication Formula). T(2x) = 7r^1/2222;-1r(a;)r(a; + |).

Proof. Assume that x > 0. By taking y = x in Theorem 7.55 and observing that
the function i(l - t) is symmetric about t = |, we see that

dt.

By the substitution

and another application of Theorem 7.55, we obtain

TOr) ~ Is ' s> ds = 2 ■ —

1 \/x) Jo L [X + |J

Since F(^) = tt1/2, the result follows. The extension to negative values of a; is left
to the reader (Exercise 6). □

F(x + a)
7.57 Theorem. Fora > 0, lim v ^ / = 1.
7.6. The Gamma Function 347

Proof. By Theorem 7.55, the substitution t = e u, and formula (7.51),

r(x)r(o) rl
{l-t)a~idt =
x_

T(x + a)

When x is large, e xu is very small unless u is close to 0, and in that case 1 — e "is
approximately u. Hence, the integral on the right should be approximately equal to
/0°° ua le xu du = x~aT(a), which is what we are trying to show. More precisely,
we have

r(x)T(a) /■oo />oo

/ u"'-1e-xudu+ / [ 1 - -ua-l]e~xudu
T{x + a) 'o Jo
oo

[I - e ~u)a-1 - ua-l]e-xu du.

Multiplying both sides by xa/T(a), we obtain

—u\ a—I
xaT(x) 1-e" ua-le-xudUi
(7.58) - 1 = -1
T{x + a) r(a) Jo

It remains to show that the quantity on the right tends to zero as x —> oo.
The function defined by /(u) = (1 - e~u)/u for u/0 and /(0) = 1 is
everywhere positive and of class C°° (even at u = 0, for it is the sum of the power
n-lun-l
series ; (-1) l/n!). Hence the same is true of f(u)a l, so the function
g(it) = f{u)a~l — 1 is smooth and vanishes at u = 0. By the mean value theorem,
then, for 0 1 we clearly have
0 < f(u) < 1 and hence —1 < g(u) < 0. Therefore, the quantity on the right of
(7.58) is bounded in absolute value by

x"
Cuae~xu du
T(a)
x
< uae~xu du uae~xu du uae-xu du
r»

where we have used (7.51) again in the last step. In short, the right side of (7.58) is
dominated by x~l as x —> oo, so we are done. □
348 Chapter 7. Functions Defined by Series and Integrals

Theorem 7.57 can be used as an effective alternative to Raabe's test to decide

the convergence of series involving quotients of factorial-like products, for such
quotients can be expressed as quotients of gamma functions by (7.49).

Example 1. Let us reconsider Example 7 from §6.2, namely,

-4- 7-. .(3n + l)

^ n23nn!
Since

1.4.7...(3n + l) = 3"[|-|...(n + |)]=3"^±i-)

and n! = Y(n + 1), the nth term of the series is

By Theorem 7.57, T(n + |)/r(n + 1) is approximately n1/3 when n is large,

so the series converges by comparison to £) n~5/3.

EXERCISES

1. Prove the duplication formula for the case where x is a positive integer simply
by using (7.48).

2. Show that for a, b > 0,

3. Evaluate the following integrals:

a. J0°°2;4e~x <ix.
b. f^e~
c. Jo x e ax.

4. Prove the following identities directly from the definition (7.53) (without using
Theorem 7.55):
a. B{x,y) = B{y,x).
b. B{x,l) =x~\
7.6. The Gamma Function 349

c. B(x + l,y)+B{x,y + l)=B(x,y).

d. b(x, y) = /0°°(i + t)-x-yty~l dt.
5. Given a,b,c> 0, evaluate J^ xa(l - xb)c dx in terms of the gamma function.
6. Use the functional equation (7.47) to show that if the duplication formula (7.56)
is valid for a particular value of x, then it is also true for x — 1. Thence show
how to deduce its validity for all x from its validity for x > 0. (In case x is
a nonpositive integer or half-integer, the formula is valid in the sense that both
sides are infinite.)

7. Use (7.54), Theorem 7.55, and (7.48) to evaluate £/2 sinfc x dx. (The form of
the answer is different depending on whether k is even or odd.)

8. Prove Wallis's formula:

n _ 2-2-4-4-6-6---(2n)(2n)
2 n->oc l-3-3-5-5-7---(2n-l)(2n + l)'

(Hint: Denote the fraction on the right by cn. Use Exercise 7 and the fact that
sin2n+1 x < sin2" x < sin2""1 x for 0 < x < \-k to show that cn < \tx <
(2n + l)cn/2n.)

9. Suppose / is a continuous function on [0, oo). For a > 0, define the function
/Q[/]on[0,cx))by

Ia[f] is called the ath-order fractional integral of /.

a. Show that the derivative of Ia+i[f] is Ia[f] f°r a > 0, and the derivative
of I\ [/] is /. (This generalizes Exercise 6 in §4.5.)
b. Show that Ia[Ifi[f\] = Ia+p[f] for all a,/3 > 0,
10. Test the following series for convergence in the manner of Example 1.
^ 1 ■ 4 ■ ■ • (3n + 1)

• 9 • • • (4n + 5)

11. Show that ]P — converges if and only if p > 2. (Try both

1 L 2 ■ 4•••(2n) J
Theorem 7.57 and Raabe's test; you'll find that the latter doesn't work in the
borderline case p = 2.)
350 Chapter 7. Functions Defined by Series and Integrals

12. Suppose a,b,c > 0. Show that \ ' _.—-—— converges if and only
*-^ T(c + n)n\

if a + b < c.

7.7 Stirling's Formula

Stirling's formula is a simple and useful approximation to F(x) for large x; in

particular, it tells precisely how rapidly T(x) grows as x —> oo.
We begin by analyzing the case where x is an integer n + 1, for which T(x) =
n\. First, observe that

log(n!) = log 1 + log 2 -\ h log n.

The sum on the right suggests a Riemann sum for Jlogxdx. Indeed, it is the
midpoint Riemann sum for f^/t, logxdx corresponding to a division into n
equal subintervals, so the latter integral provides an approximation to log(n!). In
more detail, using this Riemann sum means taking log k as an approximation to

rk+{l/2) /-1/2
/ logxdx = / log(/c + x) dx.
Jk-(l/2) J-l/2

To see how good this approximation is, we approximate log (A; + x) by its tangent
line at x = 0 and use Taylor's theorem to estimate the error:

x 1 x2
log(fcH-x) = logk+T + Ek(x), \Ek(x)\< sup —.
k \t\<\x\ {k + ty l\

(Here (k+t)~2 is the absolute value of the second derivative of log(/c+i).) Clearly,
for \x < h and k > 1 we have

\Ek(x)\ < 1-i)2 j

" 8(±A;)2 " 2/c2'

Hence,

/•1/2 /■1/2
= / [logk + k lx
7-1/2
E =
J-l/2 + Ek{i
where

I''2 Ek(x) dx
1
(7.59) Cfc —
<
J-l/2 2A;2
7.7. Stirling's Formula 351

Adding these equalities up from k = 1 to k = n, we obtain

I logxdx = log(n!)
Jl/2

On the other hand,

/ logxdx = [xlogx - = (n + i) log(n + \)-n-\\og\

l/2

= (n + \) logn - n + (n + ±) log - ±log |.

Therefore,

log(n!) - (n + i) logn + n = (n + |) log(l + (277.)"1) - \log i - J]cfe.

Since log(l + x) « x for x near 0, as n —> oo the quantity on the right approaches
the constant | — \ log | -J]^0 c^, where the series converges by the estimate (7.59).
Exponentiating both sides, we obtain a preliminary version of Stirling's formula:

n!
7.60 Lemma. As n oo, approaches a limit L G (0, oo).
n n+(l/2)e-n

Since F(n) = ji!/n, Lemma 7.60 says that T(n)/(nn ^/^e n) —> L as n —>
oo. We now extend this result from integers n to real numbers x. To do so we need
a slight strengthening of Theorem 7.57, namely, the uniformity of the convergence
with respect to the parameter a.

xar(x)
7.61 Lemma. For any A > 0, sup - 1 as x —> oo.
0<a<,4 T(x + a)

Proof. With g(u) = f(u)a l — 1 as in the proof of Theorem 7.57, the function
l#'(tl)l = l(a ~ l)/(u)a 2f'(.u)\ is jointly continuous in a and u in the compact
region a £ [0, A], u € [0,1], so its maximum on this region is finite. The constant C
in that proof can be taken to be this maximum when a € [0, A], and the conclusion
of the proof shows that

xaT(x)
sup - 1
0<a<A T(x + a) x

which yields the desired result. □

352 Chapter 7. Functions Defined by Series and Integrals

T(x)
7.62 Lemma, lim —-p-, = L, where L is as in Lemma 7.60.
x—>oo xx \ i ' e~x

Proof. Any number x > 1 can be written as x = n + a where n is a positive integer

and 0 < a < 1, so that

T(x) T(n + a)
xx-(l/2)e-x (n_|_a)n+a-(l/2)e-n-a

ea
77.
n-(l/2)e-n nar(n)

By Lemma 7.61, the first factor in this last expression will be as close to L as we
please when n is sufficiently large. By Lemma 7.62, the second factor will be as
close to 1 as we please when n is sufficiently large and 0 < a < 1. The same is
also true of the third factor; indeed, by taking logarithms it is enough to verify that

a - {n + a - \)\og(l + -
will be as close to 0 as we please when n is sufficiently large and 0 < a < 1, and
this is easily accomplished by using the Taylor expansion of log(l + t) about t = 0.
(Details are left to the reader as Exercise 1.) Combining these results, we see that
r(x)/xx~(1/2^e~x becomes as close to L as we please when x is sufficiently large,
as claimed. □

Y(x)
7.63 Theorem (Stirling's Formula), lim _(\ ' _ = \/2ix.

Proof. It remains only to identify the constant L in Lemma 7.62. According to that
lemma, the quantities

T(x) T(x + \) T(2x)

xx-(l/2)e-x' (x + ±ye-x-{l/2) ' (2x)2a-(1/2)e-2^

all approach L as x —> 00. Dividing the product of the first two by the third and
using the duplication formula

we see that

r(x)
L = lim
r(2s)

- lim V27re [ 1 + — ) .
2x )

The last factor on the right tends to e 1//2 as x —> 00, so we are done. □
7.7. Stirling's Formula 353

Stirling's formula is often written as

where ~ means that the ratio of the quantities on the left and right approaches 1 as
x —► oo. (The difference of these two quantites, however, tends to oo along with
x.)

EXERCISES

1. Complete the proof of Lemma 7.62 by showing that for some constant C > 0
we have sup0<a<1 a - (n + a - \) log[l + (a/ri)] j < C/n.
2. If a fair coin is tossed 2n times, the probability that it will come up heads
exactly n times is (2n)!/(n!)222n. (The total number of possible outcomes is
22n, and the number of those with exactly n heads is the binomial coefficient
{™) = (2n)!/(n!)2.) Use Stirling's formula to estimate this probability when
n is large.

3. Stirling's formula for factorials,

hm -ttt^. = V2tt,

can be proved more simply than the general case. One begins, as we did, by
proving Lemma 7.60, but it is then enough to evaluate the constant L there.
To do this, show that the fraction on the right of Wallis's formula (Exercise 8
in §7.6) equals [2rin!]4/[(2n)!]2(2n + 1), then use Lemma 7.60 to show that it
approaches \L2 as n -^ oo; conclude that L = \/2n.
Chapter 8

FOURIER SERIES

Fourier series are infinite series that use the trigonometric functions cos n6 and
sinnO, or, equivalently, emd and e inS, as the basic building blocks, in the same
way that power series use the monomials xn. They are a basic tool for analyzing
periodic functions, and they therefore have applications in the study of physical
phenomena that are periodic in time (such as circular or oscillatory motion) or in
space (such as crystal lattices). They can also be used to analyze functions defined
on finite intervals in ways that are useful in solving differential equations, and this
leads to many other applications in physics and engineering. The theory of Fourier
series and its ramifications is an extensive subject that lies at the heart of much
of modern mathematical analysis. Here we present only the basics; for further
information we refer the reader to Foil and [6], Kammler [10], and Korner [11].

8.1 Periodic Functions and Fourier Series

A function / on K is called periodic with period P, or P-periodic for short, if

fix + P) = f(x) for all x. In this case, / is completely determined by its values
on any interval [a, a + P) of length P, including one but not both of the endpoints;
conversely, any function / defined on an interval [a, a + P) can be extended in a
unique way to be a periodic function on K, by declaring that f(x + nP) = f(x)
for all x € [a, a + P) and all integers n. This correspondence between functions
on intervals and periodic functions on R will be useful later; for the time being, we
focus our attention on periodic functions.
Unlike power series, Fourier series can be used to represent functions that are
quite irregular. To keep the discussion reasonably simple, however, we shall make
a standing assumption that allfunctions under consideration are piecewise continu-

355
356 Chapter 8. Fourier Series

ous. By this we mean, precisely, the following: A function / denned on an interval

[a, b] is piecewise continuous on [a, b] if it is continuous except at finitely many
points in [a, b], and at each such point the one-sided limits

(8.1) f(x+) = e]im/(x + e), /(*-) = ^ f(x - e)

exist (and are finite). Moreover, we shall say that a P-periodic function / on R is
piecewise continuous if it is piecewise continuous on each interval of length P. (If
it is piecewise continuous on one such interval, of course, it is piecewise continuous
on all of them.)
Note. It is sometimes convenient to allow a piecewise continuous function to
be undefined at the points where it has jumps. This does not affect anything that
follows in a significant way.
A piecewise continuous function is integrable over every bounded interval in
its domain. In this connection, the following elementary fact is worth pointing out
explicitly: If / is P-periodic and piecewise continuous, the integrals of / over all
intervals of length P are equal:
ra+P rP

(8.2) / f(x)dx = f(x)dx for every a e R.

Ja JO
The proof is left to the reader (Exercise 9).
By making the change of variable 9 = 2nx/P, we can convert any P-periodic
function into a 27r-periodic function. Namely, if f(x + P) = f(x) and we set
g{9) = f(x) = f(P6/2w), then g(9 + 2tt) = g(9). We may therefore restrict
attention to the case where the period is 2n, and we shall generally denote the inde
pendent variable by 6. There is no presumption that 9 denotes an angle, however;
it is just a convenient name for a real variable.
The basic idea of Fourier analysis is that an arbitrary piecewise continuous 2tt-
periodic function f(6) can be expanded as an infinite linear combination of the
functions emd (n = 0, ±1, ±2,.. .), or equivalently of the functions cosn# and
sinn9 (n = 0,1,2,...). In terms of the functions em6, this expansion takes the
form

(8.3)

Here / may be either real-valued or complex-valued; the cn's are complex numbers,
and the series on the right is always to be interpreted as the limit of the symmetric
partial sums in which the nth and (—n)th terms are added in together:
oo k

= lim
k—»oo
—k
8.1. Periodic Functions and Fourier Series 357

Since e±ine = cos n6 ±i sin n6, combining the nth and (—n)th terms gives

cne"10 + c_ne"m0 = an cosn9 + bn sinnG,

where

0>n = Cn * C—m t>n ~~ i\Cn C—n).

Therefore, (8.3) can be rewritten as

(8.4) f(G) = \aQ + y](qncosnfl + &nsinnfl).

The grouping of the nth and (—n)th terms in (8.4) corresponds to the grouping of
the cos n9 and sinn# terms in (8.4). (The factor of | in front of ao is an artifact of
the definition ao = cq + c_0 = 2cq.)
The series (8.3) and (8.4) can be used interchangeably. The more traditional
form is (8.4), but each of them has its advantages. The advantages of (8.4) derive
from the fact that cos n9 and sin n9 are real-valued and are respectively even and
odd; the advantages of (8.3) derive from the fact that exponentials tend to be eas
ier to manipulate than trig functions. For developing the basic theory, the latter
consideration is compelling, so we shall work mostly with (8.3).
The questions that face us are as follows: Given a 2yr-periodic function /, can
it be expanded in a series of the form (8.3)? If so, how do we find the coefficients
cn in this series? It turns out to be easier to tackle the second question first. That
is, we first assume that / can be expressed in the form (8.3) and figure out what
the coefficients cn must be; then we show that with this choice of cn, the expansion
(8.3) is actually valid under suitable hypotheses on /.
Suppose, then, that the series X]^°oo cn^m6 converges pointwise to the function
/(#), and suppose also that the convergence is sufficiently well behaved that term-
by-term integration is permissible. The coefficients cn can then be evaluated by
the following device. To compute ck, we multiply both sides of (8.3) by e~lk0 and
integrate over [—ir, tx\.

Now,

(8 5) r ei(n-k)0 de = f [•(« - k)]-W*-W\% = 0 if n ± k,

358 Chapter 8. Fourier Series

Thus all the terms on the right of the integrated series vanish except for the one
with n = k, and we obtain

or, relabeling k as n,

(8.6) Cn = — T f{6)e-me d6.

This is the promised formula for the coefficients cn. The corresponding formula
for an and bn in (8.4) follows immediately:
(8.7)

an = cn + c_n = -L T /(#)[e-me + eme] d6 = - f f(0)

bn = z(cn - c_n) = ^ y" /(0)[e-ine - em0] dQ=X-^ f(6.
Of course, according to (8.2), the integrals over [—vr, 7r] in (8.6) and (8.7) can be
replaced by integrals over any interval of length 2tt.
It is useful to keep in mind that in either (8.3) or (8.4), the constant term in the
series is

(8.8) Co = \aQ = ^- [ f(9)d9,

-^7I" J — IT
the mean value of / on the interval [—tt. it] (or on any interval of length 2tt).
What have we accomplished? We have shown that if f(9) is the sum of a series
X^oo cnein9> an<3 if term-by-term integration is legitimate, then the coefficients
cn must be given by (8.6), but as yet we know almost nothing about the class of
functions that can be represented by such series. But now the formula (8.6) provides
a starting point for studying this matter. Indeed, if / is any integrable 2vr-periodic
function, the quantities

f(6)cosn0d9, bn = - I f(9)sinn6d9,
1 f

are well defined. We call them the Fourier coefficients of /, and we call the series
oo oo

\\ cnemd = \ao + ^(an cosnO + bn smn8)

-00 1

the Fourier series of /.

8.1. Periodic Functions and Fourier Series 359

The study of general Fourier series will be undertaken in the following sections.
We conclude this one by working out two simple examples.

Example 1. Let f(9) be the 2?r-periodic function determined by the formula

f(0)=6, (-ir<9<ir).

That is, / is the sawtooth wave depicted in the top graph of Figure 8.1. The
calculation of the Fourier coefficients cn is an easy integration by parts for

e-in0~[ n f_-\ \n+l

cn = IT- \ Ve «# = — : I"
2tt /__
— 7T
2?r —in n2 7,71

since e±m7r = ( — l)n. Moreover, cq = 0 since the mean value of / is clearly

zero. Thus the Fourier series of / is

cin6

Grouping together the nth and (—n)th terms yields the equivalent form

(8.9) 2V^ '- sin n9.

^-^ n

(We could also have obtained this series directly by using (8.7); we have an = 0
for all n since / is odd, and a calculation similar to the one above shows that
bn = 2{-l)n+l/n.)
The series (8.9) converges for all 6 by Dirichlet's test. (See Corollary
6.27. The factor of (-l)n+1 does not affect the result, since (-1)71 sinnfl =
sinn(# + 7r).) The sketches of some of the partial sums in Figure 8.1 lend plau
sibility to the conjecture that (8.9) does indeed converge to the function f(6),
at least at the points where / is continuous. (At the points 6 = (2k + l)n where
/ is discontinuous, every term in (8.9) vanishes.)

EXAMPLE 2. Let g(6) be the 2tt-periodic function determined by the formula

g(9) = \e\, (-7T<9<n).

That is, g is the triangle wave depicted in the top graph of Figure 8.2. Here it
is a bit easier to calculate the Fourier coefficients in terms of sines and cosines.
Since g is an even function, we have bn = 0 for all n and

i r 2 r
On — - / g(6) cos n9d9 = - / 9 cos nOdO.
K J-7T X JO
360 Chapter 8. Fourier Series

\ A
A
V V
Figure 8.1: Top to bottom: The sawtooth wave of Example 1
and the partial sums 54, Sio, and 5i6 of its Fourier series
~1 siring).
8.1. Periodic Functions and Fourier Series 361

Figure 8.2: Top to bottom: The triangle wave of Example 2 and

the partial sums Si, S%, and 53 of its Fourier series (5& = (tt/2) -
(4A0 Ei(2m ~ I)"2 cos(2m - 1)9).

For n = 0 we have 00 = (2/tt) J^ 0 d6 = it, and for n > 0 an integration by

parts gives

2 [0sinn0 cosn#l7r _ 2 (-l)n - 1

7T
[ n
n J0 ^ n

In other words, an = 0 when n is even and an = —4/vrn2 when n is odd, so

we obtain the Fourier series

7T cosn8 vr 4 ^^ cos(2m — 1)9

(8.10)
2 7T £
J-? W ^ (2m -
n=l,3,5,...

Since ^^° n 2 < 00, this series converges absolutely and uniformly by the
Weierstrass M-test. Again, a glance at its first few partial sums in Figure 8.2
supports the conjecture that its full sum is g(9).
362 Chapter 8. Fourier Series

EXERCISES

In Exercises 1-8, find the Fourier series of the 27r-periodic function f(9) that
is given on the interval (—it, it) by the indicated formula. (All but Exercise 5 are
either even or odd, so their Fourier series are naturally expressed in terms of cosines
or sines.) Sketches of these functions are given in Figure 8.3.

1. f(9) = < ' (the square wave).

2. f(9) = sin2 0. (You don't need calculus if you look at this the right way.)
3. f($) = | sin 91. (Hint: sin a cos b = \ [sin(a + b) + sin(a - 6)].)
4. f(0) = 92.
5. f{9) = ebB (b > 0).
6. f(9)=9(7T-\9\).

7. f(0) = I 1//a ''"' "' where 0 < a < tt. (The values of /
are chosen to make the areas of the rectangles between the graph of / and the
x-axis on the intervals [0, a] and [a, it] both equal to 1.)

8. f{6) = < a where 0 < a < ix. (The constants are

AJ \0 (a<|0|<7r),
chosen to make the areas of the triangles under the graph of / equal to 1.)

9. Prove that (8.2) is valid for every piecewise continuous P-periodic function /.
(This can be done either directly by changes of variable or by differentiating
f"+ with respect to a via Theorem 4.15a.)

8.2 Convergence of Fourier Series

Given a piecewise continuous 2vr-periodic function /, we form its Fourier series:

Does this series converge? If so, what is its sum?

These questions are rather delicate. In the first place, since \ein9\ = 1, a neces
sary condition for the convergence of the Fourier series is that cn —>■ 0 as n —> oo,
but the only estimate on the cri's that is obvious from the definition is that they are
8.2. Convergence of Fourier Series 363

Exercise 1 Exercise 2

Exercise 3
Exercise 4

Exercise 5 Exercise 6

o"! !"tt" a ir

Exercise 7 Exercise 8

FIGURE 8.3: The functions in Exercises 1-8 of §8.1.

364 Chapter 8. Fourier Series

bounded by a constant:

However, it is actually true that cn —> 0; in fact, we can say something more precise.

8.12 Theorem (Bessel's Inequality). Iff is 2n-periodic and piecewise continuous

and cn is defined by (8.11), then

--n 2 <
_;

In particular, ^ \cn\2 < cxd, and hence limn_>.-|::oo cn = 0.

Proof. We examine the difference between / and a partial sum of its Fourier series.
Since the absolute value of a complex number z is given by \z\2 = z~z, we have

N N

-N -N -N

N N

cnf(9)eine]
-N m,n=-N

Next, integration of both sides over [—tt, tt], using the definition of cn and the
relation (8.5), yields

1 N
-N -N

-N

The integral on the left is clearly nonnegative, so

N
0<

Letting N ->■ oo, we obtain the desired result. LI

8.2. Convergence of Fourier Series 365

To proceed further in our study of the convergence of the Fourier series of a

function /, we must take a closer look at the partial sums

(8.13) SfN(9) = cn =

Substitution of the formula for cn into the sum yields

-N

(The second equality is obtained by replacing n by — n, which leaves the sum from
—N to N unchanged, and the third one comes from the change of variable ip =
ip - 9 with the help of (8.2).) In other words,

r 1 N
(8.14) SfN{9) = f{tp + 0)DN(cp)dip, where DN{<p) =

Z)/v is called the TVth Dirichlet kernel. Its essential properties are summarized in
the following lemma.

8.15 Lemma. Let Dpf(ip) be the function defined in (8.14). Then:

f° f* 1
a. / DN((p)d(p= / DN{(p)dip = -.
J-n Jo z
I e _ e-iNtp
b. DN{(p) = : .

Proof. The validity of (a) is most easily seen by rewriting (8.14) as

{2tt)^1 + 7T"1 ^^cosn^ and integrating this sum term by term. Since sinO =
sin(±nvr) = 0, only the constant term gives a nonzero contribution. To prove (b),
we use the formula (6.2) for the sum of a finite geometric progression:

_ i

□
366 Chapter 8. Fourier Series

Incidentally, if we multiply and divide the formula in Lemma 8.15b for Dw((p)
by e~*^/2, we obtain

1 ei{N+(l/2))ip _ e-i

This shows that D^ is real-valued and gives an easy way to visualize it: Its graph
is the rapidly oscillating sine wave y = sin(N + ^)<p, amplitude-modulated to fit
inside the envelope y = ±(2tt sin ^)~1. (The reader may wish to generate graphs
of Dn for various values of iV on a computer.)
We are now ready to formulate and prove the basic convergence theorem for
Fourier series. It turns out that piecewise continuity of a periodic function / is not
enough to yield a good result. Instead we shall assume, in effect, that not only
/ but also its derivative /' is piecewise continuous. More precisely, we shall say
that a periodic function / is piecewise smooth if, on any bounded interval, / is
of class C1 except at finitely many points, at which the one-sided limits f(0+),
/(#-), f'(0+), and f'(6-) (as defined in (8.1)) exist and are finite. (Note that this
definition of piecewise smoothness is more general than that given in §5.1, which
required the function to be continuous.) Pictorially, / is piecewise smooth if its
graph over any bounded interval is a smooth curve except at finitely many points
where it has jumps (if / is discontinuous) or corners (if / is continuous but /' is
discontinuous). In addition, the one-sided tangent lines at the jumps and corners
are not allowed to be vertical.

8.16 Theorem. Suppose f is 2ir-periodic and piecewise smooth. Then the partial
sums SN(9) of the Fourier series of f, defined by (8.13), converge pointwise to
\[f{6~) + /(#+)]• In particular, they converge to f(9) at eachpoint 6 where f is
continuous.

Proof. By Lemma 8.15a, we have

IKO-) = f(9-) I DN{<p) dV, 1/(0+) = f(9+) r DN(<p) dip,

J-TT JO

so by (8.14), the difference between 5jy(0) and its asserted limit is

-i [/oh
8.2. Convergence of Fourier Series 367

Our object is to show that this quantity vanishes as N —> oo. By Lemma 8.15b, we
can rewrite it as

M7> 5F — 7T

where

' ' ' if -it < <p < 0,

^H =

(We could define g(0) to be anything we please; altering the value at this one point
does not affect (8.17), by Proposition 4.14.) On the interval [—7r,7r], g(tp) is con
tinuous wherever f(ip + 6) is and has jump discontinuities wherever f(ip + 8) does,
except for an additional singularity at <p = 0 caused by the vanishing of el<p — 1
there. But this singularity is also at worst a jump discontinuity; that is, the limits
<?(0+) and g(0—) both exist. Indeed, by l'Hopital's rule,

and likewise g(0—) = i~lf'{6-). In short, g is piecewise continuous.

Now we are done. By Bessel's inequality, the Fourier coefficients of g,

tend to zero as n —> =too. But the quantity (8.17) is simply C_/v_i - CV, so it
vanishes as iV —> oo, as desired. □

If / is piecewise continuous, there may be some question as to how to define /

at its points of discontinuity; as we mentioned earlier, we may wish to allow / to
remain undefined at these points. But Theorem 8.16 shows that for the purposes of
Fourier analysis, the natural choice is the average of the left- and right-hand limits:
f(°) = I[/(#-) + /(#+)]■ We shall say that / is standardized if it satisfies this
condition at all 9; thus, every standardized piecewise smooth 27r-periodic function
is the sum of its Fourier series at every point.

8.18 Corollary. If f and g are standardized piecewise smooth 2ir-periodic func

tions with the same Fourier coefficients, then f = g.

Proof, f and g are the sum of the same Fourier series. □

368 Chapter 8. Fourier Series

To illustrate Theorem 8.16, let us consider the two examples in §8.1.

EXAMPLE 1. The sawtooth wave f(9) defined by f(9) = 9 for |0| < tt is
smooth except at the odd multiples of tt, where its left- and right-hand limits are
n and —tt, respectively. Thus the Fourier series of / converges to / everywhere
except at the odd multiples of tt, where it converges to 0. On the interval
(—tt.tt), the result is

00(_l\n+i q
sin n6 = - for — tt < 6 < tt.
^ n 22

In particular, by taking 6 = ^vr, we obtain the interesting formula

(l) 11 i „

^ 2m - 1 35 7 ~ 4'
which we derived by other methods in Example 5 of §7.3.

Example 2. The triangle wave g{6) denned by f(6) = \9\ for |#| < it is
piecewise smooth and everywhere continuous, so it is the sum of its Fourier
series at every point. Thus,

it 4 ^ cos(2m - 1)0
> = 0 tor - 7T < 9 < TT.
2 7T ^ (2m - I)2

By taking 6 = 0 (or 0 = ±7r), we obtain another interesting formula:

^ 1 111 Z[!
^ (2m - I)2 = 32 + 52 + 72 + : 8
From this it is also easy to obtain the sum

A 1 111
^ n2 " + 22 + 32 + 42
by separating out the odd and even terms:

1 1 \ / 1 1
)+{
vr 2 1 / 1 1 \ tt2 S

so that 35/4 = tt2/8, or 5 = tt2/6.

8.2. Convergence of Fourier Series 369

j n

FIGURE 8.4: The function h of Example 3.

We conclude by remarking that one can often use simple changes of variable
to generate new Fourier expansions from old ones without recalculating the coeffi
cients from scratch.

Example 3. Consider the modified triangle wave h whose graph is given

in Figure 8.4. It is related to the triangle wave g in Example 2 by h{9) =
g{0 + |tt), andcos(2m - 1)(0 + \-n) = (-l)msin(2m- 1)0, so

-l)m-1sin(2m- 1)0
(2m- I)2
j.

Abel Summability of Fourier Series. The Fourier coefficients of a periodic

function / are defined whenever / is piecewise continuous, but we have proved
the convergence of the Fourier series only when / is piecewise smooth. In fact,
it has been known since 1876 that there are continuous periodic functions whose
Fourier series fail to converge at some points. (The examples are all quite compli
cated.) However, if / is merely piecewise continuous, we can still recover / from
its Fourier series Yl°?oo cnem9 by the method of Abel summation that we discussed
at the end of §7.3. Namely, for 0 < r < 1 we consider the series

(8.19)

and its limit as r —> 1- (i.e., as r approaches 1 from the left).

Since the coefficients cn are bounded, the series (8.19) converges absolutely by
comparison to the geometric series . Moreover, substitution of the formula
(8.6) for cn into (8.19) gives

1 v^ >(<9-V)
dtp.
370 Chapter 8. Fourier Series

Since / is bounded, the Weierstrass M-test (comparison to ^r'nl again) gives the
uniform convergence to justify interchange of the summation and integration, and
a couple of manipulations like those that lead to (8.14) then show that

(8.20) Arf(0) = j f(0 + <p)Pr(<p) dV, where Pr(<p) = —

The function Pr is called the Poisson kernel. Like the Dirichlet kernel, it satisfies

0 f"K y

(8.21) / Pr(<p)d<p= /
J-n JO

(write Pr(f) = (Zk)'1 + tt~[ Yyj° r™ cos niP and integrate term by term), and it is
easily expressed in closed form since it is the sum of two geometric series:

(8.22)
ir°°
ir °° i i r i —iv>
i V

1-r2 1-r2
2tt(1 — re%v)(l — re~1^) 2vr(l -\- r2 — 2r cos ip)

However, the Poisson kernel has one additional crucial property that is not shared
by the Dirichlet kernel:
(8.23)
For any 5 > 0, Pr(<fi) —» 0 uniformly on [—it, —5] and on [5, ir] as r —> 1—.

Indeed, by (8.22), for S < \tp\ < tt we have

1-r2
0
r2 -2rcos<5)'

and the expression on the right tends to zero as r —> 1 -. With these results in hand,
we come to the main theorem.

8.24 Theorem. Suppose that f is 2-n-periodic. If f is piecewise continuous, then

for every 0. If f is continuous, then Arf —» / uniformly on M as r —> 1.

8.2. Convergence of Fourier Series 371

Proof. We sketch the ideas and leave the details to the reader as Exercises 5 and 6.
Given# G Rande > 0, wechoosec) > 0 small enough so that \f{0+<p)—f{0+)\ <
e when 0 < <p < 8 and \f(0 + <p) — f(Q-)\ < e when -5 < <p < 0. We then write
the formula (8.20) for Arf{6) as

Arf{9) =
s -

The first and last integrals tend to zero as r —* 1- by (8.23). In the second and
third integrals, f(0 + cp) is within e of /(#-) and f(6+), respectively, and (8.21)
and (8.23) together show that the integrals of Pr(<p) over [—6,0] and [0,5] tend to
\ as r -^ 1-. The upshot is that Arf{6) is within 2e of \[f{6-) + /(#+)] when
r is sufficiently close to 1, and since e is arbitrary, the first assertion is proved.
If / is continuous, it is uniformly continuous on [—vr, it] by Theorem 1.33 and
hence uniformly continuous on R by periodicity. This means that the 5 in the
preceding paragraph can be chosen independent of 6, and the argument given there
then yields uniform convergence. □

EXERCISES

1. Find the Fourier series of the sawtooth waves depicted below by modifying the
series in Example 1.

■tt/2

(a) (b)

2. Find the Fourier series of the 27r-periodic function f(9) defined by f(9) =
- ivr on the interval [— jtt, f vr]. Use the result of Exercise 4 in §8.1.

3. Find the Fourier series of the 2ir-periodic functions defined on the interval
(—it, it) by the indicated formulas by modifying the series in the exercises of
§8.1.
(-7T < 9 < 0),
a. f(0) =
(0<9 <tt).
372 Chapter 8. Fourier Series

V1 ' /; whereO<a<7r.
0 (a<\8\<n),
d. f{9) = sinhd.

4. Find the sums of the following series by applying Theorem 8.16 to the series
obtained in the indicated exercises from §8.1 and choosing appropriate values
of0.

a. \J —2 and yj -j—q (Exercise 3). Can you sum the first series
l ' i
in a more elementary way by rewriting it as a telescoping series?
00 y °°
b. > —^ and > -z— (Exercise 4).
z_^ n2 z_^ n2
l l

c. y^ —5—-x
n + ^
and y^ —=——, where b > 0 (Exercise 5).

00 (
d. >^ -, ^7 (Exercise 6).
z—^/ (2n 1)
1

5. Fill in the details of the proof of the first assertion of Theorem 8.24.

6. Fill in the details of the proof of the second assertion of Theorem 8.24.

8.3 Derivatives, Integrals, and Uniform Convergence

We next study the differentiation and integration of Fourier series. As a first step,
we point out that by the fundamental theorem of calculus as stated in §4.1, the
formula

(8.25) f(b)-f(a)= /fb f'{0)dd

,
Ja

is valid when / is continuous and piecewise smooth, even though /' may be un
defined at finitely many points. (However, it is generally false if / itself has
jump discontinuities.) In particular, if / and g are both continuous and piecewise
smooth, then so is fg, and an application of (8.25) to the latter function yields the
integration-by-parts formula

' f'(x)g(x)dx = f(x)g(x)\ba- f f(x)g'(x)dx.

a Ja
8.3. Derivatives, Integrals, and Uniform Convergence 373

The first main result is that there is a very simple relation between the Fourier
coefficients of / and those of /'.

8.26 Theorem. Suppose f is 2tt-periodic, continuous, and piecewise smooth, and

let cn and dn be the Fourier coefficients of j and f, given by (8.6). Then

c'n = incn.

Equivalently, ifan, bn and a'n, b'n are the Fourier coefficients of f and f given by
(8.7), then a'n = nbn and b'n = —nan.

Proof. Simply integrate by parts:

The first term on the right vanishes because f(0)e m0 is 27r-periodic, and the sec
ond one is incn. The argument for an and bn is similar (Exercise 1). □

Note that Theorem 8.26 makes no claim about the Fourier series of /'; it is
valid whether or not that series actually converges. If we add more conditions on /
to ensure that it does, we obtain the following result:

8.27 Corollary. Suppose that f is 2vr -periodic, continuous, and piecewise smooth,
and that f is also piecewise smooth. If

<in cosn8 + bn sinn

is the Fourier series of f, then f'(9) is the sum of the derived series

incnem
cnem = \J(n6n cos n9 —
1

at every 8 at which f'(0) exists. At the exceptional points where f has jumps, the
series converges to \[f'{0—) + /'(

Proof. By Theorem 8.16, /' is the sum of its Fourier series everywhere except
where it has jumps, and the coefficients in that series are given by Theorem 8.26.
□
374 Chapter 8. Fourier Series

Example 1. The triangle wave (Example 2 in §8.1) is continuous and piece-

wise smooth, and its derivative is the square wave (Exercise 1 in §8.1). We can
therefore recover the result of the latter exercise by differentiating the series
(8.10):

4 y, Sin(2m - 1)9 (-1 (-tt < 6 < 0),

2 2m-1 ' \\
\ (0<6»<tt).
Next, we consider integration of Fourier series. There is one annoying point
that must be kept in mind: If / is a piecewise continuous 2vr-periodic function, its
indefinite integral F{9) = JQ f(ip) dtp will be periodic only when

that is, when the mean value of / over an interval of length 2vr is zero, or, equiv-
alently, when the constant term in the Fourier series of / vanishes. We make this
assumption in the following theorem; if it is not valid, we may wish to subtract off
the constant term and deal with it separately.

8.28 Theorem. Suppose f is 2v -periodic and piecewise continuous, with Fourier

coefficients cn given by (8.6) or an, bn given by (8.7). Assume that cq = ^ao = 0.
If F is a continuous, piecewise smooth function such that F' = / (except at the
points where f has jumps), then

oo , ,

F{9) = C0 + Y %ine = C0 + Y — sin nO - ^ cos nO

^m i Vn
for all 6, where Co is the mean value of F on [—vr, it].

Proof. F is 27r-periodic by (8.2), for

[ I-TT

2tt)-F{6)= f(<p)d<p=
J6 J-n

By Theorem 8.16, F is the sum of its Fourier series at every point, and by Theorem
8.26, its Fourier coefficients Cn are given for n ^ 0 by inCn = cn (and likewise
for the cosine and sine coefficients). The constant term Co is, as always, the mean
value of F. □

Observe that the series in Theorem 8.28 is obtained by formally integrating the
Fourier series of / term-by-term, whether the latter series converges or not.
8.3. Derivatives, Integrals, and Uniform Convergence 375

Example 2. Subtraction of the mean value from the triangle wave (Example
2 in §8.2) and multiplication by —2 gives

and integration of both sides from 0 to 6 then yields

which is the result of Exercise 6 in §8.1.

Theorem 8.28 and the Corollary 8.27 exhibit situations where we can integrate
or differentiate a series termwise without worrying about uniform convergence.
However, uniform and absolute convergence are still highly desirable things, so
we present a simple criterion for the Fourier series of a function to have these
properties.

8.29 Theorem. If f is 2tt-periodic, continuous, and piecewise smooth, then the

Fourier series of f is absolutely and uniformly convergent.

Proof. Let cn and c'n be the Fourier coefficients of / and /'. Since |cr*e | =
cn\, the absolute convergence of J2cne%n9 is equivalent to the convergence of
^ \cn\, and by the Weierstrass M-test, this also implies the uniform convergence
of J2 Cnein9. But by Theorem 8.26, cn = djin for n ^ 0, so

\cn\<Wn\2 + \n\-2) (n#0).

(The inequality a(3 < ±(a2 + /32) is valid for all a, (3 € R since a2 + f32 - 2a(i =
(a — (3)2 > 0.) But the series J2 \c'n\2 and J2n~2 are both convergent — by
Bessel's inequality in the former case, since /' is piecewise continuous — and
hence so is ]T \cn\. □

We conclude this section with an important feature of Fourier series, which we

state as a general principle rather than as a precise theorem:

The degree of smoothness of a periodic function is closely related to the rate of

decay of its Fourier coefficients, that is, to the rate of convergence of its Fourier
series.
376 Chapter 8. Fourier Series

Indeed, let / be a 27r-periodic function / with Fourier coefficients cn. If / is of

class Ck, then f^ is a continuous 27r-periodic function whose Fourier coefficients
are (in)kcn, by Theorem 8.26. By Bessel's inequality, limn_>oo \nkcn\ = 0, so \cn\
tends to zero faster than \n\~k as n —> ±oo. Conversely, suppose \cn\ < C\n\~k~e
for some C, e > 0. Then ^ \n:'cn\ < oo for j < k, so the series J2cnemd can
be differentiated termwise k — 1 times with the differentiated series being abso
lutely and uniformly convergent, and hence / is of class Ck~1. (A number of other
variations can be played on this theme.)
We can see this phenomenon in the examples of §8.1. The sawtooth wave
has discontinuities, and its Fourier coefficients decay like n 1; the triangle wave
is continuous but its first derivative is not, and its Fourier coefficients decay like
n~2. Figures 8.1 and 8.2 show clearly that the Fourier series of the triangle wave
converges more rapidly than that of the sawtooth wave.

EXERCISES

1. Verify the assertion about an and bn in Theorem 8.26.

2. Given a e (0, it), let / be the 27r-periodic function defined by f(6) = a"1 for
|0| < a and f(0) = {a - tt)"1 for a < \9\ < ir.
a. Find the formula for g{6) = Jo f(tp) dip on [—ir, n] and sketch its graph.
b. Use the Fourier series of / found in Exercise 7 of §8.1 to compute the
Fourier series of g.

3. By applying Theorem 8.28 to the result of Exercise 4 of §8.1, show that:

<,).

4.X0 ft
PS*
oo -. 4
El 7T4
n* ~ 90'
4. From Exercise 3 of §8.1, we know that

. 2 4^cos2n0. n^a^
sm 9 = > —s for 0 < 9 < n.
ix n ^-^ in2 - 1

Show that this series can be differentiated or integrated termwise to yield two
apparently different series expansions of cos 0 for 0 < 9 < it, and reconcile
these two expansions. (Hint: Example 1 of §8.2 is useful.)
8.4. Fourier Series on Intervals 311

5. Let f(9) be the 27r-periodic function such that f(6) = ee for \0\ < n, and let
J2^oo cnein6 be its Fourier series. If we formally differentiate this equation,
we obtain ee = Yl°?oo incnem8 for \9\ < jr. But then cn and incn are both
equal to [2tt)~1 f* eee~in9 d9, so cn = incn and hence cn = 0 for all n.
Clearly this is wrong; where is the mistake?

6. How smooth are the following functions? That is, for which k can you show
that the function is of class Ck?

ein0 ^ cos n9 y, cos 2n9

0 0

8.4 Fourier Series on Intervals

A 27r-periodic function is completely determined by its values on any interval of

length 2tt. Conversely, if one is given a function / defined on an interval of length
2n, say [—7r, it], one can extend / to be a 27r-periodic function on M by declaring
that f{6 + 2kn) = /(0) for all 6 e [-it,it] and k e Z. (Actually, this definition
is not consistent at the points 6 = (2k + l)n unless /(—ix) = f(ir), but one can
redefine / to be any given number at these points, such as ^[/(—tt) + /(tt)].) If the
original / on [—tt, it] is piecewise continuous or piecewise smooth, the same will
be true of its periodic extension. However, even if/ is perfectly smooth on [-n, n],
there will usually be discontinuities in the periodic extension or its derivatives at
the points (2k + l)n where the translates of / are joined together. (For example,
the periodic extension of f(9) = 9 on [—vr, vr] is the sawtooth wave.)
By considering the periodic extension, then, one can use Fourier series to ex
pand a piecewise smooth function on [—tt,tt] in terms of trig functions. All of
the results in the preceding sections apply, except that in using the results of §8.3
one must remember to take into account the possible extra discontinuities in the
periodic extension or its derivatives at the points (2k + l)n.
There is an extra twist we can add to this construction that is useful in many
situations. Suppose that we are considering functions on [0, n] rather than [-tt, tt].
Given a piecewise continuous function / on [0, tt], we first extend it to [—vr, tt] by
declaring it to be even (see Figure 8.5), and then extend it to be 27r-periodic on HL
That is, we define the even extension /even of / on [—tt, tt] by

)' l\f(-9) if-TT<e<o.

For this extension the Fourier sine coefficients bn all vanish because /even(^) sinn6>
378 Chapter 8. Fourier Series

Figure 8.5: A function on [0, tt] (above) and its even and odd exten
sions to [—7T, 7i] (below, left and right).

is an odd function, and the cosine coefficients an are given by

/even(0) COS n9 d9 = - I f(6) COS n9 d6.

7T JO

The resulting Fourier series is |oo + ^i° an cos nO.

On the other hand, we could also consider the odd extension of / to [—vr, n]
(see Figure 8.5):

/odd(0) =■{ -f(-e) if-7T<9<o,

kO if(9 = 0,±7r.

Here the Fourier cosine coefficients an all vanish, and the sine coefficients bn are
given by

bn = - f(M(0) sinnO dO = -- I f(9)sinn9d9.

TT J-n K Jo
The resulting Fourier series is ^^° bn sinn9.
We are thus led to the following definitions: If / is a piecewise continuous
function on [0,ir], its Fourier cosine series is the series

2 r
an = — / f(9)cosn9d9,

and its Fourier sine series is the series

2 f71
7 bn = — / f(9)sinn9d9,
% Jo
8.4. Fourier Series on Intervals 379

Example 1. Let f(6) = 6 on [0, tt]. The even and odd periodic extensions of
/ are the triangle and sawtooth waves, respectively, and the Fourier cosine and
sine series of / are

tt sinn6>

(2m - I)2 *rf n

respectively.

If / is piecewise smooth on [0, tt], its even and odd periodic extensions will be
piecewise smooth on IR. If /(0) = /(0+) and /(tt) = /(tt-), its even periodic
extension will be continuous at both 0 and tt, but its odd periodic extension will
have jumps at 0 or tt unless /(0) = 0 or /(tt) = 0, respectively. In any case, an
application of Theorem 8.16 to these extensions easily yields the following:

8.30 Theorem. Suppose f is piecewise smooth on [0, tt]. The Fourier cosine series
and the Fourier sine series of f converge to ^ [/($—) + /($+)] o-t every 9 £ (0, tt).
The cosine series converges to /(0+) at 9 = 0 and to /(tt—) at 0 = it; the sine
series converges to 0 at both these points.

We may wish to consider periodic functions with period other than 2tt, or func
tions defined on intervals other than [0, tt]. The general situation can be reduced to
the one we have studied by a linear change of variable; we record the results for
future reference.
Suppose f(x) is a piecewise smooth 2/-periodic function. We make the change
of variables

Then g is 2tt-periodic, and we have

oo f7T

g{9) = T cneine, cn = — g(9)e"m0 d6.

The substitution 6 = ttx/1 then yields the Fourier series for /.

°° 1 fl
(8.31) f(x) = Vcnem7Tx/', cn = — f(x)e~m7rx/ld
-oo J '

The corresponding formula in terms of sines and cosines is

oo
EF 'Kl TT Hf* *n TT O" ~1

an cos — h bn sin —— .
L i I J
380 Chapter 8. Fourier Series

where

1 fl ti \
On = 7 / j{x)cos——dx,
nnx j i,
bn = -
l fl ti \ ■ n7rx j
f(x)sm——dx.
1 .1-1 ' ij-j I

It follows that the Fourier cosine and sine series of a piecewise smooth function /
on the interval [0,1] are

(8.32) /(x) = ^a0 + > ancos——, on = - / f(x) cos—-dx,

and

(8.33) /(x) =: ;>^nsm——, bn = - f(x)sm——dx.

We conclude with a few remarks comparing Taylor series and Fourier series,

f{x) = y/—^xn and

0 n!

as ways of expanding a function / on an interval centered at the origin. First, Tay

lor series are only defined for functions of class C°°, whereas the smoothness re
quirements for Fourier series are quite minimal. The Taylor coefficients /(") (0)/n!
depend only on the values of / in an arbitrarily small neighborhood of the origin,
whereas the Fourier coefficients cn depend on the values of / over the whole inter
val [—/, I). The partial sums of the Taylor series provide an excellent approximation
to f(x) when |x| is small but are often quite useless when |x| is large; the partial
sums of the Fourier series tend to approximate / about equally well over the whole
interval [—/,/]. (This last statement is a bit of an oversimplification!)
Despite their differences, there is a connection between Taylor and Fourier se
ries that is of considerable importance in more advanced mathematics. Namely, let
us consider a power series f(z) = J^jjJ0 anzn as a function of the complex variable
z. If we write z in polar coordinates as z = rel6 and fix r, we obtain a function
g{6) = f(reid) of the variable 9, and the power series for / becomes a Fourier se
ries for g: g(9) = ]P^°(a,,rn)eme. (It is a special kind of Fourier series, however,
since the coefficient of emd vanishes for all n < 0.)
8.5. Applications to Differential Equations 381

EXERCISES

1. Find the Fourier cosine series and the Fourier sine series of the following func
tions on the interval [0, it]. All of these series can be derived from the results
of the examples and exercises in §8.1 without computing the coefficients from
scratch.
a. f{9) = 1.
b. f(9) = sin0.
c- f(@) = @2- (f°r me sme series, use Example 1 and Exercise 6 in §8.1.)
d. f($) = 9 for 0 < 9 < \n, f{6) = ir - 9 for \ix < 9 < ir.
2. Expand the given function in a series of the given type. As in Exercise 1, use
previously derived results as much as possible.
a. f(x) — 1; sine series on [0,1].
b. f{x) = 1 for 0 < x < 2, f(x) = — 1 for 2 < x < 4; cosine series on
[0,4].
c. f(x) = Ix — x2; sine series on [0,1].
d. f(x) = ex; series of the form J^00^ cne27Tinx on [0,1].
3. Suppose / is a piecewise continuous function on [0,2/] that satisfies f(x) =
f(2l - x) (that is, the graph of / is symmetric about the line x = I). Let an and
bn be the Fourier cosine and sine coefficients of / (given by (8.32) and (8.33)
with / replaced by 21). Show that an = 0 for n odd and bn = 0 for n even.

4. Show that a piecewise smooth function / on [0, /] can be expanded in a series

as follows:

n sin y , /3n = - I /(x)sin -= dx.

I I Jq i

{Hint: Extend / to [0,21} by making it even about x = I, i.e., f(x) = f(2l — x)

for x e [/, 21], and use Exercise 3.)

8.5 Applications to Differential Equations

Fourier series were originally invented in order to solve some boundary value prob
lems of mathematical physics. In this section we study a few basic examples.

Heat Flow in an Insulated Rod. Consider a rod occupying the interval [0.1],
insulated so that no heat can enter or leave it, and let f{x) be the temperature at
position x and time t = 0. How does the temperature distribution evolve with time?
(Note: Instead of thinking of a thin rod, one can think of a thick cylindrical slab
382 Chapter 8. Fourier Series

Figure 8.6: The cylindrical slab {{x, y, z) : 0 < x < I, (y, z) e R}.

occupying the region where 0 < x < I and (y, z) € R, where R is a bounded
region in the yz-plane, as in Figure 8.6. The model of heat flow described here is
valid under the hypothesis that the temperature depends only on x.)
Let u(x, t) denote the temperature at position x and time t; thus u satisfies the
initial condition u(x, 0) = f(x). As we showed in §5.6, u obeys the heat equation
dtu — kd%u, where /c is a positive constant (equal to K/a in (5.42)). Since the
rate of heat flow across the point x is proportional to —dxu(x,t) (Newton's law
of cooling), the fact that no heat enters or leaves the ends of the rod means that u
satisfies the boundary conditions dxu(0, t) = dxu(l, t) = 0. In summary,

... du , d2u , . „. . du, . du., .

(8.34) ~ = k—2, u(X,0) «/(*), ]rxm = -(l,t) = 0.

This is the problem we propose to solve.

The first step is to find a family of solutions of the heat equation satisfying the
right boundary conditions by a device called separation of variables. The idea is
to look for solutions of the form u(x, t) = (p(x)ijj(t). For such a function, the heat
equation becomes

In this last equation, the quantities on the left and right depend only on t and x,
respectively, so they must both be equal to a constant that we call —a. Thus,

if/(t) = -kai'(t), ip"(x) = -a<p(x).

These are simple ordinary differential equations, and the general solutions are read
ily found:

i/;(t) = Coe~kat, ifi(x) = Cicos^x + C2 sin yfax.

We have thus found a large family of solutions of the heat equation of the form
<p(x)ijj(t). For these solutions, the boundary conditions dxu(0,t) = dxu{l,t) = 0
8.5. Applications to Differential Equations 383

translate into the conditions (p'(0) = </?'(/) = 0. But

{ — C\ sin \fax + C? cos \/ax),

so the condition (p'(0) = 0 forces C% = 0, and the condition y>'(l) = 0 then forces
v/a to be a multiple of n/l, or a = n2ir2/l2 where n is an integer (which might as
well be nonnegative). In short, we have obtained the following family of solutions
of the heat equation together with the boundary conditions:

Un(x,t) = exp I ^ I cos —j— (n = 0,1,2,3,...).

Since the heat equation and the boundary conditions are linear, we obtain more
general solutions by taking linear combinations of these. In fact, we can pass to
infinite linear combinations — that is, infinite series of the form

oo / 2 2 7 j. \

(8.35) u(x,t) = y j an exp ( = I cos —;—■

„ V * / i

Finally, we are ready to tackle the initial condition u(x, 0) = f(x). If we set
t = 0 in (8.35), we obtain

oo
v—\ nirx
u{x,0) — > an cos—:—,
o

so we can make u(x, 0) equal to f(x) by taking the series on the right to be the
Fourier cosine series of /, defined by (8.32)! (Note that the constant term, which
we called ^ao before, is called oq here.) In other words, to solve the problem (8.34),
we take u(x, t) to be defined by (8.35), where the coefficients an are given in terms
of the initial data / by

If1..,, 2 /•' , % nirx

f(x) dx, an = - I f(x) cos ——- dx (n > 0).
o '■ ./o '

At this point we should stop to verify that the proposed solution (8.35) of the
problem (8.34) really works, as the passage from finite linear combinations to infi
nite series has the potential to cause difficulties. In fact, everything turns out quite
nicely for this problem. In the first place, if the initial temperature distribution f(x)
is continuous and piecewise smooth (a reasonable physical assumption), the same
will be true of its even 2/-periodic extension, so by Theorem 8.29, its Fourier series
is absolutely and uniformly convergent. In particular, Y.T \an\ < °°- The abso
lute value of the nth term of the series in (8.35) is at most an , so the Weierstrass
384 Chapter 8. Fourier Series

M-test shows that this series converges absolutely and uniformly for 0 < x 0 to define a continuous function u(x, t) there. Moreover, for t > 0, the ex
ponential factors in (8.35) decay rapidly as n —> oo, which makes the convergence
even better. In particular, repeated differentiation with respect to t or x introduces
factors of nk into the series, which are still overpowered by the decay of the expo
nential factors, so the differentiated series still converges absolutely and uniformly.
If follows that u(x, t) is of class C°° for t > 0 and that termwise differentiation
is permissible; u therefore satisfies the heat equation and the boundary conditions
because each term of the series does.
Two further remarks: First, as t —> oc, the exponential factors in (8.35) all
tend rapidly to zero except for the one with n = 0, and so u(x,t) approaches
the constant ao, the mean value of /' on the interval [0, /]. In physical terms this
means that the rod approaches thermal equilibrium as time progresses. Second, the
series (8.35) will usually diverge when t < 0, for then the exponential factors grow
rather than decay! This corresponds to the physical fact that time is irreversible for
diffusion processes governed by the heat equation.

The Vibrating String. We now study the vibrations of a string stretched across
the interval 0 < x < I and fixed at the endpoints. (Think of a guitar string, and see
Figure 8.7.) Here u(x, t) will denote the displacement of the string (in a direction
perpendicular to the x-axis) at position x and time t. The relevant differential
equation is the wave equation d2u = c2d2u, where c is a positive constant that
can be interpreted as the speed with which disturbances propagate down the string.
(See Folland [6, pp. 388-90] or Kammler [10, pp. 526-7] for a derivation of the
wave equation from physical principles.) Since the string is fixed at both ends,
the boundary conditions for this problem are u(0, t) = u(l, t) = 0. As for initial
conditions, since the wave equation is second-order in t we need to specify both
the initial displacement u(x, 0) and the initial velocity dtu(x, 0). Thus the problem
we have to solve is
(8.36)
do o9 O

T — " "r^—o" ■ LtlX,U) — J IX ), ^r IX,UJ — t/UI, tXlU,(/l — LLyL^il — U,

OtZ OX1 Ot

where / and g are specified functions on [0, /].

Again we employ the technique of separation of variables and look for solutions
of the wave equation of the form u(x, i) = (p(x)ip(t). For such functions the wave
equation becomes

ip(x)ip"{t) = c2tp"(x)4>{t), or
tp(x)
8.5. Applications to Differential Equations 385

Figure 8.7: A vibrating string fixed at its ends.

In the last equation, the quantities on the left and right depend only on t and x,
respectively, so they are both equal to a constant —a, and we obtain the ordinary
differential equations

4"{t) + ac2i'{t) = 0, <p"(x) + a<p{x) = 0.

The general solution of the second equation is

<p{x) = C\ cos ^fax + C-2 sin \fax.

The boundary condition u(0, t) = 0 forces C\ to vanish, and then the boundary
condition u(l, t) = 0 forces ^/a to be a multiple of w/l, so a = n2TT2/l2 for some
(positive) integer n. With this value of a, the general solution of the differential
equation for tp is
, . . , nirct „ . mrct
ip{t) = ocos — h B sin —-—.
I L

(The arbitrary constants are labeled b and B for reasons that will become clearer in
a moment.)
For each positive integer n, we therefore have the solution

/ nvrct nirct\ nnx

un[X, t) = bn cos —j 1- Bn sin —— sm
/ ' n I J I '

Taking linear combinations and passing to limits, we are led to the series solution

v—r / nirct nixct\ mix

(8.37) u(x, t) = > bn cos -— h Bn sin —— sm

It remains to satisfy the initial conditions. Setting t — 0 in (8.37) yields

oo
nirx
t &nsin——.

so we satisfy the condition u(x, 0) = f(x) by taking the bn's to be the Fourier sine
coefficients of /:
2 r
bn = t / f{x)sm- dsc.
' JO
386 Chapter 8. Fourier Series

Moreover, formally differentiating (8.37) with respect to t and then setting t = 0

yields
a OO

— (x,0) = T—Bnsm
i

so we should be able to satisfy the condition dtu(x, 0) = g(x) by taking mxcBnjl

to the nth Fourier sine coefficient of g:

2 fl . . n-KX
Bn = / g(x) sin —— dx.
n-nc Jo I

Again, we ask: Does this really work? It is physically reasonable to assume

that the initial functions / and g are continuous and piecewise smooth and satisfy
the boundary conditions /(0) = /(/) = g(0) = g(l) = 0. Their odd 2/-periodic
extensions will then have the same properties, so their Fourier series will be abso
lutely and uniformly convergent by Theorem 8.29. In particular, Yl \bn\ < °° and
]T \nBn\ < oo, so by the Weierstrass M-test, the series (8.37) is absolutely and
uniformly convergent for 0 <£</,— oo < i < oo. However, there is no reason
for the twice-differentiated series that should represent dfu or c2d2u, namely,

7r2c2 v^ ? f, n-Kct ^ . nirct\ . mxx

(8.38) ~~rY~ / ,n &"cos~i hflnsin— sin—,
i V /

to converge. The extra factor of n2 makes the terms larger, and there is no ex
ponential decay anywhere to compensate. If we recall that the decay of Fourier
coefficients is related to the degree of smoothness of the function in question, the
contrast with the heat equation may be expressed as follows: The diffusion of heat
tends to smooth out irregularities in the initial temperature distribution, but in wave
motion, any initial roughness simply propagates without dying out.
We can obtain a positive result by imposing more differentiability hypotheses
on / and g. If we assume that not only / and g but also the first two derivatives of /
and the first derivative of g are continuous and piecewise smooth, and that not only
/ and g but also /" vanishes at the endpoints (so that its odd periodic extension
is continuous there), then Theorems 8.26 and 8.29 imply that Y2n2\bn\ < oo and
J2 n'2\Bn\ < oo, which guarantees the absolute and uniform convergence of (8.38).
This is also enough to guarantee that the formal differentiation of (8.37) that led to
the formula for the Bn's is valid.
However, these additional assumptions are rather unnatural from a physical
point of view. The obvious model for a plucked string, for example, is to take
/' to be a piecewise linear function as in Figure 8.8. It is easy to calculate the
8.5. Applications to Differential Equations 387

FIGURE 8.8: A model for a plucked string.

coefficients bn explicitly for such an / (Exercise 4), and they turn out to decay
exactly like n~2. The series (8.37) therefore converges nicely, and we may expect
it to provide a good description of the physical vibration of the string. On the other
hand, the twice-differentiated series (8.38) does not converge at all, so it is hard to
say in what sense (8.37) satisfies the wave equation. The resolution of this paradox
is to expand our vision of what a solution of a differential equation ought to be and
to develop a notion of "weak solution" that will encompass examples such as this
one. But this is a more advanced topic; see, for example, Folland [6, §9.5].
Taking for granted that the series (8.37) really is the solution of the boundary
value problem (8.36), we say a few words about its physical interpretation. Think of
the string as being a producer of musical notes such as a guitar string. The nth term
in the series (8.37), as a function of t, is a pure sine wave with frequency nirc/l,
which represents a musical tone at a pure, definite pitch. The series (8.37) therefore
shows how the sound produced by the string can be resolved into a superposition
of these pure pitches. Typically, the coefficients bn and Bn decrease as n increases,
so that the largest contribution comes from the first term, n = 1. This is the
"fundamental" pitch, and the higher n's are the "overtones" that give the note its
particular tone quality.

Related Problems. The heat flow and vibration problems (8.34) and (8.36)
can be modified by changing the boundary conditions; this leads to models of other
interesting physical processes. Here are a few examples:

1. The boundary value problem

du d2u
k u{x,0) = f{x), u(0,t) = u(l,t) = 0

models the flow of heat in a rod that occupies the interval 0 < x < I when both
ends are held at temperature zero — by immersing them in ice water, for instance.
(Note that the heat equation doesn't care where the zero point of the temperature
scale is located; if u is a solution, so is u + c for any constant c. Of course, this
means that the validity of the heat equation as a model for actual thermodynamic
processes has its limitations, as absolute zero exists physically.) The method of
solution is exactly the same as for the insulated problem (8.34), except that the
388 Chapter 8. Fourier Series

boundary conditions for <p(x) are ip(0) = <p(l) = 0. Thus, as in the vibrating string
problem, we obtain ip(x) = sin(nirx/l), and the solution is given by

—n2TT2kt\ rmx
u(x,t) = > exp r= sin

where ^ bn sm(nirx/l) is the Fourier sine series of f(x).

2. The boundary value problem

_ 2

dx2'

Li 1 JL. U J — / I «.£ J i r I J.'« L/ J — Q\ JL J ■ _ v LJ« £ J — _ 11« £ ) — \J

at ox ox

models the vibration of air in a cylindrical pipe occupying the interval 0 < x < I
that is open at both ends. (Examples: flutes and some organ pipes.) Here u(x,t)
represents the longitudinal displacement of the air at position x and time t. The
boundary conditions dxu(0, t) = dxu(l, t) = 0 come from the fact that the change
in air pressure due to the displacement u is proportional to dxu, and the air pressure
at both ends must remain equal to the ambient air pressure. Again, the solution is
very similar to (8.37) except that it involves cosines instead of sines in x:

i / a \ sr^ ( nixd nirct\

u(x, t) = |(ao + A()t) + > an cos — 1- An sm —-— I cos

where \olq + ^i° an cos(nnx/l) and \Aq + ^^°(n7rc/ln//) cos(n7rx//) are the
Fourier cosine series of / and g, respectively. (The term |(ao + AqI) represents
a flow of air down the tube with constant velocity, of no importance for the vibra
tions.) As with the vibrating string, the vibrations of the pipe are a superposition of
vibrations at the definite frequencies mrc/l (n = 1, 2,3,...).

3. We can also mix the two types of boundary conditions we have been consid
ering: for the heat equation,

or the wave equation,

d2u 2^u

(/'It (til
u(x, 0) = f(x), —(x, 0) = g(x), u(0, t) = —(I, t) = 0.
8.5. Applications to Differential Equations 389

The first of these models heat flow in a rod where one end is held at temperature
zero and the other is insulated; the second models vibrations of air in cylindrical
pipes where one end is closed and the other is open, such as clarinets and some
organ pipes. In both of them, separation of variables leads to the ordinary differen
tial equation <p"(x) = —atp(x) with boundary conditions ip(0) = f'{l) = 0. The
general solution of the differential equation is f(x) = C\ cos ^fax + C2 sin^/ax;
the condition ip(0) = 0 forces C\ to vanish, and then the condition ip'(l) = 0 forces
yfa. to be of the form (n — ^)tt/1 with n a positive integer. We are therefore led to
try to expand the initial functions in a series of the form
00
TTX
}{x) = y^ ans'm(n - \, ,
i '
This can indeed be done; the technique for reducing this problem to one of ordinary
Fourier sine series is outlined in Exercise 4 of §8.4.
It is interesting to note that the resulting frequencies for the vibrating pipe are
(n — ~)irc/l (n = 1,2,3,...). In particular, the fundamental frequency for a pipe
closed at one end and open at the other, namely ^ttc/1, is half as great as for a
pipe of equal length that is open at both ends. Moreover, only the odd-numbered
multiples of this fundamental frequency occur as "harmonics" for half-open pipes,
whereas all integer multiples occur for open pipes; as a result, the two kinds of
pipes produce notes of different tone qualities.
4. Clearly there are many other variations to be played on this theme — dif
ferent boundary conditions, other differential equations, and so on. A few further
examples are outlined in the exercises, and we shall indicate a more general frame
work in which such problems can be studied in the next section.

EXERCISES

1. A rod 100 cm long is insulated along its length and at both ends. Suppose that
its initial temperature is u(x, 0) = x (x in cm, u in °C, t in sec, 0 < x < 100),
and that its diffusivity coefficient k is 1.1 cm2/sec (about right if the rod is made
of copper).
a. Find the temperature u(x, t) for t > 0. (For the relevant Fourier series, see
Example 1 of §8.4.)
b. Show that the first three nonvanishing terms of the series (including the
constant term) give the temperature accurately to within 1° when t = 60
(one minute after starting). What are u(0, 60), u(10, 60), and u(40, 60) to
the nearest 1°? (Hint: Y^T(2n ~ l)~2 = ^78' so ET(2n - l)~2 =
(tt2/8) - 1 - i n 0.123.)
390 Chapter 8. Fourier Series

c. Show that u(x, t) is within 1° of its equilibrium value of 50° for all x when
t > 3600 (i.e., after one hour). (Don't work too hard; crude estimates are
enough.)

2. Find the temperature function u(6, t) (t > 0) for a rod bent into the shape of
a circular hoop, given the initial temperature u(6, 0) = f{6). (Here 6 denotes
the angular coordinate on the circle, and the boundary conditions for a straight
rod are replaced by the requirement that u should be a 27r-periodic function of
e.)

3. As we found in §5.6, the inhomogeneous heat equation dtu = kd'^u + G can

be used to model heat flow in a rod when the total amount of heat energy is not
constant; here G is a function of x and t, with units of degrees per unit time,
that accounts for the addition or subtraction of heat from the rod. Let us solve
the initial value problem with constant-temperature boundary conditions,

dtu = kd£u + G, u(x,0)=f(x), u(0,t) = u(ZJt) = 0,

making appropriate assumptions on / and G so that Fourier expansions are

valid. Motivated by the solution (8.35) for the special case G = 0, we ex
pand everything in a Fourier sine series. That is, for each t we write G(x, t) =
Y1T Pn{t) s'm(nirx/l), and we try to find a solution in the form u(x,t) =
Y1T bn{t) s'm(nirx/l), where the coefficients bn(t) are to be determined. Plug
this into the equation dtu — kd^u + G to obtain an ordinary differential equa
tion for each bn(t), with initial condition determined by the requirement that
Y^0 bn(0) s'm(nirx/l) should be the Fourier sine series of f(x). Then solve
these ordinary differential equations to obtain u. What conditions on / and G
will guarantee the validity of these calculations?

4. Consider a vibrating string occupying the interval [0, /]. Suppose the string is
plucked at x = b (0 < b < 1) so that its initial displacement u(x, 0) is rnx/b
for 0 < x < b and m(l — x)/(l - b) for b < x < I (that is, u(x, 0) is linear on
[0, b] and on [b, 1], and equal to m at x = b), and its initial velocity dtu(x, 0)
is zero. (Note: For this to be a realistic model of a plucked string, we should
have I 3> m.)
a. Find the Fourier series for u(x, t) for t > 0. (The result of Exercise 2 of
§8.3 can be used.)
b. Compute the coefficients b\,..., 65 of the first five terms (notation as in
(8.37)) numerically when b = (0.4)/ and when b = (0.1)/. Observe that
the higher frequencies contribute a lot more to u(x,t) when b = (0.1)1
than when b = (0.4)/. (Musically: Plucking a string nearer the end gives a
note with more "harmonics.")
8.5. Applications to Differential Equations 391

5. The model for a vibrating string given by the wave equation is unrealistic be
cause it predicts that the vibration will continue forever without dying out. Real
strings, however, are not perfectly elastic, so the vibrational energy is gradu
ally dissipated. A better model is obtained by the following modification of the
wave equation:
d\u — (?d2xu - 2Sdtu,
where 5 is a small positive constant. (The left side is the acceleration, and the
terms on the right are the effects of the elastic restoring force and the damping
force that tends to slow the motion down. The factor of 2 is just for conve
nience.) Find the general solution of this differential equation subject to the
boundary conditions u(0,t) = u(l,t) = 0 by modifying the method used in
the text for the ordinary wave equation. Assume that 5 < ire/1. You should find
that the solutions decay exponentially in time and that the frequencies decrease
as the damping constant 5 increases.

Exercises 6 and 7 concern the Dirichlet problem for a bounded open set 5 C
IR2: Given a function / on the boundary dS, find a solution of Laplace's equation
<92u + d2u = 0 on S such that u = f on dS. (A physical interpretation: Find
the steady-state distribution of heat in S when the temperature on the boundary is
given.)

6. Consider the Dirichlet problem for a rectangle:

d\u + dyU = 0 for 0 < x < I, 0 < y < L;

u(x,0) = fi{x), u(x,L) = f2{x), u{O,y) = gi(y), u(l,y) = g2(y).

a. Suppose we can solve this problem in the two special cases g\ = g2 = 0

and /i = J2 = 0. How can the solutions u\ and u2 for these cases be
combined to yield the solution for the general case?
b. Henceforth we assume that g\ = g2 = 0 (the case f\ = f2 = 0 is sim
ilar). Use separation of variables to find solutions of Laplace's equation
satisfying u(0,y) = u(l,y) = 0 in the form u(x,y) = ip(x)ip(y); then
use Fourier techniques to find the (infinite) linear combination of these so
lutions that satisfies u(x,0) = f\{x) and u(x,L) = f2{x). (Hint: The
general solution of ip" — c2i(j = 0 can be written in the form ip(y) =
asinhcy + bsinhc(L — y). [Why?] This form of the solution is more
convenient than the more obvious a sinh cy + b cosh cy.)

7. Consider the Dirichlet problem for the unit disc:

dlu + dlu = Ofora;2 + y2 < 1, u(cos0,sin0) = f{6).

392 Chapter 8. Fourier Series

If we think of u as a function of the polar coordinates (r, 9) rather than the

Cartesian coordinates (x,y), by Proposition 2.51 this becomes

r2d2u + rdru + d$u - 0 for r < 1, u(l, 9) = f{9).

a. Use separation of variables to find solutions of this differential equation in

the form u(r, 9) = ip(r)tp(9). Keep in mind that the solutions must be 2n-
periodic functions of 9 and that they must be smooth at the origin, where
r = 0 and 9 is undefined. (Hint: The general solution of the Euler equation
r2ip" + rip' — c2ip = 0 is ip(r) = arc + br~c if c / 0, a + b log r if c = 0.)
Then use Fourier techniques to find the (infinite) linear combination of
these solutions that satisfies u(l,9) = f(6).
b. You should find that u(r,0) equals Arf(9), the Abel approximant to /
defined by (8.19). Use (8.20) and (8.22) to derive the Poisson integral
formula for the solution:

8.6 The Infinite-Dimensional Geometry of Fourier Series

In this section we shall re-examine the notion of Fourier series in the light of a
profound analogy with certain ideas from vector algebra. We begin with a quick
review of the latter.
When expressed in algebraic terms, the concepts of Euclidean geometry in n
dimensions are based on the vector-space structure of Rn (that is, the operations
of vector addition and scalar multiplication), together with the dot product or inner
product a • b, in terms of which we can define lengths (|a| = (a • a)1/2) and angles
(the angle from a to b is arccos(a • b/|a| |b|)). The "natural" coordinate systems
for this geometry are the ones arising from an orthonormal basis for R", that is, a
basis ui,..., un such that Uj ■ u^ equals 0 for j ^ k and 1 for j = k. The formula
for expressing an arbitrary vector x in terms of such a basis is given very simply in
terms of inner products:

= £■
x =

(The formula for Cj results from taking the inner product of both sides of the equa
tion x = jyi ckuk with uj to yield x • u3■ = Y^l
8.6. The Infinite-Dimensional Geometry of Fourier Series 393

Similar ideas underlie the study of complex n-dimensional vectors. The main
difference is that, since the absolute value \z\ of a complex number z is given by
(zz)1//2 rather than (z2)1/'2, the appropriate definition of inner product is

(8.39) (a,b)=^aJ6J (a,beC)

(Recall that the conjugate ~z of a complex number z = x + iy (x, y € R) is defined

to be x — iy. The notation a • b is also used for the complex inner product, but
we introduce the new notation (a, b) to avoid confusion with the real case and
to prepare for further developments.) Thus (a, b) is a linear function of a but
a conjugate-linear function of b (meaning that (a, cb) equals c(a, b) rather than
c(a, b)), and (b, a) = (a, b). The magnitude or norm of the vector a is still given
by |a| = (a, a)1/2, and we still call two vectors a and b orthogonal if (a, b) = 0.
As in the real case, a basis ui,..., un for Cn is orthonormal if (uj, u&) is 0 if
j 7^ k and 1 if j = k. The expansion formula for a vector xeC" with respect to
an orthonormal basis is exactly the same:

If the basis {uj} is orthogonal ((uj, Ufc) =0 for j ^ k) but not normalized (||uj|
not necessarily equal to 1), the formula becomes

(8.40) x = Vc7u7-, Ci = ^E!M.

Now we are ready to make the conceptual leap from the discrete and finite-
dimensional to the continuous and infinite dimensional. Suppose we are studying
functions on an interval [a,b] — let us say, piecewise continuous, complex-valued
ones. We regard such a function / as a "vector" whose "components" are the
values f(x) as x ranges over [a, b]. We define the inner product of two functions
/ and g just as in (8.39) except that the sum is replaced by an integral:

(8.41) (f,g)= / f(x)g(x)dx.

Further, we define the norm of a function / to be

i 1/2
\f(x)\2dx ,
394 Chapter 8. Fourier Series

and we define two functions / and g to be orthogonal on [a, b] if (/, g) = 0. A

sequence of functions {<pn} is called orthogonal if (ipm, fn) = 0 for m ^ n, and
orthonormal if, in addition, ||^n|| = 1 for all n.
For example, take the interval [a, b] to be [—ir,ir], and define en(x) = einx.
Then, since emx = e~mx, by (8.5) we have

(em f £n) = r ei(m-n)x dx=S27T if m = n'

J-tt I 0 otherwise.

Thus {en}™oo z<s an orthogonal set; the corresponding orthonormal set is {ipn}~ oo
oo

where Lpn = (27r)~1//2en. The formula for the Fourier series of a function /,

f—W e
./ — /^cnen, r ■- — f f(T\r-inx rlv —
cn — — ; J{x)e ax —

is an exact analogue of the formula (8.40) for the expansion of a vector in terms of
an orthogonal basis!
A similar interpretation holds for Fourier cosine and sine series. To wit, it is
easy to verify (Exercise 1) that {cosnTrx/^j-Q0 and {smrnrx/l}1^0 are orthogonal
sets on the interval [0,1], and that the formulas for the Fourier cosine and sine
coefficients of a function / on [0,1] are analogous to (8.40).
There are some unanswered questions here, however. The inner product (/, g)
makes sense when / and g are piecewise continuous on [a, b], but we have proved
the validity of Fourier expansions only for piecewise smooth functions. So, what is
the "right" class of functions to consider here? Can we make sense out of Fourier
series for functions that may not be piecewise smooth?
The key insight is that pointwise convergence is the wrong notion of conver
gence in this situation. Instead, we should use a notion of convergence that arises
from the geometry of the inner product. That is, we think of the set

PC(a, b) = set of all piecewise continuous complex-valued functions on [a, b]

as an 'Infinite-dimensional Euclidean space" with the notions of length and angle

given by the inner product (8.41). The "distance" between two functions is to be
interpreted as the norm of their difference,

r rb -jl/2
Distance from / to g = \\f - g\\ = \ \f(x) - g(x)\2 dx ,
\.Ja
and the corresponding notion of convergence is that

fk -> / ^ ||A - f\\ - 0, i.e., f \fk(x) - f(x)\2dx -> 0.

8.6. The Infinite-Dimensional Geometry of Fourier Series 395

This notion of convergence is called convergence in norm or mean-square con

vergence.

Note. If the distance ||/ — g\\ between two piecewise continuous functions is
zero, it does not follow that / is identically equal to g, but only that f(x) = g(x) for
all except perhaps finitely many values of x. In this setting, it is appropriate not to
worry about this technicality and to think of two functions as being the same when
they differ only at finitely many points. This issue already arose in connection
with the behavior of the Fourier series of / at points where / is discontinuous
(cf. Corollary 8.18).
Mean-square convergence is rather different from pointwise convergence, and
neither one implies the other. For example, let us take [a, b] = [—1,1]. If

. . , (k ifO < x < 1/k,

fk(x) = i .
I 0 otherwise,

then fk —* 0 pointwise but ||/fc|| = (J^ k2 dx)1/2 — \fk —♦ oo. On the other
hand, if
. . (l if -1/k < x < 1/k,
9k{x) = < .
I (J otherwise,

then \\gk\\ = {J^{k,k dx)1/2 = y/2/~k -* 0, but gk(Q) = 1 ■/* 0. (By replacing
the interval (—1/k, 1/k) here by an interval Ik whose length tends to 0 but whose
midpoint oscillates back and forth within the interval [—1,1] as k —> oo, one can
construct examples of sequences {gk} that converge in norm but do not converge
at any point.) However, for uniform rather than pointwise convergence there is
something to say.

8.42 Proposition. If fk —> f uniformly on [a, 6], then fk —> f in norm on [a, 6].

Proof. If fk —> / uniformly, there is a sequence {Ck} of constants such that

\fk(x) - f(x)\ < Ck for all x € [a, b] and Ck -* 0, so

\fk(x)-f(x)\2dx<(b-a)C2^0.

More generally, fk —> / in norm provided that fk —> / pointwise and there is a
constant C such that \fk(x)\ < C for all k and all x G [a, b]; this follows from the
bounded convergence theorem (4.52).
396 Chapter 8. Fourier Series

The introduction of norm convergence is justified by the fact that the Fourier
series of any piecewise continuous function f on [—n, n] converges in norm to f.
This is a substantial result, but there is more to be said before we state a formal
theorem.
The space PC(a, b) of piecewise continuous functions on [a, b] fails to be a
good infinite-dimensional analogue of Euclidean space in one crucial respect: it is
not complete. That is, if {fk} is a sequence in PC (a, b) such that \\fj — fk\\ —* 0
as j, k —> oo, there may not be a function / £ PC (a, b) such that \\fk — f\\ —> 0.
For example, with [a, b] = [0,1], let

[0 otherwise.

It is easily verified that \\fj — fk\\2 = 2|j~1//2 —fc~1//2| —» 0 as j,k —» oo. However,
the function to which the f^'s are converging is clearly f(x) = x"1/4 (x > 0),
which does not belong to PC(0,1) because it blows up at 0. Thus, to fill in the
"holes" in PC(a, b) one will have to deal with unbounded functions and improper
integrals. But even this is not enough; with more cleverness one can construct ex
amples where the limiting function / is not (Riemann) integrable on any subinterval
of [a, b).
What is needed here is the Lebesgue integral, which handles integrals of un
bounded and discontinuous functions more capably (see §4.8). The appropriate
"completion" of the space PC(a, b) is the space of square-integrable functions,

L2(a, b) = \ f : f is Lebesgue measurable on [a, b] and / |/(x)|2 dx < oo

I Ja

where the integral is a Lebesgue integral. (The name "L2" is pronounced "L-two";
the L is in honor of Lebesgue and the 2 refers to the exponent in |/(x) |2.)
We can now state the general convergence theorem for Fourier series.

8.43 Theorem. Let en{9) = ein0.

a. If f £ L2(—vr. vr), the Fourier series

p c — — f(6)e~in6d9

converges in norm to f, that is,

N 2
Jnd
lim d,6 = 0.
-N
8.6. The Infinite-Dimensional Geometry of Fourier Series 397

b. Bessel's inequality is an equality: For any f £ L2(—tt, tt),

oo f7r

-oo J-'<

c. If {cn}0?^ is any sequence of complex numbers such that Yl^oo \cn\2 con
verges, then the series X^co cn^n converges in norm to a function in L2(—tt, tt).

Proof. A full proof of Theorem 8.43 is beyond the scope of this book. (One may
be found in Jones [9, p. 325] or Rudin [18, pp. 328ff.].) However, the idea is as
follows. If / is continuous and piecewise smooth, we know that its Fourier series
converges uniformly (Theorem 8.29) and hence in norm, so (a) is valid for such /.
We then obtain the result for arbitrary / € L2(—vr, tt) by a limiting argument that
involves proving that any function in L2{-tt, tt) is the limit in norm of a sequence
of continuous, piecewise smooth functions. (A partial result in this direction is
indicated in Exercise 7.) (b) follows easily because, as we showed in the proof of
Bessel's inequality,

N PIT JV

\f(9)\2dd- md
2tt ^C"I=^L JV

and the integral on the right tends to zero as iV —* oo since the series converges
in norm to /. (c) follows from (b) and the completeness of L2(—tt, tt). Indeed, by
(b),
2

dO = 2n 2_^ \cn\ ,
M<\n\<N M<\n\<N

so the partial sums of the series' cnen are Cauchy in norm; by completeness, the
series converges in norm. □

Theorem 8.43 says that {einx}°^oo is an orthogonal basis for L2(-7r,vr), that
is, an orthogonal set with the property that every function in L2(—tt,tt) can be
expanded uniquely as a norm-convergent series of scalar multiples of functions in
the set. Likewise, {cosnrc}o° and {sinnx}f° are orthogonal bases for L2(0,7r);
see Exercises 1 and 2.
The equality in Theorem 8.43b,

(8.44)
E
—oo
398 Chapter 8. Fourier Series

is known as Parseval's identity; it is the infinite-dimensional analogue of the

Pythagorean theorem for finite-dimensional vectors, if we think of / as an infinite-
dimensional vector and the cn's as the components of this vector with respect to the
orthogonal basis {en}. The factor of 2ir is there because ||en||2 = 2vr.
As an illustration of the use of Parseval's identity, we give another derivation
of the formula Yyj° n~2 — tt2/6. (The first one was in Example 2 of §8.2.) Let
/ be the sawtooth wave function (/(#) = 6 for \6\ < it). We calculated in §8.1
that its Fourier coefficients are given by cq = 0 and cn = (-l)n+1 /in for n^O.
Therefore,

v^ 1 1[v- 1 v- 1
l2d6=T-
n"
—oo

Parseval's identity easily yields the following generalization of itself, which is

often useful:

8.45 Corollary. If f,g £ L2(-k, it) have the Fourier series ^2 cnen and Y^, In^n,
then

° "I

.46) Ec™^ = ^y f (0)9(6) d9.

Proof. We apply (8.44) to the functions /, g, and f + g:

^(|c,,j2 + 2Recn7n + |7rJ2) = 2_^ \cn + ln\2

= ^ r \f(0)+9(0)?d6 = ±- f" (\f(9)\2 + 2Ref(9)gJf)-

2 + - Re r f(e)W)de + y\1 ' /—<

It follows that Re £ cn7n = Re(l/2yr) /^ f(6)g(6) d6. The same calculation,

with / replaced by if, shows that the imaginary parts are also equal. □

The Fourier bases {e1™}^, {cosnx}o° and {smnxjf1 play a special role
among all the orthogonal bases for L2(-tt, it) and L2(0, n) because these functions
are eigenfunctions for the differential operators d/dx and d2/dx2. To explain this
in more detail, we recall that an eigenvector for a linear transformation T on W1
or Cn is a nonzero vector x such that Tx = Ax for some scalar A. (See Appendix
A, (A.56)-(A.58)). In our situation, the "vectors" are functions in L2(—vr,7r) or
L2(0, it), and the linear transformation in question is d/dx or d2/dx2, defined not
8.6. The Infinite-Dimensional Geometry of Fourier Series 399

on the whole L2 space but on a suitable subspace of functions that possess the
requisite derivatives and satisfy certain boundary conditions. Indeed, we have

777 T • ?T} T 2 2
—e —me , -—r cos nx =—n cosnx, ——^sinnx=—n smnx.
ax axz dxz

The functions emx are precisely the eigenfunctions of d/dx on [—it, n] that satisfy
the periodicity condition /( — it) = f(ir), and the functions cos no; and sinnx are
precisely the eigenfunctions of d2/dx2 on [0,n] that satisfy the boundary condi
tions /'(0) = f'{ir) = 0 and /(0) = f{n) — 0, respectively. The Fourier expan
sion of a function therefore provides the analogue of the spectral theorem (A.58)
for these fundamental differential operators, with all the resulting simplifications
that one expects when one finds an orthonormal eigenbasis for a matrix.
For example, we can rederive the solution (8.35) of the insulated heat flow
problem (8.34) as follows. To solve the heat equation dtu = kd2u subject to the
boundary conditions dxu(0, t) = dxu(l, t) = 0, we take u to be the sum of a series
of eigenfunctions of d2 satisfying these boundary conditions:

U(X,t) = y an[t) COS —■—.

0 l
Plugging this into the heat equation turns the partial differential equation dtu =
kd^u into the ordinary differential equations a'n(t) = -k(mr/l)2an(t) for the
coefficients. The latter are easily solved to yield an{t) = ane~k(n7rl^2t and hence
the solution (8.35).
There is an extensive theory of eigenfunction expansions associated to bound
ary value problems. Many such expansions yield interesting orthogonal bases for
L2 spaces. Others, in which there is a "continuous spectrum" instead of (or in addi
tion to) a "discrete spectrum," involve integrals instead of (or in addition to) infinite
series. A great deal of interesting mathematics has arisen from these ideas, and its
ramifications spread far beyond the problems for which it was originally devised.
An introduction to this subject can be found, for example, in Folland [6].

EXERCISES

1. Show that {cos nxj'jf and {sin nx}f are orthogonal sets of functions on [0, it].
What are the norms of these functions?

2. Deduce from Theorem 8.43 that if / G L2(0, n), the Fourier cosine and sine
series of / both converge to / in norm.
400 Chapter 8. Fourier Series

3. Determine the constants a, b, and c so that the functions

fo(x) = 1, fi(x) = x + a, f2(x) = x2 + bx + c

form an orthogonal set on [0,1].

4. Suppose {Vn}?0 is an orthonormal set of functions on [0,1], and let tp* and
>p~ be the even and odd extensions of <pn to [—1,I]. Show that {<Pn/^}T u
{(/?~/v^2}i° is an orthonormal set on [—1,1].
5. Suppose {^nli0 is an orthonormal set of functions on [a, b]. Given c > 0 and
d £ R, let ^n(aj) = y/c<pn(c% + d). Show that {^n}?0 is an orthonormal set
on [(a -<f)/c, (t>- d)/c}.
6. Suppose {^n}i° is an orthonormal set of functions on [0,1], and let tpn(x) =
\^2xfn(x2). Show that {ipn}}0 is also an orthonormal set on [0,1].
7. Show that any piecewise continuous function on [a, b] is the limit in norm of
a sequence of continuous functions on [a, b] by the argument suggested by the
following picture:

= lim

8. Show that in terms of the cosine and sine coefficients an and bn defined by
(8.7), Parseval's identity takes the form

9. Evaluate the following series by applying Parseval's identity, in the form given
in Exercise 8, to certain of the Fourier series found in the exercises of §8.1 and
§8.3. (Remember that the constant term is \ao, not a0.)

oo
\—■>.
b.
^ (2n - l)f
~ 1
C.

1
8.7. The Isoperimetric Inequality 401

v-^ sm na
a. } 5— (First assume that 0 < a < it, then deduce the general re-
^—' nz
suit.)

10. Suppose that / is 27r-periodic, real-valued, and of class C1. Show that /' is
orthogonal to / on [—ir, n] in two ways: (i) directly from the fact that 2//' =
if2)', and (ii) by expanding / in a Fourier series and using (8.46). (Hint: When
/ is real we have c_n = cn; why?)

8.7 The Isoperimetric Inequality

We conclude this chapter by using Fourier analysis together with Green's theorem
(thereby joining two of the main threads of this book) to show that among all simple
closed curves in the plane with a given length, the circle is the one that encloses the
greatest area.

First, a few preliminaries. Suppose g : [a, b] —> M.2 is a continuous, piece-

wise smooth parametrized curve in the plane. (Thus, the components of g are
continuous, piecewise smooth functions on [a, b]\ g'(t) is defined except perhaps
at finitely many points, and we make the usual nondegeneracy assumption that
g'(t) 7^ 0.) The arc-length function s = tp(t) = f* \g'(u)\du is a continuous,
piecewise smooth, strictly increasing function on [a, b\. It therefore has an inverse
function, t = cp~l(s), with the same properties, defined on the interval [0, L) where
L = if(b) is the total length of the curve. We can then reparametrize the curve by
h(s) = g((/p~1(s)), s 6 [0, L); we then say that the curve is parametrized by arc
length. In this parametrization, the speed |h'(s)| is identically equal to 1 (except at
isolated points where it is undefined):

(8.47) |h'(s)| =

Now, suppose in addition that our curve is a simple closed curve; this means
that, for 0 < s\ < S2 < L, h(si) = h(s2) only when 8\ = 0 and S2 = L. We can
then extend the function h from [0, L] to R by requiring it to be L-periodic; this
extension is still continuous and piecewise smooth. (Indeed, this is the natural way
to think of a simple closed curve. We think of 6 = 2tys/L as the angular coordinate
on a circle; then h(s) traces out the curve as 9 goes once around the circle.)
Finally, we observe that we can identify R2 with the complex plane C and
the vector-valued function h = (/ii,/i2) with the complex-valued function ( =
hi + ih2. The "velocity" h'(s) then turns into ('(s), and the condition (8.47)
becomes |('(s)| = 1.
Now we are ready to state our theorem:
402 Chapter 8. Fourier Series

8.48 Theorem (The Isoperimetric Inequality). Suppose that C is a piecewise

smooth, simple closed curve in the plane. Let L be the length of C and A the area
of the region enclosed by C. Then A < L2/4n, with equality if and only if C is a
circle of radius L/2tt.

Proof. We identify the plane with C Dilating the plane by a factor of r, z —> rz,
has the effect of multiplying the length of a curve by r and the area of a region by r2,
so it is enough to consider the case L = 2ir, for which the conclusion is that A < it.
By the preceding remarks, then, we can assume that C is given by z = ((s), where
( is a continuous, piecewise smooth, 27r-periodic, complex-valued function on R,
and |C'(s)| = 1 (except at isolated points where £'(s) is undefined). We expand £
in a Fourier series:

Cn.e .

Since (" is continuous and piecewise smooth, the nth Fourier coefficient of (' is
incn, by Theorem 8.26. Since |C'(s)| = 1, Parseval's identity implies that

(8.49)

On the other hand, by Green's theorem (see Example 3 in §5.2), the area of the
region enclosed by C is

A = \ j xdy-y dx

(The absolute value is there because we do not specify whether C is positively or

negatively oriented.) Moreover,

xdy — ydx — Im[(:r — iy)(dx + idy)] = lmldz,

A = him / Hdz Im / C(sK'(s)ds

i( ■

Thus, by the general form (8.46) of Parseval's identity,

A = it Im cnincn = TT
£■»
—oo

Comparing this with (8.49) yields the desired upper bound for A:
oo oo

A = 7T Vn
Z—/
2
< vr^ n Cr. ' <w = TT.

— oo —oo
8.7. The Isoperimetric Inequality 403

Moreover, the second inequality is strict unless cn = 0 for n\ > 1. In that case,
the first inequality becomes

'-1

which is strict unless either c\ or c_i vanishes. Thus A < n unless ((s) =
Co + c\els or ("(s) = cq + C-\e~ls, both of which describe a circle centered at
Co, traversed counterclockwise or clockwise, respectively. (In either case the radius
is 1 since |e±i| = |C'(s)| = 1.) □

Lecture6 Estimation PDF
No ratings yet
Lecture6 Estimation PDF
17 pages
Technical Specifications Renishaw Probing PDF
No ratings yet
Technical Specifications Renishaw Probing PDF
72 pages
Binomial and Poisson Distributions
No ratings yet
Binomial and Poisson Distributions
2 pages
Examples
No ratings yet
Examples
128 pages
Dhanalakshmi Srinivasan Institute of Technology - Trichy: ME8501 Metrology and Measurement Iii Mechanical
No ratings yet
Dhanalakshmi Srinivasan Institute of Technology - Trichy: ME8501 Metrology and Measurement Iii Mechanical
39 pages
Fanuc Oi-Tc Canned Cycle
No ratings yet
Fanuc Oi-Tc Canned Cycle
16 pages
XYZ Mesh Manual PDF
No ratings yet
XYZ Mesh Manual PDF
9 pages
Optimization of Grinding Cycle Time For End Mill Manufacturing
100% (1)
Optimization of Grinding Cycle Time For End Mill Manufacturing
5 pages
Partial Differential Equations Guide
No ratings yet
Partial Differential Equations Guide
32 pages
7249.introduction To Computer Numerical Control CNC
100% (1)
7249.introduction To Computer Numerical Control CNC
742 pages
Fanuc 18T Polar Coordinate Guide
No ratings yet
Fanuc 18T Polar Coordinate Guide
9 pages
Aberlink CNC Probe Calibration
No ratings yet
Aberlink CNC Probe Calibration
9 pages
Unit-3 (Broaching, Lathe, Planer, Boring Machines)
No ratings yet
Unit-3 (Broaching, Lathe, Planer, Boring Machines)
74 pages
Milling Machine DM
No ratings yet
Milling Machine DM
6 pages
Numerical Methods and Optimization Question Bank: Q.1) Explain
No ratings yet
Numerical Methods and Optimization Question Bank: Q.1) Explain
9 pages
PT Promecam Tools 1801 ENG
No ratings yet
PT Promecam Tools 1801 ENG
45 pages
Gear Solutions - 2013-06
No ratings yet
Gear Solutions - 2013-06
72 pages
Lex and Yacc Introduction
No ratings yet
Lex and Yacc Introduction
13 pages
Fast Beam
No ratings yet
Fast Beam
2 pages
04 Getting Started
No ratings yet
04 Getting Started
78 pages
Grinding Fixture
No ratings yet
Grinding Fixture
2 pages
Lecture 2 - Proplem Formulation
No ratings yet
Lecture 2 - Proplem Formulation
48 pages
Reading A Steel Rule Scale Quiz
No ratings yet
Reading A Steel Rule Scale Quiz
11 pages
Dynamic Mill Database - DATABASE
No ratings yet
Dynamic Mill Database - DATABASE
6 pages
Midterm Examination Engineering Probability Semester 1 2017 2018 Index 1
No ratings yet
Midterm Examination Engineering Probability Semester 1 2017 2018 Index 1
3 pages
5-Axis Function Application Manual
No ratings yet
5-Axis Function Application Manual
291 pages
1st - T - Lect No.9 - CAM - Drilling Canned Cycles
No ratings yet
1st - T - Lect No.9 - CAM - Drilling Canned Cycles
26 pages
Hass Answers
No ratings yet
Hass Answers
7 pages
ToolBook Mikron Tool English USA
No ratings yet
ToolBook Mikron Tool English USA
650 pages
B.Tech CSE Algorithm Design Notes
No ratings yet
B.Tech CSE Algorithm Design Notes
126 pages
Notes Drawing
No ratings yet
Notes Drawing
9 pages
CNC Touch Probe Guide
No ratings yet
CNC Touch Probe Guide
4 pages
Delcam - PowerSHAPE 7.0 PowerMILL Modelling Training Course en - 2007
No ratings yet
Delcam - PowerSHAPE 7.0 PowerMILL Modelling Training Course en - 2007
193 pages
1.emco Compact 5 CNC Lathe
No ratings yet
1.emco Compact 5 CNC Lathe
14 pages
Statics and Dynamics
No ratings yet
Statics and Dynamics
5 pages
Wireframe
No ratings yet
Wireframe
32 pages
Small Part Machining Guide
No ratings yet
Small Part Machining Guide
51 pages
HAAS Mill Control Panel
No ratings yet
HAAS Mill Control Panel
1 page
Introduction To CNC Machines
No ratings yet
Introduction To CNC Machines
15 pages
Mastercam Lathe Design Course Online
No ratings yet
Mastercam Lathe Design Course Online
2 pages
Lathe Thread Cutting Guide
No ratings yet
Lathe Thread Cutting Guide
23 pages
QM320 Operation Manual (2010-08-27)
No ratings yet
QM320 Operation Manual (2010-08-27)
29 pages
CNC Programming Guide
100% (1)
CNC Programming Guide
53 pages
Fisher Light Load - Catalogue - en - 01
No ratings yet
Fisher Light Load - Catalogue - en - 01
24 pages
Computer Aided Manufacturing Notes-I
No ratings yet
Computer Aided Manufacturing Notes-I
198 pages
Methods of Integration: Substitution Method
No ratings yet
Methods of Integration: Substitution Method
42 pages
Emcomat 14D
No ratings yet
Emcomat 14D
8 pages
Robot Workbench FreeCAD
No ratings yet
Robot Workbench FreeCAD
2 pages
Heidenhain M128 Measure: If You Don't Want Take Out 600 500 Table Please Follow Below Procedural
No ratings yet
Heidenhain M128 Measure: If You Don't Want Take Out 600 500 Table Please Follow Below Procedural
4 pages
Central Composite Design
100% (1)
Central Composite Design
49 pages
CNC Operator Web-Based Tutorial 545
100% (1)
CNC Operator Web-Based Tutorial 545
20 pages
Tutorial
0% (1)
Tutorial
122 pages
Calculus II: For Biology and Medicine
No ratings yet
Calculus II: For Biology and Medicine
80 pages
Lecture Notes Algebra Geometry
No ratings yet
Lecture Notes Algebra Geometry
224 pages
Algebra Sem1 Solutions
No ratings yet
Algebra Sem1 Solutions
88 pages
Ch1 Functions
No ratings yet
Ch1 Functions
8 pages
Lecture Notes
No ratings yet
Lecture Notes
148 pages
Basic Mathematics Textbook
No ratings yet
Basic Mathematics Textbook
141 pages
Basic Mathematics Textbook
100% (1)
Basic Mathematics Textbook
142 pages
2 Linear and Multilinear Algebra: 2.1 Basic Concepts and Notation
No ratings yet
2 Linear and Multilinear Algebra: 2.1 Basic Concepts and Notation
35 pages
LAB
No ratings yet
LAB
3 pages
Babson Statement
No ratings yet
Babson Statement
2 pages
Soilmec SR-125 HIT Hydraulic Rotary Rig (April 2016)
100% (1)
Soilmec SR-125 HIT Hydraulic Rotary Rig (April 2016)
12 pages
Upload - Sample INASGOC
No ratings yet
Upload - Sample INASGOC
9 pages
Spring 2024 - CS304P - 2
No ratings yet
Spring 2024 - CS304P - 2
4 pages
Interview - Hasyim Siraj
No ratings yet
Interview - Hasyim Siraj
28 pages
Ogive S Curve
No ratings yet
Ogive S Curve
2 pages
Nkealah 2016
No ratings yet
Nkealah 2016
15 pages
Palombini - 1993 - Machine Songs V Pierre Schaeffer From Research I
No ratings yet
Palombini - 1993 - Machine Songs V Pierre Schaeffer From Research I
7 pages
Abib
No ratings yet
Abib
3 pages
Assimilation Notes
No ratings yet
Assimilation Notes
5 pages
Unit Iv Transaction Management
No ratings yet
Unit Iv Transaction Management
17 pages
Student Material Attitudes N Management Skills
No ratings yet
Student Material Attitudes N Management Skills
3 pages
Asme B1.2 PDF
86% (7)
Asme B1.2 PDF
190 pages
MSDS HM Pe Filfer
No ratings yet
MSDS HM Pe Filfer
4 pages
Blockchain Tech
No ratings yet
Blockchain Tech
3 pages
Gypsum Board Manufacturing Process
100% (4)
Gypsum Board Manufacturing Process
2 pages
Vector Algebra: Study Module - Lakshya JEE AIR O1 (2026)
No ratings yet
Vector Algebra: Study Module - Lakshya JEE AIR O1 (2026)
83 pages
Continous Miner and Road Header
100% (1)
Continous Miner and Road Header
22 pages
"Urbana and Feliza: Moral Lessons"
No ratings yet
"Urbana and Feliza: Moral Lessons"
16 pages
PCR Cabinet SCR
No ratings yet
PCR Cabinet SCR
8 pages
Heat Sink Capacity Mesurment in Inservice Pipeline
No ratings yet
Heat Sink Capacity Mesurment in Inservice Pipeline
13 pages
Compiled Scope OKAT 2022
No ratings yet
Compiled Scope OKAT 2022
14 pages
Tips and Tricks ANSYS
No ratings yet
Tips and Tricks ANSYS
62 pages
Learning Reactive Programming With Java 8 1st Edition Nickolay Tsvetinov Download
100% (5)
Learning Reactive Programming With Java 8 1st Edition Nickolay Tsvetinov Download
61 pages
Wassce Waec 2023 Physics Paper 3 Past Questions and Answer PDF
No ratings yet
Wassce Waec 2023 Physics Paper 3 Past Questions and Answer PDF
18 pages
Chart Catalogue, WNM, GPE, Publn Etc
100% (1)
Chart Catalogue, WNM, GPE, Publn Etc
13 pages
Steam Engineering & Heat Transfer Guide
100% (2)
Steam Engineering & Heat Transfer Guide
169 pages
Amateur Radio Power Supply Guide
No ratings yet
Amateur Radio Power Supply Guide
7 pages
KOEL Troubleshooting Manual
No ratings yet
KOEL Troubleshooting Manual
5 pages