Full
Full
REAL ANALYSIS
Jiří Lebl
Oklahoma State University
Oklahoma State University
Introduction to Real Analysis
Jiří Lebl
This open text is disseminated via the Open Education Resource (OER) LibreTexts Project (https://LibreTexts.org) and like the
hundreds of other open texts available within this powerful platform, it is licensed to be freely used, adapted, and distributed.
This book is openly licensed which allows you to make changes, save, and print this book as long as the applicable license is
indicated at the bottom of each page.
Instructors can adopt existing LibreTexts texts or Remix them to quickly build course-specific resources to meet the needs of
their students. Unlike traditional textbooks, LibreTexts’ web based origins allow powerful integration of advanced features and
new technologies to support learning.
The LibreTexts mission is to unite students, faculty and scholars in a cooperative effort to develop an easy-to-use online
platform for the construction, customization, and dissemination of OER content to reduce the burdens of unreasonable
textbook costs to our students and society. The LibreTexts project is a multi-institutional collaborative venture to develop the
next generation of open-access texts to improve postsecondary education at all levels of higher learning by developing an
Open Access Resource environment. The project currently consists of 13 independently operating and interconnected libraries
that are constantly being optimized by students, faculty, and outside experts to supplant conventional paper-based books.
These free textbook alternatives are organized within a central environment that is both vertically (from advance to basic level)
and horizontally (across different fields) integrated.
The LibreTexts libraries are Powered by MindTouch® and are supported by the Department of Education Open Textbook Pilot
Project, the UC Davis Office of the Provost, the UC Davis Library, the California State University Affordable Learning
Solutions Program, and Merlot. This material is based upon work supported by the National Science Foundation under Grant
No. 1246120, 1525057, and 1413739. Unless otherwise noted, LibreTexts content is licensed by CC BY-NC-SA 3.0.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not
necessarily reflect the views of the National Science Foundation nor the US Department of Education.
Have questions or comments? For information about adoptions or adaptions contact info@LibreTexts.org. More information
on our activities can be found via Facebook (https://facebook.com/Libretexts), Twitter (https://twitter.com/libretexts), or our
blog (http://Blog.Libretexts.org).
1: INTRODUCTION
1.1: ABOUT THIS BOOK
1.2: ABOUT ANALYSIS
1.3: BASIC SET THEORY
2: REAL NUMBERS
2.1: BASIC PROPERTIES
2.2: THE SET OF REAL NUMBERS
2.3: ABSOLUTE VALUE
2.4: INTERVALS AND THE SIZE OF R
2.5: DECIMAL REPRESENTATION OF THE REALS
4: CONTINUOUS FUNCTIONS
4.1: LIMITS OF FUNCTIONS
4.2: CONTINUOUS FUNCTIONS
4.3: MIN-MAX AND INTERMEDIATE VALUE THEOREMS
4.4: UNIFORM CONTINUITY
4.5: LIMITS AT INFINITY
4.6: MONOTONE FUNCTIONS AND CONTINUITY
5: THE DERIVATIVE
5.1: THE DERIVATIVE
5.2: MEAN VALUE THEOREM
5.3: TAYLOR’S THEOREM
5.4: INVERSE FUNCTION THEOREM
7: SEQUENCES OF FUNCTIONS
7.1: POINTWISE AND UNIFORM CONVERGENCE
7.2: INTERCHANGE OF LIMITS
7.3: PICARD’S THEOREM
1 5/26/2021
8: METRIC SPACES
In mathematics, a metric space is a set for which distances between all members of the set are defined. Those distances, taken together,
are called a metric on the set. A metric on a space induces topological properties like open and closed sets, which lead to the study of
more abstract topological spaces.
BACK MATTER
INDEX
GLOSSARY
2 5/26/2021
CHAPTER OVERVIEW
1: INTRODUCTION
1 5/26/2021
1.1: About this book
Edit page
Click the Edit page button in your user bar. You will see a suggested structure for your content. Add your content and hit
Save.
Tips:
Classifications
Tags are used to link pages to one another along common themes. Tags are also used as markers for the dynamic
organization of content in the MindTouch framework.
Edit page
Click the Edit page button in your user bar. You will see a suggested structure for your content. Add your content and hit
Save.
Tips:
Classifications
Tags are used to link pages to one another along common themes. Tags are also used as markers for the dynamic
organization of content in the MindTouch framework.
Learning Objects
1 5/26/2021
2.1: Basic properties
Introduction
About this book
This book is a one semester course in basic analysis. It started its life as my lecture notes for teaching Math 444 at the University of Illinois at Urbana-Champaign (UIUC) in Fall semester
2009. Later I added the metric space chapter to teach Math 521 at University of Wisconsin–Madison (UW). A prerequisite for this course is a basic proof course, using for example , , or .
It should be possible to use the book for both a basic course for students who do not necessarily wish to go to graduate school (such as UIUC 444), but also as a more advanced one-semester
course that also covers topics such as metric spaces (such as UW 521). Here are my suggestions for what to cover in a semester course. For a slower course such as UIUC 444:
§0.3, §1.1–§1.4, §2.1–§2.5, §3.1–§3.4, §4.1–§4.2, §5.1–§5.3, §6.1–§6.3
For a more rigorous course covering metric spaces that runs quite a bit faster (such as UW 521):
§0.3, §1.1–§1.4, §2.1–§2.5, §3.1–§3.4, §4.1–§4.2, §5.1–§5.3, §6.1–§6.2, §7.1–§7.6
It should also be possible to run a faster course without metric spaces covering all sections of chapters 0 through 6. The approximate number of lectures given in the section notes through
chapter 6 are a very rough estimate and were designed for the slower course. The first few chapters of the book can be used in an introductory proofs course as is for example done at Iowa
State University Math 201, where this book is used in conjunction with Hammack’s Book of Proof .
The book normally used for the class at UIUC is Bartle and Sherbert, Introduction to Real Analysis third edition . The structure of the beginning of the book somewhat follows the standard
syllabus of UIUC Math 444 and therefore has some similarities with . A major difference is that we define the Riemann integral using Darboux sums and not tagged partitions. The Darboux
approach is far more appropriate for a course of this level.
Our approach allows us to fit a course such as UIUC 444 within a semester and still spend some extra time on the interchange of limits and end with Picard’s theorem on the existence and
uniqueness of solutions of ordinary differential equations. This theorem is a wonderful example that uses many results proved in the book. For more advanced students, material may be
covered faster so that we arrive at metric spaces and prove Picard’s theorem using the fixed point theorem as is usual.
Other excellent books exist. My favorite is Rudin’s excellent Principles of Mathematical Analysis or as it is commonly and lovingly called baby Rudin (to distinguish it from his other great
analysis textbook). I took a lot of inspiration and ideas from Rudin. However, Rudin is a bit more advanced and ambitious than this present course. For those that wish to continue mathematics,
Rudin is a fine investment. An inexpensive and somewhat simpler alternative to Rudin is Rosenlicht’s Introduction to Analysis . There is also the freely downloadable Introduction to Real
Analysis by William Trench .
A note about the style of some of the proofs: Many proofs traditionally done by contradiction, I prefer to do by a direct proof or by contrapositive. While the book does include proofs by
contradiction, I only do so when the contrapositive statement seemed too awkward, or when contradiction follows rather quickly. In my opinion, contradiction is more likely to get beginning
students into trouble, as we are talking about objects that do not exist.
I try to avoid unnecessary formalism where it is unhelpful. Furthermore, the proofs and the language get slightly less formal as we progress through the book, as more and more details are left
out to avoid clutter.
As a general rule, I use := instead of = to define an object rather than to simply show equality. I use this symbol rather more liberally than is usual for emphasis. I use it even when the context
is “local,” that is, I may simply define a function f(x) := x 2 for a single exercise or example.
Finally, I would like to acknowledge Jana Maříková, Glen Pugh, Paul Vojta, Frank Beatrous, Sönmez Şahutoğlu, Jim Brandt, Kenji Kozai, and Arthur Busch, for teaching with the book and
giving me lots of useful feedback. Frank Beatrous wrote the University of Pittsburgh version extensions, which served as inspiration for many of the recent additions. I would also like to thank
Dan Stoneham, Jeremy Sutter, Eliya Gwetta, Daniel Pimentel-Alarcón, Steve Hoerning, Yi Zhang, Nicole Caviris, Kristopher Lee, Baoyue Bi, Hannah Lund, Trevor Mannella, Mitchel Meyer,
Gregory Beauregard, Chase Meadors, Andreas Giannopoulos, an anonymous reader, and in general all the students in my classes for suggestions and finding errors and typos.
About analysis
Analysis is the branch of mathematics that deals with inequalities and limits. The present course deals with the most basic concepts in analysis. The goal of the course is to acquaint the reader
with rigorous proofs in analysis and also to set a firm foundation for calculus of one variable.
Calculus has prepared you, the student, for using mathematics without telling you why what you learned is true. To use, or teach, mathematics effectively, you cannot simply know what is true,
you must know why it is true. This course shows you why calculus is true. It is here to give you a good understanding of the concept of a limit, the derivative, and the integral.
Let us use an analogy. An auto mechanic that has learned to change the oil, fix broken headlights, and charge the battery, will only be able to do those simple tasks. He will be unable to work
independently to diagnose and fix problems. A high school teacher that does not understand the definition of the Riemann integral or the derivative may not be able to properly answer all the
students’ questions. To this day I remember several nonsensical statements I heard from my calculus teacher in high school, who simply did not understand the concept of the limit, though he
could “do” all problems in calculus.
We start with a discussion of the real number system, most importantly its completeness property, which is the basis for all that comes after. We then discuss the simplest form of a limit, the
limit of a sequence. Afterwards, we study functions of one variable, continuity, and the derivative. Next, we define the Riemann integral and prove the fundamental theorem of calculus. We
discuss sequences of functions and the interchange of limits. Finally, we give an introduction to metric spaces.
Let us give the most important difference between analysis and algebra. In algebra, we prove equalities directly; we prove that an object, a number perhaps, is equal to another object. In
analysis, we usually prove inequalities. To illustrate the point, consider the following statement.
Let x be a real number. If 0 ≤ x < ϵ is true for all real numbers ϵ > 0, then x = 0.
This statement is the general idea of what we do in analysis. If we wish to show that x = 0, we show that 0 ≤ x < ϵ for all positive ϵ.
The term real analysis is a little bit of a misnomer. I prefer to use simply analysis. The other type of analysis, complex analysis, really builds up on the present material, rather than being
distinct. Furthermore, a more advanced course on real analysis would talk about complex numbers often. I suspect the nomenclature is historical baggage.
Let us get on with the show…
1∈S
to denote that the number 1 belongs to the set S. That is, 1 is a member of S. Similarly we write
7∉S
to denote that the number 7 is not in S. That is, 7 is not a member of S. The elements of all sets under consideration come from some set we call the universe. For simplicity, we often consider
the universe to be the set that contains only the elements we are interested in. The universe is generally understood from context and is not explicitly mentioned. In this course, our universe will
most often be the set of real numbers.
While the elements of a set are often numbers, other objects, such as other sets, can be elements of a set. A set may also contain some of the same elements as another set. For example,
T := {0, 2}
contains the numbers 0 and 2. In this case all elements of T also belong to S. We write T ⊂ S. More formally we make the following definition.
i. A set A is a subset of a set B if x ∈ A implies x ∈ B, and we write A ⊂ B. That is, all members of A are also members of B.
ii. Two sets A and B are equal if A ⊂ B and B ⊂ A. We write A = B. That is, A and B contain exactly the same elements. If it is not true that A and B are equal, then we write A ≠ B.
iii. A set A is a proper subset of B if A ⊂ B and A ≠ B. We write A ⊊ B.
For example, for S and T defined above T ⊂ S, but T ≠ S. So T is a proper subset of S. If A = B, then A and B are simply two names for the same exact set. Let us mention the set building
notation,
{x ∈ A : P(x)}.
This notation refers to a subset of the set A containing all elements of A that satisfy the property P(x). The notation is sometimes abbreviated, A is not mentioned when understood from context.
Furthermore, x ∈ A is sometimes replaced with a formula to make the notation easier to read.
The following are sets including the standard notations.
i. The set of natural numbers, N := {1, 2, 3, …}.
ii. The set of integers, Z := {0, − 1, 1, − 2, 2, …}.
m
iii. The set of rational numbers, Q := { n : m, n ∈ Z and n ≠ 0}.
iv. The set of even natural numbers, {2m : m ∈ N}.
v. The set of real numbers, R.
Note that N ⊂ Z ⊂ Q ⊂ R.
There are many operations we want to do with sets.
i. A union of two sets A and B is defined as
A ∪ B := {x : x ∈ A or x ∈ B}.
A ∩ B := {x : x ∈ A and x ∈ B}.
A ∖ B := {x : x ∈ A and x ∉ B}.
iv. We say complement of B and write B c instead of A ∖ B if the set A is either the entire universe or is the obvious set containing B, and is understood from context.
v. We say sets A and B are disjoint if A ∩ B = ∅.
The notation B c may be a little vague at this point. If the set B is a subset of the real numbers R, then B c means R ∖ B. If B is naturally a subset of the natural numbers, then B c is N ∖ B. If
ambiguity would ever arise, we will use the set difference notation A ∖ B.
We illustrate the operations on the Venn diagrams in . Let us now establish one of most basic theorems about sets and logic.
Let A, B, C be sets. Then
(B ∪ C) c = B c ∩ C c,
(B ∩ C) c = B c ∪ C c,
A ∖ (B ∪ C) = (A ∖ B) ∩ (A ∖ C),
A ∖ (B ∩ C) = (A ∖ B) ∪ (A ∖ C).
The first statement is proved by the second statement if we assume the set A is our “universe.”
Let us prove A ∖ (B ∪ C) = (A ∖ B) ∩ (A ∖ C). Remember the definition of equality of sets. First, we must show that if x ∈ A ∖ (B ∪ C), then x ∈ (A ∖ B) ∩ (A ∖ C). Second, we must also
show that if x ∈ (A ∖ B) ∩ (A ∖ C), then x ∈ A ∖ (B ∪ C).
So let us assume x ∈ A ∖ (B ∪ C). Then x is in A, but not in B nor C. Hence x is in A and not in B, that is, x ∈ A ∖ B. Similarly x ∈ A ∖ C. Thus x ∈ (A ∖ B) ∩ (A ∖ C).
On the other hand suppose x ∈ (A ∖ B) ∩ (A ∖ C). In particular x ∈ (A ∖ B), so x ∈ A and x ∉ B. Also as x ∈ (A ∖ C), then x ∉ C. Hence x ∈ A ∖ (B ∪ C).
The proof of the other equality is left as an exercise.
We will also need to intersect or union several sets at once. If there are only finitely many, then we simply apply the union or intersection operation several times. However, suppose we have an
infinite collection of sets (a set of sets) {A 1, A 2, A 3, …}. We define
We can also have sets indexed by two integers. For example, we can have the set of sets {A 1 , 1, A 1 , 2, A 2 , 1, A 1 , 3, A 2 , 2, A 3 , 1, …}. Then we write
( )
∞ ∞ ∞ ∞
⋃ ⋃ An , m = ⋃ ⋃ An , m .
n = 1m = 1 n=1 m=1
∞ ∞ ∞
⋃ ⋂ {k ∈ N : mk < n} = ⋃ ∅ = ∅.
n = 1m = 1 n=1
However,
∞ ∞ ∞
⋂ ⋃ {k ∈ N : mk < n} = ⋂ N = N.
m = 1n = 1 m=1
Sometimes, the index set is not the natural numbers. In this case we need a more general notation. Suppose I is some set and for each ι ∈ I, we have a set A ι. Then we define
Induction
When a statement includes an arbitrary natural number, a common method of proof is the principle of induction. We start with the set of natural numbers N = {1, 2, 3, …}, and we give them
their natural ordering, that is, 1 < 2 < 3 < 4 < ⋯. By S ⊂ N having a least element, we mean that there exists an x ∈ S, such that for every y ∈ S, we have x ≤ y.
The natural numbers N ordered in the natural way possess the so-called well ordering property. We take this property as an axiom; we simply assume it is true.
Every nonempty subset of N has a least (smallest) element.
The principle of induction is the following theorem, which is equivalent to the well ordering property of the natural numbers.
[induction:thm] Let P(n) be a statement depending on a natural number n. Suppose that
i. (basis statement) P(1) is true,
ii. (induction step) if P(n) is true, then P(n + 1) is true.
Then P(n) is true for all n ∈ N.
Suppose S is the set of natural numbers m for which P(m) is not true. Suppose S is nonempty. Then S has a least element by the well ordering property. Let us call m the least element of S. We
know 1 ∉ S by assumption. Therefore m > 1 and m − 1 is a natural number as well. Since m was the least element of S, we know that P(m − 1) is true. But by the induction step we see that
P(m − 1 + 1) = P(m) is true, contradicting the statement that m ∈ S. Therefore S is empty and P(n) is true for all n ∈ N.
Sometimes it is convenient to start at a different number than 1, but all that changes is the labeling. The assumption that P(n) is true in “if P(n) is true, then P(n + 1) is true” is usually called the
induction hypothesis.
Let us prove that for all n ∈ N,
2 n − 1 ≤ n !.
We let P(n) be the statement that 2 n − 1 ≤ n ! is true. By plugging in n = 1, we see that P(1) is true.
Suppose P(n) is true. That is, suppose 2 n − 1 ≤ n ! holds. Multiply both sides by 2 to obtain
2 n ≤ 2(n !).
2 n ≤ 2(n !) ≤ (n + 1) !,
and hence P(n + 1) is true. By the principle of induction, we see that P(n) is true for all n, and hence 2 n − 1 ≤ n ! is true for all n ∈ N.
We claim that for all c ≠ 1,
1 − cn + 1
1 + c + c2 + ⋯ + cn = .
1−c
Proof: It is easy to check that the equation holds with n = 1. Suppose it is true for n. Then
1 + c + c 2 + ⋯ + c n + c n + 1 = (1 + c + c 2 + ⋯ + c n) + c n + 1
1 − cn + 1
= + cn + 1
1−c
1 − c n + 1 + (1 − c)c n + 1
=
1−c
1 − cn + 2
= .
1−c
There is an
Processing equivalent
math: 39% principle called strong induction. The proof that strong induction is equivalent to induction is left as an exercise.
A × B := {(x, y) : x ∈ A, y ∈ B}.
For example, the set [0, 1] × [0, 1] is a set in the plane bounded by a square with vertices (0, 0), (0, 1), (1, 0), and (1, 1). When A and B are the same set we sometimes use a superscript 2 to
denote such a product. For example [0, 1] 2 = [0, 1] × [0, 1], or R 2 = R × R (the Cartesian plane).
A function f : A → B is a subset f of A × B such that for each x ∈ A, there is a unique (x, y) ∈ f. We then write f(x) = y. Sometimes the set f is called the graph of the function rather than the
function itself.
The set A is called the domain of f (and sometimes confusingly denoted D(f)). The set
Define the function f : R → R by f(x) := sin(πx). Then f([0, \nicefrac12]) = [0, 1], f − 1({0}) = Z, etc….
[st:propinv] Let f : A → B. Let C, D be subsets of B. Then
( )
f − 1(C c) = f − 1(C) .
c
Let f : A → B be a function. The function f is said to be injective or one-to-one if f(x 1) = f(x 2) implies x 1 = x 2. In other words, for all y ∈ B the set f − 1({y}) is empty or consists of a single
element. We call such an f an injection.
The function f is said to be surjective or onto if f(A) = B. We call such an f a surjection.
A function f that is both an injection and a surjection is said to be bijective, and we say f is a bijection.
When f : A → B is a bijection, then f − 1({y}) is always a unique element of A, and we can consider f − 1 as a function f − 1 : B → A. In this case, we call f − 1 the inverse function of f. For example,
3
for the bijection f : R → R defined by f(x) := x 3 we have f − 1(x) = √x.
A final piece of notation for functions that we need is the composition of functions.
Let f : A → B, g : B → C. The function g ∘ f : A → C is defined as
(g ∘ f)(x) := g (f(x) ).
Cardinality
A subtle issue in set theory and one generating a considerable amount of confusion among students is that of cardinality, or “size” of sets. The concept of cardinality is important in modern
mathematics in general and in analysis in particular. In this section, we will see the first really unexpected theorem.
Processing math: 39%
|A| ≤ |B|
if there exists an injection from A to B. We write |A| = |B| if A and B have the same cardinality. We write |A| < |B| if |A| ≤ |B|, but A and B do not have the same cardinality.
We state without proof that |A| = |B| have the same cardinality if and only if |A| ≤ |B| and |B| ≤ |A|. This is the so-called Cantor-Bernstein-Schroeder theorem. Furthermore, if A and B are any
two sets, we can always write |A| ≤ |B| or |B| ≤ |A|. The issues surrounding this last statement are very subtle. As we do not require either of these two statements, we omit proofs.
The truly interesting cases of cardinality are infinite sets. We start with the following definition.
If \(\left\lvert {A} \right\rvert = \left\lvert
\right\rvert\), then A is said to be countably infinite. If A is finite or countably infinite, then we say A is countable. If A is not countable, then A is said to be uncountable.
The cardinality of N is usually denoted as ℵ 0 (read as aleph-naught)2.
The set of even natural numbers has the same cardinality as N. Proof: Given an even natural number, write it as 2n for some n ∈ N. Then create a bijection taking 2n to n.
In fact, let us mention without proof the following characterization of infinite sets: A set is infinite if and only if it is in one-to-one correspondence with a proper subset of itself.
N × N is a countably infinite set. Proof: Arrange the elements of N × N as follows (1, 1), (1, 2), (2, 1), (1, 3), (2, 2), (3, 1), …. That is, always write down first all the elements whose two entries
sum to k, then write down all the elements whose entries sum to k + 1 and so on. Then define a bijection with N by letting 1 go to (1, 1), 2 go to (1, 2) and so on.
The set of rational numbers is countable. Proof: (informal) Follow the same procedure as in the previous example, writing \nicefrac11, \nicefrac12, \nicefrac21, etc…. However, leave out any
fraction (such as \nicefrac22) that has already appeared.
For completeness we mention the following statement. If A ⊂ B and B is countable, then A is countable. Similarly if A is uncountable, then B is uncountable. As we will not need this statement
in the sequel, and as the proof requires the Cantor-Bernstein-Schroeder theorem mentioned above, we will not give it here.
We give the first truly striking result. First, we need a notation for the set of all subsets of a set.
If A is a set, we define the power set of A, denoted by P(A), to be the set of all subsets of A.
For example, if A := {1, 2}, then P(A) = {∅, {1}, {2}, {1, 2}}. For a finite set A of cardinality n, the cardinality of P(A) is 2 n. This fact is left as an exercise. Hence, for finite sets the cardinality
of P(A) is strictly larger than the cardinality of A. What is an unexpected and striking fact is that this statement is still true for infinite sets.
\(\left\lvert {A} \right\rvert < \left\lvert
(A)\).
There exists an injection f : A → P(A). For any x ∈ A, define f(x) := {x}. Therefore |A| ≤ |P(A)|.
To finish the proof, we must show that no function f : A → P(A) is a surjection. Suppose f : A → P(A) is a function. So for x ∈ A, f(x) is a subset of A. Define the set
B := {x ∈ A : x ∉ f(x)}.
We claim that B is not in the range of f and hence f is not a surjection. Suppose there exists an x 0 such that f(x 0) = B. Either x 0 ∈ B or x 0 ∉ B. If x 0 ∈ B, then x 0 ∉ f(x 0) = B, which is a
contradiction. If x 0 ∉ B, then x 0 ∈ f(x 0) = B, which is again a contradiction. Thus such an x 0 does not exist. Therefore, B is not in the range of f, and f is not a surjection. As f was an arbitrary
function, no surjection exists.
One particular consequence of this theorem is that there do exist uncountable sets, as P(N) must be uncountable. A related fact is that the set of real numbers (which we study in the next
chapter) is uncountable. The existence of uncountable sets may seem unintuitive, and the theorem caused quite a controversy at the time it was announced. The theorem not only says that
uncountable sets exist, but that there in fact exist progressively larger and larger infinite sets N, P(N), P(P(N)), P(P(P(N))), etc….
Exercises
Show A ∖ (B ∩ C) = (A ∖ B) ∪ (A ∖ C).
Prove that the principle of strong induction is equivalent to the standard induction.
Finish the proof of .
a. Prove .
b. Find an example for which equality of sets in f(C ∩ D) ⊂ f(C) ∩ f(D) fails. That is, find an f, A, B, C, and D such that f(C ∩ D) is a proper subset of f(C) ∩ f(D).
Prove that if A is finite, then there exists a unique number n such that there exists a bijection between A and {1, 2, 3, …, n}. In other words, the notation |A| := n is justified. Hint: Show that if
n > m, then there is no injection from {1, 2, 3, …, n} to {1, 2, 3, …, m}.
Prove
a. A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
b. A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
Let AΔB denote the symmetric difference, that is, the set of all elements that belong to either A or B, but not to both A and B.
a. Draw a Venn diagram for AΔB.
b. Show AΔB = (A ∖ B) ∪ (B ∖ A).
Processing math: 39%
a. Find A 1 ∩ A 2.
∞
b. Find ⋃ n = 1 A n.
c. Find ⋂ ∞
n = 1 A n.
Prove 1 3 + 2 3 + ⋯ + n 3 = ( n(n+1)
2 ) 2
for all n ∈ N.
Real Numbers
Basic properties
Note: 1.5 lectures
The main object we work with in analysis is the set of real numbers. As this set is so fundamental, often much time is spent on formally constructing the set of real numbers. However, we take
an easier approach here and just assume that a set with the correct properties exists. We need to start with the definitions of those properties.
An ordered set is a set S, together with a relation < such that
i. For any x, y ∈ S, exactly one of x < y, x = y, or y < x holds.
ii. If x < y and y < z, then x < z.
We write x ≤ y if x < y or x = y. We define > and ≥ in the obvious way.
For example, the set of rational numbers Q is an ordered set by letting x < y if and only if y − x is a positive rational number, that is if y − x = \nicefracpq where p, q ∈ N. Similarly, N and Z
are also ordered sets.
There are other ordered sets than sets of numbers. For example, the set of countries can be ordered by landmass, so for example India > Lichtenstein. Any time you sort a set in some way, you
are making an ordered set. A typical ordered set that you have used since primary school is the dictionary. It is the ordered set of words where the order is the so-called lexicographic ordering.
Such ordered sets appear often for example in computer science. In this class we will mostly be interested in ordered set of numbers however.
Let E ⊂ S, where S is an ordered set.
i. If there exists a b ∈ S such that x ≤ b for all x ∈ E, then we say E is bounded above and b is an upper bound of E.
ii. If there exists a b ∈ S such that x ≥ b for all x ∈ E, then we say E is bounded below and b is a lower bound of E.
iii. If there exists an upper bound b 0 of E such that whenever b is any upper bound for E we have b 0 ≤ b, then b 0 is called the least upper bound or the supremum of E. We write
sup E := b 0.
iv. Similarly, if there exists a lower bound b 0 of E such that whenever b is any lower bound for E we have b 0 ≥ b, then b 0 is called the greatest lower bound or the infimum of E. We write
inf E := b 0.
When a set E is both bounded above and bounded below, we say simply that E is bounded.
A supremum or infimum for E (even if they exist) need not be in E. For example, the set E := {x ∈ Q : x < 1} has a least upper bound of 1, but 1 is not in the set E itself. On the other hand, if
we take G := {x ∈ Q : x ≤ 1}, then the least upper bound of G is clearly also 1, and in this case 1 ∈ G. On the other hand, the set P := {x ∈ Q : x ≥ 0} has no upper bound (why?) and therefore
it can have no least upper bound. On the other hand 0 is the greatest lower bound of P.
[defn:lub] An ordered set S has the least-upper-bound property if every nonempty subset E ⊂ S that is bounded above has a least upper bound, that is sup E exists in S.
The least-upper-bound property is sometimes called the completeness property or the Dedekind completeness property.
The set Q of rational numbers does not have the least-upper-bound property. The subset {x ∈ Q : x 2 < 2} does not have a supremum in Q. The obvious supremum √2 is not rational. Suppose
x ∈ Q such that x 2 = 2. Write x = \nicefracmn in lowest terms. So (\nicefracmn) 2 = 2 or m 2 = 2n 2. Hence m 2 is divisible by 2 and so m is divisible by 2. Write m = 2k and so (2k) 2 = 2n 2.
Divide by 2 and note that 2k 2 = n 2, and hence n is divisible by 2. But that is a contradiction as \nicefracmn was in lowest terms.
That Q does not have the least-upper-bound property is one of the most important reasons why we work with R in analysis. The set Q is just fine for algebraists. But analysts require the least-
upper-bound property to do any work. We also require our real numbers to have many algebraic properties. In particular, we require that they are a field.
Processing math: 39%
Exercises
Prove part [prop:bordfield:iii] of .
[exercise:finitesethasminmax] Let S be an ordered set. Let A ⊂ S be a nonempty finite subset. Then A is bounded. Furthermore, inf A exists and is in A and sup A exists and is in A. Hint:
Use .
Let x, y ∈ F, where F is an ordered field. Suppose 0 < x < y. Show that x 2 < y 2.
Let S be an ordered set. Let B ⊂ S be bounded (above and below). Let A ⊂ B be a nonempty subset. Suppose all the inf ’s and sup ’s exist. Show that
Let S be an ordered set. Let A ⊂ S and suppose b is an upper bound for A. Suppose b ∈ A. Show that b = sup A.
Let S be an ordered set. Let A ⊂ S be a nonempty subset that is bounded above. Suppose sup A exists and sup A ∉ A. Show that A contains a countably infinite subset. In particular, A is
infinite.
Find a (nonstandard) ordering of the set of natural numbers N such that there exists a nonempty proper subset A ⊊ N and such that sup A exists in N, but sup A ∉ A.
Let F = {0, 1, 2}. a) Prove that there is exactly one way to define addition and multiplication so that F is a field if 0 and 1 have their usual meaning of (A4) and (M4). b) Show that F cannot be
an ordered field.
[exercise:dominatingb] Let S be an ordered set and A is a nonempty subset such that sup A exists. Suppose there is a B ⊂ A such that whenever x ∈ A there is a y ∈ B such that x ≤ y. Show
that sup B exists and sup B = sup A.
Processing math: 39%
[example:sqrt2] Claim: There exists a unique positive real number r such that r 2 = 2. We denote r by √2.
Take the set A := {x ∈ R : x 2 < 2}. First if x 2 < 2, then x < 2. To see this fact, note that x ≥ 2 implies x 2 ≥ 4 (use , we will not explicitly mention its use from now on), hence any number x such
that x ≥ 2 is not in A. Thus A is bounded above. On the other hand, 1 ∈ A, so A is nonempty.
Let us define r := sup A. We will show that r 2 = 2 by showing that r 2 ≥ 2 and r 2 ≤ 2. This is the way analysts show equality, by showing two inequalities. We already know that r ≥ 1 > 0.
In the following, it may seem we are pulling certain expressions out of a hat. When writing a proof such as this we would, of course, come up with the expressions only after playing around
with what we wish to prove. The order in which we write the proof is not necessarily the order in which we come up with the proof.
2 − s2
Let us first show that r 2 ≥ 2. Take a positive number s such that s 2 < 2. We wish to find an h > 0 such that (s + h) 2 < 2. As 2 − s 2 > 0, we have 2s + 1 > 0. We choose an h ∈ R such that
2 − s2
0<h< 2s + 1
. Furthermore, we assume h < 1.
(s + h) 2 − s 2 = h(2s + h)
< h(2s + 1) (since h < 1 )
< 2 − s2
(since h <
2 − s2
2s + 1
.
)
Therefore, (s + h) 2 < 2. Hence s + h ∈ A, but as h > 0 we have s + h > s. So s < r = sup A. As s was an arbitrary positive number such that s 2 < 2, it follows that r 2 ≥ 2.
s2 − 2 s2 − 2
Now take a positive number s such that s 2 > 2. We wish to find an h > 0 such that (s − h) 2 > 2. As s 2 − 2 > 0 we have 2s > 0. We choose an h ∈ R such that 0 < h < 2s and h < s.
s2 − (s − h) 2 = 2sh − h2
< 2sh
< s2 − 2
( since h <
s2 − 2
2s ) .
Furthermore, if x ≥ s − h, then x 2 ≥ (s − h) 2 > 2 (as x > 0 and s − h > 0) and so x ∉ A. Thus s − h is an upper bound for A. However, s − h < s, or in other words s > r = sup A. Thus r 2 ≤ 2.
Together, r 2 ≥ 2 and r 2 ≤ 2 imply r 2 = 2. The existence part is finished. We still need to handle uniqueness. Suppose s ∈ R such that s 2 = 2 and s > 0. Thus s 2 = r 2. However, if 0 < s < r,
then s 2 < r 2. Similarly 0 < r < s implies r 2 < s 2. Hence s = r.
The number √2 ∉ Q. The set R ∖ Q is called the set of irrational numbers. We just saw that R ∖ Q is nonempty. Not only is it nonempty, we will see later that is it very large indeed.
Using the same technique as above, we can show that a positive real number x 1 / n exists for all n ∈ N and all x > 0. That is, for each x > 0, there exists a unique positive real number r such
that r n = x. The proof is left as an exercise.
Archimedean property
As we have seen, there are plenty of real numbers in any interval. But there are also infinitely many rational numbers in any interval. The following is one of the fundamental facts about the
real numbers. The two parts of the next theorem are actually equivalent, even though it may not seem like that at first sight.
[thm:arch]
i. [thm:arch:i] (Archimedean property) If x, y ∈ R and x > 0, then there exists an n ∈ N such that
nx > y.
ii. [thm:arch:ii] (Q is dense in R) If x, y ∈ R and x < y, then there exists an r ∈ Q such that x < r < y.
Let us prove [thm:arch:i]. We divide through by x and then [thm:arch:i] says that for any real number t := \nicefracyx, we can find natural number n such that n > t. In other words, [thm:arch:i]
says that N ⊂ R is not bounded above. Suppose for contradiction that N is bounded above. Let b := sup N. The number b − 1 cannot possibly be an upper bound for N as it is strictly less than
math: 39% Thus there exists an m ∈ N such that m > b − 1. We add one to obtain m + 1 > b, which contradicts b being an upper bound.
b (the supremum).
Processing
n(y − x) > 1.
Also by [thm:arch:i] the set A := {k ∈ N : k > nx} is nonempty. By the well ordering property of N, A has a least element m. As m ∈ A, then m > nx. We divide through by n to get
x < \nicefracmn. As m is the least element of A, m − 1 ∉ A. If m > 1, then m − 1 ∈ N, but m − 1 ∉ A and so m − 1 ≤ nx. If m = 1, then m − 1 = 0, and m − 1 ≤ nx still holds as x ≥ 0. In other
words,
m − 1 ≤ nx or m ≤ nx + 1.
On the other hand from n(y − x) > 1 we obtain ny > 1 + nx. Hence ny > 1 + nx ≥ m, and therefore y > \nicefracmn. Putting everything together we obtain x < \nicefracmn < y. So let
r = \nicefracmn.
Now assume x < 0. If y > 0, then we just take r = 0. If y ≤ 0, then 0 ≤ − y < − x, and we find a rational q such that − y < q < − x. Then take r = − q.
Let us state and prove a simple but useful corollary of the .
inf {\nicefrac1n : n ∈ N} = 0.
Let A := {\nicefrac1n : n ∈ N}. Obviously A is not empty. Furthermore, \nicefrac1n > 0 and so 0 is a lower bound, and b := inf A exists. As 0 is a lower bound, then b ≥ 0. Now take an
arbitrary a > 0. By the there exists an n such that na > 1, or in other words a > \nicefrac1n ∈ A. Therefore a cannot be a lower bound for A. Hence b = 0.
Using supremum and infimum
We want to make sure that suprema and infima are compatible with algebraic operations. For a set A ⊂ R and a number x ∈ R define
x + A := {x + y ∈ R : y ∈ A},
xA := {xy ∈ R : y ∈ A}.
sup (x + A) ≤ x + b = x + sup A.
The other direction is similar. If b is an upper bound for x + A, then x + y ≤ b for all y ∈ A and so y ≤ b − x for all y ∈ A. So b − x is an upper bound for A. If b = sup (x + A), then
sup A ≤ b − x = sup (x + A) − x.
The set R ∗ is called the set of extended real numbers. It is possible to define some arithmetic on R ∗ . Most operations are extended in an obvious way, but we must leave ∞ − ∞, 0 ⋅ ( ± ∞),
±∞
and ±∞
undefined. We refrain from using this arithmetic, it leads to easy mistakes as R ∗ is not a field. Now we can take suprema and infima without fear of emptiness or unboundedness. In
this book we mostly avoid using R ∗ outside of exercises, and leave such generalizations to the interested reader.
Maxima and minima
By we know a finite set of numbers always has a supremum or an infimum that is contained in the set itself. In this case we usually do not use the words supremum or infimum.
When a set A of real numbers is bounded above, such that sup A ∈ A, then we can use the word maximum and the notation max A to denote the supremum. Similarly for infimum; when a set
A is bounded below and inf A ∈ A, then we can use the word minimum and the notation min A. For example,
While writing sup and inf may be technically correct in this situation, max and min are generally used to emphasize that the supremum or infimum is in the set itself.
Processing math: 39%
Prove the arithmetic-geometric mean inequality. That is, for two positive real numbers x, y we have
x+y
√xy ≤ 2
.
Let A and B be two nonempty bounded sets of nonnegative real numbers. Define the set C := {ab : a ∈ A, b ∈ B}. Show that C is a bounded set and that
sup C = ( sup A)( sup B) and inf C = ( inf A)( inf B).
Given x > 0 and n ∈ N, show that there exists a unique positive real number r such that x = r n. Usually r is denoted by x 1 / n.
Prove .
[exercise:bernoulliineq] Prove the so-called Bernoulli’s inequality 5 : If 1 + x > 0 then for all n ∈ N we have (1 + x) n ≥ 1 + nx.
Absolute value
Note: 0.5–1 lecture
A concept we will encounter over and over is the concept of absolute value. You want to think of the absolute value as the “size” of a real number. Let us give a formal definition.
|x| := { x
−x
if x ≥ 0,
if x < 0.
|a| = |a − b + b| ≤ |a − b| + |b|,
or |a| − |b|math:
Processing ≤ |a39%
− b|. Switching the roles of a and b we obtain or |b| − |a| ≤ |b − a| = |a − b|. Now applying again we obtain the reverse triangle inequality.
|x 1 + x 2 + ⋯ + x n + x n + 1| ≤ | x 1 + x 2 + ⋯ + x n | + | x n + 1|
≤ | x 1 | + | x 2 | + ⋯ + | x n | + | x n + 1 | . \qedhere
|x 2 − 9x + 1 | ≤ | x 2 | + | 9x | + | 1 | = | x | 2 + 9 | x | + 1.
It is obvious that |x | 2 + 9 | x | + 1 is largest when |x| is largest. In the interval provided, |x| is largest when x = 5 and so |x| = 5. One possibility for M is
M = 5 2 + 9(5) + 1 = 71.
There are, of course, other M that work. The bound of 71 is much higher than it need be, but we didn’t ask for the best possible M, just one that works.
The last example leads us to the concept of bounded functions.
Suppose f : D → R is a function. We say f is bounded if there exists a number M such that |f(x)| ≤ M for all x ∈ D.
In the example we proved x 2 − 9x + 1 is bounded when considered as a function on D = {x : − 1 ≤ x ≤ 5}. On the other hand, if we consider the same polynomial as a function on the whole
real line R, then it is not bounded.
For a function f : D → R we write
We also sometimes replace the “x ∈ D” with an expression. For example if, as before, f(x) = x 2 − 9x + 1, for − 1 ≤ x ≤ 5, a little bit of calculus shows
then
You should be careful with the variables. The x on the left side of the inequality in [prop:funcsupinf:eq] is different from the x on the right. You should really think of the first inequality as
Let us prove this inequality. If b is an upper bound for g(D), then f(x) ≤ g(x) ≤ b for all x ∈ D, and hence b is an upper bound for f(D). Taking the least upper bound we get that for all x ∈ D
Therefore sup y ∈ Dg(y) is an upper bound for f(D) and thus greater than or equal to the least upper bound of f(D).
The second inequality (the statement about the inf) is left as an exercise.
A common mistake is to conclude
The inequality [rn:av:ltnottrue] is not true given the hypothesis of the claim above. For this stronger inequality we need the stronger hypothesis
Exercises
Show that |x − y| < ϵ if and only if x − ϵ < y < x + ϵ.
Show that
x+y+ |x−y|
a. max {x, y} = 2
x+y− |x−y|
b. min {x, y} = 2
Processing math: 39%
b. Find a specific D, f, and g, such that f(x) ≤ g(x) for all x ∈ D, but
Prove without the assumption that the functions are bounded. Hint: You need to use the extended real numbers.
[exercise:sumofsup] Let D be a nonempty set. Suppose f : D → R and g : D → R are bounded functions. a) Show
sup (f(x) + g(x) ) ≤ sup f(x) + sup g(x) and inf (f(x) + g(x) ) ≥ inf f(x) + inf g(x).
x∈D x∈D x∈D x∈D x∈D x∈D
[a, b] := {x ∈ R: a ≤ x ≤ b},
(a, b) := {x ∈ R: a < x < b},
(a, b] := {x ∈ R: a < x ≤ b},
[a, b) := {x ∈ R: a ≤ x < b}.
The interval [a, b] is called a closed interval and (a, b) is called an open interval. The intervals of the form (a, b] and [a, b) are called half-open intervals.
The above intervals were all bounded intervals, since both a and b were real numbers. We define unbounded intervals,
[a, ∞) := {x ∈ R : a ≤ x},
(a, ∞) := {x ∈ R : a < x},
( − ∞, b] := {x ∈ R : x ≤ b},
( − ∞, b) := {x ∈ R : x < b}.
i. Define a k := x j, where j is the smallest j ∈ N such that x j ∈ (a k − 1, b k − 1). Such an x j exists by our assumption on X.
ii. Next, define b k := x j where j is the smallest j ∈ N such that x j ∈ (a k, b k − 1).
Notice that a k < b k and a k − 1 < a k < b k < b k − 1. Also notice that (a k, b k) does not contain x k and hence does not contain any x j for j = 1, …, k.
Claim: a j < b k for all j and k in N. Let us first assume j < k. Then a j < a j + 1 < ⋯ < a k − 1 < a k < b k. Similarly for j > k. The claim follows.
Let A = {a j : j ∈ N} and B = {b j : j ∈ N}. By and the claim above we have
sup A ≤ inf B.
Define y := sup A. The number y cannot be a member of A. If y = a j for some j, then y < a j + 1, which is impossible. Similarly y cannot be a member of B. Therefore, a j < y for all j ∈ N and
y < b j for all j ∈ N. In other words y ∈ (a j, b j) for all j ∈ N.
Finally we must show that y ∉ X. If we do so, then we will have constructed a real number not in X showing that X must have been a proper subset. Take any x k ∈ X. By the above
construction x k ∉
(a k, b k), so x k ≠ y as y ∈ (a k, b k).
Processing math: 39%
Exercises
For a < b, construct an explicit bijection from (a, b] to (0, 1].
Suppose f : [0, 1] → (0, 1) is a bijection. Using f, construct a bijection from [ − 1, 1] to R.
[exercise:intervaldef] Suppose I ⊂ R is a subset with at least 2 elements such that if a < b < c and a, c ∈ I, then it is one of the nine types of intervals explicitly given in this section.
Furthermore, prove that the intervals given in this section all satisfy this property.
Construct an explicit bijection from (0, 1] to (0, 1). Hint: One approach is as follows: First map (\nicefrac12, 1] to (0, \nicefrac12], then map (\nicefrac14, \nicefrac12] to
(\nicefrac12, \nicefrac34], etc…. Write down the map explicitly, that is, write down an algorithm that tells you exactly what number goes where. Then prove that the map is a bijection.
Construct an explicit bijection from [0, 1] to (0, 1).
a) Show that every closed interval [a, b] is the intersection of countably many open intervals. b) Show that every open interval (a, b) is a countable union of closed intervals. c) Show that an
intersection of a possibly infinite family of closed intervals is either empty, a single point, or a closed interval.
Suppose S is a set of disjoint open intervals in R. That is, if (a, b) ∈ S and (c, d) ∈ S, then either (a, b) = (c, d) or (a, b) ∩ (c, d) = ∅. Prove S is a countable set.
Prove that the cardinality of [0, 1] is the same as the cardinality of (0, 1) by showing that |[0, 1]| ≤ |(0, 1)| and |(0, 1)| ≤ |[0, 1]|. See . Note that this requires the Cantor-Bernstein-Schroeder
theorem we stated without proof. Also note that this proof does not give you an explicit bijection.
A number x is algebraic if x is a root of a polynomial with integer coefficients, in other words, a nx n + a n − 1x n − 1 + … + a 1x + a 0 = 0 where all a n ∈ Z. a) Show that there are only countably
many algebraic numbers. b) Show that there exist non-algebraic numbers (follow in the footsteps of Cantor, use uncountability of R). Hint: Feel free to use the fact that a polynomial of degree
n has at most n real roots.
We often assume d K ≠ 0. To represent n we write the sequence of digits: n = d Kd K − 1⋯d 2d 1d 0. By a (decimal) digit, we mean an integer between 0 and 9.
Similarly we represent some rational numbers. That is, for certain numbers x, we can find negative integer − M, a positive integer K, and digits d K, d K − 1, …, d 1, d 0, d − 1, …, d − M, such that
Not every real number has such a representation, even the simple rational number \nicefrac13 does not. The irrational number √2 does not have such a representation either. To get a
representation for all real numbers we must allow infinitely many digits.
Let us from now on consider only real numbers in the interval (0, 1]. If we find a representation for these, we simply add integers to them to obtain a representation for all real numbers.
Suppose we take an infinite sequence of decimal digits:
0.d 1d 2d 3….
That is, we have a digit d j for every j ∈ N. We have renumbered the digits to avoid the negative signs. We say this sequence of digits represents a real number x if
x = sup
n∈N ( d1
10
+
d2
10 2
+
d3
10 3
+⋯+
dn
10 n ) .
We call
d1 d2 d3 dn
D n := + + +⋯+
10 10 2 10 3 10 n
1
Dn < x ≤ Dn + for all n ∈ N.
10 n
Let us start with the first item. Suppose there is an infinite sequence of digits 0.d 1d 2d 3…. We use the geometric sum formula to write \[\begin{split} D_n = \frac{d_1}{10} + \frac{d_2}
{1-\nicefrac{1}{10}} \right) = 1-{(\nicefrac{1}{10})}^{n} < 1 . \end{split}\] In particular, D n < 1 for all n. As D n ≥ 0 is obvious, we obtain
0 ≤ sup D n ≤ 1,
n∈N
We move on to the second item. Take x ∈ (0, 1]. First let us tackle the existence. By convention define D 0 := 0, then automatically we obtain D 0 < x ≤ D 0 + 10 − 0. Suppose for induction that
we defined all the digits d 1, d 2, …, d n, and that D n < x ≤ D n + 10 − n. We need to define d n + 1.
By the of the real numbers we find an integer j such that x − D n ≤ j10 − ( n + 1 ) . We take the least such j and obtain
Let d n + 1 := j − 1. As D n < x, then d n + 1 = j − 1 ≥ 0. On the other hand since x − D n ≤ 10 − n we have that j is at most 10, and therefore d n + 1 ≤ 9. So d n + 1 is a decimal digit. Since
D n + 1 = D n + d n + 110 − ( n + 1 ) we add D n to the inequality [eq:theDnjineq] above:
D n + 1 = D n + (j − 1)10 − ( n + 1 ) < x ≤
And so D n + 1 < x ≤ D n + 1 + 10 − ( n + 1 ) holds. We have inductively defined an infinite sequence of digits 0.d 1d 2d 3…. As D n < x for all n, then sup {D n : n ∈ N} ≤ x. As x − 10 − n ≤ D n, then
x − 10 − n ≤ sup {D m : m ∈ N} for all n. The two inequalities together imply sup {D n : n ∈ N} = x.
What is left to show is the uniqueness. Suppose 0.e 1e 2e 3… is another representation of x. Let E n be the n-digit truncation of 0.e 1e 2e 3…, and suppose E n < x ≤ E n + 10 − n for all n ∈ N.
Suppose for some K ∈ N, e n = d n for all n < K, so D K − 1 = E K − 1. Then
e K < (x − D K − 1)10 K ≤ e K + 1.
Similarly we obtain
d K < (x − D K − 1)10 K ≤ d K + 1.
Hence, both e K and d K are the largest integer j such that j < (x − D K − 1)10 K, and therefore e K = d K. That is, the representation is unique.
The representation is not unique if we do not require the extra condition in the proposition. For example, for the number \nicefrac12 the method in the proof obtains the representation
0.49999….
However, we also have the representation 0.5000…. The key requirement that makes the representation unique is D n < x for all n. The inequality x ≤ D n + 10 − n is true for every representation
by the computation in the beginning of the proof.
The only numbers that have nonunique representations are ones that end either in an infinite sequence of 0s or 9s, because the only representation for which D n = x is one where all digits past n
th one are zero. In this case there are exactly two representations of x (see the exercises).
Let us give another proof of the uncountability of the reals using decimal representations. This is Cantor’s second proof, and is probably more well known. While this proof may seem shorter, it
is because we have already done the hard part above and we are left with a slick trick to prove that R is uncountable. This trick is called Cantor diagonalization and finds use in other proofs as
well.
The set (0, 1] is uncountable.
Let X := {x 1, x 2, x 3, …} be any countable subset of real numbers in (0, 1]. We will construct a real number not in X. Let
n n n
x n = 0.d 1 d 2 d 3 …
be the unique representation from the proposition, that is d nj is the jth digit of the nth number. Let e n := 1 if d nn ≠ 1, and let e n := 2 if d nn = 1. Let E n be the n-digit truncation of y = 0.e 1e 2e 3….
Because all the digits are nonzero we get that E n < E n + 1 ≤ y. Therefore
E n < y ≤ E n + 10 − n
for all n, and the representation is the unique one for y from the proposition. But for every n, the nth digit of y is different from the nth digit of x n, so y ≠ x n. Therefore y ∉ X, and as X was an
arbitrary countable subset, (0, 1] must be uncountable.
Using decimal digits we can also find lots of numbers that are not rational. The following proposition is true for every rational number, but we give it only for x ∈ (0, 1] for simplicity.
[prop:rationaldecimal] If x ∈ (0, 1] is a rational number and x = 0.d 1d 2d 3…, then the decimal digits eventually start repeating. That is, there are positive integers N and P, such that for all
n ≥ N, d n = d n + P.
Let x = \nicefracpq for positive integers p and q. Let us suppose x is a number with a unique representation, as otherwise we have seen above that both its representations are repeating.
To compute the first digit we take 10p and divide by q. The quotient is the first digit d 1 and the remainder r is some integer between 0 and q − 1. That is, d 1 is the largest integer such that
d 1q ≤ 10p and then r = 10p − d 1q.
The next digit is computed by dividing 10r by q, and so on. We notice that at each step there are at most q possible remainders and hence at some point the process must start repeating. In fact
we see that P is at most q.
The converse of the proposition is also true and is left as an exercise.
The number
x = 0.101001000100001000001…,
is irrational. That is, the digits are n zeros, then a one, then n + 1 zeros, then a one, and so on and so forth. The fact that x is irrational follows from the proposition; the digits never start
repeating. For every P, if we go far enough, we find a 1 that is followed by at least P + 1 zeros.
Exercises
What is the decimal representation of 1 guaranteed by ? Make sure to show that it does satisfy the condition.
Prove the converse of , that is, if the digits in the decimal representation of x are eventually repeating, then x must be rational.
m
Show that real numbers x ∈ (0, 1) with nonunique decimal representation are exactly the rational numbers that can be written as for some integers m and n. In this case show that there
10 n
exist exactly two representations of x.
Let b ≥ 2 be an integer. Define a representation of a real number in [0, 1] in terms of base b rather than base 10 and prove for base b.
∞
{x n} n = 1 ,
to denote a sequence.
A sequence {x n} is bounded if there exists a B ∈ R such that
|x n | ≤ B for all n ∈ N.
When we need to give a concrete sequence we often give each term as a formula in terms of n. For example, {\nicefrac1n} ∞
n = 1 , or simply {\nicefrac1n}, stands for the sequence
1, \nicefrac12, \nicefrac13, \nicefrac14, \nicefrac15, …. The sequence {\nicefrac1n} is a bounded sequence (B = 1 will suffice). On the other hand the sequence {n} stands for 1, 2, 3, 4, …, and
this sequence is not bounded (why?).
While the notation for a sequence is similar7 to that of a set, the notions are distinct. For example, the sequence {( − 1) n} is the sequence − 1, 1, − 1, 1, − 1, 1, …, whereas the set of values, the
range of the sequence, is just the set { − 1, 1}. We can write this set as {( − 1) n : n ∈ N}. When ambiguity can arise, we use the words sequence or set to distinguish the two concepts.
Another example of a sequence is the so-called constant sequence. That is a sequence {c} = c, c, c, c, … consisting of a single constant c ∈ R repeating indefinitely.
We now get to the idea of a limit of a sequence. We will see in that the notation below is well defined. That is, if a limit exists, then it is unique. So it makes sense to talk about the limit of a
sequence.
| |
A sequence {x n} is said to converge to a number x ∈ R, if for every ϵ > 0, there exists an M ∈ N such that x n − x < ϵ for all n ≥ M. The number x is said to be the limit of {x n}. We write
lim x n := x.
n→∞
A sequence that converges is said to be convergent. Otherwise, the sequence is said to be divergent.
It is good to know intuitively what a limit means. It means that eventually every number in the sequence is close to the number x. More precisely, we can get arbitrarily close to the limit,
provided we go far enough in the sequence. It does not mean we ever reach the limit. It is possible, and quite common, that there is no x n in the sequence that equals the limit x. We illustrate the
concept in . In the figure we first think of the sequence as a graph, as it is a function of N. Secondly we also plot it as a sequence of labeled points on the real line.
When we write lim x n = x for some real number x, we are saying two things. First, that {x n} is convergent, and second that the limit is x.
The above definition is one of the most important definitions in analysis, and it is necessary to understand it perfectly. The key point in the definition is that given any ϵ > 0, we can find an M.
The M can depend on ϵ, so we only pick an M once we know ϵ. Let us illustrate this concept on a few examples.
The constant sequence 1, 1, 1, 1, … is convergent and the limit is 1. For every ϵ > 0, we pick M = 1.
Claim: The sequence {\nicefrac1n} is convergent and
1
lim = 0.
n→∞
n
Proof: Given an ϵ > 0, we find an M ∈ N such that 0 < \nicefrac1M < ϵ ( at work). Then for all n ≥ M we have that
| x n − 0 | = | n | = n ≤ M < ϵ.
1 1 1
1
The sequence {( − 1) n} is divergent. Proof: If there were a limit x, then for ϵ = 2
we expect an M that satisfies the definition. Suppose such an M exists, then for an even n ≥ M we compute
| |
\nicefrac12 > x n − x = |1 − x| and | |
\nicefrac12 > x n + 1 − x = |− 1 − x|.
But
| |
Suppose the sequence {x n} has the limit x and the limit y. Take an arbitrary ϵ > 0. From the definition find an M 1 such that for all n ≥ M 1, x n − x < \nicefracϵ2. Similarly find an M 2 such
| |
that for all n ≥ M 2 we have x n − y < \nicefracϵ2. Take M := max {M 1, M 2}. For n ≥ M (so that both n ≥ M 1 and n ≥ M 2) we have
As |y − x| < ϵ for all ϵ > 0, then |y − x| = 0 and y = x. Hence the limit (if it exists) is unique.
A convergent sequence {x n} is bounded.
| |
Suppose {x n} converges to x. Thus there exists an M ∈ N such that for all n ≥ M we have x n − x < 1. Let B 1 := |x| + 1 and note that for n ≥ M we have
|xn | = |xn − x + x |
≤ |x n − x | + |x|
< 1 + |x| = B 1.
| || | | |
The set { x 1 , x 2 , …, x M − 1 } is a finite set and hence let
| || | |
B 2 := max { x 1 , x 2 , …, x M − 1 }. |
Let B := max {B 1, B 2}. Then for all n ∈ N we have
|xn | ≤ B. \qedhere
The sequence {( − 1) n} shows that the converse does not hold. A bounded sequence is not necessarily convergent.
n2 + 1
lim 2 = 1.
n→∞n +n
1
Given ϵ > 0, find M ∈ N such that M+1 < ϵ. Then for any n ≥ M we have
\begin{split} %\abs{\frac{n^2+1}{n^2+n} - 1} & = %\abs{\frac{n^2+1 - (n^2+n)}{n^2+n}} \\ \left\lvert {\frac{n^2+1}{n^2+n} - 1} \right\rvert = \left\lvert {\frac{n^2+1 - (n^2+n)}{n^2+n}} \ri
n2 + 1
Therefore, lim = 1.
n2 + n
Monotone sequences
The simplest type of a sequence is a monotone sequence. Checking that a monotone sequence converges is as easy as checking that it is bounded. It is also easy to find the limit for a convergent
monotone sequence, provided we can find the supremum or infimum of a countable set of numbers.
A sequence {x n} is monotone increasing if x n ≤ x n + 1 for all n ∈ N. A sequence {x n} is monotone decreasing if x n ≥ x n + 1 for all n ∈ N. If a sequence is either monotone increasing or
monotone decreasing, we can simply say the sequence is monotone. Some authors also use the word monotonic.
For example, {\nicefrac1n} is monotone decreasing, the constant sequence {1} is both monotone increasing and monotone decreasing, and {( − 1) n} is not monotone. First few terms of a
sample monotone increasing sequence are shown in .
[thm:monotoneconv] A monotone sequence {x n} is bounded if and only if it is convergent.
Let us suppose the sequence is monotone increasing. Suppose the sequence is bounded, so there exists a B such that x n ≤ B for all n, that is the set {x n : n ∈ N} is bounded from above. Let
x := sup {x n : n ∈ N}.
Let ϵ > 0 be arbitrary. As x is the supremum, then there must be at least one M ∈ N such that x M > x − ϵ (because x is the supremum). As {x n} is monotone increasing, then it is easy to see
(by ) that x n ≥ x M for all n ≥ M. Hence
|xn − x | = x − xn ≤ x − xM < ϵ.
Therefore the sequence converges to x. We already know that a convergent sequence is bounded, which completes the other direction of the implication.
The proof for monotone decreasing sequences is left as an exercise.
1
Take the sequence { }.
√n
1
First > 0 for all n ∈ N, and hence the sequence is bounded from below. Let us show that it is monotone decreasing. We start with √n + 1 ≥ √n (why is that true?). From this inequality we
√n
obtain math: 39%
Processing
n→∞
lim
1
√n
= inf
{ 1
√n
:n ∈ N .
}
1
We already know that the infimum is greater than or equal to 0, as 0 is a lower bound. Take a number b ≥ 0 such that b ≤ for all n. We square both sides to obtain
√n
1
b2 ≤ for all n ∈ N.
n
1
We have seen before that this implies that b 2 ≤ 0 (a consequence of the ). As we also have b 2 ≥ 0, then b 2 = 0 and so b = 0. Hence b = 0 is the greatest lower bound, and lim = 0.
√n
A word of caution: We must show that a monotone sequence is bounded in order to use . For example, the sequence {1 + \nicefrac12 + ⋯ + \nicefrac1n} is a monotone increasing sequence
that grows very slowly. We will see, once we get to series, that this sequence has no upper bound and so does not converge. It is not at all obvious that this sequence has no upper bound.
A common example of where monotone sequences arise is the following proposition. The proof is left as an exercise.
[prop:supinfseq] Let S ⊂ R be a nonempty bounded set. Then there exist monotone sequences {x n} and {y n} such that x n, y n ∈ S and
Tail of a sequence
For a sequence {x n}, the K-tail (where K ∈ N) or just the tail of the sequence is the sequence starting at K + 1, usually written as
{x n + K} ∞
n=1 or {x n} n∞= K + 1.
The main result about the tail of a sequence is the following proposition.
Let {x n} ∞
n = 1 be a sequence. Then the following statements are equivalent:
∞
i. [prop:ktail:i] The sequence {x n} n = 1 converges.
ii. [prop:ktail:ii] The K-tail {x n + K} ∞
n = 1 converges for all K ∈ N.
iii. [prop:ktail:iii] The K-tail {x n + K} ∞
n = 1 converges for some K ∈ N.
Furthermore, if any (and hence all) of the limits exist, then for any K ∈ N
lim x n = lim x n + K.
n→∞ n→∞
It is clear that [prop:ktail:ii] implies [prop:ktail:iii]. We will therefore show first that [prop:ktail:i] implies [prop:ktail:ii], and then we will show that [prop:ktail:iii] implies [prop:ktail:i]. In the
process we will also show that the limits are equal.
Let us start with [prop:ktail:i] implies [prop:ktail:ii]. Suppose {x n} converges to some x ∈ R. Let K ∈ N be arbitrary. Define y n := x n + K, we wish to show that {y n} converges to x. That is,
| |
given an ϵ > 0, there exists an M ∈ N such that x − x n < ϵ for all n ≥ M. Note that n ≥ M implies n + K ≥ M. Therefore, it is true that for all n ≥ M we have that
|x − yn | = |x − xn + K | < ϵ.
Therefore {y n} converges to x.
Let us move to [prop:ktail:iii] implies [prop:ktail:i]. Let K ∈ N be given, define y n := x n + K, and suppose that {y n} converges x ∈ R. That is, given an ϵ > 0, there exists an M ′ ∈ N such that
\nicefrac117 < \nicefrac110 < \nicefrac325 < \nicefrac18 > \nicefrac541 > \nicefrac326 > \nicefrac765 > \nicefrac110 > \nicefrac997 > \nicefrac558 > ….
That is if we throw away the first 3 terms and look at the 3 tail it is decreasing. The proof is left as an exercise. Since the 3-tail is monotone and bounded below by zero, it is convergent, and
therefore the sequence is convergent.
Subsequences
A very useful concept related to sequences is that of a subsequence. A subsequence of {x n} is a sequence that contains only some of the numbers from {x n} in the same order.
Let {x n} be a sequence. Let {n i} be a strictly increasing sequence of natural numbers (that is n 1 < n 2 < n 3 < ⋯). The sequence
{x n } i∞= 1
i
lim x n = lim x n .
i
n→∞ i→∞
Suppose lim n → ∞x n = x. That means that for every ϵ > 0 we have an M ∈ N such that for all n ≥ M
| x n − x | < ϵ.
It is not hard to prove (do it!) by that n i ≥ i. Hence i ≥ M implies n i ≥ M. Thus, for all i ≥ M we have
|
x n − x < ϵ,
i |
and we are done.
Existence of a convergent subsequence does not imply convergence of the sequence itself. Take the sequence 0, 1, 0, 1, 0, 1, …. That is, x n = 0 if n is odd, and x n = 1 if n is even. The sequence
{x n} is divergent, however, the subsequence {x 2n} converges to 1 and the subsequence {x 2n + 1} converges to 0. Compare .
Exercises
In the following exercises, feel free to use what you know from calculus to find the limit, if it exists. But you must prove that you found the correct limit, or prove that the series is divergent.
Is the sequence {3n} bounded? Prove or disprove.
Is the sequence {n} convergent? If so, what is the limit?
Is the sequence
{ } ( − 1) n
2n
convergent? If so, what is the limit?
Is the sequence { } n
n+1
convergent? If so, what is the limit?
Is the sequence
{ } n
n2 + 1
convergent? If so, what is the limit?
a. Show that lim x n = 0 (that is, the limit exists and is zero) if and only if lim x n = 0. | |
| |
b. Find an example such that { x n } converges and {x n} diverges.
Is the sequence
{} 2n
n!
convergent? If so, what is the limit?
lim x n = x k.
n→∞
x n := { n
\nicefrac1n
if n is odd,
if n is even.
| |
Let {x n} be a sequence and x ∈ R. Suppose for any ϵ > 0, there is an M such that for all n ≥ M, x n − x ≤ ϵ. Show that lim x n = x.
Let {x n} be a sequence and x ∈ R such that there exists a k ∈ N such that for all n ≥ k, x n = x. Prove that {x n} converges to x.
Let {x n} be a sequence and define a sequence {y n} by y 2k := x k 2 and y 2k − 1 = x k for all k ∈ N. Prove that {x n} converges if and only if {y n} converges. Furthermore, prove that if they
converge, then lim x n = lim y n.
n
Show that the 3-tail of the sequence defined by x n := is monotone decreasing. Hint: Suppose n ≥ m ≥ 4 and consider the numerator of the expression x n − x m.
n 2 + 16
Suppose that {x n} is a sequence such that the subsequences {x 2n}, {x 2n − 1}, and {x 3n} all converge. Show that {x n} is convergent.
an ≤ xn ≤ bn for all n ∈ N.
lim a n = lim b n.
n→∞ n→∞
The intuitive idea of the proof is illustrated in . If x is the limit of a n and b n, then if they are both within \nicefracϵ3 of x, then the distance between a n and b n is at most \nicefrac2ϵ3. As x n is
between a n and b n it is at most \nicefrac2ϵ3 from a n. Since a n is at most \nicefracϵ3 away from x, then x n must be at most ϵ away from x. Let us follow through on this intuition rigorously.
| | | |
Find an M 1 such that for all n ≥ M 1 we have that a n − x < \nicefracϵ3, and an M 2 such that for all n ≥ M 2 we have b n − x < \nicefracϵ3. Set M := max {M 1, M 2}. Suppose n ≥ M. We
compute
|x n − a n | = x n − a n ≤ b n − a n
= |b n − x + x − a n |
≤ |b n − x | + |x − a n |
ϵ ϵ 2ϵ
< + = .
3 3 3
|x n − x | = | x n − x + a n − a n |
≤ |x n − a n | + | a n − x |
2ϵ ϵ
< + = ϵ.
3 3
1 1
0≤ ≤
n√ n n
for all n ∈ N. We already know lim \nicefrac1n = 0. Hence, using the constant sequence {0} and the sequence {\nicefrac1n} in the squeeze lemma, we conclude
1
lim = 0.
n→∞ n √n
Limits also preserve inequalities.
[limandineq:lemma] Let {x n} and {y n} be convergent sequences and
x n ≤ y n,
lim x n ≤ lim y n.
Processing math: 39% n→∞ n→∞
y n − x n + x − y < ϵ, or y n − x n < y − x + ϵ.
x − y < ϵ.
Because ϵ > 0 was arbitrary we obtain x − y ≤ 0, as we have seen that a nonnegative number less than any positive ϵ is zero. Therefore x ≤ y.
An easy corollary is proved using constant sequences in . The proof is left as an exercise.
[limandineq:cor]
i. Let {x n} be a convergent sequence such that x n ≥ 0, then
lim x n ≥ 0.
n→∞
a ≤ x n ≤ b,
a ≤ lim x n ≤ b.
n→∞
In and we cannot simply replace all the non-strict inequalities with strict inequalities. For example, let x n := \nicefrac− 1n and y n := \nicefrac1n. Then x n < y n, x n < 0, and y n > 0 for all n.
However, these inequalities are not preserved by the limit operation as we have lim x n = lim y n = 0. The moral of this example is that strict inequalities may become non-strict inequalities
when limits are applied; if we know x n < y n for all n, we may only conclude
lim x n ≤ lim y n.
n→∞ n→∞
lim (x ny n) = lim z n =
n→∞ n→∞ ( )( )
lim x n
n→∞
lim y n .
n→∞
xn
iv. [prop:contalg:iv] If lim y n ≠ 0 and y n ≠ 0 for all n ∈ N, then the sequence {z n}, where z n := , converges and
yn
xn
lim = lim z n =
y
n→∞ n n→∞
Let us start with [prop:contalg:i]. Suppose {x n} and {y n} are convergent sequences and write z n := x n + y n. Let x := lim x n, y := lim y n, and z := x + y.
| | | |
Let ϵ > 0 be given. Find an M 1 such that for all n ≥ M 1 we have x n − x < \nicefracϵ2. Find an M 2 such that for all n ≥ M 2 we have y n − y < \nicefracϵ2. Take M := max {M 1, M 2}. For all
n ≥ M we have
Therefore [prop:contalg:i] is proved. Proof of [prop:contalg:ii] is almost identical and is left as an exercise.
Let us tackle [prop:contalg:iii]. Suppose again that {x n} and {y n} are convergent sequences and write z n := x ny n. Let x := lim x n, y := lim y n, and z := xy.
ϵ
| | |
Let ϵ > 0 be given. As {x n} is convergent, it is bounded. Therefore, find a B > 0 such that x n ≤ B for all n ∈ N. Find an M 1 such that for all n ≥ M 1 we have x n − x < | 2( |y| +1)
. Find an
ϵ
|
M 2 such that for all n ≥ M 2 we have y n − y <| 2B
. Take M := max {M 1, M 2}. For all n ≥ M we have
= | x n(y n − y) + (x n − x)y |
= | x n ||y n − y | + |x n − x ||y|
≤ B |y n − y | + |x n − x ||y|
ϵ ϵ
<B + |y|
2B 2(|y| + 1)
ϵ ϵ
< + = ϵ.
2 2
Finally let us tackle [prop:contalg:iv]. Instead of proving [prop:contalg:iv] directly, we prove the following simpler claim:
Claim: If {y n} is a convergent sequence such that lim y n ≠ 0 and y n ≠ 0 for all n ∈ N, then
1 1
lim = .
n → ∞ yn lim y n
Once the claim is proved, we take the sequence {\nicefrac1y n}, multiply it by the sequence {x n} and apply item [prop:contalg:iii].
Proof of claim: Let ϵ > 0 be given. Let y := lim y n. Find an M such that for all n ≥ M we have
|y|
| | |
|y| = y − y n + y n ≤ y − y n + y n < | | | 2 | |
+ yn .
| |
Subtracting \nicefrac|y|2 from both sides we obtain \nicefrac|y|2 < y n , or in other words,
1 2
< .
|y|
| |
yn
| || |
1
yn
−
1
y
=
y − yn
yy n
|y − yn |
=
|y| |y n |
|y − yn | 2
<
|y| |y|
ϵ
|y| 2 2 2
< = ϵ.
|y| |y|
n→∞
( )
lim cx n = c lim x n
n→∞
and lim (c + x n) = c + lim x n.
n→∞ n→∞
lim
n→∞
√xn = √nlim
→∞
x n.
Of course to even make this statement, we need to apply to show that lim x n ≥ 0, so that we can take the square root without worry.
| |
First suppose x = 0. Let ϵ > 0 be given. Then there is an M such that for all n ≥ M we have x n = x n < ϵ 2, or in other words
√xn < ϵ. Hence
Processing math: 39%
|√xn − √x | = |√ | xn − x
x n + √x
1
= |x n − x |
√xn + √x
1
≤
√x
|xn − x |.
We leave the rest of the proof to the reader.
A similar proof works for the kth root. That is, we also obtain lim x 1n / k = ( lim x n) 1 / k. We leave this to the reader as a challenging exercise.
We may also want to take the limit past the absolute value sign. The converse of this proposition is not true, see part b).
| |
If {x n} is a convergent sequence, then { x n } is convergent and
n→∞
| |
lim x n =
| |
lim x n .
n→∞
| |x n | − |x| | |
≤ xn − x . |
| |
Hence if x n − x can be made arbitrarily small, so can | |x n | − |x| | . Details are left to the reader.
Let us see an example putting the above propositions together. Since we know that lim \nicefrac1n = 0, then
lim
n→∞
|√1 + \nicefrac1n − \nicefrac100n | = |√1 + ( lim \nicefrac1n) − 100( lim \nicefrac1n)( lim \nicefrac1n) | = 1.
2
That is, the limit on the left hand side exists because the right hand side exists. You really should read the above equality from right to left.
Recursively defined sequences
Now that we know we can interchange limits and algebraic operations, we can compute the limits of many sequences. One such class are recursively defined sequences, that is, sequences
where the next number in the sequence computed using a formula from a fixed number of preceding elements in the sequence.
Let {x n} be defined by x 1 := 2 and
2
xn − 2
x n + 1 := x n − .
2x n
We must first find out if this sequence is well defined; we must show we never divide by zero. Then we must find out if the sequence converges. Only then can we attempt to find the limit.
First let us prove x n exists and x n > 0 for all n (so the sequence is well defined and bounded below). Let us show this by . We know that x 1 = 2 > 0. For the induction step, suppose x n > 0.
Then
x 2n − 2 2x 2n − x 2n + 2 x 2n + 2
xn + 1 = xn − = = .
2x n 2x n 2x n
2
If x n > 0, then x n + 2 > 0 and hence x n + 1 > 0.
Next let us show that the sequence is monotone decreasing. If we show that x 2n − 2 ≥ 0 for all n, then x n + 1 ≤ x n for all n. Obviously x 21 − 2 = 4 − 2 = 2 > 0. For an arbitrary n we have
( ) (x )
2 4 2 2 4 2 2 2
xn + 2 2 x n + 4x n + 4 − 8x n x n − 4x n + 4 n −2
2
xn + 1 − 2 = −2= = = .
2x n 4x n
2 2
4x n 4x n
2
2
Since any number squared is nonnegative, we have that x n + 1 − 2 ≥ 0 for all n. Therefore, {x n} is monotone decreasing and bounded (x n > 0 for all n), and the limit exists. It remains to find the
limit.
Let us write
2x nx n + 1 = x 2n + 2.
Since {x n + 1} is the 1-tail of {x n}, it converges to the same limit. Let us define x := lim x n. We take the limit of both sides to obtain
2x 2 = x 2 + 2,
2
or x = 2. As x n > 0 for all n we get x ≥ 0, and therefore x = √2.
You may have seen the above sequence before. It is the Newton’s method 8 for finding the square root of 2. This method comes up very often in practice and converges very rapidly. Notice that
we have used the fact that x 21 − 2 > 0, although it was not strictly needed to show convergence by considering a tail of the sequence. In fact the sequence converges as long as x 1 ≠ 0, although
Processing math: 39%
You should, however, be careful. Before taking any limits, you must make sure the sequence converges. Let us see an example.
Suppose x 1 := 1 and x n + 1 := x 2n + x n. If we blindly assumed that the limit exists (call it x), then we would get the equation x = x 2 + x, from which we might conclude x = 0. However, it is not
hard to show that {x n} is unbounded and therefore does not converge.
The thing to notice in this example is that the method still works, but it depends on the initial value x 1. If we set x 1 := 0, then the sequence converges and the limit really is 0. An entire branch
of mathematics, called dynamics, deals precisely with these issues.
Some convergence tests
It is not always necessary to go back to the definition of convergence to prove that a sequence is convergent. We first give a simple convergence test. The main idea is that {x n} converges to x
| |
if and only if { x n − x } converges to zero.
[convzero:prop] Let {x n} be a sequence. Suppose there is an x ∈ R and a convergent sequence {a n} such that
lim a n = 0
n→∞
and
|xn − x | ≤ an
for all n. Then {x n} converges and lim x n = x.
| |
Let ϵ > 0 be given. Note that a n ≥ 0 for all n. Find an M ∈ N such that for all n ≥ M we have a n = a n − 0 < ϵ. Then, for all n ≥ M we have
lim c n = 0.
n→∞
c n = (1 + r) n ≥ 1 + nr.
By the of the real numbers, the sequence {1 + nr} is unbounded (for any number B, we find an n ∈ N such that nr ≥ B − 1). Therefore c n is unbounded.
1
Now let c < 1. Write c = 1+r, where r > 0. Then
1 1 11
cn = ≤ ≤ .
(1 + r) n 1 + nr rn
1 11
As { n } converges to zero, so does { r n }. Hence, {c n} converges to zero.
If we look at the above proposition, we note that the ratio of the (n + 1)th term and the nth term is c. We generalize this simple result to a larger class of sequences. The following lemma will
come up again once we get to series.
[seq:ratiotest] Let {x n} be a sequence such that x n ≠ 0 for all n and such that the limit
|xn + 1 |
L := lim
n→∞ x
| n|
exists.
i. If L < 1, then {x n} converges and lim x n = 0.
ii. If L > 1, then {x n} is unbounded (hence diverges).
If L exists, but L = 1, the lemma says nothing. We cannot make any conclusion based on that information alone. For example, the sequence {\nicefrac1n} converges to zero, but L = 1. The
constant sequence {1} converges to 1, not zero, and also L = 1. The sequence {( − 1) n} does not converge at all, and L = 1. Finally the sequence {lnn} is unbounded, yet again L = 1.
| xn + 1 |
Suppose L < 1. As ≥ 0, we have that L ≥ 0. Pick r such that L < r < 1. We wish to compare the sequence to the sequence r n. The idea is that while the sequence is not going to be less
| xn |
than L eventually, it will eventually be less than r, which is still less than 1. The intuitive idea of the proof is illustrated in .
As r − L > 0, there exists an M ∈ N such that for all n ≥ M we have
||| || |
xn + 1
− L < r − L.
xn
Therefore,
Processing math: 39%
Now suppose L > 1. Pick r such that 1 < r < L. As L − r > 0, there exists an M ∈ N such that for all n ≥ M we have
|xn + 1 |
| |xn |
−L
| < L − r.
Therefore,
|xn + 1 |
> r.
|xn |
Again for n > M we write
converge.
A simple application of the above lemma is to prove that
2n
lim = 0.
n→∞ n!
2 n + 1 / (n + 1) ! 2n + 1 n! 2
= = .
2n / n ! 2 n (n + 1) ! n+1
2
It is not hard to see that { n + 1 } converges to zero. The conclusion follows by the lemma.
Exercises
Prove . Hint: Use constant sequences and .
Prove part [prop:contalg:ii] of .
Prove that if {x n} is a convergent sequence, k ∈ N, then
n→∞
k
lim x n =
( )
n→∞
lim x n k.
Hint: Use .
1 2
Suppose x 1 := 2 and x n + 1 := x n . Show that {x n} converges and find lim x n. Hint: You cannot divide by zero!
n − cos ( n )
Let x n := n
. Use the to show that {x n} converges and find the limit.
1 1 xn yn
Let x n := and y n := n . Define z n := and w n := xn . Do {z n} and {w n} converge? What are the limits? Can you apply ? Why or why not?
n2 yn
2
True or false, prove or find a counterexample. If {x n} is a sequence such that {x n } converges, then {x n} converges.
Show that
n2
lim n
= 0.
n→∞ 2
|x n + 1 − x |
L := lim
lim x n
n→∞
1/k
=
( )
lim x n
n→∞
1 / k.
1/k
x n − x1 / k 1
Hint: Find an expression q such that xn − x
= q
.
Let r > 0. Show that starting with any x 1 ≠ 0, the sequence defined by
2
xn − r
x n + 1 := x n −
2x n
[liminflimsup:def] Let {x n} be a bounded sequence. Let a n := sup {x k : k ≥ n} and b n := inf {x k : k ≥ n}. Define
For a bounded sequence, liminf and limsup always exist (see below). It is possible to define liminf and limsup for unbounded sequences if we allow ∞ and − ∞. It is not hard to generalize the
following results to include unbounded sequences, however, we first restrict our attention to bounded ones.
Let {x n} be a bounded sequence. Let a n and b n be as in the definition above.
i. The sequence {a n} is bounded monotone decreasing and {b n} is bounded monotone increasing. In particular, lim inf x n and lim sup x n exist.
ii. lim sup x n = inf {a n : n ∈ N} and lim inf x n = sup {b n : n ∈ N}.
n→∞ n→∞
iii. lim inf x n ≤ lim sup x n.
n→∞ n→∞
Let us see why {a n} is a decreasing sequence. As a n is the least upper bound for {x k : k ≥ n}, it is also an upper bound for the subset {x k : k ≥ (n + 1)}. Therefore a n + 1, the least upper bound
for {x k : k ≥ (n + 1)}, has to be less than or equal to a n, that is, a n ≥ a n + 1. Similarly (an exercise), b n is an increasing sequence. It is left as an exercise to show that if x n is bounded, then a n and
b n must be bounded.
The second item in the proposition follows as the sequences {a n} and {b n} are monotone.
For the third item, we note that b n ≤ a n, as the inf of a set is less than or equal to its sup . We know that {a n} and {b n} converge to the limsup and the liminf (respectively). We apply to
obtain
Let {x n} be defined by
{
n+1
n if n is odd,
x n :=
0 if n is even.
Let us compute the lim inf and lim sup of this sequence. First the limit inferior:
lim sup x n = 1.
n→∞
Define a n := sup {x k : k ≥ n}. Write x := lim sup x n = lim a n. Define the subsequence as follows. Pick n 1 := 1 and work inductively. Suppose we have defined the subsequence until n k for
some k. Now pick some m > n k such that
1
a (n − xm < .
k+1) k+1
We can do this as a ( n is a supremum of the set {x n : n ≥ n k + 1} and hence there are elements of the sequence arbitrarily close (or even possibly equal) to the supremum. Set n k + 1 := m. The
k+1)
subsequence {x n } is defined. Next we need to prove that it converges and has the right limit.
k
Note that a ( n ≥ a n (why?) and that a n ≥ x n . Therefore, for every k > 1 we have
k−1+1) k k k
| k k |
an − xn = an − xn
k k
≤ a (n − xn
k−1+1) k
1
< .
k
Let us show that {x n } converges to x. Note that the subsequence need not be monotone. Let ϵ > 0 be given. As {a n} converges to x, then the subsequence {a n } converges to x. Thus there
k k
exists an M 1 ∈ N such that for all k ≥ M 1 we have
| |
ϵ
an − x < .
k 2
1 ϵ
≤ .
M2 2
| k | |
x − xn = an − xn + x − an
k k k |
|
≤ an − xn + x − an
k k | | k |
1 ϵ
< +
k 2
1 ϵ ϵ ϵ
≤ + ≤ + = ϵ.
M2 2 2 2
If lim inf x n = lim sup x n, then we know that {a n} and {b n} have limits and that these two limits are the same. By the squeeze lemma (), {x n} converges and
Now suppose {x n} converges to x. We know by that there exists a subsequence {x n } that converges to lim sup x n. As {x n} converges to x, every subsequence converges to x and therefore
k
lim sup x n = lim x n = x. Similarly lim inf x n = x.
k
The middle inequality has been proved already. We will prove the third inequality, and leave the first inequality as an exercise.
We want to prove that lim sup x n ≤ lim sup x n. Define a j := sup {x k : k ≥ j} as usual. Also define c j := sup {x n : k ≥ j}. It is not true that c j is necessarily a subsequence of a j. However, as
k k
n k ≥ k for all k, we have that {x n : k ≥ j} ⊂ {x k : k ≥ j}. A supremum of a subset is less than or equal to the supremum of the set and therefore
k
c j ≤ a j.
We apply to conclude
lim c j ≤ lim a j,
j→∞ j→∞
Similarly we get the following useful test for convergence of a bounded sequence. We leave the proof as an exercise.
[seqconvsubseqconv:thm] A bounded sequence {x n} is convergent and converges to x if and only if every convergent subsequence {x n } converges to x.
k
Bolzano-Weierstrass theorem
While it is not true that a bounded sequence is convergent, the Bolzano-Weierstrass theorem tells us that we can at least find a convergent subsequence. The version of Bolzano-Weierstrass that
we present in this section is the Bolzano-Weierstrass for sequences.
[thm:bwseq] Suppose a sequence {x n} of real numbers is bounded. Then there exists a convergent subsequence {x n }.
i
We use . It says that there exists a subsequence whose limit is lim sup x n.
The reader might complain right now that is strictly stronger than the Bolzano-Weierstrass theorem as presented above. That is true. However, only applies to the real line, but Bolzano-
Weierstrass applies in more general contexts (that is, in R n) with pretty much the exact same statement.
As the theorem is so important to analysis, we present an explicit proof. The following proof generalizes more easily to different contexts.
As the sequence is bounded, then there exist two numbers a 1 < b 1 such that a 1 ≤ x n ≤ b 1 for all n ∈ N.
We will define a subsequence {x n } and two sequences {a i} and {b i}, such that {a i} is monotone increasing, {b i} is monotone decreasing, a i ≤ x n ≤ b i and such that lim a i = lim b i. That
i i
x n converges follows by the .
i
We define the sequences inductively. We will always have that a i < b i, and that x n ∈ [a i, b i] for infinitely many n ∈ N. We have already defined a 1 and b 1. We take n 1 := 1, that is x n = x 1.
1
ak + bk
Now suppose that up to some k ∈ N we have defined the subsequence x n , x n , …, x n , and the sequences a 1, a 2, …, a k and b 1, b 2, …, b k. Let y := 2
. Clearly a k < y < b k. If there exist
1 2 k
infinitely many j ∈ N such that x j ∈ [a k, y], then set a k + 1 := a k, b k + 1 := y, and pick n k + 1 > n k such that x n ∈ [a k, y]. If there are not infinitely many j such that x j ∈ [a k, y], then it must be
k+1
true that there are infinitely many j ∈ N such that x j ∈ [y, b k]. In this case pick a k + 1 := y, b k + 1 := b k, and pick n k + 1 > n k such that x n ∈ [y, b k].
k+1
Now we have the sequences defined. What is left to prove is that lim a i = lim b i. Obviously the limits exist as the sequences are monotone. From the construction, it is obvious that b i − a i is
bi − ai
cut in half in each step. Therefore b i + 1 − a i + 1 = 2
. By , we obtain that
b1 − a1
bi − ai = .
2i − 1
x = sup {a i : i ∈ N}
Now let y := lim b i = inf {b i : i ∈ N}. Obviously y ≤ x as a i < b i for all i. As the sequences are monotone, then for any i we have (why?)
b1 − a1
y − x ≤ bi − ai = .
2i − 1
b1 − a1
As is arbitrarily small and y − x ≥ 0, we have that y − x = 0. We finish by the .
2i − 1
Yet another proof of the Bolzano-Weierstrass theorem is to show the following claim, which is left as a challenging exercise. Claim: Every sequence has a monotone subsequence.
Infinite limits
If we allow lim inf and lim sup to take on the values ∞ and − ∞, we can apply lim inf and lim sup to all sequences, not just bounded ones. For any sequence, we write
Processing math: 39%
Exercises
Suppose {x n} is a bounded sequence. Define a n and b n as in . Show that {a n} and {b n} are bounded.
Prove .
( − 1) n
a. Let x n := , find lim sup x n and lim inf x n.
n
(n − 1)( − 1) n
b. Let x n := , find lim sup x n and lim inf x n.
n
Let {x n} and {y n} be bounded sequences such that x n ≤ y n for all n. Then show that
and
Hint: Find a subsequence {x n + y n } of {x n + y n} that converges. Then find a subsequence {x n } of {x n } that converges. Then apply what you know about limits.
i i mi i
c. Find an explicit {x n} and {y n} such that
Cauchy sequences
Note: 0.5–1 lecture
Often we wish to describe a certain number by a sequence that converges to it. In this case, it is impossible to use the number itself in the proof that the sequence converges. It would be nice if
we could check for convergence without knowing the limit.
A sequence {x n} is a Cauchy sequence 11 if for every ϵ > 0 there exists an M ∈ N such that for all n ≥ M and all k ≥ M we have
|xn − xk | < ϵ.
Intuitively what it means is that the terms of the sequence are eventually arbitrarily close to each other. We would expect such a sequence to be convergent. It turns out that is true because R has
the . First, let us look at some examples.
The sequence {\nicefrac1n} is a Cauchy sequence.
Proof: Given ϵ > 0, find M such that M > \nicefrac2ϵ. Then for n, k ≥ M we have that \nicefrac1n < \nicefracϵ2 and \nicefrac1k < \nicefracϵ2. Therefore for n, k ≥ M we have
| | || ||
1 1
−
n k
≤
1
n
+
1
k
ϵ ϵ
< + = ϵ.
2 2
n+1
The sequence { n
} is a Cauchy sequence.
Proof: Given ϵ > 0, find M such that M > \nicefrac2ϵ. Then for n, k ≥ M we have that \nicefrac1n < \nicefracϵ2 and \nicefrac1k < \nicefracϵ2. Therefore for n, k ≥ M we have
| n+1
n
−
k+1
k | |
=
|
k(n + 1) − n(k + 1)
nk
|
=
|
kn + k − nk − n
nk
| |
=
k−n
nk
| || |
≤
k
nk
+
−n
nk
1 1 ϵ ϵ
= + < + = ϵ.
n k 2 2
| |
Suppose {x n} is Cauchy. Pick M such that for all n, k ≥ M we have x n − x k < 1. In particular, we have that for all n ≥ M
|xn − xM | < 1.
| | | | | |
Or by the reverse triangle inequality, x n − x M ≤ x n − x M < 1. Hence for n ≥ M we have
| || | |
B := max { x 1 , x 2 , …, x M − 1 , 1 + x M }. | | |
| |
Then x n ≤ B for all n ∈ N.
ϵ
|xn − x | < 2 .
Hence for n ≥ M and k ≥ M we have
ϵ ϵ
|xn − xk | = |xn − x + x − xk | ≤ |xn − x | + |x − xk | < 2 + 2 = ϵ.
Alright, that direction was easy. Now suppose {x n} is Cauchy. We have shown that {x n} is bounded. If we show that
then {x n} must be convergent by . Assuming that liminf and limsup exist is where we use the .
Define a := lim sup x n and b := lim inf x n. By , there exist subsequences {x n } and {x m }, such that
i i
| |
that for all n, k ≥ M 3 we have x n − x k < \nicefracϵ3. Let M := max {M 1, M 2, M 3}. Note that if i ≥ M, then n i ≥ M and m i ≥ M. Hence
|
|a − b| = a − x n + x n − x m + x m − b
i i i i |
| | |
≤ a − xn + xn − xm + xm − b
i i i | | i |
ϵ ϵ ϵ
< + + = ϵ.
3 3 3
| | | |
It should be noted that the Cauchy criterion is stronger than just x n + 1 − x n (or x n + j − x n for a fixed j) going to zero as n goes to infinity. In fact, when we get to the partial sums of the
| |
harmonic series (see in the next section), we will have a sequence such that x n + 1 − x n = \nicefrac1n, yet {x n} is divergent. In fact, for that sequence it is true that lim n → ∞ x n + j − x n = 0 for
any j ∈ N (confer ). The key point in the definition of Cauchy is that n and k vary independently and can be arbitrarily far apart.
Exercises
n2 − 1
Prove that { } is Cauchy using directly the definition of Cauchy sequences.
n2
Let {x n} be a sequence such that there exists a 0 < C < 1 such that
|xn + 1 − xn | ≤ C |xn − xn − 1 |.
Prove that {x n} is Cauchy. Hint: You can freely use the formula (for C ≠ 1)
1 − Cn + 1
1 + C + C2 + ⋯ + Cn = .
1−C
Suppose F is an ordered field that contains the rational numbers Q, such that Q is dense, that is: whenever x, y ∈ F are such that x < y, then there exists a q ∈ Q such that x < q < y. Say a
sequence {x n} ∞ | |
n = 1 of rational numbers is Cauchy if given any ϵ ∈ Q with ϵ > 0, there exists an M such that for all n, k ≥ M we have x n − x k < ϵ. Suppose any Cauchy sequence of rational
numbers has a limit in F. Prove that F has the .
Let {x n} and {y n} be sequences such that lim y n = 0. Suppose that for all k ∈ N and for all m ≥ k we have
| x m − x k | ≤ y k.
Show that {x n} is Cauchy.
Suppose a Cauchy sequence {x n} is such that for every M ∈ N, there exists a k ≥ M and an n ≥ M such that x k < 0 and x n > 0. Using simply the definition of a Cauchy sequence and of a
convergent sequence, show that the sequence converges to 0.
| |
Suppose x n − x k ≤ \nicefracnk 2 for all n and k. Show that {x n} is Cauchy.
Suppose {x n} is a Cauchy sequence such that for infinitely many n, x n = c. Using only the definition of Cauchy sequence prove that lim x n = c.
True/False prove or find a counterexample: If {x n} is a Cauchy sequence then there exists an M such that for all n ≥ M we have x n + 1 − x n ≤ x n − x n − 1 . | | | |
Series
Note: 2 lectures
A fundamental object in mathematics is that of a series. In fact, when foundations of analysis were being developed, the motivation was to understand series. Understanding series is very
important in applications of analysis. For example, solving differential equations often includes series, and differential equations are the basis for understanding almost all of modern science.
Definition
Given a sequence {x n}, we write the formal object
∑ xn or sometimes just ∑ xn
n=1
s k := ∑ x n = x 1 + x 2 + ⋯ + x k,
n=1
∑ x n = x.
n=1
∞
In this case, we cheat a little and treat ∑n = 1 xn as a number.
Processing math: 39%
∞ k
∑ xn = lim ∑ x n.
n=1 k → ∞n = 1
We should be careful to only use this equality if the limit on the right actually exists. That is, the right-hand side does not make sense (the limit does not exist) if the series does not converge.
Before going further, let us remark that it is sometimes convenient to start the series at an index different from 1. That is, for example we can write
∞ ∞
∑ r n = ∑ r n − 1.
n=0 n=1
The left-hand side is more convenient to write. The idea is the same as the notation for the tail of a sequence.
It is common to write the series ∑ x n as
x1 + x2 + x3 + ⋯
with the understanding that the ellipsis indicates a series and not a simple sum. We do not use this notation as it often leads to mistakes in proofs.
The series
∞
1
∑ n
n=1 2
∞ k
1 1
∑ = lim ∑ = 1.
n=1 2n k → ∞n = 1 2
n
( )
k
1 1
∑ n
+ = 1.
n=1 2 2k
The equality is easy to see when k = 1. The proof for general k follows by , which we leave to the reader. Let s k be the partial sum. We write
| |||
k
1 1 1
|1 − s k | = 1− ∑
n=1 2
n =
2k
=
2k
.
1
The sequence {
2k | |
} and therefore { 1 − s k } converges to zero. So, {s k} converges to 1.
∑ rn
n=0
1
converges. In fact, ∑ ∞ n
n = 0r = 1−r
. The proof is left as an exercise to the reader. The proof consists of showing
k−1
1 − rk
∑ rn = 1−r
,
n=0
∞ ∞
( )
k M−1 k
∑ xn = ∑ xn + ∑ x n.
n=1 n=1 n=M
Note that ∑ M −1
n = 1 x n is a fixed number. Now use to finish the proof.
Cauchy series
A series ∑ x n is said to be Cauchy or a Cauchy series, if the sequence of partial sums {s n} is a Cauchy sequence.
A sequence of real numbers converges if and only if it is Cauchy. Therefore a series is convergent if and only if it is Cauchy.
The series ∑ x n is Cauchy if for every ϵ > 0, there exists an M ∈ N, such that for every n ≥ M and k ≥ M we have
Processing math: 39%
∑ xj − ∑ xj < ϵ.
j=1 j=1
|( ) ( )| | |
k n k
∑ xj − ∑ xj = ∑ x j < ϵ.
j=1 j=1 j=n+1
| |
k
∑ x j < ϵ.
j=n+1
Basic properties
Let ∑ x n be a convergent series. Then the sequence {x n} is convergent and
lim x n = 0.
n→∞
Let ϵ > 0 be given. As ∑ x n is convergent, it is Cauchy. Thus we find an M such that for every n ≥ M we have
| |
n+1
ϵ> ∑
j=n+1
|
xj = xn + 1 . |
Proof: We will show that the sequence of partial sums is unbounded, and hence cannot converge. Write the partial sums s n for n = 2 k as:
s 1 = 1,
s 2 = (1) + () 1
2
,
s 4 = (1) + () ( )
1
2
+
1 1
+
3 4
,
s 8 = (1) + () ( ) (
1
2
+
1 1
+
3 4
+
1 1 1 1
+ + +
5 6 7 8
, )
⋮
( )
k 2j
1
s2k = 1 + ∑ ∑ m
.
j = 1 m = 2j − 1 + 1
2k 2k
1 1 1 1
∑ m
≥ ∑ k
= (2 k − 1) =
2
.
m = 2k − 1 + 1 m = 2k − 1 + 1 2 2k
Therefore
( )
k 2k k
1 1 k
s2k = 1 + ∑ ∑ m
≥1+ ∑2 =1+
2
.
j = 1 m = 2k − 1 + 1 j=1
k 1
As { 2 } is unbounded by the , that means that {s 2 k} is unbounded, and therefore {s n} is unbounded. Hence {s n} diverges, and consequently ∑ n
diverges.
Convergent series are linear. That is, we can multiply them by constants and add them and these operations are done term by term.
Let α ∈ R and ∑ x n and ∑ y n be convergent series. Then
i. ∑ αx n is a convergent series and
∞ ∞
∑ αx n = α ∑ x n.
n=1 n=1
∑ (x n + y n) = ∑ x n + ∑ yn .
n=1 n=1 n=1
For the first item, we simply write the kth partial sum
( )
k k
∑ αx n = α ∑ x n .
n=1 n=1
We look at the right-hand side and note that the constant multiple of a convergent sequence is convergent. Hence, we simply take the limit of both sides to obtain the result.
For the second item we also look at the kth partial sum
( )( )
k k k
∑ (x n + y n) = ∑ x n + ∑ yn .
n=1 n=1 n=1
We look at the right-hand side and note that the sum of convergent sequences is convergent. Hence, we simply take the limit of both sides to obtain the proposition.
Note that multiplying series is not as simple as adding, see the next section. It is not true, of course, that we can multiply term by term, since that strategy does not work even for finite sums.
For example, (a + b)(c + d) ≠ ac + bd.
Absolute convergence
Since monotone sequences are easier to work with than arbitrary sequences, it is generally easier to work with series ∑ x n where x n ≥ 0 for all n. Then the sequence of partial sums is monotone
increasing and converges if it is bounded from above. Let us formalize this statement as a proposition.
If x n ≥ 0 for all n, then ∑ x n converges if and only if the sequence of partial sums is bounded from above.
As the limit of a monotone increasing sequence is the supremum, have the inequality
k ∞
∑ x n ≤ ∑ x n.
n=1 n=1
The following criterion often gives a convenient way to test for convergence of a series.
| |
A series ∑ x n converges absolutely if the series ∑ x n converges. If a series converges, but does not converge absolutely, we say it is conditionally convergent.
| |
A series is convergent if and only if it is Cauchy. Hence suppose ∑ x n is Cauchy. That is, for every ϵ > 0, there exists an M such that for all k ≥ M and n > k we have
| |
n n
∑
j=k+1
|xj | = ∑
j=k+1
|xj | < ϵ.
| |
n n
∑
j=k+1
xj ≤ ∑
j=k+1
| x j | < ϵ.
Hence ∑ x n is Cauchy and therefore it converges.
| |
Of course, if ∑ x n converges absolutely, the limits of ∑ x n and ∑ x n are different. Computing one does not help us compute the other.
Absolutely convergent series have many wonderful properties. For example, absolutely convergent series can be rearranged arbitrarily, or we can multiply such series together easily.
Conditionally convergent series on the other hand do not often behave as one would expect. See the next section.
We leave as an exercise to show that
∞
( − 1) n
∑ n
n=1
converges, although the reader should finish this section before trying. On the other hand we proved
∞
1
∑n
n=1
( −1) n
diverges. Therefore ∑ n is a conditionally convergent subsequence.
Since the terms of the series are all nonnegative, the sequences of partial sums are both monotone increasing. Since x n ≤ y n for all n, the partial sums satisfy for all k
Processing math: 39%
∑ x n ≤ ∑ y n.
n=1 n=1
If the series ∑ y n converges the partial sums for the series are bounded. Therefore the right-hand side of [comptest:eq] is bounded for all k. Hence the partial sums for ∑ x n are also bounded.
Since the partial sums are a monotone increasing sequence they are convergent. The first item is thus proved.
On the other hand if ∑ x n diverges, the sequence of partial sums must be unbounded since it is monotone increasing. That is, the partial sums for ∑ x n are eventually bigger than any real
number. Putting this together with [comptest:eq] we see that for any B ∈ R, there is a k such that
k k
B≤ ∑ x n ≤ ∑ y n.
n=1 n=1
Hence the partial sums for ∑ y n are also unbounded, and ∑ y n also diverges.
A useful series to use with the comparison test is the p-series.
For p ∈ R, the series
∞
1
∑ p
n=1 n
Now suppose p > 1. We proceed in a similar fashion as we did in the case of the harmonic series, but instead of showing that the sequence of partial sums is unbounded we show that it is
bounded. Since the terms of the series are positive, the sequence of partial sums is monotone increasing and will converge if we show that it is bounded above. Let s n denote the nth partial sum.
s 1 = 1,
s 3 = (1) +
( 1
2p
+
1
3p ),
s 7 = (1) +
( 1
2p
+
1
3p )( +
1
4p
+
1
5p
+
1
6p
+
1
7p ) ,
( )
k − 1 2j + 1 − 1
1
s 2k − 1 = 1 + ∑ ∑ .
j=1 m = 2j mp
1 1 1 1 1 1 1 1 1 1 1 1
Instead of estimating from below, we estimate from above. In particular, as p is positive, then 2 p < 3 p, and hence + < + . Similarly + + + < + + + .
2p 3p 2p 2p 4p 5p 6p 7p 4p 4p 4p 4p
Therefore \[\begin{split} s_{2^k-1} & = 1+ \sum_{j=1}^k \left( \sum_{m=2^{j}}^{2^{j+1}-1} \frac{1}{m^p} \right) \\ & < 1+ \sum_{j=1}^k \left( \sum_{m=2^{j}}^{2^{j+1}-1} \frac{1}
( )
∞
1 j
∑
j=1 2p − 1
converges. Therefore
( ) ( )
k ∞
1 j 1 j
s2k − 1 < 1 + ∑ ≤1+ ∑ .
j=1 2p − 1 j=1 2p − 1
( )
∞
1 j
sn < 1 + ∑ .
j=1 2p − 1
Ratio test
Let ∑ x n be a series such that
From we note that if L > 1, then x n diverges. Since it is a necessary condition for the convergence of series that the terms go to zero, we know that ∑ x n must diverge.
| |
Thus suppose L < 1. We will argue that ∑ x n must converge. The proof is similar to that of . Of course L ≥ 0. Pick r such that L < r < 1. As r − L > 0, there exists an M ∈ N such that for all
n≥M
||| || |
xn + 1
− L < r − L.
xn
Therefore,
|xn + 1 |
< r.
|xn |
For n > M (that is for n ≥ M + 1) write
( ) ( | |)
k M k
∑ |x n | = ∑ |x n | + ∑ xn
n=1 n=1 n=M+1
( | |) ( | | )
M k
≤ ∑ xn + ∑ ( x M r − M)r n
n=1 n=M+1
( | |) | | ( )
M k
≤ ∑ xn + ( x M r − M) ∑ rn .
n=1 n=M+1
( ) ( )
k M k
∑ |x n | ≤ ∑ |x n | | |
+ ( x M r − M) ∑ rn
n=1 n=1 n=M+1
( | |) ( )
M ∞
≤ ∑
n=1
xn | |
+ ( x M r − M) ∑
n=M+1
rn .
| | | |
The right-hand side is a number that does not depend on n. Hence the sequence of partial sums of ∑ x n is bounded and ∑ x n is convergent. Thus ∑ x n is absolutely convergent.
The series
∞
2n
∑ n!
n=1
converges absolutely.
Proof: We write
2 ( n + 1 ) / (n + 1) ! 2
lim = lim = 0.
n→∞ 2n / n ! n→∞n +1
Exercises
For r ≠ 1, prove
n−1
1 − rn
∑ rk = 1−r
.
k=0
n−1
Hint: Let s := ∑ k = 0 r k, then compute s(1 − r) = s − rs, and solve for s.
Processing math: 39%
∞
1
∑ rn = 1−r
.
n=0
∞ ∞
∑ xj , k
k=1
( ) ( )
n ∞ ∞ n
∑ ∑ xj , k = ∑ ∑ xj , k .
j=1 k=1 k=1 j=1
Prove the following stronger version of the ratio test: Let ∑ x n be a series.
| xn + 1 |
a. If there is an N and a ρ < 1 such that for all n ≥ N we have < ρ, then the series converges absolutely.
| xn |
| xn + 1 |
b. If there is an N such that for all n ≥ N we have ≥ 1, then the series diverges.
| xn |
Let {x n} be a decreasing sequence such that ∑ x n converges. Show that lim nx n = 0.
n→∞
∞ n
( − 1)
Show that ∑ n
converges. Hint: consider the sum of two subsequent entries.
n=1
| |
∞ ∞
∑ xn ≤ ∑ |x n | .
n=1 n=1
Prove the limit comparison test. That is, prove that if a n > 0 and b n > 0 for all n, and
an
0 < lim < ∞,
n → ∞ bn
|
[exercise:badnocauchy] Let x n = ∑ nj= 1\nicefrac1j. Show that for every k we have lim x n + k − x n = 0, yet {x n} is not Cauchy.
n→∞
|
Let s k be the kth partial sum of ∑ x n.
a) Suppose that there exists a m ∈ N such that lim s mk exists and lim x n = 0. Show that ∑ x n converges.
k→∞
b) Find an example where lim s 2k exists and lim x n ≠ 0 (and therefore ∑ x n diverges).
k→∞
c) (Challenging) Find an example where lim x n = 0, and there exists a subsequence {s k } such that lim s k exists, but ∑ x n still diverges.
j j
j→∞
More on series
Note: up to 2–3 lectures (optional, can safely be skipped or covered partially)
Root test
We have seen the ratio test before. There is one more similar test called the root test. In fact, the proof of this test is similar and somewhat easier.
Let ∑ x n be a series and let
words x n | |
k | | | |
> r n k > 1. The subsequence { x n }, and therefore also { x n }, cannot possibly converge to zero, and so the series diverges.
k
Now suppose L < 1. Pick r such that L < r < 1. By definition of limit supremum, pick M such that for all n ≥ M we have
| | 1 / k : k ≥ n} < r.
sup { x k
( )( )( )( )
k M k M k
∑ |x n | = ∑ |x n | + ∑ |x n | ≤ ∑ |x n | + ∑ rn .
n=1 n=1 n=M+1 n=1 n=M+1
rM + 1
As 0 < r < 1, the geometric series ∑ ∞ n
n = M + 1 r converges to 1−r
. As everything is positive we have
( )
k M
rM + 1
∑ |x n | ≤ ∑ |x n | +
1−r
.
n=1 n=1
| |
Thus the sequence of partial sums of ∑ x n is bounded, and so the series converges. Therefore ∑ x n converges absolutely.
∑ ( − 1) nx n
n=1
converges.
m
Write s m := ∑ k = 1 ( − 1) kx k be the mth partial sum. Then write
2n n
s 2n = ∑ ( − 1) kx k = ( − x 1 + x 2) + ⋯ + ( − x 2n − 1 + x 2n) = ∑ ( − x 2k − 1 + x 2k).
k=1 k=1
The sequence {x k} is decreasing and so ( − x 2k − 1 + x 2k) ≤ 0 for all k. Therefore the subsequence {s 2n} of partial sums is a decreasing sequence. Similarly, (x 2k − x 2k + 1) ≥ 0, and so
s 2n = − x 1 + (x 2 − x 3) + ⋯ + (x 2n − 2 − x 2n − 1) + x 2n ≥ − x 1.
The sequence {s 2n} is decreasing and bounded below, so it converges. Let a := lim s 2n.
We wish to show that lim s m = a (not just for the subsequence). Notice
s 2n + 1 = s 2n + x 2n + 1.
| |
Given ϵ > 0, pick M such that s 2n − a < \nicefracϵ2 whenever 2n ≥ M. Since lim x n = 0, we also make M possibly larger to obtain x 2n + 1 < \nicefracϵ2 whenever 2n ≥ M. If 2n ≥ M, we
| |
have s 2n − a < \nicefracϵ2 < ϵ, so we just need to check the situation for s 2n + 1:
∞
( − 1) n
∑
n=1 np
converges for arbitrarily small p > 0, but it does not converge absolutely when p ≤ 1.
Rearrangements
Generally, absolutely convergent series behave as we imagine they should. For example, absolutely convergent series can be summed in any order whatsoever. Nothing of the sort holds for
conditionally convergent series (see and ).
Take a series
∑ x n.
Processing math: 39% n=1
∑ xσ ( k ) .
k=1
|( ) |
M ∞
ϵ ϵ
∑ xn
n=1
−x <
2
and ∑
n=M+1
|xn | < 2 .
As σ is a bijection, there exists a number K such that for each n ≤ M, there exists k ≤ K such that σ(k) = n. In other words {1, 2, …, M} ⊂ σ ({1, 2, …, K} ).
Then for any N ≥ K, let Q := max σ({1, 2, …, K}) and compute
|( ) |
N
∑ xσ ( n )
n=1
−x =
|( M
∑ xn + ∑
n=1
N
n=1
σ(n) >M
xσ ( n )
)| −x
|( ) |
M N
≤ ∑ xn
n=1
−x + ∑
n=1
|x σ ( n ) |
σ(n) >M
|( ) |
M Q
≤ ∑ xn
n=1
−x + ∑
n=M+1
|x n |
< \nicefracϵ2 + \nicefracϵ2 = ϵ.
| |
So ∑ x σ ( n ) converges to x. To see that the convergence is absolute, we apply the above argument to ∑ x n to show that ∑ x σ ( n ) converges. | |
[example:harmonsumanything] Let us show that the alternating harmonic series \(\sum \frac
{n}\), which does not converge absolutely, can be rearranged to converge to anything. The odd terms and the even terms both diverge to infinity (prove this!):
∞ ∞
1 1
∑ 2n − 1 = ∞, and ∑ 2n = ∞.
n=1 n=1
{n}\) for simplicity, let an arbitrary number L ∈ R be given, and set σ(1) := 1. Suppose we have defined σ(n) for all n ≤ N. If
∑ aσ ( n ) ≤ L,
n=1
then let σ(N + 1) := k be the smallest odd k ∈ N that we have not used yet, that is σ(n) ≠ k for all n ≤ N. Otherwise let σ(N + 1) := k be the smallest even k that we have not yet used.
By construction σ : N → N is one-to-one. It is also onto, because if we keep adding either odd (resp. even) terms, eventually we will pass L and switch to the evens (resp. odds). So we switch
infinitely many times.
Finally, let N be the N where we just pass L and switch. For example suppose we have just switched from odd to even (so we start subtracting), and let N ′ > N be where we first switch back
from even to odd. Then
N−1 N′ −1
1 1
L+
σ(N)
≥ ∑ aσ ( n ) > ∑ aσ ( n ) > L − .
n=1 n=1 σ(N ′ )
1
And similarly for switching in the other direction. Therefore, the sum up to N ′ − 1 is within of L. As we switch infinitely many times we obtain that σ(N) → ∞ and
min { σ ( N ) , σ ( N ′ ) }
σ(N ′ ) → ∞, and hence \[\sum_{n=1}^\infty a_{\sigma(n)} = \sum_{n=1}^\infty \frac
{\sigma(n)} = L .\]
Here is an example to illustrate the proof. Suppose L = 1.2, then the order is
1 + \nicefrac13 − \nicefrac12 + \nicefrac15 + \nicefrac17 + \nicefrac19 − \nicefrac14 + \nicefrac111 + \nicefrac113 − \nicefrac16 + \nicefrac115 + \nicefrac117 + \nicefrac119 − \nicefrac18 + ⋯
At this point
Processing math:we are
39% no more than \nicefrac18 from the limit.
c n = a 0b n + a 1b n − 1 + ⋯ + a nb 0 = ∑ a jb n − j ,
j=0
converges to AB.
The series ∑ c n is called the Cauchy product of ∑ a n and ∑ b n.
Suppose ∑ a n converges absolutely, and let ϵ > 0 be given. In this proof instead of picking complicated estimates just to make the final estimate come out as less than ϵ, let us simply obtain an
estimate that depends on ϵ and can be made arbitrarily small.
Write
m m
A m := ∑ a n, B m := ∑ b n.
n=0 n=0
|( ) | |( ) |
m m n
∑ cn − AB = ∑ ∑ a jb n − j − AB
n=0 n = 0j = 0
|( ) |
m
= ∑ B na m − n − AB
n=0
|( ) |
m
= ∑ (B n − B)a m − n + BA m − AB
n=0
( || | )
m
≤ ∑
n=0
|B n − B am − n |
+ |B| A m − A |
|
We can surely make the second term on the right hand side go to zero. The trick is to handle the first term. Pick K such that for all m ≥ K we have A m − A < ϵ and also B m − B < ϵ. Finally, | | |
as ∑ a n converges absolutely, make sure that K is large enough such that for all m ≥ K,
∑ | a n | < ϵ.
n=K
| |
As ∑ b n converges, then we have that B max := sup { B n − B : n = 0, 1, 2, …} is finite. Take m ≥ 2K, then in particular m − K + 1 > K. So
\begin{split} %\left( \sum_{n=0}^m \left\lvert { B_n - B } \right\rvert \left\lvert {a_{m-n}} \right\rvert %\right) & = \left( \sum_{n=0}^{m-K} \left\lvert { B_n - B } \right\rvert \left\lvert {a_{m-
|( ) | ( )
m m
∑ cn
n=0
− AB ≤ ∑
n=0
|Bn − B ||am − n | |
+ |B| A m − A |
( | |) ( ( | |) )
∞ ∞
∞
The expression in the parenthesis on the right hand side is a fixed number. Hence, we can make the right hand side arbitrarily small by picking a small enough ϵ > 0. So ∑ n = 0 c n converges to
AB.
1
If both series are only conditionally convergent, the Cauchy product series need not even converge. Suppose we take a n = b n = ( − 1) n . The series ∑ ∞ ∞
n = 0 a n = ∑ n = 0 b n converges by the
√n + 1
alternating series test, however, it does not converge absolutely as can be seen from the p-test. Let us look at the Cauchy product.
c_n = {(-1)}^n \left( \frac{1}{\sqrt{n+1}} + \frac{1}{\sqrt{2n}} + \frac{1}{\sqrt{3(n-1)}} + \cdots + %\frac{1}{\sqrt{2n}} + \frac{1}{\sqrt{n+1}} \right) = {(-1)}^n \sum_{j=0}^n \frac{1}{\sqr
Therefore
n n
1 1
|cn | = j∑= 0 √(j + 1)(n − j + 1) ≥ j∑= 0 √(n + 1)(n + 1) = 1.
The terms do not go to zero and hence ∑ c n cannot converge.
Power series
Fix x 0 ∈ R. A power series about x 0 is a series of the form
∑ a n(x − x 0) n.
n=0
A power series is really a function of x, and many important functions in analysis can be written as a power series.
We say that a power series is convergent if there is at least one x ≠ x 0 that makes the series converge. Note that it is trivial to see that if x = x 0 then the series always converges since all terms
except the first are zero. If the series does not converge for any point x ≠ x 0, we say that the series is divergent.
∞
1
∑ n ! xn
n=0
is absolutely convergent for all x ∈ R. This can be seen using the ratio test: For any x notice that
(1 / (n + 1) ! ) x n + 1 x
lim = lim = 0.
n→∞ (1 / n !) x n n→∞n + 1
In fact, you may recall from calculus that this series converges to e x.
[ps:1kex] The series
∞
1
∑ n xn
n=1
lim
n→∞ | (1 / (n + 1) ) x n + 1
(1 / n) x n | = lim |x|
n→∞ n+1
n
= |x| < 1.
∞ ( −1) n ∞ 1
It converges at x = − 1, as ∑ n = 1 n converges by the alternating series test. But the power series does not converge absolutely at x = − 1, because ∑ n = 1 n does not converge. The series
diverges at x = 1. When |x| > 1, then the series diverges via the ratio test.
[ps:divergeex] The series
∑ n nx n
n=1
n→∞
| |
lim sup n nx n
1/n
= lim sup n|x| = ∞.
n→∞
Let ∑ a n(x − x 0) n be a power series. If the series is convergent, then either it converges at all x ∈ R, or there exists a number ρ, such that the series converges absolutely on the interval
(x 0 − ρ, x 0 + ρ) and diverges when x < x 0 − ρ or x > x 0 + ρ.
The number ρ is called the radius of convergence of the power series. We write ρ = ∞ if the series converges for all x, and we write ρ = 0 if the series is divergent. See . In the radius of
convergence is ρ = 1. In the radius of convergence is ρ = ∞, and in the radius of convergence is ρ = 0.
Write
R := lim sup a n
n→∞
| | 1 / n.
We use the root test to prove the proposition:
|
L = lim sup a n(x − x 0) n
n→∞
| 1/n
| |
= x − x 0 lim sup a n
n→∞
| |1 / n = |x − x0 |R.
In particular if R = ∞, then L = ∞ for any x ≠ x 0, and the series diverges by the root test. On the other hand if R = 0, then L = 0 for any x, and the series converges absolutely for all x.
| |
Suppose 0 < R < ∞. The series converges absolutely if 1 > L = R x − x 0 , or in other words when
|x − x0 | < \nicefrac1R.
|
The series diverges when 1 < L = R x − x 0 , or |
|x − x0 | > \nicefrac1R.
Letting ρ = \nicefrac1R completes the proof.
It may be useful to restate what we have learned in the proof as a separate proposition.
Let ∑ a n(x − x 0) n be a power series, and let
( )( )
∞ ∞ ∞
( )
∞ ∞
α ∑ a n(x − x 0) n = ∑ αa n(x − x 0) n,
n=0 n=0
and
( )( )
∞ ∞ ∞
where c n = a 0b n + a 1b n − 1 + ⋯ + a nb 0.
| |
That is, after performing the algebraic operations, the radius of convergence of the resulting series is at least ρ. For all x with x − x 0 < ρ, we have two convergent series so their term by term
addition and multiplication by constants follows by what we learned in the last section. For multiplication of two power series, the series are absolutely convergent inside the radius of
convergence and that is why for those x we can apply Mertens’ theorem. Note that after applying an algebraic operation the radius of convergence could increase. See the exercises.
Let us look at some examples of power series. Polynomials are simply finite power series. That is, a polynomial is a power series where the a n are zero for all n large enough. We expand a
polynomial as a power series about any point x 0 by writing the polynomial as a polynomial in (x − x 0). For example, 2x 2 − 3x + 4 as a power series around x 0 = 1 is
2x 2 − 3x + 4 = 3 + (x − 1) + 2(x − 1) 2.
We can also expand rational functions, that is, ratios of polynomials as power series, although we will not completely prove this fact here. Notice that a series for a rational function only
defines the function on an interval even if the function is defined elsewhere. For example, for the geometric series we have that for x ∈ ( − 1, 1)
∞
1
1−x
= ∑ x n.
n=0
1
The series diverges when |x| > 1, even though 1−x
is defined for all x ≠ 1.
We can use the geometric series together with rules for addition and multiplication of power series to expand rational functions as power series around x 0, as long as the denominator is not zero
at x 0. We state without proof that this is always possible, and we give an example of such a computation using the geometric series.
x
Let us expand as a power series around the origin (x 0 = 0) and find the radius of convergence.
1 + 2x + x 2
x
1 + 2x + x 2
=x
( 1
1 − ( − x) ) 2
( )
∞ 2
=x ∑ ( − 1) nx n
n=0
( )
∞
=x ∑ c nx n
n=0
= ∑ c nx n + 1 ,
n=0
where using the formula for the product of series we obtain, c 0 = 1, c 1 = − 1 − 1 = − 2, c 2 = 1 + 1 + 1 = 3, etc…. Therefore we get that for |x| < 1,
∞
x
= ∑ ( − 1) n + 1nx n.
1 + 2x + x2 n=1
The radius of convergence is at least 1. We leave it to the reader to verify that the radius of convergence is exactly equal to 1.
x3 + x
You can use the method of partial fractions you know from calculus. For example, to find the power series for at 0, write
x2 − 1
∞ ∞
x3 + x 1 1
2 =x+ −
1+x 1−x
=x+ ∑ ( − 1) nx n − ∑ x n.
x −1 n=0 n=0
Processing math: 39%
\)
\)
Suppose both ∑ ∞ ∞ ∞
n = 0 a n and ∑ n = 0 b n converge absolutely. Show that the product series, ∑ n = 0 c n where c n = a 0b n + a 1b n − 1 + ⋯ + a nb 0, also converges absolutely.
[exercise:seriesconvergestoanything] Let ∑ a n be conditionally convergent. Show that given any number x there exists a rearrangement of ∑ a n such that the rearranged series converges to x.
Hint: See .
a) Show that the alternating harmonic series \(\sum \frac
{n}\) has a rearrangement such that for any x < y, there exists a partial sum s n of the rearranged series such that x < s n < y. b) Show that the rearrangement you found does not converge. See .
c) Show that for any x ∈ R, there exists a subsequence of partial sums {s n } of your rearrangement such that lim s n = x.
k k
For the following power series, find if they are convergent or not, and if so find their radius of convergence.
∞ ∞ ∞ ∞ ∞ ∞
1
a) ∑ 2 nx n b) ∑ nx n c) ∑ n ! xn d) ∑ (2k) !
(x − 10) n e) ∑ x 2n f) ∑ n ! xn !
n=0 n=0 n=0 n=0 n=0 n=0
Suppose ∑ a n xn converges for x = 1. a) What can you say about the radius of convergence? b) If you further know that at x = 1 the convergence is not absolute, what can you say?
x
Expand as a power series around x 0 = 0 and compute its radius of convergence.
4 − x2
a) Find an example where the radius of convergence of ∑ a nx n and ∑ b nx n are 1, but the radius of convergence of the sum of the two series is infinite. b) (Trickier) Find an example where the
radius of convergence of ∑ a nx n and ∑ b nx n are 1, but the radius of convergence of the product of the two series is infinite.
| an + 1 |
Figure out how to compute the radius of convergence using the ratio test. That is, suppose ∑ a nx n is a power series and R := lim exists or is ∞. Find the radius of convergence and
| an |
prove your claim.
n(n−1) 2
a) Prove that lim n 1 / n = 1. Hint: Write n 1 / n = 1 + b n and note b n > 0. Then show that (1 + b n) n ≥ 2
bn and use this to show that lim b n = 0. b) Use the result of part a) to show that if
∑ a nx n is a convergent power series with radius of convergence R, then ∑ na nx n is also convergent with the same radius of convergence.
There are different notions of summability (convergence) of a series than just the one we have seen. A common one is Cesàro summability 14. Let ∑ a n be a series and let s n be the nth partial
sum. The series is said to be Cesàro summable to a if
s1 + s2 + ⋯ + sn
a = lim .
n→∞
n
a) If ∑ a n is convergent to a (in the usual sense), show that ∑ a n is Cesàro summable to a. b) Show that in the sense of Cesàro ∑ ( − 1) n is summable to \nicefrac12. c) Let a n := k when n = k 3
for some k ∈ N, a n := − k when n = k 3 + 1 for some k ∈ N, otherwise let a n := 0. Show that ∑ a n diverges in the usual sense, (partial sums are unbounded), but it is Cesàro summable to 0
(seems a little paradoxical at first sight).
Show that the monotonicity in the alternating series test is necessary. That is, find a sequence of positive real numbers {x n} with lim x n = 0 but such that ∑ ( − 1) nx n diverges.
Continuous Functions
Limits of functions
Note: 2–3 lectures
Before we define continuity of functions, we need to visit a somewhat more general notion of a limit. That is, given a function f : S → R, we want to see how f(x) behaves as x tends to a certain
point.
Cluster points
First, let us return to a concept we have previously seen in an exercise.
Let S ⊂ R be a set. A number x ∈ R is called a cluster point of S if for every ϵ > 0, the set (x − ϵ, x + ϵ) ∩ S ∖ {x} is not empty.
That is, x is a cluster point of S if there are points of S arbitrarily close to x. Another way of phrasing the definition is to say that x is a cluster point of S if for every ϵ > 0, there exists a y ∈ S
such that y ≠ x and |x − y| < ϵ. Note that a cluster point of S need not lie in S.
Let us see some examples.
i. The set {\nicefrac1n : n ∈ N} has a unique cluster point zero.
ii. The cluster points of the open interval (0, 1) are all points in the closed interval [0, 1].
iii. For the set Q, the set of cluster points is the whole real line R.
iv. For the set [0, 1) ∪ {2}, the set of cluster points is the interval [0, 1].
v. The set N has no cluster points in R.
Let S ⊂ R. Then x ∈ R is a cluster point of S if and only if there exists a convergent sequence of numbers {x n} such that x n ≠ x, x n ∈ S, and lim x n = x.
First suppose x is a cluster point of S. For any n ∈ N, we pick x n to be an arbitrary point of (x − \nicefrac1n, x + \nicefrac1n) ∩ S ∖ {x}, which we know is nonempty because x is a cluster
point of S.
Processing Then
math: xn
39%is within \nicefrac1n of x, that is,
| |
On the other hand, if we start with a sequence of numbers {x n} in S converging to x such that x n ≠ x for all n, then for every ϵ > 0 there is an M such that in particular x M − x < ϵ. That is,
x M ∈ (x − ϵ, x + ϵ) ∩ S ∖ {x}.
Limits of functions
If a function f is defined on a set S and c is a cluster point of S, then we can define the limit of f(x) as x gets close to c. Do note that it is irrelevant for the definition if f is defined at c or not.
Furthermore, even if the function is defined at c, the limit of the function as x goes to c could very well be different from f(c).
Let f : S → R be a function and c a cluster point of S. Suppose there exists an L ∈ R and for every ϵ > 0, there exists a δ > 0 such that whenever x ∈ S ∖ {c} and |x − c| < δ, then
|f(x) − L| < ϵ.
In this case we say f(x) converges to L as x goes to c. We say L is the limit of f(x) as x goes to c. We write
lim f(x) := L,
x→c
or
f(x) → L as x → c.
If no such L exists, then we say that the limit does not exist or that f diverges at c.
Again the notation and language we are using above assumes the limit is unique even though we have not yet proved that. Let us do that now.
Let c be a cluster point of S ⊂ R and let f : S → R be a function such that f(x) converges as x goes to c. Then the limit of f(x) as x goes to c is unique.
| |
Let L 1 and L 2 be two numbers that both satisfy the definition. Take an ϵ > 0 and find a δ 1 > 0 such that f(x) − L 1 < \nicefracϵ2 for all x ∈ S ∖ {c} with |x − c| < δ 1. Also find δ 2 > 0 such
| |
that f(x) − L 2 < \nicefracϵ2 for all x ∈ S ∖ {c} with |x − c| < δ 2. Put δ := min {δ 1, δ 2}. Suppose x ∈ S, |x − c| < δ, and x ≠ c. Then
ϵ ϵ
|L1 − L2 | = |L1 − f(x) + f(x) − L2 | ≤ |L1 − f(x) | + |f(x) − L2 | < 2 + 2 = ϵ.
| |
As L 1 − L 2 < ϵ for arbitrary ϵ > 0, then L 1 = L 2.
δ := min { 1,
ϵ
2|c| + 1
.}
Take x ≠ c such that |x − c| < δ. In particular, |x − c| < 1. Then by reverse triangle inequality we get
Adding 2|c| to both sides we obtain |x| + |c| < 2|c| + 1. We compute
|f(x) − c | = |x
2 2 − c2 |
= |(x + c)(x − c)|
= |x + c||x − c|
≤ (|x| + |c|)|x − c|
< (2|c| + 1)|x − c|
ϵ
< (2|c| + 1) = ϵ.
2|c| + 1
Define f : [0, 1) → R by
f(x) := { x
1
if x > 0,
if x = 0.
Then
lim f(x) = 0,
x→0
Sequential limits
Let us connect the limit as defined above with limits of sequences.
[seqflimit:lemma] Let S ⊂ R and c be a cluster point of S. Let f : S → R be a function.
Processing math: 39%
x ∈ S ∖ {c} and |x − c| < δ, then |f(x) − L| < ϵ. As {x n} converges to c, find an M such that for n ≥ M we have that x n − c < δ. Therefore, for n ≥ M, | |
|f(xn) − L | < ϵ.
Thus {f(x n)} converges to L.
For the other direction, we use proof by contrapositive. Suppose it is not true that f(x) → L as x → c. The negation of the definition is that there exists an ϵ > 0 such that for every δ > 0 there
exists an x ∈ S ∖ {c}, where |x − c| < δ and |f(x) − L| ≥ ϵ.
Let us use \nicefrac1n for δ in the above statement to construct a sequence {x n}. We have that there exists an ϵ > 0 such that for every n, there exists a point x n ∈ S ∖ {c}, where
|xn − c | < \nicefrac1n and |f(xn) − L | ≥ ϵ. The sequence {xn} just constructed converges to c, but the sequence {f(xn)} does not converge to L. And we are done.
It is possible to strengthen the reverse direction of the lemma by simply stating that {f(x n)} converges without requiring a specific limit. See .
lim sin(\nicefrac1x) does not exist, but lim xsin(\nicefrac1x) = 0. See .
x→0 x→0
Graphs of \sin(\nicefrac{1}{x}) and x \sin(\nicefrac{1}{x}). Note that the computer cannot properly graph \sin(\nicefrac{1}{x}) near zero as it oscillates too fast.[figsin1x]
Graphs of \sin(\nicefrac{1}{x}) and x \sin(\nicefrac{1}{x}). Note that the computer cannot properly graph \sin(\nicefrac{1}{x}) near zero as it oscillates too fast.[figsin1x]
1
Proof: Let us work with sin(\nicefrac1x) first. Let us define the sequence x n := πn + \nicefracπ2 . It is not hard to see that lim x n = 0. Furthermore,
Therefore, {sin(\nicefrac1x n)} does not converge. Thus, by , lim x → 0 sin(\nicefrac1x) does not exist.
Now let us look at xsin(\nicefrac1x). Let x n be a sequence such that x n ≠ 0 for all n and such that lim x n = 0. Notice that |sin(t)| ≤ 1 for any t ∈ R. Therefore,
Keep in mind the phrase “for every sequence” in the lemma. For example, take sin(\nicefrac1x) and the sequence x n = \nicefrac1πn. Then {sin(\nicefrac1x n)} is the constant zero sequence, and
therefore converges to zero.
Using , we can start applying everything we know about sequential limits to limits of functions. Let us give a few important examples.
Let S ⊂ R and c be a cluster point of S. Let f : S → R and g : S → R be functions. Suppose the limits of f(x) and g(x) as x goes to c both exist, and that
Then
By we know {f(x n)} converges to L 1 and {g(x n)} converges to L 2. We also have f(x n) ≤ g(x n). We obtain L 1 ≤ L 2 using .
By applying constant functions, we get the following corollary. The proof is left as an exercise.
[fconstineq:cor] Let S ⊂ R and c be a cluster point of S. Let f : S → R be a function. And suppose the limit of f(x) as x goes to c exists. Suppose there are two real numbers a and b such that
Then
a ≤ lim f(x) ≤ b.
x→c
Using in the same way as above we also get the following corollaries, whose proofs are again left as an exercise.
[fsqueeze:cor] Let S ⊂ R and c be a cluster point of S. Let f : S → R, g : S → R, and h : S → R be functions. Suppose
and the limits of f(x) and h(x) as x goes to c both exist, and
[falg:cor] Let S ⊂ R and c be a cluster point of S. Let f : S → R and g : S → R be functions. Suppose limits of f(x) and g(x) as x goes to c both exist. Then
iv. [falg:cor:iv] If lim g(x) ≠ 0, and g(x) ≠ 0 for all x ∈ S ∖ {c}, then
x→c
[prop:limrest] Let S ⊂ R, c ∈ R, and let f : S → R be a function. Suppose A ⊂ S is such that there is some α > 0 such that A ∩ (c − α, c + α) = S ∩ (c − α, c + α).
i. The point c is a cluster point of A if and only if c is a cluster point of S.
ii. Supposing c is a cluster point of S, then f(x) → L as x → c if and only if f | A(x) → L as x → c.
First, let c be a cluster point of A. Since A ⊂ S, then if (A ∖ {c}) ∩ (c − ϵ, c + ϵ) is nonempty for every ϵ > 0, then (S ∖ {c}) ∩ (c − ϵ, c + ϵ) is nonempty for every ϵ > 0. Thus c is a cluster
point of S. Second, suppose c is a cluster point of S. Then for ϵ > 0 such that ϵ < α we get that (A ∖ {c}) ∩ (c − ϵ, c + ϵ) = (S ∖ {c}) ∩ (c − ϵ, c + ϵ), which is nonempty. This is true for all
ϵ < α and hence (A ∖ {c}) ∩ (c − ϵ, c + ϵ) must be nonempty for all ϵ > 0. Thus c is a cluster point of A.
Now suppose f(x) → L as x → c. That is, for every ϵ > 0 there is a δ > 0 such that if x ∈ S ∖ {c} and |x − c| < δ, then |f(x) − L| < ϵ. Because A ⊂ S, if x is in A ∖ {c}, then x is in S ∖ {c},
and hence f | A(x) → L as x → c.
Finally suppose f | A(x) → L as x → c. Hence for every ϵ > 0 there is a δ > 0 such that if x ∈ A ∖ {c} and |x − c| < δ, then | f | A(x) − L | < ϵ. Without loss of generality assume δ ≤ α. If
|x − c| < δ, then x ∈ S ∖ {c} if and only if x ∈ A ∖ {c}. Thus |f(x) − L| = | f | A(x) − L | < ϵ.
The hypothesis of the proposition is necessary. For an arbitrary restriction we generally only get implication in only one direction, see .
A common use of restriction with respect to limits are one-sided limits.
[defn:onesidedlimits] Let f : S → R be function and let c be a cluster point of S ∩ (c, ∞). Then if the limit of the restriction of f to S ∩ (c, ∞) as x → c exists, we define
Similarly if c is a cluster point of S ∩ ( − ∞, c) and the limit of the restriction as x → c exists, we define
The proposition above does not apply to one-sided limits. It is possible to have one-sided limits, but no limit at a point. For example, define f : R → R by f(x) := 1 for x < 0 and f(x) := 0 for x ≥ 0
. We leave it to the reader to verify that \lim_{x \to 0^-} f(x) = 1, \qquad \lim_{x \to 0^+} f(x) = 0, \qquad \lim_{x \to 0} f(x) \quad \text{does not exist.} We have the following replacement.
[prop:onesidedlimits] Let S \subset {\mathbb{R}} be a set such that c is a cluster point of both S \cap (-\infty,c) and S \cap (c,\infty), and let f \colon S \to {\mathbb{R}} be a function. Then
\lim_{x \to c} f(x) = L \qquad \text{if and only if} \qquad \lim_{x \to c^-} f(x) = \lim_{x \to c^+} f(x) = L .
That is, a limit exists if both one-sided limits exist and are equal, and vice-versa. The proof is a straightforward application of the definition of limit and is left as an exercise. The key point is
that \bigl( S \cap (-\infty,c) \bigr) \cup \bigl( S \cap (c,\infty) \bigr) = S \setminus \{ c \}.
Exercises
Find the limit or prove that the limit does not exist
a) \displaystyle \lim_{x\to c} \sqrt{x}, for b) \displaystyle \lim_{x\to c} x^2+x+1, c) \displaystyle \lim_{x\to 0} x^2 \cos
c \geq 0 for any c \in {\mathbb{R}} (\nicefrac{1}{x})
Prove .
Prove .
Prove .
Let A \subset S. Show that if c is a cluster point of A, then c is a cluster point of S. Note the difference from .
[exercise:restrictionlimitexercise] Let A \subset S. Suppose c is a cluster point of A and it is also a cluster point of S. Let f \colon S \to {\mathbb{R}} be a function. Show that if f(x) \to L as x
\to c, then f|_A(x) \to L as x \to c. Note the difference from .
Find an example of a function f \colon [-1,1] \to {\mathbb{R}} such that for A:=[0,1], the restriction f|_A(x) \to 0 as x \to 0, but the limit of f(x) as x \to 0 does not exist. Note why you cannot
apply .
Find example functions f and g such that the limit of neither f(x) nor g(x) exists as x \to 0, but such that the limit of f(x)+g(x) exists as x \to 0.
[exercise:contlimitcomposition] Let c_1 be a cluster point of A \subset {\mathbb{R}} and c_2 be a cluster point of B \subset {\mathbb{R}}. Suppose f \colon A \to B and g \colon B \to
{\mathbb{R}} are functions such that f(x) \to c_2 as x \to c_1 and g(y) \to L as y \to c_2. If c_2 \in B also suppose that g(c_2) = L. Let h(x) := g\bigl(f(x)\bigr) and show h(x) \to L as x \to c_1.
Hint: note that f(x) could equal c_2 for many x \in A, see also .
Let c be a cluster point of A \subset {\mathbb{R}}, and f \colon A \to {\mathbb{R}} be a function. Suppose for every sequence \{x_n\} in A, such that \lim\, x_n = c, the sequence \{ f(x_n)
\}_{n=1}^\infty is Cauchy. Prove that \lim_{x\to c} f(x) exists.
Processing math: 39%
Continuous functions
Note: 2–2.5 lectures
You undoubtedly heard of continuous functions in your schooling. A high-school criterion for this concept is that a function is continuous if we can draw its graph without lifting the pen from
the paper. While that intuitive concept may be useful in simple situations, we require rigor. The following definition took three great mathematicians (Bolzano, Cauchy, and finally Weierstrass)
to get correctly and its final form dates only to the late 1800s.
Definition and basic properties
Let S \subset {\mathbb{R}}, c \in S, and let f \colon S \to {\mathbb{R}} be a function. We say that f is continuous at c if for every \epsilon > 0 there is a \delta > 0 such that whenever x \in S
and \left\lvert {x-c} \right\rvert < \delta, then \left\lvert {f(x)-f(c)} \right\rvert < \epsilon.
When f \colon S \to {\mathbb{R}} is continuous at all c \in S, then we simply say f is a continuous function.
If f is continuous for all c \in A, we say f is continuous on A \subset S. It is left as an easy exercise to show that this implies that f|_A is continuous, although the converse does not hold.
Continuity may be the most important definition to understand in analysis, and it is not an easy one. See . Note that \delta not only depends on \epsilon, but also on c; we need not pick one
\delta for all c \in S. It is no accident that the definition of continuity is similar to the definition of a limit of a function. The main feature of continuous functions is that these are precisely the
functions that behave nicely with limits.
[contbasic:prop] Suppose f \colon S \to {\mathbb{R}} is a function and c \in S. Then
i. If c is not a cluster point of S, then f is continuous at c.
ii. If c is a cluster point of S, then f is continuous at c if and only if the limit of f(x) as x \to c exists and \lim_{x\to c} f(x) = f(c) .
iii. f is continuous at c if and only if for every sequence \{ x_n \} where x_n \in S and \lim\, x_n = c, the sequence \{ f(x_n) \} converges to f(c).
Let us start with the first item. Suppose c is not a cluster point of S. Then there exists a \delta > 0 such that S \cap (c-\delta,c+\delta) = \{ c \}. Therefore, for any \epsilon > 0, simply pick this
given delta. The only x \in S such that \left\lvert {x-c} \right\rvert < \delta is x=c. Then \left\lvert {f(x)-f(c)} \right\rvert = \left\lvert {f(c)-f(c)} \right\rvert = 0 < \epsilon.
Let us move to the second item. Suppose c is a cluster point of S. Let us first suppose that \lim_{x\to c} f(x) = f(c). Then for every \epsilon > 0 there is a \delta > 0 such that if x \in S \setminus
\{ c \} and \left\lvert {x-c} \right\rvert < \delta, then \left\lvert {f(x)-f(c)} \right\rvert < \epsilon. As \left\lvert {f(c)-f(c)} \right\rvert = 0 < \epsilon, then the definition of continuity at c is
satisfied. On the other hand, suppose f is continuous at c. For every \epsilon > 0, there exists a \delta > 0 such that for x \in S where \left\lvert {x-c} \right\rvert < \delta we have \left\lvert {f(x)-
f(c)} \right\rvert < \epsilon. Then the statement is, of course, still true if x \in S \setminus \{ c \} \subset S. Therefore \lim_{x\to c} f(x) = f(c).
For the third item, suppose f is continuous at c. Let \{ x_n \} be a sequence such that x_n \in S and \lim\, x_n = c. Let \epsilon > 0 be given. Find a \delta > 0 such that \left\lvert {f(x)-f(c)}
\right\rvert < \epsilon for all x \in S where \left\lvert {x-c} \right\rvert < \delta. Find an M \in {\mathbb{N}} such that for n \geq M we have \left\lvert {x_n-c} \right\rvert < \delta. Then for n
\geq M we have that \left\lvert {f(x_n)-f(c)} \right\rvert < \epsilon, so \{ f(x_n) \} converges to f(c).
Let us prove the converse of the third item by contrapositive. Suppose f is not continuous at c. Then there exists an \epsilon > 0 such that for all \delta > 0, there exists an x \in S such that
\left\lvert {x-c} \right\rvert < \delta and \left\lvert {f(x)-f(c)} \right\rvert \geq \epsilon. Let us define a sequence \{ x_n \} as follows. Let x_n \in S be such that \left\lvert {x_n-c} \right\rvert <
\nicefrac{1}{n} and \left\lvert {f(x_n)-f(c)} \right\rvert \geq \epsilon. Now \{ x_n \} is a sequence of numbers in S such that \lim\, x_n = c and such that \left\lvert {f(x_n)-f(c)} \right\rvert \geq
\epsilon for all n \in {\mathbb{N}}. Thus \{ f(x_n) \} does not converge to f(c). It may or may not converge, but it definitely does not converge to f(c).
The last item in the proposition is particularly powerful. It allows us to quickly apply what we know about limits of sequences to continuous functions and even to prove that certain functions
are continuous. It can also be strengthened, see .
f \colon (0,\infty) \to {\mathbb{R}} defined by f(x) := \nicefrac{1}{x} is continuous.
Proof: Fix c \in (0,\infty). Let \{ x_n \} be a sequence in (0,\infty) such that \lim\, x_n = c. Then we know that f(c) = \frac{1}{c} = \frac{1}{\lim\, x_n} = \lim_{n \to \infty} \frac{1}{x_n} =
\lim_{n \to \infty} f(x_n) . Thus f is continuous at c. As f is continuous at all c \in (0,\infty), f is continuous.
We have previously shown \lim_{x \to c} x^2 = c^2 directly. Therefore the function x^2 is continuous. We can use the continuity of algebraic operations with respect to limits of sequences,
which we proved in the previous chapter, to prove a much more general result.
Let f \colon {\mathbb{R}}\to {\mathbb{R}} be a polynomial. That is f(x) = a_d x^d + a_{d-1} x^{d-1} + \cdots + a_1 x + a_0 , for some constants a_0, a_1, \ldots, a_d. Then f is continuous.
Fix c \in {\mathbb{R}}. Let \{ x_n \} be a sequence such that \lim\, x_n = c. Then \begin{split} f(c) &= a_d c^d + a_{d-1} c^{d-1} + \cdots + a_1 c + a_0 \\ &= a_d {(\lim\, x_n)}^d + a_{d-1}
{(\lim\, x_n)}^{d-1} + \cdots + a_1 (\lim\, x_n) + a_0 \\ & = \lim_{n \to \infty} \left( a_d x_n^d + a_{d-1} x_n^{d-1} + \cdots + a_1 x_n + a_0 \right) = \lim_{n \to \infty} f(x_n) . \end{split}
Thus f is continuous at c. As f is continuous at all c \in {\mathbb{R}}, f is continuous.
By similar reasoning, or by appealing to , we can prove the following. The details of the proof are left as an exercise.
[contalg:prop] Let f \colon S \to {\mathbb{R}} and g \colon S \to {\mathbb{R}} be functions continuous at c \in S.
i. The function h \colon S \to {\mathbb{R}} defined by h(x) := f(x)+g(x) is continuous at c.
ii. The function h \colon S \to {\mathbb{R}} defined by h(x) := f(x)-g(x) is continuous at c.
iii. The function h \colon S \to {\mathbb{R}} defined by h(x) := f(x)g(x) is continuous at c.
iv. If g(x)\not=0 for all x \in S, the function h \colon S \to {\mathbb{R}} defined by h(x) := \frac{f(x)}{g(x)} is continuous at c.
[sincos:example] The functions \sin(x) and \cos(x) are continuous. In the following computations we use the sum-to-product trigonometric identities. We also use the simple facts that \left\lvert
{\sin(x)} \right\rvert \leq \left\lvert {x} \right\rvert, \left\lvert {\cos(x)} \right\rvert \leq 1, and \left\lvert {\sin(x)} \right\rvert \leq 1. \begin{split} \left\lvert {\sin(x)-\sin(c)} \right\rvert & =
\left\lvert { 2 \sin \left( \frac{x-c}{2} \right) \cos \left( \frac{x+c}{2} \right) } \right\rvert \\ & = 2 \left\lvert { \sin \left( \frac{x-c}{2} \right) } \right\rvert \left\lvert { \cos \left( \frac{x+c}{2}
\right) } \right\rvert \\ & \leq 2 \left\lvert { \sin \left( \frac{x-c}{2} \right) } \right\rvert \\ & \leq 2 \left\lvert { \frac{x-c}{2} } \right\rvert = \left\lvert {x-c} \right\rvert \end{split} \begin{split}
\left\lvert {\cos(x)-\cos(c)} \right\rvert & = \left\lvert { -2 \sin \left( \frac{x-c}{2} \right) \sin \left( \frac{x+c}{2} \right) } \right\rvert \\ & = 2 \left\lvert { \sin \left( \frac{x-c}{2} \right) }
\right\rvert \left\lvert { \sin \left( \frac{x+c}{2} \right) } \right\rvert \\ & \leq 2 \left\lvert { \sin \left( \frac{x-c}{2} \right) } \right\rvert \\ & \leq 2 \left\lvert { \frac{x-c}{2} } \right\rvert =
\left\lvert {x-c} \right\rvert \end{split}
The claim that sin and cos are continuous follows by taking an arbitrary sequence \{ x_n \} converging to c, or by applying the definition of continuity directly. Details are left to the reader.
Composition of continuous functions
You have probably already realized that one of the basic tools in constructing complicated functions out of simple ones is composition. A useful property of continuous functions is that
compositions of continuous functions are again continuous. Recall that for two functions f and g, the composition f \circ g is defined by (f \circ g)(x) := f\bigl(g(x)\bigr).
Let A, B \subset {\mathbb{R}} and f \colon B \to {\mathbb{R}} and g \colon A \to B be functions. If g is continuous at c \in A and f is continuous at g(c), then f \circ g \colon A \to
{\mathbb{R}}
Processing is continuous
math: 39% at c.
Exercises
Using the definition of continuity directly prove that f \colon {\mathbb{R}}\to {\mathbb{R}} defined by f(x) := x^2 is continuous.
Using the definition of continuity directly prove that f \colon (0,\infty) \to {\mathbb{R}} defined by f(x) := \nicefrac{1}{x} is continuous.
Let f \colon {\mathbb{R}}\to {\mathbb{R}} be defined by f(x) := \begin{cases} x & \text{ if $x$ is rational,} \\ x^2 & \text{ if $x$ is irrational.} \end{cases} Using the definition of continuity
directly prove that f is continuous at 1 and discontinuous at 2.
Let f \colon {\mathbb{R}}\to {\mathbb{R}} be defined by f(x) := \begin{cases} \sin(\nicefrac{1}{x}) & \text{ if $x \not= 0$,} \\ 0 & \text{ if $x=0$.} \end{cases} Is f continuous? Prove your
assertion.
Let f \colon {\mathbb{R}}\to {\mathbb{R}} be defined by f(x) := \begin{cases} x \sin(\nicefrac{1}{x}) & \text{ if $x \not= 0$,} \\ 0 & \text{ if $x=0$.} \end{cases} Is f continuous? Prove
your assertion.
Prove .
Prove the following statement. Let S \subset {\mathbb{R}} and A \subset S. Let f \colon S \to {\mathbb{R}} be a continuous function. Then the restriction f|_A is continuous.
Suppose S \subset {\mathbb{R}}. Suppose for some c \in {\mathbb{R}} and \alpha > 0, we have A=(c-\alpha,c+\alpha) \subset S. Let f \colon S \to {\mathbb{R}} be a function. Prove that if
f|_A is continuous at c, then f is continuous at c.
Give an example of functions f \colon {\mathbb{R}}\to {\mathbb{R}} and g \colon {\mathbb{R}}\to {\mathbb{R}} such that the function h defined by h(x) := f(x) + g(x) is continuous, but f
and g are not continuous. Can you find f and g that are nowhere continuous, but h is a continuous function?
Let f \colon {\mathbb{R}}\to {\mathbb{R}} and g \colon {\mathbb{R}}\to {\mathbb{R}} be continuous functions. Suppose that for all rational numbers r, f(r) = g(r). Show that f(x) = g(x) for
all x.
Let f \colon {\mathbb{R}}\to {\mathbb{R}} be continuous. Suppose f(c) > 0. Show that there exists an \alpha > 0 such that for all x \in (c-\alpha,c+\alpha) we have f(x) > 0.
Let f \colon {\mathbb{Z}}\to {\mathbb{R}} be a function. Show that f is continuous.
[exercise:contseqalt] Let f \colon S \to {\mathbb{R}} be a function and c \in S, such that for every sequence \{ x_n \} in S with \lim\, x_n = c, the sequence \{ f(x_n) \} converges. Show that f
is continuous at c.
Suppose f \colon [-1,0] \to {\mathbb{R}} and g \colon [0,1] \to {\mathbb{R}} are continuous and f(0) = g(0). Define h \colon [-1,1] \to {\mathbb{R}} by h(x) := f(x) if x \leq 0 and h(x) := g(x)
if x > 0. Show that h is continuous.
Suppose g \colon {\mathbb{R}}\to {\mathbb{R}} is a continuous function such that g(0) = 0, and supppse f \colon {\mathbb{R}}\to {\mathbb{R}} is such that \left\lvert {f(x)-f(y)}
\right\rvert \leq g(x-y) for all x and y. Show that f is continuous.
Suppose f(x+y) = f(x) + f(y) for some f \colon {\mathbb{R}}\to {\mathbb{R}} such that f is continuous at 0. Show that f(x) = ax for some a \in {\mathbb{R}}. Hint: Show that f(nx) = nf(x),
then show f is continuous on {\mathbb{R}}. Then show that \nicefrac{f(x)}{x} = f(1) for all rational x.
Exercises
Find an example of a discontinuous function f \colon [0,1] \to {\mathbb{R}} where the intermediate value theorem fails.
Find an example of a bounded discontinuous function f \colon [0,1] \to {\mathbb{R}} that has neither an absolute minimum nor an absolute maximum.
Let f \colon (0,1) \to {\mathbb{R}} be a continuous function such that \displaystyle \lim_{x\to 0} f(x) = \displaystyle \lim_{x\to 1} f(x) = 0. Show that f achieves either an absolute minimum
or an absolute maximum on (0,1) (but perhaps not both).
Let f(x) := \begin{cases} \sin(\nicefrac{1}{x}) & \text{ if $x \not= 0$,} \\ 0 & \text{ if $x=0$.} \end{cases} Show that f has the intermediate value property. That is, for any a < b, if there
exists a y such that f(a) < y < f(b) or f(a) > y > f(b), then there exists a c \in (a,b) such that f(c) = y.
Suppose g(x) is a polynomial of odd degree d such that g(x) = x^d + b_{d-1} x^{d-1} + \cdots + b_1 x + b_0 , for some real numbers b_{0}, b_1, \ldots, b_{d-1}. Show that there exists a K \in
{\mathbb{N}} such that g(-K) < 0. Hint: Make sure to use the fact that d is odd. You will have to use that {(-n)}^d = -(n^d).
Suppose g(x) is a polynomial of positive even degree d such that g(x) = x^d + b_{d-1} x^{d-1} + \cdots + b_1 x + b_0 , for some real numbers b_{0}, b_1, \ldots, b_{d-1}. Suppose g(0) < 0.
Show that g has at least two distinct real roots.
[exercise:imageofinterval] Suppose f \colon [a,b] \to {\mathbb{R}} is a continuous function. Prove that the direct image f([a,b]) is a closed and bounded interval or a single number.
Suppose f \colon {\mathbb{R}}\to {\mathbb{R}} is continuous and periodic with period P > 0. That is, f(x+P) = f(x) for all x \in {\mathbb{R}}. Show that f achieves an absolute minimum
and an absolute maximum.
Suppose f(x) is a bounded polynomial, in other words, there is an M such that \left\lvert {f(x)} \right\rvert \leq M for all x \in {\mathbb{R}}. Prove that f must be a constant.
Suppose f \colon [0,1] \to [0,1] is continuous. Show that f has a fixed point, in other words, show that there exists an x \in [0,1] such that f(x) = x.
Find an example of a bounded function f \colon {\mathbb{R}}\to {\mathbb{R}} that does not achieve an absolute minimum nor an absolute maximum on {\mathbb{R}}.
Suppose f \colon {\mathbb{R}}\to {\mathbb{R}} is a continuous function such that x \leq f(x) \leq x+1 for all x \in {\mathbb{R}}. Find f({\mathbb{R}}).
True/False, prove or find a counterexample. If f \colon {\mathbb{R}}\to {\mathbb{R}} is a continuous function such that \(f|_
Uniform continuity
Note: 1.5–2 lectures (Continuous extension and Lipschitz can be optional)
Uniform continuity
We made a fuss of saying that the \delta in the definition of continuity depended on the point c. There are situations when it is advantageous to have a \delta independent of any point. Let us
give a name to this concept.
Let S \subset {\mathbb{R}}, and let f \colon S \to {\mathbb{R}} be a function. Suppose for any \epsilon > 0 there exists a \delta > 0 such that whenever x, c \in S and \left\lvert {x-c}
\right\rvert < \delta, then \left\lvert {f(x)-f(c)} \right\rvert < \epsilon. Then we say f is uniformly continuous.
It is not hard to see that a uniformly continuous function must be continuous. The only difference in the definitions is that for a given \epsilon > 0 we pick a \delta > 0 that works for all c \in S.
That is, \delta can no longer depend on c, it only depends on \epsilon. The domain of definition of the function makes a difference now. A function that is not uniformly continuous on a larger
set, may be uniformly continuous when restricted to a smaller set.
The function f \colon (0,1) \to {\mathbb{R}}, defined by f(x) := \nicefrac{1}{x} is not uniformly continuous, but it is continuous.
Proof: Given \epsilon > 0, then for \epsilon > \left\lvert {\nicefrac{1}{x}-\nicefrac{1}{y}} \right\rvert to hold we must have \epsilon > \left\lvert {\nicefrac{1}{x}-\nicefrac{1}{y}} \right\rvert
= \frac{\left\lvert {y-x} \right\rvert}{\left\lvert {xy} \right\rvert} = \frac{\left\lvert {y-x} \right\rvert}{xy} , or \left\lvert {x-y} \right\rvert < xy \epsilon . Therefore, to satisfy the definition of
uniform continuity we would have to have \delta \leq xy \epsilon for all x,y in (0,1), but that would mean that \delta \leq 0. Therefore there is no single \delta > 0.
f \colon [0,1] \to {\mathbb{R}}, defined by f(x) := x^2 is uniformly continuous.
Proof: Note that 0 \leq x,c \leq 1. Then \left\lvert {x^2-c^2} \right\rvert = \left\lvert {x+c} \right\rvert\left\lvert {x-c} \right\rvert \leq (\left\lvert {x} \right\rvert+\left\lvert {c} \right\rvert)
\left\lvert {x-c} \right\rvert \leq (1+1)\left\lvert {x-c} \right\rvert . Therefore given \epsilon > 0, let \delta := \nicefrac{\epsilon}{2}. If \left\lvert {x-c} \right\rvert < \delta, then \left\lvert {x^2-
c^2} \right\rvert < \epsilon.
On the other hand, f \colon {\mathbb{R}}\to {\mathbb{R}}, defined by f(x) := x^2 is not uniformly continuous.
Proof: Suppose it is uniformly continuous, then for all \epsilon > 0, there would exist a \delta > 0 such that if \left\lvert {x-c} \right\rvert < \delta, then \left\lvert {x^2 -c^2} \right\rvert <
\epsilon. Take x > 0 and let c := x+\nicefrac{\delta}{2}. Write \epsilon > \left\lvert {x^2-c^2} \right\rvert = \left\lvert {x+c} \right\rvert\left\lvert {x-c} \right\rvert = (2x+\nicefrac{\delta}
{2})\nicefrac{\delta}{2} \geq \delta x . Therefore x < \nicefrac{\epsilon}{\delta} for all x > 0, which is a contradiction.
We have seen that if f is defined on an interval that is either not closed or not bounded, then f can be continuous, but not uniformly continuous. For a closed and bounded interval [a,b], we can,
however, make the following statement.
[unifcont:thm] Let f \colon [a,b] \to {\mathbb{R}} be a continuous function. Then f is uniformly continuous.
We prove the statement by contrapositive. Suppose f is not uniformly continuous. We will prove that there is some c \in [a,b] where f is not continuous. Let us negate the definition of uniformly
continuous. There exists an \epsilon > 0 such that for every \delta > 0, there exist points x, y in S with \left\lvert {x-y} \right\rvert < \delta and \left\lvert {f(x)-f(y)} \right\rvert \geq \epsilon.
So for the \epsilon > 0 above, we find sequences \{ x_n \} and \{ y_n \} such that \left\lvert {x_n-y_n} \right\rvert < \nicefrac{1}{n} and such that \left\lvert {f(x_n)-f(y_n)} \right\rvert \geq
\epsilon. By , there exists a convergent subsequence \{ x_{n_k} \}. Let c := \lim\, x_{n_k}. As a \leq x_{n_k} \leq b, then a \leq c \leq b. Write \left\lvert {y_{n_k} - c} \right\rvert = \left\lvert
{y_{n_k} - x_{n_k} + x_{n_k} - c} \right\rvert \leq \left\lvert {y_{n_k} - x_{n_k}} \right\rvert + \left\lvert {x_{n_k}-c} \right\rvert < \nicefrac{1}{n_k} + \left\lvert {x_{n_k}-c} \right\rvert .
As \nicefrac{1}{n_k} and \left\lvert {x_{n_k}-c} \right\rvert both go to zero when k goes to infinity, \{ y_{n_k} \} converges and the limit is c. We now show that f is not continuous at c. We
estimate math:
Processing \begin{split}
39% \left\lvert {f(x_{n_k}) - f(c)} \right\rvert & = \left\lvert {f(x_{n_k}) - f(y_{n_k}) + f(y_{n_k}) - f(c)} \right\rvert \\ & \geq \left\lvert {f(x_{n_k}) - f(y_{n_k})}
Exercises
Let f \colon S \to {\mathbb{R}} be uniformly continuous. Let A \subset S. Then the restriction f|_A is uniformly continuous.
Let f \colon (a,b) \to {\mathbb{R}} be a uniformly continuous function. Finish the proof of by showing that the limit \lim\limits_{x \to b} f(x) exists.
Show that f \colon (c,\infty) \to {\mathbb{R}} for some c > 0 and defined by f(x) := \nicefrac{1}{x} is Lipschitz continuous.
Show that f \colon (0,\infty) \to {\mathbb{R}} defined by f(x) := \nicefrac{1}{x} is not Lipschitz continuous.
Let A, B be intervals. Let f \colon A \to {\mathbb{R}} and g \colon B \to {\mathbb{R}} be uniformly continuous functions such that f(x) = g(x) for x \in A \cap B. Define the function h \colon
A \cup B \to {\mathbb{R}} by h(x) := f(x) if x \in A and h(x) := g(x) if x \in B \setminus A. a) Prove that if A \cap B \not= \emptyset, then h is uniformly continuous. b) Find an example where
A \cap B = \emptyset and h is not even continuous.
Let f \colon {\mathbb{R}}\to {\mathbb{R}} be a polynomial of degree d \geq 2. Show that f is not Lipschitz continuous.
Let f \colon (0,1) \to {\mathbb{R}} be a bounded continuous function. Show that the function g(x) := x(1-x)f(x) is uniformly continuous.
Show that f \colon (0,\infty) \to {\mathbb{R}} defined by f(x) := \sin (\nicefrac{1}{x}) is not uniformly continuous.
Let f \colon {\mathbb{Q}}\to {\mathbb{R}} be a uniformly continuous function. Show that there exists a uniformly continuous function \widetilde{f} \colon {\mathbb{R}}\to {\mathbb{R}}
such that f(x) = \widetilde{f}(x) for all x \in {\mathbb{Q}}.
a) Find a continuous f \colon (0,1) \to {\mathbb{R}} and a sequence \{ x_n \} in (0,1) that is Cauchy, but such that \{ f(x_n) \} is not Cauchy. b) Prove that if f \colon {\mathbb{R}}\to
{\mathbb{R}} is continuous, and \{ x_n \} is Cauchy, then \{ f(x_n) \} is Cauchy.
Processing math: 39%
Limits at infinity
Note: less than 1 lecture (optional, can safely be omitted unless or is also covered)
Limits at infinity
As for sequences, a continuous variable can also approach infinity. Let us make this notion precise.
We say \infty is a cluster point of S \subset {\mathbb{R}}, if for every M \in {\mathbb{R}}, there exists an x \in S such that x \geq M. Similarly - \infty is a cluster point of S \subset
{\mathbb{R}}, if for every M \in {\mathbb{R}}, there exists an x \in S such that x \leq M.
Let f \colon S \to {\mathbb{R}} be a function, where \infty is a cluster point of S. If there exists an L \in {\mathbb{R}} such that for every \epsilon > 0, there is an M \in {\mathbb{R}} such
that \left\lvert {f(x) - L} \right\rvert < \epsilon whenever x \geq M, then we say f(x) converges to L as x goes to \infty. We call L the limit and write \lim_{x \to \infty} f(x) := L . Alternatively
we write f(x) \to L as x \to \infty.
Similarly, if -\infty is a cluster point of S and there exists an L \in {\mathbb{R}} such that for every \epsilon > 0, there is an M \in {\mathbb{R}} such that \left\lvert {f(x) - L} \right\rvert <
\epsilon whenever x \leq M, then we say f(x) converges to L as x goes to -\infty. We call L the limit and write \lim_{x \to -\infty} f(x) := L . Alternatively we write f(x) \to L as x \to -\infty.
We cheated a little bit again and said the limit. We leave it as an exercise for the reader to prove the following proposition.
[liminfty:unique] The limit at \infty or -\infty as defined above is unique if it exists.
Let f(x) := \frac{1}{\left\lvert {x} \right\rvert+1}. Then \lim_{x\to \infty} f(x) = 0 \qquad \text{and} \qquad \lim_{x\to -\infty} f(x) = 0 .
Proof: Let \epsilon > 0 be given. Find M > 0 large enough so that \frac{1}{M+1} < \epsilon. If x \geq M, then \frac{1}{x+1} \leq \frac{1}{M+1} < \epsilon. Since \frac{1}{\left\lvert {x}
\right\rvert+1} > 0 for all x the first limit is proved. The proof for -\infty is left to the reader.
Let f(x) := \sin(\pi x). Then \lim_{x\to\infty} f(x) does not exist. To prove this fact note that if x = 2n+\nicefrac{1}{2} for some n \in {\mathbb{N}} then f(x)=1, while if x = 2n+\nicefrac{3}
{2} then f(x)=-1, so they cannot both be within a small \epsilon of a single real number.
We must be careful not to confuse continuous limits with limits of sequences. For f(x) = \sin(\pi x) we could say \lim_{n \to \infty} f(n) = 0, \qquad \text{but} \qquad \lim_{x \to \infty} f(x) ~
\text{does not exist}. Of course the notation is ambiguous. We are simply using the convention that n \in {\mathbb{N}}, while x \in {\mathbb{R}}. When the notation is not clear, it is good to
explicitly mention where the variable lives, or what kind of limit are you using.
There is a connection of continuous limits to limits of sequences, but we must take all sequences going to infinity, just as before in .
[seqflimitinf:lemma] Suppose f \colon S \to {\mathbb{R}} is a function, \infty is a cluster point of S \subset {\mathbb{R}}, and L \in {\mathbb{R}}. Then \lim_{x\to\infty} f(x) = L % \qquad
\text{if and only if} \qquad if and only if \lim_{n\to\infty} f(x_n) = L% ~~\text{for all sequences $\{ x_n \}$ such that $\lim\, x_n = \infty$} . for all sequences \{ x_n \} such that
\lim\limits_{n\to\infty} x_n = \infty.
The lemma holds for the limit as x \to -\infty. Its proof is almost identical and is left as an exercise.
First suppose f(x) \to L as x \to \infty. Given an \epsilon > 0, there exists an M such that for all x \geq M we have \left\lvert {f(x)-L} \right\rvert < \epsilon. Let \{ x_n \} be a sequence in S such
that \lim \, x_n = \infty. Then there exists an N such that for all n \geq N we have x_n \geq M. And thus \left\lvert {f(x_n)-L} \right\rvert < \epsilon.
We prove the converse by contrapositive. Suppose f(x) does not go to L as x \to \infty. This means that there exists an \epsilon > 0, such that for every M \in {\mathbb{N}}, there exists an x \in
S, x \geq M, let us call it x_M, such that \left\lvert {f(x_M)-L} \right\rvert \geq \epsilon. Consider the sequence \{ x_n \}. Clearly \{ f(x_n) \} does not converge to L. It remains to note that
\lim\, x_n = \infty, because x_n \geq n for all n.
Using the lemma, we again translate results about sequential limits into results about continuous limits as x goes to infinity. That is, we have almost immediate analogues of the corollaries in .
We simply allow the cluster point c to be either \infty or -\infty, in addition to a real number. We leave it to the student to verify these statements.
Infinite limit
Just as for sequences, it is often convenient to distinguish certain divergent sequences, and talk about limits being infinite almost as if the limits existed.
Let f \colon S \to {\mathbb{R}} be a function and suppose S has \infty as a cluster point. We say f(x) diverges to infinity as x goes to \infty, if for every N \in {\mathbb{R}} there exists an M
\in {\mathbb{R}} such that f(x) > N whenever x \in S and x \geq M. We write \lim_{x \to \infty} f(x) := \infty , or we say that f(x) \to \infty as x \to \infty.
A similar definition can be made for limits as x \to -\infty or as x \to c for a finite c. Also similar definitions can be made for limits being -\infty. Stating these definitions is left as an exercise.
Note that sometimes converges to infinity is used. We can again use sequential limits, and an analogue of is left as an exercise.
Let us show that \lim_{x \to \infty} \frac{1+x^2}{1+x} = \infty.
Proof: For x \geq 1 we have \frac{1+x^2}{1+x} \geq \frac{x^2}{x+x} = \frac{x}{2} . Given N \in {\mathbb{R}}, take M = \max \{ 2N+1 , 1 \}. If x \geq M, then x \geq 1 and \nicefrac{x}{2}
> N. So \frac{1+x^2}{1+x} \geq \frac{x}{2} > N .
Compositions
Finally, just as for limits at finite numbers we can compose functions easily.
[prop:inflimcompositions] Suppose f \colon A \to B, g \colon B \to {\mathbb{R}}, A, B \subset {\mathbb{R}}, a \in {\mathbb{R}}\cup \{ -\infty, \infty\} is a cluster point of A, and b \in
{\mathbb{R}}\cup \{ -\infty, \infty\} is a cluster point of B. Suppose \lim_{x \to a} f(x) = b\qquad \text{and} \qquad \lim_{y \to b} g(y) = c for some c \in {\mathbb{R}}\cup \{ -\infty, \infty
\}. If b \in B, then suppose g(b) = c. Then \lim_{x \to a} g\bigl(f(x)\bigr) = c .
The proof is straightforward, and left as an exercise. We already know the proposition when a, b, c \in {\mathbb{R}}, see Exercises [exercise:contlimitcomposition] and
[exercise:contlimitbadcomposition]. Again the requirement that g is continuous at b, if b \in B, is necessary.
Let h(x) := e^{-x^2+x}. Then \lim_{x\to \infty} h(x) = 0 .
Proof: The claim follows once we know \lim_{x\to \infty} -x^2+x = -\infty and \lim_{y\to -\infty} e^y = 0 , which is usually proved when the exponential function is defined.
Exercises
Prove .
Exercises
Suppose f \colon [0,1] \to {\mathbb{R}} is monotone. Prove f is bounded.
Finish the proof of .
Finish the proof of .
Prove the claims in .
Finish the proof of .
Suppose S \subset {\mathbb{R}}, and f \colon S \to {\mathbb{R}} is an increasing function. a) If c is a cluster point of S \cap (c,\infty) show that \lim\limits_{x\to c^+} f(x) < \infty. b) If c is a
cluster point of S \cap (-\infty,c) and \lim\limits_{x\to c^-} f(x) = \infty, prove that S \subset (-\infty,c).
Suppose I \subset {\mathbb{R}} is an interval and f \colon I \to {\mathbb{R}} is a function such that for each c \in I, there exist a, b \in {\mathbb{R}} with a > 0 such that f(x) \geq a x + b for
all x \in I and f(c) = a c + b. Show that f is strictly increasing.
Suppose f \colon I \to J is a continuous, bijective (one-to-one and onto) function for two intervals I and J. Show that f is strictly monotone.
Consider a monotone function f \colon I \to {\mathbb{R}} on an interval I. Prove that there exists a function g \colon I \to {\mathbb{R}} such that \lim\limits_{x \to c^-} g(x) = g(c) for all c
\in I, except the smaller (left) endpoint of I, and such that g(x) = f(x) for all but countably many x.
a) Let S \subset {\mathbb{R}} be any subset. If f \colon S \to {\mathbb{R}} is increasing, then show that there exists an increasing F \colon {\mathbb{R}}\to {\mathbb{R}} such that f(x) =
F(x) for all x \in S. b) Find an example of a strictly increasing f \colon S \to {\mathbb{R}} such that an increasing F as above is never strictly increasing.
[exercise:increasingfuncdiscatQ] Find an example of an increasing function f \colon [0,1] \to {\mathbb{R}} that has a discontinuity at each rational number. Then show that the image f([0,1])
contains no interval. Hint: Enumerate the rational numbers and define the function with a series.
The Derivative
The derivative
Note: 1 lecture
The idea of a derivative is the following. Let us suppose a graph of a function looks locally like a straight line. We can then talk about the slope of this line. The slope tells us the rate at which
the value of the function changing at the particular point. Of course, we are leaving out any function that has corners or discontinuities. Let us be precise.
Definition and basic properties
Let I be an interval, let f \colon I \to {\mathbb{R}} be a function, and let c \in I. If the limit L := \lim_{x \to c} \frac{f(x)-f(c)}{x-c} exists, then we say f is differentiable at c, that L is the
derivative of f at c, and write f'(c) := L.
If f is differentiable at all c \in I, then we simply say that f is differentiable, and then we obtain a function f' \colon I \to {\mathbb{R}}.
The expression \frac{f(x)-f(c)}{x-c} is called the difference quotient.
The graphical interpretation of the derivative is depicted in . The left-hand plot gives the line through \bigl(c,f(c)\bigr) and \bigl(x,f(x)\bigr) with slope \frac{f(x)-f(c)}{x-c}, that is, the so-
called secant line. When we take the limit as x goes to c, we get the right-hand plot, where we see that the derivative of the function at the point c is the slope of the line tangent to the graph of f
at the point \bigl(c,f(c)\bigr).
We allow I to be a closed interval and we allow c to be an endpoint of I. Some calculus books do not allow c to be an endpoint of an interval, but all the theory still works by allowing it, and it
makes our work easier.
Let f(x) := x^2 defined on the whole real line. We find that \lim_{x\to c} \frac{x^2-c^2}{x-c} = \lim_{x\to c} \frac{(x+c)(x-c)}{x-c} = \lim_{x\to c} (x+c) = 2c. Therefore f'(c) = 2c.
The function f(x) := \left\lvert {x} \right\rvert is not differentiable at the origin. When x > 0, then \frac{\left\lvert {x} \right\rvert-\left\lvert {0} \right\rvert}{x-0} = \frac{x-0}{x-0} = 1 , and
when x < 0 we have \frac{\left\lvert {x} \right\rvert-\left\lvert {0} \right\rvert}{x-0} = \frac{-x-0}{x-0} = -1 .
A famous example of Weierstrass shows that there exists a continuous function that is not differentiable at any point. The construction of this function is beyond the scope of this book. On the
other hand, a differentiable function is always continuous.
Let f \colon I \to {\mathbb{R}} be differentiable at c \in I, then it is continuous at c.
We know the limits \lim_{x\to c}\frac{f(x)-f(c)}{x-c} = f'(c) \qquad \text{and} \qquad \lim_{x\to c}(x-c) = 0 exist. Furthermore, f(x)-f(c) = \left( \frac{f(x)-f(c)}{x-c} \right) (x-c) . Therefore
the limit of f(x)-f(c) exists and \lim_{x\to c} \bigl( f(x)-f(c) \bigr) = \left(\lim_{x\to c} \frac{f(x)-f(c)}{x-c} \right) \left(\lim_{x\to c} (x-c) \right) = f'(c) \cdot 0 = 0. Hence \lim\limits_{x\to c}
f(x) = f(c), and f is continuous at c.
An important property of the derivative is linearity. The derivative is the approximation of a function by a straight line. The slope of a line through two points changes linearly when the y-
coordinates are changed linearly. By taking the limit, it makes sense that the derivative is linear.
Let I be an interval, let f \colon I \to {\mathbb{R}} and g \colon I \to {\mathbb{R}} be differentiable at c \in I, and let \alpha \in {\mathbb{R}}.
i. Define h \colon I \to {\mathbb{R}} by h(x) := \alpha f(x). Then h is differentiable at c and h'(c) = \alpha f'(c).
ii. Define h \colon I \to {\mathbb{R}} by h(x) := f(x) + g(x). Then h is differentiable at c and h'(c) = f'(c) + g'(c).
First, let h(x) := \alpha f(x). For x \in I, x \not= c we have \frac{h(x)-h(c)}{x-c} = \frac{\alpha f(x) - \alpha f(c)}{x-c} = \alpha \frac{f(x) - f(c)}{x-c} . The limit as x goes to c exists on the right
by . We get \lim_{x\to c}\frac{h(x)-h(c)}{x-c} = \alpha \lim_{x\to c} \frac{f(x) - f(c)}{x-c} . Therefore h is differentiable at c, and the derivative is computed as given.
Next, define h(x) := f(x)+g(x). For x \in I, x \not= c we have \frac{h(x)-h(c)}{x-c} = \frac{\bigl(f(x) + g(x)\bigr) - \bigl(f(c) + g(c)\bigr)}{x-c} = \frac{f(x) - f(c)}{x-c} + \frac{g(x) - g(c)}{x-c}
. The limit as x goes to c exists on the right by . We get \lim_{x\to c}\frac{h(x)-h(c)}{x-c} = \lim_{x\to c} \frac{f(x) - f(c)}{x-c} + \lim_{x\to c}\frac{g(x) - g(c)}{x-c} . Therefore h is
differentiable at c and the derivative is computed as given.
It is not true that the derivative of a multiple of two functions is the multiple of the derivatives. Instead we get the so-called product rule or the Leibniz rule 17.
Let I be an interval, let f \colon I \to {\mathbb{R}} and g \colon I \to {\mathbb{R}} be functions differentiable at c. If h \colon I \to {\mathbb{R}} is defined by h(x) := f(x) g(x) , then h is
differentiable at c and h'(c) = f(c) g'(c) + f'(c) g(c) .
The proof of the product rule is left as an exercise. The key is to use the identity f(x) g(x) - f(c) g(c) = f(x)\bigl( g(x) - g(c) \bigr) + g(c) \bigl( f(x) - f(c) \bigr).
Let I be an interval, let f \colon I \to {\mathbb{R}} and g \colon I \to {\mathbb{R}} be differentiable at c and g(x) \not= 0 for all x \in I. If h \colon I \to {\mathbb{R}} is defined by h(x) :=
\frac{f(x)}{g(x)}, then h is differentiable at c and h'(c) = \frac{f'(c) g(c) - f(c) g'(c)}{{\bigl(g(c)\bigr)}^2} .
Again the proof is left as an exercise.
Chain rule
A useful math:
Processing rule for
39%computing derivatives is the chain rule.
Exercises
Prove the product rule. Hint: Use f(x) g(x) - f(c) g(c) = f(x)\bigl( g(x) - g(c) \bigr) + g(c) \bigl( f(x) - f(c) \bigr).
Prove the quotient rule. Hint: You can do this directly, but it may be easier to find the derivative of \nicefrac{1}{x} and then use the chain rule and the product rule.
[exercise:diffofxn] For n \in {\mathbb{Z}}, prove that x^n is differentiable and find the derivative, unless, of course, n < 0 and x=0. Hint: Use the product rule.
Prove that a polynomial is differentiable and find the derivative. Hint: Use the previous exercise.
Define f \colon {\mathbb{R}}\to {\mathbb{R}} by f(x) := \begin{cases} x^2 & \text{ if $x \in {\mathbb{Q}}$,}\\ 0 & \text{ otherwise.} \end{cases} Prove that f is differentiable at 0, but
discontinuous at all points except 0.
Assume the inequality \left\lvert {x-\sin(x)} \right\rvert \leq x^2. Prove that sin is differentiable at 0, and find the derivative at 0.
Using the previous exercise, prove that sin is differentiable at all x and that the derivative is \cos(x). Hint: Use the sum-to-product trigonometric identity as we did before.
Let f \colon I \to {\mathbb{R}} be differentiable. Given n \in {\mathbb{Z}}, define f^n be the function defined by f^n(x) := {\bigl( f(x) \bigr)}^n. If n < 0 assume f(x) \not= 0. Prove that
(f^n)'(x) = n {\bigl(f(x) \bigr)}^{n-1} f'(x).
Suppose f \colon {\mathbb{R}}\to {\mathbb{R}} is a differentiable Lipschitz continuous function. Prove that f' is a bounded function.
Let I_1, I_2 be intervals. Let f \colon I_1 \to I_2 be a bijective function and g \colon I_2 \to I_1 be the inverse. Suppose that both f is differentiable at c \in I_1 and f'(c) \not=0 and g is
differentiable at f(c). Use the chain rule to find a formula for g'\bigl(f(c)\bigr) (in terms of f'(c)).
[exercise:bndmuldiff] Suppose f \colon I \to {\mathbb{R}} is a bounded function and g \colon I \to {\mathbb{R}} is a function differentiable at c \in I and g(c) = g'(c) = 0. Show that h(x) :=
f(x) g(x) is differentiable at c. Hint: Note that you cannot apply the product rule.
[exercise:diffsqueeze] Suppose f \colon I \to {\mathbb{R}}, g \colon I \to {\mathbb{R}}, and h \colon I \to {\mathbb{R}}, are functions. Suppose c \in I is such that f(c) = g(c) = h(c), g and h
are differentiable at c, and g'(c) = h'(c). Furthermore suppose h(x) \leq f(x) \leq g(x) for all x \in I. Prove f is differentiable at c and f'(c) = g'(c) = h'(c).
Proof: It is easy to see from the definition that f has an absolute minimum at 0: we know f(x) \geq 0 for all x and f(0) = 0.
The function f is differentiable for x\not=0 and the derivative is 2 \sin (\nicefrac{1}{x}) \bigl( x \sin (\nicefrac{1}{x}) - \cos(\nicefrac{1}{x}) \bigr). As an exercise show that for x_n =
\frac{4}{(8n+1)\pi} we have \lim\, f'(x_n) = -1, and for y_n = \frac{4}{(8n+3)\pi} we have \lim\, f'(y_n) = 1. Hence if f' exists at 0, then it cannot be continuous.
Let us show that f' exists at 0. We claim that the derivative is zero. In other words \left\lvert {\frac{f(x)-f(0)}{x-0} - 0} \right\rvert goes to zero as x goes to zero. For x \not= 0 we have
\left\lvert {\frac{f(x)-f(0)}{x-0} - 0} \right\rvert = \left\lvert {\frac{x^2 \sin^2(\nicefrac{1}{x})}{x}} \right\rvert = \left\lvert {x \sin^2(\nicefrac{1}{x})} \right\rvert \leq \left\lvert {x}
\right\rvert . And, of course, as x tends to zero, then \left\lvert {x} \right\rvert tends to zero and hence \left\lvert {\frac{f(x)-f(0)}{x-0} - 0} \right\rvert goes to zero. Therefore, f is differentiable
at 0 and the derivative at 0 is 0. A key point in the above calculation is that is that \left\lvert {f(x)} \right\rvert \leq x^2, see also Exercises [exercise:bndmuldiff] and [exercise:diffsqueeze].
It is sometimes useful to assume the derivative of a differentiable function is continuous. If f \colon I \to {\mathbb{R}} is differentiable and the derivative f' is continuous on I, then we say f is
continuously differentiable. It is common to write C^1(I) for the set of continuously differentiable functions on I.
Exercises
Finish the proof of .
Finish the proof of .
Suppose f \colon {\mathbb{R}}\to {\mathbb{R}} is a differentiable function such that f' is a bounded function. Prove f is a Lipschitz continuous function.
Suppose f \colon [a,b] \to {\mathbb{R}} is differentiable and c \in [a,b]. Then show there exists a sequence \{ x_n \} converging to c, x_n \not= c for all n, such that f'(c) = \lim_{n\to \infty}
f'(x_n). Do note this does not imply that f' is continuous (why?).
Suppose f \colon {\mathbb{R}}\to {\mathbb{R}} is a function such that \left\lvert {f(x)-f(y)} \right\rvert \leq \left\lvert {x-y} \right\rvert^2 for all x and y. Show that f(x) = C for some
constant C. Hint: Show that f is differentiable at all points and compute the derivative.
[exercise:posderincr] Suppose I is an interval and f \colon I \to {\mathbb{R}} is a differentiable function. If f'(x) > 0 for all x \in I, show that f is strictly increasing.
Suppose f \colon (a,b) \to {\mathbb{R}} is a differentiable function such that f'(x) \not= 0 for all x \in (a,b). Suppose there exists a point c \in (a,b) such that f'(c) > 0. Prove f'(x) > 0 for all x \in
(a,b).
[exercise:samediffconst] Suppose f \colon (a,b) \to {\mathbb{R}} and g \colon (a,b) \to {\mathbb{R}} are differentiable functions such that f'(x) = g'(x) for all x \in (a,b), then show that there
exists a constant C such that f(x) = g(x) + C.
Prove the following version of L’Hopital’s rule. Suppose f \colon (a,b) \to {\mathbb{R}} and g \colon (a,b) \to {\mathbb{R}} are differentiable functions. Suppose that at c \in (a,b), f(c) = 0,
g(c)=0, and that the limit of \nicefrac{f'(x)}{g'(x)} as x goes to c exists. Show that \lim_{x \to c} \frac{f(x)}{g(x)} = \lim_{x \to c} \frac{f'(x)}{g'(x)} .
Let f \colon (a,b) \to {\mathbb{R}} be an unbounded differentiable function. Show f' \colon (a,b) \to {\mathbb{R}} is unbounded.
Prove the theorem Rolle actually proved in 1691: If f is a polynomial, f'(a) = f'(b) = 0 for some a < b, and there is no c \in (a,b) such that f'(c) = 0, then there is at most one root of f in (a,b),
that is at most one x \in (a,b) such that f(x) = 0. In other words, between any two consecutive roots of f' is at most one root of f. Hint: suppose there are two roots and see what happens.
Processing math: 39%
Taylor’s theorem
Note: half a lecture (optional section)
Derivatives of higher orders
When f \colon I \to {\mathbb{R}} is differentiable, we obtain a function f' \colon I \to {\mathbb{R}}. The function f' is called the first derivative of f. If f' is differentiable, we denote by f''
\colon I \to {\mathbb{R}} the derivative of f'. The function f'' is called the second derivative of f. We similarly obtain f''', f'''', and so on. With a larger number of derivatives the notation would
get out of hand; we denote by f^{(n)} the nth derivative of f.
When f possesses n derivatives, we say f is n times differentiable.
Taylor’s theorem
Taylor’s theorem 19 is a generalization of the . Mean value theorem says that up to a small error f(x) for x near x_0 can be approximated by f(x_0), that is f(x) = f(x_0) + f'(c)(x-x_0), where the
“error” is measured in terms of the first derivative at some point c between x and x_0. Taylor’s theorem generalizes this result to higher derivatives. It tells us that up to a small error, any n
times differentiable function can be approximated at a point x_0 by a polynomial. The error of this approximation behaves like {(x-x_0)}^{n} near the point x_0. To see why this is a good
approximation notice that for a big n, {(x-x_0)}^n is very small in a small interval around x_0.
For an n times differentiable function f defined near a point x_0 \in {\mathbb{R}}, define the nth Taylor polynomial for f at x_0 as \begin{split} P_n^{x_0}(x) & := \sum_{k=0}^n
\frac{f^{(k)}(x_0)}{k!}{(x-x_0)}^k \\ & = f(x_0) + f'(x_0)(x-x_0) + \frac{f''(x_0)}{2}{(x-x_0)}^2 + \frac{f^{(3)}(x_0)}{6}{(x-x_0)}^3 + \cdots + \frac{f^{(n)}(x_0)}{n!}{(x-x_0)}^n .
\end{split}
Taylor’s theorem says a function behaves like its nth Taylor polynomial. The is really Taylor’s theorem for the first derivative.
[thm:taylor] Suppose f \colon [a,b] \to {\mathbb{R}} is a function with n continuous derivatives on [a,b] and such that f^{(n+1)} exists on (a,b). Given distinct points x_0 and x in [a,b], we can
find a point c between x_0 and x such that f(x)=P_{n}^{x_0}(x)+\frac{f^{(n+1)}(c)}{(n+1)!}{(x-x_0)}^{n+1} .
The term R_n^{x_0}(x):=\frac{f^{(n+1)}(c)}{(n+1)!}{(x-x_0)}^{n+1} is called the remainder term. This form of the remainder term is called the Lagrange form of the remainder. There are
other ways to write the remainder term, but we skip those. Note that c depends on both x and x_0.
Find a number M_{x,x_0} (depending on x and x_0) solving the equation f(x)=P_{n}^{x_0}(x)+M_{x,x_0}{(x-x_0)}^{n+1} . Define a function g(s) by g(s) := f(s)-P_n^{x_0}(s)-M_{x,x_0}
{(s-x_0)}^{n+1} . We compute the kth derivative at x_0 of the Taylor polynomial {(P_n^{x_0})}^{(k)}(x_0) = f^{(k)}(x_0) for k=0,1,2,\ldots,n (the zeroth derivative corresponds to the
function itself). Therefore, g(x_0) = g'(x_0) = g''(x_0) = \cdots = g^{(n)}(x_0) = 0 . In particular g(x_0) = 0. On the other hand g(x) = 0. By the there exists an x_1 between x_0 and x such that
g'(x_1) = 0. Applying the to g' we obtain that there exists x_2 between x_0 and x_1 (and therefore between x_0 and x) such that g''(x_2) = 0. We repeat the argument n+1 times to obtain a
number x_{n+1} between x_0 and x_n (and therefore between x_0 and x) such that g^{(n+1)}(x_{n+1}) = 0.
Let c:=x_{n+1}. We compute the (n+1)th derivative of g to find g^{(n+1)}(s) = f^{(n+1)}(s)-(n+1)!\,M_{x,x_0} . Plugging in c for s we obtain M_{x,x_0} = \frac{f^{(n+1)}(c)}{(n+1)!}, and
we are done.
In the proof we have computed {(P_n^{x_0})}^{(k)}(x_0) = f^{(k)}(x_0) for k=0,1,2,\ldots,n. Therefore the Taylor polynomial has the same derivatives as f at x_0 up to the nth derivative.
That is why the Taylor polynomial is a good approximation to f.
The definition of derivative says that a function is differentiable if it is locally approximated by a line. Similarly we mention in passing that there exists a converse to Taylor’s theorem, which
we will neither state nor prove, saying that if a function is locally approximated in a certain way by a polynomial of degree d, then it has d derivatives.
Exercises
Compute the nth Taylor Polynomial at 0 for the exponential function.
Suppose p is a polynomial of degree d. Given any x_0 \in {\mathbb{R}}, show that the (d+1)th Taylor polynomial for p at x_0 is equal to p.
Let f(x) := \left\lvert {x} \right\rvert^3. Compute f'(x) and f''(x) for all x, but show that f^{(3)}(0) does not exist.
Suppose f \colon {\mathbb{R}}\to {\mathbb{R}} has n continuous derivatives. Show that for any x_0 \in {\mathbb{R}}, there exist polynomials P and Q of degree n and an \epsilon > 0 such
that P(x) \leq f(x) \leq Q(x) for all x \in [x_0-\epsilon,x_0+\epsilon] and Q(x)-P(x) = \lambda {(x-x_0)}^n for some \lambda \geq 0.
If f \colon [a,b] \to {\mathbb{R}} has n+1 continuous derivatives and x_0 \in [a,b], prove \lim\limits_{x\to x_0} \frac{R_n^{x_0}(x)}{{(x-x_0)}^n} = 0.
Suppose f \colon [a,b] \to {\mathbb{R}} has n+1 continuous derivatives and x_0 \in (a,b). Show that f^{(k)}(x_0) = 0 for all k = 0, 1, 2, \ldots, n if and only if \(g(x) := \frac{f(x)}
\) is continuous at x_0.
Suppose a,b,c \in {\mathbb{R}} and f \colon {\mathbb{R}}\to {\mathbb{R}} is differentiable, f''(x) = a for all x, f'(0) = b, and f(0) = c. Find f and prove that it is the unique differentiable
function with this property.
Show that a simple converse to Taylor’s theorem does not hold. Find a function f \colon {\mathbb{R}}\to {\mathbb{R}} with no second derivative at x=0 such that \left\lvert {f(x)} \right\rvert
\leq \left\lvert {x^3} \right\rvert, that is, f goes to zero at 0 faster than x^3, and while f'(0) exists, f''(0) does not.
Exercises
Suppose f \colon {\mathbb{R}}\to {\mathbb{R}} is continuously differentiable such that f'(x) > 0 for all x. Show that f is invertible on the interval J = f({\mathbb{R}}), the inverse is
continuously differentiable, and {(f^{-1})}'(y) > 0 for all y \in f({\mathbb{R}}).
Suppose I,J are intervals and a monotone onto f \colon I \to J has an inverse g \colon J \to I. Suppose you already know that both f and g are differentiable everywhere and f' is never zero. Using
chain rule but not prove the formula g'(y) = \nicefrac{1}{f'\bigl(g(y)\bigr)}.
Let n\in {\mathbb{N}} be even. Prove that every x > 0 has a unique negative nth root. That is, there exists a negative number y such that y^n = x. Compute the derivative of the function g(x) :=
y.
[exercise:oddroot] Let n \in {\mathbb{N}} be odd and n \geq 3. Prove that every x has a unique nth root. That is, there exists a number y such that y^n = x. Prove that the function defined by
g(x) := y is differentiable except at x=0 and compute the derivative. Prove that g is not differentiable at x=0.
§3] Show that if in the inverse function theorem f has k continuous derivatives, then the inverse function g also has k continuous derivatives.
Let f(x) := x + 2 x^2 \sin(\nicefrac{1}{x}) for x \not= 0 and f(0) = 0. Show that f is differentiable at all x, that f'(0) > 0, but that f is not invertible on any interval containing the origin.
a) Let f \colon {\mathbb{R}}\to {\mathbb{R}} be a continuously differentiable function and k > 0 be a number such that f'(x) \geq k for all x \in {\mathbb{R}}. Show f is one-to-one and onto,
and has a continuously differentiable inverse f^{-1} \colon {\mathbb{R}}\to {\mathbb{R}}. b) Find an example f \colon {\mathbb{R}}\to {\mathbb{R}} where f'(x) > 0 for all x, but f is not
onto.
Suppose I,J are intervals and a monotone onto f \colon I \to J has an inverse g \colon J \to I. Suppose x \in I and y := f(x) \in J, and that g is differentiable at y. Prove:
a) If g'(y) \not= 0, then f is differentiable at x.
b) If g'(y) = 0, then f is not differentiable at x.
Exercises
Let f \colon [0,1] \to {\mathbb{R}} be defined by f(x) := x^3 and let P := \{ 0, 0.1, 0.4, 1 \}. Compute L(P,f) and U(P,f).
Let f \colon [0,1] \to {\mathbb{R}} be defined by f(x) := x. Show that f \in {\mathcal{R}}[0,1] and compute \int_0^1 f using the definition of the integral (but feel free to use the propositions
of this section).
Let f \colon [a,b] \to {\mathbb{R}} be a bounded function. Suppose there exists a sequence of partitions \{ P_k \} of [a,b] such that \lim_{k \to \infty} \bigl( U(P_k,f) - L(P_k,f) \bigr) = 0 .
Show that f is Riemann integrable and that \int_a^b f = \lim_{k \to \infty} U(P_k,f) = \lim_{k \to \infty} L(P_k,f) .
Finish the proof of .
Exercises
Let f be in {\mathcal{R}}[a,b]. Prove that -f is in {\mathcal{R}}[a,b] and \int_a^b - f(x) ~dx = - \int_a^b f(x) ~dx .
Let f and g be in {\mathcal{R}}[a,b]. Prove that f+g is in {\mathcal{R}}[a,b] and \int_a^b \bigl( f(x)+g(x) \bigr) ~dx = \int_a^b f(x) ~dx + \int_a^b g(x) ~dx . Hint: Use to find a single
partition P such that U(P,f)-L(P,f) < \nicefrac{\epsilon}{2} and U(P,g)-L(P,g) < \nicefrac{\epsilon}{2}.
Let f \colon [a,b] \to {\mathbb{R}} be Riemann integrable. Let g \colon [a,b] \to {\mathbb{R}} be a function such that f(x) = g(x) for all x \in (a,b). Prove that g is Riemann integrable and that
\int_a^b g = \int_a^b f.
Prove the mean value theorem for integrals. That is, prove that if f \colon [a,b] \to {\mathbb{R}} is continuous, then there exists a c \in [a,b] such that \int_a^b f = f(c)(b-a).
If f \colon [a,b] \to {\mathbb{R}} is a continuous function such that f(x) \geq 0 for all x \in [a,b] and \int_a^b f = 0. Prove that f(x) = 0 for all x.
If f \colon [a,b] \to {\mathbb{R}} is a continuous function for all x \in [a,b] and \int_a^b f = 0. Prove that there exists a c \in [a,b] such that f(c) = 0 (Compare with the previous exercise).
If f \colon [a,b] \to {\mathbb{R}} and g \colon [a,b] \to {\mathbb{R}} are continuous functions such that \int_a^b f = \int_a^b g. Then show that there exists a c \in [a,b] such that f(c) = g(c).
Let f \in {\mathcal{R}}[a,b]. Let \alpha, \beta, \gamma be arbitrary numbers in [a,b] (not necessarily ordered in any way). Prove \int_\alpha^\gamma f = \int_\alpha^\beta f + \int_\beta^\gamma
f . Recall what \int_a^b f means if b \leq a.
Prove .
[exercise:easyabsint] Suppose f \colon [a,b] \to {\mathbb{R}} is bounded and has finitely many discontinuities. Show that as a function of x the expression \left\lvert {f(x)} \right\rvert is
bounded with finitely many discontinuities and is thus Riemann integrable. Then show \left\lvert {\int_a^b f(x)~dx} \right\rvert \leq \int_a^b \left\lvert {f(x)} \right\rvert~dx .
Show that the Thomae or popcorn function (see ) is Riemann integrable. Therefore, there exists a function discontinuous at all rational numbers (a dense set) that is Riemann integrable.
In particular, define f \colon [0,1] \to {\mathbb{R}} by f(x) := \begin{cases} \nicefrac{1}{k} & \text{ if $x=\nicefrac{m}{k}$ where $m,k \in {\mathbb{N}}$ and $m$ and $k$ have no
common divisors,} \\ 0 & \text{ if $x$ is irrational}. \end{cases} Show \int_0^1 f = 0.
If I \subset {\mathbb{R}} is a bounded interval, then the function \varphi_I(x) := \begin{cases} 1 & \text{if $x \in I$,} \\ 0 & \text{otherwise,} \end{cases} is called an elementary step
function.
Let I be an arbitrary bounded interval (you should consider all types of intervals: closed, open, half-open) and a < b, then using only the definition of the integral show that the elementary step
function \varphi_I is integrable on [a,b], and find the integral in terms of a, b, and the endpoints of I.
Processing math: 39%
Exercises
Compute \displaystyle \frac{d}{dx} \biggl( \int_{-x}^x e^{s^2}~ds \biggr).
Compute \displaystyle \frac{d}{dx} \biggl( \int_{0}^{x^2} \sin(s^2)~ds \biggr).
Suppose F \colon [a,b] \to {\mathbb{R}} is continuous and differentiable on [a,b] \setminus S, where S is a finite set. Suppose there exists an f \in {\mathcal{R}}[a,b] such that f(x) = F'(x) for
x \in [a,b] \setminus S. Show that \int_a^b f = F(b)-F(a).
[secondftc:exercise] Let f \colon [a,b] \to {\mathbb{R}} be a continuous function. Let c \in [a,b] be arbitrary. Define F(x) := \int_c^x f . Prove that F is differentiable and that F'(x) = f(x) for all
x \in [a,b].
Prove integration by parts. That is, suppose F and G are continuously differentiable functions on [a,b]. Then prove \int_a^b F(x)G'(x)~dx = F(b)G(b)-F(a)G(a) - \int_a^b F'(x)G(x)~dx .
Suppose F and G are continuously23 differentiable functions defined on [a,b] such that F'(x) = G'(x) for all x \in [a,b]. Using the fundamental theorem of calculus, show that F and G differ by a
constant. That is, show that there exists a C \in {\mathbb{R}} such that F(x)-G(x) = C.
The next exercise shows how we can use the integral to “smooth out” a non-differentiable function.
[exercise:smoothingout] Let f \colon [a,b] \to {\mathbb{R}} be a continuous function. Let \epsilon > 0 be a constant. For x \in [a+\epsilon,b-\epsilon], define g(x) := \frac{1}{2\epsilon}
\int_{x-\epsilon}^{x+\epsilon} f . a) Show that g is differentiable and find the derivative.
b) Let f be differentiable and fix x \in (a,b) (let \epsilon be small enough). What happens to g'(x) as \epsilon gets smaller?
c) Find g for f(x) := \left\lvert {x} \right\rvert, \epsilon = 1 (you can assume [a,b] is large enough).
Suppose f \colon [a,b] \to {\mathbb{R}} is continuous and \int_a^x f = \int_x^b f for all x \in [a,b]. Show that f(x) = 0 for all x \in [a,b].
Suppose f \colon [a,b] \to {\mathbb{R}} is continuous and \int_a^x f = 0 for all rational x in [a,b]. Show that f(x) = 0 for all x \in [a,b].
A function f is an odd function if f(x) = -f(-x), and f is an even function if f(x) = f(-x). Let a > 0. Assume f is continuous. Prove: a) If f is odd, then \int_{-a}^a f = 0. b) If f is even, then \int_{-
a}^a f = 2 \int_0^a f.
a) Show that f(x) := \sin(\nicefrac{1}{x}) is integrable on any interval (you can define f(0) to be anything). b) Compute \int_{-1}^1 \sin(\nicefrac{1}{x})\,dx. (Mind the discontinuity)
§6] a) Suppose f \colon [a,b] \to {\mathbb{R}} is increasing, by , f is Riemann integrable. Suppose f has a discontinuity at c \in (a,b), show that F(x) := \int_a^x f is not differentiable at c.
b) In , you have constructed an increasing function f \colon [0,1] \to {\mathbb{R}} that is discontinuous at every x \in [0,1] \cap {\mathbb{Q}}. Use this f to construct a function F(x) that is
continuous on [0,1], but not differentiable at every x \in [0,1] \cap {\mathbb{Q}}.
Exercises
Let y be any real number and b > 0. Define f \colon (0,\infty) \to {\mathbb{R}} and g \colon {\mathbb{R}}\to {\mathbb{R}} as, f(x) := x^y and g(x) := b^x. Show that f and g are
differentiable and find their derivative.
Let b > 0 be given.
a) Show that for every y > 0, there exists a unique number x such that y = b^x. Define the logarithm base b, \log_b \colon (0,\infty) \to {\mathbb{R}}, by \log_b(y) := x.
b) Show that \log_b(x) = \frac{\ln(x)}{\ln(b)}.
c) Prove that if c > 0, then \log_b(x) = \frac{\log_c(x)}{\log_c(b)}.
d) Prove \log_b(xy) = \log_b(x)+\log_b(y), and \log_b(x^y) = y \log_b(x).
§3] Use to study the remainder term and show that for all x \in {\mathbb{R}} e^x = \sum_{n=0}^\infty \frac{x^n}{n!} . Hint: Do not differentiate the series term by term (unless you would
prove that it works).
Use the geometric sum formula to show (for t\not= -1) \[1-t+t^2-\cdots+{(-1)}^n t^n = \frac{1}{1+t} - \frac
Improper integrals
Note: 2–3 lectures (optional section, can safely be skipped, requires the optional )
Often it is necessary to integrate over the entire real line, or a infinite interval of the form [a,\infty) or (\infty,b]. Also, we may wish to integrate functions defined on a finite interval (a,b) but
not bounded. Such functions are not Riemann integrable, but we may want to write down the integral anyway in the spirit of . These integrals are called improper integrals, and are limits of
integrals rather than integrals themselves.
Suppose f \colon [a,b) \to {\mathbb{R}} is a function (not necessarily bounded) that is Riemann integrable on [a,c] for all c < b. We define \int_a^b f := \lim_{c \to b^-} \int_a^{c} f , if the
limit exists.
Suppose f \colon [a,\infty) \to {\mathbb{R}} is a function such that f is Riemann integrable on [a,c] for all c < \infty. We define \int_a^\infty f := \lim_{c \to \infty} \int_a^c f , if the limit exists.
If the limit exists, we say the improper integral converges. If the limit does not exist, we say the improper integral diverges.
We similarly define improper integrals for the left hand endpoint, we leave this to the reader.
For a finite endpoint b, using we see that if f is bounded, then we have defined nothing new. What is new is that we can apply this definition to unbounded functions. The following set of
examplesmath:
Processing is so39%
useful
that we state it as a proposition.
Exercises
Finish the proof of .
Find out for which a \in {\mathbb{R}} does \sum\limits_{n=1}^\infty e^{an} converge. When the series converges, find an upper bound for the sum.
a) Estimate \sum\limits_{n=1}^\infty \frac{1}{n(n+1)} correct to within 0.01 using the integral test. b) Compute the limit of the series exactly and compare. Hint: the sum telescopes.
Prove \int_{-\infty}^\infty \left\lvert {\operatorname{sinc}(x)} \right\rvert~dx = \infty . Hint: again, it is enough to show this on just one side.
Can you interpret \int_{-1}^1 \frac{1}{\sqrt{\left\lvert {x} \right\rvert}}~dx as an improper integral? If so, compute its value.
Take f \colon [0,\infty) \to {\mathbb{R}}, Riemann integrable on every interval [0,b], and such that there exist M, a, and T, such that \left\lvert {f(t)} \right\rvert \leq M e^{at} for all t \geq T.
Show that the Laplace transform of f exists. That is, for every s > a the following integral converges: F(s) := \int_{0}^\infty f(t) e^{-st} ~dt .
Let f \colon {\mathbb{R}}\to {\mathbb{R}} be a Riemann integrable function on every interval [a,b], and such that \int_{-\infty}^\infty \left\lvert {f(x)} \right\rvert~dx < \infty. Show that the
Fourier sine and cosine transforms exist. That is, for every \omega \geq 0 the following integrals converge F^s(\omega) := \frac{1}{\pi} \int_{-\infty}^\infty f(t) \sin(\omega t) ~dt , \qquad
F^c(\omega) := \frac{1}{\pi} \int_{-\infty}^\infty f(t) \cos(\omega t) ~dt . Furthermore, show that F^s and F^c are bounded functions.
Suppose f \colon [0,\infty) \to {\mathbb{R}} is Riemann integrable on every interval [0,b]. Show that \int_0^\infty f converges if and only if for every \epsilon > 0 there exists an M such that if
M \leq a < b then \left\lvert {\int_a^b f} \right\rvert < \epsilon.
Suppose f \colon [0,\infty) \to {\mathbb{R}} is nonnegative and decreasing. a) Show that if \int_0^\infty f < \infty, then \lim\limits_{x\to\infty} f(x) = 0. b) Show that the converse does not
hold.
Find an example of an unbounded continuous function f \colon [0,\infty) \to {\mathbb{R}} that is nonnegative and such that \int_0^\infty f < \infty. Note that this means that \lim_{x\to\infty}
f(x) does not exist; compare previous exercise. Hint: on each interval [k,k+1], k \in {\mathbb{N}}, define a function whose integral over this interval is less than say 2^{-k}.
Find an example of a function f \colon [0,\infty) \to {\mathbb{R}} integrable on all intervals such that \lim_{n\to\infty} \int_0^n f converges as a limit of a sequence, but such that \int_0^\infty
f does not exist. Hint: for all n\in {\mathbb{N}}, divide [n,n+1] into two halves. In one half make the function negative, on the other make the function positive.
Show thatmath:
Processing if f 39%
\colon [1,\infty) \to {\mathbb{R}} is such that g(x) := x^2 f(x) is a bounded function, then \int_1^\infty f converges.
Sequences of Functions
Pointwise and uniform convergence
Note: 1–1.5 lecture
Up till now when we talked about sequences we always talked about sequences of numbers. However, a very useful concept in analysis is to use a sequence of functions. For example, a
solution to some differential equation might be found by finding only approximate solutions. Then the real solution is some sort of limit of those approximate solutions.
When talking about sequences of functions, the tricky part is that there are multiple notions of a limit. Let us describe two common notions of a limit of a sequence of functions.
Pointwise convergence
For every n \in {\mathbb{N}} let f_n \colon S \to {\mathbb{R}} be a function. We say the sequence \{ f_n \}_{n=1}^\infty converges pointwise to f \colon S \to {\mathbb{R}}, if for every x
\in S we have f(x) = \lim_{n\to\infty} f_n(x) .
It is common to say that f_n \colon S \to {\mathbb{R}} converges to f on T \subset {\mathbb{R}} for some f \colon T \to {\mathbb{R}}. In that case we, of course, mean f(x) = \lim\, f_n(x) for
every x \in T. We simply mean that the restrictions of f_n to T converge pointwise to f.
The sequence of functions defined by f_n(x) := x^{2n} converges to f \colon [-1,1] \to {\mathbb{R}} on [-1,1], where f(x) = \begin{cases} 1 & \text{if $x=-1$ or $x=1$,} \\ 0 &
\text{otherwise.} \end{cases} See .
To see this is so, first take x \in (-1,1). Then 0 \leq x^2 < 1. We have seen before that \left\lvert {x^{2n} - 0} \right\rvert = {(x^2)}^n \to 0 \quad \text{as} \quad n \to \infty . Therefore
\lim\,f_n(x) = 0.
When x = 1 or x=-1, then x^{2n} = 1 for all n and hence \lim\,f_n(x) = 1. We also note that \{ f_n(x) \} does not converge for all other x.
Often, functions are given as a series. In this case, we use the notion of pointwise convergence to find the values of the function.
We write \sum_{k=0}^\infty x^k to denote the limit of the functions f_n(x) := \sum_{k=0}^n x^k . When studying series, we have seen that on x \in (-1,1) the f_n converge pointwise to
\frac{1}{1-x} .
The subtle point here is that while \frac{1}{1-x} is defined for all x \not=1, and f_n are defined for all x (even at x=1), convergence only happens on (-1,1).
Therefore, when we write f(x) := \sum_{k=0}^\infty x^k we mean that f is defined on (-1,1) and is the pointwise limit of the partial sums.
Let f_n(x) := \sin(xn). Then f_n does not converge pointwise to any function on any interval. It may converge at certain points, such as when x=0 or x=\pi. It is left as an exercise that in any
interval [a,b], there exists an x such that \sin(xn) does not have a limit as n goes to infinity.
Before we move to uniform convergence, let us reformulate pointwise convergence in a different way. We leave the proof to the reader, it is a simple application of the definition of
convergence of a sequence of real numbers.
[ptwsconv:prop] Let f_n \colon S \to {\mathbb{R}} and f \colon S \to {\mathbb{R}} be functions. Then \{ f_n \} converges pointwise to f if and only if for every x \in S, and every \epsilon >
0, there exists an N \in {\mathbb{N}} such that \left\lvert {f_n(x)-f(x)} \right\rvert < \epsilon for all n \geq N.
The key point here is that N can depend on x, not just on \epsilon. That is, for each x we can pick a different N. If we can pick one N for all x, we have what is called uniform convergence.
Uniform convergence
Let f_n \colon S \to {\mathbb{R}} be functions. We say the sequence \{ f_n \} converges uniformly to f \colon S \to {\mathbb{R}}, if for every \epsilon > 0 there exists an N \in {\mathbb{N}}
such that for all n \geq N we have \left\lvert {f_n(x) - f(x)} \right\rvert < \epsilon \qquad \text{for all $x \in S$.}
Note that N now cannot depend on x. Given \epsilon > 0 we must find an N that works for all x \in S. Because of , we see that uniform convergence implies pointwise convergence.
Let \{ f_n \} be a sequence of functions f_n \colon S \to {\mathbb{R}}. If \{ f_n \} converges uniformly to f \colon S \to {\mathbb{R}}, then \{ f_n \} converges pointwise to f.
The converse does not hold.
The functions f_n(x) := x^{2n} do not converge uniformly on [-1,1], even though they converge pointwise. To see this, suppose for contradiction that the convergence is uniform. For \epsilon
:= \nicefrac{1}{2}, there would have to exist an N such that x^{2N} = \left\lvert {x^{2N} - 0} \right\rvert < \nicefrac{1}{2} for all x \in (-1,1) (as f_n(x) converges to 0 on (-1,1)). But that
means that for any sequence \{ x_k \} in (-1,1) such that \lim\, x_k = 1 we have x_k^{2N} < \nicefrac{1}{2} for all k. On the other hand x^{2N} is a continuous function of x (it is a
polynomial), therefore we obtain a contradiction 1 = 1^{2N} = \lim_{k\to\infty} x_k^{2N} \leq \nicefrac{1}{2} .
However, if we restrict our domain to [-a,a] where 0 < a < 1, then \{ f_n \} converges uniformly to 0 on [-a,a]. First note that a^{2n} \to 0 as n \to \infty. Thus given \epsilon > 0, pick N \in
{\mathbb{N}} such that a^{2n} < \epsilon for all n \geq N. Then for any x \in [-a,a] we have \left\lvert {x} \right\rvert \leq a. Therefore, for n \geq N \left\lvert {x^{2N}} \right\rvert =
\left\lvert {x} \right\rvert^{2N} \leq a^{2N} < \epsilon .
Convergence in uniform norm
For bounded functions there is another more abstract way to think of uniform convergence. To every bounded function we assign a certain nonnegative number (called the uniform norm). This
number measures the “distance” of the function from 0. We can then “measure” how far two functions are from each other. We simply translate a statement about uniform convergence into a
statement about a certain sequence of real numbers converging to zero.
[def:unifnorm] Let f \colon S \to {\mathbb{R}} be a bounded function. Define \left\lVert {f} \right\rVert_u := \sup \bigl\{ \left\lvert {f(x)} \right\rvert : x \in S \bigr\} . \left\lVert {\cdot}
\right\rVert_u is called the uniform norm.
To use this notation 26 and this concept, the domain S must be fixed. Some authors use the notation \left\lVert {f} \right\rVert_S to emphasize the dependence on S.
A sequence of bounded functions f_n \colon S \to {\mathbb{R}} converges uniformly to f \colon S \to {\mathbb{R}}, if and only if \lim_{n\to\infty} \left\lVert {f_n - f} \right\rVert_u = 0 .
First suppose \lim \left\lVert {f_n - f} \right\rVert_u = 0. Let \epsilon > 0 be given. Then there exists an N such that for n \geq N we have \left\lVert {f_n - f} \right\rVert_u < \epsilon. As
\left\lVert {f_n-f} \right\rVert_u is the supremum of \left\lvert {f_n(x)-f(x)} \right\rvert, we see that for all x we have \left\lvert {f_n(x)-f(x)} \right\rvert < \epsilon.
On the other hand, suppose \{ f_n \} converges uniformly to f. Let \epsilon > 0 be given. Then find N such that \left\lvert {f_n(x)-f(x)} \right\rvert < \epsilon for all x \in S. Taking the
supremum we see that \left\lVert {f_n - f} \right\rVert_u < \epsilon. Hence \lim \left\lVert {f_n-f} \right\rVert_u = 0.
Sometimes
Processing it is
math: said
39% that \{ f_n \} converges to f in uniform norm instead of converges uniformly. The proposition says that the two notions are the same thing.
Exercises
Let f and g be bounded functions on [a,b]. Prove \left\lVert {f+g} \right\rVert_u \leq \left\lVert {f} \right\rVert_u + \left\lVert {g} \right\rVert_u .
a) Find the pointwise limit \dfrac{e^{x/n}}{n} for x \in {\mathbb{R}}.
b) Is the limit uniform on {\mathbb{R}}?
c) Is the limit uniform on [0,1]?
Suppose f_n \colon S \to {\mathbb{R}} are functions that converge uniformly to f \colon S \to {\mathbb{R}}. Suppose A \subset S. Show that the sequence of restrictions \{ f_n|_A \}
converges uniformly to f|_A.
Suppose \{ f_n \} and \{ g_n \} defined on some set A converge to f and g respectively pointwise. Show that \{ f_n+g_n \} converges pointwise to f+g.
Suppose \{ f_n \} and \{ g_n \} defined on some set A converge to f and g respectively uniformly on A. Show that \{ f_n+g_n \} converges uniformly to f+g on A.
Find an example of a sequence of functions \{ f_n \} and \{ g_n \} that converge uniformly to some f and g on some set A, but such that \{ f_ng_n \} (the multiple) does not converge uniformly
to fg on A. Hint: Let A := {\mathbb{R}}, let f(x):=g(x) := x. You can even pick f_n = g_n.
Suppose there exists a sequence of functions \{ g_n \} uniformly converging to 0 on A. Now suppose we have a sequence of functions \{ f_n \} and a function f on A such that \left\lvert {f_n(x)
- f(x)} \right\rvert \leq g_n(x) for all x \in A. Show that \{ f_n \} converges uniformly to f on A.
Let \{ f_n \}, \{ g_n \} and \{ h_n \} be sequences of functions on [a,b]. Suppose \{ f_n \} and \{ h_n \} converge uniformly to some function f \colon [a,b] \to {\mathbb{R}} and suppose
f_n(x) \leq g_n(x) \leq h_n(x) for all x \in [a,b]. Show that \{ g_n \} converges uniformly to f.
Let f_n \colon [0,1] \to {\mathbb{R}} be a sequence of increasing functions (that is, f_n(x) \geq f_n(y) whenever x \geq y). Suppose f_n(0) = 0 and \lim\limits_{n \to \infty} f_n(1) = 0. Show
that \{ f_n \} converges uniformly to 0.
Let \{f_n\} be a sequence of functions defined on [0,1]. Suppose there exists a sequence of distinct numbers x_n \in [0,1] such that f_n(x_n) = 1 . Prove or disprove the following statements:
a) True or false: There exists \{ f_n \} as above that converges to 0 pointwise.
b) True or false: There exists \{ f_n \} as above that converges to 0 uniformly on [0,1].
Fix a continuous h \colon [a,b] \to {\mathbb{R}}. Let f(x) := h(x) for x \in [a,b], f(x) := h(a) for x < a and f(x) := h(b) for all x > b. First show that f \colon {\mathbb{R}}\to {\mathbb{R}} is
continuous. Now let f_n be the function g from with \epsilon = \nicefrac{1}{n}, defined on the interval [a,b]. Show that \{ f_n \} converges uniformly to h on [a,b].
Interchange of limits
Note: 1–1.5 lectures
Large parts of modern analysis deal mainly with the question of the interchange of two limiting operations. When we have a chain of two limits, we cannot always just swap the limits. For
example, 0 = \lim_{n\to\infty} \left( \lim_{k\to\infty} \frac{\nicefrac{n}{k}}{\nicefrac{n}{k} + 1} \right) \not= \lim_{k\to\infty} \left( \lim_{n\to\infty} \frac{\nicefrac{n}{k}}{\nicefrac{n}
{k} + 1} \right) = 1 .
When talking about sequences of functions, interchange of limits comes up quite often. We treat two cases. First we look at continuity of the limit, and second we look at the integral of the
limit.
Continuity of the limit
If we have a sequence \{ f_n \} of continuous functions, is the limit continuous? Suppose f is the (pointwise) limit of \{ f_n \}. If \lim\, x_k = x we are interested in the following interchange of
limits. The equality we have to prove (it is not always true) is marked with a question mark. In fact the limits to the left of the question mark might not even exist. \lim_{k \to \infty} f(x_k) =
\lim_{k \to \infty} \Bigl( \lim_{n \to \infty} f_n(x_k) \Bigr) \overset{\text{\textbf{?}}}{=} \lim_{n \to \infty} \Bigl( \lim_{k \to \infty} f_n(x_k) \Bigr) = \lim_{n \to \infty} f_n(x) = f(x) . In
particular, we wish to find conditions on the sequence \{ f_n \} so that the above equation holds. It turns out that if we only require pointwise convergence, then the limit of a sequence of
functions need not be continuous, and the above equation need not hold.
Let f_n \colon [0,1] \to {\mathbb{R}} be defined as f_n(x) := \begin{cases} 1-nx & \text{if $x < \nicefrac{1}{n}$,}\\ 0 & \text{if $x \geq \nicefrac{1}{n}$.} \end{cases} See .
Each function f_n is continuous. Fix an x \in (0,1]. If n \geq \nicefrac{1}{x}, then x \geq \nicefrac{1}{n}. Therefore for n \geq \nicefrac{1}{x} we have f_n(x) = 0, and so \lim_{n \to \infty}
f_n(x) = 0. On the other hand if x=0, then \lim_{n \to \infty} f_n(0) = \lim_{n \to \infty} 1 = 1. Thus the pointwise limit of f_n is the function f \colon [0,1] \to {\mathbb{R}} defined by f(x) :=
\begin{cases} 1 & \text{if $x = 0$,}\\ 0 & \text{if $x > 0$.} \end{cases} The function f is not continuous at 0.
If we, however, require the convergence to be uniform, the limits can be interchanged.
Let \{ f_n \} be a sequence of continuous functions f_n \colon S \to {\mathbb{R}} converging uniformly to f \colon S \to {\mathbb{R}}. Then f is continuous.
Let x \in S be fixed. Let \{ x_n \} be a sequence in S converging to x.
Let \epsilon > 0 be given. As \{ f_k \} converges uniformly to f, we find a k \in {\mathbb{N}} such that \left\lvert {f_k(y)-f(y)} \right\rvert < \nicefrac{\epsilon}{3} for all y \in S. As f_k is
continuous at x, we find an N \in {\mathbb{N}} such that for m \geq N we have \left\lvert {f_k(x_m)-f_k(x)} \right\rvert < \nicefrac{\epsilon}{3} . Thus for m \geq N we have \begin{split}
\left\lvert {f(x_m)-f(x)} \right\rvert & = \left\lvert {f(x_m)-f_k(x_m)+f_k(x_m)-f_k(x)+f_k(x)-f(x)} \right\rvert \\ & \leq \left\lvert {f(x_m)-f_k(x_m)} \right\rvert+ \left\lvert {f_k(x_m)-
f_k(x)} \right\rvert+ \left\lvert {f_k(x)-f(x)} \right\rvert \\ & < \nicefrac{\epsilon}{3} + \nicefrac{\epsilon}{3} + \nicefrac{\epsilon}{3} = \epsilon . \end{split} Therefore \{ f(x_m) \}
converges to f(x) and hence f is continuous at x. As x was arbitrary, f is continuous everywhere.
Integral of the limit
Again, if we simply require pointwise convergence, then the integral of a limit of a sequence of functions need not be equal to the limit of the integrals.
Processing math: 39%
Exercises
While uniform convergence preserves continuity, it does not preserve differentiability. Find an explicit example of a sequence of differentiable functions on [-1,1] that converge uniformly to a
function f such that f is not differentiable. Hint: Consider \left\lvert {x} \right\rvert^{1+1/n}, show that these functions are differentiable, converge uniformly, and then show that the limit is not
differentiable.
Let f_n(x) = \frac{x^n}{n}. Show that \{ f_n \} converges uniformly to a differentiable function f on [0,1] (find f). However, show that f'(1) \not= \lim\limits_{n\to\infty} f_n'(1).
Note: The previous two exercises show that we cannot simply swap limits with derivatives, even if the convergence is uniform. See also below.
Let f \colon [0,1] \to {\mathbb{R}} be a Riemann integrable (hence bounded) function. Find \displaystyle \lim_{n\to\infty} \int_0^1 \frac{f(x)}{n} ~dx.
Show \displaystyle \lim_{n\to\infty} \int_1^2 e^{-nx^2} ~dx = 0. Feel free to use what you know about the exponential function from calculus.
Find an example of a sequence of continuous functions on (0,1) that converges pointwise to a continuous function on (0,1), but the convergence is not uniform.
Note: In the previous exercise, (0,1) was picked for simplicity. For a more challenging exercise, replace (0,1) with [0,1].
True/False; prove or find a counterexample to the following statement: If \{ f_n \} is a sequence of everywhere discontinuous functions on [0,1] that converge uniformly to a function f, then f is
everywhere discontinuous.
[c1uniflim:exercise] For a continuously differentiable function f \colon [a,b] \to {\mathbb{R}}, define \left\lVert {f} \right\rVert_{C^1} := \left\lVert {f} \right\rVert_u + \left\lVert {f'}
\right\rVert_u . Suppose \{ f_n \} is a sequence of continuously differentiable functions such that for every \epsilon >0, there exists an M such that for all n,k \geq M we have \left\lVert {f_n-
f_k} \right\rVert_{C^1} < \epsilon . Show that \{ f_n \} converges uniformly to some continuously differentiable function f \colon [a,b] \to {\mathbb{R}}.
For the following two exercises let us define for a Riemann integrable function f \colon [0,1] \to {\mathbb{R}} the following number \left\lVert {f} \right\rVert_{L^1} := \int_0^1 \left\lvert
{f(x)} \right\rvert~dx . It is true that \left\lvert {f} \right\rvert is integrable whenever f is, see . This norm defines another very common type of convergence called the L^1-convergence, that is
however a bit more subtle.
Suppose \{ f_n \} is a sequence of Riemann integrable functions on [0,1] that converges uniformly to 0. Show that \lim_{n\to\infty} \left\lVert {f_n} \right\rVert_{L^1} = 0 .
Find a sequence of Riemann integrable functions \{ f_n \} on [0,1] that converges pointwise to 0, but \lim_{n\to\infty} \left\lVert {f_n} \right\rVert_{L^1} \text{ does not exist (is $\infty$).}
Prove Dini’s theorem: Let f_n \colon [a,b] \to {\mathbb{R}} be a sequence of continuous functions such that 0 \leq f_{n+1}(x) \leq f_n(x) \leq \cdots \leq f_1(x) \qquad \text{for all $n \in
{\mathbb{N}}$.} Suppose \{ f_n \} converges pointwise to 0. Show that \{ f_n \} converges to zero uniformly.
Suppose f_n \colon [a,b] \to {\mathbb{R}} is a sequence of continuous functions that converges pointwise to a continuous f \colon [a,b] \to {\mathbb{R}}. Suppose that for any x \in [a,b] the
sequence \{ \left\lvert {f_n(x)-f(x)} \right\rvert \} is monotone. Show that the sequence \{f_n\} converges uniformly.
Find a sequence of Riemann integrable functions f_n \colon [0,1] \to {\mathbb{R}} such that \{ f_n \} converges to zero pointwise, and such that a) \bigl\{ \int_0^1 f_n \bigr\}_{n=1}^\infty
increases without bound, b) \bigl\{ \int_0^1 f_n \bigr\}_{n=1}^\infty is the sequence -1,1,-1,1,-1,1, \ldots.
It is possible to define a joint limit of a double sequence \{ x_{n,m} \} of real numbers (that is a function from {\mathbb{N}}\times {\mathbb{N}} to {\mathbb{R}}). We say L is the joint
limit of \{ x_{n,m} \} and write \lim_{\substack{n\to\infty\\m\to\infty}} x_{n,m} = L , \qquad \text{or} \qquad \lim_{(n,m) \to \infty} x_{n,m} = L , if for every \epsilon > 0, there exists an M
such that if n \geq M and m \geq M, then \left\lvert {x_{n,m} - L} \right\rvert < \epsilon.
Suppose the joint limit of \{ x_{n,m} \} is L, and suppose that for all n, \lim\limits_{m \to \infty} x_{n,m} exists, and for all m, \lim\limits_{n \to \infty} x_{n,m} exists. Then show
\lim\limits_{n\to\infty}\lim\limits_{m \to \infty} x_{n,m} = \lim\limits_{m\to\infty}\lim\limits_{n \to \infty} x_{n,m} = L.
A joint limit does not mean the iterated limits even exist. Consider \(x_{n,m} := \frac
Picard’s theorem
Note: 1–2 lectures (can be safely skipped)
A first semester course in analysis should have a pièce de résistance caliber theorem. We pick a theorem whose proof combines everything we have learned. It is more sophisticated than the
fundamental theorem of calculus, the first highlight theorem of this course. The theorem we are talking about is Picard’s theorem 27 on existence and uniqueness of a solution to an ordinary
differential equation. Both the statement and the proof are beautiful examples of what one can do with all we have learned. It is also a good example of how analysis is applied as differential
equations are indispensable in science.
First order ordinary differential equation
Modern science is described in the language of differential equations. That is, equations involving not only the unknown, but also its derivatives. The simplest nontrivial form of a differential
equation is the so-called first order ordinary differential equation y' = F(x,y) . Generally we also specify y(x_0)=y_0. The solution of the equation is a function y(x) such that y(x_0)=y_0 and
y'(x) = F\bigl(x,y(x)\bigr).
When F involves only the x variable, the solution is given by the fundamental theorem of calculus. On the other hand, when F depends on both x and y we need far more firepower. It is not
always true that a solution exists, and if it does, that it is the unique solution. Picard’s theorem gives us certain sufficient conditions for existence and uniqueness.
The theorem
We need a definition of continuity in two variables. First, a point in the plane {\mathbb{R}}^2 = {\mathbb{R}}\times {\mathbb{R}} is denoted by an ordered pair (x,y). To make matters
simple, let us give the following sequential definition of continuity.
Let U \subset {\mathbb{R}}^2 be a set and F \colon U \to {\mathbb{R}} be a function. Let (x,y) \in U be a point. The function F is continuous at (x,y) if for every sequence \{ (x_n,y_n)
\}_{n=1}^\infty of points in U such that \lim\, x_n = x and \lim\, y_n = y, we have \lim_{n \to \infty} F(x_n,y_n) = F(x,y) . We say F is continuous if it is continuous at all points in U.
Let I, J \subset {\mathbb{R}} be closed bounded intervals, let I_0 and J_0 be their interiors, and let (x_0,y_0) \in I_0 \times J_0. Suppose F \colon I \times J \to {\mathbb{R}} is continuous
and Lipschitz in the second variable, that is, there exists a number L such that \left\lvert {F(x,y) - F(x,z)} \right\rvert \leq L \left\lvert {y-z} \right\rvert \ \ \ \text{ for all $y,z \in J$, $x \in I$} .
Then there exists an h > 0 and a unique differentiable function f \colon [x_0 - h, x_0 + h] \to J \subset {\mathbb{R}}, such that \label{picard:diffeq} f'(x) = F\bigl(x,f(x)\bigr) \qquad \text{and}
\qquad f(x_0) = y_0.
Suppose we could find a solution f. Using the fundamental theorem of calculus we integrate the equation f'(x) = F\bigl(x,f(x)\bigr), f(x_0) = y_0, and write [picard:diffeq] as the integral
equation \label{picard:inteq} f(x) = y_0 + \int_{x_0}^x F\bigl(t,f(t)\bigr)~dt . The idea of our proof is that we try to plug in approximations to a solution to the right-hand side of [picard:inteq]
to get better approximations on the left hand side of [picard:inteq]. We hope that in the end the sequence converges and solves [picard:inteq] and hence [picard:diffeq]. The technique below is
called Picard iteration, and the individual functions f_k are called the Picard iterates.
Without loss of generality, suppose x_0 = 0 (exercise below). Another exercise tells us that F is bounded as it is continuous. Therefore pick some M > 0 so that \left\lvert {F(x,y)} \right\rvert
\leq M for all (x,y) \in I\times J. Pick \alpha > 0 such that [-\alpha,\alpha] \subset I and [y_0-\alpha, y_0 + \alpha] \subset J. Define h := \min \left\{ \alpha, \frac{\alpha}{M+L\alpha} \right\} .
Observe [-h,h] \subset I.
Set f_0(x) := y_0. We define f_k inductively. Assuming f_{k-1}([-h,h]) \subset [y_0-\alpha,y_0+\alpha], we see F\bigl(t,f_{k-1}(t)\bigr) is a well defined function of t for t \in [-h,h]. Further if
f_{k-1} is continuous on [-h,h], then F\bigl(t,f_{k-1}(t)\bigr) is continuous as a function of t on [-h,h] (left as an exercise). Define f_k(x) := y_0+ \int_{0}^x F\bigl(t,f_{k-1}(t)\bigr)~dt , and
f_k is continuous on [-h,h] by the fundamental theorem of calculus. To see that f_k maps [-h,h] to [y_0-\alpha,y_0+\alpha], we compute for x \in [-h,h] \left\lvert {f_k(x) - y_0} \right\rvert =
\left\lvert {\int_{0}^x F\bigl(t,f_{k-1}(t)\bigr)~dt } \right\rvert \leq M\left\lvert {x} \right\rvert \leq Mh \leq M \frac{\alpha}{M+L\alpha} \leq \alpha . We now define f_{k+1} and so on, and
we have defined a sequence \{ f_k \} of functions. We need to show that it converges to a function f that solves the equation [picard:inteq] and therefore [picard:diffeq].
We wish to show that the sequence \{ f_k \} converges uniformly to some function on [-h,h]. First, for t \in [-h,h] we have the following useful bound \left\lvert {F\bigl(t,f_{n}(t)\bigr) -
F\bigl(t,f_{k}(t)\bigr)} \right\rvert \leq L \left\lvert {f_n(t)-f_k(t)} \right\rvert \leq L \left\lVert {f_n-f_k} \right\rVert_u , where \left\lVert {f_n-f_k} \right\rVert_u is the uniform norm, that is
the supremum of \left\lvert {f_n(t)-f_k(t)} \right\rvert for t \in [-h,h]. Now note that \left\lvert {x} \right\rvert \leq h \leq \frac{\alpha}{M+L\alpha}. Therefore \begin{split} \left\lvert {f_n(x) -
f_k(x)} \right\rvert & = \left\lvert {\int_{0}^x F\bigl(t,f_{n-1}(t)\bigr)~dt - \int_{0}^x F\bigl(t,f_{k-1}(t)\bigr)~dt} \right\rvert \\ & = \left\lvert {\int_{0}^x F\bigl(t,f_{n-1}(t)\bigr)-
F\bigl(t,f_{k-1}(t)\bigr)~dt} \right\rvert \\ & \leq L\left\lVert {f_{n-1}-f_{k-1}} \right\rVert_u \left\lvert {x} \right\rvert \\ & \leq \frac{L\alpha}{M+L\alpha} \left\lVert {f_{n-1}-f_{k-1}}
\right\rVert_u . \end{split} Let C := \frac{L\alpha}{M+L\alpha} and note that C < 1. Taking supremum on the left-hand side we get \left\lVert {f_n-f_k} \right\rVert_u \leq C \left\lVert {f_{n-
1}-f_{k-1}} \right\rVert_u . Without loss of generality, suppose n \geq k. Then by we can show \left\lVert {f_n-f_k} \right\rVert_u \leq C^{k} \left\lVert {f_{n-k}-f_{0}} \right\rVert_u . For x
\in [-h,h] we have \left\lvert {f_{n-k}(x)-f_{0}(x)} \right\rvert = \left\lvert {f_{n-k}(x)-y_0} \right\rvert \leq \alpha . Therefore, \left\lVert {f_n-f_k} \right\rVert_u \leq C^{k} \left\lVert {f_{n-
k}-f_{0}} \right\rVert_u \leq C^{k} \alpha . As C < 1, \{f_n\} is uniformly Cauchy and by we obtain that \{ f_n \} converges uniformly on [-h,h] to some function f \colon [-h,h] \to
{\mathbb{R}}. The function f is the uniform limit of continuous functions and therefore continuous. Furthremore since all the f_n([-h,h]) \subset [y_0-\alpha,y_0+\alpha], then f([-h,h]) \subset
[y_0-\alpha,y_0+\alpha] (why?).
We now need to show that f solves [picard:inteq]. First, as before we notice \left\lvert {F\bigl(t,f_{n}(t)\bigr) - F\bigl(t,f(t)\bigr)} \right\rvert \leq L \left\lvert {f_n(t)-f(t)} \right\rvert \leq L
\left\lVert {f_n-f} \right\rVert_u . As \left\lVert {f_n-f} \right\rVert_u converges to 0, then F\bigl(t,f_n(t)\bigr) converges uniformly to F\bigl(t,f(t)\bigr) for t \in [-h,h]. Hence, for x \in [-h,h] the
convergence is uniform for t \in [0,x] (or [x,0] if x < 0). Therefore, \begin{aligned} y_0 + \int_0^{x} F(t,f(t)\bigr)~dt & = y_0 + \int_0^{x} F\bigl(t,\lim_{n\to\infty} f_n(t)\bigr)~dt & & \\ & =
y_0 + \int_0^{x} \lim_{n\to\infty} F\bigl(t,f_n(t)\bigr)~dt & & \text{(by continuity of $F$)} \\ & = \lim_{n\to\infty} \left( y_0 + \int_0^{x} F\bigl(t,f_n(t)\bigr)~dt \right) & & \text{(by
uniform convergence)} \\ & = \lim_{n\to\infty} f_{n+1}(x) = f(x) . & &\end{aligned} We apply the fundamental theorem of calculus to show that f is differentiable and its derivative is
F\bigl(x,f(x)\bigr). It is obvious that f(0) = y_0.
Finally, what is left to do is to show uniqueness. Suppose g \colon [-h,h] \to J \subset {\mathbb{R}} is another solution. As before we use the fact that \left\lvert {F\bigl(t,f(t)\bigr) -
F\bigl(t,g(t)\bigr)} \right\rvert \leq L \left\lVert {f-g} \right\rVert_u. Then \begin{split} \left\lvert {f(x)-g(x)} \right\rvert & = \left\lvert { y_0 + \int_0^{x} F\bigl(t,f(t)\bigr)~dt - \left( y_0 +
\int_0^{x} F\bigl(t,g(t)\bigr)~dt \right) } \right\rvert \\ & = \left\lvert { \int_0^{x} F\bigl(t,f(t)\bigr) - F\bigl(t,g(t)\bigr)~dt } \right\rvert \\ & \leq L\left\lVert {f-g} \right\rVert_u\left\lvert {x}
\right\rvert \leq Lh\left\lVert {f-g} \right\rVert_u \leq \frac{L\alpha}{M+L\alpha}\left\lVert {f-g} \right\rVert_u . \end{split} As before, C = \frac{L\alpha}{M+L\alpha} < 1. By taking
supremum over x \in [-h,h] on the left hand side we obtain \left\lVert {f-g} \right\rVert_u \leq C \left\lVert {f-g} \right\rVert_u . This is only possible if \left\lVert {f-g} \right\rVert_u = 0.
Therefore, f=g, and the solution is unique.
Examples
Let us look at some examples. The proof of the theorem gives us an explicit way to find an h that works. It does not, however, give us the best h. It is often possible to find a much larger h for
which the conclusion of the theorem holds.
The proof also gives us the Picard iterates as approximations to the solution. So the proof actually tells us how to obtain the solution, not just that the solution exists.
Consider f'(x) = f(x), \qquad f(0) = 1 . That is, we let F(x,y) = y, and we are looking for a function f such that f'(x) = f(x). We pick any I that contains 0 in the interior. We pick an arbitrary J that
contains 1 in its interior. We can use L = 1. The theorem guarantees an h > 0 such that there exists a unique solution f \colon [-h,h] \to {\mathbb{R}}. This solution is usually denoted by e^x :=
f(x) . We leave it to the reader to verify that by picking I and J large enough the proof of the theorem guarantees that we are able to pick \alpha such that we get any h we want as long as h <
\nicefrac{1}{2}. We omit the calculation.
Of course, we know this function exists as a function for all x, so an arbitrary h ought to work. By same reasoning as above, no matter what x_0 and y_0 are, the proof guarantees an arbitrary h
as long as h < \nicefrac{1}{2}. Fix such an h. We get a unique function defined on [x_0-h,x_0+h]. After defining the function on [-h,h] we find a solution on the interval [0,2h] and notice that
Exercises
Let I, J \subset {\mathbb{R}} be intervals. Let F \colon I \times J \to {\mathbb{R}} be a continuous function of two variables and suppose f \colon I \to J be a continuous function. Show that
F\bigl(x,f(x)\bigr) is a continuous function on I.
Let I, J \subset {\mathbb{R}} be closed bounded intervals. Show that if F \colon I \times J \to {\mathbb{R}} is continuous, then F is bounded.
We proved Picard’s theorem under the assumption that x_0 = 0. Prove the full statement of Picard’s theorem for an arbitrary x_0.
Let f'(x)=x f(x) be our equation. Start with the initial condition f(0)=2 and find the Picard iterates f_0,f_1,f_2,f_3,f_4.
Suppose F \colon I \times J \to {\mathbb{R}} is a function that is continuous in the first variable, that is, for any fixed y the function that takes x to F(x,y) is continuous. Further, suppose F is
Lipschitz in the second variable, that is, there exists a number L such that \left\lvert {F(x,y) - F(x,z)} \right\rvert \leq L \left\lvert {y-z} \right\rvert \ \ \ \text{ for all $y,z \in J$, $x \in I$} . Show
that F is continuous as a function of two variables. Therefore, the hypotheses in the theorem could be made even weaker.
A common type of equation one encounters are linear first order differential equations, that is equations of the form y' + p(x) y = q(x) , \qquad y(x_0) = y_0 . Prove Picard’s theorem for linear
equations. Suppose I is an interval, x_0 \in I, and p \colon I \to {\mathbb{R}} and q \colon I \to {\mathbb{R}} are continuous. Show that there exists a unique differentiable f \colon I \to
{\mathbb{R}}, such that y = f(x) satisfies the equation and the initial condition. Hint: Assume existence of the exponential function and use the integrating factor formula for existence of f
(prove that it works): f(x) := e^{-\int_{x_0}^x p(s)\, ds} \left( \int_{x_0}^x e^{\int_{x_0}^t p(s)\, ds} q(t) ~dt + y_0 \right).
Metric Spaces
Metric spaces
Note: 1.5 lectures
As mentioned in the introduction, the main idea in analysis is to take limits. In we learned to take limits of sequences of real numbers. And in we learned to take limits of functions as a real
number approached some other real number.
We want to take limits in more complicated contexts. For example, we want to have sequences of points in 3-dimensional space. We wish to define continuous functions of several variables.
We even want to define functions on spaces that are a little harder to describe, such as the surface of the earth. We still want to talk about limits there.
Finally, we have seen the limit of a sequence of functions in . We wish to unify all these notions so that we do not have to reprove theorems over and over again in each context. The concept of
a metric space is an elementary yet powerful tool in analysis. And while it is not sufficient to describe every type of limit we find in modern analysis, it gets us very far indeed.
Let X be a set, and let d \colon X \times X \to {\mathbb{R}} be a function such that
i. [metric:pos] d(x,y) \geq 0 for all x, y in X,
ii. [metric:zero] d(x,y) = 0 if and only if x = y,
iii. [metric:com] d(x,y) = d(y,x),
iv. [metric:triang] d(x,z) \leq d(x,y)+ d(y,z) (triangle inequality).
Then the pair (X,d) is called a metric space. The function d is called the metric or sometimes the distance function. Sometimes we just say X is a metric space if the metric is clear from context.
The geometric idea is that d is the distance between two points. Items [metric:pos]–[metric:com] have obvious geometric interpretation: distance is always nonnegative, the only point that is
distance 0 away from x is x itself, and finally that the distance from x to y is the same as the distance from y to x. The triangle inequality [metric:triang] has the interpretation given in .
For the purposes of drawing, it is convenient to draw figures and diagrams in the plane and have the metric be the standard distance. However, that is only one particular metric space. Just
because a certain fact seems to be clear from drawing a picture does not mean it is true. You might be getting sidetracked by intuition from euclidean geometry, whereas the concept of a metric
space is a lot more general.
Let us give some examples of metric spaces.
The set of real numbers {\mathbb{R}} is a metric space with the metric d(x,y) := \left\lvert {x-y} \right\rvert . Items [metric:pos]–[metric:com] of the definition are easy to verify. The triangle
inequality [metric:triang] follows immediately from the standard triangle inequality for real numbers: d(x,z) = \left\lvert {x-z} \right\rvert = \left\lvert {x-y+y-z} \right\rvert \leq \left\lvert {x-y}
\right\rvert+\left\lvert {y-z} \right\rvert = d(x,y)+ d(y,z) . This metric is the standard metric on {\mathbb{R}}. If we talk about {\mathbb{R}} as a metric space without mentioning a specific
metric, we mean this particular metric.
We can also put a different metric on the set of real numbers. For example, take the set of real numbers {\mathbb{R}} together with the metric d(x,y) := \frac{\left\lvert {x-y} \right\rvert}
{\left\lvert {x-y} \right\rvert+1} . Items [metric:pos]–[metric:com] are again easy to verify. The triangle inequality [metric:triang] is a little bit more difficult. Note that d(x,y) =
\varphi(\left\lvert {x-y} \right\rvert) where \varphi(t) = \frac{t}{t+1} and \varphi is an increasing function (positive derivative). Hence \begin{split} d(x,z) & = \varphi(\left\lvert {x-z}
\right\rvert) = \varphi(\left\lvert {x-y+y-z} \right\rvert) \leq \varphi(\left\lvert {x-y} \right\rvert+\left\lvert {y-z} \right\rvert) \\ & = \frac{\left\lvert {x-y} \right\rvert+\left\lvert {y-z}
\right\rvert}{\left\lvert {x-y} \right\rvert+\left\lvert {y-z} \right\rvert+1} = \frac{\left\lvert {x-y} \right\rvert}{\left\lvert {x-y} \right\rvert+\left\lvert {y-z} \right\rvert+1} + \frac{\left\lvert {y-
z} \right\rvert}{\left\lvert {x-y} \right\rvert+\left\lvert {y-z} \right\rvert+1} \\ & \leq \frac{\left\lvert {x-y} \right\rvert}{\left\lvert {x-y} \right\rvert+1} + \frac{\left\lvert {y-z} \right\rvert}
{\left\lvert {y-z} \right\rvert+1} = d(x,y)+ d(y,z) . \end{split} Here we have an example of a nonstandard metric on {\mathbb{R}}. With this metric we see for example that d(x,y) < 1 for all
x,y \in {\mathbb{R}}. That is, any two points are less than 1 unit apart.
An important metric space is the n-dimensional euclidean space {\mathbb{R}}^n = {\mathbb{R}} \times {\mathbb{R}}\times \cdots \times {\mathbb{R}}. We use the following notation for
points: x =(x_1,x_2,\ldots,x_n) \in {\mathbb{R}}^n. We also simply write 0 \in {\mathbb{R}}^n to mean the vector (0,0,\ldots,0). Before making {\mathbb{R}}^n a metric space, let us prove
an important inequality, the so-called Cauchy-Schwarz inequality.
Exercises
Show that for any set X, the discrete metric (d(x,y) = 1 if x\not=y and d(x,x) = 0) does give a metric space (X,d).
Let X := \{ 0 \} be a set. Can you make it into a metric space?
Let X := \{ a, b \} be a set. Can you make it into two distinct metric spaces? (define two distinct metrics on it)
Let the set X := \{ A, B, C \} represent 3 buildings on campus. Suppose we wish our distance to be the time it takes to walk from one building to the other. It takes 5 minutes either way
between buildings A and B. However, building C is on a hill and it takes 10 minutes from A and 15 minutes from B to get to C. On the other hand it takes 5 minutes to go from C to A and 7
minutes to go from C to B, as we are going downhill. Do these distances define a metric? If so, prove it, if not, say why not.
Suppose (X,d) is a metric space and \varphi \colon [0,\infty) \to {\mathbb{R}} is a function such that \varphi(t) \geq 0 for all t and \varphi(t) = 0 if and only if t=0. Also suppose \varphi is
subadditive, that is, \varphi(s+t) \leq \varphi(s)+\varphi(t). Show that with d'(x,y) := \varphi\bigl(d(x,y)\bigr), we obtain a new metric space (X,d').
[exercise:mscross] Let (X,d_X) and (Y,d_Y) be metric spaces.
a) Show that (X \times Y,d) with d\bigl( (x_1,y_1), (x_2,y_2) \bigr) := d_X(x_1,x_2) + d_Y(y_1,y_2) is a metric space.
b) Show that (X \times Y,d) with d\bigl( (x_1,y_1), (x_2,y_2) \bigr) := \max \{ d_X(x_1,x_2) , d_Y(y_1,y_2) \} is a metric space.
Let X be the set of continuous functions on [0,1]. Let \varphi \colon [0,1] \to (0,\infty) be continuous. Define d(f,g) := \int_0^1 \left\lvert {f(x)-g(x)} \right\rvert\varphi(x)~dx . Show that (X,d)
is a metric space.
[exercise:mshausdorffpseudo] Let (X,d) be a metric space. For nonempty bounded subsets A and B let d(x,B) := \inf \{ d(x,b) : b \in B \} \qquad \text{and} \qquad d(A,B) := \sup \{ d(a,B) : a
\in A \} . Now define the Hausdorff metric as d_H(A,B) := \max \{ d(A,B) , d(B,A) \} . Note: d_H can be defined for arbitrary nonempty subsets if we allow the extended reals.
a) Let Y \subset {\mathcal{P}}(X) be the set of bounded nonempty subsets. Prove that (Y,d_H) is a so-called pseudometric space: d_H satisfies the metric properties [metric:pos],
[metric:com], [metric:triang], and further d_H(A,A) = 0 for all A \in Y.
b) Show by example that d itself is not symmetric, that is d(A,B) \not= d(B,A).
c) Find a metric space X and two different nonempty bounded subsets A and B such that d_H(A,B) = 0.
(0,\nicefrac{1}{2}) = (-\nicefrac{1}{2},\nicefrac{1}{2})\). The important thing to keep in mind is which metric space we are working in.
Let (X,d) be a metric space. A set V \subset X is open if for every x \in V, there exists a \delta > 0 such that B(x,\delta) \subset V. See . A set E \subset X is closed if the complement E^c = X
\setminus E is open. When the ambient space X is not clear from context we say V is open in X and E is closed in X.
If x \in V and V is open, then we say V is an open neighborhood of x (or sometimes just neighborhood).
Intuitively, an open set is a set that does not include its “boundary,” wherever we are at in the set, we are allowed to “wiggle” a little bit and stay in the set. Note that not every set is either open
or closed, in fact generally most subsets are neither.
The set [0,1) \subset {\mathbb{R}} is neither open nor closed. First, every ball in {\mathbb{R}} around 0, (-\delta,\delta), contains negative numbers and hence is not contained in [0,1) and so
[0,1) is not open. Second, every ball in {\mathbb{R}} around 1, (1-\delta,1+\delta) contains numbers strictly less than 1 and greater than 0 (e.g. 1-\nicefrac{\delta}{2} as long as \delta < 2).
Thus {\mathbb{R}}\setminus [0,1) is not open, and so [0,1) is not closed.
[prop:topology:open] Let (X,d) be a metric space.
i. [topology:openi] \emptyset and X are open in X.
ii. [topology:openii] If V_1, V_2, \ldots, V_k are open then \bigcap_{j=1}^k V_j is also open. That is, finite intersection of open sets is open.
iii. [topology:openiii] If \{ V_\lambda \}_{\lambda \in I} is an arbitrary collection of open sets, then \bigcup_{\lambda \in I} V_\lambda is also open. That is, union of open sets is open.
Note that the index set in [topology:openiii] is arbitrarily large. By \bigcup_{\lambda \in I} V_\lambda we simply mean the set of all x such that x \in V_\lambda for at least one \lambda \in I.
The sets X and \emptyset are obviously open in X.
Let us prove [topology:openii]. If x \in \bigcap_{j=1}^k V_j, then x \in V_j for all j. As V_j are all open, for every j there exists a \delta_j > 0 such that B(x,\delta_j) \subset V_j. Take \delta :=
\min \{ \delta_1,\delta_2,\ldots,\delta_k \} and notice \delta > 0. We have B(x,\delta) \subset B(x,\delta_j) \subset V_j for every j and so B(x,\delta) \subset \bigcap_{j=1}^k V_j. Consequently
the intersection is open.
Let us prove [topology:openiii]. If x \in \bigcup_{\lambda \in I} V_\lambda, then x \in V_\lambda for some \lambda \in I. As V_\lambda is open, there exists a \delta > 0 such that B(x,\delta)
\subset V_\lambda. But then B(x,\delta) \subset \bigcup_{\lambda \in I} V_\lambda and so the union is open.
The main thing to notice is the difference between items [topology:openii] and [topology:openiii]. Item [topology:openii] is not true for an arbitrary intersection, for example
\bigcap_{n=1}^\infty (-\nicefrac{1}{n},\nicefrac{1}{n}) = \{ 0 \}, which is not open.
The proof of the following analogous proposition for closed sets is left as an exercise.
[prop:topology:closed] Let (X,d) be a metric space.
i. [topology:closedi] \emptyset and X are closed in X.
ii. [topology:closedii] If \{ E_\lambda \}_{\lambda \in I} is an arbitrary collection of closed sets, then \bigcap_{\lambda \in I} E_\lambda is also closed. That is, intersection of closed sets is
closed.
iii. [topology:closediii] If E_1, E_2, \ldots, E_k are closed then \bigcup_{j=1}^k E_j is also closed. That is, finite union of closed sets is closed.
We have not yet shown that the open ball is open and the closed ball is closed. Let us show this fact now to justify the terminology.
[prop:topology:ballsopenclosed] Let (X,d) be a metric space, x \in X, and \delta > 0. Then B(x,\delta) is open and C(x,\delta) is closed.
Let y \in B(x,\delta). Let \alpha := \delta-d(x,y). Of course \alpha > 0. Now let z \in B(y,\alpha). Then d(x,z) \leq d(x,y) + d(y,z) < d(x,y) + \alpha = d(x,y) + \delta-d(x,y) = \delta . Therefore z
\in B(x,\delta) for every z \in B(y,\alpha). So B(y,\alpha) \subset B(x,\delta) and B(x,\delta) is open.
The proof that C(x,\delta) is closed is left as an exercise.
Again be careful about what is the ambient metric space. As [0,\nicefrac{1}{2}) is an open ball in [0,1], this means that [0,\nicefrac{1}{2}) is an open set in [0,1]. On the other hand
[0,\nicefrac{1}{2}) is neither open nor closed in {\mathbb{R}}.
A useful way to think about an open set is as a union of open balls. If U is open, then for each x \in U, there is a \delta_x > 0 (depending on x) such that B(x,\delta_x) \subset U. Then U =
\bigcup_{x\in U} B(x,\delta_x).
The proof of the following proposition is left as an exercise. Note that there are many other open and closed sets in {\mathbb{R}}.
[prop:topology:intervals:openclosed] Let a < b be two real numbers. Then (a,b), (a,\infty), and (-\infty,b) are open in {\mathbb{R}}. Also [a,b], [a,\infty), and (-\infty,b] are closed in
{\mathbb{R}}.
Connected sets
A nonempty metric space (X,d) is connected if the only subsets of X that are both open and closed are \emptyset and X itself. If (X,d) is not connected we say it is disconnected.
When we apply the term connected to a nonempty subset A \subset X, we simply mean that A with the subspace topology is connected.
In other words, a nonempty X is connected if whenever we write X = X_1 \cup X_2 where X_1 \cap X_2 = \emptyset and X_1 and X_2 are open, then either X_1 = \emptyset or X_2 =
\emptyset. So to show X is disconnected, we need to find nonempty disjoint open sets X_1 and X_2 whose union is X. For subsets, we state this idea as a proposition.
Let (X,d) be a metric space. A nonempty set S \subset X is not connected if and only if there exist open sets U_1 and U_2 in X, such that U_1 \cap U_2 \cap S = \emptyset, U_1 \cap S \not=
\emptyset, U_2 \cap S \not= \emptyset, and S = \bigl( U_1 \cap S \bigr) \cup \bigl( U_2 \cap S \bigr) .
If U_j is open in X, then U_j \cap S is open in S in the subspace topology (with subspace metric). To see this, note that if B_X(x,\delta) \subset U_j, then as B_S(x,\delta) = S \cap
B_X(x,\delta), we have B_S(x,\delta) \subset U_j \cap S. So if U_1 and U_2 as above exist, then X is disconnected based on the discussion above.
The proof of the other direction follows by using to find U_1 and U_2 from two open disjoint subsets of S.
Let S \subset {\mathbb{R}} be such that x < z < y with x,y \in S and z \notin S. Claim: S is not connected. Proof: Notice \bigl( (-\infty,z) \cap S \bigr) \cup \bigl( (z,\infty) \cap S \bigr) = S .
A nonempty set S \subset {\mathbb{R}} is connected if and only if it is an interval or a single point.
Suppose S is connected. If S is a single point then we are done. So suppose x < y and x,y \in S. If z is such that x < z < y, then (-\infty,z) \cap S is nonempty and (z,\infty) \cap S is nonempty.
The two sets are disjoint. As S is connected, we must have they their union is not S, so z \in S.
Suppose S is bounded, connected, but not a single point. Let \alpha := \inf \, S and \beta := \sup \, S and note that \alpha < \beta. Suppose \alpha < z < \beta. As \alpha is the infimum, then there
is an x \in S such that \alpha \leq x < z. Similarly there is a y \in S such that \beta \geq y > z. We have shown above that z \in S, so (\alpha,\beta) \subset S. If w < \alpha, then w \notin S as
\alpha was the infimum, similarly if w > \beta then w \notin S. Therefore the only possibilities for S are (\alpha,\beta), [\alpha,\beta), (\alpha,\beta], [\alpha,\beta].
Processing math: 39%
Exercises
Prove . Hint: consider the complements of the sets and apply .
Finish the proof of by proving that C(x,\delta) is closed.
Prove .
Suppose (X,d) is a nonempty metric space with the discrete topology. Show that X is connected if and only if it contains exactly one element.
Show that if S \subset {\mathbb{R}} is a connected unbounded set, then it is an (unbounded) interval.
Show that every open set can be written as a union of closed sets.
a) Show that E is closed if and only if \partial E \subset E. b) Show that U is open if and only if \partial U \cap U = \emptyset.
a) Show that A is open if and only if A^\circ = A. b) Suppose that U is an open set and U \subset A. Show that U \subset A^\circ.
Let X be a set and d, d' be two metrics on X. Suppose there exists an \alpha > 0 and \beta > 0 such that \alpha d(x,y) \leq d'(x,y) \leq \beta d(x,y) for all x,y \in X. Show that U is open in (X,d) if
and only if U is open in (X,d'). That is, the topologies of (X,d) and (X,d') are the same.
Suppose \{ S_i \}, i \in {\mathbb{N}} is a collection of connected subsets of a metric space (X,d). Suppose there exists an x \in X such that x \in S_i for all i \in {\mathbb{N}}. Show that
\bigcup_{i=1}^\infty S_i is connected.
Let A be a connected set. a) Is \overline{A} connected? Prove or find a counterexample. b) Is A^\circ connected? Prove or find a counterexample. Hint: Think of sets in {\mathbb{R}}^2.
The definition of open sets in the following exercise is usually called the subspace topology. You are asked to show that we obtain the same topology by considering the subspace metric.
[exercise:mssubspace] Suppose (X,d) is a metric space and Y \subset X. Show that with the subspace metric on Y, a set U \subset Y is open (in Y) whenever there exists an open set V \subset
X such that U = V \cap Y.
Processing math: 39%
{n}} = \epsilon .\] The sequence \{ x_j \} converges to y \in {\mathbb{R}}^n and we are done.
Convergence and topology
The topology, that is, the set of open sets of a space encodes which sequences converge.
[prop:msconvtopo] Let (X,d) be a metric space and \{x_n\} a sequence in X. Then \{ x_n \} converges to x \in X if and only if for every open neighborhood U of x, there exists an M \in
{\mathbb{N}} such that for all n \geq M we have x_n \in U.
First suppose \{ x_n \} converges. Let U be an open neighborhood of x, then there exists an \epsilon > 0 such that B(x,\epsilon) \subset U. As the sequence converges, find an M \in
{\mathbb{N}} such that for all n \geq M we have d(x,x_n) < \epsilon or in other words x_n \in B(x,\epsilon) \subset U.
Let us prove the other direction. Given \epsilon > 0 let U := B(x,\epsilon) be the neighborhood of x. Then there is an M \in {\mathbb{N}} such that for n \geq M we have x_n \in U =
B(x,\epsilon) or in other words, d(x,x_n) < \epsilon.
A set is closed when it contains the limits of its convergent sequences.
[prop:msclosedlim] Let (X,d) be a metric space, E \subset X a closed set and \{ x_n \} a sequence in E that converges to some x \in X. Then x \in E.
Let us prove the contrapositive. Suppose \{ x_n \} is a sequence in X that converges to x \in E^c. As E^c is open, says there is an M such that for all n \geq M, x_n \in E^c. So \{ x_n \} is not a
sequence in E.
When we take a closure of a set A, we really throw in precisely those points that are limits of sequences in A.
[prop:msclosureapprseq] Let (X,d) be a metric space and A \subset X. Then x \in \overline{A} if and only if there exists a sequence \{ x_n \} of elements in A such that \lim\, x_n = x.
Let x \in \overline{A}. We know by that given \nicefrac{1}{n}, there exists a point x_n \in B(x,\nicefrac{1}{n}) \cap A. As d(x,x_n) < \nicefrac{1}{n}, we have \lim\, x_n = x.
For the other direction, see .
Exercises
Processing math: 39%
\) is complete the sequence converges; there exists an y_k \in {\mathbb{R}} such that y_k = \lim_{j\to\infty} x_{j,k}.
Write y = (y_1,y_2,\ldots,y_n) \in {\mathbb{R}}^n. By we have that \{ x_j \} converges to y \in {\mathbb{R}}^n and hence {\mathbb{R}}^n is complete.
Note that a subset of {\mathbb{R}}^n with the subspace metric need not be complete. For example, (0,1] with the subspace metric is not complete as \{ \nicefrac{1}{n} \} is a Cauchy
sequence in (0,1] with no limit in (0,1]. But see also .
Compactness
Let (X,d) be a metric space and K \subset X. The set K is said to be compact if for any collection of open sets \{ U_{\lambda} \}_{\lambda \in I} such that K \subset \bigcup_{\lambda \in I}
U_\lambda , there exists a finite subset \{ \lambda_1, \lambda_2,\ldots,\lambda_k \} \subset I such that K \subset \bigcup_{j=1}^k U_{\lambda_j} .
A collection of open sets \{ U_{\lambda} \}_{\lambda \in I} as above is said to be a open cover of K. So a way to say that K is compact is to say that every open cover of K has a finite
subcover.
Let (X,d) be a metric space. A compact set K \subset X is closed and bounded.
First, we prove that a compact set is bounded. Fix p \in X. We have the open cover K \subset \bigcup_{n=1}^\infty B(p,n) = X . If K is compact, then there exists some set of indices n_1 < n_2
< \ldots < n_k such that K \subset \bigcup_{j=1}^k B(p,n_j) = B(p,n_k) . As K is contained in a ball, K is bounded.
Next, we show a set that is not closed is not compact. Suppose \overline{K} \not= K, that is, there is a point x \in \overline{K} \setminus K. If y \not= x, then for n with \nicefrac{1}{n} <
d(x,y) we have y \notin C(x,\nicefrac{1}{n}). Furthermore x \notin K, so K \subset \bigcup_{n=1}^\infty {C(x,\nicefrac{1}{n})}^c . As a closed ball is closed, {C(x,\nicefrac{1}{n})}^c is
open, and so we have an open cover. If we take any finite collection of indices n_1 < n_2 < \ldots < n_k, then \bigcup_{j=1}^k {C(x,\nicefrac{1}{n_j})}^c = {C(x,\nicefrac{1}{n_k})}^c As x
is in the closure, C(x,\nicefrac{1}{n_k}) \cap K \not= \emptyset. So there is no finite subcover and K is not compact.
We prove below that in finite dimensional euclidean space every closed bounded set is compact. So closed bounded sets of {\mathbb{R}}^n are examples of compact sets. It is not true that in
every metric space, closed and bounded is equivalent to compact. A simple example would be an incomplete metric space such as (0,1) with the subspace metric. But there are many complete
and very useful metric spaces where closed and bounded is not enough to give compactness, see : C([a,b],{\mathbb{R}}) is a complete metric space, but the closed unit ball C(0,1) is not
compact. However, see .
A useful property of compact sets in a metric space is that every sequence has a convergent subsequence. Such sets are sometimes called sequentially compact. Let us prove that in the context
of metric spaces, a set is compact if and only if it is sequentially compact. First we prove a lemma.
Exercises
Let (X,d) be a metric space and A a finite subset of X. Show that A is compact.
Let A = \{ \nicefrac{1}{n} : n \in {\mathbb{N}}\} \subset {\mathbb{R}}. a) Show that A is not compact directly using the definition. b) Show that A \cup \{ 0 \} is compact directly using the
definition.
Let (X,d) be a metric space with the discrete metric. a) Prove that X is complete. b) Prove that X is compact if and only if X is a finite set.
a) Show that the union of finitely many compact sets is a compact set. b) Find an example where the union of infinitely many compact sets is not compact.
Prove for arbitrary dimension. Hint: The trick is to use the correct notation.
Show that a compact set K is a complete metric space (using the subspace metric).
[exercise:CabRcomplete] Let C([a,b],{\mathbb{R}}) be the metric space as in . Show that C([a,b],{\mathbb{R}}) is a complete metric space.
[exercise:msclbounnotcompt] Let C([0,1],{\mathbb{R}}) be the metric space of . Let 0 denote the zero function. Then show that the closed ball C(0,1) is not compact (even though it is closed
and bounded). Hints: Construct a sequence of distinct continuous functions \{ f_n \} such that d(f_n,0) = 1 and d(f_n,f_k) = 1 for all n \not= k. Show that the set \{ f_n : n \in {\mathbb{N}}\}
\subset C(0,1) is closed but not compact. See for inspiration.
Show that there exists a metric on {\mathbb{R}} that makes {\mathbb{R}} into a compact set.
Suppose (X,d) is complete and suppose we have a countably infinite collection of nonempty compact sets E_1 \supset E_2 \supset E_3 \supset \cdots then prove \bigcap_{j=1}^\infty E_j \not=
\emptyset.
Let C([0,1],{\mathbb{R}}) be the metric space of . Let K be the set of f \in C([0,1],{\mathbb{R}}) such that f is equal to a quadratic polynomial, i.e. f(x) = a+bx+cx^2, and such that \left\lvert
{f(x)} \right\rvert \leq 1 for all x \in [0,1], that is f \in C(0,1). Show that K is compact.
[exercise:mstotbound] Let (X,d) be a complete metric space. Show that K \subset X is compact if and only if K is closed and such that for every \epsilon > 0 there exists a finite set of points
x_1,x_2,\ldots,x_n
Processing with K \subset \bigcup_{j=1}^n B(x_j,\epsilon). Note: Such a set K is said to be totally bounded, so in a complete metric space a set is compact if and only if it is closed and
math: 39%
Continuous functions
Note: 1 lecture
Continuity
Let (X,d_X) and (Y,d_Y) be metric spaces and c \in X. Then f \colon X \to Y is continuous at c if for every \epsilon > 0 there is a \delta > 0 such that whenever x \in X and d_X(x,c) < \delta,
then d_Y\bigl(f(x),f(c)\bigr) < \epsilon.
When f \colon X \to Y is continuous at all c \in X, then we simply say that f is a continuous function.
The definition agrees with the definition from when f is a real-valued function on the real line, if we take the standard metric on {\mathbb{R}}.
[prop:contiscont] Let (X,d_X) and (Y,d_Y) be metric spaces. Then f \colon X \to Y is continuous at c \in X if and only if for every sequence \{ x_n \} in X converging to c, the sequence \{
f(x_n) \} converges to f(c).
Suppose f is continuous at c. Let \{ x_n \} be a sequence in X converging to c. Given \epsilon > 0, there is a \delta > 0 such that d_X(x,c) < \delta implies d_Y\bigl(f(x),f(c)\bigr) < \epsilon. So
take M such that for all n \geq M, we have d_X(x_n,c) < \delta, then d_Y\bigl(f(x_n),f(c)\bigr) < \epsilon. Hence \{ f(x_n) \} converges to f(c).
On the other hand suppose f is not continuous at c. Then there exists an \epsilon > 0, such that for every n \in {\mathbb{N}} there exists an x_n \in X, with d_X(x_n,c) < \nicefrac{1}{n} such
that d_Y\bigl(f(x_n),f(c)\bigr) \geq \epsilon. Then \{ x_n \} converges to c, but \{ f(x_n) \} does not converge to f(c).
Suppose f \colon {\mathbb{R}}^2 \to {\mathbb{R}} is a polynomial. That is, f(x,y) = \sum_{j=0}^d \sum_{k=0}^{d-j} a_{jk}\,x^jy^k = a_{0\,0} + a_{1\,0} \, x + a_{0\,1} \, y+ a_{2\,0} \,
x^2+ a_{1\,1} \, xy+ a_{0\,2} \, y^2+ \cdots + a_{0\,d} \, y^d , for some d \in {\mathbb{N}} (the degree) and a_{jk} \in {\mathbb{R}}. Then we claim f is continuous. Let \{ (x_n,y_n)
\}_{n=1}^\infty be a sequence in {\mathbb{R}}^2 that converges to (x,y) \in {\mathbb{R}}^2. We have proved that this means that \lim\, x_n = x and \lim\, y_n = y. So by we have
\lim_{n\to\infty} f(x_n,y_n) = \lim_{n\to\infty} \sum_{j=0}^d \sum_{k=0}^{d-j} a_{jk} \, x_n^jy_n^k = \sum_{j=0}^d \sum_{k=0}^{d-j} a_{jk} \, x^jy^k = f(x,y) . So f is continuous at
(x,y), and as (x,y) was arbitrary f is continuous everywhere. Similarly, a polynomial in n variables is continuous.
Compactness and continuity
Continuous maps do not map closed sets to closed sets. For example, f \colon (0,1) \to {\mathbb{R}} defined by f(x) := x takes the set (0,1), which is closed in (0,1), to the set (0,1), which is
not closed in {\mathbb{R}}. On the other hand continuous maps do preserve compact sets.
[lemma:continuouscompact] Let (X,d_X) and (Y,d_Y) be metric spaces and f \colon X \to Y a continuous function. If K \subset X is a compact set, then f(K) is a compact set.
A sequence in f(K) can be written as \{ f(x_n) \}_{n=1}^\infty, where \{ x_n \}_{n=1}^\infty is a sequence in K. The set K is compact and therefore there is a subsequence \{ x_{n_i}
\}_{i=1}^\infty that converges to some x \in K. By continuity, \lim_{i\to\infty} f(x_{n_i}) = f(x) \in f(K) . So every sequence in f(K) has a subsequence convergent to a point in f(K), and f(K)
is compact by .
As before, f \colon X \to {\mathbb{R}} achieves an absolute minimum at c \in X if f(x) \geq f(c) \qquad \text{ for all $x \in X$.} On the other hand, f achieves an absolute maximum at c \in X
if f(x) \leq f(c) \qquad \text{ for all $x \in X$.}
Let (X,d) be a compact metric space and f \colon X \to {\mathbb{R}} a continuous function. Then f is bounded and in fact f achieves an absolute minimum and an absolute maximum on X.
As X is compact and f is continuous, we have that f(X) \subset {\mathbb{R}} is compact. Hence f(X) is closed and bounded. In particular, \sup f(X) \in f(X) and \inf f(X) \in f(X), because both
the sup and inf can be achieved by sequences in f(X) and f(X) is closed. Therefore there is some x \in X such that f(x) = \sup f(X) and some y \in X such that f(y) = \inf f(X).
Continuity and topology
Let us see how to define continuity in terms of the topology, that is, the open sets. We have already seen that topology determines which sequences converge, and so it is no wonder that the
topology also determines continuity of functions.
[lemma:mstopocontloc] Let (X,d_X) and (Y,d_Y) be metric spaces. A function f \colon X \to Y is continuous at c \in X if and only if for every open neighborhood U of f(c) in Y, the set f^{-1}
(U) contains an open neighborhood of c in X.
First suppose that f is continuous at c. Let U be an open neighborhood of f(c) in Y, then B_Y\bigl(f(c),\epsilon\bigr) \subset U for some \epsilon > 0. By continuity of f, there exists a \delta > 0
such that whenever x is such that d_X(x,c) < \delta, then d_Y\bigl(f(x),f(c)\bigr) < \epsilon. In other words, B_X(c,\delta) \subset f^{-1}\bigl(B_Y\bigl(f(c),\epsilon\bigr)\bigr) \subset f^{-1}
(U) , and B_X(c,\delta) is an open neighborhood of c.
For the other direction, let \epsilon > 0 be given. If f^{-1}\bigl(B_Y\bigl(f(c),\epsilon\bigr)\bigr) contains an open neighborhood W of c, it contains a ball. That is, there is some \delta > 0 such
that B_X(c,\delta) \subset W \subset f^{-1}\bigl(B_Y\bigl(f(c),\epsilon\bigr)\bigr) . That means precisely that if d_X(x,c) < \delta then d_Y\bigl(f(x),f(c)\bigr) < \epsilon, and so f is continuous
at c.
[thm:mstopocont] Let (X,d_X) and (Y,d_Y) be metric spaces. A function f \colon X \to Y is continuous if and only if for every open U \subset Y, f^{-1}(U) is open in X.
The proof follows from and is left as an exercise.
Let f \colon X \to Y be a continuous function. tells us that if E \subset Y is closed, then f^{-1}(E) = X \setminus f^{-1}(E^c) is also closed. Therefore if we have a continuous function f \colon
X \to {\mathbb{R}}, then the zero set of f, that is, f^{-1}(0) = \{ x \in X : f(x) = 0 \}, is closed. We have just proved the most basic result in algebraic geometry, the study of zero sets of
polynomials.
Similarly the set where f is nonnegative, that is, f^{-1}\bigl( [0,\infty) \bigr) = \{ x \in X : f(x) \geq 0 \} is closed. On the other hand the set where f is positive, f^{-1}\bigl( (0,\infty) \bigr) = \{
x \in X : f(x) > 0 \} is open.
Uniform continuity
As for continuous functions on the real line, in the definition of continuity it is sometimes convenient to be able to pick one \delta for all points.
Let (X,d_X) and (Y,d_Y) be metric spaces. Then f \colon X \to Y is uniformly continuous if for every \epsilon > 0 there is a \delta > 0 such that whenever x,c \in X and d_X(x,c) < \delta, then
d_Y\bigl(f(x),f(c)\bigr) < \epsilon.
A uniformly continuous function is continuous, but not necessarily vice-versa as we have seen.
[thm:Xcompactfunifcont] Let (X,d_X) and (Y,d_Y) be metric spaces. Suppose f \colon X \to Y is continuous and X compact. Then f is uniformly continuous.
Let \epsilon > 0 be given. For each c \in X, pick \delta_c > 0 such that d_Y\bigl(f(x),f(c)\bigr) < \nicefrac{\epsilon}{2} whenever d_X(x,c) < \delta_c. The balls B(c,\delta_c) cover X, and the
space X is compact. Apply the to obtain a \delta > 0 such that for every x \in X, there is a c \in X for which B(x,\delta) \subset B(c,\delta_c).
If x_1, x_2 \in X where d_X(x_1,x_2) < \delta, find a c \in X such that B(x_1,\delta) \subset B(c,\delta_c). Then x_2 \in B(c,\delta_c). By the triangle inequality and the definition of \delta_c
we have d_Y\bigl(f(x_1),f(x_2)\bigr) \leq d_Y\bigl(f(x_1),f(c)\bigr) + d_Y\bigl(f(c),f(x_2)\bigr) < \nicefrac{\epsilon}{2}+ \nicefrac{\epsilon}{2} = \epsilon . \qedhere
Processing math: 39%
Exercises
Consider {\mathbb{N}}\subset {\mathbb{R}} with the standard metric. Let (X,d) be a metric space and f \colon X \to {\mathbb{N}} a continuous function. a) Prove that if X is connected,
then f is constant (the range of f is a single value). b) Find an example where X is disconnected and f is not constant.
Let f \colon {\mathbb{R}}^2 \to {\mathbb{R}} be defined by f(0,0) := 0, and f(x,y) := \frac{xy}{x^2+y^2} if (x,y) \not= (0,0). a) Show that for any fixed x, the function that takes y to f(x,y) is
continuous. Similarly for any fixed y, the function that takes x to f(x,y) is continuous. b) Show that f is not continuous.
Suppose that f \colon X \to Y is continuous for metric spaces (X,d_X) and (Y,d_Y). Let A \subset X. a) Show that f(\overline{A}) \subset \overline{f(A)}. b) Show that the subset can be
proper.
Prove . Hint: Use .
[exercise:msconnconn] Suppose f \colon X \to Y is continuous for metric spaces (X,d_X) and (Y,d_Y). Show that if X is connected, then f(X) is connected.
Prove the following version of the . Let (X,d) be a connected metric space and f \colon X \to {\mathbb{R}} a continuous function. Suppose that there exist x_0,x_1 \in X and y \in
{\mathbb{R}} such that f(x_0) < y < f(x_1). Then prove that there exists a z \in X such that f(z) = y. Hint: See .
A continuous function f \colon X \to Y for metric spaces (X,d_X) and (Y,d_Y) is said to be proper if for every compact set K \subset Y, the set f^{-1}(K) is compact. Suppose a continuous f
\colon (0,1) \to (0,1) is proper and \{ x_n \} is a sequence in (0,1) that converges to 0. Show that \{ f(x_n) \} has no subsequence that converges in (0,1).
Let (X,d_X) and (Y,d_Y) be metric space and f \colon X \to Y be a one-to-one and onto continuous function. Suppose X is compact. Prove that the inverse f^{-1} \colon Y \to X is continuous.
Take the metric space of continuous functions C([0,1],{\mathbb{R}}). Let k \colon [0,1] \times [0,1] \to {\mathbb{R}} be a continuous function. Given f \in C([0,1],{\mathbb{R}}) define
\varphi_f(x) := \int_0^1 k(x,y) f(y) ~dy . a) Show that T(f) := \varphi_f defines a function T \colon C([0,1],{\mathbb{R}}) \to C([0,1],{\mathbb{R}}).
b) Show that T is continuous.
Let (X,d) be a metric space.
a) If p \in X, show that f \colon X \to {\mathbb{R}} defined by f(x) := d(x,p) is continuous.
b) Define a metric on X \times X as in part b, and show that g \colon X \times X \to {\mathbb{R}} defined by g(x,y) := d(x,y) is continuous.
c) Show that if K_1 and K_2 are compact subsets of X, then there exists a p \in K_1 and q \in K_2 such that d(p,q) is minimal, that is, d(p,q) = \inf \{ (x,y) \colon x \in K_1, y \in K_2 \}.
Exercises
For more exercises related to Picard’s theorem see .
Let F \colon {\mathbb{R}}\to {\mathbb{R}} be defined by F(x) := kx + b where 0 < k < 1, b \in {\mathbb{R}}.
a) Show that F is a contraction.
b) Find the fixed point and show directly that it is unique.
Let f \colon [0,\nicefrac{1}{4}] \to [0,\nicefrac{1}{4}] be defined by f(x) := x^2 is a contraction.
a) Show that f is a contraction, and find the best (smallest) k from the definition that works.
b) Find the fixed point and show directly that it is unique.
[exercise:nofixedpoint] a) Find an example of a contraction f \colon X \to X of non-complete metric space X with no fixed point. b) Find a 1-Lipschitz map f \colon X \to X of a complete
metric space X with no fixed point.
Consider y' =y^2, y(0)=1. Use the iteration scheme from the proof of the contraction mapping principle. Start with f_0(x) = 1. Find a few iterates (at least up to f_2). Prove that the pointwise
limit of f_n is \frac{1}{1-x}, that is for every x with \left\lvert {x} \right\rvert < h for some h > 0, prove that \lim\limits_{n\to\infty}f_n(x) = \frac{1}{1-x}.
Suppose f \colon X \to X is a contraction for k < 1. Suppose you use the iteration procedure with x_{n+1} := f(x_n) as in the proof of the fixed point theorem. Suppose x is the fixed point of f.
a) Show that d(x,x_n) \leq k^n d(x_1,x_0) \frac{1}{1-k} for all n \in {\mathbb{N}}.
b) Suppose d(y_1,y_2) \leq 16 for all y_1,y_2 \in X, and k= \nicefrac{1}{2}. Find an N such that starting at any point x_0 \in X, d(x,x_n) \leq 2^{-16} for all n \geq N.
Let f(x) := x-\frac{x^2-2}{2x}. (You may recognize Newton’s method for \sqrt{2})
a) Prove f\bigl([1,\infty)\bigr) \subset [1,\infty).
b) Prove that f \colon [1,\infty) \to [1,\infty) is a contraction.
c) Apply the fixed point theorem to find an x \geq 1 such that f(x) = x, and show that x = \sqrt{2}.
Suppose f \colon X \to X is a contraction, and (X,d) is a metric space with the discrete metric, that is d(x,y) = 1 whenever x \not= y. Show that f is constant, that is, there exists a c \in X such
that f(x) = c for all x \in X.
Learning Objects
Learning Objects
Learning Objects
Edit page
Click the Edit page button in your user bar. You will see a suggested structure for your content. Add your content and hit
Save.
Tips:
Classifications
Tags are used to link pages to one another along common themes. Tags are also used as markers for the dynamic
organization of content in the MindTouch framework.
1 5/26/2021
3.1: Sequences and Limits
Edit page
Click the Edit page button in your user bar. You will see a suggested structure for your content. Add your content and hit
Save.
Tips:
Classifications
Tags are used to link pages to one another along common themes. Tags are also used as markers for the dynamic
organization of content in the MindTouch framework.
Learning Objects
Learning Objects
Learning Objects
Learning Objects
Learning Objects
1 5/26/2021
Welcome to the Mathematics Library. This Living Library is a principal hub of the LibreTexts project, which is a multi-
institutional collaborative venture to develop the next generation of open-access texts to improve postsecondary education at
all levels of higher learning. The LibreTexts approach is highly collaborative where an Open Access textbook environment is
under constant revision by students, faculty, and outside experts to supplant conventional paper-based books.
Learning Objects
Learning Objects
Learning Objects
Edit page
Click the Edit page button in your user bar. You will see a suggested structure for your content. Add your content and hit
Save.
Tips:
Classifications
Tags are used to link pages to one another along common themes. Tags are also used as markers for the dynamic
organization of content in the MindTouch framework.
Learning Objects
Edit page
Click the Edit page button in your user bar. You will see a suggested structure for your content. Add your content and hit
Save.
Tips:
Classifications
Tags are used to link pages to one another along common themes. Tags are also used as markers for the dynamic
organization of content in the MindTouch framework.
1 5/26/2021
5.1: The Derivative
Edit page
Click the Edit page button in your user bar. You will see a suggested structure for your content. Add your content and hit
Save.
Tips:
Classifications
Tags are used to link pages to one another along common themes. Tags are also used as markers for the dynamic
organization of content in the MindTouch framework.
Edit page
Click the Edit page button in your user bar. You will see a suggested structure for your content. Add your content and hit
Save.
Tips:
Classifications
Tags are used to link pages to one another along common themes. Tags are also used as markers for the dynamic
organization of content in the MindTouch framework.
Edit page
Click the Edit page button in your user bar. You will see a suggested structure for your content. Add your content and hit
Save.
Tips:
Classifications
Tags are used to link pages to one another along common themes. Tags are also used as markers for the dynamic
organization of content in the MindTouch framework.
Learning Objects
1 5/26/2021
Welcome to the Mathematics Library. This Living Library is a principal hub of the LibreTexts project, which is a multi-
institutional collaborative venture to develop the next generation of open-access texts to improve postsecondary education at
all levels of higher learning. The LibreTexts approach is highly collaborative where an Open Access textbook environment is
under constant revision by students, faculty, and outside experts to supplant conventional paper-based books.
Learning Objects
Learning Objects
Edit page
Click the Edit page button in your user bar. You will see a suggested structure for your content. Add your content and hit
Save.
Tips:
Classifications
Tags are used to link pages to one another along common themes. Tags are also used as markers for the dynamic
organization of content in the MindTouch framework.
Learning Objects
Learning Objects
i=1
i=1
¯¯
¯
And we call ∫ the and ∫ the . Finally, if
–
–
¯
¯¯¯¯¯¯
¯
b b
∫ f (x) dα(x). (6.6.2)
a
When we set α(x) := x we recover the Riemann integral. The notation dα suggests derivative, in this case α (x) = 1 and as ′
we said, the Riemann integral is when all points are weighted equally.
If α(x) := x, then a bounded function f : [a, b] → \R is Riemann integrable if and only if it is Riemann-Stieltjes integrable
with respect to α . In this case
b b
∫ f =∫ f dα. (6.6.3)
a a
Simply plug in α(x) = x into the definition and note that the definition is now precisely the same as for the Riemann integral.
Suppose that f : [a, b] → \R is continuous. Given c ∈ (a, b) , let
1 if x ≥ c,
α(x) := { (6.6.4)
0 if x < c.
Proof: Given ϵ>0 take δ > 0 such that \absf (x) − f (c) < ϵ for all x ∈ [a, b] with \absx − c < δ . Take the partition
P = {a, c − δ, c + δ, b} . Then
> f (c) − ϵ.
The notion of of integrability really does depend on α . For a very trivial example, it is not difficult to see that if α(x) = 0 ,
then all bounded functions f on [a, b] are integrable with respect to this α and
b
∫ f dα = 0. (6.6.7)
a
If α is very nice, we can recover the Riemann-Stieltjes integral using the Riemann integral.
Suppose that f : [a, b] → \R is Riemann integrable and α: [a, b] → \R is a continuously differentiable increasing function.
Then f is Riemann-Stieltjes integrable with respect to α and
b b
′
∫ f (x) dα(x) = ∫ f (x)α (x) dx. (6.6.8)
a a
FIXME
Exercises
Directly from the definition of the Riemann-Stieltjes integral prove that if α(x) = px for some p ≥0 , then If f is Riemann
b b
integrable, then it is Riemann-Stieltjes integrable with respect to α and p ∫ a
f =∫
a
f dα .
Let α: [a, b] → \R and β: [a, b] → \R be increasing functions and suppose that α(x) = β(x) + C for some constant C . If
b b
f : [a, b] → \R is integrable with respect to α , show that it is integrable with respect to β and ∫ f dα = ∫ f dβ .
a a
1 5/26/2021
Welcome to the Mathematics Library. This Living Library is a principal hub of the LibreTexts project, which is a multi-
institutional collaborative venture to develop the next generation of open-access texts to improve postsecondary education at
all levels of higher learning. The LibreTexts approach is highly collaborative where an Open Access textbook environment is
under constant revision by students, faculty, and outside experts to supplant conventional paper-based books.
Learning Objects
Edit page
Click the Edit page button in your user bar. You will see a suggested structure for your content. Add your content and hit
Save.
Tips:
Classifications
Tags are used to link pages to one another along common themes. Tags are also used as markers for the dynamic
organization of content in the MindTouch framework.
Edit page
Click the Edit page button in your user bar. You will see a suggested structure for your content. Add your content and hit
Save.
Tips:
Classifications
Tags are used to link pages to one another along common themes. Tags are also used as markers for the dynamic
organization of content in the MindTouch framework.
1 5/26/2021
8.1: Metric Spaces
As mentioned in the introduction, the main idea in analysis is to take limits. In we learned to take limits of sequences of real
numbers. And in we learned to take limits of functions as a real number approached some other real number.
We want to take limits in more complicated contexts. For example, we might want to have sequences of points in 3-
dimensional space. Or perhaps we wish to define continuous functions of several variables. We might even want to define
functions on spaces that are a little harder to describe, such as the surface of the earth. We still want to talk about limits there.
Finally, we have seen the limit of a sequence of functions in . We wish to unify all these notions so that we do not have to
reprove theorems over and over again in each context. The concept of a metric space is an elementary yet powerful tool in
analysis. And while it is not sufficient to describe every type of limit we can find in modern analysis, it gets us very far indeed.
The geometric idea is that d is the distance between two points. Items [metric:pos]–[metric:com] have obvious geometric
interpretation: distance is always nonnegative, the only point that is distance 0 away from x is x itself, and finally that the
distance from x to y is the same as the distance from y to x. The triangle inequality [metric:triang] has the interpretation given
in
For the purposes of drawing, it is convenient to draw figures and diagrams in the plane and have the metric be the standard
distance. However, that is only one particular metric space. Just because a certain fact seems to be clear from drawing a picture
does not mean it is true. You might be getting sidetracked by intuition from euclidean geometry, whereas the concept of a
metric space is a lot more general.
Let us give some examples of metric spaces.
The set of real numbers R is a metric space with the metric
d(x, y) := |x − y| . (8.1.1)
Items [metric:pos]–[metric:com] of the definition are easy to verify. The triangle inequality [metric:triang] follows
immediately from the standard triangle inequality for real numbers:
d(x, z) = |x − z| = |x − y + y − z| ≤ |x − y| + |y − z| = d(x, y) + d(y, z). (8.1.2)
This metric is the standard metric on R . If we talk about R as a metric space without mentioning a specific metric, we mean
this particular metric.
We can also put a different metric on the set of real numbers. For example take the set of real numbers R together with the
metric
|x − y|
d(x, y) := . (8.1.3)
|x − y| + 1
Items [metric:pos]–[metric:com] are again easy to verify. The triangle inequality [metric:triang] is a little bit more difficult.
Note that d(x, y) = φ(|x − y|) where φ(t) = and note that φ is an increasing function (positive derivative) hence
t
t+1
|x − y| + |y − z| |x − y| |y − z|
= = +
|x − y| + |y − z| + 1 |x − y| + |y − z| + 1 |x − y| + |y − z| + 1
|x − y| |y − z|
≤ + = d(x, y) + d(y, z).
|x − y| + 1 |y − z| + 1
Here we have an example of a nonstandard metric on R. With this metric we can see for example that d(x, y) < 1 for all
x, y ∈ R. That is, any two points are less than 1 unit apart.
An important metric space is the n -dimensional euclidean space R = R × R × ⋯ × R . We use the following notation for
n
points: x = (x , x , … , x ) ∈ R . We also simply write 0 ∈ R to mean the vector (0, 0, … , 0). Before making R a
1 2 n
n n n
metric space, let us prove an important inequality, the so-called Cauchy-Schwarz inequality.
Take x = (x 1, x2 , … , xn ) ∈ R
n
and y = (y 1, y2 , … , yn ) ∈ R
n
. Then
n 2 n n
2 2
( ∑ xj yj ) ≤ ( ∑ x )( ∑ y ). (8.1.4)
j j
Any square of a real number is nonnegative. Hence any sum of squares is nonnegative:
n n
2
0 ≤ ∑ ∑(xj yk − xk yj )
j=1 k=1
n n
2 2 2 2
= ∑ ∑(x y +x y − 2 xj xk yj yk )
j k k j
j=1 k=1
n n n n n n
2 2 2 2
= ( ∑ x )( ∑ y ) + ( ∑ y )( ∑ x ) − 2( ∑ xj yj )( ∑ xk yk )
j k j k
2 2
0 ≤ ( ∑ x )( ∑ y ) − ( ∑ xj yj ) , (8.1.5)
j j
−− −−−−−−−−−
n
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
2 2 2 2
d(x, y) := √ (x1 − y1 ) + (x2 − y2 ) + ⋯ + (xn − yn ) = ∑ (xj − yj ) . (8.1.6)
⎷ j=1
For n = 1 , the real line, this metric agrees with what we did above. Again, the only tricky part of the definition to check is the
triangle inequality. It is less messy to work with the square of the metric. In the following, note the use of the Cauchy-Schwarz
inequality.
2 2
d(x, z) = ∑ (xj − zj )
j=1
n
2
= ∑ (xj − yj + yj − zj )
j=1
n
2 2
= ∑( (xj − yj ) + (yj − zj ) + 2(xj − yj )(yj − zj ))
j=1
n n n
2 2
= ∑ (xj − yj ) + ∑ (yj − zj ) + ∑ 2(xj − yj )(yj − zj )
−−−−−−−−−−−−−−−−−−−− −
n n n n
2 2
2 2
≤ ∑ (xj − yj ) + ∑ (yj − zj ) + 2 ∑ (xj − yj ) ∑ (yj − zj )
⎷
j=1 j=1 j=1 j=1
2
−− −−−−−−−−− −−−−−−−−−− −
⎛ n n ⎞
2
2 2
= ⎜∑ (xj − yj ) + ∑ (yj − zj ) ⎟ = (d(x, y) + d(y, z)) .
⎷ ⎷
⎝ j=1 j=1 ⎠
Taking the square root of both sides we obtain the correct inequality.
An example to keep in mind is the so-called discrete metric. Let X be any set and define
1 if x ≠ y,
d(x, y) := { (8.1.7)
0 if x = y.
That is, all points are equally distant from each other. When X is a finite set, we can draw a diagram, see for example . Things
become subtle when X is an infinite set such as the real numbers.
While this particular example seldom comes up in practice, it is gives a useful “smell test.” If you make a statement about
metric spaces, try it with the discrete metric. To show that (X, d) is indeed a metric space is left as an exercise.
[example:msC01] Let C ([a, b]) be the set of continuous real-valued functions on the interval [a, b] . Define the metric on
C ([a, b]) as
Let us check the properties. First, d(f , g) is finite as |f (x) − g(x)| is a continuous function on a closed bounded interval
[a, b], and so is bounded. It is clear that d(f , g) ≥ 0 , it is the supremum of nonnegative numbers. If f = g then
|f (x) − g(x)| = 0 for all x and hence d(f , g) = 0 . Conversely if d(f , g) = 0 , then for any x we have
|f (x) − g(x)| ≤ d(f , g) = 0 and hence f (x) = g(x) for all x and f = g . That d(f , g) = d(g, f ) is equally trivial. To show
When treat C ([a, b]) as a metric space without mentioning a metric, we mean this particular metric.
This example may seem esoteric at first, but it turns out that working with spaces such as C ([a, b]) is really the meat of a large
part of modern analysis. Treating sets of functions as metric spaces allows us to abstract away a lot of the grubby detail and
prove powerful results such as Picard’s theorem with less work.
Oftentimes it is useful to consider a subset of a larger metric space as a metric space. We obtain the following proposition,
which has a trivial proof.
Let (X, d) be a metric space and Y ⊂X , then the restriction d| Y ×Y
is a metric on Y .
It is common to simply write d for the metric on Y , as it is the restriction of the metric on X. Sometimes we will say that d is ′
Exercises
Show that for any set X, the discrete metric (d(x, y) = 1 if x ≠ y and d(x, x) = 0 ) does give a metric space (X, d) .
Let X := {0} be a set. Can you make it into a metric space?
Let X := {a, b} be a set. Can you make it into two distinct metric spaces? (define two distinct metrics on it)
Let the set X := {A, B, C } represent 3 buildings on campus. Suppose we wish to our distance to be the time it takes to walk
from one building to the other. It takes 5 minutes either way between buildings A and B . However, building C is on a hill and
it takes 10 minutes from A and 15 minutes from B to get to C . On the other hand it takes 5 minutes to go from C to A and 7
minutes to go from C to B , as we are going downhill. Do these distances define a metric? If so, prove it, if not say why not.
Suppose that (X, d) is a metric space and φ: [0, ∞] → R is an increasing function such that φ(t) ≥ 0 for all t and φ(t) = 0 if
and only if t = 0 . Also suppose that φ is subadditive, that is φ(s + t) ≤ φ(s) + φ(t) . Show that with
d (x, y) := φ(d(x, y)), we obtain a new metric space (X, d ).
′ ′
Note: d can be defined for arbitrary nonempty subsets if we allow the extended reals.
H
a) Let Y ⊂ P(X) be the set of bounded nonempty subsets. Show that (Y , d ) is a metric space. b) Show by example that d
H
When we are dealing with different metric spaces, it is sometimes convenient to emphasize which metric space the ball is in.
We do this by writing B (x, δ) := B(x, δ) or C (x, δ) := C (x, δ).
X X
Take the metric space R with the standard metric. For x ∈ R, and δ > 0 we get
B(x, δ) = (x − δ, x + δ) and C (x, δ) = [x − δ, x + δ]. (8.2.3)
Be careful when working on a subspace. Suppose we take the metric space [0, 1] as a subspace of R. Then in [0, 1] we get
⋂ Vj (8.2.5)
j=1
⋃ Vλ (8.2.6)
λ∈I
that B(x, δ ) ⊂ V . Take δ := min{δ , … , δ } and note that δ > 0 . We have B(x, δ) ⊂ B(x, δ ) ⊂ V for every j and thus
j j 1 k j j
k
B(x, δ) ⊂ ⋂ V . Thus the intersection is open.
j=1 j
The proof of the following analogous proposition for closed sets is left as an exercise.
[prop:topology:closed] Let (X, d) be a metric space.
i. [topology:closedi] ∅ and X are closed in X.
ii. [topology:closedii] If {E } λis an arbitrary collection of closed sets, then
λ∈I
⋂ Eλ (8.2.7)
λ∈I
⋃ Ej (8.2.8)
j=1
Therefore z ∈ B(x, δ) for every z ∈ B(y, α) . So B(y, α) ⊂ B(x, δ) and B(x, δ) is open.
The proof that C (x, δ) is closed is left as an exercise.
Again be careful about what is the ambient metric space. As [0, \nicefrac12) is an open ball in [0, 1], this means that
[0, \nicefrac12) is an open set in [0, 1]. On the other hand [0, \nicefrac12) is neither open nor closed in R .
A useful way to think about an open set is a union of open balls. If U is open, then for each x ∈ U , there is a δx > 0
The proof of the following proposition is left as an exercise. Note that there are other open and closed sets in R.
[prop:topology:intervals:openclosed] Let a < b be two real numbers. Then ,
(a, b) (a, ∞) , and (−∞, b) are open in R . Also
[a, b], [a, ∞), and (−∞, b] are closed in R .
Connected sets
A nonempty metric space (X, d) is connected if the only subsets that are both open and closed are ∅ and X itself.
When we apply the term connected to a nonempty subset A ⊂X , we simply mean that A with the subspace topology is
connected.
In other words, a nonempty X is connected if whenever we write X = X ∪ X where X ∩ X = ∅ and X and X are
1 2 1 2 1 2
open, then either X = ∅ or X = ∅ . So to test for disconnectedness, we need to find nonempty disjoint open sets X and X
1 2 1 2
If Uj is open in X, then U ∩ S is open in S in the subspace topology (with subspace metric). To see this, note that if
j
BX (x, δ) ⊂ Uj , then as B (x, δ) = S ∩ B (x, δ) , we have B (x, δ) ⊂ U ∩ S . The proof follows by the above discussion.
S X S j
The proof of the other direction follows by using to find U and U from two open disjoint subsets of S .
1 2
Let S ⊂ R be such that x < z < y with x, y ∈ S and z ∉ S . Claim: S is not connected. Proof: Notice
We have shown above that z ∈ S , so (α, β) ⊂ S . If w < α , then w ∉ S as α was the infimum, similarly if w > β then
w ∉ S . Therefore the only possibilities for S are (α, β), [α, β), (α, β], [α, β].
nonempty, and S = (U ∩ S) ∪ (U ∩ S) . We will show that U ∩ S and U ∩ S contain a common point, so they are not
1 2 1 2
disjoint, and hence S must be connected. Suppose that there is x ∈ U ∩ S and y ∈ U ∩ S . We can assume that x < y . As S
1 2
is an interval [x, y] ⊂ S. Let z := inf(U ∩ [x, y]) . If z = x , then z ∈ U . If z > x , then for any δ > 0 the ball
2 1
B(z, δ) = (z − δ, z + δ) contains points that are not in U , and so z ∉ U as U is open. Therefore, z ∈ U . As U is open,
2 2 2 1 1
B(z, δ) ⊂ U for a small enough δ > 0 . As z is the infimum of U ∩ [x, y], there must exist some w ∈ U ∩ [x, y] such that
1 2 2
w ∈ [z, z + δ) ⊂ B(z, δ) ⊂ U . Therefore w ∈ U ∩ U ∩ [x, y] . So U ∩ S and U ∩ S are not disjoint and hence S is
1 1 2 1 2
connected.
In many cases a ball B(x, δ) is connected. But this is not necessarily true in every metric space. For a simplest example, take a
two point space {a, b} with the discrete metric. Then B(a, 2) = {a, b} , which is not connected as B(a, 1) = {a} and
B(b, 1) = {b} are open and disjoint.
¯
¯¯¯
That is, A is the intersection of all closed sets that contain A .
¯
¯¯¯ ¯
¯¯¯
Let (X, d) be a metric space and A ⊂ X . The closure A is closed. Furthermore if A is closed then A = A .
First, the closure is the intersection of closed sets, so it is closed. Second, if A is closed, then take E =A , hence the
intersection of all closed sets E containing A must be equal to A .
The closure of (0, 1) in R is [0, 1]. Proof: Simply notice that if E is closed and contains (0, 1), then E must contain 0 and 1
¯
¯¯¯¯¯¯¯¯¯
¯
(why?). Thus [0, 1] ⊂ E . But [0, 1] is also closed. Therefore the closure (0, 1) = [0, 1].
Be careful to notice what ambient metric space you are working with. If X = (0, ∞) , then the closure of (0, 1) in (0, ∞) is
(0, 1]. Proof: Similarly as above (0, 1] is closed in (0, ∞) (why?). Any closed set E that contains (0, 1) must contain 1
¯
¯¯¯¯¯¯¯¯¯
¯
(why?). Therefore (0, 1] ⊂ E , and hence (0, 1) = (0, 1] when working in (0, ∞).
Let us justify the statement that the closure is everything that we can “approach” from the set.
¯
¯¯¯
[prop:msclosureappr] Let (X, d) be a metric space and A ⊂ X . Then x ∈ A if and only if for every δ > 0 , B(x, δ) ∩ A ≠ ∅ .
On the other hand suppose that there is a δ > 0 such that B(x, δ) ∩ A = ∅ . Then B(x, δ) is a closed set and we have that c
¯
¯¯¯ ¯
¯¯¯
A ⊂ B(x, δ) , but x ∉ B(x, δ) . Thus as A is the intersection of closed sets containing A , we have x ∉ A .
c c
We can also talk about what is in the interior of a set and what is on the boundary.
Let (X, d) be a metric space and A ⊂ X , then the interior of A is the set
∘
A := {x ∈ A : there exists a δ > 0 such that B(x, δ) ⊂ A}. (8.2.13)
¯
¯¯¯
Suppose A = (0, 1] and X = R . Then it is not hard to see that A = [0, 1], A ∘
= (0, 1) , and ∂A = {0, 1} .
¯
¯¯¯
Suppose X = {a, b} with the discrete metric. Let A = {a} , then A = A and ∂A = ∅ . ∘
Given x ∈ A we have δ > 0 such that B(x, δ) ⊂ A . If z ∈ B(x, δ) , then as open balls are open, there is an ϵ > 0 such that
∘
¯
¯¯¯ ¯
¯¯¯
As A is open, then ∂A = A ∖ A
∘ ∘
= A ∩ (A )
∘ c
is closed.
The boundary is the set of points that are close to both the set and its complement.
Let (X, d) be a metric space and A ⊂X . Then x ∈ ∂A if and only if for every δ >0 , B(x, δ) ∩ A and B(x, δ) ∩ A
c
are
both nonempty.
c
¯
¯¯¯ ¯
¯¯¯ ¯
¯¯¯
If x ∉ A , then there is some δ > 0 such that B(x, δ) ⊂ A as A is closed. So B(x, δ) contains no points of A .
Now suppose that x ∈ A , then there exists a δ > 0 such that B(x, δ) ⊂ A , but that means that B(x, δ) contains no points of
∘
A .
c
¯
¯¯¯
Finally suppose that x ∈ A ∖ A . Let δ > 0 be arbitrary. By B(x, δ) contains a point from
∘
A . Also, if B(x, δ) contained no
points of A , then x would be in A . Hence B(x, δ) contains a points of A as well.
c ∘ c
We obtain the following immediate corollary about closures of A and A . We simply apply . c
¯
¯¯¯ ¯
¯¯¯¯
¯
Let (X, d) be a metric space and A ⊂ X . Then ∂A = A ∩ A . c
Exercises
Prove . Hint: consider the complements of the sets and apply .
Finish the proof of by proving that C (x, δ) is closed.
Prove .
Suppose that (X, d) is a nonempty metric space with the discrete topology. Show that X is connected if and only if it contains
exactly one element.
Show that if S ⊂ R is a connected unbounded set, then it is an (unbounded) interval.
Show that every open set can be written as a union of closed sets.
a) Show that E is closed if and only if ∂E ⊂ E . b) Show that U is open if and only if ∂U ∩ U =∅ .
a) Show that A is open if and only if A ∘
=A . b) Suppose that U is an open set and U ⊂A . Show that U ⊂A
∘
.
Let X be a set and d ,
be two metrics on X. Suppose that there exists an α > 0 and β > 0 such that
d
′
αd(x, y) ≤ d (x, y) ≤ βd(x, y) for all x, y ∈ X . Show that U is open in (X, d) if and only if U is open in (X, d ). That is,
′ ′
i=1 i
¯
¯¯¯
Let A be a connected set. a) Is A connected? Prove or find a counterexample. b) Is A
∘
connected? Prove or find a
counterexample. Hint: Think of sets in R . 2
The definition of open sets in the following exercise is usually called the subspace topology. You are asked to show that we
obtain the same topology by considering the subspace metric.
[exercise:mssubspace] Suppose (X, d) is a metric space and Y ⊂ X . Show that with the subspace metric on Y , a set U ⊂Y
∞
{ xn } . (8.3.1)
n=1
Similarly we also define convergence. Again, we will be cheating a little bit and we will use the definite article in front of the
word limit before we prove that the limit is unique.
A sequence {x } in a metric space (X, d) is said to converge to a point p ∈ X , if for every
n ϵ>0 , there exists an M ∈ N
such that d(x , p) < ϵ for all n ≥ M . The point p is said to be the limit of {x }. We write
n n
lim xn := p. (8.3.3)
n→∞
A sequence that converges is said to be convergent. Otherwise, the sequence is said to be divergent.
Let us prove that the limit is unique. Note that the proof is almost identical to the proof of the same fact for sequences of real
numbers. In fact many results we know for sequences of real numbers can be proved in the more general settings of metric
spaces. We must replace |x − y| with d(x, y) in the proofs and apply the triangle inequality correctly.
[prop:mslimisunique] A convergent sequence in a metric space has a unique limit.
Suppose that the sequence {x } has the limit x and the limit y . Take an arbitrary ϵ > 0 . From the definition we find an M
n 1
such that for all n ≥ M , d(x , x) < \nicefracϵ2. Similarly we find an M such that for all n ≥ M we have
1 n 2 2
As d(y, x) < ϵ for all ϵ > 0 , then d(x, y) = 0 and y = x . Hence the limit (if it exists) is unique.
The proofs of the following propositions are left as exercises.
[prop:msconvbound] A convergent sequence in a metric space is bounded.
[prop:msconvifa] A sequence {x n} in a metric space (X, d) converges to p ∈ X if and only if there exists a sequence {a n} of
real numbers such that
d(xn , p) ≤ an for all n ∈ N, (8.3.4)
and
lim an = 0. (8.3.5)
n→∞
j j j j
lim x = ( lim x , lim x , … , lim xn ). (8.3.6)
1 2
j→∞ j→∞ j→∞ j→∞
j j j
Let {x } j
be a convergent sequence in R , where we write x = (x , x , … , x
∞
j=1
n j
1 2 n) ∈ R
n
. Let x = (x1 , x2 , … , xn ) ∈ R
n
be the limit. Given ϵ > 0 , there exists an M such that for all j ≥ M we have
j
d(x, x ) < ϵ. (8.3.7)
j
Hence the sequence {x k
}
∞
j=1
converges to x . k
k
∞
}
j=1
converges to x for every k = 1, 2, … , n. Hence, given ϵ > 0 , pick an M , such
k
every open neighborhood U of x, there exists an M ∈ N such that for all n ≥ M we have x ∈ U . n
First suppose that {x } converges. Let U be an open neighborhood of x, then there exists an ϵ > 0 such that B(x, ϵ) ⊂ U . As
n
the sequence converges, find an M ∈ N such that for all n ≥ M we have d(x, x ) < ϵ or in other words x ∈ B(x, ϵ) ⊂ U . n n
Let us prove the other direction. Given ϵ > 0 let U := B(x, ϵ) be the neighborhood of x. Then there is an M ∈ N such that
for n ≥ M we have x ∈ U = B(x, ϵ) or in other words, d(x, x ) < ϵ .
n n
Let us prove the contrapositive. Suppose {x n} is a sequence in X that converges to x ∈ E . As E is open, says there is an c c
When we take a closure of a set A , we really throw in precisely those points that are limits of sequences in A .
¯
¯¯¯
[prop:msclosureapprseq] Let (X, d) be a metric space and A ⊂ X . If x ∈ A , then there exists a sequence {x n} of elements in
A such that lim x = x . n
¯
¯¯¯
Let x ∈ A . We know by that given \nicefrac1n , there exists a point xn ∈ B(x, \nicefrac1n) ∩ A . As
d(x, xn ) < \nicefrac1n , we have that lim x n =x .
Exercises
Let (X, d) be a metric space and let A ⊂X . Let E be the set of all x ∈ X such that there exists a sequence { xn } in A that
¯
¯¯¯
converges to x. Show that E = A .
converges to x.
If (X, d) is a metric space where d is the discrete metric. Suppose that {x n} is a convergent sequence in X . Show that there
exists a K ∈ N such that for all n ≥ K we have x = x . n K
A set S ⊂ X is said to be dense in X if for every x ∈ X , there exists a sequence {x n} in S that converges to x. Prove that R n
⋂
∞
n=1
Un = {p}for some p ∈ X . Suppose that {x } is a sequence of points in X such that x ∈ U . Does {x } necessarily
n n n n
{ xn } is a sequence of real numbers such that for x ≥ n for all n . Show that lim x = ∞ in (R , d).
n n
∗
The definition is again simply a translation of the concept from the real numbers to metric spaces. So a sequence of real
numbers is Cauchy in the sense of if and only if it is Cauchy in the sense above, provided we equip the real numbers with the
standard metric d(x, y) = |x − y| .
Let (X, d) be a metric space. We say that X is complete or Cauchy-complete if every Cauchy sequence {x n} in X converges
to an x ∈ X .
The space R with the standard metric is a complete metric space.
n
j j j
Take n > 1 . Let {x } j
be a Cauchy sequence in R , where we write
∞
j=1
n
x
j
= (x , x , … , xn ) ∈ R
1 2
n
. As the sequence is
Cauchy, given ϵ > 0 , there exists an M such that for all i, j ≥ M we have
i j
d(x , x ) < ϵ. (8.4.2)
k
.
Write x = (x 1, x2 , … , xn ) ∈ R
n
. By we have that {x j
} converges to x ∈ R and hence R is complete.
n n
Compactness
Let (X, d) be a metric space and K ⊂ X . The set K is set to be compact if for any collection of open sets {U λ }λ∈I such that
K ⊂ ⋃ Uλ , (8.4.3)
λ∈I
K ⊂ ⋃ Uλ . (8.4.4)
j
j=1
A collection of open sets {U } λas above is said to be a open cover of K . So a way to say that K is compact is to say that
λ∈I
K ⊂ ⋃ B(p, n) = X. (8.4.5)
n=1
If K is compact, then there exists some set of indices n 1 < n2 < … < nk such that
k
j=1
c
K ⊂ ⋃ C (x, \nicefrac1n) . (8.4.7)
n=1
As a closed ball is closed, C (x, \nicefrac1n) is open, and so we have an open cover. If we take any finite collection of
c
c c
⋃ C (x, \nicefrac1 nj ) = C (x, \nicefrac1 nk ) (8.4.8)
j=1
As x is in the closure, we have C (x, \nicefrac1n k) ∩K ≠∅ , so there is no finite subcover and K is not compact.
We prove below that in finite dimensional euclidean space every closed bounded set is compact. So closed bounded sets of R n
are examples of compact sets. It is not true that in every metric space, closed and bounded is equivalent to compact. There are
many metric spaces where closed and bounded is not enough to give compactness, see for example .
A useful property of compact sets in a metric space is that every sequence has a convergent subsequence. Such sets are
sometimes called sequentially compact. Let us prove that in the context of metric spaces, a set is compact if and only if it is
sequentially compact.
[thm:mscompactisseqcpt] Let (X, d) be a metric space. Then K ⊂ X is a compact set if and only if every sequence in K has
a subsequence converging to a point in K .
Let K ⊂ X be a set and {x } a sequence in K . Suppose that for each x ∈ K , there is a ball B(x, α
n x) for some α x >0 such
that x ∈ B(x, α ) for only finitely many n ∈ N . Then
n x
K ⊂ ⋃ B(x, αx ). (8.4.9)
x∈K
Any finite collection of these balls is going to contain only finitely many x . Thus for any finite collection of such balls there n
So if is compact, then there exists an x ∈ K such that for any δ > 0 , B(x, δ) contains x for infinitely many k ∈ N .
K k
B(x, 1) contains some x so let n := k . If n k is defined, then there must exist a k > n
1 j−1 such that j−1
For the other direction, suppose that every sequence in K has a subsequence converging in K . Take an open cover {Uλ }λ∈I
As {U } is an open cover of K , δ(x) > 0 for each x ∈ K . By construction, for any positive ϵ < δ(x) there must exist a λ ∈ I
λ
Pick a and look at U . If K ⊂ U , we stop as we have found a finite subcover. Otherwise, there must be a point
λ0 ∈ I λ0 λ0
x1 ∈ K ∖ Uλ0 . There must exist some λ ∈ I such that x ∈ U and in fact B(x , δ(x )) ⊂ U . We work inductively.
1 1 λ1 1
1
2
1 λ1
point x ∈ K ∖ (U ∪ U ∪ ⋯ ∪ U
n ) . In this case, there must be some λ ∈ I such that x ∈ U
λ1 λ2 λn−1 , and in fact n n λn
1
B(xn , δ(xn )) ⊂ Uλ . (8.4.11)
2 n
So either we obtained a finite subcover or we obtained an infinite sequence {x } as above. For contradiction suppose that n
there was no finite subcover and we have the sequence {x }. Then there is a subsequence {x } that converges, that is,
n nk
∈ K . We take λ ∈ I such that B(x, δ(x)) ⊂ U . As the subsequence converges, there is a k such that
1
x = lim x nk λ
2
3 1
B(xn , δ(x)) ⊂ B(xn , δ(xn )) ⊂ Uλ . (8.4.12)
k 16 k 2 k n
k
16
λn
k
nj =x , for all j large enough we have
xnj∈ U by . Let us fix one of those j such that j > k . But by construction x ∉ U
λn
k
nj λn
k
if j > k , which is a contradiction.
By the Bolzano-Weierstrass theorem for sequences () we have that any bounded sequence has a convergent subsequence.
Therefore any sequence in a closed interval [a, b] ⊂ R has a convergent subsequence. The limit must also be in [a, b] as limits
preserve non-strict inequalities. Hence a closed bounded interval [a, b] ⊂ R is compact.
Let (X, d) be a metric space and let K ⊂ X be compact. Suppose that E ⊂ K is a closed set, then E is compact.
Let {x n} be a sequence in E . It is also a sequence in K . Therefore it has a convergent subsequence {x nj } that converges to
x ∈ K . As E is closed the limit of a sequence in E is also in E and so x ∈ E . Thus E must be compact.
[thm:msbw] A closed bounded subset K ⊂ R is compact. n
For R = R if K ⊂ R is closed and bounded, then any sequence {x } in K is bounded, so it has a convergent subsequence
1
n
by Bolzano-Weierstrass theorem for sequences (). As K is closed, the limit of the subsequence must be an element of K . So
K is compact.
Let us carry out the proof for n = 2 and leave arbitrary n as an exercise.
As K is bounded, there exists a set B = [a, b] × [c, d] ⊂ R 2
such that K ⊂B . If we can show that B is compact, then K ,
being a closed subset of a compact B , is also compact.
Let {(x , y )}
k be a sequence in B . That is, a ≤ x ≤ b and c ≤ y ≤ d for all k . A bounded sequence has a convergent
k
∞
k=1
k k
there exists a subsequence {y } that is convergent. A subsequence of a convergent sequence is still convergent, so
kj
i
∞
i=1
{x kj}
i
is convergent. Let
∞
i=1
∞
By , {(x kj
i
, yk
j
i
)}
i=1
converges to (x, y) as i goes to ∞. Furthermore, as a ≤ x k ≤b and c ≤ y k ≤d for all k , we know that
(x, y) ∈ B .
Exercises
Let (X, d) be a metric space and A a finite subset of X. Show that A is compact.
Let A = {\nicefrac1n : n ∈ N} ⊂ R . a) Show that A is not compact directly using the definition. b) Show that A ∪ {0} is
compact directly using the definition.
Let (X, d) be a metric space with the discrete metric. a) Prove that X is complete. b) Prove that X is compact if and only if X
is a finite set.
a) Show that the union of finitely many compact sets is a compact set. b) Find an example where the union of infinitely many
compact sets is not compact.
Prove for arbitrary dimension. Hint: The trick is to use the correct notation.
Show that a compact set K is a complete metric space.
Let C ([a, b]) be the metric space as in . Show that C ([a, b]) is a complete metric space.
[exercise:msclbounnotcompt] Let C ([0, 1]) be the metric space of . Let 0 denote the zero function. Then show that the closed
ball C (0, 1) is not compact (even though it is closed and bounded). Hints: Construct a sequence of distinct continuous
functions {f } such that d(f , 0) = 1 and d(f , f ) = 1 for all n ≠ k . Show that the set {f : n ∈ N} ⊂ C (0, 1) is closed
n n n k n
j=1 j
Let C ([0, 1]) be the metric space of . Let K be the set of f ∈ C ([0, 1]) such that f is equal to a quadratic polynomial, i.e.
f (x) = a + bx + cx , and such that |f (x)| ≤ 1 for all x ∈ [0, 1], that is f ∈ C (0, 1). Show that K is compact.
2
such that whenever x ∈ X and d (x, c) < δ , then d (f (x), f (c)) < ϵ.
X Y
Suppose that f is continuous at c . Let {x } be a sequence in X converging to c . Given ϵ > 0 , there is a δ > 0 such that
n
d(x, c) < δ implies d(f (x), f (c)) < ϵ. So take M such that for all n ≥ M , we have d(x , c) < δ , then d(f (x ), f (c)) < ϵ .
n n
On the other hand suppose that f is not continuous at c . Then there exists an ϵ > 0 , such that for every \nicefrac1n there
exists an x ∈ X , d(x , c) < \nicefrac1n such that d(f (x ), f (c)) ≥ ϵ . Therefore {f (x )} does not converge to f (c).
n n n n
Therefore every sequence in f (K) has a subsequence convergent to a point in f (K) , so f (K) is compact by .
As before, f : X → R achieves an absolute minimum at c ∈ X if
f (x) ≥ f (c) for all x ∈ X. (8.5.2)
Let (X, d) and be a compact metric space, and f : X → R is a continuous function. Then f (X) is compact and in fact f
f (X) is closed. Therefore there is some x ∈ X such that f (x) = sup f (X) and some y ∈ X such that f (y) = inf f (X) .
if for every open neighbourhood U of f (c) in Y , the set f (U ) contains an open neighbourhood of c in X.
−1
Suppose that f is continuous at c . Let U be an open neighbourhood of f (c) in Y , then B (f (c), ϵ) ⊂ U for some ϵ > 0 . As f
Y
is continuous, then there exists a δ > 0 such that whenever x is such that d (x, c) < δ , then d (f (x), f (c)) < ϵ. In other
X Y
That means precisely that if d X (x, c) < δ then d Y (f (x), f (c)) < ϵ and so f is continuous at c .
[thm:mstopocont] Let (X, d ) and (Y , d
X Y ) be metric spaces. A function f: X → Y is continuous if and only if for every
open U ⊂ Y , f (U ) is open in X.
−1
Exercises
Consider N ⊂ R with the standard metric. Let (X, d) be a metric space and f : X → N a continuous function. a) Prove that if
X is connected, then f is constant (the range of f is a single value). b) Find an example where X is disconnected and f is not
constant.
xy
Let f : R 2
→ R be defined by f (0, 0) := 0, and f (x, y) := x2 +y 2
if (x, y) ≠ (0, 0). a) Show that for any fixed x, the function
that takes y to f (x, y) is continuous. Similarly for any fixed y , the function that takes x to f (x, y) is continuous. b) Show that
f is not continuous.
¯
¯¯¯ ¯¯¯¯¯¯¯¯¯
¯
Suppose that f : X → Y is continuous for metric spaces (X, dX ) and (Y , dY ) . Let A ⊂X . a) Show that f (A) ⊂ f (A) . b)
Show that the subset can be proper.
Prove . Hint: Use .
[exercise:msconnconn] Suppose that f: X → Y is continuous for metric spaces (X, dX ) and (Y , dY ) . Show that if X is
connected, then f (X) is connected.
Prove the following version of the intermediate value theorem. Let (X, d) be a connected metric space and f : X → R a
continuous function. Suppose that there exist x , x ∈ X and y ∈ R such that f (x ) < y < f (x ) . Then prove that there
0 1 0 1
the set f (K) is compact. Suppose that a continuous f : (0, 1) → (0, 1) is proper and {x } is a sequence in (0, 1) that
−1
n
Let (X, d ) and (Y , d ) be metric space and f : X → Y be a one to one and onto continuous function. Suppose that
X Y X is
compact. Prove that the inverse f : Y → X is continuous.
−1
Take the metric space of continuous functions C ([0, 1]) . Let k: [0, 1] × [0, 1] → R be a continuous function. Given
f ∈ C ([0, 1]) define
a) Show that T (f ) := φ defines a function T : C ([0, 1]) → C ([0, 1]). b) Show that T is continuous.
f
map for some k < 1 , i.e. if there exists a k < 1 such that
′
d (F (x), F (y)) ≤ kd(x, y) for all x, y ∈ X. (8.6.1)
Note that the words complete and contraction are necessary. See .
Pick any x 0 ∈ X . Define a sequence {x n} by x
n+1 := T (xn ) .
n
d(xn+1 , xn ) = d(T (xn ), T (xn−1 )) ≤ kd(xn , xn−1 ) ≤ ⋯ ≤ k d(x1 , x0 ). (8.6.2)
So let m ≥ n
m−1
d(xm , xn ) ≤ ∑ d(xi+1 , xi )
i=n
m−1
i
≤ ∑ k d(x1 , x0 )
i=n
m−n−1
n i
= k d(x1 , x0 ) ∑ k
i=0
∞
1
n i n
≤ k d(x1 , x0 ) ∑ k = k d(x1 , x0 ) .
1 −k
i=0
In particular the sequence is Cauchy. Since X is complete we let x := lim n→∞ xn and claim that x is our unique fixed point.
Fixed point? Note that T is continuous because it is a contraction. Hence
As k < 1 this means that d(x, y) = 0 and hence x = y . The theorem is proved.
Note that the proof is constructive. Not only do we know that a unique fixed point exists. We also know how to find it. Let us
use the theorem to prove the classical Picard theorem on the existence and uniqueness of ordinary differential equations.
Consider the equation
dx
= F (t, x). (8.6.5)
dt
There are some subtle issues. Look at the equation x = x , x(0) = 1 . Then x(t) =
′ 2
is a solution. While 1
1−t
F is a
reasonably “nice” function and in particular exists for all x and t , the solution “blows up” at t = 1 .
Let (t0 , x0 ) ∈ I0 × J0. Then there exists h >0 and a unique differentiable f : [ t0 − h, t0 + h] → R , such that
′
f (t) = F (t, f (t)) and f (t ) = x .
0 0
Without loss of generality assume t 0 =0 . Let M := sup{|F (t, x)| : (t, x) ∈ I × J} . As I ×J is compact, M <∞ . Pick
α > 0 such that [−α, α] ⊂ I and [ x 0 − α, x0 + α] ⊂ J . Let
α
h := min {α, }. (8.6.8)
M + Lα
Here C ([−h, h]) is equipped with the standard metric d(f , g) := sup{|f (x) − g(x)| : x ∈ [−h, h]} . With this metric we
have shown in an exercise that C ([−h, h]) is a complete metric space.
Show that Y ⊂ C([−h, h]) is closed.
Define a mapping T : Y → C ([−h, h]) by
t
≤ |t| M ≤ hM ≤ α.
Therefore,
t
∣ ∣
|T (f )(t) − T (g)(t)| = ∣∫ F (s, f (s)) − F (s, g(s)) ds∣
∣ 0 ∣
≤ |t| L d(f , g)
≤ hL d(f , g)
Lα
≤ d(f , g).
M + Lα
M+Lα
<1 and the claim is proved.
Now apply the fixed point theorem () to find a unique f ∈ Y such that T (f ) = f , that is,
t
Exercises
best (largest) k that works. b) Find the fixed point and show directly that it is unique.
[exercise:nofixedpoint] a) Find an example of a contraction of non-complete metric space with no fixed point. b) Find a 1-
Lipschitz map of a complete metric space with no fixed point.
Consider x ′ 2
=x , x(0) = 1 . Start with f
0 (t) =1 . Find a few iterates (at least up to f ). Prove that the limit of f is
2 n
1
1−t
.
1 5/26/2021
9.1: Vector Spaces, linear Mappings, and Convexity
The euclidean space R has already made an appearance in the metric space chapter. In this chapter, we will extend the
n
differential calculus we created for one variable to several variables. The key idea in differential calculus is to approximate
functions by lines and linear functions. In several variables we must introduce a little bit of linear algebra before we can move
on. So let us start with vector spaces and linear functions on vector spaces. While it is common to use x⃗ or the bold x for
elements of R , especially in the applied sciences, we use just plain x, which is common in mathematics. That is x ∈ R is a
n n
vector, which means that x = (x , x , … , x ) is an n -tuple of real numbers. We use upper indices for identifying
1 2 n
components, leaving us the lower index for sequences of vectors. For example, we can have vectors x and x in R and then 1 2
n
1
x
⎡ ⎤
2
⎢ x ⎥
1 2 n
x = (x , x , … , x ) = \scriptsize ⎢ ⎥ . (9.1.1)
⎢ ⎥
⎢ ⋮ ⎥
⎣ n ⎦
x
We will do so when convenient. We call real numbers scalars to distinguish them from vectors. Let X be a set together with
operations of addition, +: X × X → X , and multiplication, ⋅: R × X → X , (we write ax instead of a ⋅ x ). X is called a
vector space (or a real vector space) if the following conditions are satisfied: (Addition is associative) If u, v, w ∈ X , then
u + (v + w) = (u + v) + w . (Addition is commutative) If u, v ∈ X , then u + v = v + u . (Additive identity) There is a
0 ∈ X such that v + 0 = v for all v ∈ X . (Additive inverse) For every v ∈ X , there is a −v ∈ X , such that v + (−v) = 0 .
for all v ∈ X . Elements of a vector space are usually called vectors, even if they are not elements of R (vectors in the n
“traditional” sense). An example vector space is R , where addition and multiplication by a constant is done componentwise:
n
if α ∈ R and x, y ∈ R , then n
1 2 n 1 2 n 1 1 2 2 n n
x + y := (x , x , … , x ) + (y , y , … , y ) = (x +y , x +y , … , x +y ),
1 2 n 1 2 n
αx := α(x , x , … , x ) = (α x , α x , … , α x ).
In this book we mostly deal with vector spaces that can be regarded as subsets of R , but there are other vector spaces that are n
useful in analysis. For example, the space C ([0, 1], R) of continuous functions on the interval [0, 1] is a vector space. A trivial
example of a vector space (the smallest one in fact) is just X = {0} . The operations are defined in the obvious way. You
always need a zero vector to exist, so all vector spaces are nonempty sets. It is also possible to use other fields than R in the
definition (for example it is common to use the complex numbers C), but let us stick with the real numbers1. A function
f : X → Y , when Y is not R is often called a mapping or a map rather than a function. Linear combinations and dimension If
1 2 k
a x1 + a x2 + ⋯ + a xk (9.1.2)
is called a linear combination of the vectors x , … , x . If Y ⊂ R is a set then the span of Y , or in notation span(Y ), is the
1 k
n
set of all linear combinations of some finite number of elements of Y . We also say Y spans span(Y ). Let
Y := {(1, 1)} ⊂ R . Then
2
2
span(Y ) = {(x, x) ∈ R : x ∈ R}. (9.1.3)
That is, span(Y ) is the line through the origin and the point (1, 1) . [example:vecspr2span] Let Y := {(1, 1), (0, 1)} ⊂ R
2
.
Then
2
span(Y ) = R , (9.1.4)
only solution to
1 2 k
a x1 + a x2 + ⋯ + a xk = 0 (9.1.6)
is the trivial solution a = a = ⋯ = a = 0 . A set that is not linearly independent, is linearly dependent. A linearly
1 2 k
independent set B of vectors such that span(B) = X is called a basis of X. For example the set Y of the two vectors in is a
basis of R . If a vector space X contains a linearly independent set of d vectors, but no linearly independent set of d + 1
2
vectors then we say the dimension or dim X := d . If for all d ∈ N the vector space X contains a set of d linearly independent
vectors, we say X is infinite dimensional and write dim X := ∞ . Clearly for the trivial vector space, dim {0} = 0. We will
see in a moment that any vector space that is a subset of R has a finite dimension, and that dimension is less than or equal to
n
n . If a set is linearly dependent, then one of the vectors is a linear combination of the others. In other words, if a ≠ 0 , then we
j
1 j−1 j+1 k
a a a a
xj = x1 + ⋯ + xj−1 + xj+1 + ⋯ + xk . (9.1.7)
j j j k
a a a a
Clearly then the vector x has at least two different representations as linear combinations of {x , x , … , x }. If
j 1 2 k
B = { x , x , … , x } is a basis of a vector space X, then every point y ∈ X has a unique representation of the form
1 2 k
j
y = ∑ α xj (9.1.8)
j=1
j j
y = ∑ α xj = ∑ β xj , (9.1.9)
j=1 j=1
then
k
j j
∑(α − β )xj = 0. (9.1.10)
j=1
and call this the standard basis of R . We use the same letters e for any R , and which space R we are working in is
n
j
n n
understood from context. A direct computation shows that {e , e , … , e } is really a basis of R ; it is easy to show that it
1 2 n
n
1 2 n j
x = (x , x , … , x ) = ∑ x ej . (9.1.12)
j=1
Let X be a vector space. If X is spanned by d vectors, then dim X ≤ d . dim X = d if and only if X has a basis of d vectors
(and so every basis has d vectors). In particular, dim R = n . If Y ⊂ X is a vector space and dim X = d , then dim Y ≤ d .
n
If dim X = d and a set T of d vectors spans X, then T is linearly independent. If dim X = d and a set T of m vectors is
linearly independent, then there is a set S of d − m vectors such that T ∪ S is a basis of X. Let us start with (i). Suppose
S = { x , x , … , x } spans X, and T = { y , y , … , y } is a set of linearly independent vectors of X. We wish to show that
1 2 d 1 2 m
m ≤ d . Write
k
y1 = ∑ α xk , (9.1.13)
1
k=1
d k
1 α
1
x1 = y1 − ∑ xk . (9.1.14)
1 1
α α
1 k=2 1
1 k
y2 = α y1 + ∑ α xk . (9.1.15)
2 2
k=2
As T is linearly independent, we must have that one of the α for k ≥ 2 must be nonzero. Without loss of generality suppose
k
2
1 d k
1 α α
2 2
x2 = y2 − y1 − ∑ xk . (9.1.16)
2 2 2
α α α
2 2 k=3 2
In particular {y , y , x , … , x } spans X. The astute reader will think back to linear algebra and notice that we are row-
1 2 3 d
reducing a matrix. We continue this procedure. If m < d , then we are done. So suppose m ≥ d . After d steps we obtain that
{ y , y , … , y } spans X. Any other vector v in X is a linear combination of { y , y , … , y }, and hence cannot be in T as T
1 2 d 1 2 d
is linearly independent. So m = d . Let us look at (ii). First notice that if we have a set T of k linearly independent vectors that
do not span X, then we can always choose a vector v ∈ X ∖ span(T ) . The set T ∪ {v} is linearly independent (exercise). If
dim X = d , then there must exist some linearly independent set of d vectors T , and it must span X, otherwise we could
choose a larger set of linearly independent vectors. So we have a basis of d vectors. On the other hand if we have a basis of d
vectors, it is linearly independent and spans X. By (i) we know there is no set of d + 1 linearly independent vectors, so
dimension must be d . For (iii) notice that {e , e , … , e } is a basis of R . To see (iv), suppose Y is a vector space and
1 2 n
n
Y ⊂ X , where dim X = d . As X cannot contain d + 1 linearly independent vectors, neither can Y . For (v) suppose T is a
set of m vectors that is linearly dependent and spans X. Then one of the vectors is a linear combination of the others.
Therefore if we remove it from T we obtain a set of m − 1 vectors that still span X and hence dim X ≤ m − 1 . For (vi)
suppose T = {x , … , x } is a linearly independent set. We follow the procedure above in the proof of (ii) to keep adding
1 m
vectors while keeping the set linearly independent. As the dimension is d we can add a vector exactly d − m times. Linear
mappings A mapping A: X → Y of vector spaces X and Y is linear (or a linear transformation) if for every a ∈ R and
x, y ∈ X we have
We usually write Ax instead of A(x) if A is linear. If A is one-to-one an onto then we say A is invertible and we denote the
inverse by A . If A: X → X is linear then we say A is a linear operator on X. We write L(X, Y ) for the set of all linear
−1
transformations from X to Y , and just L(X) for the set of linear operators on X. If a, b ∈ R and A, B ∈ L(X, Y ) , define the
transformation aA + bB
(aA + bB)(x) = aAx + bBx. (9.1.18)
Finally denote by the identity: the linear operator such that I x = x for all x. It is not hard to see that
I ∈ L(X)
aA + bB ∈ L(X, Y ) , and that AB ∈ L(X, Z) . In particular, L(X, Y ) is a vector space. It is obvious that if A is linear then
A0 = 0 . If A: X → Y is invertible, then A is linear. Let a ∈ R and y ∈ Y . As A is onto, then there is an x such that
−1
−1 −1 −1 −1
A (ay) = A (aAx) = A (A(ax)) = ax = aA (y). (9.1.20)
n
j
x = ∑ b xj (9.1.22)
j=1
The “furthermore” follows by defining the extension Ax = ∑ b y , and noting that this is well defined by uniqueness of
n
j=1
j
j
the representation of x. If X is a finite dimensional vector space and A: X → X is linear, then A is one-to-one if and only if it
is onto. Let {x , x , … , x } be a basis for X. Suppose A is one-to-one. Now suppose
1 2 n
n n
j j
∑ c Axj = A ∑ c xj = 0. (9.1.24)
j=1 j=1
j
0 = ∑ c xj (9.1.25)
j=1
and so c = 0 for all j . Therefore, {Ax , Ax , … , Ax } is linearly independent. By an above proposition and the fact that
j
1 2 n
the dimension is n , we have that {Ax , Ax , … , Ax } span X. As any point x ∈ X can be written as
1 2 n
n n
j j
x = ∑ a Axj = A ∑ a xj , (9.1.26)
j=1 j=1
so A is onto. Now suppose A is onto. As A is determined by the action on the basis we see that every element of X has to be
in the span of {Ax , … , Ax }. Suppose
1 n
n n
j j
A ∑ c xj = ∑ c Axj = 0. (9.1.27)
j=1 j=1
By the same proposition as {Ax , Ax , … , Ax } span X, the set is independent, and hence c = 0 for all j . This means that
1 2 n
j
whenever x, y ∈ U , the line segment from x to y lies in U . That is, if the convex combination (1 − t)x + ty is in U for all
t ∈ [0, 1]. See . Note that in R , every connected interval is convex. In R (or higher dimensions) there are lots of nonconvex
2
connected sets. For example the set R ∖ {0} is not convex but it is connected. To see this simply take any x ∈ R ∖ {0} and
2 2
let y := −x . Then (\nicefrac12)x + (\nicefrac12)y = 0, which is not in the set. On the other hand, the ball B(x, r) ⊂ R n
(using the standard metric on R ) is always convex by the triangle inequality. Show that in R any ball B(x, r) for x ∈ R
n n n
and r > 0 is convex. Any subspace V of a vector space X is convex. A somewhat more complicated example is given by the
following. Let C ([0, 1], R) be the vector space of continuous real valued functions on R. Let X ⊂ C ([0, 1], R) be the set of
those f such
1
Then X is convex. Take t ∈ [0, 1] and note that if f , g ∈ X then tf (x) + (1 − t)g(x) ≥ 0 for all x. Furthermore
1 1 1
C := ⋂ Cλ (9.1.30)
λ∈I
is convex. The proof is easy. If x, y ∈ C , then x, y ∈ C for all λ ∈ I , and hence if t ∈ [0, 1], then tx + (1 − t)y ∈ C for all
λ λ
λ ∈ I . Therefore tx + (1 − t)y ∈ C and C is convex. Let T : V → W be a linear mapping between two vector spaces and let
C ⊂ V be a convex set. Then T (C ) is convex. Take any two points p, q ∈ T (C ). Then pick x, y ∈ C such that T (x) = p and
is in T (C ). For completeness, let us A very useful construction is the convex hull. Given any set S ⊂V of a vector space,
define the convex hull of S , by
That is, the convex hull is the smallest convex set containing S . Note that by a proposition above, the intersection of convex
sets is convex and hence, the convex hull is convex. The convex hull of 0 and 1 in R is [0, 1]. Proof: Any convex set
containing 0 and 1 must contain [0, 1]. The set [0, 1] is convex, therefore it must be the convex hull. Exercises Verify that R n
is a vector space. Let X be a vector space. Prove that a finite set of vectors {x , … , x } ⊂ X is linearly independent if and
1 n
That is, the span of the set with one vector removed is strictly smaller. Prove that C ([0, 1], R) is an infinite dimensional vector
space where the operations are defined in the obvious way: s = f + g and m = f g are defined as s(x) := f (x) + g(x) and
m(x) := f (x)g(x). Hint: for the dimension, think of functions that are only nonzero on the interval
(\nicefrac1n + 1, \nicefrac1n). Let k: [0, 1 ] → R be continuous. Show that L: C ([0, 1], R) → C ([0, 1], R)defined by
2
is a linear operator. That is, show that L is well defined (that Lf is continuous), and that L is linear.
x ⋅ y := ∑ x jy j.
j=1
It is easy to see that the dot product is linear in each variable separately, that is, it is a linear mapping when you keep one of the
variables constant. The Euclidean norm is then defined as
|x ⋅ y| ≤ ‖x‖‖y‖ = √x ⋅ x √y ⋅ y,
with equality if and only if the vectors are scalar multiples of each other. If x = 0 or y = 0, then the theorem holds trivially. So
assume x ≠ 0 and y ≠ 0. If x is a scalar multiple of y, that is x = λy for some λ ∈ R, then the theorem holds with equality:
If x is not a scalar multiple of y, then ‖x + ty‖ 2 > 0 for all t. So the above polynomial in t is never zero. From elementary
algebra it follows that the discriminant must be negative:
or in other words (x ⋅ y) 2 < ‖x‖ 2‖y‖ 2. Item (iii), the triangle inequality, follows via a simple computation:
The distance d(x, y) := ‖x − y‖ is the standard distance function on R n that we used when we talked about metric spaces. In
fact, on any vector space X, once we have a norm (any norm), we define a distance d(x, y) := ‖x − y‖ that makes X into a metric
space (an easy exercise). Let A ∈ L(X, Y). Define
The number ‖A‖ is called the operator norm. We will see below that indeed it is a norm (at least for finite dimensional spaces).
By linearity we get
‖Ax‖
‖A‖ = sup {‖Ax‖ : x ∈ X with ‖x‖ = 1} = sup .
x∈X ‖x‖
x≠0
‖Ax‖ ≤ ‖A‖‖x‖.
In particular L(R n, R m) is a metric space with distance ‖A − B‖. If A ∈ L(R n, R m) and B ∈ L(R m, R k), then
‖BA‖ ≤ ‖B‖‖A‖.
For (i), let x ∈ R n. We know that A is defined by its action on a basis. Write
x= ∑ c je j.
j=1
Then
n n
||
If ‖x‖ = 1, then it is easy to see that c j ≤ 1 for all j, so
n n
The right hand side does not depend on x and so we are done, we have found a finite upper bound. Next,
as we mentioned above. So if ‖A‖ < ∞, then this says that A is Lipschitz with constant ‖A‖. For (ii), let us note that
‖(A + B)x‖ = ‖Ax + Bx‖ ≤ ‖Ax‖ + ‖Bx‖ ≤ ‖A‖‖x‖ + ‖B‖‖x‖ = (‖A‖ + ‖B‖)‖x‖.
Hence |c|‖A‖ ≤ ‖cA‖. That we have a metric space follows pretty easily, and is left to student. For (iii) write
As a norm defines a metric, we have defined a metric space topology on L(R n, R m) so we can talk about open/closed sets,
continuity, and convergence. Note that we have defined a norm only on R n and not on an arbitrary finite dimensional vector
space. However, after picking bases, we can define a norm on any vector space in the same way. So we really have a topology
on any L(X, Y), although the precise metric would depend on the basis picked. Let U ⊂ L(R n) be the set of invertible linear
operators. If A ∈ U and B ∈ L(R n), and
then B is invertible. U is open and A ↦ A − 1 is a continuous function on U. The proposition says that U is an open set and
A ↦ A − 1 is continuous on U. You should always think back to R 1, where linear operators are just numbers a. The operator a
is invertible (a − 1 = \nicefrac1a) whenever a ≠ 0. Of course a ↦ \nicefrac1a is continuous. When n > 1, then there are other
noninvertible operators, and in general things are a bit more difficult. Let us prove (i). First a straight forward computation
or in other words ‖Bx‖ ≠ 0 for all nonzero x, and hence Bx ≠ 0 for all nonzero x. This is enough to see that B is one-to-one (if
Bx = By, then B(x − y) = 0, so x = y). As B is one-to-one operator from R n to R n it is onto and hence invertible. Let us look at
(ii). Let B be invertible and near A − 1, that is [eqcontineq] is satisfied. In fact, suppose ‖A − B‖‖A − 1‖ < \nicefrac12. Then we
have shown above (using B − 1y instead of x)
and
FIXME: continuity of vector space Matrices Finally let us get to matrices, which are a convenient way to represent finite-
dimensional operators. If we have bases {x 1, x 2, …, x n} and {y 1, y 2, …, y m} for vector spaces X and Y, then we know that a
j
linear operator is determined by its values on the basis. Given A ∈ L(X, Y), define the numbers {a i} as follows
Ax j = ∑ a ijy i,
i=1
[ ]
1 1 1
a1 a2 ⋯ an
2 2 2
a1 a2 ⋯ an
A= .
⋮ ⋮ ⋱ ⋮
m m m
a1 a2 ⋯ an
Note that the columns of the matrix are precisely the coefficients that represent Ax j. Let us derive the familiar rule for matrix
multiplication. When
x= ∑ γ jx j,
j=1
( )
n m m n
Ax = ∑∑ γ ja ijy i, = ∑ ∑ γ ja ij y i,
j=1i=1 i=1 j=1
which gives rise to the familiar rule for matrix multiplication. There is a one-to-one correspondence between matrices and
linear operators in L(X, Y). That is, once we fix a basis in X and in Y. If we would choose a different basis, we would get
different matrices. This is important, the operator A acts on elements of X, the matrix is something that works with n-tuples of
numbers. If B is an r-by-m matrix with entries b jk , then the matrix for BA has the i, kth entry c ik being
c ik = ∑ b jka ij.
j=1
Note how upper and lower indices line up. A linear mapping changing one basis to another is then just a square matrix in
which the columns represent basis elements of the second basis in terms of the first basis. We call such a linear mapping an
change of basis. Now suppose all the bases are just the standard bases and X = R n and Y = R m. If we recall the Cauchy-
Schwarz inequality we note that
( ) ( )( ) ( )
m n 2 m n n m n
2 2
‖Ax‖ 2 = ∑ ∑ γ ja ij ≤ ∑ ∑ (γ j) ∑ (a ij) = ∑ ∑ (a ij) 2 ‖x‖ 2.
i=1 j=1 i=1 j=1 j=1 i=1 j=1
m n
‖A‖ ≤
√∑ ∑i=1j=1
2
(a ij) .
If the entries go to zero, then ‖A‖ goes to zero. In particular, if A if fixed and B is changing such that the entries of A − B go to
zero then B goes to A in operator norm. That is B goes to A in the metric space topology induced by the operator norm. We
have proved the first part of: If f : S → R nm is a continuous function for a metric space S, then taking the components of f as
the entries of a matrix, f is a continuous mapping from S to L(R n, R m). Conversely if f : S → L(R n, R m) is a continuous
function then the entries of the matrix are continuous functions. The proof of the second part is rather easy. Take f(x)e j and
note that is a continuous function to R m with standard Euclidean norm (Note ‖(A − B)e j‖ ≤ ‖A − B‖). Such a function recall
from last semester that such a function is continuous if and only if its components are continuous and these are the components
of the jth column of the matrix f(x). Determinants It would be nice to have an easy test for when is a matrix invertible. This is
where determinants come in. First define the symbol sgn(x) for a number is defined by
{
−1 if x < 0,
sgn(x) := 0 if x = 0,
1 if x > 0.
Suppose σ = (σ 1, …, σ n) is a permutation of the integers (1, …, n). It is not hard to see that any permutation can be obtained by
a sequence of transpositions (switchings of two elements). Call a permutation even (resp. odd) if it takes an even (resp. odd)
number of transpositions to get from σ to (1, …, n). It can be shown that this is well defined, in fact it is not hard to show that
is 1 if σ is even and − 1 if σ is odd. This fact can be proved by noting that applying a transposition changes the sign, which is
not hard to prove by induction on n. Then note that the sign of (1, 2, …, n) is 1. Let S n be the set of all permutations on n
det (I) = 1. det ([x 1x 2…x n]) as a function of column vectors x j is linear in each variable x j separately. If two columns of a
matrix are interchanged, then the determinant changes sign. If two columns of A are equal, then det (A) = 0. If a column is
is the unique function that satisfies (i), (ii), and (iii). But we digress. We go through the proof quickly, as you have likely seen
this before. (i) is trivial. For (ii) Notice that each term in the definition of the determinant contains exactly one factor from
each column. Part (iii) follows by noting that switching two columns is like switching the two corresponding numbers in every
element in S n. Hence all the signs are changed. Part (iv) follows because if two columns are equal and we switch them we get
the same matrix back and so part (iii) says the determinant must have been 0. Part (v) follows because the product in each term
in the definition includes one element from the zero column. Part (vi) follows as det is a polynomial in the entries of the
matrix and hence continuous. We have seen that a function defined on matrices is continuous in the operator norm if it is
continuous in the entries. Finally, part (vii) is a direct computation. If A and B are n × n matrices, then
1
det (AB) = det (A) det (B). In particular, A is invertible if and only if det (A) ≠ 0 and in this case, det (A − 1) = det ( A ) . Let
b 1, b 2, …, b n be the columns of B. Then
AB = [Ab 1 Ab 2 ⋯ Ab n].
That is, the columns of AB are Ab 1, Ab 2, …, Ab n. Let b ij denote the elements of B and a j the columns of A. Note that Ae j = a j.
By linearity of the determinant as proved above we have
([ ])
n
det (AB) = det ([Ab 1 Ab 2 ⋯ Ab n]) = det ∑ b j1 a j Ab 2 ⋯ Ab n
j=1
=
( ∑
( j1 , j2 , … , jn ) ∈ Sn
j j j
b 11 b 22 ⋯b nn sgn(j 1, j 2, …, j n)
) det ([a 1 a2 ⋯ a n]).
In the above, go from all integers between 1 and n, to just elements of S n by noting that when two columns in the determinant
are the same then the determinant is zero. We then reorder the columns to the original ordering and obtain the sgn. The
conclusion follows by recognizing the determinant of B. The rows and columns are swapped, but a moment’s reflection reveals
it does not matter. We could also just plug in A = I above. For the second part of the theorem note that if A is invertible, then
A − 1A = I and so det (A − 1) det (A) = 1. If A is not invertible, then the columns are linearly dependent. That is, suppose
∑ c ja j = 0.
j=1
It is not hard to see from the definition that det (B) = c 1 ≠ 0. Then det (AB) = det (A) det (B) = c 1 det (A). Note that the
first column of AB is zero, and hence det (AB) = 0. Thus det (A) = 0. There are tree types of so-called elementary matrices.
First for some j = 1, 2, …, n and some λ ∈ R, λ ≠ 0, an n × n matrix E defined by
Ee i =
{ ei
λe i
if i ≠ j,
if i = j.
Given any n × m matrix M the matrix EM is the same matrix as M except with the kth row multiplied by λ. It is an easy
computation (exercise) that det (E) = λ. Second, for some j and k with j ≠ k, and λ ∈ R an n × n matrix E defined by
Ee i =
{ ei
e i + λe k
if i ≠ j,
if i = j.
Given any n × m matrix M the matrix EM is the same matrix as M except with λ times the kth row added to the jth row. It is an
easy computation (exercise) that det (E) = 1. Finally for some j and k with j ≠ k an n × n matrix E defined by
{
ei if i ≠ j and i ≠ k,
Ee i = ek if i = j,
ej if i = k.
Given any n × m matrix M the matrix EM is the same matrix with jth and kth rows swapped. It is an easy computation
(exercise) that det (E) = − 1. Elementary matrices are useful for computing the determinant. The proof of the following
proposition is left as an exercise. [prop:elemmatrixdecomp] Let T be an n × n invertible matrix. Then there exists a finite
sequence of elementary matrices E 1, E 2, …, E k such that
T = E 1E 2⋯E k,
and
The proof is immediate. If in one basis A is the matrix representing a linear operator, then for another basis we can find a
matrix B such that the matrix B − 1AB takes us to the first basis, applies A in the first basis, and takes us back to the basis we
started with. Therefore, the determinant can be defined as a function on the space L(X) for some finite dimensional metric
space X, not just on matrices. We choose a basis on X, and we can represent a linear mapping using a matrix with respect to
this basis. We obtain the same determinant as if we had used any other basis. It follows from the two propositions that
det : L(X) → R
f(x + h) − f(x)
lim .
h→0
h
lim
h→0
| f(x + h) − f(x)
h |
− a = lim
h→0
|
f(x + h) − f(x) − ah
h
= lim
h→0
|
|f(x + h) − f(x) − ah|
|h|
= 0.
Multiplying by a is a linear map in one dimension. That is, we think of a ∈ L(R 1, R 1). We use this definition to extend differentiation to more variables. Let U ⊂ R n be an open subset and
f : U → R m. We say f is differentiable at x ∈ U if there exists an A ∈ L(R n, R m) such that
We define Df(x) := A, or f ′ (x) := A, and we say A is the derivative of f at x. When f is differentiable at all x ∈ U, we say simply that f is differentiable. For a differentiable function, the derivative
of f is a function from U to L(R n, R m). Compare to the one dimensional case, where the derivative is a function from U to R, but we really want to think of R here as L(R 1, R 1). The norms
above must be in the right spaces of course. The norm in the numerator is in R m, and the norm in the denominator is R n where h lives. Normally it is understood that h ∈ R n from context. We
will not explicitly say so from now on. We have again cheated somewhat and said that A is the derivative. We have not shown yet that there is only one, let us do that now. Let U ⊂ R n be an
open subset and f : U → R m. Suppose x ∈ U and there exist A, B ∈ L(R n, R m) such that
Then A = B.
‖ (A−B)h‖
So ‖h‖ → 0 as h → 0. That is, given ϵ > 0, then for all h in some δ-ball around the origin
‖(A − B)h‖ h
ϵ> = ‖(A − B) ‖.
‖h‖ ‖h‖
h
For any x with ‖x‖ = 1 let h = (\nicefracδ2) x, then ‖h‖ < δ and ‖h‖ = x and so ‖A − B‖ ≤ ϵ. So A = B. If f(x) = Ax for a linear mapping A, then f ′ (x) = A. This is easily seen:
Let U ⊂ R n be open and f : U → R m be differentiable at x 0. Then f is continuous at x 0. Another way to write the differentiability is to write
‖r(h) ‖
As ‖h‖ must go to zero as h → 0, then r(h) itself must go to zero. The mapping h ↦ f ′ (x 0)h is linear mapping between finite dimensional spaces. Therefore it is continuous and goes to
zero. Thereforem f(x 0 + h) must go to f(x 0) as h → 0. That is, f is continuous at x 0. Let U ⊂ R n be open and let f : U → R m be differentiable at x 0 ∈ U. Let V ⊂ R m be open, f(U) ⊂ V and let
g : V → R ℓ be differentiable at f(x 0). Then
F(x) = g (f(x) )
is differentiable at x 0 and
F ′ (x 0) = g ′ (f(x 0) )f ′ (x 0).
Without the points this is sometimes written as F ′ = (f ∘ g) ′ = g ′ f ′ . The way to understand it is that the derivative of the composition g ∘ f is the composition of the derivatives of g and f.
That is, if A := f ′ (x 0) and B := g ′ (f(x 0) ), then F ′ (x 0) = BA. Let A := f ′ (x 0) and B := g ′ (f(x 0) ). Take h ∈ R n and write y 0 = f(x 0), k = f(x 0 + h) − f(x 0). Let
Then
\begin{split} \frac{\left\lVert {F(x_0+h)-F(x_0) - BAh} \right\rVert}{\left\lVert {h} \right\rVert} & = \frac{\left\lVert {g\bigl(f(x_0+h)\bigr)-g\bigl(f(x_0)\bigr) - BAh} \right\rVert}{\left\lVert {h}
‖r(h) ‖ ‖ g ( y 0 + k ) − g ( y 0 ) − Bk ‖
First, ‖B‖ is constant and f is differentiable at x 0, so the term ‖B‖ ‖h‖ goes to 0. Next as f is continuous at x 0, we have that as h goes to 0, then k goes to 0. Therefore ‖k‖
goes to 0 because g is differentiable at y 0. Finally
‖f(x 0 + h) − f(x 0)‖ ‖f(x 0 + h) − f(x 0) − Ah‖ ‖Ah‖ ‖f(x 0 + h) − f(x 0) − Ah‖
≤ + ≤ + ‖A‖.
‖h‖ ‖h‖ ‖h‖ ‖h‖
‖ f ( x0 + h ) − f ( x0 ) ‖ ‖ F ( x 0 + h ) − F ( x 0 ) − BAh ‖
As f is differentiable at x 0, the term ‖h‖
stays bounded as h goes to 0. Therefore, ‖h‖
goes to zero, and F ′ (x 0) = BA, which is what was claimed. Partial
derivatives There is another way to generalize the derivative from one dimension. We can hold all but one variables constant and take the regular derivative. Let f : U → R be a function on an
∂f
We call (x) the partial derivative of f with respect to x j. Sometimes we write D jf instead. For a mapping f : U → R m we write f = (f 1, f 2, …, f m), where f k are real-valued functions. Then we
∂x j
∂f k
define (or write it as D jf k). Partial derivatives are easier to compute with all the machinery of calculus, and they provide a way to compute the total derivative of a function. Let U ⊂ R n be
∂x j
open and let f : U → R m be differentiable at x 0 ∈ U. Then all the partial derivatives at x 0 exist and in terms of the standard basis of R n and R m, f ′ (x 0) is represented by the matrix
[ ]
∂f 1 ∂f 1 ∂f 1
(x 0) (x 0) … (x 0)
∂x 1 ∂x 2 ∂x n
∂f 2 ∂f 2 ∂f 2
(x 0) (x 0) … (x 0)
∂x 1 ∂x 2 ∂x n .
⋮ ⋮ ⋱ ⋮
∂f m ∂f m ∂f m
(x 0) (x 0) … (x 0)
∂x 1 ∂x 2 ∂x n
In other words
m
∂f k
f ′ (x 0) e j = ∑ j
(x 0) e k.
k = 1 ∂x
n
If h = ∑ j = 1c je j, then
n m
∂f k
f ′ (x 0) h = ∑ ∑ cj (x 0) e k.
j = 1k = 1 ∂x j
Again note the up-down pattern with the indices being summed over. That is on purpose. Fix a j and note that
As h goes to 0, the right hand side goes to zero by differentiability of f, and hence
f(x 0 + he j) − f(x 0)
lim = f ′ (x 0)e j.
h→0 h
Note that f is vector valued. So represent f by components f = (f 1, f 2, …, f m), and note that taking a limit in R m is the same as taking the limit in each component separately. Therefore for any k
the partial derivative
∂f k f k(x 0 + he j) − f k(x 0)
(x 0) = lim
∂x j h→0
h
exists and is equal to the kth component of f ′ (x 0)e j, and we are done. One of the consequences of the theorem is that if f is differentiable on U, then f ′ : U → L(R n, R m) is a continuous function
∂f k
if and only if all the are continuous functions. Gradient and directional derivatives Let U ⊂ R n be open and f : U → R is a differentiable function. We define the gradient as
∂x j
n
∂f
∇f(x) := ∑ j
(x) e j.
j = 1 ∂x
Here the upper-lower indices do not really match up. Suppose γ : (a, b) ⊂ R → R n is a differentiable function and the image γ ((a, b) ) ⊂ U. Write γ = (γ 1, γ 2, …, γ n). Let
g(t) := f (γ(t) ).
For convenience, we sometimes leave out the points where we are evaluating as on the right hand side above. Notice
where the dot is the standard scalar dot product. We use this idea to define derivatives in a specific direction. A direction is simply a vector pointing in that direction. So pick a vector u ∈ R n
such that ‖u‖ = 1. Fix x ∈ U. Then define
γ(t) := x + tu.
d
dt | t = 0 [f(x + tu) ] = (∇f)(x) ⋅ u,
d
where the notation |
dt t = 0 represents the derivative evaluated at t = 0. We also compute directly
d
D uf(x) :=
dt | t = 0 [f(x + tu) ],
which can be computed by one of the methods above. Let us suppose (∇f)(x) ≠ 0. By Cauchy-Schwarz inequality we have
|Duf(x) | ≤ ‖(∇f)(x)‖.
Equality is achieved when u is a scalar multiple of (∇f)(x). That is, when
(∇f)(x)
u= ,
‖(∇f)(x)‖
we get D uf(x) = ‖(∇f)(x)‖. The gradient points in the direction in which the function grows fastest, in other words, in the direction in which D uf(x) is maximal. Bounding the derivative Let us
prove a “mean value theorem” for vector valued functions. If φ : [a, b] → R n is differentiable on (a, b) and continuous on [a, b], then there exists a t such that
By mean value theorem on the function (φ(b) − φ(a) ) ⋅ φ(t) (the dot is the scalar dot product again) we obtain there is a t such that
(φ(b) − φ(a) ) ⋅ φ(b) − (φ(b) − φ(a) ) ⋅ φ(a) = ‖φ(b) − φ(a)‖ 2 = (φ(b) − φ(a) ) ⋅ φ ′ (t)
where we treat φ ′ as a simply a column vector of numbers by abuse of notation. Note that in this case, it is not hard to see that \(\left\lVert {\varphi'(t)} \right\rVert_{L({\mathbb{R}},
{\mathbb{R}}^n)} = \left\lVert {\varphi'(t)} \right\rVert_
‖f ′ (x)‖ ≤ M
for all x, y ∈ U. Fix x and y in U and note that (1 − t)x + ty ∈ U for all t ∈ [0, 1] by convexity. Next
d
dt [f ((1 − t)x + ty ) ] = f ′ ((1 − t)x + ty )(y − x).
By mean value theorem above we get
d
‖f(x) − f(y)‖ ≤ ‖ [
dt ( ]
f (1 − t)x + ty ) ‖ ≤ ‖f ′ ((1 − t)x + ty )‖‖y − x‖ ≤ M‖y − x‖. \qedhere
If U is not convex the proposition is not true. To see this fact, take the set
Let f(x, y) be the angle that the line from the origin to (x, y) makes with the positive x axis. You can even write the formula for f:
f(x, y) = 2arctan
( x+
y
√x 2 + y 2 )
.
Think spiral staircase with room in the middle. See . The function is differentiable, and the derivative is bounded on U, which is not hard to see. Thinking of what happens near where the
negative x-axis cuts the annulus in half, we see that the conclusion cannot hold. Let us solve the differential equation f ′ = 0. If U ⊂ R n is connected and f : U → R m is differentiable and
f ′ (x) = 0, for all x ∈ U, then f is constant. For any x ∈ U, there is a ball B(x, δ) ⊂ U. The ball B(x, δ) is convex. Since ‖f ′ (y)‖ ≤ 0 for all y ∈ B(x, δ) then by the theorem,
‖f(x) − f(y)‖ ≤ 0‖x − y‖ = 0, so f(x) = f(y) for all y ∈ B(x, δ). This means that f − 1(c) is open for any c ∈ R m. Suppose f − 1(c) is nonempty. The two sets
are open disjoint, and further U = U ′ ∪ U ″ . So as U ′ is nonempty, and U is connected, we have that U ″ = ∅. So f(x) = c for all x ∈ U. Continuously differentiable functions We say
f : U ⊂ R n → R m is continuously differentiable, or C 1(U) if f is differentiable and f ′ : U → L(R n, R m) is continuous. Let U ⊂ R n be open and f : U → R m. The function f is continuously
differentiable if and only if all the partial derivatives exist and are continuous. Without continuity the theorem does not hold. Just because partial derivatives exist does not mean that f is
differentiable, in fact, f may not even be continuous. See the exercises FIXME. We have seen that if f is differentiable, then the partial derivatives exist. Furthermore, the partial derivatives are
the entries of the matrix of f ′ (x). So if f ′ : U → L(R n, R m) is continuous, then the entries are continuous, hence the partial derivatives are continuous. To prove the opposite direction, suppose
the partial derivatives exist and are continuous. Fix x ∈ U. If we can show that f ′ (x) exists we are done, because the entries of the matrix f ′ (x) are then the partial derivatives and if the entries
are continuous functions, the matrix valued function f ′ is continuous. Let us do induction on dimension. First let us note that the conclusion is true when n = 1. In this case the derivative is just
[ ] [ ]
∂f 1 ∂f 1 ∂f 1 ∂f 1
(x) … (x) (x) … (x)
∂x 1 ∂x n ∂x 1 ∂x n − 1
A= ⋮ ⋱ ⋮ , A1 = ⋮ ⋱ ⋮ , v=
∂f m ∂f m ∂f m ∂f m
(x) … (x) (x) … (x)
∂x 1 ∂x n ∂x 1 ∂x n − 1
Let ϵ > 0 be given. Let δ > 0 be such that for any k ∈ R n − 1 with ‖k‖ < δ we have
| ∂f j
∂x n
(x + h) −
∂f j
∂x n |
(x) < ϵ,
for all j and all h with ‖h‖ < δ. Let h = h 1 + te n be a vector in R n where h 1 ∈ R n − 1 such that ‖h‖ < δ. Then ‖h 1‖ ≤ ‖h‖ < δ. Note that Ah = A 1h 1 + tv.
As all the partial derivatives exist then by the mean value theorem for each j there is some θ j ∈ [0, t] (or [t, 0] if t < 0), such that
∂f j
f j(x + h 1 + te n) − f j(x + h 1) = t (x + h 1 + θ je n).
∂x n
Note that if ‖h‖ < δ then ‖h 1 + θ je n‖ ≤ ‖h‖ < δ. So to finish the estimate
√ ( )
m
∂f j ∂f j 2
≤ ∑ t (x + h 1 + θ je n) − t (x) + ϵ‖h 1‖
j=1 ∂x n ∂x n
≤ √m ϵ|t| + ϵ‖h 1‖
≤ (√m + 1)ϵ‖h‖.
The Jacobian Let U ⊂ R n and f : U → R n be a differentiable mapping. Then define the Jacobian of f at x as
∂(f 1, …, f n)
.
∂(x 1, …, x n)
This last piece of notation may seem somewhat confusing, but it is useful when you need to specify the exact variables and function components used. The Jacobian J f is a real valued function,
and when n = 1 it is simply the derivative. When f is C 1, then J f is a continuous function. From the chain rule it follows that:
It can be computed directly that the determinant tells us what happens to area/volume. Suppose we are in R 2. Then if A is a linear transformation, it follows by direct computation that the direct
image of the unit square A([0, 1] 2) has area | det (A)|. Note that the sign of the determinant determines “orientation”. If the determinant is negative, then the two sides of the unit square will be
flipped in the image. We claim without proof that this follows for arbitrary figures, not just the square. Similarly, the Jacobian measures how much a differentiable mapping stretches things
locally, and if it flips orientation. Exercises Let f : R 2 → R be given by f(x, y) = √x2 + y2. Show that f is not differentiable at the origin. Define a function f : R2 → R by
{
xy
if (x, y) ≠ (0, 0),
f(x, y) := x2 + y2
0 if (x, y) = (0, 0).
∂f ∂f
a) Show that partial derivatives ∂x
and ∂y
exist at all points (including the origin). b) Show that f is not continuous at the origin (and hence not differentiable). Define a function f : R 2 → R by
{
x 2y
if (x, y) ≠ (0, 0),
f(x, y) := x2 + y2
0 if (x, y) = (0, 0).
Learning Objects
The contraction mapping principle says that if f : X → X is a contraction and X is a complete metric space, then there exists a
fixed point, that is, there exists an x ∈ X such that f(x) = x. Intuitively if a function is differentiable, then it locally “behaves
like” the derivative (which is a linear function). The idea of the inverse function theorem is that if a function is differentiable
and the derivative is invertible, the function is (locally) invertible. Let U ⊂ R n be a set and let f : U → R n be a continuously
differentiable function. Also suppose x 0 ∈ U, f(x 0) = y 0, and f ′ (x 0) is invertible (that is, J f(x 0) ≠ 0). Then there exist open sets
V, W ⊂ R n such that x 0 ∈ V ⊂ U, f(V) = W and f | V is one-to-one and onto. Furthermore, the inverse g(y) = (f | V) − 1(y) is
continuously differentiable and
−1
g ′ (y) = (f ′ (x) ) , for all x ∈ V, y = f(x).
Write A = f ′ (x 0). As f ′ is continuous, there exists an open ball V around x 0 such that
1
‖A − f ′ (x)‖ < for all x ∈ V.
2‖A − 1‖
φ y(x) = x + A − 1 (y − f(x) ).
As A − 1 is one-to-one, then φ y(x) = x (x is a fixed point) if only if y − f(x) = 0, or in other words f(x) = y. Using chain rule we
obtain
So for x ∈ V we have
′
‖φ y (x)‖ ≤ ‖A − 1‖‖A − f ′ (x)‖ < \nicefrac12.
1
‖φ y(x 1) − φ y(x 2)‖ ≤ ‖x − x 2‖ for all x 1, x 2 ∈ V.
2 1
In other words φ y is a contraction defined on V, though we so far do not know what is the range of φ y. We cannot apply the
fixed point theorem, but we can say that φ y has at most one fixed point (note proof of uniqueness in the contraction mapping
principle). That is, there exists at most one x ∈ V such that f(x) = y, and so f | V is one-to-one. Let W = f(V). We need to show
that W is open. Take a y 1 ∈ W, then there is a unique x 1 ∈ V such that f(x 1) = y 1. Let r > 0 be small enough such that the
closed ball C(x 1, r) ⊂ V (such r > 0 exists as V is open). Suppose y is such that
r
‖y − y 1‖ < .
2‖A − 1‖
If we can show that y ∈ W, then we have shown that W is open. Define φ y(x) = x + A − 1 (y − f(x) ) as before. If x ∈ C(x 1, r),
then
So φ y takes C(x 1, r) into B(x 1, r) ⊂ C(x 1, r). It is a contraction on C(x 1, r) and C(x 1, r) is complete (closed subset of R n is
complete). Apply the contraction mapping principle to obtain a fixed point x, i.e. φ y(x) = x. That is f(x) = y. So
y ∈ f (C(x 1, r) ) ⊂ f(V) = W. Therefore W is open. Next we need to show that g is continuously differentiable and compute its
derivative. First let us show that it is differentiable. Let y ∈ W and k ∈ R n, k ≠ 0, such that y + k ∈ W. Then there are unique
x ∈ V and h ∈ R n, h ≠ 0 and x + h ∈ V, such that f(x) = y and f(x + h) = y + k as f | V is a one-to-one and onto mapping of V
onto W. In other words, g(y) = x and g(y + k) = x + h. We can still squeeze some information from the fact that φ y is a
contraction.
So
1 ‖h‖
‖h − A − 1k‖ = ‖φ y(x + h) − φ y(x)‖ ≤ ‖x + h − x‖ = .
2 2
1
By the inverse triangle inequality ‖h‖ − ‖A − 1k‖ ≤ 2 ‖h‖ so
In particular as k goes to 0, so does h. As x ∈ V, then f ′ (x) is invertible. Let B = (f ′ (x) ) − 1, which is what we think the
derivative of g at y is. Then
As k goes to 0, so does h. So the right hand side goes to 0 as f is differentiable, and hence the left hand side also goes to 0. And
B is precisely what we wanted g ′ (y) to be. We have g is differentiable, let us show it is C 1(W). Now, g : W → V is continuous
(it is differentiable), f ′ is a continuous function from V to L(R n), and X → X − 1 is a continuous function.
−1
g ′ (y) = (f ′ (g(y) )) is the composition of these three continuous functions and hence is continuous. Suppose U ⊂ R n is
open and f : U → R n is a continuously differentiable mapping such that f ′ (x) is invertible for all x ∈ U. Then given any open
set V ⊂ U, f(V) is open. (f is an open mapping). Without loss of generality, suppose U = V. For each point y ∈ f(V), we pick
x ∈ f − 1(y) (there could be more than one such point), then by the inverse function theorem there is a neighbourhood of x in V
that maps onto an neighbourhood of y. Hence f(V) is open. The theorem, and the corollary, is not true if f ′ (x) is not invertible
written as A = [A x A y] so that A(x, y) = A xx + A yy, where A x ∈ L(R n, R m) and A y ∈ L(R m). Let A = [A x A y] ∈ L(R n + m, R m)
and suppose A y is invertible, then let B = − (A y) − 1A x and note that
The proof is obvious. We simply solve and obtain y = Bx. Let us therefore show that the same can be done for C 1 functions.
[thm:implicit] Let U ⊂ R n + m be an open set and let f : U → R m be a C 1(U) mapping. Let (x 0, y 0) ∈ U be a point such that
f(x 0, y 0) = 0 and such that
∂(f 1, …, f m)
(x 0, y 0) ≠ 0.
∂(y 1, …, y m)
Then there exists an open set W ⊂ R n with x 0 ∈ W, an open set W ′ ⊂ R m with y 0 ∈ W ′ , with W × W ′ ⊂ U, and a C 1(W)
mapping g : W → W ′ , with g(x 0) = y 0, and for all x ∈ W, the point g(x) is the unique point in W ′ such that
f (x, g(x) ) = 0.
g ′ (x 0) = − (A y) − 1A x.
∂ ( f1 , … , fm )
FIXME: and these are ALL the points where f vanishes near x 0, y 0. The condition (x 0, y 0) = det (A y) ≠ 0 simply
∂ ( y1 , … , ym )
means that A y is invertible. Define F : U → R n + m by F(x, y) := (x, f(x, y) ). It is clear that F is C 1, and we want to show that the
derivative at (x 0, y 0) is invertible. Let us compute the derivative. We know that
goes to zero as ‖(h, k)‖ = √‖h‖ 2 + ‖k‖ 2 goes to zero. But then so does
So the derivative of F at (x 0, y 0) takes (h, k) to (h, A xh + A yk). If (h, A xh + A yk) = (0, 0), then h = 0, and so A yk = 0. As A y is
one-to-one, then k = 0. Therefore F ′ (x 0, y 0) is one-to-one or in other words invertible and we can apply the inverse function
theorem. That is, there exists some open set V ⊂ R n + m with (x 0, 0) ∈ V, and an inverse mapping G : V → R n + m, that is
F (G(x, s) ) = (x, s) for all (x, s) ∈ V (where x ∈ R n and s ∈ R m). Write G = (G 1, G 2) (the first n and the second m
components of G). Then
F (G 1(x, s), G 2(x, s) ) = (G 1(x, s), f(G 1(x, s), G 2(x, s)) ) = (x, s).
So x = G 1(x, s) and f (G 1(x, s), G 2(x, s)) = f (x, G 2(x, s) ) = s. Plugging in s = 0 we obtain
f (x, G 2(x, 0) ) = 0.
The set G(V) contains a whole neighbourhood of the point (x 0, y 0) and therefore there are some open The set V is open and
hence there exist some open sets W̃ and W ′ such that W̃ × W ′ ⊂ G(V) with x 0 ∈ W̃ and y 0 ∈ W ′ . Then take
W = {x ∈ W̃ : G 2(x, 0) ∈ W ′ }. The function that takes x to G 2(x, 0) is continuous and therefore W is open. We define
g : W → R m by g(x) := G 2(x, 0) which is the g in the theorem. The fact that g(x) is the unique point in W ′ follows because
W × W ′ ⊂ G(V) and G is one-to-one and onto G(V). Next differentiate
x ↦ f (x, g(x) ),
at x 0, which should be the zero map. The derivative is done in the same way as above. We get that for all h ∈ R n
and we obtain the desired derivative for g as well. In other words, in the context of the theorem we have m equations in n + m
unknowns.
f 1(x 1, …, x n, y 1, …, y m) = 0
⋮
f m(x 1, …, x n, y 1, …, y m) =0
And the condition guaranteeing a solution is that this is a C 1 mapping (that all the components are C 1, or in other words all the
partial derivatives exist and are continuous), and the matrix
[ ]
∂f 1 ∂f 1
…
∂y 1 ∂y m
⋮ ⋱ ⋮
∂f m ∂f m
…
∂y 1 ∂y m
is invertible at (x 0, y 0). Consider the set x 2 + y 2 − (z + 1) 3 = − 1, e x + e y + e z = 3 near the point (0, 0, 0). The function we are
looking at is
f(x, y, z) = (x 2 + y 2 − (z + 1) 3 + 1, e x + e y + e z − 3).
Df =
[ 2x
ex
2y
ey
− 3(z + 1) 2
ez ].
The matrix
[ 2(0)
e0
− 3(0 + 1) 2
e0 ] =
[ ]
0
1
−3
1
is invertible. Hence near (0, 0, 0) we can find y and z as C 1 functions of x such that for x near 0 we have
x 2 + y(x) 2 − (z(x) + 1) 3 = − 1, e x + e y ( x ) + e z ( x ) = 3.
The theorem does not tell us how to find y(x) and z(x) explicitly, it just tells us they exist. In other words, near the origin the set
of solutions is a smooth curve inn R 3 that goes through the origin. Note that there are versions of the theorem for arbitrarily
many derivatives. If f has k continuous derivatives, then the solution also has k derivatives. Exercises
Edit page
Click the Edit page button in your user bar. You will see a suggested structure for your content. Add your content and hit
Save.
Tips:
Classifications
Tags are used to link pages to one another along common themes. Tags are also used as markers for the dynamic
organization of content in the MindTouch framework.
1 5/26/2021
10.1: Differentiation under the Integral
Edit page
Click the Edit page button in your user bar. You will see a suggested structure for your content. Add your content and hit
Save.
Tips:
Classifications
Tags are used to link pages to one another along common themes. Tags are also used as markers for the dynamic
organization of content in the MindTouch framework.
Learning Objects
Edit page
Click the Edit page button in your user bar. You will see a suggested structure for your content. Add your content and hit
Save.
Tips:
Classifications
Tags are used to link pages to one another along common themes. Tags are also used as markers for the dynamic
organization of content in the MindTouch framework.
This work is dual licensed under the Creative Commons Attribution-Noncommercial-Share Alike 4.0 International License
and the Creative Commons Attribution-Share Alike 4.0 International License. To view a copy of these licenses, visit
http://creativecommons.org/licenses/by-nc-sa/4.0/ or http://creativecommons.org/licenses/by-sa/4.0/ or send a letter to Creative
Commons PO Box 1866, Mountain View, CA 94042, USA.
You can use, print, duplicate, share this book as much as you want. You can base your own notes on it and reuse parts if you
keep the license the same. You can assume the license is either the CC-BY-NC-SA or CC-BY-SA, whichever is compatible
with what you wish to do, your derivative works must use at least one of the licenses.
During the writing of these notes, the author was in part supported by NSF grant DMS-1362337.
The date is the main identifier of version. The major version / edition number is raised only if there have been substantial
changes. For example version 1.0 is first edition, 0th update (no updates yet).
See http://www.jirka.org/ra/ for more information (including contact information).
Introduction
About this book
This book is the continuation of “Basic Analysis”. The book is meant to be a seamless continuation, so the chapters are
numbered to start where the first volume left off. The book started with my notes for a second semester undergraduate analysis
at University of Wisconsin—Madison in 2012, where I used my notes together with Rudin’s book. In 2016, I taught a second
semester undergraduate analysis at Oklahoma State University and heavily modified and cleaned up the notes, this time using
them as the main text.
I plan on eventually adding more topics especially at the end. I will try to preserve the current numbering in subsequent
editions as always. The new topics I have planned would add sections and chapters onto the end of the book rather than be
inserted in the middle.
For the most part, this second volume depends on the non-optional parts of volume I, however, the optional bits such as higher
order derivatives are sometimes used, for example in 6, 3, 6. This book is not necessarily the entire second semester course.
What I had in mind for a two semester course is that some bits of the first volume, such as metric spaces, are covered in the
second semester, while some of the optional topics of volume I are covered in the first semester. Leaving metric spaces for
second semester makes more sense as then the second semester is the “multivariable” part of the course.
Several possibilities for the material in this book are:
1) 1–5, (perhaps 1), 1 and 2.
2) 1–6, 1–3, 1 and 2.
3) Everything.
When I ran the course at OSU, I covered the first book minus metric spaces and a couple of optional sections in the first
semester. Then, in the second semester, I covered most of what I skipped from volume I, including metric spaces, and took
option 2) above.
x or the bold x for elements of R n, especially in the applied sciences, we use just plain x, which is
While it is common to use →
common in mathematics. That is, v ∈ R n is a vector, which means v = (v 1, v 2, …, v n) is an n-tuple of real numbers.1
It is common to write and treat vectors as column vectors, that is, n × 1 matrices:
[]
v1
v2
v = (v 1, v 2, …, v n) = \scriptsize .
⋮
vn
We will do so when convenient. We call real numbers scalars to distinguish them from vectors.
The set R n has a so-called vector space structure defined on it. However, even though we will be looking at functions defined
on R n, not all spaces we wish to deal with are equal to R n. Therefore, let us define the abstract notion of the vector space.
Let X be a set together with operations of addition, + : X × X → X, and multiplication, ⋅ : R × X → X, (we usually write ax
instead of a ⋅ x). X is called a vector space (or a real vector space) if the following conditions are satisfied:
1. (Addition is associative) If u, v, w ∈ X, then u + (v + w) = (u + v) + w.
2. (Addition is commutative) If u, v ∈ X, then u + v = v + u.
3. (Additive identity) There is a 0 ∈ X such that v + 0 = v for all v ∈ X.
4. (Additive inverse) For every v ∈ X, there is a − v ∈ X, such that v + ( − v) = 0.
5. (Distributive law) If a ∈ R, u, v ∈ X, then a(u + v) = au + av.
6. (Distributive law) If a, b ∈ R, v ∈ X, then (a + b)v = av + bv.
7. (Multiplication is associative) If a, b ∈ R, v ∈ X, then (ab)v = a(bv).
8. (Multiplicative identity) 1v = v for all v ∈ X.
Elements of a vector space are usually called vectors, even if they are not elements of R n (vectors in the “traditional” sense).
If Y ⊂ X is a subset that is a vector space itself with the same operations, then Y is called a subspace or vector subspace of X.
An example vector space is R n, where addition and multiplication by a scalar is done componentwise: if a ∈ R,
v = (v 1, v 2, …, v n) ∈ R n, and w = (w 1, w 2, …, w n) ∈ R n, then
v + w := (v 1, v 2, …, v n) + (w 1, w 2, …, w n) = (v 1 + w 1, v 2 + w 2, …, v n + w n),
av := a(v 1, v 2, …, v n) = (av 1, av 2, …, av n).
In this book we mostly deal with vector spaces that can be often regarded as subsets of R n, but there are other vector spaces
useful in analysis. Let us give a couple of examples.
A trivial example of a vector space (the smallest one in fact) is just X = {0}. The operations are defined in the obvious way.
You always need a zero vector to exist, so all vector spaces are nonempty sets.
The space C([0, 1], R) of continuous functions on the interval [0, 1] is a vector space. For two functions f and g in C([0, 1], R)
and a ∈ R, we make the obvious definitions of f + g and af:
The space of polynomials c 0 + c 1t + c 2t 2 + ⋯ + c mt m is a vector space, let us denote it by R[t] (coefficients are real and the
variable is t). The operations are defined in the same way as for functions above. Suppose there are two polynomials, one of
degree m and one of degree n. Assume n ≥ m for simplicity. Then
(c 0 + c 1t + c 2t 2 + ⋯ + c mt m) + (d 0 + d 1t + d 2t 2 + ⋯ + d nt n) =
(c 0 + d 0) + (c 1 + d 1)t + (c 2 + d 2)t 2 + ⋯ + (c m + d m)t m + d m + 1t m + 1 + ⋯ + d nt n
and
Despite what it looks like, R[t] is not equivalent to R n for any n. In particular, it is not “finite dimensional”, we will make this
notion precise in just a little bit. One can make a finite dimensional vector subspace by restricting the degree. For example, if
we say P n is the set of polynomials of degree n or less, then P n is a finite dimensional vector space.
The space R[t] can be thought of as a subspace of C(R, R). If we restrict the range of t to [0, 1], R[t] can be identified with a
subspace of C([0, 1], R).
It is often better to think of even simpler “finite dimensional” vector spaces using the abstract notion rather than always R n. It
is possible to use other fields than R in the definition (for example it is common to use the complex numbers C), but let us
stick with the real numbers2.
Linear combinations and dimension
Suppose X is a vector space, x 1, x 2, …, x k ∈ X are vectors, and a 1, a 2, …, a k ∈ R are scalars. Then
a 1x 1 + a 2x 2 + ⋯ + a kx k
That is, span(Y) is the line through the origin and the point (1, 1).
[example:vecspr2span] Let Y := {(1, 1), (0, 1)} ⊂ R 2. Then
span(Y) = R 2,
A sum of two linear combinations is again a linear combination, and a scalar multiple of a linear combination is a linear
combination, which proves the following proposition.
Let X be a vector space. For any Y ⊂ X, the set span(Y) is a vector space itself. That is, span(Y) is a subspace of X.
If Y is already a vector space, then span(Y) = Y.
A set of vectors {x 1, x 2, …, x k} ⊂ X is linearly independent, if the only solution to
is the trivial solution a 1 = a 2 = ⋯ = a k = 0. A set that is not linearly independent, is linearly dependent.
A linearly independent set B of vectors such that span(B) = X is called a basis of X. For example the set Y of the two vectors in
is a basis of R 2.
If a vector space X contains a linearly independent set of d vectors, but no linearly independent set of d + 1 vectors, then we
say the dimension or dim X := d. If for all d ∈ N the vector space X contains a set of d linearly independent vectors, we say X
is infinite dimensional and write dim X := ∞.
Clearly for the trivial vector space, dim {0} = 0. We will see in a moment that any vector subspace of R n has a finite
dimension, and that dimension is less than or equal to n.
If a set is linearly dependent, then one of the vectors is a linear combination of the others. In other words, in [eq:lincomb] if
a j ≠ 0, then we solve for x j
a1 aj − 1 aj + 1 ak
xj = x1 + ⋯ + xj − 1 + xj + 1 + ⋯ + x k.
aj aj aj ak
The vector x j has at least two different representations as linear combinations of {x 1, x 2, …, x k}. The one above and x j itself.
If B = {x 1, x 2, …, x k} is a basis of a vector space X, then every point y ∈ X has a unique representation of the form
y= ∑ aj xj
j=1
k k
y= ∑ a jx j = ∑ b jx j,
j=1 j=1
then
∑ (a j − b j)x j = 0.
j=1
For R n we define
and call this the standard basis of R n. We use the same letters e j for any R n, and which space R n we are working in is
understood from context. A direct computation shows that {e 1, e 2, …, e n} is really a basis of R n; it spans R n and is linearly
independent. In fact,
x = (x 1, x 2, …, x n) = ∑ x je j.
j=1
y1 = ∑ a k , 1 x k,
k=1
for some numbers a 1 , 1, a 2 , 1, …, a d , 1, which we can do as S spans X. One of the a k , 1 is nonzero (otherwise y 1 would be
zero), so suppose without loss of generality that this is a 1 , 1. Then we solve
d
1 ak , 1
x1 =
a1 , 1
y1 − ∑a x k.
k=2 1,1
In particular, {y 1, x 2, …, x d} span X, since x 1 can be obtained from {y 1, x 2, …, x d}. Therefore, there are some numbers for
some numbers a 1 , 2, a 2 , 2, …, a d , 2, such that
y 2 = a 1 , 2y 1 + ∑ a k , 2 x k.
k=2
As T is linearly independent, one of the a k , 2 for k ≥ 2 must be nonzero. Without loss of generality suppose a 2 , 2 ≠ 0. Proceed
to solve for
d
1 a1 , 2 ak , 2
x2 =
a2 , 2
y2 −
a2 , 2
y1 − ∑a x k.
k=3 2,2
In particular, {y 1, y 2, x 3, …, x d} spans X.
We continue this procedure. If m < d, then we are done. So suppose m ≥ d. After d steps we obtain that {y 1, y 2, …, y d} spans
X. Any other vector v in X is a linear combination of {y 1, y 2, …, y d}, and hence cannot be in T as T is linearly independent. So
m = d.
Let us look at [mv:dimprop:ii]. First, if T is a set of k linearly independent vectors that do not span X, that is X ∖ span(T) ≠ ∅,
then choose a vector v ∈ X ∖ span(T). The set T ∪ {v} is linearly independent (exercise). If dim X = d, then there must exist
some linearly independent set of d vectors T, and it must span X, otherwise we could choose a larger set of linearly
independent vectors. So we have a basis of d vectors. On the other hand if we have a basis of d vectors, it is linearly
independent and spans X by definition. By [mv:dimprop:i] we know there is no set of d + 1 linearly independent vectors, so
dimension must be d.
We usually write Ax instead of A(x) if A is linear. If A is one-to-one and onto, then we say A is invertible, and we denote the
inverse by A − 1. If A : X → X is linear, then we say A is a linear operator on X.
We write L(X, Y) for the set of all linear transformations from X to Y, and just L(X) for the set of linear operators on X. If
a ∈ R and A, B ∈ L(X, Y), define the transformations aA and A + B by
If A ∈ L(Y, Z) and B ∈ L(X, Y), define the transformation AB as the composition A ∘ B, that is,
ABx := A(Bx).
Finally denote by I ∈ L(X) the identity: the linear operator such that Ix = x for all x.
It is not hard to see that aA ∈ L(X, Y) and A + B ∈ L(X, Y), and that AB ∈ L(X, Z). In particular, L(X, Y) is a vector space. As
the set L(X) is not only a vector space, but also admits a product, it is often called an algebra.
An immediate consequence of the definition of a linear mapping is: if A is linear, then A0 = 0.
If A ∈ L(X, Y) is invertible, then A − 1 is linear.
Let a ∈ R and y ∈ Y. As A is onto, then there is an x such that y = Ax, and further as it is also one-to-one A − 1(Az) = z for all
z ∈ X. So
[mv:lindefonbasis] If A ∈ L(X, Y) is linear, then it is completely determined by its values on a basis of X. Furthermore, if B is
a basis of X, then any function à : B → Y extends to a linear function on X.
We will only prove this proposition for finite dimensional spaces, as we do not need infinite dimensional spaces. For infinite
dimensional spaces, the proof is essentially the same, but a little trickier to write, so let us stick with finitely many dimensions.
Let {x 1, x 2, …, x n} be a basis and suppose Ax j = y j. Every x ∈ X has a unique representation
x= ∑ bj xj
j=1
n n n
Ax = A ∑ b jx j = ∑ b j Ax j = ∑ b j y j.
j=1 j=1 j=1
n n
∑ c j Ax j = A ∑ c j x j = 0.
j=1 j=1
0= ∑ c jx j
j=1
and c j = 0 for all j. So {Ax 1, Ax 2, …, Ax n} is a linearly independent set. By and the fact that the dimension is n, we conclude
{Ax 1, Ax 2, …, Ax n} span X. Any point x ∈ X can be written as
n n
x= ∑ a j Ax j = A ∑ a j x j,
j=1 j=1
so A is onto.
Now suppose A is onto. As A is determined by the action on the basis we see that every element of X has to be in the span of
{Ax 1, Ax 2, …, Ax n}. Suppose
n n
A ∑ cj xj = ∑ c j Ax j = 0.
j=1 j=1
By as {Ax 1, Ax 2, …, Ax n} span X, the set is independent, and hence c j = 0 for all j. In other words if Ax = 0, then x = 0. This
means that A is one-to-one: If Ax = Ay, then A(x − y) = 0 and so x = y.
We leave the proof of the next proposition as an exercise.
[prop:LXYfinitedim] If X and Y are finite dimensional vector spaces, then L(X, Y) is also finite dimensional.
Finally let us note that we often identify a finite dimensional vector space X of dimension n with R n, provided we fix a basis
{x 1, x 2, …, x n} in X. That is, we define a bijective linear map A ∈ L(X, R n) by Ax j = e j, where {e 1, e 2, …, e n}. Then we have
the correspondence
n
A
∑ cj xj ∈X ↦ (c 1, c 2, …, c n) ∈ R n.
j=1
Convexity
A subset U of a vector space is convex if whenever x, y ∈ U, the line segment from x to y lies in U. That is, if the convex
combination (1 − t)x + ty is in U for all t ∈ [0, 1]. See .
Note that in R, every connected interval is convex. In R 2 (or higher dimensions) there are lots of nonconvex connected sets.
For example the set R 2 ∖ {0} is not convex but it is connected. To see this simply take any x ∈ R 2 ∖ {0} and let y := − x.
Then (\nicefrac12)x + (\nicefrac12)y = 0, which is not in the set. On the other hand, the ball B(x, r) ⊂ R n (using the standard
metric on R n) is convex by the triangle inequality.
1
∫ 0f(x) dx ≤ 1 and f(x) ≥ 0 for all x ∈ [0, 1].
Then X is convex. Take t ∈ [0, 1], and note that if f, g ∈ X, then tf(x) + (1 − t)g(x) ≥ 0 for all x. Furthermore
1 1 1
∫ 0 (tf(x) + (1 − t)g(x) ) dx = t∫ 0f(x) dx + (1 − t)∫ 0g(x) dx ≤ 1.
Note that X is not a subspace of C([0, 1], R).
The intersection two convex sets is convex. In fact, if {C λ} λ ∈ I is an arbitrary collection of convex sets, then
C := ⋂ C λ
λ∈I
is convex.
If x, y ∈ C, then x, y ∈ C λ for all λ ∈ I, and hence if t ∈ [0, 1], then tx + (1 − t)y ∈ C λ for all λ ∈ I. Therefore
tx + (1 − t)y ∈ C and C is convex.
Let T : V → W be a linear mapping between two vector spaces and let C ⊂ V be a convex set. Then T(C) is convex.
Take any two points p, q ∈ T(C). Pick x, y ∈ C such that Tx = p and Ty = q. As C is convex, then tx + (1 − t)y ∈ C for all
t ∈ [0, 1], so
For completeness, a very useful construction is the convex hull. Given any set S ⊂ V of a vector space, define the convex hull
of S, by
That is, the convex hull is the smallest convex set containing S. By a proposition above, the intersection of convex sets is
convex and hence, the convex hull is convex.
The convex hull of 0 and 1 in R is [0, 1]. Proof: Any convex set containing 0 and 1 must contain [0, 1]. The set [0, 1] is
convex, therefore it must be the convex hull.
Exercises
Verify that R n is a vector space.
Let X be a vector space. Prove that a finite set of vectors {x 1, …, x n} ⊂ X is linearly independent if and only if for every
j = 1, 2, …, n
That is, the span of the set with one vector removed is strictly smaller.
1
Show that the set X ⊂ C([0, 1], R) of those functions such that ∫ 0 f = 0 is a vector subspace.
Prove C([0, 1], R) is an infinite dimensional vector space where the operations are defined in the obvious way: s = f + g and
m = fg are defined as s(x) := f(x) + g(x) and m(x) := f(x)g(x). Hint: for the dimension, think of functions that are only nonzero on
the interval (\nicefrac1n + 1, \nicefrac1n).
1
Lf(y) := ∫ 0 k(x, y)f(x) dx
is a linear operator. That is, show that L is well defined (that Lf is continuous), and that L is linear.
Let P n be the vector space of polynomials in one variable of degree n or less. Show that P
n is a vector space of dimension
n + 1.
Let R[t] be the vector space of polynomials in one variable t. Let D : R[t] → R[t] be the derivative operator (derivative in t).
Show that D is a linear operator.
Let us show that only works in finite dimensions. Take R[t] and define the operator A : R[t] → R[t] by A (P(t) ) = tP(t). Show
that A is linear and one-to-one, but show that it is not onto.
Finish the proof of in the finite dimensional case. That is, suppose, {x 1, x 2, …x n} is a basis of X, {y 1, y 2, …y n} ⊂ Y and we
define a function
n n
Ax := ∑ b jy j, if x= ∑ b jx j.
j=1 j=1
Compute the convex hull of the set of 3 points {(0, 0), (0, 1), (1, 1)} in R 2.
Show that the set {(x, y) ∈ R 2 : y > x 2} is a convex set.
Show that the set X ⊂ C([0, 1], R) of those functions such that ∫ 10 f = 1 is a convex set, but not a vector subspace.
Show that every convex set in R n is connected using the standard topology on R n.
Suppose K ⊂ R 2 is a convex set such that the only point of the form (x, 0) in K is the point (0, 0). Further suppose that there
(0, 1) ∈ K and (1, 1) ∈ K. Then show that if (x, y) ∈ K, then y > 0 unless x = 0.
Before defining the standard norm on R n, let us define the standard scalar dot product on R n. For two vectors if
x = (x 1, x 2, …, x n) ∈ R n and y = (y 1, y 2, …, y n) ∈ R n, define
x ⋅ y := ∑ x jy j.
j=1
It is easy to see that the dot product is linear in each variable separately, that is, it is a linear mapping when you keep one of the
variables constant. The Euclidean norm is defined as
√(x )
‖x‖ := ‖x‖ R n := √x ⋅ x = 2 + (x 2) 2 + ⋯ + (x n) 2.
1
We normally just use ‖x‖, but sometimes it will be necessary to emphasize that we are talking about the euclidean norm and
use ‖x‖ R n. It is easy to see that the Euclidean norm satisfies [defn:norm:i] and [defn:norm:ii]. To prove that [defn:norm:iii]
holds, the key inequality is the so-called Cauchy-Schwarz inequality we saw before. As this inequality is so important let us
restate and reprove it using the notation of this chapter.
Let x, y ∈ R n, then
|x ⋅ y| ≤ ‖x‖‖y‖ = √x ⋅ x √y ⋅ y,
with equality if and only if the vectors are scalar multiples of each other.
If x = 0 or y = 0, then the theorem holds trivially. So assume x ≠ 0 and y ≠ 0.
If x is a scalar multiple of y, that is x = λy for some λ ∈ R, then the theorem holds with equality:
If x is not a scalar multiple of y, then ‖x + ty‖ 2 > 0 for all t. So the polynomial ‖x + ty‖ 2 is never zero. Elementary algebra
says that the discriminant must be negative:
The distance d(x, y) := ‖x − y‖ is the standard distance function on R n that we used when we talked about metric spaces.
In fact, on any vector space X, once we have a norm (any norm), we define a distance d(x, y) := ‖x − y‖ that makes X into a
metric space (an easy exercise).
Let A ∈ L(X, Y). Define
The number ‖A‖ is called the operator norm. We will see below that indeed it is a norm (at least for finite dimensional
spaces). Again, when necessary to emphasize which norm we are talking about, we may write it as ‖A‖ L ( X , Y ) .
x ‖ Ax ‖ x
By linearity, ‖A ‖ x ‖ ‖ = ‖x‖
, for any nonzero x ∈ X. The vector ‖x‖
is of norm 1. Therefore,
‖Ax‖ ≤ ‖A‖‖x‖.
It is not hard to see from the definition that ‖A‖ = 0 if and only if A = 0, that is, if A takes every vector to the zero vector.
It is also not difficult to see the norm of the identity operator:
‖Ix‖ ‖x‖
‖I‖ = sup = sup = 1.
x∈X ‖x‖ x ∈ X ‖x‖
x≠0 x≠0
For finite dimensional spaces, ‖A‖ is always finite as we prove below. This also implies that A is continuous. For infinite
dimensional spaces neither statement needs to be true. For a simple example, take the vector space of continuously
differentiable functions on [0, 1] and as the norm use the uniform norm. The functions sin(nx) have norm 1, but the derivatives
have norm n. So differentiation (which is a linear operator) has unbounded norm on this space. But let us stick to finite
dimensional spaces now.
When we talk about finite dimensional vector space, one often thinks of R n, although if we have a norm, the norm might
perhaps not be the standard euclidean norm. In the exercises, you can prove that every norm is “equivalent” to the euclidean
norm in that the topology it generates is the same. For simplicity, we only prove the following proposition for the euclidean
space, and the proof for a general finite dimensional space is left as an exercise.
[prop:finitedimpropnormfin] Let X and Y be finite dimensional vector spaces with a norm. If A ∈ L(X, Y), then ‖A‖ < ∞, and
A is uniformly continuous (Lipschitz with constant ‖A‖).
As we said we only prove the proposition for euclidean space so suppose that X = R n and Y = R m and the norm is the standard
euclidean norm. The general case is left as an exercise.
Let {e 1, e 2, …, e n} be the standard basis of R n. Write x ∈ R n, with ‖x‖ = 1, as
x= ∑ c je j.
j=1
|cj | = |x ⋅ ej | ≤ ‖x‖‖ej‖ = 1.
Then
n n n
The right hand side does not depend on x. We found a finite upper bound independent of x, so ‖A‖ < ∞.
Now for any vector spaces X and Y, and A ∈ L(X, Y), suppose that ‖A‖ < ∞. For v, w ∈ X,
In particular, the operator norm is a norm on the vector space L(X, Y).
2. [item:finitedimpropnorm:ii] If A ∈ L(X, Y) and B ∈ L(Y, Z), then
‖BA‖ ≤ ‖B‖‖A‖.
For [item:finitedimpropnorm:i],
‖(A + B)x‖ = ‖Ax + Bx‖ ≤ ‖Ax‖ + ‖Bx‖ ≤ ‖A‖‖x‖ + ‖B‖‖x‖ = (‖A‖ + ‖B‖)‖x‖.
So ‖A + B‖ ≤ ‖A‖ + ‖B‖.
Similarly,
As a norm defines a metric, there is a metric space topology on L(X, Y), so we can talk about open/closed sets, continuity, and
convergence.
[prop:finitedimpropinv] Let X be a finite dimensional vector space with a norm. Let U ⊂ L(X) be the set of invertible linear
operators.
1. [finitedimpropinv:i] If A ∈ U and B ∈ L(X), and
1
‖A − B‖ < ,
‖A − 1‖
then B is invertible.
indeed imply that b is not zero. And a ↦ \nicefrac1a is a continuous map. When n > 1, then there are other noninvertible
operators than just zero, and in general things are a bit more difficult.
Let us prove [finitedimpropinv:i]. We know something about A − 1 and something about A − B. These are linear operators so let
us apply them to a vector.
Therefore,
or in other words ‖Bx‖ ≠ 0 for all nonzero x, and hence Bx ≠ 0 for all nonzero x. This is enough to see that B is one-to-one (if
Bx = By, then B(x − y) = 0, so x = y). As B is one-to-one operator from X to X which is finite dimensional and hence is
invertible.
Let us look at [finitedimpropinv:ii]. Fix some A ∈ U. Let B be invertible and near A, that is ‖A − B‖‖A − 1‖ < \nicefrac12.
Then [eqcontineq] is satisfied. We have shown above (using B − 1y instead of x)
and
Ax j = ∑ a i , j y i,
i=1
[ ]
a1 , 1 a1 , 2 ⋯ a1 , n
a2 , 1 a2 , 2 ⋯ a2 , n
A= .
⋮ ⋮ ⋱ ⋮
am , 1 am , 2 ⋯ am , n
And we say A is an m-by-n matrix. The columns of the matrix are precisely the coefficients that represent Ax j. Let us derive the
familiar rule for matrix multiplication.
When
z= ∑ c j x j,
j=1
( ) ( )
n n m m n
Az = ∑ c j Ax j = ∑ c j ∑ a i , j y i = ∑ ∑ ai , j cj y i,
j=1 j=1 i=1 i=1 j=1
ci , k = ∑ a i , j b j , k.
j=1
A way to remember it is if you order the indices as we do, that is row,column, and put the elements in the same order as the
matrices, then it is the “middle index” that is “summed-out.”
A linear mapping changing one basis to another is a square matrix in which the columns represent basis elements of the second
basis in terms of the first basis. We call such a linear mapping an change of basis.
Suppose all the bases are just the standard bases and X = R n and Y = R m. Recall the Cauchy-Schwarz inequality and compute
( ) ( )( ) ( )
m n 2 m n n m n
‖Az‖ 2 = ∑ ∑ a i , jc j ≤ ∑ ∑ (c j )2 ∑ (a i , j )2 = ∑ ∑ (a i , j) 2 ‖z‖ 2.
i=1 j=1 i=1 j=1 j=1 i=1 j=1
In other words, we have a bound on the operator norm (note that equality rarely happens)
m n
‖A‖ ≤
√∑ ∑
i=1j=1
(a i , j) 2.
If the entries go to zero, then ‖A‖ goes to zero. In particular, if A is fixed and B is changing such that the entries of A − B go to
zero, then B goes to A in operator norm. That is, B goes to A in the metric space topology induced by the operator norm. We
proved the first part of:
If f : S → R nm is a continuous function for a metric space S, then taking the components of f as the entries of a matrix, f is a
continuous mapping from S to L(R n, R m). Conversely, if f : S → L(R n, R m) is a continuous function, then the entries of the
matrix are continuous functions.
The proof of the second part is rather easy. Take f(x)e j and note that is a continuous function to R m with standard Euclidean
norm: ‖f(x)e j − f(y)e j‖ = ‖ (f(x) − f(y) )e j‖ ≤ ‖f(x) − f(y)‖, so as x → y, then ‖f(x) − f(y)‖ → 0 and so ‖f(x)e j − f(y)e j‖ → 0.
Such a function is continuous if and only if its components are continuous and these are the components of the jth column of
the matrix f(x).
Determinants
A certain number can be assigned to square matrices that measures how the corresponding linear mapping stretches space. In
particular, this number, called the determinant, can be used to test for invertibility of a matrix.
First define the symbol sgn(x) for a number is defined by
Suppose σ = (σ 1, σ 2, …, σ n) is a permutation of the integers (1, 2, …, n), that is, a reordering of (1, 2, …, n). Any permutation
can be obtained by a sequence of transpositions (switchings of two elements). Call a permutation even (resp. odd) if it takes an
even (resp. odd) number of transpositions to get from σ to (1, 2, …, n). It can be shown that this is well defined (exercise). In
fact, define
Then it can be shown that sgn(σ) is 1 if σ is even and − 1 if σ is odd. This fact can be proved by noting that applying a
transposition changes the sign. Then note that the sign of (1, 2, …, n) is 1.
Let S n be the set of all permutations on n elements (the symmetric group). Let A = [a i , j] be a square n × n matrix. Define the
determinant of A
7. [prop:det:vii] det [ ]
ab
c d = ad − bc, and det [a] = a.
In fact, the determinant is the unique function that satisfies [prop:det:i], [prop:det:ii], and [prop:det:iii]. But we digress. By
[prop:det:ii], we mean that if we fix all the vectors x 1, …, x n except for x j and think of the determinant as function of x j, it is a
linear function. That is, if v, w ∈ R n are two vectors, and a, b ∈ R are scalars, then
We go through the proof quickly, as you have likely seen this before.
[prop:det:i] is trivial. For [prop:det:ii], notice that each term in the definition of the determinant contains exactly one factor
from each column.
Part [prop:det:iii] follows by noting that switching two columns is like switching the two corresponding numbers in every
element in S n. Hence all the signs are changed. Part [prop:det:iv] follows because if two columns are equal and we switch
them we get the same matrix back and so part [prop:det:iii] says the determinant must have been 0.
Part [prop:det:v] follows because the product in each term in the definition includes one element from the zero column. Part
[prop:det:vi] follows as det is a polynomial in the entries of the matrix and hence continuous. We have seen that a function
defined on matrices is continuous in the operator norm if it is continuous in the entries. Finally, part [prop:det:vii] is a direct
computation.
AB = [Ab 1 Ab 2 ⋯ Ab n].
([ ])
n
= ∑ bj
1,1
bj
2,2
⋯b j
n,n
det ([a j
1
aj
2
⋯ a j ])
n
1 ≤ j1 , j2 , … , jn ≤ n
=
( ∑
( j1 , j2 , … , jn ) ∈ Sn
bj
1,1
bj
2,2
⋯b j
n,n
sgn(j 1, j 2, …, j n)
) det ([a 1 a2 ⋯ a n]).
In the above, go from all integers between 1 and n, to just elements of S n by noting that when two columns in the determinant
are the same, then the determinant is zero. We then reorder the columns to the original ordering and obtain the sgn.
The conclusion that det (AB) = det (A) det (B) follows by recognizing the determinant of B. We obtain this by plugging in
A = I. The expression we got for the determinant of B has rows and columns swapped, so as a side note, we have also just
proved that the determinant of a matrix and its transpose are equal.
To prove the second part of the theorem, suppose A is invertible. Then A − 1A = I and consequently
det (A − 1) det (A) = det (A − 1A) = det (I) = 1. If A is not invertible, then the columns are linearly dependent. That is,
suppose
∑ γ ja j = 0,
j=1
where not all γ j are equal to 0. Without loss of generality suppose γ 1 ≠ 1. Take
Applying the definition of the determinant we see det (B) = γ 1 ≠ 0. Then det (AB) = det (A) det (B) = γ 1 det (A). The first
column of AB is zero, and hence det (AB) = 0. Thus det (A) = 0.
Determinant is independent of the basis. In other words, if B is invertible, then
1
Proof follows by noting det (B − 1AB) = det ( B ) det (A) det (B) = det (A). If in one basis A is the matrix representing a
linear operator, then for another basis we can find a matrix B such that the matrix B − 1AB takes us to the first basis, applies A
in the first basis, and takes us back to the basis we started with. We choose a basis on X, and we represent a linear mapping
using a matrix with respect to this basis. We obtain the same determinant as if we had used any other basis. It follows that
det : L(X) → R
There are three types of so-called elementary matrices. Recall again that e j are the standard basis of R n. First for some
j = 1, 2, …, n and some λ ∈ R, λ ≠ 0, an n × n matrix E defined by
Ee i =
{ ei
λe i
if i ≠ j,
if i = j.
Given any n × m matrix M the matrix EM is the same matrix as M except with the kth row multiplied by λ. It is an easy
computation (exercise) that det (E) = λ.
Second, for some j and k with j ≠ k, and λ ∈ R an n × n matrix E defined by
Ee i =
{ ei
e i + λe k
if i ≠ j,
if i = j.
Given any n × m matrix M the matrix EM is the same matrix as M except with λ times the kth row added to the jth row. It is an
easy computation (exercise) that det (E) = 1.
Finally, for some j and k with j ≠ k an n × n matrix E defined by
{
ei if i ≠ j and i ≠ k,
Ee i = ek if i = j,
ej if i = k.
Given any n × m matrix M the matrix EM is the same matrix with jth and kth rows swapped. It is an easy computation
(exercise) that det (E) = − 1.
T = E 1E 2⋯E k,
and
Exercises
If X is a vector space with a norm ‖ ⋅ ‖, then show that d(x, y) := ‖x − y‖ makes X a metric space.
Show that for square matrices A and B, det (AB) = det (BA).
For R n define
| || | | |
‖x‖ ∞ := max { x 1 , x 2 , …, x n },
For R n define
‖x‖ 1 := ∑ | xj | ,
j=1
Using the euclidean norm on R 2. Compute the operator norm of the operators in L(R 2) given by the matrices:
a) [ ] [ ] [ ] [ ]
10
02 b)
0 1
−1 0 c)
11
01 d)
01
00
| |
earlier exercise that D is a linear operator). Define the norm on P(t) = c 0 + c 1t + ⋯ + c nt n as ‖P‖ := sup { c j : j = 0, 1, …, n}
.
a) Show that ‖P‖ is a norm on R[t].
b) Show that D does not have bounded operator norm, that is ‖D‖ = ∞. Hint: consider the polynomials t n as n tends to infinity.
In this exercise we finish the proof of . Let X be any finite dimensional vector space with a norm. Let {x 1, x 2, …, x n} be a basis
for X.
a) Show that the function f : R n → R
f(c 1, c 2, …, c n) = ‖c 1x 1 + c 2x 2 + ⋯ + c nx n‖
is continuous.
b) Show that there exists numbers m and M such that if c = (c 1, c 2, …, c n) ∈ R n with ‖c‖ = 1 (standard euclidean norm), then
m ≤ ‖c 1x 1 + c 2x 2 + ⋯ + c nx n‖ ≤ M (here the norm is on X).
m‖c‖ ≤ ‖c 1x 1 + c 2x 2 + ⋯ + c nx n‖ ≤ M‖c‖.
c) Now show that U ⊂ X is open in the metric defined by ‖x − y‖ 1 if and only if it is open in the metric defined by ‖x − y‖ 2.
In other words, convergence of sequences, continuity of functions is the same in either norm.
The derivative
Note: 2–3 lectures
The derivative
Recall that for a function f : R → R, we defined the derivative at x as
f(x + h) − f(x)
lim .
h→0
h
lim
h→0
| f(x + h) − f(x)
h |
− a = lim
h→0
| f(x + h) − f(x) − ah
h | = lim
h→0
|f(x + h) − f(x) − ah|
|h|
= 0.
Multiplying by a is a linear map in one dimension. That is, we think of a ∈ L(R 1, R 1) which is the best linear approximation
of f near x. We use this definition to extend differentiation to more variables.
Let U ⊂ R n be an open subset and f : U → R m. We say f is differentiable at x ∈ U if there exists an A ∈ L(R n, R m) such that
We write Df(x) := A, or f ′ (x) := A, and we say A is the derivative of f at x. When f is differentiable at all x ∈ U, we say simply
that f is differentiable.
For a differentiable function, the derivative of f is a function from U to L(R n, R m). Compare to the one dimensional case,
where the derivative is a function from U to R, but we really want to think of R here as L(R 1, R 1).
The norms above must be on the right spaces of course. The norm in the numerator is on R m, and the norm in the denominator
is on R n where h lives. Normally it is understood that h ∈ R n from context. We will not explicitly say so from now on.
We have again cheated somewhat and said that A is the derivative. We have not shown yet that there is only one, let us do that
now.
Let U ⊂ R n be an open subset and f : U → R m. Suppose x ∈ U and there exist A, B ∈ L(R n, R m) such that
Then A = B.
‖(A − B)h‖ h
ϵ> = ‖(A − B) ‖.
‖h‖ ‖h‖
h
For any x with ‖x‖ = 1, let h = (\nicefracδ2) x, then ‖h‖ < δ and ‖h‖
= x. So ‖(A − B)x‖ < ϵ. Taking the supremum over all
x with ‖x‖ = 1 we get the operator norm ‖A − B‖ ≤ ϵ. As ϵ > 0 was arbitrary ‖A − B‖ = 0 or in other words A = B.
If f(x) = Ax for a linear mapping A, then f ′ (x) = A. This is easily seen:
Let f : R 2 → R 2 be defined by f(x, y) = (f 1(x, y), f 2(x, y) ) := (1 + x + 2y + x 2, 2x + 3y + xy). Let us show that f is differentiable
at the origin and let us compute the derivative, directly using the definition. The derivative is in L(R 2, R 2) so it can be
represented by a 2 × 2 matrix [ ]
ab
c d . Suppose h = (h 1, h 2). We need the following expression to go to zero.
4 2 2 2 2
√h + h h
1 √h
1 2 1 + h2
= |h | 1 | |
= h1 .
√ √h
2 2 2
h +h 1 2 1 + h 22
And this expression does indeed go to zero as h → 0. Therefore the function is differentiable at the origin and the derivative
‖r(h) ‖
and ‖h‖
must go to zero as h → 0. So r(h) itself must go to zero. The mapping h ↦ f ′ (p)h is a linear mapping between
finite dimensional spaces, it is therefore continuous and goes to zero as h → 0. Therefore, f(p + h) must go to f(p) as h → 0.
That is, f is continuous at p.
Let U ⊂ R n be open and let f : U → R m be differentiable at p ∈ U. Let V ⊂ R m be open, f(U) ⊂ V and let g : V → R ℓ be
differentiable at f(p). Then
F(x) = g (f(x) )
is differentiable at p and
Without the points where things are evaluated, this is sometimes written as F ′ = (f ∘ g) ′ = g ′ f ′ . The way to understand it is
that the derivative of the composition g ∘ f is the composition of the derivatives of g and f. That is, if f ′ (p) = A and
g ′ (f(p) ) = B, then F ′ (p) = BA.
Let A := f ′ (p) and B := g ′ (f(p) ). Take h ∈ R n and write q = f(p), k = f(p + h) − f(p). Let
‖f(p+h) −f(p) ‖
As f is differentiable at p, for small enough h ‖f(p + h) − f(p) − Ah‖‖h‖ is bounded. Therefore the term ‖h‖ stays
‖ F ( p + h ) − F ( p ) − BAh ‖
bounded as h goes to 0. Therefore, ‖h‖
goes to zero, and F ′ (p) = BA, which is what was claimed.
Partial derivatives
There is another way to generalize the derivative from one dimension. We hold all but one variable constant and take the
regular derivative.
Let f : U → R be a function on an open set U ⊂ R n. If the following limit exists we write
∂f
We call ∂x j
(x) the partial derivative of f with respect to x j. Sometimes we write D jf instead.
∂f k
For a mapping f : U → R m we write f = (f 1, f 2, …, f m), where f k are real-valued functions. Then we define ∂x j
(or write it as
D jf k).
Partial derivatives are easier to compute with all the machinery of calculus, and they provide a way to compute the derivative
of a function.
[mv:prop:jacobianmatrix] Let U ⊂ R n be open and let f : U → R m be differentiable at p ∈ U. Then all the partial derivatives
at p exist and in terms of the standard basis of R n and R m, f ′ (p) is represented by the matrix
[ ]
∂f 1 ∂f 1 ∂f 1
∂x 1 (p) ∂x 2 (p) … ∂x n (p)
∂f 2 ∂f 2 ∂f 2
∂x 1 (p) ∂x 2 (p) … ∂x n (p) .
⋮ ⋮ ⋱ ⋮
∂f m ∂f m ∂f m
∂x 1 (p) ∂x 2 (p) … ∂x n (p)
In other words
m
∂f k
f ′ (p) e j = ∑ ∂x (p) e k.
k=1 j
n
If v = ∑ j = 1c je j = (c 1, c 2, …, c n), then
( )
n m m n
∂f k ∂f k
f ′ (p) v = ∑ ∑ c j ∂x (p) e k = ∑ ∑ c j ∂x (p) e k.
j = 1k = 1 j k=1 j=1 j
As h goes to 0, the right hand side goes to zero by differentiability of f, and hence
f(p + he j) − f(p)
lim = f ′ (p)e j.
h→0
h
Note that f is vector valued. So represent f by components f = (f 1, f 2, …, f m), and note that taking a limit in R m is the same as
taking the limit in each component separately. Therefore for any k the partial derivative
∂f k f k(p + he j) − f k(p)
(p) = lim
∂x j h→0 h
exists and is equal to the kth component of f ′ (p)e j, and we are done.
The converse of the proposition is not true. Just because the partial derivatives exist, does not mean that the function is
differentiable. See the exercises. However, when the partial derivatives are continuous, we will prove that the converse holds.
One of the consequences of the proposition is that if f is differentiable on U, then f ′ : U → L(R n, R m) is a continuous function
∂f k
if and only if all the ∂x j
are continuous functions.
n
∂f
∇f(x) := ∑ ∂x (x) e j.
j=1 j
Notice that the gradient gives us a way to represent the action of the derivative as a dot product: f ′ (x)v = ∇f(x) ⋅ v.
Suppose γ : (a, b) ⊂ R → R n is a differentiable function and the image γ ((a, b) ) ⊂ U. Such a function and its image is
sometimes called a curve, or a differentiable curve. Write γ = (γ 1, γ 2, …, γ n). Let
g(t) := f (γ(t) ).
The function g is differentiable. For purposes of computation we identify L(R 1) with R, and hence g ′ (t) can be computed as a
number:
n n
∂f dγ j ∂f dγ j
g ′ (t) = f′ (γ(t) ) γ ′(t) = ∑ ∂x j ( γ(t) ) (t) =
dt
∑ ∂x .
j=1 j=1 j dt
For convenience, we sometimes leave out the points where we are evaluating as on the right hand side above. Let us rewrite
this with the notation of the gradient and the dot product:
We use this idea to define derivatives in a specific direction. A direction is simply a vector pointing in that direction. So pick a
vector u ∈ R n such that ‖u‖ = 1. Fix x ∈ U. Then define a curve
γ(t) := x + tu.
d
dt | t = 0 [f(x + tu) ] = (∇f)(x) ⋅ u,
d
where the notation |
dt t = 0 represents the derivative evaluated at t = 0. We also compute directly
d
D uf(x) :=
dt | t = 0 [f(x + tu) ],
which can be computed by one of the methods above.
Let us suppose (∇f)(x) ≠ 0. By Cauchy-Schwarz inequality we have
|Duf(x) | ≤ ‖(∇f)(x)‖.
Equality is achieved when u is a scalar multiple of (∇f)(x). That is, when
(∇f)(x)
u= ,
‖(∇f)(x)‖
we get D uf(x) = ‖(∇f)(x)‖. The gradient points in the direction in which the function grows fastest, in other words, in the
direction in which D uf(x) is maximal.
The Jacobian
Let U ⊂ R n and f : U → R n be a differentiable mapping. Then define the Jacobian, or Jacobian determinant 3, of f at x as
∂(f 1, f 2, …, f n)
.
∂(x 1, x 2, …, x n)
This last piece of notation may seem somewhat confusing, but it is useful when you need to specify the exact variables and
function components used.
The Jacobian J f is a real valued function, and when n = 1 it is simply the derivative. From the chain rule and the fact that
det (AB) = det (A) det (B), it follows that:
As we mentioned the determinant tells us what happens to area/volume. Similarly, the Jacobian measures how much a
differentiable mapping stretches things locally, and if it flips orientation. In particular, if the Jacobian is non-zero than we
would assume that locally the mapping is invertible (and we would be correct as we will later see).
Exercises
Suppose γ : ( − 1, 1) → R n and α : ( − 1, 1) → R n be two differentiable curves such that γ(0) = α(0) and γ ′(0) = α ′ (0). Suppose
F : R n → R is a differentiable function. Show that
{
xy
if (x, y) ≠ (0, 0),
f(x, y) := x2 + y2
0 if (x, y) = (0, 0).
∂f ∂f
a) Show that partial derivatives ∂x
and ∂y
exist at all points (including the origin).
b) Show that f is not continuous at the origin (and hence not differentiable).
Define a function f : R 2 → R by
{
x 2y
if (x, y) ≠ (0, 0),
f(x, y) := x2 + y2
0 if (x, y) = (0, 0).
∂f ∂f
a) Show that partial derivatives ∂x and ∂y exist at all points.
b) Show that for all u ∈ R2
with ‖u‖ = 1, the directional derivative D uf exists at all points.
c) Show that f is continuous at the origin.
d) Show that f is not differentiable at the origin.
Suppose f : R n → R n is one-to-one, onto, differentiable at all points, and such that f − 1 is also differentiable at all points.
′
a) Show that f ′ (p) is invertible at all points p and compute (f − 1) (f(p) ). Hint: consider p = f − 1 (f(p) ).
b) Let g : R n → R n be a function differentiable at q ∈ R n and such that g(q) = q. Suppose f(p) = q for some p ∈ R n. Show
J g(q) = J f − 1 ∘ g ∘ f(p) where J g is the Jacobian determinant.
Suppose f : R 2 → R is differentiable and such that f(x, y) = 0 if and only if y = 0 and such that ∇f(0, 0) = (1, 1). Prove that
f(x, y) > 0 whenever y > 0, and f(x, y) < 0 whenever y < 0.
Suppose f : R → R n is differentiable and ‖f(t)‖ = 1 for all t (that is, we have a curve in the unit sphere). Then show that for all
t, treating f ′ as a vector we have, f ′ (t) ⋅ f(t) = 0.
Define f : R 2 → R 2 by f(x, y) := (x, y + φ(x) ) for some differentiable function φ of one variable. Show f is differentiable and
find f ′ .
If φ : [a, b] → R n is differentiable on (a, b) and continuous on [a, b], then there exists a t 0 ∈ (a, b) such that
By mean value theorem on the function (φ(b) − φ(a) ) ⋅ φ(t) (the dot is the scalar dot product again) we obtain there is a
t 0 ∈ (a, b) such that
(φ(b) − φ(a) ) ⋅ φ(b) − (φ(b) − φ(a) ) ⋅ φ(a) = ‖φ(b) − φ(a)‖ 2 = (b − a) (φ(b) − φ(a) ) ⋅ φ ′ (t 0)
where we treat φ ′ as a simply a column vector of numbers by abuse of notation. Note that in this case, if we think of φ ′ (t) as
simply a vector, then by , ‖φ ′ (t)‖ L ( R , R n ) = ‖φ ′ (t)‖ R n. That is, the euclidean norm of the vector is the same as the operator
norm of φ ′ (t).
By Cauchy-Schwarz inequality
Recall that a set U is convex if whenever x, y ∈ U, the line segment from x to y lies in U.
[mv:prop:convexlip] Let U ⊂ R n be a convex open set, f : U → R m a differentiable function, and an M such that
‖f ′ (x)‖ ≤ M
for all x, y ∈ U.
Fix x and y in U and note that (1 − t)x + ty ∈ U for all t ∈ [0, 1] by convexity. Next
d
[
dt ( ]
f (1 − t)x + ty ) = f ′ ((1 − t)x + ty )(y − x).
d
‖f(x) − f(y)‖ ≤ ‖
dt | t = t [f ((1 − t)x + ty ) ]‖ ≤ ‖f ′ ((1 − t0)x + t0y )‖‖y − x‖ ≤ M‖y − x‖. \qedhere
0
If U is not convex the proposition is not true. To see this fact, take the set
Let f(x, y) be the angle that the line from the origin to (x, y) makes with the positive x axis. You can even write the formula for f
:
f(x, y) = 2arctan
( x+
y
√x 2 + y 2 )
.
This means that f − 1(c) is open for any c ∈ R m. Suppose f − 1(c) is nonempty. The two sets
are open disjoint, and further U = U ′ ∪ U ″ . So as U ′ is nonempty, and U is connected, we have that U ″ = ∅. So f(x) = c for
all x ∈ U.
Continuously differentiable functions
We say f : U ⊂ R n → R m is continuously differentiable, or C 1(U) if f is differentiable and f ′ : U → L(R n, R m) is continuous.
[mv:prop:contdiffpartials] Let U ⊂ R n be open and f : U → R m. The function f is continuously differentiable if and only if all
the partial derivatives exist and are continuous.
Without continuity the theorem does not hold. Just because partial derivatives exist does not mean that f is differentiable, in
fact, f may not even be continuous. See the exercises for the last section and also for this section.
We have seen that if f is differentiable, then the partial derivatives exist. Furthermore, the partial derivatives are the entries of
the matrix of f ′ (x). So if f ′ : U → L(R n, R m) is continuous, then the entries are continuous, hence the partial derivatives are
continuous.
To prove the opposite direction, suppose the partial derivatives exist and are continuous. Fix x ∈ U. If we show that f ′ (x)
exists we are done, because the entries of the matrix f ′ (x) are then the partial derivatives and if the entries are continuous
functions, the matrix valued function f ′ is continuous.
Let us do induction on dimension. First let us note that the conclusion is true when n = 1. In this case the derivative is just the
regular derivative (exercise: you should check that the fact that the function is vector valued is not a problem).
Suppose the conclusion is true for R n − 1, that is, if we restrict to the first n − 1 variables, the conclusion is true. It is easy to see
that the first n − 1 partial derivatives of f restricted to the set where the last coordinate is fixed are the same as those for f. In
the following we think of R n − 1 as a subset of R n, that is the set in R n where x n = 0. Let
A= ⋮ ⋱ ⋮ , A1 = ⋮ ⋱ ⋮ , v= ⋮ .
∂f m ∂f m ∂f m ∂f m ∂f m
∂x 1 (x)
… ∂x n (x) ∂x 1 (x)
… ∂x n − 1 (x) ∂x n (x)
Let ϵ > 0 be given. Let δ > 0 be such that for any k ∈ R n − 1 with ‖k‖ < δ we have
| ∂f j
∂x n
(x + h) −
∂f j
∂x n |
(x) < ϵ,
Let h = h 1 + te n be a vector in R n where h 1 ∈ R n − 1 such that ‖h‖ < δ. Then ‖h 1‖ ≤ ‖h‖ < δ. Note that Ah = A 1h 1 + tv.
As all the partial derivatives exist, by the mean value theorem, for each j there is some θ j ∈ [0, t] (or [t, 0] if t < 0), such that
∂f j
f j(x + h 1 + te n) − f j(x + h 1) = t (x + h 1 + θ je n).
∂x n
Note that if ‖h‖ < δ, then ‖h 1 + θ je n‖ ≤ ‖h‖ < δ. So to finish the estimate
√( )
m
∂f j ∂f j 2
≤ ∑ t
∂x n
(x + h 1 + θ je n) − t
∂x n
(x) + ϵ‖h 1‖
j=1
≤ √m ϵ|t| + ϵ‖h 1‖
≤ (√m + 1)ϵ‖h‖.
Exercises
Define f : R 2 → R as
{
−1
(x 2 + y 2)sin ((x 2 + y 2) ) if (x, y) ≠ (0, 0),
f(x, y) :=
0 else.
Show that f is differentiable at the origin, but that it is not continuously differentiable.
{
xy
if (x, y) ≠ (0, 0),
f(x, y) := x2 + y2
0 if (x, y) = (0, 0).
∂f ∂f
Compute the partial derivatives ∂x and ∂y at all points and show that these are not continuous functions.
Let B(0, 1) ⊂ R 2 be the unit ball (disc), that is, the set given by x 2 + y 2 < 1. Suppose f : B(0, 1) → R is a differentiable
a) Find an M ∈ R such that ‖f ′ (x, y)‖ ≤ M for all (x, y) ∈ B(0, 1).
b) Find a B ∈ R such that |f(x, y)| ≤ B for all (x, y) ∈ B(0, 1).
Define φ : [0, 2π] → R 2 by φ(t) = (sin(t), cos(t) ). Compute φ ′ (t) for all t. Compute ‖φ ′ (t)‖ for all t. Notice that φ ′ (t) is never
zero, yet φ(0) = φ(2π), therefore, Rolle’s theorem is not true in more than one dimension.
||
∂f
∂y
≤ M at all points. Show that f is continuous.
Let f : R 2 → R be a function and M ∈ R, such that for every (x, y) ∈ R 2, the function g(t) := f(xt, yt) is differentiable and
|g (t) | ≤ M.
′
The contraction mapping principle says that if f : X → X is a contraction and X is a complete metric space, then there exists a
unique fixed point, that is, there exists a unique x ∈ X such that f(x) = x.
Intuitively if a function is differentiable, then it locally “behaves like” the derivative (which is a linear function). The idea of
the inverse function theorem is that if a function is differentiable and the derivative is invertible, the function is (locally)
invertible.
[thm:inverse] Let U ⊂ R n be a set and let f : U → R n be a continuously differentiable function. Also suppose p ∈ U, f(p) = q,
and f ′ (p) is invertible (that is, J f(p) ≠ 0). Then there exist open sets V, W ⊂ R n such that p ∈ V ⊂ U, f(V) = W and f | V is
one-to-one and onto. Furthermore, the inverse g(y) = (f | V) − 1(y) is continuously differentiable and
−1
g ′ (y) = (f ′ (x) ) , for all x ∈ V, y = f(x).
Write A = f ′ (p). As f ′ is continuous, there exists an open ball V around p such that
φ y(x) = x + A − 1 (y − f(x) ).
As A − 1 is one-to-one, then φ y(x) = x (x is a fixed point) if only if y − f(x) = 0, or in other words f(x) = y. Using chain rule we
obtain
′
φ y (x) = I − A − 1f ′ (x) = A − 1 (A − f ′ (x) ).
So for x ∈ V we have
1
‖φ y(x 1) − φ y(x 2)‖ ≤ ‖x 1 − x 2‖ for all x 1, x 2 ∈ V.
2
In other words φ y is a contraction defined on V, though we so far do not know what is the range of φ y. We cannot apply the
fixed point theorem, but we can say that φ y has at most one fixed point (note proof of uniqueness in the contraction mapping
principle). That is, there exists at most one x ∈ V such that f(x) = y, and so f | V is one-to-one.
Let W = f(V). We need to show that W is open. Take a y 1 ∈ W, then there is a unique x 1 ∈ V such that f(x 1) = y 1. Let r > 0
be small enough such that the closed ball C(x 1, r) ⊂ V (such r > 0 exists as V is open).
Suppose y is such that
r
‖y − y 1‖ < .
2‖A − 1‖
If we show that y ∈ W, then we have shown that W is open. Define φ y(x) = x + A − 1 (y − f(x) ) as before. If x ∈ C(x 1, r), then
So φ y takes C(x 1, r) into B(x 1, r) ⊂ C(x 1, r). It is a contraction on C(x 1, r) and C(x 1, r) is complete (closed subset of R n is
complete). Apply the contraction mapping principle to obtain a fixed point x, i.e. φ y(x) = x. That is f(x) = y. So
y ∈ f (C(x 1, r) ) ⊂ f(V) = W. Therefore W is open.
Next we need to show that g is continuously differentiable and compute its derivative. First let us show that it is differentiable.
Let y ∈ W and k ∈ R n, k ≠ 0, such that y + k ∈ W. Then there are unique x ∈ V and h ∈ R n, h ≠ 0 and x + h ∈ V, such that
f(x) = y and f(x + h) = y + k as f | V is a one-to-one and onto mapping of V onto W. In other words, g(y) = x and
g(y + k) = x + h. We can still squeeze some information from the fact that φ y is a contraction.
So
1 ‖h‖
‖h − A − 1k‖ = ‖φ y(x + h) − φ y(x)‖ ≤ ‖x + h − x‖ = .
2 2
1
By the inverse triangle inequality ‖h‖ − ‖A − 1k‖ ≤ 2
‖h‖ so
As k goes to 0, so does h. So the right hand side goes to 0 as f is differentiable, and hence the left hand side also goes to 0. And
B is precisely what we wanted g ′ (y) to be.
We have g is differentiable, let us show it is C 1(W). Now, g : W → V is continuous (it is differentiable), f ′ is a continuous
−1
function from V to L(R n), and X → X − 1 is a continuous function. g ′ (y) = (f ′ (g(y) )) is the composition of these three
continuous functions and hence is continuous.
Suppose U ⊂ R n is open and f : U → R n is a continuously differentiable mapping such that f ′ (x) is invertible for all x ∈ U.
Then given any open set V ⊂ U, f(V) is open. (f is an open mapping).
Without loss of generality, suppose U = V. For each point y ∈ f(V), we pick x ∈ f − 1(y) (there could be more than one such
point), then by the inverse function theorem there is a neighborhood of x in V that maps onto an neighborhood of y. Hence f(V)
is open.
The theorem, and the corollary, is not true if f ′ (x) is not invertible for some x. For example, the map f(x, y) = (x, xy), maps R 2
onto the set R 2 ∖ {(0, y) : y ≠ 0}, which is neither open nor closed. In fact f − 1(0, 0) = {(0, y) : y ∈ R}. This bad behavior only
occurs on the y-axis, everywhere else the function is locally invertible. If we avoid the y-axis, f is even one-to-one.
Also note that just because f ′ (x) is invertible everywhere does not mean that f is one-to-one globally. It is “locally” one-to-one
but perhaps not “globally.” For an example, take the map f : R 2 ∖ {0} → R 2 defined by f(x, y) = (x 2 − y 2, 2xy). It is left to
student to show that f is differentiable and the derivative is invertible
On the other hand, the mapping is 2-to-1 globally. For every (a, b) that is not the origin, there are exactly two solutions to
x 2 − y 2 = a and 2xy = b. We leave it to the student to show that there is at least one solution, and then notice that replacing x
and y with − x and − y we obtain another solution.
The invertibility of the derivative is not a necessary condition, just sufficient, for having a continuous inverse and being an
open mapping. For example the function f(x) = x 3 is an open mapping from R to R and is globally one-to-one with a
To make things simple we fix some notation. We let (x, y) ∈ R n + m denote the coordinates (x 1, …, x n, y 1, …, y m). A linear
transformation A ∈ L(R n + m, R m) can then be written as A = [A x A y] so that A(x, y) = A xx + A yy, where A x ∈ L(R n, R m) and
A y ∈ L(R m).
The proof is obvious. We simply solve and obtain y = Bx. Let us show that the same can be done for C 1 functions.
[thm:implicit] Let U ⊂ R n + m be an open set and let f : U → R m be a C 1(U) mapping. Let (p, q) ∈ U be a point such that
f(p, q) = 0 and such that
∂(f 1, …, f m)
(p, q) ≠ 0.
∂(y 1, …, y m)
Then there exists an open set W ⊂ R n with p ∈ W, an open set W ′ ⊂ R m with q ∈ W ′ , with W × W ′ ⊂ U, and a C 1(W)
mapping g : W → W ′ , with g(p) = q, and for all x ∈ W, the point g(x) is the unique point in W ′ such that
f (x, g(x) ) = 0.
g ′ (p) = − (A y) − 1A x.
∂ ( f1 , … , fm )
The condition ∂ ( y1 , … , ym ) (p, q) = det (A y) ≠ 0 simply means that A y is invertible.
Define F : U → R n + m by F(x, y) := (x, f(x, y) ). It is clear that F is C 1, and we want to show that the derivative at (p, q) is
invertible.
Let us compute the derivative. We know that
goes to zero as ‖(h, k)‖ = √‖h‖ 2 + ‖k‖ 2 goes to zero. But then so does
‖ (h, f(p + h, q + k) − f(p, q) ) − (h, A xh + A yk)‖ ‖f(p + h, q + k) − f(p, q) − A xh − A yk‖
= .
‖(h, k)‖ ‖(h, k)‖
That is, there exists some open set V ⊂ R n + m with (p, 0) ∈ V, and an inverse mapping G : V → R n + m, that is
F (G(x, s) ) = (x, s) for all (x, s) ∈ V (where x ∈ R n and s ∈ R m). Write G = (G 1, G 2) (the first n and the second m
components of G). Then
F (G 1(x, s), G 2(x, s) ) = (G 1(x, s), f(G 1(x, s), G 2(x, s)) ) = (x, s).
f (x, G 2(x, 0) ) = 0.
The set G(V) contains a whole neighborhood of the point (p, q) and therefore there are some open The set V is open and hence
there exist some open sets W̃ and W ′ such that W̃ × W ′ ⊂ G(V) with p ∈ W̃ and q ∈ W ′ . Then take
W = {x ∈ W̃ : G 2(x, 0) ∈ W ′ }. The function that takes x to G 2(x, 0) is continuous and therefore W is open. We define
g : W → R m by g(x) := G 2(x, 0) which is the g in the theorem. The fact that g(x) is the unique point in W ′ follows because
W × W ′ ⊂ G(V) and G is one-to-one and onto G(V).
Next differentiate
x ↦ f (x, g(x) ),
at p, which should be the zero map. The derivative is done in the same way as above. We get that for all h ∈ R n
f 1(x 1, …, x n, y 1, …, y m) = 0
⋮
f m(x 1, …, x n, y 1, …, y m) = 0
And the condition guaranteeing a solution is that this is a C 1 mapping (that all the components are C 1, or in other words all the
partial derivatives exist and are continuous), and the matrix
[ ]
∂f 1 ∂f 1
∂y 1
… ∂y m
⋮ ⋱ ⋮
∂f m ∂f m
∂y 1
… ∂y m
f(x, y, z) = (x 2 + y 2 − (z + 1) 3 + 1, e x + e y + e z − 3).
f′ =
[ 2x
ex
2y
ey
− 3(z + 1) 2
ez ] .
The matrix
[ 2(0)
e0
− 3(0 + 1) 2
e0 ][ ]
=
0
1
−3
1
is invertible. Hence near (0, 0, 0) we can find y and z as C 1 functions of x such that for x near 0 we have
x 2 + y(x) 2 − (z(x) + 1 ) 3 = − 1, e x + e y ( x ) + e z ( x ) = 3.
The theorem does not tell us how to find y(x) and z(x) explicitly, it just tells us they exist. In other words, near the origin the set
of solutions is a smooth curve in R 3 that goes through the origin.
We remark that there are versions of the theorem for arbitrarily many derivatives. If f has k continuous derivatives, then the
solution also has k continuous derivatives.
Exercises
Let C = {(x, y) ∈ R 2 : x 2 + y 2 = 1}.
a) Solve for y in terms of x near (0, 1).
b) Solve for y in terms of x near (0, − 1).
c) Solve for x in terms of y near ( − 1, 0).
Define f : R 2 → R 2 by f(x, y) := (x, y + h(x) ) for some continuously differentiable function h of one variable.
a) Show that f is one-to-one and onto.
b) Compute f ′ .
c) Show that f ′ is invertible at all points, and compute its inverse.
Define f : R 2 → R 2 ∖ {(0, 0)} by f(x, y) := (e xcos(y), e xsin(y) ).
a) Show that f is onto.
b) Show that f ′ is invertible at all points.
c) Show that f is not one-to-one, in fact for every (a, b) ∈ R 2 ∖ {(0, 0)}, there exist infinitely many different points
(x, y) ∈ R 2 such that f(x, y) = (a, b).
Therefore, invertible derivative at every point does not mean that f is invertible globally.
Find a map f : R n → R n that is one-to-one, onto, continuously differentiable, but f ′ (0) = 0. Hint: Generalize f(x) = x 3 from one
to n dimensions.
Consider z 2 + xz + y = 0 in R 3. Find an equation D(x, y) = 0, such that if D(x 0, y 0) ≠ 0 and z 2 + x 0z + y 0 = 0 for some z ∈ R,
then for points near (x 0, y 0) there exist exactly two distinct continuously differentiable functions r 1(x, y) and r 2(x, y) such that
z = r 1(x, y) and z = r 2(x, y) solve z 2 + xz + y = 0. Do you recognize the expression D from algebra?
∂f
Suppose f : (a, b) → R 2 is continuously differentiable and ∂x
(t) ≠ 0 for all t ∈ (a, b). Prove that there exists an interval (c, d)
and a continuously differentiable function g : (c, d) → R such that (x, y) ∈ f ((a, b) ) if and only if x ∈ (c, d) and y = g(x). In
other words, the set f ((a, b) ) is a graph of g.
Define f : R 2 → R 2
∂f
∂ ( ∂x )
∂ 2f j
:= .
∂x k∂x j ∂x k
∂ 2f
If k = j, then we write for simplicity.
∂x j2
We define higher order derivatives inductively. Suppose j 1, j 2, …, j ℓ are integers between 1 and n, and suppose
∂ ℓ − 1f
∂x j ∂x j ⋯∂x j
ℓ−1 ℓ−2 1
exists and is differentiable in the variable x j , then the partial derivative with respect to that variable is denoted by
ℓ
∂ ℓ − 1f
∂ ( ∂x )
∂ ℓf j ℓ − 1∂x j ℓ − 2 ⋯ ∂x j 1
:= .
∂x j ∂x j ⋯∂x j ∂x j
ℓ ℓ−1 1 ℓ
∂ 2f ∂ 2f
= .
∂x k∂x j ∂x j∂x k
Fix a point p ∈ U, and let e j and e k be the standard basis vectors and let s and t be two small nonzero real numbers. We pick s
| | | |
and t small enough so that p + s 0e j + t 0e k ∈ U for all s 0 and t 0 with s 0 ≤ |s| and t 0 ≤ |t|. This is possible since U is open
and so contains a small ball (or a box if you wish).
Using the mean value theorem on the partial derivative in x k of the function f(p + se j) − f(p), we find a t 0 between 0 and t such
that
∂f ∂f
∂x k (p + se j + t 0e k) − ∂x k (p + t 0e k) ∂ 2f
= (p + s 0e j + t 0e k).
s ∂x j∂x k
In other words
Taking a limit as (s, t) ∈ R 2 goes to zero we find that (s 0, t 0) also goes to zero and by continuity of the second partial
derivatives we find that
∂ 2f
lim g(s, t) = (p).
(s,t) →0 ∂x j∂x k
We now reverse the ordering, starting with the function f(p + te k) − f(p) we find an s 1 between 0 and s such that
∂f ∂f
(p + te k + s 1e j) − (p + s 1e j)
∂x j ∂x j ∂ 2f
= (p + t 1e k + s 1e j).
t ∂x k∂x j
∂ 2f
Again we find that g(s, t) = ∂x k∂x j
(p + t 1e k + s 1e j) and therefore
Exercises
Suppose f : U → R is a C 2 function for some open U ⊂ R n and p ∈ U. Use the proof of to find an expression in terms of just
∂ 2f
the values of f (analogue of the difference quotient for the first derivative), whose limit is ∂x j∂x k (p).
Define
{
xy ( x 2 − y 2 )
if (x, y) ≠ (0, 0),
f(x, y) := x2 + y2
0 if (x, y) = (0, 0).
Show that
a) The first order partial derivatives exist and are continuous.
∂ 2f ∂ 2f ∂ 2f ∂ 2f
b) The partial derivatives ∂x∂y
and ∂y∂x
exist, but are not continuous at the origin, and ∂x∂y
(0, 0) ≠ ∂y∂x
(0, 0).
Suppose f : U → R is a C k function for some open U ⊂ R n and p ∈ U. Suppose j 1, j 2, …, j k are integers between 1 and n, and
suppose σ = (σ 1, σ 2, …, σ k) is a permutation of (1, 2, …, k). Prove
∂ kf ∂ kf
(p) = (p).
∂x j ∂x j ⋯∂x j ∂x j ∂x j ⋯∂x j
k k−1 1 σk σk − 1 σ1
Suppose φ : R 2 → R be a C k function such that φ(0, θ) = φ(0, ψ) for all θ, ψ ∈ R and φ(r, θ) = φ(r, θ + 2π) for all r, θ ∈ R.
Let F(r, θ) = (rcos(θ), rsin(θ) ) from . Show that a function g : R 2 → R, given g(x, y) := φ (F − 1(x, y) ) is well defined (notice
that F − 1(x, y) can only be defined locally), and when restricted to R 2 ∖ {0} it is a C k function.
b
g(y) := ∫ a f(x, y) dx.
Suppose f is differentiable in y. The question we ask is when can we “differentiate under the integral”, that is, when is it true
that g is differentiable and its derivative
? b ∂f
g ′ (y) = ∫ a ∂y (x, y) dx.
Differentiation is a limit and therefore we are really asking when do the two limiting operations of integration and
differentiation commute. As we have seen, this is not always possible, some sort of uniformity is necessary. In particular, the
∂f ∂f
first question we would face is the integrability of ∂y , but the formula can fail even if ∂y is integrable for all y.
b
g(y) := ∫ a f(x, y) dx.
b ∂f
g ′ (y) = ∫ a ∂y (x, y) dx.
∂f ∂f
The continuity requirements for f and ∂y
can be weakened, but not dropped outright. The main point is for ∂y
to exist and be
continuous for a small interval in the y direction. In applications, the [c, d] can be a small interval around the point where you
need to differentiate.
∂f
Fix y ∈ [c, d] and let ϵ > 0 be given. As ∂y is continuous on [a, b] × [c, d] it is uniformly continuous. In particular, there
| |
exists δ > 0 such that whenever y 1 ∈ [c, d] with y 1 − y < δ and all x ∈ [a, b] we have
| ∂f
∂y
(x, y 1) −
∂f
∂y
(x, y) < ϵ. |
Suppose h is such that y + h ∈ [c, d] and |h| < δ. Fix x for a moment and apply mean value theorem to find a y 1 between y and
y + h such that
f(x, y + h) − f(x, y) ∂f
= (x, y 1).
h ∂y
| f(x, y + h) − f(x, y)
h
−
∂f
∂y
(x, y) =
| | ∂f
∂y
(x, y 1) −
∂f
∂y |
(x, y) < ϵ.
f(x, y + h) − f(x, y) ∂f
x↦ converges uniformly to x↦ (x, y) as h → 0.
h ∂y
We only defined uniform convergence for sequences although the idea is the same. If you wish you can replace h with
\nicefrac1n above and let n → ∞.
Now consider the difference quotient
b b
g(y + h) − g(y) ∫ a f(x, y + h) dx − ∫ a f(x, y) dx b f(x, y + h) − f(x, y)
h
=
h
= ∫a h
dx.
1
f ′ (y) = ∫ 0 − 2ycos(x 2 − y 2) dx.
Suppose we start with
1x −1
∫ 0 ln(x) dx.
The function under the integral extends to be continuous on [0, 1], and hence the integral exists, see exercise below. Trouble is
finding it. Introduce a parameter y and define a function:
y
1x −1
g(y) := ∫ 0 dx.
ln(x)
xy − 1
The function ln ( x ) also extends to a continuous function of x and y for (x, y) ∈ [0, 1] × [0, 1]. Therefore g is a continuous
function of on [0, 1]. In particular, g(0) = 0. For any ϵ > 0, the y derivative of the integrand, x y, is continuous on
[0, 1] × [ϵ, 1]. Therefore, for y > 0 we may differentiate under the integral sign
y
1 ln(x)x 1 1
g ′ (y) = ∫ 0 ln(x)
dx = ∫ 0x y dx = .
y+1
1 1
We need to figure out g(1), knowing g ′ (y) = y+1 and g(0) = 0. By elementary calculus we find g(1) = ∫ 0 g ′ (y) dy = ln(2).
Therefore
1x −1
∫ 0 ln(x) dx = ln(2).
Exercises
Suppose h : R → R is a continuous function. Suppose g : R → R is which is continuously differentiable and compactly
supported. That is there exists some M > 0 such that g(x) = 0 whenever |x| ≥ M. Define
∞
f(x) := ∫ − ∞ h(y)g(x − y) dy.
Compute ∫ 10 e tx dx. Derive the formula for ∫ 10 x ne x dx not using integration by parts, but by differentiation underneath the
integral.
1
F(y 1, y 2, …, y n) := ∫ 0 f(x, y 1, y 2, …, y n) dx
is continuously differentiable.
Work out the following counterexample: Let
{
xy 3
2 if x ≠ 0 or y ≠ 0,
f(x, y) := ( x2 + y2 )
0 if x = 0 and y = 0.
a) Prove that for any fixed y the function x ↦ f(x, y) is Riemann integrable on [0, 1] and
1 y
g(y) = ∫ 0f(x, y) dx = 2y 2 + 2
.
1 − y2
g ′ (y) = 2.
2(y 2 + 1)
∂f
b) Prove ∂y exists at all x and y and compute it.
c) Show that for all y
1 ∂f
∫ 0 ∂y (x, y) dx
exists but
1 ∂f
g ′ (0) ≠ ∫ 0 ∂y (x, 0) dx.
Work out the following counterexample: Let
{
1
xy 2sin ( ) if x ≠ 0 and y ≠ 0,
f(x, y) := x 3y
0 if x = 0 or y = 0.
a) Prove f is continuous on [0, 1] × [a, b] for any interval [a, b]. Therefore the following function is well defined on [a, b]
1
g(y) = ∫ 0f(x, y) dx.
∂f
b) Prove ∂y
exists for all (x, y) in [0, 1] × [a, b], but is not continuous.
1 ∂f
c) Show that ∫ 0 ∂y (x, y) dx does not exist if y ≠ 0 even if we take improper integrals.
Path integrals
We say γ is a simple path if γ | (a,b) is a one-to-one function. A γ is a closed path if γ(a) = γ(b), that is if the path starts and
ends in the same point.
Since γ is a function of one variable, we have seen before that treating γ ′(t) as a matrix is equivalent to treating it as a vector
since it is an n × 1 matrix, that is, a column vector. In fact, by an earlier exercise, even the operator norm of γ ′(t) is equal to the
euclidean norm. Therefore, we will write γ ′(t) as a vector as is usual, and then γ ′(t) is just the vector of the derivatives of its
′ ′ ′
components, so if γ(t) = (γ 1(t), γ 2(t), …, γ n(t) ), then γ ′(t) = (γ 1 (t), γ 2 (t), …, γ n (t) ).
One can often get by with only smooth paths, but for computations, the simplest paths to write down are often piecewise
smooth. Note that a piecewise smooth function (or path) is automatically continuous (exercise).
Generally, it is the direct image γ ([a, b] ) that is what we are interested in, although how we parametrize it with γ is also
important to some degree. We informally talk about a curve, and often we really mean the set γ ([a, b] ), just as before
depending on context.
[mv:example:unitsquarepath] Let γ : [0, 4] → R 2 be defined by
{
(t, 0) if t ∈ [0, 1],
(1, t − 1) if t ∈ (1, 2],
γ(t) := (3 − t, 1) if t ∈ (2, 3],
(0, 4 − t) if t ∈ (3, 4].
Then the reader can check that the path is the unit square traversed counterclockwise. We can check that for example
γ | [ 1 , 2 ] (t) = (1, t − 1) and therefore (γ | [ 1 , 2 ] ) ′ (t) = (0, 1) ≠ 0. It is good to notice at this point that (γ | [ 1 , 2 ] ) ′ (1) = (0, 1),
(γ | [ 0 , 1 ] ) ′ (1) = (1, 0), and γ ′(1) does not exist. That is, at the corners γ is of course not differentiable, even though the
restrictions are differentiable and the derivative depends on which restriction you take.
The condition that γ ′(t) ≠ 0 means that the image of γ has no “corners” where γ is continuously differentiable. For example,
take the function
γ(t) :=
{ (t 2, 0)
(0, t 2)
if t < 0,
if t ≥ 0.
It is left for the reader to check that γ is continuously differentiable, yet the image
\gamma({\mathbb{R}}) = \{ (x,y) \in {\mathbb{R}}^2 : (x,y) =
(s,0) \text{ or } (x,y) = (0,s) \text{ for some\)s 0\(} \} has a “corner” at the origin. And that is because γ ′(0) = (0, 0).
More complicated examples with even infinitely many corners exist, see the exercises.
The condition that γ ′(t) ≠ 0 even at the endpoints guarantees not only no corners, but also that the path ends nicely, that is, can
extend a little bit past the endpoints. Again, see the exercises.
A graph of a continuously differentiable function f : [a, b] → R is a smooth path. That is, define γ : [a, b] → R 2 by
Then α ′ (t) = (b − a, (b − a)f ′ ((1 − t)a + tb) ), which is never zero. Furthermore as sets
α ([0, 1] ) = γ ([a, b] ) = {(x, y) ∈ R2 : x ∈ [a, b] and f(x) = y}, which is just the graph of f.
The last example leads us to a definition.
Let γ : [a, b] → R n be a smooth path and h : [c, d] → [a, b] a continuously differentiable bijective function such that h ′ (t) ≠ 0
for all t ∈ [c, d]. Then the composition γ ∘ h is called a smooth reparametrization of γ.
Let γ be a piecewise smooth path, and h be a piecewise smooth bijective function. Then the composition γ ∘ h is called a
piecewise smooth reparametrization of γ.
If h is strictly increasing, then h is said to preserve orientation. If h does not preserve orientation, then h is said to reverse
orientation.
A reparametrization is another path for the same set. That is, (γ ∘ h) ([c, d] ) = γ ([a, b] ).
Let us remark that for h, piecewise smooth means that there is some partition t 0 = c < t 1 < t 2 < ⋯ < t k = d, such that
h| is continuously differentiable and (h | ′ (t) ≠ 0 for all t ∈ [t j − 1, t j]. Since h is bijective, it is either strictly
[ tj − 1 , tj ] [ tj − 1 , tj ] )
increasing or strictly decreasing. Therefore either (h | ′ (t) ′ (t)
[ tj − 1 , tj ] ) > 0 for all t or (h | [ tj − 1 , tj ] ) < 0 for all t.
(γ ∘ h) | [ s j − 1 , s j ] (t) = γ| [ tj − 1 , tj ] (h | [ s j − 1 , sj ] (t) ).
The function (γ ∘ h) | [ sj − 1 , sj ] is therefore continuously differentiable and by the chain rule
We could represent ω as a continuous function from S to R n, although it is better to think of it as a different object.
For example,
a one-form defined on the direct image γ ([a, b] ). Let γ = (γ 1, γ 2, …, γ n) be the components of γ. Define:
b
∫ γω := ∫ a (ω 1 (γ(t) )γ 1 (t) + ω 2 (γ(t) )γ 2 (t) + ⋯ + ω n (γ(t) )γ n (t) ) dt
′ ′ ′
( )
n
b
∫ ∑ ω j (γ(t) )γ j (t)
′
= a
dt.
j=1
If γ is piecewise smooth, take the corresponding partition t 0 = a < t 1 < t 2 < … < t k = b, where we assume the partition is the
minimal one, that is γ is not differentiable at t 2, t 3, …, t k − 1. Each γ | [ t , t ] is a smooth path and we define
j−1 j
∫ γω := ∫ γ | [ t , t ] ω + ∫ γ | [ t , t ] ω + ⋯ + ∫ γ | [ t
0 1 1 2 n − 1 , tn ]
ω.
The notation makes sense from the formula you remember from calculus, let us state it somewhat informally: if x j(t) = γ j(t),
′
then dx j = γ j (t)dt.
Paths can be cut up or concatenated as follows. The proof is a direct application of the additivity of the Riemann integral, and
is left as an exercise. The proposition also justifies why we defined the integral over a piecewise smooth path in the way we
did, and it further justifies that we may as well have taken any partition not just the minimal one in the definition.
[mv:prop:pathconcat] Let γ : [a, c] → R n be a piecewise smooth path. For some b ∈ (a, c), define the piecewise smooth paths
α = γ | [ a , b ] and β = γ | [ b , c ] . For a one-form ω defined on the image of γ we have
∫ γω = ∫ αω + ∫ βω.
[example:mv:irrotoneformint] Let the one-form ω and the path γ : [0, 2π] → R 2 be defined by
−y x
ω(x, y) := dx + dy, γ(t) := (cos(t), sin(t) ).
x2 + y2 x2 + y2
2π
( − sin( − t)
∫ βω = ∫ 0 (cos( − t) ) 2 + (sin( − t) ) 2 (sin( − t) ) + (cos( − t) ) 2 + (sin( − t) ) 2 ( − cos( − t) )
cos( − t)
) dt
2π
= ∫0 ( − 1) dt = − 2π.
Now, α was an orientation preserving reparametrization of γ, and the integral was the same. On the other hand β is an
orientation reversing reparametrization and the integral was minus the original.
The previous example is not a fluke. The path integral does not depend on the parametrization of the curve, the only thing that
matters is the direction in which the curve is traversed.
∫ γ ∘ hω = { ∫ γω
− ∫ γω
if h preserves orientation,
if h reverses orientation.
Assume first that γ and h are both smooth. Write the one-form as ω = ω 1dx 1 + ω 2dx 2 + ⋯ + ω ndx n. Suppose first that h is
orientation preserving. Using the definition of the path integral and the change of variables formula for the Riemann integral,
( )
n
b
∫ γω = ∫ ∑ ω j (γ(t) )γ j (t)
′
a dt
j=1
( )
n
d
∫ ∑ ω j (γ (h(τ) ) )γ j (h(τ) )
′
= c h ′ (τ) dτ
j=1
( )
n
d
= ∫ ∑ ω j (γ (h(τ) ) )(γ j ∘ h) ′ (τ)
c dτ = ∫ γ ∘ hω.
j=1
If h is orientation reversing it will swap the order of the limits on the integral introducing a minus sign. The details, along with
finishing the proof for piecewise smooth paths is left to the reader as .
Due to this proposition (and the exercises), if we have a set Γ ⊂ R n that is the image of a simple piecewise smooth path
γ ([a, b] ), then if we somehow indicate the orientation, that is, which direction we traverse the curve, in other words where we
start and where we finish. Then we just write
∫ Γω,
without mentioning the specific γ. Furthermore, for a simple closed path, it does not even matter where we start the
parametrization. See the exercises.
Recall that simple means that γ restricted to (a, b) is one-to-one, that is, it is one-to-one except perhaps at the endpoints. We
also often relax the simple path condition a little bit. For example, as long as γ : [a, b] → R n is one-to-one except at finitely
many points. That is, there are only finitely many points p ∈ R n such that γ − 1(p) is more than one point. See the exercises.
The issue about the injectivity problem is illustrated by the following example.
Suppose γ : [0, 2π] → R 2 is given by γ(t) := (cos(t), sin(t) ) and β : [0, 2π] → R 2 is given by β(t) := (cos(2t), sin(2t) ). Notice
that γ ([0, 2π] ) = β ([0, 2π] ), and we travel around the same curve, the unit circle. But γ goes around the unit circle once in the
counter clockwise direction, and β goes around the unit circle twice (in the same direction). Then
It is sometimes convenient to define a path integral over γ : [a, b] → R n that is not a path. We define
( )
n
b
∫ γω := ∫ ∑ ω j (γ(t) )γ j (t)
′
a dt
j=1
Suppose γ : [a, b] → R n is a smooth path, and f is a continuous function defined on the image γ ([a, b] ). Then define
b
∫ γf ds := ∫ af (γ(t) )‖γ ′(t)‖ dt.
The definition for a piecewise smooth path is similar as before and is left to the reader.
The geometric idea of this integral is to find the “area under the graph of a function” as we move around the path γ. The line
integral of a function is also independent of the parametrization, and in this case, the orientation does not matter.
[mv:prop:lineintrepararam] Let γ : [a, b] → R n be a piecewise smooth path and γ ∘ h : [c, d] → R n a piecewise smooth
reparametrization. Suppose f is a continuous function defined on the set γ ([a, b] ). Then
∫ γ ∘ hf ds = ∫ γf ds.
Suppose first that h is orientation preserving and γ and h are both smooth. Then as before
b
∫ γf ds = ∫ a f (γ(t) )‖γ ′(t)‖ dt
d
= ∫ c f (γ (h(τ) ) )‖γ ′ (h(τ) )‖h ′ (τ) dτ
d
= ∫ c f (γ (h(τ) ) )‖γ ′ (h(τ) )h ′ (τ)‖ dτ
d
= ∫ c f ((γ ∘ h)(τ) )‖(γ ∘ h) ′ (τ)‖ dτ
= ∫ γ ∘ hf ds.
If h is orientation reversing it will swap the order of the limits on the integral but you also have to introduce a minus sign in
order to take h ′ inside the norm. The details, along with finishing the proof for piecewise smooth paths is left to the reader as .
Similarly as before, because of this proposition (and the exercises), if γ is simple, it does not matter which parametrization we
use. Therefore, if Γ = γ ([a, b] ) we can simply write
∫ Γf ds.
In this case we also do not need to worry about orientation, either way we get the same thing.
Let f(x, y) = x. Let C ⊂ R 2 be half of the unit circle for x ≥ 0. We wish to compute
∫ Cf ds.
Parametrize the curve C via γ : [\nicefrac− π2, \nicefracπ2] → R 2 defined as γ(t) := (cos(t), sin(t) ). Then
γ ′(t) = ( − sin(t), cos(t) ), and
π/2 π/2
∫ Cf ds = ∫ γf ds = ∫ − π / 2cos(t)√ ( − sin(t) ) 2 + (cos(t) ) 2 dt = ∫ − π / 2cos(t) dt = 2.
Suppose Γ ⊂ R n is parametrized by a simple piecewise smooth path γ : [a, b] → R n, that is γ ([a, b] ) = Γ. The we define the
length by
1
ℓ ([x, y] ) = ∫ [ x , y ] ds = ∫ 0‖y − x‖ dt = ‖y − x‖.
So the length of [x, y] is the distance between x and y in the euclidean metric.
A simple piecewise smooth path γ : [0, r] → R n is said to be an arc-length parametrization if
t
ℓ (γ ([0, t] )) = ∫ 0‖γ ′(τ)‖ dτ = t.
You can think of such a parametrization as moving around your curve at speed 1.
Exercises
Show that if φ : [a, b] → R n is piecewise smooth as we defined it, then φ is a continuous function.
Finish the proof of for orientation reversing reparametrizations.
Prove .
[mv:exercise:pathpiece] Finish the proof of for a) orientation reversing reparametrizations, and b) piecewise smooth paths and
reparametrizations.
[mv:exercise:linepiece] Finish the proof of for a) orientation reversing reparametrizations, and b) piecewise smooth paths and
reparametrizations.
Suppose γ : [a, b] → R n is a piecewise smooth path, and f is a continuous function defined on the image γ ([a, b] ). Provide a
definition of ∫ γf ds.
Suppose γ : [0, 1] → R n is a smooth path, and ω is a one-form defined on the image γ ([a, b] ). For r ∈ [0, 1], let
γ r : [0, r] → R n be defined as simply the restriction of γ to [0, r]. Show that the function h(r) := ∫ γ ω is a continuously
r
differentiable function on [0, 1].
Suppose γ : [a, b] → R n is a smooth path. Show that there exists an ϵ > 0 and a smooth function γ̃ : (a − ϵ, b + ϵ) → R n with
′
γ̃(t) = γ(t) for all t ∈ [a, b] and γ̃ (t) ≠ 0 for all t ∈ (a − ϵ, b + ϵ). That is, prove that a smooth path extends some small
distance past the end points.
Suppose α : [a, b] → R n and β : [c, d] → R n are piecewise smooth paths such that Γ := α ([a, b] ) = β ([c, d] ). Show that there
exist finitely many points {p 1, p 2, …, p k} ∈ Γ, such that the sets α − 1 ({p 1, p 2, …, p k} ) and β − 1 ({p 1, p 2, …, p k} ) are
partitions of [a, b] and [c, d], such that on any subinterval the paths are smooth (that is, they are partitions as in the definition
of piecewise smooth path).
a) Suppose γ : [a, b] → R n and α : [c, d] → R n are two smooth paths which are one-to-one and γ ([a, b] ) = α ([c, d] ). Then
there exists a smooth reparametrization h : [a, b] → [c, d] such that γ = α ∘ h. Hint: It should be not hard to find some h. The
trick is to show it is continuously differentiable with a nonvanishing derivative. You will want to apply the implicit function
theorem and it may at first seem the dimensions don’t seem to work out.
b) Prove the same thing as part a, but now for simple closed paths with the further assumption that γ(a) = γ(b) = α(c) = α(d).
c) Prove parts a) and b) but for piecewise smooth paths, obtaining piecewise smooth reparametrizations. Hint: The trick is to
Suppose α : [a, b] → R n and β : [b, c] → R n are piecewise smooth paths with α(b) = β(b). Let γ : [a, c] → R n be defined by
γ(t) := { α(t)
β(t)
if t ∈ [a, b],
if t ∈ (b, c].
Show that γ is a piecewise smooth path, and that if ω is a one-form defined on the curve given by γ, then
∫ γω = ∫ αω + ∫ βω.
[mv:exercise:closedcurveintegral] Suppose γ : [a, b] → R n and β : [c, d] → R n are two simple piecewise smooth closed paths.
That is γ(a) = γ(b) and β(c) = β(d) and the restrictions γ | ( a , b ) and β | ( c , d ) are one-to-one. Suppose
Γ = γ ([a, b] ) = β ([c, d] ) and ω is a one-form defined on Γ ⊂ R n. Show that either
∫ γω = ∫ βω, or ∫ γω = − ∫ βω.
In particular, the notation ∫ Γω makes sense if we indicate the direction in which the integral is evaluated. Hint: see previous
three exercises.
[mv:exercise:curveintegral] Suppose γ : [a, b] → R n and β : [c, d] → R n are two piecewise smooth paths which are one-to-one
except at finitely many points. That is, there is at most finitely many points p ∈ R n such that γ − 1(p) or β − 1(p) contains more
than one point. Suppose Γ = γ ([a, b] ) = β ([c, d] ) and ω is a one-form defined on Γ ⊂ R n. Show that either
∫ γω = ∫ βω, or ∫ γω = − ∫ βω.
In particular, the notation ∫ Γω makes sense if we indicate the direction in which the integral is evaluated.
Hint: same hint as the last exercise.
(
Define γ : [0, 1] → R 2 by γ(t) := t 3sin(\nicefrac1t), t (3t 2sin(\nicefrac1t) − tcos(\nicefrac1t) )
2
) for t ≠ 0 and γ(0) = (0, 0).
Show that:
a) γ is continuously differentiable on [0, 1].
b) Show that there exists an infinite sequence {t n} in [0, 1] converging to 0, such that γ ′(t n) = (0, 0).
c) Show that the points γ(t n) lie on the line y = 0 and such that the x-coordinate of γ(t n) alternates between positive and
negative (if they do not alternate you only found a subsequence and you need to find them all).
d) Show that there is no piecewise smooth α whose image equals γ ([0, 1] ). Hint: look at part c) and show that α ′ must be zero
where it reaches the origin.
e) (Computer) if you know a plotting software that allows you to plot parametric curves, make a plot of the curve, but only for
t in the range [0, 0.1] otherwise you will not see the behavior. In particular, you should notice that γ ([0, 1] ) has infinitely many
“corners” near the origin.
Path independence
Note: 2 lectures
Path independent integrals
Let U ⊂ R n be a set and ω a one-form defined on U, The integral of ω is said to be path independent if for any two points
x, y ∈ U and any two piecewise smooth paths γ : [a, b] → U and β : [c, d] → U such that γ(a) = β(c) = x and γ(b) = β(d) = y
we have
∫ γω = ∫ βω.
y
∫ xω := ∫ γω = ∫ βω.
Not every one-form gives a path independent integral. In fact, most do not.
Let γ : [0, 1] → R 2 be the path γ(t) = (t, 0) going from (0, 0) to (1, 0). Let β : [0, 1] → R 2 be the path β(t) = (t, (1 − t)t ) also
going between the same points. Then
1 ′ 1
∫ γy dx = ∫ 0 γ 2(t)γ 1 (t) dt = ∫ 0 0(1) dt = 0,
1 ′ 1 1
∫ βy dx = ∫ 0 β 2(t)β 1 (t) dt = ∫ 0 (1 − t)t(1) dt = 6
.
So the integral of y dx is not path independent. In particular, ∫ (( 10 ,, 00 )) y dx does not make sense.
Let U ⊂ R n be an open set and f : U → R a continuously differentiable function. Then the one-form
∂f ∂f ∂f
df := dx 1 + dx 2 + ⋯ + dx
∂x 1 ∂x 2 ∂x n n
y
∫ xω
is path independent (for all x, y ∈ U) if and only if there exists a continuously differentiable f : U → R such that ω = df.
In fact, if such an f exists, then for any two points x, y ∈ U
y
∫ xω = f(y) − f(x).
In other words if we fix p ∈ U, then f(x) = C + ∫ xp ω.
First suppose that the integral is path independent. Pick p ∈ U and define
x
f(x) := ∫ p ω.
∂f
Write ω = ω 1dx 1 + ω 2dx 2 + ⋯ + ω ndx n. We wish to show that for every j = 1, 2, …, n, the partial derivative ∂x j
exists and is
equal to ω j.
Let e j be an arbitrary standard basis vector. Compute
f(x + he j) − f(x)
(∫ )
1 x + he j x 1 x + he j
− ∫ pω =
h ∫x
= ω ω,
h h p
x + he j x x + he j
which follows by and path indepdendence as ∫ p ω = ∫p ω + ∫x ω, because we could have picked a path from p to x + he j
that also happens to pass through x, and then cut this path in two.
1 x + he j 1 1 1 1
h ∫x h ∫γ h ∫0 ∫ 0ω j(x + the j) dt.
ω= ω= ω j(x + the j)h dt =
We wish to take the limit as h → 0. The function ω j is continuous. So given ϵ > 0, h can be small enough so that
| |
|ω(x) − ω(y)| < ϵ, whenever ‖x − y‖ ≤ |h|. Therefore, ω j(x + the j) − ω j(x) < ϵ for all t ∈ [0, 1], and we estimate
|∫ 1
0 ω j(x + the j) dt − ω(x) = | |∫ ( 1
0 |
ω j(x + the j) − ω(x) ) dt ≤ ϵ.
That is,
f(x + he j) − f(x)
lim = ω j(x),
h→0 h
which is what we wanted that is df = ω. As ω j are continuous for all j, we find that f has continuous partial derivatives and
therefore is continuously differentiable.
For the other direction suppose f exists such that df = ω. Suppose we take a smooth path γ : [a, b] → U such that γ(a) = x and
γ(b) = y, then
∂f ∂f ∂f
∫ γdf = ∫ a ( ∂x 1 (γ(t) )γ 1 (t) + ∂x 2 (γ(t) )γ 2 (t) + ⋯ + ∂x n (γ(t) )γ n (t) ) dt
b ′ ′ ′
b d
= ∫ a dt [f (γ(t) )] dt
= f(y) − f(x).
The value of the integral only depends on x and y, not the path taken. Therefore the integral is path independent. We leave
checking this for a piecewise smooth path as an exercise to the reader.
Let U ⊂ R n be a path connected open set and ω a 1-form defined on U. Then ω = df for some continuously differentiable
f : U → R if and only if
∫ γω = 0
for every piecewise smooth closed path γ : [a, b] → U.
Suppose first that ω = df and let γ be a piecewise smooth closed path. Then we from above we have that
∫ γω = f (γ(b) ) − f (γ(a) ) = 0,
because γ(a) = γ(b) for a closed path.
Now suppose that for every piecewise smooth closed path γ, ∫ γω = 0. Let x, y be two points in U and let α : [0, 1] → U and
β : [0, 1] → U be two piecewise smooth paths with α(0) = β(0) = x and α(1) = β(1) = y. Then let γ : [0, 2] → U be defined by
γ(t) := { α(t)
β(2 − t)
if t ∈ [0, 1],
if t ∈ (1, 2].
Let U ⊂ R n be an open set and p ∈ U. We say U is a star shaped domain with respect to p if for any other point x ∈ U, the
line segment between p and x is in U, that is, if (1 − t)p + tx ∈ U for all t ∈ [0, 1]. If we say simply star shaped, then U is star
shaped with respect to some p ∈ U.
Notice the difference between star shaped and convex. A convex domain is star shaped, but a star shaped domain need not be
convex.
Let U ⊂ R n be a star shaped domain and ω a continuously differentiable one-form defined on U. That is, if
then ω 1, ω 2, …, ω n are continuously differentiable functions. Suppose that for every j and k
∂ω j ∂ω k
= ,
∂x k ∂x j
∂ω j ∂ 2f ∂ 2f ∂ω k
= = = .
∂x k ∂x k∂x j ∂x j∂x k ∂x j
The condition is therefore clearly necessary. The lemma says that it is sufficient for a star shaped U.
Suppose U is star shaped with respect to y = (y 1, y 2, …, y n) ∈ U.
Given x = (x 1, x 2, …, x n) ∈ U, define the path γ : [0, 1] → U as γ(t) := (1 − t)y + tx, so γ ′(t) = x − y. Then let
( )
n
1
f(x) := ∫ γω = ∫ ∑ ω k ((1 − t)y + tx )(x k − y k)
0
dt.
k=1
We differentiate in x j under the integral. We can do that since everything, including the partials themselves are continuous.
(( ) )
n
∂ω j
1
= ∫ 0
∑ ∂x k (
(1 − t)y + tx )t(x k − y k) + ω j ((1 − t)y + tx ) dt
k=1
d
∫ 0 dt [tω j ((1 − t)y + tx ) ] dt
1
=
= ω j(x).
−y x
ω(x, y) := 2 2 dx + dy
x +y x + y2
2
∂
∂y [ ] [ ]
x2
−y
+ y2
=
∂
∂x x2
x
+ y2
.
However, there is no f : R 2 ∖ {0} → R such that df = ω. We saw in if we integrate from (1, 0) to (1, 0) along the unit circle,
that is γ(t) = (cos(t), sin(t) ) for t ∈ [0, 2π] we got 2π and not 0 as it should be if the integral is path independent or in other
words if there would exist an f such that df = ω.
Vector fields
A common object to integrate is a so-called vector field. That is an assignment of a vector at each point of a domain.
Let U ⊂ R n be a set. A continuous function v : U → R n is called a vector field. Write v = (v 1, v 2, …, v n).
Given a smooth path γ : [a, b] → R n with γ ([a, b] ) ⊂ U we define the path integral of the vectorfield v as
b
∫ γv ⋅ dγ := ∫ av (γ(t) ) ⋅ γ ′(t) dt,
where the dot in the definition is the standard dot product. Again the definition of a piecewise smooth path is done by
integrating over each smooth interval and adding the result.
If we unravel the definition we find that
y
∫ xv ⋅ dγ
is path independent (so for any γ) if and only if v = ∇f, that is the gradient of a function. The function f is then called the
potential for v.
A vector field v whose path integrals are path independent is called a conservative vector field. The naming comes from the
fact that such vector fields arise in physical systems where a certain quantity, the energy is conserved.
Finish the proof of , that is, we only proved the second direction for a smooth path, not a piecewise smooth path.
Show that a star shaped domain U ⊂ R n is path connected.
Show that U := R 2 ∖ {(x, y) ∈ R 2 : x ≤ 0, y = 0} is star shaped and find all points (x 0, y 0) ∈ U such that U is star shaped with
respect to (x 0, y 0).
Suppose U 1 and U 2 are two open sets in R n with U 1 ∩ U 2 nonempty and connected. Suppose there exists an f 1 : U 1 → R and
f 2 : U 2 → R, both twice continuously differentiable such that df 1 = df 2 on U 1 ∩ U 2. Then there exists a twice differentiable
function F : U 1 ∪ U 2 → R such that dF = df 1 on U 1 and dF = df 2 on U 2.
Let γ : [a, b] → R n be a simple nonclosed piecewise smooth path (so γ is one-to-one). Suppose ω is a continuously
∂ω j ∂ω k
differentiable one-form defined on some open set V with γ ([a, b] ) ⊂ V and ∂x k
= ∂x j
for all j and k. Prove that there exists
an open set U with γ ([a, b] ) ⊂ U ⊂ V and a twice continuously differentiable function f : U → R such that df = ω.
Hint 1: γ ([a, b] ) is compact.
Hint 2: Show that you can cover the curve by finitely many balls in sequence so that the kth ball only intersects the (k − 1)th
ball.
Hint 3: See previous exercise.
a) Show that a connected open set is path connected. Hint: Start with two points x and y in a connected set U, and let U x ⊂ U
is the set of points that are reachable by a path from x and similarly for U y. Show that both sets are open, since they are
nonempty (x ∈ U x and y ∈ U y) it must be that U x = U y = U.
b) Prove the converse that is, a path connected set U ⊂ R n is connected. Hint: for contradiction assume there exist two open
and disjoint nonempty open sets and then assume there is a piecewise smooth (and therefore continuous) path between a point
in one to a point in the other.
Usually path connectedness is defined using just continuous paths rather than piecewise smooth paths. Prove that the
definitions are equivalent, in other words prove the following statement:
Suppose U ⊂ R n is such that for any x, y ∈ U, there exists a continuous function γ : [a, b] → U such that γ(a) = x and
γ(b) = y. Then U is path connected (in other words, then there exists a piecewise smooth path).
Take
−y x
ω(x, y) = dx + dy
x2 + y2 x2 + y2
defined on R 2 ∖ {(0, 0)}. Let γ : [a, b] → R 2 ∖ {(0, 0)} be a closed piecewise smooth path. Let
R := {(x, y) ∈ R 2 : x ≤ 0 and y = 0}. Suppose R ∩ γ ([a, b] ) is a finite set of k points. Then
∫ γω = 2πℓ
for some integer ℓ with |ℓ| ≤ k.
Hint 1: First prove that for a path β that starts and end on R but does not intersect it otherwise, you find that ∫ βω is − 2π, 0, or
2π. Hint 2: You proved above that R 2 ∖ R is star shaped.
Note: The number ℓ is called the winding number it measures how many times does γ wind around the origin in the clockwise
direction.
Multivariable integral
[x 1 , j , x 1 , j ] × [x 2 , j , x 2 , j ] × ⋯ × [x n , j , x n , j ].
1−1 1 2−1 2 n−1 n
For simplicity, we order the subrectangles somehow and we say {R 1, R 2, …, R N} are the subrectangles corresponding to the
partition P of R. More simply, we say they are the subrectangles of P. In other words, we subdivided the original rectangle into
many smaller subrectangles. See . It is not difficult to see that these subrectangles cover our original R, and their volume sums
to that of R. That is,
N N
When
R k = [x 1 , j , x 1 , j ] × [x 2 , j , x 2 , j ] × ⋯ × [x n , j , x n , j ],
1−1 1 2−1 2 n−1 n
then
Let R ⊂ R n be a closed rectangle and let f : R → R be a bounded function. Let P be a partition of [a, b] and suppose that there
are N subrectangles R 1, R 2, …, R N. Define
We call L(P, f) the lower Darboux sum and U(P, f) the upper Darboux sum.
The indexing in the definition may be complicated, but fortunately we generally do not need to go back directly to the
definition often. We start proving facts about the Darboux sums analogous to the one-variable results.
[mv:sumulbound:prop] Suppose R ⊂ R n is a closed rectangle and f : R → R is a bounded function. Let m, M ∈ R be such that
for all x ∈ R we have m ≤ f(x) ≤ M. For any partition P of R we have
N
Let P be a partition. Then for all i we have m ≤ m i and M i ≤ M. Also m i ≤ M i for all i. Finally ∑ i = 1V(R i) = V(R). Therefore,
( )
N N N
( )
N N N
It is not difficult to see that if P̃ is a refinement of P, then subrectangles of P are unions of subrectangles of P̃. Simply put, in a
refinement we take the subrectangles of P, and we cut them into smaller subrectangles. See .
Let m j := inf {f(x) : x ∈ R j}, and m̃ j := inf {f(x) :∈ R̃ j} as usual. Notice also that if j ∈ I k, then m k ≤ m̃ j. Then
N N N Ñ
The key point of this next proposition is that the lower Darboux integral is less than or equal to the upper Darboux integral.
[mv:intulbound:prop] Let R ⊂ R n be a closed rectangle and f : R → R a bounded function. Let m, M ∈ R be such that for all
x ∈ R we have m ≤ f(x) ≤ M. Then
¯
mV(R) ≤ ∫ Rf ≤ ∫ Rf ≤ M V(R).
_
Taking supremum of L(P, f) and infimum of U(P, f) over all P, we obtain the first and the last inequality.
The key inequality in [mv:intulbound:eq] is the middle one. Let P = (P 1, P 2, …, P n) and Q = (Q 1, Q 2, …, Q n) be partitions of
R. Define P̃ = (P̃ 1, P̃ 2, …, P̃ n) by letting P̃ k = P k ∪ Q k. Then P̃ is a partition of R as can easily be checked, and P̃ is a
refinement of P and a refinement of Q. By , L(P, f) ≤ L(P̃, f) and U(P̃, f) ≤ U(Q, f). Therefore,
In other words, for two arbitrary partitions P and Q we have L(P, f) ≤ U(Q, f). Via Proposition 1.2.7 from volume I, we obtain
¯
In other words ∫ R f ≤ ∫ R f.
_
The Riemann integral
We have all we need to define the Riemann integral in n-dimensions over rectangles. Again, the Riemann integral is only
defined on a certain class of functions, called the Riemann integrable functions.
Then f is said to be Riemann integrable, and we sometimes say simply integrable. The set of Riemann integrable functions on
R is denoted by R(R). When f ∈ R(R) we define the Riemann integral
∫ Rf := ∫ Rf = ∫ Rf.
_
∫ Rf(x) dA.
implies immediately the following proposition.
[mv:intbound:prop] Let f : R → R be a Riemann integrable function on a closed rectangle R ⊂ R n. Let m, M ∈ R be such that
m ≤ f(x) ≤ M for all x ∈ R. Then
mV(R) ≤ ∫ Rf ≤ M V(R).
A constant function is Riemann integrable. Suppose f(x) = c for all x on R. Then
¯
cV(R) ≤ ∫ Rf ≤ ∫ Rf ≤ cV(R).
_
∫ Rαf = α∫ Rf.
2. f + g is in R(R) and
∫ R(f + g) = ∫ Rf + ∫ Rg.
Let R ⊂ R n be a closed rectangle, let f and g be in R(R), and suppose f(x) ≤ g(x) for all x ∈ R. Then
∫ Rf ≤ ∫ Rg.
Checking for integrability using the definition often involves the following technique, as in the single variable case.
[mv:prop:upperlowerepsilon] Let R ⊂ R n be a closed rectangle and f : R → R a bounded function. Then f ∈ R (R) if and only
if for every ϵ > 0, there exists a partition P of R such that
First, if f is integrable, then clearly the supremum of L(P, f) and infimum of U(P, f) must be equal and hence the infimum of
U(P, f) − L(P, f) is zero. Therefore for every ϵ > 0 there must be some partition P such that U(P, f) − L(P, f) < ϵ.
For the other direction, given an ϵ > 0 find P such that U(P, f) − L(P, f) < ϵ.
¯ ¯
As ∫ R f ≥ ∫ R f and the above holds for every ϵ > 0, we conclude ∫ R f = ∫ R f and f ∈ R (R).
_ _
For simplicity if f : S → R is a function and R ⊂ S is a closed rectangle, then if the restriction f | R is integrable we say f is
integrable on R, or f ∈ R(R) and we write
∫ Rf := ∫ Rf | R.
[mv:prop:integralsmallerset] For a closed rectangle S ⊂ R n, if f : S → R is integrable and R ⊂ S is a closed rectangle, then f is
integrable over R.
Given ϵ > 0, we find a partition P of S such that U(P, f) − L(P, f) < ϵ. By making a refinement of P if necessary, we assume
that the endpoints of R are in P. In other words, R is a union of subrectangles of P. The subrectangles of P divide into two
collections, ones that are subsets of R and ones whose intersection with the interior of R is empty. Suppose R 1, R 2…, R K are
the subrectangles that are subsets of R and let R K + 1, …, R N be the rest. Let P̃ be the partition of R composed of those
subrectangles of P contained in R. Using the same notation as before,
K N
Therefore, f | R is integrable.
‖x − y‖ ≤ √n α.
− y 1) 2 + (x 2 − y 2) 2 + ⋯ + (x n − y n) 2
‖x − y‖ =
√(x 1
− a 1) 2 + (b 2 − a 2) 2 + ⋯ + (b n − a n) 2
√
≤ (b 1
≤ √α2 + α2 + ⋯ + α2 = √n α. \qedhere
[mv:thm:contintrect] Let R ⊂ R n be a closed rectangle and f : R → R a continuous function, then f ∈ R (R).
The proof is analogous to the one variable proof with some complications. The set R is a closed and bounded subset of R n, and
hence compact. So f is not just continuous, but in fact uniformly continuous by Theorem 7.5 from volume I. Let ϵ > 0 be
ϵ
given. Find a δ > 0 such that ‖x − y‖ < δ implies |f(x) − f(y)| < V(R)
.
ϵ
f(x) − f(y) ≤ |f(x) − f(y)| < .
V(R)
As f is continuous on R k, it attains a maximum and a minimum on this subrectangle. Let x be a point where f attains the
maximum and y be a point where f attains the minimum. Then f(x) = M k and f(y) = m k in the notation from the definition of
the integral. Therefore,
ϵ
M i − m i = f(x) − f(y) < .
V(R)
And so
( )( )
N N
U(P, f) − L(P, f) = ∑ M kV(R k) − ∑ m kV(R k)
k=1 k=1
= ∑ (M k − m k)V(R k)
k=1
N
ϵ
<
V(R) k = 1
∑ V(R k) = ϵ.
¯
supp(f) := {x ∈ U : f(x) ≠ 0},
where the closure is with respect to the subspace topology on U. Recall that taking the closure with respect to the subspace
¯
topology is the same as {x ∈ U : f(x) ≠ 0} ∩ U, now taking the closure with respect to the ambient euclidean space R n. In
particular, supp(f) ⊂ U. That is, the support is the closure (in U) of the set of points where the function is nonzero. Its
complement in U is open. If x ∈ U and x is not in the support of f, then f is constantly zero in a whole neighborhood of x.
A function f is said to have compact support if supp(f) is a compact set.
is continuous on B(0, 1) and its support is the smaller closed ball C(0, \nicefrac12). As that is a compact set, f has compact
support.
Similarly g : B(0, 1) → R defined by
g(x, y) :=
{ 0
x
if x ≤ 0,
if x > 0,
We will mostly consider the case when U = R n. In light of the following exercise, this is not an oversimplification.
Suppose U ⊂ R n is open and f : U → R is continuous and of compact support. Show that the function f̃ : R n → R
f̃(x) := { f(x)
0
if x ∈ U,
otherwise,
is continuous.
1
On the other hand for the unit disc B(0, 1) ⊂ R 2, the function continuous f : B(0, 1) → R defined by f(x, y) := sin ( ),
1 − x2 − y2
does not have compact support; as f is not constantly zero on neighborhood of any point in B(0, 1), we know that the support is
the entire disc B(0, 1). The function clearly does not extend as above to a continuous function. In fact it is not difficult to show
that it cannot be extended in any way whatsoever to be continuous on all of R 2 (the boundary of the disc is the problem).
[mv:prop:rectanglessupp] Suppose f : R n → R be a continuous function with compact support. If R and S are closed rectangles
such that supp(f) ⊂ R and supp(f) ⊂ S, then
∫ Sf = ∫ Rf.
As f is continuous, it is automatically integrable on the rectangles R, S, and R ∩ S. Then says ∫ Sf = ∫ S ∩ Rf = ∫ Rf.
Because of this proposition, when f : R n → R has compact support and is integrable over a rectangle R containing the support
we write
∫ f := ∫ Rf or ∫ Rnf := ∫ Rf.
For example, if f is continuous and of compact support, then ∫ R nf exists.
Exercises
Prove .
Suppose R is a rectangle with the length of one of the sides equal to 0. For any bounded function f, show that f ∈ R (R) and
∫ Rf = 0.
[mv:zerosiderectangle] Suppose R is a rectangle with the length of one of the sides equal to 0, and suppose S is a rectangle
with R ⊂ S. If f is a bounded function such that f(x) = 0 for x ∈ R ∖ S, show that f ∈ R(R) and ∫ Rf = 0.
∫ Sh = 0.
Hint: Write h as a sum of functions as in .
[mv:zerooutside] Suppose R and R ′ are two closed rectangles with R ′ ⊂ R. Suppose f : R → R is in R (R ′ ) and f(x) = 0 for
x ∈ R ∖ R ′ . Show that f ∈ R(R) and
∫ R ′ f = ∫ Rf.
Suppose R ′ ⊂ R n and R ″ ⊂ R n are two rectangles such that R = R ′ ∪ R ″ is a rectangle, and R ′ ∩ R ″ is rectangle with one
of the sides having length 0 (that is V(R ′ ∩ R ″ ) = 0). Let f : R → R be a function such that f ∈ R(R ′ ) and f ∈ R(R ″ ). Show
that f ∈ R(R) and
∫ Rf = ∫ R ′ f + ∫ R ″ f.
Hint: see previous exercise.
Prove a stronger version of . Suppose f : R n → R be a function with compact support but not necessarily continuous. Prove that
if R is a closed rectangle such that supp(f) ⊂ R and f is integrable over R, then for any other closed rectangle S with
supp(f) ⊂ S, the function f is integrable over S and ∫ Sf = ∫ Rf. Hint: See .
Suppose R and S are closed rectangles of R n. Define f : R n → R as f(x) := 1 if x ∈ R, and f(x) := 0 otherwise. Prove f is
integrable over S and compute ∫ Sf. Hint: Consider S ∩ R.
f(x, y) := { 1
0
if x = y,
else.
f(x, y) := { 1
0
if x ∈ Q or y ∈ Q,
else.
lim
j→∞
∫ Sjf = ∫ Rf.
Suppose f : [ − 1, 1] × [ − 1, 1] → R is a Riemann integrable function such f(x) = − f( − x). Using the definition prove
∫ [ − 1 , 1 ] × [ − 1 , 1 ] f = 0.
Iterated integrals and Fubini theorem
Note: 1–2 lectures
The Riemann integral in several variables is hard to compute from the definition. For one-dimensional Riemann integral we
have the fundamental theorem of calculus and we can compute many integrals without having to appeal to the definition of the
integral. We will rewrite a Riemann integral in several variables into several one-dimensional Riemann integrals by iterating.
However, if f : [0, 1] 2 → R is a Riemann integrable function, it is not immediately clear if the three expressions
f(x, y) := { 1
0
if x = \nicefrac12 and y ∈ Q,
otherwise.
1
∫ 0f(\nicefrac12, y) dy
1 1
does not exist, so we cannot even write ∫ 0 ∫ 0 f(x, y) dy dx.
Proof: Let us start with integrability of f. We simply take the partition of [0, 1] 2 where the partition in the x direction is
{0, \nicefrac12 − ϵ, \nicefrac12 + ϵ, 1} and in the y direction {0, 1} . The subrectangles of the partition are
R 1 := [0, \nicefrac12 − ϵ] × [0, 1], R 2 := [\nicefrac12 − ϵ, \nicefrac12 + ϵ] × [0, 1], R 3 := [\nicefrac12 + ϵ, 1] × [0, 1].
and
The upper and lower sum are arbitrarily close and the lower sum is always zero, so the function is integrable and ∫ Rf = 0.
For any y, the function that takes x to f(x, y) is zero except perhaps at a single point x = \nicefrac12. We know that such a
1 1 1
function is integrable and ∫ 0 f(x, y) dx = 0. Therefore, ∫ 0 ∫ 0 f(x, y) dx dy = 0.
However if x = \nicefrac12, the function that takes y to f(\nicefrac12, y) is the nonintegrable function that is 1 on the rationals
and 0 on the irrationals. See Example 5.1.4 from volume I.
We will solve this problem of undefined inside integrals by using the upper and lower integrals, which are always defined.
We split the coordinates of R n + m into two parts. That is, we write the coordinates on R n + m = R n × R m as (x, y) where x ∈ R n
and y ∈ R m. For a function f(x, y) we write
f x(y) := f(x, y)
f y(x) := f(x, y)
¯
g(x) := ∫ S f x and h(x) := ∫ S f x
_
∫ Rg = ∫ Rh = ∫ R × Sf.
In other words
∫ R × Sf = ∫R ( ) (
∫ Sf(x, y) dy dx =
_
¯
∫ R ∫ Sf(x, y) dy ) dx.
If it turns out that f x is integrable for all x, for example when f is continuous, then we obtain the more familiar
∫ R × Sf = ∫ R∫ Sf(x, y) dy dx.
Any partition of R × S is a concatenation of a partition of R and a partition of S. That is, write a partition of R × S as
(P, P ′ ) = (P 1, P 2, …, P n, P 1′ , P 2′ , …, P m′ ), where P = (P 1, P 2, …, P n) and P ′ = (P 1′ , P 2′ , …, P m′ ) are partitions of R and S
respectively. Let R 1, R 2, …, R N be the subrectangles of P and R 1′ , R 2′ , …, R K′ be the subrectangles of P ′ . Then the
′
subrectangles of (P, P ′ ) are R j × R k where 1 ≤ j ≤ N and 1 ≤ k ≤ K.
Let
′ ′
We notice that V(R j × R k ) = V(R j)V(R k ) and hence
( )
N K N K
If we let
K K
We thus obtain
( )
N
and we can make the right hand side arbitrarily small. As for any partition we have L ((P, P ′ ), f ) ≤ L(P, g) ≤ U ((P, P ′ ), f ) we
must have that ∫ Rg = ∫ R × Sf.
Similarly we have
and hence
We can also do the iterated integration in opposite order. The proof of this version is almost identical to version A, and we
leave it as an exercise to the reader.
[mv:fubinivB] Let R × S ⊂ R n × R m be a closed rectangle and f : R × S → R be integrable. The functions g : S → R and
h : S → R defined by
¯
g(y) := ∫ R f y and h(y) := ∫ R f y
_
∫ Sg = ∫ Sh = ∫ R × Sf.
That is we also have
∫ R × Sf = ∫S ( ) (
∫ Rf(x, y) dx dy =
_
¯
∫ S ∫ Rf(x, y) dx ) dy.
Next suppose that f x and f y are integrable for simplicity. For example, suppose that f is continuous. Then by putting the two
versions together we obtain the familiar
b d d b
∫ Rf = ∫ a∫ c f(x, y) dy dx = ∫ c ∫ af(x, y) dx dy.
And the Fubini theorem is commonly thought of as the theorem that allows us to swap the order of iterated integrals.
b1 b2 bn
∫ Rf = ∫ a1∫ a2⋯∫ anf(x 1, x 2, …, x n) dx n dx n − 1⋯dx 1.
Clearly we can also switch the order of integration to any order we please. We can also relax the continuity requirement by
making sure that all the intermediate functions are integrable, or by using upper or lower integrals.
Exercises
1 1
Compute ∫ 0 ∫ − 1xe xy dx dy in a simple way.
∫ Rf = 0.
Let R = [a, b] × [c, d] and f(x, y) := g(x)h(y) for two continuous functions g : [a, b] → R and h : [a, b] → R. Prove
( )( )
b
∫ Rf = ∫ a g ∫ c h
d
.
Compute
1 1 x2 − y2 1 1 x2 − y2
∫∫ 0 0 2
dx dy and ∫∫
0 0 2
dy dx.
(x 2 + y 2) (x 2 + y 2)
1
You will need to interpret the integrals as improper, that is, the limit of ∫ ϵ as ϵ → 0.
Suppose f(x, y) := g(x) where g : [a, b] → R is Riemann integrable. Show that f is Riemann integrable for any R = [a, b] × [c, d]
and
b
∫ Rf = (d − c)∫ ag.
Define f : [ − 1, 1] × [0, 1] → R by
f(x, y) := { x
0
if y ∈ Q,
else.
Show
1 1 1 1
a) ∫ 0 ∫ − 1f(x, y) dx dy exists, but ∫ − 1∫ 0 f(x, y) dy dx does not.
¯
1 1 1 1
b) Compute ∫ − 1∫ 0 f(x, y) dy dx
and ∫ − 1∫ 0 f(x, y) dy dx.
_
c) Show f is not Riemann integrable on [ − 1, 1] × [0, 1] (use Fubini).
Define f : [0, 1] × [0, 1] → R by
f(x, y) := { \nicefrac1q
0
if x ∈ Q, y ∈ Q, and y = \nicefracpq in lowest terms,
else.
∞
where the infimum is taken over all sequences {R j} of open rectangles such that S ⊂ ⋃ j = 1R j. In particular, S is of measure
zero or a null set if m ∗ (S) = 0.
The theory of measures on R n is a very complicated subject. We will only require measure-zero sets and so we focus on these.
The set S is of measure zero if for every ϵ > 0 there exist a sequence of open rectangles {R j} such that
∞ ∞
Furthermore, if S is measure zero and S ′ ⊂ S, then S ′ is of measure zero. We can in fact use the same exact rectangles.
It is sometimes more convenient to use balls instead of rectangles. In fact we can choose balls no bigger than a fixed radius.
[mv:prop:ballsnull] Let δ > 0 be given. A set S ⊂ R n is measure zero if and only if for every ϵ > 0, there exists a sequence of
open balls {B j}, where the radius of B j is r j < δ such that
∞ ∞
S⊂ ⋃ Bj and ∑ r jn < ϵ.
j=1 j=1
If R is a (closed or open) cube (rectangle with all sides equal) of side s, then R is contained in a closed ball of radius √n s by ,
and therefore in an open ball of size 2√n s.
Let s be a number that is less than the smallest side of R and also so that 2√n s < δ. We claim R is contained in a union of
closed cubes C 1, C 2, …, C k of sides s such that
∑ V(C j) ≤ 2 nV(R).
j=1
∞ ∞ ∞
∑ n
sk = ∑ V(C k) ≤ 2 ∑ V(R k) < 2 nϵ.
n
k=1 k=1 j=1
∑ r kn < 2 2nnϵ.
k=1
∞ ∞ ∞
The definition of outer measure could have been done with open balls as well, not just null sets. We leave this generalization to
the reader.
Examples and basic properties
The set Q n ⊂ R n of points with rational coordinates is a set of measure zero.
Proof: The set Q n is countable and therefore let us write it as a sequence q 1, q 2, …. For each q j find an open rectangle R j with
q j ∈ R j and V(R j) < ϵ2 − j. Then
∞ ∞ ∞
S= ⋃ S j,
j=1
∞
where S j are all measure zero sets. Let ϵ > 0 be given. For each j there exists a sequence of open rectangles {R j , k} k = 1 such
that
Sj ⊂ ⋃ Rj , k
k=1
and
Then
∞ ∞
S⊂ ⋃ ⋃ R j , k.
j=1 k=1
As V(R j , k) is always positive, the sum over all j and k can be done in any order. In particular, it can be done as
∞ ∞ ∞
| |
P s := {x ∈ R n : x k = c, x j ≤ s for all j ≠ k}
| |
R := {x ∈ R n : c − ϵ < x k < c + ϵ, x j < s + 1 for all j ≠ k}.
V(R) = 2ϵ (2(s + 1) ) n − 1.
P= ⋃ Pj
j=1
[a, b] ⊂ ⋃ (a j, b j).
j=1
We wish to bound ∑ (b j − a j) from below. Since [a, b] is compact, then there are only finitely many open intervals that still
cover [a, b]. As throwing out some of the intervals only makes the sum smaller, we only need to take the finite number of
intervals still covering [a, b]. If (a i, b i) ⊂ (a j, b j), then we can throw out (a i, b i) as well. Therefore we have
k k−1
[mv:prop:compactnull] Suppose E ⊂ R n is a compact set of measure zero. Then for every ϵ > 0, there exist finitely many
open rectangles R 1, R 2, …, R k such that
Also for any δ > 0, there exist finitely many open balls B 1, B 2, …, B k of radii r 1, r 2, …, r k < δ such that
E ⊂ B1 ∪ B2 ∪ ⋯ ∪ Bk and ∑ r jn < ϵ.
j=1
∞ ∞
By compactness, there are finitely many of these rectangles that still contain E. That is, there is some k such that
E ⊂ R 1 ∪ R 2 ∪ ⋯ ∪ R k. Hence
k ∞
The proof that we can choose balls instead of rectangles is left as an exercise.
[example:cantor] So that the reader is not under the impression that there are only very few measure zero sets and that these
are simple, let us give an uncountable, compact, measure zero subset in [0, 1]. For any x ∈ [0, 1] write the representation in
ternary notation
x= ∑ d n3 − n .
j=1
See §1.5 in volume I, in particular Exercise 1.5.4. Define the Cantor set C as
{
C := x ∈ [0, 1] : x = ∑ d n3 − n ,
j=1
}
where d j = 0 or d j = 2 for all j .
That is, x is in C if it has a ternary expansion in only 0’s and 2’s. If x has two expansions, as long as one of them does not have
any 1’s, then x is in C. Define C 0 := [0, 1] and
{
C k := x ∈ [0, 1] : x = ∑ d n3 − n , where d j = 0 or d j = 2 for all j = 1, 2, …, k . }
j=1
C= ⋂ C k.
k=1
k 2n
3. Furthermore, m ∗ (C k) = 1 − ∑ n = 1 .
3n + 1
4. Hence, m ∗ (C) = 0.
5. The set C is in one to one correspondence with [0, 1], in other words, uncountable.
See .
Images of null sets
Before we look at images of measure zero sets, let us see what a continuously differentiable function does to a ball.
[lemma:ballmapder] Suppose U ⊂ R n is an open set, B ⊂ U is an open or closed ball of radius at most r, f : B → R n is
continuously differentiable and suppose ‖f ′ (x)‖ ≤ M for all x ∈ B. Then f(B) ⊂ B ′ , where B ′ is a ball of radius at most Mr.
Without loss of generality assume B is a closed ball. The ball B is convex, and hence via , that ‖f(x) − f(y)‖ ≤ M‖x − y‖ for all
x, y in B. In particular, suppose B = C(y, r), then f(B) ⊂ C (f(y), Mr ).
The image of a measure zero set using a continuous map is not necessarily a measure zero set. However if we assume the
mapping is continuously differentiable, then the mapping cannot “stretch” the set too much.
First let us replace U by a smaller open set to make ‖f ′ (x)‖ bounded. At each point x ∈ E pick an open ball B(x, r x) such that
the closed ball C(x, r x) ⊂ U. By compactness we only need to take finitely many points x 1, x 2, …, x q to still cover E. Define
q q
U := ⋃ B(x j, r x ),
′
K := ⋃ C(x j, r x ).
j j
j=1 j=1
We have E ⊂ U ′ ⊂ K ⊂ U. The set K is compact. The function that takes x to ‖f ′ (x)‖ is continuous, and therefore there
exists an M > 0 such that ‖f ′ (x)‖ ≤ M for all x ∈ K. So without loss of generality we may replace U by U ′ and from now on
suppose that ‖f ′ (x)‖ ≤ M for all x ∈ U.
At each point x ∈ E pick a ball B(x, δ x) of maximum radius so that B(x, δ x) ⊂ U. Let δ = inf x ∈ Eδ x. Take a sequence
δy
{x j} ⊂ E so that δ x → δ. As E is compact, we can pick the sequence to be convergent to some y ∈ E. Once ‖x j − y‖ < 2
,
j
δy
then δ x > 2
by the triangle inequality. Therefore δ > 0.
j
E ⊂ B1 ∪ B2 ∪ ⋯ ∪ Bk and ∑ r jn < ϵ.
j=1
Suppose B 1′ , B 2′ , …, B k′ are the balls of radius Mr 1, Mr 2, …, Mr k from , such that f(B j) ⊂ B j′ for all j.
Exercises
Finish the proof of , that is, show that you can use balls instead of rectangles.
If A ⊂ B, then m ∗ (A) ≤ m ∗ (B).
Suppose X ⊂ R n is a set such that for every ϵ > 0 there exists a set Y such that X ⊂ Y and m ∗ (Y) ≤ ϵ. Prove that X is a
measure zero set.
Show that if R ⊂ R n is a closed rectangle, then m ∗ (R) = V(R).
The closure of a measure zero set can be quite large. Find an example set S ⊂ R n that is of measure zero, but whose closure
¯
S = R n.
Prove the general case of without using compactness:
a) Mimic the proof to first prove that the proposition holds if E is relatively compact; a set E ⊂ U is relatively compact if the
closure of E in the subspace topology on U is compact, or in other words if there exists a compact set K with K ⊂ U and
E ⊂ K.
Hint: The bound on the size of the derivative still holds, but you need to use countably many balls in the second part of the
proof. Be careful as the closure of E need no longer be measure zero.
b) Now prove it for any null set E.
\{ x \in U : d(x,y) \geq
Hint: First show that \nicefrac{1}{m} \text{ for all\)y U\(and } d(0,x) \leq m \} is a compact set for any m > 0.
Let U ⊂ R n be an open set and let f : U → R be a continuously differentiable function. Let G := {(x, y) ∈ U × R : y = f(x)} be
the graph of f. Show that f is of measure zero.
Given a closed rectangle R ⊂ R n, show that for any ϵ > 0 there exists a number s > 0 and finitely many open cubes
C 1, C 2, …, C k of side s such that R ⊂ C 1 ∪ C 2 ∪ ⋯ ∪ C k and
∑ V(C j) ≤ V(R) + ϵ.
j=1
Show that there exists a number k = k(n, r, δ) depending only on n, r and δ such the following holds. Given B(x, r) ⊂ R n and
δ > 0, there exist k open balls B 1, B 2, …, B k of radius at most δ such that B(x, r) ⊂ B 1 ∪ B 2 ∪ ⋯ ∪ B k. Note that you can
find k that really only depends on n and the ratio \nicefracδr.
Prove the statements of . That is, prove:
a) Each C k is a finite union of closed intervals, and so C is closed.
2n
b) m ∗ (C k) = 1 − ∑ kn = 1 .
3n + 1
c) m ∗ (C) = 0.
d) The set C is in one to one correspondence with [0, 1].
That is, o(f, x, δ) is the length of the smallest interval that contains the image f (B S(x, δ) ). Clearly o(f, x, δ) ≥ 0 and notice
o(f, x, δ) ≤ o(f, x, δ ′ ) whenever δ < δ ′ . Therefore, the limit as δ → 0 from the right exists and we define the oscillation of a
function f at x as
Hence, o(x, f) = 0.
On the other hand suppose that o(x, f) = 0. Given any ϵ > 0, find a δ > 0 such that o(f, x, δ) < ϵ. If y ∈ B S(x, δ), then
o(f, x, δ) < ϵ
Take any ξ ∈ B S(x, \nicefracδ2). Notice that B S(ξ, \nicefracδ2) ⊂ B S(x, δ). Therefore,
o(f, ξ, \nicefracδ2) = sup (f(y 1) − f(y 2) ) ≤ sup (f(y 1) − f(y 2) ) = o(f, x, δ) < ϵ.
y 1 , y 2 ∈ B S ( ξ , \nicefracδ2 ) y1 , y2 ∈ BS ( x , δ )
So o(f, ξ) < ϵ as well. As this is true for all ξ ∈ B S(x, \nicefracδ2) we get that G is open in the subset topology and S ∖ G is
closed as is claimed.
The set of Riemann integrable functions
We have seen that continuous functions are Riemann integrable, but we also know that certain kinds of discontinuities are
allowed. It turns out that as long as the discontinuities happen on a set of measure zero, the function is integrable and vice
versa.
Let R ⊂ R n be a closed rectangle and f : R → R a bounded function. Then f is Riemann integrable if and only if the set of
discontinuities of f is of measure zero (a null set).
S ϵ := {x ∈ R : o(f, x) ≥ ϵ}.
By S ϵ is closed and as it is a subset of R, which is bounded, S ϵ is compact. Furthermore, S ϵ ⊂ S and S is of measure zero. Via
there are finitely many open rectangles O 1, O 2, …, O k that cover S ϵ and ∑ V(O j) < ϵ.
The set T = R ∖ (O 1 ∪ ⋯ ∪ O k) is closed, bounded, and therefore compact. Furthermore for x ∈ T, we have o(f, x) < ϵ.
Hence for each x ∈ T, there exists a small closed rectangle T x with x in the interior of T x, such that
The interiors of the rectangles T x cover T. As T is compact there exist finitely many such rectangles T 1, T 2, …, T m that cover T
.
Take the rectangles T 1, T 2, …, T m and O 1, O 2, …, O k and construct a partition out of their endpoints. That is construct a
partition P of R with subrectangles R 1, R 2, …, R p such that every R j is contained in T ℓ for some ℓ or the closure of O ℓ for some
ℓ. Order the rectangles so that R 1, R 2, …, R q are those that are contained in some T ℓ, and R q + 1, R q + 2, …, R p are the rest. In
particular,
q p
Let m j and M j be the inf and sup of f over R j as before. If R j ⊂ T ℓ for some ℓ, then (M j − m j) < 2ϵ. Let B ∈ R be such that
|f(x)| ≤ B for all x ∈ R, so (M j − m j) < 2B over all rectangles. Then
( )( )
q p
= ∑ (M j − m j)V(R j) + ∑ (M j − m j)V(R j)
j=1 j=q+1
( )( )
q p
≤ ∑ 2ϵV(R j) + ∑ 2BV(R j)
j=1 j=q+1
Clearly, we can make the right hand side as small as we want and hence f is integrable.
For the other direction, suppose f is Riemann integrable over R. Let S be the set of discontinuities again and now let
S k := {x ∈ R : o(f, x) ≥ \nicefrac1k}.
Suppose R 1, R 2, …, R p are ordered so that the interiors of R 1, R 2, …, R q intersect S k, while the interiors of R q + 1, R q + 2, …, R p
are disjoint from S k. If x ∈ R j ∩ S k and x is in the interior of R j so sufficiently small balls are completely inside R j, then by
definition of S k we have M j − m j ≥ \nicefrac1k. Then
p q q
1
ϵ> ∑ (M j − m j)V(R j) ≥ ∑ (M j − m j)V(R j) ≥ k ∑ V(R j)
j=1 j=1 j=1
In other words ∑ qj= 1V(R j) < kϵ. Let G be the set of all boundaries of all the subrectangles of P. The set G is of measure zero
∘
(see ). Let R j denote the interior of R j, then
∘ ∘ ∘
S k ⊂ R 1 ∪ R 2 ∪ ⋯ ∪ R q ∪ G.
As G can be covered by open rectangles arbitrarily small volume, S k must be of measure zero. As
S= ⋃ Sk
k=1
and a countable union of measure zero sets is of measure zero, S is of measure zero.
Exercises
Suppose f : (a, b) × (c, d) → R is a bounded continuous function. Show that the integral of f over R = [a, b] × [c, d] makes
sense and is uniquely defined. That is, set f to be anything on the boundary of R and compute the integral.
Suppose R ⊂ R n is a closed rectangle. Show that R(R), the set of Riemann integrable functions, is an algebra. That is, show
that if f, g ∈ R(R) and a ∈ R, then af ∈ R(R), f + g ∈ R(R) and fg ∈ R(R).
Suppose R ⊂ R n is a closed rectangle and f : R → R is a bounded function which is zero except on a closed set E ⊂ R of
measure zero. Show that ∫ Rf exists and compute it.
Suppose R ⊂ R n is a closed rectangle and f : R → R and g : R → R are two Riemann integrable functions. Suppose f = g
except for a closed set E ⊂ R of measure zero. Show that ∫ Rf = ∫ Rg.
χ S(x) := { 1
0
if x ∈ S,
if x ∉ S.
A bounded set S is Jordan measurable if for some closed rectangle R such that S ⊂ R, the function χ S is in R (R). Take two
closed rectangles R and R′
with S ⊂ R and S ⊂ R ′, then R ∩ R′ is a closed rectangle also containing S. By and ,
χ S ∈ R(R ∩ R ) and so χ S ∈ R(R ′ ). Thus
′
V(S) := ∫ Rχ S,
A bounded set S ⊂ R n is Jordan measurable if and only if the boundary ∂S is a measure zero set.
Suppose R is a closed rectangle such that S is contained in the interior of R. If x ∈ ∂S, then for every δ > 0, the sets
S ∩ B(x, δ) (where χ S is 1) and the sets (R ∖ S) ∩ B(x, δ) (where χ S is 0) are both nonempty. So χ S is not continuous at x. If x is
¯
either in the interior of S or in the complement of the closure S, then χ S is either identically 1 or identically 0 in a whole
neighborhood of x and hence χ S is continuous at x. Therefore, the set of discontinuities of χ S is precisely the boundary ∂S. The
proposition then follows.
[prop:jordanmeas] Suppose S and T are bounded Jordan measurable sets. Then
¯
1. The closure S is Jordan measurable.
( )
k k
ℓ ℓ
m∗ (⋃ Rj
′∘
) = ∑ V(R j′ ∘ ).
j=1 j=1
Hence
m ∗ (S) ≥ m∗ ( j⋃= 1 ) ≥
Rj
′
m∗ ( j⋃= 1 Rj
′∘
) = j∑= 1 ′∘
V(R j ) = ∑ V(R j′ ) = L(P, f) ≥ V(S) − ϵ.
j=1
Let S ⊂ R n be a bounded Jordan measurable set. A bounded function f : S → R is said to be Riemann integrable on S, or
f ∈ R(S), if for a closed rectangle R such that S ⊂ R, the function f̃ : R → R defined by
f̃(x) = { f(x)
0
if x ∈ S,
otherwise,
∫ Sf := ∫ R f̃.
When f is defined on a larger set and we wish to integrate over S, then we apply the definition to the restriction f | S. In
particular, if f : R → R for a closed rectangle R, and S ⊂ R is a Jordan measurable subset, then
∫ Sf = ∫ Rfχ S.
If S ⊂ R n is a Jordan measurable set and f : S → R is a bounded continuous function, then f is integrable on S.
¯
Define the function f̃ as above for some closed rectangle R with S ⊂ R. If x ∈ R ∖ S, then f̃ is identically zero in a
neighborhood of x. Similarly if x is in the interior of S, then f̃ = f on a neighborhood of x and f is continuous at x. Therefore, f̃
is only ever possibly discontinuous at ∂S, which is a set of measure zero, and we are finished.
Images of Jordan measurable subsets
Finally, images of Jordan measurable sets are Jordan measurable under nice enough mappings. For simplicity, let us assume
that the Jacobian never vanishes.
Suppose S ⊂ R n is a closed bounded Jordan measurable set, and S ⊂ U for an open set U ⊂ R n. Suppose g : U → R n is a
one-to-one continuously differentiable mapping such that J g is never zero on S. Then g(S) is Jordan measurable.
Let T = g(S). We claim that the boundary ∂T is contained in the set g(∂S). Suppose the claim is proved. As S is Jordan
measurable, then ∂S is measure zero. Then g(∂S) is measure zero by . As ∂T ⊂ g(∂S), then T is Jordan measurable.
It is therefore left to prove the claim. First, S is closed and bounded and hence compact. By Lemma 7.5.4 from volume I,
T = g(S) is also compact and therefore closed. In particular, ∂T ⊂ T. Suppose y ∈ ∂T, then there must exist an x ∈ S such
that g(x) = y, and by hypothesis J g(x) ≠ 0.
We now use the inverse function theorem . We find a neighborhood V ⊂ U of x and an open set W such that the restriction f | V
is a one-to-one and onto function from V to W with a continuously differentiable inverse. In particular, g(x) = y ∈ W. As
y ∈ ∂T, there exists a sequence {y k} in W with lim y k = y and y k ∉ T. As g | V is invertible and in particular has a continuous
inverse, there exists a sequence {x k} in V such that g(x k) = y k and lim x k = x. Since y k ∉ T = g(S), clearly x k ∉ S. Since
x ∈ S, we conclude that x ∈ ∂S. The claim is proved, ∂T ⊂ g(∂S).
Exercises
Prove .
Prove that a bounded convex set is Jordan measurable. Hint: induction on dimension.
b f(x)
∫ Uf = ∫ a∫ g ( x ) f(x, y) dy dx.
Let us construct an example of a non-Jordan measurable open set. For simplicity we work first in one dimension. Let {r j} be
an enumeration of all rational numbers in (0, 1). Let (a j, b j) be open intervals such that (a j, b j) ⊂ (0, 1) for all j, r j ∈ (a j, b j),
and ∑ ∞ ∞
j = 1(b j − a j) < \nicefrac12. Now let U := ⋃ j = 1(a j, b j). Show that
a) The open intervals (a j, b j) as above actually exist.
b) ∂U = [0, 1] ∖ U.
c) ∂U is not of measure zero, and therefore U is not Jordan measurable.
d) Show that W := ((0, 1) × (0, 2) ) ∖ (U × [0, 1] ) ⊂ R 2 is a connected bounded open set in R 2 that is not Jordan measurable.
Green’s theorem
Note: 1 lecture
One of the most important theorems of analysis in several variables is the so-called generalized Stokes’ theorem, a
generalization of the fundamental theorem of calculus. Perhaps the most often used version is the version in two dimensions,
called Green’s theorem, which we prove here.
Let U ⊂ R 2 be a bounded connected open set. Suppose the boundary ∂U is a finite union of (the images of) simple piecewise
¯
smooth paths such that near each point p ∈ ∂U every neighborhood V of p contains points of R 2 ∖ U. Then U is called a
bounded domain with piecewise smooth boundary in R 2.
The condition about points outside the closure means that locally ∂U separates R 2 into “inside” and “outside”. The condition
prevents ∂U from being just a “cut” inside U. Therefore as we travel along the path in a certain orientation, there is a well
defined left and a right, and either it is U on the left and the complement of U on the right, or vice-versa. Thus by orientation
on U we mean the direction along which we travel along the paths. It is easy to switch orientation if needed by reparametrizing
the path.
If U ⊂ R 2 is a bounded domain with piecewise smooth boundary, let ∂U be oriented and γ : [a, b] → R 2 is a parametrization
of ∂U giving the orientation. Write γ(t) = (x(t), y(t) ). If the vector n(t) := ( − y ′ (t), x ′ (t) ) points into the domain, that is,
ϵn(t) + γ(t) is in U for all small enough ϵ > 0, then ∂U is positively oriented. Otherwise it is negatively oriented.
The vector n(t) turns γ ′(t) counterclockwise by 90 ∘ , that is to the left. A boundary is positively oriented, if when we travel
along the boundary in the direction of its orientation, the domain is “on our left”. For example, if U is a bounded domain with
“no holes”, that is ∂U is connected, then the positive orientation means we are travelling counterclockwise around ∂U. If we
do have “holes”, then we travel around them clockwise.
Let U ⊂ R 2 be a bounded domain with piecewise smooth boundary, then U is Jordan measurable.
We need that ∂U is of measure zero. As ∂U is a finite union of simple piecewise smooth paths, which themselves are finite
unions of smooth paths we need only show that a smooth path is of measure zero in R 2.
Let γ : [a, b] → R 2 be a smooth path. It is enough to show that γ ((a, b) ) is of measure zero, as adding two points, that is the
points γ(a) and γ(b), to a measure zero set still results in a measure zero set. Define
Suppose U ⊂ R 2 is a bounded domain with piecewise smooth boundary with the boundary positively oriented. Suppose P and
¯
Q are continuously differentiable functions defined on some open set that contains the closure U. Then
∫ ∂UP dx + Q dy = ∫ U ( ∂Q
∂x
−
∂P
∂y )
.
We stated Green’s theorem in general, although we will only prove a special version of it. That is, we will only prove it for a
special kind of domain. The general version follows from the special case by application of further geometry, and cutting up
the general domain into smaller domains on which to apply the special case. We will not prove the general case.
Let U ⊂ R 2 be a domain with piecewise smooth boundary. We say U is of type I if there exist numbers a < b, and continuous
functions f : [a, b] → R and g : [a, b] → R, such that
Similarly, U is of type II if there exist numbers c < d, and continuous functions h : [c, d] → R and k : [c, d] → R, such that
∫U ( )
−
∂P
∂y
=
b f(x)
∫a ∫g ( x ) ( −
∂P
∂y )
(x, y) dy dx
b
= ∫ a ( − P (x, f(x) ) + P (x, g(x) ) ) dx
b b
= ∫ a P (x, g(x) ) dx − ∫ a P (x, f(x) ) dx.
Now we wish to integrate P dx along the boundary. The one-form P dx integrates to zero when integrating along the straight
vertical lines in the boundary. Therefore it only is integrated along the top and along the bottom. As a parameter, x runs from
left to right. If we use the parametrizations that take x to (x, f(x) ) and to (x, g(x) ) we recognize path integrals above. However
the second path integral is in the wrong direction, the top should be going right to left, and so we must switch orientation.
b a
∫ ∂UP dx = ∫ aP (x, g(x) ) dx + ∫ bP (x, f(x) ) dx = ∫ U ( )−
∂P
∂y
.
Similarly, U is also of type II. The form Q dy integrates to zero along horizontal lines. So
∂Q d h(y) ∂Q b
∫ U ∂x = ∫ c ∫ k ( y ) ∂x (x, y) dx dy = ∫ a (Q (y, h(y) ) − Q (y, k(y) ) ) dx = ∫ ∂UQ dy.
Putting the two together we obtain
∂P ∂Q ∂Q ∂P
∫ ∂UP dx + Q dy = ∫ ∂UP dx + ∫ ∂UQ dy = ∫ U ( − ∂y )
+ ∫U
∂x
= ∫ U ( ∂x −
∂y )
. \qedhere
Let us illustrate the usefulness of Green’s theorem on a fundamental result about harmonic functions.
0=
1
(
2πr ∫ D r ∂x 2
∂ 2f
+
∂ 2f
∂y 2 )
1 ∂f ∂f
= ∫ −
2πr ∂D r ∂y
dx +
∂x
dy
1 2π ∂f
=
2πr ∫ 0 − (
∂y ( 0
x + rcos(t), y 0 + rsin(t) )( − rsin(t) )
∂f
+
∂x ( 0
x + rcos(t), y 0 + rsin(t) )rcos(t) dt )
=
d
[
dr 2π ∫ 0
1 2π
]
f (x 0 + rcos(t), y 0 + rsin(t) ) dt .
1 2π
Let g(r) := 2π ∫ 0 f (x 0 + rcos(t), y 0 + rsin(t) ) dt. Then g ′ (r) = 0 for all r > 0. The function is constant for r > 0 and continuous
at r = 0 (exercise). Therefore g(0) = g(r) for all r > 0. Therefore,
1 2π
2π ∫ 0 ( 0
g(r) = g(0) = f x + 0cos(t), y 0 + 0sin(t) ) dt = f(x 0, y 0).
1 2π 1
2π ∫ 0 ( 0
f x + rcos(t), y 0 + rsin(t) ) dt =
2πr ∫ ∂D r
f(x 0, y 0) = f ds.
That is, the value at p = (x 0, y 0) is the average over a circle of any radius r centered at (x 0, y 0).
Exercises
[green:balltype3orient] Prove that a disc B(p, r) ⊂ R 2 is a type III domain, and prove that the orientation given by the
parametrization γ(t) = (x 0 + rcos(t), y 0 + rsin(t) ) where p = (x 0, y 0) is the positive orientation of the boundary ∂B(p, r).
Prove that any bounded domain with piecewise smooth boundary that is convex is a type III domain.
Suppose V ⊂ R 2 is a domain with piecewise smooth boundary that is a type III domain and suppose that U ⊂ R 2 is a domain
¯ ∂f ∂f
such that V ⊂ U. Suppose f : U → R is a twice continuously differentiable function. Prove that ∫ ∂V ∂x dx + ∂y dy = 0.
1. Subscripts are used for many purposes, so sometimes we may have several vectors that may also be identified by subscript,
such as a finite or infinite sequence of vectors y 1, y 2, ….↩
2. If you want a very funky vector space over a different field, R itself is a vector space over the rational numbers.↩
3. The matrix from representing f ′ (x) is sometimes called the Jacobian matrix.↩
4. The word “smooth” is used sometimes for continuously differentiable and sometimes for infinitely differentiable functions
in the literature.↩
5. Normally only a continuous path is used in this definition, but for open sets the two definitions are equivalent. See the
exercises.↩
1 5/26/2021
11.1: Riemann integral over Rectangles
Riemann integral over rectangles
Note: FIXME1 lectures
As in chapter FIXME, we define the Riemann integral using the Darboux upper and lower integrals. The ideas in this section are very
similar to integration in one dimension. The complication is mostly notational.
intervals [a , b ], [a , b ], … , [a , b ]. That is, for every k there is an integer ℓ and the finite set of numbers
1 1 2 2 n n
k
= { x , x , x , … , x } such that
k k k k k
P
0 1 2 ℓk
k k k k k k k
a =x <x <x <⋯ <x <x =b . (11.1.2)
0 1 2 ℓk −1 ℓk
For simplicity, we order the subrectangles somehow and we say {R , R , … , R } are the subrectangles corresponding to the 1 2 N
partition P of R . In other words we subdivide the original rectangle into many smaller subrectangles. It is not difficult to see that
these subrectangles cover our original R , and their volume sums to that of R . That is
N N
j=1 j=1
When
1 1 2 2 n n
Rk = [ x ,x ] × [x ,x ] × ⋯ × [x ,x ] (11.1.5)
j −1 j j −1 j j −1 j
1 1 2 2 n n
then
1 2 n 1 1 2 2 n n
V (Rk ) = Δx Δx ⋯ Δx = (x −x )(x −x ) ⋯ (x −x ). (11.1.6)
j1 j2 jn j1 j1 −1 j2 j2 −1 jn jn −1
Let R ⊂ R be a closed rectangle and let f : R → R be a bounded function. Let P be a partition of [a, b]. Let R be a subrectangle
n
i
Mi := sup{f (x) : x ∈ Ri },
L(P , f ) := ∑ mi V (Ri ),
i=1
U (P , f ) := ∑ Mi V (Ri ).
i=1
We call L(P , f ) the lower Darboux sum and U (P , f ) the upper Darboux sum.
We start proving facts about the Darboux sums analogous to the one-variable results.
[mv:sumulbound:prop] Suppose R ⊂ R is a closed rectangle and f : R → R is a bounded function. Let m, M
n
∈ R be such that for
all x ∈ R we have m ≤ f (x) ≤ M . For any partition P of R we have
mV (R) ≤ L(P , f ) ≤ U (P , f ) ≤ M V (R). (11.1.7)
N N N
¯
¯¯¯¯
¯
¯¯
¯
We call ∫ the lower Darboux integral and ∫ the upper Darboux integral.
–
–
~ ~ ~ ~
Let R , R , … , R be the subrectangles of
1 2 N P and R1 , R2 , … , RM be the subrectangles of R . Let I be the set of indices
k j such
~
that R ⊂ R . We notice that
j k
~ ~
Rk = ⋃ Rj , V (Rk ) = ∑ V (Rj ). (11.1.10)
j∈Ik j∈Ik
~
Let m j := inf{f (x) : x ∈ Rj } , and m
~
j := inf{f (x) :∈ Rj } as usual. Notice also that if j ∈ I , then m k k
~
≤ mj . Then
N N N M
~ ~ ~ ~
~ ~
L(P , f ) = ∑ mk V (Rk ) = ∑ ∑ mk V (Rj ) ≤ ∑ ∑ mj V (Rj ) = ∑ mj V (Rj ) = L(P , f ). \qedhere (11.1.11)
The key point of this next proposition is that the lower Darboux integral is less than or equal to the upper Darboux integral.
[mv:intulbound:prop] Let R ⊂ R be a closed rectangle and
n
f: R → R a bounded function. Let m, M ∈ R be such that for all
x ∈ R we have m ≤ f (x) ≤ M . Then
¯
¯¯¯¯
¯
By taking suprema of L(P , f ) and infima of U (P , f ) over all P we obtain the first and the last inequality.
The key of course is the middle inequality in [mv:intulbound:eq]. Let P1 = { P
1
1
,P
2
1
,…,P
n
1
} and P2 = { P
2
1
,P
2
2
,…,P
2
n
} be
~ ~1 ~2 ~n ~k ~
partitions of R . Define by letting
P = {P , P , … , P } ∪P . Then is a partition of R as can easily be checked, and
P =P
k
1 2
k
P
~ ~ ~
P is a refinement of P and a refinement of P . By , L(P , f ) ≤ L(P , f ) and U (P , f ) ≤ U (P , f ) . Therefore,
1 2 1 2
~ ~
L(P1 , f ) ≤ L(P , f ) ≤ U (P , f ) ≤ U (P2 , f ). (11.1.14)
¯
¯¯¯
¯¯
In other words ∫ R
f ≤∫
R
f .
–––
¯
¯¯¯¯¯¯
¯
b b
Then f is said to be Riemann integrable. The set of Riemann integrable functions on R is denoted by R(R) . When f ∈ R(R) we
define the Riemann integral
¯
¯¯¯¯
¯
∫ f := ∫ f =∫ f. (11.1.17)
R R R
–––
∫ f (x) dx, (11.1.18)
R
i. αf is in R(R) and
∫ αf = α ∫ f (11.1.21)
R R
∫ (f + g) = ∫ f +∫ g. (11.1.22)
R R R
Let R ⊂ R be a closed rectangle and let f and g be in R(R) and let f (x) ≤ g(x) for all x ∈ R . Then
n
∫ f ≤∫ g. (11.1.23)
R R
Again for simplicity if f : S → R is a function and R ⊂ S is a closed rectangle, then if the restriction f|
R
is integrable we say f is
integrable on R , or f ∈ R(R) and we write
∫ f := ∫ f| . (11.1.24)
R
R R
Given ϵ > 0 , we find a partition P such that U (P , f ) − L(P , f ) < ϵ . By making a refinement of P we can assume that the
endpoints of R are in P , or in other words, R is a union of subrectangles of P . Then the subrectangles of P divide into two
collections, ones that are subsets of R and ones whose intersection with the interior of R is empty. Suppose that R , R … , R be 1 2 K
~
the subrectangles that are subsets of R and R ,…,R be the rest. Let P be the partition of R composed of those subrectangles
K+1 N
k=1 k=K+1
K
~ ~
≥ ∑(Mk − mk )V (Rk ) = U (P , f | ) − L(P , f | )
R R
k=1
Therefore f | is integrable.
R
then f ∈ R(R) .
Given an ϵ > 0 find P as in the hypothesis. Then
¯
¯¯¯¯
¯
¯
¯¯¯
¯¯ ¯
¯¯¯
¯¯
As ∫ R
f ≥ ∫
R
f and the above holds for every ϵ > 0 , we conclude ∫ R
f = ∫
R
f and f ∈ R(R) .
––– –––
We say a rectangle R = [a 1 1 2 2 n
, b ] × [a , b ] × ⋯ × [a , b ]
n
has longest side at most α if b k
−a
k
≤α for all k .
If a rectangle R ⊂ R has longest side at most α . Then for any x, y ∈ R,
n
−
∥x − y∥ ≤ √n α. (11.1.27)
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
1 1 2 2 2 2 n n 2
∥x − y∥ = √ (x −y ) + (x −y ) + ⋯ + (x −y )
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
1 1 2 2 2 2 n n 2
≤ √ (b −a ) + (b −a ) + ⋯ + (b −a )
−−−−−−−−−−−−−−−
2 2 2 −
≤ √ α +α +⋯ +α = √n α. \qedhere
V (R)
Let P be a partition of R such that longest side of any subrectangle is strictly less than δ
. Then for all x, y ∈ R for a subrectangle
k
√n
−
Rk of P we have, by the proposition above, ∥x − y∥ < √n
δ
=δ . Therefore
√n
ϵ
f (x) − f (y) ≤ |f (x) − f (y)| < . (11.1.28)
V (R)
As f is continuous on R , it attains a maximum and a minimum on this interval. Let x be a point where f attains the maximum and y
k
be a point where f attains the minimum. Then f (x) = M and f (y) = m in the notation from the definition of the integral.
k k
Therefore,
ϵ
Mi − mi = f (x) − f (y) < . (11.1.29)
V (R)
And so
k=1 k=1
= ∑(Mk − mk )V (Rk )
k=1
N
ϵ
< ∑ V (Rk ) = ϵ.
V (R)
k=1
∫ f = ∫ f, (11.1.30)
a a
––––
¯
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
¯
supp(f ) := {x ∈ U : f (x) ≠ 0} . (11.1.31)
That is, the support is the closure of the set of points where the function is nonzero. So for a point not in the support we have that f is
constantly zero in a whole neighbourhood.
A function f is said to have compact support if supp(f ) is a compact set. We will mostly consider the case when U n
=R . In light
of the following exercise, this is not an oversimplification.
~
Suppose U ⊂R
n
is open and f : U → R is continuous and of compact support. Show that the function f : R n
→ R
~ f (x) if x ∈ U
f (x) := { (11.1.32)
0 otherwise
is continuous.
[mv:prop:rectanglessupp] Suppose f : R → R be a function with compact support. If R is a closed rectangle such that
n
supp(f ) ⊂ R where R is the interior of R , and f is integrable over R , then for any other closed rectangle S with supp(f ) ⊂ S ,
o o o
∫ f =∫ f. (11.1.33)
S R
~
The intersection of closed rectangles is again a closed rectangle (or empty). Therefore we can take R = R ∩ S be the intersection of
~
all rectangles containing supp(f ) . If R is the empty set, then supp(f ) is the empty set and f is identically zero and the proposition is
~ ~ ~ ~ ~
trivial. So suppose that R is nonempty. As R ⊂ R , we know that f is integrable over R . Furthermore R ⊂ S . Given ϵ > 0 , take P
~
to be a partition of R such that
~ ~
U (P , f | ~) − L(P , f | ~) < ϵ. (11.1.34)
R R
~ ~
Now add the endpoints of S to P to create a new partition P . Note that the subrectangles of P are subrectangles of P as well. Let
~ ~
R ,R ,…,R
1 2 be the subrectangles of P and R
K ,…,R the new subrectangles. Note that since supp(f ) ⊂ R , then for
K+1 N
k = K + 1, … , N we have supp(f ) ∩ R = ∅ . In other words f is identically zero on R . Therefore in the notation used
k k
previously we have
K N
k=1 k=K+1
K N
k=1 k=K+1
~ ~
= U (P , f | ~) − L(P , f | ~) < ϵ.
R R
~
Similarly we have that L(P , f | S
) = L(P , f ~)
R
and therefore
~
Since R ⊂ R we also get ∫ R
f =∫ ~
R
f , or in other words ∫ R
f =∫
S
f .
Because of this proposition, when f: R
n
→ R has compact support and is integrable over a rectangle R containing the support we
write
∫ f := ∫ f or ∫ f := ∫ f. (11.1.36)
n
R R R
Exercises
FIXME
FIXME: Show that integration over a rectangle with one side of size zero results in zero integral.
[mv:exersmallerset] Suppose R and R
′
are two closed rectangles with R ⊂R
′
. Suppose that f: R → R is in R(R) . Show that
f ∈ R(R ) .
′
∫ f =∫ f. (11.1.37)
′
R R
supp(f ) ⊂ R and f is integrable over R , then for any other closed rectangle S with supp(f ) ⊂ S , the function f is integrable over
S and ∫ f = ∫ f . Hint: notice that now the new rectangles that you add as in the proof can intersect supp(f ) on their boundary.
S R
Suppose that R and S are closed rectangles. Let f (x) := 1 if x ∈ R and f (x) = 0 otherwise. Show that f is integrable over S and
compute ∫ f . S
expressions
1 1 1 1
1 if x = \nicefrac12 and y ∈ Q,
f (x, y) := { (11.2.2)
0 otherwise.
1 1
Then f is Riemann integrable on R := [0, 1] and ∫ 2
R
f =0 . Furthermore, ∫ 0
∫
0
f (x, y) dx dy = 0 . However
1
∫ f (\nicefrac12, y) dy (11.2.3)
0
1 1
does not exist, so we cannot even write ∫ 0
∫
0
f (x, y) dy dx .
Proof: Let us start with integrability of f . We simply take the partition of [0, 1] where the partition in the 2
x direction is
{0, \nicefrac12 − ϵ, \nicefrac12 + ϵ, 1} and in the y direction {0, 1} . The subrectangles of the partition are
and
The upper and lower sum are arbitrarily close and the lower sum is always zero, so the function is integrable and ∫ R
f =0 .
For any y , the function that takes x to f (x, y) is zero except perhaps at a single point x = \nicefrac12. We know that such a
1 1 1
function is integrable and ∫ f (x, y) dx = 0. Therefore, ∫ ∫ f (x, y) dx dy = 0.
0 0 0
However if x = \nicefrac12, the function that takes y to f (\nicefrac12, y) is the nonintegrable function that is 1 on the
rationals and 0 on the irrationals. See .
We will solve this problem of undefined inside integrals by using the upper and lower integrals, which are always defined.
We split R n+m
into two parts. That is, we write the coordinates on R n+m
=R
n
×R
m
as (x, y) where x ∈ R and
n
y ∈ R
m
.
For a function f (x, y) we write
fx (y) := f (x, y) (11.2.7)
¯
¯¯¯¯
¯
∫ g =∫ h =∫ f. (11.2.10)
R R R×S
In other words
¯
¯¯¯¯
¯
If it turns out that f is integrable for all x, for example when f is continuous, then we obtain the more familiar
x
the subrectangles of P . Then P × P is the partition whose subrectangles are R × R for all 1 ≤ j ≤ N and all
′ ′
j
′
k
1 ≤k ≤K .
Let
mj,k := inf f (x, y). (11.2.13)
′
(x,y)∈Rj ×R
k
We notice that V (R j
′
× R ) = V (Rj )V (R )
k
′
k
and hence
N K N K
′ ′ ′
L(P × P , f ) = ∑ ∑ mj,k V (Rj × R ) = ∑ ( ∑ mj,k V (R )) V (Rj ). (11.2.14)
k k
If we let
mk (x) := inf f (x, y) = inf fx (y), (11.2.15)
′ ′
y∈R y∈R
k k
′ ′ ′
∑ mj,k V (R ) ≤ ∑ mk (x) V (R ) = L(P , fx ) ≤ ∫ fx = g(x). (11.2.16)
k k
S
k=1 k=1
–––
′
∑ mj,k V (R ) ≤ inf g(x). (11.2.17)
k
x∈Rj
k=1
We thus obtain
N
′
L(P × P , f ) ≤ ∑ ( inf g(x)) V (Rj ) = L(P , g). (11.2.18)
x∈Rj
j=1
Similarly U (P ′
× P , f ) ≥ U (P , h) , and the proof of this inequality is left as an exercise.
Putting this together we have
′ ′
L(P × P , f ) ≤ L(P , g) ≤ U (P , g) ≤ U (P , h) ≤ U (P × P , f ). (11.2.19)
and we can make the right hand side arbitrarily small. Furthermore as ′
L(P × P , f ) ≤ L(P , g) ≤ U (P × P , f )
′
we must
have that ∫ g = ∫
R
f .
R×S
Similarly we have
′ ′
L(P × P , f ) ≤ L(P , g) ≤ L(P , h) ≤ U (P , h) ≤ U (P × P , f ), (11.2.21)
and hence
′ ′
U (P , h) − L(P , h) ≤ U (P × P , f ) − L(P × P , f ). (11.2.22)
¯
¯¯¯¯
¯
y x
g(x) := ∫ f and h(x) := ∫ f (11.2.23)
S S
–––
∫ g =∫ h =∫ f. (11.2.24)
S S R×S
Next suppose that f and f are integrable for simplicity. For example, suppose that f is continuous. Then by putting the two
x
y
Often the Fubini theorem is stated in two dimensions for a continuous function f: R → R on a rectangle R = [a, b] × [c, d] .
Then the Fubini theorem states that
b d d b
And the Fubini theorem is commonly thought of as the theorem that allows us to swap the order of iterated integrals.
We can also obtain the Repeatedly applying Fubini theorem gets us the following corollary: Let
1 1 2 2
R := [ a , b ] × [ a , b ] × ⋯ × [ a , b ] ⊂ R
n n n
be a closed rectangle and let f : R → R be continuous. Then
1 2 n
b b b
1 2 n n n−1 1
∫ f =∫ ∫ ⋯∫ f (x , x , … , x ) dx dx ⋯ dx . (11.2.28)
R a1 a2 an
Clearly we can also switch the order of integration to any order we please. We can also relax the continuity requirement by
making sure that all the intermediate functions are integrable, or by using upper or lower integrals.
Exercises
Prove the assertion U (P ′
× P , f ) ≥ U (P , h) from the proof of .
Prove .
FIXME
∗
m (S) := inf ∑ V (Rj ), (11.3.1)
j=1
∞
where the infimum is taken over all sequences {R j} of open rectangles such that S ⊂ ⋃ j=1
Rj . In particular S is of measure
zero or a null set if m (S) = 0 . ∗
We will only need measure zero sets and so we focus on these. Note that S is of measure zero if for every ϵ > 0 there exist a
sequence of open rectangles {R } such that j
∞ ∞
j=1 j=1
The set Q n
⊂R
n
of points with rational coordinates is a set of measure zero.
Proof: The set Q is countable and therefore let us write it as a sequence
n
q1 , q2 , … . For each qj find an open rectangle Rj
∞ ∞ ∞
n −j
Q ⊂ ⋃ Rj and ∑ V (Rj ) < ∑ ϵ2 = ϵ. (11.3.3)
S = ⋃ Sj (11.3.4)
j=1
where S are all measure zero sets. Let ϵ > 0 be given. For each j there exists a sequence of open rectangles {R
j
∞
j,k }k=1 such
that
∞
Sj ⊂ ⋃ Rj,k (11.3.5)
k=1
and
∞
−j
∑ V (Rj,k ) < 2 ϵ. (11.3.6)
k=1
Then
∞ ∞
S ⊂ ⋃ ⋃ Rj,k . (11.3.7)
j=1 k=1
As V (R j,k ) is always positive, the sum over all j and k can be done in any order. In particular, it can be done as
−j
∑ ∑ V (Rj,k ) < ∑ 2 ϵ = ϵ. \qedhere (11.3.8)
P = ⋃ Pj (11.3.12)
j=1
Let us prove the other inequality. Suppose that {(a j, bj )} are open intervals such that
∞
j=1
We wish to bound ∑(b − a ) from below. Since [a, b] is compact, then there are only finitely many open intervals that still
j j
cover [a, b]. As throwing out some of the intervals only makes the sum smaller, we only need to take the finite number of
intervals still covering [a, b]. If (a , b ) ⊂ (a , b ) , then we can throw out (a , b ) as well. Therefore we have
i i j j i i
(a , b ) for some k , and we assume that the intervals are sorted such that a < a < ⋯ < a . Note that since
k
[a, b] ⊂ ⋃ j j 1 2 k
j=1
k k−1
∗
m ([a, b]) ≥ ∑(bj − aj ) ≥ ∑(aj+1 − aj ) + (bk − ak ) = bk − a1 > b − a. (11.3.14)
j=1 j=1
j=1
j=1 j=1
j=1 j=1
The image of a measure zero set using a continuous map is not necessarily a measure zero set. However if we assume the
mapping is continuously differentiable, then the mapping cannot “stretch” the set too much. The proposition does not require
compactness, and this is left as an exercise.
[prop:imagenull] Suppose U ⊂ R is an open set and
n
f: U → R
n
is a continuously differentiable mapping. If E ⊂U is a
compact measure zero set, then f (E) is measure zero.
As FIXME: distance to boundary, did we do that? We should!
FIXME: maybe this closed/open rectangle bussiness should be addressed above
Let ϵ > 0 be given.
FIXME: Let δ > 0 be the distance to boundary
Let us “fatten” E a little bit. Using compactness, there exist finitely many open rectangles T 1, T2 , … , Tk such that
Since a closed rectangle has the same volume as an open rectangle with the same sides, so we could take R to be the closure j
of T , Furthermore a closed rectangle can be written as finitely many small rectangles. Consequently for some ℓ there exist
j
√nδ
finitely many closed rectangles R 1, R2 , … , Rn of side at most 2
. such that
Let
′
E := R1 ∪ R2 ∪ ⋯ ∪ Rℓ (11.3.20)
Exercises
FIXME:
If A ⊂ B then m ∗ ∗
(A) ≤ m (B) .
Show that if R ⊂ R is a closed rectangle then m
n ∗
(R) = V (R) .
Prove a version of without using compactness:
a) Mimic the proof to first prove that the proposition holds only if E is relatively compact; a set E ⊂ U is relatively compact
if the closure of E in the subspace topology on E is compact, or in other words if there exists a compact set K with K ⊂ U
Let U ⊂R
n
be an open set and let f : U → R be a continuously differentiable function. Let
G := {(x, y) ∈ U × R : y = f (x)} be the graph of f . Show that f is of measure zero.
be able to quantify how discontinuous f is at a function is at x. For any δ > 0 define the oscillation of f on the δ -ball in subset
topology that is B (x, δ) = B (x, δ) ∩ S as
S R
n
o(f , x, δ) := sup f (y) − inf f (y) = sup (f (y1 ) − f (y2 )). (11.4.1)
y∈BS (x,δ)
y∈BS (x,δ) y1 , y2 ∈BS (x,δ)
That is, o(f , x, δ) is the length of the smallest interval that contains the image f (B (x, δ)). Clearly o(f , x, δ) ≥ 0 and notice
S
o(f , x, δ) ≤ o(f , x, δ )
′
whenever δ < δ . Therefore, the limit as δ → 0 from the right exists and we define the oscillation of a
′
function f at x as
Hence, o(x, f ) = 0 .
On the other hand suppose that o(x, f ) = 0 . Given any ϵ > 0 , find a δ > 0 such that o(f , x, δ) < ϵ. If y ∈ B S (x, δ) then
Equivalently we want to show that G = {x ∈ S : o(f , x) < ϵ} is open in the subset topology. As infδ>0 o(f , x, δ) < ϵ , find a
δ > 0 such that
Take any ξ ∈ B S (x, \nicefracδ2) . Notice that B S (ξ, \nicefracδ2) ⊂ BS (x, δ) . Therefore,
o(f , ξ, \nicefracδ2) = sup (f (y1 ) − f (y2 )) ≤ sup (f (y1 ) − f (y2 )) = o(f , x, δ) < ϵ. (11.4.7)
y1 , y2 ∈BS (ξ,\nicefracδ2) y1 , y2 ∈BS (x,δ)
So o(f , ξ) < ϵ as well. As this is true for all ξ ∈ BS (x, \nicefracδ2) we get that G is open in the subset topology and S∖G is
closed as is claimed.
The set T = R ∖ (S ∪ ⋯ ∪ S ) is closed, bounded, and therefore compact. Furthermore for x ∈ T , we have
1 k o(f , x) < ϵ .
Hence for each x ∈ T , there exists a small closed rectangle T with x in the interior of T , such that x x
The interiors of the rectangles T cover T . As T is compact there exist finitely many such rectangles T
x 1, T2 , … , Tm that covers
T.
Now take all the rectangles T , T , … , T and S , S , … , S and construct a partition out of their endpoints. That is construct a
1 2 m 1 2 k
partition P with subrectangles R , R , … , R such that every R is contained in T for some ℓ or the closure of S for some ℓ .
1 2 p j ℓ ℓ
Suppose we order the rectangles so that R , R , … , R are those that are contained in some T , and R , R , … , R are the
1 2 q ℓ q+1 q+2 p
j=1 j=q+1
j=1
q p
j=1 j=q+1
q p
j=1 j=q+1
Clearly, we can make the right hand side as small as we want and hence f is integrable.
For the other direction, suppose that f is Riemann integrable over R . Let S be the set of discontinuities again and now let
j=1
Suppose that R1 , R2 , … , Rp are order so that the interiors of R , R , … , R intersect S , while the interiors of 1 2 q k
Rq+1 , Rq+2 , … , Rp are disjoint from S . If x ∈ R ∩ S and x is in the interior of R so sufficiently small balls are completely
k j k j
p q q
1
ϵ > ∑(Mj − mj )V (Rj ) ≥ ∑(Mj − mj )V (Rj ) ≥ ∑ V (Rj ) (11.4.13)
k
j=1 j=1 j=1
In other words ∑ V (R ) < kϵ . Let G be the set of all boundaries of all the subrectangles of P . The set G is of measure zero
q
j=1 j
∘ ∘ ∘
Sk ⊂ R ∪R ∪ ⋯ ∪ Rq ∪ G. (11.4.14)
1 2
As G can be covered by open rectangles arbitrarily small volume, S must be of measure zero. As k
S = ⋃ Sk (11.4.15)
k=1
Exercises
FIXME:
1 if x ∈ S
χS (x) := { (11.5.1)
0 if x ∉ S.
A bounded set S is said to be Jordan measurable if for some closed rectangle R such that S ⊂ R , the function χ is in R(R) . S
Take two closed rectangles R and R with S ⊂ R and S ⊂ R , then R ∩ R is a closed rectangle also containing S . By and ,
′ ′ ′
∫ χS = ∫ χS = ∫ χS . (11.5.2)
′ ′
R R R∩R
V (S) := ∫ χS , (11.5.3)
R
Suppose R is a closed rectangle such that S is contained in the interior of R . If x ∈ ∂S , then for every δ > 0 , the sets
S ∩ B(x, δ) (where χ is 1) and the sets (R ∖ S) ∩ B(x, δ) (where χ is 0) are both nonempty. So χ is not continuous at x.
S S S
¯¯
¯
If x is either in the interior of S or in the complement of the closure S , then χ is either identically 1 or identically 0 in a S
whole neighbourhood of x and hence χ is continuous at x. Therefore, the set of discontinuities of χ is precisely the
S S
Let R1 , … , Rk be all the subrectangles of P such that χ is not identically zero on each R . That is, there is some point
S j
x ∈ Rj such that x ∈ S . Let O be an open rectangle such that R ⊂ O and V (O ) < V (R ) + \nicefracϵk . Notice that
j j j j j
S ⊂⋃ O . Then
j
j
k k
∗
U (P , χS ) = ∑ V (Rk ) > ( ∑ V (Ok )) − ϵ ≥ m (S) − ϵ. (11.5.5)
j=1 j=1
As U (P , χ S) ≤ V (S) + ϵ , then m ∗
(S) − ϵ ≤ V (S) + ϵ , or in other words m ∗
(S) ≤ V (S) .
subrectangles contained in S . The interiors of the subrectangles R are disjoint and V (R ) = V (R ) . It is easy to see from
′∘
j
′∘
j
′
j
definition that
ℓ ℓ
∗ ′∘ ′∘
m (⋃ R ) = ∑ V (R ). (11.5.6)
j j
j=1 j=1
Hence
ℓ ℓ
∗ ∗ ′ ∗ ′∘
m (S) ≥ m ( ⋃ R ) ≥ m ( ⋃ R ) (11.5.7)
j j
j=1 j=1
Therefore m ∗
(S) ≥ V (S) as well.
f (x) if x ∈ S,
˜
f (x) = { (11.5.8)
0 otherwise,
∫ f := ∫ f˜. (11.5.9)
S R
When f is defined on a larger set and we wish to integrate over S , then we apply the definition to the restriction f |S . In
particular note that if f : R → R for a closed rectangle R , and S ⊂ R is a Jordan measurable subset then
∫ f =∫ f χS . (11.5.10)
S R
FIXME
Let T = g(S) . We claim that the boundary ∂T is contained in the set g(∂S) . Suppose the claim is proved. As S is Jordan
measurable, then ∂S is measure zero. Then g(∂S) is measure zero by . As ∂T ⊂ g(∂S) , then T is Jordan measurable.
It is therefore left to prove the claim. First, S is closed and bounded and hence compact. By , T = g(S) is also compact and
therefore closed. In particular ∂T ⊂ T . Suppose y ∈ ∂T , then there must exist an x ∈ S such that g(x) = y . The Jacobian of
g is nonzero at x.
We now use the inverse function theorem . We find a neighbourhood V ⊂ U of x and an open set W such that the restriction
f|
V
is a one-to-one and onto function from V to W with a continuously differentiable inverse. In particular g(x) = y ∈ W .
As y ∈ ∂T , there exists a sequence {y } in W with lim y = y and y ∉ T . As g| is invertible and in particular has a
k k k V
continuous inverse, there exists a sequence {x } in V such that g(x ) = y and lim x = x . Since y ∉ T = g(S) , clearly
k k k k k
Exercises
Prove .
It may be surprising that the analogue in higher dimensions is quite a bit more complicated. The first complication is
b a
orientation. If we use the definition of integral from this chapter, then we do not have the notion of ∫ versus ∫ . We are a b
simply integrating over an interval [a, b]. With this notation then the change of variables becomes
′
∫ f (g(x)) | g (x)| dx = ∫ f (x) dx. (11.6.2)
[a,b] g([a,b])
volumes, so in one dimension it measures length. If our g was linear, that is, g(x) = Lx , then g (x) = L . Then the length of ′
the interval g([a, b]) is simply |L| (b − a) . That is because g([a, b]) is either [La, Lb] or [Lb, La]. This property holds in
higher dimension with |L| replaced by absolute value of the determinant.
[prop:volrectdet] Suppose that R ⊂R
n
is a rectangle and T:R
n
→ R
n
is linear. Then T (R) is Jordan measurable and
V (T (R)) = |det T | V (R) .
FIXME
The left hand side is ∫ χ ′
R
, where the integral is taken over a large enough rectangle R that contains g(S) . The right hand
g(S)
′
side is ∫ |J | for a large enough rectangle R that contains S . Let ϵ > 0 be given. Divide R into subrectangles, denote by
R
g
R ,R ,…,R
1 2 those subrectangles which intersect S . Suppose that the partition is fine enough such that
K
...
N N N
Let
FIXME
So |J (x)| is the replacement of |g (x)| for multiple dimensions. Note that the following theorem holds in more generality, but
g
′
FIXME
FIXME: change of variables for functions with compact support
FIXME4
Exercises
Prove .
FIXME
1. If you want a funky vector space over a different field, R is an infinite dimensional vector space over the rational
numbers.↩