0% found this document useful (0 votes)
17 views124 pages

Advanced Evt Ivt Theoremproofs

The document outlines the syllabus for the Mathematical Analysis I course at the University of Rome Tor Vergata, using Apostol's 'Calculus' as the textbook. It covers topics such as properties of real numbers, functions, differential and integral calculus, and basic differential equations. The document also includes information on lectures, exercises, office hours, and tips for writing math and performing computations.

Uploaded by

nalinbhadana465
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views124 pages

Advanced Evt Ivt Theoremproofs

The document outlines the syllabus for the Mathematical Analysis I course at the University of Rome Tor Vergata, using Apostol's 'Calculus' as the textbook. It covers topics such as properties of real numbers, functions, differential and integral calculus, and basic differential equations. The document also includes information on lectures, exercises, office hours, and tips for writing math and performing computations.

Uploaded by

nalinbhadana465
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 124

Mathematical Analysis I, 2021/22 First semester

Yoh Tanimoto
Dipartimento di Matematica, Università di Roma Tor Vergata
Via della Ricerca Scientifica 1, I-00133 Roma, Italy
email: hoyt@mat.uniroma2.it

We use the textbook “Calculus” Vol. I by Tom M. Apostol, Wiley, but follow the chapters in
a different order.
From Monday to Thursday we have lectures, and on Friday we do mostly exercises.

• Lecture notes:
http://www.mat.uniroma2.it/~tanimoto/teaching/2021MA1/2021MA1.pdf

• Exercises:
http://www.mat.uniroma2.it/~tanimoto/teaching/2021MA1/2021MA1ex.pdf

• Office hours: Tuesday 10:00–11:00 online, or send me a message on Teams

• Supplementary course: Basic Math

Some tips
• Writing math.

– LATEX. You can try it here, and you can install the full set afterwards. You need to
learn some commands, but once you know it it’s very powerful. All my lecture notes
and slides are written in LATEX
– Word processor (MS Word, Apple Pages, Open Office, Libre Office (Insert → Objects
→ Formula)...).

• Doing quick computations.

– Wolfram Math Alpha You can just type some formulas in and it shows the result.
– Programming languages. Python (I used it to make the graph of the SIR model),
Java, C,· · ·

Sep 20. Overview of the course, integers and rational numbers


Mathematical Analysis I
Summary of the course
• properties of real numbers, concept of sets.

• mathematical induction. the summation notation.

• functions. limit of functions, continuity.

• trigonometric functions (cos x, sin x), exponential function ex , logarithmic function log x.

1
10

0
0 2 4 6 8 10

Figure 1: Left: A graph can be used to study changing quantities. Right:the SIR model.

• differential calculus and applications.

• Taylor’s formula, approximation of functions

• integral of functions. the relation between integration and differentiation.

• basic differential equations.

• numerical seqeunces and series, complex numbers

What is analysis and why study it


In a real-world science, it is crucial to study quantitative aspects of the subject. When a
quantity changes by time, one can study its change in a short time (⇒ differentiation) and then
sum it up (⇒ integration). Another important problem is optimization: maximizing benefit or
minimizing cost.
2
• Mechanics, the equation of motion F (x, t) = m ddt2x (= ma)

• Electrodynamics, thermodynamics, fluid mechanics (Mathematical Analysis II)

• Epidemiology (the SIR model dS


dt = − βS(t)I(t)
N , dI
dt =
βS(t)I(t)
N − γI(t), dR
dt = γI(t))

Mathematical symbols
We use symbols for general mathematical objects. Before using a symbol, we declare what
kind of object it is. For example,

• x, y, z, a, b, c often for numbers (but be careful of the declaration)

• f, g, h, F, G, H often for “functions” (which we will study later)

A symbol might be “recycled”, that is, can be declared to be something different (unfortunately,
we have only 26 × 2 alphabets).
Symbols are very useful because we can express general properties of certain mathematical
objects at the same time, without specifying them every time.

2
Integers and rational numbers
We assume that we know

• integers: 0, 1, 2, 3, · · · , 100, 101, · · · , 492837498 · · · , −1, −2, −3, · · ·

• rational numbers: 1 2
2, 3, · · ·
23
, 62518 26543 , · · · (integers are also rational numbers)
, − 3028746

• calculations between them (sum, difference, product, division, order)

On rational numbers, we have the set of operations + (summation), · (product): For x, y, z


rational numbers (declaration), x + y and x · y are again rational numbers and they satisfy

• (commutativity) x + y = y + x, x · y = y · x

• (associativity) (x + y) + z = x + (y + z), (x · y) · z = x · (y · z)

• (distributive law) (x + y) · z = xz + yz

• (zero and unity) There are special distinct rational numbers, called 0 and 1, such that
x + 0 = x and x · 0 = 0. And x · 1 = x.

• (negative) There is a only one rational number, which we call −x, such that x + (−x) = 0.

• (inverse) If x 6= 0, there is only one rational number, which we call x−1 , such that x·x−1 = 1.

We often simply write xy for x · y and x − y for x + (−y). xy −1 is also written as xy .

Exercises Take concrete rational numbers and check these properties!


Other properties of rational numbers can be derived from these. Indeed, we can prove the
following1

Theorem 1. Let a, b, c, d be rational numbers.

• if a + b = a + c, then b = c.

• −(−a) = a.

• a(b − c) = ab − ac.

• a · 0 = 0 · a = 0.

• if ab = ac and a 6= 0, then b = c.

• if a 6= 0, then a−1 6= 0 and (a−1 )−1 = a.

• if ab = 0, then a = 0 or b = 0.

• (−a)b = −(ab) and (−a)(−b) = ab.

• if b 6= 0, d 6= 0, then a
b + c
d = ad+bc
bd .

• if b 6= 0, d 6= 0, then a
b · c
d = ac
bd .

• if a 6= 0, b 6= 0, then ( ab )−1 = ab .
1
statements that can be proven are called theorems, and the properties that we assume are called axioms.

3
−4 −3 −2 −1 0 1 2 3 4 5

− 17
4 −3 0 2
3 2.5

Figure 2: A geometric representation of integers and rational numbers.


.

Proof. We only prove a few of them and leave the rest as exercises.
Let us assume that a + b = a + c. Then, we take −a and

(a + b) + (−a) = a + (b + (−a)) (associativity)


= a + ((−a) + b) (commutativity)
= (a + (−a)) + b (associativity)
= 0 + b (definition of 0)
= b (property of 0)

Similarly, (a+c)+(−a) = c. But as a+b = a+c, we have b = (a+b)+(−a) = (a+c)+(−a) = c.


Assume that a 6= 0. We show that a−1 6= 0 by contradiction. Indeed, if we had a−1 = 0, then
we would have 0 · a = a−1 · a = 1, contradiction. Therefore, a−1 6= 0 and 1 = a · a−1 = a−1 · a,
hence a = (a−1 )−1 .
If ab = 0 and a 6= 0, then we can take a−1 and 0 = a−1 0 = a−1 ab = 1 · b = b.

Integers and rational numbers can be represented on a line.

Sep 22. Concept of sets, set of numbers


Order in rational numbers
There is also an order relation < (“x is larger than y”: y < x) which satisfies, for x, y, z rational,
• if 0 < x, 0 < y, then 0 < xy and 0 < x + y.

• if x < y, then x + z < y + z.

• if x 6= 0, either 0 < x or x < 0 but not both.

• It is not true that 0 < 0.


x < y and y > x have the same meaning.
We say that x is positive if 0 < x and x is negative if x < 0. If x is not positive, then
either x = 0 or x < 0 and in this case we say x is nonpositive and write x ≤ 0 (again, x ≥ 0
and 0 ≤ x mean the same thing). Similarly, if x > 0 or x = 0, we say x is nonnegative and
write x ≥ 0.
In addition to the “axioms”, we also use the logic that, if an equality or inequality holds for
some x and if x = y, then it also holds for y. Example: if x < z and x = y, then y < z.
Notation: for any number x, we write x2 = x · x. Similarly, x3 = x · x · x, and so on.
With the properties above, we can prove the following.
Theorem 2. Let a, b, c, d rational numbers. Then
• a < b if and only if a − b < 0.

• one and only one of the following holds: a < b, b < a, a = b.

4
• if a < b, b < c then a < c.

• if a < b and c > 0, then ac < bc.

• if a 6= 0, then a2 (= a · a) > 0.

• 1 > 0.

• if a < b and c < 0, then ac > bc.

• if a < b, then −a > −b.

• if ab > 0, then either a > 0, b > 0 or a < 0, b < 0.

• if a < c and b < d, then a + b < c + d.

Proof. We only prove a few of them and leave the rest as exercises. In general, to show “A if and
only if B”, it is enough to show that “if A, then B” and “if B, then A”, this is because “A only if
B” implies that “if not B, then not A”, and by contradiction, “if A, not not B”, but “not not B”
means B.
If a < b, then by adding −b to both sides, we get a − b < 0. Conversely, if a − b < 0, by
adding b to both sides we get a < b.
If a = b, then b − a = 0 and we know that both b − a > 0 and b − a < 0 are false and hence
b > a and b < a are false. If a < b, then a − b < 0 and a − b = 0 is false, and hence a = b is false.
If a < b, then 0 < b − a and 0 < c · (b − a) = bc − ac, hence ac < bc.
If a 6= 0, then either a > 0 or a < 0. For the case a > 0, we have a2 = a · a > 0. For the case
a < 0, we have −a > 0 and a2 = (−a)2 > 0.

All these “theorems” about rational numbers should be well-known to you. But it is important
that we could prove them from a few axioms, which we assume to be true.

Exercises Check the remaining statements.

Naive set theory


It is very often convenient to consider sets of numbers. For example, we may consider the set
Q+ of positive rational numbers, or the set of multiples of 2, and so on. In mathematics, a set is
a collection of mathematical objects. The most precise treatment of sets requires a theory called
axiomatic set theory, but in this lecture we think of a set simply as a collection of known objects.
We often use capital letters A, B, C, · · · for sets, but in any case we declare that a symbol is
a set. For a set S, we denote by x ∈ S the statement “x is an element of S”. We have already
seen examples of sets: let us give them special symbols.

• Q: the set of rational numbers

• Z: the set of integers

In general, we can consider two ways of constructing sets.

• By nomination. We can nominate all elements of a set. For example, A = {0, 1, 2, 3} and
B = {1, 10, 100, 1000} are sets.

• By specification. We include all elements of an existing set with specific properties. For
example, A = {x ∈ Z : there is y ∈ Z such that x = 2y} (read that “A is the set of
integers such that there is an integer y such that x = 2y”) is the set of multiples of 2 (we
recycled the symbol A. When we do this, we shall always declare it).

5
For a set constructed by nomination, the order and repetition do not matter: {0, 1, 2, 3} =
{0, 3, 2, 1} = {0, 0, 1, 1, 1, 2, 3}. In other words, a set is defined by its elements.
A construction by specification appears very often. Let us introduce a more symbol.

• N = {x ∈ Z : x > 0} is called the set of natural numbers.

• ∅ is the set that contains nothing and called the empty set. ∅ is a subset of any set: if A
is a set, the statement “if x ∈ ∅ then x ∈ A” is satisified just because there is no such x!

Subsets
Let B be a set. We say that A is a subset of B if all elements of A belong to B, and denote
this by A ⊂ B. It holds that A ⊂ A for any set A.

Example 3. • Let A = {1, 2, 3} and B = {0, 1, 2, 3, 4}. Then A ⊂ B.

• N ⊂ Z.

• Let A = {1, 2, 3, 4, 5, 6}. Then A ⊂ N.

It may happen that A ⊂ B and B ⊂ A, that is, all elements of A belong to B and vice versa.
This means that A and B are the same as sets, and in this case we write A = B.
The definition by specification A = {x ∈ B : x satisfies the property XXX...} gives always a
subset, in this case of B. Note also that x in this definition has no meaning (“dummy”). One
can write it equivalently A = {y ∈ B : y satisfies the property XXX...}.
For x ∈ A, the set {x} that contains only x should be distinguished from x. It is a subset of
A: {x} ⊂ A.

Unions, intersections, complements


If A and B are sets, then we can consider the set which contains the elements of A and B, and
nothing else. It is called the union of A and B and denoted by A ∪ B.

Example 4. • Let A = {1, 2, 3} and B = {0, 1, 3, 4}. Then A ∪ B = {0, 1, 2, 3, 4}.

Similarly, we can consider the set of all the elements which belong both to A and B, and
nothing else. It is called the intersection of A and B and denoted by A ∩ B.

Example 5. • Let A = {1, 2, 3} and B = {0, 1, 3, 4}. Then A ∩ B = {1, 3}.

Furthermore, the difference of B with respect to A is all the element of A that do not belong
to B and is denoted by A \ B (note that this is different from B \ A).

Example 6. • Let A = {1, 2, 3} and B = {0, 1, 3, 4}. Then A \ B = {2}.

We can consider the union of more than two sets: A ∪ (B ∪ C). By considering the meaning,
this set contains all the elements which belong either A or B ∪ C, which is to say all elements
which belong either A or B or C. Therefore, the order does not matter and we can write A∪B∪C.
Similarly, A ∩ B ∩ C is the intersection of A, B and C.
We may consider a family of sets {Ai }i∈I indexed by another set I. For example, we
can take N as the index set and An = {m ∈ Z : m is a multiple of n}. For a family of set, we
can define the union and the intersection analogously and we denote them by
[ \
Ai , Ai ,
i∈I i∈I

respectively.

6
Sep 23. Sets and logic
Sets by specification
Let us recall that, if we fix a set A, we can define a subset of A by specification: it is the subset
of elements x of A that satisfy a certain condition ϕ(x):

{x ∈ A : ϕ(x)},

where ϕ(x) is a condition on x. For example, {x ∈ Z : x > 10} is the set of integers larger than
10.
On one hand, we can consider the combined conditions: for example, the condition that
x > 10 and the condition that x < 15 can be considered at the same time. The set of integers
that satisfy both of the condition is

{x ∈ Z : x > 10 and x < 15} = {11, 12, 13, 14}.

On the other hand, we observe that this set is the intersection of two sets:

{x ∈ Z : x > 10} = {10, 11, 12, 13, 14, 15, 16, 17, · · · },
{x ∈ Z : x < 15} = {−2, −1, 0, 1, · · · , 10, 11, 12, 13, 14, 15}.

This can be generalized as follows. If B = {x ∈ A : ϕ(x)}, C = {x ∈ A : ψ(x)}, then B ∩ C =


{x ∈ A : ϕ(x) and ψ(x)}.
Similarly, the union of two sets is related with “or” as follows. For example,

B = {x ∈ Z : x > 15} = {16, 17, 18, · · · },


C = {x ∈ Z : x < 10} = {· · · − 2, −1, 0, 1, · · · , 8, 9}.

We observe that B ∪ C = {· · · − 2, −1, 0, 1, · · · , 8, 9, 16, 17, 18, · · · }, which is B ∪ C = {x ∈


Z : x > 15 or x < 10}. In general, if B = {x ∈ A : ϕ(x)}, C = {x ∈ A : ψ(x)}, then
B ∪ C = {x ∈ A : ϕ(x) or ψ(x)}.
Let us the consider the negation. For example,

B = {x ∈ Z : x > 15} = {16, 17, 18, · · · }.

Because the negation of x < 15 is x ≤ 15, we have

Z \ B = {· · · − 2, −1, 0, 1, · · · , 14, 15} = {x ∈ Z : x ≤ 15}.

In general, if ¬ϕ(x) is the negation of ϕ(x), then it holds that, for B = {x ∈ A : ϕ(x)},
A \ B = {x ∈ A : ¬ϕ(x)}.
Let us consider the set {x ∈ Q : (x − 1)(x − 3) > 0}. To understand better this set, we need
to understand the condition (x − 1)(x − 3) > 0. The left-hand side is a product of two rational
numbers. The product of two rational numbers is positive if and only if one of the following
cases is true.

• x − 1 > 0 and x − 3 > 0

• x − 1 < 0 and x − 3 < 0.

They are further equivalent to

• x > 1 and x > 3

• x < 1 and x < 3.

7
Note that x > 1 is true if x > 3. Similarly, x < 3 is true if x < 1. Therefore, these conditions
are equivalent to

• x>3

• x<1

Altogether, we have

{x ∈ Q : (x − 1)(x − 3) > 0} = {x ∈ Q : x > 3 or x < 1} = {x ∈ Q : x > 3} ∪ {x ∈ Q : x < 1}.

The set of subsets, the set of pairs, graphs


We can consider also certain sets of sets.

Example 7. • {1, 2, 3}, {2}, {1, 4, 6, 7} are sets. We can collect them together

{{1, 2, 3}, {0, 2}, {1, 4, 6, 7}}.

This is a set of sets. It is different from the set of their elements {1, 2, 3, 4, 6, 7}.

• Let A = {1, 2, 3}. We can collect all subsets of A:

{∅, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}}.

• One can also consider the set of all subsets of N, Z, Q, but we cannot name all the elements:
they are infinite. For example, for N = {1, 2, 3, 4, · · · }, the set of subsets of N is infinite.

For sets A, B, we can consider ordered pairs of elements in A and B.

Example 8. • Let A = {1, 2, 3}, B = {3, 4}. Then the sef A × B of the ordered pairs of
A, B is
A × B := {(1, 3), (2, 3), (3, 3), (1, 4), (2, 4), (3, 4)}.

• If we take N, then N × N is the set of all ordered pairs of natural numbers. N × N =


{(1, 1), (1, 2), (2, 1), (1, 3), (2, 2), (3, 1), · · · }.

Ordered pairs can be described using graphs. If A, B ⊂ Z have finitely many points, say
m, n respectively, then there are m · n ordered pairs. We take the horizontal axis for A and the
vertical axis for B.
To obtain the graph of A × B, we should mark the point (x, y) if and only if x ∈ A and
y ∈ B. For any subset X of A × B, we should mark the point (x, y) if and only if (x, y) ∈ X.
See Figure 3.
The graph can be understood in terms of ordered pairs. Let A = {1, 2, 3, 4, 5, 6}, and B =
{(x, y) ∈ A × A : y = 2x}. Let us give all elements of B and draw its graph. We check all
6 × 6 = 36 elements. See Figure 4, it is {(1, 2), (2, 4), (3, 6)}. Notice that it is on a straight line!

Sep 27. Real numbers


Are rational numbers all we need?
It is true that, in the real world, we can measure quantities to a certain accuracy, so we get
numbers in a decimal representation:

• c = 299792458[m · s−1 ] (the speed of light)

8
5 5

4 4

3 3

2 2

1 1

1 2 3 4 5 1 2 3 4 5

Figure 3: Left: the set of all ordered pairs {1, 2, 3, 4, 5} × {1, 2, 3, 4, 5}. Right: a subset
{(1, 1), (2, 2), (3, 3), (4, 4), (5, 5)} ⊂ {1, 2, 3, 4, 5} × {1, 2, 3, 4, 5}.
.

6 (3, 6)

4 (2, 4)

2 (1, 2)

1 2 3 4 5 6

Figure 4: The set of all ordered pairs (x, y) ∈ {1, 2, 3, 4, 5, 6} × {1, 2, 3, 4, 5, 6} with y = 2x.

9


2
1
1

Figure
√ 5: Left: √the right triangle with equal sides 1. By the Pytagoras’ theorem, the longest side
is 1 + 1 = 2. Right: the unit circle with radius 1 (diameter 2). The length of the circle
2 2

(the circumference) is 2π.

• G = 0.0000000000667430(15)[m3 kg−1 s−2 ] (the gravitational constant), where (15) means


these digits might be incorrect.
• Any other measured quantity in the real world.
And any experiment has a certain accuracy, so it makes sense only to write a certain number of
digits, so rational numbers seem to suffice.
But for certain cases, we know that we should consider irrational numbers. For example,

• 2 = 1.41421356 · · · , the number x such that x2 = 2.
• π = 3.1415926535 · · · , the circumference of the circle with diameter 1.
• e = 2.718281828 · · · , Napier’s number (we will define it in the lecture).
• Any decimal number which is not repeating.
For the next theorem, we need a proof by contradiction: by assuming the converse of the
conclusion, we derive a contradiction, then we can conclude that the converse of the conclusion
is false, that is, the conclusion is correct.
Recall that an integer p is even if it is a multiple of 2 (there is another integer r such that
p = 2r), and p is odd if it is not even.

Theorem 9. 2 is not a rational number.

Proof. We prove this by contradiction,
√ that is, we assume that 2 is a rational number. So there
are integers p, q such that 2 = pq . We may assume that this is already reduced (that is, not a
fraction like 84 but like 12 . It’s the form which you cannot simplify further).
√ √ 2
As 2 · 2 = 2, we have pq · pq = pq2 = 2, hence p2 = 2q 2 . As pq is reduced, there are two cases.
• if p is odd, then the equality p2 = 2q 2 is even = odd, contradiction.
• if p is even, then q is odd and we can write is as p = 2r, with another integer r, and
p2 = 4r2 = 2q 2 , and 2r2 = q 2 . This is even = odd, contradiction.

So, in√all cases we arrived a contradiction from the assumption that 2 is rational. This means
that 2 is irrational.

10

Exercise. Prove that 2 2 is irrational.
It has been proven that π and e are irrational, but they are more difficult. Instead, it can be
easily proven that any nonrepeating decimal number cannot be rational. This means there are
many irrational numbers.
In other words, the set of rational numbers have “many spaces between them”. We should fill
them in with irrational numbers, so that the set of real numbers is a “continuum”.

The axioms of the real numbers


Here we start the study of Mathematical Analysis, based on the set of real numbers. Our
approach is synthetic, in the sense that we take the axioms for real numbers for granted, and
develop the theory on them. It is also possible to “costruct” real numbers from rational numbers,
and rational numbers from integers, integers from natural numbers, and so on, but at some point
we have to assume certain axioms for simpler objects. If you are interested, look at “Dedekint’s
cut” (for real numbers), or “Peano’s axioms” (for natural numbers).
We assume that, the set R of real numbers is equipped with operations + (summation), ·
(product) and for x, y, z real numbers, x + y and x · y are again real numbers and they satisfy
(just the same properties for rational numbers Q)
• (commutativity) x + y = y + x, x · y = y · x

• (associativity) (x + y) + z = x + (y + z), (x · y) · z = x · (y · z)

• (distributive law) (x + y) · z = xz + yz

• (zero and unity) There are special distinct rational numbers, called 0 and 1, such that
x + 0 = x and x · 0 = 0. And x · 1 = x.

• (negative) There is only one real number, which we call −x, such that x + (−x) = 0.

• (inverse) If x 6= 0, there is only one real number, which we call x−1 , such that x · x−1 = 1.
There is also an order relation < which satisfies, for x, y, z real,
• if 0 < x, 0 < y, then 0 < xy and 0 < x + y.

• if x < y, then x + z < y + z.

• if x 6= 0, either 0 < x or x < 0 but not both.

• 00
We can prove Theorems for real numbers corresponding to Theorems 1, 2. Therefore, the real
numbers have the same properties as the rational numbers, concerning the sum, product and
order.
We say that S ⊂ R is bounded above if there is x ∈ R such that for any y ∈ S it holds
that y ≤ x, and we write S ≤ x. S is said to be bounded below if there is x ∈ R such that for
any y ∈ S it holds that y ≥ x, and we write S ≥ x.
If S is both bounded above and below, we say that S is bounded.
If S is bounded above, then any x ∈ R such that S ≤ x is called an upper bound of S.
Similarly, if x ≤ S, then x is said to be a lower bound of S.
If S has a least upper bound, that is there is x such that S ≤ x and x ≤ y for any upper
bound y of S, then x is called the supremum of S and we denote it by x = sup S. Similarly, if
S has a largest lower bound x, then it is called the infimum of S and we denote it by x = inf S.
R includes Z and Q: 1 ∈ R, hence 2 = 1 + 1, 3 = 1 + 1 + 1, · · · and −1, −2, · · · ∈ R. Also, if
p, q ∈ Z, pq ∈ R.
What distinguishes R from Q is the following.

11
1 1.4 1.41 1.414 1.5

Figure 6: The set S approximating 2, which is bounded by 1.5.

n − 3 n − 2 n − 1x n

0 n1 x 1 2

Figure 7: Any x ∈ R falls between n − 1 and n (including equality) for some n ∈ N. For any
x > 0, there is n ∈ N such that n1 < x.

• (the least upper bound axiom, or the completeness axiom) every nonempty subset S of R
which is bounded above has a supremum: there is B ∈ R such that B = sup S.

This should imply √ that 2 = 1.41421356 · · · belongs to R! Indeed, let us take, by chop-
ping the digits of 2, S = {1, 1.4, 1.41, 1.414, 1.4142, · · · }. S is bounded above, indeed, 1.5 >
1, 1.4, 1.41, 1.414, · · · . On the other hand, if x has a decimal representation,
√ e.g. 1.415, then
there is a smaller number x0 = 1.4149. So, sup S should be exactly 2. We will see this more
precisely later.
(A lemma is a theorem (a consequence of axioms) used to prove a more important theorem)

Lemma 10. If S ⊂ R is bounded above and B = sup S, then for any  > 0, there is x ∈ S such
that B −  < x.

Proof. By contradiction, assume that there is  > 0 such that B −  ≥ x for all x ∈ S. Then B
is not the least upper bound, because B −  is an upper bound of S and B −  < B.

Theorem 11 (Archimedean property). The set N = {1, 2, 3, · · · } is not bounded above.

Proof. By contradiction, assume that N were bounded above. Then by the completeness axiom,
there is x = sup N. By the lemma above, for  = 12 , there is n ∈ N such that x − 21 < n. But
then x < n + 12 < n + 1 ∈ N, and this contradicts the assumption that x were the upper bound
of N. This implies that N is not bounded above.

(A corollary is a theorem which follows easily from a more complicated theorem)

Corollary 12. For any x ∈ R, there is n ∈ N such that x < n. For any y, z ∈ R and z > y,
there is n ∈ N such that n1 < z − y.

Proof. By the theorem above, x is not an upper bound of N, so there is n such that x < n. By
applying this to z−y
1
, there is n such that z−y
1
< n, which implies that n1 < z − y.

Therefore, we can represent the set of real numbers by a straight line, and any point x ∈ R
is on the line and it falls between an integer n and another n − 1 (possibly x = n). Conversely,
any point on the line gives an element in R.
Any real number R has a decimal representation (next lecture).
Note that Q does not have the completeness property!

Proposition 13. Let A = {x ∈ Q : x2 < 2} ⊂ R. Then A is bounded above, and s2 = 2, where


s = sup A.

Proof. A is bounded above, indeed, if x2 < 2, then x2 < 4 = 22 , and hence x < 2.
Let s = sup A ∈ R. Then s2 = 2. We prove this by contradiction.

12
2
• if s2 < 2, then we take  > 0 such that 0 <  < 2−s s (or s < 2 − s2 ) and  < s. Then
2
(s + 4 )2 = s2 + s 2 + 16 2 + 2 < s + s < 2, therefore, s is not an upper bound of

< s2 + s s 2

A (because s + 4 ∈ A), contradiction.




2
• if s2 > 2, then we take  > 0 such that 0 <  < s s−2 (or s < s2 − 2) and  < s. Then
2
(s − 4 )2 = s2 − s 2 + 16 > s2 − s > 2, therefore, s is not the least upper bound of A
(because s − 4 ∈ A is another upper bound, smaller than s), contradiction.
But we know that there is no rational number s ∈ Q such that s2 = 2. Hence s = sup A ∈
/ Q.

This also says that s = 2 belongs to R.

Sep 29. Some sets in real numbers.


Intervals
In the set of real numbers, we can consider intervals: let a, b ∈ R and a < b. We introduce
• (a, b) = {x ∈ R : a < x, x < b} (an open interval)

• (a, b] = {x ∈ R : a < x, x ≤ b}

• [a, b) = {x ∈ R : a ≤ x, x < b}

• [a, b] = {x ∈ R : a ≤ x, x ≤ b} (a closed interval)

• (a, ∞) = {x ∈ R : a < x}

• [a, ∞) = {x ∈ R : a ≤ x}

• (−∞, b) = {x ∈ R : x < b}

• (−∞, b] = {x ∈ R : x ≤ b}
Remember that, a, b are given numbers, and x is a “dummy” number. You can write them
in a different way, without using x:
• (a, b) is the set of all numbers larger than a and smaller than b

• [a, b] is the set of all numbers larger than or equal to a and smaller than or equal to b
Example 14. Consider (0, 1).
• 0.1, 0.2, 0.5, 0.999 ∈ (0, 1).

• 0, 1, 2, 3, 10, −1, −2 ∈
/ (0, 1).

• sup(0, 1) = 1.

• inf(0, 1) = 0.
Consider [0, 1].
• 0, 0.1, 0.2, 0.5, 0.999, 1 ∈ [0, 1].

• 2, 3, 10, −1, −2 ∈
/ [0, 1].

• sup[0, 1] = 1.

• inf[0, 1] = 0.

13
( )

−1 0 0.4 1 3
[ ]

0 0.4 1

Figure 8: Open and closed intervals (0, 1) and [0, 1]. The open interval does not include the
edges 0, 1, while the closed interval [0, 1] does.

A x A+x
( ) ( )
1 2 3 4 5
A B A+B
( ) [ ] ( )
1 2 2.4 2.6 3.4 4.6
A (the smaller)
[ [ ) )
−2 −1 1 2
2A (the larger)

Figure 9: Intervals and their operations. Top: (1, 2) + 3 = (4, 5). Middle: (1, 2) + [2.4, 2.6] =
(3.4, 4.6). Bottom: 2[−1, 1) = [−2, 2).

Operations on sets
Let A, B be subsets of R and a ∈ R. We denote various subsets R as follows.

• A + a = {x ∈ R : x = y + a for some y ∈ A} = {y + a : y ∈ A}

• A − a = {x ∈ R : x = y − a for some y ∈ A} = {y − a : y ∈ A}

• aA = {x ∈ R : x = ay for some y ∈ A} = {ay : y ∈ A}

• A + B = {x ∈ R : x = y + z for some y ∈ A, z ∈ B} = {y + z : y ∈ A, z ∈ B}

• A − B = {x ∈ R : x = y − z for some y ∈ A, z ∈ B} = {y − z : y ∈ A, z ∈ B}

• AB = {x ∈ R : x = yz for some y ∈ A, z ∈ B} = {yz : y ∈ A, z ∈ B}

We write a < x < b as a shorthand notation for a < x and x < b.

Example 15. • Consider A = (0, 1), x = 2. Then A + x = (2, 3), because if 0 < y < 1, 2 <
y + 2 < 3. Note that the boundary 2, 3 is not included.

• Consider A = [1, 2], B = (2.4, 2.6). Then A + B = (3.4, 4.6). Note that the boundary 2, 3
is not included, because there is no x ∈ A, y ∈ B such that x + y = 3.4 or 4.6.

• Consider A = [−1, 1), a = 2. Then 2A = [−2, 2).

14
Some properties of upper and lower bounds
Note that sup A, inf A are only defined for nonempty sets (otherwise the definition is meaning-
less).
Lemma 16. If x, y ∈ R and x −  < y for any  > 0, then x ≤ y.
Proof. By contradiction. If x > y, then by Archimedean property, we have n such that n1 < x−y,
in other words, x− n1 > y, which contradicts the assumption that x− < y for arbitrary  > 0.

Theorem 17. Let A, B ⊂ R and define C = A + B.


• if A, B are bounded above, then A + B is bounded above and sup A + sup B = sup C.

• if A, B are bounded below, then A + B is bounded below and inf A + inf B = inf C.
Proof. We prove only the first one, because the second one is analogous.
By the completeness axiom, A and B have the supremum sup A, sup B. As sup A and sup B
are upper bounds of A and B respectively, for any element z ∈ C we have x ∈ A, y ∈ B such
that z = x + y and x ≤ sup A, y ≤ sup B hence z = x + y ≤ sup A + sup B. In particular,
sup A + sup B is an upper bound of C, hence sup C ≤ sup A + sup B.
Conversely, we know from Lemma 10 that, for any  > 0, there is x ∈ A (and y ∈ B) such
that sup A − 2 < x (and sup B − 2 < y). Therefore, sup A + sup B − 2 − 2 = sup A + sup B −  <
x + y ≤ sup C for arbitrary  > 0, hence by Lemma 16, sup A + sup B ≤ sup C. Altogether,
hence sup C = sup A + sup B.

Remember that sup A is the least (smallest) upper bound and inf B is the greatest (largest)
lower bound.
Theorem 18. Let A, B ⊂ R. If for any x ∈ A and y ∈ B it holds that x < y, then sup A ≤ inf B.
Proof. Any y ∈ B is an upper bound of A, hence sup A ≤ y. This means that sup A is a lower
bound of B, hence sup A ≤ inf B.

The square roots of real numbers


Theorem 19. For any a ∈ R, a > 0, there is s ∈ R, s > 0 such that s2 = a.
Proof. Let A = {x ∈ R : x2 < a} ⊂ R. Then A is bounded above: Indeed, as x2 < a, there are
two cases:
• if a > 1, then x2 < a2 and hence x < a.

• if a ≤ 1, then x2 < 1 and hence x < 1.


In either case, A is bounded.
Note that A is not empty, because 0 ∈ A. Let s = sup A ∈ R. s 6= 0 because we can take
n large enough by the Archimedean property that n1 < a, therefore, ( n1 )2 < a (because n1 < 1),
therefore, n1 ≤ s.
We prove s2 = a by contradiction.
2
• if s2 < a, then we take  > 0 such that 0 <  < a−s
s (or s < a − s2 ) and  < s. Then
2
(s + 4 ) = s + s 2 + 16 < s + 2 + 2 < s + s < a, therefore, s is not an upper bound of
 2 2  2 s s 2

A (because s + 4 ∈ A), contradiction.


2
• if s2 > a, then we take  > 0 such that 0 <  < s s−a (or s < s2 − a) and  < s. Then
2
(s − 4 )2 = s2 − s 2 + 16 > s2 − s > a, therefore, s is not the least upper bound of A
(because s − 4 ∈ A is another upper bound, smaller than s), contradiction.


15

We denote it by s = a.
1
For any n ∈ N, we can define the n-th root of any positive number a and we denote it by a n .
The existence can be proved similarly.

Decimal representation of real numbers


We denote N0 = N ∪ {0}.
Any (positive) real number x ∈ R can be written in the form x = a0 + a101 + 100
a2
+ · · · , where
a0 is an integer and a1√ , a2 , · · · are integers between 0 and 9 (negative numbers can be most
commonly written as − 2 = −1.41421 · · · , although an analogous representation can apply to
negative numbers).
Examples:

• 1
3 = 0.33333 · · ·

• 2 = 1.41421 · · ·

• π = 3.14159 · · ·

Indeed, let x ∈ R be a real number and x > 0. By the Archimedean property, there is a
natural number n ∈ N0 such that n − 1 ≤ x < n (this is possible, because any subset of N has
the minimal element, which we prove below). We take a0 = n − 1.
Note that 0 < x − a0 < 1. Therefore, 0 < 10(x − a0 ) < 10. Take a1 ∈ N0 the largest natural
number such that a1 ≤ 10(x − a0 ). As it is the largest, we have again 0 < 10(x − a0 ) − a1 < 1.
We can repeat this procedure and obtain an , and it always hold that x − a0 .a1 · · · an <
· · 01}.
0.0| · {z
n-digits
Let A = {a0 , a0 .a1 , a0 .a1 a2 , a0 .a1 a2 a3 , · · · }. This A is bounded (by a0 + 1), hence it has the
supremum s. Note that x is un upper bound of A, hence sup A ≤ x. On the other hand, if for any
· · 01}, we have x −  < a0 .a1 · · · an ∈ A, therefore, x ≤ sup A. Altogether, x = sup A = s.
 = 0.0| · {z
n-digits

Proposition 20. A real number that is has nonrepeating decimal representation is irrational.

Proof. We prove that any (positive) rational number has a repeating decimal representation.
Then the claim follows by contradiction.
Let x = a0 .a1 a2 · · · = pq , p, q ∈ N. We can write p = nq + r, where n, r ∈ N and 0 ≤ r < q
(division with remainder). We set a0 = n. Then we write 10r = n1 q + r1 again, and wet a1 = n1 .
In this way, we obtain the decimal representation of pq , but there are only finitely many possible
values 0, 1, · · · q − 1 of r1 because we are doing the division with remainder by q. This means
that the numbers repeat after at largest q digits.

The converse of this (any irrational number has a nonrepeating decimal representation) will
be proven later.

Sep 30. Natural numbers and induction.


Mathematical induction
The set N of natural numbers can be caracterized by the Peano axioms:

• 1∈N

• For every n ∈ N, n + 1 ∈ N

16
• For every n ∈ N, n + 1 6= 1

• Let S ⊂ N. If 1 ∈ S and n + 1 ∈ S for any n ∈ S, then S = N.

In other words, N consists of 1 and all other numbers obtained by adding 1 repeatedly to 1, and
that is all. This is the precise definition of N.
With this characterization, we obtain the mathematical induction. Let ϕ(n) be a set of
propositions depending on n ∈ N. If ϕ(1) is true, and if we can prove ϕ(n + 1) from ϕ(n), then
ϕ(n) is true for all natural numbers. Indeed, let S = {n ∈ N : ϕ(n) is true }. S is a subset of N,
1 ∈ S and if n ∈ S, then n + 1 ∈ S. From the Peano axioms, we have S = N. In other words,
ϕ(n) holds for all n ∈ N.

Example 21. n2 ≥ 2n − 1 for all n.


Indeed, we apply mathematical induction to ϕ(n) = “n2 ≥ 2n − 1”. With n = 1, we have
1 ≥ 2 · 1 − 1 = 1.
If we assume that this holds for n, then (n + 1)2 = n2 + 2n + 1 ≥ 2n − 1 + 2n + 1 = 4n =
2n + 2n ≥ 2n + 1 = 2(n + 1) − 1, therefore, we proved ϕ(n + 1) from ϕ(n). We can now conclude
that ϕ(n) is true for all n ∈ N.

Exercise: prove that n > 0 for all n ∈ N.

The well-ordering principle


First we need the following.

Lemma 22. Let m, n ∈ N, m > n. Then m − n ∈ N.

Proof. This is proved by a double induction. Let ϕ(m, n) = “if m > n, then m − n ∈ N”. Let us
first set n = 1, m = 1. In this case, m > n is not true, so we do not have to prove anything.
Assume that ϕ(m, 1) is true, that is, if m > 1, then m − 1 ∈ N. To prove ϕ(m + 1, 1), assume
that m + 1 > 1, but m + 1 − 1 = m ∈ N. By induction, ϕ(m, 1) is true for all m ∈ N.
Assume that ϕ(m, n) is true for all m ∈ N. Assume that m > n + 1. Then, m − 1 > n > 0,
and m − 1 ∈ N by ϕ(m, 1). Then by ϕ(m − 1, n), m − (n + 1) = m − 1 − n ∈ N. that is, we
proved ϕ(m, n + 1).

We combine a proof by contradiction and mathematical induction.

Theorem 23. For any nonempty subset S ∈ N, there is the smallest element in S. That is,
there is n ∈ S such that n ≤ m for all m ∈ S.

Proof. Let us call A the assumption that S is not empty.


Let us assume the contrary, that S does not admit the smallest element (call this assumption
B). It means that, for any n ∈ S, there is m ∈ S such that m < n.
Let T = {n ∈ N : m > n, for all m ∈ S}. We show that T = N by induction.

• First, 1 ∈ T . To prove this, assume that 1 ∈


/ T (call this C1 ). Then, there must be m ∈ S
such that m ≤ 1. This means 1 ∈ S. But 1 is always the smallest element of any subset of
N, contradicting B. Therefore, C1 is false and we obtain 1 ∈ T .

• Next, let n ∈ T and we prove that n + 1 ∈ T . Assume that n + 1 ∈ / T (call this Cn ). Then,
there is m ∈ S such that m ≤ n + 1, but since n ∈ T , it must hold that n < m. This means
that m = n + 1 by the previous lemma, and any ` ≤ n does not belong to S. Therefore,
m = n + 1 would be the smallest element of S, contradicting B. Therefore, Cn is false and
we obtain n + 1 ∈ T .

17
Then by induction (the Peano axioms) we have T = N. This implies that for any m ∈ S it it
holds for m < n for all n ∈ T = N. But there is no such m (larger than any natural number),
hence S = ∅. This contradicts the assumption A of the theorem. Therefore, the assumption B
made in the proof is wrong. That is, S admits the smallest element.

Alternatively, this can be proved as follows, but using the axiom of the least upper bound
(the proof above uses only the Peano axioms). As N is bounded below, S is bounded below as
well. Let a = inf S. We show that a ∈ S. If not, for  = 12 , there is n ∈ S such that n < a + 21 .
But then 0 < n − a < 12 , which is impossible by the previous lemma.

Corollary 24. Let x ∈ R, x > 0. There is n ∈ N such that n − 1 ≤ x < n.

Proof. By the Archimedean principle, there is n such that x < n. Therefore, the set {m ∈ N :
x < m} is nonempty, and by the well-ordering principle, it has the smallest element n. As this
is the smallest element, n − 1 ≤ x.

We have used this property before to find the decimal representation of x.

The summation and product notations


Assume that we have a sequence of numbers, that is a family {an }n∈S of real numbers indexed
by S ⊂ N. This means that we have numbers a1 , a2 , a3 , · · · . Sometimes we start the index from
0, and have a0 , a1 , a2 , · · · .

Example 25. • a1 = 1, a2 = 2, a3 = 3, · · ·

• a1 = 1, a2 = 4, a3 = 9, · · ·

• a1 = 4, a2 = 2534, a3 = 3
361 (a finite sequence stops at some n ∈ N)

When we have a (finite) sequence, we can sum all these numbers up: a1 + · · · + an . We denote
this by the following symbol.
Xn
ak = a1 + · · · + an
k=1

In this symbol, k is a dummy index and plays no specific role. We have


n
X n−1
X
ak = ak+1 = a1 + · · · + an .
k=1 k=0

On the other hand, the P number on the top (n in this example) is where the sequence stops.
Similarly, we can define nk=m ak = am + am+1 + · · · + anPfor n ≥ m.
More precisely, this is a recursive definition: We define 1k=1 P
= a1 and n+1
P Pn
k=1 ak = k=1 ak +
n
an+1 Similarly to mathematical induction, we define in this way k=1 ak for all natural numbers
n ∈ N.

• a1 = 1, a2 = 2, a3 = 3. 3k=1 ak = 1 + 2 + 3 = 6.
P
Example 26.

• a1 = 1, a2 = 4, a3 = 9, a4 = 16. 4k=1 ak = 1 + 4 + 9 + 16 = 30.


P

Let us also introduce a symbol for product.


n
Y
ak = a0 · a1 · · · · · an
k=1
Q3
Example 27. • a1 = 1, a2 = 2, a3 = 3. k=1 ak = 1 · 2 · 3 = 6.

18
Q4
• a1 = 1, a2 = 4, a3 = 9, a4 = 16. k=1 ak = 1 · 4 · 9 · 16 = 576.
In particular, we denote
• For a ∈ R, an = nk=1 a. For example, a1 = a, a2 = a · a, a3 = a · a · a. By convention, for
Q
a 6= 0, we set a0 = 1.

• n! = nk=1 k = 1 · 2 · · · · · n. By convention, we set 0! = 1.


Q

For example, 2! = 2, 3! = 6, 4! = 24, · · · .

• For n, k ∈ N, n ≥ k, we define nk = k!(n−k)! n!


. For example, 4 4!
 
2 = 2!2! = 6.

Some useful formulas


The summation formulas
Proposition 28. We have the following.
Pn n(n+1)
• k=1 k = 2 .
Pn n(n+1)(2n+1)
• k=1 k
2 = 6 .
Pn a(1−an )
• For a 6= 1, k=1 a
k = 1−a .

Proof. We prove them by induction.


P1 Pn k(k+1)
• k=1 k = 1 = 2 = 1 is correct. Assume the formula
1·2
k=1 k = 2 for n, then
n+1
X n
X
k= k + (n + 1)
k=1 k=1
n(n + 1)
= + (n + 1)
n 2 
= + 1 (n + 1)
2
(n + 2)(n + 1)
= .
2
Then by induction the formula holds for all n ∈ N.
P1 n(n+1)(2n+1)
• 2 2 1·2·3
= 1 is correct. Assume the formula nk=1 k = for n,
P
k=1 k = 1 = 6 6
then
n+1
X n
X
k2 = k 2 + (n + 1)2
k=1 k=1
n(n + 1)(2n + 1)
= + (n + 1)2
 6 
n(2n + 1)
= + (n + 1) (n + 1)
6
(2n2 + n + 6n + 6)(n + 1)
=
6
(2n + 3)(n + 2)(n + 1)
=
6
(2(n + 1) + 1)((n + 1) + 1)(n + 1)
= .
6
Then by induction the formula holds for all n ∈ N.

19
P1 a(1−a) Pn a(1−an )
• k=1 a
k =a= 1−a is correct. Assume the formula k=1 a
k = 1−a for n, then

n+1 n
!
X X
ak = ak + an+1
k=1 k=1
a(1 − an )
= + an+1
1−a
a − an+1 + an+1 − an+2
=
1−a
a(1 − an+1 )
=
1−a
Then by induction the formula holds for all n ∈ N.

The binominal theorem

Lemma 29. n+1 n n


  
k = k−1 + k for n ≥ k.

Proof. We prove this by induction, but in a slightly different form: we prove that the formula is
correct for n = k, and prove that it holds for n + 1 assuming the formula for n. In this way, we
prove the formula for n ≥ k.
(k+1)!
If n = k, we have k+1 k! k
+ kk .
 
k = k!(k+1−k)! = k + 1 = (k−1)! + 1 = k−1
Assuming the formula for n, we have
 
n+2 (n + 2)!
=
k k!(n + 2 − k)!
 
n+2 n+1
=
n+2−k k
   
n+2 n n
= +
n+2−k k−1 k
 
n+2 n! n!
= +
n + 2 − k (k − 1)!(n − k + 1)! k!(n − k)!
(n + 1)! n! n+2 n!
= + + ·
(k − 1)!(n + 2 − k)! (k − 1)!(n + 2 − k)! n + 2 − k k!(n − k)!
(n + 1)! n! k + (n + 2)(n + 1 − k)
= + ·
(k − 1)!(n + 2 − k)! k!(n + 1 − k)! n+2−k
(n + 1)! n! k + (n + 1)(n + 1 − k) + n + 1 − k
= + ·
(k − 1)!(n + 2 − k)! k!(n + 1 − k)! n+2−k
(n + 1)! n! (n + 1)(n + 1 − k) + n + 1
= + ·
(k − 1)!(n + 2 − k)! k!(n + 1 − k)! n+2−k
(n + 1)! n! (n + 1)(n + 2 − k)
= + ·
(k − 1)!(n + 2 − k)! k!(n + 1 − k)! n+2−k
   
(n + 1)! (n + 1)! n+1 n+1
= + = + .
(k − 1)!(n + 2 − k)! k!(n + 1 − k)! k−1 k

Pn n
Theorem 30. For any a, b ∈ R, n ∈ N, (a + b)n = ak bn−k , where in this theorem we

k=0 k
mean 00 = 1.

20
Proof. By induction. For n = 0, this holds in the sense of 1 = 1.
Assume that this holds for n. Then,

(a + b)n+1 = (a + b)n · (a + b)
n  
X n k n−k
= (a + b) a b
k
k=0
n   n  
X n k+1 n−k X n k n−k+1
= a b + a b
k k
k=0 k=0
n+1
X n  n  
k n+1−k
X n k n−k+1
= a b + a b
k−1 k
k=1 k=0
n    
X n n
= + ak bn+1−k + an+1 b0 + a0 bn+1
k−1 k
k=1
n+1
X n + 1
= ak bn+1−k
k
k=0

For example, we have

• (x + y)2 = x2 + 2xy + y 2

• (x + y)3 = x3 + 3x2 y + 3xy 2 + y 3

• (x + y)4 = x4 + 4x3 y + 6x2 y 2 + 4xy 3 + y 4

and so on.

Oct. 4. Functions, domains and ranges.


Functions
By a function we usually mean “a map” which assigns to a number x another number f (x), or
an assignment
x 7−→ f (x).
There are many “real-world” examples of functions: When a quantity changes with time, you
can use x as time (or often you denote it by t) and the quantity by f (x). Or we can plot a set
of data that depend on a parameter (more concretely: you take a path on a mountain and set x
as the horizontal distance from the house and f (x) as the height at the point x).
More precisely, we can consider it as follows: for each number x there is another number
f (x), and nothing else. We can express this situation using ordered pairs.
Let us assume that we know the correspondence x 7−→ f (x), defined on a subset (“domain”)
S. Then we can draw the graph, namely, the subset {(x, y) ∈ S × R : y = f (x)}, or in other
words, we collect all points (x, y) where y = f (x).
More generally we can define a function to be a subset f of R × R such that for each x ∈ f
there is one and only one y. Also in this case we denote the relation by y = f (x). In this sense,
the graph and the function are the same thing.
Let us introduce some terminology.

• {x ∈ R : there is some (x, y) ∈ f } is called the domain of f .

21
y

Figure 10: Left: the graph of y = x. Right: the graph of y = x2 .

• {y ∈ R : there is some (x, y) ∈ f } is called the range of f .


Example 31. • f (x) = x. Namely, f = {(x, y) ∈ R × R : y = x}. The domain is R, the
range is R.
• f (x) = x2 . Namely, f = {(x, y) ∈ R × R : y = x2 }. The domain is R, the range is [0, ∞).
• f (x) = x5 − 2x3 + 1. f = {(x, y) ∈ R × R : y = x5 − 2x3 + 1}. The domain is R, the range
is R.
√ √
• f (x) = x for x ≥ 0. Namely, f = {(x, y) ∈ R × R : x ≥ 0, y = x}. The domain is
[0, ∞), the range is [0, ∞).
√ √
• f (x) = 1 − x for 1 − x ≥ 0, or x ≤ 1. Namely, f = {(x, y) ∈ R × R : x ≤ 1, y = 1 − x}.
The domain is (−∞, 1], the range is [0, ∞).
The set {(x, y) ∈ R 2
√× R : x +√ y = 1} is not a function. Indeed,2 for each
2 x ∈ (−1, 1), there are
two numbers y = 1 − x2 , − 1 − x2 that satisfy the equation x + y 2 = 1.
Let us introduce the absolute value of x ∈ R:
(
x if x ≥ 0
|x| :=
−x if x < 0
where := means we define the left-hand side by the right-hand side. This is also a function with
the domain R and the range [0, ∞).
We define the sign of x ∈ R:
if x > 0

1

sign x := 0 if x = 0
−1 if x < 0

We define the factorial of x ∈ N0 : f (n) = n!. The domain is N0 .

22
y


Figure 11: Left: the graph of y = x5 − 2x3 + 1. Right: the graph of y = x.

y y

x x


Figure 12: Left: the graph of x2 + y 2 = 1, not a function of x. Right: the graph of y = 1 − x2 .

y
y
x

Figure 13: Left: the graph of y = |x|. Right: the graph of y = sign x, with a “jump” at x = 0.

23
y

Figure 14: The graphs of y = 1


x on (−∞, 0) ∪ (0, ∞).

Operations on functions
When we have two or more functions, we can produce more functions. Let f (x) be a function
with domain S and g(x) a function with domain T .

• Sum. We can define the sum h(x) = f (x) + g(x), defined on S ∩ T .


Example: with f (x) = x, g(x) = x2 , h(x) = x + x2 .

• Product. We can define the product h(x) = f (x) · g(x), defined on S ∩ T .


Example: with f (x) = x, g(x) = x2 , h(x) = x3 .
f (x)
• Division. We can define the division h(x) = g(x) defined on S ∩ {x ∈ T : g(x) 6= 0}.
Example: with f (x) = x + 1, g(x) = (x + 2)(x − 1), h(x) = (x+2)(x−1) ,
x+1
defined on R \
{1, −2} = (−∞, −2) ∪ (−2, 1) ∪ (1, ∞).

• Composition. We can define the composed function h(x) = f (g(x)), defined on {x ∈ T :


g(x) ∈ S}.
√ √
Example: with f (x) = x, g(x) = x + 1, h1 (x) = x + 1, defined on {x ∈ R : x + 1 ≥ 0}.
Note that this is different from the composition in the reversed order: h2 (x) = g(f (x)) =

x + 1, defined on [0, ∞).

We say that a function f (x) is injective if for any pair x1 6= x2 in the domain, it holds that
f (x1 ) 6= f (x2 ). Similarly, we say that a function f (x) is surjective if the range is R. A function
which is both injective and surjective is said to be bijective.
For example, f (x) = x is injective and surjective (hence bijective), but f (x) = x2 is neither
injective nor surjective. But if we consider f (x) = x2 with the restricted domain [0, ∞), it is
injective: for positive numbers x1 6= x2 , x21 6= x22 .
For an injective function f (x), we can define the inverse function f −1 : the domain of f −1
is the range R of f , and it assigns to f (x) the number x: it is characterized by f −1 (f (x)) = x.
Its graph (its formal definition) is given by {(x, y) ∈ R × R : x ∈ R, x = f (y)}. The range of f −1
is the domain of f .
For example, consider f (x) = x2 on the domain [0, ∞). The range of f is [0, ∞), hence the
domain of f −1 is [0, ∞). For any x ∈ [0, ∞), we should have f −1 (f (x)) = f −1 (x2 ) = x, therefore,

f −1 (x) = x.

24
y


Figure 15: The graphs of y = x and y = x2 on [0, ∞).

The triangle inequality


Lemma 32. Let x, a ∈ R, a ≥ 0. Then |x| ≤ a if and only if −a ≤ x ≤ a.
Proof. Assume that x ≥ 0.
• If |x| = x ≤ a, then −a < 0 ≤ x ≤ a.
• If −a ≤ x ≤ a, then |x| = x ≤ a.
Instead, if we assume that x < 0, then
• If |x| = −x ≤ a, then −a ≤ x < 0 ≤ a.
• If −a ≤ x ≤ a, then |x| = −x ≤ a.

Theorem 33. For any x, y ∈ R, it holds that |x + y| ≤ |x| + |y|.


Proof. We have −|x| ≤ x ≤ |x|, −|y| ≤ y ≤ |y| by Lemma, therefore, −|x|−|y| ≤ x+y ≤ |x|+|y|,
and again by Lemma this implies that |x + y| ≤ |x| + |y|.

Corollary 34. For any x1 , x2 , · · · , xn ∈ R, it holds that | nk=1 ak | ≤ nk=1 |ak |.


P P

P1 P1
Proof. By induction. For n = 1, k=1 a1 = |a1 | = k=1 |ak | is obvious. Assuming the
inequality for n, we have
n+1
X n
X
ak = ak + an+1
k=1 k=1
Xn
≤ ak + |an+1 | by Theorem
k=1
n
X
≤ |ak | + |an+1 | by induction hypothesis
k=1
n+1
X
= |ak |
k=1

which concludes the induction.

25
Oct 06. Sequence and convergence of sequence.
Convergence of sequences
We saw sequences of real numbers a1 , a2 , · · · . A sequence can be infinite, that is, it continues
infinitely. For example,
• a1 = 1, a2 = 2 and in general, an = n.

• a1 = 1, a2 = 4 and in general, an = n2 .
A sequence can be considered as a function with the domain N.
Among sequences, we have seen the following:
• a1 = 1, a2 = 1
2 and an = n1 .

• a1 = 21 , a2 = 3
4 and an = 1 − 2n .
1

Intuitively, the first of them gets closer and closer to 0, while the second one gets closer and
closer to 1. But what does it mean that it gets closer to a number?
We make precise the notion that a sequence get “arbitrarily” close to a number as follows.
Definition 35. Let {an } be a sequence of real numbers. If there is L ∈ R such that for each
 > 0 there is N such that for n ≥ N it holds that |an − L| < , we say that {an } converges
to L.
We write this situation as limn→∞ an = L, or simply an → L.
Example 36. Let us see some convergent sequences.
• a1 = 1, a2 = 12 and an = n1 . We expect that this sequence converges to 0. Indeed, for any
 > 0, there is N such that N1 <  (the Archimedean property). Furthermore, if n > N ,
then | n1 − 0| = n1 < N1 < , therefore, with L = 0, we have that {an } converges to 0.

• a1 = 21 , a2 = 34 and an = 1 − 21n . We expect that this sequence converges to 1. Indeed, for


any  > 0, there is N such that N1 <  and note that 2N1  < N1 . Furthermore, if n > N ,
then 21n < N1 and hence |1 − 21n − 1| = 21n < N1 < , therefore, with L = 1, we have that
{an } converges to 1.

• The sequence an = converges to 0. Indeed, for each , there is N such that


√1
n
1
N < ,
and hence if n > N , then √1n < √1 2 = N1 < .
2
N

Note that
• If {an } converges to L, then it does not converge to any other number. Indeed, if x 6= L,
then take N such that |an − L| < 12 |L − x| for n > N . Then by the triangle inequality
|L − x| < |an − x| + |an − L|, and hence |an − x| > |x − L| − |an − L| > 12 |L − x| 6= 0.
Therefore, {an } does not converge to x.

• The sequence a1 = 1, a2 = 0, a3 = 1, · · · , an = 1
2 (1 − (−1)n ) does not converge to any
number.

• The sequence a1 = 1, a2 , · · · , an = n does not converge to any number.

• In general, if for any x there is an Nx ∈ N such that for n > Nx it holds that |an | > x,
then we say that {an } diverges.

• The sequence an = 2n diverges.

26
− 
( )
0 a100 a10 a5 a4 a3 a2 a1

0.5

1 2 3 4 5 10

Figure 16: Up: the sequence an = 1


n plotted on the line. Bottom: the sequene an = 1
n as a
function on N.

0.5

1 2 3 4 5 10

Figure 17: The sequence an = 12 (1 − (−1)n ) as a function on N.

27
0 a8 = b3 a4 = b2 a2 = b1

Figure 18: The subsequence a2n of the sequence an = n1 .

Some properties of convergent sequences


Given a sequence {an }, one can take a subsequence of it. That is, we take an increasing
sequence of natural numbers m1 < m2 < m3 < · · · and define a new sequence bn = amn .

Example 37. Given an = 1


n and mn = 2n , the subsequence is a2n = 2n .
1

If {an } is convergent to L, then any subsequence {amn } is convergent to L. Indeed, as


m1 < m2 < m3 · · · , we have n ≤ mn and hence, for any  > 0, we take N such that |an − L| < 
for n > N , hence all n > N , |amn − L| < .
We say that {an } is nondecreasing (respectively nonincreasing) if an ≤ an+1 (respectively
an ≥ an+1 ) holds for all n ∈ N. A sequence {an } is said to be bounded above (respectively
bounded below) if there is M ∈ R such that an ≤ M (respectively an ≥ M ) for all n ∈ N).

Lemma 38. Let {an } be a nondecreasing sequence and bounded above. Then an converges to a
certain real number L ∈ R.

Proof. Let A = {an : n ∈ N}. As {an } is bounded above, A is bounded above. We put L = sup A.
By Lemma 10, for each  > 0, there is N ∈ N such that L −  < aN . As an is nondecreasing,
we have L −  < an for all n > N . On the other hand, we have an ≤ L because L = sup A.
Altogether, |an − L| <  for such n. As n was arbitrary, an converges to L.

Note that |ab| = |a||b|.

Theorem 39. The following hold.

• If an → L, then there is L̃ such that |an | < L̃ for all n.

• If an → L, bn → M , then an + bn → L + M, an · bn → LM . If M 6= 0, then bn 6= 0 for


sufficiently large n and abnn → M
L
.

• If an > 0 diverges, then 1


an converges to 0.

Proof. • Assume that an → L. Given, say 1, there is N such that |an − L| < 1 for n > N ,
hence |an | < L+1 for n > N . Then, we can take a number L̃ such that |a1 |, · · · , |aN −1 | < L̃
and L + 1 < L̃.

• Let  > 0 be arbitrary. There are N1 , N2 ∈ N such that for n > N1 (respectively n > N2 )
it holds that |an − L| < 2 (respectively |bn − M | < 2 ). Let N be the largest of N1 , N2 .
Then we have
 
|an + bn − L − M | ≤ |an − L| + |bn − M | < + = .
2 2
hence an + bn converges to L + M .
As for the product, given  > 0, we take N such that |an −L| < 2(|M|+1) , |bn −M | < 
2(|L|+1)
and |bn | < |M | + 1 for n > N (this can be done as in the case of sum). Then

|an bn − LM | = |an bn − bn L + bn L − LM | ≤ |(an − L)bn )| + |(bn − M )L|


 
≤ |an − L||bn | + |bn − M ||L| ≤ + = ,
2 2

28
which shows the desired convergence.
|M |
We prove 1
bn →M . If bn → M and M 6= 0, then |bn − M | < 2 for sufficiently large n,
1

and hence |bn | > |M |


2 , in particular bn 6= 0. We can now show that bn → M . Indeed, by
1 1
2
taking N such that |bn − M | < M2
M 2
1 1 |M − bn | 2
− = < M2
= ,
bn M |M ||bn |
2

which shows 1
bn → M.
1
Now an
bn → L
M follows from this and the product with an .

• For any  > 0, there is N such that for n > N it holds that |an | > 1 , that is 1
an < , hence
an converges to 0.
1

We denote a−n = an .
1

Proposition 40. The following hold.


• Let a > 1. Then an diverges.
• Let 0 < a < 1. Then an converges to 0.
• Let 0 < a < 1. Then bn = nk=1 ak converges to a
P
1−a .

Proof. • If a > 1, we can write a = 1 + y where y > 0. By the binomial theorem, we have
n  
n n
X n k n−k
a = (1 + y) = 1 y > 1 + ny,
k
k=0

by only taking the terms k = n, n − 1. Now it is clear that for any x there is large enough
n such that 1 + ny > x, therefore, x < 1 + ny < an , that is, an diverges.
• If 0 < a < 1, then
> 1 and ( a1 )n diverges. Therefore, an = ( a1 )−n converges to 0.
1
a
n)
• We know that bn = nk=1 ak = a(1−a
1−a , and a → 0.
n
P

Oct. 07. Continuity of functions.


Decimal representation of real numbers
Now that we have defined convergence of sequences, we can make sense of all decimal represen-
tations as real numbers.
Pn −k converges to a real
Theorem 41. Let an ∈ N0 and 0 ≤ an ≤ 9. Then bn = k=0 ak 10
number.
Proof. Let bn = nk=0 ak 10−k . This is nondecreasing and bounded above by a0 + 1. By Lemma
P
38, this converges to a real number.

When the sequence converges, it converges to only one number. In this way, we can say that
a decimal representation a0 .a1 a2 a3 · · · defines a real number.
Now we can prove that any repeating decimal representation gives a rational number. For
example consider 0.123123123 · · · . This can be written as
n
X
0.1 + 0.02 + 0.003 + 0.0001 + 0.00002 + 0.000003 + · · · = ak 10−k ,
k=0

29
where a1 = 1, a2 = 2, a3 =P3, a4 = 1, a5 = 2, a6 = 3, · · · . It is easy to see that this is equal to
0.123 + 0.000123 + · · · = nk=1 (100a3k+1 + 10a3k+2 + a3k+3 )1000−k . We know that this sum
converges and compute
n
X n
X
−k
(100a3k+1 + 10a3k+2 + a3k+3 )1000 = 123 1000−k
k=1 k=1
1000−1 123
→ 123 −1
= .
1 − 1000 999
Proposition 42. Any real number given by a repeating decimal representation is rational.

Proof. Indeed, let us take a repeating sequence 0 ≤ an ≤ 9 as above. That is, there is m ∈ N
such that an+m = am .
Then, for j, ` ∈ N,
j`
X `
X m
X
ak = a0 + 10−jm ak 10m−k
k=0 j=1 k=1
m
X 10−m (1 − 10−j` )
= a0 + ( ak 10m−k )
1 − 10−m
k=1
m m
X
m−k 10−m X 1
→ a0 + ( ak 10 ) = a 0 + ( ak 10m−k ) m
1 − 10−m 10 − 1
k=1 k=1

as ` → ∞. The last expression is evidently rational.

Theorem 43. For any real number a there is a sequence an of rational numbers such that
an → a.

Proof. Take the decimal representation of a, truncate it to the n-th digit, and call it an . Then
{an } are rational and an → a.

Continuity of functions
Let us go back to studying functions. Among functions, we saw the sign function

if x > 0

1

sign x := 0 if x = 0
−1 if x < 0

and its graph (Figure 13) has a “jump” at x = 0.


Intuitively, the “jump” means that, the value at x = 0 is 0, but if one approaches to 0 from
the right, the value of the function remains 1, while it is −1 from the left.
Let us make this precise.

Definition 44. Let f be a function defined on S (the domain), and let a ∈ R such that there is
a sequence xn ∈ S, xn 6= a such that xn → a. We write

lim f (x) = L
x→a

if for any  > 0 there is δ > 0 such that |f (x) − L| <  for any x 6= a, |x − a| < δ.
(
1 if x 6= 0
Example 45. Let f (x) = .
0 if x = 0

30
y = f (x)

L+

L−

a− a+

Figure 19: The limit limx→a f (x).

• Consider a = 2. Then, for any , we can take δ = 21 and |f (x) − 1| = |1 − 1| = 0 for any
x ∈ (2 − δ, 2 + δ) = ( 23 , 25 ). Therefore, limx→2 f (x) = 1. A similar situation holds for any
x 6= 0.

• Consider a = 0. Then, for any x 6= 0, f (x) = 1, hence again we have limx→0 f (x) = 1,
although f (0) = 0 by definition.

• For the function sign x (Figure 13), there is no limit limx→0 f (x) at x = 0.

The limit makes precise the concept of “approaching a point”. The absence of “jump” can
also be formalized using limit.

Definition 46. Let f be a function defined on S (the domain), and let a ∈ S (this time a is in
the domain) such that there is a sequence xn ∈ S, xn 6= a such that xn → a. We say that f is
continuous at a if limx→a f (x) = f (a). We say that f is continuous on S if it is continuous at
each point in S.

Now we can understand the “jumps” in terms of limit and continuity.

Example 47. • The function sign x is not continuous at x = 0, because it does not have
limx→0 sign x.
(
1 if x 6= 0
• The function f (x) = is not continuous at x = 0, because limx→0 f (x) = 1 6=
0 if x = 0
0 = f (0).

• The function f (x) = c is continuous. Indeed, let us fix a ∈ R. For any , |f (x) − c| =
|c − c| = 0 < , hence limx→a f (x) = c = f (a).

31
y

(
1 if x 6= 0
Figure 20: The graph of y = .
0 if x = 0

• The function f (x) = x is continuous. Indeed, let us fix a ∈ R. Then, for each  > 0, we
take δ =  and for |h| < δ =  it holds that |f (a + h) − a| = |a + h − a| = |h| < δ = ,
therefore, limx→a f (x) = a = f (a).

Theorem 48. Let f, g be functions defined on S, and let a such that there is {xn } ⊂ S, xn → a.
Assume that limx→a f (x) = L and limx→a g(x) = M . Then

• There is δ > 0 such that if |x − a| < δ, x 6= a then |g(x)| ≤ |M | + 1.

• limx→a (f (x) + g(x)) = L + M and limx→a (f (x)g(x)) = LM .


|M |
• Assume that M 6= 0, then there is δ > 0 such that, if |x − a| < δ, then |g(x)| > 2 for x
such that |x − a| < δ, x 6= a.
f (x)
• If L 6= 0, then limx→a g(x) = L
M.

Furthermore, if a ∈ S and if f, g are continuous at a, then f + g, f g are continuous at a. If


g(a) 6= 0, then fg is continuous at a.

Proof. The proof is similar to that of Theorem 39.

• Let δ > 0 such that |g(x)−M | < 1 for x such that |x−a| < δ, x 6= a. Then |g(x)| < |M |+1.

• For a given  > 0, let δ > 0 such that |f (x) − L| < 2 , |g(x) − M | < 2 for |x − a| < δ, x 6= a.
Then |f (x) + g(x) − L − M | < 2 + 2 = , which shows the desired limit.
For the product, for a given  > 0, let δ > 0 such that |f (x) − L| < 2(|M|+1) , |g(x) − M | <
2(|L|+1) and |g(x)| < |M | + 1 for |x − a| < δ, x 6= a. Then |f (x)g(x) − LM | = |f (x) −


(|M |+1) |L|


L||g(x)| + |g(x) − M ||L| < 2(|M |+1) + 2(|L|+1) < , which shows the desired limit.

• Let δ > 0 such that |g(x) − M | < |M |


2 for x such that |x − a| < δ, x 6= a. Then, by the
triangle inequality, |g(x)| ≥ |M | − |g(x) − M | ≤ |M |
2 .

• We show that limx→a 1


g(x) = M.
1
Then the general case follows from this and the limit of
|M |
product. Assume M 6= 0, and let  > 0. Then there is δ > 0 such that |g(x) − M | < 2 for

32
x 6= a, |x − a| < δ and hence |g(x)| > |M |
2 , in particular g(x) 6= 0. Now, there is δ̃ > 0, δ̃ < δ
2
such that for x 6= a, |x − a| < δ̃ it holds that |g(x) − M | < M2 . Then

M 2
1 1 |M − g(x)| 2
− = < M2
= ,
g(x) M |M ||g(x)|
2

which shows the desired limit.

If f, g are continuous, then limx→a f (x) = f (a), limx→a g(x) = g(a), hence limx→a (f (x)+g(x)) =
f (a) + g(a), limx→a f (x)g(x) = f (a)g(a), limx→a fg(x)
(x)
= fg(a)
(a)
.

From this, we know that

• If f (x) = a0 + a1 x1 + · · · an xn (a polynomial), then f is continuous. f (x) = x2 , f (x) =


x5 + 34x3 − 454...
P (x)
• If f (x) = Q(x) and P (x), Q(x) are polynomial, then f is continuous at x if Q(x) 6= 0.
x3
f (x) = x−2
x2
is continuous on x 6= 0 (actually defined on {x ∈ R : x 6= 0}, f (x) = x2 −1
=
x3
(x−1)(x+1) is continuous on x 6= −1, 1.

Oct. 11. Properties of continuous functions.


Sequences and continuity of functions
We can use sequences to study functions, especially regarding continuity. Let f be a function
defined on a certain domain S and {xn } a sequence in S. Then we can construct a new sequence
by {f (xn )}.

Theorem 49. Let f be a function defined on S. f is continuous at a ∈ S, that is, limx→a f (x) =
f (a) if and only if it holds that f (xn ) → f (a) for all sequences {xn } in S such that xn → a, xn 6=
a.

Proof. Assume that limx→a f (x) = f (a). Then, for each  > 0, there is δ > 0 such that if
|x − a| < δ, then it holds that |f (x) − f (a)| < . Let us take any sequence {xn } such that
xn → a. This means that, for δ above, there is N such that |xn − a| < δ for n > N . Then
by the observation above, we have |f (xn ) − f (a)| < . This shows that, for n > N , we have
|f (xn ) − f (a)| < . Therefore, for the given  we found N such that |f (xn ) − f (a)| <  for n > N .
This means that f (xn ) → f (a).
Conversely, assume that f (xn ) → f (a) for all sequences {xn } such that xn → a, xn 6= a.
To do a proof by contradiction, let us assue that there is  > 0 for which for all δ there is
x ∈ S, x 6= a such that |x − a| < δ but |f (x) − f (a)| > . Let us take δn = n1 . For each δn there
is xn ∈ S such that |xn − a| < n1 , x 6= a but |f (xn ) − f (a)| > . Then, it is clear that xn → a,
but f (xn ) is not converging to f (a), which contradicts the assumption. Therefore, it must hold
that limx→a f (x) = f (a).

Lemma 50. Let xn → x and xn ≤ a. Then x ≤ a.

Proof. Assume the contrary, that is, x > a. Then there is N such that |xn − x| < x−a
2 , and
|x−a| |x−a|
xn − a = xn − x + x − a > |x − a| − 2 = 2 , which contradicts xn ≤ a. Therefore, x ≤ a.

It also holds that, if xn → x and xn ≥ a, then x ≥ a.


We can show that, if A ⊂ [a, b], then sup A ∈ [a, b]: indeed, we can take a sequence {xn } ⊂ A
such that xn → sup A by Lemma 10.

33
y = f (x) y = f (x)

f (b)

f (a) + 

f (a) − 
c

f (a)
x
a
[ ]
a − δ xn a + δ a x b

Figure 21: Left: continuity of f and a sequence xn → a. Right: the intermediate value c is taken
at x.

Theorem 51 (the intermediate value theorem). Let f be a continuous function on a closed


interval [a, b]. Assume that f (a) < f (b). Then, for any value c ∈ (f (a), f (b)), there is x ∈ (a, b)
such that c = f (x).

Proof. Let c ∈ (f (a), f (b)), and we define A = {x ∈ [a, b] : f (x) < c}. A is bounded above,
because it is contained in [a, b], therefore, we can take x = sup A. By Lemma 10, for each
n, there is xn ∈ A such that x − n1 < xn , hence xn → x. Since f is continuous, we have
f (x) = limn→∞ f (xn ). On the other hand, xn ∈ A, hence f (xn ) < c and hence f (x) ≤ c by
Lemma 50.
x 6= b because f (b) > c ≥ f (x). Therefore, we can take a sequence xn > x, xn → x in
the interval [x, b], and then f (xn ) ≥ c because xn ∈
/ A. By continuity of f , we have f (x) =
limn f (xn ) ≥ c. Altogether, we have f (x) = c.

Composition and inverse functions


Let f, g be two functions, f defined on S and g defined on the image (range) of f : f (S) = {y ∈
R : there is x ∈ S, y = f (x)}. Recall that we can compose two functions: for x ∈ S, g(f (x))
gives a number, hence the correspondence x → g(f (x)) is a function on S. We denote this
composed function by g ◦ f .

Theorem 52. In the situation above, if f and g are continuous, then g ◦ f is continuous as well.

Proof. We take a ∈ S. Given  > 0, there is δ1 > 0 such that |g(y)−g(f (a))| <  if |y−f (a)| < δ1 ,
by continuity of g. For this δ1 , there is δ2 > 0 such that |f (x) − f (a)| < δ1 if |x − a| < δ2 .
Altogether, we have |g(f (x)) − g(f (a))| <  if |x − a| < δ2 , hence we have proved the continuity
of g ◦ f .

Definition 53. Let f be a function on S. We say that f is monotonically increasing (non-


decreasing, decreasing, nonincreasing, respectively) if for each x1 , x2 ∈ S, x1 < x2 it
holds that f (x1 ) < f (x2 ) (f (x1 ) ≤ f (x2 ), f (x1 ) > f (x2 ), f (x1 ) ≥ f (x2 ), respectively).

Example 54. (Non)examples of monotonic functions.

• f (x) = x is monotonically increasing.

34
y = f (x)

y = f −1 (x)

y+δ
)

y−δ
)

x− x+

Figure 22: The continuity of the inverse function. For a given , we can take δ.

• f (x) = x2 is not monotonically increasing on R, but it is so on R+ .

• f (x) = sign x is monotonically nondecreasing.

If a function f is monotonically increasing (or decreasing), it is injective: for x1 6= x2 , it holds


that f (x1 ) 6= f (x2 ). Therefore, we can consider its inverse function.

Theorem 55. Let f be a monotonically increasing continuous function on an interval [a, b]. Then
the inverse function f −1 defined on [f (a), f (b)] is monotonically increasing and continuous.

Proof. Note that the domain of f −1 is [f (a), f (b)] by the intermediate value theorem (continuity
of f is needed here).
Let us first show that f −1 is monotonically increasing. For each y1 < y2 , y1 , y2 ∈ [f (a), f (b)],
there are x1 , x2 ∈ S such that y1 = f (x1 ) and y2 = f (x2 ) by the intermediate value theorem and
we have x1 < x2 by monotonicity of f . This means that f −1 (y1 ) = x1 < x2 = f −1 (y2 ), that is,
f −1 is monotonically increasing.
Let x0 ∈ (a, b). For a given  > 0, we take δ as the smaller of f (x0 + ) and f (x0 − ) (if
x0 ±  are not in S, replace them by a or b). Then for any y ∈ (f (x0 ) − δ, f (x0 ) + δ), we have
f −1 (y) ∈ S ∩ (x0 − , x0 + ) by monotonicity of f . This is the continuity of f −1 .
If x0 = a or b, then we only have to consider one side.

Roots and power functions


Let us consider f (x) = xn defined on R+ ∪ {0}. This is monotonically increasing (because, if
x1 < x2 , then xn2 = (x1 + (x2 − x1 ))n > xn by the binomial theorem). Therefore, we can define
1
the inverse function f −1 (x) and denote it by x n . This shows that, for any x ∈ R+ ∪ {0}, there is
1
one and only one y such that y n = x. The function g(x) = x n is monotonically increasing and
continuous by Theorem 55.
Let p, q ∈ N, x ≥ 0. Note that we have (xp )q = xpq = (xq )p . Then it is easy to see
1 1 1 1 1 1
that (xp ) q = (x q )p : if y = (xp ) q , then y q = xp = ((x q )q )p = (x q )pq and hence y = (x q )p .

35
1
Furthermore, let m ∈ N. Then for y = (xmp ) mq we have (y q )m = y mq = xmp = (xp )m , and hence
1
y q = xp and y = (xp ) q .
p
Therefore, we can write y = x q and no confusion arises.

Oct. 13. Exponential functions.


p
For a > 0 and p, q ∈ N, we have defined a q . Then the natural question arises whether ax can be
defined for real numbers x.
For a fixed a > 0, we can consider f (x) = ax as a function defined on the set of rational
numbers Q.
Lemma 56. We have the following.
p p
r + rs
• For p, q, r, s ∈ N, we have a q a s = a q .
p r pr
• For p, q, r, s ∈ N, we have (a ) s = a .
q qs

• If a > 1, then f (x) = ax is monotonically increasing (as a function on Q).

• If 0 < a < 1, then f (x) = ax is monotonically decreasing.


p ps r qr
Proof. • Recall that we have a q = a qs and a s = a qs , and hence
p ps qr 1 1 1 ps+qr p
r + rs
a q a s = a qs a qs = (a qs )ps (a qs )qr = (a qs )ps+qr = a qs = aq .

• We will prove this as an exercise.


1 1
• Let us take a > 1. First, for any q ∈ N, a q > 1, indeed, if a q ≤ 1, we would have
1
a = (a q )q ≤ 1, contradiction.
If x1 , x2 ∈ Q and x1 < x2 , we may assume that x1 = pq , x2 = r
q and p < r. Then
p 1 1 r
ax1 = a q = (a q )p < (a q )r = a q = ax2 .

• The case 0 < a < 1 is similar.

We would like to define ax by limn→∞ axn , where xn ∈ Q and xn → x ∈ R. For this purpose,
we need some properties of sequences.
Lemma 57. If an ≤ bn and an → L, bn → M , then L ≤ M .
Proof. Consider bn − an ≥ 0. By Lemma 50, bn − an → M − L ≥ 0, hence M ≥ L.

We write an → ∞ if for any x ∈ R there is N such that an > x for n > N .


Theorem 58. Let an ≤ bn ≤ cn be three sequences. If an → L and cn → L, then bn → L.
Similarly, if an → ∞, then also bn → ∞.
Proof. For a given  > 0, we take N such that for n > N it holds that |an − L| <  and
|cn − L| < . For a fixed n > N , this means that L −  < an ≤ bn ≤ cn < L + , and hence
|bn − L| < . This means that bn → L.
If an → ∞, then for a given x there is N such that x < an ≤ bn , hence bn → ∞.

For a statement like “there is N such that for n > N ...” we say simply that “for sufficiently
large n...”.

36
sup{ak : k ≥ 5}

inf{ak : k ≥ 6}

Figure 23: A Cauchy sequence.

Proposition 59. We have the following.


an
• For a > 1, p ∈ N, we have np diverges.
1
• It holds that n n → 1.
1
• For a > 1, we have a n → 1.
Proof. • Let us consider first p = 1. Then, writing a = 1 + y with y > 0, we have, for n ≥ 2,
n  
n n
X n k n−k n(n − 1) 2
a = (1 + y) = 1 y >1+ y ,
k 2
k=0

an (n−1)y 2 (n−1)y 2 an
and hence n > 2 . As 2 → ∞, so does it hold n → ∞.
n
 n
p
1 1
ap an ap
For a general p ∈ N, we take a , then 1 < a and
p p
n → ∞, hence np = n → ∞.

1
• Let  > 0. We need prove that n n < 1 +  for sufficiently large n. Equivalently,n n <
(1+)n
(1 + ) . This follows from the previous point that n → ∞, in particular, (1+)
n
n >1
for sufficiently large n.
1 1
• 1 < a n < n n for a < n, therefore the claim follows from Theorem 58.

Definition 60. A sequence an is said to be a Cauchy sequence if for each  > 0 there is N ∈ N
such that |am − an | <  for m, n > N .
Differently from the convergence to a number L, this says that two elements in the sequence
are close to each other for large enough m, n.
Lemma 61. A sequence an is convergent if and only if it is a Cauchy sequence.
Proof. If an → L, then for any  > 0 we can take N such that |an − L| < 2 for n > N , therefore,
if n, m > N , then |am − L| < 2 as well and hence |am − an | ≤ |am − L| + |L − an | < .
Conversely, if an is Cauchy, then it is bounded. Indeed, we take N such that |am − aN +1 | <
1, then this means that |am | < |aN +1 | + 1. Then we can take the largest number among
|a1 |, · · · , |aN |, |aN +1 | + 1 as a bound. Next, we consider the sequence

bn = inf{ak : k ≥ n}.

37
This is well-defined because {ak : k ≥ n} is bounded. Furthermore, this sequence is increasing
because {ak : k ≥ n + 1} ⊃ {ak : κ ≥ n}. Therefore, bn converges to some number L. Similarly,
with cn = sup{ak : k ≥ n}, this is bounded and decreasing, hence converges to M .
Note that bn ≤ an ≤ cn , therefore, L ≤ M . Actually, we have L = M . Indeed, for given  > 0,
we can find sufficiently large `, m, n such that |cn −M | < 5 , |a` −cn | < 5 , |bn −L| < 5 , |am −bn | < 5
and |a` − am | < 5 . This implies that |M − L| <  for arbitrary  > 0, hence it must hold M = L.
Now, as bn , cn → L = M and bn ≤ an ≤ cn , we have an → L by Theorem 58.

Finally, we can define ax for all real number x.


Proposition 62. Let a > 0, xn ∈ Q, xn → x. Then axn converges. If yn ∈ Q, yn → x, then
limn→∞ axn = limn→∞ ayn .
Proof. Note that {xn } is bounded, hence {axn } is bounded as well, say by M , because the
exponential function on Q is monotonic. We show that axn is convergent. To see this, it is
enought to see that axn is Cauchy by Lemma 61.
For a given  > 0, we take δ > 0 such that |az − 1| < M for 0 < z < δ. For sufficiently large
m, n, we have |xm − xn | < δ and in that case,

|axm − axn | = |axm ||1 − axn −xm | ≤ M |1 − axn −xm | < M = .
M
This means that {axn } is a Cauchy sequence, and hence it converges to a certain real number,
which we call ax .
If {yn } is another sequence converging to x, then we can consider a further new sequence
x1 , y1 , x2 , y2 , · · · , and this converges to some number. But the subsequence {xn } converges to
ax , and hence the whole sequence and hence {yn } must converge to ax as well.

As we said in the proof, for an arbitrary real number x ∈ R, we define the exponential
function by
ax := lim axn , where xn ∈ Q, xn → x.
n→∞
The exponential functions appear in various natural phenomena. It happens typically when
we consider a collection of objects that increase or decrease independently (such as colonies of
bacteria, radioactive nuclei, and so on). See Figure 24.

Oct. 14. Logarithm.


Some properties of exponential functions
Proposition 63. We have the following.
• For a > 1, f (x) = ax is monotonically increasing and continuous.

• ax ay = ax+y .

• (ax )y = axy .
Proof. • Let x < y. Then we take sequences xn → x, yn → y, where xn , yn ∈ Q. Then
for sufficiently large n we have xn < z1 < z2 < yn where z1 , z2 ∈ Q, and therefore,
ax ≤ az1 < az2 ≤ ay .
As for continuity, let us take x, xn ∈ R and xn → x. Then there is yn ∈ Q such that
|axn − ayn | < n1 and |xn − yn | < n1 . Then yn → x as well, hence ayn → ax , while
ayn − n1 < axn < ayn + n1 , therefore, axn → ax .

• Take sequences xn → x, yn → y, xn , yn ∈ Q. We have axn ayn = axn +yn , and xn +yn → x+y,
therefore, ax ay = ax+y .

38
y = 3x y = 2x

y = 0.7x

Figure 24: The exponential functions.

• Take sequences xn → x, yn → y, xn , yn ∈ Q. For fixed m, we have (axn )ym → (ax )ym and
this is equal to axn ym → axym . Now we take the limit m → ∞ and obtain (ax )y = axy by
continuity.

Napier’s number
Let us introduce Napier’s number. We take

1 n 1 n+1
     
1
en = 1 + , En = 1 + = 1+ en
n n n

Lemma 64. For x ≥ −1, we have (1 + x)n ≥ 1 + nx for all n.

Proof. By induction. With n = 0, 1, we have (1 + x)0 = 1 = 1 and 1 + x = 1 + x. Assuming


that this holds for n, we expand

(1 + x)n+2 = (1 + x)n (1 + x)2 ≥ (1 + nx)(1 + x)2


= 1 + nx + 2x + x2 + 2nx2 + nx3 = 1 + (n + 2)x + x2 (1 + 2n + nx) ≥ 1 + (n + 2)x

because 1 + 2n + nx ≥ 0. This completes the induction for even and odd numbers.

Theorem 65. en and En converge to the same number2 e.

Proof. The proof of this theorem requires several steps.

• We have 1 < en < En . Indeed, 1 < 1 + n1 , and this follows easily.


2
This proof is take from L. Chierchia “Corso di analisi. Prima parte.” McGrow Hill.

39
• en is monotinically increasing, that is, en < en+1 . Indeed,

en (1 + n1 )n (1 + n1 )n 1 n n−1 n−1
= 1 n−1 = n n−1 = (1 + n ) · ( n )
en−1 (1 + n−1 ) ( n−1 )
(1 + n1 )n · ( n−1
n )
n (1 + n1 )n · (1 − n1 )n
= n−1 = n−1
n n
1 n 1
(1 − n2 ) 1− n
= n−1 ≥ 1 = 1.
n 1− n

Similarly En is monotonically decreasing. Indeed.

En (1 + n1 )n+1 1 + n1 1 + n1
= 1 n = n n = 2
En−1 (1 + n−1 ) ( n−1 )n ( n+1 )n ( n2n−1 )n
1 + n1 1 + n1 1+ 1
n
= 1 n
≤ n < 1 = 1.
(1 + n2 −1 ) 1 + n2 −1 1+ n

• Now we have that {en } and {En } are convergent. Note also that En − en = en (1 + n1 − 1) =
en · n1 → 0, because en is bounded, say by M , and n1 → 0, therefore, En − en ≤ Mn → 0.

We call this limit e, the Napier’s number (sometimes Euler’s number). The function ex
plays a special role in analysis, as we will see below.

Logarithm
Let a > 0, a 6= 1. We have defined the exponential function f (x) = ax , and we have seen that it
is continuous, monotonically increasing if a > 1. If 0 < a < 1, it is monotonically decreasing.
Let a > 1. We know that an diverges, and hence a−n → 0. By the intermediate value
theorem, we see that the range of ax is R+ . Now we can define the inverse function (everything
is analogous for 0 < a < 1).

Definition 66. The logarithm base a of x loga x is the inverse function f (y) = ay : loga :
R+ 7−→ R and it holds that
loga ax = x = aloga x .
We denote log x = loge x = ln x.

Example 67. log2 8 = 3, log9 3 = 12 .

We say that limx→∞ f (x) = ∞ if for each Y > 0 there is X > 0 such that if x > X then
f (x) > Y . Similarly, we define limx→±∞ f (x) = ±∞.

Proposition 68. Let a, b > 0, a 6= 1 6= b, x, y > 0, t ∈ R. Then

(i) loga a = 1, loga 1 = 0.

(ii) loga (xy) = loga x + loga y.

(iii) loga (xt ) = t loga x.

(iv) loga−1 x = − loga x.

(v) loga x = loga b · logb x.

40
y = ex

y = log x

Figure 25: The logarithm and the exponential.

41
(vi) Let a > 1. Then f (x) = loga x is monotonically increasing and continuous. loga x > 0 if
and only if x > 1.

(vii) Let a > 1, α > 0. Then limx→+∞ loga x = +∞.

Proof. (i) a1 = a, a0 = 1.

(ii) aloga x+loga y = aloga x aloga y = xy.

(iii) at loga x = (aloga x )t = xt .

(iv) (1/a)− loga x = 1/a− loga x = 1/(aloga x )−1 = 1/x−1 = x.

(v) aloga b·logb x = (aloga b )logb x = blogb x = x.

(vi) This follows from Theorem 55.


(an−1 )α (an−1 )α
(vii) First we show that limn→∞ loga (an ) = ∞. This is straightforward because loga (an ) =
a(n−1)α
n → ∞. To show the given limit, we take for y > 0 n ∈ N such that n − 1 ≤
(ay )α (an−1 )α
y < n. Then loga ay > loga (an ) , and hence the left-hand side grows as y grows. That is,
(ay )α
limy→∞ loga ay = ∞. Finally, recall that x = ay is monotonic, and x grows infinitely as y
(ay )α
grows. That is, given Z > 0, there is Y > 0 such that loga ay > Z for y > Y , which implies
xα α
that loga x > Z for x > aY . This means that limx→∞ logx x = ∞.
a

Logarithm is extremely useful in natural science. When we have a data which grows expo-
nentially, we can take the log of the value and plot it to a plane, then they lie on a straight line.
The exponent can be read from the slope of the line (this is called the logarithmic scale). In that
case, the logarithm base 10 is often used.
When y = xp , then we can consider z = log y, w = log x, hence ez = y, ew = y. We have
ez = y = xp = (ew )p = ewp . By taking log of both side, we obtain z = pw. That is, by the
log-log plot, a power relation y = xp is translated into a linear relation z = pw.

Oct. 18. Notable limits, hyperbolic functions.


Some notable limits
Proposition 69. Let a ∈ R. The function f (x) = xa defined on R+ satisfies xa y a = (xy)a and
is continuous.

Proof. Note that these properties hold if f (x) = xq , where q is rational.


Let x, y > 0. For a rational q we have (xy)q = xq y q and hence by taking qn → a we
have (xy)a = xa y a . As for continity, assume that x 6= y, then take a < q ∈ Q. We have
a q q
|f (y) − f (x)| = xa |y a x−a − 1| = xa xy − 1 < xa xy − 1 and limy→x xy − 1 = 1 by the
continuity of the rational case. Therefore, by squeezing we have limy→x f (y) = f (x).

Let L ∈ R, and f is a function defined on (a, ∞). If for each  > 0, there is X such that
|f (x) − L| <  for x > X, then we write that limx→∞ f (x) = L.
1 x
Example 70. limx→∞ x = 0. limx→∞ x−1 = 1.

Let f (x) defined on (a, b) and L ∈ R. If for each  > 0 there is δ such that |f (x) − L| <  for
x ∈ (a, a + δ), we denote it by limx→a+ f (x), and we call it the right limit of f at a. Similarly,
we write limx→b− f (x) for the left limit.

42
y = 3x

y = 2x
y = log(3x ) = log 3 · x

y = log(2x ) = log 2 · x

Figure 26: The exponential functions composed with the logarithm.

y = x2 w = 2w

1 w
y = x3 z= 3

Figure 27: The log-log plot of the relation y = xp .

43
Example 71. Let f (x) = sign x. limx→0+ f (x) = 1, limx→0− f (x) = −1.

If f (x) is defined on (b, a) ∪ (a, c), limx→a f (x) = L exists if and only if both the left and
right limits exist and limx→a+ f (x) = limx→a− f (x) and it is L. We leave the proof to the reader.

Lemma 72. Let f be a function on S, limy→y0 f (y) = L. Assume that g is a function on


T , continuous at x0 and g(x0 ) = y0 , g(x) 6= y0 if x 6= 0, |x − x0 <  for some . Then
limx→x0 f (g(x)) = limy→y0 f (y) = L. Similarly, if limy→∞ f (y) = L and limx→∞ g(x) = ∞,
then limx→∞ f (g(x)) = L.

Proof. The first statement can be proven similarly to the continuity of the composed function
f (g(x)).
As for the second point, for a given  we take Y such that |f (y) − L| <  for y > Y . Then,
there is X such that g(x) > Y for x > X. Altogether, |f (g(x)) − L| <  if x > X.

We call this the change of variables, in the sense that we can calculate limy→y0 f (y) by
calculating limx→x0 f (g(x)) and vice versa.
For x ∈ R, we denote √ by [x] the largest integer n such that n ≤ x, and call it the integer
part of x. For example, [ 2] = 1, [π] = 3, and so on.
In this Proposition, n ∈ N and x ∈ R.

Proposition 73. We have the following.


n
(i) limn→∞ 1 − n1 = e−1 .
n
(ii) limn→∞ 1 + n12 = 1.
x
(iii) limx→∞ 1 + x1 = e.
1
(iv) limx→0 (1 + x) x = e.
log(1+x)
(v) limx→0 x = 1.
x
1 + xt = et .

(vi) limx→∞
ex −1
(vii) limx→0 x = 1.

Proof. (i) Note that


−n  −1
1 n n−1 n
    
1 1 1
1− = = 1+ = 1+ n−1
n n n−1 n−1

1
1+ n−1

 −1
Note that is continuous at x = 1, e, and hence 1 +
1
x
1
n−1 → 1 and 1
1
n−1 → 1e .
n (1+ n−1 )
Altogether, 1 − n1 = 1e = e−1 .
 1
 2 n  2
1 n 1 n 1 n
(ii) limn→∞ 1 + . As limn→∞ 1 = e, this sequence is

n2
= limn→∞ 1+n2
+ n2
n 1 1
bounded by, say M . Then 1 < 1 + n12 < M n but M n → 1, then by squeezing we have

n
1 + n12 → 1.

(iii) Note that, if an → a, then bn = an+1 → a as well. Furthermore, if a < b < c and if
|a − x| < , |c − x| < , then by the triangle inequality we have − < a − x < , hence
a −  < x < a + . Similarly, c −  < x < c + , and therefore, b −  < x < b +  and hence
|b − x| < .

44
n  n+1
We know that limn→∞ 1 + n1 = limn→∞ 1 + n+1 1
= e. Let n = [x], then n ≤ x <
n + 1 and
 n+1
1
1 + n+1 
1 x
 
1 n+1
 
1 n
 
1

1 < 1 + < 1 + = 1 + · 1 + .
1 + n+1 x n n n

Note that the left-hand side and the right-hand side tend to e, because 1+ n+1
1
→ 1, 1+ n1 →
1 n+1
(1+ n+1 ) 1 n
1. This means that, for a given , 1
− e <  for
 
1
1+ n+1
− e < , 1 + n · 1 + n
x
sufficiently large n. This implies that 1 + x1 − e < .


Altogether, this says that,


 if x is sufficiently large, then we apply this argument with n = [x],
1 x 1 x
and obtain that 1 + x − e < . This is limx→∞ 1 + x = e.


1
(iv) By the previous point and a change of variables x1 , note that 1
x > 0, limx→0+ (1 + x) x = e.
1
We have limx→0− (1 + x) x = e as well. So we have checked both the right and left limits.

(v) As log y is continuous at y = e,

log(1 + x) 1 1
lim = lim log(1 + x) x = log lim (1 + x) x = log e = 1,
x→0 x x→0 x→0

1
where we used limx→0 (1 + x) x = e.

t x
  x t
(vi) Note that limx→∞ 1 + t
= et , where we used the continuity of

x = limx→∞ 1+ x
t

g(a) = at .
ex −1 y
(vii) With y = ex − 1, we have log(y + 1) = x and limx→0 x = limy→0 log(1+y) = 1.

ex −e−x
Definition 74. • sinh x = 2

ex +e−x
• cosh x = 2

• tanh x = sinh x
cosh x

Proposition 75. (i) cosh(x + y) = cosh x cosh y + sinh x sinh y.

(ii) sinh(x + y) = cosh x sinh y + sinh x cosh y.

(iii) (cosh x)2 − (sinh x)2 = 1.

Proof. (i) cosh x cosh y + sinh x sinh y = 1 x


4 (e + e−x )(ey + e−y ) + 14 (ex − e−x )(ey − e−y ) =
1
4 (2e
x+y + 2e−x−y ) = cosh(x + y).

(ii) analogous.

(iii) analogous.


Proposition 76. (i) sinh−1 (x) = log(x + x2 + 1).

(ii) cosh−1 (x) = log(x + x2 − 1) for x > 1.

45
y = cosh x

y = sinh x

y = tanh x

Figure 28: The hyperbolic functions.

46
Proof. (i)

1 (x + x2 + 1)2 − 1
 
p 1 p 1
sinh(log(x + x2 + y)) = (x + x2 + 1) − √ = √
2 x + x2 + 1 2 x + x2 + 1

1 x2 + 2x x2 + 1 + x2 + 1 − 1
= √ = x.
2 x + x2 + 1

(ii) analogous.


Definition 77. (i) Arcsinh x = sinh−1 (x) = log(x + x2 + 1).

(ii) Arccosh x = cosh−1 (x) = log(x + x2 − 1) for x > 0.

Oct. 20. Review of trigonometric functions and some limits.


Trigonometric functions
The functions sin θ and cos θ are usually defined as the length of the horizontal and vertical sides
of the right triangle obtained from a point p on the unit circle (the circle centered at (0, 0) with
radius 1) such that the x-axis and the segment from the point of origin to p makes an angle of
degree θ. However, to make this definition precise, we would first need to define the angle, that
is the length of the arc on the unit circle, then consider the right triangle...
That is possible, but we would have to wait until we define integral before define trigonometric
functions (or define the trigonometric function by something called power series). In this lecture,
we prefer practicality, therefore,

• We assume that there are functions called sin θ, cos θ.

• We use the figures and the elementary geometry to derive their elementary properties.

• Then we study their analytic aspects: limit, derivative, integral, Taylor expansion, and so
on.

Now, to obtain cos θ and sin θ, we draw the unit circle, and take the point p on the unit circle
such that the x-axis and the segment from the point of origin to p makes an angle of degree θ
going anticlockwise, 0 ≤ θ ≤ 90 (degrees). Then cos θ is defined to be the x-coordinate of the
point p, and sin θ is defined to be the y-coordinate of p.
We can make a right triangle by drawing the vertical line from this point. If 0 ≤ θ ≤ 90
(degrees), then cos θ is the length of the horizontal side of the triangle, while sin θ defined to be
the length of the vertical side. When θ ≥ 90 (degrees), then cos θ becomes negative.
There are various ways to represent the angle. Often we use the degrees, which devide the
circle into 360 degrees. Another is called the radian, which defines the angle by the lenght of
the arc on the unit circle. In radian, we have 360 (degrees) = 2π (radian), 180 (degrees) = π
(radian), 90 (degrees) = π2 (radian), 45 (degrees) = π4 (radian) and so on. In this lecture, from
this point we use radian, unless otherwise specified.
Some important values:

• sin 0 = 0, cos 0 = 1.

• sin π6 = 12 , cos π6 = 2 .
3

• sin π4 = √1 , cos
2
π
4 = √1 .
2

• sin π3 = 3
2 , cos
π
3 = 12 .

47
(cos θ, sin θ)

sin θ
θ
θ

cos θ

Figure 29: The trigonometric functions and their values for general angle θ.

• sin π2 = 1, cos π2 = 0.

We can extend cos θ and sin θ to all real numbers, considering that for θ > 2π we go around the
circle more than once, and for θ < 0 we go around the circle clockwise. With this understanding,
we have

• cos(θ + 2π) = cos θ

• sin(θ + 2π) = sin θ.

• cos(−θ) = cos θ

• sin(−θ) = − sin θ.

In this way, we can consider cos and sin as functions on R. They are continuous, because
if we change slightly the degree, the point p moves only slightly (we do not prove this, as we
introduce these functions only by geometry, without defining the arg length).
They are related by the formulas cos(θ + π2 ) = − sin θ and sin(θ + π2 ) = cos θ (see Figure 30).
We introduce also tan θ = cos θ.
sin θ

Some formulas
We often write cos2 θ = (cos θ)2 , sin2 θ = (sin θ)2 , cos3 θ = (cos θ)3 , sin3 θ = (sin θ)3 , etc.

• cos2 θ + sin2 θ = 1. This is because of the Pytagorean theorem: cos θ and sin θ are the
length of the horizontal and vertical sides of the right triangle, while the length of the
longest side is 1.

• sin(α + β) = sin α cos β + cos α sin β. See Figure 32

• cos(α + β) = cos α cos β − sin α sin β.

From these formulas, we can derive various useful formulas.

• cos 2θ = 2 cos2 θ − 1 = 1 − 2 sin2 θ. Indeed, cos 2θ = cos θ cos θ − sin θ sin θ = cos2 θ − sin2 θ
and use cos2 θ + sin2 θ = 1.

48
π
sin θ
θ+ 2
cos θ
θ

− sin θ cos θ

Figure 30: A relation cos(θ + π2 ) = − sin θ and sin(θ + π2 ) = cos θ.

y = cos θ y = sin θ

y = tan θ

Figure 31: The graphs of cos θ, sin θ and tan θ.

49
α
B sin β

β cos β
A
α

Figure 32: The formula sin(α + β) = sin α cos β + cos α sin β. A = cos β sin α, B = sin β cos α
and A + B = sin(α + β).

• sin 2θ = 2 sin θ cos θ. Indeed, sin 2θ = sin θ cos θ + cos θ sin θ.

• sin α cos β = 1
2 (sin(α + β) + sin(α − β)). Indeed,

1
(sin(α + β) + sin(α − β))
2
1
= ((sin α cos β + cos α sin β) + (sin α cos(−β) + cos α sin(−β)))
2
1
= (sin α cos β + cos α sin β + sin α cos β − cos α sin β)
2
= sin α cos β.

• cos α sin β = 1
2 (sin(α + β) − sin(α − β)).

• cos α cos β = 1
2 (cos(α + β) + cos(α − β)).

• sin α sin β = 1
2 (− cos(α + β) + cos(α − β)).

For example, we can compute cos π8 . Indeed, 2 cos2 π


8 − 1 = cos( π8 · 2) = cos π4 = √1 ,
2
and
r
√1 +1
hence cos π8 = 2
2 .

Some limit
By comparing the areas of the triangles of the sector, we see 21 cos θ sin θ < 2θ < 12 cos θ (see
sin θ

Figure 33), and hence cos θ < θ < cos θ . As we assumed that sin and cos are continuous, and
sin θ 1

cos 0 = 1, we obtain limθ→0 sinθ θ = limθ→0 cos θ = limθ→0 cos1 θ = 1 by squeezing.

Oct. 21. Open and closed sets, Bolzano-Weierstrass theorem.


Definition 78. Let O ⊂ R. We say that O is open if for any p ∈ O there is  > 0 such that
(p−, p+) ⊂ O (this  depends on p). Let F ⊂ R. We say that F is closed if for any convergent
sequence {an } ⊂ F, an → a, it holds that a ∈ F .

50
sin θ
(1, cos θ)

Figure 33: By comparing the areas of the triangles and the sector, we see 1
2 cos θ sin θ < θ
2 <
2 cos θ .
1 sin θ

( ( ) )
( [ ) ]

Figure 34: Open and closed intervals. An open set include a small “neighborhood” of any point
in it, but a sequence in it might converge to a point outside. A closed subset contains the limit
of any sequence in it, but a point might “touch” other points outside.

Example 79. • Consider the open interval A = (0, 1). This is open, because for any point
p ≤ 2 we can take  = p2 and ( p2 , 3p
1 1−p
2 ) ⊂ (0, 1). If p > 2 , we can take  = 2 . On the other
1

hand, (0, 1) is not closed. Indeed, the sequence an = n belongs to A = (0, 1), but the limit
1

0 does not belong to A.

• Consider the closed interval B = [0, 1]. This is closed. Indeed, for any convergent sequence
{an } ⊂ B, an → a, it holds that 0 ≤ an ≤ 1 and hence 0 ≤ a ≤ 1. On the other hand, for
p = 0, for any , (−, ) 6⊂ B, therefore, B is not open.
Therefore, the terminology “open” and “closed” for intervals are consistent with those for
general sets we have just introduced.

For any set A ⊂ R, we denote its complement by Ac = R \ A.

Lemma 80. O ⊂ R is open if and only if Oc is closed.

Proof. Let O be open and assume that Oc is not closed. That is, there is a sequence {an } ⊂ Oc
that converges to a, but a ∈ Oc . Therefore, it must holds a ∈ O. But we can take  > 0 such
that (a − , a + ) ⊂ O, and if an → a, it would have to hold that an ∈ (a − , a + ) ⊂ O, which
contradicts the assumption that {an } ⊂ Oc . Therefore, Oc is closed.
Conversely, let Oc be closed, and assume that O is not open. As O is not open, there is
a ∈ O such that for any n1 > 0 there is an such that |an − a| < n1 , but an ∈
/ O. Hence an ∈ Oc .
But with this condition an → a, which contradicts the assumption that O is closed. Therefore,
c

O must be open.

( ( ) )
( [ ) ]

Figure 35: Any point in an open set is “protected” from outside. On the other hand, if a set is
not open, there is a point which is not “protected”.

51
h   i
[ ]

Figure 36: Nested invertals. As the sequence {an } contains infinitely many points, one of two
intervals must contain infinitely many of them.

It is not difficult to prove that any union (even if infinite!) of open sets is again open.
Similarly, any intersection of closed sets is again closed.
Let us recall that a sequence {an } is called Cauchy if for any given  > 0 there is N such
that for m, n > N it holds that |am − an | < .
Furthermore, we said that bn is a subsequene of an if there is a growing sequence Nn ∈ N such
that bn = aNn , that is, bn is obtained by skipping some elements in an . Recall that we consider
infinite sequences, that is, the sequence does not stop at any an , but continues infinitely.

Theorem 81 (Bolzano-Weierstrass). Let {an } be a bounded sequence. Then there is a convergent


subsequence of {an }.

Proof. As {an } is bounded, we can find M sufficiently large such that an ∈ [−M, M ]. As
the sequence {an } infinitely many elements, one of the intervals [−M, 0], (0, M ] must contain
infinitely many of them. Therefore, we can take a subsequence bn = amn such that bn are
contained one of them. To fix the idea, assume that bn ∈ (0, M ] (the other case is just analogous).
As (0, M ] = (0, M 2 ] ∪ ( 2 , M ], one of them must contain infinitely many elements of bn .
M

Therefore, we can take a subsequence cn = bkn such that cn are contained one of them.
By continuing this procedure, for each n we obtain a subsequence that is contained in an
interval of length 2n−1 M
, and the later one is a subsequence of the former. Let us take a subsequence
a1 , b2 , c3 , · · · of the original sequence. Then, for n, m > N , any two elements are contained in an
interval of length 2NM−1 . Therefore, this subsequence is Cauchy. Then it is a convergent sequence
by Lemma 61.

It is important that an is bounded. Indeed, if not, it is obviously impossible in general to


extract a convergent sequence: consider an = n, which is not bounded and not convergent to
any point. In addition, the possibility to extract a convergent subsequence does not mean that
the original sequence is convergent, or there is only one convergent subsequence.

Theorem 82. Let f be a continuous function defined on a bounded closed interval F . Then f
is bounded, that is, there is M > 0 such that |f (x)| < M for x ∈ F .

Proof. Let us suppose the contrary, that for any n > 0 there is xn ∈ F such that |f (xn )| ≥ n.
As {xn } is a sequence in a bounded set F , we can take a convergent subsequence {yn } of {xn }.
As F is closed, yn → y and y ∈ F . By assumption f is continuous, therefore, it must hold that
limn→∞ f (yn ) = f (y). But this is impossible because |f (yn )| ≥ n by our choice. Therefore, f is
bounded.

Example 83. • Consider the function f (x) = x1 defined on R \ {0}. This is not bounded,
but when we restrict it to an interval [ n1 , n], it is bounded by n.
(
1
if x ∈ [−1, 1], x 6= 0
• Consider the function f (x) = x . This is defined on a closed interval
0 x=0
[−1, 1], but not continuous. Therefore, the previous theroem does not apply. Indeed, it is
not bounded.

52
[ ]

Figure 37: A continuous function on a bounded closed interval is bounded. If either of these
conditions are violated, then function can be unbounded.

Figure 38: Left: y = x on x > 0. There are no minimum or maximum. Right y = x2 on R. The
minimum is 0 at x = 0, but there is not maximum. When restricted to [a, b], either a2 or b2 is
the maximum.

Oct. 25. Maximum and minimum of functions, the Weierstrass


theorem, uniform continuity.
The maximum and minimum of functions
Definition 84. Let f be a function defined on S.

• We say that f takes its maximum at x0 if f (x0 ) ≥ f (x) for all x ∈ S.

• We say that f takes its minimum at x0 if f (x0 ) ≤ f (x) for all x ∈ S

Example 85. Note that a function does not necessarily admit maximum or minimum. If it has,
they may depend on the domain.

• f (x) = x, defined on x > 0, has no maximum or minimum. Indeed, for any x > 0,
f ( x2 ) = x2 < x and f (2x) = 2x > x.

• f (x) = x2 , defined on x ∈ R, has no maximum but the minimum is at x = 0 with f (0) = 0.


If it is restricted to the interval [a, b], then the maximum is the larger one of a2 , b2 .

Theorem 86 (Weierstrass). Let F ⊂ R be a bounded closed set (or interval), and f be a


continuous function on F . Then f admits both a maximum and a minimum in F .

53
Figure 39: The graph of the function y = x − [x], the decimal part of x. This is bounded, but
has no maximum. The minimum is 0 at x ∈ Z.

Proof. By Theorem 82, f is bounded, say −M < f (x) < M . Then the image A = {f (x) : x ∈ F }
is a bounded set in R, therefore, it admits sup A and inf A. Let us prove that f admits a maximum
(the case for minimum is analogous). For each n there is xn ∈ F such that sup A − n1 < f (xn ).
As F is bounded, xn admits a convergent subsequence yn , yn → y and y ∈ F because F is
closed. Now, as f is continuous, we have f (y) = limn→∞ f (yn ). As yn is a subsequence, it holds
that sup A − n1 < f (yn ) ≤ sup A. This implies that f (y) = sup A. That is, f attains a maximum
at y.

Example 87. (and non example)

• f (x) = x2 is continuous, hence on any closed and bounded F f admits a maximum and a
minimum. But not on the whole real line R, which is not bounded.

• f (x) = x − [x] is not continuous, and indeed it does not admit a maximum on [0, 1],
although [0, 1] is close and bounded.

Often it is said that a closed and bounded set F ⊂ R is compact. We have seen that in
any sequence {an } in a compact set admits a convergent subsequence (the Bolzano-Weierstrass
theorem), and the limit is in F . Conversely, if a set A has a property that any sequence in it has
a convergent subsequence with the limit in A, then it is compact (bounded and closed): indeed,
A must be bounded because otherwise we could take an unbounded sequence. Furthermore, A
must be closed, because if an ∈ A is a convergent sequence, we can take a convergent subsequence
with the limit a in A, but there is only one limit for an , hence an → a ∈ A, that is, A is closed.
Let us see another strong property of continuous functions defined on bounded and closed
sets.

Definition 88. Let S ⊂ R, f : S → R. f is said to be uniformely continuous on S if, for


any  > 0, there is δ > 0 such that |f (x) − f (y)| <  for all x, y ∈ S, |x − y| < δ.

Note the difference with the continuity: a function f is continuous if for each x ∈ S and for
each  there is δ such that |f (y) − f (x)| <  if |y − x| < δ. In other words, the number δ may
change from point x to others.
On the other hand, uniform continuity asserts that for each  > 0 there is δ that applies to
all x, y ∈ S, hence uniformly in S.

Example 89. (functions that are not uniformly continuous)

• f (x) = x1 is continuous on {x ∈ R : x > 0}. However, it is not uniformly continuous.


Indeed, for  = 1 for any δ > 0, we can take N such that N1 < δ and N > 2. Then
x = N1 , y = N2 , hence f (y) − f (x) = N2 > 1 =  but x − y = N1 < δ.

• f (x) = sin x1 is continuous on {x ∈ R : x > 0} but not uniformly continuous. Indeed, for
 = 21 for any δ > 0, we can take N such that πN 2
< δ and N odd. Then x = πN2 1
, y = πN ,
hence |f ( πN ) − f ( πN )| = | sin(πN ) − sin( 2 )| = 1 >  but x − y = πN < δ.
1 2 πN 1

54
Figure 40: Functions continuous but not uniformly continuous.

y = sin θ y = tanh x

Figure 41: Functions defined on R but uniformly continuous.

Note that the function f (x) = |x| is continuous. Indeed, if x > 0, then f (x) = x and this is
continuous at x. Similarly, f is continuous at x < 0. Finally, if x = 0, for any  > 0, we take
δ = . Then if |y − x| = |y − 0| < δ, then |y| − |0| = |y − 0| < δ = .

Theorem 90 (Heine-Cantor). Let F bounded and closed, f : F → R a continuous function.


Then f is uniformly continuous.

Proof. To prove this by contradiction, assume that there is  > 0 such that for any δ > 0 there
are x, y ∈ F, |x − y| < δ but |f (x) − f (y)| > . In particular, for δ = n1 > 0 there are xn , yn ∈ F
such that |xn − yn | < n1 but |f (xn ) − f (yn )| > . Let xNn be a convergent subsequence of
xn (which exists by Theorem 81) to x̃ ∈ F . Let us extract a subsequence {yNn } of {yn }. As
|x̃ − yNn | ≤ |x̃ − xNn | + |xNn − yNn | → 0, also {yNn } must be convergent to x̃ ∈ F .
Then limn→∞ |f (xNn ) − f (yNn )| = |f (x̃) − f (x̃)| = 0, as f is continuous (note that the
absolute value is continuous). But this contradicts the assumption that |f (xNn ) − f (yNn )| > .
Therefore, for all  there exists δ such that for all x, y ∈ F, |x−y| < δ vale |f (x)−f (y)| < .

Until now, we have studied continuity of functions. A function f is continuous at point x if


for each  > 0 there is δ such that |f (y) − f (x)| <  for y such that |y − x| < δ. This tells us
that “the graph is connected”, but does not tell us how fast the function f changes.
We would like to know such information. For example, if f represents the motion of a car (in
one direction), then how can we determine the speed of the car? Or if f represents the height of
the mountain in a path and x represents the distance from the starting point, what is the slope
of the mountain?
In the case of the speed, if the car has travelled 100km in two hours, then the average speed
is 50km/h per hour. But it might be that the car travelled with the constant speed 50km/h, or
it travelled with 40km/h in the first one hour and then 60km/h in the second one hour. Is it
possible to determine the speed at a moment? In the case of a mountain, what is the slope at a
point?
They should be approximated by secant lines.

55
Figure 42: The slope at a point as the limit of the slopes of secant lines.

Oct. 27. Derivative. First examples.


Derivative
As we discussed, we can define the average speed of a car, or the average slope of a curve in an
interval. By taking the limit of the interval that tends to 0, we should obtain the speed or the
slope at one point.

Definition 91. Let I ⊂ R an open interval, f a function defined on I.

• Let x0 ∈ I and h small such that x0 + h ∈ I.


f (x0 + h) − f (x0 )
h
is called the average rate of change of f between x0 and x0 + h.

• the function f is said to be differentiable at x0 if the following limit exists:

f (x0 + h) − f (x0 )
lim .
h→0 h
If this limit exists, it is called the derivative of f at x0 and it is denoted by f 0 (x0 ) =
limh→0 f (x0 +h)−f
h
(x0 ) df
, Df (x0 ) or dx (x0 ).

The derivative at the point x0 is defined to be the limit of average rates of change. In this
sense, the derivative represents the rate of change af the point x0 . If f (t) represents the position
of a car at time t, then f 0 (t) is the speed of the car at time t.

Derivatives of elementary functions.


f (x+h)−f (x)
• Let f (x) = c for x ∈ R (constant). For any x ∈ R, h = c−c
h = 0, therefore,
f 0 (x) = 0.

56
A
0.5A

0.5 1

Figure 43: The slope of the straight line at a point as the limit of the slopes of secant lines.

f (x+h)−f (x)
• Let A ∈ R and f (x) = Ax for x ∈ R (a straight line). For any x ∈ R, h =
A(x+h)−Ax 0
h = Ah
h = A, f (x) = A.
f (x+h)−f (x)
• Let A ∈ R and f (x) = Ax2 for x ∈ R (parabola). For any x ∈ R, h =
A(x+h)2 −Ax2 A(2xh+h2 )
h = h = A(2x + h), therefore, f 0 (x)
= limh→0 A(2x + h) = 2xA.
Pn
• Let n ∈ N and f (x) = Axn for x ∈ R. It holds that (x + h)n = n k n−k

k=0 k x h =
n(n−1) n−2 2
n
x + nx n−1 h+ 2 x h + . . . . For any x ∈ R,
n(n−1) n−2 2
f (x + h) − f (x) A(x + h)n − Axn A(xn + nxn−1 h + 2 x h + . . . hn − xn )
= =
h h h
n−1 n(n − 1) n−2
= Anx +A· x h + · · · hn−1 ,
2
n(n−1) n−2
therefore, f 0 (x) = limh→0 A(nxn−1 + 2 x h + · · · hn−1 ) = Anxn−1 .
• Let f (x) = 1
x for x ∈ R, x 6= 0. For any x ∈ R, x 6= 0,
1 1
f (x + h) − f (x) x+h − x x − (x + h) 1
= = =−
h h hx(x + h) x(x + h)
therefore, f 0 (x) = limh→0 − x(x+h)
1
= − x12 .

• Let f (x) = log x, x > 0. Then


 1  x
log(x + h) − log x h h 1 h h
= log 1 + = log 1 + ,
h x x x
x 1
therefore, f 0 (x) = limh→0 x1 log(1 + hx ) h = limy→0 1
x log(1 + y) y = 1
x (this is one of the
notable limits we have learned)
• Let f (x) = ex , x ∈ R. Then
ex+h − ex eh − 1
= ex ,
h h
h −1
therefore, f 0 (x) = limh→0 ex e h = ex (this is one of the notable limits).

57
• f (x) = sin x, x ∈ R. Recall the formula cos α sin β = 12 (sin(α + β) − sin(α − β)). Then,
with α = x + h2 , β = h2 , we have sin(x + h) − sin x = 2 cos(x + h2 ) sin h2 , therefore,

sin(x + h) − sin x
f 0 (x) = lim
h→0 h
2 cos(x + h2 ) sin h2
= lim
h→0 h
sin h
 
h
= lim cos x + lim h 2
h→0 2 h→0 2
= cos x · 1 = cos x

(by the continuity of cos x and one of the notable limits limh→0 sin h
h = 1 and the change of
variable h2 replacing h.

• f (x) = cos x, x ∈ R. Recall the formula − sin α sin β = 12 (cos(α + β) − cos(α − β)) Then,
with x + h2 , β = h2 , we have cos(x + h) − cos x = −2 sin(x + h2 ) sin h2

cos(x + h) − cos x
f 0 (x) = lim
h→0 h
−2 sin(x + h2 ) sin h2
= lim
h→0 h
sin h
 
h
= − lim sin x + lim h 2
h→0 2 h→0 2
= − sin x · 1 = − sin x

(by the continuity of sin x and limh→0 sin h


h = 1 and the change of variables).

Lemma 92. If f (x) is differentiable at x0 , then f is continuous at x0 .

Proof. We compute the limit:

f (x0 + h) − f (x0 )
lim f (x) − f (x0 ) = lim f (x0 + h) − f (x0 ) = lim · h = f 0 (x0 ) · 0 = 0.
x→x0 h→0 h→0 h
That is, limx→x0 f (x) = f (x0 ).

Definition 93. Let f : [x0 − δ, x0 ] → R where δ > 0. If the following limit limh→0− f (x0 +h)−f h
(x0 )

exists (from the left), f is said to be left-differentiable at x0 , and this limit is denoted by D− f (x0 ),
the left derivative. Similarly, we define the right derivative.
|0+h|−0
Example 94. Let f (x) = |x|, x0 = 0. D− f (0) = limh→0− h = limh→0− −h
h = −1, while
D+ f (0) = limh→0+ hh = 1.

Definition 95. Let f be defined on an open interval I. If f is differentiable at each point x of


I, then x 7→ f 0 (x) defines a new function I. This is called the derivative of f (x).

Example 96. • The derivative of f (x) = C (constant) is f 0 (x) = 0.

• The derivative of f (x) = x is f 0 (x) = 1.

• The derivative of f (x) = x2 is f 0 (x) = 2x.

• The derivative of f (x) = sin x is f 0 (x) = cos x.

58
y

Figure 44: The graph of y = |x|, which has left and right derivatives, but they do not coincide.

Oct. 28. More examples of derivatives.


For a function f defined on an open interval I and x ∈ I, we have defined the derivative
f 0 (x) = limh→0 f (x+h)−f
h
(x)
, and we say that f is differentiable at x if this limit exists. Sometimes
we denote this as f (x) = (Df )(x).
0

This is equivalent to write Df (x) = f 0 (x) = limw→x f (w)−f


w−x
(x)
.
Let f, g be functions. We write this x 7→ f (x). We denote by f + g the function that maps
x 7→ f (x) + g(x). Similarly, f · g is the function x 7→ f (x)g(x), fg is the function x 7→ fg(x) (x)
, and
the composition is f ◦ g that is given by x 7→ f (g(x)).

Theorem 97. Let f, g be functions on open intervals. The following hold if f, g are differentiable
at x (or f at g(x) for the chain rule):

• For a, b ∈ R, D(af + bg)(x) = aDf (x) + bDg(x) (linearity).

• D(f g)(x) = Df (x)g(x) + f (x)Dg(x) (Leibniz rule).


Df (x)g(x)−f (x)Dg(x)
• If g(x) 6= 0, then D( fg )(x) = g(x)2
.

• D(f ◦ g)(x) = Dg(x)Df (g(x)) (the chain rule).

• If Df (x) 6= 0 and f is monotonically increasing or decreasing and continuous in (x−, x+)


for some  > 0. Then f −1 is differentiable at y = f (x) and D(f −1 (y)) = Df1(x) .

Proof. • This is straightforward from the algebra of limits:

af (x + h) + bg(x + h) − af (x) − bg(x)


lim
h→0 h
f (x + h) − f (x) g(x + h) − g(x)
= lim a + lim b
h→0 h h→0 h
f (x + h) − f (x) g(x + h) − g(x)
= a lim + b lim
h→0 h h→0 h
= aDf (x) + bDg(x).

• Note that f (x + h)g(x + h) − f (x)g(x) = f (x + h)g(x + h) − f (x)g(x + h) + f (x)g(x + h) −

59
f (x)g(x), and g is continuous at x because it is differentiable there:
f (x + h)g(x + h) − f (x)g(x)
lim
h→0 h
f (x + h)g(x + h) − f (x)g(x + h) + f (x)g(x + h) − f (x)g(x)
= lim
h→0 h
f (x + h)g(x + h) − f (x)g(x + h) f (x)g(x + h) − f (x)g(x)
= lim + lim
h→0 h h→0 h
f (x + h) − f (x) g(x + h) − g(x)
= lim lim g(x + h) + f (x) lim
h→0 h h→0 h→0 h
= Df (x)g(x) + f (x)Dg(x).

• As g(x) 6= 0, we have limh→0 1


g(x+h) = 1
g(x) and

f (x+h) f (x)
g(x+h) − g(x)
lim
h→0 h
f (x + h)g(x) − f (x)g(x + h)
= lim
h→0 g(x + h)g(x)h
f (x + h)g(x) − f (x)g(x) + f (x)g(x) − f (x)g(x + h)
= lim
h→0 g(x + h)g(x)h
(f (x + h) − f (x))g(x) − f (x)(g(x + h) − g(x))
= lim
h→0 g(x + h)g(x)h
Df (x)g(x) − f (x)Dg(x)
= .
g(x)2

• Note first that the difference u(k) = f (g(x)+k)−f


k
(g(x))
−Df (g(x)) tends to 0 as k → 0. Let us
also set u(0) = 0, then u is continuous around 0. We can write this as f (g(x)+k)−f (g(x)) =
k(Df (g(x)) + u(k)), and this holds also for k = 0.
We compute
f (g(x + h)) − f (g(x))
lim
h→0 h
f (g(x) + (g(x + h) − g(x))) − f (g(x))
= lim
h→0 h
(g(x + h) − g(x))(Df (g(x)) + u(g(x + h) − g(x)))
= lim
h→0 h
g(x + h) − g(x) g(x + h) − g(x)
= lim · Df (g(x)) + lim · u(g(x + h) − g(x))
h→0 h h→0 h
= Dg(x)Df (g(x)),

because g(x + h) tends to g(x), u(k) is continuous and u(0) = 0.

• Let us assume that f is monotonically increasing and continuous in (x − , x + ). Then,


with y = f (x),

f −1 (y + h) − f −1 (y) f −1 (z) − f −1 (y)


lim = lim
h→0 h z→y z−y
f (f (w)) − f −1 (f (x))
−1
= lim
w→x f (w) − f (x)
w−x 1
= lim = ,
w→x f (w) − f (x) Df (x)

60
where in the second equality we used the change of variables z = f (w). The case where f
is monotonically decreasing is analogous.

Example 98. • Let f (x) = x4 + 3x2 − 34. Then Df (x) = 4x3 + 6x.
x2 +1 2x(x−2)−(x2 +1)·1 x2 −4x−1
• Let f (x) = x−2 . Then, for x 6= 2, Df (x) = (x−2)2
= (x−2)2
.

• Let f (x) = sin x, g(x) = x2 . By linearity, D(sin x + x2 ) = cos x + 2x.


By Leibniz rule, D(x2 sin x) = 2x sin x + x2 cos x.
Let us take the composition sin(x2 ) = f (g(x)). By the chain rule, D(sin(x2 )) = D(x2 ) ·
(D sin)(x2 ) = 2x · cos(x2 ). For (sin x)2 = g(f (x)), D((sin x)2 ) = D(sin x) · 2(sin x) =
2 sin x cos x.

• By the chain rule, D(exp(−x)) = D(−x) · (D exp)(−x) = − exp(−x). By linearity,


D sinh x = D( 21 (ex − e−x )) = 21 (ex + e−x ) = cosh x. Analogously, D cosh x = sinh x.

• For a > 0, it holds that ax = (elog a )x = elog a·x . Indeed, by the chain rule,

D(ax ) = D(exp(log a · x)) = D(log a · x) · (D exp)(log a · x) = log a · exp(log a · x) = log a · ax .

• Let a > 0 and f (x) = xa for x > 0. f (x) = exp(log x · a), and by the chain rule,
a a
Df (x) = D(log x · a)D(exp)(log x · a) = · exp(log x · a) = · xa = axa−1 .
x x
For a < 0, we consider f (x) = xa = x1a and we obtain the same formula f 0 (x) = axa−1 .
For a = 0, because xa = 1, we have D(x0 ) = D(1) = 0.
cos x·cos x−sin x·(− sin x)
• D tan x = D( cos
sin x
x) = cos2 x
= 1
cos2 x
.

• f (y) = arctan y. That is, f (y) = g −1 (y), where g(x) = tan x restricted to [− π2 , π2 ]. By the
formula for the inverse function, we have Df (y) = Dg(x)1
= cos2 x, where y = g(x) = tan x.
2 2
Therefore, y 2 = cos
sin x
2x =
1−cos x
cos2 x
, and cos2 x = 1
1+y 2
. By substituting this in the previous
result, D arctan y = Df (y) = 1+y 1
2.

• f (x) = tanh x. f 0 (x) = 1


cosh2 x
.

• f (x) = arcsin x (the inverse function of sin x restricted to [− π2 , π2 ]). f 0 (x) = √ 1


1−x2
.

Nov. 3. Meaning of derivative, some applications.


Tangent line
We defined derivative as the limit of average slope of a graph, and expected that it should
represent the slope at one point. If we have the slope at one point, then we should be able to
draw the tangent line to the graph at that point.
−y0
Recall that the slope of a segment (x0 , y0 )–(x1 , y1 ) is defined by xy11 −x 0
. The graph of y =
Ax + B has the slope A. Therefore, if the graph of the function y = f (x) passes the point (x0 , y0 )
and the derivative is f 0 (x0 ), the tangent line should be

y = f 0 (x0 )(x − x0 ) + y0 = f 0 (x0 )x + y0 − f 0 (x0 )x0 .

Indeed, this is of the form y = Ax + B with A = f 0 (x0 ) and B = y0 − f 0 (x0 )x0 , and passes the
point (x0 , y0 ).

61
y1 −y0
y= x1 −x0 (x − x0 ) + y0

y1
y0

x0 x1

y1 −y0
Figure 45: The slope of the straight line is x1 −x0 .

Figure 46: The tangent lines to the graphs of x2 , cos x. Their equations are y = 2(x − 1) + 1, y =
−(x − π2 ), respectively.

When the slope is positive, the line goes upwards (when one goes to the right), while the line
goes downwards when the slope is negative. When the slope is 0, it is a holizontal line. The
vertical line is represented by the equation x = a, and this is not of the form y = Ax + B.
If we draw these lines, they are almost always indeed tangent, but in some cases they closs
the graph.

Extrema and stationary points


Definition 99. Let x ∈ R. For  > 0, we call the interval (x − , x + ) the -neighbourhood
of x.
Let f be defined on an interval I. We say that f takes a local minimum or relative
minimum (local maximum or relative maximum , respectively) at x ∈ I if there is an  > 0
of x such that x is the minimum (maximum, respectively) of f in (x − , x + ) ∩ I.
If x is the minimum (maximum) of f on I we may say that x is the global or absolute
maximum (minimum), to distinguish them from local (relative) minimum (maximum).

Example 100. Let f (x) = x3 − x. When we consider this as a function on R, there is no global
maximum or mininum, but there are local maximum and minimum at x = √13 , − √13 , respectively
(we will see why they are these points later). If we restrict the funcion to [−2, 2], then −2, 2 are
the global minimum and the global maximum, respectively.

Theorem 101. Let f be defined on an open interval I and assume that f takes a local minimum
(or a local maximum) at the point c ∈ I. If f is differentiable at c, then f 0 (c) = 0.

62
[ ]

Figure 47: Left:The graph of y = x3 − x. The local maximum and minimum are x = √13 , − √13 ,
respectively. When restricted to a closed bounded interval, it has global maximum and minimum.
Right: The graph of y = f (x) = x3 . f 0 (x) = 3x2 , hence x = 0 is a stationary point.

Proof. Let c be a local maximum (the case for minimum is analogous). Then f (x) ≤ f (c) for all
x ∈ (c − , c + ). As f (x) is differentiable at x = c, both of its left and right derivatives must
coincide.
On the other hand, limh→0+ f (c+h)−fh
(c)
≤ 0, and limh→0− f (c+h)−f
h
(c)
≥ 0, therefore, f 0 (c) =
0.

Definition 102. A local minimum or a local maximum of a function f is called an extremum.


A point x where f 0 (x) = 0 holds is called a stationary point.

Any extremum of a differentiable function is a stationary point by Theorem 101, but a


stationary point is not necessarily an extremum.

Example 103. • y = f (x) = x3 . Then f 0 (x) = 3x2 , hence x = 0 is a stationary point, but
as f (x) is monotonically increasing, it is not an extremum.

• y = f (x) = x3 − x. Then f 0 (x) = 3x2 − 1, hence x = ± √13 are stationary points. They are
local maximum and minimum, respectively.

• y = |x|. This function has the minimum at x = 0, but the function does not have derivative
there. In particular, it does not hold f 0 (0) = 0 there (f 0 (0) has no meaning there).

Concrete situation of composed function


Imagine that we have a balloon and a gas is pumped into it at a rate of 50cm3 /s. If the pressure
remains constant, how fast is the radius of the balloon increasing when the radius is 5cm?

• The volume V (t) of the balloon at time t (second): V (t) = 50tcm3 . This implies dV
dt =
50cm3 /s.
4πr(t)3
• The radius r(t) of the sphere with volume V (t): 3 = V (t), By differentiating both
sides by t, 4π dr
dt (t)r(t) = dt .
2 dV

• By solving this with r(t0 ) = 5, dr


dt (t0 ) = 50
4π52
= 2π .
1

63
Some shape can be represented by an equation, and the equation may define a function
implicitly. For example, we know that the circle centered at (0, 0) with radius r is given by

x2 + y 2 = r 2 .

As we saw before, if we consider only the part y ≥ 0, it defines the function y = r2 − x2 .
It is not always possible to find an explicit expression for y of a given equation. Yet, an
equation may define a function in an abstract way. Let us write it y(x).
√ 1
With the explicit expression, y(x) = r2 − x2 = (r2 − x2 ) 2 , therefore,
1 −2x x
y 0 (x) = √ = −√ .
2 r 2 − x2 r − x2
2

It holds that y(x)y 0 (x) = −x.


This last relation can be also derived as follows: by taking the derivative of x2 + y(x)2 = r2 ,
we obtain 2x + 2y(x)y 0 (x) = 0, hence y(x)y 0 (x) = −x.
If we know some concrete values of y, x (even if we do not know the general formula), then
we can compute y 0 (x) at that point.

The inverse trigonometric functions


sin x, cos x, tan x are injective on certain domains, and hence have the inverse functions. The
standard choices are the following.

• sin x: consider the interval [− π2 , π2 ]. The range is [−1, 1]. The inverse function is denoted
by arcsin x, defined on [−1, 1].

• cos x: consider the interval [0, π]. The range is [−1, 1]. The inverse function is denoted by
arccos x, defined on [−1, 1].

• tan x: consider the interval [− π2 , π2 ]. The range is R. The inverse function is denoted by
arctan x, defined on R.

p Let us compute the derivative of arcsin y by putting y = sin x. Then D(sin x) = cos x =
1 − sin x = 1 − y 2 . By the general formula, D(arcsin y) = D(sin √1 .
p 1 1
2
x) = cos x = 2 1−y

Nov. 3 (14:00). More applications of derivative.


Theorem 104 (Rolle). Let f be continuous in [a, b] and differentiable in (a, b). If f (a) = f (b),
then there is x0 ∈ (a, b) such that f 0 (x0 ) = 0.

Proof. If f is constant, then f 0 (x) = 0 for all x ∈ (a, b).


If f is not constant, then by Theorem 86 of Weierstrass, f has a minimum and a maximum.
As f is not constant, one of them must be different from f (a) = f (b). Therefore, we take x0
that is either minimum or maximum, and a 6= x0 6= b. Let us take an open interval containing
x0 . Now by Theorem 101, f 0 (x0 ) = 0.

Proposition 105 (Lagrange’s mean value theorem). Let f be continuous in [a, b] and differen-
tiable in (a, b). Then there is x0 ∈ (a, b) such that f (b)−f
b−a
(a)
= f 0 (x0 ).
(f (b)−f (a))x
Proof. Let g(x) = f (x) − b−a , which is continuous in [a, b] and differentiable in (a, b).
Then g(a) = f (a)b−fb−a
(b)a
= g(b), and by Theorem 104 there is x0 such that g 0 (x0 ) = 0. This
f (b)−f (a)
implies f 0 (x0 ) − b−a = 0.

64
a b

Figure 48: A non constant function, continuous in [a, b] and differentiable in (a, b), must have a
stationary point.

a b

Figure 49: A function continuous in [a, b] and differentiable in (a, b), must have a point where
the deriative is equal to the mean slope.

Corollary 106. Let f be continuous in [a, b] and differentiable in (a, b). If f 0 (x) = 0 for all
x ∈ (a, b), then f is constant.
f (y)−f (x)
Proof. Let x < y ∈ [a, b]. By Theorem 105, there is x0 ∈ (x, y) such that y−x = f 0 (x0 ) = 0,
therefore, f (x) = f (y).

Corollary 107. Let f be continuous in [a, b] e derivabile in (a, b).


• If f 0 (x) ≥ 0 (> 0, respectively) for all x ∈ (a, b), then f is monotonically non decreasing
(increasing, respectively).

• If f 0 (x) ≤ 0 (< 0, respectively) for all x ∈ (a, b), then f is monotonically non increasing
(decreasing, respectively).

Proof. Let x < y ∈ (a, b). By Theorem 105, there is x0 ∈ (x, y) such that f (y)−f (x)
(y−x) = f 0 (x0 ). If
f (x0 ) ≥ 0(> 0), then f (y)−f (x) ≥ 0(> 0), that is f is monotonically non decreasing (increasing,
0

respectively).
The case f 0 (x) ≤ 0(< 0) is analogous.

Example 108. • f (x) = x2 . f 0 (x) = 2x, hence f is decreasing if x < 0, x = 0 is the only
one stationary point, and is increasing if x > 0.

• f (x) = sin x. f 0 (x) = cos x, hence f is increasing if x ∈ (− π2 + 2πn, π2 + 2πn) for n ∈ Z,


x = π2 +2πn, − π2 +2πn are stationary points, and f is decreasing if x ∈ ( π2 +2πn, 3π
2 +2πn).

Theorem 109. Let f be continuous in [a, b] and differentiable in (a, b). Let c ∈ (a, b).
• If f 0 (x) > 0 for x ∈ (a, c) and f 0 (x) < 0 for x ∈ (c, b), then f has a maximum at c.

• If f 0 (x) < 0 for x ∈ (a, c) and f 0 (x) > 0 for x ∈ (c, b), then f has a minimum at c.

65
Figure 50: A function and its derivative. When the derivative is positive (negative) in an interval,
the function is increasing (decreasing).

Proof. If f 0 (x) > 0 for x ∈ (a, c), then it is increasing there and continuous at c, therfore, for
any x ∈ (a, b) it holds that f (c) ≥ f (x). On the other hand, as f 0 (x) < 0 for x ∈ (c, b), and
f (c) ≥ f (x) for x ∈ (c, b).
The second case is analogous.
Example 110. • f (x) = x3 −x. f 0 (x) = 3x2 −1, and f 0 (x) > 0 if and only if x < − √13 , √13 <
x, and f 0 (x) < 0 if and only if − √13 < x < √1 .
3
Therefore, x takes a local maximum at
x= and a local minimum x =
− √13 √1 .
3
As f is differentiable in R, there is no other local
maximum or minimum.
• f (x) = cosh x. f 0 (x) = sinh x, and f 0 (x) > 0 if and only if x > 0, and f 0 (x) < 0 if and
only if x <. Therefore, x takes a minimum at x = 0 and no other minimum or maximum.
• f (x) = sinh x. f 0 (x) = cosh x and cosh x > 0, and hence f (x) is monotonically increasing.
Note that, even if f 0 (x) > 0 at one point, it does not mean that f is monotonically increasing
in a neighbourhood of x. Indeed, a counterexample is given by
(
x2 sin x1 + x2 for x 6= 0

f (x) = 1 .
2 for x = 0
As we have seen, this function without the part x2 is differentiable, and it has the derivative 0 at
x = 0. Therefore, with x2 , it is still differentiable and f 0 (0) = 12 > 0.
Yet, f is not monotonically increasing in any interval (−, ). To see this, note that
( 2
2x sin x1 − xx2 cos x1 + 12 = 2x sin x1 − cos x1 + 12 for x 6= 0
   
0
f (x) = .
0 for x = 0
and for any  > 0, there is x <  such that f 0 (x) < 0: for example, one can take x = 2πn1
for
sufficiently large n. Then the term 2x sin x = 0, while − cos x = −1, and then f (x) = − 2 .
1 1 0 1

Note that the derivative f 0 (x) is discontinuous in this case.


Proposition 111 (Cauchy’s mean value theorem). Let a < b, f, g be continuous in [a, b] and
differentiable in (a, b). Then there is x0 ∈ (a, b) such that f 0 (x0 )(g(b) − g(a)) = g 0 (x0 )(f (b) −
f (a)).
Proof. Let h(x) = f (x)(g(b) − g(a)) − g(x)(f (b) − f (a)), then h(x) is continuous in [a, b] and
differentiable on in (a, b). h(a) = f (a)g(b) − f (b)g(a) = h(b). By Rolle’s theorem 104, there is
x0 ∈ (a, b) such that 0 = h0 (x0 ) = f 0 (x0 )(g(b) − g(a)) − g(x0 )(f (b) − f (a)).

66
Figure 51: A function f such that f 0 (0) > 0 but is not monotonically increasing in any interval
containing x = 0.

Nov. 4. Higher derivatives, convexity and concavity, asymptotes.


Higher derivatives
As we saw before, if f is defined on an open interval and is differentiable on each point of I, then
f 0 defines a new function on I, the (first) derivative. It may happen that f 0 is again differentiable
on each point of I, and it defines a further new function f 00 , the second derivative. If f 00 is
again differentiable, one can also define the third derivative, and so on. We denote the n-th
n
derivative by f (n) , or Dn f, ddxnf .

Example 112. • If f (x) = x4 , then f 0 (x) = 4x3 , f 00 (x) = 12x2 , f (3) (x) = 24x, and so on.
This f is infinitely many times differentiable.

• If f (x) = sin x, then f 0 (x) = cos x, f 00 (x) = − sin x, f (3) (x) = − cos x, f (4) (x) = sin x and
so on. Again this is infinitely many times differentiable.
(
x2 sin x1 + x2 for x 6= 0

• Let f (x) = , then we have
0 for x = 0

1
 x2 1
+ 12

2x sin x − x2 cos
 x
f 0 (x) = = 2x sin x1 − cos 1
x + 12 for x 6= 0
for x = 0

1
2

and this derivative is not continuous. In particular, f is only once differentiable.

The second derivative is useful to study whether the stationary point (or a critical point)
is a maximum or a minimum, and also to study the shape of the graph.

Lemma 113. Suppose that f is differentiable in an open interval I and at x0 it is twice differ-
entiable.

• If x0 is a stationary point and f 00 (x0 ) > 0 (f 00 (x0 ) < 0, respectively), then f takes a local
minimum (a local maximum, respectively) at x0 .

• If x0 is a local minimum (a local maximum, respectively), then f 00 (x0 ) ≥ 0 (f 00 (x0 ) ≤ 0,


respectively).
0 0 0
Proof. • Let f 00 (x0 ) > 0. Then there is  > 0 such that f (x0 +h)−f
h
(x0 )
= f (xh0 +h) > 0 for
|h| < . This means that f (x0 + h) > 0 for h > 0 and f (x0 + h) < 0 for h < 0, and hence
0 0

f is monotonically decreasing in (x0 − , x0 ) and increasing in (x0 , x0 + ), that is, f takes
a minimum at x0 .

67
a b

Figure 52: A convex function. The graph is below the segment between any pair of points
(a, f (a)), (b, f (b)).

• If x0 a local minimum and suppose that f 00 (x0 ) < 0, then x0 would be a local maximum
and it would contradict the previous point.

• Other cases are analogous.

Example 114. • Let f (x) = x2 . We have f 0 (x) = 2x and x = 0 is a stationary point. As


f (x) = 2, x is a minimum.
00

• Let f 0 (x) = x3 − 3x. We have f 0 (x) = 3x2 − 3 and x = 1, −1 are stationary points. As
f 00 (x) = 6x, f takes a maximum at x = −1 and a minimum at x = 1.

Convexity and concavity


Note that, for a, b ∈ R and t ∈ [0, 1]. Then ta + (1 − t)b is a point between a, b. Indeed, if a < b,
then a = ta + (1 − t)a < ta + (1 − t)b < tb + (1 − t)b = b (the case b < a is analogous).

Definition 115. Let f be defined on an interval I. We say that f is convex (concave, respec-
tively) if for any a, b ∈ I and t ∈ [0, 1] it holds that

f (ta + (1 − t)b) ≤ tf (a) + (1 − t)f (b) ( respectively f (ta + (1 − t)b) ≥ tf (a) + (1 − t)f (b)).

Note that (ta + (1 − t)b, tf (a) + (1 − t)f (b)) defines a segment between (a, f (a)) and (b, f (b)).
Indeed, the slope from the point (a, f (a)) to such a point is (1−t)(f (b)−f (a))
(1−t)(b−a) = f (b)−f
b−a
(a)
, which
does not depend on t.

Theorem 116. Assume that f is continuous on [a, b], differentiable on (a, b). If f 0 is mono-
tonically nondecreasing (nonincreasing, respectively), then f is convex (concave, respectively).
In particular, if f 00 (x) > 0 (f 00 (x) < 0, respectively) for x ∈ (a, b), then f is convex (concave,
respectively).

Proof. Let x < y in [a, b] and t ∈ (0, 1). Let z = tx + (1 − t)y. We have to prove that
f (z) ≤ tf (x) + (1 − t)f (y), or equivalently, t(f (z) − f (x)) ≤ (1 − t)(f (y) − f (z)).
By Theorem 105, there are points c, d such that x < c < z and z < d < y such that
f (z) − f (x) = f 0 (c)(z − x) and f (y) − f (z) = f 0 (d)(y − z). As f 0 is nondecreasing, f 0 (c) ≤ f 0 (d)
and hence, using t(z − x) = (1 − t)(y − z),

t(f (z) − f (x)) = tf 0 (c)(z − x) ≤ f 0 (d)t(z − x) = f 0 (d)(1 − t)(y − z) = (1 − t)(f (y) − f (z)).

68
Figure 53: The asymptotes for 1
x and tanh x.

Under certain conditions, it can also be shown that f is convex, then f 00 > 0. We omit the
proof.

Example 117. • Let f (x) = x2 . As f 00 (x) = 2, f is convex.

• Let f 0 (x) = x3 . As f 00 (x) = 6x, f is concave on (−∞, 0) and convex on (0, ∞).

Asymptotes
The graph of some function may approach a straight line. A more precise concept of this is
asymptotes.

Definition 118. • Let f be defeind on (a, ∞). If lima→∞ f (x) = L, then we say that y = L
is a horizontal asymptote (analogous for −∞).

• Let f be defined on (a, b). If limx→a+ |f (x)| → ∞, then x = a is called a vertical


asymptote. (analogous for b).

• Let f be defined on (a, ∞). If there is A, B ∈ R such that limx→∞ f (x)


x = A and
limx→∞ f (x)−Ax = B, then we say that y = Ax+B is an oblique asymptote (analogous
for −∞).

Example 119. • Let f (x) = tanh x. We know that limx→∞ tanh x = 1, limx→−∞ tanh x =
−1, hence y = 1, −1 are the horizontal asymptotes of tanh x.

• Let f (x) = x1 on (−∞, 0) ∪ (0, ∞). We know that limx→0+ x1 = ∞, limx→0− x1 = −∞, and
hence x = 0 is a vertical asymptote of x1 . y = 0 is a horizontal asymptote of x1 because
limx→±∞ x1 = 0.

• Let f (x) = x tanh x. Then, we see that limx→∞ x tanh x


x = limx→∞ tanh x = 1 and

ex − e−x
 
lim x tanh x − x = lim x x −1
x→∞ x→∞ e + e−x
−2xe−x
= lim x = 0,
x→∞ e + e−x

hence y = x is an oblique asymptote. Similarly, y = −x an oblique asymptote for x → −∞.

69
Figure 54: The oblique asymptotes for x tanh x.

Figure 55: Left:the graphs of x2 and (x − 12 )2 − 1. Right:the graphs of x3 − x2 and −x3 − x2 .

Nov. 10. Curve sketching


Symmetry of functions
Recall that a function is a subset in R × R in the sense that it collects all the points {(x, y) ∈
R × R : y = f (x)}. This is the graph itself.
We can consider certain operations on a function.

• Translation. If g(x) = f (x−a)+b for some function f, g, then the graph of g is obtained by
translating the graph of f by (a, b). Indeed, if (x, y) is on the graph of f , then (x + a, y + b)
is on the graph of g.

• Reflection. If g(x) = f (−x) for some function f, g, then the graph of g is obtained by
reflecting the graph of f with respect to x = 0. Indeed, if (x, y) is on the graph of f , then
(−x, y) is on the graph of g.

• If g(x) = f (−(x − 2a)) for some function f, g, then the graph of g is obtained by reflecting
the graph of f with respect to x = a.

• Scaling. If g(x) = bf (x/a) for some function f, g and a, b > 0, then the graph of g is
obtained by scaling the graph of f by a in the x-direction and b in the y-direction. Indeed,
if (x, y) is on the graph of f , then (ax, by) is on the graph of g.

70
Figure 56: The graphs of sin x and 2 sin(x/2).

A graph or a function may have a symmetry. A function f is said to have a symmetry if it


is invariant under certain operations.

• Translation symmetry. If f (x) = f (x − a), then the graph of f remains invariant under
the translation (a, 0).

• Reflection. If f (x) = f (−x), then the graph of f is invariant under the reflection respect
to x = 0 and f is said to be even.

• f (x) = −f (−x), f is said to be odd.

• f (x) = f (−(x − 2a)) has reflection symmetry with respect to x = a.

Example 120. • The graph of sin x is invariant under 2π translation and under the reflection
with respect to π2 , because sin(x + 2π) = sin(x) and sin(−(x − π)) = − sin(x − π) = sin(x).
On the other hand, sin(−x) = − sin x, hence sin x is an odd function.

• If f (x) = (x − 21 )2 − 1 is invariant under the reflection with respect to x = 1


2 because
((−(x − 1)) − 12 )2 − 1 = (−x + 21 )2 − 1 = (x − 12 )2 − 1.

Curve sketching
The graph of a function f can be qualitatively drawn as follows.

(0) Determine the (natural) domain A of definition of f .

(0.5) Check if f has a symmetry or a period.

(1) Study the sign of f : where f (x) > 0, = 0, < 0 hold.

(2) Determine the asymptotes.

(3) Study the sign of f 0 and find stationary points (where f 0 (x) = 0).

(4) Study the stationary points and find local minima and maxima (either by the second deriva-
tive or the first).
2
Example 121. • f (x) = e−(2x−1) .

(0) f (x) is defined for all x ∈ R = A in a natural way.


(0.5) f (x + 12 ) = f (−x + 12 ), that is f (x) is even with respect to x = 12 .
2
(1) e−(2x−1) > 0 for all x ∈ R.
(2) Consider x → ±∞. limx→±∞ f (x) = 0. The asymptote is y = 0.

71
Figure 57: Above:The graph of sin x is invariant under 2π translation and under the reflection
with respect to π2 . Below:The graph of (x − 12 )2 + 1 is invariant under the reflection with respect
to x = 12 .

2
Figure 58: The graph of f (x) = e−(2x−1) .

72
Figure 59: The graph of f (x) = log( sin1 x ).

2
(3) f 0 (x) = −4(2x − 1)e−(2x−1) . f 0 (x) = 0 ⇔ 2x − 1 = 0 ⇔ x = 12 . f ( 21 ) = 1.
2 2
(4) f 00 (x) = (16(2x − 1)2 − 8)e−(2x−1) = (64x2 − 64x + 8)e−(2x−1) .
1
x 2
0
f (x) + 0 −
f 00 (x) −
f (x) % 1 &

• f (x) = log( sin1 x ).

(0) log y is defined for y > 0, hence 1


sin x > 0, that is sin x > 0 ⇔ x ∈ (2nπ, (2n + 1)π) for
n ∈ Z.
(0.5) sin(x + 2π) = sin x, hence f (x + 2π) = f (x). It sufficed to draw the graph for (0, π).
Since sin(x + π2 ) = sin(−x + π2 ), f (x) is even with respect to x = π2 .
(1) 0 < sin x ≤ 1, hence 1
sin x ≥ 1 and log( sin1 x ) ≥ 0.
(2) The domain is (0, π), so we need to check {0, π}. limx→0 f (x) = limx→π f (x) = ∞.
The asymptotes are x = 0, π.
(3) f 0 (x) = − cos
sin x . f (x) = 0 ⇔ cos x = 0 ⇔ x =
x 0
2.
π
f ( π2 ) = 0. f 0 (x) < 0 if x ∈ (0, π2 ),
and f (x) > 0 if x ∈ ( π2 , π).
0

(4) f 00 (x) = 1
sin2 x
> 0.
π
x 2
f 0 (x) − 0 +
f 00 (x) +
f (x) & 0 %

Solutions to equations
We can draw the graphs of f (x) = 1 − x2 and g(x) = ex − 1, and prove that there are two
solutions of the equation f (x) = g(x).
Indeed, let us consider the function h(x) = g(x) − f (x) = ex + x2 − 2 and it suffices to find
all x such that h(x) = g(x) − f (x) = 0. We have limx→±∞ g(x) − f (x) = ∞ and g(0) − f (0) =
(1 − 1) − 1 = −1. By the intermediate value theorem, there are solutions in x > 0 e x < 0.
Moreover, h0 (x) = ex + 2x, hence there is only one stationary point (because in x > 0 h0 (x)
is positive and it is negative for sufficiently small x, while g 00 (x) − f 00 (x) = ex + 2 is positive,
therefore, g 0 (x) − f 0 (x) is monotonically increasing). Therefore, h(x) = g(x) − f (x) is decreasing
in a negative half line and is increasing in the rest, hence it can have only two points x where
h(x) = 0.

Some applications of the minimum/maximum finding


If one can express a problem as a problem of finding the maximum or the minimum of a function,
we can solve it using derivatives and graphs.

73
Figure 60: The graph of h(x) = ex + x2 − 2. It crosses the x-axis twice and only twice.

• Among all rectangles of given perimeter 2r, which one has the largest area? Let the vertical
side x, then 0 ≤ x ≤ r and the other side is r − x, hence the area is x(r − x). We need
to find the maximum of f (x) = x(r − x) on the domain {x : 0 < x < r}. We have
limx→0 f (x) = limx→r f (x) = 0, while f 0 (x) = r − 2x, and hence there is a stationary point
at x = 2r , and f 00 (x) = −2, hence this is a local maximum. There is no other stationary
points, and f (0) = f (r) = 0, hence this is the maximum.

• The geometric mean ab is smaller than or equal to the arithmetic mean a+b 2 . Let us fix
√ 2 x+ P
2

P = ab and put a = x, then b = Px and 0 < x. Let us find the minimum of f (x) = 2 x .
2
This tends to ∞ as x → 0 or x → ∞. On the other hand, f 0 (x) = 12 (1− Px2 ), and hence there
2
is only one stationary point at x = P , and f 00 (x) = 2P x3
, hence this is a local minimum,
2
x+ Px
and is the minimum. At x = P , we have f (P ) = P . Hence we have P ≤ 2 .

Nov. 10 (14:00). Theorem of Bernoulli-de l’Hôpital


Let us recall the mean value theorem of Cauchy: let f, g be continuous in [a, b] and differentiable
in (a, b). Then there is x0 ∈ (a, b) such that

f 0 (y)(g(b) − g(a)) = g 0 (y)(f (b) − f (a)).

(Bernoulli-)de l’Hôpital rule is a useful tool to compute limits of the type 0


0 or ∞.

Theorem 122 (Bernoulli-de l’Hôpital, case 1). Let a < x0 , f, g differentiable in (a, x0 ) such
that g 0 (x) 6= 0 for x sufficiently close to x0 , x 6= x0 , limx→x− f (x) = limx→x− g(x) = 0,
0 0
f 0 (x) f (x)
limx→x− g 0 (x) = L ∈ R. Then g(x) 6= 0 for x close to x0 , x 6= x0 and limx→x− g(x) = L.
0 0

74
( (
a b x0

f (x) f 0 (x)
Figure 61: Theorem of de l’Hôpital. The limit limx→x− g(x) is determined by limx→x0 g 0 (x) .
0

Proof. We can extend f, g to (a, x0 ] by putting f (x0 ) = g(x0 ) = 0, such that they are continuous.
By the hypothesis we may assume that g 0 (x) 6= 0 in (b, x0 ). Let x ∈ (b, x0 ), by Lagrange’s mean
value theorem, there is y ∈ (x, x0 ) such that g(x) = g(x) − g(x0 ) = g 0 (y)(x − x0 ) 6= 0, in
particular, g(x) 6= 0.
By Cauchy’s mean value theorem, for x above, there is y ∈ (x, x0 ) such that f 0 (y)(g(x) −
g(x0 )) = g 0 (y)(f (x) − f (x0 )), that is,

f (x) f (x) − f (x0 ) f 0 (y)


= = 0 .
g(x) g(x) − g(x0 ) g (y)
f 0 (y)
If x → x0 , such y tends to x0 . Because limy→x− g 0 (y) = L by the hypothesis it holds that
0
f (x)
limx→x− g(y) = L.
0

A similar result holds for right limits.


ex −1
Example 123. • Consider sin(2x) . The limit x → 0 is of the form 0.
0
It holds that
ex −1
(sin(2x))0 = 2 cos(2x) 6= 0 as x → 0. In addition (ex − 1)0 = ex . Therefore, limx→0 sin x =
ex
2 cos(2x) = 2 .
1
limx→0

• limx→0 x
ex −1 = 1
e0
= 1.
x2
• limx→0 cos x−1 = limx→0 2x
− sin x = 2
− cos 0 = −2.

Theorem 124 (Bernoulli-de l’Hôpital, case 2). Let f, g differentiable in (a, ∞) such that g 0 (x) 6=
0 (x)
0 for x sufficiently large, limx→∞ f (x) = limx→∞ g(x) = 0, limx→∞ fg0 (x) = L. Then g(x) 6= 0
f (x)
for x sufficiently large and limx→∞ g(x) = L.

Proof. Let F (x) = f ( x1 ), G(x) = g( x1 ). Note that, as x → ∞, we have x1 → 0+ , and F 0 (x) =


− x12 f 0 ( x1 ), G0 (x) = − x12 g 0 ( x1 ). Then for sufficiently small x, G0 (x) 6= 0 because g 0 ( x1 ) 6= 0 for such
x. By applying case 1, we obtain

f (x) F (x) F 0 (x) −x2 f 0 ( x1 ) f 0 (x)


lim = lim = lim 0 = lim = lim
x→∞ g(x) x→0+ G(x) x→0+ G (x) x→0+ −x2 g 0 ( 1 ) x→∞ g 0 (x)
x

as desired.
1 2 1
sin( ) − cos( )
Example 125. • limx→∞ 1
x2
= limx→∞ x3
− 23
x2
= 1.
x2 x

Theorem 126 (Bernoulli-de l’Hôpital, case 3). Let a < x0 , f, g differentiable in (a, x0 ) such that
0 (x)
g 0 (x) 6= 0 for x sufficiently close to x0 , limx→x0 f (x) = limx→x0 g(x) = +∞, limx→x0 fg0 (x) = L.
f (x)
Then g(x) 6= 0 for x sufficiently close to x0 and limx→x0 g(x) = L.

75
(
a
f (x) f 0 (x)
Figure 62: Theorem of de l’Hôpital. The limit limx→∞ g(x) is determined by limx→∞ g 0 (x) .

f 0 (y)
Proof. Let ε > 0. By the hypothesis, there is b such that g 0 (y) −L < ε
3 for y ∈ (b, x0 ). In
addition, there is b̃ such that b < b̃ < x0 and in (b̃, x0 ) f (x) > 2f (b) > 0, g(x) > 2g(b) > 0.
g(b)
1− g(x)
Then the function h(x) = f (b) is continuous on (b̃, x0 ] and its value at x0 is 1 . Furthermore,
1− f (x)
it holds that
g(b)
f (x) − f (b) f (x) − f (b) 1 − g(x) f (x)
· h(x) = · f (b)
= .
g(x) − g(b) g(x) − g(b) 1 − g(x)
f (x)

˜ ˜
Let b̃ such that |h(x) − 1| < 3L+1
ε
for x ∈ (b̃, x0 ). By Cauchy’s mean value theorem, there is
y ∈ (b, x) such that
f (x) f (x) − f (b) f 0 (y)
= · h(x) = 0 h(x).
g(x) g(x) − g(b) g (y)
f (x) f 0 (y) f 0 (y) ε2
Now g(x) −L = g 0 (y) h(x) −L < g 0 (y) − L (1 + ε
3L+1 ) + L|h(x) − 1| < ε
3 + 3 + ε
3 < ε.

Theorem 127 (Bernoulli-de l’Hôpital, case 4). Let f, g be differentiable (a, ∞) such that g 0 (x) 6=
0 (x)
0 as x → ∞, limx→∞ f (x) = limx→∞ g(x) = +∞, limx→∞ fg0 (x) = L. Then g(x) 6= 0 for x
f (x)
sufficiently large and limx→∞ g(x) = L.

Proof. Consider F (y) = f ( y1 ), G(y) = g( y1 ). Since 1


y → ∞ as y → 0+ , and D(F (y)) =
Df ( y1 ) Dg( y1 ) DF (y) F (y)
−y 2
, D(G(y)) = −y 2
we can apply case 3 and obtain L = limy→0+ DG(y) = limy→0+ G(y) =
limx→∞ fg(x)
(x)
.

2
Example 128. • Let us compute limx→∞ xex . If the limit limx→∞ 2x ex exists, then by the de
l’Hôpital rule, they should coincide. The latter exists if limx→∞ ex exists, and it does: it is
2

0. Therefore, the second limit exists and it is 0, and hence the first limit exists and it is 0.

76
• limx→0 sin 2x
sin x = limx→0 2 cos 2x
cos x = 2.
log x 1/x
• limx→0 1/ tan x = limx→0 1/ sin2 x
= 0.
cos x
log(sin x)
• limx→0 log x = limx→0 sin x
x = 1.
xn
• limx→∞ ex = 0.
log cosh x sinh x/ cosh x
• limx→∞ x = limx→∞ 1 = 1.

Nov. 11. Landau’s symbols, Taylor’s formula.


Definition 129. Let I be an open interval, f, f1 , f2 , g : I → R, x0 ∈ I and suppose that g(x) 6= 0
in an neighbourhood of x0 , x 6= x0 . We write:

• f (x) = O(g(x)) (as x → x0 ) if there is M > 0 such that |f (x)| ≤ M |g(x)| in an neighbour-
hood of x0 .
f (x)
• f (x) = o(g(x)) (as x → x0 ) if limx→x0 g(x) = 0.

• f1 (x) = f2 (x) + O(g(x)) (f1 (x) = f2 (x) + o(g(x)), respectively) if f1 (x) − f2 (x) = O(g(x))
(= o(g(x)), respectively).

Similarly, let f, g : (a, ∞) → R, and suppose that g(x) 6= 0 for sufficiently large x (that is, there
is X > 0 such that g(x) 6= 0 if x > X). We write:

• f (x) = O(g(x)) (as x → ∞) if there is M > 0 such that |f (x)| ≤ M |g(x)| for sufficiently
large x.
f (x)
• f (x) = o(g(x)) (as x → ∞) if limx→∞ g(x) = 0.

The cases for (−∞, a), or f (x) → 0 and the cases in I but f (x) → 0 are analogous.
xn
Example 130. • If n > 1, xn = o(x) as x → 0 (because limx→0 x → 0).
xn
• xn = o(xm ) as x → 0 if n > m (because limx→0 xm → 0).
xm
• xm = o(xn ) as x → ∞ se n > m (because limx→∞ xn → 0).
log x
• log x = o(x) as x → ∞ (because limx→∞ x → 0).

• log x = o( x1 ) as x → 0 (because limx→0 x log x → 0).

• sin x = O(x) as x → 0 (because limx→0 sin x


x → 1).

• sin x = o(x) as x → ∞ (because limx→∞ sin x


x → 0).

• cos x = O(1) as x → 0 (because limx→0 cos x → 1).


ex −1
• ex − 1 = O(x) as x → 0 (because limx→0 x → 1).

Lemma 131. Let us consider the behaviour x → x0 = 0 (other cases are analogous).

(a) Let a, b ∈ R. If f (x) = O(h(x)), g(x) = O(h(x)), then af (x) + bg(x) = O(h(x)).

(b) Let a, b ∈ R. If f (x) = o(h(x)), g(x) = o(h(x)), then af (x) + bg(x) = o(h(x)).

77
Figure 63: Landau’s symbol. x2 = o(x) as x → 0, but x = o(x2 ) as x → ∞.

(c) If g(x) = o(h(x)), then f (x)g(x) = o(f (x)h(x)).


(Similarly, if g(x) = O(h(x)), then f (x)g(x) = O(f (x)h(x)))

(d) If f (x) = o(h(x)), then f (x) = O(h(x)).

(e) Let f (x) = o(h(x)) and f (0) = 0, limx→0 g(x) = 0. Then f (g(x)) = o(h(g(x))).
(Similarly, if f (x) = O(h(x)), then f (g(x)) = O(h(g(x))))
Proof. (a) We have |f (x)| ≤ M1 |h(x)|, |g(x)| ≤ M2 |h(x)|, hence |af (x) + bg(x)| ≤ |a||f (x)| +
|b||g(x)| ≤ (|a|M1 + |b|M2 )|h(x)|.

(b) Analogous.
g(x) f (x)g(x) g(x)
(c) If limx→0 h(x) = 0, then limx→0 f (x)h(x) = limx→0 h(x) = 0.

f (x) f (x)
(d) If limx→0 h(x) → 0, then h(x) < M for x close enough to 0, hence |f (x)| < M |h(x)|.

(e) Let us define ( f (k)


h(k) if k 6= 0
u(k) =
0 if k = 0.
f (k)
Then u(k) is continuous at k = 0 because h(k) → 0 as k → 0. We have f (g(x)) =
f (g(x)) h(g(x))u(g(x))
h(g(x))u(g(x)), and Altogether, limx→0 h(g(x)) = limx→0 h(g(x)) = 1 · 0 = 0. The
other claim is analogous.

Example 132. As x → 0,
• sin(x2 ) = O(x2 ), because sin(y) = O(y) and we put y = x2 .
3
• ex − 1 = O(x3 ), because ey − 1 = O(y), and we put y = x3 .

• sin2 (x) = O(x2 ), because sin x = O(x), and hence sin2 x = O(x sin x) = O(x2 ).

78
Figure 64: The second order Taylor formula. We approximate a general function by a second
order polynomial.

Second order Taylor Formula


f (x)−f (x0 )
We have defined derivative by limx→x0 x−x0 . If f is differentiable at x0 , then we have
limx→x0 f (x)−f
x−x0
(x0 )
= f 0 (x0 ), or equivalently,

f (x) − f (x0 ) f 0 (x0 )(x − x0 ) f (x) − f (x0 ) − f 0 (x0 )(x − x0 )


 
lim − = lim = 0,
x→x0 x − x0 x − x0 x→x0 x − x0
therefore, f (x) − f (x0 ) − f 0 (x0 )(x − x0 ) = o(x − x0 ). This means that we can approximate f to
the first order by f (x0 ) + f 0 (x0 )(x − x0 ). This is indeed called the first order Taylor formula.
The Taylor formula can be extended to higher order.
Proposition 133 (Second order Taylor formula). Let f be differentiable in (a, b) and twice
differentiable at x0 ∈ (a, b). Then f (x) = f (x0 ) + (x − x0 )f 0 (x0 ) + 21 (x − x0 )2 f 00 (x0 ) + o((x − x0 )2 )
as x → x0 .
Proof. Let us put P2 (x) = f (x0 ) + (x − x0 )f 0 (x0 ) + 21 (x − x0 )2 f 00 (x0 ). Then P20 (x) = f 0 (x0 ) +
(x − x0 )f 00 (x0 ). Furthermore, the first order Talylor formula holds for f 0 : f 0 (x) = f 0 (x0 ) + (x −
x0 )f 00 (x0 ) + o(x − x0 ) as x → x0 . That is,
D(f (x) − P2 (x)) f 0 (x) − f 0 (x0 ) − (x − x0 )f 00 (x0 ) 1
lim 2
= lim = (f 00 (x0 ) − f 00 (x0 )) = 0.
x→x0 D((x − x0 ) ) x→x0 2(x − x0 ) 2
By the Bernoulli-de l’Hôpital theorem,
f (x) − P2 (x)
lim = 0.
x→x0 (x − x0 )2
that is, f (x) = P2 (x) + o((x − x0 )2 ).

Example 134. As x → 0,
x2
• ex = 1 + x + 2 + o(x2 )
x2
• log(1 + x) = x − 2 + o(x2 ).
• sin(x) = x + o(x2 ).
x2
• cos(x) = 1 − 2 + o(x2 ).

79
Nov. 15. Higher order Taylor formula and more examples.
Higher order Taylor(-Peano) Formula
For f n-times differentiable, the following holds (as we prove later).
With the convention f (0) (x) = f (x),
1 1
f (x) = f (x0 ) + (x − x0 )f 0 (x0 ) + (x − x0 )2 f 00 (x0 ) + · · · + (x − x0 )n f (n) (x0 ) + o((x − x0 )n )
2 n!
n
X f (k) (x0 )
= (x − x0 )k + o((x − x0 )n )
k!
k=0
Pn f (k) (x0 )
The part k=0 k! (x − x0 )k is called the Taylor polynomial of f .

Lemma 135. Let f, g differentiable n times in (a, b) and x0 ∈ (a, b). Suppose that g (k) (x) 6= 0 for
x 6= x0 , 0 ≤ k ≤ n but f (k) (x0 ) = g (k) (x0 ) = 0 for 0 ≤ k ≤ n − 1. Then for any x 6= x0 , x ∈ (a, b)
f (x) f (n) (ξ)
there is ξ between x, x0 such that g(x) = g (n) (ξ)
.

Proof. By Proposition 111,

f (x) f (x) − f (x0 ) f 0 (ξ1 ) f 0 (ξ1 ) − f 0 (x0 ) f (2) (ξ2 ) f (n) (ξn )
= = 0 = 0 = = · · · =
g(x) g(x) − g(x0 ) g (ξ1 ) g (ξ1 ) − g 0 (x0 ) g (2) (ξ2 ) g (n) (ξn )
and we put ξ = ξn .

Proposition 136. Let F be differentiable n times at x0 ∈ (a, b). Then, F (x) = o((x − x0 )n ) as
x → x0 if and only if F (k) (x0 ) = 0 for 0 ≤ k ≤ n.
Proof. We know this for n = 0 by definition. Let us prove the general case by induction, by
assuming that it is true for n.
Let F (x) = o((x − x0 )n+1 ). Then F (k) (x0 ) = 0 for 0 ≤ k ≤ n by the hypothesis of induc-
F (x) F (x) F (n) (ξ)
tion. The assumption is 0 = limx→x0 (x−x0 )n+1
. On the other hand, (x−x0 )n+1
=
(n+1)!(ξ−x0 )
F (x) F (n) (ξ)
for some ξ. If x → x0 , ξ → x0 , that is, 0 = limx→x0 (x−x0 )n+1
= limξ→x0 (n+1)!(ξ−x 0)
=
(n) (ξ)−F (n) (x ) F (n+1) (x0 )
limξ→x0 F (n+1)!(ξ−x 0 )
0
= (n+1)! , hence F
(n+1) (x ) = 0.
0
Let F (x0 ) = 0 for 0 ≤ k ≤ n + 1. Then by the Bernoulli-de
(k) l’Hôpital theorem,

F (n+1) (x0 ) F (n) (x) − F (n) (x0 )


0= = lim
(n + 1)! x→x0 (n + 1)!(x − x0 )

F (n) (x) F (n−1) (x) − F (n−1) (x0 )


= lim = lim (n+1)!
x→x0 (n + 1)!(x − x0 ) x→x0 (x − x0 )22
F (n−1) (x) F (n−2) (x) − F (n−2) (x0 )
= lim = lim
x→x0 (n+1)! (x
− x0 )2 x→x0 (n+1)!
− x0 )3
2 3! (x
F (x)
· · · = lim .
x→x0 (x − x0 )n+1

That is, F (x) = o((x − x0 )n+1 ).

Corollary 137. Let f (x) differentiable n times at x0 ∈ (a, b). Then with
n
X f (k) (x0 )
Pn (x) = (x − x0 )k ,
k!
k=0

we have f (x) = Pn (x) + o((x − x0 )n ).

80
Figure 65: The graph of a function whose Taylor series converges but not to itself.

Proof. Dk (f (x0 ) − Pn (x0 )) = 0 for 0 ≤ k ≤ n. By Proposition 136, f (x0 ) = Pn (x0 ) + o((x −


x0 )n ).

Example 138. • f (x) = ex . As f (n) (x) = ex , we have f (n) (0) = 1 and hence ex =
Pn xn x2 x3 xn
k=0 n! + o(x ). That is, e = 1 + x + 2 + 3! + · · · + n! + o(x ) as x → x0 = 0.
n x n

• f (x) = sin x. As f (4n) (x) = sin x, f (4n+1) (x) = cos x, f (4n+2) (x) = − sin x, f (4n+3) (x) =
− cos x, we have f (4n) (0) = 0, f (4n+1) (x) = 1, f (4n+2) (x) = 0, f (4n+3) (x) = −1, and hence
k x2k+1 n 2n+1
sin x = nk=0 (−1) 2n+1 ). That is, sin x = x− x3 + x5 +· · ·+ (−1) x 2n+1 )
P
(2k+1)! +o(x 3! 5! (2n+1)! +o(x
as x → x0 = 0.
Very often, the Taylor series converges to the original function f (x), that is,

X f (k) (x0 )
f (x) = (x − x0 )k
k!
k=0

holds for some functions (ex , sin x, cos x, log(1 + x)) and for some x. One example we can show
easily such convergence is f (x) = 1−x 1
. Indeed, f 0 (x) = (1−x)
1
2,f
(n) (x) =
(1−x)n . And hence the
n!

Taylor series around x = 0 is


n n
X n!xn X n
= x ,
n!
k=0 k=0
1−xn
and we know that this partial sum is 1−x , which converges to 1−x
1
for |x| < 1. But the series
does not converge for if |x| ≥ 0.
There are functions whose Taylor series converges but not to the original function. For
example, if we take ( 1
e− x if x > 0
f (x) =
0 if x ≥ 0
then f (n) (0) = 0 for all n, hence the Taylor polynomial is identically 0, but the original function
f (x) is not identically 0.
The question of for which function the Taylor series converges to the origial function will be
studied in Mathematical Analysis II.

Applications to certain limits


Taylor’s formula can be used to compute certain indefinite limits.
Example 139. •
ex − 1 − x
lim
x→0 sin(x2 )

As x → x0 = 0, we have
x2
– ex = 1 + x + 2 + o(x2 )

81
Figure 66: Approximating the area surrounded by f (x) by rectangles.

– sin y = y + o(y)
– sin(x2 ) = x2 + o(x2 )

Then it holds, as x → 0,
x 2
2
ex − 1 − x 2 + o(x )
=
sin(x2 ) x2 + o(x2 )
ex −1−x
hence limx→0 sin(x2 )
= 21 .

• √
x − ln(1 − x) − 2x 1 + x
lim
x→0 sin(x) − xex
As x → x0 = 0, we have

– ln(1 − x) = −x − 2!1 x2 + o(x2 )



– x 1 + x = x(1 + 12 x − o(x)) = x + 12 x2 + o(x2 )
– sin x = x + 0 · x2 + o(x2 )
– xex = x(1 + x + o(x)) = x + x2 + o(x2 )

Then it holds, as x → 0,

x − ln(1 − x) − 2x 1 + x − 21 x2 + o(x2 )
=
sin(x) − xex −x2 + o(x2 )

x−ln(1−x)−2x 1+x
hence limx→0 sin(x)−xex = 12 .

Nov. 17. Definite integral.


Given a function f , we consider (Riemann) integral. This is a concept that extends the area of
familiar figures such as triangles and disks. If f (t) represents the velocity of a car at time t, then
the integral of f gives the distance the car travels in a time interval. If f is the density of a piece
of iron, the integral gives the weight.
The area of a region defined by a function can be approximated by rectangles. We know that
the area of a rectangle with sides a, b is ab.
For an interval I = (a, b) or (a, b] etc., we define |I| = b − a.
Definition 140. (i) Let I be a bounded interval
S in R. A partition of I is a finite set of disjoint
intervals P = {Ij : 1 ≤ j ≤ n} such that nj=1 Ij = I.

(ii) diam(P ) = max{|Ij | : 1 ≤ j ≤ n}.

(iii) A partition P 0 is called a refinement of P if every interval of P admits


Snj a partizione formed
by intervals in P . That is, every Ij ∈ P can be written as Ij = k=1 Ijk , Ijk ∈ P .
0 0

We denote this by P 0  P .

82
I ( ]
I ( ] P ( )[ ]
P ( )[ ] P0 ( (] )[ ]
P0 ( (] )[ )[ ] P ∧ P 0( (] (] )[ ]

Figure 67: Left: a partition P of an interval I and a refinement P 0  P . Right: two partitions
P, P 0 of I and P ∧ P 0 .

Figure 68: S I (f, P ) and S I (f, P ) for a given a partition P of I.

(iv) If P, P 0 are partitions of I, we define P ∧ P 0 = {I ∩ I 0 : I ∈ P, I ∈ P 0 , I ∩ I 0 6= ∅}. We have


P ∧ P 0  P, P 0 .

If P is a partition of I, then |I| = nj=1 |Ij |.


P
For a partition P of I and a bounded function f : I → R, we put
n
X
S I (f, P ) := (inf f )|Ij |, (“lower sum”)
Ij
j=1
Xn
S I (f, P ) := (sup f )|Ij | (“upper sum”).
j=1 Ij

We have S I (f, P ) ≤ S I (f, P ).

n , 1]}. If n = mn for m ∈ N, then


Example 141. I = [0, 1]. Pn = {[0, n1 ), [ n1 , n2 ), · · · , [ n−1 0

Pn0  Pn . diam(Pn ) = n .
1

• If f (x) = a, then S I (f, Pn ) = S I (f, Pn ) = a.

• Let f (x) = x.
n
X j 1 1 n(n + 1)
S I (f, Pn ) = · = 2·
n n n 2
j=1

Analogously,
n  
X j−1 1 1 n(n + 1) 1 n(n − 1)
S I (f, Pn ) = · = 2· −n = 2 .
n n n 2 n 2
j=1

83
···
···

Figure 69: The upper and lower sum for f (x) = a (constant) and f (x) = x.

Therefore, by taking n → ∞, we obtain limn→∞ S I (f, Pn ) = limn→∞ S I (f, Pn ) = 12 , which


is the area of the triangle {(x, y) : x ∈ [0, 1], 0 ≤ y ≤ x = f (x)}.

Lemma 142. Let I be a bounded interval, f : I → R a bounded function. If P, P 0 are two


partitions of I, then

(i) If P ≺ P 0 , then
S I (f, P ) ≤ S I (f, P 0 ) ≤ S I (f, P 0 ) ≤ S I (f, P ).

(ii) S I (f, P ) ≤ S I (f, P 0 )

Proof. (i) If P ≺ P 0 , then we can take P = {Ij : 1 ≤ j ≤ n} and P 0 = {Ijk : 1 ≤ j ≤ n, 1 ≤


nj
j ≤ nj } such that ∪k=1 Ijk = Ij . Then, for every j, k, inf Ij f ≤ inf Ijk f , supIj f ≥ supIjk f .
It follows that
n
X n
X nk
X nk
n X
X
S I (f, P ) = (inf f )|Ij | = (inf f ) |Ijk | = (inf f )|Ijk |
Ij Ij Ij
j=1 j=1 k=1 j=1 k=1
nk
n X
X
≤ (inf f )|Ijk | = S I (f, P 0 ).
Ijk
j=1 k=1
n
X n
X nk
X nk
n X
X
S I (f, P ) = (sup f )|Ij | = (sup f ) |Ijk | = (sup f )|Ijk |
j=1 Ij j=1 Ij k=1 j=1 k=1 Ij
n
X Xnk
≥ (sup f ) |Ijk | = S I (f, P 0 ).
j=1 Ijk k=1

Note that S I (f, Q) ≤ S I (f, Q) for any partition Q.

(ii) Since P ≺ P ∧P 0 , P 0 ≺ P ∧P 0 , it follows from the previous point S I (f, P ) ≤ S I (f, P ∧P 0 ) ≤


S I (f, P ∧ P 0 ) ≤ S I (f, P 0 ).

Definition 143. Let I = (a, b) or [a, b] etc. a bounded interval and f : I → R bounded. f is
said to be integrable on I if

sup S I (f, P ) = inf S I (f, P ) ,


P P
“lower integral” “upper integral”

84
where inf P and supP are taken over all possible partitions of P of I and in this case we denote
this number by
Z Z b
f (x)dx = f (x)dx.
I a
x does not have any meaning, and one can also write thsi as I f (t)dt.
R

R1
Example 144. • 0 adx = a. Indeed, for all partitions S I (f, P ) = S I (f, P ) = a.
R1
• 0 xdx = 12 . Indeed, with f (x) = x, we have found Pn such that S I (f, P ) = n(n−1) 2n2
and
n(n+1)
S I (f, P ) = 2n2
, hence the sup and the inf coincide and it is 21 .
In general, it is difficult to show integrability by definition. Fortunately, we can prove that
continuous functions on a closed bounded interval are integrable, and we also have the funda-
mental theorems of calculus, that let us calculate integrals with the knowledge of derivatives.

Nov. 18. Integrability of continuous functions, fundamental theo-


rems of calculus.
Recall that any continuous function f on a closed interval I = [a, b] is uniformly continuous, that
is, for given  > 0 there is δ > 0 such that |f (x) − f (y)| <  for x, y ∈ I with |x − y| < δ.
Theorem 145. Let I = [a, b] be a closed bounded interval. Then any continuous function f on
I is integrable.
Proof. By Theorem 90, for 2(b−a) 
there is δ such that for x, y ∈ I, |x − y| < δ it holds that
|f (x) − f (y)| < 2(b−a) . Now, for any partition P = {Ij }nj=1 with diam P = max{|Ij | : 1 ≤ j ≤


n} < δ, we have
n n
X X  
S I (f, P ) − S I (f, P ) = (sup f − inf f )|Ij | < |Ij | = < .
Ij Ij 2(b − a) 2
j=1 j=1

Therefore, f is integrable.
(
0 x rational
Example 146. (of a nonintegrable function) If f (x) = , f (x) is not integrable:
1 x irrational
S I (f, P ) = 1, S I (f, P ) = 0, because any interval I contains both a rational number and an
irrational number.
Proposition 147. Let I be a bounded interval, f, g bounded and integrable on I.
R R R
(i) If c, d ∈ R, then cf + dg is integrable on I and I (cf (t) + dg(t))dt = c I f (t)dt + d I g(t)dt.
R R
(ii) If f ≤ g, then I f (t)dt ≤ I g(t)dt.

(iii) RIf I ⊂ I, then


Pn f Ris integrable on I. If P = {Ij : 1 ≤ j ≤ n} is a partition of I, then
I f (t)dt = j=1 Ij f (t)dt.

Proof. (i) Integrability of cf is easy: S I (cf, P ) = cS I (f, P ) if c ≥ 0 and S I (cf, P ) = cS I (f, P )


if c < 0. If c = 0 everything becomes 0, and if c > 0 one obtains the limit directly. If c < 0
sup and inf are exchanged.
Let f, g be integrable. We have inf Ij f + inf Ij g ≤ inf Ij (f + g), hence for any partition P ,
S I (f, P ) + S I (g, P ) ≤ S I (f + g, P ). Analogously, S I (f + g, P ) ≤ S I (f, P ) + S I (g, P ). By
taking
R inf and sup with R respect to RP , we obtain integrability of f + g and the equality
I (cf (t) + df (t))dt = c I f (t)dt + d I g(t)dt.

85
I ( ]
I ( ]
P ( (] )[ ]
P2 ( )[ ]

Figure 70: The patition P2 of a subinterval I of I obtained from a partition P of I.

(ii) If f ≤ g, then S I (f, P ) ≤ S I (g, P ) for any P . Similarly S I (f, P ) ≤ S I (g, P ).

(iii) Let I ⊂ I. Then, for any partition P of I, we can take a refinement P 0 which consists of
the intervals of the form Ij ∩ I and Ij \ I (the latter may be a union of two intervals).
Let P2 = {Ij ∩ I : 1 ≤ j ≤ n}. Then
n
X
S I (f, P2 ) − S I (f, P2 ) ≤ S I (f, P2 ) − S I (f, P2 ) + (sup f − inf f )|Ij \ I|
Ij \I
j=1 Ij \I

= S I (f, P 0 ) − S I (f, P 0 )

As f is integrable, there is P 0 such that S I (f, P 0 ) − S I (f, P 0 ) < , then S I (f, P2 ) −


S I (f, P2 ) < .
Let us consider the case n = 2, that is I = I1 ∪I2 . For P1 , P2 partitions of I1 , I2 , P = P1 ∪P2
is a partition of I. Then

S I1 (f, P1 ) + S I2 (f, P2 ) = S I (f, P1 ∪ P2 ) ≤ S I (f, P1 ∪ P2 ) = S I1 (f, P1 ) + S I2 (f, P2 ).

If S I1 (f, P1 )−S I1 (f, P1 ) < 2ε and RS I2 (f, P2 )−SRI2 (f, P2 ) < 2εR, then S I (f, P1 ∪P2 )−S I (f, P1 ∪
P2 ) < ε and the limits coincide: I f (x)dx = I1 f (x)dx + I2 f (x)dx.

R
Corollary
R 148. If f is continuous, then |f | is continuous and by Theorem 147(ii), I f (x)dx ≤
I |f (x)|dx.
Ra Rb
Definition 149. If a < b, then we put b f (x)dx = − a f (x)dx.
Rb Rc Rb
Lemma 150. It holds that a f (x)dx = a f (x)dx + c f (x)dx for all a, b, c ∈ R.

Proof. If a < c < b, then this follows from 147(iii). If a < b < c, then
Z c Z b Z c Z b Z b
f (x)dx = f (x)dx + f (x)dx = f (x)dx − f (x)dx.
a a b a c

The other cases are analogous.

Theorem 151 (Fundamental theorem of calculus 1). Let I = [a, b] a bounded closed interval
and f : I → R continuous. Then the function of x on I defined by
Z x
F (x) = f (t)dt
a

is differentiable and F 0 (x) = f (x).

86
Proof. We have
x+h x
1 x+h
Z 
F (x + h) − F (x)
Z Z
1
= f (t)dt − f (t)dt = f (t)dt
h h a a h x
R x+h R x+h R x+h R x+h
As f is continuous, x f (t)dt − f (x)h = x f (t)dt − x f (x)dt ≤ x |f (t) − f (x)|dt
and for any  > 0 there is δ > 0 such that if |t − x| < δ, then |f (t) − f (x)| < , therefore,
R x+h F (x+h)−F (x) 1 x+h
Then, for such
R
x |f (t) − f (x)|dt < h. h, h − f (x) = h x f (t)dt − f (x) < .
F (x+h)−F (x)
Since  > 0 is arbitrary, limh→0 h = f (x).

Theorem 152 (Fundamental theorem of calculus 2). Let I = [a, b] be a closed bounded interval
and f : I → R differentiable in (a, b) and f 0 is continuous and extends to a continuous function
on Then Z x
f (x) − f (a) = f 0 (t)dt.
a
Rx
Proof. D(f (x) − Rf (a)) = f 0 (x), while D( a f 0 (t)dt) = f 0 (x) by Theorem R x 0 151. Therefore,
x 0
D(f (x) − f (a) − a f (t)dt) =R 0, and by Corollary 106, f (x) − fR(a) − a f (t)dt is constant,
a x
but with x = a, f (a) − f (a) − a f 0 (t)dt = 0, hence f (x) − f (a) − a f 0 (t)dt = 0.

Theorem 152 allows us to compute integrals of certain functions.


n+1 Rx
• We know that D(xn+1 ) = (n + 1)xn , or D( xn+1 ) = xn . Hence a tn dt = 1
n+1 (x
n+1 − an+1 ).

We know that D(ex ) = ex , hence


Rx
• a et dt = (ex − ea ).

Nov. 22. Primitive and examples of integral calculus.


Definition 153. Let f : I → R. If there is a function F : I → R such that F 0 = f , then F is
called a primitive of f .

By Corollary 106, if there are two primitives F, G of f , then F (x) − G(x) is a constant.
Rx
By Theorem 145 and Theorem 151 there is a primitive if f is continuous: F (x) = a f (t)dt
is a primitive of f .

Corollary 154. Let f be a continuous function on a closed bounded interval I = [a, b], and F
Rb
be a primitive of f . Then a f (t)dt = F (b) − F (a).
Rx
Proof. Let F̃ (x) = a f (t)dt. Then F̃ is a primitive of f , and hence F̃ (x) − F (x) = c (constant).
Rb
By Theorem 152, a f (x)dt = F̃ (b) − F̃ (a) = (F (b) + c) − (F (a) + c) = F (b) − F (a).
Rb
This tells us a way to compute the integral a f (x)dx: we only have to find a primitive F (x)
of f (x) and take the difference F (b) − F (a). This is denoted by [F (x)]ba . Namely,
Z b
f (x)dx = [F (x)]ba .
a

A primitive of f is also written by f (x)dx (up to a constant) and it is called the indefinite
R
2
integral of f . With a generic constanti it is written, for example xdx = x2 + C. In contrast,
R
Rb
a f (x)dx (the integral in the interval [a, b]) is called a definite integral.

xn+1 xn+1
• Let f (x) = xn , n ∈ N. Then F (x) = is a primitive of f (x). That is, xn dx =
R
n+1 n+1 +C.

87
• Let f (x) = sin x. Then F (x) = − cos x a primitive of f (x). That is,
R
sin xdx = − cos x+C.

• Let f (x) = cos x. Then F (x) = sin x a primitive of f (x). That is, cos xdx = sin x + C.
R

• Let f (x) = ex . Then F (x) = ex a primitive of f (x). That is, ex dx = ex + C.


R

• Let f (x) = x1 . Then F (x) = log x a primitive of f (x). That is, x1 dx = log x + C.
R

• Let f (x) = xn+11


, n ∈ N. Then F (x) = − nx1n a primitive of f (x). That is, xn+1
R 1
dx =
1
− nxn + C.

With this knowledge of primitives, we can compute definite integrals.


R1 3
• 0 x2 dx = [ x3 ]10 = 31 − 03 = 13 .
R2 5
• −1 x4 dx = [ x5 ]2−1 = 32 5 − (− 5 ) = 5 .
1 33


• 0 sin xdx = [− cos x]π0 = −(−1) − (−1) = 2.

• 0 cos xdx = [sin x]π0 = 0 − 0 = 0.
R2
• 1 ex dx = [ex ]21 = e2 − e.
R2
• 1 x1 dx = [log x]21 = log 2 − log 1 = log 2.
R2
• 1 x12 dx = [− x1 ]21 = − 12 − (− 11 ) = 21 .
R2
Note that −1 x1 dx is not integrable!

Lemma 155. We have the folowing.

• (f (x) + g(x))dx = f (x)dx + g(x)dx + C.


R R R

• af (x)dx = a f (x)dx + C for a ∈ R.


R R

• If f (x)dx = F (x) + C, then f (x − a)dx = F (x − a) + C for a ∈ R.


R R

Proof. All these follow from the rules for derivatives.

• If DF (x) = f (x), DG(x) = g(x), then D(F (x) + G(x)) = f (x) + g(x).

• If DF (x) = f (x), then D(aF (x)) = af (x).

• If DF (x) = f (x), then D(F (x − a)) = f (x − a) by the chain rule.

With this, we can compute also definite integrals.

Lemma 156. We have the folowing.


Rb Rb Rb
• a (f (x) + g(x))dx = a f (x)dx + a g(x).
Rb Rb
• a cf (x)dx = c a f (x)dx for c ∈ R.

Proof. This follows immediately from Lemma 155.


R1 3 3
• 0(x − 1)2 dx = [ (x−1)
3 ]10 = 03 − (−1)
3 = 13 .
R2 R2 4 2x3 2
• −1 x2 (x − 2)dx = −1 x3 − 2x2 dx = [ x4 − 3 ]−1 = (4 − 16
3 ) − ( 14 − (− 23 )) = − 94 .

88

• 0 sin(x − π3 )dx = [− cos(x − π3 )]π0 = − cos 2π π
3 − (− cos(− 3 )) =
1
2 + 1
2 = 1.
R2
• 2
1 x+1 dx = 2[log(x + 1)]21 = 2(log 3 − log 2).

Theorem 157. Let f be a continuous function on I = [a, b]. Then there is c ∈ (a, b) such that
Rb
a f (x)dx = f (c)(b − a).
Rx
Proof. Note that F (x) = a f (t)dt is differentiable and F 0 (x) = f (x). By Lagrange’s mean value
Rb
theorem, there is c ∈ (a, b) such that F 0 (c) = F (b)−F
b−a
(a)
, that is, f (c)(b − a) = a f (x)dx.

Some applications
Let us imagine a car travelling at the speed v(t) at time t. Then the distance travelled from time
Rb
a and b is a v(t)dt. Indeed, if x(t) is the place of the car at time t, then we have x0 (t) = v(t) by
Rb
definition. By Theorem 152, we have x(b) − x(a) = a v(t)dt.
As another example from physics, consider the situation where someone is pushing up ver-
tically a mass m (kg) to a certain height h. The work done by this motion is, as far as the
gravitational force is constant, mgh, where g is the gravitational acceleration. mg is called the
weight, which is the downward force. If one is pushing a mass in a changing R h gravitational field
g(x) (like a rocket carrying a payload), the work done by this motion is h12 mg(x)dx.

Nov. 24. Integral calculus.


Indefinfite integral of elementary functions
Note that, for x < 0, D(log |x|) = D(log(−x)) = − −x1
= x1 . Altogether,
f 0 (x)
R
f (x) f (x)dx
c (constant) 0 cx + C
xα+1
xα αxα−1 α+1 + C for α 6= 0, −1,x 6= 0 for negative power
x−1 − x12 log |x| + C x 6= 0
1 2x
x2 +1
− (x2 +1)2 arctan x + C
√ 1 x
3 arcsin x + C −1 < x < 1
1−x2 (1−x2 ) 2
ex ex ex + C
log |x| 1
x x log |x| − x + C see below, x 6= 0
sin x cos x − cos x + C
cos x − sin x sin x + C
sinh x cosh x sinh x + C
cosh x sinh x cosh x + C

Integration by parts
Recall that, if f, g are differentiable, then it holds that D(f (x)g(x)) = Df (x)g(x) + f (x)Dg(x).
By writing this as Df (x)g(x) = D(f (x)g(x)) − f (x)Dg(x), we can find a primitive of Df (x)g(x)
if we know a primitive of f (x)Dg(x). Schematically,
Z Z
f (x)g(x)dx = f (x)g(x) − f (x)g 0 (x)dx + C.
0

This is called integration by parts.


• Consider x cos xdx. With f (x) = sin x, g(x) = x, this is of the form
R
Example 158.
f 0 (x)g(x), because f 0 (x) = cos x. By integration by parts, with g 0 (x) = 1, we obtain
Z Z
x cos xdx = x sin x − sin x · 1dx + C = x sin x + cos x + C.

89
We can check this results by taking the derivative: D(x sin x + cos x) = sin x + x cos x −
sin x = x cos x.

• Consider log xdx. We can see this as 1 · log x, and 1 = D(x). Therfore, with f (x) =
R

x, g(x) = log x and g 0 (x) = x1 , we have


Z Z Z
1
log xdx = x log x − x · dx + C = x log x − 1dx + C = x log x − x + C.
x

• Consider x2 sin xdx. This cannot be integrated by one step, but by successive applications
R

of integration by parts. By noting that sin x = D(− cos x) and cos x = D(sin x),
Z Z
2 2
x sin xdx = x (− cos x) − 2x(− cos x)dx + C
Z
2
= −x cos x + 2x sin x − 2 sin xdx + C

= −x2 cos x + 2x sin x + 2 cos x + C.

As for indefinite integral, we do not have to find Rthe whole indefinite integral, but we can
b
give values to parts. Let us recall that f (b) − f (a) = a f 0 (x)dx.

Lemma 159. If f, g are differentiable and f 0 , g 0 are continuous, then


Z b Z b
f 0 (x)g(x)dx = [f (x)g(x)]ba − f (x)g 0 (x)dx.
a a

Proof. (f g)0 = f 0 g + f g 0 , hence f 0 (x)g(x)dx = f (x)g(x) − f (x)g 0 (x)dx (integration by parts)


R R
Rb
and this follows from Theorem 152, that is, If H(x) is a primitive of h(x), then a h(x)dx =
x
H(b) − H(a). Note that with h(x) = f (x)g 0 (x), we have we can take H(x) = a h(x)dx and
R
Rb Ra Rb
H(b) − H(a) = a h(x)dx − a h(x)dx = a h(x)dx.
R1 R1 2 2
Example 160. 0 xe2x dx = 12 [xe2x ]10 − 0 21 e2x dx = 12 (e2 −0)− 14 [e2x ]10 = e2 − 14 (e2 −1) = e4 + 41 .

Substitution
Next, let us consider the case where the integral is of the form ϕ0 (x)f 0 (ϕ(x))dx. We know that
R

D(f (ϕ(x))) = ϕ0 (x)f 0 (ϕ(x)) by the chain rule, hence in this case we have
Z
ϕ0 (x)f 0 (ϕ(x))dx = f (ϕ(x)) + C.

This is called substitution.

• Consider 2x sin(x2 )dx. Note that 2x = D(x2 ) and sin(y) = D(− cos y),
R
Example 161.
hence Z
2x sin(x2 )dx = − cos(x2 ) + C.

Indeed, by the chain rule, D(− cos(x2 )) = −(2x(− sin(x2 ))) = 2x sin(x2 ).

• Consider x2x+1 dx. Note that 2x = D(x2 ), and hence


R

D(x2 )
Z Z Z
x 1 2x 1 1
2
dx = 2
dx = 2
dx = log(x2 + 1).
x +1 2 x +1 2 x +1 2

90
• Consider tan xdx. Recall that tan x = cos x and note that D(cos x) = − sin x. Hence
sin x
R

Z Z
1
tan xdx = − D(cos x) · dx + C = − log | cos x| + C.
cos x

Lemma 162. If f, ϕ are differentiable and f 0 , ϕ0 is continuous, then


Z b
ϕ(b)
ϕ0 (x)f 0 (ϕ(x))dx = [f (ϕ(x))]ba = [f (y)]ϕ(a) = f (ϕ(b)) − f (ϕ(a)).
a

Proof. This follows immediately because f (ϕ(x)) is a primitive of ϕ0 (x)f 0 (ϕ(x)).

Example 163.
Z π Z π
sin3 xdx = − (cos2 x − 1) sin xdx
0 0
Z π  3 π
2 cos x
= ((cos x) − 1)D(cos x)dx = − cos x
0 3 0
(−1)3 13
 
4
= − (−1) − ( − 1) = .
3 3 3

Nov. 25. Integral calculus.


Rational functions
We know that

• (x−a)
R 1
dx = log |x − a|.

• for n ∈ N, n ≥ 2, 1 −1
.
R
(x−a)n dx = (n−1)(x−a)n−1

• 1 1 1
arctan( xa ).
R R
x2 +1
dx = arctan x, x2 +a2
dx = a

• 1 1
arctan( (x−b)
R
(x−b)2 +a2
dx = a a ).

We also have
1 + x2 − x2 −2x · x
Z Z Z Z
1 1
dx = dx = + dx
(x2 + 1)2 (x2 + 1)2 (x2 + 1) 2(x2 + 1)2
Z
x 1 x 1
= arctan x + 2
− 2
= arctan x + 2
− arctan x
2(x + 1) 2(x + 1) 2(x + 1) 2
1 x
= arctan x + .
2 2(x2 + 1)

Indeed, by taking the derivative,


0
(x2 + 1) − 2x2

1 x 1 1
arctan x + 2
= 2
+ 2 2
= 2 .
2 2(x + 1) 2(x + 1) 2(x + 1) x +1

In general, the derivative of the primitive F of f must be the original function f . We can
check that the primitive in this way. This is often useful because the calculus of primitive is
often complicated, while derivative is mechanical.

91
1
x3 −1 (4x3 −x)+ x4 −1
• dx. Note that x−4 x−4 4
R R
Example 164. 4x3 −x
dx = 4
4x3 −x 4x3 −x
= x(2x−1)(2x+1) = x +
− 27 − 92
2x−1 + 2x+1 (see below) and hence
!
x3 − 1 − 27 − 92
Z Z
x 1 4
3
dx = + + + dx
4x − x 4 4 x 2x − 1 2x + 1
x 7 9
= + log |x| − log |2x − 1| − log |2x + 1|.
4 16 16
− x2 +1 x
• Using 1
(x−1)2 (x2 +1)
= Ax+B
(x−1)2
+ Cx+D
x2 +1
= (x−1)2
+ 2
2
x +1
we get

− x2 + 1 x
Z Z Z
1 2
dx = dx + dx
(x − 1) (x2 + 1)
2 (x − 1)2 x2 + 1
− (x−1) + 12 x
Z Z
2 2
= dx + dx
(x − 1)2 x2 + 1
1 1 1
= − log |x − 1| − + log(x2 + 1).
2 2(x − 1) 4
P (x) P1 (x)
In general, if P (x) and Q(x) are polynomials, Q(x) can be written as a sum of (x−a)n or
P2 (x)
((x−b)2 +a2 )n
(with different polynomials P1 , P2 ), and for each of them one can find a primitive.

Example 165. • (x−1)(x+1) .


1
We put 1
(x−1)(x+1) = A
x−1 + x+1 .
B
Then

1 A(x + 1) + B(x − 1)
=
(x − 1)(x + 1) (x − 1)(x + 1)
and 1 = (A + B)x + (A − B). This means that A + B = 0 and 1 = A − B, therfore,
A = 21 , B = − 12 .
Bx3 +Cx2 +Dx+E
• 1
(x+1)(x2 +1)2
. We put 1
(x+1)(x2 +1)2
= A
x+1 + (x2 +1)2
.
Then
1 A(x2 + 1)2 + (Bx3 + Cx2 + Dx + E)(x + 1)
=
(x + 1)(x2 + 1)2 (x + 1)(x2 + 1)2
hence 1 = A(x4 + 2x2 + 1) + (Bx4 + (B + C)x3 + (C + D)x2 + (D + E)x + E), and this
means A + B = 0, B + C = 0, 2A + C + D = 0, D + E = 0, A + E = 1. To solve this,
we observe A = −B, C = −B, D = −E and hence −3B − E = 0, −B + E = 1, hence
B = − 41 , E = 34 , A = 41 , C = 14 , D = − 43 . Altogether,

1
1 4 − 14 x3 + 14 x2 − 34 x + 3
4
1
4 − x4 + 14 − 21 x + 12
= + = + +
(x + 1)(x2 + 1)2 x+1 (x2 + 1)2 x+1 x2 + 1 (x2 + 1)2

Change of variables
Let F (x) be a primitive of f (x), that is f (x)dx = F (x). If it is difficult to find F directly, one
R

may consider a change of variables x = ϕ(t). By the chain rule, dt d


F (ϕ(t)) = f (ϕ(t))ϕ0 (t). If
G(t) is a primitive of f (ϕ(t))ϕ (t), then F (x) = G(ϕ (t)).
0 −1

In order to recall the rule, it is useful to write


Z Z
dx
f (x)dx = f (ϕ(t)) dt,
dt
even if this is only formal.

92
Example 166. • f (x) = ex +1 .
1
With t = ex + 1, x = ϕ(t) = log(t − 1), ϕ0 (t) = t−1 ,
1

Z  
t−1
Z Z
1 1 1 1
x
dx = dt = − dt = log
e +1 t(t − 1) t−1 t t
x
and with t = ex + 1, ex1+1 dx = log exe+1 .
R

2
• f (x) = √x
1−x2
. With x = ϕ(t) = sin t, ϕ0 (t) = cos t, t = arcsin x, if t ∈ (− π2 , π2 ),

x2 sin2 t 1 − cos 2t
Z Z Z Z
2 t sin 2t
√ dx = cos tdt = sin tdt = dt = −
1 − x2 cos t 2 2 4
√ 2
and with t = arcsin x, sin 2t = 2 sin t cos t = 2x 1 − x2 , we obtain √x 1
R
1−x2
dx = 2 arcsin x−

1
2 x 1 − x2 .

Nov. 26. Integral calculus, Taylor’s formula with remainder, log


derivative, improper integral
When the function contains sin x and cos x, it is often useful to do the change of variable x =
cos2 x2 +sin2 x2
ϕ(t) = 2 arctan t, or t = tan x2 . Indeed, we have ϕ0 (t) = 1+t
2
2 , while cos2 x =
1
cos2 x
= 1 + t2
2 2
2
and sin x = sin(2 · x2 ) = 2 sin x2 cos x2 = 1+t
2t
2 and cos x = cos 2 − sin 2 = 1+t2 .
2 x 2 x 1−t

For example,
Z Z 2
1 t +1 2 x
dx = · 2
dt = log |t| + C = log tan + C.
sin x 2t 1+t 2

Definite integral by change of variables


Corollary 167. Let f be contiunous on [a, b], ϕ differentiable and ϕ0 continuous on [α, β], and
ϕ([α, β]) ⊂ [a, b], ϕ(α) = a, ϕ(β) = b. Then
Z b Z β
f (x)dx = f (ϕ(t)) · ϕ0 (t)dt
a α
Rx
Proof. Let F (x) = a f (s)ds. Since d
dt (F (ϕ(t))) = f (ϕ(t)) · ϕ0 (t),
Z β Z b
0
f (ϕ(t)) · ϕ (t)dt = [F (ϕ(t))]βα = [F (x)]ba = f (x)dx.
α a

p
Example 168. • Note that 1 − sin2 t = | cos t| and this is equal to cos t if |t| < 2,
π
hence
with x = sin t,
π π
Z 1p Z
2 p Z
2
1− x2 dx = 2
1 − sin t cos t dt = cos2 t dt
0 0 0
Z π  π
2 cos(2t) + 1 sin(2t) t 2
= dt = +
0 2 4 2 0
π
= .
4

93
Figure 71: Integral of symmetric and antisymmetric functions.

Some remarks
R0 R0
• If f (x) = f (−x), then by the change of variables x = −t, −a f (x)dx = a f (−t)(−t)0 dt =
Ra Ra Ra
0 f (t)dt, hence −a f (x)dx = 2 0 f (x)dx.
R1 √ R1√
For example, −1 1 − x2 dx = 2 0 1 − x2 dx = π2 .
R0 R0
• If f (x) = −f (−x), then by the change of variables x = −t, −a f (x)dx = a f (−t)(−t)0 dt =
Ra Ra
− 0 f (t)dt, hence −a f (x)dx = 0.
R1 2
For example, −1 ex sin xdx = 0.

• Logarithmic differentiation: If f (x) is difficult to differentiate but log f (x) is easy, then we
0 (x)
have D(log f (x)) = ff (x) , hence we have f 0 (x) = f (x)D(log f (x)).
For example, f (x) = xx (for x > 0) is not a simple product or a composition. But
log f (x) = x log x, hence D(log f (x)) = log x + 1, hence f 0 (x) = xx (log x + 1).

Taylor’s formula with remainder


Proposition 169. If f is differentiable n + 1 times in an neighbourhood of x0 with continuous
derivative, then for x in that neighbourhood,
n
X f (k) (x0 )
f (x) = (x − x0 )k + Rn (x, x0 ),
k!
k=0

1
Rx
where Rn (x, x0 ) = n! x0 f (n+1) (y)(x − y)n dy.
Proof. This is true for n = 0, because
Z x
f (x0 ) + f 0 (y)dy = f (x0 ) + [f (y)]xx0 = f (x).
x0

To prove the formula by induction, assume the claim for n and let f be n + 2 times differentiable,
(k)
then f (x) = nk=0 f k!(x0 ) (x − x0 )k + Rn (x, x0 ) and
P

1 x (n+1)
Z
Rn (x, x0 ) = f (y)(x − y)n dy
n! x0
Z x
1 h ix 1
=− f (n+1) (y)(x − y)n+1 + f (n+2) (y)(x − y)n+1 dy
(n + 1)! x0 (n + 1)! x0
Z x
1 (n+1) n+1 1
= f (x0 )(x − x0 ) + f (n+2) (y)(x − y)n+1 dy
(n + 1)! (n + 1)! x0

94
This is interesting, because for some functions, we can prove that the Taylor series converges
to the original function. Let us take x0 = 0 and consider the interval (−R, R).
n Rx y
• ex = nk=0 xn! + (n+1)!
1
0 e (y − x) dy. As |x| < R, we have e < e and |(y − x) | < R .
n y R n n
P
Altogether, the remainder term is
Z x Z x
1 1 eR Rn+1
ey (y − x)n dy ≤ eR Rn dy ≤ .
(n + 1)! 0 (n + 1)! 0 (n + 1)!
Rn+1
Note that for any R, (n+1)!→ 0 because for sufficiently large n we have n > 2R, hence from
k
that point the sequence decreases more than by 21 . This means that ex − nk=0 xk! → 0,
P
that is, the Taylor series converges to ex for x ∈ (−R, R), and this is denoted by

x
X xn
e = .
n!
n=0

Furthermore, R was arbitrary, hence this holds for any x.

• The same argument holds for sin x, cos x, because |Dn (sin x)| ≤ 1, |Dn (cos x)| ≤ 1. That
is,

X (−1)n x2n+1
sin x = ,
(2n + 1)!
n=0

X (−1)n x2n
cos x = ,
(2n)!
n=0

in the sense that for each x the series converges to the original function.
More properties of Taylor series will be studied in Mathematical Analysis II.

The value of Napier’s number


We can also find the approximate value of e = e1 using the Taylor formula with remainder. We
know that e < (1 + n1 )n+1 for any n. In particular, e < 4. Therefore, by
13 1
1k
Z
1
X 1
e = + ey (1 − y)13 dy
k! 13! 0
k=0
R1
and the error term satisfies 0 < 1
13! 0 ey (1 − y)12 dy < 4
13! < 0.0000002. Therefore, the approxi-
mation of e,
12
X 1k ∼
= 2.71828182,
k!
k=0
is correct up to the 7-th digit.

Improper integral
We can define integral for (some) funtions that are not bounded and on an interval not bounded.
Definition 170. Let (a, b) be an interval, a ∈ R or a = −∞ and b ∈ R or b = +∞. Let f
be a function integrable on all [α, β], where a < α < β < b, α, β ∈ R. If there exists the limit

limα→a α f (x)dx, then we denote it by
Z β Z β
f (x)dx = lim f (x)dx.
a α→a α

95
Rγ Rβ Rγ
It also holds that a f (x)dx = a f (x)dx + β f (x)dx for γ ∈ (a, b). Analogously if there exists
Rβ Rb Rβ
the limit limβ→b α f (x)dx, then we write α f (x)dx = limβ→b α f (x)dx. If both limits exist,
Rb Rx Rb
then we denote a f (x)dx = a 0 f (x)dx + x0 f (x)dx for some x0 ∈ (a, b).

This definition does not depend on x0 ∈ (a, b). Indeed,


Z x0 Z b Z x0 Z x1 Z x0 Z b
f (x)dx + f (x)dx = f (x)dx + f (x)dx + f (x)dx + f (x)dx
a x0 a x0 x1 x0
Z x1 Z b
= f (x)dx + f (x)dx
a x1

Example 171. • Consider (0, 1) and the function f (x) = xα , α ∈ R. For ε > 0, if α 6= −1,
Z 1
1 1
xα dx = [xα+1 ]1ε = (1 − εα+1 ),
ε α+1 α+1

and as ε → +0, this tends to 1


α+1 if α > −1, and diverges if α < −1. If α = −1,
Z 1
x−1 dx = [log x]1ε = − log ε,
ε
R1
and this tends to ∞ as ε → 0+. Therefore, for α > −1, 0 xα dx = α+1 .
1

• Consider (1, ∞) and the function f (x) = xα , α ∈ R. For β > 1, if α 6= −1,


Z β
1 1
xα dx = [xα+1 ]β1 = (β α+1 − 1),
1 α+1 α+1

and as β → +∞, this tends to − α+1


1 1
= |α+1| if α < −1, and diverges if α > −1. For
α = −1,
Z β
x−1 dx = [log x]β1 = log β,
1
R∞
and this tends to ∞ as β → +∞. Therefore, 1 f (x)dx = 1
|α+1| for α < −1.

• Consider (−∞, ∞).


Z β
2 1 2 1 2 2
xe−x dx = [e−x ]βα = (e−β − e−α ).
α 2 2
R∞ 2 −x2 ]0
Then both limits limα→−∞ , limβ→∞ exist. Furthermore, −∞ xe−x dx = 1
2 ([e −∞ +
2
[e−x ]∞
0 ) = 0.

Nov. 30. Some properties of improper integral


Let us recall that we introduced a proper integral for an unbounded function or on an unbounded
interval by
Z β Z β
f (x)dx = lim f (x)dx,
a α→a α

where a < α, and the function is bounded and integrable on all bounded intervals [α, β]. Similarly,
Rb Rβ Rb R x0 Rb
α f (x)dx = limβ→b α f (x)dx and a f (x)dx = a f (x)dx + x0 f (x)dx, where x0 ∈ (a, b).
When these limits exist, we say that the improper integral converges, and otherwise does not
converge, or diverges if the limit tends to ∞ or −∞.

96
R∞
Example 172. • The improper integral 0 sin xdx does not converge. Indeed, it holds that
Rβ β
0 sin xdx = [− cos x]0 = − cos β + 1, and as β → ∞, − cos β oscillates and does not
converge to any value.
R∞ Rβ
• Consider 0 e−x dx. For β > 0, we have R0 e−x dx = [−e−x ]β0 = −e−β − (−1) = 1 − e−β ,

hence as β → ∞, this tends to 1. That is, 0 e−x dx = 1.

• Consider (−∞, ∞).


Z β
2 1 2 1 2 2
xe−x dx = − [e−x ]βα = − (e−β − e−α ).
α 2 2
R∞ 2 2
and both limits limα→−∞ , limβ→∞ exist. Furthermore, −∞ xe−x dx = − 21 ([e−x ]0−∞ +
2
[e−x ]∞
0 ) = 0.

Proposition 173. Let f, g be integrable on all [α, β] ⊂ (a, b).


Rb Rb Rb
(i) Let 0 ≤ f ≤ g. If a g(x)dx converges, then so does a f (x)dx. If a f (x)dx diverges, then
Rb
so does a g(x)dx.

(ii) If f ≥ 0 and there is M > 0 such that α f (x)dx < M for all a < α < β < b, then
Rb
a f (x)dx converges.
Rb Rb
(iii) Let 0 < g, and limx→b− fg(x)
(x)
= c 6= 0. Then α f (x)dx exists if and only if α g(x)dx exists.
R∞
(iv) Let f > 0 be decreasing on [α, ∞). α f (x)dx converges if and only if ∞
P
n=N f (n) converges
for some N .
Rb Rb
(v) a f (x)dx ≤ a |f (x)|dx.
Rβ Rβ Rβ
Proof. (i) On any interval [α, β] it holds that α f (x)dx ≤ α g(x)dx, hence α f (x)dx is
bounded and increases as α, β tend to a, b.

(ii) As f (x) ≥ 0, when α → a (and β → b), the integral α f (x)dx increases. But as it is
Rb
bounded by M , it must converge to a certain number a f (x)dx ≤ M .

(iii) Let c > 0 (the other case is analogous). For x0 close enough to b, it holds that 2c g(x) ≤
f ≤ 2cg(x). Hence the claim follows from (i).
R n+1 PM R M +1
(iv) We have f (n + 1) ≤ n f (x)dx ≤ f (n), therefore, n=N f (n) ≤ N f (x)dx ≤
PM +1
n=N f (n).

(v) This follows from −|f | ≤ f ≤ |f | and (i) for an interval [α, β], then by taking the limits.

RN
Now we can show that ∞
P −1 1
1
is divergent using integral. Indeed, 1 x1 dx ≤ N n=1 n , but
P
RN 1 n=1 n PN −1 1
N
1 x dx = [log x]1 = log N − 0 → ∞, therefore, also n=1 n → ∞ as N → ∞.
Rb Rb
When the improper integral a |f (x)|dx, then we say that the improper integral a f (x)dx
converges absolutely.
R∞ x Rα
Example 174. • 1 cosx2
dx converges. Indeed, | cos
x2
x
| ≤ x12 and 1 x12 dx = [− x1 ]α1 = 1 − α1
which tends to 1 as α → ∞.

97
1
f (x) = x

1
f (x) = 1
x2
1
f (x) = x
1
f (x) = x2

f (x) = sin x

f (x) = e−x

Figure 72: Convergent and non convergent improper integrals.

1
f (x) = x

PN
Figure 73: A graphical proof that 1
n=1 n diverges as N → ∞.

98
sin x
f (x) = x

sin x
f (x) = x

Figure 74: f (x) = sinx x and f (x) = sin x


x . The improper integral of the former in [1, ∞) is
convergent, while the latter is not.

R∞
• 1
sin x
x dx converges. Indeed, by integration by parts,

− cos x α
Z α   Z α Z α
sin x cos x cos α cos
dx = + 2
dx = cos 1 − + 2
dx
1 x x 1 1 x α 1 x

The first two terms tend to cos 1 while the last one is convergent.
R∞ R (n+1)π sin x Rπ
• 1 | sinx x |dx diverges. Indeed, nπ 1
| x | ≥ (n+1)π 0 sin xdx = (n+1)π and hence we
2
R α sin x P[α]
have 1 | x |dx ≥ n=2 π(n+1) 2
→ ∞.
R∞ R∞
• 0 √x13 +1 dx is convergent. Indeed, it is enough to consider 1 √x13 +1 dx, and since √x13 +1 ≤
Rβ 1 R β −3
√1 = 13 , we have dx ≤ 2 dx, where the right-hand side is convergent.
1 x

3x x2 1 3
x +1
R∞ Rβ R −1
• −∞ x4 +1 dx is
√ 1 convergent. Indeed, √ 1
x4 +1
≤ √1 = 1
x2
and 1
1 x2 dx = 1
−β x2 dx =
x4
[−x−1 ]β1 = 1 − β1 → 1 as β → ∞.

Dec. 02. Area and length


Area
We know the area of rectangles, triangles and disks Let us define the area of a more general
region.

Definition 175. • Let f ≥ g be two integrable functions on an interval I. The area of the
region between g, f is defined by the following:

Dg,f := {(x, y) ∈ R2 : x ∈ I, g(x) ≤ y ≤ f (x)}


Z
area(Dg,f ) := (f (x) − g(x))dx.
I

• Even if I is not bounded, if the improper integral I (f (x) − g(x))dx exists, then we define
R

the area of the region Dg,f = {(x, y) ∈ R2 : x ∈ I, g(x) ≤ y ≤ f (x)} by the same formula.

• If D is the disjoint union of such regions, then area(D) is the sum of the areas of such
regions.

Example 176. • Rectancles. D


R = {(x, y) ∈ R : x ∈ I, a ≤ y ≤ b}, with the length |I| and
2

width b − a, then area(D) = I (b − a)dx = (b − a)|I|.

99
Figure 75: The area of the region between two functions.


f (x) = a2 − x2


f (x) = − a2 − x2

Figure 76: The area of a disk.

• Triangles. D = {(x, y) ∈ R2 : x ∈ [0, a], 0 ≤ y ≤ ab x}, with length a and width b, then
Ra
area(D) = 0 ab xdx = [ 2a 2.
b 2 a
x ]0 = ab

• Disks. D = {(x, y) ∈ R2 : x2 + y 2 ≤ a}, with radius a, then D can be also written as


p

p p
y 2 ≤ a2 − x2 ⇐⇒ − a2 − x2 ≤ y ≤ a2 − x2 .

Furthermore, −a ≤ x ≤ a because, if x > a then there is no y such that x2 + y 2 ≤ a2 .


Therefore,
p p
D = {(x, y) ∈ R2 : −a ≤ x ≤ a, − a2 − x2 ≤ y ≤ a2 − x2 }

for which we can compute the area.


By our definition,
Z a p p
area(D) = ( a2 − x2 − (− a2 − x2 ))dx
−a
Z a p
=2 a2 − x2 dx.
−a

By change of variables x = a sin θ with dx dθ = a cos θ, this integral corresponds to that on


the interval [− π2 , π2 ] because a sin π2 = a, a sin(− π2 ) = −a,
π
Z a p Z
2 p
a2 − x2 dx = a2 − a2 sin2 θa cos θdθ.
−a − π2

100
Using cos2 θ = cos 2θ+1
2 ,
Z π
2 p
area(D) = 2 a2 − a2 sin2 θa cos θdθ
− π2
Z π
2
=2 a2 cos2 θdθ
− π2
Z π
2
2
=a (cos 2θ + 1)dθ
− π2
 π
2 sin 2θ 2
=a +θ
2 − π2

= a2 π.

Length
A curve can be, at least partially, described using a function.

• segment. {(x, y) : x ∈ I, y = ax + b}.



• semicircle. {(x, y) : x ∈ (−a, a), y = a2 − x2 }.

• parabola. {(x, y) : x ∈ R, y = x2 }.

• hyperbola. {(x, y) : x ∈ R, y = x2 + 1}.

As we defined the area of a general region using integral, we can define the length of a curve
with integral.

Definition 177. For a curve represented by Gf := {(x, y) : x ∈ [a, b], y = f (x)}, where f is
differentiable and f 0 is continuous, we define the length by
Z bp
`(Gf ) := 1 + f 0 (x)2 dx.
a

If Gf is a union of such graphs, then `(Gf ) is defined to be the sum of the lengths of the partial
graphs.

Let us see that this coincides with the case of segment: a segment that goes by a horizontally
and b vertically is represented by {(x, y) : x ∈ [0, a], y = ab x}. Hence f (x) = ab x, f 0 (x) = ab . By
Raq q √
definition, `(Gf ) = 0 1 + ( a ) dx = a 1 + ( ab )2 = a2 + b2 , which coincides with the length
b 2

of the segment by the theorem of Pytagoras.


If a curve is a union of different parts, each of which is represented by a function fj , then the
length of the curve is the sum of the lengths of the parts.
Another possibile definition is to approximate a curve by segments: let f (x) be a function
on I = [a, b] and take a parition P by a = x0 < x1 < · · · < xn = b. Correspondingly, we
consider the attached segments Pf ({xk }): (x0 , f (x0 )), (x1 , f (x1 )), · · · , (xn , f (xn )). Let us recall
|P | = max1≤k≤n−1 {xk+1 − xk }.

Proposition 178. Let f be differentiable and f 0 be continuous. Then for any  there is δ such
that if a = x0 < x1 < · · · < xn = b is a partition P with |P | < δ, then |`(Gf ) − `(Pf ({xk }))| < .

101
Figure 77: The length and segments.

Proof. By the mean value theorem, there are xk ≤ ξk ≤ xk+1 such that f (xk ) − f (xk+1 ) =
f 0 (ξk )(xk − xk+1 ). Since Pf ({xk }) is the union of segments,
n−1
Xp
`(Pf ({xk }) = (xk+1 − xk )2 + (f (xk+1 ) − f (xk ))2
k=1
n−1
Xp
= (xk+1 − xk )2 + f 0 (ξk )2 (xk+1 − xk )2
k=1
n−1
Xp
= 1 + f 0 (ξk )2 (xk+1 − xk ).
k=1

On the other hand, 2 is continous and hence integrable. By uniform continuity, there
p
1 + f 0 (x)p
is δ such that | 1 + f 0 (x)2 − 1 + f 0 (y)2 | < b−a if |x − y| < δ. With such a parition, we have
p 
p p
0 0
S I ( 1 + f (x) , P ) ≤ `(Pf ({xk }) ≤ S I ( 1 + f (x)2 , P ).
2

If |P | is small,
p the difference between these sides are smaller than , and S I ( 1 + f (x) , P ) ≤
p
0 2

`(Gf ) ≤ S I ( 1 + f 0 (x)2 , P ). Therefore, |`(Gf ) − `(Pf ({xk }))| < .



Example 179. Semicircle. I = [−1, 1], f (x) = 1 − x2 , f 0 (x) = √ x
1−x2
.
Z 1 p
`(Gf ) = 1 + f 0 (x)2 dx
−1
Z1
1
= √ dx = [arcsin x]1−1 = π
−1 1 − x2
(note that this is an improper integral). That is, the length of the circle is 2π.

Dec. 06. Series


Zeno’s paradox
The Achilles and Tortoise paradox goes as follows: Achilles (ancient Greek hero, runs very fast)
running behind a tortoise (walks very slowly). At the beginning, Achilles is 10 meter behind the
tortoise. In the next moment, Achilles arrives at the position where the tortouse was there, but
in the meantime it moves by 1 meter. Then Achilles arrives at the position where the tortoise
was there in the previous moment, but in the meantime it moves by 0.1 meter. Then Achilles
arrives...
So how can we be sure that Achilles catches up with the tortoise?

102
Series and convergence
Let
Pn us recall that we have considered sequences of numbers a1 , a2 , · · · , an , · · · , and the series
k=1 ak , that is a new sequence

n
X
a1 , a1 + a2 , a1 + a2 + a3 , · · · , ak , · · ·
k=1

As this is a new sequence, we canPconsider its convergence or divergence.


That is, we say that a series nk=1 P ak converges to S ∈ R if for each  > 0 there is N ∈ N
such that for n > N it holds that |S − nk=1 ak | < . We say
P taht the series diverges if for any
R > 0 there is N ∈ N such that for n > N it holds that | nk=1 ak | > R. In other cases we just
say that the series does not converge.
If a series converges, we denote the limit by ∞ k=1 ak . Sometimes we just write n an a
P P
general term in a series.
Pn n(n+1)
Example 180. • k=1 k = 2 . This diverges.
Pn n(n+1)(2n+1)
• k=1 k
2 = 6 . This diverges.
Pn 1−an+1
• k=0 a
k = 1−a (if a 6= 1) This converges if and only if |a| < 1.

For these examples, we know the exact form of the n-th sum. For other series, it is difficult
to compute such general term, but still we may be able to say whether the series converges or
not.
For example, let us take an = n1 and consider nk=1 k1 . This is called the harmonic series.
P
As we have seen, this sum is larger than the integral of x1 on [1, n + 1]:
Z n+1 n
1 X1
dx ≤ .
1 x k
k=1
R n+1 1
On the other hand, we can calculate the left-hand side Pn 1and we obtain 1
n+1
x dx = [log x]1 =
log(n + 1), and this diverges as n → ∞. Therefore, k=1 k diverges as well.

Lemma 181. If ∞
P
k=1 ak converges, then ak → 0 as k → ∞.
P∞
Pn As
Proof. k=1 ak is convergent to S, for any P  > 0 there is N such thatPif n > N then
| k=1 ak −S| < 2 . In particular, we have S− 2 < nk=1 ak < S+ 2 and S− 2 < n+1
 
a < S+ 2 .
Pn+1 Pn Pn+1 k=1Pkn
From this it follows that − < k=1 ak − k=1 ak < , that is, | k=1 ak − k=1 ak | =
|an+1 | < . This means that an → 0.
Pn
Example 182. • k=1 k does not converge because ak = k diverges.
Pn 1 k
• k=1 ( 2 ) converges to 1, and in this case indeed ( 2 ) converges to 0.
1 k

Pn 1
• k=1 k diverges, although in this case indeed k converges to 0.
1

Theorem 183.P (i) Let ∞


P P∞
k=1 ak be convergent. Then for any c ∈ R, k=1 cak is also conver-
∞ P∞
gent and k=1 cak = c k=1 ak .
P∞ P∞ P∞
(ii) P
Let k=1 aP k and k=1 bk be convergent. Then k=1 (ak + bk ) is also convergent and
∞ ∞ P∞
k=1 ak + k=1 bk = k=1 (ak + bk )

(iii) Let k=1 ak convergent and ∞


P∞ P P∞
k=1 bk be divergent. Then k=1 (ak + bk ) is divergent.

Proof. (i)(ii) These follow from the properties of sequences.

103
(iii) Suppose the contrary that ∞
P∞ P∞
k=1 (ak + bk ) converges. Then by (ii)
P
P∞ k=1 bk = k=1 (ak +
bk ) − ak would converge, which contradicts the assumption that k=1 ak + bk diverges.

• The series 1
+ 21n ) diverges, because 1
diverges while 1
con-
P P P
Example 184. n( n n n n 2n
verges.
• The series and both diverge, but the sum converges to 0.
P P P P
n1 n −1 n (1 − 1) = n0

Let us consider some cases where the sum converges.


Example
P 185. • Telescopic series. Let bn a sequence and an = bn+1 − bn . We call such
n an a telescopic series. (Any series can be written in the form of telescopic series, but
we are interested in the case where bn is simpler than an )
Then we have
n
X
an = (b2 − b1 ) + (b3 − b2 ) + · · · + (bn+1 − bn ) = bn+1 − b1 .
k=1

From this we infer that is convergent if and only if bn is convergent.


P
n an
For example, consider an = n(n+1) .
1
This can be written as

1 1 1
an = = − ,
n(n + 1) n n+1
hence with bn = − n1 , this is a telescopic
Pseries. By the argument above, we see that
n ∞
1 1
, and
P
k=1 n(n+1) = bn+1 − b1 = 1 − n+1 a
k=1 n = 1.
Next let us take an = log(n/(n + 1)). Then it holds that an = log nP− log(n + 1), hence
with bn = − log n this is a telescopic series. As bn → −∞, the series n an diverges.
n+1
• Geometric series. Let us take x ∈ R, x 6= 1. We know that nk=0 ak = 1−a 1−a . It is clear
P

that the series


Pn converges to 1−a if |a| < 1, and diverges if |a| > 1. If a = 1, then the series
1

is simply k=1 1 = n + 1 and diverges as well.


Geometric series can be seen as a function of x: For a given number x ∈ R, we consider a
sequence an (x) = xn and it holds for |x| < 1 that

X 1
an (x) = .
1−x
n=0

The right-hand side is again a function of x. In this sense, a convergent series which depends on
x defines a new function.
We have seen other examples of this type:

x
X xn x2 x3 x4 x5
e = =1+x+ + + + ···
n! 2 6 24 120
n=0

X (−1)n x2n+1 x3 x5
sin x = =x− + + ···
(2n + 1)! 6 120
n=0

X (−1)n x2n x2 x4
cos x = =1− + + ···
(2n)! 2 24
n=0

In all these cases, for a fixed x ∈ R, we have seen that the right-hand converges by the Taylor
formula with remainder.
In a similar way, we can define many other useful functions by series.

104
Dec. 9. Convergence criteria for positive series
Let{an } ⊂ R be a sequence. We have considered a series nk=1 ak , which is a new sequence and
P
its convergence or divergence.
When all the term are non-negative: an ≥ 0, there are some criteria that can be often used
to determine the convergence or divergence.

Theorem 186. We have the following.

• Let {an }, {bn } be two sequences, an P ≥ 0, bn ≥ 0 such that there isPc > 0 and Pan ≤ cbn for
n sufficiently
P large. In this case, if bn converges, then so does an . If an diverges,
so does bn .

• Let {an }, {bn } be two sequences,


P
P an ≥ 0, bn ≥ 0 such that an /bn → c, c 6= 0, ∞. Then an
converges if and only if bn converges.

• As an ≥ 0, bnP≥ 0, the series n an ,P n bn are increasing. If n bn is convergent


P P P
Proof.
and an ≤ cbn , then nk=1 ak ≤ nk=1 cbk ≤ ∞ , The other case is analogous. hence
P
k=1 kcb
the former is bounded and increasing, therefore, it must converge.

• If an /bn →6= 0, then this implies that 2c bn ≤ an ≤ 2cbn for sufficiently large n, hence the
previous point applies.

Pn
We have seen that, for 0 < a < 1, k=1 a
k converges to 1−a .
1
We can use this fact to show
the convergence of some other series.

Theorem 187 (root test). Let an > 0 be a sequence.


1 P
(i) If ann ≤ θ < 1 for n sufficiently large, then n an converges.
1 P
(ii) If ann ≥ θ > 1 for n sufficiently large, then n an diverges.
1 P P
(iii) Let ann → a. If a < 1, n an converges. If a > 1, n an diverges.

Proof. The series n θn converges if θ < 1 (the geometric series) and diverges if θ > 1 (θn does
P
1 1
not tend to 0). By Theorem 186, and ann < θ or ann > θ, the first two claims follow.
1 1
If ann → a < 1, then we can take θ such that ann < θ < 1 for n sufficiently large. The case
1
ann → a > 1 is analogous.

• n n3n is convergent.
P 1
Example 188.
P n2n
• n 3n is convergent.

• n 2n is convergent.
P n

1
When limn ann = 1, this criterion does not give information. Indeed, n n1 is divergent, but
P
1 1
n n2 is convergent (compare it with n(n−1) ), while in both cases lim( n ) = lim( n2 ) = 1.
P 1 P 1 1 n 1 n

Proposition 189 (ratio test). Let an > 0 be a sequence.

(i) If an+1
P
an ≤ θ < 1 for n sufficiently large, then n an converges.

(ii) If an+1
P
an ≥ θ > 1 for n sufficiently large, then n an diverges.

(iii) Let an+1


P P
an → θ. If θ < 1, n an converges. If θ > 1, n an diverges.

105
an+1
Proof. Let an ≤ θ < 1 for n ≥ N . Then,

aN +m ≤ aN +m−1 θ ≤ aN +m−2 θ2 ≤ · · · ≤ aN θm .

Now n aN θm is convergent, hence by Theorem 186, n an is convergent. If an+1


an ≥ θ > 1, then
P P
an is increasing and does not convergent to 0.
If an+1 an+1
an → a < 1 or > 1, then an ≤ θ < 1 or > 1 for n sufficiently large, hence the claim
follow from (i), (ii).

• 2n is convergent.
P n
Example 190.
P n2
• n! is convergent.
P (n!)2
• is convergent.
2n2
1
When an+1 /an → 1 or ann → 1, we need to study the series better.

Lemma 191 (integral test). Let {an } be a decreasing sequence of positive numbers and R ∞ assume
that there is a positive
P R ∞ on [1, ∞). If an ≤ f (n)
decreasing function f (x) defined Pand 1 f (x)dx
converges, then n an converges. If an ≥ f (n) and 1 f (x)dx diverges, then n an diverges.
Rn
Proof. For the first case, we have nk=2 an ≤ 1 f (x)dx, and the later converges, hence so does
P
the former. R n+1
For the first case, we have nk=1 an ≥ 1 f (x)dx, and the later diverges, hence so does the
P
former.

Example 192.R Let us fix s ∈ R and consider ∞ n=1 ns . We can compare this with fs (x) = ns .
1 1
P

We know that 1 fs (x)dx converges if and only if s > 1.
ζ(s) is called the Riemann zeta function.

LemmaP 193 (condensation principle). Let P{an } be a decreasing sequence of positive numbers.
Then an converges if and only if so does 2n a2n .

Proof. Since an is decreasing and positive, a2n ≥ a2n +1 ≥ · · · ≥ a2n+1 , hence

2n+1
X−1
n
2 a2n ≥ ak ≥ 2n a2n+1 .
k=2n

By summing this with respect to n,

N N 2X−1 n+1 2NX


+1 −1 N N N +1
!
X X X 1 X n+1 1 X
2n a2n ≥ ak = ak ≥ 2n a2n+1 = 2 a2n+1 = 2n a2n − a1 .
2 2
n=1 n=1 k=2n n=1 n=1 n=1 n=1

Therefore, converges if and only if n 2n a2n converges by Theorem 186.


P P
n an

n(log n)α . By condensation principle, it is enough to study


1
2 2n (log1 2n )α =
P P n
Example 194.
nα . Hence this converges if and only if α > 1.
P 1 1 P 1
(n log 2)α = log 2α

106
Dec. 10. Convergence criteria for general series
Let {an } be aPsequence, not necessarily positive. We say that the series
P
n an converges
absolutely if n |an | converges.
P P
Lemma 195. If n |an | converges, then so does n an .

Pm As n |an | converges, for  >P 0 there is N ∈ N such that for m > n > N it holds that
P
Proof.
m
Pk=n |a n | < . Then it holds that | k=n ak | <  by the triangle inequality. This means that
n
k=1 ka is a Cauchy sequence, hence it converges.

This Lemma, combined with the criteria for positive series, enables us to show convergence
of many series.
P∞
Example 196. • n=1 (−1)n ns converges absolutely if s > 1. Indeed, with an = (−1)n ns ,
1 1

|an | = n1s , and we know that ∞ n=1 ns converges if and only if s > 1.
1
P

P∞
• n=1 (−1)n 2n converges absolutely. Indeed, with an = (−1)n 2n , |an | = 2n and by root test,
n n n

 n 1 1
n nn 1
n
= → < 1.
2 2 2
Therefore, n |an | converges.
P

If a series n an converges absolutely, its limit does not(depend on the order: indeed, as
P

an if an > 0
it is absolutely convergent, then its positive elements bn = and the negative
0 if an ≤ 0
elements cn = an − bn are both convergent.
P Therefore, even if we sum P first the positive elements
and then the negative elements , the result is the same: n an .
P P P
b
n n n cn n bn + n cn =
On the other hand, if a series n an converges but not absolutely, their positive and negative
P
parts both diverges (because otherwise it would be absolute convergence). By rearranging the
sum, one can make it diverge to ∞ (by taking many elements of bn ) or to −∞ (by taking many
elements of cn ).
A series whose terms change sign at each stem is called an alternating series. That is, for
an > 0, it is given by

X
(−1)n−1 an = a1 − a2 + a3 − a4 + · · · ,
n=1

Lemma 197 (Leibniz criterion). Let {an } be a decreasing sequence with positive terms and
assume that an → 0. Then (−1)n−1 an converges.
P

Pn Pn
Proof. Let sn = Pn k=1 (−1) ak . Then s2n = k=1 (−a2k−1 + a2k ) is decreasing. Analogously
k

s2n+1 = −a1 + k=1 (a2k −a2k+1 ) is increasing. In addition, s2n −s2n+1 = −(−a2n+1 ) = a2n+1 ≥
0. Hence s2n and s2n+1 converge to s and s, respectively, and s ≥ s. But s2n+1 ≤ s ≤ s ≤ s2n
and s2n − s2n+1 = a2n+1 → 0, hence s = s.

• n−1 1 is convergent. Note that this series is not absolutely conver-


P
Example 198. n (−1) n
gent, indeed, n n is divergent.
1
P

• 1
(−1)n−1 log(n+1) is convergent. Note that this series is not absolutely convergent, indeed,
P
Pn
n log(n+1) is divergent.
1

107
Pn
• The sequence 1
k=1 k − log n converges. Indeed, this can be seen as
Z 2 Z 3 Z n
1 1 1 1 1 1
1 + + ··· + − dx − dx + · · · − dx
2 3 n 1 x 2 x n−1 x
Z 2 Z 3 Z n
1 1 1 1 1 1
= − dx + − dx + · · · − dx + ,
1 1 x 2 2 x n−1 x n
R k+1 1
and the last expression is an alternating series. Note that k1 > k x dx > k+1 . Further-
1

more, k → 0 as k → ∞. Therefore, the Leibniz criterion applies and the series converges
1

(to a number known as the Euler-Mascheroni constant).

Pn 199 (Abel’s partial summation P


Lemma formula). Let {an }, {bnP
} be two sequences, and let
An = k=1 ak . Then we have the identity k=1 ak bk = An bn+1 + nk=1 Ak (bk − bk+1 ).
n

Proof. Let us define A0 = 0. By computing the right-hand side,


n
X n
X n+1
X
An bn+1 + Ak (bk − bk+1 ) = An bn+1 + Ak bk − Ak−1 bk
k=1 k=1 k=2
n
X n+1
X
= An bn+1 + Ak bk − Ak−1 bk
k=1 k=1
Xn
= An bn+1 + ak bk − An bn+1
k=1
n
X
= ak bk .
k=1

Theorem 200 (Dirichlet’s test). Let n an be a series and assume that An = nk=1Pak is a
P P
bounded sequence. Let bn > 0 be a decreasing sequence and bn → 0. Then the series n an bn
converges.
Proof. By Lemma 199, we have nk=1 an bn = An bn+1 + nk=1 Ak (bk − bk+1 Pn). As bk is decreasing,
P P
we
Pn have bk − bk+1 > 0 and since A k is bounded, say by C, we have k=1 |Ak (bk − bk+1 )| ≤
k=1 C(bk − bk+1 ). The latterPis a telescopic series and it is equal to C(b1 − bn+1 ), which
converges (to Cb1 ). Therefore, nk=1 Ak (bk − bk+1 ) is absolutely convergent.
OnPthe other hand, An bn+1 tends to 0 because |An | < C and bn+1 → 0. Altogether, the
series nk=1 an bn converges.

Theorem 201 (Abel’s test).PLet n an be a convergent series and assume bn is monotonic and
P
convergent. Then the series n an bn converges.
Proof. PWe may assume that bn is decreasing (otherwise we can consider −bn ). In this case,
An = n an is bounded (because it is convergent), and with B = lim bn , bn − B is decreasing
and converges to 0.
Hence we can apply Dirichlet’s test to the first term of
P P P
n an bn = n an (bn − b) + n an b,
where the last term is convergent because n an is convergent.
P

Example 202. Let us consider the series n sinnnπθ , where θ = pq is a rational number and
P
s

s > 0. This is not absolutely convergent, nor alternating (if q 6= 2). On the other hand, the
(n+2q)πp
sequence sin nπθ is periodic, because sin x is 2π-periodic, that is, sin nπp
q = sin q . Let us
assume q is even and p is odd. Then sin (n+q)πp = − sin nπp
q , and hence also the sum
P
q n sin nπθ
is periodic, and in particular bounded.

108
In this case, as n1s is monotonically decreasing and tends to 0, we can apply Dirichlet’s test
and conclude that ∞ sin nπθ
is convergent. Similarly, ∞ cos nπθ
P P
n=1 ns n=1 ns
The same holds for q odd, and actually for any θ ∈ R.

Dec. 14. Ordinary differential equations


Many scientific questions are expressed in terms of differential equation (equation about functions
and their derivatives).
2
• The equation of motion in a gravitational field m ddt2x = − mM
x2
G

2
• The heat equation ∂u ∂t = α ∂x2 (this is partial differential equation, because it contains
∂u

partial derivatives, studied in Mathematical Analysis II) .

• The SIR model in epidemiology dS


dt = − βS(t)I(t)
N , dI
dt =
βS(t)I(t)
N − γI(t), dR
dt = γI(t)

This is because the rate of change (the derivative) is often determined by the current status (the
function). For example, in the equation of motion, the gravitational force − mM x2
G
depends on
the place of a particle x(t), while the force determines the the rate of change of the speed (the
acceleration), and the speed is x0 (t), hence the second derivative appears on the left-hand side.
Once the scientific problem is written in the form of differential equation, it is a mathematical
problem to solve it, that is, to find functions that satisfies the given equation.
In the following, y(x) will be a function of x and the derivatives are denoted by y 0 (x), y 00 (x)
and so on. Some more examples of differential equation are

• y 0 (x) = y(x)

• y 0 (x) = x3 y(x) + sin(xy 00 (x))

• Sometimes we just write this as y 0 = x3 y + sin(xy 00 ), keeping in mind that y is a function


of x.

In a differential equation, certain higher derivative of y may appear. The highest order of
the derivative of y is called the order of the differential equation. For example,

• y 0 (x) = 2y(x) is a first-order differential equation.

• y 0 (x) = x3 y(x) + sin(xy 00 (x)) is a second-order differential equation.

We need to find functions y(x) that satisfy the given equation. This is why it is called a differential
equation. Compare it with an algebraic equation x2 + 3x − 4 = 0, where we need to find numbers
that satisfy this equation.
Let us consider first-order differential equations. In an abstract form, we can write it as

y 0 = f (x, y),

where f is explicitly written in examples, while y is the unknown functions which we need to
find. In the example y 0 (x) = 2y(x), we take f (x, y) = 2y. A solution of a differential equation
is a (differentiable) function that satisfies this equation. For example, by taking y(x) = Ce2x ,
we can check that this is a solution:

y 0 (x) = 2Ce2x = 2y(x).

109
Some first-order differential equations
The simplest case is where f does not depend on y: that is,

y 0 (x) = f (x).

This means that f is the


R derivative of y, or y is a primitive of f . Therefore, y can be obtained by
integrating f : y(x) = f (x)dx + C. Indeed, this y satisfies the given equation for any C ∈ R,
and there is no other solution.

Example 203. When a ball falls freely without drag, the speed −gx is proportional to the time
x. As the speed is the derivative of the position y, we have the equation

y 0 (x) = −gx.
2
This can be solved by integration, that is y(x) = (−gx)dx = − gx2 + C. The constant C
R

depends on the position where the ball starts to fall.

As we see in this example, a differential equation may have many solutions. In practice, we
are interested in one of them which satisfies additional conditions, the initial conditions or
boundary conditions, that give the value of y, y 0 at a given time x.
Next, let us consider again the simplest differential equation y 0 = f (x, y) where f depends
on y.

Theorem 204. Let a, C ∈ R. Then there is only one (differentiable) function y such that
y 0 (x) = ay(x) and y(0) = C.

Proof. We know that there is one such function: y(x) = Ceax . Indeed, we can check that
y 0 (x) = aCeax = ay(x) and y(0) = Ce0 = C.
Suppose that there is g(x) with the same condition. Let h(x) = e−ax g(x), then h0 (x) =
−ae−ax g(x) + e−ax g 0 (x) = −ae−ax g(x) + ae−ax g(x) = 0 for all x ∈ R, hence h(x) must be a
constant. As h(0) = e0 g(0) = C, h(x) = C hence g(x) = Ceax .

Let us consider when we see the equation y 0 = ay.

• A very typical example is radioactive atoms. Let y(x) be the number of a single species
of radioactive atoms at time x. It is known that each atom decays, independently from
other atoms, in a certain time period by a certain probability. This means that, at each
moment, the rate of decrease in numbers y(x) is proportional to y(x). With a constant a,
we can write this as
y 0 (x) = −ay(x).
If there are C atoms at time x = 0, we know that the solution is y(x) = Ce−ax , hence the
number of radioactive atoms decays exponentially. This can be written more conveniently
as y(x) = Ce−ax = C2−ax/ log 2 . Then with T = loga 2 , we have y(x) = C2−x/T , and it is
clear that the number of atoms halves in time T . T is called the half life of this particular
species of atom.

• Another instance is the SIR model in epidemiology. We consider the total population N , the
numbers of S(t) (succeptible), I(t) (infected) and R(t) (removed/recovered). It is assumed
that each infected people has contact with a certain number of people in each day, hence
this number is proportional to S(t)
I(t) , and assume that in each such contact transmission

110
C

C
2
C
4

T 2T

Figure 78: The exponential decay y(x) = Ce−ax = C2−ax/ log 2 The half-life is T = log 2/a.

occurs by the rate β. On the other hand, each infected people lose infectivity by the rate
γ.
dS βS(t)I(t)
=−
dt N
dI βS(t)I(t)
= − γI(t)
dt N
dR
= γI(t)
dt
It is difficult to solve this set of equations. Yet, we can understand the behaviour when
there are few infected people I(t) compared to the total number N . When I(t) is small,
then R(t) is also small and S(t) = N − I(t) − R(t) is close to N . By putting S(t) = N , we
have
dI
= (β − γ)I(t).
dt
As a function of t, we know that I(t) = Ce(β−γ)t , where C is the number of infected at day
t = 0. This epidemic grows when β − γ > 0, and decays when β − γ < 0. R0 = βγ is called
the basic reproduction number in the SIR model. When R0 > 1 the epidemic grows
and when R0 it decays.

Dec. 16. Ordinary differential equations


Linear equations with constant coefficients
Let us next consider a linear differential equation of the form

y 00 + P (x)y 0 + Q(x)y = R(x).

This contains the second derivative y 00 , hence it is a second-order differential equation. Here,
P (x), Q(x), R(x) are a known function and we have to find a function y(x) which satisfies this
equation. Such an equation is called linear homogeneous differential equation of second-
order. (Linear means that there is no term containing y 2 , y 3 , (y 0 )2 etc., homogeneous means that
the right-hand side (the term which does not depend on y) is 0.)
If y1 , y2 are two solutions of a homogeneous equation, then then so is y3 (x) = ay1 (x)+by2 (x),
because

y300 (x) + P (x)y30 (x) + Q(x)y3 (x)


= ay100 (x) + by200 (x) + P (x)(ay10 (x) + by2 (x)) + Q(x)(ay1 (x) + by2 (x))
= a(y100 (x) + P (x)y10 (x) + Q(x)y100 (x)) + b(y200 (x) + P (x)y20 (x) + Q(x)y200 (x)) = 0

If y1 , y2 are two solutions of a linear equation, then the difference y1 − y2 is a solution of the
equation where R(x) is set to 0.

111
A simplest of such equations is one where P (x), Q(x), R(x) are constant:

y 00 + ay 0 + by = 0.

As we will see, such equations appear naturally in physics.


An even simpler case is where a = 0:

y 00 + by = 0.

Let us start with solutions of this type.

• Case 1. b = 0. In this case, we have y 00 = 0. This means that y 0 = C1 (constant) and


further y = C1 x + C2 . It is easy to see that any solution is of this form.

• Case 2. b < 0. In this case, the equation can be written as y 00 = k 2 y where b = −k 2 and
we can take easily check that y(x) = C1 ekx + C2 e−kx is a solution for any constant C1 , C2 .
Indeed, y 0 (x) = kC1 ekx − kC2 e−kx and y 00 (x) = k 2 C1 ekx + (−k)2 C2 e−kx = k 2 (C1 ekx +
C2 e−kx ) = k 2 y(x).

• Case 3. b > 0. In this case, the equation can be written as y 00 = −k 2 y where b = −k 2 .


There are solutions of the form y(x) = C1 sin(kx)+C2 cos(kx) is a solution for any constant
C1 , C2 .

These solutions are all, and no other solutions (we present later the general uniqueness theorem).
Note that, in all these cases, there are two constants C1 , C2 . If we require an initial condition

• y(a) = b1

• y 0 (a) = b2

these constants are fixed.


For example, in Case 2 with a = 0, y(x) = C1 ekx +C2 e−kx , we should have y(0) = C1 +C2 = b1
and y 0 (0) = kC1 − kC2 = b2 , hence C1 = 12 (b1 + bk2 ), C2 = 12 (b1 − bk2 ).
Let us consider the general case y 00 + ay 0 + by = 0. This can be reduced to the special case
ax
above as follows. We write y(x) = u(x)e− 2 , then
ax a ax ax a
y 0 (x) = u0 (x)e− − u(x)e− 2 = u0 (x)e− 2 −
2 y(x)
2 2
ax a ax a ax a2 ax
y 00 (x) = u00 (x)e− 2 − u0 (x)e− 2 − u0 (x)e− 2 + u(x)e− 2
2 2 4
ax ax a2 ax
= u00 (x)e− 2 − au0 (x)e− 2 + u(x)e− 2
4
Therefore, if y is a solution of this equation, it must hold that

ax ax a2 ax ax a ax ax
0 = (u00 (x)e− − au0 (x)e− 2 + u(x)e− 2 ) + a(u0 (x)e− 2 − u(x)e− 2 ) + bu(x)e− 2
2
4 2
2
 
ax a ax
= u00 (x)e− 2 + b − u(x)e− 2
4
 2
 ax
hence if u safisfies u00 (x) + b − a4 u(x) = 0, then y(x) = u(x)e− 2 satisfies y 00 + ay 0 + by = 0.
We know how to solve the former, hence so the latter.
x
Example 205. Consider the equation y 00 + y 0 − y = 0.√ Then we can√
write y = ue− 2 and then u
5 5
should satisfy

u00 − 54 u√= 0. We know that u(x) = C1 e 2 x + C2 e− 2 x is a solution of this, hence
5−1 5+1
y = C1 e 2
x
+ C2 e − 2
x
.

112
Next, let us consider the inhomogeneous case, that is y 00 + ay 0 + by = R(x). In some cases
we can find solutions.
2
Example 206. Take R(x) = x2 . Then y(x) = 1b (x2 − 2ax b +
2a −2b
b2
) is a solution. Indeed,
y (x) = b (2x − b ), y (x) = b .
0 1 2a 00 2

A general solution can be obtained by adding a solution of the homogeneous version y 00 +


ay + by = 0 to this solution.
0

Physical examples
• Simple harmonic motion. Consider a mass m which is attached to a spring. Let us call
x(t) the position of the mass. When a spring is stretched by the distance r, then it pulls
back the mass by the force kr. Similarly, when a spring is pressed by the distance r (hence
the mass is displaced to −r), then it pushes back the mass by the force kr. Together with
the direction of the force, it can be written as −kx.
The equation of motion is about the variable x(t) and the acceleration is a(t) = x00 (t),
hence F (x) = ma = mx00 becomes

mx00 = F (x) = −kx.

That is, x00 + k


mx = 0. The general solution of this is
r r
k k
x(t) = C1 sin t + C2 cos t.
m m
If we pull the mass to a and leave quietly at time t = 0, then the solution should have
x(0) = a, x0 (0) = 0. That is, C2 = a and C1 = 0, and the special solutions is
r
k
x(t) = a cos t.
m
This means that the mass oscilates between −a and a.s
In general, if we specify the values x(0) and x0 (0), then there is only one solution. These
values are called the initial conditions.

• In addition to the previous example, let us consider the case where the mass lies on a floor
hence receives the friction. The friction is proportional to the velocity and in the converse
direction. Therefore, the equation of motion is

mx00 (t) = −kx(t) − cx0 (t),

or x00 + c
mx + k
mx = 0.
The solution is given by solving u00 + ( m k
− 14 ( m ) )u = 0. If −s2 =
c 2 k
m − 14 ( m ) < 0, then
c 2
c c
we have x(t) = C1 e −( 2m −s)t
+ C2 e −( 2m +s)t
.
If x(0) = a, x0 (0) = 0, then C1 +C2 = a, −( 2m
c
−s)C1 +−( 2m c
+s)C2 = 0 hence C1 = C2 = a2 ,
and
a  −( c −s)t c

x(t) = e 2m + e−( 2m +s)t
2
Note that 2m c
> s, hence this decays exponentially. This means that the mass arrives at 0
without going back and forth.
We leave the remaining case k
m − 14 ( m ) ≥ 0 as exercises.
c 2

113
Dec. 17. Ordinary differential equations
General remarks
Many interesting differential equations are nonlinear: just for example, the motion in a gravita-
tional field is given by
mM G
mx00 = −
x2
(in one-dimension). And it is difficult to solve such nonlinear equations.
Let us consider a first-order differential equation
y 0 = f (x, y),
that is, f is a given function of two variables and the question is to find a function y(x) such
that y 0 (x) = f (x, y(x)) for x in a certain interval.
If from the differential equation y 0 = f (x, y) we can derive a relation between x, y of the form
F (x, y, C) = 0,
where F (x, y, C) is another two-variable function with a parameter C (hence 3-variables), then we
say that the differential equation is solved, or integrated. This is because the relation F (x, y, C) =
0 for a fixed number C defines a function √ y(x) implicitly: recall that, if F (x, y) = x + y − C ,
2 2 2

then it defines the function(s) y(x) = ± C 2 − x2 .

Separable differential equations


We call a first-order differential equation y 0 = f (x, y) separable if it can be written in the form
y 0 = Q(x)R(y), where Q(x) is a function of x alone and R(y) is a function of y alone. For
example,
• y 0 = x3
• y 0 = yx2
• y 0 = sin y log x.
When R(y) 6= 0, we can write this in the form A(y)y 0 = Q(x).
Theorem 207. Let G(y) be a primitive of A(y) and H(x) be a primitive of Q(x). Then any
differentiable function y(x) which satisfies
G(y(x)) = H(x)
satisfies the differential equation A(y(x))y 0 (x) = Q(x), and conversely, any solution y(x) satisfies
this equation for certain H(x).
Proof. Let y(x) satisfies the equation above, then by the chain rule, we have dx d
G(y(x)) =
y (x)A(y(x)), while dx H(x) = Q(x), hence we obtain A(y(x))y (x) = Q(x) by differentiating
0 d 0

G(y(x)) = H(x).
Conversely, if y(x) is the solution of the differential equation, then by integrating both sides
of A(y(x))y 0 (x) = Q(x) by substitution, we have G(y(x)) = H(x) + C for some constant C. Note
that H(x) + C is a primitive of Q(x), hence we proved the claim.
y0
Example 208. • Consider y 0 = y 2 x. This can be written as y2
= x. Each sides can be
integrated, and we obtain
1 x2
− = + C,
y 2
or y = − x2 1 .
2
+C

114
• Consider y 0 = y.
x
This can be written as yy 0 = x. Each sides can be integrated, and we
obtain
y2 x2
= + C,
2 2

or y = x2 + 2C.
y(y−1) y0
• Consider xy 0 + y = y 2 . This can be written as y 0 = x , or y(y−1) = x1 . Each sides can
be integrated, and we obtain
1
log 1 − = log x + C,
y

or |1 − y1 | = C 0 x for some constant C 0 = eC .


Not many equations are separable, but some can be reduced to a separable equation. For
example, consider the case where y 0 = f (x, y) and

f (tx, ty) = f (x, y)

for any t 6= 0. In this case, we can introduce v = xy , or y = vx and hence y 0 = v 0 x + v. Therefore,


if y 0 is the solution of the equation above, then it must hold that

v 0 x + v = y 0 = f (x, y) = f (1, y/x) = f (1, v).

This can be written as v 0 = (f (1, v) − v) x1 , hence is separable. Once v is obtained as a function


of x, we can recover y = vx.
Example 209. Consider y 0 = y−x ty−tx y−x
x+y . f (tx, ty) = tx+ty = x+y = f (x, y), hence this is can be
solved by introducing y = vx.
1+v 2 1
We have v 0 = (f (1, v) − v) x1 = ( v−1
1+v − v) x = − 1+v x , and we have
1

Z
1+v 1
dv = arctan v + log(1 + v 2 )
1 + v2 2
Z
1
− dx = − log x + C
x
By bringing back y = vx, we have arctan xy + 12 log(x2 + y 2 ) = C.

Concrete applications
• Consider a falling body in a resisting medium. For example, we drop a ball from a window.
The gravitational force is constant g when the body moves the distance much shorter than
the radius of the Earth. In addition, the ball is resisted by the air and the resistance
is proportional to the velocity. To express this in a differential equation, let v(t) be the
velocity of the ball at time t, we leave it at time t = 0 from the height 0. Then

mv 0 = mg − kv,
k
mv 0 mg − m
or mg−kv = 1. This is separable. We have − m
k log(mg − kv) = t + C, or k e
t+C
= v(t).
• Let us consider a small particle in a large medium. If the temperature of the particle and
that of the medium is different, then the changing rate of the temperature is proportional
to the difference of the temperature. As the medium is large, we may assume that only the
temperature y(t) changes with y(0) = T , while the medium remain in the same temperature
M . In a differential equation,
y 0 (t) = k(M − y(t)).
This is again separable.

115
Dec. 20. Complex numbers
Consider the equation x2 + 1 = 0. There is no real solution x that satisfies this equation because
x2 > 0, hence x2 + 1 > 0. Yet, it is possible to extend the set of real numbers in such a way to
include solutions to this equation. One of such solutions is denoted by i, that means i2 = −1.

Complex numbers as ordered pairs


We expect that i behaves in a way similar to the real numbers. We can consider a + ib, a, b ∈ R,
it should hold that (a + ib)(c + id) = ac + i2 bd + i(ad + bc) = ac − bd + i(ad + bc).
This can be formulated as follows: A complex number is a pair (a, b) of real numbers, and
we define

• (Equality) (a, b) = (c, d) as complex numbers if and only if a = c and b = d.

• (Sum) (a, b) + (c, d) = (a + c, b + d).

• (Product) (a, b)(c, d) = (ac − bd, ad + bc).

This is a formal definition, and it is customary to denote (a, b) by a + ib. a is called the real
part and b is called the imaginary part.
The set of complex numbers is denoted by C.

Theorem 210. For x, y, z ∈ C, we have the following.

• (Commutativity) x + y = y + x, xy = yx.

• (Associativity) x + (y + z) = (x + y) + z, x(yz) = (xy)z.

• (Distributive law) x(y + z) = xy + xz.

Proof. All these properties can be shown by calculating both sides and using the properties of
real numbers. For example, for x = a + ib = (a, b), y = c + id = (c, d), x + y = (a, b) + (c, d) =
(a + c, b + d) = (c + a, d + b) = y + x, xy = (a, b)(c, d) = (ac − bd, ad + bc) = (ca − db, da + cb) =
(c, d)(a, b) = yx.
Other properties are left as exercises.

Real numbers in complex numbers


Let us consider the complex number (0, 0). For any other complex number (a, b), it holds that
(0, 0)(a, b) = (0, 0) and (0, 0) + (a, b) = (a, b). That is, (0, 0) plays the same role of 0 ∈ R.
Next, we consider (1, 0). For any other complex number (a, b), it holds that (1, 0)(a, b) =
(a, b). That is, (1, 0) plays the same role of 1 ∈ R.
For any (a, b) ∈ C, it holds that (a, b) + (−a, −b) = (0, 0). We write −(a, b) for (−a, −b).
a2 +b2 −ab+ba
For any (a, b) ∈ C, (a, b) 6= (0, 0), it holds that (a, b)( a2 +b
a −b
2 , a2 +b2 ) = ( a2 +b2 , a2 +b2 ) = (1, 0).

We write ( a2 +b
a
2 , a2 +b2 ) for 1/(a, b), or (a, b)
−b −1 .

All these properties tell that C is an object called a field.


For any number a ∈ R, the complex number (a, 0) behaves exactly as a ∈ R. Indeed,
(a, 0) + (b, 0) = (a + b, 0), (a, 0)(b, 0) = (ab, 0).

Imaginary unit
The complex number (0, 1) satisfies (0, 1)2 = (−1, 0). Indeed, (0, 1)2 = (0 − 1, 0 + 0) = (−1, 0).
With the understanding that (−1, 0) = −1 ∈ R, we can interpret this as (0, 1) = i ∈ C.
For a real number a and a complex number (b, c), we have a · (b, c) = (a, 0)(b, c) = (ab, ac).

116
−1 + 2i •

2 + i•

−0.5 − i •

Figure 79: Various complex numbers on the plane

z1 + z2
z2

• z1

Figure 80: Sum of two complex numbers

With this understanding, any complex number (a, b) can be written as a + ib, where a =
(a, 0), b = (b, 0), i = (0, 1). We can perform all the usual computations using i2 = −1, for
example,

(a + ib)(c + id) = ac + iad + ibc + i2 bd = ac − bd + i(ad + bc),


1 a − ib a − ib
= = 2 .
a + ib (a + ib)(a − ib) a + b2

Geometric interpretation
As we represented a real number as a point on the line, we can represent a complex number on
the plane:
This is helpful especially when we consider various operations on complex numbers. For
example, any complex number (a, b) can be considered as a segment from (0, 0). Then the sum
can be found by forming a parallelogram.
As we identify a complex number with√a point on the plane, for each complex number (a, b)
the length of the segment (0, 0)–(a, b) is a2 + b2 and the angle from the horizontal axis (the
real numbers), and we can write this as (a, b) = (r cos θ, r sin θ).

117
z = (r cos θ, r sin θ)

Figure 81: Complex number represented by radius and angle

With two complex numbers (r1 cos θ1 , r1 sin θ1 ), (r2 cos θ2 , r2 sin θ2 ), we have

(r1 cos θ1 , r1 sin θ1 )(r2 cos θ2 , r2 sin θ2 )


= (r1 r2 (cos θ1 cos θ2 − sin θ1 sin θ2 ), r1 r2 (cos θ1 sin θ2 + sin θ1 cos θ2 ))
= (r1 r2 cos(θ1 + θ2 ), r1 r2 sin(θ1 + θ2 ))

Therefore, the product has the length r1 r2 and the angle θ1 + θ2 . In particular, if we take
z = (r cos θ, r sin θ), then we have z n = (rn cos nθ, rn sin nθ).
From this, we can conlude that for any z ∈ C there is the n-th root in C. Indeed, if
1 1
z = (r cos θ, r sin θ), then we can take w = (r n cos nθ , r n sin nθ ).

Fundamental theorem of algebra


In complex numbers, we can solve the second degree equation ax2 + bx + c = 0. Indeed, we can
use the usual formula √
−b ± b2 − 4ac
x= ,
2a
where the square root is interpreted as a complex number, which always exists.
In general, a polynomial equation an xn + · · · a1 x + a0 = 0 has n solution in C. This is called
the fundamental theorem of algebra.

Dec. 22. Complex valued functions



Let us consider x = a + ib ∈ C. We denote by |x| = a2 + b2 the absolute value of x. For two
complex numbers x, y, the disntance between x, y is |x − y|. This distance coincide with the
distance on the plane.
We have the triangle inequality

|x − y| ≤ |x − z| + |z − y|.

This is literally a triangle inequality, in the sense that |x − y|, |y − z|, |z − x| are the lengths of
the sides of the triangle formed by x, y, z in C.
For x = a + ib, let Re x = a, Im x = b. We have |x| ≥ |Re x|, |Im x|, while |x| ≤ |Re x| + |Im x|.

Complex sequences
Let us consider a sequence of complex numbers {xn } and x ∈ C. We say that xn converges to x
(and write xn → x) if |xn − x| → 0. Note that {|xn − x|} is a sequence of real numbers, hence
we can use the definition and results there.

118
z1 •

z2 •

−z2

Figure 82: The distance of two complex numbers

Lemma 211. xn converges to x if and only if Re xn and Im xn converge to Re x and Im x,


respectively.

Proof. If |xn − x| → 0, then |Re xn − Re x| = |Re (xn − x)| ≤ |xn − x| hence Re xn → Re x.


Similarly for Im x.
If |Re xn − Re x|, |Im xn − Im x| → 0, then |xn − x| ≤ |Re xn − Re x| + |Im xn − Im x| → 0.

In particular, we say that {xn } is a Cauchy sequence if for any  > 0 there is N such that
|xm − xn | <  if m, n > N . In this case, {xn } is convergent.

Complex series
If {zn } are complex numbers, we can also consider the series
n
X
z k = z0 + z1 + · · · + zn .
k=0

We can define the convergence of the series as the convergence of the sequence nk=0 zk just as
P
with real numbers.
We say that the series nk=0 zk converges absolutely if nk=0 |zk | converges.
P P

Lemma 212. If nk=0 zk converges absolutely, then the series nk=0 zk converges.
P P

Proof. If nk=0 zk converges absolutely, then nk=0 |zk | is a Cauchy sequence, and hence nk=0 zk
P P P
is a Cauchy sequence (in the complex sense as above) because
n
X m
X n
X n
X
zk − zk = zk ≤ |zk | < 
k=0 k=0 k=m+1 k=m+1
Pn Pn
for sufficiently large m, n, by triangle inequality.
Pn This implies that Re k=0 z k and Im k=0 zk
are Cauchy, hence converge. Therefore, k=0 zk converges by Lemma 211.

119
Recall that, for real number x, we have proved

X xn
ex = ,
n!
n=0

X (−1)n x2n+1
sin x = ,
(2n + 1)!
n=0

X (−1)n x2n
cos x = .
(2n)!
n=0

We can extend these functions by replacing x by a complex number z. Indeed, the series
N
X zn
n!
n=0

is absolutely convergent for any z ∈ C, because


N
X |z|n
n!
n=0

n
is convergent by the ratio test: with an = |z| an+1 |z|
n! , an = n+1 → 0.
zn
Therefore, we can define the complex exponential function ez by ez = ∞ n! . Let us see
P
n=0
what happens if z = iθ, θ ∈ R.
∞ ∞ ∞

X (iθ)n X (iθ)2n X (iθ)2n+1
e = = +
n! (2n)! (2n + 1)!
n=0 n=0 n=0
∞ n 2n ∞
X (−1) θ X (−1)n θ2n+1
= +i
(2n)! (2n + 1)!
n=0 n=0
= cos θ + i sin θ.

This last formula is called the Euler formula. In particlar with θ = π, we have eiπ = −1, or
eiπ + 1 = 0.
We can also extend cos z, sin z to complex variables (convergence is proven again by ratio

120
test). As we have

X θn
eθ = ,
n!
n=0

X (−1)n θn
e−θ = ,
n!
n=0

X (−1)n (iθ)2n
cos(iθ) =
(2n)!
n=0

X (−1)n (−1)n θ2n
=
(2n)!
n=0

X θ2n 1
= = (eθ + e−θ ) = cosh θ,
(2n)! 2
n=0

X (−1)n (iθ)2n+1
sin(iθ) =
(2n + 1)!
n=0

X (−1)n i(−1)n θ2n+1
=
(2n + 1)!
n=0

X θ2n+1 i
=i = (eθ − e−θ ) = i sinh θ,
(2n + 1)! 2
n=0

Furthermore, this explains why the differential equation y 00 + y = 0 has a general solu-
tion y(x) = C1 sin x + C2 cos x. By formally applying (this can be justified by the mate-
rial in Mathematical Analysis II) the chain rule, we have D(eix ) = ieix , D2 (eix ) = −eix and
D(e−ix ) = −ie−ix , D2 (e−ix ) = −e−ix , hence they are formally two solutions of the equation
y 00 + y. Hence their linear combinations sin x = 2i
1
(eix − e−ix ) and cos x = 12 (eix + e−ix ) satisfy
the same equation.

Jan. 11. Ordinary differential equations


Integral curves
As in previous example, it is typical that, by solving a differential equation, we obtain an implicit
equation F (x, y, C) = 0. This means that, for each value of C, we have a relation between x, y,
and in certain cases, it defines a function y of x. As this function y(x) satisfies the differential
equation y 0 (x) = f (x, y(x)), y 0 (x) should mean the slope of the curve y(x) at the point (x, y(x)).
2
For example, consider the equation y 0 = x. This can be integrated and y = x2 + C, and
depending on the value of C, we have different parabolas. On the other hand, at each point in
the xy-plane, we can draw an arrow which goes from (x, y) to (x + , y + y 0 (x)). These arrows
are tangent to the curve which represents the solution.
This plot of arrows is called a vector field, and a solution is obtained by “connecting” these
arrows.
(One can visualize the arrows by a command VectorPlot[1,f(x,y)] on Wolfram Alpha,
and the stream by StreamPlot[1,f(x,y)], where we took  = 1).

Existence and uniqueness of solution


We have considered ordinary differential equations y 0 = f (x, y) and found solutions to some of
them. Yet, some differential equation does not have a solution for a given initial condition, and

121
Figure 83: The integral curves of y 0 = x.

others have many solutions.

• Consider (y 0 )2 − xy 0 + y + 1 = 0: no solution with y(0) = 0, because then y 0 (0)2 + 1 = 0,


which is impossible because y 0 (0) should be a real number.
2
• Consider y 0 = 3y 3 : the initial condition y(0) = 0 has two solutions y = 0 and y = x3 .

Yet, as we have seen, a differential equation gives a vector field as in Figure 83, and it
should be enough to “chase the arrows”. For this to be possible, f (x, t) should have certain nice
properties. We only state the theorem, and leave the proof to a more advanced book.
For this purpose, we need the following concept: Let f (x, y) be a function of two variables,
that is, f gives a number for a given pair of numbers (x, y). For each fixed y, we can think of
f (x, y) as a function of x, and hence take the derivative with respect to x. This is called the
partial derivative with respect to x, and denoted by ∂f ∂x .
∂f
Example 213. • Let f (x, y) = x2 + y 2 . Then ∂x = 2x.
∂f
• Let f (x, y) = xy. Then ∂x = y.
∂f
• Let f (x, y) = sin(xy 2 ). Then ∂x = y 2 cos(xy 2 ).
∂f
It is also possible to consider ∂y . The detail will be explained in Mathematical Analysis II.
∂f
Theorem 214. Suppose that f (x, y) and ∂x are continuous in a rectangle

R = {(x, y) : x0 − δ < x < x0 + δ, y0 −  < y < y0 + }.

Then there is δ1 such that the equation y 0 = f (x, y) has a unique solution y(x) with initial
condition y(x0 ) = y0 for x0 − δ1 < x < x0 + δ1 .

Euler method
The proof of this theorem amounts to construct approximate solutions. At the end, for appli-
cations in science and engineering, we are satisfied with having sufficiently good approximate
solutions.
There are many methods to obtain a numerical solution of a differential equation. One of the
simplest of them is called the Euler’s method, and it literally chase the vector field as follows.

122
Figure 84: The Euler method to solve y 0 = x with (x0 , y0 ) = (0, 1) with  = 0.5.

Let us consider the differential equation y 0 = f (x, y) with the inizial condition y(x0 ) = y0 ,
where x0 , y0 ∈ R. This means that the solution y(x) passes the point (x0 , y0 ). Furthermore, by
“chasing the arrows”, the slope of the curve y(x) at the point (x0 , y0 ) is f (x0 , y0 ). That is, if we
take a small step , then the next point on the curve is close to (x0 + , y0 + f (x0 , y0 )) = (x1 , y1 ).
Then, again at the point (x1 , y1 ), the slope of the curve is f (x1 , y1 ), hence the next point on the
curve is close to (x1 + , y1 + f (x1 , y1 )) = (x2 , y2 ), and so on. In this way, we obtain a union of
segments which approximates the solution.
If we take smaller , the approximation gets better, while we need do more computations.

A few codes in Python


In the language Python, it is very easy to write a code to solve a differential equation. Let us
see some examples3
The following code solves the equation y 0 = x, x0 = 0, y0 = 1.

import numpy as np
from scipy . integrate import odeint
import matplotlib . pyplot as plt

x = np . linspace (0 , 5 , 500)

# parameter k
k = 0.5

# equation y ’ = k y

def deriv (y , x ):
return k * y

# the initial condition


y0 = 1

# solve the equation numerically


3
The plot part is taken from the book by Christian Hill, https://scipython.com/book/chapter-8-scipy/
additional-examples/the-sir-epidemic-model/.

123
ret = odeint ( deriv , y0 , x )
y = ret

# plot the graph


plt . plot (x ,y , label = ’ Solution of y \ ’( x )= ’+ str ( k )+ ’ y with y (0)= ’+ str ( y0 ))
legend = plt . legend ()
plt . show ()
To solve a second-order differential equation y 00 +b = 0, use the trick of doubling the variables:
y00 = y1 and y10 + b = 0, which means y000 + b = 0.
import numpy as np
from scipy . integrate import odeint
import matplotlib . pyplot as plt

x = np . linspace (0 , 5 , 500)

# parameter k
k = -10

# equation y ’ ’ = ky , that is , y_0 ’ = y_1 and y_1 ’ = - k y_0

def deriv (y , x ):
return [ y [1] , k * y [0]]

# the initial condition


y0 = [1 ,0.1]

# solve the equation numerically


ret = odeint ( deriv , y0 , x )
y_0 , y_1 = ret . T

# plot the graph


plt . plot (x , y_0 , label = ’ Solution of y \ ’\ ’( x )= ’+ str ( k )+ ’ y with y (0)= ’+ str ( y0 [0])
legend = plt . legend ()
plt . show ()
It is also possible to solve a differential equation with more variables, for example the SIR
model
dS βS(t)I(t)
=− ,
dt N
dI βS(t)I(t)
= − γI(t),
dt N
dR
= γI(t),
dt
see for example this page.

124

You might also like