Advanced Evt Ivt Theoremproofs
Advanced Evt Ivt Theoremproofs
Yoh Tanimoto
              Dipartimento di Matematica, Università di Roma Tor Vergata
                   Via della Ricerca Scientifica 1, I-00133 Roma, Italy
                             email: hoyt@mat.uniroma2.it
    We use the textbook “Calculus” Vol. I by Tom M. Apostol, Wiley, but follow the chapters in
a different order.
    From Monday to Thursday we have lectures, and on Friday we do mostly exercises.
   • Lecture notes:
     http://www.mat.uniroma2.it/~tanimoto/teaching/2021MA1/2021MA1.pdf
   • Exercises:
     http://www.mat.uniroma2.it/~tanimoto/teaching/2021MA1/2021MA1ex.pdf
Some tips
   • Writing math.
        – LATEX. You can try it here, and you can install the full set afterwards. You need to
          learn some commands, but once you know it it’s very powerful. All my lecture notes
          and slides are written in LATEX
        – Word processor (MS Word, Apple Pages, Open Office, Libre Office (Insert → Objects
          → Formula)...).
        – Wolfram Math Alpha You can just type some formulas in and it shows the result.
        – Programming languages. Python (I used it to make the graph of the SIR model),
          Java, C,· · ·
• trigonometric functions (cos x, sin x), exponential function ex , logarithmic function log x.
                                                  1
     10
      0
          0     2       4      6       8         10
Figure 1: Left: A graph can be used to study changing quantities. Right:the SIR model.
Mathematical symbols
We use symbols for general mathematical objects. Before using a symbol, we declare what
kind of object it is. For example,
A symbol might be “recycled”, that is, can be declared to be something different (unfortunately,
we have only 26 × 2 alphabets).
   Symbols are very useful because we can express general properties of certain mathematical
objects at the same time, without specifying them every time.
                                                      2
Integers and rational numbers
We assume that we know
      • rational numbers:          1 2
                                   2, 3, · · ·
                                                          23
                                                      , 62518      26543 , · · · (integers are also rational numbers)
                                                              , − 3028746
• (commutativity) x + y = y + x, x · y = y · x
• (associativity) (x + y) + z = x + (y + z), (x · y) · z = x · (y · z)
• (distributive law) (x + y) · z = xz + yz
      • (zero and unity) There are special distinct rational numbers, called 0 and 1, such that
        x + 0 = x and x · 0 = 0. And x · 1 = x.
• (negative) There is a only one rational number, which we call −x, such that x + (−x) = 0.
• (inverse) If x 6= 0, there is only one rational number, which we call x−1 , such that x·x−1 = 1.
• if a + b = a + c, then b = c.
• −(−a) = a.
• a(b − c) = ab − ac.
• a · 0 = 0 · a = 0.
• if ab = ac and a 6= 0, then b = c.
• if ab = 0, then a = 0 or b = 0.
      • if b 6= 0, d 6= 0, then   a
                                  b   +       c
                                              d   =   ad+bc
                                                        bd .
      • if b 6= 0, d 6= 0, then   a
                                  b   ·   c
                                          d   =   ac
                                                  bd .
      • if a 6= 0, b 6= 0, then ( ab )−1 = ab .
  1
      statements that can be proven are called theorems, and the properties that we assume are called axioms.
                                                                   3
                       −4     −3     −2     −1     0       1    2         3   4   5
                     − 17
                       4      −3                   0   2
                                                       3            2.5
Proof. We only prove a few of them and leave the rest as exercises.
   Let us assume that a + b = a + c. Then, we take −a and
                                                   4
   • if a < b, b < c then a < c.
• if a 6= 0, then a2 (= a · a) > 0.
• 1 > 0.
Proof. We only prove a few of them and leave the rest as exercises. In general, to show “A if and
only if B”, it is enough to show that “if A, then B” and “if B, then A”, this is because “A only if
B” implies that “if not B, then not A”, and by contradiction, “if A, not not B”, but “not not B”
means B.
   If a < b, then by adding −b to both sides, we get a − b < 0. Conversely, if a − b < 0, by
adding b to both sides we get a < b.
   If a = b, then b − a = 0 and we know that both b − a > 0 and b − a < 0 are false and hence
b > a and b < a are false. If a < b, then a − b < 0 and a − b = 0 is false, and hence a = b is false.
   If a < b, then 0 < b − a and 0 < c · (b − a) = bc − ac, hence ac < bc.
   If a 6= 0, then either a > 0 or a < 0. For the case a > 0, we have a2 = a · a > 0. For the case
a < 0, we have −a > 0 and a2 = (−a)2 > 0.
   All these “theorems” about rational numbers should be well-known to you. But it is important
that we could prove them from a few axioms, which we assume to be true.
   • By nomination. We can nominate all elements of a set. For example, A = {0, 1, 2, 3} and
     B = {1, 10, 100, 1000} are sets.
   • By specification. We include all elements of an existing set with specific properties. For
     example, A = {x ∈ Z : there is y ∈ Z such that x = 2y} (read that “A is the set of
     integers such that there is an integer y such that x = 2y”) is the set of multiples of 2 (we
     recycled the symbol A. When we do this, we shall always declare it).
                                                 5
For a set constructed by nomination, the order and repetition do not matter: {0, 1, 2, 3} =
{0, 3, 2, 1} = {0, 0, 1, 1, 1, 2, 3}. In other words, a set is defined by its elements.
    A construction by specification appears very often. Let us introduce a more symbol.
   • ∅ is the set that contains nothing and called the empty set. ∅ is a subset of any set: if A
     is a set, the statement “if x ∈ ∅ then x ∈ A” is satisified just because there is no such x!
Subsets
Let B be a set. We say that A is a subset of B if all elements of A belong to B, and denote
this by A ⊂ B. It holds that A ⊂ A for any set A.
• N ⊂ Z.
   It may happen that A ⊂ B and B ⊂ A, that is, all elements of A belong to B and vice versa.
This means that A and B are the same as sets, and in this case we write A = B.
   The definition by specification A = {x ∈ B : x satisfies the property XXX...} gives always a
subset, in this case of B. Note also that x in this definition has no meaning (“dummy”). One
can write it equivalently A = {y ∈ B : y satisfies the property XXX...}.
   For x ∈ A, the set {x} that contains only x should be distinguished from x. It is a subset of
A: {x} ⊂ A.
   Similarly, we can consider the set of all the elements which belong both to A and B, and
nothing else. It is called the intersection of A and B and denoted by A ∩ B.
   Furthermore, the difference of B with respect to A is all the element of A that do not belong
to B and is denoted by A \ B (note that this is different from B \ A).
    We can consider the union of more than two sets: A ∪ (B ∪ C). By considering the meaning,
this set contains all the elements which belong either A or B ∪ C, which is to say all elements
which belong either A or B or C. Therefore, the order does not matter and we can write A∪B∪C.
Similarly, A ∩ B ∩ C is the intersection of A, B and C.
    We may consider a family of sets {Ai }i∈I indexed by another set I. For example, we
can take N as the index set and An = {m ∈ Z : m is a multiple of n}. For a family of set, we
can define the union and the intersection analogously and we denote them by
                                        [          \
                                            Ai ,      Ai ,
                                       i∈I         i∈I
respectively.
                                               6
Sep 23. Sets and logic
Sets by specification
Let us recall that, if we fix a set A, we can define a subset of A by specification: it is the subset
of elements x of A that satisfy a certain condition ϕ(x):
{x ∈ A : ϕ(x)},
where ϕ(x) is a condition on x. For example, {x ∈ Z : x > 10} is the set of integers larger than
10.
    On one hand, we can consider the combined conditions: for example, the condition that
x > 10 and the condition that x < 15 can be considered at the same time. The set of integers
that satisfy both of the condition is
On the other hand, we observe that this set is the intersection of two sets:
                   {x ∈ Z : x > 10} = {10, 11, 12, 13, 14, 15, 16, 17, · · · },
                   {x ∈ Z : x < 15} = {−2, −1, 0, 1, · · · , 10, 11, 12, 13, 14, 15}.
In general, if ¬ϕ(x) is the negation of ϕ(x), then it holds that, for B = {x ∈ A : ϕ(x)},
A \ B = {x ∈ A : ¬ϕ(x)}.
    Let us consider the set {x ∈ Q : (x − 1)(x − 3) > 0}. To understand better this set, we need
to understand the condition (x − 1)(x − 3) > 0. The left-hand side is a product of two rational
numbers. The product of two rational numbers is positive if and only if one of the following
cases is true.
                                                   7
Note that x > 1 is true if x > 3. Similarly, x < 3 is true if x < 1. Therefore, these conditions
are equivalent to
• x>3
• x<1
Altogether, we have
Example 7. • {1, 2, 3}, {2}, {1, 4, 6, 7} are sets. We can collect them together
This is a set of sets. It is different from the set of their elements {1, 2, 3, 4, 6, 7}.
{∅, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}}.
   • One can also consider the set of all subsets of N, Z, Q, but we cannot name all the elements:
     they are infinite. For example, for N = {1, 2, 3, 4, · · · }, the set of subsets of N is infinite.
Example 8.        • Let A = {1, 2, 3}, B = {3, 4}. Then the sef A × B of the ordered pairs of
    A, B is
                            A × B := {(1, 3), (2, 3), (3, 3), (1, 4), (2, 4), (3, 4)}.
    Ordered pairs can be described using graphs. If A, B ⊂ Z have finitely many points, say
m, n respectively, then there are m · n ordered pairs. We take the horizontal axis for A and the
vertical axis for B.
    To obtain the graph of A × B, we should mark the point (x, y) if and only if x ∈ A and
y ∈ B. For any subset X of A × B, we should mark the point (x, y) if and only if (x, y) ∈ X.
See Figure 3.
    The graph can be understood in terms of ordered pairs. Let A = {1, 2, 3, 4, 5, 6}, and B =
{(x, y) ∈ A × A : y = 2x}. Let us give all elements of B and draw its graph. We check all
6 × 6 = 36 elements. See Figure 4, it is {(1, 2), (2, 4), (3, 6)}. Notice that it is on a straight line!
                                                    8
        5                                                      5
4 4
3 3
2 2
1 1
1 2 3 4 5 1 2 3 4 5
Figure 3: Left: the set of all ordered pairs {1, 2, 3, 4, 5} × {1, 2, 3, 4, 5}.        Right: a subset
{(1, 1), (2, 2), (3, 3), (4, 4), (5, 5)} ⊂ {1, 2, 3, 4, 5} × {1, 2, 3, 4, 5}.
                                                          .
6 (3, 6)
4 (2, 4)
2 (1, 2)
1 2 3 4 5 6
Figure 4: The set of all ordered pairs (x, y) ∈ {1, 2, 3, 4, 5, 6} × {1, 2, 3, 4, 5, 6} with y = 2x.
                                                           9
                                                                             2π
                              √
                                  2
                                                                                         1
            1
Figure
   √ 5: Left: √the right triangle with equal sides 1. By the Pytagoras’ theorem, the longest side
is 1 + 1 = 2. Right: the unit circle with radius 1 (diameter 2). The length of the circle
      2   2
                                                 10
                          √
Exercise. Prove that 2 2 is irrational.
    It has been proven that π and e are irrational, but they are more difficult. Instead, it can be
easily proven that any nonrepeating decimal number cannot be rational. This means there are
many irrational numbers.
    In other words, the set of rational numbers have “many spaces between them”. We should fill
them in with irrational numbers, so that the set of real numbers is a “continuum”.
• (associativity) (x + y) + z = x + (y + z), (x · y) · z = x · (y · z)
• (distributive law) (x + y) · z = xz + yz
   • (zero and unity) There are special distinct rational numbers, called 0 and 1, such that
     x + 0 = x and x · 0 = 0. And x · 1 = x.
• (negative) There is only one real number, which we call −x, such that x + (−x) = 0.
   • (inverse) If x 6= 0, there is only one real number, which we call x−1 , such that x · x−1 = 1.
There is also an order relation < which satisfies, for x, y, z real,
   • if 0 < x, 0 < y, then 0 < xy and 0 < x + y.
   • 00
We can prove Theorems for real numbers corresponding to Theorems 1, 2. Therefore, the real
numbers have the same properties as the rational numbers, concerning the sum, product and
order.
     We say that S ⊂ R is bounded above if there is x ∈ R such that for any y ∈ S it holds
that y ≤ x, and we write S ≤ x. S is said to be bounded below if there is x ∈ R such that for
any y ∈ S it holds that y ≥ x, and we write S ≥ x.
     If S is both bounded above and below, we say that S is bounded.
     If S is bounded above, then any x ∈ R such that S ≤ x is called an upper bound of S.
Similarly, if x ≤ S, then x is said to be a lower bound of S.
     If S has a least upper bound, that is there is x such that S ≤ x and x ≤ y for any upper
bound y of S, then x is called the supremum of S and we denote it by x = sup S. Similarly, if
S has a largest lower bound x, then it is called the infimum of S and we denote it by x = inf S.
     R includes Z and Q: 1 ∈ R, hence 2 = 1 + 1, 3 = 1 + 1 + 1, · · · and −1, −2, · · · ∈ R. Also, if
p, q ∈ Z, pq ∈ R.
     What distinguishes R from Q is the following.
                                                  11
                              1                            1.4 1.41 1.414 1.5
                                                     √
               Figure 6: The set S approximating         2, which is bounded by 1.5.
n − 3 n − 2 n − 1x n
0 n1 x 1 2
Figure 7: Any x ∈ R falls between n − 1 and n (including equality) for some n ∈ N. For any
x > 0, there is n ∈ N such that n1 < x.
    • (the least upper bound axiom, or the completeness axiom) every nonempty subset S of R
       which is bounded above has a supremum: there is B ∈ R such that B = sup S.
                                 √
    This should imply  √ that 2 = 1.41421356 · · · belongs to R! Indeed, let us take, by chop-
ping the digits of 2, S = {1, 1.4, 1.41, 1.414, 1.4142, · · · }. S is bounded above, indeed, 1.5 >
1, 1.4, 1.41, 1.414, · · · . On the other hand, if x has a decimal representation,
                                                                         √         e.g. 1.415, then
there is a smaller number x0 = 1.4149. So, sup S should be exactly 2. We will see this more
precisely later.
    (A lemma is a theorem (a consequence of axioms) used to prove a more important theorem)
Lemma 10. If S ⊂ R is bounded above and B = sup S, then for any  > 0, there is x ∈ S such
that B −  < x.
Proof. By contradiction, assume that there is  > 0 such that B −  ≥ x for all x ∈ S. Then B
is not the least upper bound, because B −  is an upper bound of S and B −  < B.
Proof. By contradiction, assume that N were bounded above. Then by the completeness axiom,
there is x = sup N. By the lemma above, for  = 12 , there is n ∈ N such that x − 21 < n. But
then x < n + 12 < n + 1 ∈ N, and this contradicts the assumption that x were the upper bound
of N. This implies that N is not bounded above.
Corollary 12. For any x ∈ R, there is n ∈ N such that x < n. For any y, z ∈ R and z > y,
there is n ∈ N such that n1 < z − y.
Proof. By the theorem above, x is not an upper bound of N, so there is n such that x < n. By
applying this to z−y
                  1
                     , there is n such that z−y
                                             1
                                                < n, which implies that n1 < z − y.
    Therefore, we can represent the set of real numbers by a straight line, and any point x ∈ R
is on the line and it falls between an integer n and another n − 1 (possibly x = n). Conversely,
any point on the line gives an element in R.
    Any real number R has a decimal representation (next lecture).
    Note that Q does not have the completeness property!
Proof. A is bounded above, indeed, if x2 < 2, then x2 < 4 = 22 , and hence x < 2.
   Let s = sup A ∈ R. Then s2 = 2. We prove this by contradiction.
                                                12
                                                               2
   • if s2 < 2, then we take  > 0 such that 0 <  < 2−s   s   (or s < 2 − s2 ) and  < s. Then
                                2
     (s + 4 )2 = s2 + s 2 + 16         2 + 2 < s + s < 2, therefore, s is not an upper bound of
                              
                                  < s2 + s  s   2
                                                           2
   • if s2 > 2, then we take  > 0 such that 0 <  < s s−2 (or s < s2 − 2) and  < s. Then
                              2
     (s − 4 )2 = s2 − s 2 + 16 > s2 − s > 2, therefore, s is not the least upper bound of A
     (because s − 4 ∈ A is another upper bound, smaller than s), contradiction.
But we know that there is no rational number s ∈ Q such that s2 = 2. Hence s = sup A ∈
                                                                                     / Q.
                           √
   This also says that s = 2 belongs to R.
• (a, b] = {x ∈ R : a < x, x ≤ b}
• [a, b) = {x ∈ R : a ≤ x, x < b}
• (a, ∞) = {x ∈ R : a < x}
• [a, ∞) = {x ∈ R : a ≤ x}
• (−∞, b) = {x ∈ R : x < b}
   • (−∞, b] = {x ∈ R : x ≤ b}
    Remember that, a, b are given numbers, and x is a “dummy” number. You can write them
in a different way, without using x:
   • (a, b) is the set of all numbers larger than a and smaller than b
   • [a, b] is the set of all numbers larger than or equal to a and smaller than or equal to b
Example 14. Consider (0, 1).
   • 0.1, 0.2, 0.5, 0.999 ∈ (0, 1).
   • 0, 1, 2, 3, 10, −1, −2 ∈
                            / (0, 1).
• sup(0, 1) = 1.
   • inf(0, 1) = 0.
Consider [0, 1].
   • 0, 0.1, 0.2, 0.5, 0.999, 1 ∈ [0, 1].
   • 2, 3, 10, −1, −2 ∈
                      / [0, 1].
• sup[0, 1] = 1.
• inf[0, 1] = 0.
                                               13
                                   (            )
                      −1           0    0.4    1                        3
                                    [           ]
0 0.4 1
Figure 8: Open and closed intervals (0, 1) and [0, 1]. The open interval does not include the
edges 0, 1, while the closed interval [0, 1] does.
                             A                 x                A+x
                       (           )                        (           )
                       1           2           3            4           5
                             A            B                A+B
                       (           )     [ ]         (             )
                       1           2 2.4 2.6        3.4           4.6
                                        A (the smaller)
                       [            [                       )           )
                      −2          −1                        1           2
                                         2A (the larger)
Figure 9: Intervals and their operations. Top: (1, 2) + 3 = (4, 5). Middle: (1, 2) + [2.4, 2.6] =
(3.4, 4.6). Bottom: 2[−1, 1) = [−2, 2).
Operations on sets
Let A, B be subsets of R and a ∈ R. We denote various subsets R as follows.
• A + a = {x ∈ R : x = y + a for some y ∈ A} = {y + a : y ∈ A}
• A − a = {x ∈ R : x = y − a for some y ∈ A} = {y − a : y ∈ A}
• A + B = {x ∈ R : x = y + z for some y ∈ A, z ∈ B} = {y + z : y ∈ A, z ∈ B}
• A − B = {x ∈ R : x = y − z for some y ∈ A, z ∈ B} = {y − z : y ∈ A, z ∈ B}
Example 15.     • Consider A = (0, 1), x = 2. Then A + x = (2, 3), because if 0 < y < 1, 2 <
    y + 2 < 3. Note that the boundary 2, 3 is not included.
   • Consider A = [1, 2], B = (2.4, 2.6). Then A + B = (3.4, 4.6). Note that the boundary 2, 3
     is not included, because there is no x ∈ A, y ∈ B such that x + y = 3.4 or 4.6.
                                               14
Some properties of upper and lower bounds
Note that sup A, inf A are only defined for nonempty sets (otherwise the definition is meaning-
less).
Lemma 16. If x, y ∈ R and x −  < y for any  > 0, then x ≤ y.
Proof. By contradiction. If x > y, then by Archimedean property, we have n such that n1 < x−y,
in other words, x− n1 > y, which contradicts the assumption that x− < y for arbitrary  > 0.
   • if A, B are bounded below, then A + B is bounded below and inf A + inf B = inf C.
Proof. We prove only the first one, because the second one is analogous.
    By the completeness axiom, A and B have the supremum sup A, sup B. As sup A and sup B
are upper bounds of A and B respectively, for any element z ∈ C we have x ∈ A, y ∈ B such
that z = x + y and x ≤ sup A, y ≤ sup B hence z = x + y ≤ sup A + sup B. In particular,
sup A + sup B is an upper bound of C, hence sup C ≤ sup A + sup B.
    Conversely, we know from Lemma 10 that, for any  > 0, there is x ∈ A (and y ∈ B) such
that sup A − 2 < x (and sup B − 2 < y). Therefore, sup A + sup B − 2 − 2 = sup A + sup B −  <
x + y ≤ sup C for arbitrary  > 0, hence by Lemma 16, sup A + sup B ≤ sup C. Altogether,
hence sup C = sup A + sup B.
   Remember that sup A is the least (smallest) upper bound and inf B is the greatest (largest)
lower bound.
Theorem 18. Let A, B ⊂ R. If for any x ∈ A and y ∈ B it holds that x < y, then sup A ≤ inf B.
Proof. Any y ∈ B is an upper bound of A, hence sup A ≤ y. This means that sup A is a lower
bound of B, hence sup A ≤ inf B.
                                               15
                        √
   We denote it by s = a.
                                                                                             1
   For any n ∈ N, we can define the n-th root of any positive number a and we denote it by a n .
The existence can be proved similarly.
   •   1
       3   = 0.33333 · · ·
       √
   •       2 = 1.41421 · · ·
• π = 3.14159 · · ·
     Indeed, let x ∈ R be a real number and x > 0. By the Archimedean property, there is a
natural number n ∈ N0 such that n − 1 ≤ x < n (this is possible, because any subset of N has
the minimal element, which we prove below). We take a0 = n − 1.
     Note that 0 < x − a0 < 1. Therefore, 0 < 10(x − a0 ) < 10. Take a1 ∈ N0 the largest natural
number such that a1 ≤ 10(x − a0 ). As it is the largest, we have again 0 < 10(x − a0 ) − a1 < 1.
     We can repeat this procedure and obtain an , and it always hold that x − a0 .a1 · · · an <
       · · 01}.
0.0| · {z
  n-digits
   Let A = {a0 , a0 .a1 , a0 .a1 a2 , a0 .a1 a2 a3 , · · · }. This A is bounded (by a0 + 1), hence it has the
supremum s. Note that x is un upper bound of A, hence sup A ≤ x. On the other hand, if for any
           · · 01}, we have x −  < a0 .a1 · · · an ∈ A, therefore, x ≤ sup A. Altogether, x = sup A = s.
 = 0.0| · {z
       n-digits
Proposition 20. A real number that is has nonrepeating decimal representation is irrational.
Proof. We prove that any (positive) rational number has a repeating decimal representation.
Then the claim follows by contradiction.
    Let x = a0 .a1 a2 · · · = pq , p, q ∈ N. We can write p = nq + r, where n, r ∈ N and 0 ≤ r < q
(division with remainder). We set a0 = n. Then we write 10r = n1 q + r1 again, and wet a1 = n1 .
In this way, we obtain the decimal representation of pq , but there are only finitely many possible
values 0, 1, · · · q − 1 of r1 because we are doing the division with remainder by q. This means
that the numbers repeat after at largest q digits.
   The converse of this (any irrational number has a nonrepeating decimal representation) will
be proven later.
• 1∈N
• For every n ∈ N, n + 1 ∈ N
                                                     16
   • For every n ∈ N, n + 1 6= 1
In other words, N consists of 1 and all other numbers obtained by adding 1 repeatedly to 1, and
that is all. This is the precise definition of N.
    With this characterization, we obtain the mathematical induction. Let ϕ(n) be a set of
propositions depending on n ∈ N. If ϕ(1) is true, and if we can prove ϕ(n + 1) from ϕ(n), then
ϕ(n) is true for all natural numbers. Indeed, let S = {n ∈ N : ϕ(n) is true }. S is a subset of N,
1 ∈ S and if n ∈ S, then n + 1 ∈ S. From the Peano axioms, we have S = N. In other words,
ϕ(n) holds for all n ∈ N.
Proof. This is proved by a double induction. Let ϕ(m, n) = “if m > n, then m − n ∈ N”. Let us
first set n = 1, m = 1. In this case, m > n is not true, so we do not have to prove anything.
     Assume that ϕ(m, 1) is true, that is, if m > 1, then m − 1 ∈ N. To prove ϕ(m + 1, 1), assume
that m + 1 > 1, but m + 1 − 1 = m ∈ N. By induction, ϕ(m, 1) is true for all m ∈ N.
     Assume that ϕ(m, n) is true for all m ∈ N. Assume that m > n + 1. Then, m − 1 > n > 0,
and m − 1 ∈ N by ϕ(m, 1). Then by ϕ(m − 1, n), m − (n + 1) = m − 1 − n ∈ N. that is, we
proved ϕ(m, n + 1).
Theorem 23. For any nonempty subset S ∈ N, there is the smallest element in S. That is,
there is n ∈ S such that n ≤ m for all m ∈ S.
   • Next, let n ∈ T and we prove that n + 1 ∈ T . Assume that n + 1 ∈ / T (call this Cn ). Then,
     there is m ∈ S such that m ≤ n + 1, but since n ∈ T , it must hold that n < m. This means
     that m = n + 1 by the previous lemma, and any ` ≤ n does not belong to S. Therefore,
     m = n + 1 would be the smallest element of S, contradicting B. Therefore, Cn is false and
     we obtain n + 1 ∈ T .
                                               17
Then by induction (the Peano axioms) we have T = N. This implies that for any m ∈ S it it
holds for m < n for all n ∈ T = N. But there is no such m (larger than any natural number),
hence S = ∅. This contradicts the assumption A of the theorem. Therefore, the assumption B
made in the proof is wrong. That is, S admits the smallest element.
    Alternatively, this can be proved as follows, but using the axiom of the least upper bound
(the proof above uses only the Peano axioms). As N is bounded below, S is bounded below as
well. Let a = inf S. We show that a ∈ S. If not, for  = 12 , there is n ∈ S such that n < a + 21 .
But then 0 < n − a < 12 , which is impossible by the previous lemma.
Proof. By the Archimedean principle, there is n such that x < n. Therefore, the set {m ∈ N :
x < m} is nonempty, and by the well-ordering principle, it has the smallest element n. As this
is the smallest element, n − 1 ≤ x.
Example 25. • a1 = 1, a2 = 2, a3 = 3, · · ·
• a1 = 1, a2 = 4, a3 = 9, · · ·
   • a1 = 4, a2 = 2534, a3 =     3
                                361    (a finite sequence stops at some n ∈ N)
    When we have a (finite) sequence, we can sum all these numbers up: a1 + · · · + an . We denote
this by the following symbol.
                                     Xn
                                        ak = a1 + · · · + an
                                            k=1
On the other hand, the P   number on the top (n in this example) is where the sequence stops.
Similarly, we can define nk=m ak = am + am+1 + · · · + anPfor n ≥ m.
   More precisely, this is a recursive definition: We define 1k=1 P
                                                                  = a1 and n+1
                                                                          P         Pn
                                                                           k=1 ak =    k=1 ak +
                                                                    n
an+1 Similarly to mathematical induction, we define in this way k=1 ak for all natural numbers
n ∈ N.
                   • a1 = 1, a2 = 2, a3 = 3. 3k=1 ak = 1 + 2 + 3 = 6.
                                               P
Example 26.
                                                           18
                                                        Q4
   • a1 = 1, a2 = 4, a3 = 9, a4 = 16.                     k=1 ak   = 1 · 4 · 9 · 16 = 576.
   In particular, we denote
   • For a ∈ R, an = nk=1 a. For example, a1 = a, a2 = a · a, a3 = a · a · a. By convention, for
                       Q
     a 6= 0, we set a0 = 1.
                                                                    19
       P1               a(1−a)                                            Pn                   a(1−an )
   •     k=1 a
              k   =a=    1−a     is correct. Assume the formula                   k=1 a
                                                                                       k   =     1−a      for n, then
                                         n+1          n
                                                                 !
                                         X            X
                                               ak =         ak       + an+1
                                         k=1          k=1
                                                   a(1 − an )
                                                 =            + an+1
                                                     1−a
                                                   a − an+1 + an+1 − an+2
                                                 =
                                                             1−a
                                                   a(1 − an+1 )
                                                 =
                                                      1−a
       Then by induction the formula holds for all n ∈ N.
Proof. We prove this by induction, but in a slightly different form: we prove that the formula is
correct for n = k, and prove that it holds for n + 1 assuming the formula for n. In this way, we
prove the formula for n ≥ k.
                                  (k+1)!
    If n = k, we have k+1                               k!            k
                                                                          + kk .
                                                                              
                         k   = k!(k+1−k)!   = k + 1 = (k−1)! + 1 = k−1
    Assuming the formula for n, we have
            
        n+2          (n + 2)!
               =
         k        k!(n + 2 − k)!
                                   
                    n+2       n+1
               =
                  n+2−k         k
                                      
                    n+2           n          n
               =                         +
                  n+2−k         k−1          k
                                                                 
                    n+2                  n!                 n!
               =                                     +
                  n + 2 − k (k − 1)!(n − k + 1)! k!(n − k)!
                        (n + 1)!                    n!               n+2            n!
               =                        +                       +            ·
                  (k − 1)!(n + 2 − k)! (k − 1)!(n + 2 − k)! n + 2 − k k!(n − k)!
                        (n + 1)!                 n!         k + (n + 2)(n + 1 − k)
               =                        +                 ·
                  (k − 1)!(n + 2 − k)! k!(n + 1 − k)!             n+2−k
                        (n + 1)!                 n!         k + (n + 1)(n + 1 − k) + n + 1 − k
               =                        +                 ·
                  (k − 1)!(n + 2 − k)! k!(n + 1 − k)!                   n+2−k
                        (n + 1)!                 n!         (n + 1)(n + 1 − k) + n + 1
               =                        +                 ·
                  (k − 1)!(n + 2 − k)! k!(n + 1 − k)!               n+2−k
                        (n + 1)!                 n!         (n + 1)(n + 2 − k)
               =                        +                 ·
                  (k − 1)!(n + 2 − k)! k!(n + 1 − k)!           n+2−k
                                                                              
                        (n + 1)!              (n + 1)!         n+1        n+1
               =                        +                  =          +           .
                  (k − 1)!(n + 2 − k)! k!(n + 1 − k)!          k−1          k
                                                                 Pn       n
Theorem 30. For any a, b ∈ R, n ∈ N, (a + b)n =                                   ak bn−k , where in this theorem we
                                                                              
                                                                      k=0 k
mean 00 = 1.
                                                       20
Proof. By induction. For n = 0, this holds in the sense of 1 = 1.
   Assume that this holds for n. Then,
               (a + b)n+1 = (a + b)n · (a + b)
                                     n  
                                    X     n k n−k
                          = (a + b)           a b
                                          k
                                    k=0
                             n                    n  
                            X    n k+1 n−k X n k n−k+1
                          =          a b        +           a b
                                  k                     k
                            k=0                    k=0
                            n+1
                            X n                       n  
                                           k n+1−k
                                                       X     n k n−k+1
                          =               a b       +           a b
                                  k−1                        k
                            k=1                        k=0
                             n            
                            X         n          n
                          =                 +         ak bn+1−k + an+1 b0 + a0 bn+1
                                    k−1          k
                            k=1
                            n+1
                            X n + 1
                          =               ak bn+1−k
                                    k
                             k=0
• (x + y)2 = x2 + 2xy + y 2
and so on.
                                                21
                                                                y
                                                  22
                             y
                                                                                    √
         Figure 11: Left: the graph of y = x5 − 2x3 + 1. Right: the graph of y =        x.
y y
x x
                                                                                             √
Figure 12: Left: the graph of x2 + y 2 = 1, not a function of x. Right: the graph of y =         1 − x2 .
                                                                      y
                                 y
                                                                                   x
Figure 13: Left: the graph of y = |x|. Right: the graph of y = sign x, with a “jump” at x = 0.
                                                 23
                                                y
Operations on functions
When we have two or more functions, we can produce more functions. Let f (x) be a function
with domain S and g(x) a function with domain T .
    We say that a function f (x) is injective if for any pair x1 6= x2 in the domain, it holds that
f (x1 ) 6= f (x2 ). Similarly, we say that a function f (x) is surjective if the range is R. A function
which is both injective and surjective is said to be bijective.
    For example, f (x) = x is injective and surjective (hence bijective), but f (x) = x2 is neither
injective nor surjective. But if we consider f (x) = x2 with the restricted domain [0, ∞), it is
injective: for positive numbers x1 6= x2 , x21 6= x22 .
    For an injective function f (x), we can define the inverse function f −1 : the domain of f −1
is the range R of f , and it assigns to f (x) the number x: it is characterized by f −1 (f (x)) = x.
Its graph (its formal definition) is given by {(x, y) ∈ R × R : x ∈ R, x = f (y)}. The range of f −1
is the domain of f .
    For example, consider f (x) = x2 on the domain [0, ∞). The range of f is [0, ∞), hence the
domain of f −1 is [0, ∞). For any x ∈ [0, ∞), we should have f −1 (f (x)) = f −1 (x2 ) = x, therefore,
             √
f −1 (x) = x.
                                                    24
                                           y
                                                          √
                      Figure 15: The graphs of y =            x and y = x2 on [0, ∞).
                                         P1                    P1
Proof. By induction. For n = 1,            k=1 a1 = |a1 | =      k=1 |ak | is obvious. Assuming the
inequality for n, we have
                n+1
                X            n
                             X
                      ak =          ak + an+1
                k=1          k=1
                             Xn
                         ≤          ak + |an+1 |                              by Theorem
                             k=1
                             n
                             X
                         ≤         |ak | + |an+1 |                by induction hypothesis
                             k=1
                             n+1
                             X
                         =         |ak |
                             k=1
                                                     25
Oct 06. Sequence and convergence of sequence.
Convergence of sequences
We saw sequences of real numbers a1 , a2 , · · · . A sequence can be infinite, that is, it continues
infinitely. For example,
   • a1 = 1, a2 = 2 and in general, an = n.
   • a1 = 1, a2 = 4 and in general, an = n2 .
A sequence can be considered as a function with the domain N.
   Among sequences, we have seen the following:
   • a1 = 1, a2 =     1
                      2   and an = n1 .
   • a1 = 21 , a2 =   3
                      4   and an = 1 −    2n .
                                           1
Intuitively, the first of them gets closer and closer to 0, while the second one gets closer and
closer to 1. But what does it mean that it gets closer to a number?
    We make precise the notion that a sequence get “arbitrarily” close to a number as follows.
Definition 35. Let {an } be a sequence of real numbers. If there is L ∈ R such that for each
 > 0 there is N such that for n ≥ N it holds that |an − L| < , we say that {an } converges
to L.
   We write this situation as limn→∞ an = L, or simply an → L.
Example 36. Let us see some convergent sequences.
   • a1 = 1, a2 = 12 and an = n1 . We expect that this sequence converges to 0. Indeed, for any
      > 0, there is N such that N1 <  (the Archimedean property). Furthermore, if n > N ,
     then | n1 − 0| = n1 < N1 < , therefore, with L = 0, we have that {an } converges to 0.
   Note that
   • If {an } converges to L, then it does not converge to any other number. Indeed, if x 6= L,
     then take N such that |an − L| < 12 |L − x| for n > N . Then by the triangle inequality
     |L − x| < |an − x| + |an − L|, and hence |an − x| > |x − L| − |an − L| > 12 |L − x| 6= 0.
     Therefore, {an } does not converge to x.
   • The sequence a1 = 1, a2 = 0, a3 = 1, · · · , an =     1
                                                           2 (1   − (−1)n ) does not converge to any
     number.
   • In general, if for any x there is an Nx ∈ N such that for n > Nx it holds that |an | > x,
     then we say that {an } diverges.
                                                      26
      −                   
       (                   )
             0 a100 a10        a5 a4       a3         a2                          a1
0.5
1 2 3 4 5 10
0.5
1 2 3 4 5 10
                                                      27
                 0    a8 = b3 a4 = b2            a2 = b1
Lemma 38. Let {an } be a nondecreasing sequence and bounded above. Then an converges to a
certain real number L ∈ R.
Proof. Let A = {an : n ∈ N}. As {an } is bounded above, A is bounded above. We put L = sup A.
By Lemma 10, for each  > 0, there is N ∈ N such that L −  < aN . As an is nondecreasing,
we have L −  < an for all n > N . On the other hand, we have an ≤ L because L = sup A.
Altogether, |an − L| <  for such n. As n was arbitrary, an converges to L.
Proof.    • Assume that an → L. Given, say 1, there is N such that |an − L| < 1 for n > N ,
      hence |an | < L+1 for n > N . Then, we can take a number L̃ such that |a1 |, · · · , |aN −1 | < L̃
      and L + 1 < L̃.
   • Let  > 0 be arbitrary. There are N1 , N2 ∈ N such that for n > N1 (respectively n > N2 )
     it holds that |an − L| < 2 (respectively |bn − M | < 2 ). Let N be the largest of N1 , N2 .
     Then we have
                                                                           
                         |an + bn − L − M | ≤ |an − L| + |bn − M | <       + = .
                                                                         2 2
      hence an + bn converges to L + M .
      As for the product, given  > 0, we take N such that |an −L| < 2(|M|+1) , |bn −M | <         
                                                                                                2(|L|+1)
      and |bn | < |M | + 1 for n > N (this can be done as in the case of sum). Then
                                                    28
     which shows the desired convergence.
                                                                  |M |
     We prove    1
                bn   →M . If bn → M and M 6= 0, then |bn − M | < 2 for sufficiently large n,
                      1
     which shows      1
                     bn   →      M.
                                 1
                                      Now   an
                                            bn   →   L
                                                     M   follows from this and the product with an .
   • For any  > 0, there is N such that for n > N it holds that |an | > 1 , that is                        1
                                                                                                            an   < , hence
     an converges to 0.
      1
   We denote a−n =        an .
                           1
Proof.    • If a > 1, we can write a = 1 + y where y > 0. By the binomial theorem, we have
                                              n  
                              n          n
                                             X    n k n−k
                             a = (1 + y) =           1 y    > 1 + ny,
                                                  k
                                                           k=0
     by only taking the terms k = n, n − 1. Now it is clear that for any x there is large enough
     n such that 1 + ny > x, therefore, x < 1 + ny < an , that is, an diverges.
   • If 0 < a < 1, then
                      > 1 and ( a1 )n diverges. Therefore, an = ( a1 )−n converges to 0.
                              1
                              a
                                        n)
   • We know that bn = nk=1 ak = a(1−a
                                     1−a , and a → 0.
                                                 n
                      P
   When the sequence converges, it converges to only one number. In this way, we can say that
a decimal representation a0 .a1 a2 a3 · · · defines a real number.
   Now we can prove that any repeating decimal representation gives a rational number. For
example consider 0.123123123 · · · . This can be written as
                                                                                          n
                                                                                          X
             0.1 + 0.02 + 0.003 + 0.0001 + 0.00002 + 0.000003 + · · · =                         ak 10−k ,
                                                                                          k=0
                                                            29
where a1 = 1, a2 = 2, a3 =P3, a4 = 1, a5 = 2, a6 = 3, · · · . It is easy to see that this is equal to
0.123 + 0.000123 + · · · = nk=1 (100a3k+1 + 10a3k+2 + a3k+3 )1000−k . We know that this sum
converges and compute
                      n
                      X                                         n
                                                                X
                                                         −k
                        (100a3k+1 + 10a3k+2 + a3k+3 )1000 = 123   1000−k
                      k=1                                                      k=1
                              1000−1     123
                      → 123         −1
                                       =     .
                            1 − 1000     999
Proposition 42. Any real number given by a repeating decimal representation is rational.
Proof. Indeed, let us take a repeating sequence 0 ≤ an ≤ 9 as above. That is, there is m ∈ N
such that an+m = am .
   Then, for j, ` ∈ N,
              j`
              X                 `
                                X              m
                                               X
                    ak = a0 +       10−jm            ak 10m−k
              k=0               j=1            k=1
                                  m
                                 X                   10−m (1 − 10−j` )
                      = a0 + (        ak 10m−k )
                                                        1 − 10−m
                                 k=1
                                  m                                      m
                                 X
                                            m−k          10−m             X             1
                      → a0 + (         ak 10         )          = a 0 + (   ak 10m−k ) m
                                                       1 − 10−m                       10 − 1
                                 k=1                                     k=1
Theorem 43. For any real number a there is a sequence an of rational numbers such that
an → a.
Proof. Take the decimal representation of a, truncate it to the n-th digit, and call it an . Then
{an } are rational and an → a.
Continuity of functions
Let us go back to studying functions. Among functions, we saw the sign function
                                                  if x > 0
                                            
                                            1
                                            
                                   sign x := 0    if x = 0
                                             −1 if x < 0
                                            
                                            
Definition 44. Let f be a function defined on S (the domain), and let a ∈ R such that there is
a sequence xn ∈ S, xn 6= a such that xn → a. We write
                                                     lim f (x) = L
                                                     x→a
if for any  > 0 there is δ > 0 such that |f (x) − L| <  for any x 6= a, |x − a| < δ.
                        (
                         1             if x 6= 0
Example 45. Let f (x) =                          .
                         0             if x = 0
                                                           30
                                  y = f (x)
L+
L−
a− a+
   • Consider a = 2. Then, for any , we can take δ = 21 and |f (x) − 1| = |1 − 1| = 0 for any
     x ∈ (2 − δ, 2 + δ) = ( 23 , 25 ). Therefore, limx→2 f (x) = 1. A similar situation holds for any
     x 6= 0.
   • Consider a = 0. Then, for any x 6= 0, f (x) = 1, hence again we have limx→0 f (x) = 1,
     although f (0) = 0 by definition.
• For the function sign x (Figure 13), there is no limit limx→0 f (x) at x = 0.
    The limit makes precise the concept of “approaching a point”. The absence of “jump” can
also be formalized using limit.
Definition 46. Let f be a function defined on S (the domain), and let a ∈ S (this time a is in
the domain) such that there is a sequence xn ∈ S, xn 6= a such that xn → a. We say that f is
continuous at a if limx→a f (x) = f (a). We say that f is continuous on S if it is continuous at
each point in S.
Example 47.      • The function sign x is not continuous at x = 0, because it does not have
    limx→0 sign x.
                         (
                          1 if x 6= 0
  • The function f (x) =               is not continuous at x = 0, because limx→0 f (x) = 1 6=
                          0 if x = 0
    0 = f (0).
   • The function f (x) = c is continuous. Indeed, let us fix a ∈ R. For any , |f (x) − c| =
     |c − c| = 0 < , hence limx→a f (x) = c = f (a).
                                                31
                                                              y
                                                       (
                                                        1                  if x 6= 0
                           Figure 20: The graph of y =                               .
                                                        0                  if x = 0
   • The function f (x) = x is continuous. Indeed, let us fix a ∈ R. Then, for each  > 0, we
     take δ =  and for |h| < δ =  it holds that |f (a + h) − a| = |a + h − a| = |h| < δ = ,
     therefore, limx→a f (x) = a = f (a).
Theorem 48. Let f, g be functions defined on S, and let a such that there is {xn } ⊂ S, xn → a.
Assume that limx→a f (x) = L and limx→a g(x) = M . Then
• Let δ > 0 such that |g(x)−M | < 1 for x such that |x−a| < δ, x 6= a. Then |g(x)| < |M |+1.
   • For a given  > 0, let δ > 0 such that |f (x) − L| < 2 , |g(x) − M | < 2 for |x − a| < δ, x 6= a.
     Then |f (x) + g(x) − L − M | < 2 + 2 = , which shows the desired limit.
     For the product, for a given  > 0, let δ > 0 such that |f (x) − L| < 2(|M|+1) , |g(x) − M | <
     2(|L|+1) and |g(x)| < |M | + 1 for |x − a| < δ, x 6= a. Then |f (x)g(x) − LM | = |f (x) −
         
                                                         32
      x 6= a, |x − a| < δ and hence |g(x)| > |M  |
                                                2 , in particular g(x) 6= 0. Now, there is δ̃ > 0, δ̃ < δ
                                                                          2
      such that for x 6= a, |x − a| < δ̃ it holds that |g(x) − M | < M2 . Then
                                                                   M 2
                                    1    1   |M − g(x)|             2
                                       −   =             <          M2
                                                                          = ,
                                   g(x) M     |M ||g(x)|
                                                                    2
If f, g are continuous, then limx→a f (x) = f (a), limx→a g(x) = g(a), hence limx→a (f (x)+g(x)) =
f (a) + g(a), limx→a f (x)g(x) = f (a)g(a), limx→a fg(x)
                                                      (x)
                                                          = fg(a)
                                                              (a)
                                                                  .
Theorem 49. Let f be a function defined on S. f is continuous at a ∈ S, that is, limx→a f (x) =
f (a) if and only if it holds that f (xn ) → f (a) for all sequences {xn } in S such that xn → a, xn 6=
a.
Proof. Assume that limx→a f (x) = f (a). Then, for each  > 0, there is δ > 0 such that if
|x − a| < δ, then it holds that |f (x) − f (a)| < . Let us take any sequence {xn } such that
xn → a. This means that, for δ above, there is N such that |xn − a| < δ for n > N . Then
by the observation above, we have |f (xn ) − f (a)| < . This shows that, for n > N , we have
|f (xn ) − f (a)| < . Therefore, for the given  we found N such that |f (xn ) − f (a)| <  for n > N .
This means that f (xn ) → f (a).
     Conversely, assume that f (xn ) → f (a) for all sequences {xn } such that xn → a, xn 6= a.
To do a proof by contradiction, let us assue that there is  > 0 for which for all δ there is
x ∈ S, x 6= a such that |x − a| < δ but |f (x) − f (a)| > . Let us take δn = n1 . For each δn there
is xn ∈ S such that |xn − a| < n1 , x 6= a but |f (xn ) − f (a)| > . Then, it is clear that xn → a,
but f (xn ) is not converging to f (a), which contradicts the assumption. Therefore, it must hold
that limx→a f (x) = f (a).
Proof. Assume the contrary, that is, x > a. Then there is N such that |xn − x| < x−a
                                                                                   2 , and
                                   |x−a| |x−a|
xn − a = xn − x + x − a > |x − a| − 2 = 2 , which contradicts xn ≤ a. Therefore, x ≤ a.
                                                   33
               y = f (x)                                             y = f (x)
f (b)
f (a) +
f (a) − 
                                                            c
                                                         f (a)
                                                x
                             a
                                                                 [                      ]
                   a − δ xn a + δ                                a                x     b
Figure 21: Left: continuity of f and a sequence xn → a. Right: the intermediate value c is taken
at x.
Proof. Let c ∈ (f (a), f (b)), and we define A = {x ∈ [a, b] : f (x) < c}. A is bounded above,
because it is contained in [a, b], therefore, we can take x = sup A. By Lemma 10, for each
n, there is xn ∈ A such that x − n1 < xn , hence xn → x. Since f is continuous, we have
f (x) = limn→∞ f (xn ). On the other hand, xn ∈ A, hence f (xn ) < c and hence f (x) ≤ c by
Lemma 50.
    x 6= b because f (b) > c ≥ f (x). Therefore, we can take a sequence xn > x, xn → x in
the interval [x, b], and then f (xn ) ≥ c because xn ∈
                                                     / A. By continuity of f , we have f (x) =
limn f (xn ) ≥ c. Altogether, we have f (x) = c.
Theorem 52. In the situation above, if f and g are continuous, then g ◦ f is continuous as well.
Proof. We take a ∈ S. Given  > 0, there is δ1 > 0 such that |g(y)−g(f (a))| <  if |y−f (a)| < δ1 ,
by continuity of g. For this δ1 , there is δ2 > 0 such that |f (x) − f (a)| < δ1 if |x − a| < δ2 .
Altogether, we have |g(f (x)) − g(f (a))| <  if |x − a| < δ2 , hence we have proved the continuity
of g ◦ f .
                                                    34
                                                     y = f (x)
y = f −1 (x)
                   y+δ
                        )
                   y−δ
                        )
x− x+
Figure 22: The continuity of the inverse function. For a given , we can take δ.
Theorem 55. Let f be a monotonically increasing continuous function on an interval [a, b]. Then
the inverse function f −1 defined on [f (a), f (b)] is monotonically increasing and continuous.
Proof. Note that the domain of f −1 is [f (a), f (b)] by the intermediate value theorem (continuity
of f is needed here).
    Let us first show that f −1 is monotonically increasing. For each y1 < y2 , y1 , y2 ∈ [f (a), f (b)],
there are x1 , x2 ∈ S such that y1 = f (x1 ) and y2 = f (x2 ) by the intermediate value theorem and
we have x1 < x2 by monotonicity of f . This means that f −1 (y1 ) = x1 < x2 = f −1 (y2 ), that is,
f −1 is monotonically increasing.
    Let x0 ∈ (a, b). For a given  > 0, we take δ as the smaller of f (x0 + ) and f (x0 − ) (if
x0 ±  are not in S, replace them by a or b). Then for any y ∈ (f (x0 ) − δ, f (x0 ) + δ), we have
f −1 (y) ∈ S ∩ (x0 − , x0 + ) by monotonicity of f . This is the continuity of f −1 .
    If x0 = a or b, then we only have to consider one side.
                                                   35
                                                                1
Furthermore, let m ∈ N. Then for y = (xmp ) mq we have (y q )m = y mq = xmp = (xp )m , and hence
                       1
y q = xp and y = (xp ) q .
                                  p
    Therefore, we can write y = x q and no confusion arises.
   We would like to define ax by limn→∞ axn , where xn ∈ Q and xn → x ∈ R. For this purpose,
we need some properties of sequences.
Lemma 57. If an ≤ bn and an → L, bn → M , then L ≤ M .
Proof. Consider bn − an ≥ 0. By Lemma 50, bn − an → M − L ≥ 0, hence M ≥ L.
    For a statement like “there is N such that for n > N ...” we say simply that “for sufficiently
large n...”.
                                                                      36
                                                        sup{ak : k ≥ 5}
inf{ak : k ≥ 6}
                 an       (n−1)y 2            (n−1)y 2                            an
     and hence   n    >       2       . As          2    → ∞, so does it hold     n    → ∞.
                                                                           n
                                                                                                           n
                                                                                                                p
                                                1                   1
                                                                          ap                  an           ap
     For a general p ∈ N, we take a , then 1 < a and
                                                p                   p
                                                                           n   → ∞, hence     np   =        n        → ∞.
                                                         1
   • Let  > 0. We need prove that n n < 1 +  for sufficiently large n. Equivalently,n n <
                                                        (1+)n
     (1 + ) . This follows from the previous point that n → ∞, in particular, (1+)
             n
                                                                                   n    >1
     for sufficiently large n.
           1     1
   • 1 < a n < n n for a < n, therefore the claim follows from Theorem 58.
Definition 60. A sequence an is said to be a Cauchy sequence if for each  > 0 there is N ∈ N
such that |am − an | <  for m, n > N .
    Differently from the convergence to a number L, this says that two elements in the sequence
are close to each other for large enough m, n.
Lemma 61. A sequence an is convergent if and only if it is a Cauchy sequence.
Proof. If an → L, then for any  > 0 we can take N such that |an − L| < 2 for n > N , therefore,
if n, m > N , then |am − L| < 2 as well and hence |am − an | ≤ |am − L| + |L − an | < .
     Conversely, if an is Cauchy, then it is bounded. Indeed, we take N such that |am − aN +1 | <
1, then this means that |am | < |aN +1 | + 1. Then we can take the largest number among
|a1 |, · · · , |aN |, |aN +1 | + 1 as a bound. Next, we consider the sequence
bn = inf{ak : k ≥ n}.
                                                               37
This is well-defined because {ak : k ≥ n} is bounded. Furthermore, this sequence is increasing
because {ak : k ≥ n + 1} ⊃ {ak : κ ≥ n}. Therefore, bn converges to some number L. Similarly,
with cn = sup{ak : k ≥ n}, this is bounded and decreasing, hence converges to M .
   Note that bn ≤ an ≤ cn , therefore, L ≤ M . Actually, we have L = M . Indeed, for given  > 0,
we can find sufficiently large `, m, n such that |cn −M | < 5 , |a` −cn | < 5 , |bn −L| < 5 , |am −bn | < 5
and |a` − am | < 5 . This implies that |M − L| <  for arbitrary  > 0, hence it must hold M = L.
   Now, as bn , cn → L = M and bn ≤ an ≤ cn , we have an → L by Theorem 58.
   As we said in the proof, for an arbitrary real number x ∈ R, we define the exponential
function by
                           ax := lim axn , where xn ∈ Q, xn → x.
                                       n→∞
The exponential functions appear in various natural phenomena. It happens typically when
we consider a collection of objects that increase or decrease independently (such as colonies of
bacteria, radioactive nuclei, and so on). See Figure 24.
• ax ay = ax+y .
    • (ax )y = axy .
Proof.     • Let x < y. Then we take sequences xn → x, yn → y, where xn , yn ∈ Q. Then
      for sufficiently large n we have xn < z1 < z2 < yn where z1 , z2 ∈ Q, and therefore,
      ax ≤ az1 < az2 ≤ ay .
      As for continuity, let us take x, xn ∈ R and xn → x. Then there is yn ∈ Q such that
      |axn − ayn | < n1 and |xn − yn | < n1 . Then yn → x as well, hence ayn → ax , while
      ayn − n1 < axn < ayn + n1 , therefore, axn → ax .
    • Take sequences xn → x, yn → y, xn , yn ∈ Q. We have axn ayn = axn +yn , and xn +yn → x+y,
      therefore, ax ay = ax+y .
                                                      38
                                                        y = 3x            y = 2x
y = 0.7x
      • Take sequences xn → x, yn → y, xn , yn ∈ Q. For fixed m, we have (axn )ym → (ax )ym and
        this is equal to axn ym → axym . Now we take the limit m → ∞ and obtain (ax )y = axy by
        continuity.
Napier’s number
Let us introduce Napier’s number. We take
                              1 n               1 n+1
                                                        
                                                           1
                   en = 1 +       ,    En = 1 +       = 1+     en
                              n                 n          n
because 1 + 2n + nx ≥ 0. This completes the induction for even and odd numbers.
                                                        39
   • en is monotinically increasing, that is, en < en+1 . Indeed,
                           en          (1 + n1 )n   (1 + n1 )n      1 n    n−1 n−1
                                 =         1 n−1  =    n n−1 = (1 + n ) · ( n )
                          en−1       (1 + n−1  )    ( n−1 )
                                     (1 + n1 )n · ( n−1
                                                     n )
                                                        n        (1 + n1 )n · (1 − n1 )n
                                 =            n−1           =             n−1
                                                n                          n
                                             1 n            1
                                     (1   − n2 )      1−    n
                                 =        n−1     ≥         1   = 1.
                                           n          1−    n
                                 En    (1 + n1 )n+1        1 + n1           1 + n1
                                     =         1 n  =    n        n     =      2
                                En−1   (1 + n−1  )    ( n−1 )n ( n+1 )n   ( n2n−1 )n
                                            1 + n1         1 + n1     1+            1
                                                                                    n
                                       =        1    n
                                                       ≤        n   <               1   = 1.
                                         (1 + n2 −1 )    1 +  n2 −1   1+            n
   • Now we have that {en } and {En } are convergent. Note also that En − en = en (1 + n1 − 1) =
     en · n1 → 0, because en is bounded, say by M , and n1 → 0, therefore, En − en ≤ Mn → 0.
   We call this limit e, the Napier’s number (sometimes Euler’s number). The function ex
plays a special role in analysis, as we will see below.
Logarithm
Let a > 0, a 6= 1. We have defined the exponential function f (x) = ax , and we have seen that it
is continuous, monotonically increasing if a > 1. If 0 < a < 1, it is monotonically decreasing.
    Let a > 1. We know that an diverges, and hence a−n → 0. By the intermediate value
theorem, we see that the range of ax is R+ . Now we can define the inverse function (everything
is analogous for 0 < a < 1).
Definition 66. The logarithm base a of x loga x is the inverse function f (y) = ay : loga :
R+ 7−→ R and it holds that
                                loga ax = x = aloga x .
We denote log x = loge x = ln x.
    We say that limx→∞ f (x) = ∞ if for each Y > 0 there is X > 0 such that if x > X then
f (x) > Y . Similarly, we define limx→±∞ f (x) = ±∞.
                                                       40
          y = ex
y = log x
                     41
(vi) Let a > 1. Then f (x) = loga x is monotonically increasing and continuous. loga x > 0 if
     and only if x > 1.
                                                              xα
(vii) Let a > 1, α > 0. Then limx→+∞                        loga x   = +∞.
Proof. (i) a1 = a, a0 = 1.
    Logarithm is extremely useful in natural science. When we have a data which grows expo-
nentially, we can take the log of the value and plot it to a plane, then they lie on a straight line.
The exponent can be read from the slope of the line (this is called the logarithmic scale). In that
case, the logarithm base 10 is often used.
    When y = xp , then we can consider z = log y, w = log x, hence ez = y, ew = y. We have
ez = y = xp = (ew )p = ewp . By taking log of both side, we obtain z = pw. That is, by the
log-log plot, a power relation y = xp is translated into a linear relation z = pw.
     Let L ∈ R, and f is a function defined on (a, ∞). If for each  > 0, there is X such that
|f (x) − L| <  for x > X, then we write that limx→∞ f (x) = L.
                                   1                          x
Example 70. limx→∞                 x     = 0. limx→∞         x−1   = 1.
   Let f (x) defined on (a, b) and L ∈ R. If for each  > 0 there is δ such that |f (x) − L| <  for
x ∈ (a, a + δ), we denote it by limx→a+ f (x), and we call it the right limit of f at a. Similarly,
we write limx→b− f (x) for the left limit.
                                                                      42
                 y = 3x
                                   y = 2x
                                                y = log(3x ) = log 3 · x
y = log(2x ) = log 2 · x
y = x2 w = 2w
                               1                                           w
                          y = x3                                     z=    3
                                   43
Example 71. Let f (x) = sign x. limx→0+ f (x) = 1, limx→0− f (x) = −1.
    If f (x) is defined on (b, a) ∪ (a, c), limx→a f (x) = L exists if and only if both the left and
right limits exist and limx→a+ f (x) = limx→a− f (x) and it is L. We leave the proof to the reader.
Proof. The first statement can be proven similarly to the continuity of the composed function
f (g(x)).
    As for the second point, for a given  we take Y such that |f (y) − L| <  for y > Y . Then,
there is X such that g(x) > Y for x > X. Altogether, |f (g(x)) − L| <  if x > X.
    We call this the change of variables, in the sense that we can calculate limy→y0 f (y) by
calculating limx→x0 f (g(x)) and vice versa.
    For x ∈ R, we denote √ by [x] the largest integer n such that n ≤ x, and call it the integer
part of x. For example, [ 2] = 1, [π] = 3, and so on.
    In this Proposition, n ∈ N and x ∈ R.
                                                                      −1
     Note that   is continuous at x = 1, e, and hence 1 +
                  1
                  x
                                                                  1
                                                                 n−1         → 1 and           1
                                                                                                 1
                                                                                                     n−1   → 1e .
                       n                                                                 (1+ n−1 )
     Altogether, 1 − n1 = 1e = e−1 .
                                                   1
                                                    2 n                         2
                       1 n                      1 n                          1 n
 (ii) limn→∞ 1 +                                         . As limn→∞ 1                = e, this sequence is
                          
                       n2
                              = limn→∞       1+n2
                                                                         +   n2
                                                     n     1       1
     bounded by, say M . Then 1 < 1           + n12 < M n but M n →      1, then by squeezing we have
                                                   
             n
      1 + n12 → 1.
(iii) Note that, if an → a, then bn = an+1 → a as well. Furthermore, if a < b < c and if
      |a − x| < , |c − x| < , then by the triangle inequality we have − < a − x < , hence
      a −  < x < a + . Similarly, c −  < x < c + , and therefore, b −  < x < b +  and hence
      |b − x| < .
                                                    44
                                  n                  n+1
     We know that limn→∞ 1 + n1 = limn→∞ 1 + n+1     1
                                                            = e. Let n = [x], then n ≤ x <
     n + 1 and
                        n+1
                      1
                 1 + n+1        
                                      1 x
                                           
                                                  1 n+1
                                                          
                                                                 1 n
                                                                    
                                                                            1
                                                                              
                        1     <  1  +     <   1 +        =   1 +     ·  1 +     .
                   1 + n+1            x           n              n          n
     Note that the left-hand side and the right-hand side tend to e, because 1+ n+1
                                                                                  1
                                                                                       → 1, 1+ n1 →
                                                 1   n+1
                                            (1+ n+1 )                   1 n
     1. This means that, for a given ,                                            1
                                                                                       − e <  for
                                                                                    
                                                   1
                                              1+ n+1
                                                          − e < , 1 +  n   · 1 +  n
                                                        x
     sufficiently large n. This implies that 1 + x1 − e < .
                                                      
                                                                                                                       1
(iv) By the previous point and a change of variables x1 , note that                         1
                                                                                            x   > 0, limx→0+ (1 + x) x = e.
                                         1
     We have limx→0− (1 + x) x = e as well. So we have checked both the right and left limits.
                              log(1 + x)                  1                   1
                       lim               = lim log(1 + x) x = log lim (1 + x) x = log e = 1,
                      x→0         x        x→0                    x→0
                                                 1
     where we used limx→0 (1 + x) x = e.
                                     t x
                                                                        x t
(vi) Note that limx→∞ 1 +                                           t
                                                                                 = et , where we used the continuity of
                                       
                                     x         = limx→∞       1+    x
                                                                         t
     g(a) = at .
                                                                                  ex −1                  y
(vii) With y = ex − 1, we have log(y + 1) = x and limx→0                            x     = limy→0   log(1+y)   = 1.
                                             ex −e−x
Definition 74.              • sinh x =           2
                ex +e−x
   • cosh x =       2
   • tanh x =      sinh x
                   cosh x
(ii) analogous.
(iii) analogous.
                                             √
Proposition 76.      (i) sinh−1 (x) = log(x + x2 + 1).
                            √
 (ii) cosh−1 (x) = log(x + x2 − 1) for x > 1.
                                                              45
                     y = cosh x
y = sinh x
y = tanh x
                46
Proof.    (i)
                                                                                            √
                                                                                     1 (x + x2 + 1)2 − 1
                                                                              
                       p                    1        p                1
         sinh(log(x +          x2   + y)) =      (x + x2 + 1) −      √             =          √
                                            2                     x + x2 + 1         2    x + x2 + 1
                     √
            1 x2 + 2x x2 + 1 + x2 + 1 − 1
         =               √                = x.
            2         x + x2 + 1
(ii) analogous.
                                                      √
Definition 77.    (i) Arcsinh x = sinh−1 (x) = log(x + x2 + 1).
                                      √
 (ii) Arccosh x = cosh−1 (x) = log(x + x2 − 1) for x > 0.
• We use the figures and the elementary geometry to derive their elementary properties.
   • Then we study their analytic aspects: limit, derivative, integral, Taylor expansion, and so
     on.
    Now, to obtain cos θ and sin θ, we draw the unit circle, and take the point p on the unit circle
such that the x-axis and the segment from the point of origin to p makes an angle of degree θ
going anticlockwise, 0 ≤ θ ≤ 90 (degrees). Then cos θ is defined to be the x-coordinate of the
point p, and sin θ is defined to be the y-coordinate of p.
    We can make a right triangle by drawing the vertical line from this point. If 0 ≤ θ ≤ 90
(degrees), then cos θ is the length of the horizontal side of the triangle, while sin θ defined to be
the length of the vertical side. When θ ≥ 90 (degrees), then cos θ becomes negative.
    There are various ways to represent the angle. Often we use the degrees, which devide the
circle into 360 degrees. Another is called the radian, which defines the angle by the lenght of
the arc on the unit circle. In radian, we have 360 (degrees) = 2π (radian), 180 (degrees) = π
(radian), 90 (degrees) = π2 (radian), 45 (degrees) = π4 (radian) and so on. In this lecture, from
this point we use radian, unless otherwise specified.
    Some important values:
   • sin 0 = 0, cos 0 = 1.
                                √
   • sin π6 = 12 , cos π6 =         2 .
                                     3
   • sin π4 =   √1 , cos
                  2
                           π
                           4   =    √1 .
                                      2
                √
   • sin π3 =     3
                 2 , cos
                           π
                           3   = 12 .
                                                        47
                                                              (cos θ, sin θ)
                                        sin θ
                                                                                  θ
                                 θ
cos θ
Figure 29: The trigonometric functions and their values for general angle θ.
• sin π2 = 1, cos π2 = 0.
    We can extend cos θ and sin θ to all real numbers, considering that for θ > 2π we go around the
circle more than once, and for θ < 0 we go around the circle clockwise. With this understanding,
we have
• cos(−θ) = cos θ
• sin(−θ) = − sin θ.
    In this way, we can consider cos and sin as functions on R. They are continuous, because
if we change slightly the degree, the point p moves only slightly (we do not prove this, as we
introduce these functions only by geometry, without defining the arg length).
    They are related by the formulas cos(θ + π2 ) = − sin θ and sin(θ + π2 ) = cos θ (see Figure 30).
    We introduce also tan θ = cos  θ.
                               sin θ
Some formulas
We often write cos2 θ = (cos θ)2 , sin2 θ = (sin θ)2 , cos3 θ = (cos θ)3 , sin3 θ = (sin θ)3 , etc.
   • cos2 θ + sin2 θ = 1. This is because of the Pytagorean theorem: cos θ and sin θ are the
     length of the horizontal and vertical sides of the right triangle, while the length of the
     longest side is 1.
   • cos 2θ = 2 cos2 θ − 1 = 1 − 2 sin2 θ. Indeed, cos 2θ = cos θ cos θ − sin θ sin θ = cos2 θ − sin2 θ
     and use cos2 θ + sin2 θ = 1.
                                                     48
                                   π
                                                    sin θ
                              θ+   2
                    cos θ
                                            θ
− sin θ cos θ
y = cos θ y = sin θ
y = tan θ
                                       49
                                                   α
                                             B             sin β
                                       β   cos β
                                                             A
                                       α
Figure 32: The formula sin(α + β) = sin α cos β + cos α sin β. A = cos β sin α, B = sin β cos α
and A + B = sin(α + β).
   • sin α cos β =   1
                     2   (sin(α + β) + sin(α − β)). Indeed,
                     1
                       (sin(α + β) + sin(α − β))
                     2
                        1
                     = ((sin α cos β + cos α sin β) + (sin α cos(−β) + cos α sin(−β)))
                        2
                        1
                     = (sin α cos β + cos α sin β + sin α cos β − cos α sin β)
                        2
                     = sin α cos β.
   • cos α sin β =   1
                     2   (sin(α + β) − sin(α − β)).
   • cos α cos β =   1
                     2   (cos(α + β) + cos(α − β)).
   • sin α sin β =   1
                     2   (− cos(α + β) + cos(α − β)).
Some limit
By comparing the areas of the triangles of the sector, we see 21 cos θ sin θ < 2θ < 12 cos θ (see
                                                                                       sin θ
Figure 33), and hence cos θ < θ < cos θ . As we assumed that sin and cos are continuous, and
                                 sin θ   1
                                                      50
                                                             sin θ
                                                         (1, cos θ)
Figure 33: By comparing the areas of the triangles and the sector, we see     1
                                                                              2   cos θ sin θ <   θ
                                                                                                  2   <
2 cos θ .
1 sin θ
                                 (                  (    )       )
                              ( [ )                              ]
Figure 34: Open and closed intervals. An open set include a small “neighborhood” of any point
in it, but a sequence in it might converge to a point outside. A closed subset contains the limit
of any sequence in it, but a point might “touch” other points outside.
Example 79.       • Consider the open interval A = (0, 1). This is open, because for any point
    p ≤ 2 we can take  = p2 and ( p2 , 3p
        1                                                                        1−p
                                         2 ) ⊂ (0, 1). If p > 2 , we can take  = 2 . On the other
                                                              1
    hand, (0, 1) is not closed. Indeed, the sequence an = n belongs to A = (0, 1), but the limit
                                                               1
   • Consider the closed interval B = [0, 1]. This is closed. Indeed, for any convergent sequence
     {an } ⊂ B, an → a, it holds that 0 ≤ an ≤ 1 and hence 0 ≤ a ≤ 1. On the other hand, for
     p = 0, for any , (−, ) 6⊂ B, therefore, B is not open.
   Therefore, the terminology “open” and “closed” for intervals are consistent with those for
general sets we have just introduced.
Proof. Let O be open and assume that Oc is not closed. That is, there is a sequence {an } ⊂ Oc
that converges to a, but a ∈ Oc . Therefore, it must holds a ∈ O. But we can take  > 0 such
that (a − , a + ) ⊂ O, and if an → a, it would have to hold that an ∈ (a − , a + ) ⊂ O, which
contradicts the assumption that {an } ⊂ Oc . Therefore, Oc is closed.
   Conversely, let Oc be closed, and assume that O is not open. As O is not open, there is
a ∈ O such that for any n1 > 0 there is an such that |an − a| < n1 , but an ∈
                                                                            / O. Hence an ∈ Oc .
But with this condition an → a, which contradicts the assumption that O is closed. Therefore,
                                                                           c
O must be open.
                                 (                  (    )       )
                              ( [ )                              ]
Figure 35: Any point in an open set is “protected” from outside. On the other hand, if a set is
not open, there is a point which is not “protected”.
                                               51
                          h                                                  i
                                                           [      ]
Figure 36: Nested invertals. As the sequence {an } contains infinitely many points, one of two
intervals must contain infinitely many of them.
    It is not difficult to prove that any union (even if infinite!) of open sets is again open.
Similarly, any intersection of closed sets is again closed.
    Let us recall that a sequence {an } is called Cauchy if for any given  > 0 there is N such
that for m, n > N it holds that |am − an | < .
    Furthermore, we said that bn is a subsequene of an if there is a growing sequence Nn ∈ N such
that bn = aNn , that is, bn is obtained by skipping some elements in an . Recall that we consider
infinite sequences, that is, the sequence does not stop at any an , but continues infinitely.
Proof. As {an } is bounded, we can find M sufficiently large such that an ∈ [−M, M ]. As
the sequence {an } infinitely many elements, one of the intervals [−M, 0], (0, M ] must contain
infinitely many of them. Therefore, we can take a subsequence bn = amn such that bn are
contained one of them. To fix the idea, assume that bn ∈ (0, M ] (the other case is just analogous).
     As (0, M ] = (0, M       2 ] ∪ ( 2 , M ], one of them must contain infinitely many elements of bn .
                                     M
Therefore, we can take a subsequence cn = bkn such that cn are contained one of them.
     By continuing this procedure, for each n we obtain a subsequence that is contained in an
interval of length 2n−1    M
                              , and the later one is a subsequence of the former. Let us take a subsequence
a1 , b2 , c3 , · · · of the original sequence. Then, for n, m > N , any two elements are contained in an
interval of length 2NM−1 . Therefore, this subsequence is Cauchy. Then it is a convergent sequence
by Lemma 61.
Theorem 82. Let f be a continuous function defined on a bounded closed interval F . Then f
is bounded, that is, there is M > 0 such that |f (x)| < M for x ∈ F .
Proof. Let us suppose the contrary, that for any n > 0 there is xn ∈ F such that |f (xn )| ≥ n.
As {xn } is a sequence in a bounded set F , we can take a convergent subsequence {yn } of {xn }.
As F is closed, yn → y and y ∈ F . By assumption f is continuous, therefore, it must hold that
limn→∞ f (yn ) = f (y). But this is impossible because |f (yn )| ≥ n by our choice. Therefore, f is
bounded.
Example 83.       • Consider the function f (x) = x1 defined on R \ {0}. This is not bounded,
    but when we restrict it to an interval [ n1 , n], it is bounded by n.
                                 (
                                    1
                                        if x ∈ [−1, 1], x 6= 0
  • Consider the function f (x) = x                              . This is defined on a closed interval
                                    0 x=0
    [−1, 1], but not continuous. Therefore, the previous theroem does not apply. Indeed, it is
    not bounded.
                                                    52
                                                   [          ]
Figure 37: A continuous function on a bounded closed interval is bounded. If either of these
conditions are violated, then function can be unbounded.
Figure 38: Left: y = x on x > 0. There are no minimum or maximum. Right y = x2 on R. The
minimum is 0 at x = 0, but there is not maximum. When restricted to [a, b], either a2 or b2 is
the maximum.
Example 85. Note that a function does not necessarily admit maximum or minimum. If it has,
they may depend on the domain.
   • f (x) = x, defined on x > 0, has no maximum or minimum. Indeed, for any x > 0,
     f ( x2 ) = x2 < x and f (2x) = 2x > x.
                                              53
Figure 39: The graph of the function y = x − [x], the decimal part of x. This is bounded, but
has no maximum. The minimum is 0 at x ∈ Z.
Proof. By Theorem 82, f is bounded, say −M < f (x) < M . Then the image A = {f (x) : x ∈ F }
is a bounded set in R, therefore, it admits sup A and inf A. Let us prove that f admits a maximum
(the case for minimum is analogous). For each n there is xn ∈ F such that sup A − n1 < f (xn ).
     As F is bounded, xn admits a convergent subsequence yn , yn → y and y ∈ F because F is
closed. Now, as f is continuous, we have f (y) = limn→∞ f (yn ). As yn is a subsequence, it holds
that sup A − n1 < f (yn ) ≤ sup A. This implies that f (y) = sup A. That is, f attains a maximum
at y.
   • f (x) = x2 is continuous, hence on any closed and bounded F f admits a maximum and a
     minimum. But not on the whole real line R, which is not bounded.
   • f (x) = x − [x] is not continuous, and indeed it does not admit a maximum on [0, 1],
     although [0, 1] is close and bounded.
    Often it is said that a closed and bounded set F ⊂ R is compact. We have seen that in
any sequence {an } in a compact set admits a convergent subsequence (the Bolzano-Weierstrass
theorem), and the limit is in F . Conversely, if a set A has a property that any sequence in it has
a convergent subsequence with the limit in A, then it is compact (bounded and closed): indeed,
A must be bounded because otherwise we could take an unbounded sequence. Furthermore, A
must be closed, because if an ∈ A is a convergent sequence, we can take a convergent subsequence
with the limit a in A, but there is only one limit for an , hence an → a ∈ A, that is, A is closed.
    Let us see another strong property of continuous functions defined on bounded and closed
sets.
    Note the difference with the continuity: a function f is continuous if for each x ∈ S and for
each  there is δ such that |f (y) − f (x)| <  if |y − x| < δ. In other words, the number δ may
change from point x to others.
    On the other hand, uniform continuity asserts that for each  > 0 there is δ that applies to
all x, y ∈ S, hence uniformly in S.
   • f (x) = sin x1 is continuous on {x ∈ R : x > 0} but not uniformly continuous. Indeed, for
      = 21 for any δ > 0, we can take N such that πN     2
                                                            < δ and N odd. Then x = πN2        1
                                                                                        , y = πN ,
     hence |f ( πN ) − f ( πN )| = | sin(πN ) − sin( 2 )| = 1 >  but x − y = πN < δ.
                 1          2                       πN                         1
                                                54
                 Figure 40: Functions continuous but not uniformly continuous.
y = sin θ y = tanh x
   Note that the function f (x) = |x| is continuous. Indeed, if x > 0, then f (x) = x and this is
continuous at x. Similarly, f is continuous at x < 0. Finally, if x = 0, for any  > 0, we take
δ = . Then if |y − x| = |y − 0| < δ, then |y| − |0| = |y − 0| < δ = .
Proof. To prove this by contradiction, assume that there is  > 0 such that for any δ > 0 there
are x, y ∈ F, |x − y| < δ but |f (x) − f (y)| > . In particular, for δ = n1 > 0 there are xn , yn ∈ F
such that |xn − yn | < n1 but |f (xn ) − f (yn )| > . Let xNn be a convergent subsequence of
xn (which exists by Theorem 81) to x̃ ∈ F . Let us extract a subsequence {yNn } of {yn }. As
|x̃ − yNn | ≤ |x̃ − xNn | + |xNn − yNn | → 0, also {yNn } must be convergent to x̃ ∈ F .
     Then limn→∞ |f (xNn ) − f (yNn )| = |f (x̃) − f (x̃)| = 0, as f is continuous (note that the
absolute value is continuous). But this contradicts the assumption that |f (xNn ) − f (yNn )| > .
     Therefore, for all  there exists δ such that for all x, y ∈ F, |x−y| < δ vale |f (x)−f (y)| < .
                                                 55
           Figure 42: The slope at a point as the limit of the slopes of secant lines.
                                             f (x0 + h) − f (x0 )
                                          lim                     .
                                         h→0          h
      If this limit exists, it is called the derivative of f at x0 and it is denoted by f 0 (x0 ) =
      limh→0 f (x0 +h)−f
                     h
                         (x0 )               df
                               , Df (x0 ) or dx (x0 ).
    The derivative at the point x0 is defined to be the limit of average rates of change. In this
sense, the derivative represents the rate of change af the point x0 . If f (t) represents the position
of a car at time t, then f 0 (t) is the speed of the car at time t.
                                                 56
                                 A
                   0.5A
0.5 1
Figure 43: The slope of the straight line at a point as the limit of the slopes of secant lines.
                                                                                                   f (x+h)−f (x)
  • Let A ∈ R and f (x) = Ax for x ∈ R (a straight line). For any x ∈ R,                                 h         =
    A(x+h)−Ax            0
        h     = Ah
                 h = A, f (x) = A.
                                                                                                   f (x+h)−f (x)
  • Let A ∈ R and f (x) = Ax2 for x ∈ R (parabola). For any x ∈ R,                                       h         =
    A(x+h)2 −Ax2        A(2xh+h2 )
          h        =        h        = A(2x + h), therefore,       f 0 (x)
                                                      = limh→0 A(2x + h) = 2xA.
                                                                    Pn
  • Let n ∈ N and f (x) = Axn for x ∈ R. It holds that (x + h)n =         n k n−k
                                                                           
                                                                      k=0 k x h   =
                 n(n−1) n−2 2
     n
    x + nx n−1 h+ 2 x      h + . . . . For any x ∈ R,
                                                                                n(n−1) n−2 2
      f (x + h) − f (x)   A(x + h)n − Axn        A(xn + nxn−1 h +                  2  x   h      + . . . hn − xn )
                        =                     =
              h                   h                                                 h
              n−1       n(n − 1) n−2
      = Anx       +A·           x    h + · · · hn−1 ,
                           2
                                                   n(n−1) n−2
    therefore, f 0 (x) = limh→0 A(nxn−1 +             2  x    h     + · · · hn−1 ) = Anxn−1 .
  • Let f (x) =    1
                   x   for x ∈ R, x 6= 0. For any x ∈ R, x 6= 0,
                                                   1      1
                         f (x + h) − f (x)        x+h −   x       x − (x + h)       1
                                           =                  =               =−
                                 h                   h             hx(x + h)     x(x + h)
    therefore, f 0 (x) = limh→0 − x(x+h)
                                     1
                                         = − x12 .
                                                         57
   • f (x) = sin x, x ∈ R. Recall the formula cos α sin β = 12 (sin(α + β) − sin(α − β)). Then,
     with α = x + h2 , β = h2 , we have sin(x + h) − sin x = 2 cos(x + h2 ) sin h2 , therefore,
                                                  sin(x + h) − sin x
                                     f 0 (x) = lim
                                              h→0         h
                                                  2 cos(x + h2 ) sin h2
                                            = lim
                                              h→0         h
                                                                        sin h
                                                             
                                                            h
                                            = lim cos x +         lim h 2
                                              h→0           2 h→0 2
                                            = cos x · 1 = cos x
      (by the continuity of cos x and one of the notable limits limh→0              sin h
                                                                                      h     = 1 and the change of
      variable h2 replacing h.
   • f (x) = cos x, x ∈ R. Recall the formula − sin α sin β = 12 (cos(α + β) − cos(α − β)) Then,
     with x + h2 , β = h2 , we have cos(x + h) − cos x = −2 sin(x + h2 ) sin h2
                                                cos(x + h) − cos x
                                    f 0 (x) = lim
                                            h→0         h
                                                −2 sin(x + h2 ) sin h2
                                          = lim
                                            h→0          h
                                                                       sin h
                                                             
                                                           h
                                          = − lim sin x +        lim h 2
                                              h→0          2 h→0 2
                                          = − sin x · 1 = − sin x
                                                                  f (x0 + h) − f (x0 )
     lim f (x) − f (x0 ) = lim f (x0 + h) − f (x0 ) = lim                              · h = f 0 (x0 ) · 0 = 0.
    x→x0                   h→0                              h→0            h
That is, limx→x0 f (x) = f (x0 ).
Definition 93. Let f : [x0 − δ, x0 ] → R where δ > 0. If the following limit limh→0− f (x0 +h)−f  h
                                                                                                       (x0 )
exists (from the left), f is said to be left-differentiable at x0 , and this limit is denoted by D− f (x0 ),
the left derivative. Similarly, we define the right derivative.
                                                                         |0+h|−0
Example 94. Let f (x) = |x|, x0 = 0. D− f (0) = limh→0−                     h      = limh→0−     −h
                                                                                                 h    = −1, while
D+ f (0) = limh→0+ hh = 1.
                                                       58
                                                          y
Figure 44: The graph of y = |x|, which has left and right derivatives, but they do not coincide.
Theorem 97. Let f, g be functions on open intervals. The following hold if f, g are differentiable
at x (or f at g(x) for the chain rule):
                                                     59
  f (x)g(x), and g is continuous at x because it is differentiable there:
                f (x + h)g(x + h) − f (x)g(x)
            lim
            h→0               h
                   f (x + h)g(x + h) − f (x)g(x + h) + f (x)g(x + h) − f (x)g(x)
            = lim
              h→0                                 h
                   f (x + h)g(x + h) − f (x)g(x + h)         f (x)g(x + h) − f (x)g(x)
            = lim                                     + lim
              h→0                   h                   h→0              h
                   f (x + h) − f (x)                          g(x + h) − g(x)
            = lim                    lim g(x + h) + f (x) lim
              h→0          h         h→0                  h→0        h
            = Df (x)g(x) + f (x)Dg(x).
                          f (x+h)       f (x)
                          g(x+h)    −   g(x)
                   lim
                   h→0       h
                         f (x + h)g(x) − f (x)g(x + h)
                   = lim
                     h→0         g(x + h)g(x)h
                         f (x + h)g(x) − f (x)g(x) + f (x)g(x) − f (x)g(x + h)
                   = lim
                     h→0                     g(x + h)g(x)h
                         (f (x + h) − f (x))g(x) − f (x)(g(x + h) − g(x))
                   = lim
                     h→0                  g(x + h)g(x)h
                     Df (x)g(x) − f (x)Dg(x)
                   =                           .
                               g(x)2
                                                       60
      where in the second equality we used the change of variables z = f (w). The case where f
      is monotonically decreasing is analogous.
Example 98.         • Let f (x) = x4 + 3x2 − 34. Then Df (x) = 4x3 + 6x.
                   x2 +1                                          2x(x−2)−(x2 +1)·1       x2 −4x−1
   • Let f (x) =   x−2 .   Then, for x 6= 2, Df (x) =                  (x−2)2
                                                                                      =    (x−2)2
                                                                                                   .
• For a > 0, it holds that ax = (elog a )x = elog a·x . Indeed, by the chain rule,
   • Let a > 0 and f (x) = xa for x > 0. f (x) = exp(log x · a), and by the chain rule,
                                                                      a                   a
              Df (x) = D(log x · a)D(exp)(log x · a) =                  · exp(log x · a) = · xa = axa−1 .
                                                                      x                   x
      For a < 0, we consider f (x) = xa = x1a and we obtain the same formula f 0 (x) = axa−1 .
      For a = 0, because xa = 1, we have D(x0 ) = D(1) = 0.
                                cos x·cos x−sin x·(− sin x)
   • D tan x = D( cos
                  sin x
                      x) =                 cos2 x
                                                              =     1
                                                                  cos2 x
                                                                         .
   • f (y) = arctan y. That is, f (y) = g −1 (y), where g(x) = tan x restricted to [− π2 , π2 ]. By the
     formula for the inverse function, we have Df (y) = Dg(x)1
                                                                = cos2 x, where y = g(x) = tan x.
                            2              2
      Therefore, y 2 = cos
                        sin x
                           2x =
                                1−cos x
                                 cos2 x
                                         , and cos2 x =                1
                                                                     1+y 2
                                                                           .   By substituting this in the previous
      result, D arctan y = Df (y) = 1+y 1
                                          2.
Indeed, this is of the form y = Ax + B with A = f 0 (x0 ) and B = y0 − f 0 (x0 )x0 , and passes the
point (x0 , y0 ).
                                                          61
                                             y1 −y0
                                      y=     x1 −x0 (x   − x0 ) + y0
                           y1
                           y0
x0 x1
                                                                       y1 −y0
                        Figure 45: The slope of the straight line is   x1 −x0 .
Figure 46: The tangent lines to the graphs of x2 , cos x. Their equations are y = 2(x − 1) + 1, y =
−(x − π2 ), respectively.
    When the slope is positive, the line goes upwards (when one goes to the right), while the line
goes downwards when the slope is negative. When the slope is 0, it is a holizontal line. The
vertical line is represented by the equation x = a, and this is not of the form y = Ax + B.
    If we draw these lines, they are almost always indeed tangent, but in some cases they closs
the graph.
Example 100. Let f (x) = x3 − x. When we consider this as a function on R, there is no global
maximum or mininum, but there are local maximum and minimum at x = √13 , − √13 , respectively
(we will see why they are these points later). If we restrict the funcion to [−2, 2], then −2, 2 are
the global minimum and the global maximum, respectively.
Theorem 101. Let f be defined on an open interval I and assume that f takes a local minimum
(or a local maximum) at the point c ∈ I. If f is differentiable at c, then f 0 (c) = 0.
                                                  62
                   [                        ]
Figure 47: Left:The graph of y = x3 − x. The local maximum and minimum are x = √13 , − √13 ,
respectively. When restricted to a closed bounded interval, it has global maximum and minimum.
Right: The graph of y = f (x) = x3 . f 0 (x) = 3x2 , hence x = 0 is a stationary point.
Proof. Let c be a local maximum (the case for minimum is analogous). Then f (x) ≤ f (c) for all
x ∈ (c − , c + ). As f (x) is differentiable at x = c, both of its left and right derivatives must
coincide.
    On the other hand, limh→0+ f (c+h)−fh
                                            (c)
                                                ≤ 0, and limh→0− f (c+h)−f
                                                                       h
                                                                           (c)
                                                                               ≥ 0, therefore, f 0 (c) =
0.
Example 103.       • y = f (x) = x3 . Then f 0 (x) = 3x2 , hence x = 0 is a stationary point, but
    as f (x) is monotonically increasing, it is not an extremum.
   • y = f (x) = x3 − x. Then f 0 (x) = 3x2 − 1, hence x = ± √13 are stationary points. They are
     local maximum and minimum, respectively.
   • y = |x|. This function has the minimum at x = 0, but the function does not have derivative
     there. In particular, it does not hold f 0 (0) = 0 there (f 0 (0) has no meaning there).
   • The volume V (t) of the balloon at time t (second): V (t) = 50tcm3 . This implies                       dV
                                                                                                             dt   =
     50cm3 /s.
                                                                         4πr(t)3
   • The radius r(t) of the sphere with volume V (t):                       3      = V (t), By differentiating both
     sides by t, 4π dr
                    dt (t)r(t) = dt .
                              2  dV
                                                       63
  Some shape can be represented by an equation, and the equation may define a function
implicitly. For example, we know that the circle centered at (0, 0) with radius r is given by
                                                    x2 + y 2 = r 2 .
                                                                                 √
As we saw before, if we consider only the part y ≥ 0, it defines the function y = r2 − x2 .
   It is not always possible to find an explicit expression for y of a given equation. Yet, an
equation may define a function in an abstract way. Let us write it y(x).
                                        √                      1
   With the explicit expression, y(x) = r2 − x2 = (r2 − x2 ) 2 , therefore,
                                                  1 −2x              x
                                   y 0 (x) =       √         = −√        .
                                                  2 r 2 − x2      r − x2
                                                                   2
    • sin x: consider the interval [− π2 , π2 ]. The range is [−1, 1]. The inverse function is denoted
      by arcsin x, defined on [−1, 1].
    • cos x: consider the interval [0, π]. The range is [−1, 1]. The inverse function is denoted by
      arccos x, defined on [−1, 1].
    • tan x: consider the interval [− π2 , π2 ]. The range is R. The inverse function is denoted by
      arctan x, defined on R.
p Let us compute the derivative of arcsin y by putting y = sin x. Then D(sin x) = cos x =
 1 − sin x = 1 − y 2 . By the general formula, D(arcsin y) = D(sin              √1 .
            p                                                   1         1
        2
                                                                   x) = cos x =    2           1−y
Proposition 105 (Lagrange’s mean value theorem). Let f be continuous in [a, b] and differen-
tiable in (a, b). Then there is x0 ∈ (a, b) such that f (b)−f
                                                          b−a
                                                              (a)
                                                                  = f 0 (x0 ).
                               (f (b)−f (a))x
Proof. Let g(x) = f (x) −            b−a      ,   which is continuous in [a, b] and differentiable in (a, b).
Then g(a) = f (a)b−fb−a
                         (b)a
                               = g(b),   and by Theorem 104 there is x0 such that g 0 (x0 ) = 0. This
                   f (b)−f (a)
implies f 0 (x0 ) − b−a = 0.
                                                          64
                                  a                               b
Figure 48: A non constant function, continuous in [a, b] and differentiable in (a, b), must have a
stationary point.
a b
Figure 49: A function continuous in [a, b] and differentiable in (a, b), must have a point where
the deriative is equal to the mean slope.
Corollary 106. Let f be continuous in [a, b] and differentiable in (a, b). If f 0 (x) = 0 for all
x ∈ (a, b), then f is constant.
                                                                              f (y)−f (x)
Proof. Let x < y ∈ [a, b]. By Theorem 105, there is x0 ∈ (x, y) such that         y−x       = f 0 (x0 ) = 0,
therefore, f (x) = f (y).
   • If f 0 (x) ≤ 0 (< 0, respectively) for all x ∈ (a, b), then f is monotonically non increasing
     (decreasing, respectively).
Proof. Let x < y ∈ (a, b). By Theorem 105, there is x0 ∈ (x, y) such that f (y)−f  (x)
                                                                               (y−x)   = f 0 (x0 ). If
f (x0 ) ≥ 0(> 0), then f (y)−f (x) ≥ 0(> 0), that is f is monotonically non decreasing (increasing,
 0
respectively).
    The case f 0 (x) ≤ 0(< 0) is analogous.
Example 108.      • f (x) = x2 . f 0 (x) = 2x, hence f is decreasing if x < 0, x = 0 is the only
    one stationary point, and is increasing if x > 0.
Theorem 109. Let f be continuous in [a, b] and differentiable in (a, b). Let c ∈ (a, b).
   • If f 0 (x) > 0 for x ∈ (a, c) and f 0 (x) < 0 for x ∈ (c, b), then f has a maximum at c.
• If f 0 (x) < 0 for x ∈ (a, c) and f 0 (x) > 0 for x ∈ (c, b), then f has a minimum at c.
                                                 65
Figure 50: A function and its derivative. When the derivative is positive (negative) in an interval,
the function is increasing (decreasing).
Proof. If f 0 (x) > 0 for x ∈ (a, c), then it is increasing there and continuous at c, therfore, for
any x ∈ (a, b) it holds that f (c) ≥ f (x). On the other hand, as f 0 (x) < 0 for x ∈ (c, b), and
f (c) ≥ f (x) for x ∈ (c, b).
    The second case is analogous.
Example 110.         • f (x) = x3 −x. f 0 (x) = 3x2 −1, and f 0 (x) > 0 if and only if x < − √13 , √13 <
      x, and f 0 (x) < 0 if and only if − √13 < x <      √1 .
                                                           3
                                                                Therefore, x takes a local maximum at
      x=     and a local minimum x =
           − √13                             √1 .
                                               3
                                                    As f is differentiable in R, there is no other local
      maximum or minimum.
   • f (x) = cosh x. f 0 (x) = sinh x, and f 0 (x) > 0 if and only if x > 0, and f 0 (x) < 0 if and
     only if x <. Therefore, x takes a minimum at x = 0 and no other minimum or maximum.
   • f (x) = sinh x. f 0 (x) = cosh x and cosh x > 0, and hence f (x) is monotonically increasing.
    Note that, even if f 0 (x) > 0 at one point, it does not mean that f is monotonically increasing
in a neighbourhood of x. Indeed, a counterexample is given by
                                        (
                                          x2 sin x1 + x2 for x 6= 0
                                                    
                                 f (x) = 1                             .
                                           2                 for x = 0
As we have seen, this function without the part x2 is differentiable, and it has the derivative 0 at
x = 0. Therefore, with x2 , it is still differentiable and f 0 (0) = 12 > 0.
   Yet, f is not monotonically increasing in any interval (−, ). To see this, note that
                 (                   2
                   2x sin x1 − xx2 cos x1 + 12 = 2x sin x1 − cos x1 + 12 for x 6= 0
                                                                         
          0
         f (x) =                                                                         .
                   0                                                           for x = 0
and for any  > 0, there is x <  such that f 0 (x) < 0: for example, one can take x = 2πn1
                                                                                             for
sufficiently large n. Then the term 2x sin x = 0, while − cos x = −1, and then f (x) = − 2 .
                                             1                  1               0        1
                                                    66
Figure 51: A function f such that f 0 (0) > 0 but is not monotonically increasing in any interval
containing x = 0.
Example 112.       • If f (x) = x4 , then f 0 (x) = 4x3 , f 00 (x) = 12x2 , f (3) (x) = 24x, and so on.
    This f is infinitely many times differentiable.
   • If f (x) = sin x, then f 0 (x) =   cos x, f 00 (x) = − sin x, f (3) (x) = − cos x, f (4) (x) = sin x and
     so on. Again this is infinitely    many times differentiable.
                  (
                    x2 sin x1 + x2      for x 6= 0
                             
   • Let f (x) =                                   , then we have
                    0                   for x = 0
                                     
                                             1
                                                x2            1
                                                                     + 12
                                                                 
                                     2x sin x − x2 cos
                                                              x
                            f 0 (x) = = 2x sin x1 − cos        1
                                                               x     + 12       for x 6= 0
                                                                                for x = 0
                                     
                                     1
                                         2
    The second derivative is useful to study whether the stationary point (or a critical point)
is a maximum or a minimum, and also to study the shape of the graph.
Lemma 113. Suppose that f is differentiable in an open interval I and at x0 it is twice differ-
entiable.
   • If x0 is a stationary point and f 00 (x0 ) > 0 (f 00 (x0 ) < 0, respectively), then f takes a local
     minimum (a local maximum, respectively) at x0 .
      f is monotonically decreasing in (x0 − , x0 ) and increasing in (x0 , x0 + ), that is, f takes
      a minimum at x0 .
                                                    67
                                              a                       b
Figure 52: A convex function. The graph is below the segment between any pair of points
(a, f (a)), (b, f (b)).
   • If x0 a local minimum and suppose that f 00 (x0 ) < 0, then x0 would be a local maximum
     and it would contradict the previous point.
   • Let f 0 (x) = x3 − 3x. We have f 0 (x) = 3x2 − 3 and x = 1, −1 are stationary points. As
     f 00 (x) = 6x, f takes a maximum at x = −1 and a minimum at x = 1.
Definition 115. Let f be defined on an interval I. We say that f is convex (concave, respec-
tively) if for any a, b ∈ I and t ∈ [0, 1] it holds that
f (ta + (1 − t)b) ≤ tf (a) + (1 − t)f (b) ( respectively f (ta + (1 − t)b) ≥ tf (a) + (1 − t)f (b)).
   Note that (ta + (1 − t)b, tf (a) + (1 − t)f (b)) defines a segment between (a, f (a)) and (b, f (b)).
Indeed, the slope from the point (a, f (a)) to such a point is (1−t)(f    (b)−f (a))
                                                                      (1−t)(b−a)     = f (b)−f
                                                                                           b−a
                                                                                               (a)
                                                                                                   , which
does not depend on t.
Theorem 116. Assume that f is continuous on [a, b], differentiable on (a, b). If f 0 is mono-
tonically nondecreasing (nonincreasing, respectively), then f is convex (concave, respectively).
In particular, if f 00 (x) > 0 (f 00 (x) < 0, respectively) for x ∈ (a, b), then f is convex (concave,
respectively).
Proof. Let x < y in [a, b] and t ∈ (0, 1). Let z = tx + (1 − t)y. We have to prove that
f (z) ≤ tf (x) + (1 − t)f (y), or equivalently, t(f (z) − f (x)) ≤ (1 − t)(f (y) − f (z)).
    By Theorem 105, there are points c, d such that x < c < z and z < d < y such that
f (z) − f (x) = f 0 (c)(z − x) and f (y) − f (z) = f 0 (d)(y − z). As f 0 is nondecreasing, f 0 (c) ≤ f 0 (d)
and hence, using t(z − x) = (1 − t)(y − z),
t(f (z) − f (x)) = tf 0 (c)(z − x) ≤ f 0 (d)t(z − x) = f 0 (d)(1 − t)(y − z) = (1 − t)(f (y) − f (z)).
                                                     68
                         Figure 53: The asymptotes for      1
                                                            x   and tanh x.
   Under certain conditions, it can also be shown that f is convex, then f 00 > 0. We omit the
proof.
• Let f 0 (x) = x3 . As f 00 (x) = 6x, f is concave on (−∞, 0) and convex on (0, ∞).
Asymptotes
The graph of some function may approach a straight line. A more precise concept of this is
asymptotes.
Definition 118.     • Let f be defeind on (a, ∞). If lima→∞ f (x) = L, then we say that y = L
     is a horizontal asymptote (analogous for −∞).
Example 119.     • Let f (x) = tanh x. We know that limx→∞ tanh x = 1, limx→−∞ tanh x =
    −1, hence y = 1, −1 are the horizontal asymptotes of tanh x.
   • Let f (x) = x1 on (−∞, 0) ∪ (0, ∞). We know that limx→0+ x1 = ∞, limx→0− x1 = −∞, and
     hence x = 0 is a vertical asymptote of x1 . y = 0 is a horizontal asymptote of x1 because
     limx→±∞ x1 = 0.
                                                       ex − e−x
                                                                 
                              lim x tanh x − x = lim x x        −1
                             x→∞                x→∞    e + e−x
                                                     −2xe−x
                                               = lim x       = 0,
                                                x→∞ e + e−x
                                                69
                         Figure 54: The oblique asymptotes for x tanh x.
   • Translation. If g(x) = f (x−a)+b for some function f, g, then the graph of g is obtained by
     translating the graph of f by (a, b). Indeed, if (x, y) is on the graph of f , then (x + a, y + b)
     is on the graph of g.
   • Reflection. If g(x) = f (−x) for some function f, g, then the graph of g is obtained by
     reflecting the graph of f with respect to x = 0. Indeed, if (x, y) is on the graph of f , then
     (−x, y) is on the graph of g.
   • If g(x) = f (−(x − 2a)) for some function f, g, then the graph of g is obtained by reflecting
     the graph of f with respect to x = a.
   • Scaling. If g(x) = bf (x/a) for some function f, g and a, b > 0, then the graph of g is
     obtained by scaling the graph of f by a in the x-direction and b in the y-direction. Indeed,
     if (x, y) is on the graph of f , then (ax, by) is on the graph of g.
                                                 70
                             Figure 56: The graphs of sin x and 2 sin(x/2).
     • Translation symmetry. If f (x) = f (x − a), then the graph of f remains invariant under
       the translation (a, 0).
     • Reflection. If f (x) = f (−x), then the graph of f is invariant under the reflection respect
       to x = 0 and f is said to be even.
 Example 120.      • The graph of sin x is invariant under 2π translation and under the reflection
     with respect to π2 , because sin(x + 2π) = sin(x) and sin(−(x − π)) = − sin(x − π) = sin(x).
     On the other hand, sin(−x) = − sin x, hence sin x is an odd function.
 Curve sketching
 The graph of a function f can be qualitatively drawn as follows.
(3) Study the sign of f 0 and find stationary points (where f 0 (x) = 0).
 (4) Study the stationary points and find local minima and maxima (either by the second deriva-
     tive or the first).
                                           2
 Example 121.            • f (x) = e−(2x−1) .
                                                    71
Figure 57: Above:The graph of sin x is invariant under 2π translation and under the reflection
with respect to π2 . Below:The graph of (x − 12 )2 + 1 is invariant under the reflection with respect
to x = 12 .
                                                                       2
                            Figure 58: The graph of f (x) = e−(2x−1) .
                                                 72
                                     Figure 59: The graph of f (x) = log( sin1 x ).
                                                    2
       (3) f 0 (x) = −4(2x − 1)e−(2x−1) . f 0 (x) = 0 ⇔ 2x − 1 = 0 ⇔ x = 12 . f ( 21 ) = 1.
                                                           2                                   2
       (4) f 00 (x) = (16(2x − 1)2 − 8)e−(2x−1) = (64x2 − 64x + 8)e−(2x−1) .
                         1
           x             2
         0
       f (x)      +     0        −
       f 00 (x)         −
       f (x)      %     1        &
       (4) f 00 (x) =      1
                        sin2 x
                                 > 0.
                         π
          x              2
       f 0 (x)    −     0        +
       f 00 (x)         +
       f (x)      &     0        %
Solutions to equations
We can draw the graphs of f (x) = 1 − x2 and g(x) = ex − 1, and prove that there are two
solutions of the equation f (x) = g(x).
    Indeed, let us consider the function h(x) = g(x) − f (x) = ex + x2 − 2 and it suffices to find
all x such that h(x) = g(x) − f (x) = 0. We have limx→±∞ g(x) − f (x) = ∞ and g(0) − f (0) =
(1 − 1) − 1 = −1. By the intermediate value theorem, there are solutions in x > 0 e x < 0.
Moreover, h0 (x) = ex + 2x, hence there is only one stationary point (because in x > 0 h0 (x)
is positive and it is negative for sufficiently small x, while g 00 (x) − f 00 (x) = ex + 2 is positive,
therefore, g 0 (x) − f 0 (x) is monotonically increasing). Therefore, h(x) = g(x) − f (x) is decreasing
in a negative half line and is increasing in the rest, hence it can have only two points x where
h(x) = 0.
                                                                 73
    Figure 60: The graph of h(x) = ex + x2 − 2. It crosses the x-axis twice and only twice.
   • Among all rectangles of given perimeter 2r, which one has the largest area? Let the vertical
     side x, then 0 ≤ x ≤ r and the other side is r − x, hence the area is x(r − x). We need
     to find the maximum of f (x) = x(r − x) on the domain {x : 0 < x < r}. We have
     limx→0 f (x) = limx→r f (x) = 0, while f 0 (x) = r − 2x, and hence there is a stationary point
     at x = 2r , and f 00 (x) = −2, hence this is a local maximum. There is no other stationary
     points, and f (0) = f (r) = 0, hence this is the maximum.
                              √
   • The geometric mean ab is smaller than or equal to the arithmetic mean a+b         2 . Let us fix
          √                                2                                                   x+ P
                                                                                                    2
     P = ab and put a = x, then b = Px and 0 < x. Let us find the minimum of f (x) = 2 x .
                                                                                  2
     This tends to ∞ as x → 0 or x → ∞. On the other hand, f 0 (x) = 12 (1− Px2 ), and hence there
                                                                 2
     is only one stationary point at x = P , and f 00 (x) = 2P x3
                                                                   , hence this is a local minimum,
                                                                                                  2
                                                                                            x+ Px
     and is the minimum. At x = P , we have f (P ) = P . Hence we have P ≤                    2       .
Theorem 122 (Bernoulli-de l’Hôpital, case 1). Let a < x0 , f, g differentiable in (a, x0 ) such
that g 0 (x) 6= 0 for x sufficiently close to x0 , x 6= x0 , limx→x− f (x) = limx→x− g(x) = 0,
                                                                              0                   0
           f 0 (x)                                                                      f (x)
limx→x−    g 0 (x)   = L ∈ R. Then g(x) 6= 0 for x close to x0 , x 6= x0 and limx→x−    g(x)      = L.
       0                                                                            0
                                                        74
                                            (                                   (
                                            a                                   b                x0
                                                                                               f (x)                                       f 0 (x)
  Figure 61: Theorem of de l’Hôpital. The limit limx→x−                                        g(x)     is determined by limx→x0           g 0 (x) .
                                                                                          0
Proof. We can extend f, g to (a, x0 ] by putting f (x0 ) = g(x0 ) = 0, such that they are continuous.
By the hypothesis we may assume that g 0 (x) 6= 0 in (b, x0 ). Let x ∈ (b, x0 ), by Lagrange’s mean
value theorem, there is y ∈ (x, x0 ) such that g(x) = g(x) − g(x0 ) = g 0 (y)(x − x0 ) 6= 0, in
particular, g(x) 6= 0.
    By Cauchy’s mean value theorem, for x above, there is y ∈ (x, x0 ) such that f 0 (y)(g(x) −
g(x0 )) = g 0 (y)(f (x) − f (x0 )), that is,
    • limx→0          x
                    ex −1   =   1
                                e0
                                     = 1.
                       x2
    • limx→0        cos x−1     = limx→0      2x
                                            − sin x   =        2
                                                            − cos 0   = −2.
Theorem 124 (Bernoulli-de l’Hôpital, case 2). Let f, g differentiable in (a, ∞) such that g 0 (x) 6=
                                                                        0 (x)
0 for x sufficiently large, limx→∞ f (x) = limx→∞ g(x) = 0, limx→∞ fg0 (x)    = L. Then g(x) 6= 0
                                                      f (x)
for x sufficiently large and limx→∞                   g(x)    = L.
as desired.
                                                   1                         2            1
                                            sin(     )                  −        cos(       )
Example 125.                  • limx→∞           1
                                                  x2
                                                         = limx→∞           x3
                                                                                − 23
                                                                                         x2
                                                                                                 = 1.
                                                x2                               x
Theorem 126 (Bernoulli-de l’Hôpital, case 3). Let a < x0 , f, g differentiable in (a, x0 ) such that
                                                                                           0 (x)
g 0 (x) 6= 0 for x sufficiently close to x0 , limx→x0 f (x) = limx→x0 g(x) = +∞, limx→x0 fg0 (x) = L.
                                                                                              f (x)
Then g(x) 6= 0 for x sufficiently close to x0 and limx→x0                                     g(x)    = L.
                                                                       75
                                            (
                                            a
                                                                                    f (x)                                                  f 0 (x)
   Figure 62: Theorem of de l’Hôpital. The limit limx→∞                             g(x)     is determined by limx→∞                       g 0 (x) .
                                                                                            f 0 (y)
Proof. Let ε > 0. By the hypothesis, there is b such that                                   g 0 (y)   −L <          ε
                                                                                                                    3   for y ∈ (b, x0 ). In
addition, there is b̃ such that b < b̃ < x0 and in (b̃, x0 ) f (x) > 2f (b) > 0, g(x) > 2g(b) > 0.
                                          g(b)
                                       1− g(x)
Then the function h(x) =                  f (b)    is continuous on (b̃, x0 ] and its value at x0 is 1 . Furthermore,
                                       1− f (x)
it holds that
                                                                                        g(b)
                                f (x) − f (b)          f (x) − f (b) 1 −                g(x)              f (x)
                                              · h(x) =              ·                   f (b)
                                                                                                      =         .
                                g(x) − g(b)            g(x) − g(b) 1 −                                    g(x)
                                                                                        f (x)
    ˜                                          ˜
Let b̃ such that |h(x) − 1| < 3L+1
                                ε
                                    for x ∈ (b̃, x0 ). By Cauchy’s mean value theorem, there is
y ∈ (b, x) such that
                            f (x)    f (x) − f (b)         f 0 (y)
                                  =                · h(x) = 0      h(x).
                             g(x)    g(x) − g(b)           g (y)
            f (x)          f 0 (y)                    f 0 (y)                                                               ε2
Now         g(x)    −L =   g 0 (y) h(x)   −L <        g 0 (y)   − L (1 +     ε
                                                                           3L+1 )   + L|h(x) − 1| <                 ε
                                                                                                                    3   +   3    +   ε
                                                                                                                                     3   < ε.
Theorem 127 (Bernoulli-de l’Hôpital, case 4). Let f, g be differentiable (a, ∞) such that g 0 (x) 6=
                                                                0 (x)
0 as x → ∞, limx→∞ f (x) = limx→∞ g(x) = +∞, limx→∞ fg0 (x)           = L. Then g(x) 6= 0 for x
                                           f (x)
sufficiently large and limx→∞              g(x)    = L.
                                                                     2
Example 128.        • Let us compute limx→∞ xex . If the limit limx→∞ 2x ex exists, then by the de
    l’Hôpital rule, they should coincide. The latter exists if limx→∞ ex exists, and it does: it is
                                                                       2
0. Therefore, the second limit exists and it is 0, and hence the first limit exists and it is 0.
                                                                    76
   • limx→0   sin 2x
               sin x   = limx→0   2 cos 2x
                                    cos x     = 2.
               log x                  1/x
   • limx→0   1/ tan x   = limx→0   1/ sin2 x
                                                = 0.
                                      cos x
              log(sin x)
   • limx→0     log x      = limx→0   sin x
                                       x      = 1.
               xn
   • limx→∞    ex   = 0.
               log cosh x               sinh x/ cosh x
   • limx→∞         x       = limx→∞          1          = 1.
   • f (x) = O(g(x)) (as x → x0 ) if there is M > 0 such that |f (x)| ≤ M |g(x)| in an neighbour-
     hood of x0 .
                                                         f (x)
   • f (x) = o(g(x)) (as x → x0 ) if limx→x0             g(x)    = 0.
   • f1 (x) = f2 (x) + O(g(x)) (f1 (x) = f2 (x) + o(g(x)), respectively) if f1 (x) − f2 (x) = O(g(x))
     (= o(g(x)), respectively).
Similarly, let f, g : (a, ∞) → R, and suppose that g(x) 6= 0 for sufficiently large x (that is, there
is X > 0 such that g(x) 6= 0 if x > X). We write:
   • f (x) = O(g(x)) (as x → ∞) if there is M > 0 such that |f (x)| ≤ M |g(x)| for sufficiently
     large x.
                                                         f (x)
   • f (x) = o(g(x)) (as x → ∞) if limx→∞                g(x)    = 0.
The cases for (−∞, a), or f (x) → 0 and the cases in I but f (x) → 0 are analogous.
                                                                                          xn
Example 130.             • If n > 1, xn = o(x) as x → 0 (because limx→0                    x   → 0).
                                                                        xn
   • xn = o(xm ) as x → 0 if n > m (because limx→0                      xm   → 0).
                                                                             xm
   • xm = o(xn ) as x → ∞ se n > m (because limx→∞                           xn   → 0).
                                                            log x
   • log x = o(x) as x → ∞ (because limx→∞                    x      → 0).
Lemma 131. Let us consider the behaviour x → x0 = 0 (other cases are analogous).
(a) Let a, b ∈ R. If f (x) = O(h(x)), g(x) = O(h(x)), then af (x) + bg(x) = O(h(x)).
(b) Let a, b ∈ R. If f (x) = o(h(x)), g(x) = o(h(x)), then af (x) + bg(x) = o(h(x)).
                                                            77
            Figure 63: Landau’s symbol. x2 = o(x) as x → 0, but x = o(x2 ) as x → ∞.
(e) Let f (x) = o(h(x)) and f (0) = 0, limx→0 g(x) = 0. Then f (g(x)) = o(h(g(x))).
    (Similarly, if f (x) = O(h(x)), then f (g(x)) = O(h(g(x))))
Proof. (a) We have |f (x)| ≤ M1 |h(x)|, |g(x)| ≤ M2 |h(x)|, hence |af (x) + bg(x)| ≤ |a||f (x)| +
    |b||g(x)| ≤ (|a|M1 + |b|M2 )|h(x)|.
(b) Analogous.
                g(x)                        f (x)g(x)              g(x)
(c) If limx→0   h(x)    = 0, then limx→0    f (x)h(x)   = limx→0   h(x)   = 0.
                f (x)               f (x)
(d) If limx→0   h(x)    → 0, then   h(x)    < M for x close enough to 0, hence |f (x)| < M |h(x)|.
Example 132. As x → 0,
   • sin(x2 ) = O(x2 ), because sin(y) = O(y) and we put y = x2 .
        3
   • ex − 1 = O(x3 ), because ey − 1 = O(y), and we put y = x3 .
• sin2 (x) = O(x2 ), because sin x = O(x), and hence sin2 x = O(x sin x) = O(x2 ).
                                                          78
Figure 64: The second order Taylor formula. We approximate a general function by a second
order polynomial.
Example 134. As x → 0,
                          x2
    • ex = 1 + x +        2    + o(x2 )
                                x2
    • log(1 + x) = x −          2    + o(x2 ).
    • sin(x) = x + o(x2 ).
                          x2
    • cos(x) = 1 −        2    + o(x2 ).
                                                                79
Nov. 15. Higher order Taylor formula and more examples.
Higher order Taylor(-Peano) Formula
For f n-times differentiable, the following holds (as we prove later).
   With the convention f (0) (x) = f (x),
                                       1                               1
 f (x) = f (x0 ) + (x − x0 )f 0 (x0 ) + (x − x0 )2 f 00 (x0 ) + · · · + (x − x0 )n f (n) (x0 ) + o((x − x0 )n )
                                       2                               n!
          n
         X   f (k) (x0 )
       =                 (x − x0 )k + o((x − x0 )n )
                  k!
         k=0
           Pn        f (k) (x0 )
The part       k=0       k!      (x   − x0 )k is called the Taylor polynomial of f .
Lemma 135. Let f, g differentiable n times in (a, b) and x0 ∈ (a, b). Suppose that g (k) (x) 6= 0 for
x 6= x0 , 0 ≤ k ≤ n but f (k) (x0 ) = g (k) (x0 ) = 0 for 0 ≤ k ≤ n − 1. Then for any x 6= x0 , x ∈ (a, b)
                                              f (x)       f (n) (ξ)
there is ξ between x, x0 such that            g(x)    =   g (n) (ξ)
                                                                    .
           f (x)   f (x) − f (x0 )  f 0 (ξ1 )  f 0 (ξ1 ) − f 0 (x0 )   f (2) (ξ2 )           f (n) (ξn )
                 =                 = 0        = 0                    =             = · · · =
           g(x)    g(x) − g(x0 )    g (ξ1 )    g (ξ1 ) − g 0 (x0 )     g (2) (ξ2 )           g (n) (ξn )
and we put ξ = ξn .
Proposition 136. Let F be differentiable n times at x0 ∈ (a, b). Then, F (x) = o((x − x0 )n ) as
x → x0 if and only if F (k) (x0 ) = 0 for 0 ≤ k ≤ n.
Proof. We know this for n = 0 by definition. Let us prove the general case by induction, by
assuming that it is true for n.
   Let F (x) = o((x − x0 )n+1 ). Then F (k) (x0 ) = 0 for 0 ≤ k ≤ n by the hypothesis of induc-
                                                         F (x)                                             F (x)       F (n) (ξ)
tion. The assumption is 0 = limx→x0                   (x−x0 )n+1
                                                                 .           On the other hand,         (x−x0 )n+1
                                                                                                                     =
                                                                                                                    (n+1)!(ξ−x0 )
                                                                                         F (x)                     F (n) (ξ)
for some ξ. If x → x0 , ξ → x0 , that is, 0 = limx→x0                                 (x−x0 )n+1
                                                                                                   =     limξ→x0 (n+1)!(ξ−x  0)
                                                                                                                                 =
          (n) (ξ)−F (n) (x )    F (n+1) (x0 )
limξ→x0 F (n+1)!(ξ−x    0 )
                            0
                              =    (n+1)! , hence F
                                                    (n+1) (x ) = 0.
                                                            0
   Let F (x0 ) = 0 for 0 ≤ k ≤ n + 1. Then by the Bernoulli-de
         (k)                                                                                   l’Hôpital theorem,
Corollary 137. Let f (x) differentiable n times at x0 ∈ (a, b). Then with
                                                          n
                                                          X f (k) (x0 )
                                             Pn (x) =                          (x − x0 )k ,
                                                                        k!
                                                          k=0
                                                                   80
      Figure 65: The graph of a function whose Taylor series converges but not to itself.
Example 138.     • f (x) = ex . As f (n) (x) = ex , we have f (n) (0) = 1 and hence ex =
    Pn xn                                  x2  x3          xn
     k=0 n! + o(x ). That is, e = 1 + x + 2 + 3! + · · · + n! + o(x ) as x → x0 = 0.
                  n            x                                    n
   • f (x) = sin x. As f (4n) (x) = sin x, f (4n+1) (x) = cos x, f (4n+2) (x) = − sin x, f (4n+3) (x) =
     − cos x, we have f (4n) (0) = 0, f (4n+1) (x) = 1, f (4n+2) (x) = 0, f (4n+3) (x) = −1, and hence
                       k x2k+1                                                         n 2n+1
     sin x = nk=0 (−1)              2n+1 ). That is, sin x = x− x3 + x5 +· · ·+ (−1) x            2n+1 )
             P
                     (2k+1)! +o(x                                  3!   5!           (2n+1)! +o(x
     as x → x0 = 0.
   Very often, the Taylor series converges to the original function f (x), that is,
                                               ∞
                                               X f (k) (x0 )
                                     f (x) =                   (x − x0 )k
                                                       k!
                                               k=0
holds for some functions (ex , sin x, cos x, log(1 + x)) and for some x. One example we can show
easily such convergence is f (x) = 1−x 1
                                         . Indeed, f 0 (x) = (1−x)
                                                                1
                                                                  2,f
                                                                      (n) (x) =
                                                                                (1−x)n . And hence the
                                                                                  n!
      As x → x0 = 0, we have
                          x2
        – ex = 1 + x +    2    + o(x2 )
                                                     81
              Figure 66: Approximating the area surrounded by f (x) by rectangles.
         – sin y = y + o(y)
         – sin(x2 ) = x2 + o(x2 )
       Then it holds, as x → 0,
                                                           x 2
                                                                   2
                                              ex − 1 − x    2 + o(x )
                                                         =
                                                sin(x2 )   x2 + o(x2 )
                      ex −1−x
       hence limx→0    sin(x2 )
                                  = 21 .
   •                                                              √
                                               x − ln(1 − x) − 2x 1 + x
                                           lim
                                           x→0        sin(x) − xex
       As x → x0 = 0, we have
       Then it holds, as x → 0,
                                                        √
                                     x − ln(1 − x) − 2x 1 + x   − 21 x2 + o(x2 )
                                                              =
                                            sin(x) − xex        −x2 + o(x2 )
                                    √
                      x−ln(1−x)−2x 1+x
       hence limx→0       sin(x)−xex        = 12 .
We denote this by P 0 P .
                                                      82
                                                                     I   (                          ]
     I    (                                                ]         P   (              )[          ]
     P    (                  )[                            ]   P0 (           (]             )[     ]
     P0   (       (]         )[               )[           ] P ∧ P 0(         (]        (]   )[     ]
Figure 67: Left: a partition P of an interval I and a refinement P 0  P . Right: two partitions
P, P 0 of I and P ∧ P 0 .
Pn0  Pn . diam(Pn ) = n .
                       1
   • Let f (x) = x.
                                                       n
                                                       X j 1  1 n(n + 1)
                                        S I (f, Pn ) =    · = 2·
                                                         n n n     2
                                                          j=1
      Analogously,
                                        n                       
                                        X j−1 1  1    n(n + 1)      1 n(n − 1)
                       S I (f, Pn ) =        · = 2·            −n = 2          .
                                           n  n n        2         n     2
                                        j=1
                                                                83
                                                                                                      ···
                                                ···
Figure 69: The upper and lower sum for f (x) = a (constant) and f (x) = x.
 (i) If P ≺ P 0 , then
                                    S I (f, P ) ≤ S I (f, P 0 ) ≤ S I (f, P 0 ) ≤ S I (f, P ).
Definition 143. Let I = (a, b) or [a, b] etc. a bounded interval and f : I → R bounded. f is
said to be integrable on I if
                                                                   84
where inf P and supP are taken over all possible partitions of P of I and in this case we denote
this number by
                                   Z             Z b
                                      f (x)dx =      f (x)dx.
                                             I                a
x does not have any meaning, and one can also write thsi as I f (t)dt.
                                                            R
                     R1
Example 144.       • 0 adx = a. Indeed, for all partitions S I (f, P ) = S I (f, P ) = a.
     R1
   • 0 xdx = 12 . Indeed, with f (x) = x, we have found Pn such that S I (f, P ) = n(n−1) 2n2
                                                                                              and
                      n(n+1)
      S I (f, P ) =     2n2
                             ,   hence the sup and the inf coincide and it is 21 .
   In general, it is difficult to show integrability by definition. Fortunately, we can prove that
continuous functions on a closed bounded interval are integrable, and we also have the funda-
mental theorems of calculus, that let us calculate integrals with the knowledge of derivatives.
n} < δ, we have
                                           n                         n
                                           X                         X                   
             S I (f, P ) − S I (f, P ) =     (sup f − inf f )|Ij | <               |Ij | = < .
                                                 Ij      Ij               2(b − a)        2
                                           j=1                      j=1
Therefore, f is integrable.
                                                          (
                                                            0 x rational
Example 146. (of a nonintegrable function) If f (x) =                      , f (x) is not integrable:
                                                            1 x irrational
S I (f, P ) = 1, S I (f, P ) = 0, because any interval I contains both a rational number and an
irrational number.
Proposition 147. Let I be a bounded interval, f, g bounded and integrable on I.
                                                      R                        R             R
  (i) If c, d ∈ R, then cf + dg is integrable on I and I (cf (t) + dg(t))dt = c I f (t)dt + d I g(t)dt.
                      R            R
 (ii) If f ≤ g, then I f (t)dt ≤ I g(t)dt.
                                                       85
                                   I       (                                                       ]
                                   I                               (                     ]
                                   P (               (]                    )[                      ]
                                   P2                              (       )[            ]
(iii) Let I ⊂ I. Then, for any partition P of I, we can take a refinement P 0 which consists of
      the intervals of the form Ij ∩ I and Ij \ I (the latter may be a union of two intervals).
      Let P2 = {Ij ∩ I : 1 ≤ j ≤ n}. Then
                                                                                             n
                                                                                             X
                S I (f, P2 ) − S I (f, P2 ) ≤ S I (f, P2 ) − S I (f, P2 ) +                      (sup f − inf f )|Ij \ I|
                                                                                                                  Ij \I
                                                                                             j=1 Ij \I
= S I (f, P 0 ) − S I (f, P 0 )
      If S I1 (f, P1 )−S I1 (f, P1 ) < 2ε and RS I2 (f, P2 )−SRI2 (f, P2 ) < 2εR, then S I (f, P1 ∪P2 )−S I (f, P1 ∪
      P2 ) < ε and the limits coincide: I f (x)dx = I1 f (x)dx + I2 f (x)dx.
                                                                                                                            R
Corollary
R              148. If f is continuous, then |f | is continuous and by Theorem 147(ii),                                     I   f (x)dx ≤
  I |f (x)|dx.
                                            Ra               Rb
Definition 149. If a < b, then we put b f (x)dx = − a f (x)dx.
                                Rb           Rc             Rb
Lemma 150. It holds that a f (x)dx = a f (x)dx + c f (x)dx for all a, b, c ∈ R.
Proof. If a < c < b, then this follows from 147(iii). If a < b < c, then
                  Z    c               Z   b               Z   c                Z   b                  Z     b
                           f (x)dx =           f (x)dx +           f (x)dx =            f (x)dx −                f (x)dx.
                   a                   a                   b                    a                        c
Theorem 151 (Fundamental theorem of calculus 1). Let I = [a, b] a bounded closed interval
and f : I → R continuous. Then the function of x on I defined by
                                             Z x
                                     F (x) =     f (t)dt
                                                                       a
                                                                   86
Proof. We have
                                                  x+h                     x
                                                                                        1 x+h
                                       Z                                            
              F (x + h) − F (x)
                                                                      Z                  Z
                                  1
                                =                       f (t)dt −             f (t)dt =       f (t)dt
                      h           h          a                        a                 h x
                        R x+h                    R x+h           R x+h        R x+h
As f is continuous, x f (t)dt − f (x)h = x f (t)dt − x f (x)dt ≤ x |f (t) − f (x)|dt
and for any  > 0 there is δ > 0 such that if |t − x| < δ, then |f (t) − f (x)| < , therefore,
R x+h                                             F (x+h)−F (x)           1 x+h
                               Then, for such
                                                                           R
 x    |f (t) − f (x)|dt <  h.                h,        h       − f (x) = h x   f (t)dt − f (x) < .
                                   F (x+h)−F (x)
Since  > 0 is arbitrary, limh→0         h              = f (x).
Theorem 152 (Fundamental theorem of calculus 2). Let I = [a, b] be a closed bounded interval
and f : I → R differentiable in (a, b) and f 0 is continuous and extends to a continuous function
on Then                                             Z x
                                    f (x) − f (a) =     f 0 (t)dt.
                                                                  a
                                                        Rx
Proof. D(f (x) − Rf (a)) = f 0 (x), while D( a f 0 (t)dt) = f 0 (x) by Theorem R x 0 151. Therefore,
                    x 0
D(f (x) − f (a) − a f (t)dt) =R 0, and by Corollary 106, f (x) − fR(a) − a f (t)dt is constant,
                                 a                                     x
but with x = a, f (a) − f (a) − a f 0 (t)dt = 0, hence f (x) − f (a) − a f 0 (t)dt = 0.
    By Corollary 106, if there are two primitives F, G of f , then F (x) − G(x) is a constant.
                                                                                         Rx
    By Theorem 145 and Theorem 151 there is a primitive if f is continuous: F (x) = a f (t)dt
is a primitive of f .
Corollary 154. Let f be a continuous function on a closed bounded interval I = [a, b], and F
                            Rb
be a primitive of f . Then a f (t)dt = F (b) − F (a).
                     Rx
Proof. Let F̃ (x) = a f (t)dt. Then F̃ is a primitive of f , and hence F̃ (x) − F (x) = c (constant).
                    Rb
By Theorem 152, a f (x)dt = F̃ (b) − F̃ (a) = (F (b) + c) − (F (a) + c) = F (b) − F (a).
                                                 Rb
    This tells us a way to compute the integral a f (x)dx: we only have to find a primitive F (x)
of f (x) and take the difference F (b) − F (a). This is denoted by [F (x)]ba . Namely,
                                        Z     b
                                                  f (x)dx = [F (x)]ba .
                                          a
    A primitive of f is also written by f (x)dx (up to a constant) and it is called the indefinite
                                         R
                                                                                2
integral of f . With a generic constanti it is written, for example xdx = x2 + C. In contrast,
                                                                        R
Rb
 a f (x)dx (the integral in the interval [a, b]) is called a definite integral.
                                                   xn+1                                                        xn+1
   • Let f (x) = xn , n ∈ N. Then F (x) =                    is a primitive of f (x). That is,       xn dx =
                                                                                                 R
                                                   n+1                                                         n+1 +C.
                                                             87
   • Let f (x) = sin x. Then F (x) = − cos x a primitive of f (x). That is,
                                                                                      R
                                                                                          sin xdx = − cos x+C.
   • Let f (x) = cos x. Then F (x) = sin x a primitive of f (x). That is, cos xdx = sin x + C.
                                                                                      R
   • Let f (x) = x1 . Then F (x) = log x a primitive of f (x). That is, x1 dx = log x + C.
                                                                        R
     Rπ
   • 0 sin xdx = [− cos x]π0 = −(−1) − (−1) = 2.
     Rπ
   • 0 cos xdx = [sin x]π0 = 0 − 0 = 0.
     R2
   • 1 ex dx = [ex ]21 = e2 − e.
     R2
   • 1 x1 dx = [log x]21 = log 2 − log 1 = log 2.
     R2
   • 1 x12 dx = [− x1 ]21 = − 12 − (− 11 ) = 21 .
          R2
Note that −1 x1 dx is not integrable!
• If DF (x) = f (x), DG(x) = g(x), then D(F (x) + G(x)) = f (x) + g(x).
                                                  88
       Rπ
   •   0    sin(x − π3 )dx = [− cos(x − π3 )]π0 = − cos 2π            π
                                                         3 − (− cos(− 3 )) =
                                                                               1
                                                                               2   +   1
                                                                                       2   = 1.
       R2
   •      2
       1 x+1 dx    = 2[log(x + 1)]21 = 2(log 3 − log 2).
Theorem 157. Let f be a continuous function on I = [a, b]. Then there is c ∈ (a, b) such that
Rb
 a f (x)dx = f (c)(b − a).
                           Rx
Proof. Note that F (x) = a f (t)dt is differentiable and F 0 (x) = f (x). By Lagrange’s mean value
                                                                                      Rb
theorem, there is c ∈ (a, b) such that F 0 (c) = F (b)−F
                                                     b−a
                                                         (a)
                                                             , that is, f (c)(b − a) = a f (x)dx.
Some applications
Let us imagine a car travelling at the speed v(t) at time t. Then the distance travelled from time
           Rb
a and b is a v(t)dt. Indeed, if x(t) is the place of the car at time t, then we have x0 (t) = v(t) by
                                                       Rb
definition. By Theorem 152, we have x(b) − x(a) = a v(t)dt.
    As another example from physics, consider the situation where someone is pushing up ver-
tically a mass m (kg) to a certain height h. The work done by this motion is, as far as the
gravitational force is constant, mgh, where g is the gravitational acceleration. mg is called the
weight, which is the downward force. If one is pushing a mass in a changing  R h gravitational field
g(x) (like a rocket carrying a payload), the work done by this motion is h12 mg(x)dx.
Integration by parts
Recall that, if f, g are differentiable, then it holds that D(f (x)g(x)) = Df (x)g(x) + f (x)Dg(x).
By writing this as Df (x)g(x) = D(f (x)g(x)) − f (x)Dg(x), we can find a primitive of Df (x)g(x)
if we know a primitive of f (x)Dg(x). Schematically,
                          Z                              Z
                             f (x)g(x)dx = f (x)g(x) − f (x)g 0 (x)dx + C.
                               0
                                                    89
     We can check this results by taking the derivative: D(x sin x + cos x) = sin x + x cos x −
     sin x = x cos x.
   • Consider log xdx. We can see this as 1 · log x, and 1 = D(x). Therfore, with f (x) =
                R
   • Consider x2 sin xdx. This cannot be integrated by one step, but by successive applications
               R
     of integration by parts. By noting that sin x = D(− cos x) and cos x = D(sin x),
                        Z                            Z
                            2           2
                          x sin xdx = x (− cos x) − 2x(− cos x)dx + C
                                                             Z
                                          2
                                    = −x cos x + 2x sin x − 2 sin xdx + C
    As for indefinite integral, we do not have to find Rthe whole indefinite integral, but we can
                                                         b
give values to parts. Let us recall that f (b) − f (a) = a f 0 (x)dx.
Substitution
Next, let us consider the case where the integral is of the form ϕ0 (x)f 0 (ϕ(x))dx. We know that
                                                                  R
D(f (ϕ(x))) = ϕ0 (x)f 0 (ϕ(x)) by the chain rule, hence in this case we have
                                Z
                                   ϕ0 (x)f 0 (ϕ(x))dx = f (ϕ(x)) + C.
                      • Consider            2x sin(x2 )dx. Note that 2x = D(x2 ) and sin(y) = D(− cos y),
                                        R
Example 161.
    hence                                    Z
                                                2x sin(x2 )dx = − cos(x2 ) + C.
Indeed, by the chain rule, D(− cos(x2 )) = −(2x(− sin(x2 ))) = 2x sin(x2 ).
                                                                     D(x2 )
                     Z                        Z                 Z
                            x      1                2x      1                   1
                          2
                              dx =                 2
                                                       dx =           2
                                                                            dx = log(x2 + 1).
                         x +1      2              x +1      2        x +1       2
                                                        90
   • Consider             tan xdx. Recall that tan x = cos x and note that D(cos x) = − sin x. Hence
                                                       sin x
                      R
                             Z              Z
                                                             1
                                tan xdx = − D(cos x) ·          dx + C = − log | cos x| + C.
                                                          cos x
Example 163.
                                    Z   π            Z π
                                        sin3 xdx = −     (cos2 x − 1) sin xdx
                                     0                 0
                                       Z π                              3            π
                                                   2                     cos x
                                    =      ((cos x) − 1)D(cos x)dx =           − cos x
                                        0                                  3           0
                                         (−1)3             13
                                                                 
                                                                        4
                                    =           − (−1) − ( − 1) = .
                                            3              3            3
  • (x−a)
    R 1
           dx = log |x − a|.
   • for n ∈ N, n ≥ 2,                      1                  −1
                                                                       .
                                    R
                                         (x−a)n dx   =   (n−1)(x−a)n−1
   •         1                                     1            1
                                                                    arctan( xa ).
       R                                    R
           x2 +1
                 dx   = arctan x,               x2 +a2
                                                       dx   =   a
   •           1                    1
                                        arctan( (x−b)
       R
           (x−b)2 +a2
                      dx        =   a             a ).
We also have
                       1 + x2 − x2                           −2x · x
  Z                 Z                    Z               Z
         1                                       1
               dx =                dx =                +              dx
     (x2 + 1)2          (x2 + 1)2            (x2 + 1)      2(x2 + 1)2
                                              Z
                                    x                1                      x    1
                  = arctan x +     2
                                          −         2
                                                           = arctan x +    2
                                                                                − arctan x
                               2(x + 1)         2(x + 1)                2(x + 1) 2
                    1                 x
                  = arctan x +              .
                    2             2(x2 + 1)
    In general, the derivative of the primitive F of f must be the original function f . We can
check that the primitive in this way. This is often useful because the calculus of primitive is
often complicated, while derivative is mechanical.
                                                                         91
                                                             1
                                         x3 −1                 (4x3 −x)+ x4 −1
                                •                                                dx. Note that               x−4              x−4            4
                                    R                    R
Example 164.                            4x3 −x
                                               dx    =       4
                                                                  4x3 −x                                    4x3 −x
                                                                                                                     =   x(2x−1)(2x+1)   =   x   +
        − 27        − 92
        2x−1    +   2x+1       (see below) and hence
                                                                                                              !
                                     x3 − 1                                   − 27   − 92
                                Z                                  Z
                                                x 1                       4
                                       3
                                            dx = +                          +      +                              dx
                                    4x − x      4 4                       x 2x − 1 2x + 1
                                                         x              7                9
                                                     =     + log |x| −    log |2x − 1| −    log |2x + 1|.
                                                         4             16                16
                                                                       − x2 +1         x
    • Using                1
                 (x−1)2 (x2 +1)
                                        =   Ax+B
                                            (x−1)2
                                                     +   Cx+D
                                                         x2 +1
                                                                   =   (x−1)2
                                                                                 +    2
                                                                                       2
                                                                                     x +1
                                                                                            we get
                                                                       − x2 + 1                     x
                           Z                                       Z                        Z
                                       1                                                            2
                                                 dx =                           dx +                   dx
                                (x − 1) (x2 + 1)
                                       2                               (x − 1)2                 x2 + 1
                                                                   − (x−1) + 12                       x
                                                                   Z                            Z
                                                                       2                  2
                                                               =                dx +          dx
                                                                    (x − 1)2           x2 + 1
                                                                  1                  1       1
                                                               = − log |x − 1| −           + log(x2 + 1).
                                                                  2               2(x − 1) 4
                                                                                 P (x)                                               P1 (x)
    In general, if P (x) and Q(x) are polynomials,                               Q(x)    can be written as a sum of                 (x−a)n       or
    P2 (x)
((x−b)2 +a2 )n
                 (with different polynomials P1 , P2 ), and for each of them one can find a primitive.
                                                      1          A(x + 1) + B(x − 1)
                                                               =
                                                (x − 1)(x + 1)     (x − 1)(x + 1)
        and 1 = (A + B)x + (A − B). This means that A + B = 0 and 1 = A − B, therfore,
        A = 21 , B = − 12 .
                                                                             Bx3 +Cx2 +Dx+E
    •         1
        (x+1)(x2 +1)2
                      .        We put             1
                                            (x+1)(x2 +1)2
                                                              =     A
                                                                   x+1   +        (x2 +1)2
                                                                                            .
        Then
                                      1           A(x2 + 1)2 + (Bx3 + Cx2 + Dx + E)(x + 1)
                                                =
                               (x + 1)(x2 + 1)2                 (x + 1)(x2 + 1)2
        hence 1 = A(x4 + 2x2 + 1) + (Bx4 + (B + C)x3 + (C + D)x2 + (D + E)x + E), and this
        means A + B = 0, B + C = 0, 2A + C + D = 0, D + E = 0, A + E = 1. To solve this,
        we observe A = −B, C = −B, D = −E and hence −3B − E = 0, −B + E = 1, hence
        B = − 41 , E = 34 , A = 41 , C = 14 , D = − 43 . Altogether,
                                   1
                      1            4    − 14 x3 + 14 x2 − 34 x +                            3
                                                                                            4
                                                                                                        1
                                                                                                        4         − x4 + 14   − 21 x + 12
                                =     +                                                         =             +             +
               (x + 1)(x2 + 1)2   x+1           (x2 + 1)2                                           x+1           x2 + 1      (x2 + 1)2
Change of variables
Let F (x) be a primitive of f (x), that is f (x)dx = F (x). If it is difficult to find F directly, one
                                           R
                                                                         92
Example 166.           • f (x) =           ex +1 .
                                             1
                                                        With t = ex + 1, x = ϕ(t) = log(t − 1), ϕ0 (t) =                           t−1 ,
                                                                                                                                    1
                                                   Z          
                                                                          t−1
                       Z          Z
                         1               1              1    1
                       x
                             dx =             dt =         −     dt = log
                      e +1           t(t − 1)          t−1   t             t
                                             x
      and with t = ex + 1, ex1+1 dx = log exe+1 .
                           R
                   2
   • f (x) =   √x
                1−x2
                     .     With x = ϕ(t) = sin t, ϕ0 (t) = cos t, t = arcsin x, if t ∈ (− π2 , π2 ),
                      x2                           sin2 t                                           1 − cos 2t
               Z                           Z                                Z                  Z
                                                                                    2                              t  sin 2t
                   √        dx =                          cos tdt =              sin tdt =                     dt = −
                     1 − x2                        cos t                                                2          2    4
                                                        √                                                                 2
      and with t = arcsin x, sin 2t = 2 sin t cos t = 2x 1 − x2 , we obtain                                             √x             1
                                                                            R
                                                                                                                         1−x2
                                                                                                                              dx   =   2   arcsin x−
          √
      1
      2 x   1 − x2 .
   For example,
                Z                 Z 2
                       1             t +1          2                              x
                          dx =               ·       2
                                                       dt = log |t| + C = log tan    + C.
                    sin x              2t      1+t                                2
                                                   p
Example 168.        • Note that                     1 − sin2 t = | cos t| and this is equal to cos t if |t| <                              2,
                                                                                                                                           π
                                                                                                                                                hence
    with x = sin t,
                                                                   π                                            π
                                 Z    1p                       Z
                                                                   2   p                     Z
                                                                                                                2
                                       1−          x2 dx   =                   2
                                                                        1 − sin t cos t dt =                        cos2 t dt
                                  0                            0                                            0
                                                               Z   π                                π
                                                                   2   cos(2t) + 1        sin(2t)   t 2
                                                           =                       dt =           +
                                                               0            2                4      2 0
                                                            π
                                                           = .
                                                            4
                                                                           93
                  Figure 71: Integral of symmetric and antisymmetric functions.
Some remarks
                                                                               R0                R0
   • If f (x) = f (−x), then by the change of variables x = −t,                 −a f (x)dx   =   a    f (−t)(−t)0 dt =
     Ra                Ra             Ra
      0 f (t)dt, hence −a f (x)dx = 2 0 f (x)dx.
                    R1 √               R1√
     For example, −1 1 − x2 dx = 2 0 1 − x2 dx = π2 .
                                                                               R0                R0
   • If f (x) = −f (−x), then by the change of variables x = −t,                −a f (x)dx   =   a    f (−t)(−t)0 dt =
        Ra               Ra
     − 0 f (t)dt, hence −a f (x)dx = 0.
                    R1    2
     For example, −1 ex sin xdx = 0.
   • Logarithmic differentiation: If f (x) is difficult to differentiate but log f (x) is easy, then we
                           0 (x)
     have D(log f (x)) = ff (x)  , hence we have f 0 (x) = f (x)D(log f (x)).
      For example, f (x) = xx (for x > 0) is not a simple product or a composition. But
      log f (x) = x log x, hence D(log f (x)) = log x + 1, hence f 0 (x) = xx (log x + 1).
                      1
                           Rx
where Rn (x, x0 ) =   n!   x0   f (n+1) (y)(x − y)n dy.
Proof. This is true for n = 0, because
                                   Z x
                         f (x0 ) +     f 0 (y)dy = f (x0 ) + [f (y)]xx0 = f (x).
                                            x0
To prove the formula by induction, assume the claim for n and let f be n + 2 times differentiable,
                       (k)
then f (x) = nk=0 f k!(x0 ) (x − x0 )k + Rn (x, x0 ) and
             P
                            1 x (n+1)
                               Z
           Rn (x, x0 ) =            f      (y)(x − y)n dy
                           n! x0
                                                                   Z x
                    1       h                        ix       1
           =−                 f (n+1) (y)(x − y)n+1      +              f (n+2) (y)(x − y)n+1 dy
                (n + 1)!                              x0   (n + 1)! x0
                                                               Z x
                  1        (n+1)                 n+1      1
           =             f        (x0 )(x − x0 )     +             f (n+2) (y)(x − y)n+1 dy
              (n + 1)!                                 (n + 1)! x0
                                                            94
    This is interesting, because for some functions, we can prove that the Taylor series converges
to the original function. Let us take x0 = 0 and consider the interval (−R, R).
                    n         Rx y
    • ex = nk=0 xn! + (n+1)!
                           1
                               0 e (y − x) dy. As |x| < R, we have e < e and |(y − x) | < R .
                                           n                          y    R              n     n
            P
      Altogether, the remainder term is
                              Z x                           Z x
                        1                             1                     eR Rn+1
                                  ey (y − x)n dy ≤              eR Rn dy ≤           .
                     (n + 1)! 0                    (n + 1)! 0               (n + 1)!
                              Rn+1
      Note that for any R,   (n+1)!→ 0 because for sufficiently large n we have n > 2R, hence from
                                                                                           k
      that point the sequence decreases more than by 21 . This means that ex − nk=0 xk! → 0,
                                                                                     P
      that is, the Taylor series converges to ex for x ∈ (−R, R), and this is denoted by
                                                                    ∞
                                                           x
                                                                    X xn
                                                          e =                      .
                                                                              n!
                                                                    n=0
   • The same argument holds for sin x, cos x, because |Dn (sin x)| ≤ 1, |Dn (cos x)| ≤ 1. That
     is,
                                                             ∞
                                                             X (−1)n x2n+1
                                                   sin x =                                ,
                                                                            (2n + 1)!
                                                             n=0
                                                              ∞
                                                             X      (−1)n x2n
                                                  cos x =                     ,
                                                                      (2n)!
                                                             n=0
      in the sense that for each x the series converges to the original function.
More properties of Taylor series will be studied in Mathematical Analysis II.
Improper integral
We can define integral for (some) funtions that are not bounded and on an interval not bounded.
Definition 170. Let (a, b) be an interval, a ∈ R or a = −∞ and b ∈ R or b = +∞. Let f
be a function integrable on all [α, β], where a < α < β < b, α, β ∈ R. If there exists the limit
       Rβ
limα→a α f (x)dx, then we denote it by
                                  Z β               Z β
                                      f (x)dx = lim     f (x)dx.
                                         a                     α→a α
                                                               95
                   Rγ          Rβ         Rγ
It also holds that a f (x)dx = a f (x)dx + β f (x)dx for γ ∈ (a, b). Analogously if there exists
                  Rβ                      Rb                  Rβ
the limit limβ→b α f (x)dx, then we write α f (x)dx = limβ→b α f (x)dx. If both limits exist,
                 Rb          Rx          Rb
then we denote a f (x)dx = a 0 f (x)dx + x0 f (x)dx for some x0 ∈ (a, b).
Example 171.               • Consider (0, 1) and the function f (x) = xα , α ∈ R. For ε > 0, if α 6= −1,
                                           Z       1
                                                                          1               1
                                                       xα dx =               [xα+1 ]1ε =     (1 − εα+1 ),
                                               ε                         α+1             α+1
where a < α, and the function is bounded and integrable on all bounded intervals [α, β]. Similarly,
Rb                   Rβ              Rb            R x0           Rb
 α f (x)dx = limβ→b α f (x)dx and a f (x)dx = a f (x)dx + x0 f (x)dx, where x0 ∈ (a, b).
When these limits exist, we say that the improper integral converges, and otherwise does not
converge, or diverges if the limit tends to ∞ or −∞.
                                                                                96
                                         R∞
Example 172.      • The improper integral 0 sin xdx does not converge. Indeed, it holds that
    Rβ                     β
     0 sin xdx = [− cos x]0 = − cos β + 1, and as β → ∞, − cos β oscillates and does not
    converge to any value.
              R∞                             Rβ
  • Consider 0 e−x dx. For β > 0, we have R0 e−x dx = [−e−x ]β0 = −e−β − (−1) = 1 − e−β ,
                                                ∞
    hence as β → ∞, this tends to 1. That is, 0 e−x dx = 1.
(iii) Let c > 0 (the other case is analogous). For x0 close enough to b, it holds that 2c g(x) ≤
      f ≤ 2cg(x). Hence the claim follows from (i).
                             R n+1                             PM              R M +1
(iv) We have f (n + 1) ≤ n f (x)dx ≤ f (n), therefore,           n=N f (n) ≤ N        f (x)dx ≤
      PM +1
        n=N f (n).
(v) This follows from −|f | ≤ f ≤ |f | and (i) for an interval [α, β], then by taking the limits.
                                                                          RN
   Now we can show that ∞
                                                                                     P −1 1
                                  1
                                    is divergent using integral. Indeed, 1 x1 dx ≤ N      n=1 n , but
                           P
RN 1                          n=1 n                   PN −1 1
                 N
 1 x dx = [log x]1 = log N − 0 →   ∞, therefore, also n=1 n → ∞ as N → ∞.
                                 Rb                                                         Rb
   When the improper integral a |f (x)|dx, then we say that the improper integral a f (x)dx
converges absolutely.
                      R∞ x                                              Rα
Example 174.        • 1 cosx2
                               dx converges. Indeed, | cos
                                                        x2
                                                           x
                                                             | ≤ x12 and 1 x12 dx = [− x1 ]α1 = 1 − α1
      which tends to 1 as α → ∞.
                                                 97
                                                                                           1
                                                                                 f (x) =   x
                                                                                 1
                                                                       f (x) =    1
                                                                                 x2
                                      1
                            f (x) =   x
          1
f (x) =   x2
f (x) = sin x
f (x) = e−x
                                                     1
                                           f (x) =   x
                                                     PN
               Figure 73: A graphical proof that             1
                                                         n=1 n   diverges as N → ∞.
                                               98
                                                           sin x
                                                 f (x) =     x
                                                           sin x
                                                 f (x) =     x
       R∞
   •    1
            sin x
              x dx    converges. Indeed, by integration by parts,
                                      − cos x α
                     Z α                        Z α                            Z α
                         sin x                        cos x              cos α       cos
                               dx =             +        2
                                                            dx = cos 1 −       +       2
                                                                                         dx
                       1   x             x    1    1   x                   α      1 x
     The first two terms tend to cos 1 while the last one is convergent.
     R∞                                 R (n+1)π sin x          Rπ
   • 1 | sinx x |dx diverges. Indeed, nπ                    1
                                                | x | ≥ (n+1)π   0 sin xdx = (n+1)π and hence we
                                                                                 2
          R α sin x        P[α]
     have 1 | x |dx ≥ n=2 π(n+1)    2
                                          → ∞.
     R∞                                                             R∞
   • 0 √x13 +1 dx is convergent. Indeed, it is enough to consider 1 √x13 +1 dx, and since √x13 +1 ≤
                           Rβ 1             R β −3
     √1 = 13 , we have                dx ≤        2 dx, where the right-hand side is convergent.
                                             1 x
                               √
       3x      x2            1    3
                                 x +1
       R∞                                                                               Rβ            R −1
   •    −∞ x4 +1 dx is
           √ 1              convergent. Indeed,      √ 1
                                                      x4 +1
                                                              ≤    √1    =   1
                                                                             x2
                                                                                  and     1
                                                                                        1 x2 dx   =       1
                                                                                                       −β x2 dx   =
                                                                    x4
       [−x−1 ]β1 = 1 − β1   → 1 as β → ∞.
Definition 175.     • Let f ≥ g be two integrable functions on an interval I. The area of the
     region between g, f is defined by the following:
   • Even if I is not bounded, if the improper integral I (f (x) − g(x))dx exists, then we define
                                                         R
the area of the region Dg,f = {(x, y) ∈ R2 : x ∈ I, g(x) ≤ y ≤ f (x)} by the same formula.
   • If D is the disjoint union of such regions, then area(D) is the sum of the areas of such
     regions.
                                                      99
                Figure 75: The area of the region between two functions.
                                                                     √
                                                           f (x) =       a2 − x2
                                                                √
                                                       f (x) = − a2 − x2
• Triangles. D = {(x, y) ∈ R2 : x ∈ [0, a], 0 ≤ y ≤ ab x}, with length a and width b, then
             Ra
  area(D) = 0 ab xdx = [ 2a        2.
                          b 2 a
                            x ]0 = ab
                                          p                 p
                       y 2 ≤ a2 − x2 ⇐⇒ − a2 − x2 ≤ y ≤ a2 − x2 .
                                                     100
      Using cos2 θ =    cos 2θ+1
                            2    ,
                                                   Z       π
                                                           2       p
                                     area(D) = 2                    a2 − a2 sin2 θa cos θdθ
                                                       − π2
                                                   Z       π
                                                           2
                                             =2                    a2 cos2 θdθ
                                                       − π2
                                                       Z       π
                                                               2
                                                   2
                                             =a                    (cos 2θ + 1)dθ
                                                           − π2
                                                                         π
                                                   2       sin 2θ           2
                                             =a                   +θ
                                                             2              − π2
= a2 π.
Length
A curve can be, at least partially, described using a function.
    • parabola. {(x, y) : x ∈ R, y = x2 }.
                                      √
    • hyperbola. {(x, y) : x ∈ R, y = x2 + 1}.
   As we defined the area of a general region using integral, we can define the length of a curve
with integral.
Definition 177. For a curve represented by Gf := {(x, y) : x ∈ [a, b], y = f (x)}, where f is
differentiable and f 0 is continuous, we define the length by
                                                  Z bp
                                        `(Gf ) :=     1 + f 0 (x)2 dx.
                                                       a
If Gf is a union of such graphs, then `(Gf ) is defined to be the sum of the lengths of the partial
graphs.
     Let us see that this coincides with the case of segment: a segment that goes by a horizontally
and b vertically is represented by {(x, y) : x ∈ [0, a], y = ab x}. Hence f (x) = ab x, f 0 (x) = ab . By
                       Raq                  q              √
definition, `(Gf ) = 0 1 + ( a ) dx = a 1 + ( ab )2 = a2 + b2 , which coincides with the length
                                  b 2
Proposition 178. Let f be differentiable and f 0 be continuous. Then for any  there is δ such
that if a = x0 < x1 < · · · < xn = b is a partition P with |P | < δ, then |`(Gf ) − `(Pf ({xk }))| < .
                                                            101
                               Figure 77: The length and segments.
Proof. By the mean value theorem, there are xk ≤ ξk ≤ xk+1 such that f (xk ) − f (xk+1 ) =
f 0 (ξk )(xk − xk+1 ). Since Pf ({xk }) is the union of segments,
                                      n−1
                                      Xp
                     `(Pf ({xk }) =             (xk+1 − xk )2 + (f (xk+1 ) − f (xk ))2
                                      k=1
                                      n−1
                                      Xp
                                  =             (xk+1 − xk )2 + f 0 (ξk )2 (xk+1 − xk )2
                                      k=1
                                      n−1
                                      Xp
                                  =             1 + f 0 (ξk )2 (xk+1 − xk ).
                                      k=1
On the other hand,                  2 is continous and hence integrable. By uniform continuity, there
                       p
                         1 + f 0 (x)p
is δ such that | 1 + f 0 (x)2 − 1 + f 0 (y)2 | < b−a     if |x − y| < δ. With such a parition, we have
                   p                                  
     p                                        p
                0                                     0
S I ( 1 + f (x) , P ) ≤ `(Pf ({xk }) ≤ S I ( 1 + f (x)2 , P ).
                   2
     If |P | is small,
                  p the difference between these sides are smaller than , and S I ( 1 + f (x) , P ) ≤
                                                                                    p
                                                                                            0  2
                                                       102
Series and convergence
Let
Pn us recall that we have considered sequences of numbers a1 , a2 , · · · , an , · · · , and the series
  k=1 ak , that is a new sequence
                                                                             n
                                                                             X
                                     a1 , a1 + a2 , a1 + a2 + a3 , · · · ,         ak , · · ·
                                                                             k=1
    For these examples, we know the exact form of the n-th sum. For other series, it is difficult
to compute such general term, but still we may be able to say whether the series converges or
not.
    For example, let us take an = n1 and consider nk=1 k1 . This is called the harmonic series.
                                                  P
As we have seen, this sum is larger than the integral of x1 on [1, n + 1]:
                                                 Z    n+1           n
                                                            1      X1
                                                              dx ≤    .
                                                  1         x       k
                                                                   k=1
                                                                               R n+1 1
On the other hand, we can calculate the left-hand side    Pn 1and  we  obtain   1
                                                                                                   n+1
                                                                                     x dx = [log x]1   =
log(n + 1), and this diverges as n → ∞. Therefore, k=1 k diverges as well.
Lemma 181. If ∞
                     P
                       k=1 ak converges, then ak → 0 as k → ∞.
            P∞
 Pn As
Proof.          k=1 ak is convergent to S, for any P     > 0 there is N such thatPif n > N then
| k=1 ak −S| < 2 . In particular, we have S− 2 < nk=1 ak < S+ 2 and S− 2 < n+1
                                                  
                                                                                               a < S+ 2 .
                                          Pn+1       Pn                         Pn+1 k=1Pkn
     From this it follows that − < k=1 ak − k=1 ak < , that is, | k=1 ak − k=1 ak | =
|an+1 | < . This means that an → 0.
                          Pn
Example 182.            •    k=1 k does not converge because ak = k diverges.
       Pn 1 k
     •   k=1 ( 2 ) converges to 1, and in this case indeed ( 2 ) converges to 0.
                                                               1 k
       Pn 1
     •   k=1 k diverges, although in this case indeed k converges to 0.
                                                          1
                                                             103
(iii) Suppose the contrary that ∞
                                                                           P∞      P∞
                                    k=1 (ak + bk ) converges. Then by (ii)
                                 P
                                                                        P∞ k=1 bk = k=1 (ak +
      bk ) − ak would converge, which contradicts the assumption that k=1 ak + bk diverges.
                      • The series              1
                                                    + 21n ) diverges, because       1
                                                                                        diverges while       1
                                                                                                                 con-
                                         P                                      P                        P
Example 184.                                 n( n                                 n n                     n 2n
    verges.
   • The series            and              both diverge, but the sum                                converges to 0.
                  P              P                                         P                    P
                      n1             n −1                                       n (1 − 1)   =   n0
                                                           1      1  1
                                              an =               = −    ,
                                                        n(n + 1)  n n+1
     hence with bn = − n1 , this is a telescopic
                                               Pseries.  By the argument above, we see that
       n                                         ∞
              1                       1
                                         , and
     P
       k=1 n(n+1) = bn+1 − b1 =  1 − n+1            a
                                                 k=1 n  = 1.
     Next let us take an = log(n/(n + 1)). Then it holds that an = log nP− log(n + 1), hence
     with bn = − log n this is a telescopic series. As bn → −∞, the series n an diverges.
                                                                                  n+1
   • Geometric series. Let us take x ∈ R, x 6= 1. We know that nk=0 ak = 1−a    1−a . It is clear
                                                                 P
The right-hand side is again a function of x. In this sense, a convergent series which depends on
x defines a new function.
   We have seen other examples of this type:
                                      ∞
                                 x
                                      X xn                    x2 x3 x4    x5
                              e =                 =1+x+         +   +   +    ···
                                             n!               2   6   24 120
                                      n=0
                                       ∞
                                      X     (−1)n x2n+1     x3   x5
                            sin x =                     =x−    +     + ···
                                             (2n + 1)!      6    120
                                      n=0
                                       ∞
                                      X     (−1)n x2n     x2 x4
                           cos x =                    =1−   +    + ···
                                              (2n)!       2   24
                                      n=0
In all these cases, for a fixed x ∈ R, we have seen that the right-hand converges by the Taylor
formula with remainder.
    In a similar way, we can define many other useful functions by series.
                                                            104
Dec. 9. Convergence criteria for positive series
Let{an } ⊂ R be a sequence. We have considered a series nk=1 ak , which is a new sequence and
                                                        P
its convergence or divergence.
    When all the term are non-negative: an ≥ 0, there are some criteria that can be often used
to determine the convergence or divergence.
   • Let {an }, {bn } be two sequences, an P  ≥ 0, bn ≥ 0 such that there isPc > 0 and  Pan ≤ cbn for
      n sufficiently
               P      large. In this case, if   bn converges, then so does     an . If      an diverges,
      so does    bn .
     • If an /bn →6= 0, then this implies that 2c bn ≤ an ≤ 2cbn for sufficiently large n, hence the
       previous point applies.
                                         Pn
    We have seen that, for 0 < a < 1,          k=1 a
                                                    k   converges to   1−a .
                                                                        1
                                                                               We can use this fact to show
the convergence of some other series.
Proof. The series n θn converges if θ < 1 (the geometric series) and diverges if θ > 1 (θn does
                  P
                                           1              1
not tend to 0). By Theorem 186, and ann < θ or ann > θ, the first two claims follow.
         1                                                    1
     If ann → a < 1, then we can take θ such that ann < θ < 1 for n sufficiently large. The case
 1
ann → a > 1 is analogous.
                    •    n n3n is convergent.
                       P 1
Example 188.
      P n2n
    •   n 3n is convergent.
    •   n 2n is convergent.
      P n
                1
  When limn ann = 1, this criterion does not give information. Indeed, n n1 is divergent, but
                                                                         P
                                                                              1           1
 n n2 is convergent (compare it with    n(n−1) ), while in both cases lim( n ) = lim( n2 ) = 1.
P 1                                  P 1                                   1 n        1 n
  (i) If an+1
                                                    P
          an ≤ θ < 1 for n sufficiently large, then  n an converges.
 (ii) If an+1
                                                    P
          an ≥ θ > 1 for n sufficiently large, then  n an diverges.
                                                    105
               an+1
Proof. Let      an    ≤ θ < 1 for n ≥ N . Then,
aN +m ≤ aN +m−1 θ ≤ aN +m−2 θ2 ≤ · · · ≤ aN θm .
                       •   2n is convergent.
                         P n
Example 190.
       P n2
   •      n!   is convergent.
       P (n!)2
   •               is convergent.
           2n2
                                     1
    When an+1 /an → 1 or ann → 1, we need to study the series better.
Lemma 191 (integral test). Let {an } be a decreasing sequence of positive numbers and  R ∞ assume
that there is a positive
                  P                                    R ∞ on [1, ∞). If an ≤ f (n)
                         decreasing function f (x) defined                        Pand 1 f (x)dx
converges, then n an converges. If an ≥ f (n) and 1 f (x)dx diverges, then n an diverges.
                                                Rn
Proof. For the first case, we have nk=2 an ≤ 1 f (x)dx, and the later converges, hence so does
                                    P
the former.                                 R n+1
    For the first case, we have nk=1 an ≥ 1 f (x)dx, and the later diverges, hence so does the
                                P
former.
Example 192.R Let us fix s ∈ R and consider ∞       n=1 ns . We can compare this with fs (x) = ns .
                                                         1                                     1
                                                  P
                  ∞
We know that 1 fs (x)dx converges if and only if s > 1.
    ζ(s) is called the Riemann zeta function.
LemmaP 193 (condensation principle). Let P{an } be a decreasing sequence of positive numbers.
Then  an converges if and only if so does   2n a2n .
                                                         2n+1
                                                           X−1
                                               n
                                               2 a2n ≥           ak ≥ 2n a2n+1 .
                                                         k=2n
                                                              106
Dec. 10. Convergence criteria for general series
Let {an } be aPsequence, not necessarily positive. We say that the series
                                                                                  P
                                                                                    n an converges
absolutely if n |an | converges.
                    P                                 P
Lemma 195. If n |an | converges, then so does n an .
Pm As n |an | converges, for  >P        0 there is N ∈ N such that for m > n > N it holds that
              P
Proof.
                                           m
Pk=n   |a n | < . Then  it holds that |   k=n ak | <  by the triangle inequality. This means that
  n
  k=1 ka   is a Cauchy  sequence,  hence it converges.
    This Lemma, combined with the criteria for positive series, enables us to show convergence
of many series.
                        P∞
Example 196.          •    n=1 (−1)n ns converges absolutely if s > 1. Indeed, with an = (−1)n ns ,
                                  1                                                            1
     |an | = n1s , and we know that ∞    n=1 ns converges if and only if s > 1.
                                             1
                                       P
     P∞
   •    n=1 (−1)n 2n converges absolutely. Indeed, with an = (−1)n 2n , |an | = 2n and by root test,
                 n                                                 n            n
                                          n 1     1
                                              n   nn   1
                                           n
                                                =     → < 1.
                                          2        2   2
       Therefore, n |an | converges.
                  P
    If a series n an converges absolutely, its limit does not(depend on the order: indeed, as
                P
                                                                   an if an > 0
it is absolutely convergent, then its positive elements bn =                         and the negative
                                                                   0     if an ≤ 0
elements cn = an − bn are both convergent.
                                        P Therefore, even if we sum P      first the positive elements
        and  then the negative elements        , the result is the same:                         n an .
P                                                                                    P         P
     b
   n n                                    n cn                                n bn +   n cn =
    On the other hand, if a series n an converges but not absolutely, their positive and negative
                                  P
parts both diverges (because otherwise it would be absolute convergence). By rearranging the
sum, one can make it diverge to ∞ (by taking many elements of bn ) or to −∞ (by taking many
elements of cn ).
    A series whose terms change sign at each stem is called an alternating series. That is, for
an > 0, it is given by
                             ∞
                             X
                                (−1)n−1 an = a1 − a2 + a3 − a4 + · · · ,
                             n=1
Lemma 197 (Leibniz criterion).          Let {an } be a decreasing sequence with positive terms and
assume that an → 0. Then (−1)n−1 an converges.
                                P
                  Pn                               Pn
Proof. Let sn = Pn k=1 (−1) ak . Then s2n = k=1 (−a2k−1 + a2k ) is decreasing. Analogously
                                k
s2n+1 = −a1 + k=1 (a2k −a2k+1 ) is increasing. In addition, s2n −s2n+1 = −(−a2n+1 ) = a2n+1 ≥
0. Hence s2n and s2n+1 converge to s and s, respectively, and s ≥ s. But s2n+1 ≤ s ≤ s ≤ s2n
and s2n − s2n+1 = a2n+1 → 0, hence s = s.
   •                   1
         (−1)n−1 log(n+1)    is convergent. Note that this series is not absolutely convergent, indeed,
      P
      Pn
        n log(n+1) is divergent.
              1
                                                 107
                     Pn
   • The sequence           1
                        k=1 k   − log n converges. Indeed, this can be seen as
                                            Z 2          Z 3                  Z n
                           1 1          1        1            1                     1
                      1 + + ··· + −                dx −          dx + · · · −         dx
                           2 3         n     1 x          2 x                  n−1 x
                              Z 2             Z 3                  Z n
                          1       1       1         1                     1       1
                       = −          dx + −            dx + · · · −          dx + ,
                          1     1 x       2     2 x                 n−1 x         n
                                                                             R k+1 1
      and the last expression is an alternating series. Note that k1 > k           x dx > k+1 . Further-
                                                                                           1
      more, k → 0 as k → ∞. Therefore, the Leibniz criterion applies and the series converges
             1
Theorem 200 (Dirichlet’s test). Let n an be a series and assume that An = nk=1Pak is a
                                        P                                            P
bounded sequence. Let bn > 0 be a decreasing sequence and bn → 0. Then the series n an bn
converges.
Proof. By Lemma 199, we have nk=1 an bn = An bn+1 + nk=1 Ak (bk − bk+1   Pn). As bk is decreasing,
                                P                        P
we
Pn  have bk − bk+1  > 0 and  since A k is bounded,  say  by C,  we  have   k=1 |Ak (bk − bk+1 )| ≤
   k=1 C(bk − bk+1 ). The latterPis a telescopic series and it is equal to C(b1 − bn+1 ), which
converges (to Cb1 ). Therefore, nk=1 Ak (bk − bk+1 ) is absolutely convergent.
    OnPthe other hand, An bn+1 tends to 0 because |An | < C and bn+1 → 0. Altogether, the
series nk=1 an bn converges.
Theorem 201 (Abel’s test).PLet n an be a convergent series and assume bn is monotonic and
                                  P
convergent. Then the series n an bn converges.
Proof. PWe may assume that bn is decreasing (otherwise we can consider −bn ). In this case,
An = n an is bounded (because it is convergent), and with B = lim bn , bn − B is decreasing
and converges to 0.
    Hence we can apply Dirichlet’s test to the  first term of
                                                              P            P                  P
                                                                 n an bn =   n an (bn − b) +     n an b,
where the last term is convergent because n an is convergent.
                                           P
Example 202. Let us consider the series n sinnnπθ        , where θ = pq is a rational number and
                                              P
                                                      s
s > 0. This is not absolutely convergent, nor alternating (if q 6= 2). On the other hand, the
                                                                                     (n+2q)πp
sequence sin nπθ is periodic, because sin x is 2π-periodic, that is, sin nπp
                                                                           q = sin      q     . Let us
assume q is even and p is odd. Then sin (n+q)πp   = − sin nπp
                                                            q , and hence also the sum
                                                                                          P
                                            q                                                n sin nπθ
is periodic, and in particular bounded.
                                                   108
   In this case, as n1s is monotonically decreasing and tends to 0, we can apply Dirichlet’s test
and conclude that ∞        sin nπθ
                                   is convergent. Similarly, ∞    cos nπθ
                   P                                        P
                       n=1    ns                              n=1   ns
   The same holds for q odd, and actually for any θ ∈ R.
                                      2
   • The heat equation ∂u ∂t = α ∂x2 (this is partial differential equation, because it contains
                                 ∂u
This is because the rate of change (the derivative) is often determined by the current status (the
function). For example, in the equation of motion, the gravitational force − mM      x2
                                                                                        G
                                                                                          depends on
the place of a particle x(t), while the force determines the the rate of change of the speed (the
acceleration), and the speed is x0 (t), hence the second derivative appears on the left-hand side.
   Once the scientific problem is written in the form of differential equation, it is a mathematical
problem to solve it, that is, to find functions that satisfies the given equation.
   In the following, y(x) will be a function of x and the derivatives are denoted by y 0 (x), y 00 (x)
and so on. Some more examples of differential equation are
• y 0 (x) = y(x)
    In a differential equation, certain higher derivative of y may appear. The highest order of
the derivative of y is called the order of the differential equation. For example,
We need to find functions y(x) that satisfy the given equation. This is why it is called a differential
equation. Compare it with an algebraic equation x2 + 3x − 4 = 0, where we need to find numbers
that satisfy this equation.
   Let us consider first-order differential equations. In an abstract form, we can write it as
y 0 = f (x, y),
where f is explicitly written in examples, while y is the unknown functions which we need to
find. In the example y 0 (x) = 2y(x), we take f (x, y) = 2y. A solution of a differential equation
is a (differentiable) function that satisfies this equation. For example, by taking y(x) = Ce2x ,
we can check that this is a solution:
                                                      109
Some first-order differential equations
The simplest case is where f does not depend on y: that is,
y 0 (x) = f (x).
Example 203. When a ball falls freely without drag, the speed −gx is proportional to the time
x. As the speed is the derivative of the position y, we have the equation
                                           y 0 (x) = −gx.
                                                                 2
This can be solved by integration, that is y(x) = (−gx)dx = − gx2 + C. The constant C
                                                       R
    As we see in this example, a differential equation may have many solutions. In practice, we
are interested in one of them which satisfies additional conditions, the initial conditions or
boundary conditions, that give the value of y, y 0 at a given time x.
    Next, let us consider again the simplest differential equation y 0 = f (x, y) where f depends
on y.
Theorem 204. Let a, C ∈ R. Then there is only one (differentiable) function y such that
y 0 (x) = ay(x) and y(0) = C.
Proof. We know that there is one such function: y(x) = Ceax . Indeed, we can check that
y 0 (x) = aCeax = ay(x) and y(0) = Ce0 = C.
      Suppose that there is g(x) with the same condition. Let h(x) = e−ax g(x), then h0 (x) =
−ae−ax g(x) + e−ax g 0 (x) = −ae−ax g(x) + ae−ax g(x) = 0 for all x ∈ R, hence h(x) must be a
constant. As h(0) = e0 g(0) = C, h(x) = C hence g(x) = Ceax .
   • A very typical example is radioactive atoms. Let y(x) be the number of a single species
     of radioactive atoms at time x. It is known that each atom decays, independently from
     other atoms, in a certain time period by a certain probability. This means that, at each
     moment, the rate of decrease in numbers y(x) is proportional to y(x). With a constant a,
     we can write this as
                                         y 0 (x) = −ay(x).
      If there are C atoms at time x = 0, we know that the solution is y(x) = Ce−ax , hence the
      number of radioactive atoms decays exponentially. This can be written more conveniently
      as y(x) = Ce−ax = C2−ax/ log 2 . Then with T = loga 2 , we have y(x) = C2−x/T , and it is
      clear that the number of atoms halves in time T . T is called the half life of this particular
      species of atom.
   • Another instance is the SIR model in epidemiology. We consider the total population N , the
     numbers of S(t) (succeptible), I(t) (infected) and R(t) (removed/recovered). It is assumed
     that each infected people has contact with a certain number of people in each day, hence
     this number is proportional to S(t)
                                      I(t) , and assume that in each such contact transmission
                                                110
                                   C
                                   C
                                   2
                                   C
                                   4
T 2T
Figure 78: The exponential decay y(x) = Ce−ax = C2−ax/ log 2 The half-life is T = log 2/a.
      occurs by the rate β. On the other hand, each infected people lose infectivity by the rate
      γ.
                                            dS      βS(t)I(t)
                                                =−
                                            dt          N
                                            dI    βS(t)I(t)
                                                =           − γI(t)
                                             dt       N
                                            dR
                                                = γI(t)
                                            dt
      It is difficult to solve this set of equations. Yet, we can understand the behaviour when
      there are few infected people I(t) compared to the total number N . When I(t) is small,
      then R(t) is also small and S(t) = N − I(t) − R(t) is close to N . By putting S(t) = N , we
      have
                                              dI
                                                 = (β − γ)I(t).
                                              dt
      As a function of t, we know that I(t) = Ce(β−γ)t , where C is the number of infected at day
      t = 0. This epidemic grows when β − γ > 0, and decays when β − γ < 0. R0 = βγ is called
      the basic reproduction number in the SIR model. When R0 > 1 the epidemic grows
      and when R0 it decays.
This contains the second derivative y 00 , hence it is a second-order differential equation. Here,
P (x), Q(x), R(x) are a known function and we have to find a function y(x) which satisfies this
equation. Such an equation is called linear homogeneous differential equation of second-
order. (Linear means that there is no term containing y 2 , y 3 , (y 0 )2 etc., homogeneous means that
the right-hand side (the term which does not depend on y) is 0.)
    If y1 , y2 are two solutions of a homogeneous equation, then then so is y3 (x) = ay1 (x)+by2 (x),
because
If y1 , y2 are two solutions of a linear equation, then the difference y1 − y2 is a solution of the
equation where R(x) is set to 0.
                                                    111
   A simplest of such equations is one where P (x), Q(x), R(x) are constant:
y 00 + ay 0 + by = 0.
y 00 + by = 0.
   • Case 2. b < 0. In this case, the equation can be written as y 00 = k 2 y where b = −k 2 and
     we can take easily check that y(x) = C1 ekx + C2 e−kx is a solution for any constant C1 , C2 .
     Indeed, y 0 (x) = kC1 ekx − kC2 e−kx and y 00 (x) = k 2 C1 ekx + (−k)2 C2 e−kx = k 2 (C1 ekx +
     C2 e−kx ) = k 2 y(x).
These solutions are all, and no other solutions (we present later the general uniqueness theorem).
   Note that, in all these cases, there are two constants C1 , C2 . If we require an initial condition
• y(a) = b1
• y 0 (a) = b2
                      ax           ax     a2    ax            ax   a         ax               ax
    0 = (u00 (x)e−    − au0 (x)e− 2 + u(x)e− 2 ) + a(u0 (x)e− 2 − u(x)e− 2 ) + bu(x)e− 2
                       2
                                          4                        2
                                2
                                 
                   ax         a              ax
     = u00 (x)e− 2 + b −            u(x)e− 2
                               4
                                     2
                                                                 ax
hence if u safisfies u00 (x) + b − a4 u(x) = 0, then y(x) = u(x)e− 2 satisfies y 00 + ay 0 + by = 0.
We know how to solve the former, hence so the latter.
                                                                                        x
Example 205. Consider the equation y 00 + y 0 − y = 0.√ Then we can√
                                                                       write y = ue− 2 and then u
                                                         5           5
should satisfy
        √
               u00 − 54 u√= 0. We know that u(x) = C1 e 2 x + C2 e− 2 x is a solution of this, hence
           5−1                5+1
y = C1 e   2
               x
                   + C2 e −   2
                                  x
                                      .
                                                        112
   Next, let us consider the inhomogeneous case, that is y 00 + ay 0 + by = R(x). In some cases
we can find solutions.
                                                                                  2
Example 206. Take R(x) = x2 . Then y(x) = 1b (x2 − 2ax     b +
                                                                 2a −2b
                                                                   b2
                                                                        ) is a solution. Indeed,
y (x) = b (2x − b ), y (x) = b .
 0       1       2a   00     2
Physical examples
   • Simple harmonic motion. Consider a mass m which is attached to a spring. Let us call
     x(t) the position of the mass. When a spring is stretched by the distance r, then it pulls
     back the mass by the force kr. Similarly, when a spring is pressed by the distance r (hence
     the mass is displaced to −r), then it pushes back the mass by the force kr. Together with
     the direction of the force, it can be written as −kx.
     The equation of motion is about the variable x(t) and the acceleration is a(t) = x00 (t),
     hence F (x) = ma = mx00 becomes
   • In addition to the previous example, let us consider the case where the mass lies on a floor
     hence receives the friction. The friction is proportional to the velocity and in the converse
     direction. Therefore, the equation of motion is
     or x00 +   c
                mx   +   k
                         mx   = 0.
     The solution is given by solving u00 + ( m  k
                                                    − 14 ( m ) )u = 0. If −s2 =
                                                           c 2                        k
                                                                                      m   − 14 ( m ) < 0, then
                                                                                                 c 2
                             c                 c
     we have x(t) = C1 e −( 2m −s)t
                                    + C2 e −( 2m +s)t
                                                      .
     If x(0) = a, x0 (0) = 0, then C1 +C2 = a, −( 2m
                                                   c
                                                     −s)C1 +−( 2m c
                                                                    +s)C2 = 0 hence C1 = C2 = a2 ,
     and
                                           a  −( c −s)t        c
                                                                       
                                    x(t) =    e 2m       + e−( 2m +s)t
                                           2
     Note that 2m c
                      > s, hence this decays exponentially. This means that the mass arrives at 0
     without going back and forth.
     We leave the remaining case         k
                                         m   − 14 ( m ) ≥ 0 as exercises.
                                                    c 2
                                                      113
Dec. 17. Ordinary differential equations
General remarks
Many interesting differential equations are nonlinear: just for example, the motion in a gravita-
tional field is given by
                                                    mM G
                                          mx00 = −
                                                      x2
(in one-dimension). And it is difficult to solve such nonlinear equations.
    Let us consider a first-order differential equation
                                            y 0 = f (x, y),
that is, f is a given function of two variables and the question is to find a function y(x) such
that y 0 (x) = f (x, y(x)) for x in a certain interval.
   If from the differential equation y 0 = f (x, y) we can derive a relation between x, y of the form
                                           F (x, y, C) = 0,
where F (x, y, C) is another two-variable function with a parameter C (hence 3-variables), then we
say that the differential equation is solved, or integrated. This is because the relation F (x, y, C) =
0 for a fixed number C defines a function  √ y(x) implicitly: recall that, if F (x, y) = x + y − C ,
                                                                                          2     2     2
G(y(x)) = H(x).
    Conversely, if y(x) is the solution of the differential equation, then by integrating both sides
of A(y(x))y 0 (x) = Q(x) by substitution, we have G(y(x)) = H(x) + C for some constant C. Note
that H(x) + C is a primitive of Q(x), hence we proved the claim.
                                                                         y0
Example 208.      • Consider y 0 = y 2 x. This can be written as         y2
                                                                              = x. Each sides can be
    integrated, and we obtain
                                           1    x2
                                          − =      + C,
                                           y     2
      or y = − x2 1       .
                 2
                     +C
                                                 114
   • Consider y 0 =     y.
                        x
                              This can be written as yy 0 = x. Each sides can be integrated, and we
     obtain
                                                    y2   x2
                                                       =    + C,
                                                    2    2
               √
      or y =       x2 + 2C.
                                                                     y(y−1)            y0
   • Consider xy 0 + y = y 2 . This can be written as y 0 =             x   ,   or   y(y−1)   = x1 . Each sides can
     be integrated, and we obtain
                                                        1
                                              log 1 −     = log x + C,
                                                        y
                               Z
                                    1+v                      1
                                           dv = arctan v + log(1 + v 2 )
                                    1 + v2                   2
                                   Z
                                       1
                               −         dx = − log x + C
                                      x
By bringing back y = vx, we have arctan xy + 12 log(x2 + y 2 ) = C.
Concrete applications
   • Consider a falling body in a resisting medium. For example, we drop a ball from a window.
     The gravitational force is constant g when the body moves the distance much shorter than
     the radius of the Earth. In addition, the ball is resisted by the air and the resistance
     is proportional to the velocity. To express this in a differential equation, let v(t) be the
     velocity of the ball at time t, we leave it at time t = 0 from the height 0. Then
                                                  mv 0 = mg − kv,
                                                                                                     k
            mv 0                                                                                mg − m
      or   mg−kv   = 1. This is separable. We have − m
                                                     k log(mg − kv) = t + C, or                 k e
                                                                                                       t+C
                                                                                                             = v(t).
   • Let us consider a small particle in a large medium. If the temperature of the particle and
     that of the medium is different, then the changing rate of the temperature is proportional
     to the difference of the temperature. As the medium is large, we may assume that only the
     temperature y(t) changes with y(0) = T , while the medium remain in the same temperature
     M . In a differential equation,
                                         y 0 (t) = k(M − y(t)).
      This is again separable.
                                                      115
Dec. 20. Complex numbers
Consider the equation x2 + 1 = 0. There is no real solution x that satisfies this equation because
x2 > 0, hence x2 + 1 > 0. Yet, it is possible to extend the set of real numbers in such a way to
include solutions to this equation. One of such solutions is denoted by i, that means i2 = −1.
This is a formal definition, and it is customary to denote (a, b) by a + ib. a is called the real
part and b is called the imaginary part.
   The set of complex numbers is denoted by C.
• (Commutativity) x + y = y + x, xy = yx.
Proof. All these properties can be shown by calculating both sides and using the properties of
real numbers. For example, for x = a + ib = (a, b), y = c + id = (c, d), x + y = (a, b) + (c, d) =
(a + c, b + d) = (c + a, d + b) = y + x, xy = (a, b)(c, d) = (ac − bd, ad + bc) = (ca − db, da + cb) =
(c, d)(a, b) = yx.
     Other properties are left as exercises.
We write ( a2 +b
               a
                 2 , a2 +b2 ) for 1/(a, b), or (a, b)
                       −b                             −1 .
Imaginary unit
The complex number (0, 1) satisfies (0, 1)2 = (−1, 0). Indeed, (0, 1)2 = (0 − 1, 0 + 0) = (−1, 0).
With the understanding that (−1, 0) = −1 ∈ R, we can interpret this as (0, 1) = i ∈ C.
   For a real number a and a complex number (b, c), we have a · (b, c) = (a, 0)(b, c) = (ab, ac).
                                                      116
                                 −1 + 2i •
2 + i•
−0.5 − i •
                                                 z1 + z2
                                         z2
                                         •
• z1
    With this understanding, any complex number (a, b) can be written as a + ib, where a =
(a, 0), b = (b, 0), i = (0, 1). We can perform all the usual computations using i2 = −1, for
example,
Geometric interpretation
As we represented a real number as a point on the line, we can represent a complex number on
the plane:
    This is helpful especially when we consider various operations on complex numbers. For
example, any complex number (a, b) can be considered as a segment from (0, 0). Then the sum
can be found by forming a parallelogram.
    As we identify a complex number with√a point on the plane, for each complex number (a, b)
the length of the segment (0, 0)–(a, b) is a2 + b2 and the angle from the horizontal axis (the
real numbers), and we can write this as (a, b) = (r cos θ, r sin θ).
                                                117
                                                            z = (r cos θ, r sin θ)
                                                        •
With two complex numbers (r1 cos θ1 , r1 sin θ1 ), (r2 cos θ2 , r2 sin θ2 ), we have
Therefore, the product has the length r1 r2 and the angle θ1 + θ2 . In particular, if we take
z = (r cos θ, r sin θ), then we have z n = (rn cos nθ, rn sin nθ).
   From this, we can conlude that for any z ∈ C there is the n-th root in C. Indeed, if
                                                 1           1
z = (r cos θ, r sin θ), then we can take w = (r n cos nθ , r n sin nθ ).
|x − y| ≤ |x − z| + |z − y|.
This is literally a triangle inequality, in the sense that |x − y|, |y − z|, |z − x| are the lengths of
the sides of the triangle formed by x, y, z in C.
    For x = a + ib, let Re x = a, Im x = b. We have |x| ≥ |Re x|, |Im x|, while |x| ≤ |Re x| + |Im x|.
Complex sequences
Let us consider a sequence of complex numbers {xn } and x ∈ C. We say that xn converges to x
(and write xn → x) if |xn − x| → 0. Note that {|xn − x|} is a sequence of real numbers, hence
we can use the definition and results there.
                                                       118
                                                             z1 •
z2 •
−z2
   In particular, we say that {xn } is a Cauchy sequence if for any  > 0 there is N such that
|xm − xn | <  if m, n > N . In this case, {xn } is convergent.
Complex series
If {zn } are complex numbers, we can also consider the series
                                      n
                                      X
                                            z k = z0 + z1 + · · · + zn .
                                      k=0
We can define the convergence of the series as the convergence of the sequence nk=0 zk just as
                                                                                 P
with real numbers.
    We say that the series nk=0 zk converges absolutely if nk=0 |zk | converges.
                          P                                    P
Lemma 212. If nk=0 zk converges absolutely, then the series nk=0 zk converges.
                 P                                              P
Proof. If nk=0 zk converges absolutely, then nk=0 |zk | is a Cauchy sequence, and hence nk=0 zk
          P                                 P                                          P
is a Cauchy sequence (in the complex sense as above) because
                         n
                         X            m
                                      X                n
                                                       X             n
                                                                     X
                               zk −         zk =             zk ≤           |zk | < 
                         k=0          k=0          k=m+1            k=m+1
                                                                          Pn              Pn
for sufficiently large m, n, by triangle inequality.
                                           Pn        This implies that Re  k=0 z k and Im  k=0 zk
are Cauchy, hence converge. Therefore, k=0 zk converges by Lemma 211.
                                                       119
   Recall that, for real number x, we have proved
                                                 ∞
                                                 X xn
                                         ex =                 ,
                                                         n!
                                                 n=0
                                                  ∞
                                                 X       (−1)n x2n+1
                                       sin x =                       ,
                                                          (2n + 1)!
                                                 n=0
                                                  ∞
                                                 X       (−1)n x2n
                                       cos x =                     .
                                                           (2n)!
                                                 n=0
We can extend these functions by replacing x by a complex number z. Indeed, the series
                                                   N
                                                   X zn
                                                          n!
                                                   n=0
                                                   n
is convergent by the ratio test: with an = |z|   an+1  |z|
                                            n! , an = n+1 → 0.
                                                                                         zn
    Therefore, we can define the complex exponential function ez by ez = ∞               n! .   Let us see
                                                                        P
                                                                          n=0
what happens if z = iθ, θ ∈ R.
                            ∞                ∞             ∞
                       iθ
                            X (iθ)n          X (iθ)2n      X (iθ)2n+1
                      e =                =               +
                                  n!               (2n)!     (2n + 1)!
                            n=0              n=0            n=0
                                              ∞        n  2n           ∞
                                             X     (−1) θ              X (−1)n θ2n+1
                                         =                        +i
                                                       (2n)!                 (2n + 1)!
                                             n=0                       n=0
                                         = cos θ + i sin θ.
This last formula is called the Euler formula. In particlar with θ = π, we have eiπ = −1, or
eiπ + 1 = 0.
    We can also extend cos z, sin z to complex variables (convergence is proven again by ratio
                                                       120
test). As we have
                                     ∞
                                     X θn
                             eθ =               ,
                                           n!
                                     n=0
                                      ∞
                                     X     (−1)n θn
                            e−θ =                   ,
                                             n!
                                     n=0
                                      ∞
                                     X     (−1)n (iθ)2n
                        cos(iθ) =
                                              (2n)!
                                     n=0
                                      ∞
                                     X     (−1)n (−1)n θ2n
                                =
                                               (2n)!
                                     n=0
                                      ∞
                                     X      θ2n   1
                                =                = (eθ + e−θ ) = cosh θ,
                                           (2n)!  2
                                     n=0
                                      ∞
                                     X     (−1)n (iθ)2n+1
                         sin(iθ) =
                                             (2n + 1)!
                                     n=0
                                      ∞
                                     X     (−1)n i(−1)n θ2n+1
                                =
                                               (2n + 1)!
                                     n=0
                                       ∞
                                      X       θ2n+1    i
                                =i                    = (eθ − e−θ ) = i sinh θ,
                                            (2n + 1)!  2
                                      n=0
     Furthermore, this explains why the differential equation y 00 + y = 0 has a general solu-
tion y(x) = C1 sin x + C2 cos x. By formally applying (this can be justified by the mate-
rial in Mathematical Analysis II) the chain rule, we have D(eix ) = ieix , D2 (eix ) = −eix and
D(e−ix ) = −ie−ix , D2 (e−ix ) = −e−ix , hence they are formally two solutions of the equation
y 00 + y. Hence their linear combinations sin x = 2i
                                                  1
                                                     (eix − e−ix ) and cos x = 12 (eix + e−ix ) satisfy
the same equation.
                                                        121
                            Figure 83: The integral curves of y 0 = x.
    Yet, as we have seen, a differential equation gives a vector field as in Figure 83, and it
should be enough to “chase the arrows”. For this to be possible, f (x, t) should have certain nice
properties. We only state the theorem, and leave the proof to a more advanced book.
    For this purpose, we need the following concept: Let f (x, y) be a function of two variables,
that is, f gives a number for a given pair of numbers (x, y). For each fixed y, we can think of
f (x, y) as a function of x, and hence take the derivative with respect to x. This is called the
partial derivative with respect to x, and denoted by ∂f ∂x .
                                                            ∂f
Example 213.        • Let f (x, y) = x2 + y 2 . Then        ∂x   = 2x.
                               ∂f
   • Let f (x, y) = xy. Then   ∂x    = y.
                                       ∂f
   • Let f (x, y) = sin(xy 2 ). Then   ∂x    = y 2 cos(xy 2 ).
                                     ∂f
   It is also possible to consider   ∂y .   The detail will be explained in Mathematical Analysis II.
                                                 ∂f
Theorem 214. Suppose that f (x, y) and           ∂x   are continuous in a rectangle
Then there is δ1 such that the equation y 0 = f (x, y) has a unique solution y(x) with initial
condition y(x0 ) = y0 for x0 − δ1 < x < x0 + δ1 .
Euler method
The proof of this theorem amounts to construct approximate solutions. At the end, for appli-
cations in science and engineering, we are satisfied with having sufficiently good approximate
solutions.
    There are many methods to obtain a numerical solution of a differential equation. One of the
simplest of them is called the Euler’s method, and it literally chase the vector field as follows.
                                                      122
        Figure 84: The Euler method to solve y 0 = x with (x0 , y0 ) = (0, 1) with  = 0.5.
    Let us consider the differential equation y 0 = f (x, y) with the inizial condition y(x0 ) = y0 ,
where x0 , y0 ∈ R. This means that the solution y(x) passes the point (x0 , y0 ). Furthermore, by
“chasing the arrows”, the slope of the curve y(x) at the point (x0 , y0 ) is f (x0 , y0 ). That is, if we
take a small step , then the next point on the curve is close to (x0 + , y0 + f (x0 , y0 )) = (x1 , y1 ).
Then, again at the point (x1 , y1 ), the slope of the curve is f (x1 , y1 ), hence the next point on the
curve is close to (x1 + , y1 + f (x1 , y1 )) = (x2 , y2 ), and so on. In this way, we obtain a union of
segments which approximates the solution.
    If we take smaller , the approximation gets better, while we need do more computations.
import numpy as np
from scipy . integrate import odeint
import matplotlib . pyplot as plt
x = np . linspace (0 , 5 , 500)
# parameter k
k = 0.5
# equation y ’ = k y
def deriv (y , x ):
    return k * y
                                                    123
ret = odeint ( deriv , y0 , x )
y = ret
x = np . linspace (0 , 5 , 500)
# parameter k
k = -10
def deriv (y , x ):
    return [ y [1] , k * y [0]]
124