Riemann Stieltjes Integration
Riemann Stieltjes Integration
S Kumaresan
kumaresa@gmail.com
27 January 2021
Contents
Abstract
We treat the case of Riemann integrals first and motivate the results geometrically.
When we prove the results analytically, we observe that the proof can be adopted to prove
the corresponding results in Riemann-Stieltjes integrals where the integrator is an in-
creasing function. The aim of this article is to give an outline of the theory of Riemann-
Stieltjes integral closely following the development of Riemann integral in our book [1].
The approach may develop geometric ideas of Riemann integrals and hone the analytical
skills.
You may also like to watch the videos in [5] on which this article is based.
1
1 Darboux Approach to Riemann Integration
n−1
X
L( f , P ) := m i ( f )(x i +1 − x i ), where m i ( f ) := GLB { f (x) : x ∈ [x i −1 , x i ]}
i =0
n−1
X
U ( f , P ) := M i ( f )(x i +1 − x i ), where M i ( f ) := LUB { f (x) : x ∈ [x i −1 , x i ]}
i =0
We now look at some examples of upper and lower sums. They also teach us the impor-
tance of choosing partitions smartly depending on the nature of the functions.
Example 4 (Constant functions). Let f : [a, b] →PR be a constant, say, c. Let P be any partition.
Then we see that m i = M i = c. Hence L( f , P ) = ni=1 c(x i − x i −1 ) = c(b − a). Similarly U ( f , P ) =
c(b − a). This validates our intuition that the “area" under the graph should be c(b − a).
2
Example 5 (Step Functions). Let f : [−1, 1] → R be defined by f (x) = 1 for −1 ≤ x < 0, f (0) = 10
and f (x) = 2 for 0 < x ≤ 1. Draw picture of the graph of f . Can you convince yourself that the
area under graph should be 3? Our intuition says that the point x = 0 needs special attention.
Let us consider the partition P n := {−1, x 1 = − n1 , x 2 = n1 , 1}. We see that m 1 = M 1 = 1, m 2 = 1
and M 2 = 10 and m 3 = M 3 = 2. Hence we find that
µ ¶ µ ¶ µ ¶
1 2 1 1 2
L( f , P n ) = 1 1 − +1× +2 1− = 3 1− +
n n n n n
µ ¶ µ ¶ µ ¶
1 2 1 1 2
U ( f , Pn ) = 1 1 − + 10 × + 2 1 − = 3 1− + 10 × .
n n n n n
(If f is decreasing what are the lower and upper sums?) If we take finer partitions do the upper
and lower sums come close to each other? That is, we ask: can we make U ( f , P ) − L( f , P )
arbitrarily small? Note that
n
X
U ( f , P ) − L( f , P ) = [ f (x i ) − f (x i −1 )](x i − x i −1 ).
i =1
Note that if all the lengths x i − x i −1 of the subintervals are equal, say, (b − a)/n, then we have
b−a Xn b−a
U ( f , P ) − L( f , P ) = [ f (x i ) − f (x i −1 )] = ( f (b) − f (a)). (1)
n i =1 n
These examples lead us to believe that as we take finer and finer partitions, we seem to
approximate the desired area well.
Example 7 (Dirichlet’s Function). Let f : [0, 1] → R be the function f (x) = 1 if x ∈ Q ∩ [0, 1] and
f (x) = 0 if x ∈ [0, 1] \ Q. Let P be any partition. Then m i = 0 and M i = 1 thanks to the density
of rational numbers as well as the irrational numbers. Hence we find that
X X
L( f , P ) = 0 × (x i − x i −1 ) = 0 and U ( f , P ) = 1 × (x i − x i −1 ) = 1.
i i
3
This is true for any partition P . What do we infer? If at all there is meaning to the area under
the graph of the function, our approach is not going to give approximations to the ‘area’. So
there are two possible policy decisions we have to take. We love our approach and so we
declare that the area of the region under the graph of f has no meaning. Or, we should devise
a better system. We adopt the first and leave the second to a later course!
Ex. 8. Prove that m(b − a) ≤ L( f , P ) ≤ U ( f , P ) ≤ M (b − a) for any partition P of [a, b].
Ex. 9. Let f : [a, b] → R be bounded. Let c ∈ (a, b). Let P 1 (respectively, P 2 ) be a partition
of [a, c] (respectively, of [c, b]). Show that P := P 1 ∪ P 2 is a partition of [a, b]. Show also that
L( f , P ) = L( f , P 1 ) + L( f , P 2 ) and U ( f , P ) = U ( f , P 1 ) +U ( f , P 2 ).
A very pedantic formulation is as follows. Let f 1 (respectively, f 2 ) be the restriction of f to
[a, c] (respectively, to [c, b]). Prove that L( f , P ) = L( f 1 , P 1 ) + L( f 2 , P 2 ) and so on.
Definition 10. Given a partition P = {x 0 , . . . , x n }, we insert a new node, say, t such that x i < t <
x i +1 for some i and get a new partition Q. Then drawing pictures of a non-negative function,
it is clear that L( f ,Q) ≥ L( f , P ). Similarly, U ( f ,Q) ≤ U ( f , P ). Draw pictures. See Figures 6.4–
6.5 on Page 178 in [1]. (We shall prove this later.) Thus, Q produces a better approximation to
the area bound by the graph. This suggests that to get the “real" area we should look at
L( f ) := LUB {L( f , P ) : P is a partition of [a, b]}
U ( f ) := GLB {U ( f , P ) : P is a partition of [a, b]}.
These numbers exist (why?) and are called respectively the lower and upper integral of f
on [a, b]. The upper integral of f on [a, b] may be understood as the best possible approx-
imation to the area of the region under the graph as approximated from above. How do we
understand the lower integral?
Ex. 11. Show that L( f ) ≤ U ( f ) for any bounded function f : [a, b] → R.
Definition 12. We say that f is Darboux integrable (or simply integrable) on [a, b] if the upper
and lower integrals coincide. (This intuitively says that we require that the area should be
approximable both from below and from above.) If f is integrable, the common value of the
Rb
upper and lower integrals is denoted by the symbol a f (x) d x. This is just a notation; we may
Rb
as well use I ab ( f ) or a f etc.
4
Since m i ≤ M i and ∆αi ≥ 0, we see that
n n
L( f , P, α) := m i ∆αi ≤ M i ∆αi = U ( f , P, α).
X X
i =1 i =1
Let m and M be such that m ≤ f (x) ≤ M for x ∈ [a, b]. Then m ≤ m i and M i ≤ M for all i .
Hence it follows that
In particular, the set {U ( f , P, α)} is bounded below and hence its GLB exists. In a similar way,
{L( f , P, α)} is bounded above and hence its LUB exists. As in the case of Riemann integral, let
U ( f , α) := GLB {U ( f , P, α)} be the upper α-integral of f on [a, b]. How is L( f , α) defined? What
do you call it?
The following theorem is a collection of results which are proved by mimicking the proofs
of the corresponding results in the theory of Riemann integration.
Q := P ∪ {t } = {x 0 , x 1 , . . . , x i −1 , t , x i , . . . , x n }.
Can you state and prove the analogous results for m i ∆αi ? We then have
U ( f , P, α) = M j ∆α j + M i ∆αi
X
j 6=i
5
U ( f ,Q r −1 , α) ≤ · · · ≤ U ( f ,Q 1 , α) ≤ U ( f , P, α). The analogous result for lower sum follows in
a similar fashion.
(ii) We know that L( f , P, α) ≤ U ( f , P, α) for any partition P . Let P 0 = P ∪Q. Observe that
L( f , P, α) ≤ L( f , P 0 , α) ≤ U ( f , P 0 , α) ≤ U ( f ,Q, α).
Definition 14. Keep the notation above. We say that f is α-integrable if L( f , α) = U ( f , α). We
also say that the Riemann-Stieltjes integral of f with respect to the integrator α exists. The
Rb
common value of the upper and lower integrals is denoted by a f d α.
Let us repeat: if α(x) = x then we simply say that f is integrable on [a, b].
Note that when α is the identity function, this means that we can approximate the area
under the graph of a positive function both from inside and outside and get the same result.
Thus, this definition insists on a kind of “symmetry".
Theorem 15 (Riemann’s Criterion). f is α-integrable iff for any ε > 0 there exists a partition P
such that U ( f , P, α) − L( f , P, α) < ε.1
Proof. Let the condition be satisfied. We are required to prove that f is α-integrable. Let I 1
and I 2 be the lower and upper integrals. We need to show I 1 = I 2 . It is enough to show that for
any ε > 0, |I 1 − I 2 | < ε.
Since I 1 ≤ I 2 , it is enough to show that I 2 < I 1 + ε for any ε > 0. Given ε > 0, let P be as in
the hypothesis. Observe that
I 1 ≥ L( f , P, α) and I 2 ≤ U ( f , P, α).
Hence
I 2 − I 1 ≤ U ( f , P, α) − L( f , P, α) < ε.
Thus we have proved that f α-is integrable.
1
See the video in [5]: Riemann Integration - 3, especially 30:57 – 46:25.
6
To prove the converse, let ε > 0 be given. Let I 1 and I 2 be the lower and upper integrals.
Then we have I 1 = I 2 . Since I 1 is the LUB of L( f , P, α)’s, there exists a partition P 1 of [a, b] such
that I 1 − ε/2 < L( f , P 1 , α). That is,
L( f , P 1 , α) ≤ L( f , P, α) ≤ U ( f , P, α) ≤ U ( f , P 2 , α). (4)
Now,
U ( f , P, α) − L( f , P, α)
= U ( f , P, α) − I 2 + I 2 − L( f , P, α), adding and subtracting I 2
= U ( f , P, α) − I 2 + I 1 − L( f , P, α), since I 1 = I 2
≤ U ( f , P 2 , α) − I 2 + I 1 − L( f , P 1 , α), by (4)
< ε/2 + ε/2, by (2) and (3)
= ε.
for x ∈ [a, b]. Thus to control the second, we need to find a partition P so that i ∈B ∆αi is
“small". Keep these vague ideas in mind while going through the proofs which use Riemann’s
criterion. You will not miss the wood for trees! Look at Examples 18–19, Theorem 22 (i),
Theorem 24, Example 27, and Theorem 32.
Observation 17. As explained in the Remark above, we need to look at M i ( f ) − m i ( f ). Al-
ternate descriptions of M i ( f ) − m i ( f ) are useful in the sequel. Let us start with a general
observation.
Let A − A := {x − y : x, y ∈ A}. We claim that LUB A −GLB A = LUB (A − A). Let C = LUB (A −
A). Let M := LUB A and m = GLB A. Then x − y ≤ M − m. Hence C ≤ M − m. Fix y ∈ A.
2
See the video in [5]: Riemann Integration - 5, especially 8:3–21:23.
7
Since x − y ≤ C , we see that x ≤ C + y for all x ∈ A. Thus C + y is an upper bound for A and so
M ≤ C + y. It follows from this that M − C ≤ y. This is true for any y ∈ A. Therefore, M − C is a
lower bound for A. We conclude that M −C ≤ m or M − m ≤ C . Thus, M − m = LUB (A − A).
Next question is: Is there ¯a better ¯ description of LUB (A − A)? Note that both x − y, y −
x ∈ A ¯− A for
¯ x, y ∈ A. Since x − y = max{x
¯ ¯
¯ − y,¯ y − x}, we suspect that C := LUB (A − A) =
LUB { x − y : x, y ∈ A} =: D. Since x −¯y ≤ ¯x¯ − y ¯ ≤ D, it follows that D is an upper bound of
¯ ¯
A − A and so C ≤ D. Conversely, since ¯x − y ¯ is¯ either¯ x − y or y − x and since both lie in A − A,
we see
©¯ that
¯ x − y ≤ C
ª and y − x ≤ C and hence ¯x − y ¯ ≤ C . That is, C is an upper bound for the
set ¯x − y ¯ : x, y ∈ A . Hence D ≤ C . Thus we have established
© ª ©¯ ¯ ª
LUB A − GLB A = LUB x − y : x, y ∈ A = LUB ¯x − y ¯ : x, y ∈ A . (5)
We shall apply these observations to the set A := { f (x) : x ∈ [x i −1 , x i ]}. We then arrive at the
following:
© ª
M i ( f ) − m i ( f ) = LUB f (s) − f (t ) : s, t ∈ [x i −1 , x i ]
©¯ ¯ ª
= LUB ¯ f (s) − f (t )¯ : s, t ∈ [x i −1 , x i ] . (6)
Example 18 (Step Functions). Let f : [a, b] → R be a bounded function. Assume that ¯ f (x)¯ ≤
¯ ¯
M for x ∈ [a, b]. Let c ∈ (a, b). Assume that f (x) = A on [a, c) and f (x) = B on (c, b]. Let
f (c) = C . We claim that f is integrable. Let P be a partition such that c ∈ [x j −1 , x j ]. Then we
have
If ε is given, we need only choose any partition P such that the length of the subinterval con-
taining c in its interior is arbitrarily small.
For example, choose N ∈ N such that 4M N
< ε. Consider the partition P = {a, c − N1 , c + N1 , b}.
We assume that N is so large that a < c − N1 and c + N1 < b. Then U ( f , P )−L( f , P ) ≤ 2M × N2 < ε.
This argument is easily adapted to step-functions. A function f : [a, b] → R is a step func-
tion if there exists a finite set of points {t i : 1 ≤ i ≤ r } ⊂ [a, b] such that on each of the subinter-
vals in [a, b] \ {t i : 1 ≤ i ≤ r }, the function is a constant.
Let us be explicit. Assume without loss of generality a ≤ t 1 < t 2 < · · · < t r ≤ b. We have
subintervals of the form [a, t 1 ), (t i −1 , t i ) and (t r , b]. Then the assumption is that f is a constant
c i on the i -th interval.
Example 19. We now look at very important but easy example. Fix c such that a < c ≤ b.
Define the Heaviside function at c as follows:
(
0 if x < c
Hc (x) :=
1 if x ≥ c.
8
Note that Hc is an increasing function. We let α := Hc . Let f : [a, b] → R be a bounded function
Rb
which is continuous at c. We claim that f is α-integrable and we have a f d α = f (c). Let P be
any partition. Choose the unique k-th subinterval of the partition so that x k−1 < c ≤ x k . Then
we have the following:
f (c) − ε ≤ L( f , P, α) ≤ L( f , α) = U ( f , α) ≤ U ( f , P, α) ≤ f (c) + ε.
¯R ¯
¯ b
Thus, ¯ a f d α − f (c)¯ ≤ 2ε. Since ε > 0 is arbitrary, the result follows.
¯
Ex. 20. Prove that the result of the last example holds true if we relax the assumption of f by
requiring that f is continuous from the left. Hint: We may assume that c is the right endpoint
of the subinterval (x k−1 , x k ].
Ex. 21. This is an exploratory question. (i) What happens if one assumes right continuity.
2. If we replace ≥ by ≤ in the definition of the Heaviside function.
Proof. Pn(i) Please review Remark 16. Given ε > 0, we need to find a partition P of [a, b] such
that i =1 (M i − m i )∆αi < ε. Since f is continuous on any subinterval [x i −1 , x i ], there exists
s i , t i ∈ [x i −1 , x i ] such that f (t i ) = M i and f (s i ) = m i . Hence as explained in the Remark quoted
above, if s i , t i are close to each other we may ensure that f (t i ) − f (s i ) = M i − m i < ε. A little
thought will convince us to invoke the uniform continuity of ¯f on the closed and bounded
interval [a, b]. Let δ be such that if |s − t | < δ, then ¯ f (s) − f (t )¯ < ε. Choose N ∈ N such that
¯
(b − a)/N < δ. Let P be the partition whose i -th node is x i := a + Ni (b − a). Then if s i and t i as
above, it is clear that M i − m i < ε. Hence we have
N
U ( f , P, α) − L( f , P, α) = (M i − m i )[α(x i ) − α(x i −1 )] < ε[α(b) − α(a)].
X
i =1
9
length. This facilitated the estimate of U ( f , P ) − L( f , P ). If we wish to extend the result to an
arbitrary increasing integrator α, we need to find partition of [α(a), α(b)] into subintervals of
equal length and this, in turn, gives rise to partition of [a, b]. Let us attend to the details.
Let y k := α(a) + nk (α(b) − α(a)), 0 ≤ k ≤ n. Since α is increasing and continuous, we note
that α([a, b]) = [α(a), α(b)]. Now y k ∈ [α(a), α(b)]. Hence by the intermediate value theorem,
there exists x k ∈ [a, b] such that α(x k ) = y k . Note also that x k ≤ x k+1 since α is increasing. Let
P = {x 0 , . . . , x k , . . . , x n } be the partition of [a, b]. We can adapt the argument in Example 6 to
arrive at U ( f , P, α) − L( f , P, α) = α(b)−α(a) n [ f (b) − f (a)].
Remark 23. Go through the proof of (ii) of Theorem 22 again. You may be tempted to say that
is suffices to assume that α : [a, b] → [α(a), α(b)] is onto. If you remember your real analysis
well, especially the study of monotone functions, you may recall that a monotone function on
an interval is continuous iff its range is an interval. (This follows from the intermediate value
theorem. See pages 88-89 of [1], especially Proposition 3.5.5 and its corollary. It is always
worth reviewing as it makes you gain mastery over the subject.) So, nothing gained!
Theorem 24. Let f : [a, b] → R be a bounded function. Assume that f is continuous on [a, b]
except at a finite number of points. Let α : [a, b] → R be increasing. Assume that α is continuous
at the points of discontinuity of f . Then f is α-integrable.
Proof. We shall walk through a proof assuming that f is discontinuous at only one point c ∈
(a, b). The general case can be proved along similar lines.
As per the strategy outlined in Remark 16,Pc is a “bad" point. Let P be a partition such
that c ∈ [x j −1 , x j ]. Then U ( f , P, α)−L( f , P, α) = i 6= j (M i −m i )∆αi +(M j −m j )(α(x j )−α(x j −1 )).
Thanks to the continuity of f on [a, x j −1 ] and [x j , b] we are confident of ¯ensuring ¯ that the
first summand on the right is as small as we please. If C > 0 is such that f (x) ≤ C for x ∈ ¯ ¯
[a, b], then the term (M j − m j )[α(x j ) − α(x j −1 )] ≤ 2C [α(x j ) − α(x j −1 )]. Thus we should aim to
“control" α(x j ) − α(x j −1 ). This is where we invoke the continuity of α at c. Choose s < c < t so
small that α(t ) − α(s) is as small as we please. Let us look at [a, b] = [a, s] ∪ [s, t ] ∪ [t , b]. The
textbook proof is ready.
Let ε > 0 be given. Let C > 0 be such that ¯ f (x)¯ ≤ C for x ∈ [a, b]. Since α is continuous at c,
¯ ¯
ε
there exists δ > 0 such that |x − c| < δ =⇒ |α(x) − α(c)| < 12C . Let s := c − δ2 and t := c + δ2 . Then
ε
|α(t ) − α(s)| = α(t )−α(s) < 6C . Consider the intervals [a, s] and [t , b]. Since f is continuous on
each of them there exist partitions P 1 := {a = s 0 < · · · < s m = s} and P 2 := {t = t 0 < · · · < t n = b}
such that U ( f , P i , α) − L( f , P i , α) < 3ε for i = 1, 2. (If you are very pedantic, then the " f " in
U ( f , P 1 , α) is the restriction of f to [a, s] and so on!) Consider P = P 1 ∪P 2 . Let M 0 = LUB { f (x) :
x ∈ [s, t ]} and m 0 = GLB { f (x) : x ∈ [s, t ]}. We then have
U ( f , P, α) − L( f , P, α)
= [U ( f , P 1 , α) − L( f , P 1 , α)] + (M 0 − m 0 )[α(t ) − α(s)] + [U ( f , P 2 , α) − L( f , P 2 , α)]
ε ε ε
< + 2C + = ε.
3 6C 3
10
This proves that f is α-integrable.
Can you now work out the proof of the general case? If c 1 , . . . , c r are the points of disconti-
nuity of f , choose δ > 0 in such that way that {(c i − δ, c i + δ) : 1 ≤ i ≤ r } are pairwise disjoint.
Let s i := c i − δ2 etc. What do we want about α(t i ) − α(s i )? What kind of estimate you want on
U ( f , P i , α) − L( f , P i , α) where P i is a partition of [t i −1 , s i ]? Draw pictures for the final partition
P . On how many subintervals the ‘control’ of U ( f , P i , α) − L( f , P i , α) is easy? What kind of
estimate you would like to have on each of them?
Remark 25. Revisit Example 18. Let f : [a, b] → R be a step function. Then f is continuous
except at finitely many points. If we let α(x) = x, then f is integrable on [a, b] by the last
theorem. This was proved already in Example 18.
Remark 26. There is a result due to Lebesgue which characterizes the integrability of f in
terms of the ‘size’ (measure or length) of the set of discontinuities. (See a forthcoming book ) Give Ref!
A particular case: if the set of discontinuities of f is countable, then the function is integrable.
For example, Thomae’s function is integrable. See Example 27.
Given ε > 0, we must find a partition P such that U ( f , P ) − L( f , P ) < ε. By the density of
irrationals, in any subinterval [x i , x i +1 ] of a partition P of [0, 1], irrationals exist and hence m i ’s
are zero and hence L( f , P ) = 0. Hence we need only show U ( f , P ) < ε. Let n ∈ N be such that
1/n < ε. The set A n := {r ∈ Q ∩ [0, 1] : f (r ) > 1/n} is finite. If r ∈ A n and r ∈ [x i −1 , x i ], M i ( f ) ≥ n1 .
We call such interval ‘bad’. We employ the divide and conquer method to show that the sum
of the lengths of such bad intervals is ‘small’.
Let ε > 0 be given. Choose k ∈ N such that k1 < ε/2. There exists a finite number, say, N of
rational numbers p/q with q ≤ k. Denote them by {r j : 1 ≤ j ≤ N }. Let δ < ε/(4N ). Choose a
partition P = {x 0 , . . . , x n } of [0, 1] such that
max{|x i +1 − x i | : 0 ≤ i ≤ n − 1} < δ.
Let A := {i : r j ∈ [x i , x i +1 ], for some j }, and B := {0, . . . , n}\ A. Note that the number of elements
in A will be at most 2N . (Why 2N ? Some r j could be the left and the right endpoint of adjacent
11
subintervals!) For i ∈ A, we have M i ≤ 1. For j ∈ B , we have M j < 1/k. Hence
n−1
X
U(f ,P) = M i (x i +1 − x i )
i =0
X X
= M i (x i +1 − x i ) + M j (x j +1 − x j )
i ∈A j ∈B
X 1
≤ (2N )δ + (x j +1 − x j )
j ∈B k
1
≤ (2N )δ +
k
ε ε
+ = ε.
<
2 2
P
In the above, we used the fact that j ∈B (x j +1 − x j ) is the sum of the lengths of the disjoint
subintervals that lie in B , and hence it is at most 1.
Thus, for any ε > 0, we have found a partition P ε such that U ( f , P ε ) < ε. It follows that
R1
GLB {U ( f , P ) : P is a partition of [0, 1]} = 0. Hence 0 f = 0. Of course, we could use a simpler
argument. Since f is integrable, and since each lower sum is zero, it follows that the lower
R1
integral is 0. Hence 0 f = 0. (Always explore different ways of looking at the same thing!)
We now show that the present definition of α-integral is equivalent to the earlier definition
if the integrator α is increasing.
Theorem 29. Let f : [a, b] → R be bounded. Let α : [a, b] → R be increasing. Then f is α-
integrable according to the first definition iff it is α-integrable according to Definition 28. In
such a case, both the integrals are the same.
12
Proof. Let f be integrable according to Definition 28 with the integral A. We need to show
that f is α-integrable according to the first definition. We plan to use the Riemann’s criterion
to achieve this. Let ε > 0 be given. Then by definition there exists a partition ¯ Pn P such that ¯ for
any partition Q, which refines P and for any set of tags t, we have i =1 f (t i )∆αi − A ¯ < ε.
¯
We now choose a sequence tk := (t k1 , . . . , t kn )of tags as follows: f (t ki ) > M i − k1 for k ∈ N.
1
i f (t ki¯)∆αi = U ( f , P, α). Since
P
Note
¯ Pn that we have M i − k
< f (t ki ) ≤ M i . It follows that lim k
You may like to watch the videos 7 and 8 in [5] either as you read this section or you may read
this section after watching them.
Theorem 31. Let α : [a, b] → R be a bounded function. Let R(α) denote the set of α-integrable
functions on [a, b]. Then the following are true.
Rb
i) R(α) is a vector space over R and the map f 7→ a f d α is linear.
ii) Let β : [a, b] → R be bounded. Let σ := λα + µβ where λ, µ ∈ R. If f is α as well as β-
integrable, then f is σ integrable and we have
Z b Z b Z b
f dσ = λ f dα + µ f d β. (8)
a a a
iii) Let a < c < b. Assume that any two of the integrals in (9) exist. Then the third integral exists
and we have Z bZ Z c b
f dα = f dα + f d α. (9)
a a c
13
Rb
iv) Assume that α is increasing. Let f ∈ R(α) and f ≥ 0 on [a, b]. Then a f d α ≥ 0. More
generally, let f , g ∈ R(α). Assume that f (x) ≤ g (x) for x ∈ [a, b]. Then we have
Z b Z b
f dα ≤ g d α. (10)
a a
that α is increasing. Let f ∈ R(α). Assume that ¯ f (x)¯ ≤ M for x ∈ [a, b]. Then
¯ ¯
v) ¯Assume
¯ f ¯ ∈ R(α) and we have
¯
¯Z b ¯ Z b
α ¯ f ¯ d α ≤ M (α(b) − α(a)).
¯ ¯ ¯ ¯
¯
¯ f d ¯≤
¯ (11)
a a
Proof. Just to make sure that you are alert while reading the statements, (i) – (iii) are true for
any integrator α. We do not required α to be increasing for the validity of (i)–(iii).
The proofs will be instructive in the sense that we shall employ one of the equivalent defi-
nitions (for the case on hand) which makes our lives easy!
(i) Let λ, µ ∈ R. Observe that
n
S(λ f + µg , P, t, α) = (λ f (t i ) + µg (t i ))∆αi = λS( f , P, t, α) + µS(g , P, t, α).
X
i =1
Hence we obtain
¯ µ Z b Z b ¶¯
¯S(λ f + µg , P, t, α) − λ f dα + µ g d α ¯¯
¯ ¯
¯
a a
¯ Z b ¯ ¯ Z b
¯
≤ |λ| ¯S( f , P, t, α) − f d α¯ + µ ¯S(g , P, t, α) − g d α¯¯ .
¯ ¯ ¯ ¯¯ ¯
¯ ¯ ¯ ¯ ¯ (12)
a a
ε ε
¯ Z b ¯ ¯ Z b ¯
¯S( f , P 1 , t, α) − f d α¯ < & ¯S(g , P 2 , t, α) − g d α¯¯ <
¯ ¯ ¯ ¯
¯ ¯
2(1 + ¯µ¯)
¯ ¯
¯
a 2(1 + |λ|) a
Let P := P 1 ∪ P 2 . Then, for any partition Q finer than P , it follows from (12) that
¯ µ Z b Z b ¶¯
¯S(λ f + µg , P, t, α) − λ α µ α ¯ < ε.
¯ ¯
¯ f d + g d ¯
a a
14
Remark: This may be omitted on first reading.
Just for fun: If you wish to prove (ii) using Darboux sum approach, you need to
assume λ, µ are positive. Let us go through a proof when α is increasing and λ =
µ = 1.
To prove ii), we start with an observation: ∆σi = ∆αi + ∆βi . An immediate conse-
quence is: L( f , P, σ) = L( f , P, α) + L( f , P, β) for any partition P . Similar result holds
for the upper sums.
We first show that f is σ-integrable, using Riemann’s criterion. Let ε > 0 be given.
Then we can find partitions P 1 and P 2 so that
U ( f , P, σ) − L( f , P, σ) = U ( f , P, α) − L( f , P, α) +U ( f , P, β) − L( f , P, β) < ε.
L( f , P, σ) = L( f , P, α) + L( f , P, β)
Z b Z b
≤ f dα + f dβ
a a
≤ U ( f , P, α) +U ( f , P, β)
= U ( f , P, σ).
Rc Rb
We shall now prove one of the three cases of (iii). Assume that a f d α and c f d α exist.
Then we are required to show that f is α-integrable on [a, b] and that (9) holds.
Let P be a partition of [a, b] such that c is a node, that is,c ∈ P . Let P 1 = P ∩ [a, c] and
P 2 = P ∩ [c, b]. Then P 1 and P 2 provide partitions of [a, c] and [c, b] respectively.
What is the relation between S( f , P 1 , t1 , α), S( f , P 2 , t2 , α) and S( f , P, t, α)? Of course, its eas-
ily seen that
S( f , P, t, α) = S( f , P 1 , t1 α) + S( f , P 2 , t2 , α).
15
Let ε > 0 be given. Choose partitions P 1 and P 2 of [a, c] and [c, b] respectively with the prop-
erty that for any set of tags we have
¯ Z c ¯ ¯ Z b ¯
¯S( f , P 1 , t1 , α) − f d α¯ < ε/2 and ¯S( f , P 2 , t2 , α) − f d α¯¯ < ε/2.
¯ ¯ ¯ ¯
¯ ¯
¯
a c
Let P = P 1 ∪P 2 . Let Q be any partition finer than P . Note that Q i = Q ∩P i will be finer than P i ,
i = 1, 2. Hence for any set of tags in Q we have
¯ Z c Z b ¯
¯S( f ,Q, t, α) − f dα − f d α¯¯ < ε.
¯ ¯
¯
a c
(Can you justify the inequality quoting the observation made above?) This proves (iii).
Let f ∈ R(α) and f ≥ 0. To prove (iv), we need only ask the question: Which of the defini-
tions is likely to yield a simple proof? Yes, the Darboux sum approach. Again, the next ques-
Rb
tion is since a f d α is the LUB of lower sums and the GLB of upper sums, which sum should
Rb
we use? An obvious choice is lower sums. Observe that L( f , P, α) ≥ 0. Since a f d α ≥ L( f , P, α),
the result follows.
An aside. You could work with upper sums as well as Riemann sums. In each of
these cases, assume that the integral is negative, say A < 0. Choose ε = − A2 to ar-
rive at a contradiction. Recall how you prove: If (x n ) is a sequence of nonnegative
terms converging to x, then x ≥ 0.
The general case can be deduced easily from this. Consider g − f . By (i), g − f ∈ R(α) and
Rb Rb
also g − f ≥ 0. Hence a (g − f ) d α ≥ 0. Again by the linearity of the integral, a (g − f ) d α =
Rb Rb
a g d α − a f d α ≥ 0. Thus we have proved (iv).
(v) I am sure all of us will think of Darboux sum approach to ¯ ¯prove this.
¯ ¯ The key idea of
the proof is to find the relation between M i ( f ) − m i ( f ) and M i ( f ) − m i (¯ f ¯). If you think for
¯ ¯
a while, we can formulate a more general question. Let A ⊂ R be a nonempty set which is
both bounded above and bounded below. Let B := {|x| : x ∈ A}. Recall the relation between
LUB A − GLB A and LUB B − GLB B made in Observation 17.
We are now ready to prove (v). We let A := { f (x) : x ∈ [x i −1 , x i ]} and apply the observations
made in the last two paragraphs above. Recall that for any s, t ∈ R, we have ||s| − |t || ≤ |s − t |.
Observe that for s, t ∈ [x i −1 , x i ] we have
¯¯ ¯ ¯ ¯¯ ¯ ¯
¯¯ f (s)¯ − ¯ f (t )¯¯ ≤ ¯ f (s) − f (t )¯ .
¯ ¯ ¯ ¯
The LUB of the left side is M i (¯ f ¯)−m i (¯ f ¯). Similarly the LUB of the right side is M i ( f )−m i ( f ).
It follows that ¯ ¯ ¯ ¯
M i (¯ f ¯) − m i (¯ f ¯) ≤ M i ( f ) − m i ( f ).
Since ∆αi ≥ 0, we see that M i (¯ f ¯) − m i (¯ f ¯) ∆αi ≤ [M i ( f ) − m i ( f )]∆αi . Hence
£ ¯ ¯ ¯ ¯¤
U ¯ f ¯ , P, α − L ¯ f ¯ , P, α ≤ U ( f , P, α) − L( f , P, α).
¡¯ ¯ ¢ ¡¯ ¯ ¢
16
Do you see how to prove ¯ f ¯ ∈ R(α)? If yes, which results are needed to establish (11)?
¯ ¯
Theorem 32. Let α : [a, b] → R be increasing. Let f ∈ R(α). Assume that m ≤ f (x) ≤ M for
x ∈ [a, b]. Let g : [m, M ] → R be continuous. Then g ◦ f ∈ R(α).
Proof. Let C > 0 be such that ¯g (y)¯ ≤ C for y ∈ [m, M ]. Let ε > 0 be given. We need to find a
¯ ¯
We shall use the “Divide and Conquer" trick to estimate the sum. By the uniform continuity of
g on [m, M ], for the given θ, there corresponds a δ > 0. Hence if we let G := {i : M i ( f )−m i ( f ) <
δ}, then we have a control over M i (g ◦ f ) − m i (g ◦ f ). Let B := {i : M i ( f ) − m i ( f ) ≥ δ}. The sum
over B is
∆αi .
X X
(M i (g ◦ f ) − m i (g ◦ f ))∆αi ≤ 2C
i ∈B i ∈B
U (g ◦ f , P, α) − L(g ◦ f , P, α) =
X X
(M i (g ◦ f ) − m i (g ◦ f ))∆αi + (M i (g ◦ f ) − m i (g ◦ f ))∆αi
i ∈G i ∈B
θ∆αi + 2C ∆αi
X X
<
i ∈G i ∈B
< θ(α(b) − α(a)) + 2C ∆αi .
X
i ∈B
We want the first term on the last right expression to be less than ε/2. So, you choose 0 < θ <
ε
2[1+α(b)−α(a)]
.
17
∆αi so that it is less than ε/2. Observe that
P
We need to estimate i ∈B
We find that i ∈B ∆αi < η/δ. So this suggests that we choose a partition P so that U ( f , P, α) −
P
εδ
L( f , P, α) < 4C .
Do you think you can write a textbook proof now?
Let ε > 0 be given. Let C > 0 be such that ¯g (t )¯ ≤ C for t ∈ [m, M ]. Since g is uniformly
¯ ¯
εδ
U ( f , P, α) − L( f , P, α) < .
4C
Let G := {i : M i ( f ) − m i ( f ) < δ} and B := {i : M i ( f ) − m i ( f ) ≥ δ}. Then we have
εδ ε
δ ∆αi ≤ ∆αi <
X X X
(M i − m i )∆αi < =⇒ . (13)
i ∈B i ∈B 4C i ∈B 4C
For x, y ∈ [x i −1 , x i ], i ∈ G, we see that ¯ f (x) − f (y)¯ ≤ M i − m i < δ so that g ( f (x)) − g ( f (y)) <
¯ ¯
ε
2[1+α(b)−α(a)] . Observation 17 implies that
ε
M i (g ◦ f ) − m i (g ◦ f ) ≤ for i ∈ G. (14)
2[1 + α(b) − α(a)]
U (g ◦ f , P, α) − L(g ◦ f , P, α)
X X
= [M i (g ◦ f ) − m i (g ◦ f )]∆αi + [M i (g ◦ f ) − m i (g ◦ f )]∆αi
i ∈G i ∈B
ε
∆αi + 2C ∆αi
X X
< by (14)
2[1 + α(b) − α(a)] i ∈G i ∈B
ε n ε
∆αi + 2C
X
< by (13)
2[1 + α(b) − α(a)] i =1 4C
ε ε
< + .
2 2
We hope that you enjoyed the proof. This is an example of how one thinks through a proof
and then writes a textbook proof.
Remark 33. Do you realize that (v) and (vi) of Theorem 31 are easy consequences of the last
Theorem 32?
18
7 Two Important Examples of Riemann-Stieltjes Integrals
You may like to watch the video 11 in [5] either as you read this section or you may read this
section after watching it.
We now give two important examples of Riemann-Stieltjes integrals which are very useful.
Also, the first example shows that an infinite series is a special case of α-integrals.
Example 34. Let {t k : 1 ≤ k ≤ N } be a finite set of points in (a, b]. (Did you notice the brackets?
see why they are different?) Let {c k : 1 ≤ k ≤ N } be a set of nonnegative numbers. Let
Do you P
N
α(x) := k=1 c k H tk (x). It is clear that α is increasing. Let f : [a, b] → R be continuous. We then
have Z b
f d α = c k f (t k ).
X
(15)
a k
(What results did we use to assert (15)?)
Ex. 35. Let α : [0, N ] → R be the greatest integer function α(x) = [x]. Write it as “a sum of
RN PN
Heaviside functions”. Let f : [0, N ] → R be continuous. Prove that 0 f d α = k=0 f (k).
α(x) := c k H tk (x),
X
x ∈ [a, b]. (16)
k
Then α(x) makes sense (why?) and is an increasing function. We now claim that for any
Rb
continuous function f on [a, b] we have a f d α = k c k f (t k ). This is an easy exercise with the
P
PN
following hint: Given ε > 0, choose N ∈ N such that k≥N +1 c k < ε/2. Write α = k=1
P
c k H tk +
β σ
P
c H
k≥N +1 k t k = n + n , say.
Pn Rb
We need to show that c f (t k ) → a f d α. Let ε¯> 0 be given. Then we need to
k=1 ¯k
¯R b
≥ N , ¯ a f d α − nk=1 c k f (t k )¯ < ε. Keep the notation of the
P
find N such that for n
¯
19
Hence we have proved the following theorem.
α(x) := c k H tk (x),
X
x ∈ [a, b]. (17)
k
Theorem 37. Let f , α : [a, b] → R be bounded. Assume that α is continuously differentiable and
Rb Rb
that f is α-integrable. Then f α0 is integrable on [a, b] and we have a f d α = a f α0 d x.
Proof. It behooves us to consider the Riemann sums. Let P be any partition with tags t. Then
we look at
n n
f (t i )α0 (t i )∆x i .
X X
f (t i )∆αi − (19)
i =1 i =1
Again it is clear that we wish to apply the mean value theorem to ∆αi : ∆αi = α0 (s i )∆x i . Using
this in (19) we arrive at
n n n
f (t i )α0 (t i )∆x i = f (t i )(α0 (s i ) − α0 (t i ))∆x i .
X X X
f (t i )∆αi − (20)
i =1 i =1 i =1
Our aim is to show that these two Riemann sums are close to each other. So the next obvious
¯ α . Give ε > ¯ 0, by the uniform continuity of α there
0 0
step is to use the uniform continuity of
exists a δ > 0 such that |s − t | < δ =⇒ α (s) − α (t ) < ε. Then for any a partition Q, which is
¯ 0 0 ¯
a refinement of P , with the property that the maximum length of the subintervals is less than
δ, it follows from (20)
¯ ¯
¯X n n ¯
f (t i )α0 (t i )∆x i ¯ ≤ M ε(b − a).
X
¯ f (t i )∆αi − (21)
¯ ¯
¯i =1 i =1
¯
(M
¯R has thePusual meaning!) ¯ We choose any partition Q 0 which is refinement of Q so that
¯ b
¯ a f d α − ni=1 f (t i )∆αi ¯ < ε for any set of tags. We then see that for any such Q 0 and any set
¯
of tags ¯Z ¯
¯ b n ¯
f dα − f (t i )α0 (t i )¯ ∆x i < (M (b − a) + 1)ε.
¯ X ¯
¯
¯ a i =1
¯
This completes the proof.
Remark 38. The result above can be proved without assuming that α0 is continuous but under
the hypothesis that α0 is integrable. See Theorem 39.
20
Theorem 39. Let f ∈ R[a, b]. Let α : [a, b] → R be differentiable. Assume further that α0 ∈
R[a, b]. Then f ∈ R(α) and we have
Z b Z b
f dα = f α0 d x. (22)
a a
Proof. Note that f , α0 ∈ R[a, b] implies f α0 ∈ R[a, b]. Hence the right side of (22) makes sense.
Rb Rb
Let A := a f α0 . We wish to show that a f d α = A. Let ¯ f (x)¯ ≤ C for x ∈ [a, b].
¯ ¯
Let ε > 0 be given. Since f α0 is integrable and α0 is integrable, there exists a (common)
partition P such that for any partition Q ≥ P , and for any set of tags in Q, we have
¯ ¯
¯S( f α ,Q, t) − A ¯ = ¯ f (t i )α (t i )(x i − x i −1 ) − A ¯¯ < ε , and U (α0 ,Q) − L(α0 ,Q) < ε . (23)
¯ 0
¯ ¯¯X 0
¯
¯i ¯ 2 2C
We claim that ¯S( f ,Q, t, α) − A ¯ < ε. Applying the mean value theorem to α(x i ) − α(x i −1 ), we
¯ ¯
We have
¯ ¯
¯X ¯
¯S( f α0 ,Q, t) − S( f ,Q, t, α)¯ = ¯ f (t i )(α0 (t i ) − α0 (s i ))(x i − x i −1 )¯¯
¯ ¯ ¯
¯i ¯
≤ ¯ f (t i )¯ ¯α (t i ) − α0 (s i )¯ (x i − x i −1 )
X¯ ¯¯ 0 ¯
i
≤ ¯ f (t i )¯ (M i (α0 ) − m i (α0 ))(x i − x i −1 )
X¯ ¯
(Why?)
i
≤ C U (α0 ,Q) − L(α0 ,Q) < ε/2.
¡ ¢
(Why?) (24)
< ε.
Rb
This shows that a f d α = A.
Remark 40. Did you observe something interesting about the proof above? We did not hes-
itate to use both the definitions of the integral! To show that f is α-integrable, we used the
Riemann sum approach and to estimate a sum, we used the Riemann’s criterion for the Dar-
boux sum approach. I thought of the proof after I sent you the file! I have¯ given a textbook
proof above. Can you ‘discover’ the proof starting from a wish to estimate ¯S( f ,Q, t, α) − A ¯?
¯
21
8 Fundamental Theorems of Calculus
You may like to watch the videos 9 and 10 in [5] either as you read this section or you may read
this section after watching them.
We now look at one of the most important results in the theory of integration, namely, the
fundamental theorems of calculus. These theorems establish the validity of the computation
of integrals via Newtonian calculus, as learned in high school. In some sense, they justify the
high-school way of defining integration as finding an anti-derivative.
Rb
Proof. WeP wish to use the Riemann sum approach. A Riemann sum for the integral a f 0 (t ) d t
looks like ni=1 f 0 (t i )(x i − x i −1 ). The summands remind us of the mean value theorem of dif-
ferential calculus. Let ε > 0 be given. Then there exists a partition P such that for any partition
Q ≥ P and set of tags t, we have
b
¯ Z ¯
¯S( f 0 ,Q, t) − 0
f (t ) d t ¯¯ < ε.
¯ ¯
¯ (25)
a
where we used the mean value theorem to f (x i ) − f (x i −1 ), 1 ≤ i ≤ n. But the term on the
extreme right is S( f 0 ,Q, t). Hence it follows from (25)-(26) that
b
¯ Z ¯
0
f (t ) d t ¯¯ < ε.
¯ ¯
¯ f (b) − f (a) −
¯
a
Remark 42. Note that this justifies what you learned in school about the integral being anti-
Rb
derivative. That is, to find a f (x) d x, we find a function g such that g 0 = f and then in this
Rb
case we have a f (x) d x = g (b) − g (a).
Let f : [a, b] → R be integrable. Then R xfor any x ∈ [a, b], we know that f is integrable on
[a, x]. Hence we have a function F : x 7→ a f (t ) d t , x ∈ [a, b]. The new function F is called the
indefinite integral of f . This is the area under the curve y = f (x) between the x-axis, x = a,
and x = b. Draw a picture
22
Theorem 43 (Second Fundamental Theorem of Calculus). Let f : [a, b] → R be integrable. The
indefinite integral F of f is continuous (in fact, Lipschitz) on [a, b] and is differentiable at x if
f is continuous at x ∈ [a, b]. In fact, F 0 (x) = f (x), if f is continuous at x.
R x+h
Why is this result plausible? Draw a picture. It seems that x f (t ) d t is approximately the
1 x+h
R
area of the rectangle whose base is h and height is f (x). Hence h x f (t ) d t ≈ f (x). Observe
that µZ x+h Z x
1 x+h
¶
1
Z
f (t ) d t − f (t ) d t = f (t ) d t ≈ f (x).
h a a h x
¯ ¯
Proof. Since f is bounded, there exists M such that ¯ f (x)¯ ≤ M for x ∈ [a, b]. Then we have
¯ ¯ x
¯Z Z y ¯
¯ ¯
¯F (x) − F (y)¯ = ¯ f (t ) d t − f (t ) d t ¯¯
¯
¯Z a y ¯ a
¯ ¯
= ¯¯ f (t ) d t ¯¯
x
¯Z y ¯
¯ ¯
≤M¯ ¯ 1 d t ¯¯
¯ x ¯
= M ¯x − y ¯ .
Given ε > 0, by the continuity of f at c, we can find a δ > 0 such that ¯ f (t ) − f (c)¯ < ε for
¯ ¯
|t − c| < δ. Hence for x ∈ [a, b] such that |x − c| < δ, we see that the RHS of (27) is estimated
above by ε. Similar argument applies when x < c.
This shows that F is differentiable at c and F 0 (c) = f (c).
Remark 44. Look at the inequality (27). One of the terms in LHS is an integral while the other
is a numberR f (c). We re-wrote this as a sum of two integrals by observing that f (c) is the
1 x
average x−c c f (c) d t and then applied the linearity, continuity, and the standard estimate
for the integral. Learn this well as this trick is often used.
23
Remark 45. We can deduce a weaker version of the first fundamental Theorem 41 from the
second fundamental theorem of calculus.
Let f : [a, b] → R be differentiable with f 0 continuous on [a, b]. Then
Z b
f 0 (x) d x = f (b) − f (a).
a
Rx
Proof. Since f 0 is continuous, it is integrable and its indefinite integral, say, G(x) = a f 0 (t ) d t ,
exists. By the last item, G is differentiable with derivative G 0 = f 0 . Hence the derivative of f −G
is zero on [a, b] and hence the function f −G is a constant on [a, b]. In particular, f (a)−G(a) =
Rb
f (b) −G(b), that is, f (a) = f (b) − a f 0 (x) d x.
One of the most basic tools for computing integration in high school is integration by
substitution or the change of variables. The following result justifies this process.
24
Ry
Proof. Fix c ∈ I . Let F (y) := c f (t ) d t . Then by the second fundamental theorem of calculus
(Theorem 43), F is differentiable and F 0 (y) = f (y). Let g (x) := (F ◦ u)(x). Then g is differen-
tiable, and by the chain rule we have
We now prove a general version of integration by parts formula for Riemann-Stieltjes in-
tegrals.
Rb
Strategy: Let A := a f d g . Let ε > 0 be given. Choose P = {x 0 , . . . ,¯x n } such that for
¯
any partition Q finer than P and for any set of tags t in Q, we have ¯ A − S( f , P, t, g )¯ <
ε.
We wish to show that ¯−A + f (b)g (b) − f (a)g (a) − S(g , P, t, f )¯ < ε. Note that we
¯ ¯
This suggests that we consider a partition Q in which the nodes are the nodes of
P along with the tags in P . Also, the tags of Q should include the tags t and the
nodes of P . Let Q = P ∪ t. Let
¯ s = {s 1 = a, s 2¯ = x 1 , s 3 = x 1 , . . . , x n−1 , x n−1 , s 2n = b} be
the set of tags in Q. Then ¯ A − S( f ,Q, s, g )¯ < ε. Keeping in mind what we want,
rearrange the terms in S( f ,Q, s, g ) to arrive at (31).
Rb
Proof. Let A := a f d g . Let ε > 0 be given. Choose P¯ = {x 0 , . . . , x n } such that for any partition
Q finer than P and for any set of tags t in Q, we have ¯ A − S( f , P, t, g )¯ < ε.
¯
25
We wish to show that ¯−A + f (b)g (b) − f (a)g (a) − S(g , P, t, f )¯ < ε.
¯ ¯
Since Q ≥ P , the left side is less than ε. So the right side is also less than ε. We deduce that
Z b Z b
− gdf = f d g − f (b)g (b) + f (a)g (a).
a a
Thus (30) follows.
References
(1) Ajit Kumar and S Kumaresan, A Basic Course in Real Analysis, CRC Press.
(2) Tom Apostol, Analysis, Indian Reprint.
(3) Walter Rudin, Principles of Mathematical Analysis, International Student Edition.
(4) S Kumaresan and G Santhanam, Analysis on Rn , Forthcoming.
(5) This article is based on the series of my videos in the YouTube playlist on Riemann Inte-
gration:
https://www.youtube.com/playlist?list=PLDzvuf9Uf4FNveNqPpjBtZqtfWEIj3DKr
You may either watch them first and then read this article or the other way around.
26