0% found this document useful (0 votes)
71 views23 pages

Lecture 7: Convergence and Limit Theorems

This document discusses different types of convergence for sequences of random variables: 1) Almost sure convergence (a.s.) means the sample paths converge with probability 1. The sample mean Sn converges almost surely to the true mean for i.i.d. random variables by the Strong Law of Large Numbers. 2) Convergence in mean square means the expected squared difference converges to 0. The sample mean converges in mean square to the true mean for i.i.d. random variables with finite mean and variance. 3) Convergence in probability means the probability that the absolute difference exceeds any epsilon converges to 0. It is a weaker form of convergence than almost sure convergence.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views23 pages

Lecture 7: Convergence and Limit Theorems

This document discusses different types of convergence for sequences of random variables: 1) Almost sure convergence (a.s.) means the sample paths converge with probability 1. The sample mean Sn converges almost surely to the true mean for i.i.d. random variables by the Strong Law of Large Numbers. 2) Convergence in mean square means the expected squared difference converges to 0. The sample mean converges in mean square to the true mean for i.i.d. random variables with finite mean and variance. 3) Convergence in probability means the probability that the absolute difference exceeds any epsilon converges to 0. It is a weaker form of convergence than almost sure convergence.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Lecture 7: Convergence and Limit Theorems

Almost sure convergence


Convergence in mean square
Convergence in probability
Convergence in distribution

1
Motivation: Estimating Mean

Want to estimate the mean E[X] of a distribution.


Generate X1 , X2 , . . . , Xn i.i.d. drawn from the distribution and compute the sample mean
n
1X
Sn = Xi .
n i=1

Does Sn converge to E[X]? In what sense? Note that Sn is a random variable.

2
Motivation: Estimating Mean
Example: X1 , X2 , . . . , Xn i.i.d. N (0, 1)
▶ Generate 6 sets of X1 , X2 , . . . , Xn (sample paths).
▶ Note that every sample path appears to be converging to the mean 0 as n increases.

3
Almost Sure Convergence

Recall that a sequence of numbers x1 , x2 , . . . converges to x if given any ϵ > 0, there


exists an n(ϵ) such that for all n ≥ n(ϵ), |xn − x| < ϵ. Equivalently, maxi≥n |xi − x| → 0
as n → ∞.

Suppose a sequence of random variables X1 , X2 , . . . defined over the same probability


space (Ω, F, P). For every ω ∈ Ω, we get a sample path X1 (ω), X2 (ω), . . ., which is a
sequence of numbers.

We say Xn → X almost surely (a.s.) or with probability 1 (w.p. 1) if


n o
P ω : lim Xn (ω) = X(ω) = 1.
n→∞

This means that the set of ω such that the sample path or sequence X1 (ω), X2 (ω), . . .
converges to X(ω) has probability 1.

4
Almost Sure Convergence

 
a.s.
Lemma. P max |Xi − X| > ϵ → 0 for any ϵ > 0 iff Xn −→ X.
i≥n

We prove the forward direction. We will see that the converse holds when discussing
convergence in probability.
Let Mn = maxi≥n |Xi − X|, which is a decreasing sequence bounded below by 0. Therefore
Mn ↓ M for some M .
Then, for all ϵ > 0, P(M > ϵ) ≤ P(Mn > ϵ) → 0 as n → ∞.
Therefore, from continuity of P, P(M = 0) = 1 and Mn → 0 a.s. This is equivalent to saying
that Xn → X a.s.

5
Qn
Example †: X1 , X2 , . . . i.i.d. ∼ Bern (1/2). Let Yn = 2n i=1 Xi . Show that the sequence
Yn converges to 0 a.s.
For 0 < ϵ < 1, we have
 
P max |Yi − 0| > ϵ ≤ P(Xi = 1 ∀ i ≤ n)
i≥n
1
= →0
2n
as n → ∞.
Important example of a.s. convergence is the Strong Law of Large Numbers (SLLN): If
X1 , X2 , . . . are i.i.d. (in fact pairwise independence suffices) with finite mean µ, then
n
1X a.s.
Sn = Xi −→ µ
n i=1

as n → ∞. Proof of the SLLN is beyond the scope of this course. See [Github:
https://github.com/wptay/aipt].

6
Convergence in Mean Square

We say Xn → X in mean square or L2 if

lim E(Xn − X)2 = 0.


n→∞

Let X1 , X2 , . . . be i.i.d. with finite mean E[X] and var(X). Then Sn → E[X] in mean square.
" n
!2 #
 2
 1 X
E (Sn − E[X]) =E (Xi − E[X])
n
i=1
" #
1 X
= 2E (Xi − E[X])(Xj − E[X])
n
i,j
n
1 X
= E(Xi − E[X])2 ∵ Xi s are independent
n2
i=1
1
= var(X) → 0
n
as n → ∞.
Note proof works even if the Xi s are only pairwise independent or even only uncorrelated.
7
Convergence in Mean Square

Convergence in mean square does not imply convergence a.s.


Example ♣: Let X1 , X2 , . . . be independent random variables such that
 1
0 w.p. 1 − n
,
Xn = 1
1 w.p. n .
  1
E (Xn − 0)2 = n
→ 0 as n → ∞. Therefore, this sequence converges to 0 in mean square.
Pm 1 m
For any 0 < ϵ < 1, using the inequalities log(1 − x) ≤ −x and i=n i
≥ log n
,

    m 
1
Y 
P max |Xi | < ϵ = lim P max |Xi | < ϵ = lim 1−
i≥n m→∞ n≤i≤m m→∞ i
i=n
m
! m
!
1 1 n
X   X
= lim exp log 1 − ≤ lim exp − ≤ lim = 0.
m→∞ i m→∞ i m→∞ m
i=n i=n

Therefore, it does not converge a.s.

8
Convergence in Mean Square

Convergence a.s. does not imply convergence in mean square.

From Example †, Yn converges to 0 a.s., but


1
E (Yn − 0)2 = 22n n = 2n ,
 
2
the sequence does not converge in mean square.

9
Convergence in Probability

A sequence X1 , X2 , . . . converges to a random variable X in probability if for any ϵ > 0,

lim P(|Xn − X| > ϵ) = 0,


n→∞

p
If Xn → X a.s., then Xn −→ X.
For proof, see [ Github: https://github.com/wptay/aipt ].
If Xn → X a.s., then 1{|Xn −X|>ϵ} → 0 a.s. The Dominated Convergence Theorem tells
us that

E1{|Xn −X|>ϵ} = P(|Xn − X| > ϵ) → 0.

Example ♣:
1
P(|Xn − 0| > ϵ) = → 0,
n
so the sequence converges in probability. But we saw before that it does not converge a.s.
10
Convergence in Probability
p
If Xn → X in mean square, then Xn −→ X.
From the Markov inequality,
1
P(|Xn − X| > ϵ) ≤ E(Xn − X)2 → 0.
ϵ2
Converse is not true. Example:

 0 w.p. 1 − n1 ,
Xn =
 n 1
w.p. n.

Convergence in probability:
1
P(|Xn − 0| > ϵ) = → 0.
n
But,
1
E(Xn − 0)2 = n2 · = n → ∞.
n
Convergence in probability is weaker than both convergence a.s. and in mean square.
11
Weak Law of Large Numbers

Suppose X1 , X2 , . . . are such that EXi = 0, EXi2 = σ 2 < ∞ and E[Xi Xj ] ≤ 0 for i ̸= j. Then,

n
1 X p
Xi −→ 0 as n → ∞.
n
i=1

Proof:

We have
" n
!2 # " n
#
1 X 1 X X
E Xi = 2E Xi2 +2 Xi Xj
n n
i=1 i i<j
n
1 X σ2
≤ EXi2 = .
n2 n
i

12
Weak Law of Large Numbers

From Chebyshev’s inequality, we then have for any ϵ > 0,

n
!
1 X 1 σ2
P Xi > ϵ ≤ → 0,
n ϵ2 n
i=1

as n → ∞.

13
Convergence in distribution

d
We say that Xn −→ X if for all continuous bounded functions f ,

E[f (Xn )] → E[f (X)].

d
Equivalent to saying Xn −→ X ⇐⇒ FXn (t) → FX (t) for all continuity points t of
FX (·). [Github: https://github.com/wptay/aipt]

Central Limit Theorem (CLT) for i.i.d. sequences.


ConsiderPi.i.d. random variables X1 , X2 , . . ., with EXi = µ and var Xi = σ 2 < ∞. Let
n
Sn = n1 i=1 Xi . Then

Sn − µ d
Zn = √ −→ N (0, 1).
σ/ n

14
CLT Example

15
Characteristic Functions

Given a random variable X, the characteristic function of X is φ : R 7→ C given by

φ(t) = E eitX ,
 

= E[cos(tX)] + iE[sin(tX)],

where i = −1.
This is the Fourier transform of pdf of X.
Example: The characteristic function of the Gaussian distribution N (µ, σ 2 ) is
σ 2 t2
φ(t) = eiµt− 2 .

In particular, when µ = 0, σ 2 = 1,
t2
φ(t) = e− 2 .

16
Fourier Inversion

R
Suppose that φ(t) is a characteristic function of a r.v. X. If |φ(t)| dt < ∞, then X has
pdf
Z
1
fX (x) = φ(t)e−itx dt.

d
If Xn −→ X then φn (t) → φ(t), ∀ t ∈ R.
Obvious because φn (t) = E eitXn = E[cos tXn ] + iE[sin tXn ], cos(tx) and sin(tx) are
d
bounded continuous functions, and Xn −→ X.

17
Fourier Inversion

Converse not true. The sequence Xn ∼ N (0, n) has characteristic function


nt2 n→∞
φn (t) = e− 2 −−−−→ 0, ∀ t ̸= 0 and φn (0) = 1, ∀ n. Therefore, φn (t) converges.
But for all x ∈ R,
  
1 x 1
P(Xn < x) = 1 + erf √ → ,
2 2n 2

when n → ∞, which implies that this sequence does not converge in distribution.

18
Levy’s Continuity Theorem
Suppose Xn has characteristic function φn (t) → φ(t), and φ(t) is continuous at t = 0. Then there exists
d
X s.t. φ(t) is the characteristic function of X and Xn −→ X.
For proof, see [Github: https://github.com/wptay/aipt].
Proof of CLT for i.i.d. sequence. Without loss of generality, we assume that µ = 0, σ = 1. We have
n
1 X
Zn = √ Xj .
n
j=1

Let
  Pn 
j=1
Xj
φn (t) = E exp it √
n
n h  X i
Y j
= E exp it √
n
j=1
 X
n
it √ 1
= Ee n

n
t

= φ1 √ .
n
19
CLT Proof
From the Taylor series expansion, we have

φ′′
1 (0)
φ1 (t) = φ1 (0) + φ′1 (0)t + + o(t2 ).
2!
   
Since φ1 (0) = 1, φ′1 (0) = E iX1 ei0X1 = iEX1 = 0, φ′′
1 (0) = E (iX1 )
2 = −1, we obtain

t2
φ1 (t) = 1 − + o(t2 ).
2
Therefore,
n
t

φn (t) = φ1 √
n
 n
t2 n→∞ t2
= 1− + o(t2 /n) −−−−→ e− 2 ,
2n

the characteristic function of N (0, 1) (certainly continuous at t = 0). From Levy’s Continuity Theorem,
we then have
d
Zn −→ N (0, 1).

20
CLT for Discrete R.V.
Pn Xi −1/2
Example: X1 , X2 , . . . are i.i.d. ∼ Bern (1/2). Note that Zn = i=1

n/2
is discrete and has
no pdf. But its cdf converges to the Gaussian cdf.

21
CLT for Random Vectors

The CLT applies to i.i.d. sequences of random vectors.


Let X1 , X2 , . . . be i.i.d. with each having finite mean µ and non-singular covariance
matrix Σ.
n
1 X d
Zn = √ (Xi − µ) −→ N (0, Σ)
n i=1

22
Summary

23

You might also like