Random Variables Explained
Random Variables Explained
APPENDIX ON COMPLETENESS:
Now, At least one of them contains infinitely many {xn}, call
it I1. Repeat this to generate I1, I2, · · · , such that there
1 A sequence {xn} in a metric space S is Cauchy in metric
0 X(!) Xn(!) n I{X(!) n} + X(!)I{X(!) > n}. is some xn(k) in Ik = [ak , bk ] where
2
d if limm,n"1 d(xn, xm) ! 0.
Both terms on the right ! 0, the first because 2 n # 0 ak ", bk #, |ak bk | = (b a)/2k # 0.
and the second because I{X(!) > n} = 0 from some n
The metric d is complete, equivalently, (S, d) is complete,
on for each !. Likewise, So ak , bk ! x⇤ = maxk ak = mink bk . Then xn(k) ! x⇤.
if every Cauchy sequence converges.
1
0 X(!) Xn0 (!) 8n. Proof that Rd is complete: It suffices to consider
2n d = 1. Let {xm} ⇢ R and limn,m"1 |xn xm| = 0. Then
This property may hold for one metric and not for
The claim follows. given ✏ > 0, for sufficiently large n, |xn xm| < ✏ 8 m n,
another even when the two are equivalent (i.e., lead hence {xk } is bounded. The claim follows by (⇤), (†).
to the same open sets).
Proof: Sequence xn = n 1 is Cauchy in d but does not The collection of sets A ⇢ R for which {X 2 A} 2 F
(i) Xn ! X and,
converge. (It is not Cauchy in d0.) So d is not complete. forms a -field (follows from the fact that for a map
⇢
(ii) Xn0 ! X uniformly, pointwise in both cases. But d0 is because {yn} is Cauchy/convergent in d0 , y1n f : S1 ! S2, f 1 preserves set operations).
min(X, 0), and apply the above lemma to X +, X . contains the -field generated by such sets, viz., the Borel
-field B of R. In other words, X 1(A) 2 F for all Borel
sets A 2 R. Hence X is a random variable.
(X) := {X 1(A) : A 2 B} is a -field,
1. ⌦, 2 J 8J 2 C, so ⌦, 2 ⇤. Since (Y ) ⇢ (X), Ai = {! : Y (!) = ai} must equal
the ‘ -field generated by X’ denoted (X). {! : X(!) 2 Bi} for some Bi in the Borel sigma field of
2. If A 2 ⇤, then A 2 J for some J 2 C, therefore Ac 2 J Rd .
Equivalently, the smallest sub- -field of F with respect to for this J. So Ac 2 ⇤.
which X is measurable (i.e., which contains X 1(A), A 2 Also, {Ai} are disjoint.
B), 3. If Ai 2 Ji for some Ji 2 C, then [iAi 2 J for J =
[iJi ⇢ I and J, being a countable union of countable Define disjoint sets {Cn} by:
or, the intersection of sub- -fields of F with respect to sets, is countable, i.e., J 2 C. Thus J ⇢ ⇤, so [iAi 2 ⇤. 1
C 1 = B 1 , Cn = B n \ [ n
m=1 Bm, n 2.
which X is measurable (i.e., which contains X 1(A), A 2
B). This proves that ⇤ is a -field, completing the proof. Then [n = [n 8n.
m=1Cm m=1Bm
Furthermore,
This result shows that I contains only those sets that
What about a possibly uncountably infinite collection of 1
can be described in terms of countably many indices in {! : X(!) 2 Cn} = {! : X(!) 2 Bn\ [n
m=1 Bm}
real random variables, say X↵, ↵ 2 I, where I is an un- 1
I. = {! : X(!) 2 Bn}\ [n
m=1 {! : X(!) 2 Bm}
countable index set?
1
= A n \ [n
m=1 Am
For example, the set {max↵ X↵ < 1} is not in this set. = An ,
Let (X↵, ↵ 2 I) denote the smallest sub- -field of F with
respect to which all X↵ are measurable. Let C denote the because {Ai} are disjoint. Define h(x) = ai for x 2 Ci 8i.
This is a serious limitation when, e.g., we consider I =
collection of countable subsets of I. Then
[0, T ] (say) with t 2 I interpreted as time. That is, Xt, t 2
X
[0, T ], is a random process. One then has to go beyond Y = ai I A i
Lemma (X↵, ↵ 2 I) = [J2C (X↵, ↵ 2 J). i
X
this simple construct. = aiI{!:X(!)2Ci}
i
· · · · · · (because Ai = {! : X(!) 2 Ci})
and
Such random variables are said to be integrable. nX1
0 1 Thus the Lebesgue measure enlarges the class of func-
@ min f (x)A (xi+1 xi). tions that we can integrate. This generality carries over
i=1 xixxi+1
For Rd-valued random variables, the expectation is Let |⇧| := maxi(xi+1 xi). If the two sums approach each to expectations, because they are simply Lebesgue inte-
defined componentwise. other as |⇧| ! 0, we call the common limit the Riemann grals with respect to a probability measure.
Rt
integral 0 f (x)dx. If not, the integral is undefined.
The following analogy will help fix the ideas.
100, 10, 10, 20, 50, 20, 100, 10, 10, 10, 100, 50, 20, Thus
Lemma If Zn, n = 1, 2, · · · , 1, are simple random vari-
50, 20, 10, 50, 50, 20, 10, 20.
ables with 0 Zn " Z1 a.s., then E[Zn] " E[Z1]. lim E[Y m] = lim E[X n] = E[X].
m"1 n"1
R
Proof of the above theorem: 1. Change of variables formula: Let X be an Rd-valued
We often use the alternative notations X(!)P (d!) or
R R integrable random variable with law µ and f : Rd 7! R
X(!)dP (!) or simply XdP .
Define Wn,m := X n ^ Y m, m, n 1. Since X n " X Ym be a measurable function such that the real random
a.s., Wn,m " Y m a.s. as n " 1. Similarly, since Y m " X variable f (X) is integrable. Then
This definition is not dependent on the choice of the ap-
X n a.s. as m " 1, Wn,m " X n a.s. as m " 1. Hence Z Z
proximations {X n}: Suppose {Yn} is another sequence of E[f (X)] := f (X(!))dP (!) = f (x)µ(dx).
simple random variables with 0 Y n " X a.s. lim E[X n] lim E[Wn,m]
n"1 n"1 For f = IA for some A 2 B(R), this reduces to
······ (because X n Wn,m 8m)
P (X 1(A)) = µ(A), which is how µ was defined.
= E[Y m]
Theorem E[Y n] " E[X]. ······ (by Lemma) It extends to simple f by linearity and general f by
2. Chebyshev inequality: This says: for a random vari- for real random variables X, Y, Z and p, q, r 1. Some
familiar cases are: We have the following two useful inequalities:
able X 0 and 0 < x 2 R,
E[X] the variance var(X) of a real random variable X, which
P (X x) . Theorem (Minkowski inequality) Let X, Y be random
x
is its second centred moment,
variables on some probability space with kXkp, kY kp < 1
This follows from the fact
for some p, 1 p 1. Then
standard deviation := the square-root thereof, and,
E[X] E[XI{X x}] E[xI{X x}] = xP (X x).
kX + Y kp kXkp + kY kp. (1)
You are invited to verify each step using the properties the covariance matrix of RD -valued random variables
X, Y, given by
of expectation.
E (X E[X])(Y E[Y ])T ,
For non-negative X, we have the following convenient Proof The claim is easy when p = 1 or when kXkp or
3. Jensen’s inequality: If f : Rd 7! R is convex, then kY kp is zero. So let 1 p < 1 and kXkp, kY kp > 0.
formula:
E[f (X)] f (E[X])
R X , Y := Y , so that kX k , kY k = 1.
Let X0 := kXk
Theorem If X 0 a.s., E [X p] = p 01 tp 1P (X t)dt, p 0 kY kp 0 p 0 p
assuming both sides are well defined.
1 p < 1.
pkXk
Then for := kXk +kY kp
,
To see this, use the fact that f (x) = supg2G g(x) p
For a real valued random variable X and p 1, define For non-negative integer valued X, this leads to
the following whenever the expectation is well defined. 1
X
E[X] = P (X n). Thus
n=0 ! !
1 p 1 p
1. pth moment E [X p] E[|X+Y |p] E[|X|p]+E[|Y |p] = E[|X|p] p + E[|Y |p] p .
Let X be a real random variable and 1 < p, q < 1 with
1 1
2. pth centred moment E [(X E[X])p] 1 + 1q = 1. We also allow p = 1, q = 1. Let a := E[|X|p] p , b := E[|Y |p] p . We are done if we prove
p
1 1
(ap + bp) p ((a + b)p) p = a + b,
3. pth absolute moment E [|X|p] Define
1 a , xp + (1
equivalently, for x := a+b x)p 1, which is
kXkp = (E [|X|p]) p , 1 p < 1; kXk1 := ess supkXk,
4. pth centred absolute moment E [|X E[X]|p] easily verified. This completes the proof. 2
where ‘ess sup’, which stands for ‘essential supremum’,
For Rd valued random variables, the last two can be is the smallest C 0 such that |X| C a.s. (Recall that
defined with k · k replacing | · |. random variables are a.s. equivalence classes.)
As in case of (1), equality holds for 1 < p < 1 if and only Bounded convergence theorem:
if X = Y for some 2 R\{0}. For p = 2, q = 2 and this
Corollary If X, Y 6= 0 a.s. and 1 < p < 1 in the above,
reduces to the familiar Cauchy-Schwartz inequality. Theorem Let Xn, n 1, be real random variables satisfy-
then equality holds in (1) if and only if X = Y for some
2 R\{0}. ing |Xn| K a.s. for some constant K < 1 and Xn ! X
Let 1 p 1 and kXkp, kY kp < 1. Then: a.s. Then E[Xn] ! E[X].
Fatou’s lemma:
Monotone convergence theorem:
✏ + E[(|Xn| + |X|)I{|Xn X| ✏}]
Theorem Let Xn 0 be integrable real random variables
with Xn ! X a.s. Then lim inf n"1 E[Xn] E[X].
✏ + 2KP (|Xn X| ✏)]
Theorem Let Xn, n 1, be integrable real random vari-
Proof We need the following lemma: ables with Xn " X a.s. Then E[Xn] " E[X].
n"1
! ✏
Lemma Let X 0 be integrable. Then E[X ^ N ] " E[X] Proof By replacing Xn by Xn X1 if necessary, we may
by the above lemma.
as N " 1.
assume without loss of generality that Xn 0 a.s. Then
by Fatou’s lemma,
Since ✏ can be made arbitrarily small, the claim follows. Proof Let ✏ > 0. Define
N 1
N 2X k k
(
k+1
) lim inf E[Xn] E[X].
XN := I X< . n"1
N 2N 2N
k=0 2
The ‘limit infimum’ or ‘liminf’ of a real sequence {xn} is Note that XN X ^ N . Pick N sufficiently large so that
defined as On the other hand, Xn X implies E[Xn] E[X]. Thus
|E[X] E[XN ]| = E[X] E[XN ] < ✏, (1)
lim inf xn = lim inf xm = sup inf xm. lim sup E[Xn] E[X].
n"1 n"1 m n n 1m n n"1
which is possible from our construction of the expecta-
It is always well defined in R [ {±1}. But
tion.
lim sup E[Xn] lim inf E[Xn].
n"1 n"1
Similarly, the ‘limit supremum’ or ‘limsup’ of a real Thus Combining these inequalities, the claim follows.
sequence {xn} is defined as
0 E[X] E[X ^ N ] = E[X] E[XN ] + E[XN ] E[X ^ N ]
lim sup xn = lim sup xm = inf sup xm. This result also holds if E[X] = 1.
n"1 n"1 m n n 1m n E[X] E[XN ] < ✏.
It is always well defined in R [ {±1}. Since ✏ > 0 is arbitrary, the claim follows.
Dominated convergence theorem: ✏ + 2 sup E[|Xj |I{|Xn X1| > ✏}]
2. If {X↵, ↵ 2 I} is uniformly integrable, then 1j1
Theorem Let {Xn}, X, Y be real random variables with sup↵ E[|X↵|] < 1.
✏ + 2 sup (E[|Xj |I{|Xn X1| > ✏}I{|Xj | a}]
Xn ! X a.s. and |Xn| Y with E[Y ] < 1. Then 1j1
To see this, fix a > 0 such that
E[Xn] ! E[X].
+ E[|Xj |I{|Xn X1| > ✏}I{|Xj | > a}])
sup E[|X↵|I{|X↵| > a}] := M < 1.
↵
Proof By Fatou’s lemma,
Then
✏ + 2aP (|Xn X1| > ✏) +
E[Y ] lim sup E[Xn] = lim inf E[Y Xn ]
n"1 n"1 sup E[|X↵|] sup E[|X↵|I{|X↵| > a}] 2 sup E[|Xj |I{|Xj | > a}].
↵ ↵ 1j1
E[Y X] = E[Y ] E[X],
+ sup E[|X↵|I{|X↵| a}] Thus
↵
implying M + a < 1.
0 lim inf E[|Xn X1|]
lim sup E[Xn] E[X]. n"1
n"1
Similarly,
The following equivalence serves as an alternative defini-
lim inf E[Xn] + E[Y ] = lim inf E[Xn + Y ]
n"1 n"1
tion of uniform integrability, sometimes more convenient lim sup E[|Xn X1|]
E[X + Y ] = E[X] + E[Y ], n"1
to use.
implying
✏ + 2 sup E[|Xj |I{|Xj | > a}].
Theorem {X↵, ↵ 2 I} are uniformly integrable if and only 1j1
lim inf E[Xn] E[X].
n"1
if Let ✏ # 0 and a " 1 to conclude.
Thus
Z
sup E [|X↵|] < 1 and lim sup |X↵|dP = 0. (2)
E[X] lim inf E[Xn] lim sup E[Xn] E[X]. ↵ P (A)!0 ↵2I A
n"1 n"1
The following two facts are immediate (prove them): = E[|Xn X1|I{|Xn X1| > ✏}] +
E[|Xn X1|I{|Xn X1| ✏}] Theorem Let {Xn, 1 n 1} be non-negative in-
tegrable random variables with Xn ! X1 a.s. Then
1. If {X↵, ↵ 2 I} is uniformly integrable and Y1, · · · , Yk are
E [|Xn X1|] ! 0 if and only if E [Xn] ! E [X1].
integrable random variables, then {X↵, ↵ 2 I; Y1, · · · , Yk } ✏ + E[(|Xn| + |X1|)I{|Xn X1| > ✏}]
is uniformly integrable.
Proof Clearly, “Uniform continuity”:
|E [Xn] E [X1] | E [|Xn X1|] ! 0. f is uniformly continuous if given ✏ > 0, we can find
Lecture 5 > 0 such that kx yk < =) |f (x) f (y)| < ✏.
Conversely, 0 (X1 X n )+ X 1 .
Hence by the dominated convergence theorem, Contrast with ‘continuity at x’:
h i Vivek Borkar
E (X1 Xn)+ ! 0.
IIT BOMBAY f is continuous at x if given ✏ > 0, we can find >0
Then since (X1 Xn ) = Xn X1 + (X1 X n )+ ,
such that kx yk < =) |f (x) f (y)| < ✏.
E [|Xn X1|] = 2E (X1 Xn)+ + E [Xn] E [X1] ! 0. Sept. 2020
‘Continuous’ () ‘continuous at x 8x’. Here can
This completes the proof. depend on x.
5. =) 1. Let A ⇢ B(Rd) with µ(@A) = 0. Let f 2 Cb(Rd) Corollary In 3. above, it suffices to consider bounded
2. =) 3. For an open G ⇢ Rd , the set and for a given ✏ > 0, choose a0 < a1 < · · · < aN , N 1, open G.
such that
G✏ := {x 2 G : inf kx yk ✏} ⇢ G Proof Suppose for all bounded open G,
y2@G
G✏ is nonempty for ✏ > 0 sufficiently small, and is closed. 1. a0 = supx |f (x)| ✏, aN = supx |f (x)| + ✏,
lim inf µn(G) µ(G). (1)
n"1
The function f ✏ : Rd 7! [0, 1] given by
2. µ({x : f (x) = ai}) = 0 8i, and, Consider an unbounded open set G. Let Gn := G \ {x :
inf y2G c kx yk
f ✏(x) := kxk < n}, n 1. Then {Gn} are bounded open, so
inf y2G c kx yk + inf y2G ✏ kx yk
3. |ai ai 1 | < ✏ 8 1 i N .
is uniformly continuous with respect to k · k (prove it), lim inf µn(G) lim inf µn(Gm) µ(Gm)
n"1 n"1
is 1 on G✏ and 0 outside G and f✏ " IG as ✏ # 0. This is always possible because the set of a for which for m 1. Since G = [mGm and Gm ⇢ Gm+1, letting
µ({x : f (x) = a}) > 0 is at most countable and can be " 1 on the right hand side, µ(Gm) " µ(G), hence (1)
avoided. holds for any arbitrary open set G.
Corollary There exists a countable family {fi} ⇢ Cb(Rd)
d(·,·)
such that Fact: d(·, ·) a metric, then d(·, ·)^1, 1+d(·,·) are equivalent Consider µn ! µ1 in P(Rd) and Fn, 1 n 1, the
Z Z bounded metrics. corresponding distribution functions, assumed to be con-
fidµn ! fidµ 8i =) µn ! µ.
tinuous and strictly increasing. Then Fn ! F1 pointwise.
ProofLet G be open. For any x 2 G, we can find a ratio- One such convenient metric is
nal y 2 G (i.e., with rational coordinates) and a rational Let U := a uniformly distributed random on the
⇢(µ, ⌫) := inf E [kX Y k ^ 1]
r > 0 such that the open ball {z 2 Rd : kz yk < r} X⇡µ,Y ⇡⌫ probability space ([0, 1], B([0, 1]), P ), P := the
contains x and is contained in G. where ‘X ⇡ µ’ stands for ‘X has law µ’ and likewise for Lebesgue measure, defined as U (!) = !.
G is the union of all open balls with rational centres and Y, ⌫, and the infimum on the right is over all pairs of
rational radii contained in it. Then finite unions of such random variables (X, Y ) such that X has law µ and Y Then Xn := Fn 1(U ) has law µn for 1 n 1 and
balls is a countable collection G and an argument analo- has law ⌫. Xn ! X1 a.s. (in fact, pointwise).
gous to the above shows that it suffices to verify 3. for
sets in G.
Sm Sm [m]
=) = ⇥ ! 0 Empirical Risk Minimization: justifies minimization of
m [m] m
a.s. 1 Pn
empirical loss n
2 m=1 L(Xm, Ym, ✓) in place of E [L(Xn, Yn, ✓)]
where {(Xn, Yn)} are i.i.d. input-output pairs.
Let {Xn} be independent and identically distributed with A sequence of probability measures µ✏ 2 P(S), ✏ > 0, 5. Thermodynamic limits of lattice models in statistical
mean a. Define I : R 7! R [ {1} by satisfies the large deviations principle with rate function mechanics: variational principles of thermodynamics
✓ ◆
I if for all Borel sets in S,
I(x) := sup ✓x log E e✓X .
✓
inf I lim inf ✏ log µ✏(B) 6. Associated with ‘fluid limits’ of point processes
Then I(·) is known as the rate function. B ✏#0
Properties of I(·):
Laplace-Varadhan principle:
h i
1. I(✓) 0⇥x log E e0⇥X = 0.
"
F (X)
# Lecture 7
lim ✏ log E e ✏ = min(F (x) I(x)).
2. I, being a pointwise supremum of linear functions, is ✏!0 x
More generally, consider a partition D := {B1, · · · , Bm} ⇢ Let c := inf y2G kx yk and xn 2 G, n 1, such that
If Y, Z were two G measurable random variables satisfying
F of ⌦ with P (Bi) > 0 8i. kx xnk # c. Now,
the definition, we have
Z Z Z
Y dP = ZdP 8 C 2 G =) (Y Z)dP = 0. k(xn x) + (xm x)k2 + k(xn x) (xm x)k2
Suppose we are told which particular bin Bi the sample C C C
point ! falls in. Then we can say that ‘when ! 2 Bi, the 2kxn xk2 + 2kxm xk2
Taking C := {Y Z > 0} 2 G, we have
(A\Bi)
conditional probability of A ⇢ F is P (A|Bi) := P P ’. =) kxn xm k2
(Bi) Z
(Y Z)dP = E [(Y Z > 0}] = 0, xn + xm 2
{Y Z>0}
Z)I{Y 2kxn xk2 + 2kxm xk2 4 x
2
In other words, we can define a random variable P (A|G) 2kxn xk2 + 2kxm xk2 4c2
which implies Y Z a.s.
where G := the -field generated by {B1, · · · , Bm} as: n,m"1
! 2c2 + 2c2 4c2 = 0.
0 1
m
X P (A \ Bi) A
P (A|G) := @ I Bi A symmetric argument leads to Y Z a.s., so Y = Z
P (Bi) Then {xn} is Cauchy and therefore xn ! some x̌ 2 G. It
i=1 a.s. =) a.s. uniqueness.
follows that kx x̌k = c. Set x⇤ = x̌.
and call it the ‘conditional probability of A given G’.
G consists of , the Bi’s and [i2S Bi for all possible S ⇢ D. Digression: Projection theorem:
Corollary For x, x⇤ as above, hx x⇤ , y x⇤i = 0 8 y 2 G.
Thus for any C = [i2S Bi 2 G, S ⇢ D, we have Let H be a Hilbert space over reals , i.e., a vector space
0 0 1 1
Z Z m P (A \ Bi) A
X with an inner product h·, ·i such that the associated norm Proof If not, either y x⇤ or 2x⇤ y makes a strictly
P (A|G)dP = @ @ IBi A dP q
C C i=1 P (Bi) x 2 H 7! kxk := hx, xi is complete. acute angle with x x⇤ , say the former. Then for points
0 0 1 1
Z X P (A \ Bi)
= @ @ A I A dP
B i sufficiently close to x⇤ on the line segment joining y and
C i2S P (Bi)
0 1
X P (A \ Bi)
Theorem Given a closed subspace G of H and x 2 H, x⇤, the distance from x is strictly less that kx x⇤k, a
= @ A P (B )
i
i2S P (Bi) there exists a unique x⇤ 2 G such that contradiction.
X
= P (A \ Bi)
i2S kx x⇤k = min kx yk.
= P (A [ C). y2G
Let H = L2(⌦, F , P ), i.e., the space of random variables Letting N " 1 in the above equality and using the mono-
h i
X on (⌦, F , P ) that are square-integrable, i.e., E X2 < tone convergence theorem, we have, For A 2 F , we define P (A|G) := E [IA|G] and call it the
Z Z
1. XdP = E[X|G]dP 8 C 2 G. conditional probability of A given G.
C C
By the above corollary, (for ✓ ⌘ 0) A simpler but far from elementary proof for existence of Properties of conditional expectations
E[X|G] goes as follows.
E[(X X̂)Y ] = E[(X X̂)(Y X̂)] E[(X X̂)(✓ X̂)] = 0
Fix a probability space (⌦, F , P ) and let G 0 ⇢ G ⇢ F be
for Y 2 G. (Take x = X, x⇤ = X̂ and Y = y x⇤ or = the Radon-Nikodym theorem: Suppose µ is a positive mea- sub- -fields. X, Y, W, Z denote integrable random vari-
zero vector x⇤ in the above.) sure on (⌦, F ) and ⌫ a signed measure on (⌦, F ) such ables on (⌦, F , P ).
that ⌫ is absolutely continuous w.r.t. µ, i.e.,
h i
That is, E[XY ] = E X̂Y . In what follows, all expectations and conditional expec-
µ(A) = 0, A 2 F =) ⌫(A) = 0.
tations are assumed to be well defined.
Letting Y = IC for some C 2 G, we have Then there exists an F -measurable function Z : ⌦ 7! R
Z h i Z such that 1. (Monotonicity) X Y a.s. =) E[X|G] E[Y |G] a.s.
XdP = E [XIC ] = E X̂IC = X̂dP.
C C Z Z
f d⌫ = f Zdµ 8 F measurable f.
Thus we can define E[X|G] := X̂, proving existence. (Already proved)
R
Thus µ(A|G) = P (X 2 A|G) a.s. and f dµ(x|G) = E[f (X)|G]
5. (Conditional Minkowski inequality) Let kXkp, kY kp < An arbitrary family of random variables, events, or sub-
a.s.
1 for some p, 1 p 1. Then -fields is independent if every finite subfamily thereof
is.
When G = (X↵, ↵ 2 I), we write µ(·|X↵, ↵ 2 I). E[|X + Y |p|G]1/p E[|X|p|G]1/p + E[|Y |p|G]1/p.
Let Ai 2 Fi, i = 1, 2. Then Observe that only the first definition is symmetric in F1
h i h h ii and F3.
E IA1 IA2 E [Y |F1 _ F2] = E E IA1 IA2 Y |F1 _ F2
h i
= E IA1 IA2 Y
h h ii Example: {Xn} satisfies the Markov property if 8n,
= E E IA1 IA2 Y |F2
h h ii
= E IA2 E IA1 Y |F2 P (Xn + 1 2 A1, · · · , Xn+k 2 Ak |Xm, m n)
h h i i
= E IA2 E IA1 |F2 E [Y |F2]
h h ii = P (Xn + 1 2 A1, · · · , Xn+k 2 Ak |Xn).
= E E IA1 IA2 E [Y |F2] |F2
h i () the ‘future’ Xm, m > n, and the ‘past’ Xm, m < n,
= E IA1 IA2 E [Y |F2] .
are conditionally independent given the present.
form a -field.
This implies 2.