0% found this document useful (0 votes)
56 views5 pages

Stochastic Processes and Time Series Markov Chains - II: 1 Conditional Probability Results

This document summarizes key concepts related to Markov chains, including: - Conditional probability results that are useful for Markov chains. - The definition of transition probabilities and k-step transition probabilities. - Notations used for transition matrices and Chapman-Kolmogorov equations. - How the future evolution of a Markov chain depends only on its present state. - How the distribution of a Markov chain is determined by the initial distribution and transition probabilities. - An example analysis of an On-Off Markov chain to illustrate long-term characteristics.

Uploaded by

Biju Angalees
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views5 pages

Stochastic Processes and Time Series Markov Chains - II: 1 Conditional Probability Results

This document summarizes key concepts related to Markov chains, including: - Conditional probability results that are useful for Markov chains. - The definition of transition probabilities and k-step transition probabilities. - Notations used for transition matrices and Chapman-Kolmogorov equations. - How the future evolution of a Markov chain depends only on its present state. - How the distribution of a Markov chain is determined by the initial distribution and transition probabilities. - An example analysis of an On-Off Markov chain to illustrate long-term characteristics.

Uploaded by

Biju Angalees
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Stochastic Processes and Time Series

Module 2
Markov Chains - II
Dr. Alok Goswami, Professor, Indian Statistical Institute, Kolkata

1 Conditional Probability Results


1.1 Some Basic Results involving Conditional Probabilities:
CP1: P(A ∩ B) = P(A | B)P(B)
CP2: More generally, P(A1 ∩ · · · ∩ An ) = P(An | A1 ∩ · · · An−1 ) · · · · · P(A2 | A1 ) · P(A1 )
CP3: P(A ∩ B | C) = P(A | B ∩ C) · P(B | C)
CP4: More generally, P(A1 ∩· · ·∩An | B) = P(A1 | A2 ∩· · ·∩An ∩B)· · · · ·P(An−1 | An ∩B)·P(An | B)
CP5: P(B) = P(B | A1 )P(A1 ) + · · · + P(B | An )P(An ) for mutually exclusive and collectively ex-
haustive events A1 , . . . , An .
CP6: More generally, for events A1 , . . . , An as above, P(B | C) = P(B ∩ A1 | C) + · · · + P(B ∩
An | C) = P(B | A1 ∩ C) P(A1 | C) + · · · + P(B | An ∩ C) P(An | C)

2 Back to Markov Chain:


Let (Xn )n≥0 be a Markov chain (to be often abbreviated as MC) with state space S and transition
probabilities {pxy , x, y ∈ S}.
Recall: For any x, y ∈ S, pxy represents the conditional probability that the chain will move to
state y at the next time point, given that the chain is in state x at present time point.
What about probability of moving to a state y after 2 steps?
What is P(Xn+2 = y | X0 = x0 , . . . , Xn−1 = xn−1 , Xn = x)?

2.1 2-step transition probabilities:


Using (CP6) and (CP3) and the Markov property, the above equals:
X
P(Xn+2 = y, Xn+1 = z | X0 = x0 , . . . , Xn−1 = xn−1 , Xn = x)
z∈S
X
= P(Xn+2 = y | X0 = x0 , . . . , Xn−1 = xn−1 , Xn = x, Xn+1 = z)
z∈S
X
× P(Xn+1 = z | X0 = x0 , . . . , Xn−1 = xn−1 , Xn = x) = pxz pzy
z∈S

1
3 Markov chains: Some Notations
P (2)
The quantity z∈S pxz pzy obtained above is denoted by pxy and is called the “2-step transition
probability” (from state x to state y).
In fact, what we have just proved, implies:
P(Xn+2 = y | X0 = x0 , . . . , Xn−1 = xn−1 , Xn = x) = P(Xn+2 = y | Xn = x)
(2)
=P(X2 = y | X0 = x)=pxy

3.1 k-step transition probabilities


One can similarly consider k-step transition probabilities, for any k ≥ 2, and show by induction
that,
P(Xn+k = y | X0 = x0 , . . . , Xn−1 = xn−1 , Xn = x)
(k)
=P(Xn+k = y | Xn = x) = P(Xk = y | X0 = x) = pxy
(k) P (k−1) (k−1) P
pxz pzy = # pxz1 · · · pzk−1 y ,
P
where pxy = pxz pzy =
z∈S z∈S
P#
with the last sum above being over all z1 , . . . , zk−1 ∈ S.

4 Marix Notations:
4.1 Transition Matrix:
Consider the S × S matrix: P = ((pxy ))x,y∈S
P : transition matrix (or, the transition probability matrix).
Properties: (i) all entries non-negative, (ii) each row-sum = 1
(2) (2)
pxy = z pxz pzy =(x, y)th entry of P 2 =⇒ P 2 = (( pxy ))
P

(k) (1)
In general, P k = (( pxy )), for every k ≥ 1. (write pxy = pxy )
(0)
Holds for k = 0 also: pxy = P(X0 = y | X0 = x) = δxy .

4.2 Chapman-Kolmogorov Equation


From the trivial identity P m+n = P m · P n , one easily gets:
(m+n) P (m) (n)
C-K Equations: pxy = pxz pzy , for x, y ∈ S, m, n ≥ 1
z∈S
(x → y in m+n steps ≡ x → some z in m steps, z → y in n steps)
We introduce two very important notations:
For any event A involving (Xn )n≥0 , Px (A) := P(A | X0 = x). For any real random variable
Z determined by (Xn )n≥0 , Ex (Z) := E(Z | X0 = x). Thus, for each x ∈ S, Px denotes the
(conditional) probabilityand Ex denotes the (conditional) expectation, given that the chain starts
at state x.

2
Clearly, in this notation,

pxy = Px (X1 = y) = Ex I{X1 =y}
and p(k)

xy = Px (Xk = y) = Ex I{Xk =y}

(Here, I(·) denotes the usual indicator random variable.)

5 Some Facts Left as Exercises:


Using Markov property (and also Time-homogeneity), one can show the following (treat these as
exercises!):
P(Xn+1 = y1 , . . . , Xn+k = yk | X0 = x0 , . . . , Xn−1 = xn−1 , Xn = x)
=Px (X1 = y1 , . . . , Xk = yk ) = pxy1 py1 y2 · · · pyk−1 yk

For any B ∈ S k and for any f : S k → R

P( (Xn+1 , . . . , Xn+k ) ∈ B | X0 = x0 , . . . , Xn−1 = xn−1 , Xn = x)


X
= Px ( (X1 , . . . , Xk ) ∈ B) = pxy1 py1 y2 · · · pyk−1 yk
(y1 ,...,yk )∈B

and, E( f (Xn+1 , . . . , Xn+k ) | X0 = x0 , . . . , Xn−1 = xn−1 , Xn = x)


X
= Ex ( f (X1 , . . . , Xk ) ) = f (y1 , . . . , yk )pxy1 py1 y2 · · · pyk−1 yk
(y1 ,...,yk )
The last two equalities dealt with “finite” future (Xn+1 , . . . ,Xn+k ). But, same holds for “infinite”
future as well. For any B ∈ S ∞ and for any f : S ∞ → R
P( (Xn+1 , Xn+2 , . . .) ∈ B | X0 = x0 , . . . , Xn−1 = xn−1 , Xn = x)
= Px ( (X1 , X2 , . . .) ∈ B)
and, E( f (Xn+1 , Xn+2 , . . .) | X0 = x0 , . . . , Xn−1 = xn−1 , Xn = x)
= Ex ( f (X1 , X2 , . . .))

Upshot: Given the past upto a time n and the present state x, the conditional future evolution of
the chain from time (n+1), behaves in exactly the same way as the evolution of the chain from time
1, conditioned on its starting at the state x and the corresponding conditional prob-
abilities and expectations are all completely determined by transition probabilities of
the MC.

6 Initial Distribution
Conditional probabilities and expectations related to future evolution, given history upto any time
point, reduce to Px and Ex , which are, in turn, determined by the transition probabilities.
To get unconditional probabilities and expectations, we need to specify the distribution of the

3
initial state X0 and use:
X X
P(A)= P(A |X0 = x)P(X0 = x), E(Z)= E(Z |X0 = x)P(X0 = x)
x∈S x∈S
Denote the distribution of X0 by µ: µx = P(X0 = x), x ∈ S. Writing Pµ and Eµ respectively
for (unconditional) probabilities and expectations arising out of µ, the above relations reduce to:
X X
Pµ (A) = µx Px (A), Eµ (Z) = µx Ex (Z)
x∈S x∈S
Distribution of a MC is completely specified by Initial Distribution (µx , x ∈ S) and the
Transition Probabilities (pxy , x, y ∈ S)

7 Distribution of the MC
Joint distribution of (X1 , . . . , Xn ):
X
Pµ (X1 = x1 , . . . , Xn = xn ) = µx pxx1 · · · pxn−1 xn
x∈S
P (n)
Marginal distribution of Xn : Pµ (Xn = x) = µz pzx
z∈S
(n)
Denote distribution of Xn as: µ(n) = {µx , x ∈ S}. The above formula says (in matrix notation):
µ(n) = µ · P n , where µ and µ(n) are both viewed as 1 × S (row) vectors. Thus, probabilities of
all finite-dimensional events are explicitly given in terms of initial distribution µ and transition
matrix P . But, the really important questions for a MC centre around undesrtanding its long-term
characteristics and these involve Infinite-Dimensional Events!!
What are these questions? How does one answer them?

8 On-Off Chain: Detailed Analysis


A complete analysis of the On-Off Chain will help us identify at least one important question for
general MC.

State space: S = {0, 1}


Transition probabilities: p01 = α = 1−p00 , p10 = β = 1−p11 .
Let, P(X0 = 0) = µ0 , P(X0 = 1) = µ1 (= 1 − µ0 ).

(n) (n) (n)


Denote µ0 = P(Xn = 0), µ1 = P(Xn = 1) = 1 − µ0 .
(n)
Recurrence relation between the successive µ0 :

(n) (n−1) (n−1) (n−1)


µ0 = µ0 · p00 + µ1 · p10 = β + (1 − α − β) · µ0

Deduce:
n−1
(n)
X
µ0 =β (1 − α − β)k + (1 − α − β)n · µ0 .
k=1

4
Assume: α + β > 0. [Just rules out α = β = 0, so OK (why?)]
 
(n) β n β (n)
Get a simple formula for distribution of Xn : µ0 = − (1 − α − β) − µ0 , µ1 =
α+β α+β
(n)
1−µ0

All three main parameters µ0 , α and β involved, as expected!

We ask: If n → ∞, is there a limit? If so, what?

Assume: α + β < 2. [Rules out α = β = 1, again OK (why?)]

0 < α + β < 2 ⇒ |1 − α − β| < 1 ⇒ (1 − α − β)n → 0

We conclude (except in case α = β = 0 or α = β = 1) :


(n) β (n) α
lim µ0 = = π0 (say) and lim µ1 = 1−π0 =
n→∞ α+β n→∞ α+β

8.1 What are the main takeaways from the above analysis?
The chain has a unique limiting distribution. The limit depends only on α and β, but not on µ.
(Initial distribution often has little or no effect in long run!) Limit distribution π = {π0 =
β α
α+β , π1 = α+β } has property:
if µ = π, then µ(n) = π for all n. (X0 ∼ π ⇒ Xn ∼ π ∀ n)
(First prove: πP = π, then conclude πP n = π for all n)
This property is expressed as: π is a stationary distribution.
 
β α
π = α+β , α+β is the only stationary distribution.
(It is the only probability on S = {0, 1} satisfying πP = π)

To sum up: The On-Off Chain has, irrespective of the initial distribution, a unique limiting
distribution, which is also a stationary distribution for the chain and is the only one.

• Do the Same Results hold for general Markov Chains?

• Quite expectedly, the answer is : NO!!

• Next Natural Question: How to go about understanding long-term behaviour of a general


MC?

• Distinct Advantage for On-Off chain: Simple explicit formula for µ(n) .

• Such explicit computations NOT possible for general MC.

• Will have to employ different ideas, that do NOT depend on such explicit computations.

You might also like