Stochastic Processes and Time Series
Module 2
                                   Markov Chains - II
             Dr. Alok Goswami, Professor, Indian Statistical Institute, Kolkata
1     Conditional Probability Results
1.1   Some Basic Results involving Conditional Probabilities:
CP1: P(A ∩ B) = P(A | B)P(B)
CP2: More generally, P(A1 ∩ · · · ∩ An ) = P(An | A1 ∩ · · · An−1 ) · · · · · P(A2 | A1 ) · P(A1 )
CP3: P(A ∩ B | C) = P(A | B ∩ C) · P(B | C)
CP4: More generally, P(A1 ∩· · ·∩An | B) = P(A1 | A2 ∩· · ·∩An ∩B)· · · · ·P(An−1 | An ∩B)·P(An | B)
CP5: P(B) = P(B | A1 )P(A1 ) + · · · + P(B | An )P(An ) for mutually exclusive and collectively ex-
haustive events A1 , . . . , An .
CP6: More generally, for events A1 , . . . , An as above, P(B | C) = P(B ∩ A1 | C) + · · · + P(B ∩
An | C) = P(B | A1 ∩ C) P(A1 | C) + · · · + P(B | An ∩ C) P(An | C)
2     Back to Markov Chain:
Let (Xn )n≥0 be a Markov chain (to be often abbreviated as MC) with state space S and transition
probabilities {pxy , x, y ∈ S}.
Recall: For any x, y ∈ S, pxy represents the conditional probability that the chain will move to
state y at the next time point, given that the chain is in state x at present time point.
What about probability of moving to a state y after 2 steps?
What is P(Xn+2 = y | X0 = x0 , . . . , Xn−1 = xn−1 , Xn = x)?
2.1   2-step transition probabilities:
Using (CP6) and (CP3) and the Markov property, the above equals:
                X
                   P(Xn+2 = y, Xn+1 = z | X0 = x0 , . . . , Xn−1 = xn−1 , Xn = x)
                    z∈S
                    X
                =    P(Xn+2 = y | X0 = x0 , . . . , Xn−1 = xn−1 , Xn = x, Xn+1 = z)
                    z∈S
                                                                           X
                × P(Xn+1 = z | X0 = x0 , . . . , Xn−1 = xn−1 , Xn = x) =         pxz pzy
                                                                           z∈S
                                                   1
3     Markov chains: Some Notations
               P                                       (2)
The quantity z∈S pxz pzy obtained above is denoted by pxy and is called the “2-step transition
probability” (from state x to state y).
In fact, what we have just proved, implies:
    P(Xn+2 = y | X0 = x0 , . . . , Xn−1 = xn−1 , Xn = x) = P(Xn+2 = y | Xn = x)
                               (2)
    =P(X2 = y | X0 = x)=pxy
3.1    k-step transition probabilities
One can similarly consider k-step transition probabilities, for any k ≥ 2, and show by induction
that,
    P(Xn+k = y | X0 = x0 , . . . , Xn−1 = xn−1 , Xn = x)
                                                                 (k)
    =P(Xn+k = y | Xn = x) = P(Xk = y | X0 = x) = pxy
           (k)  P (k−1)              (k−1) P
                                pxz pzy = # pxz1 · · · pzk−1 y ,
                              P
    where pxy =    pxz pzy =
                  z∈S                z∈S
                         P#
    with the last sum         above being over all z1 , . . . , zk−1 ∈ S.
4     Marix Notations:
4.1    Transition Matrix:
Consider the S × S matrix: P = ((pxy ))x,y∈S
P : transition matrix (or, the transition probability matrix).
Properties: (i) all entries non-negative, (ii) each row-sum = 1
 (2)                                                    (2)
pxy = z pxz pzy =(x, y)th entry of P 2 =⇒ P 2 = (( pxy ))
      P
                        (k)                                (1)
In general, P k = (( pxy )), for every k ≥ 1. (write pxy = pxy )
                              (0)
    Holds for k = 0 also: pxy = P(X0 = y | X0 = x) = δxy .
4.2    Chapman-Kolmogorov Equation
From the trivial identity P m+n = P m · P n , one easily gets:
                       (m+n)    P (m) (n)
   C-K Equations: pxy        =    pxz pzy , for x, y ∈ S, m, n ≥ 1
                                     z∈S
    (x → y in m+n steps ≡ x → some z in m steps, z → y in n steps)
We introduce two very important notations:
For any event A involving (Xn )n≥0 , Px (A) := P(A | X0 = x). For any real random variable
Z determined by (Xn )n≥0 , Ex (Z) := E(Z | X0 = x). Thus, for each x ∈ S, Px denotes the
(conditional) probabilityand Ex denotes the (conditional) expectation, given that the chain starts
at state x.
                                                       2
    Clearly, in this notation,
                                                                                   
                                        pxy = Px (X1 = y) = Ex I{X1 =y}
                                 and p(k)
                                                                                    
                                      xy = Px (Xk = y) = Ex I{Xk =y}
(Here, I(·) denotes the usual indicator random variable.)
5    Some Facts Left as Exercises:
Using Markov property (and also Time-homogeneity), one can show the following (treat these as
exercises!):
    P(Xn+1 = y1 , . . . , Xn+k = yk | X0 = x0 , . . . , Xn−1 = xn−1 , Xn = x)
    =Px (X1 = y1 , . . . , Xk = yk ) = pxy1 py1 y2 · · · pyk−1 yk
For any B ∈ S k and for any f : S k → R
                   P( (Xn+1 , . . . , Xn+k ) ∈ B | X0 = x0 , . . . , Xn−1 = xn−1 , Xn = x)
                                                        X
                    = Px ( (X1 , . . . , Xk ) ∈ B) =             pxy1 py1 y2 · · · pyk−1 yk
                                                          (y1 ,...,yk )∈B
             and, E( f (Xn+1 , . . . , Xn+k ) | X0 = x0 , . . . , Xn−1 = xn−1 , Xn = x)
                                                X
                = Ex ( f (X1 , . . . , Xk ) ) =     f (y1 , . . . , yk )pxy1 py1 y2 · · · pyk−1 yk
                                                (y1 ,...,yk )
The last two equalities dealt with “finite” future (Xn+1 , . . . ,Xn+k ). But, same holds for “infinite”
future as well. For any B ∈ S ∞ and for any f : S ∞ → R
                    P( (Xn+1 , Xn+2 , . . .) ∈ B | X0 = x0 , . . . , Xn−1 = xn−1 , Xn = x)
                                        = Px ( (X1 , X2 , . . .) ∈ B)
              and, E( f (Xn+1 , Xn+2 , . . .) | X0 = x0 , . . . , Xn−1 = xn−1 , Xn = x)
                                    = Ex ( f (X1 , X2 , . . .))
Upshot: Given the past upto a time n and the present state x, the conditional future evolution of
the chain from time (n+1), behaves in exactly the same way as the evolution of the chain from time
1, conditioned on its starting at the state x and the corresponding conditional prob-
abilities and expectations are all completely determined by transition probabilities of
the MC.
6    Initial Distribution
Conditional probabilities and expectations related to future evolution, given history upto any time
point, reduce to Px and Ex , which are, in turn, determined by the transition probabilities.
To get unconditional probabilities and expectations, we need to specify the distribution of the
                                                                3
initial state X0 and use:
                     X                           X
               P(A)= P(A |X0 = x)P(X0 = x), E(Z)= E(Z |X0 = x)P(X0 = x)
                         x∈S                                              x∈S
Denote the distribution of X0 by µ:                 µx = P(X0 = x), x ∈ S. Writing Pµ and Eµ respectively
for (unconditional) probabilities and expectations arising out of µ, the above relations reduce to:
                                  X                         X
                       Pµ (A) =      µx Px (A), Eµ (Z) =       µx Ex (Z)
                                            x∈S                                  x∈S
Distribution of a MC is completely specified by Initial Distribution (µx , x ∈ S) and the
Transition Probabilities (pxy , x, y ∈ S)
7      Distribution of the MC
Joint distribution of (X1 , . . . , Xn ):
                                                                   X
                           Pµ (X1 = x1 , . . . , Xn = xn ) =             µx pxx1 · · · pxn−1 xn
                                                                   x∈S
                                                         P         (n)
Marginal distribution of Xn : Pµ (Xn = x) =                   µz pzx
                                                        z∈S
                                         (n)
Denote distribution of Xn as: µ(n) = {µx , x ∈ S}. The above formula says (in matrix notation):
µ(n) = µ · P n , where µ and µ(n) are both viewed as 1 × S (row) vectors. Thus, probabilities of
all finite-dimensional events are explicitly given in terms of initial distribution µ and transition
matrix P . But, the really important questions for a MC centre around undesrtanding its long-term
characteristics and these involve Infinite-Dimensional Events!!
What are these questions? How does one answer them?
8      On-Off Chain: Detailed Analysis
A complete analysis of the On-Off Chain will help us identify at least one important question for
general MC.
State space: S = {0, 1}
Transition probabilities: p01 = α = 1−p00 , p10 = β = 1−p11 .
Let, P(X0 = 0) = µ0 , P(X0 = 1) = µ1 (= 1 − µ0 ).
           (n)                        (n)                              (n)
Denote µ0 = P(Xn = 0), µ1 = P(Xn = 1) = 1 − µ0 .
                                            (n)
Recurrence relation between the successive µ0 :
 (n)      (n−1)            (n−1)                                         (n−1)
µ0     = µ0       · p00 + µ1       · p10 = β + (1 − α − β) · µ0
Deduce:
                                            n−1
                                (n)
                                            X
                               µ0     =β          (1 − α − β)k + (1 − α − β)n · µ0 .
                                            k=1
                                                               4
Assume: α + β > 0. [Just rules out α = β = 0, so OK (why?)]
                                                                                      
                                                 (n)      β                n    β          (n)
Get a simple formula for distribution of Xn :   µ0     =     − (1 − α − β)         − µ0 , µ1 =
                                                         α+β                   α+β
      (n)
1−µ0
   All three main parameters µ0 , α and β involved, as expected!
We ask: If n → ∞, is there a limit? If so, what?
Assume: α + β < 2. [Rules out α = β = 1, again OK (why?)]
0 < α + β < 2 ⇒ |1 − α − β| < 1 ⇒ (1 − α − β)n → 0
We conclude (except in case α = β = 0 or α = β = 1) :
                      (n)        β                      (n)         α
                 lim µ0     =       = π0 (say) and lim µ1 = 1−π0 =
                n→∞             α+β                n→∞             α+β
8.1     What are the main takeaways from the above analysis?
The chain has a unique limiting distribution. The limit depends only on α and β, but not on µ.
(Initial distribution often has little or no effect in long run!) Limit distribution π = {π0 =
  β          α
α+β , π1 = α+β } has property:
if µ = π, then µ(n) = π for all n. (X0 ∼ π ⇒ Xn ∼ π ∀ n)
(First prove: πP = π, then conclude πP n = π for all n)
This property is expressed as: π is a stationary distribution.
                 
         β     α
π = α+β     , α+β   is the only stationary distribution.
(It is the only probability on S = {0, 1} satisfying πP = π)
To sum up: The On-Off Chain has, irrespective of the initial distribution, a unique limiting
distribution, which is also a stationary distribution for the chain and is the only one.
   • Do the Same Results hold for general Markov Chains?
   • Quite expectedly, the answer is : NO!!
   • Next Natural Question: How to go about understanding long-term behaviour of a general
     MC?
   • Distinct Advantage for On-Off chain: Simple explicit formula for µ(n) .
   • Such explicit computations NOT possible for general MC.
   • Will have to employ different ideas, that do NOT depend on such explicit computations.