0% found this document useful (0 votes)

55 views51 pages

Notes Co905

This document provides an introduction to stochastic processes and Markov chains. It begins with definitions of stochastic processes and Markov processes. The document then focuses on Markov chains, discussing their general properties including transition probabilities and the Chapman-Kolmogorov equations. Discrete and continuous time Markov chains are introduced. The document provides context and examples to illustrate key concepts in stochastic processes and Markov chains.

Uploaded by

Prem Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views51 pages

Notes Co905

Uploaded by

Prem Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

Stochastic Processes

CO905 - Complexity Science

Stefan Grosskinsky

Warwick, 2008

These notes and other information about the course are available on
http://www.warwick.ac.uk/∼masgav/teaching/co905.html

Contents
Introduction 2

1 Markov chains 4
1.1 General properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Discrete time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Continuous time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Countably infinite state spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2 Processes with continuous state space 19

2.1 Brownian motion and the Central limit theorem . . . . . . . . . . . . . . . . . 19
2.2 General facts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3 Brownian motion and the heat equation . . . . . . . . . . . . . . . . . . . . . 26
2.4 Diffusion processes and Fokker-Planck equations . . . . . . . . . . . . . . . . 27

3 Martingales 34
3.1 Filtrations and adapted processes . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2 Properties of martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4 Stochastic calculus 40
4.1 Diffusion processes and SDEs . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2 Semimartingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.3 Stochastic integration and Itô calculus . . . . . . . . . . . . . . . . . . . . . . 46
4.4 Solutions to SDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

References
[GS] G. Grimmett and D. Stirzaker: Probability and Random Processes (3rd edition), Oxford
2001
[Ga] C.W. Gardiner: Handbook of Stochastic Methods (3rd edition), Springer 2004

1
Introduction

In this module we will cover the basics to study complex systems with stochastic time
evolution. There are two different origins of stochasticity:

• Classical mechanics: stochasticity due to lack of information

In principle all components involved in the system follow a deterministic system of equa-
tions of motion. But in practice all microscopic details are not accessible and the un-
known influences on the dynamics are approximated as effective random noise with a
certain postulated distribution. The actual origin of the noise may be related to chaotic
motion, i.e. deterministic time evolution with random initial data such as a dice or pen-
dulum, or neglected interactions in a large system such as gases or fluids leading to a
stochastic time evolution.

• Quantum mechanics: inherent stochasticity

Even simple systems can only be described stochastically and the full microscopic details
are inherently inaccessible (uncertainty principle). Mathematically, the state of such a
system is therefore given by a complex probability density function (wave function),
rather than a single element in the set of all possible configurations.

Examples.

In this course we only cover classical stochastic systems. After a general introduction
to stochastic processes we will study some examples of particle systems with thermal inter-
actions. The first and most classical example of this phenomenon is Brownian motion (see
Gardiner, Section 1.2). In 1827 Robert Brown observed the irregular motion of small pollen
grains suspended in water. A first satisfactory theoretical description of this phenomenon was
given by Einstein in 1905. A mathematically idealized version of this is called the Wiener
process and can be described by the theory of stochastic calculus which was developed in the
1950s by Itô. Due to the continuous state space of the system this theory is rather involved, and
will be discussed towards the end of the module. Simpler to analyse are models with a discrete
state space such as birth-death processes, which appear for example in predator-prey models
in biology (see Gardiner, Section 1.3). In the first part of the course we concentrate on Markov

2
chains (following Grimmett and Stirzaker, Chapter 6), which are certain stochastic processes
with discrete state space. We conclude the introductory section by two general definitions.

Definition 0.1 A stochastic process X = (Xt : t ∈ T) is a family of random variables Xt :

Ω → S with state space S and time index set T ⊆ R.

A stochastic process X : T × Ω → S is a function of two variables, time t and ω ∈ Ω. For

fixed ω, the function t 7→ Xt (ω) is called a sample path. The probability space Ω is arbitrary,
but has to be big enough to encode all possible time evolutions. A canonical choice is the set
of all sample paths Ω = {f : T → S}, or often one requires some regularity of the functions
f , such as continuity.

Definition 0.2 A stochastic process is a Markov process if for all t1 < t2 < . . . < tn ∈ T,
n ∈ N, for all s1 , . . . , sn−1 ∈ S and all (measurable) A ⊆ S,

P Xtn ∈ An Xt1 = s1 , . . . , Xtn−1 = sn−1 = P Xtn ∈ An Xtn−1 = sn−1 . (0.1)
A Markov process is called homogeneous if for all (measurable) A, B ⊆ S and t > t0 ∈ T

P Xt ∈ A Xt0 ∈ B = P Xt−t0 ∈ A X0 ∈ B . (0.2)
A homogeneous Markov process is called a Markov chain, if S is discrete.

In this course we will only deal with homogeneous Markov processes. We will concentrate
on the choices T = N, Z for discrete time and T = [0, ∞), R for continuous time processes.
Typical choices for state spaces are S = Z (e.g. random walk, birth-death processes), N (e.g.
counting processes), Rd (e.g. Brownian motion).

Examples.

3
1 Markov chains
1.1 General properties
Definition 1.1 For a Markov chain we define the transition probabilities

pij (t) := P(Xt = j|X0 = i) ∈ [0, 1] for all i, j ∈ S , (1.1)

and the transition ’matrices’

P (t) := pij (t) : i, j ∈ S ∈ [0, 1]|S|×|S| .

(1.2)

A homogeneous Markov chain starting at time t = 0 is uniquely determined by an initial

distribution π(0) with πi (0) = P(X0 = i), i ∈ S and the transition probabilities, because
every joint probability can be written as

P Xt1 ∈ A1 , . . . , Xtn ∈ An =
X
= πi0 (0)pi0 i1 (t1 )pi1 i2 (t2 − t1 ) · · · pin−1 in (tn − tn−1 ) (1.3)
i0 ∈S,i1 ∈A1 ,..,in ∈An

for all 0 < t1 < . . . < tn ∈ T and A1 , . . . , An ⊆ S. In particular, the distribution at time t is
X
πj (t) = P(Xt = j) = πi (0) pij , so π(t) = π(0) P (t) . (1.4)
i∈S

Example.

Proposition 1.1 P (0) = Id and the family P (t) : t ≥ 0 satisfies the Chapman-Kolmogorov
equations,

P (t + t0 ) = P (t) P (t0 ) for all t, t0 , t + t0 ∈ T . (1.5)

Proof. pij (0) = δij by definition, and for all t, t0 , t + t0 ∈ T

pij (t + t0 ) = P(Xt+t0 = j|X0 = i) =

X X
= P(Xt+t0 = j|X0 = i, Xt = k) P(Xt = k|X0 = i) = pik (t) pkj (t0 ) ,(1.6)
k∈S k∈S

using the total probability sum rule, the Markov property and homogeneity. 2

4
For discrete time with T = N this leads to
P (n + 1) = P (1) P (n) = P (n) P (1) ⇒ P (n) = P n , (1.7)

where we denote P = P (1). Therefore a discrite time Markov chain since is uniquely deter-
mined by the initial distribution π(0) and the transition matrix P .

Example.

For continuous time with T = [0, ∞) we require some regularity of the function t 7→ P (t)
at t = 0. We only study processes where it is continuous and differentiable, i.e.
P (t) − Id
lim P (t) = P (0) = Id and G := lim exists . (1.8)
t&0 t&0 t
Together with the Chapman-Kolmogorov equations this implies that
P (t + ∆t) − P (t) P (∆t) − Id P (∆t) − Id
= P (t) = P (t) , (1.9)
∆t ∆t ∆t
and thus taking ∆t & 0, P (t) is differentiable for all t ≥ 0 and fulfills

d
P (t) = G P (t) = P (t) G ⇒ P (t) = exp(t G) . (1.10)
dt
For finite state spaces |S| < ∞ the formal solution to this equation is given by a matrix
exponential which is discussed in Section 2.3 in more detail.

Definition 1.2 A probability distribution π ∗ is called stationary if π ∗ P (t) = π ∗ for all t ≥ 0.

This will play an important role in the long-time behaviour of Markov chains, since ’often’
π(t) → π ∗ . How and when this is true will be seen later.

Theorem 1.2 (Existence) A Markov chain with finite state space S has at least one stationary
distribution.

Proof. Depends on discrete or continuous time, see later.

In Section 1.5 we will see a generalisation of this for inifinite state spaces. The question of
uniqueness of stationary distributions is connected to the following definition.

5
Definition 1.3 State i ∈ S communicates with state j ∈ S if pij (t) > 0 for some t ∈ T, and
we write i → j. States i and j are connected if i → j and j → i, and we write i ↔ j.
The Markov chain is called irreducible if i ↔ j for all i, j ∈ S.
A state i is called absorbing, if i 6→ j for all j 6= i.
Examples.

Theorem 1.3 (Uniqueness) An irreducible Markov chain has at most one stationary distribu-
tion.
Proof. Depends on discrete or continuous time, see later.

1.2 Discrete time

Since P (n) = P n a discrete time Markov chain is uniquely determined by the transition matrix
P = P (1) and its initial distribution. So with Definition 1.2, π ∗ is a stationary distribution if
and only if π ∗ P = π ∗ , i.e. π ∗ is a left eigenvector with eigenvalue P λ = 1.
By definition, P is a stochastic matrix, i.e. pij ∈ [0, 1] and j∈S pij = 1, since the chain
starting in i has to jump somewhere in S. So λ = 1 is an eigenvalue of P with right eigenvector
1 = (. . . , 1, 1, . . .)T . Therefore if S is finite, there exists at least one left eigenvector π ∗ , whose
entries can be shownPto be non-negative. If the chain is irreducible this eigenvector is unique
under the condition i∈S πi∗ = 1.
Theorem 1.4 A discrete time, finite state irreducible Markov chain has a unique stationary
distribution π ∗ , where
πi∗ = 1/µi with µi := E(Ti |X0 = i) and Ti := min{t ≥ 1 : Xt = i} , (1.11)

6
so that π ∗ is determined by the inverse of the mean recurrence times µi .
Furthermore,
π ∗ = 1 (Id − P + U )−1 , where uij = 1 for all i, j ∈ S . (1.12)

Proof. see Grimmett and Stirzaker pp 229 - 230 for recurrence times.
for uniqueness see proof of Theorem 1.8.
π∗ = π∗P ⇔ π ∗ (Id − P ) = 0 ⇔ π ∗ (Id + P ) + 1 = 1
⇔ π ∗ (Id − P + U ) = 1 (1.13)
By uniqueness of π ∗ we know that this linear system of equations has a unique solution, which
is equivalent to Id − P + U ) being invertible. 2

Example.

Proposition 1.5 Let X = Xn : n ∈ {0, . . . , N } be a finite state irreducible Markov chain
with transition matrix P X . Suppose further that X is stationary, i.e. Xn ∼ π ∗ for all n. Then
the ’reversed chain’ Y with Yn = XN −n is a Markov chain with transition matrix
πj∗ X
pYij = p for all i, j ∈ S . (1.14)
πi∗ ji
Proof. Using stationarity and the Markov property of X we get

P Yn+1 = in+1 Yn = in , . . . , Y0 = i0 =
P(Yk = ik , 0 ≤ k ≤ n + 1) P(XN −k = ik , 0 ≤ k ≤ n + 1)
= = =
P(Yk = ik , 0 ≤ k ≤ n) P(XN −k = ik , 0 ≤ k ≤ n)
πi∗ pin+1 in · · · pi1 i0 πi∗n+1 pin+1 in
= n+1 = (1.15)
πi∗n pin in−1 · · · pi1 i0 πi∗n
as required. 2

Note that in general a time-reversed Markov chain is not necessarily a Markov chain, this
only holds for stationary chains. π ∗ is then also stationary for the reversed chain Y .

Definition 1.4 Let π be a probability distribution on S. A discrete time Markov chain X with
transition matrix P is called reversible (w.r.t. π), if it fulfilles the detailed balance conditions
πi pij = πj pji for all i, j ∈ S . (1.16)

7
Proposition 1.6 Suppose a discrete time Markov chain X is reversible w.r.t. π. Then π is a
stationary distribution of X.

2
P P
Proof. From (1.16) we deduce (πP )j = i∈S πi pij = i∈S πj pji = πj .

Note that Proposition 1.5 together with (1.16) implies that a reversible Markov chain and
its time-reversal are indistinguishable, i.e. they have the same transition probabilities, since
πj∗ X πi∗ X
pYij = p ji = p = pX
ij . (1.17)
πi∗ πi∗ ij
The detailed balance relations (1.16) can be a useful tool to find stationary distributions of cer-
tain Markov chains ’without loops’.

Example.

Definition 1.5 The period d(i) of a state i ∈ S is defined as

d(i) := gcd t ≥ 1 : pii (t) > 0 , (1.18)

the greatest common divisor of the epochs at which return is possible.

Theorem 1.7 For an irreducible Markov chain all states have the same period, and we call
the chain aperiodic if d(i) = 1 for all i ∈ S.

Proof. see p.224 in Grimmett and Stirzaker

8
Example.

Theorem 1.8 An irreducible, aperiodic Markov chain with finite state space is ergodic, i.e.
pij (n) → πj∗ as t → ∞ , for all i, j ∈ S . (1.19)
Proof. The statement follows from the Perron-Frobenius Theorem:
If P is the transition matrix of a finite state irreducible Markov chain with period d then
(i) the d complex roots of unity are eigenvalue of P ,
λ1 = w0 = 1, λ2 = w1 , . . . , λd = wd−1 where w = e2πi/d , (1.20)
(ii) and the remaining eigenvalues λd+1 , . . . , λ|S| satisfy |λj | < 1.
Note that this includes uniqueness of the stationary distribution claimed in Theorem 1.4.
Suppose further that the eigenvalues are all distinct, then B P B −1 = (λi δij )ij is a diagonal
matrix with entries λ1 , . . . , λ|S| , where the rows of B are the left eigenvectors of P . Thus
 n   
λ1 . . . 0 1 ... 0
 . . 
P n = B −1  .. . . . ..  B → B −1  ... . . . ...  B (1.21)
 
0 . . . λ|S| n
0 ... 0
since λ1 = 1 and |λi | < 1 for all i > 1 if the chain is aperiodic.
The proof can be extended to countably infinite S. 2

This implies that for every initial distribution

π(n) = π(0) P n → π ∗ = (1/µ1 , . . . , 1/µ|S| ) as n → ∞ . (1.22)
Example.

9
1.3 Continuous time
Subject to the boundary conditions P (0) = Id, equations (1.10) often have a unique solution
∞ k
d X t
P (t) = G P (t) = P (t) G ⇒ P (t) = exp(t G) = Gk . (1.23)
dt k!
k=0

For example this is the case if |S| < ∞, and subject to certain technical conditions also for
inifite state space S. Therefore a continuous-time Markov chain is uniquely determined by the
initial distribution and the matrix G which is called the generator of the process.

How does G look and what is the relation to the time evolution of X?

Assume that Xt = i. For small times ∆t we have from (1.23)

pij (∆t) = gij ∆t + o(∆t) for all i 6= j ∈ S . (1.24)

So during a time interval (t, t + ∆t) the chain jumps from state i to j with probability gij ∆t,
and gij ≥ 0 can be interpreted as a jump rate. On the diagonal we have

pii (∆t) = 1 + gii ∆t + o(∆t) for all i ∈ S , (1.25)

which gives the probability that nothing happens in the time interval (t, t + ∆t). By normal-
ization we have
X X X
1= pij (∆t) = 1 + ∆t gij ⇒ gij = 0 for all i ∈ S . (1.26)
j∈S j∈S j∈S

Therefore the diagonal entries of G are

X
gii = − gij ≤ 0 for all I ∈ S , (1.27)
j6=i

and |gii | can be interpreted as the total rate to leave state i.

What does really ’happen’ in a continuous time Markov chain?

Assume that Xt = i and define the holding time

Wi := inf t0 ≥ 0 : Xt+t0 6= i ,

(1.28)

i.e. the (random) time until a jump occurs. This is actually independent of t by homogeneity
and if i is absorbing, gij = 0 for all j ∈ S and Wi = ∞.

Proposition 1.9 The random variable Wi is exponentially distributed with parameter |gii | and
if |gii | > 0, the probability that the chain jumps to j 6= i after time Wi is gij /|gii |.

Proof. Wi has ’loss of memory’ property, i.e. for all s, u > 0

P(Wi > s + u|Wi > s) = P(Wi > s + u|Xt+s = i) =

= P(Wi > u|Xt = i) = P(Wi > u) , (1.29)

10
where we have used the Markov property and homogeneity. Therefore we have for the tail

F̄ (s + u) = P(Wi > s + u) = P(Wi > s + u, Wi > s) , (1.30)

and thus F̄ (s + u) = F̄ (s) F̄ (u).

Analogous to the Chapman-Kolmogorov equations (1.5) this can be used to derive a differential
equation for F̄ which has an exponential solution

F̄ (s) = P(Wi > s) = eλs where λ = F̄ 0 (0) . (1.31)

Together with (1.25) we get

P(Wi > ∆t) − 1 pii (∆t) + o(∆t) − 1
F̄ 0 (0) = lim = lim = gii ≤ 0 , (1.32)
∆t&0 ∆t ∆t&0 ∆t

and therefore P(Wi > s) = e−|gii |s and Wi ∼ Exp(|gii |).

Now the probability that the chain jumps to j, conditioned on the event that it actually jumps
somewhere in the time interval (s, s + ∆t], is given by
pij (∆t) gij
P(Xt+s+∆t = j|Xt+s = i, Wi < ∆t) ' → as ∆t & 0 . (1.33)
1 − pii (∆t) −gii
Picture. 2

n−1
X
The chain jumps at the jump time Jn = WYi to state Yn = XJn .
i=0
Y = (Yn : n ∈ N) is called the jump chain, and it is a discrete time Markov chain with
transition Matrix P Y given by

Y 0 , i=j
pij = if gii > 0 , and pYij = δij if gii = 0 . (1.34)
gij /|gii | , i 6= j
So a continuous-time Markov chain can also be characterized by its jump chain Y and a se-
quence of independent exponentially distributed holding times (WYn : n ∈ N).

11
Examples.

For the Poisson process there exists also another characterization.

Proposition 1.10 X = (Xt : t ≥ 0) is a Poisson process with rate λ if and only if it has
stationary, independent increments, i.e.
Xt+t0 − Xt0 is distributed like Xt − X0 and independent of (Xs : s ≤ t0 ) , (1.35)
(λt)k
and for each t, Xt has Poisson distribution with parameter λt, i.e. P(Xt = k) = k! e−λt .
Proof.

Using equation (1.23) we can also get an evolution equation for the distribution,
d d
π(t) = π(0) P (t) = π(0) P (t) G = π(t) G . (1.36)
dt dt

12
This is called the Master equation and using (1.27) the coordinate form is given by

d X
πi (t) = πj (t) gji − πi (t) gij . (1.37)
dt
j6=i

If π = π ∗ is a stationary distribution, then both sides of the equation vanish.

Proposition 1.11 Let G be the generator of a continuous time Markov chain, P (t) = exp(tG)
and P Y the transition matrix of the jump chain. Then

π ∗ P (t) = π ∗ ⇔ π∗G = 0 ⇔ π̄P Y = π̄ , (1.38)

where π̄i = πi∗ |gii | for all i ∈ S.

Proof. Assume finite state space S and that all |gii | > 0.

π ∗ G = (0, . . . , 0) ⇔ π ∗ Gk = (0, . . . , 0) for all k ≥ 1

∞ k
X t ∗ k
⇔ π G = (0, . . . , 0) for all t ≥ 0
k!
k=1
∞ k
∗
X t
⇔ π Gk = π ∗ for all t ≥ 0 since G0 = Id
k!
k=0
⇔ π ∗ P (t) = π ∗ for all t ≥ 0 . (1.39)

By (1.34) we can write gij in terms of the entries of P Y , gij = |gii |(pYij − δij ) , and so
X X
(π̄P Y )j − π̄j = π̄i (pYij − δij ) = πi∗ gij = (π ∗ G)j , (1.40)
i∈S i∈S

and both sides vanish equivalently. 2

Theorem 1.12 A continuous time irreducible Markov chain with finite state space has a unique
stationary distribution π ∗ , where
1
πi∗ = with µi := E(Ti |X0 = i) and Ti := inf{t ≥ J1 : Xt = i} . (1.41)
µi |gii |

This follows immediately from Theorem 1.4 for discrete time by the tie-up with stationary
measures of the jump chain (Proposition 1.11). Note that Ti is still the recurrence time for the
jump chain Y . This forces a slightly different definition in terms of X, and Ti is often called a
first passage time. Then πi∗ is determined by the average fraction of time the chain spends in
state i,
1 E(Wi )
πi∗ = = with the expected holding time E(Wi ) = 1/|gii | . (1.42)
µi |gii | µi
The detailed balance conditions for a continuous-time Markov chain are

πi gij = πj gji for all i, j ∈ S . (1.43)

13
If they are fulfilled for a distribution π, then π is stationary since every term in the right-hand
side of (1.37) vanishes individually.

Examples.

Theorem 1.13 An irreducible Markov chain with finite state space is ergodic, i.e.
pij (t) → πj∗ as t → ∞ , for all i, j ∈ S . (1.44)

Again, this follows directly by ergodicity of the jump chain (Theorem 1.8), and it implies
E(W )
1 E(W|S| )
π(t) = π(0) P (t) → π ∗ = ,..., as t → ∞ , (1.45)
µ1 µ|S|
for every initial distribution π(0).
Note that for continuous time there is no issue of periodicity, since
if i → j then pij (t) > 0 for all t > 0 . (1.46)
This is because i → j is equivalent to
gii1 gi1 i2 · · · gin−1 j > 0 for some i1 , . . . in−1 ∈ S, n ∈ N , (1.47)
which implies that pij (t) ≥ pii1 (t/n) · · · pin−1 j (t/n) > 0 .

14
1.4 Countably infinite state spaces
For infinite state space S, the Markov chain can ’get lost at infinity’, and therefore not have a
stationary probability distribution.

Examples.

Definition 1.6 A state i ∈ S is called

recurrent, if P {t ≥ 0 : Xt = i} is unbounded X0 = i = 1 and

transient, if P {t ≥ 0 : Xt = i} is unbounded X0 = i = 0 . (1.48)

Proposition 1.14 Let i ∈ S be a non-absorbing state. Then

X∞ Z ∞
P(Ti < ∞) = 1 ⇔ pii (n) or pii (t) = ∞ ⇒ i recurrent ,
n=0 0
X∞ Z ∞
P(Ti < ∞) < 1 ⇔ pii (n) or pii (t) < ∞ ⇒ i transient . (1.49)
n=0 0

Each state is either recurrent or transient. If the chain is irreducible then the states are either
all recurrent or all transient.

Proof. see Section 6.2 in Grimmett and Stirzaker

15
Examples.

Apparently recurrence and transience of a Markov chain is not enough to characterize the
existence of a stationary distribution π ∗ completely. If it exists, the entries πi∗ ∝ 1/µi are
inversely proportional to the mean recurrence times µi = E(Ti |X0 = i). If the chain is
transient, then certainly µi = ∞ for all i, π ∗ does not exist and in fact πi (t) → 0.

Definition 1.7 A recurrent state i ∈ S is called positive recurrent if µi < ∞ and null recurrent
if µi = ∞.

Theorem 1.15 Let X be an irreducible (non-explosive1 ) Markov chain. Then X has a unique
stationary distribution if and only if it is positive recurrent.

So the positive recurrent (non-explosive) Markov chains behave exactly like chains with
finite state space concerning their stationary distributions. This also holds for dynamic proper-
ties and the convergence to equilibrium.

1
explained at the end of this section

16
Examples.

Therefore, X can be positive recurrent while the corresponding jump chain Y is null recur-
rent. However, we have the following connection.

Proposition 1.16 Let X be a (non-explosive) Markov chain with jump chain Y . Then

i ∈ S is transient for X ⇔ i is transient for Y ,

i ∈ S is recurrent for X ⇔ i is recurrent for Y . (1.50)

Transient continuous time chains can get lost at infinity even in finite time. This phenom-
enon is called explosion. Define the explosion time
∞
X
J∞ := lim Jn = Wi ∈ (0, ∞] . (1.51)
n→∞
i=1

This is a random variable that usually takes the value ∞, and we say that the chain is non-
explosive if P(J∞ = ∞) = 1. For example this is the case if |S| < ∞ or supi∈S |gii | < ∞.

17
Example.

18
2 Processes with continuous state space
2.1 Brownian motion and the Central limit theorem
Let Y1 , Y2 , . . . ∈ R be iidrvs with mean E(Yi ) = 0 and variance var(Yi ) = σ 2 > 0. Then
define the discrete-time process
n
X
Xn := Yi with X0 = 0 . (2.1)
i=1

For example if Yi ∼ U {−1, 1} then X is a simple symmetric random walk. Then by the
Central Limit Theorem (CLT) as n → ∞
X
√ n → ξ ∼ N (0, σ 2 ) (Gaussian rv with mean 0 and variance σ 2 ) , (2.2)
n
or, equivalently, for all y ∈ R
X Z y
n
1 2 2
P √ ≤y → √ e−x /(2σ ) dx with Gaussian pdf fX (x) . (2.3)
n 2
−∞
| 2πσ {z }
fX (x)

We can usepthe CLT to look at the process Xn in rescaled time t = n∆t. According to
(2.2), X[t/∆t] / 1/∆t should converge to a t-dependent random variable as ∆t → 0, and we
define
√ [t/∆t]
√ t X √
Bt := lim ∆t X[t/∆t] = lim p Yi = t ξt ∼ N (0, tσ 2 ) . (2.4)
∆t→0 ∆t→0 t/∆t i=1

Here the ξt ∼ N (0, 1) are different for each t, but they are certainly not independent. Note that
by the CLT the time rescaling induces a space rescaling

t = ∆t n , b = (∆t)α x with α = 1/2 , (2.5)

and on all other spatial scales, the limiting process does either not exist or is degenerate,

Bt = 0 for α > 1/2 , Bt is not well defined for α < 1/2 . (2.6)

19
Properties of the process B = (Bt : t ≥ 0):
B0 = 0 , Bt ∼ N (0, tσ 2 ) and analogously to (2.4)
√ [t/∆t]
t X
Bt − Bs = lim p Yi ∼ N (0, (t − s)σ 2 ) (2.7)
∆t→0 t/∆t i=[s/∆t]

for all t ≥ s ≥ 0. So B has stationary increments, i.e. Bt − Bs ∼ Bt−s − B0 , and by

independence of the Yi , B has independent increments, i.e.
Bt − Bs is independent of {Bu : u ≤ s} for all t ≥ s ≥ 0 . (2.8)
So far B is only the result of an informal derivation, an important question is wether it actually
exists as a mathematical object.

Theorem 2.1 Existence of Brownian motion (Wiener)

There exists a process B = (Bt : t ≥ 0) with stationary independent increments, such that
B0 = 0 and Bt ∼ N (0, t). B is called a standard Brownian motion (BM) or Wiener process.

Proof. see e.g. Rogers and Williams, Section I.6

√
It suffices to look at standard BMs B with σ 2 = 1 and B0 = 0, then σB + x0 is a BM
with variance σ 2 starting in x0 . All distributional properties of BM are characterized by the
finite dimensional distributions.

Proposition 2.2 Let B be a standard BM. For all t1 , . . . , tn , n ∈ N the vector

(Bt1 , . . . , Btn ) ∼ N (0, Γ) with γij = min{ti , tj } , (2.9)
has multivariate Gaussian distribution with zero means and covariance matrix Γ = (γij )i,j .

Proof. Bt ∼ N (0, t) and it suffices to show that cov(Bs , Bt ) = min{s, t}. Take s < t, then
E(Bs Bt ) = E Bs2 + Bs (Bt − Bs ) = E(Bs2 ) + 0 ,

(2.10)
since B has independent increments and E(Bs ) = 0. Thus cov(Bs , Bt ) = var(Bs ) = s . 2

Reminder. The pdf of the multivariat Gaussian (Bt1 , . . . , Btn ) is given by

1 1 −1 T

ft1 ,..,tn (x) = exp − 2 x Γ x with x = (x1 , . . . , xn ) . (2.11)
(2π det Γ)n/2

What are the regularity properties of a Brownian sample path?

From (2.4) we expect for Brownian motion

√
Bt+h − Bt = h ξ ∼ N (0, hσ 2 ) → 0 a.s. as h → 0 . (2.12)
Therefore Brownian sample paths are continuous (and more precisely, Hölder continuous with
exponent < 1/2). But they are nowhere differentiable, since
Bt+h − Bt σ
=√ ξ has no limit as h → 0 . (2.13)
h h

20
These properties do not follow from (2.7) and (2.8), which can be fulfilled also by discontin-
uous processes. But under the restriction that t 7→ Bt (ω) is a continous function
of t for all
ω ∈ Ω, BM is unique. So we restrict ourselves to the path space C [0, ∞), R of continuous
functions, and the process can then be described by a probability measure on that space.

Theorem 2.3 Uniqueness of Brownian motion (Wiener)

There exists a unique probability measure W on the path space C [0, ∞), R (called the
Wiener measure), such that the process with sample paths distributed according to W is a
Brownian motion as defined in Theorem 2.1

Proof. see e.g. Rogers and Williams, Section I.6

Examples of sample paths.

Note that if the increments Yi in (2.2) are not identically distributed or independent, the
CLT still holds under more general conditions (see e.g. Gardiner, Section 2.8.2). So Brownian
motion is the natural limiting process for a very general class of models.

Definition 2.1 A d-dimensional standard Brownian motion B = (Bt : t ≥ 0) is a collection

of d independent one-dimensional BMs B 1 , . . . , B d as defined in Theorem 2.1, i.e.

Bt = (Bt1 , . . . , Btd ) for all t ≥ 0 . (2.14)

kxk22

So the pdf of the increments Bt − Bs is ft−s (x) = (2π(t − s))−d/2 exp − 2(t−s) .

Analogous to the random walk, one can study recurrence and transience for BM depending
on the space dimension.

21
Theorem 2.4 (i) If d = 1 BM is point-recurrent, i.e.

P {t ≥ 0 : Bt = 0} is unbounded B0 = 0 = 1 . (2.15)

(ii) If d = 2, BM is neighbourhood-recurrent, i.e. for every > 0

P {t ≥ 0 : |Bt | < } is unbounded B0 = 0 = 1 . (2.16)

However, points are polar, i.e. for all x ∈ R2

P(Tx = ∞) = 1 , where Tx = inf{t > 0 : Bt = x} . (2.17)

(iii) If d ≥ 3, BM is transient, i.e. |Bt | → ∞ as t → ∞ with probability one.

Proof. see e.g. Rogers and Williams, Section I.18

Proposition 2.5 For dimension d ≥ 2, the image {Bt : t ≥ 0} ⊆ Rd of the sample path of a
BM B has Hausdorff (or fractal) dimension 2.

’Proof’. see class

22
2.2 General facts
In this section we discuss processes with continuous state space S = R or Rd and continuous
time T = [0, ∞). This is mathematically more complicated than Markov chains, and we will
discuss some of the technical issues below. On the other hand, the sample paths are now real
valued functions, our state space has an analytic structure and we will be able to use concepts
from usual calculus.
For example we will often integrate over sets A ∈ R of possible values with respect to the
distribution function F of a random variable X, e.g.
Z Z
P(X ∈ A) = dF (x) = f (x) dx where f = F 0 is the pdf (if it exists) . (2.18)
A A

This cannot be done for all sets A ⊆ R but only for A ∈ A, where A ( P(R) is a socalled σ-
algebra. This is a set of measurable sets where the measure dF (x) can be consistently defined
on.
Remember also that a random variable is actually a measurable function X : Ω → R, i.e.
for each measurable A ⊆ R the preimage X −1 (A) = {ω : X(ω) ∈ A is measurable with
respect to P on Ω. We always us the shorthand

P(X ∈ A) = P ω ∈ Ω : X(ω) ∈ A , (2.19)

and the probability space Ω is hidden from the discussion.

Definition 2.2 Let X, Y : Ω → R be random variables and A ⊆ R measurable. Then we say

X ∈ A almost surely (a.s.) , if P(X ∈ A) = 1 and

X = Y a.s. , if P(X = Y ) = P {ω : X(ω) = Y (ω)} = 1 and
X ∼ Y if FX (x) = P(X ≤ x) = FY (x) for all x ∈ R . (2.20)

Example.

So in general we have X = Y a.s. ⇒ X∼Y .

Similar concepts exist for convergence of random variables,

Xn → X means convergence in distribution, i.e. FXn (x) → FX (x) ,

Xn → X a.s. means almost sure convergence, i.e. P(Xn → X) = 1 . (2.21)

23
(Note that convergence in distribution is required only for all x ∈ R at which F is continuous
due to technical reasons, see e.g. Grimmett and Stirzaker, Section 7.2.)
In general, ’almost sure’ is stronger then ’in distribution’ and we have

X = Y a.s. ⇒ X∼Y , Xn → X a.s. ⇒ Xn → X . (2.22)

As for Markov chains, the distributional properties of a general stochastic process are de-
termined by fixing all finite-dimensional distributions (fdds)

Ft (x) = P Xt1 ≤ x1 , . . . , Xtn ≤ xn , (2.23)

for all t = (t1 , . . . , tn ) ∈ [0, ∞)n , ti 6= tj , x = (x1 , . . . , xn ) ∈ Rn and n ∈ N. For state space
S = R these are characterized by joint distribution functions Ft .

Theorem 2.6 If a collection {Ft } of fdds fulfills the Kolmogorov consistency relations

Ft,tn+1 (x, xn+1 ) → Ft (x) as xn+1 → ∞ , and

FΠt (Πx) = Ft (x) for all permutations Π of (1, . . . , n) , (2.24)

then there exists a prob. space Ω and a process X = (Xt : t ≥ 0) on Ω that has fdds {Ft }.

Proof. Is related to the Skorohod representation theorem. Basically one takes Ω to be the path
space of the process. Some hints are given in Grimmett and Stirzaker, Section 8.6

Example.

So Brownian motion is an example of a Gaussian process.

Definition 2.3 A real-valued, continuous-time process X is called a Gaussian process if each
finite-dimensional vector (Xt1 , . . . , Xtn ) ∼ N µ(t), V (t) is Gaussian with mean vector µ
and covariance matrix V , which may depend on t.

Analogous to Proposition 2.2, all finite dimensional distributions of a Gaussian process are
uniquely specified by its mean and covariance matrix (which has to be positive definite). But
note that Gaussian processes are not necessarily Markov. Standard Brownian motion is a
Gaussian process with zero mean and covariance min{s, t}.
The transition probabilities of a Markov chain can also be generalized.

24
Definition 2.4 Let X be a stochastic process. The conditional distribution function

F (t, x|s, y) = P(Xt ≤ x|Xs = y) , (2.25)

is called the transition kernel of X. If it has a density we call this the transition density,
∂F
f (t, x|s, y) = (t, x|s, y) . (2.26)
∂x
Note that for a homogeneous process, the kernel is actually only a function of t − s.

Proposition 2.7 The fdds of a Markov process are uniquely determined by the transition ker-
nels and the initial distribution.

Proof. Sample calculation for 0 ≤ t1 ≤ t2 with densities using the Markov property,
Z x2 Z x1 Z ∞

P Xt1 ≤ x1 , Xt2 ≤ x2 = f (0, x) f (t1 , y|0, x) f (t2 , z|t1 , y) dx dy dz .
−∞ −∞ −∞
2
Example.

In contrast to Markov chains, for continuous state space the fdds do not determine the
process uniquely. Two processes with the same fdds are called versions of each other, and their
sample paths can have very different properties. This fact cannot be ignored, since it is very
important when studying properties such as first-passage times.
In the previous section we saw that the sample paths of BM are continuous. Many interest-
ing phenomena cannot be modeled with continuous processes alone, but one usually concen-
trates on the following class of processes.

Definition 2.5 A real-valued, continuous-time process X is called càdlàg if its sample paths
are right continuous (continue à droite) and have left limits (limite à gauche), i.e.

lim Xs (ω) = Xt (ω) and lim Xs (ω) exists , for all ω ∈ Ω, t ∈ [0, ∞) . (2.27)
s&t s%t

For example continuous-time Markov chains (e.g. the Poisson process) are defined as càdlàg.

25
2.3 Brownian motion and the heat equation
We are looking for an evolution equation for the transition densities, analogous to the forward
equation (or master equation) for Markov chains. First we will derive it for Brownian motion
as scaling limit from the simple random walk.
Let (Xn : n ∈ N) be a simple random walk. Then the distribution at time n is given by
π(n + 1) = π(n) P , which can be written in the following incremental form
π(n + 1) − π(n) = π(n)(P − Id) , (2.28)
where P − Id is proportional to the discrete Laplacian ∆ : RZ → RZ ,
   
.. .. .. .. .. ..
 . 1. . 1  1 . . .  1
P − Id =  2 −1 2 1 −2 1
=  = ∆. (2.29)
  2  2
.. .. .. .. .. ..
. . . . . .

In the previous section we saw that under the scaling t = ∆t n, x = (∆t)α k with α = 1/2 ,
(∆t)α X[t/∆t] → Bt converges to Brownian motion as ∆t → 0. Therefore the mass function
πk (n) should converge to the pdf f (t, x) of Bt , i.e.
lim πx/(∆t)α (t/∆t) = f (t, x) = (2πt)−1/2 exp − x2 /(2t) .

(2.30)
∆t→0

Plugging the scaling into the discrete-time Master equation (2.28), we can derive a differ-
ential equation for f . We assume that for large n, k (i.e. small ∆t), πk (n) is approximately
given by
πk (n) ' f k(∆t)α , n∆t = f (t, x) .

(2.31)
Then we get by Taylor expansion
∂ (∆t)2α ∂ 2
πk±1 (n) ' f (t, x) ± (∆t)α f (t, x) + O (∆t)3α

f (t, x) + 2
∂x 2 ∂x
∂
πk (n + 1) ' f (t, x) + ∆t f (t, x) + O (∆t)2 .

(2.32)
∂t
Thus if α = 1/2 (otherwise the limit is again degenrate),
∂ πk (n + 1) − πk (n) 1
f (t, x) = lim = lim πk−1 (n) − 2πk (n) + πk+1 (n) =
∂t ∆t→0 ∆t ∆t→0 2∆t
(∆t)2α ∂ 2 3α−1
1 ∂2
= lim f (t, x) + O (∆t) = f (t, x) . (2.33)
∆t→0 2∆t ∂x2 2 ∂x2

26
So since standard BM starts in the origin, its pdf should fulfill
∂ 1 ∂2
f (t, x) = f (t, x) with initial condition f (0, x) = δ0 (x) . (2.34)
∂t 2 ∂x2
This PDE is the socalled heat equation which has been well studied and indeed (2.30) is its
unique solution.

Note that with the implicit initial condition f (t, x) = f (t, x|0, 0) in terms of transition den-
sities. An analogous derivation conditioned on Bs = y gives the same equation for f (t, x|s, y)
with the more general initial condition f (s, x|s, y) = δy (x).
Indeed, as we have seen before Bt ∼ N (0, t − s) for t ≥ s, and therefore the transition
kernel F (t, x|s, y) has density function
∂ −1/2 (x − y)2
f (t, x|s, y) = F (t, x|s, y) = 2π(t − s) exp − . (2.35)
∂x 2(t − s)
f (t, x|s, y) is also called the heat kernel, since it is the fundamental solution to that PDE (2.34),
i.e. for every intial distribution f (0, y) we have
Z
f (t, x) = f (t, x|0, y) f (0, y) dy . (2.36)
R
d
We can also derive (2.34) from the forward equation dt P (t) = P (t) G or the master equa-
tion (1.37) of a continuous-time Markov chain, by rescaling only space as x = k with → 0.
In these derivations the exact structure of the generator G or P − Id is not important and this
equation holds for a whole class of processes.

2.4 Diffusion processes and Fokker-Planck equations

Definition 2.6 A Markov process X is called a diffusion process, if

P |Xt+h − Xt | > Xt = x = o(h) for all > 0, x ∈ R ,

E Xt+h − Xt Xt = x = a(t, x) h + o(h) ,
E (Xt+h − Xt )2 Xt = x

= b(t, x) h + o(h) , (2.37)

for some functions a(t, x) (drift coefficient) and b(t, x) (diffusion coefficient).

27
By the first property diffusion processes have continuous sample paths. Their distributional
properties are uniquely characterized by the drift and the diffusion coefficient.

Theorem 2.8 Let X be a diffusion process with drift a(t, x) and diffusion coefficient b(t, x).
Then the transition density f = f (t, x|s, y) exists and satisfies the (forward) Fokker-Planck
equation (or forward equation)
∂f ∂ 1 ∂2
=− a(t, x) f + 2
b(t, x) f (2.38)
∂t ∂x 2 ∂x
for all 0 ≤ s ≤ t, x, y ∈ R.

Proof. by Taylor expansion similar to Section 2.3

Examples.

f = f (t, x|s, y) is also the solution to the so-called backward Fokker-Planck equation
∂f ∂f 1 ∂2f
= −a(s, y) − b(s, y) 2 (2.39)
∂s ∂y 2 ∂y
which can be derived from the backward equation of a continuous time MC.
Stationary pdfs f ∗ (x) of a time-homogeneous diffusion process with constant drift a(x)
and diffusion b(x) are given by stationary solutions to (2.38), i.e.
∂ 1 ∂2
a(x) f ∗ (x) + b(x) f ∗ (x) .

0=− 2
(2.40)
∂x 2 ∂x

28
Examples.

In general, integrating (2.40) and denoting the derivative by 0 we get

Z x
1 x
Z
∗
0 00
0 = − a(y) f (y) dy + b(y) f ∗ (y) dy =
−∞ 2 −∞
1 0
= −a(x) f ∗ (x) + b(x) f ∗ (x) (+const.) (2.41)
2
This is a first order linear differential equation and differentiating with the product rule we get
2a(x) − b0 (x) ∗
f ∗ 0 (x) = f (x) . (2.42)
b(x)
So the solution is
Z 2a(y) − b0 (y)
x
f ∗ (x) = f ∗ (0) exp dy (2.43)
0 b(y)
where f ∗ (0) is fixed by normalization R f ∗ (x) dx = 1.
R

Diffusion processes can be generalized to higher dimensions. X in Rd is called a diffusion

process if in addition to the continuity property analogous to Definition 2.6

E Xt+h − Xt Xt = x = a(t, x) h + o(h) ,

E (Xt+h − Xt ) ⊗ (Xt+h − Xt ) Xt = x = b(t, x) h + o(h) , (2.44)
with drift vector a(t, x) ∈ Rd and diffusion matrix b ∈ Rd×d , where
j
i
− Xti )(Xt+h − Xtj ) Xt = x .

bij = E (Xt+h (2.45)

29
This is the covariance matrix of the increments of the process. The Fokker-Planck equation for
f = f (t, x) is now given by
d d
∂f X ∂ 1 X ∂2
bij (t, x) f = L∗ f .

= − ai (t, x) f + (2.46)
∂t ∂xi 2 ∂xi ∂xj
i=1 i,j=1

where the right-hand side defines a linear operator L∗ on the set of functions f : Rd → R. L∗
is called the adjoint generator of the process X and is the analogous quantity of the generator
of a continuous-time Markov chain.
Let g : Rd → R be an observable, such as g(Xt ) = kXt k22 . Then the expected value
Z
g(x) f (t, x) dd x

ḡ(t) := E g(Xt ) = (2.47)
Rd
obeys the following evolution equation,
Z Z
d ∂f (t, x) d
ḡ(t) = g(x) d x= g(x) (L∗ f )(t, x) dd x =
dt R d ∂t R d
Z
(Lg)(x) f (t, x) dd x = Lg(t) = E (Lg)(Xt ) .

= (2.48)
Rd
This follows by partial integration, since for each i = 1, . . . , d
Z Z
∂ ∂
ai (t, x) f (t, x) dd x = − g(x) ai (t, x) f (t, x)dd x ,

g(x) (2.49)
Rd ∂xi Rd ∂xi
because f (t, x) → 0 as |x| → ∞ so there are no boundary terms. For the diffusion part this
can be done twice and leads to
d d
X ∂ 1 X ∂2
L= ai (t, x) + bij (t, x) . (2.50)
∂xi 2 ∂xi ∂xj
i=1 i,j=1

This operator is called the generator of the process X and describes the expected time evo-
lution of observables. Note that this also determines the right-hand side of the backward
Fokker-Planck equation (2.39). It is technically more convenient than L∗ and therefore dif-
fusion processes are often characterized by defining their generator.
For time-independent drift a(x) and diffusion b(x) existence and uniqueness of the initial
value problem
∂f (t, x)
= (L∗ f )(t, x) , f (0, x) = f0 (x) , (2.51)
∂t
is well understood. Under the assumption of uniform ellipticity, i.e.
d
X
T
ξ b(x) ξ = bij (x) ξi ξj ≥ αkξk22 for some α > 0 and all ξ ∈ Rd . (2.52)
i,j=1

Theorem 2.9 Under the assumption (2.52) and the growth conditions
2 ∂ai (x) ∂ 2 bij (x)
f0 (x) ≤ Ceαkxk2 , ≤ C1 1 + kxk22 , ≤ C2 1 + kxk22 (2.53)

∂xi ∂xi ∂xj

C, C1 , C2 > 0, the initial value problem (2.51) has a unique classical

for some constants
C 1,2 (0, ∞), Rd solution.

30
Defining the probability current J(f ) with i-th component
d
1X ∂
Ji (x, f ) := ai (x) f − bij (x) f , (2.54)
2 ∂xj
j=1

the Fokker-Planck equation (2.46) can be written as a continuity equation

∂f (t, x)
+ ∇x · J x, f (t, x) = 0. (2.55)
∂t
Integrating this equation over a domain A ⊆ Rd and using integration by parts like above we
get
Z Z Z
∂
f (t, x) dd x = − ∇x · J x, f (t, x) dd x = −

J x, f (t, x) · dS . (2.56)
∂t A A ∂A

The second identity follows from Stokes’ theorem (also called Gauss’ integration theorem).

If A = Rd or the system is closed in A then J x, f (t, x) = 0 for all x ∈ ∂A. So the

right-hand side of (2.56) vanishes and the total probability is conserved, i.e.
Z
P(Xt ∈ A) = f (t, x) dd x = 1 . (2.57)
A

An important class of diffusion processes with direct connections to statistical mechanics

are noise-perturbed gradient flows.

31
Definition 2.7 Let X be a diffusion process with time-independent drift a(x) and diffusion
b(x). V : Rd → R is called a potential for X, if a(x) = −∇V (x). If bij (x) = b δij we call X
a (noise-perturbed) gradient flow.

The Fokker-Planck equation of a gradient flow is given by

∂f (t, x) b
= ∇ · (∇V (x)) f (t, x) + ∆f (t, x) (2.58)
∂t 2
and the generator is
b
L = − ∇V (x) · ∇ + ∆ . (2.59)
2
Examples.

Proposition 2.10 Assume that V : Rd → R is smooth and that

Z
Z := e−2V (x)/b dd x < ∞ . (2.60)
Rd

Then the diffusion process X with generator (2.59) is ergodic. The unique stationary distribu-
tion is the Gibbs distribution with density
1 −2V (x)/b
f ∗ (x) = e , (2.61)
Z
and the normalization factor Z is called partition function.

32
Proof. We have ∇f ∗ = − 2b (∇V ) f ∗ and thus
b b
∆f ∗ = ∇ · (∇f ∗ ) = −∇ · (∇V ) f ∗ .

(2.62)
2 2
Substituting this in (2.58) the right-hand side vanishes and f ∗ is stationary.
Uniqueness and ergodicity follow from the fact gradient flows fulfill the conditions such that
the Fokker-Planck equation (2.58) has a unique (time-dependent) solution. 2

33
3 Martingales
Martingales are an important class of processes in the study of stochastic differential equations.
They can be discrete or continuous both in time and space.

3.1 Filtrations and adapted processes

Definition 3.1 Let Y = {Yn : n ∈ N} and X = {Xn : n ∈ N} be sequences of real-valued
random variables. Y is a martingale with respect to X if for all n ∈ N

E |Yn | < ∞ and E Yn+1 X0 , . . . , Xn = Yn . (3.1)

Examples.

Alternatively, the conditional expectation can be defined with respect to the σ-algebra gen-
erated by X. This approach can be extended to continuous time.
A σ-algebra F on the probability space Ω is a set of measurable sets, that fulfills the following
consistency conditions: For all A, A1 , A2 , ... ∈ F
[
∅ ∈ F , Ac = Ω \ A ∈ F , An ∈ F . (3.2)
n∈N

34
Examples.

A filtration is a collection (Ft , t ∈ T) of sub-σ-algebras of F which is increasing, i.e.

s ≤ t ⇒ Fs ⊆ Ft . A process X = (Xt : t ∈ T) is adapted to the filtration (Ft : t ∈ T) if
Xt is Ft -measurable for every t.

Example.

Interpretation: Ft is the total information available up to time t.

As in the above example, this can be generated by a process X = (Xt : t ∈ T). The natural
filtration for the process X is given by

FtX = σ {Xs : s ≤ t, s ∈ T} for all t ∈ T ,

(3.3)

35
i.e. the information generated by the process X. This is the smallest filtration X is adapted to.
Then we can write for a random variable Z
E(Z|X0 , . . . , Xt ) = E(Z|FtX ) . (3.4)

In general we have the following monotone behaviour with respect to the available information.

Proposition 3.1 Tower property

Let F1 ⊆ F2 be σ-algebras. Then for every random variable Z with E |Z| < ∞,

E E(Z|F2 ) F1 = E E(Z|F1 ) F2 = E(Z|F1 ) . (3.5)

With this notation continuous-time and discrete-time martingales can be treated together,
in the first case T = [0, ∞), in the second T = N.

Definition 3.2 A real-valued process X = (Xt : t ∈ T) is called a martingale with respect to

the filtration (Ft : t ∈ T), if it is adapted to (Ft : t ∈ T) and for all t ∈ T

E |Xt | < ∞ and E(Xt |Fs ) = Xs for all s ≤ t, s ∈ T . (3.6)
X is just called a martingale if it is a martingale w.r.t. its natural filtration (FtX : t ∈ T).

In particular this implies that E(Xt ) = X0 for all t ∈ T.

Example.

36
Proposition 3.2 Lévy’s characterization of Brownian motion
A continuous-time process B is a standard Brownian motion if and only if B and (Bt2 − t : t ≥
0) are martingales with continuous sample paths and B0 = 0.

3.2 Properties of martingales

As mentioned before, for general continuous-time processes we concentrate on càdlàg processes,
i.e. processes with right-continuous sample paths that have left limits.

Theorem 3.3 Martingale convergence theorem

Let X = (Xt : t ∈ T) be a (càdlàg) martingale. If E |Xt | ≤ M for some M > 0 and all
t ∈ T, then X∞ = limt→∞ Xt exists a.s. and E X∞ < ∞ .

This is basically the same as saying that X converges to a stationary distribution and X∞ is a
random variable with that distribution. In particular this implies Xt = E(X∞ |Ft ).

Examples.

Definition 3.3 A random variable T ∈ T ∪ {∞} is called a stopping time (with respect to the
filtration (Ft : t ∈ T)), if {T ≤ t} ∈ Ft for all t ∈ T.

37
Examples.

Interpretation: If T is a stopping time then at each time t ∈ T there is always enough

information (in Ft ) to decide wether T = t or not.

Proposition 3.4 Let X = (Xt : t ∈ T) be a (càdlàg) martingale and T ∈ T a stopping time,

both w.r.t. (Ft : t ∈ T). Then the stopped process

T Xt , t < T
X := is a martingale w.r.t. (Ft : t ∈ T) . (3.7)
XT , t ≥ T

Proof. We focus on discrete time T = N. We can write

n−1
Xi 1T =i + Xn 1T ≥n
X
XnT = (3.8)
i=0

so XnT is adapted to Fn and E |XnT | ≤ ni=0 E |Xi | < ∞ .

P
T
Also Xn+1 − XnT = (Xn+1 − Xn ) 1T >n , so we have
T
|Fn ) − XnT = E(Xn+1
T
− XnT |Fn ) = E (Xn+1 − Xn ) 1T >n Fn =

E(Xn+1
= E(Xn+1 |Fn ) − Xn 1T >n = 0 ,

(3.9)

and X T is a martingale w.r.t. (Fn : n ∈ N). 2

38
Example.

Theorem 3.5 Optional stopping theorem

Let X = (Xt : t ∈ T) be a (càdlàg) martingale with E |Xt | ≤ M for some M > 0 and all
t ∈ T. Then for all stopping times S ≤ T with P(S < ∞) = 1 we have

E(XT |FS ) = XS a.s. . (3.10)

T even if
Proof. By the convergence theorem 3.3 Xt → X∞ a.s. and we have XT = X∞
T = ∞. Then again for discrete time T = N we have

|Fs ) 1S=s = XsT 1S=s = XST = XS

X X
T
E(XT |FS ) = E(X∞ (3.11)
s∈N s∈N

using Proposition 3.4 and that S ≤ T . 2

In particular with S = 0 this implies that E(XT ) = X0 for all stopping times T .

Example.

39
4 Stochastic calculus
4.1 Diffusion processes and SDEs
Diffusion processes can be described also by stochastic differential equations. Let X be a
diffusion process in R with drift a(t, x) and diffusion coefficient b(t, x) = σ 2 (t, x) given by

E Xt+h − Xt Xt = x = a(t, x) h + o(h) ,
E (Xt+h − Xt )2 Xt = x = σ 2 (t, x) h + o(h) .

(4.1)

In general for a random variable Y with mean µ and variance σ 2 we can write
X −µ
Y = µ + σξ where ξ= . (4.2)
σ
Also the increments of the process X at time t are random variables with mean and variance
depending on Xt and given by

E Xt+h − Xt Xt = a(t, Xt ) h + o(h) ,
var Xt+h − Xt Xt = σ 2 (t, Xt ) h − a(t, Xt )2 h2 + o(h) = σ 2 (t, Xt ) h + o(h) (4.3)

.
p
Therefore with ξt,t+h = Xt+h − Xt − a(t, Xt ) / σ 2 (t, Xt ) h we get
√
Xt+h − Xt = a(t, Xt ) h + σ(t, Xt ) h ξt,t+h + o(h) . (4.4)

Then
√ √
E( h ξt,t+h ) = 0 and var( h ξt,t+h ) = h , (4.5)

which looks an awful lot like the increment of a Brownian motion. Indeed, if the process X
has independent increments also the ξt,t+h are independent and
n
X
ξt,t+h = ξt+(k−1)/n,t+k/n (4.6)
k=1

sum of arbitrarily many independent random variables with mean 0 and vari-
can be written as a √
ance 1. Therefore h ξt,t+h ∼ N (0, h) are Gaussian and can thus be interpreted as increments
of a Brownian motion. Now we can write

Xt+h − Xt = a(t, Xt ) h + σ(t, Xt )(Bt+h − Bt ) + o(h) for a BM B . (4.7)

Deviding by h we get in the limit h → 0

dXt dBt
= a(t, Xt ) + σ(t, Xt ) . (4.8)
dt dt
This is a differential equation for each path of X, i.e. for fixed ω ∈ Ω. But paths of a BM are
not differentiable and therefore (4.8) is often written as

dXt = a(t, Xt ) dt + σ(t, Xt ) dBt . (4.9)

This is called a stochastic differential equation (SDE).

40
The (non-existing) derivative ηt = dBt /dt is called white noise, and can be understood as
a normalized random force term on X uncorrelated in time. Formally it is given by a Gaussian
process with covariance δt,s . Physicists often write
dXt
= a(t, Xt ) + σ(t, Xt ) ηt , (4.10)
dt
instead of (4.9) and call this a Langevin equation.
As for ordinary differential equations, it is often better to look at the integrated version of
(4.9), since it requires less regularity assumptions,
Z t Z t
Xt − X0 = a(s, Xs ) ds + σ(s, Xs ) dBs . (4.11)
0 0

In general, a solution of the SDE with initial condition X0 = x0 consists of

• a probability space and a filtration Ω, F, (Ft : t ≥ 0), P ,
• a BM B = (Bt : t ≥ 0) adapted to (Ft : t ≥ 0) ,
• a continuous process X = (Xt : t ≥ 0) adapted to (Ft : t ≥ 0)
that fulfilles (4.11) with X0 = x0 . (4.12)

As usual, the probability space is often not mentioned explicitly, X is just given as some func-
tion of B and (Ft : t ≥ 0) is the natural filtration for B. But in any case, we have to make
sense of the two stochastic integrals in (4.11).

From now on let us fix some probability space Ω, F, (Ft : t ≥ 0), P . Let X = (Xt : t ≥
0) and Y = (Yt : t ≥ 0) be two càdlàg adapted processes.

We partition the time interval [0, t] such that

0 = t0 < t1 < . . . < tn = t with tk − tk−1 → 0 for all k = 1, . . . , n, as n → ∞ .

(4.13)

Then we would like to define the stochastic integral I = (It : t ≥ 0) by

Z t Xn
It = Ys dXs = lim Yτk (Xtk − Xtk−1 ) . (4.14)
0 n→∞
k=1

The question is, for which X and Y is this limit well defined, does it depend on the choice of
τk ∈ [tk−1 , tk ] and in what sense does the limit hold?

41
Integrands: The most general integrands Y we will consider are continuous adapted
processes. This includes all diffusion processes but could be further generalized to socalled
previsible processes.

Integrators: The most general integrators X for which (4.14) can be defined are socalled
semimartingales, which we introduce in the following.

4.2 Semimartingales
Definition 4.1 Let X be a càdlàg adapted process. For each ω ∈ Ω define
n
X
Vt (ω) := lim Xtk (ω) − Xtk−1 (ω) ∈ [0, ∞] , (4.15)
n→∞
k=1

which is non-decreasing in t. V = (Vt : t ≥ 0) is called the total variation process of X and

X is of finite variation if Vt < ∞ for all t ≥ 0.

Vt (ω) corresponds to the length of the path Xs (ω) : 0 ≤ s ≤ t .

Examples.

Proposition 4.1 Let X be a continuous

martingale. Then there exists a unique adapted in-
creasing process [X] = [X]t : t ≥ 0 with [X]0 = 0, such that

X 2 − [X] = Xt2 − [X]t : t ≥ 0 is a continuous martingale .

(4.16)

[X] is called the quadratic variation of X and for

n
X
[X]nt := (Xtk − Xtk−1 )2 we have [X]nt → [X]t as n → ∞ , (4.17)
k=1

42

in the sense that for all > 0 and t ≥ 0, P sup [X]ns − [X]s > → 0 .
s≤t

Note that if we define the quadratic variation pathwise like the total variation, it does not
have the nice properties stated above.

Examples.

Proposition 4.2 Let X be a continuous process of finite variation. Then [X] ≡ 0.

In particular, if X is a continuous martingale of finite variation, then X ≡ X0 .

Proof. By the definitions (4.17) and 4.1 we have

n
X n
X
[X]nt = (Xtk − Xtk−1 )2 ≤ sup Xtk − Xtk−1 Xtk − Xtk−1 . (4.18)
k=1 k=1,..,n
| {z } |k=1 {z }
→0 as n→∞ ≤Vt
by continuity by finite variation

Therefore [X]t = lim [X]nt = 0 for all t ≥ 0 .

n→∞
If X is a continuous martingale with say X0 = 0 we have

E(Xtk Xtk−1 ) = E E(Xtk Xtk−1 |Ftk−1 ) = E Xtk−1 E(Xtk |Ftk−1 ) = E(Xt2k−1 )(4.19)

.

Therefore we have for a time partition of arbitrary size n,

X n X n
2
E(Xt2 ) = E Xt2k − Xt2k−1

=E Xtk − Xtk−1 →0 (4.20)
k=1 k=1

as n → ∞ by the first statement. Thus Xt = 0 for all t ≥ 0. 2

Definition 4.2 A semimartingale X is a càdlàg adapted process which may be written as

X = X0 + M + A with M 0 = A0 = 0 , (4.21)

where M is a martingale and A is a process of finite variation.

43
Examples.

Corollary 4.3 For continuous semimartingales X the decomposition (4.21) is unique and it is
also called the Doob-Meyer decomposition of X.

Proof.

4.3 Let N = (Nt : t ≥ 0) ∼ P P (λ) and Z1 , Z2 , . . . a sequence of iidrv’s with

Definition
E |Zk | < ∞ and distribution function F . Then
Nt
X
Y = (Yt : t ≥ 0) with Yt = Zk (4.22)
k=1

is called a compound Poisson process or jump process. For a, σ ∈ R and B a standard BM a

process of the form

Xt = X0 + a t + σBt + Yt with stationary, independent increments (4.23)

is called a Lévy process. X is completely determined by the Lévy triple (a, σ 2 , λF ).

44
Proposition 4.4 A Lévy process X is a càdlàg semimartingale.

Proof.

It can also be shown that every càdlàg semimartingale with stationary independent incre-
ments has to be a Lévy process, so they are quite general.

45
4.3 Stochastic integration and Itô calculus
Theorem 4.5 Itô integral
Let X be a continuous semimartingale with Doob-Meyer decomposition X = X0 + M + A
(4.21) and Y be a continuous adapted process. If
Z t
sup E(Xs2 ) < ∞ and E Ys2 d[X]s < ∞ (4.24)
0≤s≤t 0

for some t ≥ 0, then

Z t n
X
It = Ys dXs := lim Ytk−1 (Xtk − Xtk−1 ) (4.25)
0 n→∞
k=1

exists in the sense of (4.17). If (4.24) holds for all t ≥ 0, then I = (It : t ≥ 0) is a continuous
semimartingale with decomposition
Z t Z t
It = 0 + Ys dMs + Ys dAs (4.26)
0 0

and is called the (stochastic) Itô integral of Y w.r.t. X.

Examples.

With considerable technical effort, the Itô integral can be generalized to non-continuous processes.
(4.26) implies that
Rt
if X is a martingale, then 0 Ys dXs is a martingale . (4.27)

So for example Itô integrals w.r.t. BM with X = B are martingales. Surprisingly, also a
converse statement holds.

46
Proposition 4.6 Let X be a martingale. Then there exists an adapted process Y such that
Z t
Xt = X0 + Ys dBs for all t ≥ 0 , where B is a standard BM . (4.28)
0

Moreover, if X is a continuous martingale with X0 = 0 and [X]t → ∞ as t → ∞, we have

Xt = B[X]t for all t ≥ 0 , where B is a standard BM . (4.29)

Proof.

So every martingale X is an integral w.r.t. standard BM, and if it is continuous, it is ac-

tually a (time-changed) standard BM on the time scale [X]t rather than t. Note that of course
consistently [B]t = t.

How do we calculate Itô integrals? Let’s start with a simple example.

47
We see that for the Itô integral with α = 0 we get
Z t
1
Bs dBs = (Bt2 − Bt20 ) − (t − t0 ) .

(4.30)
t0 2
Another common choice are centred intermediate points with α = 1/2. Here we get
Z t
1
S Bs dBs = (Bt2 − Bt20 ) , (4.31)
t0 2
and this integral is called the Stratonovich integral. The advantage of this choice is that it obeys
the usual rules of calculus. But now dependence of Yτk and the increment Xtk − Xtk−1 is more
complicated, leading to several disadvantages compared to Itô:
R
• S Ys dXs can only be defined if also Y is a semimartingale .
R
• Even if X is a martingale, S Ys dXs is in general NOT a martingale .

Therefore the preferred choice is usually the Itô integral, and from this one can recover the
Stratonovich version by a simple transformation. The unexpected term (t − t0 ) in (4.30) has
to be there, since the result should be a martingale. These additional terms can be easily
understood by the rules of Itô calculus, introduced below.
It is often convenient to use the following intuitive differential notation,
Z t
It = It0 + Ys dXs ⇔ dIt = Yt dXt . (4.32)
t0

For a continuous martingale M we get analogous to our above computation

Z t
1 2
Ms dMs = (Mt − Mt20 ) − [M ]t − [M ]t0 . (4.33)
t0 2
This is equivalent to
1
d(Mt2 ) − d[M ]t or d(Mt2 ) = 2Mt dMt + d[M ]t .

Mt dMt = (4.34)
2
This is basically an application of the chain rule for Itô calculus. The meaning of the quadratic
variation term becomes clear if we compute an increment by hand,
2
Mt+h − Mt2 = (Mt+h − Mt )(Mt+h + Mt ) = (Mt+h − Mt )(Mt+h − Mt + 2Mt ) =
= (Mt+h − Mt )2 + 2Mt (Mt+h − Mt ) . (4.35)

Taking h → 0 we get d(Mt2 ) = 2Mt dMt + (dMt )2 , and comparing with (4.34),

d[M ]t = (dMt )2 = O(dt) . (4.36)

In usual calculus these terms are of negligible order o(dt), but for martingales they have to be
taken into account, for example for BM d[B]t = (dBt )2 = dt .
If X = X0 + M + A is a semimartingale, by Proposition 4.2, [A] ≡ 0. Thus [X] = [M ] and
the general version of (4.34) is

d(Xt2 ) = 2Xt dXt + d[X]t

= 2(Mt + At ) d(Mt + At ) + d[M ]t . (4.37)

48
Using (4.36) we can easily see that the quadratic variation of the Itô integral
Z t Z t
It = Ys dXs is given by [I]t = Ys2 d[X]s for all t ≥ t0 . (4.38)
t0 t0

This follows directly from

d[I]t = (dIt )2 = (Yt dXt )2 = Yt2 (dXt )2 = Yt2 d[X]t . (4.39)

This should clarify condition (4.24) which insures that the integral has a finite quadratic varia-
tion. These findings are summarized in the following very usefull result.

Theorem 4.7 Itô’s formula

Let X be a continuous semimartingale and g ∈ C 2 (R, R). Then
1
dg(Xt ) = g 0 (Xt ) dXt + g 00 (Xt ) d[X]t , (4.40)
2
or in the integrated version
Z t
1 t 00
Z
0
g(Xt ) = g(X0 ) + g (Xt ) dXt + g (Xt ) d[X]t . (4.41)
0 2 0

Proof. Taylor expansion with terms up to order dt.

In particular, we see that f (Xt ) is again a semimartingale with decomposition

Z t Z t
1 t 00
Z
0 0
g(Xt ) = g(X0 ) + g (Xt ) dMt + g (Xt ) dAt + g (Xt ) d[X]t . (4.42)
2
|0 {z } |0 {z 0 }
cont. martingale finite variation

Examples.

49
4.4 Solutions to SDEs
Let X be a solution of the SDE

dXt = a(t, Xt ) dt + σ(t, Xt ) dBt . (4.43)

For the quadratic variation of X we get

d[X]t = (dXt )2 = σ 2 (t, Xt ) dt + o(dt) , (4.44)

and using Itô’s formula, the SDE for an observable g(Xt ) with g ∈ C 2 (R, R) is
1 00
dg(Xt ) = g 0 (Xt ) dXt + g (Xt ) (dXt )2 =
2
1
= g 0 (Xt ) a(t, Xt ) dt + σ(t, Xt ) dBt + g 00 (Xt )σ 2 (t, Xt ) dt . (4.45)
2
Taking the expectation on both sides, we get with PDF f (t, x) by partial integration
Z
d ∂
E g(Xt ) = g(x) f (t, x) dx =
dt ∂t
ZR
1
= g 0 (x) a(t, x) + g 00 (x)σ 2 (t, x) f (t, x) dx =
R 2
1 ∂2
Z
∂ 2

= g(x) a(t, x) f (t, x) + σ (t, x) f (t, x) dx , (4.46)
R ∂x 2 ∂x2
since the expected value of the martingale part vanishes. This holds for arbitrary functions g,
and therefore we must have
∂ ∂ 1 ∂2
σ 2 (t, x) f (t, x) .

f (t, x) = a(t, x) f (t, x) + 2
(4.47)
∂t ∂x 2 ∂x
Thus f (t, x) fulfilles the Fokker-Planck equation and X is a diffusion process with drift a(t, x)
and diffusion σ 2 (t, x).

How many solutions to the SDE (4.43) are there?

Definition 4.4 We say that a SDE has a weak solution if there exists a solution for all initial
values X0 = x0 ∈ R. The solution is unique in law, if all solutions started from x0 have the
same distribution. The solution is pathwise unique, if for a fixed probability space Ω, F, (Ft :
t ≥ 0), P and a fixed BM B, any two solutions X and X 0 fulfill

X0 = X00 a.s. ⇒ P(Xt = Xt0 for all t ≥ 0) = 1 . (4.48)

If a solution X is adapted to the natural filtration of B it is called a strong solution.

For time-independent drift and diffusion there is a general theorem about existence and
uniqueness for SDEs.

50
Theorem 4.8 Suppose that a : R → R and σ : R → R are Lipschitz-continuous, i.e.

a(x) − a(y) ≤ K|x − y| for some K > 0 and all x, y ∈ R . (4.49)

Then for each Ω, F, (Ft : t ≥ 0), P and each BM B adapted to (Ft : t ≥ 0) solutions to

dXt = a(Xt ) dt + σ(Xt ) dBt (4.50)

are pathwise unique and there exists a strong solution for any starting point x0 ∈ R.

Proof. analogous to ordinary differential equations using the contraction mapping theorem and
Gronwall’s Lemma.

Itô’s formula and the existence and uniqueness theorem can be extended to higher space
dimensions. There is also a Stratonovich interpretation of SDEs which is directly connected to
the Itô version given here. Both can be found in Gardiner, Section 4.3.

Notes Co905 11
No ratings yet
Notes Co905 11
44 pages
ST202 Notes
No ratings yet
ST202 Notes
52 pages
Main
No ratings yet
Main
93 pages
Stochastic Proccess
No ratings yet
Stochastic Proccess
114 pages
Stoch Procs Lecture1 2025
No ratings yet
Stoch Procs Lecture1 2025
20 pages
Stochastic - Lecture Notes
100% (1)
Stochastic - Lecture Notes
108 pages
Probability & Statistics 2: Robert Šámal January 29, 2024
No ratings yet
Probability & Statistics 2: Robert Šámal January 29, 2024
29 pages
STAT 333 Markov Chains Notes
No ratings yet
STAT 333 Markov Chains Notes
71 pages
Math5846 Chapter8
No ratings yet
Math5846 Chapter8
101 pages
Markov Chain
No ratings yet
Markov Chain
37 pages
Markov Chains2
No ratings yet
Markov Chains2
75 pages
Screenshot 2025-05-06 at 4.01.34 AM
No ratings yet
Screenshot 2025-05-06 at 4.01.34 AM
5 pages
Memoria
No ratings yet
Memoria
70 pages
Markov Chains: 1.1 Specifying and Simulating A Markov Chain
No ratings yet
Markov Chains: 1.1 Specifying and Simulating A Markov Chain
38 pages
MIT18 S096F13 Lecnote1
No ratings yet
MIT18 S096F13 Lecnote1
67 pages
18 615 Notes
No ratings yet
18 615 Notes
33 pages
Markov Hand Out
No ratings yet
Markov Hand Out
14 pages
All Chapters
No ratings yet
All Chapters
180 pages
Stochastic Processes Study Guide
No ratings yet
Stochastic Processes Study Guide
8 pages
Stat 333 Master
No ratings yet
Stat 333 Master
85 pages
Stanford - Discrete Time Markov Chains PDF
No ratings yet
Stanford - Discrete Time Markov Chains PDF
23 pages
Markov Chain
No ratings yet
Markov Chain
23 pages
An Introduction To Stochastic Processes in Continuous Time
No ratings yet
An Introduction To Stochastic Processes in Continuous Time
145 pages
Markov Chains
No ratings yet
Markov Chains
55 pages
Markov Chains for Mathematicians
No ratings yet
Markov Chains for Mathematicians
59 pages
Stochastic Processes Lectures
No ratings yet
Stochastic Processes Lectures
132 pages
Stochastic Processes Beamer
No ratings yet
Stochastic Processes Beamer
43 pages
Stoch Bio Chapter 3
No ratings yet
Stoch Bio Chapter 3
46 pages
Notes On Markov Chain
No ratings yet
Notes On Markov Chain
34 pages
Jeffrey S. Rosenthal - A First Look at Stochastic Processes-World Scientific Publishing (2020)
No ratings yet
Jeffrey S. Rosenthal - A First Look at Stochastic Processes-World Scientific Publishing (2020)
230 pages
ST202 - Stochastic Processes
No ratings yet
ST202 - Stochastic Processes
5 pages
Ca07 RgIto Text
No ratings yet
Ca07 RgIto Text
18 pages
STAT0007 Course Notes
No ratings yet
STAT0007 Course Notes
93 pages
Continuous-Time Markov Chains
No ratings yet
Continuous-Time Markov Chains
15 pages
StochBioChapter3 PDF
No ratings yet
StochBioChapter3 PDF
46 pages
ST333 v8.0
No ratings yet
ST333 v8.0
109 pages
MATH858D Markov Chains: Maria Cameron
No ratings yet
MATH858D Markov Chains: Maria Cameron
44 pages
Markov Chains
No ratings yet
Markov Chains
42 pages
Markov Chains Ergodicity
No ratings yet
Markov Chains Ergodicity
8 pages
Pavliotis Book
No ratings yet
Pavliotis Book
155 pages
Continuous Time Markov Processes (Tho... (Z-Library)
100% (4)
Continuous Time Markov Processes (Tho... (Z-Library)
289 pages
Introduction To Markov Chains
No ratings yet
Introduction To Markov Chains
14 pages
Markov Chain
No ratings yet
Markov Chain
61 pages
Markov Chains
No ratings yet
Markov Chains
61 pages
Markov Chains Course Notes
No ratings yet
Markov Chains Course Notes
61 pages
Stochastic 2023-24 Answer
No ratings yet
Stochastic 2023-24 Answer
11 pages
MC Notes
No ratings yet
MC Notes
42 pages
Lectures On Stochastic Processes
No ratings yet
Lectures On Stochastic Processes
207 pages
System Modeling - 5
No ratings yet
System Modeling - 5
9 pages
Markov Chains & Random Walks Notes
No ratings yet
Markov Chains & Random Walks Notes
128 pages
ST333/ST406 Applied Stochastic Processes
No ratings yet
ST333/ST406 Applied Stochastic Processes
104 pages
1 Introduction
No ratings yet
1 Introduction
4 pages
Stochastic Processes & Markov Chains
100% (1)
Stochastic Processes & Markov Chains
16 pages
An Introduction To Stochastic Calculus
100% (1)
An Introduction To Stochastic Calculus
102 pages
Retrieval of Biophysical Parameters With Heteroscedastic Gaussian Processes
No ratings yet
Retrieval of Biophysical Parameters With Heteroscedastic Gaussian Processes
5 pages
Wind Turbine Blade Modeling
No ratings yet
Wind Turbine Blade Modeling
66 pages
Lecture 11 - Probabilistic Machine Learning For Neural Networks
No ratings yet
Lecture 11 - Probabilistic Machine Learning For Neural Networks
25 pages
Bahan Ajar Pemodelan Dan Identifikasi Sistem PDF
No ratings yet
Bahan Ajar Pemodelan Dan Identifikasi Sistem PDF
5 pages
Ipc2022-87176 Machine Learning Tools To Predict The Burst Capacity of Pipelines Containing Dentgouges
No ratings yet
Ipc2022-87176 Machine Learning Tools To Predict The Burst Capacity of Pipelines Containing Dentgouges
8 pages
Random Vibration and Spectral Analysis: André Preumont
No ratings yet
Random Vibration and Spectral Analysis: André Preumont
8 pages
Physics 127b: Statistical Mechanics Brownian Motion: Random Walk
No ratings yet
Physics 127b: Statistical Mechanics Brownian Motion: Random Walk
7 pages
Machine Learning in Chemistry Data-Driven Algorithms, Learning Systems, and Predictions
No ratings yet
Machine Learning in Chemistry Data-Driven Algorithms, Learning Systems, and Predictions
140 pages
311c PDF
No ratings yet
311c PDF
50 pages
Microwaveantenna Assignment2
No ratings yet
Microwaveantenna Assignment2
9 pages
Matlab Random Waves
No ratings yet
Matlab Random Waves
185 pages
Deep Gaussian Processes For Multi-Fidelity Modeling
No ratings yet
Deep Gaussian Processes For Multi-Fidelity Modeling
15 pages
Aalto Science Institute (Asci) Internship Programme Summer 2021 Project List
No ratings yet
Aalto Science Institute (Asci) Internship Programme Summer 2021 Project List
42 pages
M0703 Communication Engineering
No ratings yet
M0703 Communication Engineering
77 pages
Machine Learning by Joerg Kienitz
No ratings yet
Machine Learning by Joerg Kienitz
5 pages
Chapter 7
No ratings yet
Chapter 7
33 pages
M.E. Power Systems Curriculum
No ratings yet
M.E. Power Systems Curriculum
128 pages
Preview: Computationally Efficient Data-Enhanced Manifold Modeling of Multi-Modal Turbulent Combustion
No ratings yet
Preview: Computationally Efficient Data-Enhanced Manifold Modeling of Multi-Modal Turbulent Combustion
24 pages
W6a Gaussian Process Kernels
No ratings yet
W6a Gaussian Process Kernels
6 pages
Predicting Global Patterns of Long-Term Climate Change From Short-Term Simulations Using Machine Learning
No ratings yet
Predicting Global Patterns of Long-Term Climate Change From Short-Term Simulations Using Machine Learning
9 pages
Sparse Gaussian Processes: Structured Approximations and Power-EP Revisited
No ratings yet
Sparse Gaussian Processes: Structured Approximations and Power-EP Revisited
19 pages
Spatial Control Charts For The Mean PDF
No ratings yet
Spatial Control Charts For The Mean PDF
20 pages
Bayesian Optimization For Adaptive Experimental Design A Review
No ratings yet
Bayesian Optimization For Adaptive Experimental Design A Review
12 pages
DSG Bring Your Own Project
No ratings yet
DSG Bring Your Own Project
8 pages
Multivariate Gaussian and Student T Process Regression For Multi-Output Prediction
No ratings yet
Multivariate Gaussian and Student T Process Regression For Multi-Output Prediction
29 pages
Mixed Effects Models For The Population Approach Models, Tasks, Methods and Tools - 1st Edition Google Drive Download
100% (13)
Mixed Effects Models For The Population Approach Models, Tasks, Methods and Tools - 1st Edition Google Drive Download
16 pages
Gaussian Processes in Bayesian Regression
No ratings yet
Gaussian Processes in Bayesian Regression
146 pages
Supporting Information: Portable Seawater Desalination System For Generating Drinkable Water in Remote Locations
No ratings yet
Supporting Information: Portable Seawater Desalination System For Generating Drinkable Water in Remote Locations
21 pages
Tutorial
No ratings yet
Tutorial
11 pages
A. K. Basu - Introduction To Stochastic Process
100% (2)
A. K. Basu - Introduction To Stochastic Process
430 pages

Notes Co905

Uploaded by

Notes Co905

Uploaded by

Stochastic Processes

CO905 - Complexity Science

2 Processes with continuous state space 19

• Classical mechanics: stochasticity due to lack of information

• Quantum mechanics: inherent stochasticity

Definition 0.1 A stochastic process X = (Xt : t ∈ T) is a family of random variables Xt :

A stochastic process X : T × Ω → S is a function of two variables, time t and ω ∈ Ω. For

pij (t) := P(Xt = j|X0 = i) ∈ [0, 1] for all i, j ∈ S , (1.1)

and the transition ’matrices’

P (t) := pij (t) : i, j ∈ S ∈ [0, 1]|S|×|S| .

A homogeneous Markov chain starting at time t = 0 is uniquely determined by an initial

P (t + t0 ) = P (t) P (t0 ) for all t, t0 , t + t0 ∈ T . (1.5)

Proof. pij (0) = δij by definition, and for all t, t0 , t + t0 ∈ T

pij (t + t0 ) = P(Xt+t0 = j|X0 = i) =

Definition 1.2 A probability distribution π ∗ is called stationary if π ∗ P (t) = π ∗ for all t ≥ 0.

Proof. Depends on discrete or continuous time, see later.

1.2 Discrete time

Definition 1.5 The period d(i) of a state i ∈ S is defined as

the greatest common divisor of the epochs at which return is possible.

Proof. see p.224 in Grimmett and Stirzaker

This implies that for every initial distribution

Assume that Xt = i. For small times ∆t we have from (1.23)

pij (∆t) = gij ∆t + o(∆t) for all i 6= j ∈ S . (1.24)

pii (∆t) = 1 + gii ∆t + o(∆t) for all i ∈ S , (1.25)

Therefore the diagonal entries of G are

and |gii | can be interpreted as the total rate to leave state i.

What does really ’happen’ in a continuous time Markov chain?

Assume that Xt = i and define the holding time

Proof. Wi has ’loss of memory’ property, i.e. for all s, u > 0

P(Wi > s + u|Wi > s) = P(Wi > s + u|Xt+s = i) =

F̄ (s + u) = P(Wi > s + u) = P(Wi > s + u, Wi > s) , (1.30)

and thus F̄ (s + u) = F̄ (s) F̄ (u).

F̄ (s) = P(Wi > s) = eλs where λ = F̄ 0 (0) . (1.31)

Together with (1.25) we get

and therefore P(Wi > s) = e−|gii |s and Wi ∼ Exp(|gii |).

For the Poisson process there exists also another characterization.

If π = π ∗ is a stationary distribution, then both sides of the equation vanish.

π ∗ P (t) = π ∗ ⇔ π∗G = 0 ⇔ π̄P Y = π̄ , (1.38)

where π̄i = πi∗ |gii | for all i ∈ S.

π ∗ G = (0, . . . , 0) ⇔ π ∗ Gk = (0, . . . , 0) for all k ≥ 1

and both sides vanish equivalently. 2

πi gij = πj gji for all i, j ∈ S . (1.43)

Definition 1.6 A state i ∈ S is called

Proposition 1.14 Let i ∈ S be a non-absorbing state. Then

Proof. see Section 6.2 in Grimmett and Stirzaker

i ∈ S is transient for X ⇔ i is transient for Y ,

t = ∆t n , b = (∆t)α x with α = 1/2 , (2.5)

for all t ≥ s ≥ 0. So B has stationary increments, i.e. Bt − Bs ∼ Bt−s − B0 , and by

Theorem 2.1 Existence of Brownian motion (Wiener)

Proof. see e.g. Rogers and Williams, Section I.6

Proposition 2.2 Let B be a standard BM. For all t1 , . . . , tn , n ∈ N the vector

Reminder. The pdf of the multivariat Gaussian (Bt1 , . . . , Btn ) is given by

What are the regularity properties of a Brownian sample path?

From (2.4) we expect for Brownian motion

Theorem 2.3 Uniqueness of Brownian motion (Wiener)

Proof. see e.g. Rogers and Williams, Section I.6

Examples of sample paths.

Definition 2.1 A d-dimensional standard Brownian motion B = (Bt : t ≥ 0) is a collection

Bt = (Bt1 , . . . , Btd ) for all t ≥ 0 . (2.14)

(ii) If d = 2, BM is neighbourhood-recurrent, i.e. for every  > 0

However, points are polar, i.e. for all x ∈ R2

P(Tx = ∞) = 1 , where Tx = inf{t > 0 : Bt = x} . (2.17)

(iii) If d ≥ 3, BM is transient, i.e. |Bt | → ∞ as t → ∞ with probability one.

Proof. see e.g. Rogers and Williams, Section I.18

’Proof’. see class

and the probability space Ω is hidden from the discussion.

Definition 2.2 Let X, Y : Ω → R be random variables and A ⊆ R measurable. Then we say

X ∈ A almost surely (a.s.) , if P(X ∈ A) = 1 and

So in general we have X = Y a.s. ⇒ X∼Y .

Similar concepts exist for convergence of random variables,

Xn → X means convergence in distribution, i.e. FXn (x) → FX (x) ,

X = Y a.s. ⇒ X∼Y , Xn → X a.s. ⇒ Xn → X . (2.22)

Ft,tn+1 (x, xn+1 ) → Ft (x) as xn+1 → ∞ , and

So Brownian motion is an example of a Gaussian process.

F (t, x|s, y) = P(Xt ≤ x|Xs = y) , (2.25)

2.4 Diffusion processes and Fokker-Planck equations

(ii) If d = 2, BM is neighbourhood-recurrent, i.e. for every > 0