0% found this document useful (0 votes)
19 views11 pages

Lecture 3 Monte

Lecture 3 of the M362K Intro to Stochastic Processes course discusses stochastic processes, defining them as families of random variables indexed by time. It covers the concepts of discrete and continuous-time processes, the construction of a simple random walk, and the role of simulations in modeling randomness. Additionally, it addresses the challenges of random number generation and the requirements for effective random number generators.

Uploaded by

ataabuasad08
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views11 pages

Lecture 3 Monte

Lecture 3 of the M362K Intro to Stochastic Processes course discusses stochastic processes, defining them as families of random variables indexed by time. It covers the concepts of discrete and continuous-time processes, the construction of a simple random walk, and the role of simulations in modeling randomness. Additionally, it addresses the challenges of random number generation and the requirements for effective random number generators.

Uploaded by

ataabuasad08
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Lecture 3: Stochastic processes and simulation 1 of 11

Course: M362K Intro to Stochastic Processes


Term: Fall 2014
Instructor: Gordan Zitkovic

Lecture 3
Stochastic processes and simulation

STOCHASTIC PROCESSES

Definition 3.1. Let T be a subset of [0, ∞). A family of random vari-


ables { Xt }t∈T , indexed by T , is called a stochastic (or random) pro-
cess. When T = N (or T = N0 ), { Xt }t∈T is said to be a discrete-time
process, and when T = [0, ∞), it is called a continuous-time process.

When T is a singleton (say T = {1}), the process { Xt }t∈T ≡ X1


is really just a single random variable. When T is finite (e.g., T =
{1, 2, . . . , n}), we get a random vector. Therefore, stochastic processes
are generalizations of random vectors. The interpretation is, however,
somewhat different. While the components of a random vector usually
(not always) stand for different spatial coordinates, the index t ∈ T is
more often than not interpreted as time. Stochastic processes usually
model the evolution of a random system in time. When T = [0, ∞)
(continuous-time processes), the value of the process can change ev-
ery instant. When T = N (discrete-time processes), the changes occur
discretely.
In contrast to the case of random vectors or random variables, it is
not easy to define a notion of a density (or a probability mass func-
tion) for a stochastic process. Without going into details why exactly
this is a problem, let me just mention that the main culprit is the in-
finity. One usually considers a family of (discrete, continuous, etc.)
finite-dimensional distributions, i.e., the joint distributions of ran-
dom vectors
( Xt1 , Xt2 , . . . , Xt n ),
for all n ∈ N and all choices t1 , . . . , tn ∈ T .
The notion of a stochastic processes is very important both in math-
ematical theory and its applications in science, engineering, economics,
etc. It is used to model a large number of various phenomena where
the quantity of interest varies discretely or continuously through time
in a non-predictable fashion.

Last Updated: March 23, 2016


Lecture 3: Stochastic processes and simulation 2 of 11

Every stochastic process can be viewed as a function of two vari- 6

ables - t and ω. For each fixed t, ω 7→ Xt (ω ) is a random variable, 4 æ

as postulated in the definition. However, if we change our point of æ

2 æ æ æ
view and keep ω fixed, we see that the stochastic process is a function æ æ æ æ æ æ

mapping ω to the real-valued function t 7→ Xt (ω ). These functions are æ æ


5
æ æ
10
æ æ
15
æ
20
æ æ æ
called the trajectories of the stochastic process X. -2

-4
Figure 1. on the right shows two different trajectories of a simple
random walk. i.e., each one corresponds to a (different) frozen ω ∈ Ω, -6

but t goes from 0 to 20. Note: We will define the simple random walk later. For
now, let us just say that is behaves as follows. It starts at x = 0 for t = 0. After that a fair 6

coin is tossed and we move up (to x = 1) if heads is observed and down to x = −1 is we æ

4 æ æ
see tails. The procedure is repeated at t = 1, 2, . . . and the position at t + 1 is determined æ æ æ

in the same way, independently of all the coin tosses before (note that the position at 2 æ æ æ

æ æ
t = k can be any of the following x = −k, x = −k + 2, . . . , x = k − 2, x = k). , æ æ æ æ æ
5 10 15 20
æ æ æ æ
Unlike with in Figure 1. above, Figure 2. on the right shows two -2 æ

time-slices of the same random process; in each graph, the time t is


-4
fixed (t = 15 vs. t = 25) but the various values random variables X15
and X25 can take are presented through the probability mass functions. -6

THE CANONICAL PROBABILITY SPACE Figure 1: Two (truncated) trajec-


tories of a random walk.
When one deals with
Note: You can skip this subsection if it sounds too abstract.
infinite-index (#T = +∞) stochastic processes, the construction of the 0.30

probability space (Ω, F , P) to support a given model is usually quite 0.25

a technical matter. This course does not suffer from that problem 0.20

because all our models can be implemented on a special probability 0.15

space. We start with the sample-space Ω: 0.10

0.05

Ω = [0, 1] × [0, 1] × · · · = [0, 1]∞ , 0.00


-20 -10 0 10 20
0.30

and any generic element of Ω will be a sequence ω = (ω0 , ω1 , ω2 , . . . ) 0.25

of real numbers in [0, 1]. For n ∈ N0 we define the mapping γn : Ω → 0.20

0.15
[0, 1] which simply chooses the n-th coordinate :
0.10

γn ( ω ) = ω n . 0.05

0.00
-20 -10 0 10 20

The proof of the following theorem can be found in advanced proba-


bility books: Figure 2: Two time-slices, corre-
sponding to t = 15 and t = 20,
Theorem 3.2. There exists a σ-algebra F and a probability P on Ω such that of the random walk.

1. each γn , n ∈ N0 is a random variable with the uniform distribution on


[0, 1], and

2. the sequence {γn }n∈N0 is independent.

Last Updated: March 23, 2016


Lecture 3: Stochastic processes and simulation 3 of 11

Remark 3.3. One should think of the sample space Ω as a source of all
the randomness in the system: the elementary event ω ∈ Ω is chosen
by a process beyond out control and the exact value of ω is assumed
to be unknown. All the other parts of the system are possibly compli-
cated, but deterministic, functions of ω (random variables). When a
coin is tossed, only a single drop of randomness is needed - the out-
come of a coin-toss. When several coins are tossed, more randomness
is involved and the sample space must be bigger. When a system in-
volves an infinite number of random variables (like a stochastic process
with infinite T ), a large sample space Ω is needed.

CONSTRUCTING THE RANDOM WALK

Let us show how to construct the simple random walk on the canonical
probability space (Ω, F , P) from Theorem 3.2. First of all, we need a
definition of the simple random walk:

Definition 3.4. A stochastic process { Xn }n∈N0 is called a simple ran-


dom walk if

1. X0 = 0,

2. the increment Xn+1 − Xn is independent of ( X0 , X1 , . . . , Xn ) for


each n ∈ N0 , and

3. the increment Xn+1 − Xn has the coin-toss distribution, i.e.

P[ Xn+1 − Xn = 1] = P[ Xn+1 − Xn = −1] = 12 .

For the sequence {γn }n∈N , given by Theorem 3.2, define the fol-
lowing, new, sequence {ξ n }n∈N of random variables:

1, γn ≥ 12
ξn =
−1, otherwise.

Then, we set
n
X0 = 0, Xn = ∑ ξk, n ∈ N.
k =1

Intuitively, we use each ξ n to emulate a coin toss and then define the
value of the process X at time n as the cumulative sum of the first n
coin-tosses.

Proposition 3.5. The sequence { Xn }n∈N0 defined above is a simple random


walk.

Last Updated: March 23, 2016


Lecture 3: Stochastic processes and simulation 4 of 11

Proof. (1) is trivially true. To get (2), we first note that the {ξ n }n∈N
is an independent sequence (as it has been constructed by an appli-
cation of a deterministic function to each element of an independent
sequence {γn }n∈N ). Therefore, the increment Xn+1 − Xn = ξ n+1 is
independent of all the previous coin-tosses ξ 1 , . . . , ξ n . What we need
to prove, though, is that it is independent of all the previous values of
the process X. These, previous, values are nothing but linear combi-
nations of the coin-tosses ξ 1 , . . . , ξ n , so they must also be independent
of ξ n+1 . Finally, to get (3), we compute

P[ Xn+1 − Xn = 1] = P[ξ n+1 = 1] = P[γn+1 ≥ 21 ] = 12 .

A similar computation shows that P[ Xn+1 − Xn = −1] = 21 .

SIMULATION

Another way of thinking about sample spaces, and randomness in


general, is through the notion of simulation. Simulation is what I
did to produce the two trajectories of the random walk above; a com-
puter tossed a fair coin for me 30 times and I followed the procedure
described above to construct a trajectory of the random walk. If I
asked the computer to repeat the process, I would get different 30
coin-tosses1 . This procedure is the exact same one we imagine nature 1
Actually, I would get the exact same 30
(or casino equipment) follows whenever a non-deterministic situation coin-tosses with probability 0.000000001

is involved. The difference is, of course, that if we use the random walk
to model out winnings in a fair gamble, it is much cheaper and faster
to use the computer than to go out and stake (and possibly loose) large
amounts of money. Another obvious advantage of the simulation ap-
proach is that it can be repeated; a simulation can be run many times
and various statistics (mean, variance, etc.) can be computed.
More technically, every simulation involves two separate inputs.
The first one if the actual sequence of outcomes of coin-tosses. The
other one is the structure of the model - I have to teach the computer
to “go up” if heads shows and to “go down” if tails show, and to re-
peat the same procedure several times. In more complicated situations
this structure will be more complicated. What is remarkable is that
the first ingredient, the coin-tosses, will stay almost as simple as in
the random walk case, even in the most complicated models. In fact,
all we need is a sequence of so-called random numbers. You will see
through the many examples presented in this course that if I can get
my computer to produce an independent sequence of uniformly dis-
tributed numbers between 0 and 1 (these are the random numbers) I
can simulate trajectories of all important stochastic processes. Just to
start you thinking, here is how to produce a coin-toss from a random

Last Updated: March 23, 2016


Lecture 3: Stochastic processes and simulation 5 of 11

number: declare heads if the random number drawn is between 0 and


0.5, and declare tails otherwise.

Random number generation Before we get into intricacies of simula-


tion of complicated stochastic processes, let us spend some time on
the (seemingly) simple procedure of the generation of a single random
number. In other words, how do you teach a computer to give you
a random number between 0 and 1? Theoretically, the answer is You
can’t!. In practice, you can get quite close. The question of what ac-
tually constitutes a random number is surprisingly deep and we will
not even touch it in this course.
Suppose we have written a computer program, a random number
generator (RNG) - call it rand - which produces a random number
between 0 and 1 every time we call it. So far, there is nothing that
prevents rand from always returning the same number 0.4, or from al-
ternating between 0.3 and 0.83. Such an implementation of rand will,
however, hardly qualify for an RNG since the values it spits out come
in a predictable order. We should, therefore, require any candidate for
a random number generator to produce a sequence of numbers which
is as unpredictable as possible. This is, admittedly, a hard task for a
computer having only deterministic functions in its arsenal, and that is
why the random generator design is such a difficult field. The state of
the affairs is that we speak of good or less good random number gener-
ators, based on some statistical properties of the produced sequences
of numbers.
One of the most important requirements is that our RNG produce
uniformly distributed numbers in [0, 1] - namely - the sequence of
numbers produced by rand will have to cover the interval [0, 1] evenly,
and, in the long run, the number of random numbers in each subin-
terval [ a, b] of [0, 1] should be proportional to the length of the interval
b − a. This requirement if hardly enough, because the sequence
0, 0.1, 0.2, . . . , 0.8, 0.9, 1, 0.05, 0.15, 0.25, . . . , 0.85, 0.95, 0.025, 0.075, 0.125, 0.175, . . .

will do the trick while being perfectly predictable.


To remedy the inadequacy of the RNGs satisfying only the require-
ment of uniform distribution, we might require rand to have the prop-
erty that the pairs of produced numbers cover the square [0, 1] × [0, 1]
uniformly. That means that, in the long run, the proportion of pairs
falling in a patch A of the square [0, 1] × [0, 1] will be proportional to
its area. Of course, one could continue with such requirements and ask
for triples, quadruples, . . . of random numbers to be uniform in [0, 1]3 ,
[0, 1]4 . . . . The highest dimension n such that the RNG produces uni-
formly distributed numbers in [0, 1]n is called the order of the RNG. A
widely-used RNG called the Mersenne Twister, has the order of 623.

Last Updated: March 23, 2016


Lecture 3: Stochastic processes and simulation 6 of 11

Another problem with RNGs is that the numbers produced will


start to repeat after a while (this is a fact of life and finiteness of
your computer’s memory). The number of calls it takes for a RNG
to start repeating its output is called the period of a RNG. You might
have wondered how is it that an RNG produces a different number
each time it is called, since, after all, it is only a function written in
some programming language. Most often, RNGs use a hidden vari-
able called the random seed which stores the last output of rand and
is used as an (invisible) input to the function rand the next time it is
called. If we use the same seed twice, the RNG will produce the same
number, and so the period of the RNG is limited by the number of
possible seeds. It is worth remarking that the actual random number
generators usually produce a “random” integer between 0 and some
large number RAND_MAX, and report the result normalized (divided) by
RAND_MAX to get a number in [0, 1).

Simulation of Random Variables Having found a random number gen-


erator good enough for our purposes (the one used by Mathematica is
just fine), we might want to use it to simulate random variables with
distributions different from the uniform on [0, 1] (coin-tosses, normal,
exponential, . . . ). This is almost always achieved through transforma-
tions of the output of a RNG, and we will present several methods for
dealing with this problem. A typical procedure (see the Box-Muller
method below for an exception) works as follows: a real (determinis-
tic) function f : [0, 1] → R - called the transformation function - is
applied to rand. The result is a random variable whose distribution
depends on the choice of f . Note that the transformation function is
by no means unique. In fact, γ ∼ U [0, 1], then f (γ) and fˆ(γ), where
fˆ( x ) = f (1 − x ), have the same distribution (why?).
What follows is a list of procedures commonly used to simulate
popular random variables:

1. Discrete Random Variables Let X have a discrete distribution given


by !
x1 x2 . . . x n
X∼ .
p1 p2 . . . p n
For discrete distributions taking an infinite number of values we
can always truncate at a very large n and approximate it with a
distribution similar to the one of X.
We know that the probabilities p1 , p2 , . . . , pn add-up to 1, so we
define the numbers 0 = q0 < q1 < · · · < qn = 1 by

q0 = 0, q1 = p1 , q2 = p1 + p2 , . . . qn = p1 + p2 + · · · + pn = 1.

Last Updated: March 23, 2016


Lecture 3: Stochastic processes and simulation 7 of 11

To simulate our discrete random variable X, we call rand and then


return x1 if 0 ≤ rand < q1 , return x2 if q1 ≤ rand < q2 , and so on.
It is quite obvious that this procedure indeed simulates a random
variable X. The transformation function f is in this case given by



 x1 , 0 ≤ x < q1

x , q ≤ x < q

2 1 2
f (x) =


 ...


n −1 ≤ x ≤ 1
x , q
n

2. The Method of Inverse Functions The basic observation in this


method is that , for any continuous random variable X with the
distribution function FX , the random variable Y = FX ( X ) is uni-
formly distributed on [0, 1]. By inverting the distribution function
FX and applying it to Y, we recover X. Therefore, if we wish to sim-
ulate a random variable with an invertible distribution function F,
we first simulate a uniform random variable on [0, 1] (using rand)
and then apply the function F −1 to the result. In other words, use
f = F −1 as the transformation function. Of course, this method
fails if we cannot write F −1 in closed form.

Example 3.6. (Exponential Distribution) Let us apply the method


of inverse functions to the simulation of an exponentially distributed
random variable X with parameter λ. Remember that the density
f X of X is given by

f X ( x ) = λ exp(−λx ), x > 0, and so FX ( x ) = 1 − exp(−λx ), x > 0,

and so FX−1 (y) = − λ1 log(1 − y). Since, 1−rand has the same U[0, 1]-
distribution as rand, we conclude that f ( x ) = − λ1 log( x ) works as
a transformation function in this case, i.e., that

log( rand )

λ
has the required Exp(λ)-distribution.

Example 3.7. (Cauchy Distribution) The Cauchy distribution is de-


fined through its density function

1 1
f X (x) = .
π (1 + x 2 )

The distribution function FX can be determined explicitly in this


example:
1 x 1 1 π 1 
Z  
−1
FX ( x ) = dx = + arctan ( x ) , and so FX ( y ) = tan π y − ,
π − ∞ (1 + x 2 ) π 2 2

Last Updated: March 23, 2016


Lecture 3: Stochastic processes and simulation 8 of 11

yielding that f ( x ) = tan(π ( x − 21 )) is a transformation function for


the Cauchy random variable, i.e., tan(π (rand − 0.5)) will simulate
a Cauchy random variable for you.

3. The Box-Muller method This method is useful for simulating nor-


mal random variables, since for them the method of inverse func-
tion fails (there is no closed-form expression for the distribution
function of a standard normal). Note that this method does not
fall under that category of transformation function methods as de-
scribed above. You will see, though, that it is very similar in spirit.
It is based on a clever trick, but the complete proof is a bit technical,
so we omit it.

Proposition 3.8. Let γ1 and γ2 be independent U[0, 1]-distributed ran-


dom variables. Then the random variables
q q
X1 = −2 log(γ1 ) cos(2πγ2 ), X2 = −2 log(γ1 ) sin(2πγ2 )

are independent and standard normal (N(0,1)).

Therefore, in order to simulate a normal random variable with


mean µ = 0 and variance σ2 = 1, we produce call the function
rand twice to produce two random numbers rand1 and rand2. The
numbers
q
X1 = −2 log(rand1) cos(2π rand2), and
q
X2 = −2 log(rand1) sin(2π rand2)

will be two independent normals. Note that it is necessary to call


the function rand twice, but we also get two normal random num-
bers out of it. It is not hard to write a procedure which will produce
2 normal random numbers in this way on every second call, return
one of them and store the other for the next call. In the spirit of
the discussion above, the function f = ( f 1 , f 2 ) : (0, 1] × [0, 1] → R2
given by
q q
f 1 ( x, y) = −2 log( x ) cos(2πy), f 2 ( x, y) = −2 log( x ) sin(2πy).

can be considered a transformation function in this case.

4. Method of the Central Limit Theorem The following algorithm is


often used to simulate a normal random variable:

(a) Simulate 12 independent uniform random variables (rands) -

γ1 , γ2 , . . . , γ12 .

Last Updated: March 23, 2016


Lecture 3: Stochastic processes and simulation 9 of 11

(b) Set X = γ1 + γ2 + · · · + γ12 − 6.

The distribution of X is very close to the distribution of a unit


normal, although not exactly equal (e.g. P[ X > 6] = 0, and
P[ Z > 6] 6= 0, for a true normal Z). The reason why X approx-
imates the normal distribution well comes from the following the-
orem

Theorem 3.9. Let X1 , X2 , . . . be a sequence of independent random vari-


ables, all having the same (square-integrable) distribution. Set µ = E[ X1 ]
(= E[ X2 ] = . . . ) and σ2 = Var[ X1 ] (= Var[ X2 ] = . . . ). The sequence
of normalized random variables

( X1 + X2 + · · · + Xn ) − nµ
√ ,
σ n

converges to the normal random variable2 . 2


in a mathematically precise sense, be-
yond the scope of these notes.

The choice of exactly 12 rands (as opposed to 11 or 35) comes from


practice: it seems to achieve satisfactory performance with rela-
tively low computational cost. Also, the standard deviation of a
√ √
U [0, 1] random variable is 1/ 12, so the denominator σ n conve-
niently becomes 1 for n = 12. It might seem a bit wasteful to use 12
calls of rand in order to produce one draw from the unit normal. If
you try it out, you will see, however, that it is of comparable speed
to the Box-Muller method described above; while Box-Muller uses

computationally expensive cos, sin, and log, this method uses
only addition and subtraction. The final verdict of the comparison
of the two methods will depend on the architecture you are running
the code on, and the quality of the implementation of the functions
cos, sin . . . .

MONTE CARLO INTEGRATION

Having described some of the procedures and methods used for sim-
ulation of various random objects (variables, vectors, processes), we
turn to an application in probability and numerical mathematics. We
start off by the following version of the Law of Large Numbers which
constitutes the theory behind most of the Monte Carlo applications

Theorem 3.10. (Law of Large Numbers) Let X1 , X2 , . . . be a sequence of


identically distributed random variables, and let g : R → R be function such
that µ = E[ g( X1 )] (= E[ g( X2 )] = . . . ) exists. Then3 3
as above, there is a mathematically pre-
cise way of describing how, exactly, this
Z ∞
g ( X1 ) + g ( X2 ) + · · · + g ( X n ) convergence occurs.
→µ= g( x ) f X1 ( x ) dx, as n → ∞.
n −∞

Last Updated: March 23, 2016


Lecture 3: Stochastic processes and simulation 10 of 11

The key idea of Monte Carlo integration is the following


Suppose that the quantity y we are interested in can be written as
Z ∞
y= g( x ) f X ( x ) dx,
−∞

for some random variable X with density f X and come function g, and that
x1 , x2 , . . . are random numbers distributed according to the distribution with
density f X . Then the average

1
( g( x1 ) + g( x2 ) + · · · + g( xn )),
n
will approximate y.

It can be shown that the accuracy of the approximation behaves like



1/ n, so that you have to quadruple the number of simulations if you
want to double the precision of you approximation.

Example 3.11.

1. (numerical integration) Let g be a function on [0, 1]. To approxi-


R1
mate the integral 0 g( x ) dx we can take a sequence of n (U[0,1])
random numbers x1 , x2 , . . . ,
Z 1
g ( x1 ) + g ( x2 ) + · · · + g ( x n )
g( x ) dx ≈ ,
0 n
because the density of X ∼ U [0, 1] is given by

1, 0 ≤ x ≤ 1
f X (x) = .
0, otherwise

2. (estimating probabilities) Let Y be a random variable with the den-


sity function f Y . If we are interested in the probability P[Y ∈ [ a, b]]
for some a < b, we simulate n draws y1 , y2 , . . . , yn from the distri-
bution FY and the required approximation is

number of yn ’s falling in the interval [ a, b]


P[Y ∈ [ a, b]] ≈ .
n
One of the nicest things about the Monte-Carlo method is that even
if the density of the random variable is not available, but you can
simulate draws from it, you can still preform the calculation above
and get the desired approximation. Of course, everything works
in the same way for probabilities involving random vectors in any
number of dimensions.

3. (approximating π)
We can devise a simple procedure for approximating π ≈ 3.141592

Last Updated: March 23, 2016


Lecture 3: Stochastic processes and simulation 11 of 11

by using the Monte-Carlo method. All we have to do is remember


that π is the area of the unit disk. Therefore, π/4 equals to the
portion of the area of the unit disk lying in the positive quadrant,
and we can write
Z 1Z 1
π
= g( x, y) dxdy,
4 0 0

where 
1, x 2 + y2 ≤ 1
g( x, y) =
0, otherwise.

So, simulate n pairs ( xi , yi ), i = 1 . . . n of uniformly distributed ran-


dom numbers and count how many of them fall in the upper quar-
ter of the unit circle, i.e. how many satisfy xi2 + y2i ≤ 1, and divide
by n. Multiply your result by 4, and you should be close to π. How
close? Well, that is another story . . . Experiment!

Last Updated: March 23, 2016

You might also like