0% found this document useful (0 votes)
35 views30 pages

Empirical Finance7

7empirical finance7

Uploaded by

edison6685
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views30 pages

Empirical Finance7

7empirical finance7

Uploaded by

edison6685
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

7.

Appendix (not examinable): Monte-Carlo (MC) methods;


by K. M. Abadir

Three cases illustrating simulation experiments:

Case 1. Suppose you come up with a new procedure (e.g. estimation method).
Before you try it out on the data, you need to verify how well it will perform
in a controlled environment where you know the true state of affairs.
If it breaks down in this case (e.g. big bias of the estimator), you can either
modify it or think about an alternative approach. (The spurious regression
problem was discovered this way.)
If it works in a variety of possible situations, then you can take it to the data
and start applying it.
Case 2. Suppose you have a quantity of interest to calculate; e.g. an integral
(the price of a European option is an expectation, hence an integral).
You may be able to work it out analytically, but the real world is more com-
plicated than tractable analytical models that are typically simplified.
If the required quantity involves a random variable (e.g. options again), then
you can simulate the possible outcomes by generating random numbers. For
example, we generated two random walks or approximate Brownian motions
in Lecture 3 when we analyzed spurious correlations.
Case 3. A random variable will have more than one possible outcome, so it
is important to be able to quantify the variability or sensitivity that arises.
For example, in some ideal cases, we known the distribution of the estimators.
However, in most models the distribution contains unknown components.
Example 1: {}=1 ∼ IN(  2), i.e.  =  +  where {} ∼ IN(0  2).
The sample mean  is an unbiased estimator of the population mean , with
var() =  2 where  2 is unknown (the same is true for the generalization of
this simple model to y = Xβ + ε whose MLE has var(β) b =  2(X 0X)−1).

Using  b in the t-ratio, asymptotic normality holds in large samples. However,


it is a poor approximation when  in small (see Student’s t tables), and it is
even more problematic if  is not normal.
Example 2: from a random sample of observations {} with density  (not
necessarily normal), the sample median e has variance 1(4 ()2) where  is
the population median.
In this example, we have the additional complication of how to calculate  .
(NP density estimation can give the answer, if we are willing to make further
assumptions on  . Also, the results will depend on the bandwidth.)
The applications in this lecture follow broadly these 3 cases, but first we need
to introduce simulation methods and evaluate their efficiency in Section A.
A.1. Generating pseudorandom numbers
Monte-Carlo (MC) simulations attempt to answer questions like these, by gen-
erating sequences of numbers that seem random. It is not the same as calibra-
tion, where values are assigned instead of being generated randomly.
MC generates pseudo-random numbers, i.e. numbers that are generated de-
terministically (nonrandomly) but seem random and pass the usual tests for
randomness. This is what the slot machines in Monaco and Las Vegas use!
For example, variates {}=1 (where  is called the number of replications or
repetitions) that look uniformly distributed on 0      − 1 can be generated
by
+1 = mod ( +  )
where 0 is the seed of the sequence and mod(.,.) is the modulo function; e.g.
mod(13 5) = 3, the remainder of the division of 13 by 5. Bad    choices
lead to graphs like overleaf, but today’s software is much better.
Figure 1: Plot of +1 vs. , for two generators; Griffeath (1992, Stat Sci).
For any continuous c.d.f.  (), the function  with random argument  ()
is uniformly distributed over the interval (0 1); e.g. for  a logisitic variate:
1
has a standard uniform distribution as  varies randomly.
1 + exp(−)

Proof. Define the random  ≡  () which has realizations  ≡  () ∈ (0 1).
The change of variable formula of calculus can be simplified here: first,
 () ≡ Pr( ≤ ) = Pr( () ≤  ()) ≡ Pr( ≤ ) ≡ ()
where the equality is because  is nondecreasing, then differentiating both
sides w.r.t.  gives
d () d() d  ()
= · ⇐⇒ () = 
d d d d d
which actually links  to  for any d d  0, more generally than just in
our case. For our  ≡  (), we get the p.d.f. of  as
 ()  ()
() ≡ = = 1
d () d  ()
so a standard (since  ∈ (0 1)) uniform where all outcomes are equally likely.
Implication for MC: if we have a uniform  (as we do from the previous slides),
we can generate any -variate having c.d.f.  from −1().
For example, let  ∈ R− have the c.d.f.
 () = e for   0
The variate − ∈ R+ is known as a standard exponential. Now,
−1() = log()
If we generate  as a standard uniform, the previous paragraph says that log()
has the same distribution as ; i.e. − log() ∈ R+ has a standard exponential
distribution. To generate an exponential with variance 2, use (− log()) .
These ideas can be illustrated in the MATLAB programme “expo.m”. In-
crease the number of replications to get a more accurate sample (i.e. thorough
representation) of exponentials.
Let 1 and 2 be a random sample from the standard uniform distribution.
The Box-Müller transformation,
µ ¶ µ ¶
1 12 cos(22)
≡ (−2 log(1))
2 sin(22)
generates 1 and 2 as independent N(0,1) variates.
The p.d.f. of a standard Cauchy is (with slow non-exponential decline of tail)
1
 () = ¡ ¢
 1+ 2
It can be generated as a ratio of two independent N(0,1), which explains why
we expect to see it often in finance where we study ratios. Alternatively, the
more basic representation in terms of uniform variates is from the Cauchy c.d.f.
 () = cot−1(−) = 1 − cot−1() (1)
and solving  =  () here gives
 = cot−1(−) ⇒  = − cot () 
Notice that − ∼ Cau(0,1) if  ∼ Cau(0,1). Similarly, 1 −  is also standard
uniform, and the second expression of (1) plugged into 1 −  =  () gives
1 −  = 1 − cot−1() ⇒  = cot () 
The MATLAB file “cau.m” (see also “cau2.m”) illustrates the fat-tailed Cauchy
distribution, and how often extremes arise. It also illustrates that the mean
from a Cauchy refuses to settle down to a single value as  → ∞: the LLN
fails (and so does the CLT). Contrast this with the normal, where the sample
mean is N( 1), hence tends to  (it is 0 in this example) as  → ∞.
The MATLAB file “unif.m” shows the distribution of the sample mean  when
the data {} are uniformly distributed. The mean of two independent uni-
forms, 12 (1 + 2), has a triangular density. Very rapidly, as  increases, we get
the normal distribution for , and the CLT applies.
Recall that {}=1 ∼ IID(  2), with  2  ∞, gives

 ∼ D( 2) and  ∼ N(  2)
In “unif.m”, we can replace the uniform density with the asymmetric exponen-
tial, and we would still have the convergence to normality as  → ∞.
A.2. Variance reduction techniques (VRTs)
You may need to generate a lot of random numbers, before you’ve got a thor-
ough representation of a variate like . And it becomes even harder when you
need to generate a long time-path of a process, and its replications that reflect
all possible routes; e.g. in pricing exotic options.
An alternative is to reduce the variability of the experiment by a judicious
choice of generated numbers, the so-called variance reduction techniques (VRTs).
In our financial application, we will require the evaluation of integrals. We will
therefore illustrate VRTs for the calculation
Z of integrals. Define
≡ () d
∈X
We can write this as an expectation with respect to a random variable , whose
realizations are  ∈ X and density (), as
Z µ ¶
() ()
≡ () d = E 
∈X () ()
where we stress that the expectation is taken w.r.t.  (hence the subscript of
E). Let us start with the sampling, then we will consider VRTs.
We can generate  independent and identically distributed  = 1     
(the  replications), then approximate  by the MC-sample average

1 X ()

 ()
=1
which is an unbiased estimator. (Note  6=  in general.) By the i.i.d. property
of {}, the variance of this estimator of  is
µ ¶
1 ()
var  (2)
 ()
There are 2 ways to reduce var(). First, importance sampling: a ‘good’ 
to choose would be proportional to  (because var(constant) = 0), representing
the relative frequencies of (·) accurately; e.g. take  to be the exponential
density if  declines exponentially, normal if  is bell-shaped, etc.
Second, how can we improve on our estimate of , given that we have calculated
a number  of MC repetitions? Generating them takes time, so it is good if
we can find a simple way to save on how many  we need to generate.
From (2), if a VRT yields an efficiency gain (variance reduction) of e.g. 2, it
allows us to achieve the same accuracy by generating only half the replications.
The first VRT that we consider is based on the following idea. If  is a density
that is symmetric around 0, and we have generated
1 = 15 2 = 11 3 = −23 4 = 12 5 = −15 6 = 01
then we know (by the symmetry of the probability density  ) that
7 = −15 8 = −11 9 = 23 10 = −12 11 = 15 12 = −01
are equally likely (identically distributed to the first lot, but not independent of
it). The additional ‘replications’ are called antithetic variates. The new com-
1 P ()()+(−)(−)
bined estimator  =1 2 has the same expectation
as before, but instead of (2), its variance is
µ µ ¶¶
1 1 () (−)
var +
 µ2 ()¶ (−)µ ¶ µ ¶
1 () 1 (−) 1 () (−)
= var + var + cov 
4 µ () ¶ 4 µ(−) 2¶ () (−)
1 () 1 () (−)
= var + cov   (3)
2 () 2 () (−)
by symmetry. If cov = 0, then we have reduced the variance of our estimator
of  by a factor of 2. If cov  0, that’s even better! In general, there is a
reduction of the variance if cov  var.
The second VRT is based on the idea that we may be able to control the
accuracy of our calculation if we can
Z find an integral with a known solution
0 ≡ 0() d
∈X
such thatZ0 ≈ . More specifically, Z
= [0() + (() − 0())] d = 0 + (() − 0()) d
∈X ∈X
and we call 0() a control variate; e.g. think of 0 as the Black-Scholes price of
a European call option (known explicitly: see Section C below) and  the price
of a corresponding American option. Estimate  from the MC replications by
X
1 () − 0()
0 + 
 ()
=1
where 0 is known.µ The estimator¶ is unbiased, with variance
1 () − 0()
var
 µ ¶ () µ ¶ µ ¶
1 () 1 0() 2 () 0()
= var + var − cov  
 ()  ()  () ()
which is less than before if var(0()())  2 cov( ).
A.3. Response surfaces
We have already seen one way to summarize the outcome of the MC experi-
ment, by histograms. But what if we are estimating a quantity that depends
on, e.g.,  (sample size) and θ (vector of parameters)?
We need to design our experiment for a representative range of  and θ, then
summarize it in a response surface, a function of  and θ that can be extrap-
olated to other values of  and θ.
For example, let {}=1 be generated by the AR(1),
 = −1 +  (4)

¡ 2¢
where {}=1 ∼ IN 0  and 0 = 0. The MLE
P
=2 −1
b ≡ P

 2
=2 −1
is biased, but the bias vanishes as  → ∞. We will show in Section B that the
bias is negative: done by generating  replications of the whole path {}=1,
each for the same  and , then calculating the average of  b ’s. (Also, plot the
histogram of b to see how far its distribution is from the normal.)
A response surface would summarize how this bias changes as  and  change,
in this case taking the form

bias(b) ≈ 1 + error, ( ||  1),

where we can estimate 1 by regression of the MC estimate of bias(b) on the
corresponding  . This helps:
1. project the bias for any  and ||  1, not just the few values that have
been calculated;
2. summarize many numerical results into a simple formula.
A more general form would involve higher powers of   −1 and corresponding
1 2    coefficients to estimate.
Note: this response surface should not contain , since  b does not depend on
: P
=2 () (−1)
b = P
 2

=2 (−1)
Before we finish this section, we warn that response surfaces are inherently
heteroskedastic, so one should use HCSE to judge the significance of the fitted
parameters. Also, appropriate functional forms should be considered, such as
the logit transformation log ( (1 − )) for probabilities .
B. Simulating the performance of estimators and tests

The MATLAB file “ar1.m” calculates the bias and distribution of the MLE in
the AR(1) model. Try out a few values of  and , and see what you get!
)→ 0 (see the previous slide for the rate at which it happens)
Note that bias(b

b ∼ N(const ) when  → ∞ under ||  1.
and 
The MATLAB file “ar1quant.m” calculates the 5% quantile of the distribution
of the t-ratio for H0 :  = 1 in the AR(1) model. It also plots the histogram
of the t-ratio, showing that it is not symmetric and that it differs from the
usual Student t whose limit is the normal. Note that you can now obtain any
quantiles you need, not just the  = 25 50    that are in Dickey-Fuller tables.
The MATLAB file “ar1pow.m” calculates the power of the same test, for a 5%
size (Type I error) when H1 :  = 09 is true. As you can see, the power of
about 16% is very low! This inability to distinguish competing hypotheses in
the neighborhood of 1 is a typical problem of unit-root analysis.
See also earlier simulations of the sample mean.
C. Simulating option prices
The MATLAB file “RW.m” generates a random walk (RW), and plots its path.
The RW is the discrete-time equivalent of the Brownian motion (BM). Notice
a number of features:
• The reflection principle of BM means that, at any point in time, the path
could have gone in the opposite direction. An antithetic technique could use
this to generate extra paths from the first one.
• If ( ) is a driftless BM d( ) ≡  d ( ), then ( ) ≡ e( ) is a geometric
BM (GBM) with positive drift
2
d( ) = ( ) d + ( ) d ( )
2
 2
by Itô’s formula (see also the previous lecture for the extra 2 term).
As in the previous section, we can use MC simulations to generate a repre-
sentative set of trajectories for ( ). These can be used to price European
options, which are just a truncated expectation of ( ) at the terminal point.
Let us analyze this pricing in more detail.
Let  be the price of an asset at time . Assume, for the moment, that the
asset does not pay dividends. Suppose that this asset is underlying a European
call option with expiration date  and strike price . Then, the intrinsic value
of this option at expiration is max{ −  0}. In an arbitrage-free economy,
there exists a risk-neutral density (RND)  such that the price of the call
option at time  can be written as
Z ∞
 () = e−( −) E (max{ −  0}) = e−( −) ( − )  () d

where  is the continuously-compounded risk-free interest rate. (Note:  is a
declining function of , at any given time .) Differentiating the integral gives
Z ∞
d ()
= −e−( −)  () d ≡ −e−( −) (1 −  ()) 
d 
where  is the c.d.f. corresponding to the p.d.f.  . The second derivative is
given by ¯
d2 () ¯¯ −( −)  () 
¯ = e
d 2 ¯
=
which reveals the required density  .
The Black-Scholes and Merton models assume that  is log-normal (LN), i.e.
that log( ) is normal. Assume that  ≡   ∼ LN(( − 12  2)  2), where
 ≡  −  is the time to maturity,   0 and we know  at time ; i.e. the
continuous-time ( ) follows a geometric Brownian motion with initial value
(0) ≡  and drift . Then,
 () = Φ(1) − e− Φ(2) (5)
where Φ is the standard normal c.d.f.,
³ ´
log (e− ) 1 √ √
1 = √ +   and 2 = 1 −  
  2
giving  () a decreasing function of . If the asset pays a dividend yield of †,
− †
then it should be subtracted from  and the RHS of (5) multiplied by e .
If the RND is indeed as assumed here (i.e. LN), MC is not going to provide any
new insight, since we have the pricing formula (5). You may wish to use MC to
price barrier options, but again there are formulae for the hitting (or passage)
times of the BM in terms of inverse Gaussian distributions. MC comes in
handy for other exotic options.
It turns out that real life is more elaborate than LN, maybe a mixture of LNs
but possibly other forms. Some typical RNDs are given in the following graphs.
For MC pricing, you need to know which RND to simulate from.

Figure 2: Plot of 2 RNDs (general vs. LN) for 2 maturities.


D.1. Resampling by the Jackknife

In Lecture 1, we saw the idea of resampling without replacement and we called


it cross-validation. However, the idea is older than that, and has been used by
Quenouille back in 1949, then Tukey in 1958 who called it the Jackknife.
It is a way to get a feel for the sensitivity of the estimates to any observation in
the data set, and it provides a way to estimate this variability and/or correct
for systematic deviations (e.g. biases); see the exercise at the end.
It is a precursor to the next resampling technique that has become much more
influential.
D.2. Resampling by the bootstrap

We now consider a method of resampling with replacement. Allowing replace-


ment
¡¢ means that we can generate as many samples as we want (not limited by
 as in the Jackknife that removes  observations at a time). It treats the
i.i.d. sample as if it were the population.
Suppose we go back to Example 2 of Case 3 (see slide 3).
The sample 1      is a random drawing ( times) from a variate  having
density  , and leads to the sample median e, but we don’t know its variance.
The bootstrap generates  samples, each of size , by drawing  observations
with replacement from {}=1. This generates  medians, {eb }
=1, and we
can now estimate the variance of e by

1 X
c ) ≡
var(e b − eb)2
(e

=1
where eb is the average of the bootstrap samples’ eb1     eb.
Let us illustrate...
Figure 3: “Pulling yourself by the bootstrap”; illustration in the Dec.04 issue
of Significance.
The teacher generates a sample of size  = 20 from the logistic p.d.f.
exp (−) 1
 () = 2
= 2

(1 + exp (−)) (exp (2) + exp (−2))
whose c.d.f. is
1
 () = 
1 + exp (−)
with the logit as its inverse −1 () = log ( (1 − )). This symmetric density
implies that the median is zero and
1 4
var(e
) = 2
= = 02
4 (0) 
but the teacher didn’t tell you any of this! Now suppose he gave you this
sample (you don’t know where it’s coming from) and asked you to bootstrap
it to get an estimate of the variance.
In the MATLAB file “BootLogitMed.m”,  = 100 gives a bootstrap estimate
of the variance that is not bad but is a bit volatile. Increasing  to 1 000 gives
fairly stable estimate. This stable estimate is usually also accurate.
This was an example of a nonparametric bootstrap.
Suppose we wrote  =  + , then estimated  b (hence b) from the sample.
We could generate as many new series {b } as before, but now drawing from
the rescaled
b

 ≡ q
e
1 − 1
then adding  b to each bootstrapped {e b} value. The denominator is due
to the fact that var(b ) = (1 − 1 ) 2  var(). In general, this factor will
depend on , so you will also need to recenter these rescaled residuals. (If
a ≡ (0     0 1 0     0)0, with a one in the -th position and zeros elsewhere,
2 0
¡ 0 ¢−1 0
then var(b) =  (1−aX X X X a) for the linear model y = Xβ+ε.)
We now have  samples of {b}=1 but, because the new b values depend
b, this is called a parametric bootstrap. An-
on the initial parameter estimate 
other form of parametric bootstrap is to generate data from N(b b2(1 − 1 )),
 
assuming that the data were indeed normal.
The bootstrap does not work in every single situation, but it works generally
well. In addition to the previous illustration, it gives accurate small-sample
results when the asymptotic theory ( → ∞) is not a good approximation.
Cases where the bootstrap is known to fail include unit roots ( = 1 in the
AR(1)) and/or fat-tailed distributions such as the Cauchy.
One of the cases where the bootstrap requires extra care: dependent data (e.g.
time series), especially when the dependence structure is unknown.
The dependence structure may be destroyed by the resampling!
One of the tricks is to use the block-bootstrap, where blocks of data are resam-
pled, rather than individual observations.
This works if there is not too much persistence in the data, and the blocks are
sufficiently large to preserve this persistence within blocks. The blocks also
need to be sufficiently numerous, so there is a balancing choice to be made
about their size.
E. Exercises
Exercise 1 (Jackknifing b ) Assume that the bias of an estimator based on a
random sample of size  has the asymptotic expansion µ ¶
b 1 () 2 () 1
bias() = + 2
+ 2

  
for some nonrandom functions 1 and 2 not depending on . Suppose that we
eliminate one observation at a time from the sample, say the -th observation,
and recalculate the estimator now labelled as b− .
(a) Prove that there are  values of b
− .
b 1 P b
(b) Denoting the average of these values by −1 ≡  =1 − , prove that
µ ¶
b 1 () 2 () 1
E(−1) =  + + 2
+ 2

 − 1 ( − 1) 
(c) Let the Quenouille-Tukey jackknife estimator be defined by
e  − ( − 1) b
1 ≡ b −1
Show that the (1) term that was in the bias of b
 has now disappeared from
the bias of e
1, and that E(e
1) =  + (1).
Exercise 2 (Prog) Go through the provided MATLAB files, altering the para-
meters of the simulations.
Exercise 3 (Prog too) Write a programme to simulate a GBM  times. Use
it to price a European call option when  = 0, and check that the result matches
the theoretical formula (5). Now alter the generating distribution of increments
to a fat-tailed t distribution (or asymmetric exponential) instead of a normal,
and redo the comparison between MC and Black-Scholes formula. Finally,
create you own option (not a European one), and use simulations to price it.
Overall, how sensitive are your results to the choice of  and ?
Exercise 4 (Boot) The teacher generates a series of  = 10 observations from
an AR(1) with  = 09 and gives it to you. (Alternatively, you have generated
the series but then forgotten what  was!) Use it to obtain a bootstrap distribu-
tion of the MLE  b , and show that it looks more like the unit-root approximation
than the limiting normal that arises for ||  1. Redo the exercise, increas-
ing  to see how slowly the limiting distribution converges to the asymptotic
normal.
Solution ¡1. (a) ¢ For
¡ selecting  − 1 observations out of a sample of size ,
¢

there are −1 = 1 =  possible combinations of the observations (the
sample is i.i.d. and therefore the order of the observations does not matter).
(b) From the definition of b −1 and the given expansion of bias(b),
 
à µ ¶!
1 X 1 X 1 () 2 () 1
b
E(−1) = b
E(− ) = + + +
   − 1 ( − 1) 2 2
=1 =1

since each of b
− is based on  − 1 observations, rather than , and (1( −
1)2) = (12). We get the required result by noting that none of the sum-
mands depend on .
(c) From the definition of e
1,
E(e1) =  E(b) − ( − 1) E(b
 )
µ µ−1¶¶ µ µ ¶¶
2 () 1 2 () 1
=  + 1 () + + − ( − 1)  + 1 () + +
µ  ¶ µ ¶ −1 
1 1 1
=  + 2 () − +
 µ− 1¶ 
2 () 1
=− +
( −
µ ¶ 1) 
1
=+ 

One should be careful in interpreting this result: it is the order of the bias that
is reduced (hence bias reduction in ‘large’ samples) but nothing is said about
what happens to the bias when  is small.

You might also like