1
Conditional Heteroscedasticity
May 30, 2010
Junhui Qian
1 Introduction
ARMA(p,q) models dictate that the conditional mean of a time series depends on past
observations of the time series and the past innovations. Let µt = E(Xt |Ft−1 ), we have for
an ARMA(p,q) process,
µt = a1 Xt−1 + · · · + ap Xt−p + b1 εt−1 + · · · + bq εt−q .
If we assume εt ∼ i.i.d. with zero mean and finite variance, then the conditional variance
of Xt is a constant, regardless of the order p or q,
var(Xt |Xt−1 , Xt−2 , ...) = var(εt ) < ∞.
In this chapter we relax this constraint and consider time-varying conditional variance.
2 ARCH and GARCH Models
To introduce time-varying conditional variance to the model, we write
Xt = µt + ωt ,
where µt is the conditional mean as above and ωt is a white noise with time-varying condi-
tional variance (conditional heteroscedasticity). Specifically, we write
ωt = σt εt , (1)
2
where εt is a strong white noise with zero mean and unit variance, ie, εt ∼ iid(0, 1). And
σt2 is the conditional variance of Xt , ie, σt2 = var(Xt |Xt−1 , Xt−2 , ...).
For ARCH and GARCH models, σt2 evolves over time in a deterministic manner. For
example, in the simplest ARCH(1) model, σt2 is specified as
σt2 = c + aωt−1
2
, (2)
where c > 0 and a ≥ 0. The positiveness of a implies that the probability of getting a large
shock in ωt is high when there is a big shock in ωt−1 . ARCH(1) model thus describes the
volatility clustering to some extent.
More generally, we have ARCH(p) model which is specified as
Definition: ARCH(p) Model
σt2 = c + a1 ωt−1
2
+ · · · + ap ωt−p
2
, (3)
where c > 0, ai ≥ 0 for all i.
The ARCH(p) component ωt has following properties.
(a) Let ηt = ωt2 − σt2 . (ηt ) is a martingale difference sequence, ie, E(ηt |Xt−1 , Xt−2 , ...) = 0.
(b) (ωt2 ) has an AR(p) form.
(c) (σt2 ) has an AR(p) form with random coefficients,
∑
p
σt2 = c + (ai ε2t−i )ωt−i
2
. (4)
i=1
(d) (ωt ) is a white noise with a (unconditional) variance of var(ωt ) = c/(1 − a1 − · · · − ap ).
(e) The (unconditional) distribution of ωt is leptokurtic.
3
For the ARCH(1) model in (2) in particular, var(ωt ) = c/(1 − a). Since variance has to be
positive, we must have 0 < a < 1. And if we assume εt ∼ iid N (0, 1), we can calculate the
fourth moment of ωt ,
3c2 (1 + a)
E(ωt4 ) = .
(1 − a)(1 − 3a2 )
And the (unconditional) kurtosis of ωt is thus,
E(ωt4 ) 1 − a2
Kurtosis(ωt ) = = 3 > 3. (5)
[var(ωt )]2 1 − 3a2
Since the kurtosis of ωt is greater than 3, the kurtosis of normal distribution, the tail of
the distribution of ωt is heavier or longer than the normal, which is saying large shocks are
more probable for ωt than a normal series. Of course, to ensure the kurtosis in (5) to be
√
positive, we must have 1 − 3a2 > 0, hence a is restricted to [0, 3/3).
One weakness of ARCH(p) models is that it may need many lags, ie, a big p, to fully
absorb the correlation in ωt2 . In the same spirit of the extension from AR to ARMA models,
Bollerslev (1986) proposes GARCH model, which specifies the conditional variance σt2 as
follows.
Definition: GARCH(p, q) Model
σt2 = c + a1 ωt−1
2
+ · · · + ap ωt−p
2 2
+ b1 σt−1 + · · · + bq σt−q
2
, (6)
∑max(p,q)
where c > 0, ai ≥ 0, bi ≥ 0 for all i, and i=1 (ai + bi ) < 1.
It is obvious that the GARCH model is a generalization of the ARCH model. If bi = 0
for all i, GARCH(p, q) reduces to ARCH(p). The GARCH component ωt has following
properties.
(a) Let ηt = ωt2 − σt2 . (ηt ) is a martingale difference sequence, ie, E(ηt |Xt−1 , Xt−2 , ...) = 0.
4
(b) Let r = max(p, q), (ωt2 ) has an ARMA(r, q) form,
∑
r ∑
q
ωt2 =c+ (ai + bi )ωt−i + ηt −
2
bi ηt−i , (7)
i=1 i=1
where (ai ) or (bi ) were padded with zero to have a length of r if necessary.
(c) (σt2 ) has an AR(r) form with random coefficients,
∑
r
σt2 = c + (ai ε2t−i + bi )ωt−i
2
. (8)
i=1
∑p ∑q
(d) ωt is a white noise, with an (unconditional) variance of var(ωt ) = c/(1− i ai − i bi ).
(e) The (unconditional) distribution of ωt is leptokurtic.
GARCH(1,1) is perhaps the most popular model in practice. The conditional variance
is specified as follows,
σt2 = c + aωt−1
2 2
+ bσt−1 , (9)
where c > 0, a > 0, b > 0, and a + b < 1.
If 1 − 2a2 − (a + b)2 > 0,
E(ωt4 ) 1 − (a + b)2
Kurtosis(ωt ) = = 3 > 3.
[var(ωt )]2 1 − (a + b)2 − 2a2
3 Identification, Estimation, and Forecasting
Since ARCH model is a special case of GARCH, we will focus on GARCH hereafter.
3.1 Identification
For all GARCH models, the square of the GARCH component, ωt2 , is serially correlated.
This gives us a test on whether a given process is GARCH – we may simply use the Ljung-
Box test on ωt2 .
5
We may also use Engle’s (1982) Lagrange test. This test is equivalent to the F -test on
ai = 0 for all i in the following regression,
ωt2 = c + a1 ωt−1
2
+ · · · + a2m ωt−m + ηt ,
where m is a predetermined number.
To determine the order of ARCH(p), we may examine the PACF of ωt2 . If we believe
the model is GARCH(p = 0, q), then we may use the ACF of ωt2 to determine q.
Finally, we may use information criteria such as AIC to determine the order of GARCH(p,
q).
3.2 Estimation
Maximum likelihood estimation is commonly used in estimating GARCH models. Assume
εt ∼ N (0, 1), the log likelihood function of GARCH(p, q) is
l(θ|ω1 , ..., ωT ) = log [f (ωT |FT −1 )f (ωT −1 |FT −2 ) · · · f (ωp+1 |Fp )f (ω1 , ..., ωp ; θ)]
∏T ( 2
)
1 ω
= log f (ω1 , ..., ωp ; θ) log √ − t2
2
2πσt 2σ
t=p+1 t
∑T ( 2
)
1 ω
= log f (ω1 , ..., ωp ; θ) − log(2π) + log(σt2 ) + t2 ,
2 σt
t=p+1
where θ is the set of parameters to be estimated, f (ωs |Fs−1 ) is the density of ωt conditional
on the information contained in (ωt ) up to time s − 1, and f (ω1 , ..., ωp ; θ) is the joint
distribution of ω1 , ..., ωp .
Since the form of f (ω1 , ..., ωp ; θ) is rather complicated, the usual practice is to ignore
this term and to use conditional log likelihood instead,
( )
1 ∑
T
ω2
l(θ|ω1 , ..., ωT ) = − log(2π) + log(σt2 ) + t2 . (10)
2 σt
t=p+1
6
Note that the σt2 in the above log likelihood function is not observable and has to be
estimated recursively,
σt2 = c + a1 ωt−1
2
+ · · · + ap ωt−p
2 2
+ b1 σt−1 + · · · + bq σt−q
2
.
The initial values of σt2 are usually assigned to be the unconditional variance of ωt , which
∑ ∑
is c/(1 − i ai − i bi ).
To check whether a model is adequate, we may examine the following series,
ωt
ε̂t = .
σ̂t
If the model is adequate and it is appropriately estimated, (ε̂t ) should be iid normal. We
may apply Ljung-Box test to (ε̂t ) to see if the conditional mean, µt in (??), is correctly
specified. We may apply Ljung-Box test to (ε̂2t ) to see if the model of ωt is adequate.
Finally, we may use Jarque-Bera Test and QQ-plot to check whether εt is normal.
We may, of course, use other distribution for the specification of εt . For example, one
popular choice is the Student-t, which has heavier tails than the normal distribution. For
the purpose of consistently estimating GARCH parameters such as (ai ) and (bi ), the choice
of distribution does not matter much. It can be shown that maximizing the log likelihood
in (10) yields consistent estimator even when the distribution of εt is not normal. This is
called quasi-likelihood estimation.
3.3 Forecasting
Forecasting volatility is perhaps the most interesting aspect of GARCH model in practice.
For one-step-ahead forecast, we have
σT2 +1 = c + a1 ωT2 + · · · + ap ωT2 −p+1 + b1 σT2 + · · · + bq σT2 −q+1 ,
7
where (ωT2 , ..., ωT2 −p+1 ) and (σT2 , ..., σT2 −q+1 ) are known at time T . Note that the one-step-
ahead forecast is deterministic.
For two-step-ahead forecasting, we have
σ̂T2 +2 = c + a1 E(ωT2 +1 |FT ) + a2 ωT2 + · · · + ap ωT2 −p+2 + b1 σT2 +1 + b2 σT2 + · · · + bq σT2 −q+2
= c + a2 ωT2 + · · · + ap ωT2 −p+2 + (a1 + b1 )σT2 +1 + b2 σT2 + · · · + bq σT2 −q+2 .
n-step-ahead forecast can be constructed similarly. For GARCH(1, 1) model in (9), the
n-step-ahead forecast can be written as
c(1 − (a + b)n−1 ) c
σ̂T2 +n = + (a + b)n−1 σT2 +1 → ,
1−a−b 1−a−b
c
as n goes to infinity. 1−a−b is exactly the unconditional variance of ωt .
4 Extensions
There are many extensions to the GARCH model. In this section we discuss four of them,
Integrated GARCH (IGARCH), GARCH in mean (GARCH-M), APGARCH, and Expo-
nential GARCH (EGARCH).
4.1 IGARCH
When the ARMA representation in (7) of a GARCH model has a unit root in its AR
polynomial, the GARCH model is integrated in ωt2 . The model is then called Integrated
GARCH, or IGARCH.
The key feature of IGARCH lies in the implication that any shock in volatility is per-
sistent. This is similar with ARIMA model, in which any shock in mean is persistent. Take
8
the example of IGARCH(1, 1) model, which can be written as,
ωt = σt εt , σt2 = c + bσt−1
2
+ (1 − b)ωt−1
2
.
The shock in volatility is given by ηt = ωt2 − σt2 . Then
ωt2 = c + ωt−1
2
+ ηt − bηt−1 .
To forecast volatility in the IGARCH(1, 1) framework, we first have
σT2 +1 = c + bσT2 + (1 − b)ωT2 .
Then we have
σ̂T2 +2 = c + σT2 +1
σ̂T2 +3 = c + σ̂T2 +2 = 2c + σT2 +1
···
σ̂T2 +n = (n − 1)c + σT2 +1
The case when c = 0 is especially interesting. In this case, the volatility forecasts
σ̂T2 +n = σT2 +1 for all n. This approach is indeed adopted by RiskMetrics for the calculation
of VaR (Value at Risk).
4.2 GARCH-M
To model premium for holding risky assets, we may let the conditional mean depend on
the conditional variance. This is the idea of GARCH in mean, or GARCH-M. A typical
9
GARCH-M may be written as
Xt = µt + ωt , µt = α′ zt + βσt2 , ωt = σt εt , (11)
where zt is a vector of explanatory variables and the specification for σt2 is the same as in
GARCH models.
4.3 APGARCH
To model leverage effects, which make volatility more sensitive to negative shocks, we may
consider the Asymmetric Power GARCH of Ding, Granger, and Engle (1993). A typical
APGARCH(p, q) can be written as
∑
q ∑
p
σtδ =c+ ai (|εt−i | + γi εt−i ) +
δ δ
bi σt−i , (12)
i=1 i=1
where δ, c, (γi ), (ai ), and (bi ) are model parameters.
The impact of εt−i on σtδ is obviously asymmetric. Consider the term g(εt−i , γi ) =
|εt−i | + γi εt−i . We have
(1 + γi )|εt−i |, if εt−i ≥ 0
g(εt−i , γi ) =
(1 − γi )|εt−i |, if εt−i < 0.
We expect γi < 0.
The APGARCH model includes several interesting special cases,
(a) GARCH, when δ = 2 and γi = 0 for all i
(b) NGARCH of Higgins and Bera (1992), when γi = 0 for all i.
(c) GJR-GARCH of Glosten, Jagannathan, and Runkle (1993), when δ = 2 and −1 ≤
γi ≤ 0,
10
(d) TGARCH of Zakoian (1994) when δ = 1 and −1 ≤ γi ≤ 0,
(e) Log-GARCH, when δ → 0 and γi = 0 for all i.
4.4 EGARCH
To model leverage effects, we may also consider Exponential GARCH, or EGARCH, pro-
posed by Nelson (1990). An EGARCH(p, q) model can be written as
∑
p ∑
q
ht = log σt2 , ht = c + ai (|εt−i | + γi εt−i ) + bi ht−i . (13)
i=1 i=1
As in APGARCH, we expect γi < 0. When εt−i > 0 (there is good news), the impact of
εt−i on ht is (1 + γi )|εt−i |. If εt−i < 0 (bad news), the impact is (1 − γi )|εt−i |.
5 Stochastic Volatility Models
In all ARCH and GARCH models, the evolution of the conditional variance σt2 is determin-
istic, conditional on the information available up to time t − 1.
SV (Stochastic Volatility) models relax this constraint and posit that the volatility itself
is random. A typical SV model may be defined as
ωt = σεt , β(L) log(σt2 ) = c + vt , (14)
where c is a constant, β(z) = 1 − b1 z − · · · − bq z q , and (vt ) is iid N (0, σv2 ).
The SV model can be estimated using quasi-likelihood methods via Kalman filtering or
MCMC (Monte Carlo Markov Chain). Some applications show that SV models provide bet-
ter performance in terms of model fitting. But their performance in out-of-sample volatility
forecasts is less convincing.
11
Appendix
5.1 Ljung-Box Test
The Ljung-Box test is a test of whether any of a group of autocorrelations of a time series
are different from zero. It is a joint test based on a number of lags and is therefore a
portmanteau test.
The Ljung-Box test statistic is defined as,
∑
h
ρ̂2k
Q = n (n + 2) ,
n−k
k=1
where n is the sample size, ρ̂k is the sample autocorrelation at lag k, and h is the number
of lags being tested. Q is asymptotically distributed as the chi-square distribution with
h degrees of freedom. The LjungCBox test is commonly used in model diagnostics after
estimating time series models.
5.2 Jarque-Bera Test
The Jarque-Bera test can be used to test the null hypothesis that the data are from a
normal distribution, based on the sample kurtosis and skewness. The test statistic JB is
defined as
n( 2 )
JB = S + (K − 3)2 /4 ,
6
where n is the number of observations, S the sample skewness, and K the sample kurtosis.
JB is distributed as χ22 . The null hypothesis is a joint hypothesis of both the skewness
and excess kurtosis being 0, since samples from a normal distribution have an expected
skewness of 0 and an expected excess kurtosis of 0.