Time Series Analysis & ARMA Modeling
Time Series Analysis & ARMA Modeling
A time series is a stochastic process in discrete time with a continuous state space.
Notation: {X1, X2, ..., Xn } denotes a time series process, whereas {x1, x2, ..., xn } denotes a univariate
time series, i.e. a sequence of realisations of the time series process.
x1
xn
x2 … xn-1
?
- e.g. correlation coefficient between sales 1 month apart, 2 months apart, etc.
! Autocorrelation Function (ACF)
! Partial Autocorrelation Function (PACF)
! From the class of ARMA models, select a model which best fits the data based on ACF
and PACF of the observed time series
All ARMA models are stationary. If an observed time series is non-stationary (e.g. upward trend),
it must be converted to stationary time series (e.g. by differencing).
I.2 Other forms of analysis
Another important approach to the analysis of time series relies on the Spectral Density Function; the
analysis is then based on the autocorrelation function of a time series model. This approach is not
covered in this course.
II. Stationarity and ARMA modelling
II.1 Stationarity
a. Definition
A stochastic process is (strictly) stationary if its statistical properties remain unchanged over time.
Joint distribution of Xt1, Xt2, ..., Xtn = Joint distribution of Xk+t1, Xk+t2, ..., Xk+tn, for all k and for all n.
Example: Joint distribution of X5, X6, ..., X10 = Joint distribution of X120, X121, ..., X125
" for any ‘chunk’ of variables
" for any ‘shift’ of start
Take n = 1:
Xt Xt+k
t t+k
• The variables Xt in a stationary process must be identically distributed (but not necessarily
independent)
Take n = 2:
Xs Xt Xs+k Xt+k
{s t} {s+k t+k}
b. Strict Stationarity
Examples:
! NOT stationary
! trivially stationary
c. Weak Stationarity
• This requires only that E(Xt) is constant AND COV(Xs, Xt) depends only on (t – s)
• For weak stationarity, COV(Xt, Xt+k) is constant with respect to t for all lags k
Solution: If X ~ N(!, ") then X is completely determined by ! and " (property of the multivariate
Normal distribution). If these do not depend on t, neither does the distribution of X.
! Xt is weakly stationary
Question: If we know X0, then we can work out u, since X0 = sin(u). We then know all the values of Xt
= sin(#t + u)
! Xt is completely determined by X0
Definition: X is purely indeterministic if values of X1, ..., Xn are progressively less useful at predicting
XN as N % &.
Here stationary time series means weakly stationary, purely indeterministic process.
c. Autocovariance function
Xt Xt+k
| |
t t+k time
• We define !k = Cov(Xt, Xt+k) = E(Xt Xt+k) - E(Xt) E(Xt+k) the “autocovariance at LAG k”.
• Note: !0 = Var(Xt)
Question: Properties of covariance – needed when calculating autocovariances for specified models.
• For a stationary process, we define "k = corr(Xt, Xt+k) = !k/!0 the “autocorrelation at lag k”.
(This is the usual correlation coefficient, since Var(Xt) = Var(Xt+k) = '0)
• Note: "0 = 1
• For a purely indeterministic process, we expect *k % 0 as k % & (i.e. values far apart will not
be correlated)
• Recall (ST3053): a sequence of i.i.d. random variables {Zt} is called a white noise process and is
trivially stationary.
#! 2 , if k = 0
" "k = COV ( e t ,e t + k ) = $
%0, otherwise
• Note: the variables et have zero mean, variance )2 and are uncorrelated
• A sequence of i.i.d. variables with zero mean will be a white noise process, according to this
definition. In particular, Zt independent, Zt ~ N(0,)2) is a white noise process.
Let r(x,y|z) = corr(x,y|z) denote the partial correlation coefficient between x and y, adjusted for z (or
with z held constant).
| | | | |
t t+1 t+2 t+k-1 t+k
• Denote:
+2 = corr(xt, xt+2|xt+1)
+1 = corr(Xt, Xt+1) = *1
Recall that
r(x,y) - r(x,z)r(y,z)
r(x,y|z) =
1-r 2 (x,z) 1-r 2 (y,z)
Applying this here, using x = Xt, y = Xt+2, z = Xt+1, "2 = corr(xt, xt+2|xt+1) = r(x,y|z), along with #1= r(x,z)
and #2 = r(x,y), yields:
! 2 # !12
"2 =
1 # !12
We assume that the sequence of observations {x1, x2, ...xn} comes from a stationary time series process.
To find a model to fit the sequence {x1,x2, ... ,xn}, we must be able to estimate the ACF of the process
of which the data is a realisation. Since the model underlying the data is assumed to be stationary, its
mean can be estimated using the sample mean.
1 n
µ̂ = ! xt
n t =1
The autocovariance function, 'k, can be estimated using the sample autocovariance function:
1 n
!ˆk = # (x t " µˆ )(x t-k " µˆ )
n t = k +1
!ˆk
rk =
!ˆ0
The collection {rk : k ! Z } is called the sample autocorrelation function (SACF). The plot of rk
against k is called a correlogram.
!1 = "1
1 "1
"1 " 2 " 2 # "12
!2 = =
1 "1 1 # "12
"1 1
In general, !k is given as a ratio of determinants involving *1, *2, ..., *k. The sample partial
autocorrelation coefficients are given by these formulae, but with the *k replaced by their estimates rk:
!ˆ1 = r1
ˆ r2 " r12
!2 =
1 " r12
etc.
The collection {!ˆk } is called the sample partial autocorrelation function (SPACF). The plot of {!ˆk }
against k is called the partial correlogram.
rk 1
!ˆk 1
| | | | | | | | | | | |
k k
-1 -1
These are the main tools in identifying a model for a stationary time series.
II.3 ARMA modelling
Autoregressive moving average (ARMA) models constitute the main class of linear models for time
series. More specifically:
• Autoregressive (AR)
• Moving Average (MA)
• Autoregressive Moving Average (ARMA)
• Autoregressive Integrated Moving Average (ARIMA)
a. AR models
• Recall: Markov Chain = process such that the conditional distribution of Xn+1, given Xn,Xn-1,...X0
depends only on Xn, i.e. “the future depends on the present, but not on the past”
• The simplest type of autoregressive model (AR(1)) has this property: Xt = , Xt-1 + -t , where -t is
zero-mean white noise.
Xt-2 Xt-1 Xt
| | |
t-2 t-1 t
Xt = µ + , (Xt-1 – µ) + -t
b. MA models
A realisation of a white noise process is very ‘jagged’, since successive observations are realisations of
independent variables... Most time series observed in practice have a smoother time series plot than a
realisation of a white noise process, since in this process the successive observations are realisations of
independent variables. In that respect, taking a “moving average” is a standard way of smoothing an
observed time series:
Data
Moving average
• The simplest type of moving average (MA) process is Xt = µ +-t + .-t-1 where -t is zero-mean
white noise
Xt-1
Xt-2 Xt
c. ARMA models
!Xt = Xt – Xt-1
% t $1 &
Var ( X t ) = Var '' µ + ! t ( X 0 $ µ ) + +! j" t $ j ((
) j =0 *
t $1
= ! 2tVar ( X 0 ) + + ! 2 j# 2
j =0
2t 1 $ ! 2t
2
= ! Var ( X 0 ) + #
1$! 2
If µ0 = µ then µt = µ + ,t (µ0 – µ) = µ.
!2 1 # ! 2t 2t "
2
"2
If Var (X0) = then Var ( X t ) = " 2 + ! =
1#" 2 1#! 2 1#! 2 1#! 2
Neither µt nor Var(Xt) depend on t. We also require that |,| < 1 so that the AR(1) process be stationary,
in which case
!2 2t # !2 $
µt = µ + " t (µ0 % µ ) AND Var ( X t ) % = " & Var ( X 0 ) % '
1%" 2 ( 1%" 2 )
• If |,| < 1, both terms will decay away to zero for large t
! X is almost stationary for large t
• Equivalently, if we assume that the process has already been running for a very long time, it
will be stationary
• Any AR(1) process with infinite history and |,| < 1 will be stationary:
(1 – ,B)(Xt – µ) = -t
• So
#
X t = µ + %! j " t $ j
j =0
• We must calculate the autocovariance 'x = Cov(Xt , Xt+k) and show that this depends only on the
lag k. We need properties of covariance:
Cov(X,e) = 0
Xt Xt-1
Cov(-t, Xt-1) = 0
Cov(-t, Xt-k) = 0, k(1
Cov(-t, Xt) = )2
In general,
Hence,
2
'k = ,k'0 = ,k ! for k ( 0
1#" 2
!2 # !12
"1 = !1 and "2 =
1 # !12
Here +1 = *1 = , and
! #!
2 2
"2 = =0
1#! 2
rt= µ + ,(rt-1-µ) + et
Recall that the AR(p) model can be written either in its generic form
Explanation for this result: write the AR(p) process in the form
IFF |z1| > 1. In the AR(p) case, we need to be able to invert all of the factors
! B"
#1- $
% zi &
This will be the case IFF |zi| > 1 for i = 1,2, ..., p.
Example : AR(2)
Answer: we have
Xt = µ + , (Xt-1 – µ) + et.
i.e. (1 – ,B)(Xt – µ) = et, so 1 – ,z = 0 is the characteristic equation with solution z = 1/,. So |,| < 1 is
equivalent to |z| > 1, as required.
In the AR(1) model, we had '1 = ,'0 and '0 = )2. These are a particular case of the Yule-Walker
Equations for AR(p):
c. Yule-Walker equations
%! 2 , if k=0
" k = #1" k $1 + # 2" k $2 + ... + # p" k $ p + ' , for 0 & k & p
( 0, otherwise
Considering the AR(1) (i.e. p = 1), for k = 1, we get '1 = ,'0, and for k = 0, we get '0 = )2.
Example (p=3):
Yule-Walker Equations:
'0 = 0.6'1 + 0.4'2 – 0.1'3 + )2 (0)
'1 = 0.6'0 + 0.4'1 – 0.1'2 (1)
'2 = 0.6'1 + 0.4'0 – 0.1'1 (2)
'3 = 0.6'2 + 0.4'1 – 0.1'0 (3)
56 54
From (2), '2 = 0.4'0 + 0.56'1, hence '1 = '0, and hence '2 = '0.
65 65
483
From (3), '3 = '0
650
*0 = 1
*1 = . / (1+.2)
*k = 0 for k > 1
Since the mean E(Xt) and covariance 'k = E(Xt, Xt-k) do not depend on t, the MA(1) process is (weakly)
stationary - for all values of the parameter &.
However, we require MA models to be invertible and this imposes conditions on the parameters.
(1 – ,B)(Xt – µ) = et,
Xt – µ = (1 + .B)et
or
(1 + .B)-1(Xt – µ) = et
i.e.
Xt-µ – .(Xt-1 – µ) + .2(Xt-2 – µ) + ... = et
So an MA(1) process is represented as an AR(&) one – but only if |.| < 1, in which case the MA(1)
process is invertible.
0.5 2
*1 = ./(1+.2) = = = 0.4,
1 + (0.5) 1 + 22
2
So both models have the same ACF. However, only the model with .=0.5 is invertible.
= ...
= Xn – µ – .(Xn-1 – µ) + .2(Xn-2 – µ) ... + (-.)n-1(X1 – µ) + (-.)ne0
Note:
For an MA(1) process, we have *k = 0 for k > 1, so for an MA(1) process, the ACF “cuts off” after lag1.
It may be shown that PACF “tails off” to zero.
AR(1) MA(1)
since the only non-zero terms occur when the subscripts of et-i and et-j-k match, i.e. when i = j+k, for k '
q.
'1 = Cov (1 + en – 5en-1 + 6en-2, 1 + en-1 – 5en-2 + 6en-3) = (-5)(1) + (6)(-5) = -35
Recall that an AR(p) process is stationary IFF roots z of the characteristic eq satisfy |z| > 1. For an
MA(q) process , we have
Consider the equation 1 + .1z + .2x2 + ... + .pzp = 0. The MA(q) process is invertible IFF all roots z of
this equation satisfy |z| > 1.
In summary:
The characteristic equation is 1 – 5z + 6z2 = 0 with roots (1-2z)(1-3z) = 0, i.e. roots z = / and z = 1/3
! Not invertible
Recall that the ARMA(p,q) model can be written either in its generic form
Xt = µ + ,1(Xt-1 – µ) + ... ,p(Xt-p –µ) + et + .1et-1 + ...+ .qet-q
where
If 0 (2) and 1 (2) have factors in common, we simplify the defining relation.
Xt = ,Xt-1 + et – ,et-1
or
Dividing through by (1 – ,B), we obtain Xt = et. Therefore the process is actually an ARMA(0,0), also
called white noise.
We assume that 0(2) and 1(2) have no common factors. Properties of ARMA(p,q) are a mixture of
those of AR(p) and those of MA(q).
• ARMA(p,q) is stationary IFF all the roots z of 1 – ,1z ... – ,pzp = 0 satisfy |z| > 1
• ARMA(p,q) is invertible IFF all the roots z of 1 – .1z ... – .pzq = 0 satisfy |z| > 1
Example: the ARMA(1,1) process Xt = ,Xt-1 + et + .et-1 is stationary if |,| < 1 and invertible if |.| < 1.
Example: ACF of ARMA(1,1). For the model given by Xt = ,Xt-1 + et + .et-1 we have
Cov(et, Xt-1) =0
Cov(et, et-1) =0
= , )2 + 0 + . )2 = (, + .) )2
'0 = Cov(Xt,Xt) = , Cov(Xt,Xt-1) + Cov(Xt,et) + . Cov(Xt,et-1)
= ,'1 + )2 + . (,+.) )2
= ,'1 + (1 + ,. + .2) )2
'1 = Cov(Xt-1,Xt)
= ,'0 + .)2
For k > 1,
'k = Cov(Xt-1,Xt)
= , 'k-1
1 + 2!" + " 2 2
'0 = #
1-! 2
(!+")(1 + !") 2
'1 = #
1-! 2
!1 (1+"#)("+#)
Hence = 2
, *k = ,k-1*1, for k > 1 (compare *k = ,k, for k ( 0 for AR(1)).
!0 1+2"#+#
For (stationary) ARMA(p,q),
The roots of
a. Non-ARMA processes
• Given time series data X1 ... Xn, find a model for this data.
• Compare with known ACF/PACF of class of ARMA models to select suitable model.
• All ARMA models considered are stationary – so can only be used for stationary time series
data.
• Take the “inverse transform” of this model as model for the original non-stationary time series.
Question: Given X0, X1 ...Xn the first order differences are wi = xn – xi-1 , i = 1, ... , N
From the differences w1, w2, ..., wN and x0 we can calculate the original time series:
w1 = x1 – x0 , so x1 = x0 + w1
w2 = x2 – x1 , so x2 = x1 + w2
= x0 + w1 + w2, etc.
The inverse process of differencing is integration, since we must sum the differences to obtain the
original time series.
Example: If the first differences xn = xn – xn-1 of x1, x2 ... xn are modelled by an AR(1) model
(stationary)
Then, Xn – Xn-1 = 0.5(Xn-1 – Xn-2) + en, so Xn = 1.5Xn-1 – 0.5Xn-2 +en is the model for the original time
series.
This AR(2) model is non-stationary since written as (1 – 1.5B + 0.5B2)Xn = en, for which the
characteristic equation is:
1 – 1.5z + 0.5z2 = 0
with roots z = 1 and z = 2. The model is non-stationary since |z| > 1 does not hold for BOTH roots.
We have
t
Xt = X0 + !e
j=1
j
So E(Xt) = E(X0), if E(et) = 0, but Var(Xt) = Var(X0) + t)2. Hence Xt is non-stationary, but !Xt = et,
where et is a stationary white noise process.
Now consider the daily returns Yt – Yt-1 = ln(Zt/Zt-1). Since Yt – Yt-1 = µ + et and the et’s are independent,
then Yt – Yt-1 is independent of Y1 ...Yt-1 or ln(Zt/Zt-1) is independent of past prices Z0, Z1, ... Zt-1.
Example: Recall the example of Qt = consumer price index at time t. We have
rt = µ + , (rt-1 – µ) + et
ln(Qt/Qt-1) = µ + , (ln(Qt/Qt-1) - µ) + et
!ln(Qt) = µ + ,(!ln(Qt-1) – µ) + et
If
• X needs to be differenced at least d times to reduce it to stationarity,
d
• and Y = ! X is stationary ARMA(p,q),
then
X is an ARIMA(p,d,q) process.
! Model is ARIMA(2,1,1)
Note: if !dXt is ARMA(1,q), to check for stationarity, we only need to see that |,1| < 1.
II.9 The Markov Property
AR(1) Model:
Xt = µ + ,(Xt-1 – µ) + et
AR(2) Model:
Xt = µ + ,1 (Xt-1 – µ) + ,2 (Xt-2 - ) + et
Consider now
or
! Xn " T
Define Yn = # $ =(X n ,X n-1 )
% X n-1 &
• Notation: Var(1)
• In general, AR(P) does not have the Markov property for p > 1, but Y = (Xt, Xt-1, ... Xt-p+1)T does
• Recall: Random walk – ARIMA(0,1,0) defined by Xt – Xt-1 = et has independent increments and
hence does have the Markov property
It may be shown that for p+d > 1, ARIMA(p,d,0) does not have the Markov property, but Yt = (Xt, Xt-1,
..., Xt-p-d+1)T does.
Consider the MA(1) process Xt = µ + et + .et-1. It is clear that “knowing Xn will never be enough to
deduce the value of en, on which the distribution of Xn+1 depends”. Hence an MA(1) process does not
have the Markov property.
Now consider an MA(q) = AR(&) process. It is known that AR(p) processes Y = (Xt, Xt-1, ...Xt-p+1)T
have the Markov property if considered as a p-dimensional vector process (p finite). It follows that an
MA(q) process has no finite dimensional Markov representation.
Question: Associate a vector-valued Markov process with 2Xt = 5Xt-1 – 4Xt-2 + Xt-3 + et
We have
!2 Xt = !2Xt-1 + et
en = 1 with probability /
-1 with probability /
P(Xn = 2 | Xn-1 = 0)
=//=3
=0
! Not Markov: since the two probabilities differ, value of Xn does not depend on the immediate
past n-1 only.
III. Non-stationarity: trends and techniques
III.1 Typical trends
Example:
+1, probability 0.6
Xn = Xn-1 + Zn, where Zn =
-1, probability 0.4
Here Xn is I(1) , since Zn = Xn – Xn-1 is stationary. Also, E(Xn) = E(Xn-1) + 0.2, so the process has a
deterministic trend.
Many techniques allow to detect non-stationary series; among the simplest methods:
• Plot of time series against t
• Sample ACF
The sample ACF is an estimate of the theoretical ACF, based on the sample data and is defined later. A
plot of the time series will highlight a trend in the data and will show up any cyclic variation.
t t
2003 | 2004
Trend + Seasonal
Xt
Recall: For a stationary time series, *k % 0 as k % &, i.e. (theoretical) ACF converges toward zero.
Hence, the sample ACF should also converge toward zero. If the sample ACF decreases slowly, the
time series is non-stationary, and needs to be differenced before fitting a model.
Sample ACF Sample ACF
rk rk
1 k 1 12
If sample ACF exhibits periodic oscillation, there is probably a seasonal pattern in the data. This
should be removed before fitting a model (see Figures 7.3a and 7.3b). The following graph (Fig 7.3(a))
shows the number of hotel rooms occupied over several years. Inspection shows the clear seasonal
dependence, manifested as a cyclic effect.
The next graph (Fig 7.3(b)) shows the sample autocorrelation function for this data. It is clear that the
seasonal effect shows up as a cycle in this function. In particular, the period of this cycle looks to be 12
months, reinforcing the idea that it is a seasonal effect.
!"#$%&'()*+'
Seasonal variation- hotel room occupancy (7.3a) 1963-1976 and its sample ACF (7.3b)
• Seasonal differencing
• Method of Moving Averages
• Method of seasonal means
Fit a model,
Xt = a + bt + Yt
ˆ
ŷt = x t ! (aˆ - bt)
Note: least squares may also be used to remove nonlinear trends from a time series. It is naturally
possible to model any observed nonlinear trend by some term "(t) within
Xt = "(t) + Yt
which can be estimated using least squares. For example, a plot of hourly data of daily energy loads
against temperature, over a one-daytime frame, may indicate quadratic variations over the day; in this
case one could use "(t) = a + bt2.
III.3 Differencing
Use differencing if the sample ACF decreases slowly. If there is a linear trend, e.g. xt = a + bt + yt, then
"xt = xt ! xt !1 = b + " yt ,
so differencing has removed the linear trend. If xt is I(d), then differencing xt d times will make it
stationary.
However, if we remove the trend using linear regression we will still be left with an I(1) process that is
non-stationary.
Example:
+1, prob. 0.6
Xn = Xn-1 + Zn, where Zn =
-1, prob. 0.4
E(X2) = 0.2(2)
E(Xn) = 0.2(n).
Let Yn = Xn – 0.2(n). Then E(Yn) = 0, so we have removed the linear trend but
= Zn – 0.2
Hence Yn is a random walk (which is non-stationary) and !Yn is stationary, so Yn is an I(1) process.
b. Selection of d
How many times (d) do we have to difference the time series Xt to convert it to stationarity? This will
determine the parameter d in the fitted ARIMA(p,d,q) model.
• Trend
• Cycle
• Time series is an integrated series
We are assuming that linear trends and cycles have been removed, so if the plot of the time series and its
SACF indicate non-stationarity, it could be that the time series is a realisation of an integrated process
and so must be differenced a number of times to achieve stationarity.
• Look at the SACF. If the SACF decays slowly to zero, this indicates a need for differencing (for
a stationary ARMA model, the SACF decays rapidly to zero).
• Look at the sample variance of the original time series X and its difference.
Let !ˆ 2 be the sample variance of z ( d ) =! d x . It is normally the case that !ˆ 2 first decreases with d until
stationarity is reached, and then starts to increase, since differencing too much introduces correlation.
Take d equal to the value that minimises !ˆ 2 .
!ˆ 2 5 5
5
5
5 5 5 5
0 1 2 3 d
In the above example, take d=2, which is the value for which the estimated variance is minimised.
Example: Let X be the monthly average temperature in London. Suppose that the model
xt = µ + 4t + yt
applies, where 4t is a periodic function with period 12 and yt is stationary. The seasonal difference of X
is defined as:
( "12 x )t = xt – xt !12
Hence xt – xt-12 is a stationary process. We can model xt – xt-12 as a stationary process and thus get a
model for xt.
Example: In the UK, monthly inflation figures are obtained by seasonal differencing of the retail prices
index (RPI). If xt is the value of RPI for month t, then annual inflation figure for month t is
x t - x t-12
!100%
x t-12
Remark 1: the number of seasonal differences taken is denoted by D. For example, for the seasonal
differencing X t ! X t !12 = "12 X t we have D=1.
Remark 2: in practice, for most time series we would need at most d=1 and D=1.
This method makes use of a simple linear filter to eliminate the effects of periodic variation. If X is a
time series with seasonal effects with even period d = 2h, we define a smoothed process Y by
1 !1 1 "
yt = # xt -h + xt -h +1 + ... + xt -1 + xt + ... + xt + h -1 + xt + h $
2h % 2 2 &
Example with quarterly data: A yearly period will have d = 4 = 2h, so h = 2, and
yt = 3 ( / xt-2 + xt-1 + xt + xt+1 + / xt+2)
This is a centred moving average, since the average is taken symmetrically around the time t. Such an
average can only be calculated retrospectively.
For odd periods d = 2h + 1, the end terms xt-h and xt+h need not be halved:
1
yt = ( x t-h +x t-h+1 +...+x t-1 +x t +...+x t+h-1 + x t+h )
2h + 1
Example: with data every 4 months, a yearly period will have d = 3 = 2h+1, so h = 1 and
to a monthly time series, x extending over 10 years from January 1990, the estimate of µ is x (the
average over all 120 observations) and the estimate of 4January is
1
!ˆ January = (x1 +x13 +...+x109 )-x ,
10
the difference between the average value for January, and the overall average over all the months.
Recall that 4t is a periodic function with period 12 and yt is stationary. Thus, 4t contains the deviation of
the model (from the overall mean µ) at time t due to the seasonal effect.
Filtering and exponential smoothing techniques are commonly applied to time series in order to “clean”
the original series from undesired artifacts. The moving average is an example of a filtering technique.
Other filters may be applied depending on the nature of the input series.
Exponential smoothing is another common set of techniques. It is used typically to “simplify” the input
time series by dampening its variations so as to retain in priority the underlying dynamics.
III.8 Transformations
yi = .0 + .ixi + ei
where ei ~ IN (0,)2), we use regression diagnostic plots of the residuals, eˆi , to test the assumptions
about the model (e.g. the normality of the error variables ei or the constant variance of the error variables
ei). To test the later assumption we plot the residuals against the fitted values.
eˆi
x x x xx
x x x x x
0
x x x x x x x
x x x x
ŷi
If the plot does not appear as above, the data is transformed, and the most common transformation is the
logarithmic transformation.
Similarly, if after fitting an ARMA model to a time series xt, a plot of the “residuals” versus the “fitted
values” indicates a dependence, then we should consider modelling a transformation of the time series
xt and the most common transformation is the logarithmic Transformation
Yt = ln(Xt)
IV. Box-Jenkins methodology
IV.1 Overview
We consider how to fit an ARIMA(p,d,q) model to historical data {x1, x2, ...xn}. We assume that trends
and seasonal effects have been removed from the data.
If the tentatively identified model passes the diagnostic tests, it can be used for forecasting.
If it does not, the diagnostic tests should indicate how the model should be modified, and a new cycle of
• Identification
• Estimation
• Diagnostic checks
is performed.
Recall: in a simple linear regression model, yi = .0 + .1Xi + ei, ei ~ IN(0,)2), we use regression
diagnostic plots of the residuals eˆi to test the goodness of fit of the model, i.e. if the assumptions
ei ~ IN(0,)2) are justified.
The error variables ei form a zero-mean white noise process: they are uncorrelated, with common
variance )2.
E (et ) = 0 $t
%! 2 , k = 0
" k = Cov(et , et #k ) = &
' 0, otherwise
Thus the ACF and PACF of a white noise process (when plotted against k) look like this:
ACF (*k) PACF ( !ˆk )
1 1
| | | | | | | | | | | |
1 2 3 ... k 1 2 3 ... k
-1 -1
i.e. apart from *0 = 1, we have *k = 0 for k = 1,2,... and !k = 0 for k = 1, 2,...
Question: how do we test if the residuals from a time series model look like a realisation of a white
noise process?
Answer: we look at the SACF and SPACF of the residuals. In studying the SACF and SPACF, we
realise that even if the original process was white noise, we would not expect rk = 0 for k = 1, 2,… and
!k = 0 for k = 1, 2,… as rk is only an estimate of *k and !ˆk is only an estimate of !k .
Question: how close to 0 should rk and !ˆk be, if rk = 0 for k = 1, 2, … and !ˆk = 0 for k = 1, 2, …?
Answer: If the original model is white noise, Xt = µ + et, then for each k, the SACF and SPACF satisfy
! 1$ ! 1$
rk ~ N # 0, & and !ˆk ~ N # 0, &
" n% " n%
This is true for large samples, i.e. for large values of n.
! 2 2 "
Values of rk or !ˆk outside the range $ # , % can be taken as suggesting that a white noise model is
& n n'
inappropriate.
However, these are only approximate 95% confidence intervals. If *k = 0, we can be 95% certain that rk
lies between these limits. This means that 1 value in 20 will lie outside these limits even if the white
noise model is correct.
Hence a single value of rk or !ˆk outside these limits would not be regarded as significant on its own, but
three such values might well be significant.
There is an overall Goodness of Fit test, based on all the rk’s in the SACF, rather than on individual rk’s,
called the Portmanteau test by Ljung and Box. It consists in checking whether the m sample
autocorrelation coefficients of the residuals are too large to resemble those of a white noise process
(which should all be negligible).
Given residuals from an estimated ARMA(p,q) model, under the null hypothesis that all values of rk = 0,
and the Q-statistic is asymptotically #2-distributed with s = m – p – q degrees of freedom, or, if a
constant (say µ) is included, s = m – p – q – 1 degrees of freedom.
That is, under the null hypothesis that all values of rk = 0, the Q-statistic given above is asymptotically
#2-distributed with m degrees of freedom. If the Q-statistic is found to be greater than the 95th percentile
of that #2 distribution, the null hypothesis is rejected, which means that the alternative hypothesis that “at
least one autocorrelation is non-zero” is accepted. Statistical packages print these statistics. For large n,
the Ljung-Box Q-statistic tends to closely approximate the Box-Pierce statistic:
m
rk 2 m
2
n(n + 2) " ! n "r k
k =1 n - k k =1
The overall diagnostic test is therefore performed as follows (for centred realisations):
• Fit ARMA(p,q) model
• Estimate (p+q) parameters
• Test if
m
rk2 2
Q = n(n + 2) "n!k ~ ! m! p!q
k=1
Remark: the above Ljung-Box Q-statistic was first suggested to improve upon the simpler Box-Pierce
test statistic
m
Q = n! rk2
k =1
which was found to perform poorly even for moderately large sample sizes.
b. Identification of MA(q)
Recall: for an MA(q) process, #k = 0 for all k > q, i.e. the “ACF cuts off after lag q”.
To test if an MA(q) model is appropriate, we see if rk is close to 0 for all k > q. If the data do come from
an MA(q) model, then for k > q (since the first q+1 coefficients are significant),
" 1" q %%
rk ~ N $$ 0, $$1+ ! 2 !i2 ''''
# n # i=1 &&
" 1$ q
1$ q #
2% 2%
' &1.96 )1 + 2/ !i * , +1.96 )1 + 2/ !i * (
'- n+ i =1 , n+ i =1 , (.
(note that it is common to use 2 instead of 1.96 in the above formula). We would expect 1 in 20 values to
lie outside the interval. In practise, the #i’s are replaced by ri’s. The “confidence limits” on SACF plots
are based on this. If rk lies outside these limits it is “significantly different from zero” and we conclude
that #k $ 0. Otherwise, rk is not significantly different to zero and we conclude that #k = 0.
SACF
---
---
---
rk
1 2 k
---
---
---
For q=0, the limits for k=1 are
as for testing for white noise model. Coefficient r1 is compared with these limits. For q = 1, the limits
for k = 2 are
! 1 2 1 2
"
$ #1.96 (1 + 2r1 ),1.96 (1 + 2r1 ) %
& n n '
and r2 is compared with these limits. Again, 2 is often used in place of 1.96.
c. Identification of AR(p)
Recall: for an AR(p) process, we have !k = 0 for all k > p, i.e. the “PACF cuts off after lag p”.
To test if an AR(p) model is appropriate, we see if the sample estimate of !k is close to 0 for all k > p. If
the data do come from an AR(p) model, then for k > p,
! 1$
!ˆk ~ N # 0, &
" n%
and 95% of the sample estimates should lie in the interval
! 2 2 "
$# , %
& n n'
The “confidence limits” on SPACF plots are based on this: if the sample estimate of !k lies outside these
limits, it is “significant”.
0.2
-0.2 0.0
5 10 15
Lag k
IV.3 Model fitting
• An appropriate value of d has been found and {zd+1, zd+2, ... zn} is stationary.
• For simplicity, we assume that d = 0 (to simplify upper and lower limits of sums).
• If the SACF appears to cut off after lag q, an MA(q) model is indicated (we use the tests of
significance described previously).
• If the SPACF appears to cut off after lag p, and AR(p) model is indicated.
If neither the SACF nor the SPACF cut off, mixed models must be considered, starting with
ARMA(1,1).
Having identified the values for the parameters p and q, we must now estimate the values of the
parameters (1, (2, ... (p and &1, &2, ..., &q in the model
Least squares (LS) estimation is equivalent to maximum likelihood (ML) estimation if et is assumed
normally distributed.
Example: in the AR(p) model, et = Zt – (1Zt-1 – ... – (pZt-p. The estimators !ˆ1 ,...,!ˆ p are chosen to
minimise
n
For general ARMA models, êt cannot be deduced from the zt. In the MA(1) model for instance,
We can solve this iteratively for êt as long as some starting value ê0 is assumed. For an ARMA(p,q)
model, the list of starting values is ( ê0 , ê1 , ..., êq!1 ). The starting values are estimated recursively by
backforecasting:
0. Assume ( ê0 , ê1 , ..., êq!1 ) are all zero
1. Estimate the (i and &j
2. Use forecasting on the time-reversed process {zn, ..., z1} to predict values for ( ê0 , ê1 , ..., êq!1 )
3. Repeat cycle (1)-(2) until the estimates converge.
• Calculate theoretical ACF or ARMA(p,q): #k’s will be a function of the (’s and &’s.
• Set #k = rk and solve for the (’s and &’s. These are the method of moments estimators.
xn = en + .en-1 , en ~ N(0,1)
ˆ
We have r1 = ! 1 = -0.25.
!ˆ0
Recall: the MA(1) process is invertible IFF |.| < 1. So for . = -0.268, the model is invertible. But for . =
-3.732 the model is not invertible.
Note: If !ˆ1 = -0.5 here, then #1 = r1 = ! = -0.5, which gives (. + 1)2 = 0, so . = -1, and neither
1+! 2
estimate gives an invertible model.
Recall that in the simple linear model Yi = &0 + &1Xi + ei, ei ~ IN(0, )2), )2 is estimated by
n
1
!ˆ 2 = " eˆ 2
i
n-2 i =1
1 n 2
!ˆ 2 = $ eˆt
n t = p +1
n
1
= $ ( z - "ˆ z t 1 t -1 -...- "ˆ p zt - p - #ˆ1eˆt -1 -...- #ˆq eˆt -q )
n t = p +1
No matter which estimation method is used this parameter is estimated last, as estimates of the (’s and
.’s are required first.
Note: In using either Least Squares or Maximum Likelihood Estimation we also find the residuals, ê t ,
whereas using the Method of Moments to estimate the ,’s and .’s these residuals have to be calculated
afterwards.
Note: for large n, there will be little difference between LS, ML and Method of Moments estimators.
d. Diagnostic checking
Assume we have identified a tentative ARIMA(p,d,q) model and calculated the estimates
ˆ "ˆ 1 , ... "ˆ p , #ˆ 1, ... ,#ˆ q .
ˆ !,
µ,
We must perform diagnostic checks based on the residuals. If the ARMA(p,q) model is a good
approximation to the underlying time series process, then the residuals ê t will form a good
approximation to a white noise process.
If the SACF or SPACF of the residuals has too many values outside the interval !$ # 1.96 , 1.96 "% we
& n n'
conclude that the fitted model does not have enough parameters and a new model with additional
parameters should be fitted.
The Portmanteau test may also be used for this purpose. Other tests are:
plot ê t against t
•
• plot ê t against zt
any patterns evident in these plots may indicate that the residuals are not a realisation of a set of
independent (uncorrelated) variables and so the model is inadequate.
(III) Counting Turning Points:
This is a test of independence. Are the residuals a realisation of a set of independent variables?
Possible configurations for a turning point are:
In the diagram above, there exists a turning point for all configurations except (a) and (b). Since four out
of the six possible configurations exhibit a turning point, the probability to observe one is 4/6 = 2/3.
If y1, y2, ..., yn is a sequence of numbers, the sequence has a turning point at time k if
either
yk-1 < yk AND yk > yk+1
or
yk-1 > yk AND yk < yk+1
therefore, the number of turning points in a realisation of Y1, Y2, ... YN should lie within the 95%
confidence interval:
!2 $ 16 N # 29 % 2 $ 16 N # 29 % "
& ( N # 2) # 1.96 ( ) , ( N # 2) + 1.96 ( )'
,& 3 * 90 + 3 * 90 + -'
Recall: the spectral density function on white noise process is f(#) = )2/2$ , -$ < # < $. So the sample
spectral density function of the residuals should be roughly constant for a white noise process.
V. Forecasting
V.1 The Box-Jenkins approach
Having fitted an ARMA model to {x1, x2, ... xn} we have the equation:
?
• • •
1 2 n n+k time
In the Box-Jenkins approach, x̂ n (k) is taken as E(Xn+k | X1 , ... , Xn), i.e. x̂ n (k) is the conditional
expectation of the future value of the process, given the information currently available.
From result 2 in ST3053 (section A), we know that E(Xn+k | X1 , ... , Xn) minimises the mean square error
E(Xn+k – h( X1 , ... , Xn))2 of all functions h(X1 , ... , Xn).
• Replace random variables X1, ..., Xn by their observed values x1 , ... , xn.
• Replace random variables Xn+1 , ... , Xn+k-1 by their forecast values, x̂ n (1) , ... , x̂ n (k-1)
x n+k - xˆ n (k )
This is needed for confidence interval forecasts as it is more useful than a point estimate.
For stationary processes, it may be shown that x̂ n (k) ! µ as k ! " . Hence, the variance of the
forecast error tends to E(xn+k-µ)2 = )2 as k % &, where )2 is the variance of the process.
ˆ ˆ n (1) ! µˆ
xˆ n (2) = 2xˆ n (1) ! x n + zˆ n (2) = 2xˆ n (1) ! x n + µˆ +!(z
• The Box-Jenkins method requires a skilled operator in order to obtain reliable results.
• For cases where only a simple forecast is needed, exponential smoothing is much simpler (Holt,
1958).
A weighted combination of past values is used to predict future observations. For example, the first
forecast for an AR model is obtained by
( 2
xˆn (1) = ! xn + (1 " ! ) xn "1 + (1 – ! ) xn "2 + ... )
or
"
!
xˆn (1) = ! # (1- ! )i xn-i = xn
i =0 1- (1- ! ) B
!
!
• The sum of the weights is ! " (1-!)i = =1
i=0 1-(1-!)
• Generally we use a value of , such that 0 < , < 1, so that there is less emphasis on historic values
further back in time (usually, 0.2 6 , 6 0.3).
• There is only one parameter to control, usually estimated via least squares.
Xn-1 Xn Xn+1
5
?
5
| | |
n-1 n n+1
A linear filter is a transformation of a time series {xt} (the input series) to create an output series {yt}
which satisfies:
!
yt = #a k x t-k .
k= "!
The objective of the filtering is to modify the input series to meet particular objectives, or to display
specific features of the data. For example, an important problem in analysis of economic time series is
detection, isolation and removal of deterministic trends.
In practice, a filter {ak : k % Z} normally contains only a relatively small number of non-zero
components.
Example: regular differencing. This is used to remove a linear trend. Here a0 = 1, a1 = -1, ak = 0
otherwise. Hence yt = xt – xt-1.
Example: if the input series is a white noise and the filter takes the form {.0 = 1, .1, ... , .q}, then the
output series is MA(q), since
q
yt = " ! k et -k
k =0
If the input series, x, is AR(p), and the filter takes the form {,0 = 1, -,1, ... , -,p}, then the output series is
white noise
p
yt = xt " # ! k xt -k = et
k =1
VI. Multivariate time series analysis
X adjusted for Z:
-1
#$ X = µ X + ! ( Z " µ Z )
X ! E( X | Z ) = ( X ! µ X ) ! " XZ "ZZ ( z - µZ ) % ˆ -1
$& ! = ' XZ ' ZZ
Y adjusted for Z:
-1
Y ! E(Y | Z ) = (Y ! µY ) ! "YZ "ZZ ( z - µZ )
Partial Covariance :
-1 -1
Cov "$( X ! µ X ) ! & XZ & ZZ ( z - µ Z ), (Y ! µY ) ! & YZ & ZZ ( z - µ Z ) #%
-1 -1
= E " "$( X ! µ X ) ! & XZ & ZZ ( z - µ Z ) #% "$(Y ! µY ) ! & YZ & ZZ ( z - µ Z ) #% #
$ %
-1 -1
= E "$( X ! µ X )(Y ! µY ) ! & XZ & ZZ ( z - µ Z )( z - µ Z ) -1 & ZZ & YZ
#
%
!1
= & XY ! & XZ & & ZZ ZY
Variance:
-1
Var "$( X ! µ X ) ! & XZ & ZZ ( z - µZ ) #%
-1 -1
= E " "$( X ! µ X ) ! & XZ & ZZ ( z - µZ ) #% "$( X ! µ X ) ! & XZ & ZZ ( z - µZ ) #% #
$ %
!1
= & XX ! & XZ & & ZZ ZX
and
-1 !1
Var "$(Y ! µY ) ! & YZ & ZZ ( z - µ Z ) #% = & YY ! & YZ & & ZZ ZY
P( X , Y | Z ) =
! -! ! !
XY XZ ZZ ZY
-1 -1
! XX
- ! ! ! ! -! ! !
XZ ZZ ZX YY YZ ZZ ZY
Now substituting
X = Xt 8XY = '2
we get
! 2 $ ! 1! 0-1! 1
P( X t , X t + 2 X t +1 ) =
! 0 $ ! 1! 0-1! 1
2
!2 %! &
$' 1 (
!0 ) !0 *
= 2
% !1 &
1$ ' (
) !0 *
" $ "2
= 2 21 = #2
1 $ "1
1 "1
"1 "2
=
1 "1
"1 1
A univariate time series consists of a sequence of random variables Xt, where Xt is the value of the
single variable X of interest at time t.
An m-dimensional multivariate time series consists of a sequence of random vectors X1, X2 , ... There
are m variables of interest, denoted X(1) , ..., X(m), and Xt(m) is the value of X(m) at time t.
X1 ... Xt ... Xn
single
variable X
X1 ... Xt ... Xn
. . .
The vector process {Xt} is weakly stationary if E(Xt) and Cov(Xt, Xt+k) are independent of t. Let µ
denote the common mean vector E(Xt) and 8k denote the common lag k covariance matrix, i.e.
8k = Cov(Xt, Xt+k)
%k = #! ... !
$
# $
X
(m)
#&* ... *$
'
Example: Multivariate White Noise. Recall that univariate white noise is a sequence e1, e2, ... of random
variables with E(et) = 0 and Cov(et, et+k) = )2 1(k=0) (where 1(.) is the indicator function). Multivariate
white noise is the simplest example of a multivariate random process.
Let e1 , e2 , ... be a sequence of independent, zero-mean random vectors, each with the same covariance
matrix 8. Thus for k = 0, the lag k covariance matrix of the et’s is 8k = 8.
But since the et’s are independent vectors 8k= 0 for k > 0.Thus 8 need not be a diagonal matrix, i.e.
the components of et at time t need not be independent of each other. However, the et’s are independent
vectors -- the components of et and et+k are independent for k > 0.
Example: Let it denote the interest rate at time t and It the tendency to invest at time t. We might believe
these two are related as follows:
where e(i) and e(I) are zero-mean, univariate white noise. They may have different variances and are not
necessarily uncorrelated, i.e. we do not require that Cov (et(i) , et(I) ) = 0 for any t. However, we do require
Cov (et(i) , es(I) ) = 0 for s 9 t.
X t = µ + A( X t !1 – µ ) + et
we have
t -1
X t = µ + ! A j et - j + At ( X 0 - µ )
j =0
In order that X should represent a stationary time series, the powers of A should converge to zero in
some sense: this will happen if all eigenvalues of the matrix A are less than 1 in absolute magnitude.
Recall eigenvalues (see appendix): 2 is an eigenvalue of the n x n matrix A if there is a non-zero vector
x (called the eigenvector) such that
Ax = 2x or (A – 2I) x = 0
These equations have a non-zero solution x IFF | A – 2I | = 0. This equation is solved for 2 to find the
eigenvalues.
2"! 1
Example: Find the eigenvalues of !# 2 1 "$ . Solution: Solve = 0 which is
% 4 2& 4 2"!
x
! x t " ! 0.3 0.5 " ! x t-1 " ! e t "
# $=# $# $+# y $
% y t & % 0.2 0.2 & % y t-1 & #% e t $&
! 2 = 0.57, -0.07
Question: Write the model in question 7.18 in terms of Xt only. Show that Xt is stationary in its own
right. Solution: The model can be written as:
"# X t = 0.3 X t !1 + 0.5Yt !1 + et X (1)
$ Y
#%Yt = 0.2 X t -1 + 0.2Yt -1 + et (2)
Rearranging (1): Yt-1 = 2(Xt – 0.3Xt-1 – etX) so Yt = 2(Xt+1 – 0.3Xt – et+1X). Substituting for Yt and Yt-1 in
(2) and tidying up:
Since the white noise terms do not affect stationarity, the characteristic equation is
1 – 0.5 2 – 0.04 22 = 0
Since the model can be written as (1 – 0.5B -0.04B2)Xt = ..., the roots of the characteristic equation are 21
= -14.25 and 22 = 1.75. Since | 2 | > 1 for both roots, the Xt process is stationary.
Example: A 2-dimensional VAR(2). Let Yt denote the national income over a period of time, Ct the total
consumption over the same period, and It the total investment over the same period. We assume Ct = ,Yt-
(1) (1)
1 + et , where e is a zero-mean white noise (consumption over a period depends on the income over
the previous period).
We assume It = . (Ct-1 – Ct-2) + et(2), where e(2) is another zero-mean white noise.
$ Ct % $ ! ! % $ Ct #1 % $ 0 0 % $ Ct #2 % $ et(1) %
& '=& '& '+& '& '+& '
( It ) ( " 0 ) ( It #1 ) ( #" 0 ) ( It #2 ) ( et(2) )
VI.3 Cointegration
Cointegrated time series can be applied to analyse non-stationary multivariate time series.
For univariate models, we have seen that a stochastic trend can be removed by differencing, so that the
resulting time series can be estimated using the univariate Box-Jenkins approach. In the multivariate
case, the appropriate way to treat non-stationary variables is not so straightforward, since it is possible
for there to be a linear combination of integrated variables that is stationary. In this case, the variables
are said to be cointegrated. This property can be found in many econometric models.
Definition: Two time series X and Y are called cointegrated if:
Thus X and Y are themselves non-stationary, (being I(1)), but their movements are correlated in such a
way that a certain weighted average of the two processes is stationary. The vector (, , .) is called a
cointegrating vector.
Remarks:
R1 – Any equilibrium relationship among a set of non-stationary variables indicates that the variables
cannot move independently of each other, and implies that their stochastic trends must be linked. This
linkage implies that the variables are cointegrated.
R2 – If the linear relationship (as made obvious by cointegration) is already stationary, differencing the
relationship entails a misspecification error.
R3 – There are two main popular tests for cointegration, but they are not the only ones.
Reference: see e.g. Enders, “Applied Econometric Time Series”, Wiley 2004.
Example: Let Xt denote the U.S. Dollars/GB Poung exchange rate. Let Pt be the consumer price index
for the U.S. and Qt the consumer price index for the U.K.
It is assumed that Xt fluctuates around the purchasing power Pt/Qt according to the following model:
ln Xt = ln(Pt/Qt) + Yt
Yt = µ + , (Yt-1 – µ) + et + . et-1
where e(1) and e(2) are zero-mean white noise, possibly correlated. Since ln Pt and ln Qt are both
ARIMA(1,1,0) processes, they are both I(1)-non-stationary, and ln Xt is also non-stationary. However,
lnXt – ln Pt + ln Qt = Yt
Solution: We have to show that Xt – Yt is a stationary process. If we subtract the second equation from
the first, we get
Hence the process is stationary, since |0.3| < 1; the white noise terms don’t affect the stationarity.
Strictly speaking, we should also show that the processes Xt and Yt are both I(1). We use the method of
question 7.19 to find the process Yt: from the first equation (1) we have
If this is to be an I(1) process, we need to show that the first difference is I(0). Look at the characteristic
equation or re-write the above equation in terms of differences:
since |0.3| < 1, this process is I(0) and so Xt is I(1). Similarly, Yt can be shown to be I(1).
a. Bilinear models
Considered as a function of X, this relation is linear; it is also linear when considered as a function of e
only; hence, the name “bilinear”.
• Many bilinear models exhibit “burst” behaviour: When the process is far from its mean, it tends
to exhibit larger fluctuations.
• The difference between this model and ARMA(1,1) is in the final term: b(Xn-1 – µ)en-1.
If Xn-1 is far from µ and en-1 is far from 0, this term assumes a much greater significance.
b. Threshold AR models
$! ( X " µ ) + en , if X n"1 # d
X n = µ + % 1 n"1
&! 2 ( X n"1 " µ ) + en , if X n "1 > d
Example: set ,2 = 0. Xn follows an AR(1) process until it passes the threshold value d. Then Xn returns to
µ and the process effectively starts again. Thus we get cyclic behaviour as the process keeps resetting.
t
c. Random coefficient AR models
Consider a simple example: Xt = µ + ,t (Xt-1 – µ) + et, where {,1, ,2, ...} is a sequence of independent
random variables.
Example: Xt = value of investment fund at time t. We have Xt = (1 + it) Xt-1 + et. It follows that µ = 0
and ,t = 1 + it where it is the random rate of return. The behaviour of such models is generally more
irregular than that of the corresponding AR(1) model.
a. ARCH
Thus the variance of the process is dependent upon the size of the previous value. This is what is meant
by conditional heteroscedasticity.
The class of autoregressive models with conditional heteroscedasticity of order p – the ARCH(p) models
– is defined by:
p
X t = µ + et !0 +! ! k (X t-k - µ)2
k=1
X t = µ + et !0 + !1 (X t-1 - µ)2
A significant deviation of Xt-1 from the mean µ gives rise to an increase in the conditional variance of
Xt, given Xt-1:
(Xt - µ)2 = et2 (,0 + ,1 (Xt-1 – µ)2)
Example: Let Zt denote the price of asset at the end of the tth trading day, and let Xt = ln(Zt/Zt-1) be the
daily rate of return on day t.
It has been found that the ARCH model can be used to model Xt.
b. GARCH