0% found this document useful (0 votes)
196 views56 pages

Time Series Analysis & ARMA Modeling

- A time series is a stochastic process in discrete time with a continuous state space. Time series analysis aims to describe observed data, construct models that fit the data, and forecast future values. - Stationarity means the statistical properties of a time series remain unchanged over time. Weak stationarity only requires the mean and autocovariances be constant. - The autocorrelation function (ACF) describes correlations between observations at different times and helps identify appropriate models. The partial autocorrelation function (PACF) describes direct correlations between non-successive observations adjusting for intermediate lags.

Uploaded by

Collins Musera
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
196 views56 pages

Time Series Analysis & ARMA Modeling

- A time series is a stochastic process in discrete time with a continuous state space. Time series analysis aims to describe observed data, construct models that fit the data, and forecast future values. - Stationarity means the statistical properties of a time series remain unchanged over time. Weak stationarity only requires the mean and autocovariances be constant. - The autocorrelation function (ACF) describes correlations between observations at different times and helps identify appropriate models. The partial autocorrelation function (PACF) describes direct correlations between non-successive observations adjusting for intermediate lags.

Uploaded by

Collins Musera
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

I.

Introduction to time series analysis

A time series is a stochastic process in discrete time with a continuous state space.

Notation: {X1, X2, ..., Xn } denotes a time series process, whereas {x1, x2, ..., xn } denotes a univariate
time series, i.e. a sequence of realisations of the time series process.

X1 X2 ... Xn-1 Xn Xn+1


S = (-!,!)

x1
xn
x2 … xn-1
?

0 1 2 n-1 n n+1 time

I.1 Purposes of Time Series Analysis

• Describe the observed time series data:

- mean, variance, correlation structure, ...

- e.g. correlation coefficient between sales 1 month apart, 2 months apart, etc.
! Autocorrelation Function (ACF)
! Partial Autocorrelation Function (PACF)

• Construct a model which fits the data

! From the class of ARMA models, select a model which best fits the data based on ACF
and PACF of the observed time series

! Apply Box Jenkins Methodology:


o Identify tentative model
o Estimate model parameters
o Diagnostic checks - does the model fit?

• Forecast future values of the time series process

! easy, once model has been fitted to past data

All ARMA models are stationary. If an observed time series is non-stationary (e.g. upward trend),
it must be converted to stationary time series (e.g. by differencing).
I.2 Other forms of analysis

Another important approach to the analysis of time series relies on the Spectral Density Function; the
analysis is then based on the autocorrelation function of a time series model. This approach is not
covered in this course.
II. Stationarity and ARMA modelling
II.1 Stationarity

a. Definition

A stochastic process is (strictly) stationary if its statistical properties remain unchanged over time.

S X5 X10 X120 X125

5 10 120 125 time

Joint distribution of Xt1, Xt2, ..., Xtn = Joint distribution of Xk+t1, Xk+t2, ..., Xk+tn, for all k and for all n.

Example: Joint distribution of X5, X6, ..., X10 = Joint distribution of X120, X121, ..., X125
" for any ‘chunk’ of variables
" for any ‘shift’ of start

Implications of (strict) stationarity

Take n = 1:
Xt Xt+k

t t+k

• Distribution of Xt = distribution of Xt+k for any integers k

Xt discrete: P(Xt = i) = P(Xt+k = i) for any k

Xt continuous: f(Xt) = f(Xt+k) for any k

In particular, E(Xt) = E(Xt+k) for any k


Var(Xt) = Var(Xt+k) for any k

• A stationary process has constant mean and variance

• The variables Xt in a stationary process must be identically distributed (but not necessarily
independent)
Take n = 2:

Xs Xt Xs+k Xt+k

{s t} {s+k t+k}

• Joint Distribution of (Xs ,Xt) = Joint Distribution of (Xs+k ,Xt+k)

" for all lags (t - s)


" for all integers k
" depends on the lag (t - s)

• In particular, COV(Xs ,Xt) = COV(Xs+k ,Xt+k)

where COV(Xs ,Xt) = E[(Xs – E(Xs)) (Xt – E(Xt))]

• Thus COV(Xs ,Xt) depends only on lag (t – s) and not on time s

b. Strict Stationarity

• Very stringent requirement


• Hard to prove a process is stationary
• To show a process is not stationary show one condition doesn’t hold

Examples:

Simple Random Walk: {Xt} not identically distributed

! NOT stationary

White Noise Process: {Zt} i.i.d.

! trivially stationary

c. Weak Stationarity

• This requires only that E(Xt) is constant AND COV(Xs, Xt) depends only on (t – s)

• Since Var(Xt) = COV(Xt, Xt) this implies that Var(Xt) is constant

• Weak stationarity does not imply strict stationarity

• For weak stationarity, COV(Xt, Xt+k) is constant with respect to t for all lags k

• Here (and often), stationary is shorthand for weakly stationary


Question: If the joint distribution of the Xt’s is multivariate normal, then weak stationarity implies strong
stationarity.

Solution: If X ~ N(!, ") then X is completely determined by ! and " (property of the multivariate
Normal distribution). If these do not depend on t, neither does the distribution of X.

Example: Xt = sin(#t + u), U ~ U[0, 2$] then E(Xt) = 0.

Here COV(Xt, Xt+k) = cos(#k) E(sin2(u)), hence does not depend on t

! Xt is weakly stationary

Question: If we know X0, then we can work out u, since X0 = sin(u). We then know all the values of Xt
= sin(#t + u)

! Xt is completely determined by X0

Definition: X is purely indeterministic if values of X1, ..., Xn are progressively less useful at predicting
XN as N % &.

Here stationary time series means weakly stationary, purely indeterministic process.

II.2 Autocovariance, autocorrelation and partial autocorrelation

c. Autocovariance function

Xt Xt+k
| |
t t+k time

• For a stationary process, E(Xt) = µt = µ, for any t

• We define !k = Cov(Xt, Xt+k) = E(Xt Xt+k) - E(Xt) E(Xt+k) the “autocovariance at LAG k”.

• This function does not depend on t.

• Autocovariance function of X: {'0, '1, '2, ...} = {'k : k ( 0}

• Note: !0 = Var(Xt)

Question: Properties of covariance – needed when calculating autocovariances for specified models.

b. Autocorrelation function (ACF)

• Recall that corr(X,Y) = Cov(X,Y) / ()X)Y)

• For a stationary process, we define "k = corr(Xt, Xt+k) = !k/!0 the “autocorrelation at lag k”.
(This is the usual correlation coefficient, since Var(Xt) = Var(Xt+k) = '0)

• Autocorrelation Function (ACF) of X: { *0 , *1, *2 , ...} = {*k : k ( 0}

• Note: "0 = 1

• For a purely indeterministic process, we expect *k % 0 as k % & (i.e. values far apart will not
be correlated)

• Recall (ST3053): a sequence of i.i.d. random variables {Zt} is called a white noise process and is
trivially stationary.

Example: {et} is a zero-mean white noise process if

" E(et) = 0 for any t and

#! 2 , if k = 0
" "k = COV ( e t ,e t + k ) = $
%0, otherwise

• Note: the variables et have zero mean, variance )2 and are uncorrelated

• A sequence of i.i.d. variables with zero mean will be a white noise process, according to this
definition. In particular, Zt independent, Zt ~ N(0,)2) is a white noise process.

• Result: 'k = '-k and *k = *-k


• Correlogram = plot of ACF {*k : k ( 0} as a function of lag k. It is widely used as it tells a lot
about the time series.

c. Partial autocorrelation function (PACF)

Let r(x,y|z) = corr(x,y|z) denote the partial correlation coefficient between x and y, adjusted for z (or
with z held constant).

Xt Xt+1 Xt+2 ... Xt+k-1 Xt+k

| | | | |
t t+1 t+2 t+k-1 t+k

• Denote:

+2 = corr(xt, xt+2|xt+1)

+3 = corr(xt, xt+3|xt+1, xt+2)

+4= corr(xt, xt+k|xt+1,... xt+k-1)

+k = partial autocorrelation coefficient at lag k.


• Partial autocorrelation function (PACF):

{+1, +2, ...} = {+k, k ( 1}

• The +k’s are related to the *k’s:

+1 = corr(Xt, Xt+1) = *1

Recall that

r(x,y) - r(x,z)r(y,z)
r(x,y|z) =
1-r 2 (x,z) 1-r 2 (y,z)

Applying this here, using x = Xt, y = Xt+2, z = Xt+1, "2 = corr(xt, xt+2|xt+1) = r(x,y|z), along with #1= r(x,z)
and #2 = r(x,y), yields:

! 2 # !12
"2 =
1 # !12

d. Estimation of the ACF and PACF

We assume that the sequence of observations {x1, x2, ...xn} comes from a stationary time series process.

The following functions are central to the analysis of time series:

{'k} - Autocovariance function

f($) {*k} Autocorrelation function (ACF)


Spectral
density
function

{!k } Partial Autocorrelation function


(PACF)

To find a model to fit the sequence {x1,x2, ... ,xn}, we must be able to estimate the ACF of the process
of which the data is a realisation. Since the model underlying the data is assumed to be stationary, its
mean can be estimated using the sample mean.
1 n
µ̂ = ! xt
n t =1

The autocovariance function, 'k, can be estimated using the sample autocovariance function:
1 n
!ˆk = # (x t " µˆ )(x t-k " µˆ )
n t = k +1

from which are derived estimates, rk for the autocorrelation *k:

!ˆk
rk =
!ˆ0

The collection {rk : k ! Z } is called the sample autocorrelation function (SACF). The plot of rk
against k is called a correlogram.

Recall that the partial autocorrelation coefficients !k are calculated as follows:

!1 = "1
1 "1
"1 " 2 " 2 # "12
!2 = =
1 "1 1 # "12
"1 1

In general, !k is given as a ratio of determinants involving *1, *2, ..., *k. The sample partial
autocorrelation coefficients are given by these formulae, but with the *k replaced by their estimates rk:

!ˆ1 = r1
ˆ r2 " r12
!2 =
1 " r12
etc.

The collection {!ˆk } is called the sample partial autocorrelation function (SPACF). The plot of {!ˆk }
against k is called the partial correlogram.

rk 1
!ˆk 1

| | | | | | | | | | | |
k k
-1 -1

These are the main tools in identifying a model for a stationary time series.
II.3 ARMA modelling

Autoregressive moving average (ARMA) models constitute the main class of linear models for time
series. More specifically:
• Autoregressive (AR)
• Moving Average (MA)
• Autoregressive Moving Average (ARMA)
• Autoregressive Integrated Moving Average (ARIMA)

! Last type are non-stationary


! Others are stationary

a. AR models
• Recall: Markov Chain = process such that the conditional distribution of Xn+1, given Xn,Xn-1,...X0
depends only on Xn, i.e. “the future depends on the present, but not on the past”
• The simplest type of autoregressive model (AR(1)) has this property: Xt = , Xt-1 + -t , where -t is
zero-mean white noise.

Xt-2 Xt-1 Xt
| | |
t-2 t-1 t

• For AR(1), we prove that +2 = corr(Xt, Xt-2| Xt-1) = 0


• Similarly, +k = 0 for k > 2.
• A more general form of an AR(1) model is

Xt = µ + , (Xt-1 – µ) + -t

where µ = E(Xt) is the process mean

• Autoregressive process of order p (AR(p)):

Xt = µ + ,1 (Xt-1 – µ) + ,2 (Xt-2 – µ) + ... ,p (Xt-p – µ) + -t

b. MA models

A realisation of a white noise process is very ‘jagged’, since successive observations are realisations of
independent variables... Most time series observed in practice have a smoother time series plot than a
realisation of a white noise process, since in this process the successive observations are realisations of
independent variables. In that respect, taking a “moving average” is a standard way of smoothing an
observed time series:

Observed data: x1 , x2 , x3 , x4 ,...


1
Moving average: ( x1 + x2 + x3 , x2 + x3 + x4 ,...)
3

Data

Moving average

• A moving average process is “smoothed white noise”

• The simplest type of moving average (MA) process is Xt = µ +-t + .-t-1 where -t is zero-mean
white noise

• The %t’s are uncorrelated, but the Xt’s are not:

Xt-1

..... -t-1 -t-2 -t-1 -t

Xt-2 Xt

• For MA(1) we prove that: *2 = corr(Xt , Xt-2) = 0

• Similarly, *k = 0, for k > 2

• Moving average process of order (q) (MA(q)):

Xt = µ +-t + .1 -t-1 + ... + .q -t-q

c. ARMA models

ARMA processes « combine » AR and MA parts :

Xt = µ + ,1(Xt-1 – µ) +...+ ,p(Xt-p – µ) + -t + . 1 -t-1 +...+ .q -t-q

Note: ARMA(p,0) = AR(p)


ARMA(0,q) = MA(q)

II.4 Backwards Shift Operator and Difference Operator

The following operators will be useful:


• Backwards shift operator: B Xt = Xt-1, Bµ = µ
• Difference operator: ! = 1-B, hence

!Xt = Xt – Xt-1

B2Xt = BBXt = BXt-1 = Xt-2

!2Xt = !Xt - !Xt-1


= Xt – Xt-1 – (Xt-1 – Xt-2)
= (1-B)2Xt
= (1-2B+B2) Xt
= Xt – 2Xt-1 + Xt-2

II.5 AR(p) models, stationarity and the Yule-Walker equations

a. The AR(1) Model


• Recall Xt = µ + ,(Xt-1 – µ) + -t
• Substituting in for Xt-1, then for Xt-2,

Xt = µ + ,[, (Xt-2 – µ) + -t-1] + -t = µ + ,2(Xt-2 – µ) + -t + , -t-1


t-1
t t-1 t j
Xt = µ + , (X0 – µ) + -t + , -t-1 + ...+ , -1 = µ + , (X0 – µ) + #! "
j=0
t-j

• Note: X0 is a Random Variable


• Since E(-t) = 0 for any t, µt = E(Xt) =µ + ,t (µ0 – µ)
• Since the -t’s are uncorrelated with each other and with X0,

% t $1 &
Var ( X t ) = Var '' µ + ! t ( X 0 $ µ ) + +! j" t $ j ((
) j =0 *
t $1
= ! 2tVar ( X 0 ) + + ! 2 j# 2
j =0

2t 1 $ ! 2t
2
= ! Var ( X 0 ) + #
1$! 2

Question: When will AR(1) process be stationary?

Answer: This will require constant mean and variance.

If µ0 = µ then µt = µ + ,t (µ0 – µ) = µ.

!2 1 # ! 2t 2t "
2
"2
If Var (X0) = then Var ( X t ) = " 2 + ! =
1#" 2 1#! 2 1#! 2 1#! 2
Neither µt nor Var(Xt) depend on t. We also require that |,| < 1 so that the AR(1) process be stationary,
in which case

!2 2t # !2 $
µt = µ + " t (µ0 % µ ) AND Var ( X t ) % = " & Var ( X 0 ) % '
1%" 2 ( 1%" 2 )

• If |,| < 1, both terms will decay away to zero for large t
! X is almost stationary for large t

• Equivalently, if we assume that the process has already been running for a very long time, it
will be stationary

• Any AR(1) process with infinite history and |,| < 1 will be stationary:

... --2, --1, -0, -1, ... -t

... X-2, X-1, X0, X1, ... Xt

Steady State reached Observed time series

• An AR(1) process can be represented as:


#
X t = µ + %! j" t $ j
j =0
and this converges only if |,| < 1.

• The AR(1) model Xt = µ + , (Xt-1 – µ) + -t can be written as

(1 – ,B)(Xt – µ) = -t

If |,| < 1, then (1 – ,B) is invertible and

Xt – µ = (1 – ,B)-1-t = (1 + ,B + ,2B2 + ...) -t

= -t + ,-t-1 + ,2-t-2 + ...,

• So
#
X t = µ + %! j " t $ j
j =0

From this representation,


#
!2
µt = E(Xt) = µ and Var(Xt) = $! "2 =
2j
if |,| < 1.
j =0 1#" 2
• So, if |,| < 1, the mean and variance are constant, as required for stationarity

• We must calculate the autocovariance 'x = Cov(Xt , Xt+k) and show that this depends only on the
lag k. We need properties of covariance:

Cov(X+Y,W) = Cov(X,W) + Cov (Y,W)

Cov(X,e) = 0

From the following diagram

... -t-2, -t-1, -t (uncorrelated) and ... -t-2, -t-1, -t

Xt Xt-1

we can tell that -t and Xt-1 are uncorrelated, hence

Cov(-t, Xt-1) = 0
Cov(-t, Xt-k) = 0, k(1
Cov(-t, Xt) = )2

'1 = Cov(Xt, Xt-1) = Cov(µ + ,(Xt-1 – µ) + -t, Xt-1)


= , Cov(Xt-1, Xt-1) + Cov(-t, Xt-1)
= ,'0 + 0

'2 = Cov(Xt, Xt-2) = Cov(µ + ,(Xt-1 – µ) + -t, Xt-2)


= , Cov(Xt-1, Xt-2) + Cov(-t, Xt-2)
= , '1 + 0
= ,2'0

Similarly, 'k = ,k '0, k ( 0

In general,

'k = Cov(Xt,Xt-k) = Cov(µ + ,(Xt-1 – µ) + -t,Xt-k)


= ,Cov(Xt-1, Xt-k) + Cov(-t,Xt-k)
= , 'k-1 + 0

Hence,
2
'k = ,k'0 = ,k ! for k ( 0
1#" 2

and *k = 'k/'0 = ,k for k ( 0

! ACF decreases geometrically with k


Recall the partial autocorrelations +1 and +2 satisfy

!2 # !12
"1 = !1 and "2 =
1 # !12

Here +1 = *1 = , and
! #!
2 2

"2 = =0
1#! 2

In fact, #k = 0 for k > 1

In summary, for the AR(1) model,

• ACF “tails off” to zero


• PACF “cuts off” after lag 1

Example: Consumer price index Qt

rt = ln(Qt/Qt-1) models the force of inflation

Assume rt is an AR(1) process:

rt= µ + ,(rt-1-µ) + et

Note: Here µ is the long-run mean

rt - µ = ,(rt-1- µ), ignoring et

If |,| < 1, then rt – µ % 0 and so rt % µ as t % &. In this case rt is said to be mean-reverting.

b. The AR(P) model and stationarity

Recall that the AR(p) model can be written either in its generic form

Xt = µ + ,1(Xt-1 – µ ) + ,2(Xt-2 – µ) + ... + ,p(Xt-p – µ) + et

or using the B operator as

(1 – ,1 B – ,2 B2 – ,3 ... – ,pBp) (Xt – µ) = et

Result: AR(p) is stationary IFF the roots of the characteristic equation

1 – ,1z – ,2z2 – ... – ,pzp = 0

are all greater than 1 in absolute value.


1 – ,1z – ,2z 2 – ... – ,pzp # Characteristic Polynomial

Explanation for this result: write the AR(p) process in the form

! B "! B" ! B"


$ 1 # %$ 1 # ...
% $$ 1 # %% ( X t # µ ) = et
& z1 '& z 2 ' & z p '

where z1 ...zp are roots of the characteristic polynomial:

! z "! z " ! z "


p $ 1 # %$1 # % ... $1 # %
1 – ,1z ... – ,pz = & z1 '& z 2 ' $ z p %
& '

In the AR(1) case,

1 – ,z = 1 – z/z1, where z1 = 1/,

In AR(1) case, we can invert the term


! B"
#1- $
% z1 &
in
! B"
$1- % ( X t # µ ) = et
& z1 '

IFF |z1| > 1. In the AR(p) case, we need to be able to invert all of the factors

! B"
#1- $
% zi &

This will be the case IFF |zi| > 1 for i = 1,2, ..., p.

Example : AR(2)

Xt = 5 – 2(Xt-1 - 5) + 3(Xt-2 – 5) + et or (1 + 2B -3B2)(Xt – 5) = et


! 1 + 2z -3z2 = 0 is the characteristic equation here

Question: when is an AR(1) process stationary ?

Answer: we have

Xt = µ + , (Xt-1 – µ) + et.

i.e. (1 – ,B)(Xt – µ) = et, so 1 – ,z = 0 is the characteristic equation with solution z = 1/,. So |,| < 1 is
equivalent to |z| > 1, as required.

Question: Consider the AR(2) process Xn = Xn-1 – / Xn-2 + en. Is it stationary ?


Answer: Use B-operator: (1 – B + / B2)Xn = en. So characteristic equation is

1 – z + / z2 = 0, with roots 1 ± i and |1± i| = !2 > 1

Since both roots satisfy |zi| > 1, the process is stationary.

In the AR(1) model, we had '1 = ,'0 and '0 = )2. These are a particular case of the Yule-Walker
Equations for AR(p):

Cov( X t , X t #k ) = Cov( µ + !1 ( X t #1 # µ ) + ... + ! p ( X t # p # µ ) + et , X t #k )


$" 2 , if k=0
= !1Cov( X t #1 , X t #k ) + ... + ! pCov( X t # p , X t #k ) + %
& 0, otherwise

c. Yule-Walker equations

The Yule-Walker equations are defined by the following relationship:

%! 2 , if k=0
" k = #1" k $1 + # 2" k $2 + ... + # p" k $ p + ' , for 0 & k & p
( 0, otherwise

Considering the AR(1) (i.e. p = 1), for k = 1, we get '1 = ,'0, and for k = 0, we get '0 = )2.

Example (p=3):

'3 = ,1'2 + ,2'1 + ,3'0


'2 = ,1'1 + ,2'0 + ,3'1
'1 = ,1'0 + ,2'1 + ,3'2
'0 = ,1'1 + ,2'2 + ,3'3 + )2

Example: consider the AR(3) model Xt = 0.6Xt-1 + 0.4Xt-2 – 0.1Xt-3 + et

Yule-Walker Equations:
'0 = 0.6'1 + 0.4'2 – 0.1'3 + )2 (0)
'1 = 0.6'0 + 0.4'1 – 0.1'2 (1)
'2 = 0.6'1 + 0.4'0 – 0.1'1 (2)
'3 = 0.6'2 + 0.4'1 – 0.1'0 (3)

From (1), '2 = 6'0 – 6'1

56 54
From (2), '2 = 0.4'0 + 0.56'1, hence '1 = '0, and hence '2 = '0.
65 65
483
From (3), '3 = '0
650

From (0), )2 = 0.22508'0

Hence, '0 = 4.4429)2, '1=3.8278)2, '2=3.6910)2, '3=3.3014)2


and so, since *k = 'k/'0, *0 = 1, *1 = 0.862, *2 = 0.831, *3 = 0.743.

It may be shown that for AR(p) models,

• ACF “tails off” to zero,


• PACF “cuts off” after lag p, i.e. #k = 0 for k > p

II.6 MA(q) models and invertibility

a. The MA(1) model

The model is given by Xt = µ + et + .et-1, where µt = E(Xt) = µ, and

'0 = Var(et + .et-1) = (1 + .2))2

'1 = Cov(et + .et-1, et-1+.et-2)=.)2

'k = 0 for k > 1

Hence, the ACF for MA(1) is:

*0 = 1

*1 = . / (1+.2)

*k = 0 for k > 1

Since the mean E(Xt) and covariance 'k = E(Xt, Xt-k) do not depend on t, the MA(1) process is (weakly)
stationary - for all values of the parameter &.

However, we require MA models to be invertible and this imposes conditions on the parameters.

Recall: If |,| < 1 then in the AR(1) model

(1 – ,B)(Xt – µ) = et,

(1-,B) is invertible and


#
X t = µ " $! j et" j = µ + et + ! et "1 + ! 2et "2 + ...
j =0

i.e. an AR(1) process is MA(&). An MA(1) process can be written as

Xt – µ = (1 + .B)et

or

(1 + .B)-1(Xt – µ) = et

i.e.
Xt-µ – .(Xt-1 – µ) + .2(Xt-2 – µ) + ... = et

So an MA(1) process is represented as an AR(&) one – but only if |.| < 1, in which case the MA(1)
process is invertible.

Example: MA(1) with & = 0.5 or & = 2

For both values of . we have:

0.5 2
*1 = ./(1+.2) = = = 0.4,
1 + (0.5) 1 + 22
2

So both models have the same ACF. However, only the model with .=0.5 is invertible.

Question: Interpretation of invertibility

Consider the MA(1) model Xn – µ – .en-1. We have

en = Xn – µ – .en-1 = Xn – µ – .(Xn-1 – µ – .en-2)

= ...
= Xn – µ – .(Xn-1 – µ) + .2(Xn-2 – µ) ... + (-.)n-1(X1 – µ) + (-.)ne0

As n gets large, the dependence of en on e0 will be small if .| < 1.

Note:

AR(1) is stationary IFF |$| < 1.

MA(1) is invertible IFF |%| < 1.

For an MA(1) process, we have *k = 0 for k > 1, so for an MA(1) process, the ACF “cuts off” after lag1.
It may be shown that PACF “tails off” to zero.

AR(1) MA(1)

ACF Tails off to zero Cuts off after lag 1

PACF Cuts off after lag 1 Tails off to zero

b. The MA(q) model and invertibility


An MA(q) process is modeled by Xt = µ + et + .1et-1 + ... + .qet-q, where {et} is a sequence of
uncorrelated realisations. For this model we have 'k = Cov(Xt, Xt-k) = 0 for k > q.

'k = Cov(Xt, Xt-k)

= E[(et + .1et-1 + ... + .qet-q) ( et-k + .1et-k-1 + ... + .qet-k-q)]


q q
= ## ! ! E(e
i =0 j =0
i j e
t -i t " j " k ) [where &0 = 1]
q "k
= )2 #! j +k !j , [since j = i-k ' q-k]
j =0

since the only non-zero terms occur when the subscripts of et-i and et-j-k match, i.e. when i = j+k, for k '
q.

In summary, for k > q, !k = 0:


• For MA(q), ACF cuts off after lag q
• For AR(p), PACF cuts off after lag p

Question: ACF of the MA(2) process Xn = 1 + en – 5en-1 + 6en-2.

'0 = Cov(1 + en – 5en-1 + 6en-2, 1 + en – 5en-1 + 6en-2) = (1 + 25 + 36) 1 = 62

If E(en) = 0 and Var(en) = 1.

'1 = Cov (1 + en – 5en-1 + 6en-2, 1 + en-1 – 5en-2 + 6en-3) = (-5)(1) + (6)(-5) = -35

'2 = Cov (1 + en – 5en-1 + 6en-2, 1 + en-2 – 5en-3 + 6en-4) = (6)(1) = 6


'k = 0, k > 2

Recall that an AR(p) process is stationary IFF roots z of the characteristic eq satisfy |z| > 1. For an
MA(q) process , we have

Xt – µ = (1 + .1B + .2B2 + ... + .pBp) et

Consider the equation 1 + .1z + .2x2 + ... + .pzp = 0. The MA(q) process is invertible IFF all roots z of
this equation satisfy |z| > 1.

In summary:

• If AR(p) stationary, then AR(p) = MA(&)


• If MA(q) is invertible, then MA(q) = AR(&)

Question: Assess invertibility of the MA(2) process Xt = 2 + et – 5et-1 + 6et-2.

We have Xt = 2 + (1-5B +6B2)et.

The characteristic equation is 1 – 5z + 6z2 = 0 with roots (1-2z)(1-3z) = 0, i.e. roots z = / and z = 1/3

! Not invertible

II.7 ARMA(p,q) models

Recall that the ARMA(p,q) model can be written either in its generic form
Xt = µ + ,1(Xt-1 – µ) + ... ,p(Xt-p –µ) + et + .1et-1 + ...+ .qet-q

or using the B operator:

(1 – ,1B ... –,pBp) (Xt – µ) = (1 + .1B ... + .qBq)et

i.e. 0(B)(Xt – µ) = 1(B)et

where

0 (2) = 1 – ,12 - ... - ,p2p


1 (2) = 1 + .12 + ... + .p2q

If 0 (2) and 1 (2) have factors in common, we simplify the defining relation.

Consider the simple ARMA(1,1) process with . = -,, written either

Xt = ,Xt-1 + et – ,et-1

or

(1 – ,B)Xt = (1 – ,B)et , with |,| < 1

Dividing through by (1 – ,B), we obtain Xt = et. Therefore the process is actually an ARMA(0,0), also
called white noise.

We assume that 0(2) and 1(2) have no common factors. Properties of ARMA(p,q) are a mixture of
those of AR(p) and those of MA(q).

• Characteristic polynomial of ARMA(p,q) = 1 – ,1z ... – ,pzp (as for AR(p))

• ARMA(p,q) is stationary IFF all the roots z of 1 – ,1z ... – ,pzp = 0 satisfy |z| > 1

• ARMA(p,q) is invertible IFF all the roots z of 1 – .1z ... – .pzq = 0 satisfy |z| > 1

Example: the ARMA(1,1) process Xt = ,Xt-1 + et + .et-1 is stationary if |,| < 1 and invertible if |.| < 1.

Example: ACF of ARMA(1,1). For the model given by Xt = ,Xt-1 + et + .et-1 we have

Cov(et, Xt-1) =0

Cov(et, et-1) =0

Cov(et, Xt) = , Cov(et,Xt-1) + Cov(et,et) + . Cov(et,et-1) = )2

Cov(et-1, Xt) = , Cov(et-1,Xt-1) + Cov(et-1,et) + . Cov(et-1,et-1)

= , )2 + 0 + . )2 = (, + .) )2
'0 = Cov(Xt,Xt) = , Cov(Xt,Xt-1) + Cov(Xt,et) + . Cov(Xt,et-1)

= ,'1 + )2 + . (,+.) )2

= ,'1 + (1 + ,. + .2) )2

'1 = Cov(Xt-1,Xt)

= , Cov(Xt-1,Xt-1) + Cov(Xt-1,et) + .Cov(Xt-1,et-1)

= ,'0 + .)2

For k > 1,

'k = Cov(Xt-1,Xt)

= , Cov(Xt-k,Xt-1) + Cov(Xt-k,et) + . Cov(Xt-k,et-1)

= , 'k-1

(Analogues of Yule-Walker Equations)

! Solve for '0 and '1:

1 + 2!" + " 2 2
'0 = #
1-! 2

(!+")(1 + !") 2
'1 = #
1-! 2

'k = ,k-1 '1, for k > 1

!1 (1+"#)("+#)
Hence = 2
, *k = ,k-1*1, for k > 1 (compare *k = ,k, for k ( 0 for AR(1)).
!0 1+2"#+#
For (stationary) ARMA(p,q),

• ACF tails off to zero


• PACF tails off to zero

Question: ARMA(2,2) process

12 Xt = 10Xt-1 – 2Xt-2 + 12et – 11et-1 + 2et-2


(12 – 10B +2B2)Xt = (12 – 11B +2B2)et

The roots of

12 – 10z + 2z2 = 2(z – 2)(z –3) = 0

Are z = 2 and z = 3, |z| > 1 for both roots; process stationary.

II.8 ARIMA(p,d,q) models

a. Non-ARMA processes

• Given time series data X1 ... Xn, find a model for this data.

• Calculate sample statistics: sample mean, sample ACF, sample PACF.

• Compare with known ACF/PACF of class of ARMA models to select suitable model.

• All ARMA models considered are stationary – so can only be used for stationary time series
data.

• If time-series data is non-stationary, transform it to a stationary time series (e.g. by differencing)

• Model this transformed series using an ARMA model

• Take the “inverse transform” of this model as model for the original non-stationary time series.

Example: Random Walk X0 = 0, Xn = Xn-1 + Zn, where Zn is a white noise process.

Xn is non-stationary, but Xn = Xn – Xn-1 = Zn is stationary.

Question: Given X0, X1 ...Xn the first order differences are wi = xn – xi-1 , i = 1, ... , N

From the differences w1, w2, ..., wN and x0 we can calculate the original time series:

w1 = x1 – x0 , so x1 = x0 + w1

w2 = x2 – x1 , so x2 = x1 + w2

= x0 + w1 + w2, etc.

The inverse process of differencing is integration, since we must sum the differences to obtain the
original time series.

b. The I(d) notation (“integrated of order d”)

• X is said to be I(0) if X is stationary


• X is said to be I(1) if X is not stationary but Yt = Xt – Xt-1 is stationary

• X is said to be I(2) if X is not stationary, but Y is I(1).

Thus X is I(d) if X must be “differenced” d times to make it stationary.

Example: If the first differences xn = xn – xn-1 of x1, x2 ... xn are modelled by an AR(1) model
(stationary)

!Xn = 0.5 !Xn-1 + en,

Then, Xn – Xn-1 = 0.5(Xn-1 – Xn-2) + en, so Xn = 1.5Xn-1 – 0.5Xn-2 +en is the model for the original time
series.

This AR(2) model is non-stationary since written as (1 – 1.5B + 0.5B2)Xn = en, for which the
characteristic equation is:

1 – 1.5z + 0.5z2 = 0

with roots z = 1 and z = 2. The model is non-stationary since |z| > 1 does not hold for BOTH roots.

X is ARIMA(p,1,q) if X is non-stationary, but !X (the first difference of X) is a stationary ARMA(p,q)


process

• Recall that a process X is I(1) if X is non-stationary, but !X = Xt – Xt-1 is stationary

Note: If Xt is ARIMA(p,1,q) then Xt is I(1).

Example: Random Walk. Xt – Xt-1 = et, where et is a white noise process.

We have
t
Xt = X0 + !e
j=1
j

So E(Xt) = E(X0), if E(et) = 0, but Var(Xt) = Var(X0) + t)2. Hence Xt is non-stationary, but !Xt = et,
where et is a stationary white noise process.

Example: Zt = closing share price on day t. Here the model is given by

Zt = Zt-1 exp(µ + et)

Let Yt = ln Zt , then Yt = µ + Yt-1 + et . This is a random walk with drift.

Now consider the daily returns Yt – Yt-1 = ln(Zt/Zt-1). Since Yt – Yt-1 = µ + et and the et’s are independent,
then Yt – Yt-1 is independent of Y1 ...Yt-1 or ln(Zt/Zt-1) is independent of past prices Z0, Z1, ... Zt-1.
Example: Recall the example of Qt = consumer price index at time t. We have

rt = ln(Qt/Qt-1) follows AR(1) model

rt = µ + , (rt-1 – µ) + et

ln(Qt/Qt-1) = µ + , (ln(Qt/Qt-1) - µ) + et

!ln(Qt) = µ + ,(!ln(Qt-1) – µ) + et

thus !ln(Qt) is AR(1) and so ln(Qt) is ARIMA(1,1,0)

If
• X needs to be differenced at least d times to reduce it to stationarity,
d
• and Y = ! X is stationary ARMA(p,q),

then
X is an ARIMA(p,d,q) process.

An ARIMA(p,d,q) process is I(d)

Example: Identify as ARIMA(p,d,q) the following model

Xt = 0.6Xt-1 + 0.3Xt-2 + 0.1Xt-3 + et – 0.25et

(1 – 0.6B – 0.3B2 – 0.1B3) Xt = (1 – 0.25B) et

Check for factor (1 – B) on LHS: (1 – B)(1 – 0.4B + 0.1B2)Xt = (1 – 0.25B) et

! Model is ARIMA(2,1,1)

Characteristic equation: 1 + 0.4z + 0.1z2 = 0 with roots -2 ± i 6

Since |z| = 10 > 1 for both roots !Xt is stationary, as required.

Alternative method: Write model in terms of !Xt = Xt – Xt-1, !Xt-1, etc

Xt – Xt-1 = -0.4Xt-1 + 0.4Xt-2

= -0.1Xt-2 + 0.1Xt-3 + et – 0.25et

!Xt = -0.4 !Xt-1 – 0.1 !Xt-2 + et – 0.25et-1

Hence, !Xt is ARMA(2,1) (check for stationarity as above), and so Xt is ARIMA(2,1,1)

Note: if !dXt is ARMA(1,q), to check for stationarity, we only need to see that |,1| < 1.
II.9 The Markov Property

AR(1) Model:

Xt = µ + ,(Xt-1 – µ) + et

Conditional distribution of Xn+1 , given Xn, Xn-1 , ... , X0 depends only on Xn

! AR(1) has markov property

AR(2) Model:

Xt = µ + ,1 (Xt-1 – µ) + ,2 (Xt-2 - ) + et

Conditional distribution of Xn+1, given Xn , Xn-1 , ... X0

depends on Xn-1 as well as Xn.

! AR(2) does not have the Markov Property

Consider now

Xn+1 = µ + ,1Xn + ,2Xn-1 + en+1

or

! X n+1 " ! µ " ! !1 ! 2 " ! X n " ! en+1 "


# $ =# $+# $# $ +# $
% X n & % 0 & % 1 0 & % X n-1 & % 0 &

! Xn " T
Define Yn = # $ =(X n ,X n-1 )
% X n-1 &

! µ " ! !1 ! 2 " ! en+1 "


then Yn+1 = # $ + # Y +
$ n # $
%0 & % 1 0 & %0 &

• Y is said to be a vector autoregressive process of order 1.

• Notation: Var(1)

• Y has the Markov property

• In general, AR(P) does not have the Markov property for p > 1, but Y = (Xt, Xt-1, ... Xt-p+1)T does

• Recall: Random walk – ARIMA(0,1,0) defined by Xt – Xt-1 = et has independent increments and
hence does have the Markov property

It may be shown that for p+d > 1, ARIMA(p,d,0) does not have the Markov property, but Yt = (Xt, Xt-1,
..., Xt-p-d+1)T does.
Consider the MA(1) process Xt = µ + et + .et-1. It is clear that “knowing Xn will never be enough to
deduce the value of en, on which the distribution of Xn+1 depends”. Hence an MA(1) process does not
have the Markov property.

Now consider an MA(q) = AR(&) process. It is known that AR(p) processes Y = (Xt, Xt-1, ...Xt-p+1)T
have the Markov property if considered as a p-dimensional vector process (p finite). It follows that an
MA(q) process has no finite dimensional Markov representation.

Question: Associate a vector-valued Markov process with 2Xt = 5Xt-1 – 4Xt-2 + Xt-3 + et

We have

2 (Xt – Xt-1) = 3 (Xt-1 – Xt-2) - (Xt-2 – Xt-3) + et

2!Xt = 3 !Xt-1 – !Xt-2 + et

!2 Xt = !2Xt-1 + et

! ARIMA(1,2,0) or ARIMA(p,d,q) with p = 1 and d = 2.

Since p+d = 3 > 1, Yt = (Xt, Xt-1, ...Xt-p-d+1)T = (Xt, Xt-1,Xt-2)T is Markov

Question: Let the MA(1) process Xn = en + en-1, where

en = 1 with probability /
-1 with probability /

P(Xn = 2 | Xn-1 = 0)

= P(en = 1, en-1 = 1 | en-1 + en-2 = 0)

= P(en = 1) P(en-1 = 1 | en-1 + en-2 = 0)

=//=3

P(Xn = 2 | Xn-1 = 0, Xn-2 = 2)

= P(en = 1, en-1 = 1| en-1 + en-2 = 0, en-2 + en-3 = 2)

=0

! Not Markov: since the two probabilities differ, value of Xn does not depend on the immediate
past n-1 only.
III. Non-stationarity: trends and techniques
III.1 Typical trends

Possible causes of non-stationarity in a time series are:

• Deterministic trend (e.g. linear or exponential growth)


• Deterministic cycle (e.g. seasonal effects)
• Time series is integrated (as opposed to differenced)

Example:
+1, probability 0.6
Xn = Xn-1 + Zn, where Zn =
-1, probability 0.4

Here Xn is I(1) , since Zn = Xn – Xn-1 is stationary. Also, E(Xn) = E(Xn-1) + 0.2, so the process has a
deterministic trend.

Many techniques allow to detect non-stationary series; among the simplest methods:
• Plot of time series against t
• Sample ACF

The sample ACF is an estimate of the theoretical ACF, based on the sample data and is defined later. A
plot of the time series will highlight a trend in the data and will show up any cyclic variation.

Trend Seasonal Pattern


Xt
Xt

t t
2003 | 2004
Trend + Seasonal

Xt

Recall: For a stationary time series, *k % 0 as k % &, i.e. (theoretical) ACF converges toward zero.

Hence, the sample ACF should also converge toward zero. If the sample ACF decreases slowly, the
time series is non-stationary, and needs to be differenced before fitting a model.
Sample ACF Sample ACF
rk rk

1 k 1 12

If sample ACF exhibits periodic oscillation, there is probably a seasonal pattern in the data. This
should be removed before fitting a model (see Figures 7.3a and 7.3b). The following graph (Fig 7.3(a))
shows the number of hotel rooms occupied over several years. Inspection shows the clear seasonal
dependence, manifested as a cyclic effect.

The next graph (Fig 7.3(b)) shows the sample autocorrelation function for this data. It is clear that the
seasonal effect shows up as a cycle in this function. In particular, the period of this cycle looks to be 12
months, reinforcing the idea that it is a seasonal effect.

!"#$%&'()*+'
Seasonal variation- hotel room occupancy (7.3a) 1963-1976 and its sample ACF (7.3b)

Methods for removing a linear trend:


• Least squares
• Differencing

Methods for removing a seasonal effect

• Seasonal differencing
• Method of Moving Averages
• Method of seasonal means

III.2 Least squares trend removal

Fit a model,

Xt = a + bt + Yt

where Yt is a zero-mean, stationary process.

Recall: et = error variables (“true residuals”) in a regression model. Assume et ~ IN(0,)2)

• Estimate parameters a and b using linear regression

• Fit a stationary model to the residuals:

ˆ
ŷt = x t ! (aˆ - bt)

Note: least squares may also be used to remove nonlinear trends from a time series. It is naturally
possible to model any observed nonlinear trend by some term "(t) within

Xt = "(t) + Yt

which can be estimated using least squares. For example, a plot of hourly data of daily energy loads
against temperature, over a one-daytime frame, may indicate quadratic variations over the day; in this
case one could use "(t) = a + bt2.

III.3 Differencing

a. Differencing and linear trend removal

Use differencing if the sample ACF decreases slowly. If there is a linear trend, e.g. xt = a + bt + yt, then

"xt = xt ! xt !1 = b + " yt ,

so differencing has removed the linear trend. If xt is I(d), then differencing xt d times will make it
stationary.

Differencing xt once will remove any linear trend, as above.


Suppose xt is I(1) with a linear trend. If we difference xt once, then !xt is stationary and we have
removed the trend.

However, if we remove the trend using linear regression we will still be left with an I(1) process that is
non-stationary.

Example:
+1, prob. 0.6
Xn = Xn-1 + Zn, where Zn =
-1, prob. 0.4

Let X0 = 0. Then E(X1) = 0.2, since E(Z1) = 0.2, and

E(X2) = 0.2(2)
E(Xn) = 0.2(n).

Then Xn is I(1) AND Xn has a linear trend.

Let Yn = Xn – 0.2(n). Then E(Yn) = 0, so we have removed the linear trend but

Yn – Yn-1 = Xn – Xn-1 -0.2

= Zn – 0.2

Hence Yn is a random walk (which is non-stationary) and !Yn is stationary, so Yn is an I(1) process.

b. Selection of d

How many times (d) do we have to difference the time series Xt to convert it to stationarity? This will
determine the parameter d in the fitted ARIMA(p,d,q) model.

Recall the three causes of non-stationarity:

• Trend
• Cycle
• Time series is an integrated series

We are assuming that linear trends and cycles have been removed, so if the plot of the time series and its
SACF indicate non-stationarity, it could be that the time series is a realisation of an integrated process
and so must be differenced a number of times to achieve stationarity.

Choosing an appropriate value of d:

• Look at the SACF. If the SACF decays slowly to zero, this indicates a need for differencing (for
a stationary ARMA model, the SACF decays rapidly to zero).

• Look at the sample variance of the original time series X and its difference.

Let !ˆ 2 be the sample variance of z ( d ) =! d x . It is normally the case that !ˆ 2 first decreases with d until
stationarity is reached, and then starts to increase, since differencing too much introduces correlation.
Take d equal to the value that minimises !ˆ 2 .

!ˆ 2 5 5

5
5

5 5 5 5
0 1 2 3 d

In the above example, take d=2, which is the value for which the estimated variance is minimised.

III.4 Seasonal differencing

Example: Let X be the monthly average temperature in London. Suppose that the model

xt = µ + 4t + yt

applies, where 4t is a periodic function with period 12 and yt is stationary. The seasonal difference of X
is defined as:

( "12 x )t = xt – xt !12

But: xt – xt-12 = (µ + 4t + yt) – (µ + 4t-12 + yt-12) = yt – yt-12 since 4t = 4t-12.

Hence xt – xt-12 is a stationary process. We can model xt – xt-12 as a stationary process and thus get a
model for xt.

Example: In the UK, monthly inflation figures are obtained by seasonal differencing of the retail prices
index (RPI). If xt is the value of RPI for month t, then annual inflation figure for month t is

x t - x t-12
!100%
x t-12

Remark 1: the number of seasonal differences taken is denoted by D. For example, for the seasonal
differencing X t ! X t !12 = "12 X t we have D=1.

Remark 2: in practice, for most time series we would need at most d=1 and D=1.

III.5 Method of moving averages

This method makes use of a simple linear filter to eliminate the effects of periodic variation. If X is a
time series with seasonal effects with even period d = 2h, we define a smoothed process Y by

1 !1 1 "
yt = # xt -h + xt -h +1 + ... + xt -1 + xt + ... + xt + h -1 + xt + h $
2h % 2 2 &

This ensures that each period makes equal contribution to yt.

Example with quarterly data: A yearly period will have d = 4 = 2h, so h = 2, and
yt = 3 ( / xt-2 + xt-1 + xt + xt+1 + / xt+2)

This is a centred moving average, since the average is taken symmetrically around the time t. Such an
average can only be calculated retrospectively.

For odd periods d = 2h + 1, the end terms xt-h and xt+h need not be halved:
1
yt = ( x t-h +x t-h+1 +...+x t-1 +x t +...+x t+h-1 + x t+h )
2h + 1
Example: with data every 4 months, a yearly period will have d = 3 = 2h+1, so h = 1 and

yt = 1/3 (xt-1 + xt + xt+1)

III.6 Seasonal means

In fitting the seasonal model

xt = µ + 4t + yt with E(Yt)=0 (additive model)

to a monthly time series, x extending over 10 years from January 1990, the estimate of µ is x (the
average over all 120 observations) and the estimate of 4January is

1
!ˆ January = (x1 +x13 +...+x109 )-x ,
10
the difference between the average value for January, and the overall average over all the months.

Recall that 4t is a periodic function with period 12 and yt is stationary. Thus, 4t contains the deviation of
the model (from the overall mean µ) at time t due to the seasonal effect.

Month/Year 1 2 .... 10 mean


January x1 x13 ... x109 !ˆ1
. . . .
. . . .
. . . .
December x12 x24 ... x120 !ˆ12
overall mean x

III.7 Filtering, smoothing

Filtering and exponential smoothing techniques are commonly applied to time series in order to “clean”
the original series from undesired artifacts. The moving average is an example of a filtering technique.
Other filters may be applied depending on the nature of the input series.
Exponential smoothing is another common set of techniques. It is used typically to “simplify” the input
time series by dampening its variations so as to retain in priority the underlying dynamics.

III.8 Transformations

Recall: In the simple linear model

yi = .0 + .ixi + ei

where ei ~ IN (0,)2), we use regression diagnostic plots of the residuals, eˆi , to test the assumptions
about the model (e.g. the normality of the error variables ei or the constant variance of the error variables
ei). To test the later assumption we plot the residuals against the fitted values.

eˆi
x x x xx
x x x x x
0
x x x x x x x
x x x x

ŷi

If the plot does not appear as above, the data is transformed, and the most common transformation is the
logarithmic transformation.

Similarly, if after fitting an ARMA model to a time series xt, a plot of the “residuals” versus the “fitted
values” indicates a dependence, then we should consider modelling a transformation of the time series
xt and the most common transformation is the logarithmic Transformation

Yt = ln(Xt)
IV. Box-Jenkins methodology
IV.1 Overview

We consider how to fit an ARIMA(p,d,q) model to historical data {x1, x2, ...xn}. We assume that trends
and seasonal effects have been removed from the data.

The methodology developed by Box and Jenkins consists in 3 distinct steps:

• Tentative identification of an ARIMA model


• Estimation of the parameters of the identified model
• Diagnostic checks

If the tentatively identified model passes the diagnostic tests, it can be used for forecasting.

If it does not, the diagnostic tests should indicate how the model should be modified, and a new cycle of

• Identification
• Estimation
• Diagnostic checks

is performed.

IV.2 Model selection

a. Identification of white noise

Recall: in a simple linear regression model, yi = .0 + .1Xi + ei, ei ~ IN(0,)2), we use regression
diagnostic plots of the residuals eˆi to test the goodness of fit of the model, i.e. if the assumptions
ei ~ IN(0,)2) are justified.

The error variables ei form a zero-mean white noise process: they are uncorrelated, with common
variance )2.

Recall: {et : t !! } is a zero-mean white noise process if

E (et ) = 0 $t
%! 2 , k = 0
" k = Cov(et , et #k ) = &
' 0, otherwise

Thus the ACF and PACF of a white noise process (when plotted against k) look like this:
ACF (*k) PACF ( !ˆk )

1 1
| | | | | | | | | | | |
1 2 3 ... k 1 2 3 ... k

-1 -1
i.e. apart from *0 = 1, we have *k = 0 for k = 1,2,... and !k = 0 for k = 1, 2,...

Question: how do we test if the residuals from a time series model look like a realisation of a white

noise process?

Answer: we look at the SACF and SPACF of the residuals. In studying the SACF and SPACF, we
realise that even if the original process was white noise, we would not expect rk = 0 for k = 1, 2,… and
!k = 0 for k = 1, 2,… as rk is only an estimate of *k and !ˆk is only an estimate of !k .

Question: how close to 0 should rk and !ˆk be, if rk = 0 for k = 1, 2, … and !ˆk = 0 for k = 1, 2, …?

Answer: If the original model is white noise, Xt = µ + et, then for each k, the SACF and SPACF satisfy
! 1$ ! 1$
rk ~ N # 0, & and !ˆk ~ N # 0, &
" n% " n%
This is true for large samples, i.e. for large values of n.

! 2 2 "
Values of rk or !ˆk outside the range $ # , % can be taken as suggesting that a white noise model is
& n n'
inappropriate.

However, these are only approximate 95% confidence intervals. If *k = 0, we can be 95% certain that rk
lies between these limits. This means that 1 value in 20 will lie outside these limits even if the white
noise model is correct.
Hence a single value of rk or !ˆk outside these limits would not be regarded as significant on its own, but
three such values might well be significant.

There is an overall Goodness of Fit test, based on all the rk’s in the SACF, rather than on individual rk’s,
called the Portmanteau test by Ljung and Box. It consists in checking whether the m sample
autocorrelation coefficients of the residuals are too large to resemble those of a white noise process
(which should all be negligible).

Given residuals from an estimated ARMA(p,q) model, under the null hypothesis that all values of rk = 0,
and the Q-statistic is asymptotically #2-distributed with s = m – p – q degrees of freedom, or, if a
constant (say µ) is included, s = m – p – q – 1 degrees of freedom.

If the white noise model is correct then


m
rk 2
Q = n(n + 2)# ! ! s2 for each s = m - p - q.
k =1 n " k

That is, under the null hypothesis that all values of rk = 0, the Q-statistic given above is asymptotically
#2-distributed with m degrees of freedom. If the Q-statistic is found to be greater than the 95th percentile
of that #2 distribution, the null hypothesis is rejected, which means that the alternative hypothesis that “at
least one autocorrelation is non-zero” is accepted. Statistical packages print these statistics. For large n,
the Ljung-Box Q-statistic tends to closely approximate the Box-Pierce statistic:
m
rk 2 m
2
n(n + 2) " ! n "r k
k =1 n - k k =1

The overall diagnostic test is therefore performed as follows (for centred realisations):
• Fit ARMA(p,q) model
• Estimate (p+q) parameters
• Test if
m
rk2 2
Q = n(n + 2) "n!k ~ ! m! p!q
k=1

Remark: the above Ljung-Box Q-statistic was first suggested to improve upon the simpler Box-Pierce
test statistic
m
Q = n! rk2
k =1
which was found to perform poorly even for moderately large sample sizes.

b. Identification of MA(q)

Recall: for an MA(q) process, #k = 0 for all k > q, i.e. the “ACF cuts off after lag q”.

To test if an MA(q) model is appropriate, we see if rk is close to 0 for all k > q. If the data do come from
an MA(q) model, then for k > q (since the first q+1 coefficients are significant),

" 1" q %%
rk ~ N $$ 0, $$1+ ! 2 !i2 ''''
# n # i=1 &&

and 95% of the rk’s should lie in the interval

" 1$ q
1$ q #
2% 2%
' &1.96 )1 + 2/ !i * , +1.96 )1 + 2/ !i * (
'- n+ i =1 , n+ i =1 , (.

(note that it is common to use 2 instead of 1.96 in the above formula). We would expect 1 in 20 values to
lie outside the interval. In practise, the #i’s are replaced by ri’s. The “confidence limits” on SACF plots
are based on this. If rk lies outside these limits it is “significantly different from zero” and we conclude
that #k $ 0. Otherwise, rk is not significantly different to zero and we conclude that #k = 0.

SACF
---
---
---
rk
1 2 k
---
---
---
For q=0, the limits for k=1 are

! 1.96 1.96 "


$# , %
& n n'

as for testing for white noise model. Coefficient r1 is compared with these limits. For q = 1, the limits
for k = 2 are

! 1 2 1 2
"
$ #1.96 (1 + 2r1 ),1.96 (1 + 2r1 ) %
& n n '

and r2 is compared with these limits. Again, 2 is often used in place of 1.96.

c. Identification of AR(p)

Recall: for an AR(p) process, we have !k = 0 for all k > p, i.e. the “PACF cuts off after lag p”.

To test if an AR(p) model is appropriate, we see if the sample estimate of !k is close to 0 for all k > p. If
the data do come from an AR(p) model, then for k > p,

! 1$
!ˆk ~ N # 0, &
" n%
and 95% of the sample estimates should lie in the interval

! 2 2 "
$# , %
& n n'

The “confidence limits” on SPACF plots are based on this: if the sample estimate of !k lies outside these
limits, it is “significant”.

Sample PACF of AR(1)


0.8
0.6
0.4
SPACF

0.2
-0.2 0.0

5 10 15

Lag k
IV.3 Model fitting

a. Fitting an ARMA(p,q) model

We make the following assumptions:

• An appropriate value of d has been found and {zd+1, zd+2, ... zn} is stationary.

• Sample mean z = 0; if not, subtract µˆ = z from each zi.

• For simplicity, we assume that d = 0 (to simplify upper and lower limits of sums).

We look for an ARMA(p,q) model for the data z:

• If the SACF appears to cut off after lag q, an MA(q) model is indicated (we use the tests of
significance described previously).

• If the SPACF appears to cut off after lag p, and AR(p) model is indicated.

If neither the SACF nor the SPACF cut off, mixed models must be considered, starting with
ARMA(1,1).

b. Parameter estimation: LS and ML

Having identified the values for the parameters p and q, we must now estimate the values of the
parameters (1, (2, ... (p and &1, &2, ..., &q in the model

Zt = (1Zt-1 + ... + (pZt-p + et + &1et-1 + &qet-q

Least squares (LS) estimation is equivalent to maximum likelihood (ML) estimation if et is assumed
normally distributed.
Example: in the AR(p) model, et = Zt – (1Zt-1 – ... – (pZt-p. The estimators !ˆ1 ,...,!ˆ p are chosen to
minimise
n

" (z t ! !ˆ 1z t-1 ! ... ! !ˆ p z t-p )2


t=p+1

Once these estimates obtained, the residual at time t is given by

eˆt = z " !ˆ1 zt -1 " ... " !ˆ p zt - p

For general ARMA models, êt cannot be deduced from the zt. In the MA(1) model for instance,

eˆt = zt " !ˆ1eˆt "1

We can solve this iteratively for êt as long as some starting value ê0 is assumed. For an ARMA(p,q)
model, the list of starting values is ( ê0 , ê1 , ..., êq!1 ). The starting values are estimated recursively by
backforecasting:
0. Assume ( ê0 , ê1 , ..., êq!1 ) are all zero
1. Estimate the (i and &j
2. Use forecasting on the time-reversed process {zn, ..., z1} to predict values for ( ê0 , ê1 , ..., êq!1 )
3. Repeat cycle (1)-(2) until the estimates converge.

c. Parameter estimation: method of moments

• Calculate theoretical ACF or ARMA(p,q): #k’s will be a function of the (’s and &’s.

• Set #k = rk and solve for the (’s and &’s. These are the method of moments estimators.

Example: you have decided to fit the following MA(1) model

xn = en + .en-1 , en ~ N(0,1)

You have calculated !ˆ0 =1, !ˆ1 = -0.25. Estimate ..

ˆ
We have r1 = ! 1 = -0.25.
!ˆ0

Recall: '0 = (1 + .2) )2 = 1 + .2 and '1 = .)2 = . here, from which *1 = ! .


1+! 2
Setting #1 = r1 = ! = -0.25 and solving for . gives . = -0.268 or . = -3.732.
1+! 2

Recall: the MA(1) process is invertible IFF |.| < 1. So for . = -0.268, the model is invertible. But for . =
-3.732 the model is not invertible.

Note: If !ˆ1 = -0.5 here, then #1 = r1 = ! = -0.5, which gives (. + 1)2 = 0, so . = -1, and neither
1+! 2
estimate gives an invertible model.

Now, let us estimate )2 = Var (et).

Recall that in the simple linear model Yi = &0 + &1Xi + ei, ei ~ IN(0, )2), )2 is estimated by

n
1
!ˆ 2 = " eˆ 2
i
n-2 i =1

where eˆi = yi - !ˆ0 - !ˆ1 xi is the ith residual. Here we use

1 n 2
!ˆ 2 = $ eˆt
n t = p +1
n
1
= $ ( z - "ˆ z t 1 t -1 -...- "ˆ p zt - p - #ˆ1eˆt -1 -...- #ˆq eˆt -q )
n t = p +1
No matter which estimation method is used this parameter is estimated last, as estimates of the (’s and
.’s are required first.

Note: In using either Least Squares or Maximum Likelihood Estimation we also find the residuals, ê t ,
whereas using the Method of Moments to estimate the ,’s and .’s these residuals have to be calculated
afterwards.

Note: for large n, there will be little difference between LS, ML and Method of Moments estimators.

d. Diagnostic checking

Assume we have identified a tentative ARIMA(p,d,q) model and calculated the estimates
ˆ "ˆ 1 , ... "ˆ p , #ˆ 1, ... ,#ˆ q .
ˆ !,
µ,

We must perform diagnostic checks based on the residuals. If the ARMA(p,q) model is a good
approximation to the underlying time series process, then the residuals ê t will form a good
approximation to a white noise process.

(I) Tests to see if the residuals are white noise:


! 1.96 1.96 "
• Study SACF and SPACF of residuals. Do rk and !ˆk lie outside $ # , %?
& n n'
• Portmanteau test of residuals (carried out on the residual SACF):
m
r2
n(n + 2) # k ~ ! m2 "s , for s = number of parameters of the model
k =1 n - k

If the SACF or SPACF of the residuals has too many values outside the interval !$ # 1.96 , 1.96 "% we
& n n'
conclude that the fitted model does not have enough parameters and a new model with additional
parameters should be fitted.

The Portmanteau test may also be used for this purpose. Other tests are:

• Inspection of the graph of {eˆ t }

• Counting turning points

• Study the sample spectral density function of the residuals

(II) Inspection of the graph of {eˆ t }:

plot ê t against t

• plot ê t against zt
any patterns evident in these plots may indicate that the residuals are not a realisation of a set of
independent (uncorrelated) variables and so the model is inadequate.
(III) Counting Turning Points:

This is a test of independence. Are the residuals a realisation of a set of independent variables?
Possible configurations for a turning point are:

In the diagram above, there exists a turning point for all configurations except (a) and (b). Since four out
of the six possible configurations exhibit a turning point, the probability to observe one is 4/6 = 2/3.

If y1, y2, ..., yn is a sequence of numbers, the sequence has a turning point at time k if

either
yk-1 < yk AND yk > yk+1
or
yk-1 > yk AND yk < yk+1

Result: if Y1, Y2, ... YN is a sequence of independent random variables, then

• the probability of a turning point at time k is 2/3

• The expected number of turning points is 2/3 (N - 2)

• The variance is (16N – 29)/90


[Kendall and Stuart, “The Advanced Theory of Statistics”, 1966, vol 3, p.351]

therefore, the number of turning points in a realisation of Y1, Y2, ... YN should lie within the 95%
confidence interval:

!2 $ 16 N # 29 % 2 $ 16 N # 29 % "
& ( N # 2) # 1.96 ( ) , ( N # 2) + 1.96 ( )'
,& 3 * 90 + 3 * 90 + -'

Study the sample spectral density function of the residuals:

Recall: the spectral density function on white noise process is f(#) = )2/2$ , -$ < # < $. So the sample
spectral density function of the residuals should be roughly constant for a white noise process.
V. Forecasting
V.1 The Box-Jenkins approach

Having fitted an ARMA model to {x1, x2, ... xn} we have the equation:

Xn+k = µ + ,1 (xn+k-1 – µ) + ... + ,p (xn+k-p – µ) + en+k + .1en+k-1 + ...+ .qen+k-q

x1 x2 ... ... xn .... xn+k


S

?
• • •

1 2 n n+k time

x̂ n (k) = Forecast value of xn+k, given all observations up until time n.

= k-step ahead forecast at time n.

In the Box-Jenkins approach, x̂ n (k) is taken as E(Xn+k | X1 , ... , Xn), i.e. x̂ n (k) is the conditional
expectation of the future value of the process, given the information currently available.

From result 2 in ST3053 (section A), we know that E(Xn+k | X1 , ... , Xn) minimises the mean square error
E(Xn+k – h( X1 , ... , Xn))2 of all functions h(X1 , ... , Xn).

x̂ n (k) is calculated as follows from the equation for Xn+k:

• Replace all unknown parameters by their estimated values

• Replace random variables X1, ..., Xn by their observed values x1 , ... , xn.

• Replace random variables Xn+1 , ... , Xn+k-1 by their forecast values, x̂ n (1) , ... , x̂ n (k-1)

• Replace variables e1 , ... , en by the residuals eˆ1 , ... , eˆ n

• Replace variables en+1 , ... , en+k-1 by their expectations 0.

Example: AR(2) model xn = µ + !1 ( xn-1 - µ ) + ! 2 ( xn-2 - µ ) + en . Since

X n+1 = µ + !1 ( X n – µ ) + ! 2 ( X n"1 – µ ) + en+1


X n+2 = µ + !1 ( X n+1 – µ ) + ! 2 ( X n – µ ) + en+2
we have
xˆn (1) = µˆ + !ˆ1 ( xn " µˆ ) + !ˆ 2 ( xn-1 " µˆ )
xˆn (2) = µˆ + !ˆ1 ( xˆn (1) " µˆ ) + !ˆ 2 ( xn " µˆ )
Example: 2-step ahead forecast of an ARMA(2,2) model

xn = µ + !1 ( xn-1 - µ ) + ! 2 ( xn-2 - µ ) + en + "2en#2 .

Since xn+2 = µ + !1 ( xn+1 - µ ) + ! 2 ( xn - µ ) + en+ 2 + "2en , we have

xˆn (2) = µˆ + !ˆ1 ( xˆn (1) - µˆ ) + !ˆ 2 ( xn - µˆ ) + "ˆ2eˆn

The (forecast) error of the forecast x̂ n (k) is

x n+k - xˆ n (k )

The expected value of this error is

E(xn+k - xˆ n (k) | x1,...,x n ) = xˆ n (k) - xˆ n (k) = 0

Hence the variance of the forecast error is

E((x n+k ! xˆ n (k ))2 | x1 ,..., x n )

This is needed for confidence interval forecasts as it is more useful than a point estimate.

For stationary processes, it may be shown that x̂ n (k) ! µ as k ! " . Hence, the variance of the
forecast error tends to E(xn+k-µ)2 = )2 as k % &, where )2 is the variance of the process.

V.2 Forecasting ARIMA processes

If X is ARIMA(p,d,q) then Z = ! d X is ARMA(p,q).

• Use methods reviewed to produce forecasts for Z

• Reverse the differencing procedure to produce forecasts for X

Example: if X is ARIMA(0,1,1) then Z = !X is ARMA(0,1), leading to the forecast ẑn (1) .

But Xn+1 = Xn + Zn+1, so xˆ n (1) = x n + zˆ n (1)

Question: Find x̂ n (2) for an ARIMA(1,2,1) process.

Let Z n = ! 2 X n and assume Zn = µ + , (Zn-1 – µ) + en + .en-1, but

Z n+ 2 = !2 X n+ 2 = ( X n+ 2 " X n+1 ) " ( X n+1 " X n )


= X n+ 2 " 2 X n+1 + X n
so Xn+2 = 2Xn+1 – Xn + Zn+2. Hence,

ˆ ˆ n (1) ! µˆ
xˆ n (2) = 2xˆ n (1) ! x n + zˆ n (2) = 2xˆ n (1) ! x n + µˆ +!(z

V.3 Exponential smoothing and Holt-Winters

• The Box-Jenkins method requires a skilled operator in order to obtain reliable results.

• For cases where only a simple forecast is needed, exponential smoothing is much simpler (Holt,
1958).

A weighted combination of past values is used to predict future observations. For example, the first
forecast for an AR model is obtained by

( 2
xˆn (1) = ! xn + (1 " ! ) xn "1 + (1 – ! ) xn "2 + ... )
or

"
!
xˆn (1) = ! # (1- ! )i xn-i = xn
i =0 1- (1- ! ) B

!
!
• The sum of the weights is ! " (1-!)i = =1
i=0 1-(1-!)

• Generally we use a value of , such that 0 < , < 1, so that there is less emphasis on historic values
further back in time (usually, 0.2 6 , 6 0.3).

• There is only one parameter to control, usually estimated via least squares.

• The weights decrease geometrically – hence the name exponential smoothing.

Updating forecasts is easy with exponential smoothing:

Xn-1 Xn Xn+1

5
?
5
| | |
n-1 n n+1

It is easy to see that

xˆn (1) = (1- ! ) xˆn-1 (1) + ! xn = xˆn-1 (1) + ! ( xn - xˆn-1 (1))

Current forecast = previous forecast + , 7 (error in previous forecast).


• Simple exponential smoothing can’t cope with trend or seasonal variation.

• Holt-Winters smoothing can cope with trend and seasonal variation

• Holt –Winters can sometimes outperform Box-Jenkins forecasts.

V.4 Linear filtering

input process linear filter output process


xt yt

time Series filter weights time series

A linear filter is a transformation of a time series {xt} (the input series) to create an output series {yt}
which satisfies:

!
yt = #a k x t-k .
k= "!

The collection of weights {ak : k % Z} forms a complete description of the filter.

The objective of the filtering is to modify the input series to meet particular objectives, or to display
specific features of the data. For example, an important problem in analysis of economic time series is
detection, isolation and removal of deterministic trends.

In practice, a filter {ak : k % Z} normally contains only a relatively small number of non-zero
components.

Example: regular differencing. This is used to remove a linear trend. Here a0 = 1, a1 = -1, ak = 0
otherwise. Hence yt = xt – xt-1.

Example: seasonal differencing. Here a0 = 1, a12 = -1, ak = 0 otherwise, and yt = xt – xt-12.

Example: if the input series is a white noise and the filter takes the form {.0 = 1, .1, ... , .q}, then the
output series is MA(q), since
q
yt = " ! k et -k
k =0

If the input series, x, is AR(p), and the filter takes the form {,0 = 1, -,1, ... , -,p}, then the output series is
white noise
p
yt = xt " # ! k xt -k = et
k =1
VI. Multivariate time series analysis

VI.1 Principal component analysis and dimension reduction

a. Principle Component Analysis

See lectures and practicals.

b. Multivariate correlation: basic properties

The multivariate process (X,Y,Z) defined by


! ! µX " ! # XX # XY # XZ " "
$$ % $ %%
( X , Y , Z ) ~ N $ $ µY % , $ #YX #YY #YX % %
$$ µ % $# # ZY # ZZ %' %'
&& Z ' & ZX
satisfies the following:

X adjusted for Z:
-1
#$ X = µ X + ! ( Z " µ Z )
X ! E( X | Z ) = ( X ! µ X ) ! " XZ "ZZ ( z - µZ ) % ˆ -1
$& ! = ' XZ ' ZZ

Y adjusted for Z:

-1
Y ! E(Y | Z ) = (Y ! µY ) ! "YZ "ZZ ( z - µZ )

Partial Covariance :

-1 -1
Cov "$( X ! µ X ) ! & XZ & ZZ ( z - µ Z ), (Y ! µY ) ! & YZ & ZZ ( z - µ Z ) #%
-1 -1
= E " "$( X ! µ X ) ! & XZ & ZZ ( z - µ Z ) #% "$(Y ! µY ) ! & YZ & ZZ ( z - µ Z ) #% #
$ %
-1 -1
= E "$( X ! µ X )(Y ! µY ) ! & XZ & ZZ ( z - µ Z )( z - µ Z ) -1 & ZZ & YZ
#
%
!1
= & XY ! & XZ & & ZZ ZY

Variance:
-1
Var "$( X ! µ X ) ! & XZ & ZZ ( z - µZ ) #%
-1 -1
= E " "$( X ! µ X ) ! & XZ & ZZ ( z - µZ ) #% "$( X ! µ X ) ! & XZ & ZZ ( z - µZ ) #% #
$ %
!1
= & XX ! & XZ & & ZZ ZX

and
-1 !1
Var "$(Y ! µY ) ! & YZ & ZZ ( z - µ Z ) #% = & YY ! & YZ & & ZZ ZY

from which we get the partial correlation


-1

P( X , Y | Z ) =
! -! ! !
XY XZ ZZ ZY
-1 -1
! XX
- ! ! ! ! -! ! !
XZ ZZ ZX YY YZ ZZ ZY

Now substituting

X = Xt 8XY = '2

Z = Xt+1 8XZ = '1 = 8XZ

Y = Xt+2 8ZZ = '0

we get

! 2 $ ! 1! 0-1! 1
P( X t , X t + 2 X t +1 ) =
! 0 $ ! 1! 0-1! 1
2
!2 %! &
$' 1 (
!0 ) !0 *
= 2
% !1 &
1$ ' (
) !0 *
" $ "2
= 2 21 = #2
1 $ "1
1 "1
"1 "2
=
1 "1
"1 1

VI.2 Vector AR processes

A univariate time series consists of a sequence of random variables Xt, where Xt is the value of the
single variable X of interest at time t.

An m-dimensional multivariate time series consists of a sequence of random vectors X1, X2 , ... There
are m variables of interest, denoted X(1) , ..., X(m), and Xt(m) is the value of X(m) at time t.

Thus at time t we have a vector of observations


! X t(1) "
# $
Xt = # ! $
# X t( m ) $
% &

X1 ... Xt ... Xn
single
variable X

1 ... t ... n time

X1 ... Xt ... Xn

X1(1) Xt(1) Xn(1)


m variables of
interest . . .

(X(1) , ... , X(m)) . . .

. . .

X1(m) Xt(m) Xn(m)

1 ... t ... n time

As for the second order properties of Xk, we use

• Vectors of expected values µt = E(Xt)

• Covariance Matrices Cov(Xt, Xt+k) for all pairs of random vectors

The vector process {Xt} is weakly stationary if E(Xt) and Cov(Xt, Xt+k) are independent of t. Let µ
denote the common mean vector E(Xt) and 8k denote the common lag k covariance matrix, i.e.

8k = Cov(Xt, Xt+k)

In the stationary case,


(1) (m)
X X
(1)
X !* ... *"

%k = #! ... !
$
# $
X
(m)
#&* ... *$
'

For k=0, 8k is the variance/covariance matrix of X(1), ..., X(m).

8k(1,1) = Cov (xt(1) , xt+k(1)) = covariance at lag k for x(1).

8k(i,j) = Cov (xt(i) , xt+k(j)) = lag k cross-covariance of X(i) with x(j).

Example: Multivariate White Noise. Recall that univariate white noise is a sequence e1, e2, ... of random
variables with E(et) = 0 and Cov(et, et+k) = )2 1(k=0) (where 1(.) is the indicator function). Multivariate
white noise is the simplest example of a multivariate random process.

Let e1 , e2 , ... be a sequence of independent, zero-mean random vectors, each with the same covariance
matrix 8. Thus for k = 0, the lag k covariance matrix of the et’s is 8k = 8.

But since the et’s are independent vectors 8k= 0 for k > 0.Thus 8 need not be a diagonal matrix, i.e.
the components of et at time t need not be independent of each other. However, the et’s are independent
vectors -- the components of et and et+k are independent for k > 0.

Example: A vector autoregressive process of order P, Var(P), is a sequence of m-component random


vectors {X1, X2, ...} satisfying
P
xt = µ + " Aj ( xt - j ! µ ) + et
j =1

where e is an m-dimensional white noise process and the Aj are (m x m) matrices.

Example: Let it denote the interest rate at time t and It the tendency to invest at time t. We might believe
these two are related as follows:

#$it – µi = !11 ( it "1 – µi ) + et (i )


% (I )
$& I t – µ I = ! 21 ( it "1 – µi ) + ! 22 ( I t "1 – µ I ) + et

where e(i) and e(I) are zero-mean, univariate white noise. They may have different variances and are not
necessarily uncorrelated, i.e. we do not require that Cov (et(i) , et(I) ) = 0 for any t. However, we do require
Cov (et(i) , es(I) ) = 0 for s 9 t.

The model can be expressed as a 2-dimensional VAR(1):

! i t -µ i " ! !11 0 " ! i t-1 -µ i " ! e(i) t


"
# $=# $# $ + # (I) $
% I t -µ I & % ! 21 ! 22 & % I t-1 -µ I & % et &
The theory and analysis of Var(1) closely parallels that of a univariate AR(1).
Recall: The AR(1) model xt = µ + , (xt-1 – µ) + et is stationary IFF | , | < 1. For the Var(p) process with
p = 1 (Var(1))

X t = µ + A( X t !1 – µ ) + et

we have
t -1
X t = µ + ! A j et - j + At ( X 0 - µ )
j =0

In order that X should represent a stationary time series, the powers of A should converge to zero in
some sense: this will happen if all eigenvalues of the matrix A are less than 1 in absolute magnitude.

Recall eigenvalues (see appendix): 2 is an eigenvalue of the n x n matrix A if there is a non-zero vector
x (called the eigenvector) such that

Ax = 2x or (A – 2I) x = 0

These equations have a non-zero solution x IFF | A – 2I | = 0. This equation is solved for 2 to find the
eigenvalues.

2"! 1
Example: Find the eigenvalues of !# 2 1 "$ . Solution: Solve = 0 which is
% 4 2& 4 2"!

equivalent to (2 – 2)2 – 4 = 22 - 42 = 2 (2 – 4) = 0. The eigenvalues are 0 and 4.

Question: Is the following multivariate time series stationary?

x
! x t " ! 0.3 0.5 " ! x t-1 " ! e t "
# $=# $# $+# y $
% y t & % 0.2 0.2 & % y t-1 & #% e t $&

0.3 0.5 "


We find the eigenvalues of !# $:
% 0.2 0.2 &

0.3 " ! 0.5


= (0.3 " ! )(0.2 " ! ) " 0.1
0.2 0.2 " !
= ! 2 " 0.5! " 0.04 = 0

! 2 = 0.57, -0.07

Since | 2 | < 1 for both eigenvalues, the process is stationary.

Question: Write the model in question 7.18 in terms of Xt only. Show that Xt is stationary in its own
right. Solution: The model can be written as:
"# X t = 0.3 X t !1 + 0.5Yt !1 + et X (1)
$ Y
#%Yt = 0.2 X t -1 + 0.2Yt -1 + et (2)
Rearranging (1): Yt-1 = 2(Xt – 0.3Xt-1 – etX) so Yt = 2(Xt+1 – 0.3Xt – et+1X). Substituting for Yt and Yt-1 in
(2) and tidying up:

Xt+1 = 0.5Xt + 0.04Xt-1 + et+1X – 0.2etX + 0.5etY

Since the white noise terms do not affect stationarity, the characteristic equation is
1 – 0.5 2 – 0.04 22 = 0

Since the model can be written as (1 – 0.5B -0.04B2)Xt = ..., the roots of the characteristic equation are 21
= -14.25 and 22 = 1.75. Since | 2 | > 1 for both roots, the Xt process is stationary.

Example: A 2-dimensional VAR(2). Let Yt denote the national income over a period of time, Ct the total
consumption over the same period, and It the total investment over the same period. We assume Ct = ,Yt-
(1) (1)
1 + et , where e is a zero-mean white noise (consumption over a period depends on the income over
the previous period).

We assume It = . (Ct-1 – Ct-2) + et(2), where e(2) is another zero-mean white noise.

We assume Yt = Ct + It (any part of the national income is either consumed or invested).

Eliminating Yt, we get the following 2-dimensional VAR(2):

Ct = , Ct-1 + , It-1 + et(1)

It = . (Ct-1 – Ct-2) + et(2)

Using matrix notation, we get

$ Ct % $ ! ! % $ Ct #1 % $ 0 0 % $ Ct #2 % $ et(1) %
& '=& '& '+& '& '+& '
( It ) ( " 0 ) ( It #1 ) ( #" 0 ) ( It #2 ) ( et(2) )

VI.3 Cointegration

Cointegrated time series can be applied to analyse non-stationary multivariate time series.

Recall: X is integrated of order d (X is I(d)) if Y = ! d X is stationary.

For univariate models, we have seen that a stochastic trend can be removed by differencing, so that the
resulting time series can be estimated using the univariate Box-Jenkins approach. In the multivariate
case, the appropriate way to treat non-stationary variables is not so straightforward, since it is possible
for there to be a linear combination of integrated variables that is stationary. In this case, the variables
are said to be cointegrated. This property can be found in many econometric models.
Definition: Two time series X and Y are called cointegrated if:

i) X and Y are I(1) random processes

ii) There exists a non-zero vector (, , .) such that ,X + .Y is stationary.

Thus X and Y are themselves non-stationary, (being I(1)), but their movements are correlated in such a
way that a certain weighted average of the two processes is stationary. The vector (, , .) is called a
cointegrating vector.

We may expect two processes to be cointegrated if

• one of the processes is driving the other;

• both are being driven by the same underlying process.

Remarks:
R1 – Any equilibrium relationship among a set of non-stationary variables indicates that the variables
cannot move independently of each other, and implies that their stochastic trends must be linked. This
linkage implies that the variables are cointegrated.
R2 – If the linear relationship (as made obvious by cointegration) is already stationary, differencing the
relationship entails a misspecification error.
R3 – There are two main popular tests for cointegration, but they are not the only ones.
Reference: see e.g. Enders, “Applied Econometric Time Series”, Wiley 2004.

Example: Let Xt denote the U.S. Dollars/GB Poung exchange rate. Let Pt be the consumer price index
for the U.S. and Qt the consumer price index for the U.K.

It is assumed that Xt fluctuates around the purchasing power Pt/Qt according to the following model:

ln Xt = ln(Pt/Qt) + Yt
Yt = µ + , (Yt-1 – µ) + et + . et-1

where e is a zero-mean white noise.

We assume ln P and ln Q follow ARIMA(1,1,0) models:

(1-B) ln Pt = µ1 + ,1 [(1-B) ln Pt-1 – µ1] + et(1)


(1-B) ln Qt = µ2 + ,2 [(1-B) ln Qt-1 – µ2] + et(2)

where e(1) and e(2) are zero-mean white noise, possibly correlated. Since ln Pt and ln Qt are both
ARIMA(1,1,0) processes, they are both I(1)-non-stationary, and ln Xt is also non-stationary. However,

lnXt – ln Pt + ln Qt = Yt

and Yt is an ARMA(1,1) process-stationary. Hence, the sequence of random vectors

{(ln Xt, ln Pt, ln Qt): t = 1,2,...}

is a cointegrated model with cointegrating vector (1,-1,1).

Question: Show that the two processes Xt and Yt defined by


Xt = 0.05Xt-1 + 0.35Yt-1 + etX . . . (1)
Yt = 0.35Xt-1 + 0.65Yt-1 + etY . . . (2)

are cointegrated, with cointegrating vector (1,-1).

Solution: We have to show that Xt – Yt is a stationary process. If we subtract the second equation from
the first, we get

Xt – Yt = 0.3Xt-1 – 0.3Yt-1 + etX - etY


= 0.3 (Xt-1 – Yt-1) + etX - etY

Hence the process is stationary, since |0.3| < 1; the white noise terms don’t affect the stationarity.
Strictly speaking, we should also show that the processes Xt and Yt are both I(1). We use the method of
question 7.19 to find the process Yt: from the first equation (1) we have

Yt-1 = 1/0.35 (Xt – 0.05Xt-1 – etX)


and so

Yt = 1/0.35 (Xt+1 – 0.05Xt – et+1X).

Substituting in the second equation (2), gives


1 1
( X t +1 ! 0.05 X t ! et +1 X ) = 0.35 X t -1 + (0.05) ( X t ! 0.05 X t -1 + et X ) + et Y
0.35 0.35

Tidying up, we have:

Xt+1 = 1.3Xt – 0.3 Xt-1 +et+1X – 0.05etX + 0.35etY

If this is to be an I(1) process, we need to show that the first difference is I(0). Look at the characteristic
equation or re-write the above equation in terms of differences:

!X t +1 = 0.3 !X t + et +1 X – 0.05et X + 0.35et Y

since |0.3| < 1, this process is I(0) and so Xt is I(1). Similarly, Yt can be shown to be I(1).

VI.4 Other common models

a. Bilinear models

The simplest example of this class is

Xn + ,(Xn-1 – µ) = µ + en + .en-1 + b(Xn-1 – µ)en-1

Considered as a function of X, this relation is linear; it is also linear when considered as a function of e
only; hence, the name “bilinear”.

• Many bilinear models exhibit “burst” behaviour: When the process is far from its mean, it tends
to exhibit larger fluctuations.
• The difference between this model and ARMA(1,1) is in the final term: b(Xn-1 – µ)en-1.
If Xn-1 is far from µ and en-1 is far from 0, this term assumes a much greater significance.

b. Threshold AR models

Let us look at a simple example:

$! ( X " µ ) + en , if X n"1 # d
X n = µ + % 1 n"1
&! 2 ( X n"1 " µ ) + en , if X n "1 > d

These models exhibit cyclic behaviour.

Example: set ,2 = 0. Xn follows an AR(1) process until it passes the threshold value d. Then Xn returns to
µ and the process effectively starts again. Thus we get cyclic behaviour as the process keeps resetting.

t
c. Random coefficient AR models

Consider a simple example: Xt = µ + ,t (Xt-1 – µ) + et, where {,1, ,2, ...} is a sequence of independent
random variables.

Example: Xt = value of investment fund at time t. We have Xt = (1 + it) Xt-1 + et. It follows that µ = 0
and ,t = 1 + it where it is the random rate of return. The behaviour of such models is generally more
irregular than that of the corresponding AR(1) model.

VI.5 ARCH and GARCH

a. ARCH

Recall: Homoscedastic = constant variance


Heteroscedastic = different variances

Financial assets often display the following behaviour:

- A large change in asset price is followed by a period of high volatility.


- A small change in asset price tends to be followed by further small changes.

Thus the variance of the process is dependent upon the size of the previous value. This is what is meant
by conditional heteroscedasticity.

The class of autoregressive models with conditional heteroscedasticity of order p – the ARCH(p) models
– is defined by:
p
X t = µ + et !0 +! ! k (X t-k - µ)2
k=1

where e is a sequence of independent standard normal variables.

Example: The ARCH(1) model

X t = µ + et !0 + !1 (X t-1 - µ)2

A significant deviation of Xt-1 from the mean µ gives rise to an increase in the conditional variance of
Xt, given Xt-1:
(Xt - µ)2 = et2 (,0 + ,1 (Xt-1 – µ)2)

E[(Xt – µ)2 | Xt-1] = ,0 + ,1 (Xt-1 – µ)2

Example: Let Zt denote the price of asset at the end of the tth trading day, and let Xt = ln(Zt/Zt-1) be the
daily rate of return on day t.
It has been found that the ARCH model can be used to model Xt.

Brief history of cointegration and ARCH modelling:


• Cointegration (1981 - ) – Granger
• ARCH (1982 - ) – Engle
2003 Nobel prize in Economics – Engle/Granger

b. GARCH

You might also like