CHAPTER 1: Modelling and Forecasting Stationary Univariate
Time Series
1. Time series and random process
1.1 Definitions
Definition : Time series= succession of the observations of a
variable over time.
Example : Daily value of the Dow Jones Index from January
2000 to January 2007
Definition : Let X be a random variable. The set of values
taken by X over time t is called a random process X t tZ .
A time series is a time indexed random process.
1.2 Some useful characteristics of a random process
Expectation E ( X t ) xf ( x)dx (continuous random variable)
t
1 T
- Estimator calculated on a sample of T observations of the process X t t Z : X T Xt
T t 1
1
Variance V ( X t ) E X t E ( X t ) E ( X t ²) E ( X t )²
2
1 T
- Estimator calculated on a sample of T observations: S T
2
t 1
X t X T 2
T
Autocovariance cov( X t , X t h ) E X t E ( X t ) X t h E ( X t h ) E X t X t h E X t E X t h
1 1 T 1
ˆ ( h)
- Estimator :
T
X X X X with X X et X
T
X t h
t h 1 t 1
T h t h 1
t T t h T h T t T h
T h T
cov( X t , X t h )
Autocorrelation (ACF) cor ( X t , X t h )
X X
t t h
- Remark: in the case of a stationary process
X X
t t h
cor ( X t , X t h ) cov( X t , X t h ) / V ( X t ) (h) / (0)
- Estimator calculated on a sample of T observations (the case of a stationary process) :
ˆ 1
ˆ (h) (h) ˆ
(0)
ˆ ( h)
with
T
t h 1
X t X T X t h X T h and ˆ (0) ST2 1 Tt1 X t X T 2 .
T h T
- Definition : Correlogram of a process = graph of (h) as a function of h .
Partial Autocorrelation (PACF)
- It measures the correlation between an observation h periods ago and the current observation, after
controlling for observations at intermediate lags (i.e. all lags < h). So it measures the correlation between
X t and X t h after removing the effects of Xt-h+1 , Xt-h+2 , …, Xt-1 .
2
- Estimator ˆhh of hh : OLS estimator of the last parameter in the regression below
X t cˆ ˆh1 X t 1 ˆh 2 X t 2 ˆh, h 1 X t h 1 ˆhh X t h ˆt
- Definition : Partial Correlogram of a process = graph of hh as a function of h .
1.3 The concept of stationarity
Definition : The process X t is called second-order stationary (or weakly Stationary) if its two first moments
are finite and time invariant (they don't depend on the period t), that is:
i) E( X t ) m
ii) V (Xt ) 2
iii) cov( X t , X t h ) (h) t , h Z
t
Examples and counter-examples: X t a t , X t i , X t a bt t with t i.i.d .(0, ²)
i 1
Graphically : one must observe regular fluctuations around one a stationary point
Example Counter-example
3
2. ARMA (Autoregressive Moving Average) processes.
2.1 Definitions and properties
a) Definitions
Definition 1 : A standard white noise process t has the following properties:
(i) E t 0 t (null expectation)
(ii) V t 2 t (homoskedastic)
(iii) cov( t , t ' ) 0 for t t ' ( not autocorrelated)
In what follows we will denote it by t ~ BB(0, 2 ) .
Definition 2 : A process X t is a moving average of order q (MA(q)) if it satisfies the following equation:
X t m t 1 t 1 2 t 2 q t q X t m ( L) t with ( L) 1 1 L q Lq
, where the coefficients i IR , i 1,, q , q 0 and t ~ BB(0, 2 ) .
Definition 3 : A process X t is a p order autoregressive process (AR(p)) if it satisfies the following equation:
X t c 1 X t 1 2 X t 2 p X t p t ( L) X t c t avec ( L) 1 1 L p L p
, where the coefficients i IR , i 1,, p , p 0 and t ~ BB(0, 2 ) .
Definition 4 : A process X t is an autoregressive moving average of order p and q (ARMA(p,q)) if it satisfies
the following equation:
X t c 1 X t 1 2 X t 2 p X t p t 1 t 1 2 t 2 q t q
( L) X t c ( L) t with ( L) 1 1 L p L p et ( L) 1 1 L q Lq
4
,where i IR , i 1,, p , i IR , i 1,, q , p 0 , q 0 and t ~ BB(0, 2 ) .
Note: It enables to get a more parsimonious representation of the data since it permits to approximate, in an
appropriate way, a particular time series modeled with a relatively large order of the pure AR or pure MA model.
b) Properties
Theorem 1 : A MA(q) process such that:
X t m ( L) t avec ( L) 1 1 L q Lq
is said to be invertible if all the roots in L of the ( L) polynomial lie outside the unit circle (i.e. have all modulus
strictly higher than one).
Theorem 2 : An AR(p) process such that:
( L) X t c t avec ( L) 1 1 L p L p
is said to be stationary if all the roots in L of the ( L) polynomial lie outside the unit circle (i.e. have all modulus
strictly higher than one).
Theorem 3 : An ARMA(p,q) process such that:
( L) X t c ( L) t avec ( L) 1 1 L p L p et ( L) 1 1 L q Lq
is said to be stationary and invertible if all the roots in L of the ( L) and ( L) polynomials lie outside the unit circle
(i.e. have all modulus strictly higher than one).
1
Example : (1 L) X t c t is stationary if 1 1.
Implication : if the process is stationary then the effect diminishes towards zero over time.
5
Proof in the case of an AR(1) process: (1 L) X t t
1
(1 L) X t t X t t i Li t if 1. X t t t 1 2 t 2 3 t 3 ........
1 L i 0
Assuming a t shock in t, all other shocks being equal to zero, we have:
X t t , X t 1 t , X t 2 2 t ,…, X t i i t 0 if 1 ; On the contrary if 1 (unit root), → infinite
i
persistence of the shock.
6
Some useful results for the identification of ARMA processes
Result 1 : A stationary AR(p) process has the following features:
- Its autocorrelation function is geometrically declining (as in graph 1), or has a sinusoidal shape (as in graph 2).
- Only the p first terms of its partial autocorrelation function are significantly different from 0.
X t 0.9 X t 1 0.2 X t 2 t X t 0.9 X t 1 0.8 X t 2 t
7
Result 2 : An inversible MA(q) process has the following features:
- Only the q first terms of its autocorrelation function are significantly different from 0.
- Its partial autocorrelation function is geometrically declining (as in graph 1), or has a sinusoidal shape (as in
graph 2).
X t t 0.9 t 1 0.2 t 2 X t t 0.9 t 1 0.8 t 2
8
Result 3 : A stationary and inversible ARMA(p,q) process has the following features:
- Its autocorrelation function is geometrically declining (as in graph 1), or has a sinusoidal shape (as in graph 2).
- Its partial autocorrelation function is geometrically declining (as in graph 1), or has a sinusoidal shape (as in
graph 2).
X t 0.9 X t 1 t 0.5 t 1 X t 0.8 X t 1 t 0.5 t 1
9
2.2 Identification, estimation and forecast of stationary processes
a) Identification of the p and q orders
1st method: correlogram and partial correlogram
- Principle: Calculation and representation of the empirical autocorrelations and empirical partial autocorrelations
ˆ (h) and ˆhh , h 1,, H max for the observations xt Tt1 and use of the results 1,2 and 3 to identify the nature and
the order of the process.
- Examples: Growth rate of the monthly French Growth rate of the Dow Jones Index from March 2000
unemployment over the 1989.12 to 2004.2 period to January 2007
10
2nd method : Information Criteria
- Principle: Estimation of the ARMA(p,q) process for different lags p 0,1, , p max and q 0,1, , q max and choice
of the model minimizing the Akaike information criterion (AIC), the Bayesian information criterion (BIC) or
Schwarz criterion.
- Definition :
2
The Akaike information criterion can be calculated as AIC ( K ) T ln ˆ e 2 K
2
The Bayesian information criterion can be calculated as BIC ( K ) T ln ˆ e K ln T
2
, with ˆ e the estimated variance of the residuals, K the number of parameters of the model, and T the number
of observations.
- Example : Growth rate of the monthly French unemployment from 89.12 to 2004.2
Bayesian information criterion for the estimated ARMA(p,q) process
p/q 0 1 2 3 4 5 6 7 8 9 10 11 12
0 -6,55 -6,66 -6,74 -6,95 -6,92 -6,93 -6,98 -6,96 -6,96 -6,93 -6,90 -6,90 -6,88
1 -6,75 -6,98 -7,06 -7,04 -7,03 -7,02 -6,99 -6,97 -6,94 -6,91 -6,89 -6,88 -6,85
2 -6,92 -7,02 -7,04 -7,02 -6,99 -6,98 -6,96 -6,94 -6,92 -6,89 -6,86 -6,87 -6,84
3 -7,14 -7,11 -7,08 -7,06 -7,08 -7,08 -7,06 -7,04 -7,01 -6,98 -6,95 -6,92 -6,86
4 -7,11 -7,09 -7,06 -7,03 -7,08 -7,05 -7,01 -7,01 -6,98 -6,93 -6,86 -6,91 -6,93
5 -7,09 -7,08 -7,13 -7,04 -7,01 -7,04 -7,00 -7,03 -7,00 -6,97 -6,95 -6,92 -6,82
6 -7,07 -7,05 -7,06 -7,03 -6,98 -6,98 -7,03 -6,93 -6,97 -6,94 -6,92 -6,83 -6,81
7 -7,04 -7,01 -7,00 -6,99 -7,00 -6,95 -6,94 -6,90 -6,87 -7,01 -6,82 -6,85 -6,92
8 -7,01 -6,98 -7,04 -7,12 -7,00 -6,93 -6,87 -7,04 -6,92 -6,85 -6,90 -6,79 -6,83
9 -6,97 -6,99 -7,00 -6,93 -6,96 -6,96 -6,95 -6,91 -6,93 -6,84 -6,86 -6,81 -6,78
10 -6,94 -6,95 -6,97 -6,86 -6,96 -6,91 -6,95 -6,86 -6,89 -6,83 -6,78 -6,76 -6,73
11 -6,93 -6,90 -6,87 -6,94 -6,91 -6,90 -6,92 -6,89 -6,80 -6,75 -6,73 -6,77 -6,66
12 -6,89 -6,90 -6,87 -6,80 -6,87 -6,86 -6,82 -6,85 -6,77 -6,72 -6,62 -6,66 -6,66
11
b) Estimation of ARMA processes
Method : Generally, using a maximum-likelihood estimation (MLE), or Ordinary Least Squares (OLS).
Properties: The estimators of the coefficients i , i 1,, p et i , i 1,, q , are convergent and asymptotically
normally distributed.
Consequences : Usual t-statistics can be used to test for the significance of the estimated coefficients
(Student's t-test, Fisher's test).
c) Specification Tests
Test for the significance of the estimated coefficients
Test for serial autocorrelation: the Ljung Box test
Let j denote the autocorrelation of order j of the t error process of an ARMA(p,q).
- Assumption : H 0 : 1 2 H 0 against H 1 : j 1,2,, H such that j 0
ˆ h2
- the Ljung-box test statistic: LB ( H ) T (T 2)h 1 ²( H K ) , where K is the number of parameters of
H
T h H 0
the model.
2
- Decision rule: if LB ( H ) H K ;1 , reject H 0 (or when Prob< )
12
d) Forecasting with ARMA Models
Definition
For a forecast horizon h, the optimal prediction made in T of X T h , denoted Xˆ T ( h) ou Xˆ T h T is given by:
Xˆ T (h) E X T h I T , I T X 1 ,, X T the set of information available at date T.
The associated forecast error eˆT (h) can be written as: eˆT (h) X T h Xˆ T (h) .
Proposition
Let X t be an ARMA process [ ( L) X t c ( L) t ] with all the roots in L of the (L) and ( L) polynomials lying
outside the unit circle.
E T i I T 0 et E T 1i I T T 1i i 1
E X T i I T Xˆ T (i ) et E X T 1i I T X T 1i i 1
, where I T X 1 , , X T denotes the set of information available at date T. The t process is called innovation
since it is the unpredictable part of the process.
Example : Prediction of an AR(1) stationary process X t X t 1 t , t i.i.d . N (0, 2 )
Xˆ T (1) E X T 1 I T E X T I T X T
Xˆ T (2) E X T 2 I T E X T 1 I T Xˆ T (1) ² X T
Xˆ T (3) E X T 3 IT E X T 2 IT Xˆ T (2) 3 X T
…
Xˆ T (h) E X T h IT E 1 X T h 1 2 X T h 2 T h IT Xˆ T (h 1) h X T
13
TECHNICAL APPENDIX
Skewness coefficient, Kurtosis coefficient, and the Jarque–Bera test
Definition : The Skewness coefficient of a variable X t of mean m X and standard deviation X
is defined as:
X t m X 3
SK X E
3X
Its estimator calculated on a sample of T observations of the X t process is given by:
1 T X X
3
SKˆ X t 1 t 3 T
T SX
1 T 1 T
, with X T X , SX
t 1 t t 1
X t X T 2 .
T T
Definition : The Kurtosis coefficient of a variable X t of mean m X and standard deviation X is
defined as:
X t m X 4
K X E
X4
Its estimator calculated on a sample of T observations of the X t process is given by:
1 T X X T
4
Kˆ X t 1 t
T S X 4
1 T 1 T
, with X T t 1
X t , S X t 1
X t X T 2 .
T T
14
Values of reference for these two coefficients: : those of the normal distribution
The Skewness coefficient
- SK=0: case of the normal distribution (symmetrical distribution)
- SK<0: asymmetric distribution, skewed to the left (i.e. smallest values have a higher frequency
of apparition than the highest ones)
- SK>0: asymmetric distribution, skewed to the right (i.e. highest values have a higher frequency
of apparition than the smallest ones)
SK<0 SK=0 SK>0
10% 8% 10%
9% 9%
7% P(X<m-a)=P(X>m+a)
8% 8%
6% P(X<m-a)<P(X>m+a)
P(X<m-a)>P(X>m+a)
7% 7%
5%
6% 6%
5% 4%
5%
4% 3% 4%
3% 3%
2%
2% 2%
1%
1% 1%
0% 0%
0%
-20 -19 -18 -17 -16 -15 -14 -13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11
m-a m m+a -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
m-a m m+
m-a m m+a
Note : empirical distributions of same mean and standard deviation.
15
The Kurtosis coefficient
- Kx = 3: case of the normal distribution
- Kx<3: Fat-tailed distribution (platykurtic distribution)
- Kx>3: Heavy-tailed distributions (leptokurtic distribution); a sampling randomly taken from this
distribution will exhibit more extreme values than if taken from the normal distribution.
K=1.80 K=3 K=11
18% 18% 18%
16% 16% 16%
14% 14% 14%
12% 12% 12%
10% 10% 10%
8% 8% 8%
6% 6% 6%
4% 4% 4%
2% 2% 2%
0% 0% 0%
-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10
Note : empirical distributions of same mean and standard deviation.
16
Normality test: the Jarque–Bera test
i) Null hypothesis : H 0 : the series is normally distributed.
T ˆ 1 ˆ
ii) Test statistic and distribution under H0 : JB S ² ( K 3)² ²(2) , where
6 4 H 0
T kˆ 1 ˆ
JB S ² ( K 3)² ²(2) .
6 4 H 0
, with T the number of observations, Ŝ the empirical Skewness coefficient, K̂ the empirical
Kurtosis coefficient (and k the number of estimated parameters of the model).
iii) Decision rule: if JB 22,1 at the level, reject H 0 (or when Prob< ).
17