Unit Roots: A Selected Survey
Gabriel Rodrguez
Ponticia Universidad Catlica del Per
c Gabriel Rodrguez, 2010
Motivation (1)
Random Walk (No Drift)
15
10
5
0
-5
-10
-15
-20
-25
250
500
750
1000
Japn: Exchange Rate Yen for Dollar
140.0
135.0
130.0
125.0
120.0
115.0
110.0
105.0
100.0
Ago-08
Feb-08
Ago-07
Feb-07
Ago-06
Feb-06
Ago-05
Feb-05
Ago-04
Feb-04
Ago-03
Feb-03
Ago-02
Feb-02
Ago-01
95.0
e2
En 1
e2
En 3
e2
En 5
e2
En 7
e2
En 9
e3
En 1
e3
En 3
e3
En 5
e3
En 7
e3
En 9
e4
En 1
e4
En 3
e4
En 5
e4
En 7
e4
En 9
e5
En 1
e5
En 3
e5
En 5
e5
En 7
e5
En 9
e6
En 1
e6
En 3
e6
En 5
e6
En 7
e6
En 9
e7
En 1
e7
En 3
e7
En 5
e7
En 7
e7
En 9
e8
En 1
e8
En 3
e8
En 5
e8
En 7
e8
En 9
e9
En 1
e9
En 3
e9
En 5
e9
En 7
e9
En 9
e0
En 1
e0
En 3
e0
En 5
e07
En
Motivation (2)
500
Random Walk (with Drift)
400
300
200
100
0
250
500
3
750
1000
120
Federal Reserve Board' Industrial Production Index
100
80
60
40
20
Motivation (3)
35
30
25
20
15
10
5
0
-5
25
50
75
Y_AR_0.5
Y_AR_0.98
Y_RANDOM_WALK
100
Y_AR_0.97
Y_AR_0.99
Outline
Basic References: Campbell and Perron (1991), Stock (1994), Phillips
and Xiao (1999), Maddala and Kim (2000), Haldrup and Jansson (2006)
Data Generating Process
Classical Unit Root Statistics
Other Unit Root Statistics
Recent Unit Root Statistics
Some Issues on Unit Roots
Structural Change and Unit Roots
The Role of the Initial Condition and Unit Roots
Covariates and Unit Roots
Additive Outliers and Unit Roots
Further Issues and/or Limitations of this Survey
The Data Generating Process (DGP)
(1)
(2)
yt = dt + ut ; t = 1; :::; T;
ut = u t 1 + vt ;
u0 = 0 (initial condition);
P1
P
vt = 1
i=0 ij i j < 1 and where f t g is a martingale
i=0 i t i with
dierence sequence;
vt has a non-normalized spectral density
zero given by
P at frequency
2
2
);
E(
= 2 (1)2 ; where 2 = limT !1 T 1 1
t
t=1
Under H0 , Functional Central Limit Theorem (FCLT) says: T
W (r); W (r) is a standard Wiener process.
dt =
H0 :
= 1;
zt ; where zt is a set of deterministic components;
HA : j j < 1;
Local-to-unity framework:
= 1 + c=T: Used after.
1=2
P[rT ]
t=1
vt )
4
4.1
Classical Unit Root Statistics
The Dickey-Fuller (DF) Statistic
References: Dickey and Fuller (1979, 1981).
The regression model is
0
yt =
Assume that vt
distributions are
T (b
i:i:d:(0;
zt + yt
+ vt :
(3)
) and zt = f;g. Then, the asymptotic
R
W (r)dW
0:5[W (1)2 1]
R
1) ) R
=
;
W (r)2 dr
W (r)2 dr
R
W (r)dW
0:5[W (1)2 1]
R
=
:
tb ) R
[ W (r)2 dr]1=2
[ W (r)2 dr]1=2
(4)
(5)
If
or zt = f1; tg, then W is replaced by W i = W (r)
R zt = f1g
W Z(ZZ 0 ) 1 Z(r), for i = ; . W i is the projection of W onto the
space orthogonal to z.
Asymptotic critical values at 5.0% are: -1.94, -2.86, -3.43 for zt = f;g,
zt = f1g and zt = f1; tg; respectively.
4.2
The Parametric ADF
Reference: Said and Dickey (1984).
Now, assume that vt is I(0), as in Section 3. In general: vt is an
ARM A(p; q) process.
Assuming, as before that zt = f;g, then
T (b
where
=(
0:5[W (1)2 1] +
R
;
W (r)2 dr
f0:5[W (1)2 1]g +
R
)
;
2
1=2
v [ W (r) dr]
1) )
(6)
tb
(7)
2
2
v )=2 :
Distributions depend of nuissance parameters.
The autocorrelation is corrected using the following autoregression
yt =
zt +
0 yt
1+
k
X
bi y t
(8)
i=1
where
1: Then, H0 :
= 0:
b 0 zt , then (8) is
If we dene a detrended time series as yet = yt
equivalently written as
yet =
et 1
0y
If k ! 1; k 3 =T ! 0; then, T (b
(4) and (5).
k
X
i=1
bi yet
1) and t b converge to the expressions
Important empirical application: Nelson and Plosser (1982).
(9)
4.3
The Semi-Parametric Z b and Zt Statistics
References: Phillips (1987, 1988), Phillips and Perron (1988).
The coe cient is estimated from equation (3). Residuals vbt are used
in constructing an estimator of 2 . Therefore, the autocorrelation is
taken into account in a non-parametric way:
b
= s =T
w( ; k) = 1
k+1
vbt2
+ 2T
k
X
=1
w( ; k)
T
X
t= +1
vbt vbt ; (10)
(11)
Using (10) and (11), we have that
Zb
Zt
0:5(s2 sb2v )
P
T 2 Tt=2 yt2 1
0:5[W (1)2 1]
R
;
)
W (r)2 dr
sbv
0:5(s2 sb2v )
= ( )t b
P
s
[T 2 Tt=2 yt2 1 ]1=2
= T (b
1)
0:5[W (1)2 1]
R
:
[ W (r)2 dr]1=2
(12)
(13)
which are the same as in (4) and (5), respectively.
Asymptotic critical values of Z b at 5.0% are -8.0, -14.1, and -21.7 for
zt = f;g, zt = f1g and zt = f1; tg; respectively.
Asymptotic critical values of Zt at 5.0% are -1.94, -2.86, -3.43 for zt =
f;g, zt = f1g and zt = f1; tg; respectively.
10
4.4
The M-Statistics
References: Stock (1999), Perron and Ng (1996).
Denitions:
MZ
T
2T
M SB = [
M Zt =
1 2
y~T s2
;
P
T
2
2
y
~
t
1
t=1
PT 2
2
~t 1 1=2
t=1 y
] ;
s2
[4s2 T
1 2
y~T
P
T
2
t=1
s2
y~t2 1 ]1=2
(14)
(15)
;
(16)
P
0
where: y~t = yt ^ zt , s2 = s2ek =[1 ^b(1)]2 , s2ek = Tt=k+1 e^2tk , ^b(1) =
Pk ^
j=1 bj , obtained from the autoregression (9).
The limiting distributions of M ZR ans M Zt are the expressions (4) and
(5), respectively. The M SB ) [ W (r)2 dr]1=2 :
Asymptotically: M Zt = (M Z )
(M SB):
Asymptotic critical values: see Stock (1999).
Simulation Monte-Carlo evidence.
11
4ijD 1
E1 4F
PJACT Si'E
200
Sc4/t'e r 4Rt4 1,
flOLL
nt
zt
cOL,19
o `it?!
=
fizo
o
= 0.5
1 35i2I
i oo5sJ
o
o. oo9
O Lif/
o.019
o o
5ouC
10.033
o oS?
031
011LQ
o
z 0!
o.o103J
tO
=0.0
o $101
6o60
o.023
LCSO
cO 72
o0i5
4A1 AJ
00q2
/T9&
/
5
5.1
Recent Unit Root Statistics
The ADF GLS
References: Elliott, Rothenberg and Stock (ERS, 1996), Ng and Perron
(2001).
Under local-to-unity framework:
= 1 + c=T: Then, T
Wc (r); where Wc (r) is an Ornstein-Uhlenbeck process.
1=2
u[T r] )
It bridges the gap between I(0) and I(1) asymptotics. If c ! 1,
T (b 1) and t b have I(0) distributions. If c ! +1, T (b 1) and t b
have a Cauchy and Normal distributions, respectively.
Particular characteristic: use of GLS detrended data with
= 1 + c=T:
Construction of GLS detrended Data:
yt = [y1 ; (1
zt = [z1; (1
L)yt ]; t = 2; ::::; T;
L)zt ]; t = 2; :::::; T;
(17)
(18)
Let ^ be the estimator that minimizes:
0
S( ) = (yt
12
zt )0 (yt
zt ):
(19)
^0
Detrended series: yet = yt
GLS zt :
All unit root statistics may be used with yet . For the ADF, see ERS
(1996) and for the M-statistics, see Ng and Perron (2001).
When zt = f1g and zt = f1; tg, the limiting distributions are:
0:5[Wc (1)2 1]
R
;
[ Wc (r)2 dr]1=2
0:5[Vc;c (1)2 1]
) R
;
[ Vc;c (r)2 dr]1=2
DF GLS )
(20)
DF GLS
(21)
where Vc;c (r; c) = Wc (r)
(1 c)=(1 c + c2 =3).
rb, b = Wc (1) + 3(1
R
) rWc (r)dr,
Asymptotic critical values: see ERS (1996), Ng and Perron (2001).
13
5.2
A Feasible Point Optimal Test
References: Dufour and King (1991), ERS (1996).
This test is denoted by PTGLS and dened by:
PTGLS (c; c) =
S( )
S(1)
s2
(22)
where S( ) and S(1) are the sums of squared errors from GLS regressions with = and = 1, respectively.
Limiting distributions:
GLS
PT;
(c; c)
GLS
PT;
(c; c)
) c
) c
Wc (r)2 dr
cWc (1)2 ;
(23)
Vc;c (r; c)2 dr + (1
c)Vc;c (1; c)2 ;
(24)
for zt = f1g, and zt = f1; tg, respectively.
Selection of c:
Asymptotic critical values: see ERS (1996), Ng and Perron (2001).
14
Some Issues on Unit Root Tests
6.1
The Asymptotic Gaussian Power Envelope
There is no uniform most powerful (UMP) or uniform most powerful
invariant (UMPI) statistic in unit root framework.
With
lope.
= 1 + c=T , derivation of the asymptotic Gaussian power enve-
Power envelope allows to judge between dierent alternative statistics.
The asymptotic Gaussian power envelope is dened by:
GLS
(c) = Pr[H PT
(c; c) < b
PTGLS
(c)];
(25)
where bPT (c) is such that
GLS
Pr[H PT
with
(0; c) < b
PTGLS
(c)] = ;
the size of the test.
Selection of c (-7.0 for zt = f1g and -13.5 for zt = f1; tg).
15
(26)
6.2
Asymptotic Power Functions
The asymptotic power functions of the tests are dened by:
J (c; c)
= Pr[H J
GLS
(c; c) < bJ
GLS
(c)];
GLS
where J( ) = M Z , M SB, M Zt , and ADF , and the constant bJ
GLS
GLS
is such that Pr[H J (0; c) < bJ (c)] = , the size of the tests.
16
(c)
gis
r
e
w
L
It
e,
28
:8
o
E',.
a
O
`o
E
it
O!
o
o
o
4
&
2
-c
f.n:,c
2t
24
32
6.3
Selection of the Lag length
Information Criteria: AIC, BIC
2k
;
fkg
T
log(T )k
= arg min log(s2ek ) +
:
fkg
T
kaic = arg min log(s2ek ) +
(27)
kbic
(28)
Recursive t-sig method
Modied Information Criteria: MAIC, MBIC (Ng and Perron, 2001):
kmic = arg min log(s2ek ) +
fkg
where
^T (k) =
(s2ek ) 1 b 20
CT [^T (k) + k]
T
T
X
t=1
yet2 1 :
(29)
(30)
The MAIC uses CT = 2 and the MBIC uses CT = log(T ):
Ng and Perron (2001), based on theoretical considerations and simulations, recommended MAIC.
The advantage of the MIC is that it takes into account the possible
dependence of b 0 on k:
17
6.4
Summary of Monte-Carlo Evidence
All asymptotic valid tests exhibit nite-sample size distortions for models close to I(0) model.
Importance of data dependent methods to select lag length.
Presence of non-normality or conditional heteroskedasticity increases
size distortions.
Including additional trend terms reduce the power of the unit root test
if the trends are unnecessary.
Span is important, not the frequency.
Power of the unit root depends of the initial condition u0 .
If trend is underspecied, unit root tests and estimators are inconsistent.
18
Sinl 77
`7
+00
bUi JY
03 /
0O
oO
OTI
oto &i/
Q&'Q
LiGO
060
Z5C
LOO
5791QV
jQ
cli
vYW
03
QO
jO
bo o
Yo
h&Q
ECO
hO
oVo
zoo
oG
oi wny5
080
Yo
800
go
o
oo
co
00/
?0
Z2
9O0
?v
01!
oo
900
bAO
A/o
ozo
001
oQ
?`O-
QY2
9j
`4 vii
1e
.1
7SV VV3fj7&
/-9
?JJIYJ
ouWd5r
=`
Qj/2l
7U.SiLLS
*oo =1
vv ?
7
7.1
Structural Change and Unit Root Tests
Introduction
References: Perron (1989), Christiano (1992), Banerjee et al. (1992),
Zivot and Andrews (1992), Perron (1997), Perron and Rodrguez (2003a).
Basic idea: misspecication of the trend function is responsible for the
nonrejection of the null hypothesis of a unit root in Nelson and Plosser
(1982).
Models (I, II, III):
zt = f1; 1(t > TB ); tg;
zt = f1; t; 1(t > TB )(t TB )g;
zt = f1; 1(t > TB ); t; 1(t > TB )(t
TB )g;
where 1(:) is the indicator function and TB is the break point. Assume
that TB = T , for some 2 (0; 1):
Perron (1989)
Christiano (1992)
Zivot and Andrews( 1992)
19
92
- -
881
s64
84-4
62
76
74
72
68
66
4
62
6
-----------`---
90'3
732
&3
933
940
Note. flw broLen ttai&hi hite ji a taed trend
it:'929.
DL-O:f :l929and DL
Lajr.}n .f
Kl
19W
1973
960
OLS o the forrn
Nott:nt1 V.Lgc.
8 2ai
757'-i
-/
73
72
`1
69
95C
.955
r65
9.'
.e4e
975
Note Tse br,kcn `L:a;gZ; l.nc ji al ,J trtnd `o' 01.5 of sic Sim.
D70:!973:i.o, lE'-. Tifr>H!.!- *
-tu
2.-i_.gar:lt
Kj
es
, -
DT
-,asr Qualteri kca
1/
//
25
t <'
.7
Ir
SCE
80
7228
3r; 34O
lic t:,,i, .,::a.ht.;ocis
Sic tren.
01.5 o! cc 5
h,r;cL%D7Or7ctaruL rL-.DT-I:r,,:97
.;;
Logan;
.;
C.
tiaton
Sock 5
"t,.
tt
973
O
;crc
Mo't SIOC
VeIoc.ty
02 34*10*3411
2,
-
3 -*33310 tabIlo.3
f----100.l-&*
7.'2I'
r_v
"::::: :: ::
hiUd_
//
`&%CV
,qS
`*04
91*3
IC6,i It
1044
`956
1944
361 0
1660
1090
`9033
`930
1920
1040
1840
33,0
1*60
Vea,
61.1,.,,.
l,,r-/ f4ea
jij
-`
1030, *II.'!y R..r.Ia
6-';
7.5
_1
II 3J'
1171.-.-.-
31*3*
`3330
Ye.3,
L-c'rr.:',c''
Real
S1-lE'r,:es
lodo
y.,11-'Il
LII .-J,oll.e1 `00..
`.`.-
qe
1
4-4
50111013 1% C.C
42
.0111,:,
1933
`541
--".*.-,0
119111963
1103
Italo IIbS'OOO'
A,VllWIOl3t
0910
lI;
lOo
943
Vea
Vea,
Loo 0%.o' -Ilool.
Fogure
2 cco,h,wedl
TItl,.lit
`969
7.2
GLS Detrended Data and Structural Change
Reference: Perron and Rodrguez (2003a).
Limiting distributions of the unit root statistics (Models II and III)
M Z GLS ( ) )
0:5K1 (c; c; )
K2 (c; c; )
(K2 (c; c; ))1=2
0:5K1 (c; c; )
M SB GLS ( ) )
M ZtGLS ( ) )
ADF GLS ( ) )
H M Z (c; c; );
H M SB (c; c; );
(31)
(32)
1=2
H M Zt (c; c; );
(33)
1=2
H ADF (c; c; );
(34)
(K2 (c; c; ))
0:5K1 (c; c; )
(K2 (c; c; ))
where:
(1)
(2)
K1 (c; c; ) = Vcc (1; )2 2Vcc (1; ) 1;
Z 1
Z 1
(2)
(1)
2
Vcc (r; )dr;
K2 (c; c; ) =
Vcc (r) dr 2
0
What does happen with the Model I with GLS Detrended data? Reference: Rodrguez (2007)
20
7.3
Selection of the Break Point
Method 1: Estimating as the break point that yields the minimal
value of the statistics; see Zivot and Andrews (1992), i.e. using
inf J GLS ( )
f g
where J( ) = M Z , M SB, M Zt , and ADF .
By the Continous Mapping Theorem (CMT), the limiting distribution
using method 1 is:
inf J GLS ( ) ) inf H J (c; c; );
2(0;1)
(35)
2(0;1)
Method 2: Choose the break point such that the absolute value of the
t-statistic on the coe cient of the change in slope is maximized; see
Perron (1997):
^ = arg max jtb ( )j;
2
2(";1 ")
Limiting distribution using method 2:
^ = arg max jtb ( )j ) arg max jb4 =(
2
2(";1 ")
2(";1 ")
1=2
3 )j
Hence, the limiting distributions of the statistics are given by
J GLS (^) ) H J (c; c;
21
):
(36)
7.4
The Feasible Optimal Point Test
When
is unknown:
GLS
PT;
(c; c) = f inf
2[";1 "]
S( ; )
inf
2[";1 "]
S(1; )g=s2 :
(37)
Limiting distribution:
GLS
PT;
(c; c) )
sup M (c; 0; )
2[";1 "]
2c
Wc (r)dW (r) + (c
PTGLS
2cc)
Wc (r)2 dr
(38)
(c; c):
Derivation of the power envelope.
Selection of c (c =
sup M (c; c; )
2[";1 "]
22:5)
Asymptotic Power Functions.
Finite-Sample Size and Power.
Empirical Evidence.
22
24
Figure 1: Gaussian Local Power Envelope and the Local Asymptotic Power Functions
of the Tests
The Role of the Initial Condition
Traditionally, theoretical works assume the starting value of time series is zero or has nite expectations. The eect of initial observation
disappears asymptotically.
Exceptions: Elliott (1999), Mller and Elliott (2003) in no structural
change models.
Hui and Rodrguez (2006) introduces both an unknown structural break
and a random initial condition under the alternative hypothesis.
The data generating process (DGP) is the same as before, except that:
Condition A (Initial condition assumption). We assume that u0 is zero
2
when = 1; so u1 = v1 ; while u1 has mean zero and variance 2 =(1
)
when < 1:
The innovations fvt g satisfy the standard asumptions.
Same statistics as in Perron and Rodrguez (2003a).
For Model I and II have the following limiting distributions:
M Z GLS ( ) )
0:5g1 (c; c; )
g2 (c; c; )
M SB GLS ( ) ) (g2 (c; c; ))1=2
0:5g1 (c; c; )
M ZtGLS ( ) )
(g2 (c; c; ))1=2
0:5g1 (c; c; )
ADF GLS ( ) )
(g2 (c; c; ))1=2
Using power envelope, we obtain c =
J MZ
GLS
J M SB
(c; c; )
GLS
GLS
J M Zt
J ADF
(c; c; )
(c; c; )
GLS
(c; c; )
24:
T = 1000 and 10,000 replications to calculate the asymptotic power
function for each statistic.
The curve of power function lies under the power envelope, but not far
from it.
Using Inmum method to choose break point sometimes gives a slightly
higher power function than supremum method.
Figure 1. Gaussian Power Envelope and Asymptotic Power Functions; Inmum Method
and Fixed and Random Initial Condition.
Figure 2. Gaussian Power Envelope and Asymptotic Power Functions; Supremum
Method and Fixed and Random Initial Condition.
Covariates and Unit Root Tests
Importance of covariates in improving the power of unit root tests.
References: Hansen (1995), Elliott and Jansson (2003). For structural
change models: Hui and Rodrguez (2006).
The data generating process (DGP):
yt = dyt + uyt ;
xt = dxt + uxt ;
A (L)
[1
L]uy;t
ux;t
= 1 + cT
A (L) ut ( ) = et
1
(39)
(40)
(41)
(42)
where xt , an m 1 vector, is an arbitrary number of stationary covariates containing extra information of yt , the variable to be tested.
is dened as the spectral density at the frequency zero (scaled by 2 )
of ut ( ). Therefore R2 = ! yy1 ! yx xx1 ! 0yx is a measure of the long-run
correlation between shocks to xt and quasi-dierences of yt at the frequency zero. The value of R2 represents the contribution of covariates
to the explanation of yt , and the value of R2 ranges from zero to unity.
27
The optimal statistic is dened by
i
P (1; ) =
inf
2(0;1)
inf
2(0;1)
where u^it (r) = zt (r)
T
X
u^it ( ; )0
1 i
u^t (
u^it (1; )0
1 i
u^t
; )
t=1
T
X
(1; )
c:
(43)
t=1
dt (r)0 ^ and r = ; 1:
The Theorem establishes that for cases i = 1 and 2 :
P i (1; ) )
1 (c; c; R
)+
i
2 (c; c;
; R2 )
(44)
The asymptotic power depends on c, which corresponds to one particular point under the alternative hypothesis.
The distribution of the P i (1; ) test also depends on the parameter R2 :
When R2 = 0; there is no covariate correlated with the quasi-dierences
of yt and consequently we retrieve the same asymptotic distribution as
that derived in Perron and Rodrguez (2003). When R2 is greater than
zero, the limiting distribution is a function of R2 , indicating that extra
information contained in the covariates may make a dierence on the
performance of the test.
28
Figure 1. Power Envelopes for R2 = 0:0; 0:3; 0:5; 0:7; 0:9.
29