0% found this document useful (0 votes)

66 views37 pages

Paper Econometrica

The document discusses selecting an appropriate lag length for unit root tests and proposes modified information criteria. It considers applying local generalized least squares detrending to modified unit root tests to improve size and power. Simulation results show the modified information criteria approach yields better size properties than other methods for selecting the lag length.

Uploaded by

Alissa Barnes

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

66 views37 pages

Paper Econometrica

Uploaded by

Alissa Barnes

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Lag Length Selection and the Construction of Unit Root Tests with Good Size and Power

Author(s): Serena Ng and Pierre Perron

Reviewed work(s):
Source: Econometrica, Vol. 69, No. 6 (Nov., 2001), pp. 1519-1554
Published by: The Econometric Society
Stable URL: http://www.jstor.org/stable/2692266 .
Accessed: 25/05/2012 08:12

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

The Econometric Society is collaborating with JSTOR to digitize, preserve and extend access to Econometrica.

http://www.jstor.org
Econometrica, Vol. 69, No. 6 (November, 2001), 1519-1554

LAG LENGTH SELECTION AND THE CONSTRUCTION OF

UNIT ROOT TESTS WITH GOOD SIZE AND POWER

BY SERENA NG AND PIERRE PERRON1

It is widely known that when there are errors with a moving-average root close to -1,
a high order augmented autoregression is necessary for unit root tests to have good size,
but that information criteria such as the AIC and the BIC tend to select a truncation lag
(k) that is very small. We consider a class of Modified Information Criteria (MIC) with
a penalty factor that is sample dependent. It takes into account the fact that the bias in
the sum of the autoregressive coefficients is highly dependent on k and adapts to the type
of deterministic components present. We use a local asymptotic framework in which the
moving-average root is local to -1 to document how the MIC performs better in selecting
appropriate values of k. In Monte-Carlo experiments, the MIC is found to yield huge size
improvements to the DFGLSand the feasible point optimal PT test developed in Elliott,
Rothenberg, and Stock (1996). We also extend the M tests developed in Perron and Ng
(1996) to allow for GLS detrending of the data. The MIC along with GLS detrended data
yield a set of tests with desirable size and power properties.

KEYWORDS: Integrated processes, truncation lag, GLS detrending, information

criteria.

1. INTRODUCTION

MANY UNIT ROOT TESTS have been developed for testing the null hypothesis of
a unit root against the alternative of stationarity. While the presence or absence
of a unit root has important implications, many remain skeptical about the con-
clusions drawn from such tests. This concern is justifiable, as these tests gener-
ally suffer from two problems. First, many tests have low power when the root
of the autoregressive polynomial is close to but less than unity (e.g., DeJong et
al. (1992)). Second, the majority of the tests suffer from severe size distortions
when the moving-average polynomial of the first differenced series has a large
negative root (e.g., Schwert (1989), Perron and Ng (1996)). Although less severe,
the problem also arises when there is a large negative autoregressive root in the
residuals. The consequence is over-rejections of the unit root hypothesis. While
few economic time series are found to have negative serial correlation of the
autoregressive type, many do exhibit a large negative moving-average root. The
inflation rate is an example, and omitted outliers have also been shown to induce

'This is a substantially revised version of a paper previously circulated as "Constructing Unit

Root Tests with Good Size and Power." We thank three anonymous referees and Alain Monfort,
the co-editor, for comments on earlier drafts. The authors thank the Social Sciences and Humanities
Research Council of Canada (SSHRC), the Natural Sciences and Engineering Research Council of
Canada (NSERC), and the Fonds de la Formation de Chercheurs et l'Aide a la Recherche du Quebec
(FCAR) for financial support.

1519
1520 SERENA NG AND PIERRE PERRON

a negative moving average root in the error process (e.g., Franses and Haldrup
(1994)). It is therefore desirable to have powerful unit root tests that are robust
to size distortions.
The implementation of unit root tests often necessitates the selection of an
autoregressive truncation lag, say, k. This is required in the autoregression used
to form the Said-Dickey-Fuller test. It is also required in constructing an autore-
gressive estimate of the spectral density at frequency zero. However, simulations
have repeatedly shown a strong association between k and the severity of size dis-
tortions and/or the extent of power loss (e.g., Ng and Perron (1995), and Lopez
(1997)). To date, there exists no operational procedure for selecting k that yields
robust and satisfactory results. The problem is that while a small k is adequate
for finite order autoregressive errors and ARIA processes with small moving-
average components, a large k is generally necessary for noise functions with a
moving-average root that is large. For the latter class of models, selection rules
such as the Schwarz or Bayesian Information Criterion (BIC) and the Akaike
Information Criterion (AIC) tend to select values of k that are generally too
small for unit root tests to have good sizes.
This paper sets out to resolve two problems. One is to enhance the power of
several tests that have been shown to have small size distortions. More precisely,
in Perron and Ng (1996), we analyzed a class of modified tests, originally pro-
posed by Stock (1990), and showed that these tests are far more robust to size
distortions than other unit root tests in the literature when the residuals have neg-
ative serial correlation. Exploiting the findings in Elliott, Rothenberg, and Stock
(1996) (hereafter ERS) and Dufour and King (1991) that local GLS detrending
of the data yields substantial power gains, we apply the idea of GLS detrending
to the modified tests and show that non-negligible size and power gains can be
made when used in conjunction with an autoregressive spectral density estimator
at frequency zero provided the truncation lag is appropriately selected.
Our second task is to provide an improved procedure for choosing the trun-
cation lag. We argue that the penalty of k assigned to overfitting in information
criteria such as the AIC under-estimates the cost of a low order model when the
unit root process has a negative moving-average root and, hence, tends to select
a lag length that is too small. We suggest a class of modified information criteria
that takes better account of the cost of underfitting. The modified AIC is shown
to lead to substantial size improvements over standard information criteria in
all the unit root tests considered. Taking the two steps together, GLS detrend-
ing along with the selection rule for k provide procedures which allow for the
construction of unit root tests with much improved size and power. Additionally,
using GLS detrended data to estimate the spectral density at frequency zero is
found to have favorable size and power implications.
The paper proceeds as follows. After setting up the preliminaries in Section 2,
Section 3 considers the class of M tests with GLS detrending. Section 4 explains
why information criteria will underestimate k and Section 5 proposes an alterna-
tive penalty factor in the context of integrated data. Section 6 presents the prop-
erties of the modified information criteria using a local asymptotic framework
LAG LENGTH SELECTION 1521

and shows why the AIC version is preferable. Simulation results are presented
in Section 7, and the procedures are illustrated using inflation rates for the G7
countries in Section 8. Section 9 offers brief concluding remarks. Mathematical
derivations are given in the Appendix.

2. PRELIMINARIES

We assume that we have T + 1 observations generated by (for t = O,... , T):

(1) Yt= dt + ut, ut = atut-,+ vt,

where E(u2) < oo (even as T oo), vt = 6(L)et = Z7o0 61et_ with EZ-o jI5jI < oo
and {et} i.i.d. (0, e2).The non-normalized spectral density at frequency zero of
vt is given by a-2 = (oe2Q(1)2.Also, T-1/2 Et111 vt =X>SW(s), where =X denotes weak
convergence in distribution and W(s) is a Wiener process defined on C[O, 1].
In (1), dt = O'zt, where zt is a set of deterministic components. We consider
d'= P=oqit', with special focus on p = 0, 1, though the analysis remains valid
for more general cases.2 Of interest is testing the null hypothesis a = 1 against
a < 1. The DF test due to Dickey and Fuller (1979) and Said and Dickey (1984)
is the t statistic for go = 0 in the autoregression:
k
(2) Ayt = dt + PoYt-i + E pjAYt-j+ etk.
j=1

In Perron and Ng (1996), we analyzed the properties of three tests: MZa, MZt,
and MSB, collectively referred to as the M tests. These are defined as (for the
no deterministic term case):

(3) MZ.a = (T1YT SAR)(2T Yt-1)

T \1/2
(4) MSB = T-2 E yt2 /S2) 1/
t=1

and MZt = MZa x MSB. All three tests are based on S2R, an autoregressive
estimate of the spectral density at frequency zero of vt. Let /(1) = Z$1 60
f, =
(T - k)-1 T=k?ee, with f3iand {etk} obtained from (2) estimated by OLS. Then,

(5) S2R = Sk2/(_ - /(1))2.

The M tests for p = 0 and 1 can be obtained upon replacing Yt-i and YT by the
residuals obtained from least squares detrending. The MZa and MZt tests can be
2 Most of the results
presented here, in particular Theorem 1 below, hold with the less restrictive
assumption that {et} is a martingale difference sequence with respect to the sigma algebra c =
{es;s < t} with o = limI<: T1 Zt,1 E(etIlt_).
1522 SERENA NG AND PIERRE PERRON

viewed as modified versions of the Phillips (1987) and Phillips-Perron (1988) Za

and Zt tests, referred to as the Z tests below (see Perron and Ng (1998)). The Z
tests suffer from severe size distortions when vt has a negative moving-average
root. When the root is close to minus one (e.g., -.8), the rejection rate can be
as high as 100% (see Schwert (1989)). Perron and Ng (1996) showed that the M
tests have dramaticallysmaller size distortions than most (if not all) unit root tests
in the literature in cases of negative moving-average errors if the autoregressive
spectraldensityestimatordefined above is used in conjunction with a suitably chosen
k. The three M tests have similar properties and we use MZa for illustration.

3. THE TESTS UNDER GLS DETRENDING

To improve power, ERS proposed local to unity GLS detrending of the data.
For any series {xt}[=0, define (xO,xt) = (xo, (1 - aL)xt), t = 1, . . . , T, for some
chosen a-= 1 + c/T. The GLS detrended series is defined as

(6) Yt Yt-z
where & minimizes S(a-, f) = (y" - q'z )a(ya - q'za). If v, is i.i.d. normal, the
point optimal test of the null hypothesis a = 1 against the alternative hypothesis
a a is the likelihood ratio statistic, L = S(a) - S(1), where S(d) = mind,S( -, /).
=
ERS considered a feasible point optimal test that takes into account that vt may
be serially correlated. The statistic is

(7) PT = [S( ) - SM11SAR,

where S2R is defined as in (5). The value of c is chosen such that the asymptotic
local power function of the test is tangent to the power envelope at 50% power.
For p = 0 this is -7.0 and for p = 1, it is -13.5. The PT test did not perform
well in simulations when k was selected by the BIC. Large size distortions were
recorded. ERS then recommended the DFGLs statistic (whose local asymptotic
power function also lies close to the Gaussian local power envelope) as the t-
statistic for testing go = 0 from the following regression estimated by OLS:
k
(8) A5t = foit-1 + E fi tA j+ etk.
j=1

3.1. The MGLsTests

While the power gains of the DFGLs from using GLS detrended data are
impressive, simulations also show that the test exhibits strong size distortions
when vt is an MA(1) with a negative coefficient. For T = 100, ERS report rejec-
tion rates of 30 and 50% for p = 0 and 1, respectively, when the nominal rate is
5%. An explanation is provided in Section 4. Size distortions, however, are less
of an issue with the M tests in theory as we showed in Perron and Ng (1996).
In practice, it does require us to have a way to find the appropriate k. Suppose
LAG LENGTH SELECTION 1523

for the moment that this lag length selection problem is solved. Since the power
gains from the DFGLSover the DF come from the use of GLS detrended data,
it is natural to consider the M tests under GLS detrending. We now state the
asymptotic properties of the MGLS tests.

THEOREM 1: Let Yt be generated as in (1) with a = 1 + c/T and consider

data obtained from local GLS detrending at a- = 1 + c/T. Also, let s2 be a
consistent estimate of (J2. If p = 0, MZGLs=X 0.5(Jc(1)2 - )(fg Jc(r)2dr) - and
MSBGLSX, (f J (r)2dr) 1/2. If p = 1, MZ_LS 0.5(VC15(1)2 - 1)(foV1 , (r)2dr)1
and MSBGLSX (f0 J, 5(r)2dr)1/2, whereJj(r) is an Ornstein-Uhlenbeck process
defined by dJc(r) = cJc(r)dr+dW(r) with Jc(0) = 0, and Vc, (r) = Jc(r) -
r[AJC(1)+ 3(1-A) f01 sJc(s) ds], A = (1-C)/(1-c ? c2/3). Resultsfor MZGLscan
be obtained in view of the fact that MZGLs= MZGLSx MSBGLs.

The proof is based on standard results for sample moments of GLS detrended
data and therefore omitted. The MZGLS,MZGLS,and MSBGLShave similar size
and power properties and will be referred to as the MGLS tests. We have com-
puted, via simulations, the local asymptotic power functions of the M tests with
OLS and GLS detrending. These showed the MGLS tests to be uniformly more
powerful than the M tests and also to have local asymptotic power functions
that are indistinguishable from those of the DFGLSand the Gaussian asymptotic
local power envelope. A graph is therefore not presented. The important point
is that the MGLStests, like the DFGLS,achieve local asymptotic power that is
approximately the same as the feasible point-optimal test. We also consider two
modified feasible point optimal tests and derive their limiting distributions:

(9) p =0: MPT = [T EY1

t=1
- CT-] /SAR
X C2f JC(r)2dr- CJ (1)2

p=1: MPLS = [2T2 Et2_+(1-c)T Y]T/SAR

X V2G -(r)2dr + (1 -c cV, cl

These tests are based on the same motivation that leads to the definition of the
M tests in Stock (1990), namely, to provide functionals of samp-lemoments that
have the same asymptotic distributions as well known unit root tests. The MPT
is important because its limiting distribution coincides with that of the feasible
point optimal test considered in ERS.
Critical values for the limiting distributions of the MGLS tests were obtained
by numerical simulations that approximate the Wiener process W(r) by partial
sums of N(0, 1) random variables with 5,000 steps and 20,000 simulations. These
are summarized in Table I.
1524 SERENA NG AND PIERRE PERRON

TABLE I
ASYMPTOTIC CRITICAL VALUESa

Percentile MZGLS
a MZGLS,
t ADFGLS MSBGLS ~~
~~~~~T
pGLS MpGLS
T

Case: p= 0, C= -7.0
.01 -13.8 -2.58 .174 1.78
.05 -8.1 -1.98 .233 3.17
.10 -5.7 -1.62 .275 4.45
Case: p= 1, C=-13.5
.01 -23.8 -3.42 .143 4.03
.05 -17.3 -2.91 .168 5.48
.10 -14.2 -2.62 .185 6.67
aFor MZGLS, MZGLS, and ADFGLS with p = 0, the critical values are from Fuller
(1976). The other entries were obtained using 20,000 simulations and 5,000 steps to approx-
imate the Wiener processes.

3.2. The Constructionof the AutoregressiveSpectralDensity

Estimator and the MGLS
All that is required for Theorem 1 and the results of ERS to hold is that
SAR converges to u2 under the null hypothesis of a unit root, and for the tests
to be consistent, that TS2R diverges under the alternative of stationarity. The
attractiveness of GLS detrending is that it estimates the deterministic function
with more precision and leads to reduced bias in f3i. Since this should lead to a
more precise estimate of /3(1), we also consider estimating 5AR using the autore-
gression (8) based on GLS detrended data. An additional advantage is that this
estimator is invariant to the slope of the trend function.3 Experimentation with
different values of c when constructing the autoregressive spectral density gave
similar results. Hence, c = -7.0 and -13.5 will continue to be used. The tests so
constructed are denoted ZGLS and MGLS. These are to be distinguished from the
ZGLSand MGLS statistics that use least squares detrended data to estimate S2R

4. THE SIZE ISSUE, THE SELECTION OF k, AND

INFORMATION CRITERIA

To see the influence of the lag order k, we consider the finite sample size of
the tests MZaLS, DFGLS, and ZGLS. The setup, which will be used throughout,
is to consider data generated by (1) with a = 1 and Vt either an MA(1), i.e.
vt = et + Oet_1,or an AR(1), i.e. vt = pvt1 + et, with et - i.i.d. N(O, 1). The
errors are generated from the rndn( function in Gauss with 999 as the seed, and
5,000 replications are used. We use the nominal size of 5% as the benchmark.
3 When p = 1, Perron and Ng (1996), Stock (1990), and ERS estimate equation (2) with only a
constant by ordinary least-squares. This still ensures a consistent estimate under the null hypothesis
but the omission of the trend term implies a noninvariance with respect to the slope of the trend
function in finite samples under the alternative hypothesis. This noninvariance is such that the tests
lose power as the slope increases.
LAG LENGTH SELECTION 1525

TABLE II.A
EXACT SIZE OF THE TESTS AT SELECTED VALUES OF k: p = 0

MA Case T 0 k =0 1 2 3 4 5 6 7 8 9 10

100 -0.8 0.994 0.904 0.808 0.740 0.698 0.672 0.654 0.644 0.636 0.633 0.627
zGLS -0.5 0.618 0.322 0.224 0.193 0.176 0.181 0.182 0.192 0.197 0.207 0.216
0.0 0.072 0.082 0.089 0.104 0.112 0.125 0.137 0.152 0.164 0.179 0.190
0.5 0.010 0.143 0.056 0.116 0.100 0.129 0.135 0.153 0.158 0.179 0.187
0.8 0.006 0.172 0.036 0.147 0.071 0.155 0.108 0.166 0.138 0.188 0.169
250 -0.8 0.987 0.861 0.725 0.641 0.592 0.559 0.539 0.524 0.515 0.507 0.504
-0.5 0.553 0.258 0.166 0.124 0.112 0.110 0.106 0.112 0.111 0.115 0.116
0.0 0.060 0.062 0.063 0.066 0.069 0.075 0.077 0.081 0.087 0.087 0.092
0.5 0.008 0.111 0.047 0.080 0.065 0.077 0.076 0.083 0.084 0.091 0.095
0.8 0.004 0.145 0.030 0.108 0.049 0.099 0.062 0.096 0.074 0.100 0.085
100 -0.8 0.990 0.828 0.543 0.319 0.187 0.129 0.095 0.086 0.079 0.077 0.078
MzGLS -0.5 0.595 0.241 0.126 0.097 0.084 0.092 0.097 0.106 0.120 0.132 0.141
0.0 0.065 0.070 0.082 0.095 0.104 0.119 0.130 0.145 0.158 0.172 0.185
0.5 0.008 0.139 0.054 0.113 0.097 0.126 0.132 0.150 0.156 0.176 0.186
0.8 0.004 0.168 0.035 0.143 0.068 0.153 0.106 0.164 0.136 0.186 0.168
250 -0.8 0.987 0.828 0.597 0.407 0.260 0.166 0.107 0.075 0.055 0.044 0.035
-0.5 0.545 0.227 0.121 0.081 0.067 0.063 0.062 0.066 0.070 0.071 0.074
0.0 0.058 0.059 0.061 0.064 0.066 0.072 0.076 0.079 0.084 0.084 0.088
0.5 0.008 0.110 0.045 0.078 0.064 0.076 0.075 0.082 0.083 0.090 0.095
0.8 0.004 0.145 0.029 0.108 0.048 0.099 0.061 0.095 0.074 0.100 0.084
100 -0.8 0.991 0.843 0.624 0.455 0.322 0.243 0.187 0.157 0.129 0.114 0.104
DFGLS -0.5 0.605 0.263 0.147 0.108 0.089 0.087 0.081 0.079 0.076 0.078 0.075
0.0 0.073 0.070 0.072 0.074 0.072 0.072 0.070 0.070 0.071 0.074 0.073
0.5 0.010 0.124 0.046 0.081 0.063 0.075 0.074 0.077 0.078 0.075 0.071
0.8 0.005 0.150 0.030 0.107 0.043 0.089 0.054 0.082 0.059 0.080 0.064
250 -0.8 0.985 0.823 0.613 0.449 0.332 0.247 0.194 0.155 0.128 0.109 0.099
-0.5 0.535 0.227 0.123 0.084 0.068 0.064 0.062 0.062 0.061 0.058 0.060
0.0 0.057 0.054 0.054 0.053 0.056 0.055 0.058 0.058 0.056 0.055 0.054
0.5 0.009 0.103 0.043 0.070 0.055 0.061 0.057 0.058 0.057 0.060 0.060
0.8 0.004 0.133 0.027 0.090 0.039 0.074 0.046 0.068 0.049 0.065 0.051
AR Case T p k =0 1 2 3 4 5 6 7 8 9 10

100 -0.8 0.802 0.286 0.286 0.295 0.300 0.308 0.314 0.322 0.328 0.340 0.346
zGLS 0.8 0.000 0.102 0.115 0.135 0.143 0.158 0.172 0.191 0.201 0.217 0.226
250 -0.8 0.737 0.170 0.171 0.171 0.173 0.176 0.179 0.182 0.185 0.184 0.183
0.8 0.000 0.069 0.071 0.077 0.082 0.084 0.090 0.094 0.093 0.099 0.100
100 -0.8 0.784 0.027 0.035 0.050 0.065 0.081 0.089 0.113 0.120 0.140 0.149
MZ?GLS 0.8 0.000 0.100 0.114 0.134 0.142 0.157 0.170 0.190 0.199 0.216 0.225
250 -0.8 0.730 0.043 0.041 0.047 0.048 0.050 0.058 0.062 0.064 0.071 0.071
0.8 0.000 0.069 0.071 0.077 0.082 0.084 0.090 0.094 0.092 0.099 0.100
100 -0.8 0.791 0.073 0.074 0.077 0.079 0.080 0.076 0.081 0.075 0.083 0.078
DFGLS 0.8 0.000 0.066 0.070 0.069 0.068 0.069 0.071 0.072 0.071 0.077 0.079
250 -0.8 0.718 0.057 0.054 0.056 0.054 0.055 0.057 0.057 0.057 0.059 0.058
0.8 0.000 0.059 0.061 0.063 0.062 0.060 0.060 0.060 0.058 0.058 0.054
1526 SERENA NG AND PIERRE PERRON

TABLE II.B
EXACT SIZE OF THE TESTS AT SELECTED VALUES OF k: p = 1

MA Case T 0 k= 0 1 2 3 4 5 6 7 8 9 10

100 -0.8 1.000 0.994 0.965 0.940 0.918 0.906 0.893 0.888 0.882 0.882 0.883
zGLS -0.5 0.810 0.470 0.340 0.299 0.290 0.309 0.323 0.358 0.376 0.413 0.450
0.0 0.053 0.076 0.100 0.136 0.169 0.222 0.259 0.313 0.346 0.400 0.430
0.5 0.001 0.169 0.059 0.163 0.155 0.226 0.253 0.303 0.343 0.397 0.432
0.8 0.001 0.219 0.023 0.229 0.091 0.284 0.190 0.350 0.297 0.422 0.389
250 -0.8 1.000 0.992 0.960 0.927 0.899 0.875 0.859 0.848 0.839 0.833 0.830
-0.5 0.801 0.404 0.253 0.202 0.181 0.174 0.176 0.179 0.188 0.199 0.209
0.0 0.047 0.057 0.064 0.077 0.081 0.094 0.104 0.113 0.129 0.142 0.154
0.5 0.002 0.127 0.037 0.092 0.071 0.097 0.095 0.120 0.122 0.143 0.153
0.8 0.001 0.176 0.014 0.129 0.038 0.131 0.068 0.139 0.099 0.162 0.136
100 -0.8 1.000 0.925 0.592 0.345 0.220 0.185 0.159 0.163 0.163 0.185 0.183
MZGLS -0.5 0.750 0.252 0.104 0.093 0.099 0.132 0.159 0.204 0.234 0.285 0.317
0.0 0.032 0.055 0.080 0.115 0.152 0.206 0.243 0.300 0.332 0.390 0.419
0.5 0.001 0.161 0.054 0.155 0.148 0.219 0.246 0.299 0.338 0.392 0.427
0.8 0.001 0.211 0.021 0.224 0.087 0.277 0.186 0.346 0.294 0.417 0.386
250 -0.8 1.000 0.978 0.800 0.514 0.276 0.147 0.087 0.058 0.048 0.040 0.036
-0.5 0.786 0.311 0.126 0.074 0.057 0.059 0.065 0.069 0.082 0.100 0.106
0.0 0.040 0.049 0.057 0.070 0.077 0.088 0.099 0.108 0.124 0.136 0.149
0.5 0.002 0.124 0.034 0.090 0.069 0.095 0.094 0.117 0.121 0.142 0.150
0.8 0.001 0.174 0.014 0.128 0.036 0.130 0.066 0.137 0.098 0.160 0.135
100 -0.8 1.000 0.977 0.815 0.600 0.415 0.293 0.203 0.167 0.125 0.116 0.098
DFGLS -9.5 0.838 0.375 0.176 0.112 0.087 0.084 0.074 0.070 0.068 0.069 0.065
0.0 0.070 0.072 0.067 0.072 0.069 0.073 0.072 0.071 0.069 0.068 0.061
0.5 0.003 0.160 0.045 0.091 0.064 0.073 0.062 0.066 0.061 0.067 0.062
0.8 0.002 0.209 0.021 0.127 0.036 0.099 0.046 0.085 0.053 0.076 0.061
250 -0.8 1.000 0.984 0.875 0.685 0.504 0.362 0.259 0.194 0.150 0.120 0.099
-0.5 0.809 0.352 0.161 0.101 0.072 0.061 0.059 0.053 0.053 0.056 0.052
0.0 0.053 0.055 0.055 0.056 0.056 0.054 0.054 0.054 0.055 0.055 0.054
0.5 0.002 0.126 0.034 0.071 0.048 0.057 0.053 0.054 0.052 0.056 0.050
0.8 0.001 0.172 0.015 0.098 0.024 0.075 0.032 0.063 0.038 0.058 0.042
AR Case T p k=0 1 2 3 4 5 6 7 8 9 10

100 -0.8 0.960 0.497 0.509 0.526 0.549 0.574 0.593 0.616 0.640 0.669 0.690
zGLS 0.8 0.000 0.123 0.162 0.213 0.252 0.312 0.351 0.410 0.441 0.489 0.514
250 -0.8 0.956 0.319 0.327 0.330 0.335 0.345 0.352 0.360 0.369 0.375 0.379
0.8 0.000 0.070 0.077 0.089 0.099 0.119 0.129 0.150 0.159 0.176 0.183
100 -0.8 0.941 0.001 0.008 0.030 0.058 0.104 0.140 0.191 0.233 0.290 0.329
MZGLS 0\8 0.000 0.121 0.160 0.211 0.249 0.309 0.349 0.408 0.439 0.487 0.513
250 -0.8 0.?52 0.011 0.015 0.020 0.028 0.037 0.050 0.062 0.076 0.092 0.106
0.8 0.000 0.070 0.076 0.089 0.099 0.119 0.128 0.150 0.159 0.175 0.183
100 -0.8 0.967 0.069 0.067 0.067 0.060 0.068 0.059 0.065 0.063 0.065 0.065
DFGLS 0.8 0.000 0.072 0.067 0.070 0.071 0.069 0.069 0.074 0.066 0.072 0.071
250 -0.8 0.959 0.052 0.051 0.052 0.050 0.049 0.050 0.050 0.051 0.052 0.053
0.8 0.000 0.057 0.055 0.057 0.054 0.055 0.058 0.056 0.055 0.059 0.058
LAG LENGTH SELECTION 1527

Reported in Table II are the sizes of the tests for p = 0, 1 at selected values of 0
and p. We report results for T = 100 and 250 to highlight the fact that the size
issue in the negative moving average case is not a small sample problem.
Several features of the results for MA errors are of note. First, for a 0 of the
same absolute value, a negative 0 always requires a larger lag to obtain a more
accurate size. Second, for a positive 0, the size of the tests are significantly better
when k is even. Third, the larger the number of deterministic terms, the more
distant are the exact from the nominal sizes. Fourth, there is always a value of
k that will yield an MZGLSwith a reasonably accurate size when 0 = -0.80.
For example, for p = 0 when T = 100 and 250, exact sizes of .077 and .055,
respectively, are achievable with lags of 9 and 8. In contrast, there does not exist
a k that will result in an acceptable size for ZGLS, and increasing the sample size
will not reduce the size distortions adequately. For the sample sizes reported,
the exact sizes for ZGLSare well above 0.5 at 0 = -.8, and size distortions are
non-negligible even when 0 = -0.5. Fifth, the DFGLSalso exhibits a dependence
on k. At T = 100, the size is .104 at k = 10, but is .624 when k = 2.
For AR errors, size discrepancies between ZGLS and MGLS also exist when
p is large and negative, albeit not as dramatic as in the MA case. The results
reveal that the MZaLS is inferior to the DFGLSwhen p = 1, the sample size is
small, and lpl is large. For large negative AR coefficients, the MZGLShas a more
accurate size at k greater than 1. For large positive AR coefficients, the MZGLS
is oversized for small T, though the size improves as T increases.
Clearly, the choice of k is crucial especially in the case of negative moving
average errors. Various practical solutions have been used against this problem.
In ERS, the BIC is used to select k but they set the lower bound to be 3,
because even larger size distortions would have resulted if zero was the lower
bound. An alternative method is the sequential t test for the significance of the
last lag considered in Ng and Perron (1995). It has the ability to yield higher
k's than the BIC when there are negative moving-average errors and reduce size
distortions. But, the sequential test tends to overparameterize in other cases.
This, as does ERS's implementation of the BIC, leads to less efficient estimates
and subsequently to power losses. Neither approach is fully satisfactory.
The AIC and the BIC belong to the class of information based rules (here-
after, IC) where the chosen value of k is ki = arg mink[O.kmax]IC(k) where4

(10) IC(k) = ln(o6i) + (k + p 1)CT/(T -kmax),

&k = (T - kma- LT=kmax?itk CT/T - 0 as T -> oc, and CT > 0. The various
criteria differ in CT, the weight applied to overfitting, but all use k as the penalty
to overfitting. We argue below that, with integrated data, this penalty may be a
poor approximation to the cost of underfitting.

4See Ng and Perron (2000) for an analysis pertaining to the adequacy of the particular formulation
of the information criterion stated.
1528 SERENA NG AND PIERRE PERRON

5. AN ALTERNATIVE PENALTY FOR INTEGRATED DATA

To motivate the main idea, we consider the derivation of the AIC for data gen-
erated by a finite order AR(ko) with normal errors and a unit root (i.e., the DGP
is (2) with k = ko, go = 0, and etk = et - i.i.d. N(O, 1)). The family of parametric
models fitted is given by (8). For notation, we let go (k) = (O,01, ,f3k)'g (k) =
(0, 1,-*,k), o(k) = (P1. * I*3Pk)' and P/o(k) = (1 . .., IPk). In what
follows, we adopt the treatment of Gourieroux and Monfort (1995, pp. 307-309).
The goal is to select an autoregressive order between 0 and some upper bound
kmax.We assume this upper bound satisfies kmax/T-> 0 as T -> oo. Let f(Ay
00(k)) be the likelihood function of the data (AYk m1x ... AYT) conditional on ,

the initial observations (yo, ... , Ykmax). This ensures that each competing model is
evaluated with the same number of effective observations, namely T - kmax.The
Kullback distance between the true probability distribution and the estimated
parametric model is Q = Eo[ln(f (Ay I /0(k))) - ln (f (Ay I /3(k)))] with sample
analog:
T

Q = (T - kmax)1 E ln(f(Ayt Igo(k)))

t=kmax+1

-(T -kmax)-l E ln(f (Ayt I/3(k))).

t=kmax+1

As discussed in Gourieroux-Monfort (1995), Akaike's suggestion was to find a

Q* such that limT,,,, E[T(Q - Q*)] = 0 so that Q* is unbiasedfor Q to order
T-1. Let Xt = (Yt-, Zt) with Zt = (Yt-l * , jyt-) and
T

kT(k) = (1/k2) (P(k) -/ (k))' E XtXt ((k) -/30(k)),

t=kma?1

T
where &k2= (T - kmax)-1 T=k
?1 e2. Using Taylor series expansions, we have

TQ=(PT(k)/2+op(1) and TQ=- (PT(k)/2+op(1). Since T(Q-Q)= IT(k)+

op(l), limT-.,, E[T(Q -Q*)] = 0 if Q*= Q + T(k) and the remainder term is
uniformly integrable. Now consider the term PT(k) in the context of integrated
data. Given the asymptotic block diagonality of the matrix DT LT=kma+1 XtXDT
with DT = diag((T-kma)-1, (T-kmax)-1/2, ... I (T-kmax)-/2), we have

(11) ~T(k>+i ((JI)O E Y-2

T
(11)~~ (Tk 1/ gTo E t-1 k+ p(1
'2 2 tkma?
&k

t=kmax+1
LAG LENGTH SELECTION 1529

where Xi is a chi-square random variable with k degrees of freedom that is

asymptotically independent of the first term. Hence, assuming the remainder
term is uniformlyintegrable,a Q* that will satisfylimT,,,O E[T(Q -Q*)] = 0 is
T
Q= (T-kmax)1 E ln(f(Ay,Ig3o(k)))
t=kmax 1

t=kmax+l1

T
2
+ (11&Tk2)
(32 E 21 +k.
t=kmaX+?1

Since the first term is common to all models and the second term is propor-
tional to -((T- kmax)/2) ln(k2) under normality, minimizing Q* is equivalent to
minimizing

(12) MAIC(k) = In(Tk2)+ 2(TT(k) + k)

T-kmax

where TT(k) = (k2) 1 2 ETky1 y2 and &k2= (T -kmax) 1 ETe=k+1e2. Note

in
that the same result holds the general case where the data are generated by (1).
The important step is the relation given by (11), which can be shown to hold in
the general case, in the sense that the remainder term is op(l) uniformly in k for
0 < k < KT, where KT = o(T1/3). This follows using the results of Berk (1974) and
Said and Dickey (1984) provided k -> oo, k3/T -> 0, and T-112 L]=k+1 If4i- 0
As a natural generalization of the MAIC(k), we propose a class of Modified
Information Criteria (MIC) that selects k as kmic= arg minkMIC(k) where

(13)
(13) MIC(k)
MI C(k) ==Tln(6k2) + CT (TT (k) + k)
-kmax

with CT > 0 and CT/T -> 0 as T -+> . The MBI C obtains with CT = ln(T - km=).
There are two important elements that distinguish (13) from standard infor-
mation criteria. The first is the imposition of the null hypothesis go = 0. This
idea of imposing the null hypothesis in model selection when the ultimate objec-
tive is hypothesis testing appears new and may have implications beyond the unit
root issue.5 This is an interesting avenue of research, but is beyond the scope of
this paper. The second element is that we retain a term in the penalty function
that is stochastic. Now, as T -?oo,(1/6o)go ET=k_ +t_l X== (-2/o-)[W(1)2 -

1]2/[2 fl W(r)2dr] -r(W), say, where W is a detrended Wiener process, whose

exact form depends on the deterministic components and the method of detrend-
ing. In principle, we could use the mean of r(W), which is independent of k,
instead of rT(k), to construct Q* and the objective function would then reduce

5We thank a referee for bringing this feature to our attention.

1530 SERENA NG AND PIERRE PERRON

to the standard AIC. A central theme of our argument for retaining rT(k) is
that, unless both T and k are very large, it varies substantially with k, especially
when a negative MA component is present. In a sense, the imposition of the null
hypothesis, go = 0, allows us to avoid using asymptotic expected values to approx-
imate the penalty factor. We can therefore hope that TT(k) will better capture
the relevant cost of selecting different orders in finite samples since it depends
not only on k, but also on the nature of the deterministic components and the
detrending procedure. Hence, the penalty term will also depend on these factors.
To get an idea of the dependence of rT(k) on k, Figure 1 graphs the aver-
age over 5,000 replications of rT(k) against k for p = 0,1 using the DGP
AY, = e?+ Oe,_1 with et - i.i.d. N(O, 1). For positive 0, _T(k) is indeed fairly
invariant to k (as is the case for AR processes, not reported). But, there is a clear
inverse and nonlinear relation between rT(k) and k for negative values of 0. This
relation exists even when 0 is -.5 for both p = 0 and p = 1. The basic reason for
the dependence of TT(k) on k is that the bias in the sum of the estimated autore-
gressive coefficients (i.e., io) is very high for small values of k. This suggests that
TT(k) will play a role in determining the appropriate k when there are negative

P=0, Ti100 P=0, T=250

30 40

25
30
20
<- 0=-0.8
15 20 <- O=.8

10
1 O=-.5

0 5 10 0 5 10
k k
p=1, T=100 p=1 (T=250

80
40 0 =-0.8 60
<-- 0=-0.8

40-
20~ ~~ ~~~~~~~20<O-.

0 C
0 5 10 0 5 10
k k
FIGURE 1: TT(k).
LAG LENGTH SELECTION 1531

moving-average errors. To make this claim precise, the next subsection uses a
local asymptotic framework to analyze the properties of TrT(k) and the MIC.

6. LOCAL ASYMPTOTIC PROPERTIES

As discussed above, TT(k) will likely influence the choice of k when there is a
large negative moving-average root, a situation for which standard unit root tests
exhibit severe size distortions. To understand the issue of lag length selection
in this setting, a useful device is an asymptotic framework in which the moving
average coefficient is parameterized as local to -1.

(14) Yt- TYt-1 = et + oTet-1,

aT = 1 + c/T, +
OT -1 ?/T,

et ~ i.i.d. (0, Te2),yo = eo = 0, 8 > 0, and c < 0. As T increases to infinity,

aT -1, OT - -1, and Yt becomes a white noise process. However, Yt is a nearly
integratednearly white noise process in finite samples. Nabeya and Perron (1994)
used this asymptotic framework to derive approximations to the finite sample
properties of the least-squares estimator in a first-order autoregressive model
with negative moving-average errors and showed these approximations not only
to be superior to those obtained from standard asymptotic distributions but also
to be quite accurate even for small sample sizes. Perron and Ng (1996) used the
same setup to analyze the size and power of various unit root tests based upon
augmented autoregressions. Our aim here is to use the same device to first obtain
useful qualitative features about the relative properties of the IC and MIC, and
second, to guide us to an appropriate choice of the penalty weight CT.

6.1. Local Asymptotic Propertiesof TT(k)

The following Theorem (proved in the Appendix) summarizes the general
properties of _T(k) in this local asymptotic framework.

THEOREM 2: Let yt be generated by (14), Z1 = (1 ? 82 of Jc(r)2dr)/

Jr
[82 fo Jc(r)2dr]2 when p = 0, and Z1 = (1 + 82 fJ VC5(r)2dr)/[82 Q
V(r)2dr]2
when p 1. Let k -> oo, K1T o(T1/2), T1/2/K2T op(l), and K3T = o(T). Then,
= = =
(i) uniformly in 0 < k < K1T: (k2/T)TT(k) =X Zj; (ii) if k/T1/2 - K, TT(k) =X
1(K)2Z2, where E(Z2) = E(Z1) and #~(K) = 28 exp(-2K8)/(1 - exp(-2K8)); and
(iii) uniformly in K2T <! k < K3T, TT(k) X Z3, where Z3 is Op(1).

Theorem 2 shows that the local limit of TT(k) is unbounded unless k increases
at least at rate T112.Since TT(O) diverges at rate T (see Nabeya and Perron
(1994)), and rT(k) diverges at rate T/k2 if k2/T -> 0, the cost to overfitting is
not linear in k as is assumed for the IC, and Figure 1 bears this out. Theorem 2
also highlights the fact that TT(k) will depend, via Z1, on the nature of the
deterministic components and the method of detrending.
1532 SERENA NG AND PIERRE PERRON

Given Theorem 2 and the fact that 1k2is consistent for o-e2irrespective of the
rate of increase of k (shown in the Appendix), it can be shown that, in this local
asymptotic framework, the limiting distribution of the DF test diverges to -o
unless k grows at least at rate T1/2.6 Hence, the unit root will always be rejected
if standard critical values are used. This explains why the DF test tends to be
oversized when k is small. In Perron and Ng (1996, 1998), we showed that
k2s R = Op(1) but we also need TS2R to be Op(1) for the M tests to be bounded
under the null hypothesis in this local framework. But k2s2R and TS2R can both
be bounded only if k/T1/2 -?> K for some 0 < K< 00. In the next section, we will
consider model selection procedures with this requirement in mind.

6.2. Local Asymptotic Propertiesof the MIC and IC

In the standard asymptotic framework, with data generated by (1), rT(k) is
invariant to k for T and k large. It follows that kmicgrows at a logarithmic rate
from Hannan and Deistler (1988). We now consider the properties of k chosen
by the MIC and the IC in the local asymptotic framework with the additional
technical requirement that C2/T -> 0. Using Theorem 2 and an expansion for
ln(6k2) derived in the Appendix, we have the following expansions for the MIC
and IC.

LEMMA 1: Let Yt be generated by (14) and K1T,K2T, and K3T as defined in

Theorem 2; then (i) uniformly in 0 < k < K1T:

(15) MIC(k) = ln(o-e2)+? + CT Z1+

CTk + OP(T-1/2);

(ii) for k/T1/2 -?> K, with #(K) defined in Theorem 2:

= In(e2)+ (K) + CT k
(16) MIC(k) n -e 'K)Z ? ?O(T1)
+OP(T-1/2);
?/T C/T

(iii) uniformly in K2T < k < K3T:

(17) MIC(k)=In(oe2) + CT Z3 + CT k + OP(T-1/2)

V7Texp(2k5/17T) T

The expansions for the IC(k) are the same except that the terms involving the Z's
are absent.

The objects of ultimate interest are the values kic and k,..ic chosen by the IC
and the MIC, respectively, over all possible values of k.

6
This is a more general result than Pantula (1991) who used a different local asymptotic framework
for modeling the negative moving average component in unit root tests. Pantula's result applies only
when k = Op(T1/4).
LAG LENGTH SELECTION 1533

THEOREM 3: Let Yt be generated by (14) and consider the rate of increase of k

as T -> oo. Define:
452 exp(-2K8)
FMIc(1)(K)
FMIC()(K) =
-
Vl exp(-2K5)]2

? TT
[1x(-2K3)]3?

2C2
MIC(2kKJ- _VCT/TI
K2
CT
K3T

FIC (K) = -482 exp(-2K8) ? CT

[1 -exp(-2K8)]2 T

When CT is fixed, k nicI(Knlic(l)T -I? 1 where =, and

Kmic(l) solves Fm c(l)(K)
(kic/KicT112) -1, where Kic solves F1c(K) = 0. When CT - ,> ookic/V/T/CTCT
1. When CT - 00 and CT/T -? d(> 0), kmic/(Kmic(2)V\T/CT) -> 1, where Kmic(2)
solves FMIC(2)(K) = 0. If CT -> ? and CI/T -+> o,kmic/(2ZiT)1/3 _> 1.

Provided CT is constant (as is the case with Akaike's value of 2), both the IC
and the MIC will choose k proportional to T1/2 asymptotically, much faster than
the logarithmic rate attained under the standard asymptotic framework. The rate
of 11 is optimal for unit root tests because this is the rate that will render the
class of DF as well as the M tests nondivergent under the null hypothesis in this
local asymptotic framework. However, when CT increases with T, this optimal
rate is not achieved.
The proportionality factor for the rate of increase of kic when CT increases
with T is one and highlights a major shortcoming of the IC in such cases, namely
a complete independence to 8, the parameter of the local asymptotic framework.
For the kmicwhen CT increases with T, the proportionality factor depends on
the parameters of the model via Z1. This effect, however, still vanishes as T
increases. This lack of flexibility does not arise when CT is fixed since Kmic and
Kic always maintain a dependence on 8. Although kic increases at the appropriate
rate of VT, it too has drawbacks because Kic is invariant to T and it is the same
whether p = 0 or p = 1. In contrast, Knic adapts to the sample size and the trend
function via a second order factor involving Z1. Thus, the MIC with CT fixed
maintains a proportionality factor that is the most flexible with respect to T, the
regression model, as well as parameters of the DGP.
If the weight CT on the penalty factor is too large, that is C/T- oo, the MIC
chooses a rate that is independent of CT, namely T1 3, while the IC continues to
select a k that is proportional to \VT/CT. This, however, is a result of theoretical
rather than practical interest since almost all information criteria that have been
suggested are such that CI/T -O 0.
To illustrate these properties, we consider the MAIC and AIC with CT = 2
and the MBIC and BIC with CT = ln(T). We denote the proportionality factors
by Kmaic, Kaic, Kmbic, and Kbic, respectively. Numerical values of K are reported in
1534 SERENA NG AND PIERRE PERRON

TABLE III.A
VALUES OF THE PROPORTIONALITY FACTOR K

p=0 p=1
T\8 0.50 1.00 1.50 2.00 5.00 8.00 0.50 1.00 1.50 2.00 5.00 8.00

100 bic 0.99 0.96 0.93 0.89 0.67 0.54 0.99 0.96 0.93 0.89 0.67 0.54
aic 0.69 0.65 0.61 0.57 0.39 0.30 0.69 0.65 0.61 0.57 0.39 0.30
mbic 3.59 1.73 1.24 1.04 0.68 0.54 6.34 2.89 1.87 0.40 0.71 0.54
maic 1.70 0.88 0.68 0.59 0.39 0.30 2.96 1.37 0.90 0.70 0.39 0.30
150 bic 0.99 0.96 0.93 0.90 0.69 0.55 0.99 0.96 0.93 0.90 0.69 0.55
aic 0.69 0.65 0.61 0.57 0.39 0.30 0.69 0.65 0.61 0.57 0.39 0.30
mbic 3.58 1.73 1.24 1.04 0.69 0.55 6.41 2.92 1.88 1.42 0.72 0.56
maic 1.63 0.85 0.67 0.59 0.39 0.30 2.87 1.32 0.88 0.69 0.39 0.30
250 bic 0.99 0.97 0.94 0.90 0.70 0.57 0.99 0.67 0.94 0.90 0.70 0.57
aic 0.69 0.65 0.61 0.57 0.39 0.30 0.69 0.65 0.61 0.57 0.39 0.30
mbic 3.55 1.71 1.24 1.04 0.71 0.57 6.46 2.93 1.89 1.42 0.73 0.57
maic 1.55 0.82 0.66 0.58 0.39 0.30 2.75 1.27 0.85 0.67 0.39 0.30
500 bic 0.99 0.97 0.94 0.91 0.72 0.58 0.99 0.97 0.94 0.91 0.72 0.58
aic 0.69 0.65 0.61 0.57 0.39 0.30 0.69 0.65 0.61 0.57 0.39 0.30
mbic 3.47 1.68 1.23 1.04 0.72 0.58 6.46 2.92 1.89 1.42 0.75 0.59
maic 1.44 0.79 0.65 0.58 0.39 0.30 2.60 1.20 0.81 0.65 0.39 0.30
5000 bic 0.99 0.98 0.96 0.93 0.76 0.63 0.99 0.98 0.96 0.93 0.76 0.63
aic 0.69 0.65 0.61 0.57 0.39 0.30 0.69 0.65 0.61 0.57 0.39 0.30
mbic 3.06 1.52 1.16 1.02 0.76 0.63 6.12 2.74 1.78 1.36 0.78 0.63
maic 1.14 0.71 0.62 0.57 0.39 0.30 2.12 1.00 0.72 0.60 0.39 0.30
50000 bic 0.99 0.98 0.96 0.94 0.79 0.66 0.99 0.98 0.96 0.94 0.79 0.66
aic 0.69 0.65 0.61 0.57 0.39 0.30 0.69 0.65 0.61 0.57 0.39 0.30
mbic 2.56 1.34 1.08 0.99 0.79 0.67 5.43 2.42 1.59 1.25 0.80 0.66
maic 0.92 0.67 0.61 0.57 0.39 0.30 1.69 0.85 0.66 0.58 0.39 0.30

Table III.A for selected values of 3 and T under the null hypothesis of a unit root
with c = 0. In the case of the MIC these values are random variables because of
the presence of Z1. The numerical calculations reported use its expected value.7
Table III.A shows first that the Kmic's are drastically larger than the Kics when 3
is small. Second, the Kic'Svary much less with 8 than the Kmic's. Third, the K,( ic'S
are larger when p = 1 than when p = 0, but the Kic's are invariant to the order
of the deterministic terms in the regression.
Values of k selected by the IC and the MIC are in Table III.B. These are
constructed using the expected value of the MIC given by (for the IC(k) the term

7 To get the expected value we use the following crude approximation. For p = 0, E[Z1]
(E[1 + 82 ]1 W(r)2dr])/ 4E([fJ W(r)2dr]2) = (4 + 252)/84. For p = 1, E[Zj] E[1 +
82 fl Vo,(r)2dr]/84E(fl Vo0(r)2dr)2. For c =-13.5, tedious but straightforward algebra gives
; ?+
E[Z1] (1 ?2(.102475)/(1
+ 4(.102475)2).
LAG LENGTH SELECTION 1535

TABLE III.B
k CHOSEN BY MINIMIZING THE ASYMPTOTIC APPROXIMATION OF THE MIC AND IC (18)

p=O0 p=1
T\8 0.50 1.00 1.50 2.00 5.00 8.00 0.50 1.00 1.50 2.00 5.00 8.00

100 bic 4 4 4 4 3 2 4 4 4 4 3 2
aic 6 6 6 5 3 3 6 6 6 5 3 3
mbic 16 8 5 4 3 2 29 13 8 6 3 2
maic 17 8 6 5 3 3 29 13 9 7 3 3
150 bic 5 5 5 4 3 3 5 5 5 4 3 3
aic 8 8 7 7 4 3 8 8 7 7 4 3
mbic 19 9 6 5 3 3 35 15 10 7 3 3
maic 20 10 8 7 4 3 35 16 10 8 4 3
250 bic 6 6 6 6 4 3 6 6 6 6 4 3
aic 10 10 9 9 6 4 10 10 9 9 6 4
mbic 23 11 8 7 4 3 43 19 12 9 4 3
maic 24 13 10 9 6 4 43 20 13 10 6 4
500 bic 8 8 8 8 6 5 8 8 8 8 6 5
aic 15 14 13 12 8 6 15 14 13 12 8 6
mbic 31 15 11 9 6 5 57 26 16 12 6 5
maic 32 17 14 13 8 6 58 27 18 14 8 6
5000 bic 24 23 23 22 18 15 24 23 23 22 18 15
aic 49 46 43 40 27 21 49 46 43 40 27 21
mbic 74 37 28 24 18 15 148 66 43 33 19 15
maic 80 50 44 40 27 21 150 71 50 42 27 21
50000 bic 67 66 65 63 53 45 67 66 65 63 53 45
aic 154 147 137 128 88 68 154 147 137 128 88 68
mbic 174 91 74 67 54 45 369 164 108 85 54 45
maic 207 151 138 128 88 68 379 190 147 130 88 68

with Z1 is absent)

2) (1- - O
(18) E[MIC(k)] = lnf(e ?)+ )2(k?l)

+ CT ([( j ( )2(k+l) ] ( T
This expression is valid uniformly in 0 < k < K3T for K3T = o(T) and is obtained
using equation (A.4) and Lemma A.5. When 8 is small and especially when
p = 1, the kmicare several folds larger than the kic. As well, differences between
kaic and knaic are nontrivial even for sample sizes deemed unrealistically large.
Table JJJ.Balso provides some guidance for the k's that would likely be selected
in practice. When 0 = -.9, for example, the local asymptotic results for 8 = 1 and
T = 100 suggest that kbiC = 4 for both p = 0 and 1, but that the kmaic are 8 and
13, respectively. The local asymptotic results therefore support the earlier claim
that the IC lacks the flexibility to handle DGP's with a large negative moving
1536 SERENA NG AND PIERRE PERRON

average root and/or alternative deterministic components. The issue of practical

interest is whether in finite samples and for realistic values of the moving average
parameter, such large discrepancies remain and what are the implications for
unit root tests. The rest of the analysis presents evidence to this effect.

6.3. The Finite Sample Propertiesof knfic

To evaluate the finite sample properties of the MIC vis-a-vis the IC, we per-
formed the following simulation experiment. For a given DGP, we construct the
MZoGLSand DFGLS tests at each k E [0,10], and record the exact sizes. We then
find the so-called optimal values of k*, denoted k* and k*gis, as the first k with
a size closest to within .03 of the nominal size of 0.05. If no such k exists, k* is
the k for which the exact size of the test has the smallest absolute deviation from
the nominal size of 5%. We then obtain kbiCas the median value selected by the
BIC over the range 0 and 10. A similar procedure is used to obtain kmlbiC, kaic,
and kniaic. The setup for the simulations is as outlined in Section 4.
In Table IVA (p = 0) and IVB (p = 1), we report the values for k2*, k kgs5 k
and k,nic for selected parameters for CT = 2 and CT = ln(T - kmax).While the kic's
are in the same range as the k*'s for many cases, there are important differences
between the two when 0 < 0. The k,1i is much closer to k,1* and kfgls in negative
MA models than the kic. For example, when 0 = -.8, k* and k*gis are 10 at
T = 250 and p = 1; the BIC chooses 3 over half the time whereas k,znaichas 8 as
the median value. The kaic is closer to k,2* and k*fgls than the kbiC,but still not
as close as the kniaic. For the AR case, the kmaic indeed selects the appropriate
order of one, but when p= 1, this sometimes differs from k* . This does not
reflect a weakness of the MIC but rather that the MGLS tests are undersized
when the autoregressive coefficient is large and negative. The results show that
k maiccorresponds closely to the k*fgls in the AR case.

7. SIZE AND POWER OF THE TESTS IN FINITE SAMPLES

In this section, we need to establish two things: first, that the MIC chooses
values of k that are appropriate in the sense of minimizing size distortions under
the null hypothesis and not overparameterizing under the alternative; second,
that the MGLS and MGLS statistics have power comparable to the DFGLS and
dominate the M tests. Whenever an estimate of U2 is required, S2R based upon
the regression (8) is used. Because the MAIC dominates all other criteria from
both theoretical and numerical perspectives we only consider the MGLS tests
constructed using the MAIC. In the simulations, the lower bound is always zero
to reduce the chance of overparameterizing when a large k is not necessary.
The upper bound is kmax= int(12(T/100)1/4).8 The results are tabulated for T =
100,150, and 250. The ten statistics being considered are detailed in the notes
to Table VIA and Table VI.A.
8 In practice, the upper bound should be set such that the constraint does not bind. In simulations,
the upper bound occasionally binds when T = 100 but not for larger sample sizes (e.g. T = 150).
LAG LENGTH SELECTION 1537

For p = 0, the results for size are reported in Table VA. The M and MGLS
tests still have inflated sizes at T = 100 when 0 = -0.8. However, use of GLS
detrended data to construct S2R produces substantial size improvements. The
DFGLS based on the BIC yields an exact size around 0.4 at 0 = -0.8, but

TABLE IV.A
SELECTED k FOR p = 0

MA Case: vt = (1 + OL)et
T =100 T =250

o ~k*Z7 k*fgls kbic kaic kmbic k k kdfgls kbic kaic kbic aic
MZaic
-0.8 8 8 2 4 5 6 7 10 4 6 5 7
-0.7 6 10 2 3 3 4 6 7 3 5 4 5
-0.6 5 7 1 2 2 3 5 6 2 4 3 4
-0.5 3 7 1 2 2 2 4 4 2 3 2 3
-0.4 2 4 1 2 1 2 2 2 1 2 2 2
-0.3 2 3 1 1 1 1 2 2 1 2 1 2
-0.2 1 2 0 1 1 1 1 1 1 1 1 1
-0.1 1 1 0 0 0 0 1 1 0 1 0 1
0.0 0 0 0 0 0 0 0 0 0 0 0 0
0.1 0 0 0 0 0 0 0 0 0 1 0 1
0.2 0 0 0 1 0 1 1 1 1 1 1 1
0.3 2 2 1 1 1 1 2 1 1 2 1 2
0.4 2 2 1 2 1 2 2 2 1 2 2 2
0.5 2 2 1 2 2 2 2 2 2 3 2 3
0.6 2 2 2 3 2 3 2 2 2 4 2 4
0.7 2 2 2 4 2 4 2 2 3 5 3 5
0.8 2 2 3 5 3 5 2 2 4 7 4 7
AR Case: (1-pL)vt =et
T =100 T =250

kZ kfgs kbic kaic k,nbic k* k*fgls kbic kaic k,nbic

kt7iaic kt7iaic
-0.8 1 1 1 1 1 1 1 1 1 1 1 1
-0.7 1 1 1 1 1 1 1 1 1 1 1 1
-0.6 1 1 1 1 1 1 1 1 1 1 1 1
-0.5 1 1 1 1 1 1 1 1 1 1 1 1
-0.4 1 1 1 1 1 1 1 1 1 1 1 1
-0.3 1 1 1 1 1 1 1 1 1 1 1 1
-0.2 1 1 0 1 1 1 1 1 1 1 1 1
-0.1 1 1 0 0 0 0 1 1 0 1 0 1
0.0 0 0 0 0 0 0 0 0 0 0 0 0
0.1 0 0 0 0 0 0 0 0 0 1 0 1
0.2 1 0 0 1 0 1 1 1 1 1 1 1
0.3 1 1 1 1 1 1 1 1 1 1 1 1
0.4 0 1 1 1 1 1 1 1 1 1 1 1
0.5 1 1 1 1 1 1 1 1 1 1 1 1
0.6 1 1 1 1 1 1 1 1 1 1 1 1
0.7 1 1 1 1 1 1 1 1 1 1 1 1
0.8 1 1 1 1 1 1 1 1 1 1 1 1
Note: The entries are the median values of the selected k using 5,000 replications.
1538 SERENA NG AND PIERRE PERRON

TABLE IV.B
SELECTED k FOR p = 1

MA Case: vt = (1 + OL)et
T =100 T =250

o kznz kdfgls kbic kaic kbi. k7aic kTh* k*fgls kbic kaic k,nbic k,paic

-0.8 6 10 0 2 6 6 7 10 3 5 7 8
-0.7 4 8 1 2 4 5 5 8 3 4 5 6
-0.6 4 6 1 2 3 4 4 6 2 3 4 4
-0.5 3 6 1 2 2 3 3 4 2 3 3 3
-0.4 2 3 1 1 2 2 2 3 1 2 2 2
-0.3 2 2 0 1 1 1 2 2 1 1 1 2
-0.2 1 4 0 1 1 1 1 1 1 1 1 1
-0.1 0 1 0 0 0 1 1 1 0 1 0 1
0.0 0 0 0 0 0 0 0 0 0 0 0 0
0.1 1 0 0 1 0 0 1 0 0 1 0 1
0.2 1 2 0 1 0 1 1 1 1 1 0 1
0.3 0 2 1 1 0 1 2 2 1 2 1 2
0.4 2 2 1 2 1 2 2 2 1 2 2 2
0.5 2 2 1 2 2 2 2 2 2 3 2 3
0.6 2 2 2 3 2 2 2 2 3 4 2 4
0.7 2 2 2 4 2 4 2 2 3 5 3 5
0.8 2 2 3 5 2 4 4 4 4 7 4 6
AR Case: (1 - pL)vt = et
T =100 T =250

k* kfg1s kbic kaic k,nbic k k*fg kbic kaic k,nbic

kiaic kt?iaic
-0.8 3 1 1 1 1 1 4 1 1 1 1 1
-0.7 3 1 1 1 1 1 1 1 1 1 1 1
-0.6 2 1 1 1 1 1 1 1 1 1 1 1
-0.5 1 1 1 1 1 1 1 1 1 1 1 1
-0.4 1 1 1 1 1 1 1 1 1 1 1 1
-0.3 1 1 1 1 1 1 1 1 1 1 1 1
-0.2 1 1 0 1 1 1 1 1 1 1 1 1
-0.1 0 1 0 0 0 0 0 1 0 1 0 1
0.0 0 0 0 0 0 0 0 0 0 0 0 0
0.1 1 0 0 1 0 0 1 0 0 1 0 1
0.2 1 1 1 1 0 1 1 1 1 1 0 1
0.3 1 1 1 1 0 1 1 1 1 1 1 1
0.4 1 1 1 1 1 1 1 1 1 1 1 1
0.5 1 1 1 1 1 1 1 1 1 1 1 1
0.6 1 1 1 1 1 1 1 1 1 1 1 1
0.7 1 1 1 1 1 1 1 1 1 1 1 1
0.8 1 1 1 1 1 1 1 1 1 1 1 1

Note: The entries are the median values of the selected k using 5,000 replications.

the MAIC improves the size to 0.107. The MAIC also yields a substantial size
improvement over DFGLS(t), which has a size of .35 when 0 = -0.8. The results
also confirm that the PT test constructed with k chosen by BIC suffers from sub-
LAG LENGTH SELECTION 1539

TABLE V.A
SIZE OF THE TESTS; p = 0

MA Case: vt = et + Oeti , et - N(0, 1)

ncii mL DEbic DFC DFGL GLS MPG~LS
MGLS ,IncS
MGLS DDF,G?/cS
S
DFGLS DF PTbC TIC ,TiC

100 -0.8 0.058 0.041 0.036 0.394 0.411 0.356 0.107 0.429 0.034 0.036
-0.5 0.028 0.013 0.056 0.082 0.104 0.095 0.073 0.109 0.048 0.051
0.0 0.018 0.005 0.043 0.072 0.077 0.072 0.050 0.094 0.036 0.040
0.5 0.027 0.013 0.062 0.081 0.082 0.080 0.048 0.118 0.048 0.054
0.8 0.042 0.018 0.067 0.095 0.097 0.091 0.039 0.145 0.054 0.062
150 -0.8 0.034 0.018 0.026 0.407 0.390 0.304 0.095 0.420 0.025 0.027
-0.5 0.025 0.009 0.054 0.085 0.093 0.086 0.066 0.091 0.047 0.049
0.0 0.017 0.005 0.049 0.062 0.075 0.070 0.051 0.085 0.040 0.046
0.5 0.028 0.011 0.054 0.066 0.072 0.071 0.046 0.090 0.047 0.050
0.8 0.040 0.013 0.060 0.079 0.077 0.070 0.034 0.108 0.047 0.052
250 -0.8 0.016 0.006 0.022 0.370 0.323 0.236 0.088 0.380 0.021 0.021
-0.5 0.036 0.012 0.058 0.075 0.089 0.079 0.063 0.086 0.049 0.054
0.0 0.021 0.006 0.047 0.059 0.062 0.062 0.045 0.069 0.039 0.042
0.5 0.034 0.009 0.056 0.059 0.068 0.069 0.045 0.078 0.047 0.051
0.8 0.044 0.014 0.059 0.066 0.062 0.058 0.042 0.081 0.050 0.053
AR Case: Vt = pvtl +et, et - N(0, 1)

T p M7liic MGLcS I7GLS DEbic DFGLS DFt DFGSc PTbic GLSC MPGLC

100 -0.8 0.002 0.000 0.015 0.067 0.082 0.076 0.049 0.063 0.013 0.014
-0.5 0.010 0.003 0.037 0.067 0.076 0.071 0.046 0.087 0.028 0.033
0.5 0.029 0.012 0.064 0.082 0.087 0.082 0.053 0.108 0.051 0.057
0.8 0.047 0.025 0.090 0.091 0.081 0.077 0.053 0.125 0.068 0.077
150 -0.8 0.003 0.000 0.021 0.062 0.069 0.067 0.048 0.051 0.019 0.020
-0.5 0.013 0.004 0.039 0.060 0.074 0.069 0.045 0.072 0.031 0.035
0.5 0.030 0.010 0.055 0.063 0.069 0.066 0.047 0.081 0.046 0.051
0.8 0.043 0.017 0.070 0.067 0.071 0.069 0.047 0.092 0.055 0.063
250 -0.8 0.007 0.001 0.029 0.056 0.057 0.054 0.040 0.049 0.026 0.028
-0.5 0.022 0.006 0.046 0.052 0.057 0.058 0.047 0.062 0.040 0.042
0.5 0.028 0.008 0.052 0.056 0.061 0.061 0.045 0.070 0.043 0.046
0.8 0.037 0.010 0.060 0.060 0.061 0.063 0.051 0.073 0.051 0.056
Notes: The results reported for the M,7ic tests are those corresponding to MZ. Since the numerical results for MZa, MZt,
and MSB are similar, we use the label M in the tables. Three variants of the M tests are considered. The Mt,,ic is the original
MZa, the M,7GILS uses GLS detrended data to construct the statistic but uses least squares detrending in the estimation of s2 R and
KIGiLcS uses GLS detrending at c = -7.0 when constructing sAR. When the BIC is used, km,a = 6 and km,n = 3 as in ERS. For the
MIC, kmax= int(12(T/100)1/4), kmin = 0, and CT = 2, giving the MAIC. The DFGLS is based on the 10% sequential t test for the
significance of the last lag.

-GLS

stantial size distortions. However, the PT and MPT show remarkably little size
distortions when the MAIC is used to select k.
To assess the relative power of the tests, we tabulate the rejection rates at
a 1 + C/T. For p = 0 and c = -7.0, the values of arare 0.93, 0.953, and 0.972
=
1540 SERENA NG AND PIERRE PERRON

TABLE V.B
SIZE-ADJUSTEDPOWEROF THE TESTS;p = 0

MA Case:vt = et + Oet-1,et - N(0, 1)

T 0 Ms7c MzGnzLS
MjmGLS DEbic DFbGcS DFtGLS DFG7LSc PTbic FLS IpGLS

100 -0.8 0.193 0.237 0.311 0.241 0.419 0.412 0.385 0.313 0.326 0.311
-0.5 0.253 0.351 0.344 0.191 0.448 0.429 0.385 0.414 0.362 0.355
0.0 0.315 0.452 0.458 0.131 0.432 0.436 0.454 0.418 0.471 0.442
0.5 0.259 0.332 0.324 0.122 0.414 0.402 0.370 0.402 0.351 0.347
0.8 0.180 0.257 0.264 0.134 0.366 0.335 0.343 0.348 0.296 0.281
150 -0.8 0.180 0.252 0.296 0.262 0.448 0.417 0.415 0.381 0.307 0.292
-0.5 0.267 0.387 0.384 0.195 0.450 0.430 0.393 0.421 0.398 0.396
0.0 0.339 0.463 0.460 0.152 0.422 0.405 0.454 0.413 0.467 0.453
0.5 0.263 0.382 0.380 0.160 0.441 0.418 0.405 0.429 0.393 0.398
0.8 0.233 0.346 0.363 0.147 0.382 0.362 0.404 0.387 0.377 0.374
250 -0.8 0.190 0.319 0.334 0.287 0.441 0.420 0.437 0.419 0.335 0.327
-0.5 0.264 0.404 0.396 0.201 0.477 0.455 0.428 0.458 0.425 0.409
0.0 0.338 0.481 0.477 0.152 0.464 0.446 0.483 0.455 0.485 0.478
0.5 0.258 0.411 0.411 0.169 0.439 0.424 0.431 0.441 0.422 0.413
0.8 0.246 0.385 0.395 0.143 0.432 0.413 0.419 0.434 0.406 0.405
AR Case:vt = pvt-1 + et, et - N(0, 1)
T P Mttlic Ms7Gzz.LS 2 inic
MGLS DFbic S
DFGzLS Tb FGLS
Milc M
'ics DFbGcS
biTbic DFtGLS Tic MPGTtLS
TrIc

100 -0.8 0.246 0.349 0.350 0.154 0.433 0.429 0.404 0.384 0.362 0.342
-0.5 0.330 0.437 0.432 0.145 0.428 0.420 0.455 0.405 0.452 0.437
0.5 0.279 0.365 0.368 0.115 0.338 0.330 0.385 0.379 0.385 0.374
0.8 0.196 0.251 0.241 0.084 0.297 0.292 0.298 0.261 0.276 0.269
150 -0.8 0.275 0.398 0.416 0.161 0.477 0.437 0.432 0.442 0.419 0.409
-0.5 0.332 0.453 0.454 0.161 0.404 0.390 0.468 0.423 0.463 0.458
0.5 0.307 0.419 0.411 0.140 0.417 0.410 0.446 0.407 0.436 0.430
0.8 0.232 0.333 0.330 0.103 0.341 0.324 0.362 0.325 0.349 0.353
250 -0.8 0.325 0.479 0.472 0.157 0.485 0.480 0.493 0.468 0.477 0.471
-0.5 0.300 0.452 0.447 0.179 0.493 0.477 0.466 0.469 0.459 0.460
0.5 0.336 0.443 0.436 0.158 0.447 0.434 0.456 0.428 0.468 0.456
0.8 0.279 0.375 0.366 0.131 0.386 0.382 0.381 0.381 0.387 0.383
Note:Poweris evaluatedat a = 1+ c/T, c = -7.0. Theseare 0.93,0.953,and0.972for T = 100,T = 150,and T = 250,respectively.

for T = 100, 150, and 250, respectively.9 The size-adjusted power of the tests are
reported in Table VB. Although the tests should reject the null hypothesis 50%
of the time in large samples, none of the tests quite achieve the 50% (asymptotic)
target for sample sizes less than 200. Thus, in finite samples, the need to account
for serial correlation reduces the power of the tests across the board. However,
the GLS based tests have more power than tests that do not implement GLS

9 We also performed extensive power simulations with other values of a and T. The results are
qualitatively similar and do not affect the conclusions concerning the relative performance of the
different tests.
LAG LENGTH SELECTION 1541

TABLE VI.A
SIZE OF THE TESTS; p = 1

MA Case: vt = et + Oet-I, et - N(0, 1)

T 0 MZnic
MszGliLMS MGLS DEbic DFS DFtGS DFGLS PTbic GLS MpGLSc

100 -0.8 0.140 0.123 0.059 0.509 0.567 0.528 0.123 0.562 0.061 0.061
-0.5 0.037 0.025 0.035 0.092 0.107 0.111 0.069 0.145 0.036 0.036
0.0 0.011 0.005 0.018 0.075 0.074 0.078 0.039 0.161 0.017 0.018
0.5 0.032 0.022 0.048 0.084 0.085 0.094 0.038 0.190 0.046 0.047
0.8 0.048 0.033 0.056 0.096 0.105 0.109 0.017 0.250 0.053 0.056
150 -0.8 0.074 0.061 0.027 0.581 0.586 0.477 0.082 0.570 0.028 0.028
-0.5 0.032 0.017 0.028 0.087 0.103 0.104 0.049 0.114 0.028 0.029
0.0 0.017 0.008 0.023 0.066 0.067 0.072 0.038 0.111 0.023 0.024
0.5 0.025 0.016 0.032 0.068 0.084 0.085 0.029 0.142 0.032 0.034
0.8 0.058 0.035 0.058 0.080 0.088 0.085 0.018 0.181 0.055 0.057
250 -0.8 0.025 0.016 0.012 0.566 0.531 0.381 0.064 0.554 0.012 0.012
-0.5 0.031 0.016 0.033 0.094 0.105 0.099 0.048 0.103 0.034 0.035
0.0 0.023 0.011 0.029 0.056 0.059 0.065 0.038 0.083 0.030 0.031
0.5 0.032 0.012 0.037 0.062 0.066 0.070 0.028 0.097 0.036 0.037
0.8 0.059 0.032 0.059 0.062 0.067 0.060 0.026 0.119 0.059 0.060
AR Case: vt = pvt_ +et, et - N(0, 1)

T T P
p M1721c
Millic MGiS MGLS
n DEbic DFbGcS DFtGLS DFGLS Tb*c pGLS MpGLS
mIc mi Sm c t Pbc C
Trni~ PTrnC
100 -0.8 0.000 0.000 0.000 0.072 0.079 0.084 0.034 0.067 0.001 0.000
-0.5 0.006 0.003 0.012 0.071 0.074 0.077 0.037 0.122 0.013 0.013
0.5 0.033 0.021 0.042 0.076 0.083 0.088 0.038 0.178 0.040 0.043
0.8 0.081 0.056 0.093 0.081 0.086 0.091 0.043 0.210 0.080 0.085
150 -0.8 0.001 0.000 0.003 0.061 0.062 0.069 0.028 0.043 0.001 0.004
-0.5 0.017 0.006 0.024 0.059 0.070 0.075 0.038 0.090 0.014 0.023
0.5 0.037 0.018 0.044 0.056 0.070 0.074 0.039 0.127 0.041 0.043
0.8 0.045 0.022 0.056 0.063 0.066 0.073 0.043 0.142 0.056 0.055
250 -0.8 0.001 0.000 0.003 0.057 0.054 0.058 0.028 0.009 0.004 0.004
-0.5 0.017 0.006 0.024 0.046 0.047 0.053 0.038 0.025 0.023 0.023
0.5 0.037 0.018 0.044 0.059 0.063 0.068 0.039 0.046 0.042 0.043
0.8 0.045 0.022 0.056 0.054 0.056 0.061 0.043 0.054 0.052 0.055
Note: The results reported for the Mr..ic tests are those corresponding to MZ,. Since the numerical results for MZa, MZt,
and MSB are similar, we use the label M in the tables. Three variants of the M tests are considered. The Mr7,ic is the original
MZ, the MMGLS uses GLS detrended data to construct the statistic but uses least squares detrending in the estimation of s2 R and
MG,S uses GLS detrending at c = -13.5 when constructing 52 When the BIC is used, kmaj = 6 and kmin = 3 as in ERS. For the
MIC, kmax= int(12(T/100)1/4), k i = 0, and CT = 2, giving the MAIC. The DFGLS is based on the 10% t test for the significance
of the last lag.

detrending. Furthermore, the power increase from T = 100 to T = 150 is quite

substantial, especially when we take into account the fact that we are evaluating
at a closer alternative at T = 150 than at T = 100.
For p = 1, the results for size are reported in Table VI.A. The size distortions
are somewhat larger when there is a linear trend in the regression, but there are
clearly gains from using GLS detrended data to construct S2R and selecting k
1542 SERENA NG AND PIERRE PERRON

TABLE VI.B
SIZE-ADJUSTEDPOWEROF THE TESTS;P = 1

MA Case:vt = et + Oet-1, et - N(0, 1)

T 0 Mn ic MGLS
nrc MGLs
fleI c DFbic DFGLS DFtGLS DFGLS
u i
PTbic Ll
,nIC M pGLS
Tm IC

100 -0.8 0.296 0.266 0.348 0.283 0.398 0.414 0.357 0.355 0.350 0.348
-0.5 0.320 0.246 0.300 0.258 0.421 0.420 0.324 0.324 0.311 0.305
0.0 0.450 0.266 0.439 0.175 0.362 0.341 0.432 0.488 0.448 0.442
0.5 0.196 0.147 0.207 0.161 0.316 0.310 0.259 0.320 0.224 0.218
0.8 0.149 0.166 0.154 0.169 0.282 0.269 0.254 0.260 0.163 0.155
150 -0.8 0.236 0.269 0.316 0.309 0.444 0.416 0.354 0.326 0.316 0.314
-0.5 0.283 0.348 0.325 0.291 0.434 0.415 0.358 0.384 0.333 0.330
0.0 0.407 0.470 0.457 0.200 0.384 0.352 0.462 0.369 0.466 0.465
0.5 0.312 0.347 0.348 0.212 0.365 0.365 0.375 0.354 0.344 0.348
0.8 0.171 0.209 0.217 0.201 0.317 0.310 0.309 0.302 0.227 0.225
250 -0.8 0.247 0.285 0.296 0.371 0.460 0.430 0.375 0.406 0.300 0.297
-0.5 0.308 0.370 0.366 0.278 0.459 0.441 0.392 0.422 0.374 0.371
0.0 0.380 0.458 0.451 0.250 0.437 0.418 0.447 0.432 0.455 0.456
0.5 0.311 0.374 0.365 0.258 0.426 0.406 0.388 0.441 0.367 0.363
0.8 0.225 0.268 0.271 0.233 0.364 0.354 0.310 0.388 0.280 0.282
AR Case: Vt = pvt_l +et, et - N(0, 1)

T P Mnlic M'GzI.Ls gCGLS DFbic DFGLS DFtGLS DFG7LSc PTbic TGLc MpTGLSc

100 -0.8 0.257 0.308 0.296 0.204 0.380 0.372 0.385 0.256 0.301 0.299
-0.5 0.332 0.396 0.382 0.205 0.367 0.366 0.431 0.303 0.398 0.393
0.5 0.235 0.265 0.270 0.133 0.262 0.264 0.286 0.247 0.279 0.280
0.8 0.114 0.127 0.129 0.098 0.167 0.165 0.206 0.144 0.150 0.144
150 -0.8 0.311 0.363 0.349 0.240 0.435 0.409 0.409 0.345 0.351 0.347
-0.5 0.388 0.433 0.423 0.240 0.404 0.376 0.437 0.385 0.438 0.429
0.5 0.296 0.338 0.345 0.207 0.338 0.313 0.366 0.333 0.358 0.355
0.8 0.189 0.211 0.218 0.133 0.239 0.228 0.251 0.211 0.228 0.224
250 -0.8 0.379 0.452 0.441 0.245 0.472 0.456 0.468 0.464 0.446 0.442
-0.5 0.372 0.453 0.451 0.300 0.509 0.484 0.445 0.445 0.452 0.457
0.5 0.331 0.387 0.385 0.233 0.391 0.372 0.392 0.401 0.395 0.388
0.8 0.259 0.280 0.288 0.189 0.314 0.292 0.289 0.312 0.285 0.284
Note: Poweris evaluatedat a = 1 + cIT, c =-13.5. There are 0.865, 0.910, and 0.946 for T = 100, T = 150, and T = 250,
respectively.

with the MAIC. The resulting MGLS tests have size close to the nominal size for
all parameter values, while the M and MGLS tests continue to have size close
to .15. As well, the MAIC yields exact sizes for the DFGLSand the PT tests that
are far more accurate than those obtained using the sequential t test and the
BIC.
The size-adjusted power of the tests for p = 1 are reported in Table VI.B.
Power is again evaluated at ar= 1 + c/T. For c = -13.5, ar is 0.865 for T =
100, 0.910 for T = 150, and 0.946 for T = 250. Compared to the size-adjusted
LAG LENGTH SELECTION 1543

power reported in Table VB, all GLS based tests have lower power when p = 1.
However, there are non-negligible power gains from GLS detrending.
A feature of interest is that the MIC allows the MGLS tests to have an exact size
that is closer to 5% than fixing k at k* assuming a known DGP. For example,
in the MA case with 0 = -.8 and p = 1, the size is .059 using the MIC and .159
using a value of k fixed at k* = 6. When T = 150, the corresponding values
are .027 with the MAIC and .093 at k* = 8 (not reported). Hence, the MAIC
chooses a k that adapts to the particular sample analyzed. Size improvements
obtained using the MIC compared to fixing k at k* are even larger when the
number of deterministic components increases (see Perron and Rodriguez (1998)
for the case of a trend with a change in slope).
The size reported for MGLS is remarkable, given that when 0 < 0, the Z tests
based on kernel estimates of o2 reject with probability close to one under the null
hypothesis. The DF tests have higher size-unadjusted power in the case of nega-
tive MA errors. Such high probabilities of rejections are misleading, however, as
they reflect size distortions. Although the DFGiLS has higher size-adjusted power
than Mic for some parameter values, this result must also be put into perspec-
tive. Consider 0 = -0.8, T = 100, and p = 1. While the size-adjusted power of
the DFbicS is higherthan the M'L by 5 basis points, its size is more distortedby
45 basis points. This is a substantial increase in the probability of a Type I error.
The DFGLsfares better, but there is still a size advantage in favor of the Mm.
Overall, the MGiLfS tests have better sizes, while the DFGLs has better power.
A drawback of the M,Gfr test is that it is substantially undersized for pure AR
processes with a coefficient close to -1. Although economic time series rarely
display such a property, this nonetheless translates into lower (size unadjusted)
power. As discussed in Section 6.2, this is not due to a weakness of the MIC in
selecting k, but rather is an inherent property of the MGLS tests. The trade-offs
between the DFGLs and the MFGL are to be determined by the practitioner.

8. EMPIRICAL ILLUSTRATIONS

To illustrate the ease of application of our procedures and the differences in

inference that can result, we consider inflation series from the G7 countries. Data
for the GDP deflator are taken from the OECD International Sector Database.
We constructed annualized quarterly inflation rate series covering the period
1960:2 to 1997:2 (T + 1 = 149 observations) as (400 times) the logged differences
of successive quarters. All estimation results are presented in Table VII. As a pre-
liminary step, we first estimated the simple ARMA(1, 1) model for each series.
The results show autoregressive roots that are indeed close to one (ranging from
.92 for Germany to .98 for France). Also, the MA coefficients are negative for
all series, ranging from -.27 for Italy to -.72 for France. This suggests that such
series are likely affected by the types of problems we discussed.
The first step to constructing the tests is to generate GLS detrended series
as defined by (6) with c = -7 as the noncentrality parameter (since inflation
series are nontrending; otherwise use c = -13.5). The next step is to estimate by
1544 SERENA NG AND PIERRE PERRON

TABLE VII
EMPIRICAL RESULTS FOR INFLATION SERIES FROM THE G7 COUNTRIES

USA Canada UK Japan Italy France Germany

AR(a): .95 .95 .93 .87 .92 .98 .92

MA(O): -.32 -.51 -.55 -.42 -.27 -.72 -.54
BIC k 2 1 1 1 0 3 3
zGLS -4.04 -9.63b _26.81a -34.12a _7.39C _5.70c -26.58a
MZGLS -3.71 -8.13b -18.05a -23.95a -7.17c _3.49 -16.57a
DFGLS -1.37 -2.07b -3.31a -3.85a _1.95c -1.36 -3.00a
pGLS 7.39 3.36c 1.40a 1.01a 4.06c 7.92 1.96b
MpGLS 6.60 3.12b 1.41a 1.02a 3.42c 7.01 1.98b

MAIC k 2 2 8 12 5 7 5
ZGLS -4.04 -6.98C - 14.75a -13.35a -2.71 -3.27 - 18.83a
MZGLS -3.71 -5.48 -6.00C -3.17 -2.49 -1.07 -8.82b
DFGLS -1.37 -1.69c 1.77c -1.33 -1.08 -0.77 -2.21b
pGLS 7.39 4.92 4.13c 7.64 11.68 25.85 3.42c
MpGLS 6.60 4.58 4.17c 7.73 9.83 22.89 3.44c
Note: a, b, and c denotes a statistic significant at the 1%, 5%, and 10% level, respectively.

OLS the autoregression (8) for all values of k ranging from 0 to some maximal
order, say kmax.We use kmax = int(12(T/100)1/4) but other values are valid. In
the model selection stage, each autoregression uses the same number of effective
observations, namely T - kmax.The selected kmaicis obtained as the value of
k that minimizes (12). Given k naic' the DFGLs is obtained as the t-statistic on
go = 0 from re-estimation of (8) with T - kmaic effective observations. The same
regression is used to construct S2R according to (5). Then, the MGLS tests are
constructed from (3) and (4), the P`LS test from (7), and the MP.LS test from
(9), all with GLS detrended series jt. Using the BIC, the steps are the same
except that the term rT(k) is omitted from (12).
The results show interesting differences between using the MAIC or the BIC
to select the lag length. First, for all countries, except the U.S., the selected lag
length is higher with MAIC than with BIC. Second, in all cases (except the U.S.),
the evidence in favor of stationarity is weaker with the MAIC than with the BIC.
For example, with the BIC, the MZGLS and DFbGLS tests suggest a rejectionat
the 1% significance level for Japan and Germany, at the 5% level for Canada,
and at the 10% level for Italy. When using the MAIC, there are only rejections
at the 5% level for Germany (showing that the tests still have power) and at
the 10% level for the U.K. For Canada, the DFGLs is significant at the 10%
level but barely. For the U.S. and France, there is no difference in outcome. Of
interest also is the fact that the ZGLSaiC still shows strong rejections at the 1%
level for Japan, the U.K., and Germany. This accords with the simulation results
that the Z GLS test has high size distortions even when constructed using the
MAIC.
LAG LENGTH SELECTION 1545

9. CONCLUSION

The analysis of ERS shows that detrending data by GLS yields power gains
for unit root tests. We find that GLS detrending also allows for a more pre-
cise autoregressive spectral density estimate and ensures that it is invariant to
the parameters of the trend function. We also show that the BIC or AIC are
not sufficiently flexible for unit root tests. We suggest a modification, the MAIC,
which is evidently more robust when there are negative moving-average errors.
Use of the MAIC in conjunction with GLS detrended data results in a class
of MGLS tests that have good size and power. The key distinction between the
MAIC and standard information criteria is that the former takes account of the
fact that the bias in the estimate of the sum of the autoregressive coefficients is
highly dependent on k. In this paper, we show that the MAIC is useful for the
DFGLS test and the construction of the autoregressive estimate of the spectral
density at frequency zero. We also expect the MAIC, or the more general MIC,
to be useful in a broader range of applications because macroeconomic data are
known to be highly persistent, and in such cases, the bias in the estimate of the
sum of the autoregressive coefficients should depend on the order of the autore-
gression. However, it should also be noted that while the suggested information
criterion is useful in the context of unit root tests as we have shown, it may not
be appropriate when selecting the lag order in other contexts. The general use-
fulness of the MIC also awaits further investigation.

Dept. of Economics, Johns Hopkins University,Baltimore, MD 21218, U.S.A.;

Serena.Ng@jhu.edu
and
Dept. of Economics, Boston University,270 Bay State Rd., Boston, MA 02215,
U.S.A.;perron@Rbu.edu

Manuscriptreceived May, 1997; final -evision receivedAugust, 2000.

APPENDIX

PROOF OF THEOREM 2: In this Appendix, for simplicity, we consider the derivations in the case
where the DGP is (14) and no deterministic component is present in the regression (8), in which
case Yt is used instead of ,t. All the results carry over to the more general case with least-squares
or GLS detrending. With GLS detrending, the stated results remain exactly the same when p = 0
and are as stated in the text with Vc c(r) replacing Jc(r) when p = 1. As a matter of notation, we
let =X denote weak convergence in distribution and -+ convergence in probability. Also, to alleviate
notational burden, we let N = T - kmax.Note that we can write the DGP as

k k
Ayt =- ,(-OT) AYt-i + T #(-T)iYt-i-1 + et - (-OT)k+'et-k-l-
i=1 i=O

Let
k-j c k-j
Zt_= ,(-T)iYt-i- = T Z(-OT)Yt-i-j-1 + et-j- (OT) ket-kl;
i=O i=O
1546 SERENA NG AND PIERRE PERRON

the DGP can also be written as

(A.1) Ayt = Yt-l - (-OT)Zt-l + T (-OT)iYt-i-1 + et (0 )klet-k-1

k
= PoYt-1 + f3izt-i + etk
i=l

with go0= c/T, 11 =OT, Pi = 0 for i > 1, and

c k
(A.2) etk = (-OT)iYt-i-1 +et - (-0T)klet-k-l
i=1

Since {zt-j, i = 1, . . ., k} is a linear function of {Ayt-i, i =1, . . ., k}, the OLS estimate of g0 and
the sum of squared residuals from the regression
k
(A.3) Ayt = goYt-, + piAyt-i + etk
i=l

have the same properties as the corresponding quantities from estimating (A.1). We start with a
Lemma that will be used extensively.

LEMMA A.1: Let Jj(r) = fJexp((r - s)c)dW(s), K3T = o(T), and 8ij = 1 if i = j and 0 other-
wise with i, j = 1, . . ., k. Then, uniformly for i, j < kma< K3 = o(T): (a) N-1 ET k ?1 Y2-1=
2((1 + 82 foJc(r)2dr) + 0,(T- /2); (b) N-1 ET=kmx+l Yt-lZt-j = 81 + 0p(T-1/2); (c)
N-1 Et=kmax+1 Zt-izt-j = (e (8ij + (-OT)2(k?l) i) + 0 (T1/2); (d) N-1 _T-k 1 2AYtYt1
+ O(T-1/2); (e) N-1 ETT ?1AYtZtJ = -81joe2 + OP(T-1/2).

PROOF: From Nabeya and Perron (1994), we can write Yt = aTet + bTXt where Xt = (1 +
c/T)Xt1 + et, aT = (-OT)(1 + cIT), bT = 1-(1 c/T)(-OT),
+ aT 1, T112bT 8 as T oo. Also,
ET-1Xt_et = Op(T). For part (a), we have:

T T
2 2
N-1N-1 , ~Yt-i N'1
=N E (aTet-1 + bT X21+
t-1 +
aTb -e
TXt-l-1)
t=kmax+l t=kmax+l

= _e2(1 + 82 Jc(r)2dr) + O (T-1/2).

For part (b), we have

T
N-1 Z Yt-lzt-j
t=kmax+l

T / c k-j
N Y, +bTXt-1)
- -(-OT) ek1
+tk1
(aTe-1 Z(-OT)Yt-i-j-1
t=kmax+l i=O

Since Xt =Et=1 exp((t-j)c/T)ej, Xt_1 = Xtk+1 E 1 exp((t-j)c/T)ejI We have ET-=kmax+lee =

OP(Tl) for t 7 s, T112bT = Op(1), and ET=km?+l Xt-let = OP(T), hence, N-1 ET=km?+l YtlZti -
j2 = Op(T-1/2) and N-1 ETkmax+1Yt-lzt-j = Op(T-1/2) for j > 1 uniformly in 0 < i, j < kmax
< K3T
For part (c),
T
N-1 E zt-izt-
t=kmax+l

T
AT
N- ~, [et-i - et-k-1 (-OT) ][et-j t-k-1 (-OT)klJ] + (T12)
t=kmax+1

=o-r2(5 + (_0T)2(k+l)-i-j ) + O(T-1/2).

LAG LENGTH SELECTION 1547

Part (d) follows using the fact that

T T T
N-1 Z AYtYt1-= cN t +N
Yt- (aTe_l - + bTXt-l)(et + OTet-1),
t=kmax+1 t=kmax+l t=kmax+l

and part (e) using the fact that

T T /C
N-1 Z Aytzt = N-1 t m
Tyt-k+( +e
(-(-OT)etl)
t=kmax+1 t=kmax+1

(T j=o T)jYt-i-j-l1+et-i-et_ I _0T k+1-i)

The proof of Theorem 2 follows directly from the following Lemma concerned with the limit of
go0, the OLS estimate of go0 from the regression (A.1), which is equivalent to the OLS estimate of
go0from the autoregression (8).

LEMMA A.2: Consider the OLS estimate of g0 from the autoregression (8). Let x = f2 Jo (r)2dr
when p = 0 and x = f2R VCe(r)2dr when p = 1. Then, (i) uniformly in 0 < k < K1T: k(30= -x-1;
(ii) if kIT1/2 -+ K, T1/2p0 =: A, where A is a random variable with E(A) = -a(K)E(x-1), 9(K) =
28 exp(-2K5)/(1 - exp(-2K8)); (iii) uniformly in K2T < k < K3T, T1/2/30= OP(l).

(i) Case 0 < k < K1T. Let Xt = (Yt-1, Z), Zt = (Zt-*1 Zt-k). Define the following two (k + 1)
by (k + 1) matrices:
- T T
2 Yt-lzt,
yt2_1
T t=kmax+l t=kmax +1
RT= N-1 XtX = N-' ,
t=kmax +1 T

Lt=kmax +l t=kmax +l

l+x el

_ e Ik + CC'-

where x 82 f J, (r)2dr, e' = (1, 0lxk) and c is a k x 1 vector of ones. The inverse of R is

-k +l -k 1 1 1
-k k(l +x) -(1+x) -(1+x) -(1 + x)
R1=J21 1 -(l+X) k(l+x) -x _x
1 k+1 x
1 -(1 + x) -x k(l + x) -x
L -(1 +x) -x -x -x k(l+x)

In what follows, the structure of R-1 will be exploited. Let

T T
VT= (N-i ?
t=kmax +1
AYtYt1, -1 f AytZK) V = (-1, -1, ?' 1Y
t=kmax +1

Note that 0 1, . . .,3k) = RJTV

LEMMA A.3: (a) maxo<k<KlT IIRT - RT| = Op(K1T/;), (b) maXO<k<KlT IIRT RI =

?nP(K1 T1N1T)-
1548 SERENA NG AND PIERRE PERRON

PROOF: Using Lemma A.1, we have Ri -Rij = Cij where Cij= Op(T-1/2) uniformly in 0 < k < KlT
for i, j = 1, . . ., k + 1 and part (a) follows. To prove part (b), note that the lower k x k submatrix
of R is, say, Rk = (Ik_+ tC). Then (Ik + W)-1 = Ik - (1 + C't)-1tc' has eigenvalues of 1 and (1 + L't)-1.
Hence maxO<k<KlT IIR-1I= Op(l). Using arguments as in Perron and Ng (1998), the (k + 1) x (k + 1)
matrix R also has eigenvalues of the same order as Rk. Hence maxo<k<KlTh1R-111= Op(l). Let QT =
I JRT- -Re 11.Then maXO<k<KlT QT< maXO<k<KlT iTi IRT-RII IR-1 ii= OP(K1T/17\;) by arguments
analogous to Berk (1974).

A.4: (a) maxo<k<KlT IIVT

LEMMA II=Op(l)
- VI = OP(/KlT/T); (b) maXO<k<KlT IIVT

PROOF: Part (a) follows directly from Lemma A.1 (d, e). Since V = (-1, -1, Olxk-l)',

maxO<k<KlT 1IvII =
Op(l) and maxo<k<KlT I1T I = Op(l) in view of part (a) of the Lemma.
Let e' = (1, Olxk)- We have

= e'RT VT = e'(R1 - R-1)(VT -17) + e'(RT1- R-1)V + e'R- (VT- V) + e'R1-V

= A+B+C+D.

For a matrix M, denote the i, jth element of M-1 by me'. We now consider the limit of each of the
four terms. For A,

O m<ax
IIke(R+1
maxjj|ke (RT
-R(VT
R)(VTl)| - V)12<KIT
< KlT I1 -Fli 112
p
li 11T-
i, T
|| P
II
Vi 1
O<k<KlT i=1

K2 k K'
<1 KIT pIli - P112 < (Kl-T O-

since i=r-r1-I2 < - R-1 12= Op(k2/T) < Op(KlT/T). For the term B, we need

IIke'(Ri k-R1)VII. Recall that RT = R + C, where C3j= Op(T-1/2), i, j = 1..

maxO<k<KlT , k + 1.
Then R1 = R-1 + R-CR-1 + Op(T-1). We have (ignoring, without loss, the Op(T-1) term):

k+1 k+1

IIe'(RT1- R-1)II = Ie'R-lCR-l1 - Z(Flj + F2j) Z FClic

j=1 i=1

k+1 k+1 k k+1

< E 1 (Flj + F2j) F1E1 li 111Cij 11< C* E 1(Flj + F2j) 11 11Fli1,
j=l i=1 j=1 i=1

where c* = maxi, Cij. But the sum of the norms of the elements of the first row of R-1 is
[2k + (k - 1)][1 + (k + 1)x1-l = OP(1), and the sum of the norms of the sum of the first two
elements of each row of R-1 is given by (1 + kx + (k - 1)x)/(1 + (k + 1)x) = OP(l). Since c* =
OP(T-1/2), maxO<k<KlT Ike'(RTl - K-)Flll = OP(KlT/T1/2). For the term C, first note that

(VT -V) =
NA( YtYt-1+1)( Yz-+1,

T T 1
Z 'YtZ,-2 Z, Y(Zt-kJ.
t=kmax+1 t=kmax+1
LAG LENGTH SELECTION 1549

Thus,

max IIke'R-1(VT-V)jj
O<k'KlT

( 1T +
1 + (KIT + l)X 0[(KlT
<+(KT+) + 1)) ((N-1 ?, Atyty
t=kmax +1
+ i)

- K tll
=kmaxjY j=2 t=kmax+l

< (cK1 __. 0

For D,~~~~~~ TIN1/2) KlT (N-1 ? AtZt1 +I) +N1Z ? AYtZt-U

For D,

max (ke'R- V) K= T 1 -1
O<KT 1 1KT + 1)X X=8
fo J, (r)dr
Combining the results, we have kf3o = op(l) + op(l) + op(l) + Op(l) = Op(1). Thus, kf30 =
(82 JJ(r)2dr)-1 uniformly in 0 < k < K1T.

(ii) Case k/Ti/2 - K:

For the case where kIT112 K, we first note that the relevant matrix R is given by

O-
1r+x Ie'4 I
e Ik + AA J

where_' = [(-OT)k . , (-OT)]. The limiting vector V remains unchanged. Now we have v7I3T=
v-7e'R-' V + <7W, where W = A + B + C. It is easy to see that E(VYW) = 0 in large samples. So,
we need to evaluate v/7e'RK-1, (minus) the sum of the first two elements of R-1. By partitioned
matrix inversion, these are given by

Ri: [(1 +x) -ee'(Ik +LtiL)ie]V,

R-2 - [(1 +X) - e (Ik+/1/1t)- e]-le'(Ik++/-qt) 1e.

Since (Ik + /Lt')i = 'k- bb/A, A = 1 + b'4, e'(Ik + AA') ie = e'e - (e'AA'e/A) = 1 - [(-OT)2k/A].
Hence, 1- e'(Ik +y 1) e = (-OT)2k/A and substitutingfor A = (1- (-OT)2(k+l))/(l - OT )

_ _
(A.4) v~Ye/k1f~v- = (_ )2k(1 -_OT)2 _(_0)2k(1 OT)2+~ (1/2),
(A.4) [1 - (-0)2(k+l)]X + (-0)2k(1 - OT)2=[1 - (-O)2(k?l) Ox + (T-

which converges to -a(K)/X setting k = KT1/2 and taking the limit as T oo.

(iii) Case K2T < k < K3T.

Note that (-OT)k -O 0 if k/ ;T -+ oc. Let

2 1+x e' ]
and note that

- X-1 -X-1 Olxk-i

_
- =(2 X-1 (1 + X)X-1 0 .

? k-lxl ? Ik--1_
1550 SERENA NG AND PIERRE PERRON

Thus, IRII = Op(l) IIR-111= Op(l). From Lemma A.1 (a, b, c), maXK2T<k<K3T IT-
and
RI = OP(K3T/T1 maXK2T<k<K3T IIRTII< maxK2T<k<K3T IRT1 RI + maxK2T<k<K3T
). Also, RI=

Op(K3TIT12) I
since maXK2T<k<K3T RI = Op(l) and K3T/T112 + c. Since maxK2T<k<K3TIIR1 <
X <k<K3TIRT I, maXK2T<k<K3T II ll Op(T112/K3T). It follows that

1k1-
max glRT-II < max IlRTl
1IJIRT-RIIIIJ-1II
K2)T<k<K3T K2T<k K3T

< Op(T 12/K3T)Op(K3T/T1/2)0p(j) = Op(l).

We also have maxK2T<k<K3T IIpT-VF = Op(K12 /T 1/2) and I1VII= Op(1). Thus,

T1/2 -T1=T/22elR1-J PT

= T/2e'(R1 - R-1)(J(T-V) + T1/2e'(RiTl - R-1)

+ T1/2e'R-1 (VT- V) + T1/2e'R V

= A + B + C + D.

For A, we have

k
max IIT 1e'(R R)JT V) 12< max TZj Fl1 i 12I J"1, 112i
K2Tk<K3T K2T_kKT i-

k
< max cL)71i-1i I2
K2T<k<K3T i=1

-I 112=
<KkaKC jlI-
RT Op()

For B, IIT112e'(Rk1-R-1)V1I = iIT12e'R-1CR-1V I+ Op(T-1/2) for some R such that RT = R + C,

where the matrix C has elements that are OP(T-1/2). Now

-= [x-1, -x-l,0,0,, . . . ,0],

-T= [0, -1, 0, O.. ]'.

Therefore CR-V =-C2 where C2 is the second column of the matrix C. Then e'R1 CR1 V=
x'1(-C12+C22). Since C12 and C22 = Op (T-1/2), maxK2T <k<K3T IITI/2e'Rk-CR-1 V11 = Op(l). For the
third term C,

max IT11 e (VT V)|| = max T1/2[X -](VT-V)

K2T- -K3T K2T<k<K3T

maxtK3 1l [(V1, T V)(P2, T-F2)] |

=K2T<k<K3T X

T 1/2 C
<_ Op(l).
x T12
T1/
=

For D,

T1/2 e'R-1 = T1/2[ x-1 +X-1] = 0.

Combining the results, we have T1/2/30 = Op(l) uniformly in K2T < k < K3T.
LAG LENGTH SELECTION 1551

PROOF OF LEMMA 1

LEMMA A.5: Consider the estimated autoregression (8) when the DGP is given by (14). Let &k,=
N-1 ET ; then uniformly over 0 < k < K3T,

ln(&k2)= ln(o-e2) + 1 ( oT;2(k+1) (1 -T) + OP(T112).

PROOF: Let Ek = eTk), where etk = cT1 Ek

(-OT)iYt-i-1 +et -
(
OT)k1
etkl. Let
X = (X1, ... I XT)Y. Then
(ekmax+lk. 1

61k2
= N-XE'Ek-N E'X(X'X) X'Ek.

By direct calculations,
T T
(A.5) N-1E'Ek = N-1
Ze2e + (-OT)2(k?l)Nl1e+kl
)2(k+l)N-1 ? e2 + (T )
t=kmax+1 t=kmax+1

(A.6) = C-2(1 + (_OT)2(k+l)) +O 1(T1/2).

Consider N-1E.X = (N-1

EkT=kma +1 yt-, etk, N1 EjT=kma?+l
Ztetk). Using results established in the
proof of Theorem 2,
T
N 1 Yt- 1etk
t=kmax +1

T ( C k
= N-1 (aT t-l+ bTXt-1) (et-(-OT) t-k-1 + T ( OT) Yt-i-iJ
t=kmax+l i

=Op(T-1/2)

and
T
N1 E zt_etk
t=kma +1

T
= N-1 , (et - (-OT)kljet-k-l) (et - (-OT)k1 et-k-1) + Op(T1/2)
t=kmax1

T
= (_0 )2(k+l)-jN- E e21 + Op(T1/2),
t=kmax+1

= cr2( OT)2(k+l)-j + O (T-1/2).

Therefore,

N1 EX = cr-2[0,02(k+l)-1 (. 0 )2(k+l)-k] + O (T-

= e2(0T)k1 [0, / t] + Op(T1/2),

where At'= [(-OT)k,. . ., (-OT)]. From Lemma A.1,

e
(A.7) N-1 (X'X) = o-e2 [ + ] + OP(T-1/2) _ R + OP(T 1/2).

In view of the properties of N-1E'X,

N-1E'X(X'X)-fX'Ek = (-OT )2(k+l)t'R-l (k)4tt+ Op(T-1/2),

1552 SERENA NG AND PIERRE PERRON

where R-1 (k) is the lower right k x k submatrix of R-1. By partitioned matrix inversion,

R-1 (k) = [Ik + AA'- e'e/(l + x)]-',

= [Jk + ]1 t7,

where Jk = Ik - ee'/(l + x). For a k x k matrix A and k x 1 vectors u and v, (A + uv')-1 = A -

A 1it(1 + v'A-1u)-1 v'A-1. Hence, for A = 1 + 1 Jk

R1(k) = Jk 1- J- /A,

'R-l (k) =
= ~1C
~_1
_k-
-
-(/_j
1
_L)2 /A,

h4/
1 + t'-1

Since Jk = Ik - ee'/(1 + x), J,-' = Ik + ee'/x, and

I_'Jk = bt'u+ It'ee'u/x = b'A + (-OT) 2kX.

We also have _i/p. = Ek (0T )2i hence

(OT )2 (OT )2(k+l) 1 - (-OT )2(k+l)

1- (0T
)2
~1 -(_OT )2

After some simplications,

i tJ,7-1 tt (-OT)2[-l(OT)2k] + (_0T)2k[l _ (_OT)2]/X

1 + 'J1 -
[1- (oTO)2(k+l)] + (-OT)2k [1 -
(OT)2]/X

1 - (_ OT)2k 1
= (
(T 2
OT)2l - (_0T )2(k?1)Ol(T+
J p(- /)
).

Hence,

N1 E'X(X'X)1 X'Ek = (_0)2( (kOTl )2 1OT ( )2k + OP(T )

It follows that

,r2= 1 + (-OT )2(k?1)-(-OT ) ( [

,re (-0 )2(k+l)))(

(
ln(rk2) = ln(o-e2) + [( o ?)2(k+l
( (OT2 + Op(T

Lemma 1 is a consequence of Theorem 2 and Lemma A.5.

PROOF OF THEOREM 3

The strategy of the proof is to first determine the rate of increase of k through a global minimiza-
tion of the objective function. Once this rate is found, the proportionality factor is deduced. In the
following, we use the notation k ocf (T) for the case where k/f (T) -+ c for some finite constant c
strictly above 0. We also let d > 0 be a finite constant not necessarily the same throughout.
For both the IC and MIC, it is readily seen that the objective function is Op(CT/T1/2) when
kIT1/2 -+ K(> 0), and that it is Op(CTk/T) when kIT1/2 _+ oo. Hence, irrespective of the choice of
CT, having kIT1/2 -+ K will always, in large samples, yield a smaller value of the objective function
than with k/T1/2 _+ oo. Hence, we only need to compare the cases k/T1/2 -+ K(> 0) and k/T1/2 -+ 0.
LAG LENGTH SELECTION 1553

We start with the analysis for the IC. When kIT1l2 O 0, there are two cases of interest. The first
is when k2CT/T -+ d in which case the objective function is Op(l/k). The minimal value is obtained
if k is chosen as the maximal value permissible, which means k oc AIT/CT. The minimized objective
function is then OP(,(/CT/v'7). The second case is when T/k2CT -+ d and the objective function
is Op(kCT/T), which is minimized by taking the minimal permissible rate of increase for k, again
k oc \IT/CT, and the minimized objective function is then OP(/ICT/v'7). So the case with kIT"/2
K(> 0) yields an objective function of order Op(CT/T1/2) and the case k/T1/2 -+ 0 an objective
function of order OP(I/CT/v'). Hence, k oc \IT/CT. To derive the constant of proportionality, note
that if CT is fixed, k = KVY and the objective function is (ignoring constants)

CT
2- (K) +
T
TT

Taking derivatives and equating to zero yields the stated solution. When CT ?-+ c as T o-+ c, the
objective function (ignoring constants) is

1 + CTIT/CT
KA/TITCT T

and the value of K that minimizes this expression is 1. Consider now the MIC(k). The first case is
where

CT CTk2
(A.8) k --+d and T d.

Then the objective function when kIT1/2 + 0 is Op(k-1). Hence, the highest possible rate is chosen
and is given by k oc \IT/CT and the minimized objective function is of order Op(NICT/T). Since
the objective function when kIT1/2 -* K(> 0) is Op(CT/T1/2), we have k oc T1/2 if CT is fixed and
k oc \IT/CT if CTo-+ c. However, in view of the requirement that CT/k -+ d this implies the need
for CT/T -+ d. To obtain the constant of proportionality when CT is fixed, we substitute k = KIT
in (16) and equate its first derivative to zero and solve for K. When CT is increasing, we substitute
k = K\IT/CT in (15), and again equate the first derivative to zero and solve for K.
Consider now the case where CI/T -+ oo. There are two cases to cover when either of the condi-
tions in (A.8) is not satisfied. Suppose first that k3/T -+ d, in which case k/CT -+ d. The objective
function in the case k/T1/2 O* 0 is Op(CT/k2) and the highest rate for k is chosen, namely k oc Tll3.
The minimized objective function is then of order Op(CT/T2/3), which is of a smaller order than
the case with k/T1/2 -+ K(> 0). Hence, k oc Tll3 is the resulting rate. If T/k3 -+ d, in which case
CTk2/T -+ oo, the objective function is of order Op(kCT/T) and the smallest rate for k is optimal.
So again, k oc T1l3 and the minimized objective function is also of order Op(CT/T2/3). To obtain the
factor of proportionality, let k = KT13. The objective function is then (ignoring constants)

1 CTZI CTK
KT1/3 K2T213 T2/3

When C3/T -+ oo, the first term is negligible compared to the last two. Hence, minimizing the last
two terms with respect to K yields K = (2Z,)1/3

REFERENCES

BERK, K. N. (1974): "Consistent Autoregressive Spectral Estimates," TheAnnals of Statistics, 2, 489-

502.
DEJONG,D. N., J. C. NANKERVIS,N. E. SAVIN,AND C. H. WHITEMAN (1992): "The Power Prob-
lem of Unit Root Tests in Time Series with Autoregressive Errors," Journal of Econometrics, 53,
323-343.
1554 SERENA NG AND PIERRE PERRON

DICKEY,D. A., AND W. A. FULLER(1979): "Distribution of the Estimators for Autoregressive Time
Series with a Unit Root," Journal of the American StatisticalAssociation, 74, 427-431.
DUFOUR, J. M., AND M. KING (1991): "Optimal Invariant Tests for the Autocorrelation Coefficient
in Linear Regressions with Stationary and Nonstationary Errors," Journal of Econometrics, 47,
115-143.
ELLIOTT,G., T. J. ROTHENBERG,AND J. H. STOCK(1996): "Efficient Tests for an Autoregressive
Unit Root, Econometrica, 64, 813-836.
FRANSES,P. H., AND N. HALDRUP(1994): "The Effects of Additive Outliers on Tests of Unit Roots
and Cointegration," Journal of Business and Economic Statistics, 12, 471-478.
FULLER,W. A. (1976): Introduction to Statistical Time Series. New York: John Wiley.
GOURIEROUX,C., AND A. MONFORT(1995): Statisticaland Econometric Models, Vol. 1. Cambridge:
Cambridge University Press.
HANNAN, E. J., AND M. DEISTLER (1988): The Statistical Theoty of Linear Systems. New York: John
Wiley.
LOPEZ,J. H. (1997): "The Power of the ADF Test," Economics Letters, 57, 5-10.
NABEYA,S., AND P. PERRON(1994): "Local Asymptotic Distribution Related to the AR(1) Model
with Dependent Errors,"Journal of Econometrics, 62, 229-264.
NG, S., AND P. PERRON(1995): "Unit Root Tests in ARMA Models with Data Dependent Methods
for the Selection of the Truncation Lag,"Journal of theAmerican StatisticalAssociation, 90, 268-281.
(2000): "A Note on the Selection of Time Series Models," Manuscript, Department of Eco-
nomics, Boston College.
PANTULA,S.G. (1991): "Asymptotic Distributions of Unit Root Tests When the Process is Nearly
Stationary,"Journal of Business and Economic Statistics, 9, 63-71.
PERRON,P., AND S. NG (1996): "Useful Modifications to Unit Root Tests with Dependent Errors
and their Local Asymptotic Properties," Review of Economic Studies, 63, 435-465.
(1998): "An Autoregressive Spectral Density Estimator at Frequency Zero for Nonstationar-
ity Tests," Econometric Theory, 14, 560-603.
PERRON,P., AND G. RODRIGUEZ(1998): "GLS Detrending, Efficient Unit Root Tests and Struc-
tural Change," manuscript, Boston University.
PHILLIPS,P. C. B. (1987): "Time Series Regression with Unit Roots," Econometrica, 55, 277-302.
PHILLIPS,P. C. B., AND P. PERRON(1988): "Testing for a Unit Root in Time Series Regression,"
Biometrika, 75, 335-346.
SAID, S. E., AND D. A. DICKEY(1984): "Testing for Unit Roots in Autoregressive-Moving Average
Models of Unknown Order," Biometrika, 71, 599-607.
SCHWERT,G. W. (1989): "Tests for Unit Roots: A Monte Carlo Investigation," Journal of Business
and Economic Statistics, 7, 147-160.
STOCK,J. H. (1990): "A Class of Tests for Integration and Cointegration," Manuscript, Harvard
University.

Phillips & Perron - Biometrika - 1988 - Unit Root Test
No ratings yet
Phillips & Perron - Biometrika - 1988 - Unit Root Test
13 pages
Tests of Stationarity Against A Change in Persistence: Fabio Busetti, A.M.Robert Taylor
No ratings yet
Tests of Stationarity Against A Change in Persistence: Fabio Busetti, A.M.Robert Taylor
34 pages
Econometrics: Unit Root Analysis
No ratings yet
Econometrics: Unit Root Analysis
34 pages
Testing For Unit Roots Using The Augmented Dickey-Fuller Test
No ratings yet
Testing For Unit Roots Using The Augmented Dickey-Fuller Test
6 pages
Topic 5 Unit Roots, Cointegration and VECM
100% (1)
Topic 5 Unit Roots, Cointegration and VECM
42 pages
Unit Root Testing: ° Physica-Verlag 0, ISSN 0002-6018
No ratings yet
Unit Root Testing: ° Physica-Verlag 0, ISSN 0002-6018
16 pages
Durbin, J., & Watson, G. S. (1951) - Testing For Serial Correlation in Least Squares Regression. II. Biometrika, 38 (12), 159.
No ratings yet
Durbin, J., & Watson, G. S. (1951) - Testing For Serial Correlation in Least Squares Regression. II. Biometrika, 38 (12), 159.
20 pages
Enders Lee April 18 2010
No ratings yet
Enders Lee April 18 2010
33 pages
Hansen 1992
No ratings yet
Hansen 1992
16 pages
Dickey-Fuller Test: From Wikipedia, The Free Encyclopedia
No ratings yet
Dickey-Fuller Test: From Wikipedia, The Free Encyclopedia
3 pages
AMFE Module 5 - Unit Root Test
No ratings yet
AMFE Module 5 - Unit Root Test
13 pages
Long Run and Short Run Models by Afees Salisu
No ratings yet
Long Run and Short Run Models by Afees Salisu
51 pages
Econometric Unit Root Tests
100% (1)
Econometric Unit Root Tests
3 pages
All About Unit Root New
No ratings yet
All About Unit Root New
14 pages
Perron (1997) - Further Evidence On Breaking Trend Functions in Macroeconomic Variables
No ratings yet
Perron (1997) - Further Evidence On Breaking Trend Functions in Macroeconomic Variables
31 pages
Demonstrating The Use of Vector Error Correction M
No ratings yet
Demonstrating The Use of Vector Error Correction M
17 pages
SSRN Id3270269
No ratings yet
SSRN Id3270269
82 pages
Hall 2013 Inference
No ratings yet
Hall 2013 Inference
28 pages
Further Non-Stationarity Notes
No ratings yet
Further Non-Stationarity Notes
50 pages
1 Choosing The Lag Length For The ADF Test
No ratings yet
1 Choosing The Lag Length For The ADF Test
11 pages
1999 - Likelihood Ratio Tests For Multiple Structural Changes - Bai
No ratings yet
1999 - Likelihood Ratio Tests For Multiple Structural Changes - Bai
25 pages
ADF-GLS Test - Wikipedia
No ratings yet
ADF-GLS Test - Wikipedia
2 pages
Taxonomy of Time Series Models
No ratings yet
Taxonomy of Time Series Models
42 pages
Financial Time Series Analysis
No ratings yet
Financial Time Series Analysis
11 pages
Differencing and Unit Root Tests
No ratings yet
Differencing and Unit Root Tests
8 pages
Geweke, J Meese, R and Dent, W (1983) - Comparing Alternative Tests of Causality in Temporal Systems
No ratings yet
Geweke, J Meese, R and Dent, W (1983) - Comparing Alternative Tests of Causality in Temporal Systems
34 pages
Temi Di Discussione: Del Servizio Studi
No ratings yet
Temi Di Discussione: Del Servizio Studi
55 pages
Unit Root Test and Applications
No ratings yet
Unit Root Test and Applications
11 pages
Lecture 3 Models With Trend
No ratings yet
Lecture 3 Models With Trend
36 pages
Chapter 4
No ratings yet
Chapter 4
102 pages
Kapetanios 2005
No ratings yet
Kapetanios 2005
11 pages
SERP2003001
No ratings yet
SERP2003001
20 pages
Jurnal Reset TEST
No ratings yet
Jurnal Reset TEST
13 pages
Time Series Analysis: Unit Root Tests & Box-Jenkins
No ratings yet
Time Series Analysis: Unit Root Tests & Box-Jenkins
35 pages
Testing For Changes in The Error Distribution in Functional Linear Models
No ratings yet
Testing For Changes in The Error Distribution in Functional Linear Models
18 pages
Bai 1996
No ratings yet
Bai 1996
27 pages
Spurious Regressions in Economics PDF
No ratings yet
Spurious Regressions in Economics PDF
10 pages
Formula Sheet
No ratings yet
Formula Sheet
7 pages
Lecture On Time Series Diagnostic Tests: Chung-Ming Kuan Institute of Economics Academia Sinica
No ratings yet
Lecture On Time Series Diagnostic Tests: Chung-Ming Kuan Institute of Economics Academia Sinica
23 pages
Linear Regression Model
No ratings yet
Linear Regression Model
195 pages
SSRN Id3270269 PDF
No ratings yet
SSRN Id3270269 PDF
83 pages
Dicky-Fuller Test
No ratings yet
Dicky-Fuller Test
3 pages
An Empirical Note About Additive Outliers and Nonstationarity in Latin-American in Ation Series
No ratings yet
An Empirical Note About Additive Outliers and Nonstationarity in Latin-American in Ation Series
12 pages
Unit Root Testing in AR1 Processes
No ratings yet
Unit Root Testing in AR1 Processes
64 pages
Phillips Oularis
No ratings yet
Phillips Oularis
53 pages
1993 - Eui WP - Eco - 016
No ratings yet
1993 - Eui WP - Eco - 016
40 pages
Module 5 - Tests For Stationarity Part 2
No ratings yet
Module 5 - Tests For Stationarity Part 2
8 pages
Econometric Model Stability Tests
No ratings yet
Econometric Model Stability Tests
8 pages
We Ran One Regression: David F. Hendry and Hans-Martin Krolzig Department of Economics, Oxford University. March 10, 2004
No ratings yet
We Ran One Regression: David F. Hendry and Hans-Martin Krolzig Department of Economics, Oxford University. March 10, 2004
9 pages
Chapter 3ee
No ratings yet
Chapter 3ee
51 pages
1 Unit Root Tests: T T T T T T T
No ratings yet
1 Unit Root Tests: T T T T T T T
34 pages
Shin Et Al-1998-Journal of Time Series Analysis
No ratings yet
Shin Et Al-1998-Journal of Time Series Analysis
9 pages
An Augmented Autoregressive Distributed Lag Bounds Test For Cointegration
No ratings yet
An Augmented Autoregressive Distributed Lag Bounds Test For Cointegration
13 pages
Power Versus Frequency of Observation : Yale University, New Haven, C T 06520, USA
No ratings yet
Power Versus Frequency of Observation : Yale University, New Haven, C T 06520, USA
6 pages
SaidDickey Fuller1984 PDF
No ratings yet
SaidDickey Fuller1984 PDF
9 pages
Marketing Environment: Module - 2
No ratings yet
Marketing Environment: Module - 2
11 pages
Business Startup Costs
No ratings yet
Business Startup Costs
5 pages
Factors Affecting Fund Performance Perception: Investment Advisors' Perspective
No ratings yet
Factors Affecting Fund Performance Perception: Investment Advisors' Perspective
6 pages
The Relative Performance of VAR and VECM Model: Xzhang@business - Queensu.ca
No ratings yet
The Relative Performance of VAR and VECM Model: Xzhang@business - Queensu.ca
4 pages
Karnataka Civil Service (Conduct) Rules 1966 (Eng)
No ratings yet
Karnataka Civil Service (Conduct) Rules 1966 (Eng)
27 pages
Mbaschemesyll PDF
No ratings yet
Mbaschemesyll PDF
128 pages
Observation Research
No ratings yet
Observation Research
15 pages
FM Questions
No ratings yet
FM Questions
3 pages
Chapter 01
No ratings yet
Chapter 01
20 pages
Excel Basics for Beginners
No ratings yet
Excel Basics for Beginners
2 pages
04 Abstract
No ratings yet
04 Abstract
1 page
Ifm Module 6 Theory
No ratings yet
Ifm Module 6 Theory
5 pages
Page No List of Table List of Figures Chapter I: Introduction
No ratings yet
Page No List of Table List of Figures Chapter I: Introduction
3 pages
2011-2010 Financial Borrowings Summary
No ratings yet
2011-2010 Financial Borrowings Summary
1 page
Sharpe Index Model: Portfolio Expected Return E (R) o
No ratings yet
Sharpe Index Model: Portfolio Expected Return E (R) o
2 pages
IFM Module 6 Problems
No ratings yet
IFM Module 6 Problems
7 pages
Time Value of Money Calculations
No ratings yet
Time Value of Money Calculations
1 page
Kui Luk Catalog
No ratings yet
Kui Luk Catalog
20 pages
12 Rao Elements 2004
No ratings yet
12 Rao Elements 2004
9 pages
RXT 10GE Manual
No ratings yet
RXT 10GE Manual
98 pages
Age of Fantasy - Saurians v2.0: Background About OPR
No ratings yet
Age of Fantasy - Saurians v2.0: Background About OPR
2 pages
LiDAR Full Notes
No ratings yet
LiDAR Full Notes
32 pages
Library Management System Report IPT
No ratings yet
Library Management System Report IPT
6 pages
Test 3 Global WF With Answers
No ratings yet
Test 3 Global WF With Answers
5 pages
Isometric Projections
100% (1)
Isometric Projections
20 pages
Sawasdee SET: S-T Retracement, Opportunity To Buy
No ratings yet
Sawasdee SET: S-T Retracement, Opportunity To Buy
14 pages
92-005-541 - Poly-Pro All-Format Polish, 1 Gallon
No ratings yet
92-005-541 - Poly-Pro All-Format Polish, 1 Gallon
8 pages
Kangen Water - Change Your Water.. Change Your Life! Presentation
No ratings yet
Kangen Water - Change Your Water.. Change Your Life! Presentation
22 pages
Catalytic Bleaching
100% (1)
Catalytic Bleaching
195 pages
Technology Globalization and Ethics
No ratings yet
Technology Globalization and Ethics
30 pages
Qatar Tourism
No ratings yet
Qatar Tourism
3 pages
Apollo Hospital Report
No ratings yet
Apollo Hospital Report
16 pages
Grease Programme
No ratings yet
Grease Programme
2 pages
Environmental Science Unit 2 IA
No ratings yet
Environmental Science Unit 2 IA
76 pages
Deep Learning Revision Guide
No ratings yet
Deep Learning Revision Guide
6 pages
Nfpa 2
No ratings yet
Nfpa 2
15 pages
MCA Exam: Mathematical Foundations
No ratings yet
MCA Exam: Mathematical Foundations
3 pages
NSTP1 Syllabus 2024-2025
No ratings yet
NSTP1 Syllabus 2024-2025
6 pages
D9R Hydraulic System
No ratings yet
D9R Hydraulic System
24 pages
New Generation History Grade 11 Learners Book
100% (1)
New Generation History Grade 11 Learners Book
4 pages
Important MCQs On RBI Circular For SBI Clerk
No ratings yet
Important MCQs On RBI Circular For SBI Clerk
246 pages
CSS Module 1-Grade 12
No ratings yet
CSS Module 1-Grade 12
41 pages
2022年港澳杯初赛 P1
No ratings yet
2022年港澳杯初赛 P1
4 pages
DRAW10W - Assembly Drawing
No ratings yet
DRAW10W - Assembly Drawing
8 pages
SAP Extended Warehouse Management (EWM) Material Flow System (MFS) (Connecting A PLC)
100% (2)
SAP Extended Warehouse Management (EWM) Material Flow System (MFS) (Connecting A PLC)
175 pages
Global Economic Impact of COVID-19
100% (1)
Global Economic Impact of COVID-19
15 pages
Order Receipt: Gate Academy Learning
No ratings yet
Order Receipt: Gate Academy Learning
1 page

Paper Econometrica

Uploaded by

Paper Econometrica

Uploaded by

Lag Length Selection and the Construction of Unit Root Tests with Good Size and Power

Author(s): Serena Ng and Pierre Perron

LAG LENGTH SELECTION AND THE CONSTRUCTION OF

BY SERENA NG AND PIERRE PERRON1

KEYWORDS: Integrated processes, truncation lag, GLS detrending, information

'This is a substantially revised version of a paper previously circulated as "Constructing Unit

We assume that we have T + 1 observations generated by (for t = O,... , T):

(1) Yt= dt + ut, ut = atut-,+ vt,

(3) MZ.a = (T1YT SAR)(2T Yt-1)

(5) S2R = Sk2/(_ - /(1))2.

viewed as modified versions of the Phillips (1987) and Phillips-Perron (1988) Za

3. THE TESTS UNDER GLS DETRENDING

(7) PT = [S( ) - SM11SAR,

3.1. The MGLsTests

THEOREM 1: Let Yt be generated as in (1) with a = 1 + c/T and consider

(9) p =0: MPT = [T EY1

p=1: MPLS = [2T2 Et2_+(1-c)T Y]T/SAR

X V2G -(r)2dr + (1 -c cV, cl

3.2. The Constructionof the AutoregressiveSpectralDensity

4. THE SIZE ISSUE, THE SELECTION OF k, AND

(10) IC(k) = ln(o6i) + (k + p 1)CT/(T -kmax),

5. AN ALTERNATIVE PENALTY FOR INTEGRATED DATA

Q = (T - kmax)1 E ln(f(Ayt Igo(k)))

-(T -kmax)-l E ln(f (Ayt I/3(k))).

As discussed in Gourieroux-Monfort (1995), Akaike's suggestion was to find a

kT(k) = (1/k2) (P(k) -/ (k))' E XtXt ((k) -/30(k)),

TQ=(PT(k)/2+op(1) and TQ=- (PT(k)/2+op(1). Since T(Q-Q)= IT(k)+

(11) ~T(k>+i ((JI)O E Y-2

where Xi is a chi-square random variable with k degrees of freedom that is

(12) MAIC(k) = In(Tk2)+ 2(TT(k) + k)

where TT(k) = (k2) 1 2 ETky1 y2 and &k2= (T -kmax) 1 ETe=k+1e2. Note

1]2/[2 fl W(r)2dr] -r(W), say, where W is a detrended Wiener process, whose

5We thank a referee for bringing this feature to our attention.

P=0, Ti100 P=0, T=250

6. LOCAL ASYMPTOTIC PROPERTIES

(14) Yt- TYt-1 = et + oTet-1,

et ~ i.i.d. (0, Te2),yo = eo = 0, 8 > 0, and c < 0. As T increases to infinity,

6.1. Local Asymptotic Propertiesof TT(k)

THEOREM 2: Let yt be generated by (14), Z1 = (1 ? 82 of Jc(r)2dr)/

6.2. Local Asymptotic Propertiesof the MIC and IC

LEMMA 1: Let Yt be generated by (14) and K1T,K2T, and K3T as defined in

(15) MIC(k) = ln(o-e2)+? + CT Z1+

(ii) for k/T1/2 -?> K, with #(K) defined in Theorem 2:

(iii) uniformly in K2T < k < K3T:

(17) MIC(k)=In(oe2) + CT Z3 + CT k + OP(T-1/2)

THEOREM 3: Let Yt be generated by (14) and consider the rate of increase of k

FIC (K) = -482 exp(-2K8) ? CT

When CT is fixed, k nicI(Knlic(l)T -I? 1 where =, and

average root and/or alternative deterministic components. The issue of practical

6.3. The Finite Sample Propertiesof knfic

7. SIZE AND POWER OF THE TESTS IN FINITE SAMPLES

k*Z k*fgs kbic kaic k,nbic k* k*fgls kbic kaic k,nbic

k* k*fg1s kbic kaic k,nbic k* k*fg kbic kaic k,nbic

MA Case: vt = et + Oeti , et - N(0, 1)

MA Case:vt = et + Oet-1,et - N(0, 1)

MA Case: vt = et + Oet-I, et - N(0, 1)

detrending. Furthermore, the power increase from T = 100 to T = 150 is quite

MA Case:vt = et + Oet-1, et - N(0, 1)

To illustrate the ease of application of our procedures and the differences in

USA Canada UK Japan Italy France Germany

AR(a): .95 .95 .93 .87 .92 .98 .92

Dept. of Economics, Johns Hopkins University,Baltimore, MD 21218, U.S.A.;

Manuscriptreceived May, 1997; final -evision receivedAugust, 2000.

the DGP can also be written as

(A.1) Ayt = Yt-l - (-OT)Zt-l + T (-OT)iYt-i-1 + et (0 )klet-k-1

with go0= c/T, 11 =OT, Pi = 0 for i > 1, and

= _e2(1 + 82 Jc(r)2dr) + O (T-1/2).

For part (b), we have

Since Xt =Et=1 exp((t-j)c/T)ej, Xt_1 = Xtk+1 E 1 exp((t-j)c/T)ejI We have ET-=kmax+lee =

=o-r2(5 + (_0T)2(k+l)-i-j ) + O(T-1/2).

Part (d) follows using the fact that

and part (e) using the fact that

(T j=o T)jYt-i-j-l1+et-i-et_ I _0T k+1-i)

In what follows, the structure of R-1 will be exploited. Let

Note that 0 1, . . .,3k) = RJTV

A.4: (a) maxo<k<KlT IIVT

kZ kfgs kbic kaic k,nbic k* k*fgls kbic kaic k,nbic

k* kfg1s kbic kaic k,nbic k k*fg kbic kaic k,nbic