Paper Econometrica
Paper Econometrica
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
The Econometric Society is collaborating with JSTOR to digitize, preserve and extend access to Econometrica.
http://www.jstor.org
Econometrica, Vol. 69, No. 6 (November, 2001), 1519-1554
It is widely known that when there are errors with a moving-average root close to -1,
a high order augmented autoregression is necessary for unit root tests to have good size,
but that information criteria such as the AIC and the BIC tend to select a truncation lag
(k) that is very small. We consider a class of Modified Information Criteria (MIC) with
a penalty factor that is sample dependent. It takes into account the fact that the bias in
the sum of the autoregressive coefficients is highly dependent on k and adapts to the type
of deterministic components present. We use a local asymptotic framework in which the
moving-average root is local to -1 to document how the MIC performs better in selecting
appropriate values of k. In Monte-Carlo experiments, the MIC is found to yield huge size
improvements to the DFGLSand the feasible point optimal PT test developed in Elliott,
Rothenberg, and Stock (1996). We also extend the M tests developed in Perron and Ng
(1996) to allow for GLS detrending of the data. The MIC along with GLS detrended data
yield a set of tests with desirable size and power properties.
1. INTRODUCTION
MANY UNIT ROOT TESTS have been developed for testing the null hypothesis of
a unit root against the alternative of stationarity. While the presence or absence
of a unit root has important implications, many remain skeptical about the con-
clusions drawn from such tests. This concern is justifiable, as these tests gener-
ally suffer from two problems. First, many tests have low power when the root
of the autoregressive polynomial is close to but less than unity (e.g., DeJong et
al. (1992)). Second, the majority of the tests suffer from severe size distortions
when the moving-average polynomial of the first differenced series has a large
negative root (e.g., Schwert (1989), Perron and Ng (1996)). Although less severe,
the problem also arises when there is a large negative autoregressive root in the
residuals. The consequence is over-rejections of the unit root hypothesis. While
few economic time series are found to have negative serial correlation of the
autoregressive type, many do exhibit a large negative moving-average root. The
inflation rate is an example, and omitted outliers have also been shown to induce
1519
1520 SERENA NG AND PIERRE PERRON
a negative moving average root in the error process (e.g., Franses and Haldrup
(1994)). It is therefore desirable to have powerful unit root tests that are robust
to size distortions.
The implementation of unit root tests often necessitates the selection of an
autoregressive truncation lag, say, k. This is required in the autoregression used
to form the Said-Dickey-Fuller test. It is also required in constructing an autore-
gressive estimate of the spectral density at frequency zero. However, simulations
have repeatedly shown a strong association between k and the severity of size dis-
tortions and/or the extent of power loss (e.g., Ng and Perron (1995), and Lopez
(1997)). To date, there exists no operational procedure for selecting k that yields
robust and satisfactory results. The problem is that while a small k is adequate
for finite order autoregressive errors and ARIA processes with small moving-
average components, a large k is generally necessary for noise functions with a
moving-average root that is large. For the latter class of models, selection rules
such as the Schwarz or Bayesian Information Criterion (BIC) and the Akaike
Information Criterion (AIC) tend to select values of k that are generally too
small for unit root tests to have good sizes.
This paper sets out to resolve two problems. One is to enhance the power of
several tests that have been shown to have small size distortions. More precisely,
in Perron and Ng (1996), we analyzed a class of modified tests, originally pro-
posed by Stock (1990), and showed that these tests are far more robust to size
distortions than other unit root tests in the literature when the residuals have neg-
ative serial correlation. Exploiting the findings in Elliott, Rothenberg, and Stock
(1996) (hereafter ERS) and Dufour and King (1991) that local GLS detrending
of the data yields substantial power gains, we apply the idea of GLS detrending
to the modified tests and show that non-negligible size and power gains can be
made when used in conjunction with an autoregressive spectral density estimator
at frequency zero provided the truncation lag is appropriately selected.
Our second task is to provide an improved procedure for choosing the trun-
cation lag. We argue that the penalty of k assigned to overfitting in information
criteria such as the AIC under-estimates the cost of a low order model when the
unit root process has a negative moving-average root and, hence, tends to select
a lag length that is too small. We suggest a class of modified information criteria
that takes better account of the cost of underfitting. The modified AIC is shown
to lead to substantial size improvements over standard information criteria in
all the unit root tests considered. Taking the two steps together, GLS detrend-
ing along with the selection rule for k provide procedures which allow for the
construction of unit root tests with much improved size and power. Additionally,
using GLS detrended data to estimate the spectral density at frequency zero is
found to have favorable size and power implications.
The paper proceeds as follows. After setting up the preliminaries in Section 2,
Section 3 considers the class of M tests with GLS detrending. Section 4 explains
why information criteria will underestimate k and Section 5 proposes an alterna-
tive penalty factor in the context of integrated data. Section 6 presents the prop-
erties of the modified information criteria using a local asymptotic framework
LAG LENGTH SELECTION 1521
and shows why the AIC version is preferable. Simulation results are presented
in Section 7, and the procedures are illustrated using inflation rates for the G7
countries in Section 8. Section 9 offers brief concluding remarks. Mathematical
derivations are given in the Appendix.
2. PRELIMINARIES
In Perron and Ng (1996), we analyzed the properties of three tests: MZa, MZt,
and MSB, collectively referred to as the M tests. These are defined as (for the
no deterministic term case):
T \1/2
(4) MSB = T-2 E yt2 /S2) 1/
t=1
and MZt = MZa x MSB. All three tests are based on S2R, an autoregressive
estimate of the spectral density at frequency zero of vt. Let /(1) = Z$1 60
f, =
(T - k)-1 T=k?ee, with f3iand {etk} obtained from (2) estimated by OLS. Then,
The M tests for p = 0 and 1 can be obtained upon replacing Yt-i and YT by the
residuals obtained from least squares detrending. The MZa and MZt tests can be
2 Most of the results
presented here, in particular Theorem 1 below, hold with the less restrictive
assumption that {et} is a martingale difference sequence with respect to the sigma algebra c =
{es;s < t} with o = limI<: T1 Zt,1 E(etIlt_).
1522 SERENA NG AND PIERRE PERRON
To improve power, ERS proposed local to unity GLS detrending of the data.
For any series {xt}[=0, define (xO,xt) = (xo, (1 - aL)xt), t = 1, . . . , T, for some
chosen a-= 1 + c/T. The GLS detrended series is defined as
(6) Yt Yt-z
where & minimizes S(a-, f) = (y" - q'z )a(ya - q'za). If v, is i.i.d. normal, the
point optimal test of the null hypothesis a = 1 against the alternative hypothesis
a a is the likelihood ratio statistic, L = S(a) - S(1), where S(d) = mind,S( -, /).
=
ERS considered a feasible point optimal test that takes into account that vt may
be serially correlated. The statistic is
where S2R is defined as in (5). The value of c is chosen such that the asymptotic
local power function of the test is tangent to the power envelope at 50% power.
For p = 0 this is -7.0 and for p = 1, it is -13.5. The PT test did not perform
well in simulations when k was selected by the BIC. Large size distortions were
recorded. ERS then recommended the DFGLs statistic (whose local asymptotic
power function also lies close to the Gaussian local power envelope) as the t-
statistic for testing go = 0 from the following regression estimated by OLS:
k
(8) A5t = foit-1 + E fi tA j+ etk.
j=1
for the moment that this lag length selection problem is solved. Since the power
gains from the DFGLSover the DF come from the use of GLS detrended data,
it is natural to consider the M tests under GLS detrending. We now state the
asymptotic properties of the MGLS tests.
The proof is based on standard results for sample moments of GLS detrended
data and therefore omitted. The MZGLS,MZGLS,and MSBGLShave similar size
and power properties and will be referred to as the MGLS tests. We have com-
puted, via simulations, the local asymptotic power functions of the M tests with
OLS and GLS detrending. These showed the MGLS tests to be uniformly more
powerful than the M tests and also to have local asymptotic power functions
that are indistinguishable from those of the DFGLSand the Gaussian asymptotic
local power envelope. A graph is therefore not presented. The important point
is that the MGLStests, like the DFGLS,achieve local asymptotic power that is
approximately the same as the feasible point-optimal test. We also consider two
modified feasible point optimal tests and derive their limiting distributions:
These tests are based on the same motivation that leads to the definition of the
M tests in Stock (1990), namely, to provide functionals of samp-lemoments that
have the same asymptotic distributions as well known unit root tests. The MPT
is important because its limiting distribution coincides with that of the feasible
point optimal test considered in ERS.
Critical values for the limiting distributions of the MGLS tests were obtained
by numerical simulations that approximate the Wiener process W(r) by partial
sums of N(0, 1) random variables with 5,000 steps and 20,000 simulations. These
are summarized in Table I.
1524 SERENA NG AND PIERRE PERRON
TABLE I
ASYMPTOTIC CRITICAL VALUESa
Percentile MZGLS
a MZGLS,
t ADFGLS MSBGLS ~~
~~~~~T
pGLS MpGLS
T
Case: p= 0, C= -7.0
.01 -13.8 -2.58 .174 1.78
.05 -8.1 -1.98 .233 3.17
.10 -5.7 -1.62 .275 4.45
Case: p= 1, C=-13.5
.01 -23.8 -3.42 .143 4.03
.05 -17.3 -2.91 .168 5.48
.10 -14.2 -2.62 .185 6.67
aFor MZGLS, MZGLS, and ADFGLS with p = 0, the critical values are from Fuller
(1976). The other entries were obtained using 20,000 simulations and 5,000 steps to approx-
imate the Wiener processes.
To see the influence of the lag order k, we consider the finite sample size of
the tests MZaLS, DFGLS, and ZGLS. The setup, which will be used throughout,
is to consider data generated by (1) with a = 1 and Vt either an MA(1), i.e.
vt = et + Oet_1,or an AR(1), i.e. vt = pvt1 + et, with et - i.i.d. N(O, 1). The
errors are generated from the rndn( function in Gauss with 999 as the seed, and
5,000 replications are used. We use the nominal size of 5% as the benchmark.
3 When p = 1, Perron and Ng (1996), Stock (1990), and ERS estimate equation (2) with only a
constant by ordinary least-squares. This still ensures a consistent estimate under the null hypothesis
but the omission of the trend term implies a noninvariance with respect to the slope of the trend
function in finite samples under the alternative hypothesis. This noninvariance is such that the tests
lose power as the slope increases.
LAG LENGTH SELECTION 1525
TABLE II.A
EXACT SIZE OF THE TESTS AT SELECTED VALUES OF k: p = 0
MA Case T 0 k =0 1 2 3 4 5 6 7 8 9 10
100 -0.8 0.994 0.904 0.808 0.740 0.698 0.672 0.654 0.644 0.636 0.633 0.627
zGLS -0.5 0.618 0.322 0.224 0.193 0.176 0.181 0.182 0.192 0.197 0.207 0.216
0.0 0.072 0.082 0.089 0.104 0.112 0.125 0.137 0.152 0.164 0.179 0.190
0.5 0.010 0.143 0.056 0.116 0.100 0.129 0.135 0.153 0.158 0.179 0.187
0.8 0.006 0.172 0.036 0.147 0.071 0.155 0.108 0.166 0.138 0.188 0.169
250 -0.8 0.987 0.861 0.725 0.641 0.592 0.559 0.539 0.524 0.515 0.507 0.504
-0.5 0.553 0.258 0.166 0.124 0.112 0.110 0.106 0.112 0.111 0.115 0.116
0.0 0.060 0.062 0.063 0.066 0.069 0.075 0.077 0.081 0.087 0.087 0.092
0.5 0.008 0.111 0.047 0.080 0.065 0.077 0.076 0.083 0.084 0.091 0.095
0.8 0.004 0.145 0.030 0.108 0.049 0.099 0.062 0.096 0.074 0.100 0.085
100 -0.8 0.990 0.828 0.543 0.319 0.187 0.129 0.095 0.086 0.079 0.077 0.078
MzGLS -0.5 0.595 0.241 0.126 0.097 0.084 0.092 0.097 0.106 0.120 0.132 0.141
0.0 0.065 0.070 0.082 0.095 0.104 0.119 0.130 0.145 0.158 0.172 0.185
0.5 0.008 0.139 0.054 0.113 0.097 0.126 0.132 0.150 0.156 0.176 0.186
0.8 0.004 0.168 0.035 0.143 0.068 0.153 0.106 0.164 0.136 0.186 0.168
250 -0.8 0.987 0.828 0.597 0.407 0.260 0.166 0.107 0.075 0.055 0.044 0.035
-0.5 0.545 0.227 0.121 0.081 0.067 0.063 0.062 0.066 0.070 0.071 0.074
0.0 0.058 0.059 0.061 0.064 0.066 0.072 0.076 0.079 0.084 0.084 0.088
0.5 0.008 0.110 0.045 0.078 0.064 0.076 0.075 0.082 0.083 0.090 0.095
0.8 0.004 0.145 0.029 0.108 0.048 0.099 0.061 0.095 0.074 0.100 0.084
100 -0.8 0.991 0.843 0.624 0.455 0.322 0.243 0.187 0.157 0.129 0.114 0.104
DFGLS -0.5 0.605 0.263 0.147 0.108 0.089 0.087 0.081 0.079 0.076 0.078 0.075
0.0 0.073 0.070 0.072 0.074 0.072 0.072 0.070 0.070 0.071 0.074 0.073
0.5 0.010 0.124 0.046 0.081 0.063 0.075 0.074 0.077 0.078 0.075 0.071
0.8 0.005 0.150 0.030 0.107 0.043 0.089 0.054 0.082 0.059 0.080 0.064
250 -0.8 0.985 0.823 0.613 0.449 0.332 0.247 0.194 0.155 0.128 0.109 0.099
-0.5 0.535 0.227 0.123 0.084 0.068 0.064 0.062 0.062 0.061 0.058 0.060
0.0 0.057 0.054 0.054 0.053 0.056 0.055 0.058 0.058 0.056 0.055 0.054
0.5 0.009 0.103 0.043 0.070 0.055 0.061 0.057 0.058 0.057 0.060 0.060
0.8 0.004 0.133 0.027 0.090 0.039 0.074 0.046 0.068 0.049 0.065 0.051
AR Case T p k =0 1 2 3 4 5 6 7 8 9 10
100 -0.8 0.802 0.286 0.286 0.295 0.300 0.308 0.314 0.322 0.328 0.340 0.346
zGLS 0.8 0.000 0.102 0.115 0.135 0.143 0.158 0.172 0.191 0.201 0.217 0.226
250 -0.8 0.737 0.170 0.171 0.171 0.173 0.176 0.179 0.182 0.185 0.184 0.183
0.8 0.000 0.069 0.071 0.077 0.082 0.084 0.090 0.094 0.093 0.099 0.100
100 -0.8 0.784 0.027 0.035 0.050 0.065 0.081 0.089 0.113 0.120 0.140 0.149
MZ?GLS 0.8 0.000 0.100 0.114 0.134 0.142 0.157 0.170 0.190 0.199 0.216 0.225
250 -0.8 0.730 0.043 0.041 0.047 0.048 0.050 0.058 0.062 0.064 0.071 0.071
0.8 0.000 0.069 0.071 0.077 0.082 0.084 0.090 0.094 0.092 0.099 0.100
100 -0.8 0.791 0.073 0.074 0.077 0.079 0.080 0.076 0.081 0.075 0.083 0.078
DFGLS 0.8 0.000 0.066 0.070 0.069 0.068 0.069 0.071 0.072 0.071 0.077 0.079
250 -0.8 0.718 0.057 0.054 0.056 0.054 0.055 0.057 0.057 0.057 0.059 0.058
0.8 0.000 0.059 0.061 0.063 0.062 0.060 0.060 0.060 0.058 0.058 0.054
1526 SERENA NG AND PIERRE PERRON
TABLE II.B
EXACT SIZE OF THE TESTS AT SELECTED VALUES OF k: p = 1
MA Case T 0 k= 0 1 2 3 4 5 6 7 8 9 10
100 -0.8 1.000 0.994 0.965 0.940 0.918 0.906 0.893 0.888 0.882 0.882 0.883
zGLS -0.5 0.810 0.470 0.340 0.299 0.290 0.309 0.323 0.358 0.376 0.413 0.450
0.0 0.053 0.076 0.100 0.136 0.169 0.222 0.259 0.313 0.346 0.400 0.430
0.5 0.001 0.169 0.059 0.163 0.155 0.226 0.253 0.303 0.343 0.397 0.432
0.8 0.001 0.219 0.023 0.229 0.091 0.284 0.190 0.350 0.297 0.422 0.389
250 -0.8 1.000 0.992 0.960 0.927 0.899 0.875 0.859 0.848 0.839 0.833 0.830
-0.5 0.801 0.404 0.253 0.202 0.181 0.174 0.176 0.179 0.188 0.199 0.209
0.0 0.047 0.057 0.064 0.077 0.081 0.094 0.104 0.113 0.129 0.142 0.154
0.5 0.002 0.127 0.037 0.092 0.071 0.097 0.095 0.120 0.122 0.143 0.153
0.8 0.001 0.176 0.014 0.129 0.038 0.131 0.068 0.139 0.099 0.162 0.136
100 -0.8 1.000 0.925 0.592 0.345 0.220 0.185 0.159 0.163 0.163 0.185 0.183
MZGLS -0.5 0.750 0.252 0.104 0.093 0.099 0.132 0.159 0.204 0.234 0.285 0.317
0.0 0.032 0.055 0.080 0.115 0.152 0.206 0.243 0.300 0.332 0.390 0.419
0.5 0.001 0.161 0.054 0.155 0.148 0.219 0.246 0.299 0.338 0.392 0.427
0.8 0.001 0.211 0.021 0.224 0.087 0.277 0.186 0.346 0.294 0.417 0.386
250 -0.8 1.000 0.978 0.800 0.514 0.276 0.147 0.087 0.058 0.048 0.040 0.036
-0.5 0.786 0.311 0.126 0.074 0.057 0.059 0.065 0.069 0.082 0.100 0.106
0.0 0.040 0.049 0.057 0.070 0.077 0.088 0.099 0.108 0.124 0.136 0.149
0.5 0.002 0.124 0.034 0.090 0.069 0.095 0.094 0.117 0.121 0.142 0.150
0.8 0.001 0.174 0.014 0.128 0.036 0.130 0.066 0.137 0.098 0.160 0.135
100 -0.8 1.000 0.977 0.815 0.600 0.415 0.293 0.203 0.167 0.125 0.116 0.098
DFGLS -9.5 0.838 0.375 0.176 0.112 0.087 0.084 0.074 0.070 0.068 0.069 0.065
0.0 0.070 0.072 0.067 0.072 0.069 0.073 0.072 0.071 0.069 0.068 0.061
0.5 0.003 0.160 0.045 0.091 0.064 0.073 0.062 0.066 0.061 0.067 0.062
0.8 0.002 0.209 0.021 0.127 0.036 0.099 0.046 0.085 0.053 0.076 0.061
250 -0.8 1.000 0.984 0.875 0.685 0.504 0.362 0.259 0.194 0.150 0.120 0.099
-0.5 0.809 0.352 0.161 0.101 0.072 0.061 0.059 0.053 0.053 0.056 0.052
0.0 0.053 0.055 0.055 0.056 0.056 0.054 0.054 0.054 0.055 0.055 0.054
0.5 0.002 0.126 0.034 0.071 0.048 0.057 0.053 0.054 0.052 0.056 0.050
0.8 0.001 0.172 0.015 0.098 0.024 0.075 0.032 0.063 0.038 0.058 0.042
AR Case T p k=0 1 2 3 4 5 6 7 8 9 10
100 -0.8 0.960 0.497 0.509 0.526 0.549 0.574 0.593 0.616 0.640 0.669 0.690
zGLS 0.8 0.000 0.123 0.162 0.213 0.252 0.312 0.351 0.410 0.441 0.489 0.514
250 -0.8 0.956 0.319 0.327 0.330 0.335 0.345 0.352 0.360 0.369 0.375 0.379
0.8 0.000 0.070 0.077 0.089 0.099 0.119 0.129 0.150 0.159 0.176 0.183
100 -0.8 0.941 0.001 0.008 0.030 0.058 0.104 0.140 0.191 0.233 0.290 0.329
MZGLS 0\8 0.000 0.121 0.160 0.211 0.249 0.309 0.349 0.408 0.439 0.487 0.513
250 -0.8 0.?52 0.011 0.015 0.020 0.028 0.037 0.050 0.062 0.076 0.092 0.106
0.8 0.000 0.070 0.076 0.089 0.099 0.119 0.128 0.150 0.159 0.175 0.183
100 -0.8 0.967 0.069 0.067 0.067 0.060 0.068 0.059 0.065 0.063 0.065 0.065
DFGLS 0.8 0.000 0.072 0.067 0.070 0.071 0.069 0.069 0.074 0.066 0.072 0.071
250 -0.8 0.959 0.052 0.051 0.052 0.050 0.049 0.050 0.050 0.051 0.052 0.053
0.8 0.000 0.057 0.055 0.057 0.054 0.055 0.058 0.056 0.055 0.059 0.058
LAG LENGTH SELECTION 1527
Reported in Table II are the sizes of the tests for p = 0, 1 at selected values of 0
and p. We report results for T = 100 and 250 to highlight the fact that the size
issue in the negative moving average case is not a small sample problem.
Several features of the results for MA errors are of note. First, for a 0 of the
same absolute value, a negative 0 always requires a larger lag to obtain a more
accurate size. Second, for a positive 0, the size of the tests are significantly better
when k is even. Third, the larger the number of deterministic terms, the more
distant are the exact from the nominal sizes. Fourth, there is always a value of
k that will yield an MZGLSwith a reasonably accurate size when 0 = -0.80.
For example, for p = 0 when T = 100 and 250, exact sizes of .077 and .055,
respectively, are achievable with lags of 9 and 8. In contrast, there does not exist
a k that will result in an acceptable size for ZGLS, and increasing the sample size
will not reduce the size distortions adequately. For the sample sizes reported,
the exact sizes for ZGLSare well above 0.5 at 0 = -.8, and size distortions are
non-negligible even when 0 = -0.5. Fifth, the DFGLSalso exhibits a dependence
on k. At T = 100, the size is .104 at k = 10, but is .624 when k = 2.
For AR errors, size discrepancies between ZGLS and MGLS also exist when
p is large and negative, albeit not as dramatic as in the MA case. The results
reveal that the MZaLS is inferior to the DFGLSwhen p = 1, the sample size is
small, and lpl is large. For large negative AR coefficients, the MZGLShas a more
accurate size at k greater than 1. For large positive AR coefficients, the MZGLS
is oversized for small T, though the size improves as T increases.
Clearly, the choice of k is crucial especially in the case of negative moving
average errors. Various practical solutions have been used against this problem.
In ERS, the BIC is used to select k but they set the lower bound to be 3,
because even larger size distortions would have resulted if zero was the lower
bound. An alternative method is the sequential t test for the significance of the
last lag considered in Ng and Perron (1995). It has the ability to yield higher
k's than the BIC when there are negative moving-average errors and reduce size
distortions. But, the sequential test tends to overparameterize in other cases.
This, as does ERS's implementation of the BIC, leads to less efficient estimates
and subsequently to power losses. Neither approach is fully satisfactory.
The AIC and the BIC belong to the class of information based rules (here-
after, IC) where the chosen value of k is ki = arg mink[O.kmax]IC(k) where4
&k = (T - kma- LT=kmax?itk CT/T - 0 as T -> oc, and CT > 0. The various
criteria differ in CT, the weight applied to overfitting, but all use k as the penalty
to overfitting. We argue below that, with integrated data, this penalty may be a
poor approximation to the cost of underfitting.
4See Ng and Perron (2000) for an analysis pertaining to the adequacy of the particular formulation
of the information criterion stated.
1528 SERENA NG AND PIERRE PERRON
To motivate the main idea, we consider the derivation of the AIC for data gen-
erated by a finite order AR(ko) with normal errors and a unit root (i.e., the DGP
is (2) with k = ko, go = 0, and etk = et - i.i.d. N(O, 1)). The family of parametric
models fitted is given by (8). For notation, we let go (k) = (O,01, ,f3k)'g (k) =
(0, 1,-*,k), o(k) = (P1. * I*3Pk)' and P/o(k) = (1 . .., IPk). In what
follows, we adopt the treatment of Gourieroux and Monfort (1995, pp. 307-309).
The goal is to select an autoregressive order between 0 and some upper bound
kmax.We assume this upper bound satisfies kmax/T-> 0 as T -> oo. Let f(Ay
00(k)) be the likelihood function of the data (AYk m1x ... AYT) conditional on ,
the initial observations (yo, ... , Ykmax). This ensures that each competing model is
evaluated with the same number of effective observations, namely T - kmax.The
Kullback distance between the true probability distribution and the estimated
parametric model is Q = Eo[ln(f (Ay I /0(k))) - ln (f (Ay I /3(k)))] with sample
analog:
T
T
where &k2= (T - kmax)-1 T=k
?1 e2. Using Taylor series expansions, we have
T
(11)~~ (Tk 1/ gTo E t-1 k+ p(1
'2 2 tkma?
&k
t=kmax+1
LAG LENGTH SELECTION 1529
t=kmax+l1
T
2
+ (11&Tk2)
(32 E 21 +k.
t=kmaX+?1
Since the first term is common to all models and the second term is propor-
tional to -((T- kmax)/2) ln(k2) under normality, minimizing Q* is equivalent to
minimizing
(13)
(13) MIC(k)
MI C(k) ==Tln(6k2) + CT (TT (k) + k)
-kmax
with CT > 0 and CT/T -> 0 as T -+> . The MBI C obtains with CT = ln(T - km=).
There are two important elements that distinguish (13) from standard infor-
mation criteria. The first is the imposition of the null hypothesis go = 0. This
idea of imposing the null hypothesis in model selection when the ultimate objec-
tive is hypothesis testing appears new and may have implications beyond the unit
root issue.5 This is an interesting avenue of research, but is beyond the scope of
this paper. The second element is that we retain a term in the penalty function
that is stochastic. Now, as T -?oo,(1/6o)go ET=k_ +t_l X== (-2/o-)[W(1)2 -
to the standard AIC. A central theme of our argument for retaining rT(k) is
that, unless both T and k are very large, it varies substantially with k, especially
when a negative MA component is present. In a sense, the imposition of the null
hypothesis, go = 0, allows us to avoid using asymptotic expected values to approx-
imate the penalty factor. We can therefore hope that TT(k) will better capture
the relevant cost of selecting different orders in finite samples since it depends
not only on k, but also on the nature of the deterministic components and the
detrending procedure. Hence, the penalty term will also depend on these factors.
To get an idea of the dependence of rT(k) on k, Figure 1 graphs the aver-
age over 5,000 replications of rT(k) against k for p = 0,1 using the DGP
AY, = e?+ Oe,_1 with et - i.i.d. N(O, 1). For positive 0, _T(k) is indeed fairly
invariant to k (as is the case for AR processes, not reported). But, there is a clear
inverse and nonlinear relation between rT(k) and k for negative values of 0. This
relation exists even when 0 is -.5 for both p = 0 and p = 1. The basic reason for
the dependence of TT(k) on k is that the bias in the sum of the estimated autore-
gressive coefficients (i.e., io) is very high for small values of k. This suggests that
TT(k) will play a role in determining the appropriate k when there are negative
25
30
20
<- 0=-0.8
15 20 <- O=.8
10
1 O=-.5
0 5 10 0 5 10
k k
p=1, T=100 p=1 (T=250
80
40 0 =-0.8 60
<-- 0=-0.8
40-
20~ ~~ ~~~~~~~20<O-.
0 C
0 5 10 0 5 10
k k
FIGURE 1: TT(k).
LAG LENGTH SELECTION 1531
moving-average errors. To make this claim precise, the next subsection uses a
local asymptotic framework to analyze the properties of TrT(k) and the MIC.
As discussed above, TT(k) will likely influence the choice of k when there is a
large negative moving-average root, a situation for which standard unit root tests
exhibit severe size distortions. To understand the issue of lag length selection
in this setting, a useful device is an asymptotic framework in which the moving
average coefficient is parameterized as local to -1.
aT = 1 + c/T, +
OT -1 ?/T,
Theorem 2 shows that the local limit of TT(k) is unbounded unless k increases
at least at rate T112.Since TT(O) diverges at rate T (see Nabeya and Perron
(1994)), and rT(k) diverges at rate T/k2 if k2/T -> 0, the cost to overfitting is
not linear in k as is assumed for the IC, and Figure 1 bears this out. Theorem 2
also highlights the fact that TT(k) will depend, via Z1, on the nature of the
deterministic components and the method of detrending.
1532 SERENA NG AND PIERRE PERRON
Given Theorem 2 and the fact that 1k2is consistent for o-e2irrespective of the
rate of increase of k (shown in the Appendix), it can be shown that, in this local
asymptotic framework, the limiting distribution of the DF test diverges to -o
unless k grows at least at rate T1/2.6 Hence, the unit root will always be rejected
if standard critical values are used. This explains why the DF test tends to be
oversized when k is small. In Perron and Ng (1996, 1998), we showed that
k2s R = Op(1) but we also need TS2R to be Op(1) for the M tests to be bounded
under the null hypothesis in this local framework. But k2s2R and TS2R can both
be bounded only if k/T1/2 -?> K for some 0 < K< 00. In the next section, we will
consider model selection procedures with this requirement in mind.
= In(e2)+ (K) + CT k
(16) MIC(k) n -e 'K)Z ? ?O(T1)
+OP(T-1/2);
?/T C/T
The expansions for the IC(k) are the same except that the terms involving the Z's
are absent.
The objects of ultimate interest are the values kic and k,..ic chosen by the IC
and the MIC, respectively, over all possible values of k.
6
This is a more general result than Pantula (1991) who used a different local asymptotic framework
for modeling the negative moving average component in unit root tests. Pantula's result applies only
when k = Op(T1/4).
LAG LENGTH SELECTION 1533
? TT
[1x(-2K3)]3?
2C2
MIC(2kKJ- _VCT/TI
K2
CT
K3T
Provided CT is constant (as is the case with Akaike's value of 2), both the IC
and the MIC will choose k proportional to T1/2 asymptotically, much faster than
the logarithmic rate attained under the standard asymptotic framework. The rate
of 11 is optimal for unit root tests because this is the rate that will render the
class of DF as well as the M tests nondivergent under the null hypothesis in this
local asymptotic framework. However, when CT increases with T, this optimal
rate is not achieved.
The proportionality factor for the rate of increase of kic when CT increases
with T is one and highlights a major shortcoming of the IC in such cases, namely
a complete independence to 8, the parameter of the local asymptotic framework.
For the kmicwhen CT increases with T, the proportionality factor depends on
the parameters of the model via Z1. This effect, however, still vanishes as T
increases. This lack of flexibility does not arise when CT is fixed since Kmic and
Kic always maintain a dependence on 8. Although kic increases at the appropriate
rate of VT, it too has drawbacks because Kic is invariant to T and it is the same
whether p = 0 or p = 1. In contrast, Knic adapts to the sample size and the trend
function via a second order factor involving Z1. Thus, the MIC with CT fixed
maintains a proportionality factor that is the most flexible with respect to T, the
regression model, as well as parameters of the DGP.
If the weight CT on the penalty factor is too large, that is C/T- oo, the MIC
chooses a rate that is independent of CT, namely T1 3, while the IC continues to
select a k that is proportional to \VT/CT. This, however, is a result of theoretical
rather than practical interest since almost all information criteria that have been
suggested are such that CI/T -O 0.
To illustrate these properties, we consider the MAIC and AIC with CT = 2
and the MBIC and BIC with CT = ln(T). We denote the proportionality factors
by Kmaic, Kaic, Kmbic, and Kbic, respectively. Numerical values of K are reported in
1534 SERENA NG AND PIERRE PERRON
TABLE III.A
VALUES OF THE PROPORTIONALITY FACTOR K
p=0 p=1
T\8 0.50 1.00 1.50 2.00 5.00 8.00 0.50 1.00 1.50 2.00 5.00 8.00
100 bic 0.99 0.96 0.93 0.89 0.67 0.54 0.99 0.96 0.93 0.89 0.67 0.54
aic 0.69 0.65 0.61 0.57 0.39 0.30 0.69 0.65 0.61 0.57 0.39 0.30
mbic 3.59 1.73 1.24 1.04 0.68 0.54 6.34 2.89 1.87 0.40 0.71 0.54
maic 1.70 0.88 0.68 0.59 0.39 0.30 2.96 1.37 0.90 0.70 0.39 0.30
150 bic 0.99 0.96 0.93 0.90 0.69 0.55 0.99 0.96 0.93 0.90 0.69 0.55
aic 0.69 0.65 0.61 0.57 0.39 0.30 0.69 0.65 0.61 0.57 0.39 0.30
mbic 3.58 1.73 1.24 1.04 0.69 0.55 6.41 2.92 1.88 1.42 0.72 0.56
maic 1.63 0.85 0.67 0.59 0.39 0.30 2.87 1.32 0.88 0.69 0.39 0.30
250 bic 0.99 0.97 0.94 0.90 0.70 0.57 0.99 0.67 0.94 0.90 0.70 0.57
aic 0.69 0.65 0.61 0.57 0.39 0.30 0.69 0.65 0.61 0.57 0.39 0.30
mbic 3.55 1.71 1.24 1.04 0.71 0.57 6.46 2.93 1.89 1.42 0.73 0.57
maic 1.55 0.82 0.66 0.58 0.39 0.30 2.75 1.27 0.85 0.67 0.39 0.30
500 bic 0.99 0.97 0.94 0.91 0.72 0.58 0.99 0.97 0.94 0.91 0.72 0.58
aic 0.69 0.65 0.61 0.57 0.39 0.30 0.69 0.65 0.61 0.57 0.39 0.30
mbic 3.47 1.68 1.23 1.04 0.72 0.58 6.46 2.92 1.89 1.42 0.75 0.59
maic 1.44 0.79 0.65 0.58 0.39 0.30 2.60 1.20 0.81 0.65 0.39 0.30
5000 bic 0.99 0.98 0.96 0.93 0.76 0.63 0.99 0.98 0.96 0.93 0.76 0.63
aic 0.69 0.65 0.61 0.57 0.39 0.30 0.69 0.65 0.61 0.57 0.39 0.30
mbic 3.06 1.52 1.16 1.02 0.76 0.63 6.12 2.74 1.78 1.36 0.78 0.63
maic 1.14 0.71 0.62 0.57 0.39 0.30 2.12 1.00 0.72 0.60 0.39 0.30
50000 bic 0.99 0.98 0.96 0.94 0.79 0.66 0.99 0.98 0.96 0.94 0.79 0.66
aic 0.69 0.65 0.61 0.57 0.39 0.30 0.69 0.65 0.61 0.57 0.39 0.30
mbic 2.56 1.34 1.08 0.99 0.79 0.67 5.43 2.42 1.59 1.25 0.80 0.66
maic 0.92 0.67 0.61 0.57 0.39 0.30 1.69 0.85 0.66 0.58 0.39 0.30
Table III.A for selected values of 3 and T under the null hypothesis of a unit root
with c = 0. In the case of the MIC these values are random variables because of
the presence of Z1. The numerical calculations reported use its expected value.7
Table III.A shows first that the Kmic's are drastically larger than the Kics when 3
is small. Second, the Kic'Svary much less with 8 than the Kmic's. Third, the K,( ic'S
are larger when p = 1 than when p = 0, but the Kic's are invariant to the order
of the deterministic terms in the regression.
Values of k selected by the IC and the MIC are in Table III.B. These are
constructed using the expected value of the MIC given by (for the IC(k) the term
7 To get the expected value we use the following crude approximation. For p = 0, E[Z1]
(E[1 + 82 ]1 W(r)2dr])/ 4E([fJ W(r)2dr]2) = (4 + 252)/84. For p = 1, E[Zj] E[1 +
82 fl Vo,(r)2dr]/84E(fl Vo0(r)2dr)2. For c =-13.5, tedious but straightforward algebra gives
; ?+
E[Z1] (1 ?2(.102475)/(1
+ 4(.102475)2).
LAG LENGTH SELECTION 1535
TABLE III.B
k CHOSEN BY MINIMIZING THE ASYMPTOTIC APPROXIMATION OF THE MIC AND IC (18)
p=O0 p=1
T\8 0.50 1.00 1.50 2.00 5.00 8.00 0.50 1.00 1.50 2.00 5.00 8.00
100 bic 4 4 4 4 3 2 4 4 4 4 3 2
aic 6 6 6 5 3 3 6 6 6 5 3 3
mbic 16 8 5 4 3 2 29 13 8 6 3 2
maic 17 8 6 5 3 3 29 13 9 7 3 3
150 bic 5 5 5 4 3 3 5 5 5 4 3 3
aic 8 8 7 7 4 3 8 8 7 7 4 3
mbic 19 9 6 5 3 3 35 15 10 7 3 3
maic 20 10 8 7 4 3 35 16 10 8 4 3
250 bic 6 6 6 6 4 3 6 6 6 6 4 3
aic 10 10 9 9 6 4 10 10 9 9 6 4
mbic 23 11 8 7 4 3 43 19 12 9 4 3
maic 24 13 10 9 6 4 43 20 13 10 6 4
500 bic 8 8 8 8 6 5 8 8 8 8 6 5
aic 15 14 13 12 8 6 15 14 13 12 8 6
mbic 31 15 11 9 6 5 57 26 16 12 6 5
maic 32 17 14 13 8 6 58 27 18 14 8 6
5000 bic 24 23 23 22 18 15 24 23 23 22 18 15
aic 49 46 43 40 27 21 49 46 43 40 27 21
mbic 74 37 28 24 18 15 148 66 43 33 19 15
maic 80 50 44 40 27 21 150 71 50 42 27 21
50000 bic 67 66 65 63 53 45 67 66 65 63 53 45
aic 154 147 137 128 88 68 154 147 137 128 88 68
mbic 174 91 74 67 54 45 369 164 108 85 54 45
maic 207 151 138 128 88 68 379 190 147 130 88 68
with Z1 is absent)
2) (1- - O
(18) E[MIC(k)] = lnf(e ?)+ )2(k?l)
+ CT ([( j ( )2(k+l) ] ( T
This expression is valid uniformly in 0 < k < K3T for K3T = o(T) and is obtained
using equation (A.4) and Lemma A.5. When 8 is small and especially when
p = 1, the kmicare several folds larger than the kic. As well, differences between
kaic and knaic are nontrivial even for sample sizes deemed unrealistically large.
Table JJJ.Balso provides some guidance for the k's that would likely be selected
in practice. When 0 = -.9, for example, the local asymptotic results for 8 = 1 and
T = 100 suggest that kbiC = 4 for both p = 0 and 1, but that the kmaic are 8 and
13, respectively. The local asymptotic results therefore support the earlier claim
that the IC lacks the flexibility to handle DGP's with a large negative moving
1536 SERENA NG AND PIERRE PERRON
In this section, we need to establish two things: first, that the MIC chooses
values of k that are appropriate in the sense of minimizing size distortions under
the null hypothesis and not overparameterizing under the alternative; second,
that the MGLS and MGLS statistics have power comparable to the DFGLS and
dominate the M tests. Whenever an estimate of U2 is required, S2R based upon
the regression (8) is used. Because the MAIC dominates all other criteria from
both theoretical and numerical perspectives we only consider the MGLS tests
constructed using the MAIC. In the simulations, the lower bound is always zero
to reduce the chance of overparameterizing when a large k is not necessary.
The upper bound is kmax= int(12(T/100)1/4).8 The results are tabulated for T =
100,150, and 250. The ten statistics being considered are detailed in the notes
to Table VIA and Table VI.A.
8 In practice, the upper bound should be set such that the constraint does not bind. In simulations,
the upper bound occasionally binds when T = 100 but not for larger sample sizes (e.g. T = 150).
LAG LENGTH SELECTION 1537
For p = 0, the results for size are reported in Table VA. The M and MGLS
tests still have inflated sizes at T = 100 when 0 = -0.8. However, use of GLS
detrended data to construct S2R produces substantial size improvements. The
DFGLS based on the BIC yields an exact size around 0.4 at 0 = -0.8, but
TABLE IV.A
SELECTED k FOR p = 0
MA Case: vt = (1 + OL)et
T =100 T =250
o ~k*Z7 k*fgls kbic kaic kmbic k k kdfgls kbic kaic kbic aic
MZaic
-0.8 8 8 2 4 5 6 7 10 4 6 5 7
-0.7 6 10 2 3 3 4 6 7 3 5 4 5
-0.6 5 7 1 2 2 3 5 6 2 4 3 4
-0.5 3 7 1 2 2 2 4 4 2 3 2 3
-0.4 2 4 1 2 1 2 2 2 1 2 2 2
-0.3 2 3 1 1 1 1 2 2 1 2 1 2
-0.2 1 2 0 1 1 1 1 1 1 1 1 1
-0.1 1 1 0 0 0 0 1 1 0 1 0 1
0.0 0 0 0 0 0 0 0 0 0 0 0 0
0.1 0 0 0 0 0 0 0 0 0 1 0 1
0.2 0 0 0 1 0 1 1 1 1 1 1 1
0.3 2 2 1 1 1 1 2 1 1 2 1 2
0.4 2 2 1 2 1 2 2 2 1 2 2 2
0.5 2 2 1 2 2 2 2 2 2 3 2 3
0.6 2 2 2 3 2 3 2 2 2 4 2 4
0.7 2 2 2 4 2 4 2 2 3 5 3 5
0.8 2 2 3 5 3 5 2 2 4 7 4 7
AR Case: (1-pL)vt =et
T =100 T =250
TABLE IV.B
SELECTED k FOR p = 1
MA Case: vt = (1 + OL)et
T =100 T =250
o kznz kdfgls kbic kaic kbi. k7aic kTh* k*fgls kbic kaic k,nbic k,paic
-0.8 6 10 0 2 6 6 7 10 3 5 7 8
-0.7 4 8 1 2 4 5 5 8 3 4 5 6
-0.6 4 6 1 2 3 4 4 6 2 3 4 4
-0.5 3 6 1 2 2 3 3 4 2 3 3 3
-0.4 2 3 1 1 2 2 2 3 1 2 2 2
-0.3 2 2 0 1 1 1 2 2 1 1 1 2
-0.2 1 4 0 1 1 1 1 1 1 1 1 1
-0.1 0 1 0 0 0 1 1 1 0 1 0 1
0.0 0 0 0 0 0 0 0 0 0 0 0 0
0.1 1 0 0 1 0 0 1 0 0 1 0 1
0.2 1 2 0 1 0 1 1 1 1 1 0 1
0.3 0 2 1 1 0 1 2 2 1 2 1 2
0.4 2 2 1 2 1 2 2 2 1 2 2 2
0.5 2 2 1 2 2 2 2 2 2 3 2 3
0.6 2 2 2 3 2 2 2 2 3 4 2 4
0.7 2 2 2 4 2 4 2 2 3 5 3 5
0.8 2 2 3 5 2 4 4 4 4 7 4 6
AR Case: (1 - pL)vt = et
T =100 T =250
Note: The entries are the median values of the selected k using 5,000 replications.
the MAIC improves the size to 0.107. The MAIC also yields a substantial size
improvement over DFGLS(t), which has a size of .35 when 0 = -0.8. The results
also confirm that the PT test constructed with k chosen by BIC suffers from sub-
LAG LENGTH SELECTION 1539
TABLE V.A
SIZE OF THE TESTS; p = 0
100 -0.8 0.058 0.041 0.036 0.394 0.411 0.356 0.107 0.429 0.034 0.036
-0.5 0.028 0.013 0.056 0.082 0.104 0.095 0.073 0.109 0.048 0.051
0.0 0.018 0.005 0.043 0.072 0.077 0.072 0.050 0.094 0.036 0.040
0.5 0.027 0.013 0.062 0.081 0.082 0.080 0.048 0.118 0.048 0.054
0.8 0.042 0.018 0.067 0.095 0.097 0.091 0.039 0.145 0.054 0.062
150 -0.8 0.034 0.018 0.026 0.407 0.390 0.304 0.095 0.420 0.025 0.027
-0.5 0.025 0.009 0.054 0.085 0.093 0.086 0.066 0.091 0.047 0.049
0.0 0.017 0.005 0.049 0.062 0.075 0.070 0.051 0.085 0.040 0.046
0.5 0.028 0.011 0.054 0.066 0.072 0.071 0.046 0.090 0.047 0.050
0.8 0.040 0.013 0.060 0.079 0.077 0.070 0.034 0.108 0.047 0.052
250 -0.8 0.016 0.006 0.022 0.370 0.323 0.236 0.088 0.380 0.021 0.021
-0.5 0.036 0.012 0.058 0.075 0.089 0.079 0.063 0.086 0.049 0.054
0.0 0.021 0.006 0.047 0.059 0.062 0.062 0.045 0.069 0.039 0.042
0.5 0.034 0.009 0.056 0.059 0.068 0.069 0.045 0.078 0.047 0.051
0.8 0.044 0.014 0.059 0.066 0.062 0.058 0.042 0.081 0.050 0.053
AR Case: Vt = pvtl +et, et - N(0, 1)
T p M7liic MGLcS I7GLS DEbic DFGLS DFt DFGSc PTbic GLSC MPGLC
100 -0.8 0.002 0.000 0.015 0.067 0.082 0.076 0.049 0.063 0.013 0.014
-0.5 0.010 0.003 0.037 0.067 0.076 0.071 0.046 0.087 0.028 0.033
0.5 0.029 0.012 0.064 0.082 0.087 0.082 0.053 0.108 0.051 0.057
0.8 0.047 0.025 0.090 0.091 0.081 0.077 0.053 0.125 0.068 0.077
150 -0.8 0.003 0.000 0.021 0.062 0.069 0.067 0.048 0.051 0.019 0.020
-0.5 0.013 0.004 0.039 0.060 0.074 0.069 0.045 0.072 0.031 0.035
0.5 0.030 0.010 0.055 0.063 0.069 0.066 0.047 0.081 0.046 0.051
0.8 0.043 0.017 0.070 0.067 0.071 0.069 0.047 0.092 0.055 0.063
250 -0.8 0.007 0.001 0.029 0.056 0.057 0.054 0.040 0.049 0.026 0.028
-0.5 0.022 0.006 0.046 0.052 0.057 0.058 0.047 0.062 0.040 0.042
0.5 0.028 0.008 0.052 0.056 0.061 0.061 0.045 0.070 0.043 0.046
0.8 0.037 0.010 0.060 0.060 0.061 0.063 0.051 0.073 0.051 0.056
Notes: The results reported for the M,7ic tests are those corresponding to MZ. Since the numerical results for MZa, MZt,
and MSB are similar, we use the label M in the tables. Three variants of the M tests are considered. The Mt,,ic is the original
MZa, the M,7GILS uses GLS detrended data to construct the statistic but uses least squares detrending in the estimation of s2 R and
KIGiLcS uses GLS detrending at c = -7.0 when constructing sAR. When the BIC is used, km,a = 6 and km,n = 3 as in ERS. For the
MIC, kmax= int(12(T/100)1/4), kmin = 0, and CT = 2, giving the MAIC. The DFGLS is based on the 10% sequential t test for the
significance of the last lag.
-GLS
stantial size distortions. However, the PT and MPT show remarkably little size
distortions when the MAIC is used to select k.
To assess the relative power of the tests, we tabulate the rejection rates at
a 1 + C/T. For p = 0 and c = -7.0, the values of arare 0.93, 0.953, and 0.972
=
1540 SERENA NG AND PIERRE PERRON
TABLE V.B
SIZE-ADJUSTEDPOWEROF THE TESTS;p = 0
100 -0.8 0.193 0.237 0.311 0.241 0.419 0.412 0.385 0.313 0.326 0.311
-0.5 0.253 0.351 0.344 0.191 0.448 0.429 0.385 0.414 0.362 0.355
0.0 0.315 0.452 0.458 0.131 0.432 0.436 0.454 0.418 0.471 0.442
0.5 0.259 0.332 0.324 0.122 0.414 0.402 0.370 0.402 0.351 0.347
0.8 0.180 0.257 0.264 0.134 0.366 0.335 0.343 0.348 0.296 0.281
150 -0.8 0.180 0.252 0.296 0.262 0.448 0.417 0.415 0.381 0.307 0.292
-0.5 0.267 0.387 0.384 0.195 0.450 0.430 0.393 0.421 0.398 0.396
0.0 0.339 0.463 0.460 0.152 0.422 0.405 0.454 0.413 0.467 0.453
0.5 0.263 0.382 0.380 0.160 0.441 0.418 0.405 0.429 0.393 0.398
0.8 0.233 0.346 0.363 0.147 0.382 0.362 0.404 0.387 0.377 0.374
250 -0.8 0.190 0.319 0.334 0.287 0.441 0.420 0.437 0.419 0.335 0.327
-0.5 0.264 0.404 0.396 0.201 0.477 0.455 0.428 0.458 0.425 0.409
0.0 0.338 0.481 0.477 0.152 0.464 0.446 0.483 0.455 0.485 0.478
0.5 0.258 0.411 0.411 0.169 0.439 0.424 0.431 0.441 0.422 0.413
0.8 0.246 0.385 0.395 0.143 0.432 0.413 0.419 0.434 0.406 0.405
AR Case:vt = pvt-1 + et, et - N(0, 1)
T P Mttlic Ms7Gzz.LS 2 inic
MGLS DFbic S
DFGzLS Tb FGLS
Milc M
'ics DFbGcS
biTbic DFtGLS Tic MPGTtLS
TrIc
100 -0.8 0.246 0.349 0.350 0.154 0.433 0.429 0.404 0.384 0.362 0.342
-0.5 0.330 0.437 0.432 0.145 0.428 0.420 0.455 0.405 0.452 0.437
0.5 0.279 0.365 0.368 0.115 0.338 0.330 0.385 0.379 0.385 0.374
0.8 0.196 0.251 0.241 0.084 0.297 0.292 0.298 0.261 0.276 0.269
150 -0.8 0.275 0.398 0.416 0.161 0.477 0.437 0.432 0.442 0.419 0.409
-0.5 0.332 0.453 0.454 0.161 0.404 0.390 0.468 0.423 0.463 0.458
0.5 0.307 0.419 0.411 0.140 0.417 0.410 0.446 0.407 0.436 0.430
0.8 0.232 0.333 0.330 0.103 0.341 0.324 0.362 0.325 0.349 0.353
250 -0.8 0.325 0.479 0.472 0.157 0.485 0.480 0.493 0.468 0.477 0.471
-0.5 0.300 0.452 0.447 0.179 0.493 0.477 0.466 0.469 0.459 0.460
0.5 0.336 0.443 0.436 0.158 0.447 0.434 0.456 0.428 0.468 0.456
0.8 0.279 0.375 0.366 0.131 0.386 0.382 0.381 0.381 0.387 0.383
Note:Poweris evaluatedat a = 1+ c/T, c = -7.0. Theseare 0.93,0.953,and0.972for T = 100,T = 150,and T = 250,respectively.
for T = 100, 150, and 250, respectively.9 The size-adjusted power of the tests are
reported in Table VB. Although the tests should reject the null hypothesis 50%
of the time in large samples, none of the tests quite achieve the 50% (asymptotic)
target for sample sizes less than 200. Thus, in finite samples, the need to account
for serial correlation reduces the power of the tests across the board. However,
the GLS based tests have more power than tests that do not implement GLS
9 We also performed extensive power simulations with other values of a and T. The results are
qualitatively similar and do not affect the conclusions concerning the relative performance of the
different tests.
LAG LENGTH SELECTION 1541
TABLE VI.A
SIZE OF THE TESTS; p = 1
T 0 MZnic
MszGliLMS MGLS DEbic DFS DFtGS DFGLS PTbic GLS MpGLSc
100 -0.8 0.140 0.123 0.059 0.509 0.567 0.528 0.123 0.562 0.061 0.061
-0.5 0.037 0.025 0.035 0.092 0.107 0.111 0.069 0.145 0.036 0.036
0.0 0.011 0.005 0.018 0.075 0.074 0.078 0.039 0.161 0.017 0.018
0.5 0.032 0.022 0.048 0.084 0.085 0.094 0.038 0.190 0.046 0.047
0.8 0.048 0.033 0.056 0.096 0.105 0.109 0.017 0.250 0.053 0.056
150 -0.8 0.074 0.061 0.027 0.581 0.586 0.477 0.082 0.570 0.028 0.028
-0.5 0.032 0.017 0.028 0.087 0.103 0.104 0.049 0.114 0.028 0.029
0.0 0.017 0.008 0.023 0.066 0.067 0.072 0.038 0.111 0.023 0.024
0.5 0.025 0.016 0.032 0.068 0.084 0.085 0.029 0.142 0.032 0.034
0.8 0.058 0.035 0.058 0.080 0.088 0.085 0.018 0.181 0.055 0.057
250 -0.8 0.025 0.016 0.012 0.566 0.531 0.381 0.064 0.554 0.012 0.012
-0.5 0.031 0.016 0.033 0.094 0.105 0.099 0.048 0.103 0.034 0.035
0.0 0.023 0.011 0.029 0.056 0.059 0.065 0.038 0.083 0.030 0.031
0.5 0.032 0.012 0.037 0.062 0.066 0.070 0.028 0.097 0.036 0.037
0.8 0.059 0.032 0.059 0.062 0.067 0.060 0.026 0.119 0.059 0.060
AR Case: vt = pvt_ +et, et - N(0, 1)
T T P
p M1721c
Millic MGiS MGLS
n DEbic DFbGcS DFtGLS DFGLS Tb*c pGLS MpGLS
mIc mi Sm c t Pbc C
Trni~ PTrnC
100 -0.8 0.000 0.000 0.000 0.072 0.079 0.084 0.034 0.067 0.001 0.000
-0.5 0.006 0.003 0.012 0.071 0.074 0.077 0.037 0.122 0.013 0.013
0.5 0.033 0.021 0.042 0.076 0.083 0.088 0.038 0.178 0.040 0.043
0.8 0.081 0.056 0.093 0.081 0.086 0.091 0.043 0.210 0.080 0.085
150 -0.8 0.001 0.000 0.003 0.061 0.062 0.069 0.028 0.043 0.001 0.004
-0.5 0.017 0.006 0.024 0.059 0.070 0.075 0.038 0.090 0.014 0.023
0.5 0.037 0.018 0.044 0.056 0.070 0.074 0.039 0.127 0.041 0.043
0.8 0.045 0.022 0.056 0.063 0.066 0.073 0.043 0.142 0.056 0.055
250 -0.8 0.001 0.000 0.003 0.057 0.054 0.058 0.028 0.009 0.004 0.004
-0.5 0.017 0.006 0.024 0.046 0.047 0.053 0.038 0.025 0.023 0.023
0.5 0.037 0.018 0.044 0.059 0.063 0.068 0.039 0.046 0.042 0.043
0.8 0.045 0.022 0.056 0.054 0.056 0.061 0.043 0.054 0.052 0.055
Note: The results reported for the Mr..ic tests are those corresponding to MZ,. Since the numerical results for MZa, MZt,
and MSB are similar, we use the label M in the tables. Three variants of the M tests are considered. The Mr7,ic is the original
MZ, the MMGLS uses GLS detrended data to construct the statistic but uses least squares detrending in the estimation of s2 R and
MG,S uses GLS detrending at c = -13.5 when constructing 52 When the BIC is used, kmaj = 6 and kmin = 3 as in ERS. For the
MIC, kmax= int(12(T/100)1/4), k i = 0, and CT = 2, giving the MAIC. The DFGLS is based on the 10% t test for the significance
of the last lag.
TABLE VI.B
SIZE-ADJUSTEDPOWEROF THE TESTS;P = 1
T 0 Mn ic MGLS
nrc MGLs
fleI c DFbic DFGLS DFtGLS DFGLS
u i
PTbic Ll
,nIC M pGLS
Tm IC
100 -0.8 0.296 0.266 0.348 0.283 0.398 0.414 0.357 0.355 0.350 0.348
-0.5 0.320 0.246 0.300 0.258 0.421 0.420 0.324 0.324 0.311 0.305
0.0 0.450 0.266 0.439 0.175 0.362 0.341 0.432 0.488 0.448 0.442
0.5 0.196 0.147 0.207 0.161 0.316 0.310 0.259 0.320 0.224 0.218
0.8 0.149 0.166 0.154 0.169 0.282 0.269 0.254 0.260 0.163 0.155
150 -0.8 0.236 0.269 0.316 0.309 0.444 0.416 0.354 0.326 0.316 0.314
-0.5 0.283 0.348 0.325 0.291 0.434 0.415 0.358 0.384 0.333 0.330
0.0 0.407 0.470 0.457 0.200 0.384 0.352 0.462 0.369 0.466 0.465
0.5 0.312 0.347 0.348 0.212 0.365 0.365 0.375 0.354 0.344 0.348
0.8 0.171 0.209 0.217 0.201 0.317 0.310 0.309 0.302 0.227 0.225
250 -0.8 0.247 0.285 0.296 0.371 0.460 0.430 0.375 0.406 0.300 0.297
-0.5 0.308 0.370 0.366 0.278 0.459 0.441 0.392 0.422 0.374 0.371
0.0 0.380 0.458 0.451 0.250 0.437 0.418 0.447 0.432 0.455 0.456
0.5 0.311 0.374 0.365 0.258 0.426 0.406 0.388 0.441 0.367 0.363
0.8 0.225 0.268 0.271 0.233 0.364 0.354 0.310 0.388 0.280 0.282
AR Case: Vt = pvt_l +et, et - N(0, 1)
T P Mnlic M'GzI.Ls gCGLS DFbic DFGLS DFtGLS DFG7LSc PTbic TGLc MpTGLSc
100 -0.8 0.257 0.308 0.296 0.204 0.380 0.372 0.385 0.256 0.301 0.299
-0.5 0.332 0.396 0.382 0.205 0.367 0.366 0.431 0.303 0.398 0.393
0.5 0.235 0.265 0.270 0.133 0.262 0.264 0.286 0.247 0.279 0.280
0.8 0.114 0.127 0.129 0.098 0.167 0.165 0.206 0.144 0.150 0.144
150 -0.8 0.311 0.363 0.349 0.240 0.435 0.409 0.409 0.345 0.351 0.347
-0.5 0.388 0.433 0.423 0.240 0.404 0.376 0.437 0.385 0.438 0.429
0.5 0.296 0.338 0.345 0.207 0.338 0.313 0.366 0.333 0.358 0.355
0.8 0.189 0.211 0.218 0.133 0.239 0.228 0.251 0.211 0.228 0.224
250 -0.8 0.379 0.452 0.441 0.245 0.472 0.456 0.468 0.464 0.446 0.442
-0.5 0.372 0.453 0.451 0.300 0.509 0.484 0.445 0.445 0.452 0.457
0.5 0.331 0.387 0.385 0.233 0.391 0.372 0.392 0.401 0.395 0.388
0.8 0.259 0.280 0.288 0.189 0.314 0.292 0.289 0.312 0.285 0.284
Note: Poweris evaluatedat a = 1 + cIT, c =-13.5. There are 0.865, 0.910, and 0.946 for T = 100, T = 150, and T = 250,
respectively.
with the MAIC. The resulting MGLS tests have size close to the nominal size for
all parameter values, while the M and MGLS tests continue to have size close
to .15. As well, the MAIC yields exact sizes for the DFGLSand the PT tests that
are far more accurate than those obtained using the sequential t test and the
BIC.
The size-adjusted power of the tests for p = 1 are reported in Table VI.B.
Power is again evaluated at ar= 1 + c/T. For c = -13.5, ar is 0.865 for T =
100, 0.910 for T = 150, and 0.946 for T = 250. Compared to the size-adjusted
LAG LENGTH SELECTION 1543
power reported in Table VB, all GLS based tests have lower power when p = 1.
However, there are non-negligible power gains from GLS detrending.
A feature of interest is that the MIC allows the MGLS tests to have an exact size
that is closer to 5% than fixing k at k* assuming a known DGP. For example,
in the MA case with 0 = -.8 and p = 1, the size is .059 using the MIC and .159
using a value of k fixed at k* = 6. When T = 150, the corresponding values
are .027 with the MAIC and .093 at k* = 8 (not reported). Hence, the MAIC
chooses a k that adapts to the particular sample analyzed. Size improvements
obtained using the MIC compared to fixing k at k* are even larger when the
number of deterministic components increases (see Perron and Rodriguez (1998)
for the case of a trend with a change in slope).
The size reported for MGLS is remarkable, given that when 0 < 0, the Z tests
based on kernel estimates of o2 reject with probability close to one under the null
hypothesis. The DF tests have higher size-unadjusted power in the case of nega-
tive MA errors. Such high probabilities of rejections are misleading, however, as
they reflect size distortions. Although the DFGiLS has higher size-adjusted power
than Mic for some parameter values, this result must also be put into perspec-
tive. Consider 0 = -0.8, T = 100, and p = 1. While the size-adjusted power of
the DFbicS is higherthan the M'L by 5 basis points, its size is more distortedby
45 basis points. This is a substantial increase in the probability of a Type I error.
The DFGLsfares better, but there is still a size advantage in favor of the Mm.
Overall, the MGiLfS tests have better sizes, while the DFGLs has better power.
A drawback of the M,Gfr test is that it is substantially undersized for pure AR
processes with a coefficient close to -1. Although economic time series rarely
display such a property, this nonetheless translates into lower (size unadjusted)
power. As discussed in Section 6.2, this is not due to a weakness of the MIC in
selecting k, but rather is an inherent property of the MGLS tests. The trade-offs
between the DFGLs and the MFGL are to be determined by the practitioner.
8. EMPIRICAL ILLUSTRATIONS
TABLE VII
EMPIRICAL RESULTS FOR INFLATION SERIES FROM THE G7 COUNTRIES
MAIC k 2 2 8 12 5 7 5
ZGLS -4.04 -6.98C - 14.75a -13.35a -2.71 -3.27 - 18.83a
MZGLS -3.71 -5.48 -6.00C -3.17 -2.49 -1.07 -8.82b
DFGLS -1.37 -1.69c 1.77c -1.33 -1.08 -0.77 -2.21b
pGLS 7.39 4.92 4.13c 7.64 11.68 25.85 3.42c
MpGLS 6.60 4.58 4.17c 7.73 9.83 22.89 3.44c
Note: a, b, and c denotes a statistic significant at the 1%, 5%, and 10% level, respectively.
OLS the autoregression (8) for all values of k ranging from 0 to some maximal
order, say kmax.We use kmax = int(12(T/100)1/4) but other values are valid. In
the model selection stage, each autoregression uses the same number of effective
observations, namely T - kmax.The selected kmaicis obtained as the value of
k that minimizes (12). Given k naic' the DFGLs is obtained as the t-statistic on
go = 0 from re-estimation of (8) with T - kmaic effective observations. The same
regression is used to construct S2R according to (5). Then, the MGLS tests are
constructed from (3) and (4), the P`LS test from (7), and the MP.LS test from
(9), all with GLS detrended series jt. Using the BIC, the steps are the same
except that the term rT(k) is omitted from (12).
The results show interesting differences between using the MAIC or the BIC
to select the lag length. First, for all countries, except the U.S., the selected lag
length is higher with MAIC than with BIC. Second, in all cases (except the U.S.),
the evidence in favor of stationarity is weaker with the MAIC than with the BIC.
For example, with the BIC, the MZGLS and DFbGLS tests suggest a rejectionat
the 1% significance level for Japan and Germany, at the 5% level for Canada,
and at the 10% level for Italy. When using the MAIC, there are only rejections
at the 5% level for Germany (showing that the tests still have power) and at
the 10% level for the U.K. For Canada, the DFGLs is significant at the 10%
level but barely. For the U.S. and France, there is no difference in outcome. Of
interest also is the fact that the ZGLSaiC still shows strong rejections at the 1%
level for Japan, the U.K., and Germany. This accords with the simulation results
that the Z GLS test has high size distortions even when constructed using the
MAIC.
LAG LENGTH SELECTION 1545
9. CONCLUSION
The analysis of ERS shows that detrending data by GLS yields power gains
for unit root tests. We find that GLS detrending also allows for a more pre-
cise autoregressive spectral density estimate and ensures that it is invariant to
the parameters of the trend function. We also show that the BIC or AIC are
not sufficiently flexible for unit root tests. We suggest a modification, the MAIC,
which is evidently more robust when there are negative moving-average errors.
Use of the MAIC in conjunction with GLS detrended data results in a class
of MGLS tests that have good size and power. The key distinction between the
MAIC and standard information criteria is that the former takes account of the
fact that the bias in the estimate of the sum of the autoregressive coefficients is
highly dependent on k. In this paper, we show that the MAIC is useful for the
DFGLS test and the construction of the autoregressive estimate of the spectral
density at frequency zero. We also expect the MAIC, or the more general MIC,
to be useful in a broader range of applications because macroeconomic data are
known to be highly persistent, and in such cases, the bias in the estimate of the
sum of the autoregressive coefficients should depend on the order of the autore-
gression. However, it should also be noted that while the suggested information
criterion is useful in the context of unit root tests as we have shown, it may not
be appropriate when selecting the lag order in other contexts. The general use-
fulness of the MIC also awaits further investigation.
APPENDIX
PROOF OF THEOREM 2: In this Appendix, for simplicity, we consider the derivations in the case
where the DGP is (14) and no deterministic component is present in the regression (8), in which
case Yt is used instead of ,t. All the results carry over to the more general case with least-squares
or GLS detrending. With GLS detrending, the stated results remain exactly the same when p = 0
and are as stated in the text with Vc c(r) replacing Jc(r) when p = 1. As a matter of notation, we
let =X denote weak convergence in distribution and -+ convergence in probability. Also, to alleviate
notational burden, we let N = T - kmax.Note that we can write the DGP as
k k
Ayt =- ,(-OT) AYt-i + T #(-T)iYt-i-1 + et - (-OT)k+'et-k-l-
i=1 i=O
Let
k-j c k-j
Zt_= ,(-T)iYt-i- = T Z(-OT)Yt-i-j-1 + et-j- (OT) ket-kl;
i=O i=O
1546 SERENA NG AND PIERRE PERRON
k
= PoYt-1 + f3izt-i + etk
i=l
Since {zt-j, i = 1, . . ., k} is a linear function of {Ayt-i, i =1, . . ., k}, the OLS estimate of g0 and
the sum of squared residuals from the regression
k
(A.3) Ayt = goYt-, + piAyt-i + etk
i=l
have the same properties as the corresponding quantities from estimating (A.1). We start with a
Lemma that will be used extensively.
LEMMA A.1: Let Jj(r) = fJexp((r - s)c)dW(s), K3T = o(T), and 8ij = 1 if i = j and 0 other-
wise with i, j = 1, . . ., k. Then, uniformly for i, j < kma< K3 = o(T): (a) N-1 ET k ?1 Y2-1=
2((1 + 82 foJc(r)2dr) + 0,(T- /2); (b) N-1 ET=kmx+l Yt-lZt-j = 81 + 0p(T-1/2); (c)
N-1 Et=kmax+1 Zt-izt-j = (e (8ij + (-OT)2(k?l) i) + 0 (T1/2); (d) N-1 _T-k 1 2AYtYt1
+ O(T-1/2); (e) N-1 ETT ?1AYtZtJ = -81joe2 + OP(T-1/2).
PROOF: From Nabeya and Perron (1994), we can write Yt = aTet + bTXt where Xt = (1 +
c/T)Xt1 + et, aT = (-OT)(1 + cIT), bT = 1-(1 c/T)(-OT),
+ aT 1, T112bT 8 as T oo. Also,
ET-1Xt_et = Op(T). For part (a), we have:
T T
2 2
N-1N-1 , ~Yt-i N'1
=N E (aTet-1 + bT X21+
t-1 +
aTb -e
TXt-l-1)
t=kmax+l t=kmax+l
T / c k-j
N Y, +bTXt-1)
- -(-OT) ek1
+tk1
(aTe-1 Z(-OT)Yt-i-j-1
t=kmax+l i=O
T
AT
N- ~, [et-i - et-k-1 (-OT) ][et-j t-k-1 (-OT)klJ] + (T12)
t=kmax+1
T T T
N-1 Z AYtYt1-= cN t +N
Yt- (aTe_l - + bTXt-l)(et + OTet-1),
t=kmax+1 t=kmax+l t=kmax+l
T T /C
N-1 Z Aytzt = N-1 t m
Tyt-k+( +e
(-(-OT)etl)
t=kmax+1 t=kmax+1
The proof of Theorem 2 follows directly from the following Lemma concerned with the limit of
go0, the OLS estimate of go0 from the regression (A.1), which is equivalent to the OLS estimate of
go0from the autoregression (8).
LEMMA A.2: Consider the OLS estimate of g0 from the autoregression (8). Let x = f2 Jo (r)2dr
when p = 0 and x = f2R VCe(r)2dr when p = 1. Then, (i) uniformly in 0 < k < K1T: k(30= -x-1;
(ii) if kIT1/2 -+ K, T1/2p0 =: A, where A is a random variable with E(A) = -a(K)E(x-1), 9(K) =
28 exp(-2K5)/(1 - exp(-2K8)); (iii) uniformly in K2T < k < K3T, T1/2/30= OP(l).
(i) Case 0 < k < K1T. Let Xt = (Yt-1, Z), Zt = (Zt-*1 Zt-k). Define the following two (k + 1)
by (k + 1) matrices:
- T T
2 Yt-lzt,
yt2_1
T t=kmax+l t=kmax +1
RT= N-1 XtX = N-' ,
t=kmax +1 T
Lt=kmax +l t=kmax +l
l+x el
_ e Ik + CC'-
where x 82 f J, (r)2dr, e' = (1, 0lxk) and c is a k x 1 vector of ones. The inverse of R is
-k +l -k 1 1 1
-k k(l +x) -(1+x) -(1+x) -(1 + x)
R1=J21 1 -(l+X) k(l+x) -x _x
1 k+1 x
1 -(1 + x) -x k(l + x) -x
L -(1 +x) -x -x -x k(l+x)
LEMMA A.3: (a) maxo<k<KlT IIRT - RT| = Op(K1T/;), (b) maXO<k<KlT IIRT RI =
?nP(K1 T1N1T)-
1548 SERENA NG AND PIERRE PERRON
PROOF: Using Lemma A.1, we have Ri -Rij = Cij where Cij= Op(T-1/2) uniformly in 0 < k < KlT
for i, j = 1, . . ., k + 1 and part (a) follows. To prove part (b), note that the lower k x k submatrix
of R is, say, Rk = (Ik_+ tC). Then (Ik + W)-1 = Ik - (1 + C't)-1tc' has eigenvalues of 1 and (1 + L't)-1.
Hence maxO<k<KlT IIR-1I= Op(l). Using arguments as in Perron and Ng (1998), the (k + 1) x (k + 1)
matrix R also has eigenvalues of the same order as Rk. Hence maxo<k<KlTh1R-111= Op(l). Let QT =
I JRT- -Re 11.Then maXO<k<KlT QT< maXO<k<KlT iTi IRT-RII IR-1 ii= OP(K1T/17\;) by arguments
analogous to Berk (1974).
PROOF: Part (a) follows directly from Lemma A.1 (d, e). Since V = (-1, -1, Olxk-l)',
maxO<k<KlT 1IvII =
Op(l) and maxo<k<KlT I1T I = Op(l) in view of part (a) of the Lemma.
Let e' = (1, Olxk)- We have
For a matrix M, denote the i, jth element of M-1 by me'. We now consider the limit of each of the
four terms. For A,
O m<ax
IIke(R+1
maxjj|ke (RT
-R(VT
R)(VTl)| - V)12<KIT
< KlT I1 -Fli 112
p
li 11T-
i, T
|| P
II
Vi 1
O<k<KlT i=1
K2 k K'
<1 KIT pIli - P112 < (Kl-T O-
since i=r-r1-I2 < - R-1 12= Op(k2/T) < Op(KlT/T). For the term B, we need
k+1 k+1
where c* = maxi, Cij. But the sum of the norms of the elements of the first row of R-1 is
[2k + (k - 1)][1 + (k + 1)x1-l = OP(1), and the sum of the norms of the sum of the first two
elements of each row of R-1 is given by (1 + kx + (k - 1)x)/(1 + (k + 1)x) = OP(l). Since c* =
OP(T-1/2), maxO<k<KlT Ike'(RTl - K-)Flll = OP(KlT/T1/2). For the term C, first note that
(VT -V) =
NA( YtYt-1+1)( Yz-+1,
T T 1
Z 'YtZ,-2 Z, Y(Zt-kJ.
t=kmax+1 t=kmax+1
LAG LENGTH SELECTION 1549
Thus,
max IIke'R-1(VT-V)jj
O<k'KlT
( 1T +
1 + (KIT + l)X 0[(KlT
<+(KT+) + 1)) ((N-1 ?, Atyty
t=kmax +1
+ i)
- K tll
=kmaxjY j=2 t=kmax+l
For D,
max (ke'R- V) K= T 1 -1
O<KT 1 1KT + 1)X X=8
fo J, (r)dr
Combining the results, we have kf3o = op(l) + op(l) + op(l) + Op(l) = Op(1). Thus, kf30 =
(82 JJ(r)2dr)-1 uniformly in 0 < k < K1T.
For the case where kIT112 K, we first note that the relevant matrix R is given by
O-
1r+x Ie'4 I
e Ik + AA J
where_' = [(-OT)k . , (-OT)]. The limiting vector V remains unchanged. Now we have v7I3T=
v-7e'R-' V + <7W, where W = A + B + C. It is easy to see that E(VYW) = 0 in large samples. So,
we need to evaluate v/7e'RK-1, (minus) the sum of the first two elements of R-1. By partitioned
matrix inversion, these are given by
Since (Ik + /Lt')i = 'k- bb/A, A = 1 + b'4, e'(Ik + AA') ie = e'e - (e'AA'e/A) = 1 - [(-OT)2k/A].
Hence, 1- e'(Ik +y 1) e = (-OT)2k/A and substitutingfor A = (1- (-OT)2(k+l))/(l - OT )
_ _
(A.4) v~Ye/k1f~v- = (_ )2k(1 -_OT)2 _(_0)2k(1 OT)2+~ (1/2),
(A.4) [1 - (-0)2(k+l)]X + (-0)2k(1 - OT)2=[1 - (-O)2(k?l) Ox + (T-
which converges to -a(K)/X setting k = KT1/2 and taking the limit as T oo.
2 1+x e' ]
and note that
? k-lxl ? Ik--1_
1550 SERENA NG AND PIERRE PERRON
Thus, IRII = Op(l) IIR-111= Op(l). From Lemma A.1 (a, b, c), maXK2T<k<K3T IT-
and
RI = OP(K3T/T1 maXK2T<k<K3T IIRTII< maxK2T<k<K3T IRT1 RI + maxK2T<k<K3T
). Also, RI=
Op(K3TIT12) I
since maXK2T<k<K3T RI = Op(l) and K3T/T112 + c. Since maxK2T<k<K3TIIR1 <
X <k<K3TIRT I, maXK2T<k<K3T II ll Op(T112/K3T). It follows that
1k1-
max glRT-II < max IlRTl
1IJIRT-RIIIIJ-1II
K2)T<k<K3T K2T<k K3T
We also have maxK2T<k<K3T IIpT-VF = Op(K12 /T 1/2) and I1VII= Op(1). Thus,
T1/2 -T1=T/22elR1-J PT
= A + B + C + D.
For A, we have
k
max IIT 1e'(R R)JT V) 12< max TZj Fl1 i 12I J"1, 112i
K2Tk<K3T K2T_kKT i-
k
< max cL)71i-1i I2
K2T<k<K3T i=1
-I 112=
<KkaKC jlI-
RT Op()
Therefore CR-V =-C2 where C2 is the second column of the matrix C. Then e'R1 CR1 V=
x'1(-C12+C22). Since C12 and C22 = Op (T-1/2), maxK2T <k<K3T IITI/2e'Rk-CR-1 V11 = Op(l). For the
third term C,
T 1/2 C
<_ Op(l).
x T12
T1/
=
For D,
Combining the results, we have T1/2/30 = Op(l) uniformly in K2T < k < K3T.
LAG LENGTH SELECTION 1551
PROOF OF LEMMA 1
LEMMA A.5: Consider the estimated autoregression (8) when the DGP is given by (14). Let &k,=
N-1 ET ; then uniformly over 0 < k < K3T,
61k2
= N-XE'Ek-N E'X(X'X) X'Ek.
By direct calculations,
T T
(A.5) N-1E'Ek = N-1
Ze2e + (-OT)2(k?l)Nl1e+kl
)2(k+l)N-1 ? e2 + (T )
t=kmax+1 t=kmax+1
T ( C k
= N-1 (aT t-l+ bTXt-1) (et-(-OT) t-k-1 + T ( OT) Yt-i-iJ
t=kmax+l i
=Op(T-1/2)
and
T
N1 E zt_etk
t=kma +1
T
= N-1 , (et - (-OT)kljet-k-l) (et - (-OT)k1 et-k-1) + Op(T1/2)
t=kmax1
T
= (_0 )2(k+l)-jN- E e21 + Op(T1/2),
t=kmax+1
Therefore,
e
(A.7) N-1 (X'X) = o-e2 [ + ] + OP(T-1/2) _ R + OP(T 1/2).
where R-1 (k) is the lower right k x k submatrix of R-1. By partitioned matrix inversion,
= [Jk + ]1 t7,
R1(k) = Jk 1- J- /A,
'R-l (k) =
= ~1C
~_1
_k-
-
-(/_j
1
_L)2 /A,
h4/
1 + t'-1
1 + 'J1 -
[1- (oTO)2(k+l)] + (-OT)2k [1 -
(OT)2]/X
1 - (_ OT)2k 1
= (
(T 2
OT)2l - (_0T )2(k?1)Ol(T+
J p(- /)
).
Hence,
It follows that
(
ln(rk2) = ln(o-e2) + [( o ?)2(k+l
( (OT2 + Op(T
PROOF OF THEOREM 3
The strategy of the proof is to first determine the rate of increase of k through a global minimiza-
tion of the objective function. Once this rate is found, the proportionality factor is deduced. In the
following, we use the notation k ocf (T) for the case where k/f (T) -+ c for some finite constant c
strictly above 0. We also let d > 0 be a finite constant not necessarily the same throughout.
For both the IC and MIC, it is readily seen that the objective function is Op(CT/T1/2) when
kIT1/2 -+ K(> 0), and that it is Op(CTk/T) when kIT1/2 _+ oo. Hence, irrespective of the choice of
CT, having kIT1/2 -+ K will always, in large samples, yield a smaller value of the objective function
than with k/T1/2 _+ oo. Hence, we only need to compare the cases k/T1/2 -+ K(> 0) and k/T1/2 -+ 0.
LAG LENGTH SELECTION 1553
We start with the analysis for the IC. When kIT1l2 O 0, there are two cases of interest. The first
is when k2CT/T -+ d in which case the objective function is Op(l/k). The minimal value is obtained
if k is chosen as the maximal value permissible, which means k oc AIT/CT. The minimized objective
function is then OP(,(/CT/v'7). The second case is when T/k2CT -+ d and the objective function
is Op(kCT/T), which is minimized by taking the minimal permissible rate of increase for k, again
k oc \IT/CT, and the minimized objective function is then OP(/ICT/v'7). So the case with kIT"/2
K(> 0) yields an objective function of order Op(CT/T1/2) and the case k/T1/2 -+ 0 an objective
function of order OP(I/CT/v'). Hence, k oc \IT/CT. To derive the constant of proportionality, note
that if CT is fixed, k = KVY and the objective function is (ignoring constants)
CT
2- (K) +
T
TT
Taking derivatives and equating to zero yields the stated solution. When CT ?-+ c as T o-+ c, the
objective function (ignoring constants) is
1 + CTIT/CT
KA/TITCT T
and the value of K that minimizes this expression is 1. Consider now the MIC(k). The first case is
where
CT CTk2
(A.8) k --+d and T d.
Then the objective function when kIT1/2 + 0 is Op(k-1). Hence, the highest possible rate is chosen
and is given by k oc \IT/CT and the minimized objective function is of order Op(NICT/T). Since
the objective function when kIT1/2 -* K(> 0) is Op(CT/T1/2), we have k oc T1/2 if CT is fixed and
k oc \IT/CT if CTo-+ c. However, in view of the requirement that CT/k -+ d this implies the need
for CT/T -+ d. To obtain the constant of proportionality when CT is fixed, we substitute k = KIT
in (16) and equate its first derivative to zero and solve for K. When CT is increasing, we substitute
k = K\IT/CT in (15), and again equate the first derivative to zero and solve for K.
Consider now the case where CI/T -+ oo. There are two cases to cover when either of the condi-
tions in (A.8) is not satisfied. Suppose first that k3/T -+ d, in which case k/CT -+ d. The objective
function in the case k/T1/2 O* 0 is Op(CT/k2) and the highest rate for k is chosen, namely k oc Tll3.
The minimized objective function is then of order Op(CT/T2/3), which is of a smaller order than
the case with k/T1/2 -+ K(> 0). Hence, k oc Tll3 is the resulting rate. If T/k3 -+ d, in which case
CTk2/T -+ oo, the objective function is of order Op(kCT/T) and the smallest rate for k is optimal.
So again, k oc T1l3 and the minimized objective function is also of order Op(CT/T2/3). To obtain the
factor of proportionality, let k = KT13. The objective function is then (ignoring constants)
1 CTZI CTK
KT1/3 K2T213 T2/3
When C3/T -+ oo, the first term is negligible compared to the last two. Hence, minimizing the last
two terms with respect to K yields K = (2Z,)1/3
REFERENCES
DICKEY,D. A., AND W. A. FULLER(1979): "Distribution of the Estimators for Autoregressive Time
Series with a Unit Root," Journal of the American StatisticalAssociation, 74, 427-431.
DUFOUR, J. M., AND M. KING (1991): "Optimal Invariant Tests for the Autocorrelation Coefficient
in Linear Regressions with Stationary and Nonstationary Errors," Journal of Econometrics, 47,
115-143.
ELLIOTT,G., T. J. ROTHENBERG,AND J. H. STOCK(1996): "Efficient Tests for an Autoregressive
Unit Root, Econometrica, 64, 813-836.
FRANSES,P. H., AND N. HALDRUP(1994): "The Effects of Additive Outliers on Tests of Unit Roots
and Cointegration," Journal of Business and Economic Statistics, 12, 471-478.
FULLER,W. A. (1976): Introduction to Statistical Time Series. New York: John Wiley.
GOURIEROUX,C., AND A. MONFORT(1995): Statisticaland Econometric Models, Vol. 1. Cambridge:
Cambridge University Press.
HANNAN, E. J., AND M. DEISTLER (1988): The Statistical Theoty of Linear Systems. New York: John
Wiley.
LOPEZ,J. H. (1997): "The Power of the ADF Test," Economics Letters, 57, 5-10.
NABEYA,S., AND P. PERRON(1994): "Local Asymptotic Distribution Related to the AR(1) Model
with Dependent Errors,"Journal of Econometrics, 62, 229-264.
NG, S., AND P. PERRON(1995): "Unit Root Tests in ARMA Models with Data Dependent Methods
for the Selection of the Truncation Lag,"Journal of theAmerican StatisticalAssociation, 90, 268-281.
(2000): "A Note on the Selection of Time Series Models," Manuscript, Department of Eco-
nomics, Boston College.
PANTULA,S.G. (1991): "Asymptotic Distributions of Unit Root Tests When the Process is Nearly
Stationary,"Journal of Business and Economic Statistics, 9, 63-71.
PERRON,P., AND S. NG (1996): "Useful Modifications to Unit Root Tests with Dependent Errors
and their Local Asymptotic Properties," Review of Economic Studies, 63, 435-465.
(1998): "An Autoregressive Spectral Density Estimator at Frequency Zero for Nonstationar-
ity Tests," Econometric Theory, 14, 560-603.
PERRON,P., AND G. RODRIGUEZ(1998): "GLS Detrending, Efficient Unit Root Tests and Struc-
tural Change," manuscript, Boston University.
PHILLIPS,P. C. B. (1987): "Time Series Regression with Unit Roots," Econometrica, 55, 277-302.
PHILLIPS,P. C. B., AND P. PERRON(1988): "Testing for a Unit Root in Time Series Regression,"
Biometrika, 75, 335-346.
SAID, S. E., AND D. A. DICKEY(1984): "Testing for Unit Roots in Autoregressive-Moving Average
Models of Unknown Order," Biometrika, 71, 599-607.
SCHWERT,G. W. (1989): "Tests for Unit Roots: A Monte Carlo Investigation," Journal of Business
and Economic Statistics, 7, 147-160.
STOCK,J. H. (1990): "A Class of Tests for Integration and Cointegration," Manuscript, Harvard
University.