Auto Correlation
Auto Correlation
12.1 Motivation
• Autocorrelation occurs when something that happens today has an impact on what
• Note: Autocorrelation can only happen into the past, not into the future.
²i = ρ²i−1 + ui
• Note: In general we can have AR(p) errors which implies p lagged terms in the error
structure, i.e.,
  • Note: We will need |ρ| < 1 for stability and stationarity. If |ρ| < 1 happens to fail
       then we have the following problems:
                                                 187
    2. ρ > 1: The process explodes
• The consequences for OLS: β̂ is unbiased and consistent but no longer efficient and
  usual statistical inference is rendered invalid.
• Lemma:
                                               ∞
                                               X
                                        ²i =         ρj ui−j
                                               j=0
• Proof:
²i = ρ²i−1 + ui
and
                                          188
  If we continue to substitute for ²i−k we get
                                                 ∞
                                                 X
                                          ²i =         ρj ui−j
                                                 j=0
                                                X∞
                                     E[²i ] = E[   ρj ui−j ]
                                                       j=0
                                                  ∞
                                                  X
                                             =           ρj E[ui−j ]
                                                   j=0
                                                  ∞
                                                  X
                                             =           ρj 0 = 0
                                                   j=0
• The variance of ² is
var(²i ) = E[²2i ]
• Note: E[ui uj ] = 0 for all i 6= j via the white noise assumption. Therefore, all terms
ρN where N is odd are wiped out. This is not the same as E[²i , ²j ] = 0.
= σu2 + ρ2 (var(²i−1 ))
                                             189
  But, assuming homoscedasticity, var(²i ) = var(²i−1 ) so that
                                          = σu2 + ρ2 (var(²i ))
                                                 σu2
                               var(²i ) =             ≡ σ2
                                               1 − ρ2
• Note: This is why we need |ρ| < 1 for stability in the process.
• If |ρ| > 1 then the denominator is negative and the var(²i ) cannot be negative.
= E[(ρ²i−1 + ui )²i−1 )
• In general
                                                               ρj−i 2
                              cov(²i ²i−j ) = E[²i ²i−j ] =         σ
                                                              1 − ρ2 u
                                            190
• which implies that
                                                                                         
                                                                   2       3       N −1
                                           1         ρ        ρ       ρ       · ρ        
                                                                                         
                                           ρ          1        ·      ·       · ρN −2    
                                                                                         
                                                                                         
                                                                                         
                              σu 
                               2          ρ2           ·        1      ·       ·     ·    
                        2
                       σ Ω=                                                              
                            1 − ρ2 
                                           ·          ·        ·      ·       ·     ·
                                                                                          
                                                                                          
                                                                                         
                                                                                         
                                           ·          ·        ·      ·       ·     ·    
                                                                                         
                                                                                         
                                         ρN −1 ρN −2            ·      ·       ·     1
                                                            cov(²i , ²i−1 )
                            corr(²i , ²i−1 ) = p
                                                           var(²i )var(²i−1 )
                                                       ρ
                                                           σ2
                                                      1−ρ2 u
                                                =        2
                                                        σu
                                                                    =ρ
                                                       1−ρ2
• Note: If we know Ω then we can apply our previous results of GLS for an easy fix.
• This implies that σ 2 (X 0 X)−1 tends to be less than σ 2 (X 0 X)−1 X 0 ΩX(X 0 X)−1 if ρ > 0
• This implies that t-statistics are over-stated and we may introduce Type I errors in
our inferences.
                                                191
  • How do we know if we have Autocorrelation or not?
• Take the sign of each residual and write them out as such
• Let a ”run” be an uninterrupted sequence of the same sign and let the ”length”
• Here we have 7 runs: 4 plus, 7 minus, 4 plus, 1 minus, 1 plus, 3 minus, 6 plus.
            N = n1 + n2      Total Observations
            n1               Number of positive residuals
                                                192
       where
                                  2n1 n2         2n1 n2 (2n1 n2 − n1 − n2 )
                        E[k] =           ; σk2 =
                                 n1 + n2         (n1 + n2 )2 (n1 + n2 − 1)
• Here we have n1 = 15, n2 = 11 and k = 7 thus E[k] = 13.69 σk2 = 5.93 and
σk = 2.43
However, k = 7 so we reject the null hypothesis that the errors are truly ran-
dom. In STATA after a reg command, calculate the fitted residuals and use the
4. Durbin-Watson Test
• The Durbin-Watson test is a very popular test for AR(1) error terms.
• Assumptions:
                                             PN                  2
                                               t=2 (²̂t − ²̂t−1 )
                                      d=         PN 2
                                                     t=1 ²̂t
                                         193
  which is equivalent to
                                                                  
                                          1    −1   0   ·   ·    0 
                                                                   
                                          −1   2    −1 0 ·       0 
                                                                   
                                                                   
                                                                   
                      0
                    ²̂ A²̂                0    −1   2   0 ·      0 
                        0
                            where A = 
                                      
                                                                    
                                                                    
                      ²̂ ²̂               ·    ·    ·   ·   ·    · 
                                                                   
                                                                   
                                          ·    ·    ·   · 2     −1 
                                                                   
                                                                   
                                           ·    ·    ·   · 1     1
• Some statistical packages report the Durbin-Watson statistic for every regression
• a rule of thumb for the DW test: a statistic very close to 2, either above or below,
• There is a potential problem with the DW test, however. The DW test has three
  regions: We can reject the null, we can fail to reject the null, or we may have an
  inconclusive result.
• The reason for the ambiguity is that the DW statistic does not follow a standard
  distribution. The distribution of the statistic depends on the ²̂t , which are de-
  pendent upon the Xt0 s in the model. Further, each application of the test has a
  different number of degrees of freedom.
                                     194
                          • To implement the Durbin-Watson test
                              (b) Using N , the number of observations, and k the number of rhs variables
                                  (excluding the intercept) determine the upper and lower bounds of the DW
statistic.
                          • Then if
                              d < DWL                                                                   Reject H0 : Evidence of positive correlation
                          • For example, let N = 25, k = 3 then DWL = 0.906 and DWU = 1.409. If
                             d = 1.78 then d > DWU but d < 4 − DWU and we fail to reject the null.
generate a P -value for the DW calculated. If not, then non-parametric tests may
                                                                                                           195
      • Let’s look a little closer at our DW statistic.
                                                   PN           2        PN                      PN        2
                                                      i=2 ²ˆi       −2       i=2 ²̂i ²̂i−1   +       i=2 ²̂i−1
                                DW =
                                                   h                           ²̂0 ²̂           i
                                                              P
                                                    ²̂0 ²̂ − 2 N  ²̂ ²̂
                                                               i=2 i i−1 + ²̂ 0
                                                                                ²̂ − ²̂2
                                                                                       1 − ²̂ 2
                                                                                              N
                                          =
                                                                               ²̂0 ²̂
Therefore,
                                                                PN
                                                  2²̂0 ²̂ − 2        i=2 ²̂i ²̂i−1   − ²̂21 − ²̂2N
                               DW =
                                                                        ²̂0 ²̂
                                                           PN
                                                       2        i=2 (ρ²̂i−1    + ui )²̂i−1 − [²̂21 + ²̂2N ]
                                         = 2−
                                                                                 ²̂0 ²̂
                                                     PN          2
                                                         i=2 ²̂i−1                       ²̂21 + ²̂2N
                                          γ1 =                           and γ2 =
                                                          ²̂0 ²̂                              ²̂0 ²̂
5. Durbin’s h-Test
• The Durbin-Watson test assumes that X is non-stochastic. This may not always
be the case, e.g., if we include lagged dependent variables on the right-hand side.
                                                           196
     • Durbin offers an alternative test in this case.
                                     µ     ¶s
                                         d          N
                                   h= 1−
                                         2    1 − N (var(α))
• Note: If N var(α) > 1 then we have a problem because we can’t take the square
6. Wald Test
7. Breusch-Godfrey Test
• Regress ²̂i on Xi , ²̂i−1 , . . . , ²̂i−p and obtain N R2 ∼ χ2p where p is the number of
lagged residuals.
                                             197
 8. Box-Pierce Test
                                           L                        PN
                                           X                         j=i+1 ²̂j ²̂j−1
                                    Q=N          ri2   where ri =    PN 2
                                           i=1                          j=1 ²̂j
  • One way to fix the problem is to get the error term of the estimated equation to satisfy
       the full ideal conditions. One way to do this might be through substitution.
yt = β0 + β1 Xt + ρ²t−1]+ut
• We can estimate the transformed model, which satisfies the full ideal conditions as
                                                  198
• One downside is the loss of the first observation, which can be a considerable sacrifice
in degrees of freedom. For instance, if our sample size were 30 observations, this
  where                                                                   
                                                      2           N −1
                                         1     ρ ρ       · · ρ            
                                                                          
                                         ρ     1   ·     · ·      ·       
                                                                          
                                                                          
                                                                          
                              1   
                                           ·    ·   ·     · ·      ·       
                                                                           
                         Ω=
                            1 − ρ2 
                                          ·    ·   ·     · ·      ·
                                                                           
                                                                           
                                                                          
                                                                          
                                          ·    ·   ·     · ·     ρ        
                                                                          
                                                                          
                                         ρN −1 ·    ·     · ρ     1
• Note that for GLS we seek Ω−1/2 such that Ω−1/2 ΩΩ−1/2 = I and transform the model.
Thus we estimate
  where                             p                                 
                                      1 − ρ2 0 · ·                 0
                                                                    
                                                                    
                                      −ρ     1 0 ·                0 
                                                                    
                                                                    
                          Ω−1/2   =
                                       0    −ρ 1 0                0 
                                                                     
                                                                    
                                                                    
                                       ·     · · ·                · 
                                                                    
                                        0     · · −ρ               1
                                          199
   • This implies that
                                            p                 p                   p
                      1st observation          1 − ρ2 y1 =        1 − ρ2 X1 β +    1 − ρ2 ²1
where ui = ²i − ρ²i−1
   • Thus,
                                        cov(β̃) = σu2 (X 0 Ω−1 X)−1
     and
                                            1
                                   σ̃ 2 =     (y − X β̃)0 Ω−1 (y − X β̃)
                                            N
• Methods of estimating ρ
                                                        PN
                                                        i=2    ²̂i ²̂i−1
                                                 ρ̂ =   P N        2
                                                             i=2 ²̂i
                                                 200
2. Durbin’s Method (1960)
  From this we obtain ρ̂ which is the coefficient on yi−1 . This parameter estimate
  is biased but consistent.
We can correct the covariance matrix of β̂ much like we did in the case of het-
eroscedasticity. This extention of White (1980) was offered by Newey and West.
                         \
                        cov(                    \
                            β̂) = σ 2 (X 0 X)−1 X 0 ΩX(X 0 X)−1
where
                                   L    N
          \      1 X 2      0   1 X X                         0
           0
          X ΩX =    ²̂i Xi Xi +             ωi ²̂j ²̂j−1 (Xj Xj−1 + Xj−1 Xj0 )
                 N              N i=1 j=i+1
                                     201
             where
                                                           i
                                               ωi = 1 −
                                                          L+1
A possible problem in this approach is to determine L, or how far back into the
• We sometimes use this method because simulation models have shown that a more
when this happens because the estimated coefficient on the k th lag will be insignificant.
• This approach is useful in time-series studies with lots of data. Thus you are safely
  • Having estimated β̃GLS we know that β̃GLS is BLUE when the cov(²) = σ 2 Ω when
       Ω 6= I.
• With an AR(1) process, we know that tomorrow’s output is dependent upon today’s
• We estimate
yt = Xt β + ²t
where ²t = ρ²t−1 + ut .
                                               202
  • The forecast becomes
• To finish the forecast, we need ρ̂ from our previous estimation techniques and then we
recongize that
²˜t = yt − Xt β̃
  • What if Xt+1 doesn’t exist. This occurs when we try to perform out-of-sample fore-
       casting. Perhaps we use Xt β̃?
• In this example we look at the relationship between the U.S. average retail price of
       gasoline and the wholesale price of gasoline from from January 1985 through February
       2006, using the Stata data file gasprices.dta.
• As an initial step, we plot the two series over time and notice a highly correlated set
of series:
                                              203
        250
        200
        150
        100
        50
allgradesprice wprice
• The results suggest that for every penny in wholesale price, there is a 1.21 penny
increase in the average retail price of gasoline. The constant term suggests that, on
                                          204
  average, there is approximately 32 cents difference between retail and wholesale prices,
   . dwstat
  Durbin-Watson d-statistic(         2,        254) =    .1905724
• The DW statistic suggests that the data suffer from significant autocorrelation. Re-
  versing out an estimate of ρ̂ = 1 − d/2 suggests that ρ = 0.904.
                                              205
 .    reg allgradesprice wprice, r
• The robust regression results suggest that the naive OLS over-states the variance in
       the parameter estimate on wprice, but the positive value of ρ suggests the opposite is
       likely true.
F(1,252) =   3558.42
Prob > F       =    0.0000
----------------------------------------------------------------------
           |            Newey-West
allgrades |      Coef.   Std. Err.    t    P>|t| [95% Conf. Interval]
----------+-----------------------------------------------------------
   wprice |   1.219083   .0204364 59.65    0.000 1.178835     1.259331
    _cons |   31.98693   2.023802 15.81    0.000 28.00121     35.97265
     • The Newey-West corrected standard errors, assuming AR(1) errors, are significantly
       higher than the robust OLS standard errors but are only slightly lower than those in
naive OLS.
                                             206
  • Prais-Winsten using Cochrane-Orcutt transformation (note: the first observation is
lost):
                                          207
------------------------------------------------------------------------------
Durbin-Watson statistic (original)    0.190572
Durbin-Watson statistic (transformed) 2.052344
• Notice that both Prais-Winsten results reduce the parameter on WPRICE and the
    increases the standard error. The t-statistic drops, although the qualitative result
    doesn’t change.
• In both cases, the DW stat on the transformed data is nearly two, indicating zero
autocorrelation.
• We can try the “large sample fix” by going back to the original model and including
. durbina
                                          208
---------------------------------------------------------------------------
                        H0: no serial correlation
• The large sample fix suggests a smaller parameter estimate on WPRICE, the standard
error is larger and the t-statistic is much lower than the original OLS model.
  • In this case, we included three lagged values of the dependent variable. Note that
       they are all significant. If we include four or more lags, the fourth (and higher) lags
       are insignificant. Notice that the marginal effect of wholesale price on retail price is
       the percentage of people who answer, “I don’t know” to the Gallup poll question “How
       is the president doing in his job?” The data are posted at the course website and were
                                              209
  • Our first step is to take a crack at the standard OLS model:
------------------------------------------------------------------------------
    dontknow |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     newpres |   4.876033   .6340469     7.69   0.000     3.624252    6.127813
unemployment | -.6698213    .1519708    -4.41   0.000    -.9698529   -.3697897
     eleyear |   -1.94002   .5602409    -3.46   0.001    -3.046087   -.8339526
   inflation |   .1646866   .0770819     2.14   0.034     .0125061    .3168672
       _cons |   16.12475   .9076888    17.76   0.000     14.33273    17.91678
  • Things look pretty good. All variables are statistically significant and take reasonable
    values and signs. We wonder if there is autocorrelation in the data, as the data are
    time series. If there is autocorrelation it is possible that the standard errors are biased
    downwards, t-stats are biased upwards, and Type I errors are possible (falsely rejecting
    the null hypothesis).
    We grab the fitted residuals from the above regression: . predict e1, resid. We then
    plot the residuals using scatter and tsline (twoway tsline e1——scatter e1 yearq)
                                            210
          15
          10
      Residuals
          05
          −5
Residuals Residuals
• It’s not readily apparent, but the data look to be AR(1) with positive autocorrelation.
  How do we know? A positive residual tends to be followed by another positive residual
                                         211
                        0.60      0.40
        Partial autocorrelations of e1
          0.00        0.20
                        −0.20
                                         0                     10                  20    30   40
                                                                                   Lag
                                         95% Confidence bands [se = 1/sqrt(n)]
  • We see that the first lag is the most important, the other lags (4, 27, 32) are also
    important statistically, but perhaps not economically/politically.
  • Can we test for AR(1) process in a more statistically valid way? How about the Runs
    test? Use the STATA command runtest and give the command the error term defined
    above, e1.
. runtest e1
 N(e1 <= -.3066953718662262) = 86
 N(e1 > -.3066953718662262) = 86
        obs = 172
    N(runs) = 54
         z = -5.05
   Prob>|z| = 0
• Looks like error terms are not distributed randomly (p-value is small). The threshold(0)
                                                                                 212
. runtest e1,    threshold(0) N(e1 <= 0) = 97
 N(e1 > 0) =     75
        obs =    172
    N(runs) =    56
         z =     -4.6
   Prob>|z| =    0
• It still looks like the error terms are not distributed randomly.
. dwstat
• The results suggest there is positive autocorrelation (DW stat is less than 2). We can
test this more directly by regressing the current error term on the previous period’s
error term
. reg e1 l.e1
------------------------------------------------------------------------------
e1           |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
e1           |
          L1 |   .4827093    .066574     7.25   0.000     .3512854    .6141331
_cons        | -.0343568    .2074016    -0.17   0.869    -.4437884    .3750747
                                            213
  • The l.e1 variable tells STATA to use the once-lagged value of e1. In the results, notice
the L1 tag for e1 - the parameter estimate suggests positive autocorrelation with rho
close to 0.48.
• Just for giggles, we find that 2*(1-rho) is ”close” to the reported DW stat
. reg e1 l.e1,noc
------------------------------------------------------------------------------
e1           |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
e1           |
          L1 |   .4826932   .0663833     7.27   0.000     .3516515    .6137348
                                           214
------------------------------------------------------------------------------
e1           |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
e1           |
          L1 |   .4986584   .0766115     6.51   0.000     .3474132    .6499036
          L2 | -.0572775    .0759676    -0.75   0.452    -.2072515    .0926966
• It doesn’t look like there is an AR(2) process. Let’s test for AR(2) with Bruesch-
Godfrey test:
------------------------------------------------------------------------------
e1           |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
e1           |
          L1 |   .5015542   .0775565     6.47   0.000     .3484092    .6546992
          L2 | -.0400182    .0795952    -0.50   0.616    -.1971888    .1171524
newpres      | -.5280386    .5741936    -0.92   0.359    -1.661855    .6057781
 ( 1)   L.e1 = 0
 ( 2)   L2.e1 = 0
                                           215
       F(   2,   163) =      24.80
             Prob > F =       0.0000
  • Notice that we reject the null hypothesis that the once and twice lagged error terms
    are jointly equal to zero. The t-stat on the twice lagged error term is not different from
• An AR(1) process has been well confirmed. What do we do to ”correct” the original
------------------------------------------------------------------------------
    dontknow |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     newpres |   5.248822   .7448114     7.05   0.000     3.778298    6.719347
unemployment | -.6211503    .2441047    -2.54   0.012      -1.1031   -.1392004
     eleyear | -2.630427    .6519536    -4.03   0.000    -3.917617   -1.343238
   inflation |   .1577092   .1247614     1.26   0.208    -.0886145    .4040329
       _cons |   15.91768   1.474651    10.79   0.000     13.00619    18.82917
-------------+----------------------------------------------------------------
         rho |   .5022053
------------------------------------------------------------------------------
Durbin-Watson statistic (original)    1.016382
Durbin-Watson statistic (transformed) 1.941491
                                            216
  • Now, inflation is insignificant - autocorrelation led to Type I error? Notice the new
------------------------------------------------------------------------------
    dontknow |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     newpres |   5.209751    .749466     6.95   0.000     3.730103      6.6894
unemployment | -.5766546    .2457891    -2.35   0.020    -1.061909   -.0914003
     eleyear | -2.707991    .6550137    -4.13   0.000    -4.001165   -1.414816
   inflation |   .1100198   .1229761     0.89   0.372    -.1327684    .3528079
       _cons |   15.98344   1.493388    10.70   0.000     13.03509     18.9318
-------------+----------------------------------------------------------------
         rho |   .5069912
------------------------------------------------------------------------------
Durbin-Watson statistic (original)    1.016382
Durbin-Watson statistic (transformed) 1.918532
                                            217
                                                                   Prob > F           =       0.0000
------------------------------------------------------------------------------
             |             Newey-West
    dontknow |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     newpres |   4.876033    1.06021     4.60   0.000      2.78289    6.969175
unemployment | -.6698213    .1502249    -4.46   0.000    -.9664059   -.3732366
     eleyear |   -1.94002   .5212213    -3.72   0.000    -2.969052   -.9109878
   inflation |   .1646866    .083663     1.97   0.051    -.0004868    .3298601
       _cons |   16.12475   .9251126    17.43   0.000     14.29833    17.95117
------------------------------------------------------------------------------
             |               Robust
    dontknow |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     newpres |   4.876033   .9489814     5.14   0.000     3.002486    6.749579
unemployment | -.6698213    .1257882    -5.32   0.000    -.9181613   -.4214812
     eleyear |   -1.94002    .419051    -4.63   0.000     -2.76734     -1.1127
   inflation |   .1646866   .0702247     2.35   0.020     .0260441    .3033292
       _cons |   16.12475   .7825265    20.61   0.000     14.57983    17.66967
• Here, inflation is still significant, although the standard error of inflation is a bit smaller
                                              218
1. Cochranne-Orcutt approach assumes a constant rho over the entire sample pe-
3. Newey-West standard errors do not adjust parameter estimates but do alter the
4. Robust standard errors are perhaps the most flexible option - the correction might
errors are not guaranteed to accurately control for the first order autocorrelation.
219