Panel Econometrics III
ECONOMETRICS
Today’s outline
                              Concluding the basics
   Relation between RE, FE, and pooled OLS
   Which estimator should one choose?
   More on the Hausman test
   Up next
                                                      ECONOMETRICS
Comparison of estimators and associated transformations
    OLS:           regression of yi on Xi → weights within and between variance equally
    FE (within):   regression of ÿi on Ẍi → uses only within variance
    Between:       regression of ȳi on X̄i → uses only between variance
    RE:            regression of yi − (1 − φ̂)ȳi on Xi − (1 − φ̂)X̄i
                   → can be interpreted as combination of OLS and between estimators
                   regression of ÿi + φ̂ȳi on Ẍi + φ̂X̄i
                   → can be interpreted as combination of within and between estimators
                   Important: combination is function of φ̂ and thus data-dependent
                   → weights within and between variance in data-dependent way
                                                                                          ECONOMETRICS
Efficiency of the RE estimator
   Based on the previous results we can understand why the RE estimator (if it is
   consistent) is more efficient than the FE estimator (and also the between estimator).
    I It optimally adjusts to the within versus between regressor dispersion.
    I It optimally adjusts to the within versus between error variance.
   In contrast, the FE estimator neglects any between variance and thus is only as good
   as the RE estimator if the between dimension is negligible.
   (But recall the exogeneity assumptions...)
                                                                                           ECONOMETRICS
FE estimator or FD estimator?
   The assumptions of the FE and FD estimators differ in one respect:
    I FE: uit is white noise over t
    I FD: uit is a random walk (at least under classical error assumptions)
   Choose according to what is more likely.
   Often, reality is in between: there is some serial correlation but not as much as
   predicted by the random walk assumption. In these cases, it might be helpful to apply
   both estimators with robust variance matrix → both point estimators and SE’s are
   consistent and the differences should be relatively small.
   If the differences are large, the strict exogeneity assumption may be invalid (for a test
   see next page).
                                                                                           ECONOMETRICS
Testing for strict exogeneity
   Recall that strict exogeneity means E(yit |Xi1 , . . . , XiT , ci ) = E(yit |Xit , ci ).
   A simple test for T ≥ 3 is to apply FE or FD to the augmented regression
                                   yit = Xit β + wi,t +1 γ + ci + uit ,
   where wi,t +1 is a subset of Xi,t +1 .
   Under the null of strict exogeneity, γ = 0. This can be tested with a simple F test.
                                                                                              ECONOMETRICS
FE estimator or RE estimator?
   The main difference between FE and RE assumptions is whether the ci are allowed to
   correlate with Xi or not.
    I Consistency of the RE estimator requires this correlation be zero. We have
      discussed many examples where this assumption is likely to fail. In such cases, you
      should not use the RE estimator.
    I Instead, when ci is expected to correlate with Xi you should use the FE (or FD)
      estimator which is still consistent.
    I If you are unsure, you can use a Hausman test to compare FE and RE estimators
      (see below). Beware of effects of such pretests, though.
    I Given the robustness of the FE estimator with respect to the question whether the
      ci are allowed to correlate with Xi , it is natural to ask why one should use the RE
      estimator at all. The answer is efficiency (see above).
                                                                                        ECONOMETRICS
Data example: pooled OLS / RE estimator inconsistent
Positive partial effect of xi on yi but ci negatively correlated with xi
                                                                           ECONOMETRICS
Data example: pooled OLS / RE estimator consistent
Positive partial effect of xi on yi and ci uncorrelated with xi
                                                                  ECONOMETRICS
Hausman testing principle
   The Hausman test is a general testing principle that compares two estimators β̂A and
   β̂B .
   Under the null, both estimators are consistent but only β̂B is efficient, i.e.,
   Avar( β̂A ) − Avar( β̂B ) > 0.
   Under the alternative, β̂B is inconsistent while β̂A remains consistent.
   Under general conditions, the Hausman statistic
                                                                        d
                   ( β̂A − β̂B )0 [Avar  A
                                              d ( β̂ )]−1 ( β̂ − β̂ ) → χ2r ,
                                   d ( β̂ ) − Avar
                                                    B         A    B
   where r is the number of parameters.
                                                                                      ECONOMETRICS
Hausman test to compare FE and RE estimators
   Test H0 : ci uncorrelated with Xi versus H1 : ci correlated with Xi .
   Under the null both RE and FE estimators are consistent but RE is more efficient,
   while under the alternative only FE is consistent.
   Suppose the strict exogeneity, invertibility and homoskedasticity/white noise
   assumptions (RE.1a, RE.2, RE.3) hold throughout. Further assume the regressors do
   not include variables that vary solely across t (such as time dummies). Will return to
   this.
   If the regressors include time-invariant variables, their parameters are not identified by
   the FE estimator. Hence, only the parameters of the time-varying regressors can be
   compared (Stata automatically excludes the other). In the following, for simplicity we
   assume Xit includes only variables that vary both with i and t.
                                                                                            ECONOMETRICS
Concretely
   In terms of assumption RE.1 (b)-(c) the hypotheses are:
   H0 : E(ci |Xi1 , . . . , XiT ) = E(ci ) = 0 vs. H1 : E(ci |Xi1 , . . . , XiT ) 6= E(ci )
   The classical Hausman statistic:
                                                                       d
              H = ( β̂FE − β̂RE )0 [Avar  FE
                                               d ( β̂ )]−1 ( β̂ − β̂ ) →
                                    d ( β̂ ) − Avar
                                                     RE        FE   RE   χ2K ,
   where K is the number of parameters (= the length of the vectors β̂FE and β̂RE ).
   The null is rejected if H exceeds the critical value derived from the χ2K distribution.
                                                                                              ECONOMETRICS
Hausman test – implementation
   The tricky thing is estimating the difference between the FE and RE (homoskedastic)
   variance matrices,
                                            (         ) −1 (              ) −1
                                               N               N
                                                                     − 1
            d ( β̂ ) − Avar
            Avar           d ( β̂ ) = σ̂u2 ∑ Ẍi0 Ẍi     − ∑ Xi0 Ω̂ Xi        .
                    FE            RE
                                                       i =1                   i =1
   In finite samples, this difference may not be positive definite.
   To mitigate this problem, Wooldridge (p. 331) suggests to use the same estimator of
   σu2 to estimate the FE variance
                                       (          ) −1
                                                   N
                                       σ̂u2       ∑ Ẍi0 Ẍi
                                                  i =1
   and the RE variance                 (                           ) −1
                                            N
                                                         −1
                                           ∑ Xi0 Ω̂           Xi          .
                                           i =1
                                                                                         ECONOMETRICS
Remarks
  The Hausman test has some important details:
   I It is (both under the null and the alternative) based on the strict exogeneity
     assumption. If this assumption fails, the plims of the RE and FE estimators will
     generally differ and the test will tend to reject.
   I It is—at least in the conventional form presented here—based on the assumptions
     RE.3. If this assumption fails, the asymptotic χ2 distribution will not hold and
     test decisions will be biased. (But a robust form is available, see below.)
                                                                                        ECONOMETRICS
... there is more
   In addition, the Hausman test can only compare estimators of regressors that vary
   both with i and t:
     I The parameters of time-invariant regressors are not identified by the FE estimator
        and thus cannot be compared to the RE estimator.
     I The parameters of regressors that vary solely across t (such as time dummies)
        have the same asymptotic variance when estimated by FE or RE. Hence, the test
        cannot distinguish the two estimation approaches.
     I Fortunately, Stata will automatically apply the Hausman test only to those
        parameters that are eligible.
   Note: Be sure that K is only the number of regressors that vary across both i and t.
   Regressors that are time-invariant or vary solely across t are excluded! (Again, Stata...)
                                                                                           ECONOMETRICS
Hausman test – Stata
   You first have to tell Stata that you have panel data:
   xtset id year
   FE estimator with classical variance matrix is computed and stored as “fixed”:
   xtreg y x1 x2 x3, fe
   estimates store fixed
   RE estimator with classical variance matrix is computed:
   xtreg y x1 x2 x3, re
   The Hausman test is computed based on the more efficient RE estimate of σu2 :
   hausman fixed ., sigmamore
   The Hausman test is computed based on the less efficient FE estimate of σu2 :
   hausman fixed ., sigmaless
                                                                                    ECONOMETRICS
Example: Effects of job training grants on scrap rates
Example 10.4 taken from Wooldridge’s textbook
    Note: regression includes two time dummies and one time-invariant variable (union)!
    *** load data and set panel ***
    use "jtrain1.dta", clear
    xtset fcode year
    *** FE regression and store result (Stata skips union) ***
    xtreg lscrap d88 d89 union grant grant 1, fe
    estimates store fixed
    *** run RE regression ***
    xtreg lscrap d88 d89 union grant grant 1, re
    *** compute test based on efficient estimate of Var(u) ***
    hausman fixed ., sigmamore
                                                                                          ECONOMETRICS
Stata output
               ECONOMETRICS
Hausman variable addition test
   Under maintained assumption RE.3, it can be shown that the Hausman statistic can
   also be obtained from estimating the augmented equation
                                    yit = Xit β + X̄i δ + vit
   by means of the RE estimator and computing the Wald statistic for exclusion of X̄i :
                                         0
                                           h        i −1
                                   W = δ̂ Avar
                                             d (δ̂)      δ̂.
   The Hausman statistic is identical to this Wald statistic.
                                                                                          ECONOMETRICS
Hausman variable addition test with general regressors
   A nice feature of the variable addition test is that we can use it even if we include
   regressors that do not vary across both i and t.
   Let us split the regressor set into
    I Xit which vary across both i and t,
    I zt which vary only across t (e.g., time dummies), and
    I hi which vary only across i (e.g., gender or race dummies).
   Then the structural equation is written as
                                yit = Xit β + zt γ + hi θ + vit .
   Since we can only compare the RE and FE estimators of β, the augmented regression is
                             yit = Xit β + zt γ + hi θ + X̄i δ + vit .
   Estimating this equation by RE and computing the Wald statistic for H0 : δ = 0 yields
   again the Hausman statistic.                                                            ECONOMETRICS
Hausman variable addition test with general covariance
   The variable addition test can even be used when assumption RE.3 does not hold.
   In this case, we estimate the augmented equation
                             yit = Xit β + zt γ + hi θ + X̄i δ + vit
   again by RE but now compute a robust estimator of Var( β̂RE ).
   Based on this robust variance estimator, we compute the Wald statistic for H0 : δ = 0.
   This yields a robust version of the Hausman statistic.
                                                                                       ECONOMETRICS
Further insights from the variable addition test ?
   Let us consider the structural equation
                               yit = Xit β + zt γ + hi θ + ci + uit .
   Now split Xit = (Xit − X̄i ) + X̄i and rewrite the equation accordingly:
                       yit = (Xit − X̄i ) β + zt γ + hi θ + X̄i β + ci + uit .
   In this equation, only hi and X̄i can correlate with ci . This implies:
     I Without controls hi , the Hausman null is H0 : Corr(X̄i , ci ) = 0.
     I With controls hi , the Hausman null is H0 : Corr(X̄i − L(X̄i |hi ), ci ) = 0, where
        X̄i − L(X̄i |hi ) is the linear projection error (the part of Xi that is left after
        controlling for hi ).
     I Hence, with a rich set of individual-specific controls hi , it is possible for
        X̄i − L(X̄i |hi ) to be uncorrelated with ci even though X̄i is correlated with ci .
     I Practical consequence: include many controls hi !
                                                                                               ECONOMETRICS
And another view ?
   Compare the structural equation
                      yit = (Xit − X̄i ) β + zt γ + hi θ + X̄i β + ci + uit .
   with the augmented equation
                   yit = (Xit − X̄i ) β + zt γ + hi θ + X̄i (δ + β) +ci + uit .
                                                            | {z }
                                                               κ
   First note that the coefficient of Xit − X̄i will be estimated consistently by RE,
         p
   β̂RE → β, because Xit − X̄i does not correlate with ci .
     I If the Hausman null H0 : Corr(X̄i − L(X̄i |hi ), ci ) = 0 holds, then the RE
                                                         p                   p
        estimator of X̄i will also converge to β: κ̂RE → β and thus δ̂RE → 0. The null is
        thus equivalent to H0 : δ = 0.
     I If the null does not hold, then the RE estimator of X̄i will not converge to β, and
        thus δ̂RE will not converge to 0. (The correlation with the disturbance – here: ci
        – leads to asymptotic bias in κ̂RE .)
                                                                                        ECONOMETRICS
Hausman variable addition test – Stata
   Classical Hausman test:
   xtreg y x1 x2 z1 z2 h1 h2, fe
   estimates store fixed
   xtreg y x1 x2 z1 z2 h1 h2, re
   hausman fixed ., sigmaless
   Compute one time average per individual (assume x1 and x2 vary with i and t):
   by id, sort: egen x1bar = mean(x1)
   by id, sort: egen x2bar = mean(x2)
   Classical Hausman variable addition test:
   xtreg y x1 x2 z1 z2 h1 h2 x1bar x2bar, re
   test x1bar x2bar
   Robust Hausman variable addition test:
   xtreg y x1 x2 z1 z2 h1 h2 x1bar x2bar, re vce(robust)
   test x1bar x2bar                                                                ECONOMETRICS
Example: Effects of job training grants on scrap rates
Example 10.4 taken from Wooldridge’s textbook
    Question: shall we use RE or FE?
    Note: regression includes two time dummies (d88 and d89) and one time-invariant
    variable (union).
    *** load data and set panel ***
    use "jtrain1.dta", clear
    xtset fcode year
    *** compute Hausman test based efficient estimate of Var(u) ***
    xtreg lscrap d88 d89 union grant grant 1, fe
    estimates store fixed
    xtreg lscrap d88 d89 union grant grant 1, re
    hausman fixed ., sigmaless
                                                                                      ECONOMETRICS
Example continued
   *** compute time averages ***
   by fcode, sort: egen gm = mean(grant)
   by fcode, sort: egen gm 1 = mean(grant 1)
   *** classical Hausman variable addition test ***
   xtreg lscrap d88 d89 union grant grant 1 gm gm 1, re
   test gm gm 1
   *** robust Hausman variable addition test ***
   xtreg lscrap d88 d89 union grant grant 1 gm gm 1, re vce(robust)
   test gm gm 1
                                                                      ECONOMETRICS
ECONOMETRICS
ECONOMETRICS
ECONOMETRICS
Coming up
     Specification tests for panel data models
                                                 ECONOMETRICS