Panel Cookbook
Panel Cookbook
Spring 2004 
1 
REGRESSION (Revision) 
Cross section:  1,   , i   N  =   (NON-ORDERED) 
Individuals, firms, countries etc. 
Time Series:  1,   , t   T  =   (ORDERED) 
 
Variables: 
y
i
 (y
t
):  DEPENDENT (Endogenous) 
x
ki
 (x
kt
):  INDEPENDENT (Exogenous),  1,   , k   K  =  
 
MODEL 
1
K
i   j   ji   i
j
y   x o      e
=
=   +   +
 
o and  are parameters. e is a stochastic error 
NOTE: The model can be written in matrix form 
y  X   e =   + ,  where 
1
N
y
y
y
   
   
   
   
=
   
   
   
   
   
, 
1
K
   
   
   
   
   
    =
   
   
   
   
       
   
, etc 
2 
The simple regression model 
i   i   i
y   x o      e =   +   +  
is used as an example in this course. We always write 
"K" as the number of exogenous variables, however 
 
ASSUMPTIONS 
1) Correct Model  E(   )   0
i
e   =  
2) Exogeneity  Cor(   ,   )   0
i   i
x e   =  
3) Homoscedasticity 
2
Var(   )
i
e   o = , constant 
4) Serial independence  Cor(   ,   )   0,
i   j
  i   j e   e   =     
5) Normality  Normal
i
  e  
6) No incidental parameters 
(K does not grow with N) 
 
(1), (2) and (6) are needed for CONSISTENCY (i.e., 
in large samples OLS parameter estimates will be 
correct "on the average") 
(3) and (4) are needed for EFFICIENCY (i.e., in 
large samples OLS yields "best" estimates, 
significance tests are correct, etc.) 
(5) is needed for small sample properties 
3 
Problem when Cor(   ,   )   0 x e     
 
X
y
True line
Observations   positive  
negative  
 
4 
FORMULAE for OLS (Ordinary Least Squares) 
2
(   )(   )
(   )
i   i
i
x   x y   y
x   x
  -   -
=
-
  y   x o    =   -  
Residual: 
  i   i
y   x e    =   -  
Error variance: 
2   2
1
   
  i
:
o   e =
  
,  
where  1 N   K : =   -   - , the degrees of freedom 
Variance; 
2
2
Var(   )
(   )
i
x   x
o
  =
-
 
Standard error; 
2
2
se(   )
(   )
i
x   x
o
  =
-
 
t-value; 
(   )
  
(   )
t
  se
=  
 
MATRIX FORMULAE 
1
  (   ) X X   X y 
  -
   
=  
2   1
Var(   )   (   ) X X    o
  -
= , etc 
5 
PANEL DATA MODELS 
,   1,   ,   ,   1,
it   it
y x   i   N t   T     =   =  
 
THE POOLED MODEL 
it   it   it
y   x o      e =   +   +  
X
y
 
Here we are NOT using any Panel information. 
The data are treated as if there was only a single 
index. 
 
6 
TRADITIONAL PANEL MODEL 
it   i   it   it
y   x o      e =   +   +  
X
y
  1
2
3
 
The constant terms, 
i
o , vary from individual to 
individual.  
This is called INDIVIDUAL (UNOBSERVED) 
HETEROGENEITY 
The slopes are, however, the same for all individuals. 
 
In both the Pooled and Panel Models we assume that 
the errors are homoscedastic and serially 
independent both within and between individuals 
2
Var(   )
it
e   o =  
Cor(   ,   )   0  when   and/or 
it   js
  i   j   t   s e   e   =        
7 
SUR MODEL 
SEEMINGLY UNRELATED REGRESSIONS 
it   i   i  it   it
y   x o      e =   +   +  
X
y
1
2
3
 
The  constant  terms, 
i
o ,  and  slopes, 
i
 ,  vary  from 
individual to individual.  
In SUR models the errors are allowed to be contem-
porally  correlated  and  heteroscedastic  between  indi-
viduals.  We  still  assume  serial  independence  as  well 
as homoscedasticity within individuals 
2
Var(   )
it   i
e   o =  
Cor(   ,   )     
it   jt   ij
e   e   o =  
Cor(   ,   )   0  when 
it   js
  t   s e   e   =     
8 
TWO COMMON SITUATIONS 
1)  There are a LARGE number of independent 
individuals observed for a FEW time periods. 
N   T >>  
N is often in the range 500 - 20,000, while T lies 
between 2 and 10. In this case it is not possible to 
estimate different individual slopes for all the 
exogenous variables.  
The PANEL DATA MODEL is most appropriate. 
2)  There are some MEDIUM length time series for 
RELATIVELY FEW, possibly dependent, 
equations (countries, firms, sectors etc) 
T   N >  
T is usually in the range 30 - 150, while N often 
lies between 2 and 15.  
In this case the SUR MODEL is appropriate.  
Efficient (SUR) estimation is used when T   N   
Equation-by-equation OLS is used if  K   T   N    < . 
Panel models are MORE general than Pooled models 
but LESS general than SUR models. 
9 
FIXED EFFECTS MODELS 
Here we treat the individual heterogeneity as N 
parameters that are to be estimated 
it   i   it   it
y   x o      e =   +   +  
1,   ,   ,   1, i   N   t   T     =   =  
N is large (and can often be increased). T is small 
and fixed. 
 
WHY CAN'T WE USE OLS? 
The individual heterogeneity can be considered as N 
dummy variables. A regression with  N   K +  
variables (so called Least Squares Dummy Variables 
(LSDV) regression) must therefore be estimated. 
There are two problems with LSDV regression. 
1)  There are INCIDENTAL PARAMETERS 
The number of 
i
o  grows as N increases. The 
usual proof of consistency therefore does not hold 
for LSDV 
2)  Inverting a  N   K + matrix can be impossible if N 
is very large. Even when possible it can be 
impracticable and/or inaccurate. 
 
10 
WE NEED A "TRICK" TO REMOVE 
THE INCIDENTAL PARAMETERS! 
The original model is 
it   i   it   it
y   x o      e =   +   +   (1) 
Averaging over the T observations for each 
individual yields 
.   .   . i   i   i   i
y   x o      e =   +   +   (2) 
where the "dot" notation is simply 
1
. i   it
T
 t
y   y =
  
, etc. 
Subtracting (2) from (1) gives 
.   .   .
(   )   (   )   (   )
it   i   it   i   it   i
y   y   x   x    e   e -   =   -   +   -   (3) 
This is called the WITHIN REGRESSION. There 
are no incidental parameters and the errors still 
satisfy the usual assumptions. We can therefore use 
LS on (3) to obtain consistent estimates. 
 
11 
The Within Regression Estimates 
To simplify the notation we define 
. it   it   i
y   y   y
  =   - , 
. it   it   i
x   x   x
  =   - , etc 
The within regression can thus be written 
it   it   it
y   x
      
   e =   +  
The estimates can thus be written 
.   .
2   2
.
(   )(   )
(   )
it it   it   i   it   i
w
it   it   i
x y   x   x   y   y
x   x   x
   
  -   -
=   =
-
   
   
 
and the individual effects can be estimated as 
,   .   .
wi   i   w i
y   x o    =   -  
12 
PROPERTIES OF THE WITHIN (FE) 
ESTIMATES 
 
w
  is consistent if either N or T become large. 
 
,
 wi
o  is only consistent when T become large. 
  The number of degrees of freedom must be 
adjusted.  
Degrees of freedom    #obs   #pars =   - , i.e. 
NT   N   K
N T   K (   1)
: =   -   -
=   -   -
   
Usual OLS programs, that are not explicitly 
designed for panel data, assume that the degrees of 
freedom are  NT   K - . Their standard errors, test 
statistics and P-values must therefore be 
corrected. 
  The parameter estimates from LSDV are the same 
as from the within regression!  
This is NOT a general result (incidental para-
meters do cause inconsistencies in many models) 
13 
THE FIXED EFFECTS MODEL:  
A SUMMARY 
1)  Calculate the within averages:
  . i
y  and 
. i
x  
2)  Calculate the differences from within averages: 
. it   it   i
y   y   y
  =   -  and 
. it   it   i
x   x   x
  =   -  
3)  Regress 
it
y
 on 
it
x
w
  and 
(   )
w
se   
4)  Estimate the individual effects (if required): 
,   .   .
wi   i   w i
y   x o    =   -  
5)  If the regression has been performed with an 
ordinary least squares program, then the degrees 
of freedom etc. must be adjusted. 
a   u
  N :   : =   -  
   
(   )   (   )
u
a   w   u   w
a
se   se
:
   
:
=     
   
(   )   (   )
a
a   w   u   w
u
t   t
:
   
:
=      
which is distributed 
a
t
:
 under H
0
 
"a" denotes ADJUSTED and "u" UNADJUSTED 
14 
RANDOM EFFECTS MODELS 
In Fixed Effects models: 
  We aren't interested in the individual effects 
  We can't estimate them consistently 
WHY BOTHER WITH THEM? 
 
The individual effects have an "empirical" 
distribution 
alfa
frequency
 
which has certain characteristics, e.g. 
1
average 
i
N
   o   o =   =
  
 
2
variance of 
o
o   o =  
15 
We can use these definitions to rewrite the panel 
data model 
(   )
it   it   i   it
y   x       o      e =   +   +   -   +  
Defining the new error: 
(   )
it   i   it
u   o      e =   -   +  
we can write 
it   it   it
y   x   u     =   +   +  
This is the RANDOM EFFECTS MODEL. 
 
This looks almost the same as the POOLED model, 
but note two differences 
  The constant term can be interpreted as the 
average individual effect 
  The error term now has a special form 
 
We can obviously estimate the RE model using OLS 
to obtain estimates of  and  
When is this consistent? 
If consistent, is it efficient? 
16 
WHEN IS THE RANDOM EFFECTS 
MODEL CONSISTENT? 
Two conditions must be fulfilled 
  E(   )   E(   )   E(   )   0
it   i   it
u   o      e =   -   +   =  
  Cov(   ,   )   Cov(   ,   )   Cov(   ,   )   0
it   it   i   it   it   it
u x   x   x o   e =   +   =  
The first condition is OK as long as the original 
errors are unbiased. 
The second condition needs x
it
 to be independent of 
e
it
 (which has already been assumed) and of o
i
. 
IS IT REASONABLE TO ASSUME THAT THE 
INDIVIDUAL EFFECTS ARE INDEPENDENT OF 
THE EXOGENOUS VARIABLES? 
EXAMPLE: 
y
it
 = # days unemployed year t 
x
it
 = income 
o
i
 = unmeasured individual propensity to be 
unemployed (depends on such factors as 
Education, Health Status etc.) 
THIS ASSUMPTION MUST BE TESTED! 
17 
IS OLS EFFICIENT IN THE RANDOM 
EFFECTS MODEL? 
Efficient OLS needs homoscedasticity and serial 
independence in the errors, u
it
. 
Remember that  (   )
it   i   it
u   o      e =   -   +  we obtain 
2   2
Var(   )
it
u
  o   e
o   o =   +   Assuming that o
i
 and e
it
 are 
independent 
Cov(   ,   )   0,
it   js
u u   j   i =      Can be assumed if all 
individuals are independent 
2
Cov(   ,   )   0
it   is
u u
  o
o =      Since o
i
 is the same for all t 
within the same individual 
The last condition violates the "serial independence" 
assumption.  
OLS is thus INEFFICIENT in the random effects 
model, and yields INCORRECT standard errors and 
tests. 
18 
EFFICIENT ESTIMATION IN THE 
RANDOM EFFECTS MODEL 
The Random Effects Model can be efficiently 
estimated using GLS (Generalised Least Squares) 
1)  Define 
1
1
  e
o
0
o
=  - , where 
2   2   2
1
  T
  o   e
o   o   o =   +  
2)  Calculate the "pseudo within differences" 
. it   it   i
y   y   y 0
*
=   - ,   
. it   it   i
x   x   x 0
*
=   -  
3)  Perform an OLS regression on 
it   it   it
y   x   u    
*   *   *   *
=   +   +  
where  (1   )    0 
*
=   -   
and 
it
u
*
 satisfies the LS assumptions 
4)  The Random Effects estimate of  is given by 
.   .
2
.
(   )(   )
(   )
it   i   it   i
re
it   i
x   x   y   y
x   x
*   *   *   *
*   *
-   -
=
-
 
19 
PROBLEM: 0 is not known 
Unfortunately 
2
e
o  and 
2
o
o  are unknown.  
If the errors u and e (or o) where known we could 
estimate the variances using 
2   2
1   .
  T
  i
N
  u o   =
  
  (4) 
2   2
1
.
(   1)
2
1
.
(   1)
  (   )
(   )
it   i
N T
it   i
N T
u   u
e
o
e   e
-
-
=   -
=   -
  (5) 
2   2
1
1
  (   )
i
N
o
o   o   o
-
=   -
  (6) 
 
Since u, e and o are unknown there are a number of 
suggestions for how they can be estimated.  
20 
These methods use various residuals instead of the 
unknown errors: 
ols
u  = RE residuals from the POOLED regression 
 
it   it   it
y   x   u     =   +   +    #obs = NT 
b
u   = RE residuals from the BETWEEN regression 
 
.   .   . i   i   i
y   x   u     =   +   +   #obs = N 
w
e   = FE residuals from the WITHIN regression 
 
it   it   it
y   x
      
   e =   +    #obs = NT 
w
u   = RE residuals from the WITHIN regression 
  = 
      
(   )
w   w   w
e   o   o +   -  
re
u   = residuals from the RE regression 
 
it   it   it
y   x   u    
*   *   *   *
=   +   +    #obs = NT 
 
21 
SOME DIFFERENT METHODS OF 
ESTIMATING 0  
I  WALLACE and HUSSAIN.  
Use 
ols
u  instead of u in (4) and (5) 
II   AMEMIYA 
Use 
w
u  in (4) and 
w
e  in (5) 
III  SWAMY and ARORA 
Use 
b
u  in (4) and 
w
e  in (5) 
IV   NERLOVE 
Use 
 w
o  in (6) and 
w
e  in (5) 
V  MAXIMUM LIKELIHOOD 
Start with one of the previous methods, estimate 
the RE parameters and then use 
re
u  to calculate 
a new 0. Iterate. 
22 
Different authors suggest different degrees-of-
freedom corrections in the variance formulae. For 
example (5) is often calculated as 
2   2
1
,
(   1)
   
  wit
N T   K
e
o   e
-  -
=
  
 
where we have also used the fact that 
  0
w
e   =  
23 
PROPERTIES 
Research has established the following: 
  There is not much difference between I - V when 
the Random Effects model is correct. 
  Only NERLOVE guarantees that 
2
  0
o
o   > . Many 
users of the other methods set  1 0 =  (Fixed Effects) 
if a negative value of 
2
o
o  is obtained.  
  It is difficult to give any general rules as to which 
method to use. SWAMY/ARORA is probably the 
most common. 
  The Random Effects estimates are more efficient 
than the Fix Effects estimates when the RE model 
is correct. They are inconsistent, however, when 
the model is incorrect. 
  It is important to test which model is correct. 
24 
INDIVIDUAL SPECIFIC VARIABLES 
In many cases we have some exogenous variables 
that vary between individuals, but which do not vary 
over time within a given individual (e.g., gender, 
race, nationality). 
Denote such an individual specific variable as q
i
 
 
In a FIX EFFECT Model we will thus write 
it   i   i   it   it
y   q   x o   j      e =   +   +   +  
The term (   )
i   i
q o   j +  does not vary over time, and 
will thus be removed by the within transformation, 
i.e., 
.   .   .
(   )   (   )   (   )
it   i   it   i   it   i
y   y   x   x    e   e -   =   -   +   -  
The parameters of the individual specific variables 
(j) cannot be estimate in the Fix Effects model  
(that is, we cannot distinguish between observed and 
unobserved heterogeneity) 
If q
i
 only varies slightly over time, and only for a few 
individuals, then j  will be estimated with poor 
precision (for example, Education, Marital Status) 
25 
In a RANDOM EFFECTS model we will write 
it   i   it   it
y   q   x   u    j    =   +   +   +  
in which case j  can be estimated (although not when 
using the NERLOVE method). 
For the Random Effects model to be appropriate, 
however, the observed heterogeneity (q) must be 
independent of the unobserved heterogeneity (o). 
The Random Effects model therefore has the added 
advantage of allowing us to estimate parameters of 
which we are probably interested 
26 
TESTING 
Hypothesis testing is central to statistical inference. 
In econometric modelling we often distinguish 
between three types of tests 
1)  SPECIFICATION TESTS 
Is the model correct? (e.g., POOLED, RE, FE, 
SUR) 
2)  MISSPECIFICATION TESTS 
Are any of the statistical assumptions violated? 
(e.g., Serial independence, Homoscedasticity) 
3)  PARAMETER TESTS 
Do the parameters have specified values? 
(e.g., is a parameter "significant") 
 
(1) and (2) are really two aspects of the same 
question - we are asking "Can the model be 
estimated efficiently using Least Squares?". They  
should be answered together 
(3) can only be addressed after (1) and (2). 
27 
PARAMETER TESTS 
There are no new problems with panel data models. 
The same principals that apply to ordinary 
regression can be applied here. 
The usual way to test hypotheses concerning the 
parameters in regression models is to use t-tests (one 
parameter) and F-tests (several parameters).  
These tests can be calculated in two ways, which give 
identical results in linear models. We use the method 
of calculation that is easiest. 
28 
Sum of Squares Tests 
We have  
  an UNRESTRICTED model, where all the 
parameters are estimated, and  
  a RESTRICTED model, where the parameters 
satisfy a number of restrictions. 
The null hypothesis (H
0
) is that the RESTRICTED 
model is true, while the alternative hypothesis (H
1
) is 
that the UNRESTRICTED model holds 
 
 
The restrictions are usually of the form that certain 
parameters are zero - i.e., some variables are not 
important.  
If the hypothesis that a single parameter is zero is 
rejected, then we say that this parameter is 
significant (more correctly: "the parameter is 
significantly different from zero at the given 
significance level ") 
29 
The unrestricted model, containing p
u
 parameters, is 
estimated using LS. The residual sum of squares is 
calculated 
2
,
u i
URSS   e =
 
The restricted model, which has p
r
 (<p
u
) parameters, 
is also estimated using LS. The residual sum of 
squares is also calculated here 
2
,
r i
RRSS   e =
 
Defining  
1
  # restrictions
u   r
p   p :   =   -   =  
2
  # observations
  u
p :   =   -  
we calculate the test statistic 
1   2
1
,   0
2
(   )
 under 
RRSS URSS
F   H
URSS
  F 
  :   :
:
:
-
=  
  The SS test is derived through a small sample 
adjustment of the Likelihood Ratio Test 
  If 
1
  1 :   =  then the square root of F is t-distributed 
under the null. 
30 
Standard Error Tests 
The t-test for a single parameter can also be written 
1
  0
( ) under : 0
(   )
  v
t   H
se
  t 
=   =   
  This is a small sample adjustment of the Wald 
Test. 
  The F-test for multiple parameters can also be 
derived from the equivalent Wald test, but is now 
expressed in matrix terms. 
In general  
  tests for single parameters are easiest to use in 
standard error form, while  
  tests for several parameters are calculated easiest 
in sum-of-squares form. 
 
31 
CRITICAL VALUES AND P-VALUES 
The traditional way to test a hypothesis is to see 
whether the t-statistic is greater than the critical 
value for a given significance level 
-3   -2   -1   0   1   2   3
2.5%
  2.5%
t(b)
 
The null hypothesis is rejected if 
(   ) t   c
o
   > . 
In the above diagram o = 5%, c
o
 = 1.96 and t = 1.51, 
i.e. the hypothesis is not rejected. 
32 
This method of describing test results is not very 
informative, however.  
The reader is dependent on what significance level 
the author considers to be interesting. At best the 
author will use the "star" convention (one, two or 
three stars to show significance at the 5%, 1% and 
0.1% levels) 
An alternative, which is gaining more and more 
popularity, is to quote P-VALUES 
 
-3   -2   -1   0   1   2   3
6.3%
  6.3%
t(b)
 
The P-Value is 
   
(   |   (   )| )   (   |   (   )| ) P t   t   P t   t     >   +   <- .  
The null hypothesis is rejected if the P-value is less 
than the significance level. 
In the above diagram the P-value is 12.6%, which is 
greater than 5%. The hypothesis would therefore not 
be rejected at the 5% level. 
33 
SPECIFICATION TESTS 
So far we have considered four models for panel 
data: 
SUR
it   i   i  it   it
y   x o      e =   +   +  
FIXED EFFECTS
it   it   it
y   x
      
   e =   +  
RANDOM EFFECTS
it   it   it
y   x   u    
*   *   *   *
=   +   +  
POOLED
it   it   it
y   x o      e =   +   +  
The error terms in these models satisfy the OLS 
assumptions IF the respective model is correct (this 
is why we have expressed the FE and RE models in 
their transformed form).  
We will not be considering the SUR model in detail, 
since this is not possible to estimate if T is small. 
34 
TESTS FOR VARIOUS MODELS 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
FIXED 
EFFECTS 
RANDOM 
EFFECTS 
POOLED 
Chow 
Test 
Hausman  
Test 
LM 
Test 
35 
A TEST FOR UNOBSERVED 
HETEROGENEITY 
The CHOW TEST of the POOLED MODEL 
against the FIX EFFECTS MODEL. 
 
 
H
0
: POOLED MODEL (Restricted) 
H
1
: FIX EFFECTS MODEL (Unrestricted) 
The URSS is calculated using the residuals from the 
Within Regression (
 w
ols
e ).The number of parameters 
is  1
r
p   K =   + . 
The number of observations is NT in both cases. 
36 
The Sum-of-Squares test of H
0
 is thus 
(   ) (   1)
(   )
RRSS URSS   N
CHOW
  URSS NT   N   K
-   -
=
-   -
 
which is distributed 
1, N   NT N K
F
  -   -   -
 under H
0
. 
This test is called a CHOW test because of its 
similarity to the well known CHOW test for 
parameter stability. 
37 
INDIVIDUAL SPECIFIC VARIABLES 
If there are p
q
 individual specific variables in the 
model, then these are INCLUDED in the POOLED 
model, but EXCLUDED from the FIXED EFFECTS 
model. 
This is reasonable, since we want to test for 
unobserved heterogeneity, not observed 
heterogeneity! 
In this case we must use  1
r   q
p   K   p =   +  +  and 
1
  1
q
N   p :   =   -   -  in the Chow test. 
38 
RANDOM or FIX EFFECTS?  
The HAUSMAN TEST 
The Hausman test is a general test procedure which 
is used when we want to test the validity of an 
assumption that is necessary for efficient estimation. 
For the test to work we need two estimation 
methods: 
METHOD 1, called
 
a
 , is both consistent and 
efficient under H
0
, but is inconsistent under H
1
. 
METHOD 2, called
 
b
 , is consistent under both H
0
 
and H
1
, but is inefficient under H
0
. 
39 
If there is only one parameter to be tested, then the 
test statistic is very simple 
2
2
0
1
2   2
   
(   )
  under H
b   a
b   a
h
  s   s
  
   
_
-
=
-
 
where s
a
 and s
b
 are the standard errors of the 
parameter estimates.  
 
  Although 
2   2
b   a
o   o >  under the null, this relation 
need not hold in small samples for the standard 
errors. If 
2   2
b   a
s   s <  the test is not applicable. 
  If there are J  > 1 parameters to be compared, the 
Hausman test statistic must be expressed in matrix 
terms and is distributed 
2
J
_ . 
  There often exists an "omitted variables" version 
of the Hausman test, which has the same 
asymptotic properties and which is never negative 
40 
 The Hausman test for RE vs. FE  
In the case of testing for random effects we have the 
following situation 
H
0
: Random Effects model  [Cor(   ,   )   0
i   it
x o   = ] 
H
1
: Fix Effects model   [Cor(   ,   )   0
i   it
x o    ] 
Our estimates satisfy the Hausman conditions 
re
  is consistent and efficient under H
0
, but 
inconsistent under H
1
 
w
  is consistent under H
0
 and H
1
, but inefficient 
under H
0
 
 
The Hausman test can now be calculated in matrix 
terms or through an omitted variables procedure 
41 
OMITTED VARIABLES VERSION 
Define as before 
. it   it   i
x   x   x
      j
*   *   *
=   +   +   +  
The alternative Hausman test is a simple 
F-test that j  is zero.  
This is appropriate since 
0   0
H  :   0   H  : Cor(   ,   )   0
i   it
x j   o =      =  
 
If there are individual specific variables we simply 
test 
0
H  :   0 j =  in the regression 
it   i   it   it   it
y   q   x   x   w
         j
*   *   *   *
=   +   +   +   +  
Note now that we must assume that the q
i
 are 
independent of the o
i
 (this is not testable). 
42 
POOLED or RANDOM EFFECTS?  
The BREUSCH-PAGAN LM TEST 
The RE model reduces to the POOLED model if the 
variance of the individual effects becomes zero. The 
hypothesis we wish to test is thus 
H
0
: 
2
0
o
o   =  
H
1
: 
2
0
o
o   >  
LM tests are useful when it is easy to estimate the 
model under the null (here the POOLED model) and 
more complicated under the alternative (here the RE 
model) 
The Breusch-Pagan statistic is calculated using the 
OLS residuals from the pooled model (
  
ols
e   e = ) 
2
2   2
.
  2
0
1
2
  1    under H
2(   1)
  i
it
T   e
NT
LM
  T
  e
   _
   
   
   
=   -
   
    -
     
   
 
43 
  Unfortunately, the Breusch-Pagan test is two-sided 
against the alternative 
2
0
o
o    , in spite of the fact 
that we know that variances cannot be negative. 
  An improvement suggested by HONDA is to use 
the one-sided test 
2   2
.
0
2
  1   (0,1) under H
2(   1)
  i
it
T   e
NT
HONDA   N
T
  e
  
   
   
   
=      -
   
    -
     
   
 
A one-sided P-value is calculated;  (   ) P x   HONDA >  
  Another problem is that LM tests often have low 
power 
Experiments have shown that in many cases it is 
better to use the CHOW test for FE against 
POOLED even if we suspect that RE is the correct 
alternative 
44 
SUR MODELS 
SUR models can easily be tested against FE and 
POOLED models using Chow tests, if we assume 
homoscedasticity and independence between and 
within individuals. 
SUR models can be estimated when there is 
heteroscedasticity and/or correlation between 
individuals. In this case we must adapt the Chow 
tests. 
Testing SUR against RE always needs generalised 
Chow tests. 
45 
MISSPECIFICATION TESTS 
When T is small it is very difficult to investigate the 
time series properties of panel data model. This is 
quite possible when T gets larger, however. 
Misspecification testing should be performed on the 
most general model being considered  
SMALL T  
Autocorrelation can only be tested with great 
difficulty 
Heteroscedasticity can be tested, but it is difficult to 
distinguish within individual from between 
individual differences. 
46 
TEST FOR HETEROSCEDASTICITY 
The proposed test is the Bickel version of the 
Breusch-Pagan test. This tests for both within and 
between heteroscedasticity, and is performed in 
three steps 
1)  Estimate the within regression. Obtain the 
residuals (
it
it
e ) 
2)  Calculate the total residual variance  
2   2
1
  
it
NT N K
s   e
-   -
=
  
 
Remembering that 
.
  0
i
e   = , calculate the within 
individual variances 
2   2
1
1
  
i   it
T
  t
s   e
-
=
  
 
3)  Calculate the Bartlett statistic 
i
N
T   N   s   s
B
  N   T
2   2
2
1   0
(   1)[   ln   ln   ]
 under H
1   {(   1)  3(   1)}
  _
  -
-   -
=
+   +   -
  
Bickel's test can also be used if we suspect 
heteroscedasticity within individuals. 
48 
TESTS FOR AUTOCORRELATION 
The first order within individual autocorrelation 
coefficient is calculated from the within regression 
residuals 
,   1
1   2
2
1   2
  
N   T
it i t
i   t
N   T
it
i   t
r
  e  e
e
-
=   =
=   =
=
 
The simplest test is the LM test due to Breusch and 
Godfrey 
2
0
(0,1) under H
1
NT
LM   r   N
T
   =   
-
 
The autocorrelation coefficient is known to have a 
slow convergence to normality, however, so a 
superior alternative is probably a test due to Fisher 
0
1
ln   (0,1) under H
2   1
NT   N   K   r
z   N
r
 
-   -   +
=   
-
 
49 
ROBUST STANDARD ERRORS 
If we discover (or even suspect) heteroscedasticity or 
serial autocorrelation we must decide what to do.  
One approach is to try and model these variances 
and/or correlations. This can be difficult even for 
large T, and is generally impossible for small T. 
An alternative approach is to accept the usual 
estimates, but to calculate their so called Robust 
Standard Errors. 
  If we only suspect heteroscedasticity then we can 
use WHITE'S ROBUST ERRORS. 
  If we suspect heteroscedasticity and/or within 
individual autocorrelation we can use 
ARELLANO'S ROBUST ERRORS 
50 
WHITE'S method is a standard approach performed 
by most econometric software.  
The robust variances estimate is given for a fixed 
effects model with one exogenous variable as 
2   2
2   2
Var(   )
(   )
it it
it
x
x
 
e
  =
 
 
where the residuals and variables are from the 
within regression.  
In the general case with K exogenous variables the 
variance-covariance matrix is given by 
  1   2   1
 
Var(   )   (   )   (   )(   )
it   it   it
X X   X X   X X
               
   e
-   -
      
=
  
 
where  X
 is the (   ) NT  K   "difference-from-mean" 
matrix of all the exogenous variables and 
it
X
 is the 
(1   ) K   row vector of variables for a given 
observation. 
51 
ARELLANO'S method is not standard. 
For one variable we obtain  
2
2   2
( )
Var(   )
(   )
it it
i   t
it
x
x
   
e
  =
 
 
while in the general case we have 
  1   1
  
Var(   )   (   )   (   )(   )
i i i   i
X X   X   X   X X
               
  
   e e
-   -
         
=
  
 
where 
i
X
 is the (   ) T  K   "difference-from-mean" 
matrix of exogenous variables, and 
e  is the (   1) T  
vector of residuals, for the i
th
 individual 
52 
STRATEGY 
1)  Test for Heteroscedasticity and Serial 
Correlation in the most general model available 
(SUR if possible, FE otherwise) 
2)  If there is no violation of the assumptions we test 
a) RE vs. FE   (Hausman)  
b) POOLED vs. FE  (Chow) 
If (b) not significant  Use POOLED model 
If (b) significant but (a) is not  Use RE model 
If both tests significant  Use FE model 
If T is large we can also test the FE model against 
SUR using a generalised Chow test or Wald test 
of 
i
  i     =   " . 
53 
3)  If the assumptions are violated 
a) For small T estimate the FE model with 
Arellano standard errors 
b) For medium T use the following strategy in the 
FE model. Test against pooled after 
i)  Adjusting for autocorrelation by making 
the model dynamic 
ii)  Adjusting for heteroscedasticity between 
individuals by using weighted least squares  
c) For large T estimate the SUR model. Test 
against FE and pooled after making the model 
dynamic 
Estimating SUR with the restriction 
i
  i     =   "  
is sometimes called Park's model.  
54 
ONE-WAY PANEL MODEL 
We have written the one-way panel model as 
it   i   it   it
y   x o      e =   +   +   (1) 
This is often rewritten as 
it   i   it   it
y   x o         e =   +   +   +   (2a) 
1
where
0
and
i
i
i   i
N
o   o
o o
=   -
  
  (2b) 
o  is the AVERAGE individual effect, while 
i
  is the 
individual DEVIATION FROM AVERAGE. 
 
(2) seems to be just a complicated way of writing (1).  
BUT it has the advantage that it can easily be 
extended to two-way models. 
55 
TWO-WAY PANEL MODELS 
In the one-way model we assume that there exists an 
unobserved individual heterogeneity, but that the 
model is homogeneous over time. 
Is it reasonable to assume that all time heterogeneity 
can be captured using observed explanatory 
variables? 
 
Assume that the individual and time effects are 
additive, i.e. there is no interaction, 
X
y
  i=1,t=1
i=2,t=1
i=2,t=2
i=1,t=2
 
This is the Two-Way Panel Model 
56 
The Two-Way Panel Model is written 
it   i   t   it   it
y   x o      A      e =   +   +   +   +   (3a) 
where   0,   0
i   t
i   t
   A =   =
   
  (3b) 
We can define the individual/time effect as 
it   i   t
o   o      A =   +   +   (4) 
Using the usual "dot" notation we obtain 
1
it
i   t
NT
o   o   o =   
  
  average effect  (5a) 
1
i   i   it
t
T
o      o   o +   =   
  
  individual effect  (5b) 
1
t   t   it
i
N
o   A   o   o +   =   
  
  time effect  (5c) 
Note that some programmes report the individual 
effects as 
i 
o  , while others report 
i
  
Note also that we can substitute (5) into (4) to obtain 
0
it   i   t       
o   o   o   o -   -   +   =   (6) 
57 
THE TWO-WAY MODEL WITH 
FIXED EFFECTS 
The Two-Way model (3) has incidental parameters 
as either N or T go to infinity.  
We need a new "within" transformation to remove 
these. We can see from (6) how this can be done 
it   it   i   t
y   y   y   y   y
      
  =   -   -   +    
The Two-Way Within Model can thus be written 
OLS estimates 
it   it   it   w
y   x
      
      
   e    =   +     
The average, individual and time effects can now be 
estimated 
w   w
y   x
   
o    =   -  
,
  
wi   i   w i
y   x
      
o    =   -  
,
  
w t   t   w  t
y   x
      
o    =   -  
58 
 
 w
o  and 
w
  are consistent as either N or T   
,
 wi 
o  is only T-consistent 
,
 w t 
o  is only N-consistent 
  The Two-Way within transformation removes 
both observed and unobserved heterogeneity, for 
both individual and time effects.  
A dummy for an "oil-shock" or a "flu epidemic" 
will disappear in a FE estimation 
  If T is small then the 2-Way FE model can easily 
be estimated using a 1-Way program. We write 
1
1
T
it   i   it   s  st   it
s
y   x   D o         A   e
-
=
=   +   +   +   +
,   (7) 
where D
s
 are dummies for year s. We can simply 
treat these dummies as explanatory variables. 
(Note that A is now defined as the difference from 
year T, not difference from average.) 
59 
ONE and TWO-WAY MODELS with 
FIXED and RANDOM EFFECTS 
A One-Way model has either fixed or random 
effects. Let 
{   }
F
 , {   }
R
 , {   }
F
A  and {   }
R
A  
denote the One-Way fixed and random models for 
individual and time effects.  
In a Two-Way model both the individual effects and 
the time effects can be fixed or random. Let 
{   } ,
F   F
   A  denote the fully FE model 
{   } ,
R   F
   A  and {   } ,
F   R
   A  denote mixed FE/RE model 
{   } ,
R   R
   A  denote the fully RE model 
  Estimation of the One-Way and fully FE Two-Way 
models have been described earlier 
 
60 
THE FULLY RANDOM EFFECTS 
TWO-WAY MODEL  
The model can be written 
it   it   it
y   x   u o    =   +   + , with 
it   i   t   it
u      A   e =   +   + , 
where , A, e and x are independent 
 
OLS will be consistent but inefficient. The efficient 
estimate is obtained by regressing  y
**
 on  x
**
, where 
1   2   3 it   it   i   t
y   y   y   y   y
      
0   0   0
**
=   -   -   +  
and where 
1
1
1
  e
o
0
o
=  -  with 
2   2   2
1
  T
     e
o   o   o =   + ,  (8a) 
2
2
1
  e
o
0
o
=  -  with 
2   2   2
2
  N
  A   e
o   o   o =   +  and  (8b) 
3   1   2
3
  1
e
o
0   0   0
o
=   +   +   -  with 
2   2   2   2
3   1   2   e
o   o   o   o =   +   -   (8c) 
61 
PROBLEM: The 0  are unknown. 
If the two-way errors u and e  where known then 
2   2
1
  1
  T
  i
N
  i
u
o
-
=
  
  (9a) 
2   2
2
  1
  N
  t
T
  t
u
o
-
=
  
  (9b) 
2   2
1
(   1)(   1)
  it
N   T
  i   t
e
o   e
-   -
=
   
  (9c) 
Similar alternatives as in One-Way RE are available 
  WALLACE uses OLS residuals 
  AMEMIYA uses within residuals 
  SWAMY/ARORA uses the between individual and 
between time residuals for (9a) and (9b) and 
within residuals for (9c) 
  NERLOVE estimates 
2
o  and 
2
A
o Error! Not a valid link. 
directly from the FE model, and uses within 
residuals for (9c). 
  + more complicated alternatives 
  It is common to adjust the denominators of (9) for 
degrees-of-freedom 
62 
  Negative estimates of 
2
o  and 
2
A
o Error! Not a valid link. are 
possible for all methods except NERLOVE. A 
common procedure is to use 
2
max(   , 0) o . 
63 
MIXED FE/RE TWO-WAY MODELS  
If the number of time periods (or individuals) is 
small then a mixed model can be estimated by using 
RE on a One-Way model with dummies (as in (7)) 
Otherwise we proceed as follows 
1) Adjust the model for the fixed effects  
2) Adjust for the random effects  
3) Regress adjusted y on adjusted x (no constant) 
 
Step  Action 
{   } ,
R   F
   A   {   } ,
F   R
   A  
1)  Within 
transformation 
it   it   t
y   y   y
=   -  
it   it   i
y   y   y
  =   -  
2)  RE 
transformation 
1 it   it   i
y   y   y
      
0
*
=   -
 
2 it   it   t
y   y   y
      
0
*
=   -
 
  Theta 
estimation 
0
1
 from (8a) 
and (9a) 
0
2
 from (8b) 
and (9b) 
3)  RE regression 
(no constant) 
it
y
*
 on 
it
x
*
 
it
y
*
 on 
it
x
*
 
 
Note that 
i   i
y   y   y
      
=   -  and 
t   t
y   y   y
      
  =   -  
64 
PARAMETER TESTS 
There are 9 different models when we allow for the 
possibility of both individual and time effects 
{   } ,
F   F
   A  
{   } ,
R   F
   A   {   }
F
A  
{   } ,
F   R
   A   {   } ,
R   R
   A   {   }
R
A  
{   }
F
   {   }
R
   POOLED 
 
We must therefore choose: 
  The level (2-way, 1-way, pooled).  
  CHOW tests for FE 
  LM tests for RE  
  The type of effects (FE, RE) 
  HAUSMAN tests for given level 
 
This is a "chicken-egg" problem. But 
  LM tests have poor power in small samples and 
are complicated to adjust. 
  Chow tests have good power even in RE models 
65 
TWO-WAY CHOW TESTS 
The best power is obtained by always testing against 
the unrestricted two-way model  
Model  # Parameters (p)  RSS 
IT 
{   } ,
F   F
   A   1 N T   K +   +   -  
I T
RSS  
I 
{   }
F
   N   K +  
I
RSS  
T 
{   }
F
A   T   K +  
T
RSS  
0  POOLED  K 
0
RSS  
 
We perform three Chow tests, for m = 0, T, I 
(   ) /(   1   )
/[(   1)(   1)   ]
m   I T   m
m
I T
RSS   RSS   N T   K   p
CHOW
  RSS   N   T   K
-   +   +   -  -
=
-   -   -
 
 
Reject 0/I T 
something 
Reject T/I T 
individual 
Reject I /I T 
time 
Conclusion 
YES  YES  YES  2-way 
YES  YES   NO   Individual 
YES  NO  YES  Time 
NO  NO  NO  Pooled 
YES  NO  NO  ?? (2-way) 
NO  YES  NO  ? (Individ.) 
NO  NO  YES  ? (Time) 
NO  YES  YES  ?? (2-way) 
66 
TWO-WAY HAUSMAN TESTS 
The best power is obtained by always testing against 
the fully FE model  The omitted variables variant of 
the tests are as follow 
1) To test {   } ,
R   R
   A  against {   } ,
F   F
   A  
Regress  y
**
 on  x
**
,  x
  
2) To test {   } ,
R   F
   A  against {   } ,
F   F
   A  
Regress  y
*
 on  x
*
,  x
  
3) To test {   } ,
F   R
   A  against {   } ,
F   F
   A  
Regress  y
*
 on  x
*
,  x
 
 
Reject (1) 
some FE 
Reject (2) 
ind. FE 
Reject (3) 
time FE 
Conclusion 
YES  YES  YES 
{   } ,
F   F
   A  
YES  YES   NO 
{   } ,
F   R
   A  
YES  NO   YES 
{   } ,
R   F
   A  
NO  NO  NO 
{   } ,
R   R
   A  
YES  NO  NO  ?? {   } ,
F   F
   A   
NO  YES  NO  ? {   } ,
F   R
   A  
NO  NO  YES  ? {   } ,
R   F
   A  
NO  YES  YES  ?? {   } ,
F   F
   A  
 
67 
STRATEGY 
1)  Test for Heteroscedasticity and Serial 
Correlation in Two-Way FE model 
2)  If there is no violation of the assumptions we  
a)  First choose level with CHOW tests 
b)  Then decide RE/FE with Hausman tests 
3)  If the assumptions are violated: see p. 53. 
68 
INCOMPLETE PANELS 
Panel data studies where all individuals are observed 
at each time period are called COMPLETE.  
INCOMPLETE surveys are those with missing data. 
These can occur for several reasons  
1) We can plan our survey so that it is incomplete. 
We have DETERMINISTIC missing data 
2) The missing data is unplanned, but the selection 
rule is independent of the data (observed and 
unobserved). We have RANDOMLY missing data. 
3) There is a correlation between the selection rule 
and the data. There is a SELECTION BIAS 
 
Complete surveys are BALANCED, i.e. each 
individual and each time period is observed equally 
often (N and T respectively). 
Stochastic missing data is UNBALANCED, while 
deterministic missing data can be either. 
69 
  DETERMINISTIC and RANDOM missing values 
are methodologically equivalent  
  UNBALANCED models without selection bias 
only cause technical problems 
  The data for unbalanced panels is written 
{   } ,
it   it
y x  for  1,   ,   ,   1,   ,
 i
i   N   t   T     =   =  
  SELECTION BIAS is a serious problem that 
needs complicated estimation methods in panel 
data models 
  Missing data is often caused by ATTRITION; the 
tendency of individuals to drop out of surveys that 
stretch over many periods. We often suspect that 
the causes of attrition are correlated with the data. 
70 
UNBALANCED PANELS 
Assumption: There is no selection bias 
ONE-WAY FIXED EFFECTS 
  The individual means are redefined: 
1
i   it
i
  t
y   y
T
=
  
 
  As in the balanced model we regress  y
on x
 
ONE-WAY RANDOM EFFECTS 
  The GLS transformation is now: 
it   it   i  i
y   y   y
0
*
=   -  
with  1
i
i
e
o
0
u
=  -  and 
2   2   2
i   i
T
     e
u   o   o =   +  
  The variances can be estimated consistently as  
2   2
1
(   1)
  
  it
N T   K
  
e
o   e
-  -
=
  
, where 
1
i
N
T   T =
  
 
2
b
  RSS
N   K
o   =
-
 from regressing 
i   i
T y
 on 
i   i
T x
 
2   2
2
     
  b
T
  
o   o
o
  -
=  
  The estimates are obtained by regressing  y
*
 on  x
*
 
71 
TESTING Chow and Hausman (omitted variables) 
tests as before. LM tests must be adjusted slightly 
TWO-WAY MODELS are messy, but not difficult 
SELECTION BIAS models are difficult to estimate 
(we need numerical integration). Some simple 
specification tests exist, however 
  Hausman-type tests are available if we estimate 
the model in the full (unbalanced) sample and a 
balanced sub-sample. The two methods should 
give the same results if there is no selection bias 
  Omitted variable tests can be used with such extra 
variables as 
  # times i
th
 individual is in sample 
  dummy for whether i
th
 individual is in the whole 
sample  
  dummy for whether i
th
 individual was present 
in the previous period 
72 
ROTATING PANELS 
Surveyors are wary of designs where individuals 
have to answer questions many times over a long 
period of time. This often leads to a large degree of 
attrition, which can very well include selection bias 
One method of avoiding this is to introduce a 
deterministic attrition. By only interviewing each 
individual a few times we hope to reduce the 
stochastic attrition. 
The most common deterministic design is the 
method of ROTATING PANELS.  
  Period 1  Period 2  Period 3  Period 4 
Wave 1  N  N/2     
Wave 2    N/2  N/2   
Wave 3      N/2  N/2 
Wave 4        N/2 
 
73 
DYNAMIC MODELS 
Dynamic models include lagged values of the 
endogenous variable on the RHS (they can also 
include lagged exogenous variables) 
,   1 it   i   i t   it   it
y   y   x o         e
-
=   +   +   +  
it
e  is assumed independent of 
,   1
,
it   i t
x y
  -
 
 
it
e  and 
it
y  are dependent 
 
it
e  is correlated with 
i
y
w
  is only T-consistent, not N-consistent 
 
The standard way of estimating models with 
correlation between the errors and the RHS 
variables is to use the INSTRUMENTAL 
VARIABLES method 
74 
INSTRUMENTAL VARIABLES (IV) 
Consider a simple linear regression 
y   x o      e =   +   +  
where E(   )   0 x e    . 
A variable z is called an instrumental variable if 
E(   )   0 z e   =  and Cov(   ,   )   0 x z    
The IV estimate of  is given by 
(   )(   )
(   )(   )
I V
  y  y z  z
x   x z  z
  -   -
=
-   -
 
  In matrix terms 
1
  (   '   )   '
IV
  Z X   Z y 
  -
=  if there are as 
many instruments as RHS variables. 
  If there are more instruments than RHS variables 
then 
1
      
(   '   )   '
IV
  X X   X y 
  -
= , where 
 
explanatory variables x:  1 k   
Model:  (   1)   (   ) P y   F   x 
=   =  
 
j   j
x   x    
, where x
1
 is usually the constant 
  Note that y is a binomially distributed 
   for given x with  (   ) P   F   x 
=  
E(   |   ) y x   P    =  and  Var(   |   )   (1   ) y x   P   P =   -  
 
79 
MARGINAL EFFECTS 
The change in the dependent variable (y) for a given 
change in an explanatory variable (
  j
x ) is called the 
MARGINAL EFFECT of that variable. 
If 
j
x  is continuous: 
E(   )
c
j
y
ME
  x
 
If 
j
x  is a dummy: 
E(   |   1)   E(   |   0)
d   j   j
ME   y x   y x =   =   -   =  
In the LINEAR model 
c   d
ME   ME    =   =  
In NONLINEAR models 
  ME      
  ME is a function of x 
 
c   d
ME   ME  , but often 
c   d
ME   ME   
  We are not always so interested in  
 
80 
LINEAR PROBABILITY MODEL 
(LPM) 
The simplest binary choice model is to assume that F 
is linear 
(   )   E(   |   ) F   x   x   y x   x       
      
=      =  
i   i   i
y   x    e
   =   +  
 We can fit a linear regression, treating y as an 
ordinary variable.  
 
A technical problem is that e is heteroscedastic, since 
Var(   )   (1   )   (   )(1   )   constant
i   i   i
P   P   x   x e      
   
=   -   =   -     
 
LPM can be estimated using 
  OLS, with White's robust variance estimates 
  WLS 
  ML 
81 
A more serious problem is the assumption behind 
the model. 
Plotting y against x in a model with only one 
explanatory variable yields 
0
1
 
The OLS line estimates  (   1)   0 P y=   <  for some values 
of x, and  (   1)   1 P y=   >  for some others!! 
82 
PROBIT and LOGIT MODELS 
Two commonly used functions that always lie in the 
interval (0,1) are 
PROBIT:  (   )   (   ) F   x   x    d 
   
= , the standard normal 
distribution function 
LOGIT:  (   )
1
x
x
e
F   x
  e
  =
+
 
The LOGIT function was first proposed as an 
approximation to the PROBIT. In most cases they 
give very similar results. 
 
83 
LATENT VARIABLE INTERPRETATION 
There is an alternative interpretation to these binary 
choice models, which is in some ways more attractive 
Assume that there is a "true", but unobservable, 
variable  y
*
, e.g. the propensity to be sick. There is 
also an observed variable y, the incidence of being 
sick 
The latent variable is explained in an ordinary linear 
regression 
y   x    e
*
=   + ,  (1) 
and the observed variable is given by  
0   0
1   0
y
y
  y
*
*
<
  (2) 
PROBIT  e   is normally distributed 
LOGIT  e   is logistically distributed 
84 
IDENTIFICATION PROBLEMS 
Multiplying (1) by a positive constant  
 the sign of y* is unchanged 
 y is unchanged 
  is unidentified  
However 
  The sign of 
j
 and the ratio 
j   
     are identified,. 
  The marginal effects are also identified 
  The probit model is normalised by letting e be 
standard normally distributed (
  2
1 o   = ) 
  Imposing the logistic distribution normalises  
  These normalised parameters are related 
1.6   4
logit   probit   LPM
           , except for the constant 
term where  1.6   4   2
logit   probit   LPM
             - . 
85 
ML ESTIMATION 
The nonlinear probit and logit models are estimated 
using maximum likelihood 
1   0
1
Likelihood   Joint Probability of sample
(   1)   (   0)
[   (   )]   [1   (   )]
i   i
i   i
i   i
y   y
y   y
i   i
i
P y   P y
F   x   F   x    
=   =
-
=
=   =   =
   
=   -
   
 
and thus 
{   }
log likelihood   ln   (   )   (1   )ln[1   (   )]
i   i   i   i
i
  y   F   x   y   F   x    
   
=   +   -   -
 
This is easy to maximise iteratively for LOGIT and 
PROBIT models 
86 
PREDICTIONS  
What do we mean by a prediction from a binary 
choice model? The intuitive 
( ) F x
 is a prediction 
of  (   1) P y= , not of y. 
The standard definition is 
0 0
1   0
y
y
  y
*
*
<
 
where 
y   x 
*
= . Note that 
   
0   (   )   0.5 y   F y
*   *
        
This rule seems reasonable if  0.5
obs
P    , where 
obs
P  
is the observed proportion of ones amongst the y's. It 
will however lead to nearly all the prediction being 
zeroes or ones if 
obs
P  is small (large)  
An alternative definition is therefore 
0 ( )
1   (   )
  obs
obs
F y   P
y
  F y   P
*
*
<
 
The proportion 
Cor(   ,   ) y y  
2) Effron's
  2
2
2
(   )
1
(   )
y  y
R
  y  y
  -
=  -
-
 
3) McFadden's
  2
0
ln
1
ln
 L
R
  L
=  - ,  
where L is the likelihood from the estimated model 
and L
0
 is from the model with only a constant 
4) 
# correct predictions (   )
i   i
y   y
N
  =
 
The first two measures reduce to the ordinary R
2
 
measure in a linear model. 
We can of course replace 
   
with  y   y in all except (3) 
88 
RESULTS IN LIMDEP 
LIMDEP includes the following output when using 
the PROBIT/LOGIT commands 
1) LPM (start values) 
2) PROBIT/LOGIT model 
3) Measure of fit (4) 
4) LogL and LogL0  Measure of fit (3) 
One can request the marginal effects 
1)  (   )
c
ME x ; evaluated at the average of the x's 
2)  (   )
c   s
ME x ; evaluated at strata averages  
Note that  (   )
c   i
ME x  and  (   )
d   i
ME x must be explicitly 
calculated 
One can also save 
1) The predictions 
y 
2) The residuals  y  y -  
3) The probabilities 
   
(   ) F y   y
*
  
89 
PANEL DATA LOGIT/PROBIT 
A panel data binary choice model can be written 
it   i   it   it
y   x o         e
*
=   +   +   + ,   
where the observed variable is given as usual by  
0   0
1   0
y
y
  y
*
*
<
   
There are two problem here 
1) FIXED EFFECTS: The incidental parameters 
cannot be swept away by a simple transformation 
of the data 
2) RANDOM EFFECTS: Maximising the likelihood 
involves numerical integration over T-dimensions 
90 
Fixed Effect Panel Logit - Chamberlain's Approach 
Chamberlain has shown that the incidental 
parameters are removed if the likelihood is 
conditioned on 
it
t
 y
. 
The number of observations and regressors available 
are substantially reduced in the FE logit model 
  Individuals that have the same y for all time 
periods don't contribute to the likelihood and can 
be removed. 
  As for all fixed effects models we cannot include 
individual specific regressors. 
  As for all unbalanced panels we can remove all 
individuals that are only observed once 
Other "problems" are 
 There is no obvious way of estimating the 
individual effects or the marginal effects 
 It would, be possible to estimate the marginal 
effects for the "average" person, i.e., when  0
i
  = .  
LIMDEP doesn't do this, however. 
 No "conditional ML" Probit is available 
91 
Random Effect Panel Probit and Logit 
The full RE model is not possible to estimate. The 
usual approach is to impose the "equi-correlation" 
restriction 
2
2   2
Cor(   ,   )
it   is
u u
  
   e
o
o   o
=   
+
 
The Probit model also assumes that  (0,1) N  e  
The Logit model assumes that e is logistic distr. 
Both assume that 
i
 is normally distributed 
The marginal effects are difficult to estimate since 
E(   ) y  is now highly nonlinear. Replacing 
i
 with its 
expected value (zero) in E(   ) y  leads to the usual 
probit/logit formulae, however. 
TESTS 
Testing FE vs. RE vs. POOLED can be performed 
with Hausman tests. In the Probit model a Wald test 
is also available for RE vs. POOLED (   0  = ). 
LRT or F-tests are used for parameter testing. 
92 
OTHER LIMDEP MODELS 
Multinomial Logit (MNL) 
Each individual "chooses" between alternatives 
0,1,   , J  . Thus 
i
y   j =  if alternative j is chosen 
The explanatory variables x
ij
 are of two types 
z
ij
: the choice specific attributes 
w
i
: the individual specific characteristics 
The multinomial logit model is written 
0
1
(   )
  ij   j  i
i   i   i
z   w
i
  J
z   z   w
e
P y   j
e   e
     
   j
      j
   
+
      
+
=
=   =
+
 
where 
0
j  is normalised to 0. 
Note that the attributes have choice independent 
parameters, while the characteristics can have choice 
dependent parameters. 
93 
A property of the MNL model is the so called 
Independence of Irrelevant Alternatives (IIA). When 
choosing between alternatives 1 and 2 it does not 
matter if alternative 3 exists or not. 
The multinomial Probit model (MNP) does not 
impose IIA automatically. Estimating MNP needs 
Monte Carlo integration, however.  
In LIMDEP we estimate MNL models using 
  LOGIT if there are no attributes (   0
ij
z   = ) 
  NLOGIT if the characteristics have 
j
j   j =  
  NLOGIT + choice dummies/interactions for the 
general model 
 
MNP models can also be estimated in LIMDEP 
94 
Ordered Logit/Probit 
In these models we assume that there is a strict 
ranking between the alternatives (the classic example 
is school grades). 
The model is given by 
1
(   )   (   )
i   j   i   j
P y   j   F   y r   r
*
-
=   =   <     
where the r's are to be estimated. F is logistic or 
normal depending on whether an Ordered Logit or 
Ordered Probit is used. 
Random Effect versions of the ordered models are 
also available in LIMDEP 
95 
Count Models 
If we are modelling, for example, the number of sick 
days it may not seem unreasonable to assume that 
Poisson(   ) y   A , where ln   x A   
=  
A problem with a Poisson regression is that it forces 
the mean and variance of y (given x) to be equal. An 
alternative which allows for "overdispersion" is the 
Negative Binomial regression 
In many cases the Poisson and Neg.Bin. models 
underestimate the number of zeroes. This may be 
due to fact that there are two processes at work 
1) A binary choice model, which determines if we 
report sick 
2) A count model, which determines how many days 
we are absent if we report sick  
Such models are called Zero-Inflated. 
LIMDEP estimates Poisson, NegBin, ZIP, ZINB, 
Fixed and Random Effect Poisson and NegBin., and 
some sample selection count models 
96 
Truncated and Censored Models 
In binary choice models we only observe, for 
example, if consumption occurs. In a censored model 
we observe the amount of consumption, but only if it 
is non-negative. The model is 
i   i   i
y   x    e
*
=   +  
if  
if  
i   i   i
i
i   i   i
y   y   L
y
  L   y   L
*   *
*
>
 
This model is left censored if x
i
 is observed for all 
observations and left truncated if x
i
 is only observed 
when 
i   i
y   y
*
= . Right censoring and truncation are 
also possible. 
The limit, L
i
, can be a constant or a variable. If y is 
observed consumption, then 
i
y
*
 is the propensity to 
consume and the limit value is zero. 
Censored and Truncated models usually assume that 
e is normally distributed. In this case the censored 
model is usually called a TOBIT model 
Random Effects, nested, bivariate and sample 
selection forms of the TOBIT model are available in 
LIMDEP. 
97 
Sample Selection 
In many situations (more than we like to think) the 
data we have available has not been obtained 
randomly from the population of interest. 
In addition there may well be a correlation between 
the mechanism that determines what data is 
observed and the process we are interested in. 
For example, physicians may tend to "deselect" 
patients considered too ill to take part in a clinical 
trial. Too few of these patients will therefore 
participate, which will obviously bias our results. 
Sample selection models consist of two parts 
1) A selection model (probit, logit, etc) 
2) An explanatory model (regression, tobit, etc). 
These models are very easily estimated in LIMDEP. 
Note the main problem consists of formulating a 
reasonable selection model!