Multiple Regression Analysis
y = 0 + 1x1 + 2x2 + . . . kxk + u
   1. Estimation
2011-3-20          Lectured by Dr Jin Hongfei   1
Recap of Simple Regression
 Assumptions:
    The population model is linear in parameters:
   y = 0 + 1x + u                          [SLR.1]
   There is a random sample of size n, {(xi, yi): i=1,
   2, …, n}, from the population model.      [SLR.2]
    Assume E(u|x) = 0 and thus E(ui|xi) = 0 [SLR.3]
    Assume there is variation in the xi      [SLR.4]
    Var(u|x) = σ2                            [SLR.5]
2011-3-20          Lectured by Dr Jin Hongfei       2
Recap of Simple Regression
       Applying the least squares method yields
      estimators of 0 and 1
      R2 = 1-(SSR/SST) is a measure of goodness
      of fit
       Under SLR.1-SLR.4 the OLS estimators
      are unbiased…
       …and, with SLR.5, have variances which
      can be estimated from the sample…
2011-3-20          Lectured by Dr Jin Hongfei   3
Recap of Simple Regression
 … if we estimate σ2 with SSR/(n-2).
 Adding SLR.6, Normality, gives normally
 distributed errors and allows us to state that …
            ˆ1  1 ˆ1  1
                               ~ tn2
             ˆ s x   se( ˆ1 )
 …which is the basis of statistical inference.
 Alternative, useful, functional forms are possible.
2011-3-20         Lectured by Dr Jin Hongfei   4
Limitations of Simple Regression
      In simple regression we explicitly control for only
      a single explanatory variable.
      We deal with this by assuming SLR.3
      e.g. wage = β0 + β1educ + β2exper + u
      Simple regression of wage on educ puts exper in u
      and assumes educ and u independent.
      Simple regression puts a lot of weight on
      conditional mean independence.
2011-3-20             Lectured by Dr Jin Hongfei       5
Multiple Regression Model
 In the population we assume
   y = 0 + 1x1 + 2x2 + . . . kxk + u
       We are still explaining y.
       There are k explanatory variables.
       There are k+1 parameters.
       k = 1 gets us back to simple regression.
2011-3-20           Lectured by Dr Jin Hongfei    6
Parallels with Simple Regression
       0 is still the intercept
       1 to k all called slope parameters
       u is still the error term (or disturbance)
       Still need to make a zero conditional mean
      assumption, so now assume that
       E(u|x1,x2, …,xk) = 0        or E(u|x) = 0
       Still minimizing the sum of squared
      residuals, however...
2011-3-20           Lectured by Dr Jin Hongfei   7
Applying Least Squares
      A residual is
            uˆi  yi  yˆ i
      And a fitted value is
            yˆ i  ˆ0  ˆ1 xi1  ˆ2 xi 2    ˆk xik
      So     n             n
             i  i i
             ( ˆ
               u
            i 1
                 ) 2
                      ( y  ˆ
                             y
                          i 1
                               ) 2
                    n
              ( yi  ˆ0  ˆ1 xi1  ˆ2 xi 2    ˆk xik ) 2
                   i 1
2011-3-20                      Lectured by Dr Jin Hongfei           8
Some Important Notation
• xij is the i’th observation on the j’th
explanatory variable
• e.g. x32 is the 3rd observation on explanatory
variable 2
• Not such a problem when we use variable
names e.g. educ3
2011-3-20         Lectured by Dr Jin Hongfei       9
The First Order Conditions
      n
     (uˆ i ) 2            n
     i 1
                      2  ( y i  ˆ0  ˆ1 xi1  ˆ 2 xi 2    ˆ k xik )  0
       ˆ0                i 1
      n
    (uˆi ) 2             n
     i 1
                     2 xi1 ( yi  ˆ0  ˆ1 xi1  ˆ2 xi 2    ˆk xik )  0
       ˆ1               i 1
                                                                                   
          n
      (uˆ i ) 2           n
      i 1
                      2  xik ( y i  ˆ0  ˆ1 xi1  ˆ 2 xi 2    ˆ k xik )  0
          ˆ k            i 1
2011-3-20                         Lectured by Dr Jin Hongfei                         10
The First Order Conditions
 • There are k + 1 first order conditions,
 solution of which is hard.
 •A matrix approach is “easier” but beyond the
 scope of our course. See Wooldridge,
 Appendix E.
 •In general each ˆ j is a function of all the xj
 and the y.
2011-3-20          Lectured by Dr Jin Hongfei        11
Interpreting Multiple Regression
 yˆ  ˆ0  ˆ1 x1  ˆ2 x2  ...  ˆk xk , so
 yˆ  ˆ x  ˆ x  ...  ˆ x ,
            1   1     2        2                 k   k
 so holding x2 ,..., xk fixed implies that
 yˆ  ˆ x , that is each  has
            1   1
  a ceteris paribus interpretation
2011-3-20           Lectured by Dr Jin Hongfei           12
Example
 Consider the multiple regression model:
            wage = β0 + β1educ + β2exper + u
ˆ 1 is the estimated increase in the wage for a unit
increase in educ holding exper constant
 Using the data in wage.wfl…
2011-3-20             Lectured by Dr Jin Hongfei   13
Example (Eviews output)
2011-3-20   Lectured by Dr Jin Hongfei   14
Example - interpretation
       The fitted equation is:
 • Holding experience fixed, a one year increase in
 education increases the hourly wage by 64 cents.
 • Holding education fixed, a one year increase in
 experience increases the hourly wage by 7 cents.
 • When education and experience are zero wages are
 predicted to be -$3.40!
2011-3-20              Lectured by Dr Jin Hongfei     15
Simple vs Multiple Estimates
                                   ~   ~ ~
Compare the simple regression y   0  1 x1
with the multiple regression yˆ  ˆ0  ˆ1 x1  ˆ2 x2
             ~ ˆ
Generally, 1  1 unless :
ˆ  0 (i.e. no partial effect of x ) OR
  2                                              2
x1 and x2 are uncorrelated in the sample
2011-3-20           Lectured by Dr Jin Hongfei       16
Assumptions for Unbiasedness
  Population model is linear in parameters:
  y = 0 + 1x1 + 2x2 +…+ kxk + u             [MLR.1]
 {(xi1, xi2,…, xik, yi): i=1, 2, …, n} is a random sample
from the population model, so that
  yi = 0 + 1xi1 + 2xi2 +…+ kxik + ui        [MLR.2]
 E(u|x1, x2,… xk) = 0, implying that all of the explanatory
variables are uncorrelated with the error [MLR.3]
 None of the x’s is constant, and there are no exact linear
relationships among them                        [MLR.4]
2011-3-20           Lectured by Dr Jin Hongfei        17
Unbiasedness of OLS
 Under these assumptions: E (ˆ j )   j , j  0,1,..., k
 All of the OLS estimators of the parameters of
the multiple regression model are unbiased
estimators.
 This is not generally true if any one of MLR.1-
MLR.4 are violated.
Note MLR.4 more involved than SLR.4 and rules
out perfect multicollinearity.
 MLR.3 more plausible than SLR.3?
2011-3-20           Lectured by Dr Jin Hongfei      18
Too Many or Too Few Variables
       What happens if we include variables in
      our specification that don’t belong?
           OLS estimators remain unbiased
           There will, however, be an impact on the
            variance of the estimators
      What if we exclude a variable from our
      specification that does belong?
           OLS will usually be biased
2011-3-20                Lectured by Dr Jin Hongfei    19
Omitted Variable Bias
            True model:
            y  0  1 x1  2 x2  u
            We estimate:
            y  0  1 x1  u
                             xi1
                                x   y
            Then  1            1      i
                          xi1  x1 
                                       2
2011-3-20               Lectured by Dr Jin Hongfei   20
Omitted Variable Bias (cont)
Recall that the true model is
yi  0  1 xi1  2 xi 2  ui , so the
numerator becomes
  xi1      x1  0  1 xi1  2 xi 2  ui  
1   xi1  x1   2   xi1  x1 xi 2    xi1  x1 ui
                     2
2011-3-20                Lectured by Dr Jin Hongfei     21
Omitted Variable Bias (cont)
                xi1
                     x   x         xi1  x1  ui
  1  2           1     i 2
                                 
               xi1  x1           xi1  x1 
                            2                     2
since E(ui )  0, taking expectations we have
                       xi1
                           x   x
E  1   1  2
                            1     i2
                     xi1  x1 
                                  2
2011-3-20               Lectured by Dr Jin Hongfei     22
Omitted Variable Bias (cont)
Consider the regression of x2 on x1
                                                 xi1  x1  xi 2
x2   0   1 x1 then  1 
                                                  xi1  x1 
                                                                 2
so E   1   1  2 1
2011-3-20         Lectured by Dr Jin Hongfei                  23
Summary of Direction of Bias
            Corr(x1, x2) > 0 Corr(x1, x2) < 0
 2 > 0     Positive bias                     Negative bias
 2 < 0     Negative bias                     Positive bias
2011-3-20        Lectured by Dr Jin Hongfei                   24
Omitted Variable Bias Summary
       Two cases where bias is equal to zero
           2 = 0, that is x2 doesn’t really belong in model
           x1 and x2 are uncorrelated in the sample
       If correlations between x2 , x1 and x2 , y are
      the same sign, bias will be positive
       If correlations between x2 , x1 and x2 , y are
      the opposite sign, bias will be negative
2011-3-20                 Lectured by Dr Jin Hongfei       25
Omitted Variable Bias: Example
 Suppose the model
 log(wage) = 0 + 1educ + 2abil + u
 satisfies MLR.1-MLR.4
 abil is typically hard to observe so we
   estimate log(wage) = 0 + 1educ + u
 On average we expect the estimate of 1 to be
 too high, E(ˆ 1 )  1 , since we expect 2  0 and
 Corr(abil , educ)  0.
2011-3-20         Lectured by Dr Jin Hongfei     26
The More General Case
       Technically, can only sign the bias for the
      more general case if all of the included x’s
      are uncorrelated
       Typically, then, we work through the bias
      assuming the x’s are uncorrelated, as a
      useful guide even if this assumption is not
      strictly true
2011-3-20           Lectured by Dr Jin Hongfei      27
Variance of the OLS Estimators
       Assume Var(u|x1, x2,…, xk) = 2
      (Homoscedasticity)
      Let x stand for (x1, x2,…xk)
       Assuming that Var(u|x) = 2 also implies
      that Var(y| x) = 2
       The 4 assumptions for unbiasedness, plus
      this homoscedasticity assumption are
      known as the Gauss-Markov assumptions
2011-3-20          Lectured by Dr Jin Hongfei   28
Variance of OLS (cont)
Given the Gauss-Markov Assumptions
                         2
Var  ˆ j                   , where
               SSTj 1  R j 
                           2
SSTj    xij  x j  and R j is the R
                          2                   2   2
from regressing x j on all other x's
2011-3-20        Lectured by Dr Jin Hongfei           29
Components of OLS Variances
 1.  The error variance: a larger 2 implies a larger
    variance for the OLS estimators
 2. The total sample variation: a larger SSTj implies
    a smaller variance for the estimators
 3. Linear relationships among the independent
    variables: a larger Rj2 implies a larger variance
    for the estimators (multicollinearity)
2011-3-20          Lectured by Dr Jin Hongfei     30
Misspecified Models
 Consider again the misspecified model
                                     
                                            
                                                     2
 ~   ~ ~                       ~
 y   0  1 x1 , so that Var 1 
                                    SST1
             
             ~
                               
 Thus, Var   Var ˆ unless x and
             1                     1             1
 x2 are uncorrelated, then they' re the same
2011-3-20         Lectured by Dr Jin Hongfei             31
Misspecified Models (cont)
Assuming that x and x are not uncorrelated,
                1             2
we can draw the following conclusions:
                                                 ~
From the second conclusion, it is clear that 1
is preferred if 2 0.
2011-3-20           Lectured by Dr Jin Hongfei       32
Misspecified Models (cont)
       While the variance of the estimator is smaller for
      the misspecified model, unless 2 = 0 the
      misspecified model is biased
       Corrolary: including an extraneous or irrelevant
      variable cannot decrease the variance of the
      estimator
       As the sample size grows, the variance of each
      estimator shrinks to zero, making the variance
      difference less important
2011-3-20             Lectured by Dr Jin Hongfei        33
Gauss-Markov Assumptions
      Linear in parameters: y = b0 + b1x1 + b2x2 +…+
      bkxk + u                          [MLR.1]
       {(xi1, xi2,…, xik, yi): i=1, 2, …, n} is a random
      sample from the population model, so that
      yi = b0i + b1x1i + b2x2i +…+ bkxki + ui        [MLR.2]
       E(u|x1, x2,… xk) = E(u) = 0. Conditional mean
      independence                             [MLR.3]
       No exact multicollinearity              [MLR.4]
      V(u|x) = V(u) = σ2. Homoscedasticity. [MLR.5]
2011-3-20              Lectured by Dr Jin Hongfei        34
The Gauss-Markov Theorem
       Given our 5 Gauss-Markov Assumptions it
      can be shown that OLS is “BLUE”
       Best
       Linear
       Unbiased
       Estimator
       Thus, if the assumptions hold, use OLS
2011-3-20          Lectured by Dr Jin Hongfei   35
Estimating the Error Variance
   ˆ   uˆ /(n  k  1)  SSR / df
        2       2
                i
              ˆ
    thus, se(  j )  ˆ /[ SST j (1  R j )]
                                         2 1/ 2
     df = n – (k + 1), or df = n – k – 1
     df (i.e. degrees of freedom) is the (number
     of observations) – (number of estimated
     parameters)
2011-3-20           Lectured by Dr Jin Hongfei    36
Goodness-of-Fit
  R2 can also be used in the multiple regression
  context.
  R2 = SSE/SST = 1 – SSR/SST
  0 < R2 < 1
  R2 has the same interpretation - the
  proportion of the variation in y explained by
  the independent (x) variables
2011-3-20         Lectured by Dr Jin Hongfei   37
More about R-squared
       R2 can never decrease when another
      independent variable is added to a
      regression, and usually will increase
       This is because SSR is non-increasing in k
       Because R2 will usually increase with the
      number of independent variables, it is not a
      good way to compare models
2011-3-20           Lectured by Dr Jin Hongfei   38
Adjusted R-Squared (Section 6.3)
       An alternative measure of goodness of fit is
      sometimes used
      The adjusted R2 takes into account the number of
      variables in a model, and may decrease
                R   2
                         1
                             SSR n  k  1
                               SST n  1
                                   ˆ   2
                 1
                         SST n  1
2011-3-20                Lectured by Dr Jin Hongfei   39
Adjusted R-Squared (cont)
       It’s easy to see that the adjusted R2 is just
      (1 – R2)(n – 1) / (n – k – 1), but Eviews will
      give you both R2 and adj-R2
       You can compare the fit of 2 models (with
      the same y) by comparing the adj-R2
       You cannot use the adj-R2 to compare
      models with different y’s (e.g. y vs. ln(y))
2011-3-20            Lectured by Dr Jin Hongfei   40
Eviews output again
            R   2
                           ̂        R2
2011-3-20           Lectured by Dr Jin Hongfei   41
Summary: Multiple Regression
       Many of the principles the same as simple
      regression
           Functional form results the same
       Need to be aware of the role of the
      assumptions
       Have only focused on estimation; consider
      inference in the next lecture
2011-3-20                Lectured by Dr Jin Hongfei   42
Next Time
      Next topic is inference in (multiple)
      regression (Chapter 4)
2011-3-20           Lectured by Dr Jin Hongfei   43