Announcements:
• Quiz 1 on January 30, 2025 at 3 pm. Will cover material
  covered until January 23
• Please answer google form
Simple Linear Regression model continued
          “Desirable” properties of estimators
  • Unbiasedness
  • Efficiency (minimum variance) [we will cover this later]
These are finite sample properties
An estimator θ̂ of θ is said to be unbiased if E(θ̂) = θ
In the SLR case, we want β̂0 and β̂1 to be unbiased. I.e. that
E[β̂1 ] = β1 and E[β̂0 ] = β0
             Assumptions in the SLR model
Given the regression model y = β0 + β1 x + u, we assume
 1. The model is linear in parameters.
 2. There is a random sample of n observations on y and x
 3. Not all the x have the same value
 4. E(u|x) = 0. By Law of Iterated Expecations, this ⇒ E(u) = 0
 5. Var (u|x) = σ 2 . This assumption is called homoskedasticity.
                      Assumption 4
• We needed only the first three assumptions to derive the OLS
  estimator.
• We need assumption 4 for demonstrating unbiasedness of the
  OLS estimator.
• Another implication of (4) is that E(y |x) = β0 + β1 x This is
  referred to as the population regression function. It
  emphasizes how y changes on average with changes in x.
• This is a key assumption and we will refer to it several times
  in this course.
                 Unbiasedness of β̂1 and β̂0
Before proving unbiasedness, it is useful to recognize the following
identity
               Xn                       n
                                        X
                  (xi − x̄)(yi − ȳ ) =    (xi − x̄)yi
                  i                       i
We use this to rewrite the expression for β̂1 as follows:
                  Pn                        Pn
                    iP(xi − x̄)(yi − ȳ )      (xi − x̄)yi
            β̂1 =        n          2
                                          = Pin          2
                         i (xi − x̄)          i (xi − x̄)
Substitute for yi from the population model
                        Pn
                            (xi − x̄)(β0 + β1 xi + ui )
                ⇒ β̂1 = i       Pn             2
                                   i (xi − x̄)
                  Unbiasedness of β̂1 and β̂0
         β0 ni (xi − x̄) β1 ni (xi − x̄)xi
           P               P                 Pn
                                                (xi − x̄)ui
       = Pn           2
                        + Pn            2
                                           + Pin          2
           i (xi − x̄)      i (xi − x̄)        i (xi − x̄)
Substitute for yi from the population regression function
                Pn                 Pn                     Pn
                   i (xi − x̄)2     i  (xi − x̄)ui           (xi − x̄)ui
⇒ β̂1 = 0 + β1 Pn              2
                                 + Pn            2
                                                   = β1 + Pin           2
                   i (xi − x̄)        i (xi − x̄)           i (xi − x̄)
This is a key relationship and has many uses.
This means that conditional on x, E(β̂1 |x) = β1 + 0 = β hence
unbiased. This in turn implies that unbiasedness holds
unconditionally as well.
Similarly, β̂0 is also unbiased. Proof left as an exercise.
                      Assumption 5
• Assumption 5 states that Var (u|x) = σ 2 , residuals are
  homoskedastic. This in turn implies that Var (y |x) = σ 2
• σ 2 is a scalar, and does not vary with observation i. When
  Var (u|x) is not a constant, we term this a case of
  heteroskedasticity. We will consider this case later.
• It is important to reiterate that both assumptions (4) and (5)
  pertain to the unobserved error u and not to the observed
  residual û
Homoskedasticity (from Wooldridge, Chapter 2)
  Heteroskedasticity (from Wooldridge, Chapter 2)
Assumption 5 rules out heteroskedasticity (for now)
                          Variance of β̂1
Assumption 5 is needed to derive the variance of the estimate
coefficients. We start with the slope coefficient β̂1 But first, recall
that the variance of any estimator θ̂ is given by:
Var (θ̂) = E(θ̂ − E(θ̂))2 .
                                           (xi − x̄)ui 2
                                         P            
                                      2
          ⇒ Var (β̂1 ) = E(β̂1 − β1 ) = E P
                                            (xi − x̄)2
Where these are all conditional on the sample x’s. (Recall
derivation of unbiasedness of β̂1 )
          (xi − x̄)ui 2
       P            
                                1         X
    =E                  =                  (xi − x̄)2 E(ui2 )
          (xi − x̄)2      [ (xi − x̄)2 ]2
        P                  P
We were able to do this because of conditioning of x. Note further
that, conditional on x, E(ui2 ) = σ 2 given that E(u) = 0 and
Var (ui ) = E(ui2 ) = σ 2 , a constant. After cancellation,
                                      σ2
                    ⇒ Var (β̂1 ) = P
                                    (xi − x̄)2
We will show later that this is the lowest variance of any estimator
that happens to be linear and unbiased.
An analogous expression can be derived for β̂0 . Proof left as an
exercise.
                  Estimated variance of β̂1
We are not done yet. σ 2 is not observed, and is a population
parameter. This cannot therefore be computed. To be able to
calculate Var (β̂1 ) we need to estimate σ̂ 2 .
The estimated σ 2 is given by
                                    P 2
                                2     ûi
                              σ̂ =
                                    n−2
It turns out E(σ̂ 2 ) = σ 2 . We will show this formally in the general
K variable case.
Dividing the SSR by n instead of n − 2 would yield a biased
estimator of σ 2 . This is known as a degrees of freedom correction.
Intuitively this is because there are two restrictions that the
residuals must follow:
   •
     P
         ûi = 0 and
   •
     P
         xi ûi = 0
                  Estimated variance of β̂1
Therefore the estimated variance of β̂1 denoted by Var
                                                   d (β̂1 )
                                        2
                      d (β̂1 ) = P σ̂
                      Var
                                  (xi − x̄)2
The variance of β̂0 can be derived analogously. This is left as an
exercise.
Distinction between Var (β̂1 ) and Var
                                   d (β̂1 ) is important
                       Dummy variables
When x is a binary variable, taking values only 1 or 0, it is called a
dummy variable.
Consider a regression of adult heights y in cms on gender x which
takes value 1 if male and value 0 if female. I.e. x can only take
two values, 0 and 1.
E[y |x] = β0 + β1 x ⇒ E[y |x = 0] = β0 and E[y |x = 1] = β0 + β1
Thus, β0 is the average height of women in cms; while the ‘slope’
coefficient β1 represents the difference in heights between men and
women.
                                     OLS results
Based on state level data on heights, the results are
      Source |       SS           df       MS      Number of obs   =       58
-------------+----------------------------------   F(1, 56)        =   496.08
       Model | 2171.62041          1 2171.62041    Prob > F        =   0.0000
    Residual | 245.141349         56 4.37752409    R-squared       =   0.8986
-------------+----------------------------------   Adj R-squared   =   0.8968
       Total | 2416.76176         57 42.3993292    Root MSE        =   2.0923
------------------------------------------------------------------------------
      height | Coefficient Std. err.       t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
      gender |   12.23793   .5494526    22.27   0.000     11.13724    13.33862
       _cons |   152.2379   .3885217   391.84   0.000     151.4596    153.0162
------------------------------------------------------------------------------
                            Comparison of means
These are identical to results from a simple comparison of means
. by gender: summ height
--------------------------------------------------------------------------------------------------
-> gender = 0
    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
      height |         29    152.2379    1.537255      149.3      154.8
--------------------------------------------------------------------------------------------------
-> gender = 1
    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
      height |         29    164.4759     2.52822      157.5      168.4