FINM3123 Introduction to Econometrics
Chapter 8
Heteroskedasticity
1
Multiple Regression Analysis: Heteroskedasticity
Consequences of heteroskedasticity for OLS
§ OLS still unbiased and consistent under heteroskedasticity!
§ Also, interpretation of R-squared is not changed
Unconditional error variance is unaffected by heteroskedasticity
(which refers to the conditional error variance)
§ heteroskedasticity invalidates variance formulas for OLS estimators
§ The usual 𝐹-tests and 𝑡-tests are not valid under heteroskedasticity
§ Under heteroskedasticity, OLS is no longer the best linear unbiased estimator (BLUE);
there may be more efficient linear estimators
2
Multiple Regression Analysis: Heteroskedasticity
Heteroskedasticity-robust inference after OLS
§ Formulas for OLS standard errors and related statistics have been developed
that are robust to heteroskedasticity of unknown form
§ These robust formulas are only valid in large samples
§ Formula for heteroskedasticity-robust OLS standard error
Also called White/Eicker/Huber standard errors.
They involve the squared residuals from the
regression of 𝑥𝑗 on all other explanatory variables.
§ Using these formulas, the usual 𝑡-test is valid asymptotically
§ The usual 𝐹-statistic does not work under heteroskedasticity, but
heteroskedasticity-robust versions are available in most software
3
Multiple Regression Analysis: Heteroskedasticity
Example: Hourly wage equation
heteroskedasticity-robust standard errors may be larger or smaller than
their nonrobust counterparts. The differences are often small in practice.
F-statistics are also often not too different.
If there is strong heteroskedasticity, differences may
be larger. To be on the safe side, it is advisable to
always compute robust standard errors.
4
Multiple Regression Analysis: Heteroskedasticity
Testing for heteroskedasticity
§ It may still be interesting to test whether there is heteroskedasticity because then
OLS may not be the most efficient linear estimator anymore
Breusch-Pagan test for heteroskedasticity
Under MLR.4
The mean of 𝑢2 must not vary with 𝑥1, 𝑥2, …, 𝑥𝑘
5
Multiple Regression Analysis: Heteroskedasticity
Breusch-Pagan test for heteroskedasticity (cont.)
Regress squared residuals on all
explanatory variables and test whether
this regression has explanatory power.
𝑅"#!! /𝑘 A large test statistic (= a high R-squared)
𝐹= ~𝐹$,&'$'( is evidence against the null hypothesis.
(1 − 𝑅"#!! )/(𝑛 − 𝑘 − 1)
Alternative test statistic (= Lagrange multiplier statistic, LM).
𝐿𝑀 = 𝑛 % 𝑅"#!! ~𝜒$#
Again, high values of the test statistic (= high R-squared)
lead to rejection of the null hypothesis that the expected
value of 𝑢2 is unrelated to the explanatory variables.
6
Multiple Regression Analysis: Heteroskedasticity
Example: heteroskedasticity in housing price equations
heteroskedasticity
⟹ 𝑅"#!% = .1601, 𝑝⎼𝑣𝑎𝑙𝑢𝑒$ = .002, 𝑝⎼𝑣𝑎𝑙𝑢𝑒%& = .0028
⟹ 𝑅"#!% = .0480, 𝑝⎼𝑣𝑎𝑙𝑢𝑒$ = .245, 𝑝⎼𝑣𝑎𝑙𝑢𝑒%& = .2390
In the logarithmic specification, homoskedasticity cannot be rejected
7
Multiple Regression Analysis: Heteroskedasticity
White test for heteroskedasticity Regress squared residuals on all explanatory variables,
their squares, and interactions (here: example for k=3)
The White test detects more general deviations from
heteroskedasticity than the Breusch-Pagan test
𝐿𝑀 = 𝑛 9 𝑅"#!% ~𝜒'#
Disadvantage of this form of the White test:
§ Including all squares and interactions leads to a large number of estimated parameters
(e.g. k=6 leads to 27 parameters to be estimated)
8
Multiple Regression Analysis: Heteroskedasticity
Alternative form of the White test
This regression indirectly tests the dependence of the squared residuals
on the explanatory variables, their squares, and interactions, because the
predicted value of y and its square implicitly contain all of these terms.
𝐻! : 𝛿" = 𝛿# = 0, 𝐿𝑀 = 𝑛 - 𝑅%#$! ~𝜒##
Example: heteroskedasticity in (log) housing price equations
9
Multiple Regression Analysis: Heteroskedasticity
Weighted least squares estimation
Assume heteroskedasticity is known up to a multiplicative constant
The functional form of the
heteroskedasticity is known
Transformed model
10
Multiple Regression Analysis: Heteroskedasticity
Example: Savings and income
Note that this regression model has no intercept
The transformed model is homoskedastic
If the other Gauss-Markov assumptions hold as well, OLS applied to the transformed
model is the best linear unbiased estimator!
11
Multiple Regression Analysis: Heteroskedasticity
OLS in the transformed model is called weighted least squares (WLS)
Observations with a large variance get a
smaller weight in the optimization problem
Why is WLS more efficient than OLS in the original model?
§ Observations with a large variance are less informative than observations with small
variance and therefore should get less weight
WLS is a special case of generalized least squares (GLS)
12
Multiple Regression Analysis: Heteroskedasticity
Example: Financial wealth equation
Net financial wealth
Assumed form of heteroskedasticity:
WLS estimates have considerably
smaller standard errors (which is
in line with the expectation that
they are more efficient).
Participation in 401K pension plan
13
Multiple Regression Analysis: Heteroskedasticity
Important special case of heteroskedasticity
§ If the observations are reported as averages at the city/county/state/country/firm level,
they should be weighted by the size of the unit
Average contribution to Average earnings Percentage firm heteroskedastic
pension plan in firm 𝑖 and age in firm 𝑖 contributes to plan error term
Error variance if errors are
homoskedastic at the employee level
If errors are homoskedastic at the employee level, WLS with weights equal to firm size 𝑚𝑖
should be used. If the assumption of homoskedasticity at the employee level is not exactly
right, one can calculate robust standard errors after WLS (i.e. for the transformed model).
14
Multiple Regression Analysis: Heteroskedasticity
Unknown heteroskedasticity function (feasible GLS)
Assumed general form of
heteroskedasticity; the exponential
function is used to ensure positivity
Multiplicative error (assumption:
independent of the explanatory variables)
Use inverse values of the
estimated heteroskedasticity
function as weights in WLS
Feasible GLS is consistent and asymptotically more efficient than OLS.
15
Multiple Regression Analysis: Heteroskedasticity
Example: demand for cigarettes
§ Estimation by OLS
Cigarettes smoked per day Logged income and cigarette price
Smoking restrictions in restaurants
Reject homoskedasticity
16
Multiple Regression Analysis: Heteroskedasticity
Estimation by FGLS Now statistically significant
Discussion
§ The income elasticity is now statistically significant; other coefficients are also more
precisely estimated (without changing the quality of the results)
17
Multiple Regression Analysis: Heteroskedasticity
What if the assumed heteroskedasticity function is wrong?
§ If the heteroskedasticity function is misspecified, WLS is still consistent under MLR.1 –
MLR.4, but robust standard errors should be computed
§ WLS is consistent under MLR.4 but not necessarily under MLR.4‘
§ If OLS and WLS produce very different estimates, this typically indicates that some other
assumptions (e.g. MLR.4) are wrong
§ If there is strong heteroskedasticity, it is still often better to use a wrong form of
heteroskedasticity in order to increase efficiency
18
Multiple Regression Analysis: Heteroskedasticity
WLS in the linear probability model
In the LPM, the exact form of
heteroskedasticity is known
Use inverse values
as weights in WLS
Discussion
§ Infeasible if LPM predictions are below zero or greater than one
§ If such cases are rare, they may be adjusted to values such as .01/.99
§ Otherwise, it is probably better to use OLS with robust standard errors
19
Summary
§ Testing for heteroskedasticity
• Breusch-Pagan test
• White test
• Alternative form of the White test
§ If heteroskedasticity exists, corrective measures
• heteroskedasticity-robust standard error
• WLS if heteroskedasticity is known to a multiplicative constant
• If observations are reported as averages at the group level
• Linear probability model
• FGLS for unknown heteroskedasticity function
20