Chapter-4:Violation of Assumptions
Sied Hassen(PhD)
Department of Economics, Addis Ababa University
December, 2015
Outline
Multicollinearity
Hetroscedasticity
Autocorrelation
Multicollinearity
Multicollinearity refers to a linear relationship among some or
all explanatory variables of multiple linear regression model
The relationship can be prefect or exact like
λ1 X1 + λ2 X2 + ... + λk Xk = 0 (1)
Or the relationship can be less than prefect like
λ1 X1 + λ2 X2 + ... + λk Xk + vi = 0 (2)
where vi is error term
As stated above multicollinearity refers only to linear
correlation. It does not rule out non-linear relation like
Yi = α + β1 X1 + β2 X12 + β3 X13 + ui (3)
Causes of multicollinearity
Data collection method employed
sampling over limited range of the values taken by the
explanatory variables in the population
constraints on the model or population
For example, in the regression of electricity consumption on
income (X1 ) and house size (X2 ) there is a physical constraint
in the population in that families with higher incomes generally
have larger homes than families with lower incomes.
model specification
for example, adding polynomial terms to a regression model,
especially when the range of the X variable is small.
An over-determined model
This happens when the model has more explanatory variables
than the number of observations
Consequences of Multicollinearity
If multicollinearity is perfect, the regression coefficients are
indeterminate and their standard errors are infinite
If multicollinearity is less than perfect,the regression
coefficients are determinate but their their standard errors are
large
This will make the t-value of the coefficients to be smaller and
hence most of coefficients are individually insignificant
It malkes the model R 2 to be very large, usually greater than
80%
The high R 2 will make the slope coefficients to be jointly
significant
Detetion of Multicollinearity
High R 2 (> .8) but few significant coefficients
High pair-wise correlation among the explanatory
variables(rx1 x2 > .8)
This is a sufficient but not a necessary condition.
This is because it is possible that R 2 to be high when the
correlation between the variables is low
Using Rx21 x2 x3 ...xk from an auxiliary regression, i.e regression of
one explantory variable on the remaining explantory variables
X1 = λ1 X2 + λ2 X3 + ... + λk Xk (4)
In this case there is multicollinearity if Rx21 x2 x3 ...xk from an
auxiliary regression is greater than R 2 from the regression of
Y on the explantory variables(X ′ s)
Detetion of Multicollinearity
a more formal test Using Rj2 from an auxiliary regression is
Rx21 x2 x3 ...x
k
F= k−2 (5)
2
1−Rx1 x2 x3 ...x
k
n−k+1
if the computed F is greater than critical F (k − 1, n − k + 1),
there is multicollinearity problem
Variance inflation factor(VIF)or Tolerance(TOL)
1
VIF = 1−Rx21 x2 x3 ...x
(6)
k
1
Tol = VIF
If VIF > 10, Tol < .1andRx21 x2 x3 ...xk > .9, it is an indication of
high collinearity
Remedial Measure
Do nothing
or follow some rule of thumb
The rule of thumb
Drop variable/varaibles that cuase the problem of
multicolinearity
this may cause an omitted variable bias
Transform the variables:(1)First difference
Yt = α + β1 X1t + β2 X2t + ut (7)
The one period lagged of the above equation, i.e
Yt−1 = α + β1 X1t−1 + β2 X2t−1 + ut−1 (8)
Take difference of the above two euations
Yt−1 = α + β1 X1t−1 + β2 X2t−1 + ut−1 (9)
Yt −Yt−1 = β1 (X1t −X1t−1 )+β2 (X2t −X2t−1 )+(ut −ut−1 ) (10)
this can be rewritten as
∆Yt = β1 ∆X1t + β2 ∆X2t + ∆ut (11)
The rule of thumb
Transform the variables:(2)ratio transformation
Yt = α + β1 X1t + β2 X2t + ut (12)
Yt
X2t = α
X2t + β1 [ X 1t
X2t ] +
ut
X2t (13)
Use additional or new data
Reduce collinearity in polynomial regression
Hetroscedasticity
Constant variance of the disturbance term is one of the
classical regression assumptions, i.e,
E (ui2 ) = σ 2 = constant (14)
This means variance of the error term is constant regardless of
the values of the explanatory variables
If variance varies with the values of the explanatory variables,
we have a hetroscedastic variance,i.e
E (ui2 ) = σi2 ≠ constant (15)
This is most common in cross-sectional data.
Hetroscedasticity
Fig. (a) Homosecdastic error variance fig.(b) Heteroscedastic error variance
Matrix Representation Variance of error term
(U i2 ) (U 1U 2 ) .......... (U 1U n )
(U 1U 2 ) (U 22 ) .......... (U 2U n )
(UU ' )
: : :
(U 1U n ) (U 2U n ) .......... (U n )
2
Matrix Representation of Hemoscedasticity
Assuming that there is no autocorrelation(E (Ui Uj ) = 0), the
homoscedastic variance of the error term is given by
2 0 .......... 0
0 2 .......... 0
(UU ')
: : :
0 0 .......... 2
1 0 .......... 0
0 1 .......... 0
(UU ') 2
: : :
0 0 .......... 1
(UU ') 2 I
Matrix Representation of Hetroscedasticity
Assuming that there is no autocorrelation(E (Ui Uj ) = 0), the
hetroscedastic variance of the error term is given by
12 0 .......... 0
0 22 .......... 0
(UU ')
: : :
0 0 .......... n2
Causes of hetroscedasticty
1. Error learning model:
It states that as people learn their error of behavior become
smaller over time. In this case σi2 is expected to decrease
Example: as the number of hours of typing practice increases,
the average number of typing errors and as well as their
variance decreases.
2.As data collection technique improves, σi2 is likely to
decrease
3. Heteroscedasticity can also arise as a result of the
presence of outliers
An outlier is an observation that is much different (either very
small or very large) in relation to the other observation in the
sample.
Consequences of Hetroscedasticity
The OLS estimators are still linear ,unbiased and consistent
However, they are no longer efficient.
With homoscedastic variance and one explanatory variable,
variance of the slope coefficient,β̂ is
σ2
var (β̂) = ∑ xi2
(16)
However if we have hetroscedasticty and single regressor,the
correct variance of the slope coefficient is
∑ xi2 σi2
var (β̂) = 2 (17)
(∑ xi2 )
If hetroscedasticty is not corrected,it can be show that
∑ xi2 σi2 σ2
2 > ∑ xi2
(18)
(∑ xi2 )
Consequences of Hetroscedasticity
If hetroscedasticty is not corrected, the large variance will
make the t-values to be lower.
The the estimated coefficients are appeared to be insignificant
while actually they may not be
This will make our inference and prediction to be wrong.
Detecting Hetroscedasticity
There are several methods to detect the presence or absence
of hetroscedasticity
The main refernce, Gujarati, Basic econometrics 4th edition,
have a discussion on these methods
Here we focus only on the mostly common applied in research
and regression softwares
Graphing of the residual square with the predicted Y(Ŷ ) or
explanatory variable(X ) can give a hint.
Detecting Hetroscedasticity
White’s test for Hetroscedasticity
Unlike other tests White test does not rely on the normality
assumption and is easy to implement
As an illustration of the basic idea, consider the following two-
explanatory variable regression model
Yi = α + β1 X1i + β2 X2i ut (19)
The White test proceeds as follows:
stage 1: estimate the model and obtain the residuals
ei = Yi − α̂ − βˆ1 X1i − βˆ2 X2i (20)
stage 2:run the following auxiliary regression
ei2 = λˆ0 + λˆ1 X1i + λˆ2 X2i + λˆ3 X1i2 + λˆ4 X2i2 + λˆ5 X1i X2i + vi (21)
White’s test for Hetroscedasticity
stage 3:Obtain R 2 from the above auxiliary regression.
Under the null hypothesis that there is no heteroscedasticity,
it can be shown
n ∗ Re2i X1 X1 ∼ χ2df (22)
df is degree of freedom and in this case it takes the number
of explanatory variables
stage 4:If the chi-square value obtained in (22) exceeds the
critical chi-square value at the chosen level of significance,
the conclusion is that there is heteroscedasticity.
If it does not exceed the critical chi-square value, there is no
heteroscedasticity,
which is to say that in the auxiliary regression
λˆ1 = λˆ2 = λˆ3 = λˆ4 = λˆ5 = 0 (23)
Solution(Remedial Measure) to Hetroscedasticty
The solution or remedial measure for hetroscedasticity is to
transform the the regression equation.
The propose of the transformation is to make the error term
in the transformed model to be homescedastic.,
Applying OLS in the transformed model is called the
Generalized Least square(GLS) or Weighted Least
Square(WLS)
Hence, the solution to hetroscedastcity is to use GLS or WLS
Solution(Remedial Measure) to Hetroscedasticty
For the model
Yi = α + βXi + ui (24)
Genrally the transformation is done as follows.
If variance of the error term is given by
var (ui ) = σ 2 f (Xi ) (25)
The√transformation is done by dividing the regression model
by f (Xi ) as
√ Yi = √ α + β √ Xi + √ ui (26)
f (Xi ) f (Xi ) f (Xi ) f (Xi )
For example if var (ui ) = E (ui2 ) = σ 2 Xi
The transformed model is
√Yi = √α + β √XXi + √ui (27)
Xi Xi i Xi
Solution(Remedial Measure) to Hetroscedasticty
In equation 27 it can be shown that variance of the
transformed error term ( √uXi ) is constant
i
var ( √uXi ) = E ( √uXi )2 = 1 2
Xi E (ui ) (28)
i i
From above E (ui2 ) = σ 2 Xi
This implies, equation 28 can be written as
var ( √uXi ) = 1 2
Xi E (ui ) = ( X1i )(σ 2 Xi ) = σ 2 = constant (29)
i
Hence in the transformed model the error term is
homoscedastic.
Applying OLS on equation 27 is called GLS or WLS
GLS/WLS estimates are BLUE
Autocorrelation
In both simple and multiple regression model, we assumed
that successive values of the error terms are independent
Cov (Ut , Ut−1 ) = E (Ut Ut−1 ) = 0 (30)
This is called the assumption of no autocorrelation.
If successive values of the error terms are correlated, then
there is autocrrelation.
Yt = α + βXt + Ut (31)
Ut = ρUt−1 + vt ; ρ ≠ 0 (32)
vt is assumed to be non-auto-correlated and homoscedastic
Auto-correlation is most common in time series data
Difference between correlation and autocorrelation
Autocorrelation is a special case of correlation which refers to
the relationship between successive values of the same variable
While correlation may also refer to the relationship between
two or more different variables
Autocorrelation is also sometimes called as serial correlation
But some economists distinguish between these two terms.
Autocorrelation is the lag correlation of a given series with
itself but lagged by a number of time
units(U2 , U4 , ..., U10 )and(U1 , U3 , ..., U11 )
Whereas correlation between time series such as
(U1 , U2 , ..., U10 ) and (V1 , V2 , ..., V10 ) where U and V are two
different time series, is called serial correlation
Graphical representation of Autocorrelation
Graphs (a)-(d) show existence of autocorrelation while fig(e)
shows no autocorrelation
Ui Ui Ui
t t t
(a) (b ) (c)
Ui Ui
t : : : : : : :: :: : : : : t
:::::::::::::
(d) (e)
Matrix representation of Autocorrelation
2 12 .......... 1n 12 12 .......... 1n
2 .......... 2 n 22 .......... 2 n
(UU ') 21 (UU ') 21
: : : : : :
n1 n 2 .......... 2 n1 n 2 .......... n2
Homoscedastic and autocorrelated hetroscedastic and autocorrelated
Causes of Autocorrelation
1.Cyclical fluctuations:
Time series such as GNP, price index, production, employment
and unemployment exhibit business cycle
2.Exclusion of variables from the regression model
Error term captures any variable excluded from the model
Thus error term will show a systematic change as these
excluded variable changes
3.Incorrect functional form
Error term also captures any mistake in the functional form
This will also make the error terms to be correlated
For example , suppose that the correct model is
Yt = α + βXt + λYt−1 + Ut (33)
for some reason we incorrectly regress
Yt = α + βXt + Vt (34)
Causes of Autocorrelation
Which implies that
Vt = λYt−1 + Ut (35)
Hence, Vt shows systematic change reflecting autocrrelation
Consequences of Autocorrelation
The OLS estimators are linear, unbiased and consistent
However, they are no longer efficient
The OLS estimates will be appeared to be statistically
insignificant while actually they may not be
Detection of Autocorrelation
The most commonly used method in testing for the presence
or absence of autocorrelation is the Durbin-Watson d test
∑t=n
t=2 (et −et−1 )
2
d= ∑t=n e 2 (36)
t=1 t
Note that, in the numerator of d statistic the number of
observations is n − 1 because one observation is lost in taking
successive differences
There are certain assumptions that underlie this test
1. The regression model includes an intercept term
Detection of Autocorrelation
2. The explanatory variables, the Xs, are non-stochastic, or
fixed in repeated sampling.
3. The disturbances are generated by the first order auto
regressive scheme successive differences
Ut = ρUt−1 + t (37)
This is first order auto-regressive model(AR(1))
Ut = ρ1 Ut−1 + ρ2 Ut−2 + t (38)
This is second order auto-regressive model(AR(2))
4.The regression model does not include lagged value of Y,
the dependent variable as one of the explanatory variables
Yt = α + βXt + λYt−1 + Ut (39)
5. There are no missing observations in the data
Detection of Autocorrelation
d can also be rewritten as
∑t=n 2 2
t=2 (et +et−1 −2et et−1 )
d∗ = t=n 2
∑t=1 et
(40)
For large sample size ∑t=n 2 t=n 2
t=2 et = ∑t=2 et−1
this implies that
2(∑t=n
t=2 et
2 2(∑t=n
t=2 et et−1
d∗ = ∑t=n e 2) − ∑t=n 2 (41)
t=1 t t=1 et )
∑t=n
t=2 et et−1
d ∗ = 2(1 − ∑t=n 2 ) (42)
t=1 et
d ∗ = 2(1 − ρ̂) (43)
∑t=n
t=2 et et−1
ρ̂ = ∑t=n 2 (44)
t=1 et
−1 ≤ ρ̂ ≤ 1 ⇒ 0 ≤ d ∗ ≤ 4 (45)
Detection of Autocorrelation
If ρ̂ = 0 ⇒, d ∗ = 2 ⇒ no autocorrelation
If ρ̂ = 1 ⇒, d ∗ = 0 ⇒ positive autocorrelation
Ifρ̂ = −1 ⇒, d ∗ = 4 ⇒ negative autocorrelation
The formal test is by comparing d* with two critical values
which are called lower critical value(dL ) and upper critical
value(du )
if d∗ < dL or d∗ > (4 − dL ) ⇒ we reject the null hypothesis of
no autocorrelation in favor of the alternative which implies
existence of autocorrelation.
if du < d∗ < 4 − du ⇒ we accept the null hypothesis of no
autocorrelation
If dL < d∗ < du or 4 − du < d∗ < 4 − dL ⇒ indeterminate.
Which means we neither accept or reject the the null
hypothesis of autocorrelation
Detection of Autocorrelation
If ρ̂ = 0 ⇒, d ∗ = 2 ⇒ no autocorrelation
If ρ̂ = 1 ⇒, d ∗ = 0 ⇒ positive autocorrelation
Ifρ̂ = −1 ⇒, d ∗ = 4 ⇒ negative autocorrelation
The formal test is by comparing d* with two critical values
which are called lower critical value(dL ) and upper critical
value(du )
if d∗ < dL or d∗ > (4 − dL ) ⇒ we reject the null hypothesis of
no autocorrelation in favor of the alternative which implies
existence of autocorrelation.
if du < d∗ < 4 − du ⇒ we accept the null hypothesis of no
autocorrelation
If dL < d∗ < du or 4 − du < d∗ < 4 − dL ⇒ indeterminate.
Which means we neither accept or reject the the null
hypothesis of autocorrelation
Graphical representation of d-test
Detection of Autocorrelation
If ρ̂ = 0 ⇒, d ∗ = 2 ⇒ no autocorrelation
If ρ̂ = 1 ⇒, d ∗ = 0 ⇒ positive autocorrelation
Ifρ̂ = −1 ⇒, d ∗ = 4 ⇒ negative autocorrelation
The formal test is by comparing d* with two critical values
which are called lower critical value(dL ) and upper critical
value(du )
if d∗ < dL or d∗ > (4 − dL ) ⇒ we reject the null hypothesis of
no autocorrelation in favor of the alternative which implies
existence of autocorrelation.
if du < d∗ < 4 − du ⇒ we accept the null hypothesis of no
autocorrelation
If dL < d∗ < du or 4 − du < d∗ < 4 − dL ⇒ indeterminate.
Which means we neither accept or reject the the null
hypothesis of autocorrelation
Example on d-test
Consider the simple regression model, Yt = α̂ + β̂Xt + et .
Given the following information, test whether there is
auto-correlation in the estimated model using Durbin-Watson
d test. Where x and y are in deviation form
2 2
∑ xt yt = 255; ∑ xt = 280; ∑ yt = 274; X̄ = 8; Ȳ = 7 (46)
t=n t=n
2 2
∑ (et − et−1 ) = 60.21; ∑ et = 41.767; dL = 1.08; du = 1.36
t=2 t=1
(47)
solution;
∑t=n
t=2 (et −et−1 )
2
60.21
d= t=n 2
∑t=1 et
= 41.767 = 1.42 (48)
4 − du = 4 − 1.36 = 2.64;4 − dL = 2.92 ∶ dL = 1.08; du
since du < d ∗ < 4 − du = 1.36 < 1.442 < 2.64 ⇒ we accept the
null hypothesis of no autocorrelation. Hence the estimated
model is not auto-correlated
Exercise on d-test
From the table below, first estimate the coefficients ,the
t=n
(et −et−1 )2
residuals(et &et−1 ), ∑t=2∑t=n 2 then the d-value
t=1 et
Make sure that you get the same d-value as in the above
Xt 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Yt 2 2 2 1 3 5 6 6 10 10 10 12 15 10 11
t 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Solution of Auto-correlation
Generally the solution is to apply OLS on the transformed
model. Again, This is called GLS or WLS.
The transformation depends on ρ in the model below is known
or not
Yt = α + βXt + Ut (49)
Ut = ρUt−1 + vt ; ρ ≠ 0 (50)
if ρ is known, then he procedure of transformation will be
given below.
Take the lagged form of Yt = α + βXt + Ut and multiply
through by ρ
ρYt = ρα + ρβXt−1 + ρUt−1 (51)
Subtract the above equation from Yt = α + βXt + Ut i,.e
(Yt −ρYt−1 ) = (α−ρα)+(betaXt −ρβXt−1 +(Ut −ρUt−1 ) (52)
⇒ (Ut − ρUt−1 = vt and vt is not auto-correlated
Solution of Auto-correlation
OLS on the above transformed model is GLS and estimated
its coefficient are BLUE
When ρ is not known, we rely either on a prior information or
use estimated ρ
1. a prior information : A researcher usually makes
reasonable guess. Most commonly used value is ρ = 1. In this
case the transformation is called first difference
(Yt − Yt−1 ) = (α − α) + (βXt ) − βXt−1 + (Ut − Ut−1 ) (53)
2. Use estimated ρ : for example we can estimate ρ from
the d ∗ ,Durbin-Watson d test as
d ∗ = 2(1 − ρ̂) (54)
Or estimate from
∑t=n
t=2 et et−1
ρ̂ = ∑t=n 2 (55)
t=1 et