Chap 9 Multicollinearity
Chap 9 Multicollinearity
9.1 Introduction
• Recall from Chapter 3 that multicollinearity exists when there are near lin-
ear dependencies among the regressors.
1
2 CHAPTER 9
2. Constraints
Suppose that an electric utility is investigating the e↵ect of family income
(x1 ) and house size (x2 ) on residential electricity consumption (y). The lev-
els of the two regressors obtained in the sample data are shown in the scatter
diagram below. Note that the data lies approximately along a straight line,
indicating a potential multicollinearity problem. In this case, a physical con-
straint in the population has caused this phenomenon, namely, families with
higher incomes generally have larger homes, while families with lower in-
comes usually have smaller homes. When physical constraints such as this
are present, multicollinearity will exist regardless of the sampling method em-
ployed. Constraints often occur in problems involving production or chemical
processes, where the regressors are the components of a product, and these
components add to a constant.
4 CHAPTER 9
3. Choice of Model
If the range of a regressor x is small, adding polynomial terms like x2 can
result in significant multicollinearity.
4. Overdefined Model
An overdefined model has more regressors than observations. These models
are sometimes encountered in medical and behavioral research, where there
may be only a small number of subjects available, and information is collected
from a large number of regressors on each subject. The usual approach to
dealing with multicollinearity in this context is to eliminate some of the re-
gressors from consideration.
CHAPTER 9 5
1. Large Var( ˆ)
Recall that in matrix notation, the multiple linear regression model with k
regressors and n observations is
y =X +✏
where
2 3 2 3 2 3 2 3
6 y1 7 6 1 x11 x12 · · · x1k 7 6 0 7 6 ✏1 7
6 7 6 7 6 7 6 7
6 y2 7 6 1 x21 x22 · · · x2k 7 6 1 7 6 ✏2 7
6 7 6 7 6 7 6 7
y=6 7, X=6 , =6 7, ✏=6
6 ... 7 6 .. .. .. .. 7
7 6 ... 7 6 .. 7
6 7 6 . . . . 7 6 7 6 . 7
7
4 5 4 5 4 5 4 5
yn 1 xn1 xn2 · · · xnk k ✏n
Similar to the Unit Normal Scaling under the Standardized Regression Co-
efficients in Chapter 3, we can scale the variables using the following Unit
Length Scaling
xij x̄j yi ȳ
wij = p and yi0 = p , i = 1, 2, . . . , n, j = 1, 2, . . . , k
Sjj S yy
suum of square
~
Pn Pn
where Sjj = i=1 (xij x̄j )2 and Syy = SST =
ȳ)2 . Under this scal-
i=1 (yi
pP n
ing, each new regressor wj has mean w̄j = 0 with length i=1 (wij w̄j )2 =
1, and the transformed model is
出 =
上 + 点 ≤= (
组 ^≈ ) ” 以些
6 CHAPTER 9
or in matrix form
y0 = Wb + ✏
A useful result from this scaling is that the W0 W matrix is in the form of a
correlation matrix
2 3
6 1 r12 r13 · · · r1k 7
6 7
6 r12 1 r23 · · · r2k 7
6 7
6 7
W0 W = 6
6 r13 r23 1 · · · r3k 7 7
6 7
6 .. .. .. . . . .. 7
6 . . . . 7
4 5
r1k r2k r3k · · · 1
Pn
u=1 (xui x̄i )(xuj x̄j ) Sij
where rij = p = p is the simple correlation
Sii Sjj Sii Sjj
between regressors xi and xj .
Similarly, 2 3
6 r1y 7
6 7
6 r2y 7
6 7
6 7
W0 y0 = 6
6 r3y 7
7
6 7
6 .. 7
6 . 7
4 5
rky
Pn
u=1 (xuj x̄j )(yu ȳ) Sjy
where rjy = p = p is the simple correlation
Sjj Syy Sjj Syy
between regressor xj and response y.
CHAPTER 9 7
Suppose that there are only two regressors, x1 and x2 . The model, assuming
that x1 , x2 and y are scaled to unit length, is
y= 1 x1 + 2 x2 +✏
↑
no -
intersept
The matrix X0 X is 2 3
1 r12
X0 X = 4 5
r12 1
4 r12 1 54 5=4 12
r2y r12 r1y 5
1 r12 2 1 r12 2 r 2y
1 r122 。
When there are more than two regressors, multicollinearity produces similar
e↵ects. It can be shown that the diagonal elements of the matrix C =
(X0 X) 1
are
1
Cjj = , j = 1, 2, . . . , k
1 Rj2
where Rj2 is the coefficient of multiple determination from the regression of
xj on the remaining k 1 regressors.
To see this, consider the squared distance from ˆ to the true parameter
L21 = ( ˆ )0 ( ˆ )
E(L21 ) = E[( ˆ )0 ( ˆ )]
k
X
= E( ˆj j)
2
j=1
k
X
= Var( ˆj )
j=1
k
X 2
=
j=1
1 Rj2
CHAPTER 9 9
The presence of multicollinearity implies that the distance from the least
squares estimate to the true parameter may be large.
k
X 2
0
=) E( ˆ ˆ) = 0
+
j=1
1 Rj2
That is, the vector ˆ is generally “longer” than the vector , implying the
method of least squares produces estimated regression coefficients that are
large in absolute value.
Although the method of least squares will generally produce poor estimates
of the individual model parameters when multicollinearity is present, this
does not necessarily imply that the fitted model gives poor predictions.
• We will now discuss and illustrate the diagnostic measures to detect multi-
collinearity.
y
* x
… 3
CHAPTER 9 11
These are typical chemical process data for which a full quadratic response
surface in all three regressors is considered to be an appropriate initial model
y= 0 + 1 x1 + 2 x2 + 3 x3 + 4 x1 x2 + 5 x1 x3 + 6 x2 x3
+ 7 x21 + polynomial
2 2 model
8 x2 + 9 x3 +✏
where
了
reactor temperature 1212.50
x1 =
80.623
' '
ratio of H2 to n-heptane 12.44 usual
x2 = standardization
5.662 5o that '
within
multicollinearity
contact time 0.0403
x3 =
0.03164
12 CHAPTER 9
Use the following scatterplot matrix and correlation matrix to discuss about
the problem of multicollinearity in this question.
CHAPTER 9 13
x , xx xs x ,
x ,
x . x; x . x3 x? ☆ ㉓
λ
xz
x
,
X
12
X 13
X 23
×i
×z
×㎡
Rule : Ir 1 > 0 .
8
cor ( x , ,
x
3
) = -
0 958 .
δ ( xi
C . ,
x *3
, ) = -
0 .
972
Λ
( or (x ,
x2
,
x< x3
) = -
0 975
.
Cor ( xi ,
xi ) = 0 874
.
Cir ( xi ,
x, x3
) =
-
0 972 .
Observing the correlation matrix, we can see that the high correlations are
mainly due to the negative linear relationship between reactor temperature
(x1) and contact time (x3). Problem of multicollinearity is present.
14 CHAPTER 9
and
4
X
xij = 10 for observations 2 to 12.
j=1
x5 and x6 were obtained from a table of normal random numbers. The re-
sponses yi were generated by the relationship
linearly dependent
~
cannot 3 or
more
Regressors : x
, ,
x2 …
X5
, ,
R: ?
Mode / 4 ( -
|µ (x 4 ~ x ,
+ x 2
+ x3 +
x5
)
summary ( Modsl 4
)
16 CHAPTER 9
xj → l ,
depende
,ntR → 1 VIF; →∞
,
We can
also identify which regressurs are involued in the
mlt .
involved are : X
Xs x xs
The regressor , .
,
, ,
xix
3 x ? ;
i
,
,
CHAPTER 9 17
3. Eigensystem Analysis of X0 X
The eigenvalues of X0 X, j, can be used to measure the extent of multi-
collinearity in the data, where X is the matrix of regressors after the unit
length scaling. One or more small eigenvalues imply the existence of multi-
collinearity.
max
spelling kuppas =
min
need 4 o remmember ! j =
max
, j = 1, 2, . . . , k
j
The number of condition indices that are large (generally > 1000) is a useful
measure of the number of near linear dependencies in X0 X.
蝆
{
λ max
k =
⼊ min
exam not
I manx
give kj =
乃j
」
j : 1
,
…
,
k
4 205230
k
.
=
5
= 43381 31 .
≥ 1000
9 693644 × 10
-
{
exay Hot
give
2 4287873 ( 5
k
.
= = 2195 908
,
71000
0 .
001106051
We can further use the eigenvectors to identify the nature of the near linear
dependencies in the data.
了
We concluded that there is one near linear dependence in the data. The
smallest eigenvalue is 6 = 0.0011 and the corresponding eigenvector is
2 3
λ
6 0.44768 7 ,
6 7
6 0.42114 7 ×
6 7 z
6 7
6 0.54169 7 x
6 7 了
6 7
6 0.57337 7 x
6 7 4
6 7
6 7×
6 0.00605 7 5
4 5
0.00217 x 。
Consider the delivery time data. The regressors cases and distance have a
strong positive linear relationship, indicating a problem of multicollinearity.
This could be avoided by collecting some additional data at points designed
to break up the relationship, that is, at points where cases are small and
distance is large and at points where cases are large and distance is small.
2. Model Respecification
If two highly correlated regressors are used in a regression model, we can
choose to respecify the model by dropping one of the regressor. This is
often a highly e↵ective technique. However, it may not provide a satisfactory
solution if the regressors dropped from the model have significant explanatory
power relative to the response. That is, eliminating regressors to solve the
multicollinearity problem may damage the predictive power of the model.
22 CHAPTER 9
3. Ridge Regression
With the existence of multicollinearity, the least squares estimator ˆ has
large variance, though unbiased. A biased estimator with small variance may
perform “better” than an unbiased estimator with large variance.
=) ˆ R = (X0 X + I) 1 X0 y
where matrix X and vector y are the ones after the unit length scaling, and
0 is a constant selected by the analyst. Note that when = 0, the ridge
estimator reduces to the least squares estimator.
o
,
0 , :
unbiased D , 0
o bissed
vorlo ( vor ( o ) (: )
0
)
.
MSE ( O ) ,
< MSE
0 is better
,
θ ,
is better
CHAPTER 9 23
For a biased estimator, we look at the mean squared error (MSE) in place of
the variance. It can be shown that
E( - 0
E( ˆ R ) = (X0 X + I) 1 X0 X
Var( ˆ R ) = 2
(X0 X + I) 1 X0 X(X0 X + I) 1
As increases, the squared bias increases but the variance decreases. In using
ridge regression we would like to choose a value of such that the reduction
in the variance is greater than the increase in the squared bias.
Note that the negative estimates of b6 and b7 in least squares are now positive
under the ridge regression.
If b̂j are the estimators of the standardized regression coefficients (for both
Unit Normal Scaling and Unit Length Scaling), then the estimator of the
unstandardized regression coefficients are
s
品。 {
k
X
ˆj = b̂j Syy , j = 1, 2, . . . , k ˆ0 = ȳ ˆj x̄j
Sjj i=1
M SRes R2