0% found this document useful (0 votes)
18 views29 pages

Chap 9 Multicollinearity

Chapter 9 discusses multicollinearity, which occurs when regressors exhibit near linear dependencies, leading to potentially misleading inferences in regression models. It identifies four primary sources of multicollinearity: data collection methods, constraints, choice of model, and overdefined models. The chapter also outlines the effects of multicollinearity on regression analysis, including large variances in estimates and diagnostics for detection.

Uploaded by

jonaslix02
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views29 pages

Chap 9 Multicollinearity

Chapter 9 discusses multicollinearity, which occurs when regressors exhibit near linear dependencies, leading to potentially misleading inferences in regression models. It identifies four primary sources of multicollinearity: data collection methods, constraints, choice of model, and overdefined models. The chapter also outlines the effects of multicollinearity on regression analysis, including large variances in estimates and diagnostics for detection.

Uploaded by

jonaslix02
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

CHAPTER 9 MULTICOLLINEARITY

9.1 Introduction

• Recall from Chapter 3 that multicollinearity exists when there are near lin-
ear dependencies among the regressors.

• If there is no linear relationship between two regressors, they are said to be


orthogonal. In practice, regressors are unlikely to be perfectly orthogonal.

• If multicollinearity is present, inferences based on the regression model can be


misleading or erroneous.

• In Figure 3.13(a), a strong linear relationship is observed between x1 and x2 ,


multicollinearity is present. Building a regression model to the data is analogous
to placing a plane through the dots. This plane is unstable and sensitive to
relatively small changes in the data points. By contrast, in Figure 3.13(b), x1
and x2 appear to be orthogonal, the plane fits to the points is more stable.

1
2 CHAPTER 9

9.2 Sources of Multicollinearity

• There are four primary sources of multicollinearity:

1. Data Collection Method


Consider the soft drink delivery time data discussed in chapter 3. From the
scatterplot matrix, we can observe that the regressors cases and distance have
a fairly strong linear relationship. Observations with a small number of cases
generally also have a short distance, while observations with a large number
of cases usually also have a long distance. If this positive correlation is strong
enough, a multicollinearity problem will occur. In this case, the problem can
be avoided simply by collecting data with a small number of cases and a long
distance.
CHAPTER 9 3

2. Constraints
Suppose that an electric utility is investigating the e↵ect of family income
(x1 ) and house size (x2 ) on residential electricity consumption (y). The lev-
els of the two regressors obtained in the sample data are shown in the scatter
diagram below. Note that the data lies approximately along a straight line,
indicating a potential multicollinearity problem. In this case, a physical con-
straint in the population has caused this phenomenon, namely, families with
higher incomes generally have larger homes, while families with lower in-
comes usually have smaller homes. When physical constraints such as this
are present, multicollinearity will exist regardless of the sampling method em-
ployed. Constraints often occur in problems involving production or chemical
processes, where the regressors are the components of a product, and these
components add to a constant.
4 CHAPTER 9

3. Choice of Model
If the range of a regressor x is small, adding polynomial terms like x2 can
result in significant multicollinearity.

4. Overdefined Model
An overdefined model has more regressors than observations. These models
are sometimes encountered in medical and behavioral research, where there
may be only a small number of subjects available, and information is collected
from a large number of regressors on each subject. The usual approach to
dealing with multicollinearity in this context is to eliminate some of the re-
gressors from consideration.
CHAPTER 9 5

9.3 E↵ects of Multicollinearity

• The presence of multicollinearity has a number of serious e↵ects on the regression


analysis.

1. Large Var( ˆ)
Recall that in matrix notation, the multiple linear regression model with k
regressors and n observations is

y =X +✏

where
2 3 2 3 2 3 2 3
6 y1 7 6 1 x11 x12 · · · x1k 7 6 0 7 6 ✏1 7
6 7 6 7 6 7 6 7
6 y2 7 6 1 x21 x22 · · · x2k 7 6 1 7 6 ✏2 7
6 7 6 7 6 7 6 7
y=6 7, X=6 , =6 7, ✏=6
6 ... 7 6 .. .. .. .. 7
7 6 ... 7 6 .. 7
6 7 6 . . . . 7 6 7 6 . 7
7
4 5 4 5 4 5 4 5
yn 1 xn1 xn2 · · · xnk k ✏n

Similar to the Unit Normal Scaling under the Standardized Regression Co-
efficients in Chapter 3, we can scale the variables using the following Unit
Length Scaling
xij x̄j yi ȳ
wij = p and yi0 = p , i = 1, 2, . . . , n, j = 1, 2, . . . , k
Sjj S yy
suum of square
~
Pn Pn
where Sjj = i=1 (xij x̄j )2 and Syy = SST =
ȳ)2 . Under this scal-
i=1 (yi
pP n
ing, each new regressor wj has mean w̄j = 0 with length i=1 (wij w̄j )2 =
1, and the transformed model is

yi0 = b1 wi1 + b2 wi2 + · · · + bk wik + ✏, i = 1, 2, . . . , n



no
intercept

出 =
上 + 点 ≤= (
组 ^≈ ) ” 以些
6 CHAPTER 9

or in matrix form
y0 = Wb + ✏

A useful result from this scaling is that the W0 W matrix is in the form of a
correlation matrix
2 3
6 1 r12 r13 · · · r1k 7
6 7
6 r12 1 r23 · · · r2k 7
6 7
6 7
W0 W = 6
6 r13 r23 1 · · · r3k 7 7
6 7
6 .. .. .. . . . .. 7
6 . . . . 7
4 5
r1k r2k r3k · · · 1

Pn
u=1 (xui x̄i )(xuj x̄j ) Sij
where rij = p = p is the simple correlation
Sii Sjj Sii Sjj
between regressors xi and xj .

Similarly, 2 3
6 r1y 7
6 7
6 r2y 7
6 7
6 7
W0 y0 = 6
6 r3y 7
7
6 7
6 .. 7
6 . 7
4 5
rky

Pn
u=1 (xuj x̄j )(yu ȳ) Sjy
where rjy = p = p is the simple correlation
Sjj Syy Sjj Syy
between regressor xj and response y.
CHAPTER 9 7

Suppose that there are only two regressors, x1 and x2 . The model, assuming
that x1 , x2 and y are scaled to unit length, is

y= 1 x1 + 2 x2 +✏

no -

intersept
The matrix X0 X is 2 3
1 r12
X0 X = 4 5
r12 1

and so the inverse is


2 1 r12 3
2
6 1 r12 1 r122 7
C = (X0 X) 1
=4 r12 1 5
1 r122 2
1 r12

We can then obtain the least squares estimator of


2 1 r12 3 2 3 2 r1y r12 r2y 3
2 2 7 r1y 2
β
ˆ = (X0 X) 1 X0 y = 6 1 r12 1 r12 6 1 r 7 ,

4 r12 1 54 5=4 12
r2y r12 r1y 5
1 r12 2 1 r12 2 r 2y
1 r122 。

and the variance of ˆ


2 3
2
r12 2
6 1 r2 1 r12 2 7
6 7
Var( ˆ) = 2 0
(X X) 1
=6 12
7
4 r12 2 2 5
1 r122 1 r12 2

Irial near I s Varli )sD


If multicollinearity exists, |r12 | will be large, resulting in large variances and
covariances for the least squares estimators of the regression coefficients.
8 CHAPTER 9

When there are more than two regressors, multicollinearity produces similar
e↵ects. It can be shown that the diagonal elements of the matrix C =
(X0 X) 1
are
1
Cjj = , j = 1, 2, . . . , k
1 Rj2
where Rj2 is the coefficient of multiple determination from the regression of
xj on the remaining k 1 regressors.

If multicollinearity exists between xj and any subset of the other k 1 re-


2
gressors, the value of Rj2 will be close to unity, causing Var( ˆj ) = to
1 Rj2
be large. Generally, the covariances of ˆj and other ˆi ’s will also be large.
refer to the distence befween iB and B
s .

2. “Large” estimates ˆ p is unbicsed ,


get estimate tar from it .

Multicollinearity also tends to produce least squares estimates ˆj that are


large in absolute value.

To see this, consider the squared distance from ˆ to the true parameter

L21 = ( ˆ )0 ( ˆ )

The expected value is

E(L21 ) = E[( ˆ )0 ( ˆ )]
k
X
= E( ˆj j)
2

j=1
k
X
= Var( ˆj )
j=1
k
X 2
=
j=1
1 Rj2
CHAPTER 9 9

The presence of multicollinearity implies that the distance from the least
squares estimate to the true parameter may be large.

Equivalently, we can show that


0 0
E(L21 ) = E[( ˆ )0 ( ˆ )] = E( ˆ ˆ 2ˆ + 0
)

k
X 2
0
=) E( ˆ ˆ) = 0
+
j=1
1 Rj2

That is, the vector ˆ is generally “longer” than the vector , implying the
method of least squares produces estimated regression coefficients that are
large in absolute value.

Although the method of least squares will generally produce poor estimates
of the individual model parameters when multicollinearity is present, this
does not necessarily imply that the fitted model gives poor predictions.

If predictions are confined to regions of the x space where the multicollinearity


holds, the fitted model often produces satisfactory predictions. This can
P
occur because the linear combination kj=1 j xij may be estimated well, even
though the individual parameters j are estimated poorly.
10 CHAPTER 9

9.4 Multicollinearity Diagnostics

• We will now discuss and illustrate the diagnostic measures to detect multi-
collinearity.

1. Scatterplot Matrix and Correlation Matrix


A very simple measure of multicollinearity is an inspection of the scatterplot
matrix together with the correlation matrix. If regressors xi and xj are nearly
linearly dependent, then their scatter diagram will show points lying approx-
imately along a straight line and the magnitude of their simple correlation
|rij | will be near unity.

Example: The Acetylene Data


Table 9.1 presents data concerning the percentage of conversion of n-heptane
to acetylene and three regressors.

y
* x
… 3
CHAPTER 9 11

These are typical chemical process data for which a full quadratic response
surface in all three regressors is considered to be an appropriate initial model

y= 0 + 1 x1 + 2 x2 + 3 x3 + 4 x1 x2 + 5 x1 x3 + 6 x2 x3

+ 7 x21 + polynomial
2 2 model
8 x2 + 9 x3 +✏

where

y = percentage of conversion of n-heptane to acetylene


reactor temperature 1212.50
x1 =
80.623
' '
ratio of H2 to n-heptane 12.44 usual
x2 = standardization
5.662 5o that '

within
multicollinearity
contact time 0.0403
x3 =
0.03164
12 CHAPTER 9

Use the following scatterplot matrix and correlation matrix to discuss about
the problem of multicollinearity in this question.
CHAPTER 9 13

x , xx xs x ,
x ,
x . x; x . x3 x? ☆ ㉓
λ

xz
x
,

X
12

X 13

X 23

×i

×z

×㎡

Rule : Ir 1 > 0 .
8

cor ( x , ,
x
3
) = -
0 958 .
δ ( xi
C . ,
x *3
, ) = -
0 .
972
Λ

( or (x ,
x2
,
x< x3
) = -
0 975
.
Cor ( xi ,
xi ) = 0 874
.

Cir ( xi ,
x, x3
) =
-
0 972 .

Observing the correlation matrix, we can see that the high correlations are
mainly due to the negative linear relationship between reactor temperature
(x1) and contact time (x3). Problem of multicollinearity is present.
14 CHAPTER 9

Example: The Webster, Gunst, and Mason Data


Webster, Gunst, and Mason generated the following data, having
4
X
x1j = 11 for observation 1
j=1

and
4
X
xij = 10 for observations 2 to 12.
j=1
x5 and x6 were obtained from a table of normal random numbers. The re-
sponses yi were generated by the relationship

y = 10 + 2x1 + x2 + 0.2x3 2x4 + 3x5 + 10x6 + ✏


iid
where ✏ ⇠ N(0, 1).

linearly dependent
~

In this example, we have more than two regressors, namely x1 , x2 , x3 , and


x4 , involve in a near linear dependence. Clearly, multicollinearity is present.

Observe the given correlation matrix and scatterplot matrix to comment


about using them to detect multicollinearity.
CHAPTER 9 15

Limitation of Ist method :

only use to detect any two regressor

cannot 3 or
more

None of the pairwise correlations are suspiciously large and consequently


we have no indication of the near linear dependencies between the
regressors. However, we should note that multicollinearity could occur
among regressors. Hence, a further check is required.

Scatterplot matrix and correlation matrix is only helpful in detecting


multicollinearity between pairs of regressors.
k;
'

Var ( iBi ) : racis =


. "

Regressors : x
, ,
x2 …
X5
, ,

R: ?

Mode / 4 ( -

|µ (x 4 ~ x ,
+ x 2
+ x3 +
x5
)
summary ( Modsl 4
)
16 CHAPTER 9

2. Variance Inflation Factors


Recall that the matrix C = (X0 X) 1 , where X is the matrix of regressors
after the unit length scaling, is closely related to multicollinearity. We define
the variance inflation factor, VIF as
1
VIFj = Cjj =
1 Rj2
where Cjj is the jth diagonal element of C and Rj2 is the coefficient of multiple
determination from the regression of xj on the remaining k 1 regressors.

If xj is nearly orthogonal to the remaining regressors, Rj2 is small and Cjj is


close to 1; if xj is nearly linearly dependent on some subset of the remaining
regressors, Rj2 is near 1 and Cjj is large. Since Var( ˆj ) = Cjj 2
, we can view
Cjj as the factor by which the variance is increased.

One or more large VIFs indicate multicollinearity. Practical experience sug-


gests that VIF > 5 is large. xj → orthogonal ,
R; → 0
,
VIF; → 1

xj → l ,
depende
,ntR → 1 VIF; →∞
,

Example: The Acetylene Data


Check if there exists problem of multicollinearity using the given VIFs.

The largest VIF is 6563 , 345193 , which is larger


than the typical valueof VIF
( 5
),
indicatingthay ul
is
present .

We can
also identify which regressurs are involued in the

mlt .

involved are : X
Xs x xs
The regressor , .

,
, ,
xix
3 x ? ;
i
,
,
CHAPTER 9 17

Example: The Webster, Gunst, and Mason Data


Use the following VIFs to detect multicollinearity.

The largest VIF is 297.714658 , which is larger than the typical


value of 5. This suggest that multicollinearity is present.
The regressors involved are x1, x2, x3, x4.

3. Eigensystem Analysis of X0 X
The eigenvalues of X0 X, j, can be used to measure the extent of multi-
collinearity in the data, where X is the matrix of regressors after the unit
length scaling. One or more small eigenvalues imply the existence of multi-
collinearity.

Some analysts prefer to examine the condition number of X0 X, defined as

max
spelling kuppas  =
min

which is a measure of the spread in the eigenvalue spectrum of X0 X. Generally

 < 100 no serious problem of multicollinearity


100    1000 moderate to strong multicollinearity
 > 1000 severe multicollinearity
18 CHAPTER 9

We can also compute the condition indices of X0 X

need 4 o remmember ! j =
max
, j = 1, 2, . . . , k
j

The number of condition indices that are large (generally > 1000) is a useful
measure of the number of near linear dependencies in X0 X.

Example: The Acetylene Data


Check if there exists problem of multicollinearity using the given eigenvalues.


{
λ max
k =
⼊ min

exam not
I manx

give kj =
乃j

j : 1
,

,
k

4 205230
k
.

=
5
= 43381 31 .
≥ 1000
9 693644 × 10
-

Implying that multicollinearity is present.


Since only one condition index is more than 1000, there is
only one linearly dependence relationship among the
regressors.
CHAPTER 9 19

Example: The Webster, Gunst, and Mason Data


Use the following eigenvalues to detect multicollinearity.

{
exay Hot

give

2 4287873 ( 5
k
.

= = 2195 908
,
71000
0 .

001106051

Implying that multicollinearity is present.


Since only one condition number is more than 1000, we
conclude that there is only one linearly dependent
relationship among the regressors.
20 CHAPTER 9

We can further use the eigenvectors to identify the nature of the near linear
dependencies in the data.


We concluded that there is one near linear dependence in the data. The
smallest eigenvalue is 6 = 0.0011 and the corresponding eigenvector is
2 3
λ
6 0.44768 7 ,

6 7
6 0.42114 7 ×
6 7 z

6 7
6 0.54169 7 x
6 7 了
6 7
6 0.57337 7 x
6 7 4
6 7
6 7×
6 0.00605 7 5
4 5
0.00217 x 。

This implies the relationship


0.44768x1 + 0.42114x2 + 0.54169x3 + 0.57337x4
+0.00605x5 + 0.00217x6 = 0
which is approximately 的
small

x1 + 0.941x2 + 1.210x3 + 1.281x4 = 0


CHAPTER 9 21

9.5 Methods for Dealing with Multicollinearity

• There are a few approaches to combat the problems induced by multicollinearity.

1. Collecting Additional Data


Collecting additional data has been suggested as the best method of combat-
ing multicollinearity. The additional data should be collected in a manner
designed to break up the multicollinearity in the existing data.

Consider the delivery time data. The regressors cases and distance have a
strong positive linear relationship, indicating a problem of multicollinearity.
This could be avoided by collecting some additional data at points designed
to break up the relationship, that is, at points where cases are small and
distance is large and at points where cases are large and distance is small.

However, there are situations that this approach is not applicable:


• The multicollinearity is due to constraints on the model or in the popu-
lation.
• Economic constraints.
• The process being studied is no longer available.

2. Model Respecification
If two highly correlated regressors are used in a regression model, we can
choose to respecify the model by dropping one of the regressor. This is
often a highly e↵ective technique. However, it may not provide a satisfactory
solution if the regressors dropped from the model have significant explanatory
power relative to the response. That is, eliminating regressors to solve the
multicollinearity problem may damage the predictive power of the model.
22 CHAPTER 9

If we have x1 , x2 , and x3 being nearly linearly dependent, another possibility


is to find some function such as x = (x1 +x2 )/x3 or x = x1 x2 x3 that preserves
the information content in the original regressors but solves the problem of
multicollinearity.

3. Ridge Regression
With the existence of multicollinearity, the least squares estimator ˆ has
large variance, though unbiased. A biased estimator with small variance may
perform “better” than an unbiased estimator with large variance.

A number of procedures have been developed for obtaining biased estimators


of regression coefficients. One of these procedures is the ridge regression.
The ridge estimator ˆ R is found by solving a modified version of the normal
equation
(X0 X + I) ˆ R = X0 y

=) ˆ R = (X0 X + I) 1 X0 y

where matrix X and vector y are the ones after the unit length scaling, and
0 is a constant selected by the analyst. Note that when = 0, the ridge
estimator reduces to the least squares estimator.

o
,

0 , :
unbiased D , 0
o bissed

vorlo ( vor ( o ) (: )
0
)
.
MSE ( O ) ,
< MSE

0 is better
,
θ ,
is better
CHAPTER 9 23

For a biased estimator, we look at the mean squared error (MSE) in place of
the variance. It can be shown that

MSE = Variance + (Bias)2


E (0 ) ㎡= (δ )
<
0 Var +
( d) ]
-

E( - 0

For the ridge estimator ˆ R ,

E( ˆ R ) = (X0 X + I) 1 X0 X

Var( ˆ R ) = 2
(X0 X + I) 1 X0 X(X0 X + I) 1

As increases, the squared bias increases but the variance decreases. In using
ridge regression we would like to choose a value of such that the reduction
in the variance is greater than the increase in the squared bias.

We will determine the value of by inspecting the ridge trace, which is a


plot of ˆ R versus where is usually between 0 and 1. If multicollinearity
is severe, the instability of ˆ R will show up in the ridge trace. And, as
increases, ˆ R will stabilize. The objective is to select a reasonably small at
which the ridge estimates are stable.

Example: The Acetylene Data


Use ridge regression to fit the model.
24 CHAPTER 9
CHAPTER 9 25
26 CHAPTER 9

Least squares estimates: unit scaled

ŷ 0 = 0.33648681w1 + 0.23349593w2 0.67589625w3 0.47995686w4


2.03395608w5 0.26571830w6 0.83454188w7 0.09035419w8
1.00085767w9

Ridge regression estimates:

ŷ 0 = 0.5031290w1 + 0.2042090w2 0.3785296w3 0.1728259w4


0.05463787w5 + 0.06781794w6 + 0.1250744w7 0.04605682w8
0.02776430w9

Note that the negative estimates of b6 and b7 in least squares are now positive
under the ridge regression.

Estimates of b5 , b6 and b9 are considerably smaller in magnitude under ridge


regression than the least squares estimates.
CHAPTER 9 27

If b̂j are the estimators of the standardized regression coefficients (for both
Unit Normal Scaling and Unit Length Scaling), then the estimator of the
unstandardized regression coefficients are
s
品。 {
k
X
ˆj = b̂j Syy , j = 1, 2, . . . , k ˆ0 = ȳ ˆj x̄j
Sjj i=1

ŷ = 35.16593 + 5.986614x1 + 2.429835x2 4.504035x3


2.324997x1 x2 0.7247702x1 x3 + 0.9617223x2 x3
+ 1.876934x21 0.4958324x22 0.3216017x23
28 CHAPTER 9

As increases, SSRes increases therefore M SRes increases but R2 decreases.


⇣ ⌘0 ⇣ ⌘ ⇣ ⌘0 ⇣ ⌘
SSRes = y X ˆ y X ˆ + ˆ R
ˆ 0
XX R ˆ ˆ

M SRes R2

0 0.018112 = 0.0003279721 0.9977

0.02 0.001240819 0.9913143

Chapter 9 Problems 1, 6, 7, 8, 10, 13, 14, 19, 23(a)(b)(c).

You might also like