0% found this document useful (0 votes)

18 views29 pages

Chap 9 Multicollinearity

Chapter 9 discusses multicollinearity, which occurs when regressors exhibit near linear dependencies, leading to potentially misleading inferences in regression models. It identifies four primary sources of multicollinearity: data collection methods, constraints, choice of model, and overdefined models. The chapter also outlines the effects of multicollinearity on regression analysis, including large variances in estimates and diagnostics for detection.

Uploaded by

jonaslix02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views29 pages

Chap 9 Multicollinearity

Uploaded by

jonaslix02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

CHAPTER 9 MULTICOLLINEARITY

9.1 Introduction

• Recall from Chapter 3 that multicollinearity exists when there are near lin-
ear dependencies among the regressors.

• If there is no linear relationship between two regressors, they are said to be

orthogonal. In practice, regressors are unlikely to be perfectly orthogonal.

• If multicollinearity is present, inferences based on the regression model can be

misleading or erroneous.

• In Figure 3.13(a), a strong linear relationship is observed between x1 and x2 ,

multicollinearity is present. Building a regression model to the data is analogous
to placing a plane through the dots. This plane is unstable and sensitive to
relatively small changes in the data points. By contrast, in Figure 3.13(b), x1
and x2 appear to be orthogonal, the plane fits to the points is more stable.

1
2 CHAPTER 9

9.2 Sources of Multicollinearity

• There are four primary sources of multicollinearity:

1. Data Collection Method

Consider the soft drink delivery time data discussed in chapter 3. From the
scatterplot matrix, we can observe that the regressors cases and distance have
a fairly strong linear relationship. Observations with a small number of cases
generally also have a short distance, while observations with a large number
of cases usually also have a long distance. If this positive correlation is strong
enough, a multicollinearity problem will occur. In this case, the problem can
be avoided simply by collecting data with a small number of cases and a long
distance.
CHAPTER 9 3

2. Constraints
Suppose that an electric utility is investigating the e↵ect of family income
(x1 ) and house size (x2 ) on residential electricity consumption (y). The lev-
els of the two regressors obtained in the sample data are shown in the scatter
diagram below. Note that the data lies approximately along a straight line,
indicating a potential multicollinearity problem. In this case, a physical con-
straint in the population has caused this phenomenon, namely, families with
higher incomes generally have larger homes, while families with lower in-
comes usually have smaller homes. When physical constraints such as this
are present, multicollinearity will exist regardless of the sampling method em-
ployed. Constraints often occur in problems involving production or chemical
processes, where the regressors are the components of a product, and these
components add to a constant.
4 CHAPTER 9

3. Choice of Model
If the range of a regressor x is small, adding polynomial terms like x2 can
result in significant multicollinearity.

4. Overdefined Model
An overdefined model has more regressors than observations. These models
are sometimes encountered in medical and behavioral research, where there
may be only a small number of subjects available, and information is collected
from a large number of regressors on each subject. The usual approach to
dealing with multicollinearity in this context is to eliminate some of the re-
gressors from consideration.
CHAPTER 9 5

9.3 E↵ects of Multicollinearity

• The presence of multicollinearity has a number of serious e↵ects on the regression

analysis.

1. Large Var( ˆ)
Recall that in matrix notation, the multiple linear regression model with k
regressors and n observations is

y =X +✏

where
2 3 2 3 2 3 2 3
6 y1 7 6 1 x11 x12 · · · x1k 7 6 0 7 6 ✏1 7
6 7 6 7 6 7 6 7
6 y2 7 6 1 x21 x22 · · · x2k 7 6 1 7 6 ✏2 7
6 7 6 7 6 7 6 7
y=6 7, X=6 , =6 7, ✏=6
6 ... 7 6 .. .. .. .. 7
7 6 ... 7 6 .. 7
6 7 6 . . . . 7 6 7 6 . 7
7
4 5 4 5 4 5 4 5
yn 1 xn1 xn2 · · · xnk k ✏n

Similar to the Unit Normal Scaling under the Standardized Regression Co-
efficients in Chapter 3, we can scale the variables using the following Unit
Length Scaling
xij x̄j yi ȳ
wij = p and yi0 = p , i = 1, 2, . . . , n, j = 1, 2, . . . , k
Sjj S yy
suum of square
~
Pn Pn
where Sjj = i=1 (xij x̄j )2 and Syy = SST =
ȳ)2 . Under this scal-
i=1 (yi
pP n
ing, each new regressor wj has mean w̄j = 0 with length i=1 (wij w̄j )2 =
1, and the transformed model is

yi0 = b1 wi1 + b2 wi2 + · · · + bk wik + ✏, i = 1, 2, . . . , n

↑
no
intercept

出 =
上 + 点 ≤= (
组 ^≈ ) ” 以些
6 CHAPTER 9

or in matrix form
y0 = Wb + ✏

A useful result from this scaling is that the W0 W matrix is in the form of a
correlation matrix
2 3
6 1 r12 r13 · · · r1k 7
6 7
6 r12 1 r23 · · · r2k 7
6 7
6 7
W0 W = 6
6 r13 r23 1 · · · r3k 7 7
6 7
6 .. .. .. . . . .. 7
6 . . . . 7
4 5
r1k r2k r3k · · · 1

Pn
u=1 (xui x̄i )(xuj x̄j ) Sij
where rij = p = p is the simple correlation
Sii Sjj Sii Sjj
between regressors xi and xj .

Similarly, 2 3
6 r1y 7
6 7
6 r2y 7
6 7
6 7
W0 y0 = 6
6 r3y 7
7
6 7
6 .. 7
6 . 7
4 5
rky

Pn
u=1 (xuj x̄j )(yu ȳ) Sjy
where rjy = p = p is the simple correlation
Sjj Syy Sjj Syy
between regressor xj and response y.
CHAPTER 9 7

Suppose that there are only two regressors, x1 and x2 . The model, assuming
that x1 , x2 and y are scaled to unit length, is

y= 1 x1 + 2 x2 +✏
↑
no -

intersept
The matrix X0 X is 2 3
1 r12
X0 X = 4 5
r12 1

and so the inverse is

2 1 r12 3
2
6 1 r12 1 r122 7
C = (X0 X) 1
=4 r12 1 5
1 r122 2
1 r12

We can then obtain the least squares estimator of

2 1 r12 3 2 3 2 r1y r12 r2y 3
2 2 7 r1y 2
β
ˆ = (X0 X) 1 X0 y = 6 1 r12 1 r12 6 1 r 7 ,

4 r12 1 54 5=4 12
r2y r12 r1y 5
1 r12 2 1 r12 2 r 2y
1 r122 。

and the variance of ˆ

2 3
2
r12 2
6 1 r2 1 r12 2 7
6 7
Var( ˆ) = 2 0
(X X) 1
=6 12
7
4 r12 2 2 5
1 r122 1 r12 2

Irial near I s Varli )sD

If multicollinearity exists, |r12 | will be large, resulting in large variances and
covariances for the least squares estimators of the regression coefficients.
8 CHAPTER 9

When there are more than two regressors, multicollinearity produces similar
e↵ects. It can be shown that the diagonal elements of the matrix C =
(X0 X) 1
are
1
Cjj = , j = 1, 2, . . . , k
1 Rj2
where Rj2 is the coefficient of multiple determination from the regression of
xj on the remaining k 1 regressors.

If multicollinearity exists between xj and any subset of the other k 1 re-

2
gressors, the value of Rj2 will be close to unity, causing Var( ˆj ) = to
1 Rj2
be large. Generally, the covariances of ˆj and other ˆi ’s will also be large.
refer to the distence befween iB and B
s .

2. “Large” estimates ˆ p is unbicsed ,

get estimate tar from it .

Multicollinearity also tends to produce least squares estimates ˆj that are

large in absolute value.

To see this, consider the squared distance from ˆ to the true parameter

L21 = ( ˆ )0 ( ˆ )

The expected value is

E(L21 ) = E[( ˆ )0 ( ˆ )]
k
X
= E( ˆj j)
2

j=1
k
X
= Var( ˆj )
j=1
k
X 2
=
j=1
1 Rj2
CHAPTER 9 9

The presence of multicollinearity implies that the distance from the least
squares estimate to the true parameter may be large.

Equivalently, we can show that

0 0
E(L21 ) = E[( ˆ )0 ( ˆ )] = E( ˆ ˆ 2ˆ + 0
)

k
X 2
0
=) E( ˆ ˆ) = 0
+
j=1
1 Rj2

That is, the vector ˆ is generally “longer” than the vector , implying the
method of least squares produces estimated regression coefficients that are
large in absolute value.

Although the method of least squares will generally produce poor estimates
of the individual model parameters when multicollinearity is present, this
does not necessarily imply that the fitted model gives poor predictions.

If predictions are confined to regions of the x space where the multicollinearity

holds, the fitted model often produces satisfactory predictions. This can
P
occur because the linear combination kj=1 j xij may be estimated well, even
though the individual parameters j are estimated poorly.
10 CHAPTER 9

9.4 Multicollinearity Diagnostics

• We will now discuss and illustrate the diagnostic measures to detect multi-
collinearity.

1. Scatterplot Matrix and Correlation Matrix

A very simple measure of multicollinearity is an inspection of the scatterplot
matrix together with the correlation matrix. If regressors xi and xj are nearly
linearly dependent, then their scatter diagram will show points lying approx-
imately along a straight line and the magnitude of their simple correlation
|rij | will be near unity.

Example: The Acetylene Data

Table 9.1 presents data concerning the percentage of conversion of n-heptane
to acetylene and three regressors.

y
* x
… 3
CHAPTER 9 11

These are typical chemical process data for which a full quadratic response
surface in all three regressors is considered to be an appropriate initial model

y= 0 + 1 x1 + 2 x2 + 3 x3 + 4 x1 x2 + 5 x1 x3 + 6 x2 x3

+ 7 x21 + polynomial
2 2 model
8 x2 + 9 x3 +✏

where

y = percentage of conversion of n-heptane to acetylene

了
reactor temperature 1212.50
x1 =
80.623
' '
ratio of H2 to n-heptane 12.44 usual
x2 = standardization
5.662 5o that '

within
multicollinearity
contact time 0.0403
x3 =
0.03164
12 CHAPTER 9

Use the following scatterplot matrix and correlation matrix to discuss about
the problem of multicollinearity in this question.
CHAPTER 9 13

x , xx xs x ,
x ,
x . x; x . x3 x? ☆ ㉓
λ

xz
x
,

X
12

X 13

X 23

×i

×z

×㎡

Rule : Ir 1 > 0 .
8

cor ( x , ,
x
3
) = -
0 958 .
δ ( xi
C . ,
x *3
, ) = -
0 .
972
Λ

( or (x ,
x2
,
x< x3
) = -
0 975
.
Cor ( xi ,
xi ) = 0 874
.

Cir ( xi ,
x, x3
) =
-
0 972 .

Observing the correlation matrix, we can see that the high correlations are
mainly due to the negative linear relationship between reactor temperature
(x1) and contact time (x3). Problem of multicollinearity is present.
14 CHAPTER 9

Example: The Webster, Gunst, and Mason Data

Webster, Gunst, and Mason generated the following data, having
4
X
x1j = 11 for observation 1
j=1

and
4
X
xij = 10 for observations 2 to 12.
j=1
x5 and x6 were obtained from a table of normal random numbers. The re-
sponses yi were generated by the relationship

y = 10 + 2x1 + x2 + 0.2x3 2x4 + 3x5 + 10x6 + ✏

iid
where ✏ ⇠ N(0, 1).

linearly dependent
~

In this example, we have more than two regressors, namely x1 , x2 , x3 , and

x4 , involve in a near linear dependence. Clearly, multicollinearity is present.

Observe the given correlation matrix and scatterplot matrix to comment

about using them to detect multicollinearity.
CHAPTER 9 15

Limitation of Ist method :

only use to detect any two regressor

cannot 3 or
more

None of the pairwise correlations are suspiciously large and consequently

we have no indication of the near linear dependencies between the
regressors. However, we should note that multicollinearity could occur
among regressors. Hence, a further check is required.

Scatterplot matrix and correlation matrix is only helpful in detecting

multicollinearity between pairs of regressors.
k;
'

Var ( iBi ) : racis =

. "

Regressors : x
, ,
x2 …
X5
, ,

R: ?

Mode / 4 ( -

|µ (x 4 ~ x ,
+ x 2
+ x3 +
x5
)
summary ( Modsl 4
)
16 CHAPTER 9

2. Variance Inflation Factors

Recall that the matrix C = (X0 X) 1 , where X is the matrix of regressors
after the unit length scaling, is closely related to multicollinearity. We define
the variance inflation factor, VIF as
1
VIFj = Cjj =
1 Rj2
where Cjj is the jth diagonal element of C and Rj2 is the coefficient of multiple
determination from the regression of xj on the remaining k 1 regressors.

If xj is nearly orthogonal to the remaining regressors, Rj2 is small and Cjj is

close to 1; if xj is nearly linearly dependent on some subset of the remaining
regressors, Rj2 is near 1 and Cjj is large. Since Var( ˆj ) = Cjj 2
, we can view
Cjj as the factor by which the variance is increased.

One or more large VIFs indicate multicollinearity. Practical experience sug-

gests that VIF > 5 is large. xj → orthogonal ,
R; → 0
,
VIF; → 1

xj → l ,
depende
,ntR → 1 VIF; →∞
,

Example: The Acetylene Data

Check if there exists problem of multicollinearity using the given VIFs.

The largest VIF is 6563 , 345193 , which is larger

than the typical valueof VIF
( 5
),
indicatingthay ul
is
present .

We can
also identify which regressurs are involued in the

mlt .

involved are : X
Xs x xs
The regressor , .

,
, ,
xix
3 x ? ;
i
,
,
CHAPTER 9 17

Example: The Webster, Gunst, and Mason Data

Use the following VIFs to detect multicollinearity.

The largest VIF is 297.714658 , which is larger than the typical

value of 5. This suggest that multicollinearity is present.
The regressors involved are x1, x2, x3, x4.

3. Eigensystem Analysis of X0 X
The eigenvalues of X0 X, j, can be used to measure the extent of multi-
collinearity in the data, where X is the matrix of regressors after the unit
length scaling. One or more small eigenvalues imply the existence of multi-
collinearity.

Some analysts prefer to examine the condition number of X0 X, defined as

max
spelling kuppas  =
min

which is a measure of the spread in the eigenvalue spectrum of X0 X. Generally

 < 100 no serious problem of multicollinearity

100    1000 moderate to strong multicollinearity
 > 1000 severe multicollinearity
18 CHAPTER 9

We can also compute the condition indices of X0 X

need 4 o remmember ! j =
max
, j = 1, 2, . . . , k
j

The number of condition indices that are large (generally > 1000) is a useful
measure of the number of near linear dependencies in X0 X.

Example: The Acetylene Data

Check if there exists problem of multicollinearity using the given eigenvalues.

蝆
{
λ max
k =
⼊ min

exam not
I manx

give kj =
乃j
」
j : 1
,
…
,
k

4 205230
k
.

=
5
= 43381 31 .
≥ 1000
9 693644 × 10
-

Implying that multicollinearity is present.

Since only one condition index is more than 1000, there is
only one linearly dependence relationship among the
regressors.
CHAPTER 9 19

Example: The Webster, Gunst, and Mason Data

Use the following eigenvalues to detect multicollinearity.

{
exay Hot

give

2 4287873 ( 5
k
.

= = 2195 908
,
71000
0 .

001106051

Implying that multicollinearity is present.

Since only one condition number is more than 1000, we
conclude that there is only one linearly dependent
relationship among the regressors.
20 CHAPTER 9

We can further use the eigenvectors to identify the nature of the near linear
dependencies in the data.

了
We concluded that there is one near linear dependence in the data. The
smallest eigenvalue is 6 = 0.0011 and the corresponding eigenvector is
2 3
λ
6 0.44768 7 ,

6 7
6 0.42114 7 ×
6 7 z

6 7
6 0.54169 7 x
6 7 了
6 7
6 0.57337 7 x
6 7 4
6 7
6 7×
6 0.00605 7 5
4 5
0.00217 x 。

This implies the relationship

0.44768x1 + 0.42114x2 + 0.54169x3 + 0.57337x4
+0.00605x5 + 0.00217x6 = 0
which is approximately 的
small

x1 + 0.941x2 + 1.210x3 + 1.281x4 = 0

CHAPTER 9 21

9.5 Methods for Dealing with Multicollinearity

• There are a few approaches to combat the problems induced by multicollinearity.

1. Collecting Additional Data

Collecting additional data has been suggested as the best method of combat-
ing multicollinearity. The additional data should be collected in a manner
designed to break up the multicollinearity in the existing data.

Consider the delivery time data. The regressors cases and distance have a
strong positive linear relationship, indicating a problem of multicollinearity.
This could be avoided by collecting some additional data at points designed
to break up the relationship, that is, at points where cases are small and
distance is large and at points where cases are large and distance is small.

However, there are situations that this approach is not applicable:

• The multicollinearity is due to constraints on the model or in the popu-
lation.
• Economic constraints.
• The process being studied is no longer available.

2. Model Respecification
If two highly correlated regressors are used in a regression model, we can
choose to respecify the model by dropping one of the regressor. This is
often a highly e↵ective technique. However, it may not provide a satisfactory
solution if the regressors dropped from the model have significant explanatory
power relative to the response. That is, eliminating regressors to solve the
multicollinearity problem may damage the predictive power of the model.
22 CHAPTER 9

If we have x1 , x2 , and x3 being nearly linearly dependent, another possibility

is to find some function such as x = (x1 +x2 )/x3 or x = x1 x2 x3 that preserves
the information content in the original regressors but solves the problem of
multicollinearity.

3. Ridge Regression
With the existence of multicollinearity, the least squares estimator ˆ has
large variance, though unbiased. A biased estimator with small variance may
perform “better” than an unbiased estimator with large variance.

A number of procedures have been developed for obtaining biased estimators

of regression coefficients. One of these procedures is the ridge regression.
The ridge estimator ˆ R is found by solving a modified version of the normal
equation
(X0 X + I) ˆ R = X0 y

=) ˆ R = (X0 X + I) 1 X0 y

where matrix X and vector y are the ones after the unit length scaling, and
0 is a constant selected by the analyst. Note that when = 0, the ridge
estimator reduces to the least squares estimator.

o
,

0 , :
unbiased D , 0
o bissed

vorlo ( vor ( o ) (: )
0
)
.
MSE ( O ) ,
< MSE

0 is better
,
θ ,
is better
CHAPTER 9 23

For a biased estimator, we look at the mean squared error (MSE) in place of
the variance. It can be shown that

MSE = Variance + (Bias)2

E (0 ) ㎡= (δ )
<
0 Var +
( d) ]
-

E( - 0

For the ridge estimator ˆ R ,

E( ˆ R ) = (X0 X + I) 1 X0 X

Var( ˆ R ) = 2
(X0 X + I) 1 X0 X(X0 X + I) 1

As increases, the squared bias increases but the variance decreases. In using
ridge regression we would like to choose a value of such that the reduction
in the variance is greater than the increase in the squared bias.

We will determine the value of by inspecting the ridge trace, which is a

plot of ˆ R versus where is usually between 0 and 1. If multicollinearity
is severe, the instability of ˆ R will show up in the ridge trace. And, as
increases, ˆ R will stabilize. The objective is to select a reasonably small at
which the ridge estimates are stable.

Example: The Acetylene Data

Use ridge regression to fit the model.
24 CHAPTER 9
CHAPTER 9 25
26 CHAPTER 9

Least squares estimates: unit scaled

ŷ 0 = 0.33648681w1 + 0.23349593w2 0.67589625w3 0.47995686w4

2.03395608w5 0.26571830w6 0.83454188w7 0.09035419w8
1.00085767w9

Ridge regression estimates:

ŷ 0 = 0.5031290w1 + 0.2042090w2 0.3785296w3 0.1728259w4

0.05463787w5 + 0.06781794w6 + 0.1250744w7 0.04605682w8
0.02776430w9

Note that the negative estimates of b6 and b7 in least squares are now positive
under the ridge regression.

Estimates of b5 , b6 and b9 are considerably smaller in magnitude under ridge

regression than the least squares estimates.
CHAPTER 9 27

If b̂j are the estimators of the standardized regression coefficients (for both
Unit Normal Scaling and Unit Length Scaling), then the estimator of the
unstandardized regression coefficients are
s
品。 {
k
X
ˆj = b̂j Syy , j = 1, 2, . . . , k ˆ0 = ȳ ˆj x̄j
Sjj i=1

ŷ = 35.16593 + 5.986614x1 + 2.429835x2 4.504035x3

2.324997x1 x2 0.7247702x1 x3 + 0.9617223x2 x3
+ 1.876934x21 0.4958324x22 0.3216017x23
28 CHAPTER 9

As increases, SSRes increases therefore M SRes increases but R2 decreases.

⇣ ⌘0 ⇣ ⌘ ⇣ ⌘0 ⇣ ⌘
SSRes = y X ˆ y X ˆ + ˆ R
ˆ 0
XX R ˆ ˆ

M SRes R2

0 0.018112 = 0.0003279721 0.9977

0.02 0.001240819 0.9913143

Chapter 9 Problems 1, 6, 7, 8, 10, 13, 14, 19, 23(a)(b)(c).

Chapter9 Regression Multicollinearity
No ratings yet
Chapter9 Regression Multicollinearity
25 pages
9
No ratings yet
9
25 pages
6 Multicolinearity
No ratings yet
6 Multicolinearity
6 pages
Multicollinearity in Regression Models
No ratings yet
Multicollinearity in Regression Models
20 pages
Chapter7 Econometrics Multicollinearity
No ratings yet
Chapter7 Econometrics Multicollinearity
24 pages
Chapter7 Econometrics Multicollinearity
No ratings yet
Chapter7 Econometrics Multicollinearity
25 pages
Multicollinearity Assignment April 5
100% (1)
Multicollinearity Assignment April 5
15 pages
Chapter 4 Multicollinearity
No ratings yet
Chapter 4 Multicollinearity
7 pages
Lecture 17: Multicollinearity 1 Why Collinearity Is A Problem
No ratings yet
Lecture 17: Multicollinearity 1 Why Collinearity Is A Problem
9 pages
Multi Kol
No ratings yet
Multi Kol
44 pages
Missing Value 11
No ratings yet
Missing Value 11
14 pages
Multicollinearity
No ratings yet
Multicollinearity
7 pages
Mulicolinearity
No ratings yet
Mulicolinearity
18 pages
Multi Collinearity
No ratings yet
Multi Collinearity
22 pages
MULTICOLLINEALITY
No ratings yet
MULTICOLLINEALITY
20 pages
Multicollinearity and Endogeneity PDF
No ratings yet
Multicollinearity and Endogeneity PDF
37 pages
Understanding Multicollinearity in CLRM
No ratings yet
Understanding Multicollinearity in CLRM
35 pages
Multicolinearidade
No ratings yet
Multicolinearidade
24 pages
Introduction To Econometrics, 5 Edition: Chapter 3: Multiple Regression Analysis
No ratings yet
Introduction To Econometrics, 5 Edition: Chapter 3: Multiple Regression Analysis
31 pages
Chapter 4
No ratings yet
Chapter 4
38 pages
Econometrics Edited Chapter-4
No ratings yet
Econometrics Edited Chapter-4
35 pages
cn2 Multi
No ratings yet
cn2 Multi
7 pages
Multicollinearity
No ratings yet
Multicollinearity
36 pages
3-Linear Regreesion-Assumptions
No ratings yet
3-Linear Regreesion-Assumptions
28 pages
Business Econometrics Using SAS Tools (BEST) : Class XI and XII - OLS BLUE and Assumption Errors
No ratings yet
Business Econometrics Using SAS Tools (BEST) : Class XI and XII - OLS BLUE and Assumption Errors
15 pages
Collinarity
No ratings yet
Collinarity
6 pages
Multicollinearity: Abhijeet Kumar Kumar Anshuman Manish Kumar Umashankar Singh
100% (1)
Multicollinearity: Abhijeet Kumar Kumar Anshuman Manish Kumar Umashankar Singh
22 pages
Advanced Regression Analysis Guide
No ratings yet
Advanced Regression Analysis Guide
68 pages
Session On Multicollinearity
No ratings yet
Session On Multicollinearity
11 pages
Econometrics ch11
No ratings yet
Econometrics ch11
44 pages
Econometrics Chapter 10
No ratings yet
Econometrics Chapter 10
41 pages
09 - Building A Robust Geodemographic Segmentation Model
No ratings yet
09 - Building A Robust Geodemographic Segmentation Model
65 pages
Multicollinerity
No ratings yet
Multicollinerity
27 pages
CH 10
No ratings yet
CH 10
9 pages
Topic 7 Regression Diagnostic I Analysis Multicollinearity
No ratings yet
Topic 7 Regression Diagnostic I Analysis Multicollinearity
28 pages
Chapter 5
No ratings yet
Chapter 5
26 pages
Statisticians' Guide to Multicollinearity
100% (5)
Statisticians' Guide to Multicollinearity
14 pages
Multicollinearity in Regression
No ratings yet
Multicollinearity in Regression
25 pages
Multicollinearity in Regression Analysis
100% (3)
Multicollinearity in Regression Analysis
7 pages
Econ 321.6
No ratings yet
Econ 321.6
20 pages
AIS Lecture 18
No ratings yet
AIS Lecture 18
33 pages
Multicollinearity 2023
No ratings yet
Multicollinearity 2023
32 pages
Understanding Multicollinearity in Econometrics
100% (1)
Understanding Multicollinearity in Econometrics
45 pages
ECN 305 Multicollinearity 100656
No ratings yet
ECN 305 Multicollinearity 100656
38 pages
Multicollinearity Among The Regressors Included in The Regression Model
No ratings yet
Multicollinearity Among The Regressors Included in The Regression Model
13 pages
MULTICOLLINEARITY
No ratings yet
MULTICOLLINEARITY
21 pages
1 Multicollinearity and Partial F Test PowerPoint
No ratings yet
1 Multicollinearity and Partial F Test PowerPoint
61 pages
Lecture 14: Multiple Linear Regression 1 Review of Simple Linear Regression in Matrix Form
No ratings yet
Lecture 14: Multiple Linear Regression 1 Review of Simple Linear Regression in Matrix Form
7 pages
Multi Col Linearity
No ratings yet
Multi Col Linearity
37 pages
LEC11
No ratings yet
LEC11
21 pages
Finalize Outline of Time Series and Panel Deta
No ratings yet
Finalize Outline of Time Series and Panel Deta
4 pages
Statistical Models for Analysts
No ratings yet
Statistical Models for Analysts
93 pages
C4 English
No ratings yet
C4 English
27 pages
Multi Col Linearity
No ratings yet
Multi Col Linearity
3 pages
AE Unit II
No ratings yet
AE Unit II
64 pages
Ecn - Introduction To Econometrics
No ratings yet
Ecn - Introduction To Econometrics
47 pages
Violation of OLS Assumption - Multicollinearity
No ratings yet
Violation of OLS Assumption - Multicollinearity
18 pages
Sports Injury Probability Analysis
No ratings yet
Sports Injury Probability Analysis
12 pages
Stat
No ratings yet
Stat
88 pages
Introduction to Statistics and Sampling
No ratings yet
Introduction to Statistics and Sampling
3 pages
SMK Student Learning Motivation Inventory
No ratings yet
SMK Student Learning Motivation Inventory
6 pages
Smtb1402-Probability & Statistics: Correlation
No ratings yet
Smtb1402-Probability & Statistics: Correlation
19 pages
Homework #6
No ratings yet
Homework #6
3 pages
CH 14 Part 3 Additional Section Two-Way ANOVA With
No ratings yet
CH 14 Part 3 Additional Section Two-Way ANOVA With
23 pages
Sy Btech
No ratings yet
Sy Btech
12 pages
Regression and Data Visualization Quiz
No ratings yet
Regression and Data Visualization Quiz
4 pages
Binary Logistic Regression Analysis
No ratings yet
Binary Logistic Regression Analysis
19 pages
(Chapman & Hall CRC Biostatistics Series'',) Andreas Wienke - Frailty Models in Survival Analysis (Chapman & Hall CRC Biostatistics Series) - Chapman and Hall - CRC (2010)
No ratings yet
(Chapman & Hall CRC Biostatistics Series'',) Andreas Wienke - Frailty Models in Survival Analysis (Chapman & Hall CRC Biostatistics Series) - Chapman and Hall - CRC (2010)
320 pages
EXTENDED PROJECT-Soft Drink
100% (1)
EXTENDED PROJECT-Soft Drink
26 pages
Illustration of Using Excel To Find Maximum Likelihood Estimates
No ratings yet
Illustration of Using Excel To Find Maximum Likelihood Estimates
14 pages
Biostatistics & Epidemiology Course
No ratings yet
Biostatistics & Epidemiology Course
2 pages
Dept of Eco Ets Course Content Mphil Econometrics
No ratings yet
Dept of Eco Ets Course Content Mphil Econometrics
18 pages
Individuals and Moving Range Charts: NCSS Statistical Software
No ratings yet
Individuals and Moving Range Charts: NCSS Statistical Software
14 pages
ANoteona New Bivariate Copula Definedwitha Piecewisegeneratorfunction
No ratings yet
ANoteona New Bivariate Copula Definedwitha Piecewisegeneratorfunction
15 pages
MAKAUT Math 2023
No ratings yet
MAKAUT Math 2023
2 pages
Random Sampling & Bias Explained
No ratings yet
Random Sampling & Bias Explained
4 pages
Meta Analysis
No ratings yet
Meta Analysis
73 pages
1645102337
No ratings yet
1645102337
23 pages
YMS Chapter 2: The Normal Distributions AP Statistics at LSHS Mr. Molesky
100% (3)
YMS Chapter 2: The Normal Distributions AP Statistics at LSHS Mr. Molesky
2 pages
SAT Math: Scatterplots & Sampling
No ratings yet
SAT Math: Scatterplots & Sampling
7 pages
Implant Scan Accuracy Study
No ratings yet
Implant Scan Accuracy Study
10 pages
Groundwater Glyphosate Precision
No ratings yet
Groundwater Glyphosate Precision
3 pages
Unit 3 Analysis of Data Assignment Part 1 March 2023 v2
No ratings yet
Unit 3 Analysis of Data Assignment Part 1 March 2023 v2
4 pages
Statistical Process Control: Samir Mistry
100% (1)
Statistical Process Control: Samir Mistry
26 pages
Six Sigma Updated PDF
100% (4)
Six Sigma Updated PDF
127 pages
Unified Supplementary Learning Materials: (Uslem
50% (2)
Unified Supplementary Learning Materials: (Uslem
8 pages
SPSS Independent Samples T Test
No ratings yet
SPSS Independent Samples T Test
72 pages

Chap 9 Multicollinearity

Uploaded by

Chap 9 Multicollinearity

Uploaded by

CHAPTER 9 MULTICOLLINEARITY

• If there is no linear relationship between two regressors, they are said to be

• If multicollinearity is present, inferences based on the regression model can be

• In Figure 3.13(a), a strong linear relationship is observed between x1 and x2 ,

9.2 Sources of Multicollinearity

• There are four primary sources of multicollinearity:

1. Data Collection Method

9.3 E↵ects of Multicollinearity

• The presence of multicollinearity has a number of serious e↵ects on the regression

yi0 = b1 wi1 + b2 wi2 + · · · + bk wik + ✏, i = 1, 2, . . . , n

and so the inverse is

We can then obtain the least squares estimator of

and the variance of ˆ

Irial near I s Varli )sD

If multicollinearity exists between xj and any subset of the other k 1 re-

2. “Large” estimates ˆ p is unbicsed ,

Multicollinearity also tends to produce least squares estimates ˆj that are

The expected value is

Equivalently, we can show that

If predictions are confined to regions of the x space where the multicollinearity

9.4 Multicollinearity Diagnostics

1. Scatterplot Matrix and Correlation Matrix

Example: The Acetylene Data

y = percentage of conversion of n-heptane to acetylene

Example: The Webster, Gunst, and Mason Data

y = 10 + 2x1 + x2 + 0.2x3 2x4 + 3x5 + 10x6 + ✏

In this example, we have more than two regressors, namely x1 , x2 , x3 , and

Observe the given correlation matrix and scatterplot matrix to comment

Limitation of Ist method :

only use to detect any two regressor

None of the pairwise correlations are suspiciously large and consequently

Scatterplot matrix and correlation matrix is only helpful in detecting

Var ( iBi ) : racis =

2. Variance Inflation Factors

If xj is nearly orthogonal to the remaining regressors, Rj2 is small and Cjj is

One or more large VIFs indicate multicollinearity. Practical experience sug-

Example: The Acetylene Data

The largest VIF is 6563 , 345193 , which is larger

Example: The Webster, Gunst, and Mason Data

The largest VIF is 297.714658 , which is larger than the typical

Some analysts prefer to examine the condition number of X0 X, defined as

which is a measure of the spread in the eigenvalue spectrum of X0 X. Generally

 < 100 no serious problem of multicollinearity

We can also compute the condition indices of X0 X

Example: The Acetylene Data

Implying that multicollinearity is present.

Example: The Webster, Gunst, and Mason Data

Implying that multicollinearity is present.

This implies the relationship

x1 + 0.941x2 + 1.210x3 + 1.281x4 = 0

9.5 Methods for Dealing with Multicollinearity

• There are a few approaches to combat the problems induced by multicollinearity.

1. Collecting Additional Data

However, there are situations that this approach is not applicable:

If we have x1 , x2 , and x3 being nearly linearly dependent, another possibility

A number of procedures have been developed for obtaining biased estimators

MSE = Variance + (Bias)2

For the ridge estimator ˆ R ,

We will determine the value of by inspecting the ridge trace, which is a

Example: The Acetylene Data

Least squares estimates: unit scaled

ŷ 0 = 0.33648681w1 + 0.23349593w2 0.67589625w3 0.47995686w4

Ridge regression estimates:

ŷ 0 = 0.5031290w1 + 0.2042090w2 0.3785296w3 0.1728259w4

Estimates of b5 , b6 and b9 are considerably smaller in magnitude under ridge

ŷ = 35.16593 + 5.986614x1 + 2.429835x2 4.504035x3

As increases, SSRes increases therefore M SRes increases but R2 decreases.

0 0.018112 = 0.0003279721 0.9977

0.02 0.001240819 0.9913143

Chapter 9 Problems 1, 6, 7, 8, 10, 13, 14, 19, 23(a)(b)(c).

You might also like