Regression
Regression
for
Stat 5313
J. D. Tubbs
Department of Mathematical Sciences
1 Least Squares 1
1.1 Inference in Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Regression Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.2 Analysis of Variance for Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.3 ANOVA Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.4 Continuation of the Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
i
3.1.3 Expected Values of the Sum of Squares . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.4 Distribution of the Mean Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
8 Multicollinearity 87
8.1 Detecting Multicollinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
8.1.1 Tolerances and Variance Inflation Factors . . . . . . . . . . . . . . . . . . . . . . . . . 90
8.1.2 Eigenvalues and Condition Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
8.1.3 SAS – Collinearity Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
8.1.4 SAS Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
ii
9 Ridge Regression 96
9.0.5 Ridge Plot Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
iii
Chapter 1
Least Squares
Suppose that one observes n pairs of data, given by (xi , yi ), i = 1, 2, . . . , n. The Least Squares problem
consists of finding the linear equation given by ŷ = β̂0 + β̂1 x which minimizes the following equation
n
X n
X
Q(β0 , β1 ) = [yi − ŷi ]2 = [yi − (β0 + β1 xi )]2 .
i=1 i=1
The solution can be found by taking the partial derivatives of Q(β0 , β1 ) with repect to both β0 and β1 . That
is,
n
∂Q(β0 , β1 ) X
= −2 [yi − β0 − β1 xi ]
∂β0 i=1
and
n
∂Q(β0 , β1 ) X
= −2 [yi − β0 − β1 xi ]xi .
∂β1 i=1
Setting both of these equations equal to zero one obtains the following:
X X
nβ0 + xi β1 = yi
X X X
xi β0 + x2i β1 = xi yi
The above equation is called the normal equation for the linear least squares problem. This equation has a
unique solution given by
β̂0 = ȳ − β̂1 x̄
SSxy
β̂1 =
SSxx
where the basic notation gives
n
X n
X
ȳ = yi /n x̄ = xi /n
i=1 i=1
Xn X
SSxy = (xi − x̄)(yi − ȳ) = xi yi − nx̄ȳ
i=1
Xn X
SSxx = (xi − x̄)2 = x2i − nx̄2 = (n − 1)s2x
i=1
1
n
X X
SSyy = (yi − ȳ)2 = yi2 − nȳ 2 = (n − 1)s2y
i=1
The residuals are defined by êi = yi − β̂0 − β̂1 xi for i = 1, 2, . . . , n. The residual sum of squares or sum of
squares due to the error is
2
X SSxy
SSE = Q(β̂0 , β̂1 ) = ê2i = SSyy − .
SSxx
The predicted value (line at x = x∗ ) if given by
2
• The y-intercept (β0 ):
β̂0 = ȳ − β̂1 x̄
and p
σ̂β̂0 = σ̂ 1/n + x̄2 /SSxx .
From these values one can find (1 − γ)100% Confidence intervals for;
• The slope:
β̂1 ± tγ/2 (df = (n − 2))σ̂β̂1 .
• The line at x = x∗ :
µ̂y|x∗ ± tγ/2 (df = (n − 2))σ̂µ̂y|x∗ .
where tγ/2 (df = (n − 2)) is the critical point from a t-distribution with df = n-2.
The last two lines of the above table denote the column totals and averages. Hence x̄ = 21.54 and ȳ = 5.589.
The sample variances are computed using the totals under (x − x̄)2 = 1441 and (y − ȳ)2 = 60.23. That is
SSxx = 1441 and SSyy = 60.23 and the sample variance of x is given by SSxx /(n-1) = 1441/8 = 180.08 and
the sample variance for y is given by SSyy /8 = 60.23/8 = 7.529. In order to find the least square estimates
for the y intercept and for the slope of the line one computes the estimate for the slope
3
√
where .01146 = .435/ 1440 and 2.365 is obtained form the t distribution table under the .025 column with
7 degrees of freedom.
Suppose that you want to test the hypothesis that the slope of the line is equal to zero at the .05 level. One
can reject this hypothesis since 0 is not contained in the above interval. Note: one can only use confidence
intervals to formulate test of hypothesis when the level of the CI is comparable to the specified type I error.
Suppose that one wanted a 95% CI for the line at x = 20. The interval is given by
p
= 1/n + (20 − x̄)2 /SSxx
(β̂0 + β̂1 · 20) ± (t.025 (n − 2))σ̂
p
= 1.233 + (.20221)(20) ± 2.365(.435) 1/9 + (20 − 21.54)2 /1440
= 5.277 ± (2.365)(.146)
= (4.932, 5.622).
and
n
X
SSE = (yi − ŷi )2 .
i=1
The value SSM is called the sum of squares due to the model and SSE is called the sum of squares due to
the error or the lack of fit.
The above identity follows from
X X
(yi − ŷi )2 = (yi − ȳ + ȳ − ŷi )2
X
= (yi − ȳ)2 + (ŷi − ȳ)2 − 2(yi − ȳ)(ŷi − ȳ)
X X X
= (yi − ȳ)2 + (ŷi − ȳ)2 − 2 (yi − ȳ)(ŷi − ȳ)
X X
= (yi − ȳ)2 − (ŷi − ȳ)2
4
If the slope of the line is nonzero then one would expect that a sizeable amount of the variability in
y as given by SST would be attributable to SSM . One way of measuring this is by computing R2 =
SSM/SST = SSM/(SSM + SSE). R2 is a number between 0 and 1 which is usually expressed as a
percentage. The closer the value is to 1 or 100% means that the amount of variablity found in SST is nearly
explained by the model (or in this case by having β̂1 to be nonzero). On the otherhand if R2 is close to
zero then very little of the variability in the data is explained by the model (in this case the linear equation)
which means that one doesn’t need x in order to explain variability in y.
Source Sum
P of Squares Degrees of Freedom Mean Square
2
due to β1 | β0 P (ŷi − ȳ) 2 1 MSM = SSM/1
Residual P(yi − ŷi )2 n-2 MSE = SSE/(n-2)
Corrected Total (yi − ȳ) n-1
due to β0 nȳ 2 1
yi2
P
Total n
5
Dependent variable is: consumption
SAS Plots
6
Chapter 2
xn
7
2.1.2 Addition
C = A ± B is defined as cij = aij ± bij provided both A and B have the same number of rows and columns.
It can easily be showned that (A ± B) ± C = A ± (B ± C) and A + B = B + A.
2.1.3 Multiplication
Pp
C = AB is defined as cij = k=1 aik bkj provided A and B are conformable matrices (A is r × p and B is
p × c). Note: Even if both AB and BA are defined they are not necessarily equal. It follows
Pthat A(B ± C) =
n
AB ± AC). Two vectors a and b are said to be orthogonal, denoted by a⊥b = 0, if ab = i=1 ai bi = 0.
2.1.5 Inverse
A n × n matrix A is said to be nonsingular if there exists a matrix B satisfying AB = BA = In . B is called
the inverse of A is denoted by A−1 .
2.1.6 Transpose
If A is r × c then the transpose of A, denoted by A0 , is a c × r matrix. It follows that
1. (A0 )0 = A
2. (A ± B)0 = A0 ± B 0
3. (AB)0 = B 0 A0
4. If A = A0 then A is said to be symmetric.
5. A0 A and AA0 are symmetric.
6. (A ⊗ B)0 = (A0 ⊗ B 0 ).
8
2.1.7 Trace
Definition:
Pn 2.1 Suppose that the matrix A = (aij ), i = 1, . . . , n, j = 1, . . . , n then the trace of A given by
tr[A] = i=1 aii .
2.1.8 Rank
Suppose that A is a r × c matrix with r rows a1 , a2 , . . . , ac are said to be linearly independent if no ai can
be expressed as a linear combination Pr of the remaining a0i s, that is, there does not exist a non-null vector
c = (c1 , c2 , . . . , cr ) such that i=1 ci ai = 0. It can be shown that the number of linearly indendent rows is
equal to the number of linearly independent columns of any matrix A and that number is the rank of the
matrix. If the rank of A is r then the matrix A is said to be full row rank. If the rank of A is c then A is
said to be full row rank.
1. rank[A] = 0 if and only if A = 0.
2. rank[A] = rank[A0 ].
3. rank[A] = rank[A0 A] = rank[AA0 ].
9
2.1.10 Positive Semidefinite Matrices
A symmetric matrix A is said to be positive semidefinite (p.s.d.) if and only if q = x0 Ax ≥ 0 for all x.
1. The eigenvalues of p.s.d. matrices are nonnegative.
2. If A is p.s.d. then tr[A] ≥ 0.
3. A is p.s.d. of rank r if and only if there exists an n × n matrix R of rank r such that A = RR0 .
4. If A is an n × n p.s.d. matrix of rank r, then there exists an n × r matrix S of rank r such that
S 0 AS = Ir .
5. If A is p.s.d., then X 0 AX = 0 ⇒ AX = 0.
10
2.1.13 Orthogonal Matrices
An n × n matrix A is said to be orthogonal if and only A−1 = A0 . If A is orthogonal then
1. −1 ≤ ai ≤ 1.
2. AA0 = A0 A = In .
3. | A |= 1.
Vector Differentiation
Let X be an n × m matrix with elements xij , then if f (X) is a function of the elements of X, we define
df df
=
dX dxij
then
d(β 0 a)
1. dβ = a.
d(β 0 Aβ)
2. dβ = 2Aβ. (A symmetric).
df
3. if f (X) = a0 Xb, then dX = ab0 .
df
4. if f (X) = tr[AXB], then dX = A0 B 0 .
df
5. if X is symmetric and f (X) = a0 Xb, then dX = ab0 + b0 a − diag(ab0 ).
df
6. if X is symmetric and f (X) = tr[AXB], then dX = A0 B 0 + BA − diag(BA).
df
7. if X and A are symmetric and f (X) = tr[AXAX], then dX = 2AXA.
11
2.2 Random Variables and Vectors
2.2.1 Expectations
Let U denote a random variable with expectation E(U ) and V ar(U ) = E(U − E(U ))2 ). Let a and b denote
any constants, then we have
1. E(aU ± b) = aE(U ) ± b.
2. V ar(aU ± b) = a2 V ar(U ).
Suppose that t(x) is a statistic that is used to estimate a parameter θ. If E(t(x)) = θ, the statistic is said to
be an unbiased estimate for θ. If E(t(x)) = η 6= θ then t(x) is biased and the bias is given by Bias = (θ − η),
in which case the mean square error is given by
2.2.2 Covariance
Let U and V denote two random variables with respective means, µu and µv . The covariance between the
two random variables is defined by
12
3. cov(u + d) = cov(u).
Pn
4. tr[cov(u)] = trE[(u − E(u))(u − E(u))0 ] = E[(u − E(u))0 (u − E(u))] = i=1 σii is the total variance
of u.
Suppose that A is a r × n matrix and one defines v = Au ± b, then
5. E(v) = AE(u) ± b.
6. cov(v) = Acov(u)A0 = AΣA0 . Note cov(v) is an r × r symmetric and at least p.s.d. matrix.
Suppose that C is a s × n matrix and one defines w = Cu ± d, then
7. cov(v, w) = AΣB 0 . Note cov(v, w) is a r × s matrix.
2.3 Distributions
2.3.1 Multivariate Normal
Recall in the univariate case, the normal density function for y is given by
k = (2πσ 2 )−1/2 .
where,
1. k = (2π)−n/2 | Σ |−1/2 is the normalizing constant and | Σ | is the determinate of Σ.
2. E(y) = µ = (µ1 , µ2 , . . . , µn )0 and cov(y) = Σ.
3. Q = (y − E(y))0 Σ−1 (y − E(y)) ∼ χ2n , where χ2n is a Chi-square with n degrees of freedom.
4. y is said to have an n-dimensional multivariate normal distribution with mean = µ and covariance
matrix = Σ provided Σ is nonsingular. This is denoted by y ∼ Nn (µ, Σ).
Suppose that y ∼ Nn (µ, Σ) and A is a r × n. Define u = Ay ± b then u ∼ Nr (µu = Aµ ± b, Σu = AΣA0 )
provided AΣA0 is nonsingular (i.e. rank(A) = r).
13
u/n
5. If u ∼ χ2 (n) and v ∼ χ2 (m) then v/m ∼ F-dist(n,m).
Pn
6. z = (z1 , z2 , . . . , zn )0 then z 0 z = i=1 zi2 ∼ χ2 (n).
7. If x ∼ N (µ, 1) then x2 ∼ χ2 (df = 1, λ = µ2 ). x2 is said to have a noncentral Chi-square distribution
with noncentrality parameter λ.
14
Chapter 3
In this chapter the approach given in the previous chapter is modernized by using matrix notation. The
linear regression model can be written as
yi = β0 + β1 xi + ei
for i = 1, 2, . . . , n. The term given by ei represents the unobserved error or residual that the observed data
value yi is from the predicted line given by β0 + β1 xi . This model can be written as
~ + ~e
~y = X β
where
y1 1 x1 e1
y2 1 x2 e2
~y =
..
X=
... .. ~e =
...
. .
yn 1 xn en
and
~= β0
β .
β1
Note: I will assume that the symbol y is a vector rather than using the notation ~y .
15
From which one obtains the normal equations given by
X 0 Xβ = X 0 y.
β̂ = (X 0 X)−1 X 0 y
Let L = (X 0 X)−1 X then it follows that the least squares estimate for β is a linear function of y, i.e. β̂ = Ly.
In the linear regression model we have
P −1 P
β̂ =
β̂0
= Pn P x2i P yi
β̂1 xi xi xi yi
P 2 P P
1 − xi
= P Pxi P yi
(xi − x̄)2 − xi n xi yi
H = X(X 0 X)−1 X 0 .
H is a n × n matrix called the Hat Matrix. The error in prediction or residual is given by ê = y − ŷ =
y − Hy = (I − H)y. The residual sum of squares is given by
Note: that (I − H) and H are idempotent matrices and it follows that H(I − H) = (I − H)H = 0.
The above derivation is similar to that given in the previous chapter. At this stage the problem is one of
optimation rather than statistics. In order to create a statistical problem it is necessary to introduce some
distributional assumptions. When using the linear models approach one makes assumptions concerning the
error structure. That is, assume that the unobserved error ei , i = 1, 2, . . . , n are i.i.d normals with mean =
0 and variance = σ 2 . This assumption becomes
e1
e2 2
e= ... ∼ Nn (0, σ In ).
en
Using the properties of linear transformations of normal variates one has
y1
y2 2
y= .. ∼ Nn (Xβ, σ In ).
.
yn
16
(d) corr(β̂i , β̂j ) = ((X 0 X)−1 )ii /[((X 0 X)−1 )ii ((X 0 X)−1 )jj ]1/2 .
2. ŷ ∼ Nn (Xβ, σ 2 H).
(a) var(ŷi ) = σ 2 hii .
(b) cov(ŷi , ŷj ) = σ 2 hij , where H = (hij ). Notice that the ŷi0 s are not independent of one another
unless hij = 0.
(c) corr(ŷi , ŷj ) = hij /[hii hjj ]1/2 .
3. ê ∼ Nn (0, σ 2 (I − H)).
(a) var(êi ) = σ 2 (1 − hii ).
(b) cov(êi , êj ) = −σ 2 hij .
(c) corr(êi , êj ) = −hij /[(1 − hii )(1 − hjj )]1/2 .
3.1.1 Estimation of σ 2
The estimation of σ 2 follows from observing that the residual sum of of squares Q(β̂) is a quadratic form
given by ê0 ê = y 0 (I −H)y and using the expected value of a quadratic form, i.e., E(q = y 0 Ay) = tr[A]+µ0 Aµ,
one has
E(Q(β̂) = y 0 (I − H)y) = tr[(I − H)] + β̂ 0 X 0 (I − H)X β̂ = tr[(I − H)] = σ 2 (tr[I] + tr[H]) = σ 2 (n − 2),
since
X 0 (I − H) = X 0 − X 0 X(X 0 X)−1 X 0 = X 0 − X 0 = (I − H)X = X − X 0 X(X 0 X)−1 X = X − X = 0.
From here one can define an estimate for σ 2 with
σ̂ 2 = y 0 (I − H)y/(n − 2) = SSE /(n − 2).
17
3.1.3 Expected Values of the Sum of Squares
Observe that the terms under the sum of squares column are actually quadratic forms. Using the properties
of the expected value of quadratic forms, i.e.,
Let x = (x1 , x2 , . . . , xn )0 ∼ Nn (µ, V ) (This means that the x0i s are not independent of one another).
Define the quadratic form q1 = x0 Ax then the expected value of q1 is E(q1 ) = tr[AV ] + µ0 Aµ. It follows
when V = σ 2 In that
1. E(y 0 Hy) = σ 2 tr[H] + β 0 X 0 HXβ = 2σ 2 + β 0 X 0 Xβ.
2. E(y 0 (I − H)y) = σ 2 tr[(I − H)] + β 0 X 0 (I − H)Xβ = (n − 2)σ 2 , since X 0 (I − H) = (I − H)X = 0.
18
Chapter 4
In this chapter the approach given in the previous chapter is extended to higher order regression models. The
generalized linear regression (not to be confused with the generalized linear model) model can be written as
y = Xβ + e
where
y1
1 x11 x21 ... x(p−1)1 e1
y2 1 x12 x21 ... x(p−1)1 e2
y=
.. X=. e=
...
.. .. ..
..
. . . .
yn 1 x1n x2n . . . x(p−1)n en
and
β0
β1
β2
β= ,
..
.
βp−1
where
y is n × 1 vector of dependent observations.
X is an n × p matrix of independent observations or known values.
19
β is a p × 1 vector of parameters.
e is a n × 1 vector of unobservable errors or residuals.
By using the properties of differentiation with matrices one has
∂Q(β)
= −2X 0 y + 2X 0 Xβ = 0.
∂β
From which one obtains the normal equations given by
X 0 Xβ = X 0 y.
β̂ = (X 0 X)−1 X 0 y.
4.0.5 Inference
The above derivation is similar to that given in the previous chapter. At this stage the problem is one of
optimation rather than statistics. In order to create a statistical problem it is necessary to introduce some
distributional assumptions. When using the linear models approach one makes assumptions concerning the
error structure. That is, assume that the unobserved error ei , i = 1, 2, . . . , n are i.i.d normals with mean =
0 and variance = σ 2 . This assumption becomes
e1
e2 2
... ∼ Nn (0, σ In ).
e=
en
y1
y2 2
.. ∼ Nn (Xβ, σ In ).
y=
.
yn
2. ŷ ∼ Nn (Xβ, σ 2 H).
(a) var(ŷi ) = σ 2 hii .
(b) cov(ŷi , ŷj ) = σ 2 hij , where H = (hij ). Notice that the ŷi0 s are not independent of one another
unless hij = 0.
20
(c) corr(ŷi , ŷj ) = hij /[hii hjj ]1/2 .
3. ê ∼ Nn (0, σ 2 (I − H)).
(a) var(êi ) = σ 2 (1 − hii ).
(b) cov(êi , êj ) = −σ 2 hij .
(c) corr(êi , êj ) = −hij /[(1 − hii )(1 − hjj )]1/2 .
4.0.6 Estimation of σ 2
The estimation of σ 2 follows from observing that the residual sum of of squares Q(β̂) is a quadratic form
given by ê0 ê = y 0 (I −H)y and using the expected value of a quadratic form, i.e., E(q = y 0 Ay) = tr[A]+µ0 Aµ,
one has
E(Q(β̂) = y 0 (I − H)y) = tr[(I − H)] + β̂ 0 X 0 (I − H)X β̂ = tr[(I − H)] = σ 2 (tr[I] + tr[H]) = σ 2 (n − p),
since
y 0 Hy = y 0 (H − jj0 )y + y 0 jj0 y
where y 0 jj0 y = nȳ 2 is the correction factor or the sum of squares due to β0 . In which case the ANOVA table
become
21
4.0.8 Expected Values of the Sum of Squares
Observe that the terms under the sum of squares column are actually quadratic forms. Using the properities
of the expected value of quadratic forms, i.e.,
Let x = (x1 , x2 , . . . , xn )0 ∼ Nn (µ, V ) (This means that the x0i s are not independent of one another).
Define the quadratic form q1 = x0 Ax then the expected value of q1 is E(q1 ) = tr[AV ] + µ0 Aµ. It follows
when V = σ 2 In that
1. E(y 0 Hy) = σ 2 tr[H] + β 0 X 0 HXβ = pσ 2 + β 0 X 0 Xβ.
2. E(y 0 (I − H)y) = σ 2 tr[(I − H)] + β 0 X 0 (I − H)Xβ = (n − p)σ 2 , since X 0 (I − H) = (I − H)X = 0.
The R2 is an indicator of how much of the variation in the data is explained by the model. It is defined
as
SSβ∗ |β0 y 0 Hy − yjj0 y
R2 = = 0 .
SSCT y y − y 0 jj0 y
The adjusted R2 is the R2 value adjusted for the number of parameters in the model and is given by
where i is 1 if the model includes the y intercept (β0 ), and is 0 otherwise. Tolerance (TOL) and variance
inflation factors (VIF) measure the strength of interrelationships amoung regressor variables in the model.
They are given as
T OL = 1 − R2
and
V IF = T OL−1 .
22
4.1 The Reduction Notation
In constructing the ANOVA tables one is interested in describing the sum of squares attributal to various
terms in the model. This can be done in one of two ways, sequential or partial. Suppose that one has the
three term model given by
yi = β0 + β1 x1i + β2 x2i + β3 x3i + ei .
If the model represents a polynomial regression model of order 2 (i.e., xji = xji for j = 1, 2, 3, i =
1, 2, . . . , n) then the model sum of squares SSM can be written as
Where each of these terms are independent quadratic forms with noncentral chi-square distributions with
degrees of freedom = 1.
However, the model represents a multiple regression model where each of the xj represents a different
measurement or random variable one is interested in determining the amount of reduction in sum of squares
that can be attributal to each of the terms in the model. That is, one wants to be able to compute
S(β3 | β0 , β1 , β2 )
S(β2 | β0 , β1 , β3 )
S(β1 | β0 , β2 , β3 ).
These quadratic forms are no longer independent hence their sum does not equal SSM but they are inde-
pendent of the error sum of squares and they can be shown to be chi-square distributed with one degree of
freedom.
When using SAS the sequential sum of squares is the standard output and can be specified with the
Type I SS statement. The partial SS can be obtained using the Type II SS statement. SAS is a very general
statistics package and one can analyse other linear models besides the regression models. The Type I SS
should be used when
• Balance ANOVA model with terms in their proper sequence or order.
• Purely nested model in proper order.
• Polynomial regression models.
23
options center nodate pagesize=100 ls=80;
symbol1 i=none color=red v=star ;
symbol2 i=join v=none color=blue;
symbol3 i=join color=green v=none;
title1 ’Appendix 1A page 46’;
data a1;
infile ’C:\MyDocuments\Regression\Data\DS_data\01a’;
input obs x1-x10 @@; y=x1; x=x4;
*proc print;*run;
proc sort; by x;
title2 ’Scatterplot of Data’;*proc gplot data=a1; *plot y*x=1; *run;
proc reg graphics;
model y=x4; run;
proc reg graphics;
model y = x2 x3 x5 x4 / ss1 ss2; run;
The output for this program is as follows;
Appendix 1A page 46
Scatterplot of Data
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
24
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t| Type I SS
Parameter Estimates
Variable DF Type II SS
Intercept 1 0.54367
x2 1 2.16136
x3 1 0.36049
x5 1 0.72591
x4 1 17.29818
H0 : Lβ = c
where L is a q × p matrix of rank q. The approach is to estimate Lβ − c with Lβ̂ − c. Using the properities
of expectation and covariance we have
1. E(Lβ̂ − c) = Lβ − c.
25
3. The quadratic form given by
Q = (Lβ̂ − c)0 (L(X 0 X)−1 L0 )−1 (Lβ̂ − c) ∼ χ2 (df = q, λ = 1/2(Lβ − c)0 (L(X 0 X)−1 L0 )−1 (Lβ − c).
Taking the derivative of Λ with respect to both β and λ and setting equal to zero gives the following solution
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
26
Root MSE 0.43454 R-Square 0.9781
Dependent Mean 5.58889 Adj R-Sq 0.9749
Coeff Var 7.77501
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 0 58.81974 . . .
Error 8 1.40914 0.17614
Corrected Total 8 60.22889
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
27
Test of Hypothesis
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Mean
Source DF Square F Value Pr > F
28
Chapter 5
In this chapter the underlying assumptions are tested for the fitted model. Various techniques have been
propose for examing the assumptions with the fitted model. The are several results that have been found in
the previous chapters. They are;
1. β̂ = (X 0 X)−1 X 0 y ∼ Np (β, σ 2 (X 0 X)−1 ).
2. ŷ = X β̂ ∼ Nn (Xβ, σ 2 H).
(a) var(ŷi ) = σ 2 hii .
(b) cov(ŷi , ŷj ) = σ 2 hij , where H = (hij ). Notice that the ŷi0 s are not independent of one another
unless hij = 0.
(c) corr(ŷi , ŷj ) = hij /[hii hjj ]1/2 .
29
to data that should be modeled by a quadratic model. SAS allows for a visual examination for the normality
of the residuals. These plots are called the QQ plots and the PP plots. If normality holds then these plots
will resemble a 45 degree line passing through the origin of your x-y plot. There are some tests of fit for
normality although these are usually not needed when assessing the normality of the residuals.
In the QQ plot one usually plots the standardized residuals versus the γi0 s. If the data are normal then the
resulting scatterplot should form a diagonal 45 degree line.
In the PP plot one plots the ordered pair for the ith observation as (Φ(z(i) , [i/n]).
Sometimes the patterns in the residuals may indicate that one has correlated errors, i.e. corr(ei , ej ) 6= 0.
The following test called the Durbin-Watson test is commonly used to test for dependency in the error
structure.
Note that d = 0 when ei = ei−1 , d = 4 when ei = −ei−1 , while d = 2 when φ = 0. Since this statistic has
these properties the decision rule for rejecting the null hypothesis is dependent upon whether or not one is
testing for the alternative of positive or negative correlation.
In order to test for first order autoregressive error structure one will need to find critical values from the
Durbin-Watson tables (Table 7.1 page 184 in Draper and Smith). The table provides two critical points dL
and dU for specified values of n and k=p-1. The test procedure is as follows;
1. One sided test against the alternative hypothesis Ha : ρ > 0 where ρ = corr(ei , ei−1 ).
If d < dL , conclude d is significant and reject Ho at the α level.
If d > dU , conclude d is not significant and do not reject Ho at the α level.
If dL ≤ d ≤ dU , the test is said to be inconclusive (in practice this implies that one does not have
a reason to reject the null hypothesis).
30
2. One sided test against the alternative hypothesis Ha : ρ < 0. Repeat above using (4 − d) instead of d.
3. One sided test against the alternative hypothesis Ha : ρ 6= 0.
If d < dL or (4 − d) < dL , conclude d is significant and reject Ho at the 2α level.
If d > dU and (4 − d) > dU , conclude d is not significant and do not reject Ho at the 2α level.
Otherwise, the test is said to be inconclusive.
σ̂ 2 = M SE
or
2 (n − p)σ̂ 2 − ê2i /(1 − hii )
σ̂(i) =
(n − p − 1)
2
where σ̂(i) is the mean square for the error whenever the ith observation has been omitted from the regression
model. From here one can define
1. Internally Studentized Residual is given by
êi
si = .
σ̂(1 − hii )1/2
5.0.5 Leverages
The hat matrix H has the following properities;
1. SS(β) = β̂ 0 X 0 y = y 0 Hy = y 0 H 2 y = ŷ 0 ŷ.
Pn 2 2
2. i=1 var(ŷi )/n = tr[σ H]/n = σ p/n.
3. H1 = 1 whenever the y intercept is included in the model, thus the sum of every row and every column
of H sums to 1.
Pn
4. 0 ≤ hij ≤ 1 and i=1 hii = p = rank(X). Since the average hii = p/n the ith observation is said to
be a leverage point if hii ≥ 2p/n.
31
5. Since ŷ = Hy we have X
ŷi = hii yi + hij yj .
i6=j
This indicates the importance that yi has upon ŷi is given by the magitude of the leverage = hii .
Another measure of influence is the COVRATIO which is the ratio of the determinant of the covariance
matrix for the estimate β̂, given by det[σ̂ 2 (X 0 X)−−1 ] when the ith observation has been removed for the
computation of the estimate β̂(i) . That is,
0
2
COV RAT IO = det[σ̂(i) (X(i) X(i) )−−1 ]/det[σ̂ 2 (X 0 X)−−1 ].
This statistic should be close to one whenever the observation has little influence upon the estimation of β.
If the statistics is much different from one then the observation is said to be influencial.
32
5.1 Examples
5.1.1 Heat Consumption Example – Residual Plots
Splus Residual Plots
33
5.1.2 Exercise 1 Chapter 3 problem k page 101
Run the following SAS program and compare the results with the answer to the questions for the problem.
title2 ’Least Squares Fit to the Data’; plot y*x / pred95; run;
title2 ’Residual Plot’; plot r.*x/ vref=0; run;
title2 ’Leverage Plot’; plot h.*obs.; run;
title2 ’Check for Outlier’; plot student.*obs.; run;
title2 ’Check for outlier ith obs missing’; plot rstudent.*obs.; run;
title2 ’PP Plot’; plot npp.*r. /nostat; run;
title2 ’QQ Plot’; plot r. * nqq. /nostat; run;
*Test to determine if the lack of fit is significant, not useful in this example;
34
Output from Splus
*** Linear Model ***
Coefficients:
Value Std. Error t value Pr(>|t|)
(Intercept) 1.0021 0.0109 92.0401 0.0000
x -0.0029 0.0002 -12.4346 0.0000
Correlation of Coefficients:
(Intercept)
x -0.785
Response: y
35
Splus Plots Scatterplot
36
Splus Plots QQ plot
37
Output from JMP
JMP Layout 1
38
JMP Layout 2
39
SAS Output
Variable Intercept x y
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Correlation of Estimates
Variable Intercept x
40
Output Statistics
Output Statistics
41
4 0.0382 -0.107 | | | 0.000
5 0.0382 -0.428 | | | 0.005
6 0.0383 -0.239 | | | 0.002
7 0.0383 -0.271 | | | 0.002
8 0.0383 0.162 | | | 0.001
9 0.0383 -1.996 | ***| | 0.104
10 0.0384 0.132 | | | 0.000
11 0.0385 0.738 | |* | 0.012
12 0.0385 0.714 | |* | 0.011
13 0.0385 -0.143 | | | 0.000
14 0.0385 -0.395 | | | 0.003
15 0.0386 -0.806 | *| | 0.013
16 0.0386 0.816 | |* | 0.013
17 0.0386 1.272 | |** | 0.030
18 0.0386 0.0301 | | | 0.000
19 0.0387 -1.790 | ***| | 0.056
20 0.0387 1.905 | |*** | 0.057
21 0.0387 1.575 | |*** | 0.038
22 0.0387 0.580 | |* | 0.005
23 0.0387 -0.374 | | | 0.002
24 0.0387 -0.0832 | | | 0.000
25 0.0387 -1.428 | **| | 0.035
26 0.0384 1.135 | |** | 0.031
27 0.0383 -0.606 | *| | 0.010
28 0.0379 1.526 | |*** | 0.088
29 0.0379 -0.202 | | | 0.002
30 0.0379 0.684 | |* | 0.018
31 0.0372 1.868 | |*** | 0.207
32 0.0372 -1.113 | **| | 0.074
33 0.0359 -1.986 | ***| | 0.401
34 0.0346 -0.613 | *| | 0.055
Sum of Residuals 0
Sum of Squared Residuals 0.04949
Predicted Residual SS (PRESS) 0.05706
42
SAS Plots
43
44
45
46
47
The REG Procedure
Model: MODEL1
Dependent Variable: y
Output Statistics
48
9 0.4000 0.4605 0.0305 0.3934 0.5276 -0.0605
10 0.6000 0.4605 0.0305 0.3934 0.5276 0.1395
11 0.3000 0.4095 0.0404 0.3205 0.4985 -0.1095
12 0.5000 0.4095 0.0404 0.3205 0.4985 0.0905
13 0.3000 0.3585 0.0528 0.2423 0.4747 -0.0585
Output Statistics
Output Statistics
-------DFBETAS-------
Obs DFFITS Intercept x
Sum of Residuals 0
Sum of Squared Residuals 0.09573
Predicted Residual SS (PRESS) 0.13405
49
5.2 Transformations
In some cases, one needs to transform the dependent variable y is ensure that the model assumptions are
correct. Suppose that one suspects that the homogeneity of variance assumption is violated. One solution
might be to use a more general linear regression model called weighted least squares (this will be discussed
in a later chapter). Another approach is to define what are called variance stabilizing transformations. The
concept is fairly simple and is presented as follows;
Using a form of the Taylor series approximation, any function f (y) of y with continuous first derivative
f 0 (y) and finite second derivative f 00 (y) can be expressed as
f (y) − f (η) = (y − η)f 0 (η) + 1/2(y − η)2 f 00 (θ),
where θ lies between y and η and E[y] = η. Thus whenever (y − η)2 is small, we have
f (y) − f (η) = (y − η)f 0 (η).
Squaring both side and taking the expectation, we have
var(f (y)) ≈ (f 0 (η))2 σ 2 (η),
where σ 2 (η) is the variance of the random variable y with mean η. Thus in order to find a variance stabilizing
transformation f of y which makes the var(f (y)) approximately constant, we need to solve the equation
f 0 (η) = c/σ(η),
where c is any constant. Such a transformation f is called a variance stabilizing transformation.
An example is suppose that y is a count, then σ 2 (η) ∝ η and one needs an f such that
f 0 (η) = c/η 1/2 .
If one lets c = 1/2 then f (η) = η 1/2 is a solution and therefore one should use the square root transformation
√
y whenever the response variable is a count. This same method provides a solution for
• y is the proportion of counts which means that y ∼ Binomial with var(y) = n−1 η(1 − η) where
√
E(y) = η. Then f = n1/2 sin−1 y is the appropraite transformation.
√
• y is such that σ 2 = σ = η from which f = log(η).
Draper and Smith provide a table (Table 13.11) containing additonal variance stabilizing transformation on
page 292.
Another method of finding transformations when one does not have a precise form for the variance of y
is the general Box-Cox family of transformations.
50
5.2.1 Box-Cox Transformations
Suppose that y > 0, define the Box-Cox transformation as
λ
(λ) (yi − 1)/λ when λ 6= 0,
yi =
ln yi when λ = 0,
where i = 1, 2, . . . , n. One determines λ by maximizing
n
X
−n/2 log[s2 (λ)] = (λ − 1) ln(yi ) − n/2 log[σ̂ 2 (λ)],
i=1
(λ)
and σ̂ 2 (λ) = 1/n~y (λ)0 [I − H]~y (λ) i.e., it is the sum of squares for the error term when yi is used instead of
(λ) (λ) (λ)
yi and ~y (λ) = (y1 , y2 , . . . , yn )0 .
Since, there is not a close form solution to the above maximization, one usually plots −n/2 log[s2 (λ)] vs
λ. Another approach is to compute a confidence interval using the fact that −n/2 log[s2 (λ)] ∼ χ2 (df = 1).
One can then use any λ which is contained in the confidence interval.
51
5.2.2 Examples of Box Cox Transformations
Splus Plots
52
SAS Plots
53
The SAS code (including a listing of the boxcox macro) is as follows;
%macro boxcox(name1=, lower2=, number=, increas=);
/*
This program contains the MACRO in order to find the BOX-COX Power
Transformation (Box and Cox 1964).
The program reads an input SAS filename bc, i.e.,
data bc;
infile ’R:/export/home/jtubbs/ftp/Regression/Data/DS_data/13.2’;
input obs f p y;
%boxcox(name1=y, lower2=-.95, number=20, increas=.1)
The user inputs the lower bound where the search is to begin and the
number of searches to be made.The increment for the search is also input
by the user.
USEAGE: %BOXCOX
INPUT:
REFERENCES:
Box,G.E.P., and D.R.Cox. (1964). An analysis of
transformations (with discussion). Journal of the Royal
Statistical Society.B.26:211-252.
Johnson,R.A., and D.Wichern. (1982) Applied Multivariate
54
Statistical Analysis.Englewood Cliffs,N.J.:Prentice-Hall.
*/
data aa;set a;
mn =_N_;
if mn = 1;
drop mn;
data a;set a;
y=log(xx);
z=(xx**lambda-1)/lambda;
55
/* The log-likelihood is calculated at each stage of the search */
data b;set b;
c91=var2*(n2-1)/n2;
c71=sum1;
c51=(-n2*log(c91)/2)+(lambda-1)*c71;
if abs(lambda)<0.000001 then lambda=0;
lambda&i=lambda;
loglik&i=c51;
loglik=c51;
data dat&i;set b;
keep lambda&i loglik&i;
%end;
data cc;set dat1;
lambda=lambda1;
loglik=loglik1;
keep loglik lambda;
56
indicate=’a’;
data ee;merge cc dd;
by indicate;
if loglik=maxlik;
drop indicate;
maxl_lik=maxlik;
%mend boxcox;
data bc;
infile ’c:/mydocuments/Regression/Data/DS_data/e13g’;
input obs x1 x2 x3 y; ly=log(y); lx1=log(x1); lx2=log(x2); lx3=log(x3); run;
title2 ’Transformation for Response variable Y’;
%boxcox(name1=x2, lower2=-0.85, number=30, increas=.1)
run;
title2 ’Transformation for Independent variable X1’;
%boxcox(name1=x3, lower2=-2.15, number=30, increas=.1)
run;
title2 ’Regular Linear regression’;
proc reg data=bc graphics;
model y = x1-x3/ss1 ss2 dw;
title2 ’Student Residuals’;plot student.*obs./nostat;run;
title2 ’Leverage Plot’; plot h.*obs./nostat;run;
title2 ’QQ plot’;plot student.*nqq./nostat;run;
57
run;
proc reg data=bc graphics;
model ly = x1-x3/ss1 ss2 dw;
title2 ’Student Residuals’;plot student.*obs./nostat;run;
title2 ’Leverage Plot’; plot h.*obs./nostat;run;
title2 ’QQ plot’;plot student.*nqq./nostat;run;
run;
title2 ’Regression using logs’;
proc reg data=bc graphics;
model ly = lx1 lx2 lx3/ ss1 ss2 dw;
title2 ’Student Residuals’;plot student.*obs./nostat;run;
title2 ’Leverage Plot’; plot h.*obs./nostat;run;
title2 ’QQ plot’;plot student.*nqq./nostat;run;
run;
The SAS output (excluding the boxcox macro) is;
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t| Type I SS Type II SS
Durbin-Watson D 1.447
Number of Observations 35
58
1st Order Autocorrelation 0.249
--------------------------------------------------------------------------------------------------------
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t| Type I SS Type II SS
Durbin-Watson D 1.586
Number of Observations 35
1st Order Autocorrelation 0.173
-------------------------------------------------------------------------------------------------------
The REG Procedure
Model: MODEL1
Dependent Variable: ly
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
59
Root MSE 0.05065 R-Square 0.9552
Dependent Mean 3.23975 Adj R-Sq 0.9509
Coeff Var 1.56347
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t| Type I SS Type II SS
Durbin-Watson D 1.901
Number of Observations 35
1st Order Autocorrelation 0.016
60
Chapter 6
In this chapter the approach given in the previous chapters are extended to the case where there is a general
error structure.
This model can be written as
y = Xβ + e
where
y1
1 x11 x21 ... x(p−1)1 e1
y2 1 x12 x21 ... x(p−1)1 e2
y=
.. X=. e=
...
.. .. ..
..
. . . .
yn 1 x1n x2n ... x(p−1)n en
and
β0
β1
β2
β= ,
..
.
βp−1
where
E(e) = 0, V ar(e) = σ 2 Σ, e ∼ Nn (0, σ 2 Σ).
The least squares problem becomes find β̂ which minimizes
∂Q(β)
= −2X 0 Σ−1 y + 2X 0 Σ−1 Xβ = 0.
∂β
From which one obtains the normal equations given by
X 0 Σ−1 Xβ = X 0 Σ−1 y.
61
6.0.3 Inference
The above derivation is similar to that given in the previous chapter. At this stage the problem is one of
optimation rather than statistics. In order to create a statistical problem it is necessary to introduce some
distributional assumptions. When using the linear models approach one makes assumptions concerning the
error structure. That is, assume that the unobserved error ei , i = 1, 2, . . . , n are i.i.d normals with mean =
0 and variance = σ 2 Σ. This assumption becomes
e1
e2 2
... ∼ Nn (0, σ Σ).
e=
en
y1
y2 2
.. ∼ Nn (Xβ, σ Σ).
y=
.
yn
62
3.00 2.60 7.84
3.00 2.67 7.84
3.00 2.66 7.84
3.00 2.78 7.84
3.00 2.80 7.84
5.34 5.92 7.43
5.38 5.35 6.99
5.40 4.33 6.78
5.40 4.89 6.78
5.45 5.21 6.30
7.70 7.68 0.89
7.80 9.81 0.84
7.81 6.52 0.83
7.85 9.71 0.82
7.87 9.82 0.81
7.91 9.81 0.79
7.94 8.50 0.78
9.03 9.47 0.47
9.07 11.45 0.46
9.11 12.14 0.45
9.14 11.50 0.45
9.16 10.65 0.44
9.37 10.64 0.41
10.17 9.78 0.31
10.18 12.39 0.31
10.22 11.03 0.30
10.22 8.00 0.30
10.22 11.90 0.30
10.18 8.68 0.31
10.50 7.25 0.28
10.23 13.46 0.30
10.03 10.19 0.32
10.23 9.93 0.30
;
proc reg;
model y = x / influence dw; run;
title2 ’Regression Plot’; plot y*x; run;
/*
title2 ’Residual Plot’; plot r.*obs./ vref=0; run;
title2 ’Leverage Plot’; plot h.*obs.; run;
*/
title2 ’Check for Outlier’; plot student.*obs.; run;
/*
title2 ’Check for outlier ith obs missing’; plot rstudent.*obs.; run;
title2 ’PP Plot’; plot npp.*r. /nostat; run;
title2 ’QQ Plot’; plot r. * nqq. /nostat; run;
*/
* Weighted Least Squares;
proc reg;
63
model y = x / influence dw;
weight w;run;
/*
title2 ’Regression Plot’; plot y*x; run;
title2 ’Residual Plot’; plot r.*obs./ vref=0; run;
title2 ’Leverage Plot’; plot h.*obs.; run;
*/
title2 ’Check for Outlier’; plot student.*obs.; run;
The output is;
Table 9.1 data X, Y, W page 226
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Durbin-Watson D 1.952
Number of Observations 35
1st Order Autocorrelation 0.015
Output Statistics
64
3 -0.2273 -0.1615 0.0946 1.1727 -0.0522 -0.0511 0.0436
4 -0.1573 -0.1118 0.0946 1.1737 -0.0361 -0.0353 0.0302
5 -0.1673 -0.1189 0.0946 1.1736 -0.0384 -0.0376 0.0321
6 -0.0473 -0.0336 0.0946 1.1745 -0.0109 -0.0106 0.0091
7 -0.0273 -0.0194 0.0946 1.1746 -0.0063 -0.0061 0.0052
8 0.4359 0.3016 0.0426 1.1045 0.0636 0.0529 -0.0365
9 -0.1795 -0.1240 0.0421 1.1091 -0.0260 -0.0215 0.0147
10 -1.2222 -0.8537 0.0418 1.0610 -0.1783 -0.1468 0.1002
11 -0.6622 -0.4589 0.0418 1.0954 -0.0958 -0.0789 0.0539
12 -0.3990 -0.2758 0.0411 1.1038 -0.0571 -0.0466 0.0315
13 -0.4837 -0.3324 0.0290 1.0877 -0.0575 -0.0140 -0.0072
14 1.5328 1.0704 0.0293 1.0211 0.1860 0.0391 0.0295
15 -1.7686 -1.2425 0.0293 0.9971 -0.2160 -0.0447 -0.0350
16 1.3760 0.9577 0.0295 1.0356 0.1669 0.0323 0.0292
17 1.4633 1.0204 0.0295 1.0279 0.1781 0.0333 0.0324
18 1.4079 0.9807 0.0297 1.0330 0.1716 0.0298 0.0335
19 0.0638 0.0438 0.0298 1.0960 0.0077 0.0013 0.0016
20 -0.2037 -0.1405 0.0386 1.1048 -0.0281 0.0046 -0.0143
21 1.7308 1.2212 0.0390 1.0103 0.2461 -0.0424 0.1274
22 2.3754 1.7120 0.0395 0.9292 0.3473 -0.0634 0.1828
23 1.7014 1.2000 0.0399 1.0143 0.2446 -0.0464 0.1304
24 0.8287 0.5748 0.0402 1.0854 0.1176 -0.0229 0.0631
25 0.5802 0.4020 0.0430 1.1001 0.0852 -0.0208 0.0493
26 -1.1881 -0.8359 0.0566 1.0796 -0.2047 0.0815 -0.1441
27 1.4105 0.9970 0.0568 1.0606 0.2447 -0.0978 0.1725
28 0.005126 0.003570 0.0576 1.1285 0.0009 -0.0004 0.0006
29 -3.0249 -2.2698 0.0576 0.8372 -0.5611 0.2280 -0.3983
30 0.8751 0.6130 0.0576 1.1024 0.1515 -0.0616 0.1076
31 -2.2995 -1.6689 0.0568 0.9542 -0.4095 0.1638 -0.2887
32 -4.0928 -3.3136 0.0635 0.6295 -0.8630 0.3868 -0.6401
33 2.4238 1.7687 0.0578 0.9366 0.4381 -0.1787 0.3115
34 -0.6191 -0.4316 0.0539 1.1111 -0.1030 0.0386 -0.0706
35 -1.1062 -0.7777 0.0578 1.0872 -0.1926 0.0786 -0.1370
Sum of Residuals 0
Sum of Squared Residuals 70.01571
Predicted Residual SS (PRESS) 77.79464
--------------------------------------------------------------------------------------------------------
Weight: w
Analysis of Variance
65
Sum of Mean
Source DF Squares Square F Value Pr > F
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Durbin-Watson D 1.662
Number of Observations 35
1st Order Autocorrelation 0.160
Output Statistics
66
20 0.4700 -0.1570 -0.0950 0.0304 1.0962 -0.0168
21 0.4600 1.7764 1.0825 0.0302 1.0205 0.1910
22 0.4500 2.4198 1.4805 0.0300 0.9603 0.2603
23 0.4500 1.7449 1.0506 0.0303 1.0248 0.1858
24 0.4400 0.8716 0.5122 0.0299 1.0785 0.0899
25 0.4100 0.6171 0.3493 0.0300 1.0881 0.0615
26 0.3100 -1.1745 -0.5799 0.0297 1.0734 -0.1015
27 0.3100 1.4239 0.7049 0.0298 1.0629 0.1236
28 0.3000 0.0173 0.008371 0.0292 1.0955 0.0015
29 0.3000 -3.0127 -1.5061 0.0292 0.9553 -0.2613
30 0.3000 0.8873 0.4299 0.0292 1.0829 0.0746
31 0.3100 -2.2861 -1.1458 0.0298 1.0115 -0.2009
32 0.2800 -4.0887 -2.0277 0.0297 0.8607 -0.3550
33 0.3000 2.4357 1.2030 0.0293 1.0028 0.2091
34 0.3200 -0.6014 -0.3005 0.0293 1.0895 -0.0522
35 0.3000 -1.0943 -0.5310 0.0293 1.0765 -0.0923
Output Statistics
-------DFBETAS-------
Obs Intercept x
1 0.1293 -0.1124
2 -0.1221 0.1005
3 -0.0053 0.0038
4 0.0621 -0.0438
5 0.0524 -0.0370
6 0.1684 -0.1188
7 0.1879 -0.1325
8 0.0364 0.1635
9 -0.0012 -0.0071
10 -0.0423 -0.3148
11 -0.0185 -0.1373
12 -0.0046 -0.0646
13 0.0335 -0.0515
14 -0.1375 0.2087
15 0.1428 -0.2164
16 -0.1230 0.1856
17 -0.1302 0.1959
18 -0.1241 0.1858
19 -0.0116 0.0173
20 0.0112 -0.0153
21 -0.1273 0.1738
22 -0.1741 0.2372
23 -0.1246 0.1695
24 -0.0604 0.0820
25 -0.0420 0.0565
26 0.0733 -0.0953
27 -0.0893 0.1161
67
28 -0.0011 0.0014
29 0.1892 -0.2457
30 -0.0540 0.0701
31 0.1451 -0.1887
32 0.2608 -0.3356
33 -0.1514 0.1966
34 0.0374 -0.0489
35 0.0668 -0.0868
Sum of Residuals 0
Sum of Squared Residuals 42.37373
Predicted Residual SS (PRESS) 47.09259
68
6.1.2 SAS Scatterplot
69
6.1.3 SAS Student Plot
70
6.1.4 SAS Student Plot for Weighted LS
71
Chapter 7
In this chapter the aim is to determine the best subset model when using a multiple regression model. Draper
and Smith introduce this concept in chapter 15. They propose using the statistics, 1) R2 , 2) s2 , the residual
mean square, or 3) Mallow’s Cp statistic. The first two statistics are found from running each of the possible
subset models and computing the corresponding R2 and s2 = σ̂ 2 . The Mallow’s Cp is as follows
72
2. The partial F-value is calculated for every predictor variable using the type II sum of squares.
3. The lowest partial F-value, FL , is compared with a preselected or default significance level, F0 .
(a) If FL < F0 , remove the variable corresponding to FL , say XL then recompute the model using
the reduced model.
(b) If FL > F0 , adopt the regression model as calculated.
A list of the procedures used by SAS is as follows;
Stepwise (STEPWISE)
The stepwise method is a modification of the forward-selection technique and differs in that variables already
in the model do not necessarily stay there. As in the forward-selection method, variables are added one by
one to the model, and the F statistic for a variable to be added must be significant at the SLENTRY=
level. After a variable is added, however, the stepwise method looks at all the variables already included in
the model and deletes any variable that does not produce an F statistic significant at the SLSTAY= level.
Only after this check is made and the necessary deletions accomplished can another variable be added to the
model. The stepwise process ends when none of the variables outside the model has an F statistic significant
73
at the SLENTRY= level and every variable in the model is significant at the SLSTAY= level, or when the
variable to be added to the model is the one just deleted from it.
R2 Selection (RSQUARE)
The RSQUARE method finds subsets of independent variables that best predict a dependent variable by
linear regression in the given sample. You can specify the largest and smallest number of independent
variables to appear in a subset and the number of subsets of each size to be selected. The RSQUARE
method can efficiently perform all possible subset regressions and display the models in decreasing order of
R2 magnitude within each subset size. Other statistics are available for comparing subsets of different sizes.
These statistics, as well as estimated regression coefficients, can be displayed or output to a SAS data set.
The subset models selected by the RSQUARE method are optimal in terms of R2 for the given sample,
but they are not necessarily optimal for the population from which the sample is drawn or for any other
sample for which you may want to make predictions. If a subset model is selected on the basis of a large
R2 value or any other criterion commonly used for model selection, then all regression statistics computed
for that model under the assumption that the model is given a priori, including all statistics computed by
PROC REG, are biased.
While the RSQUARE method is a useful tool for exploratory model building, no statistical method can
be relied on to identify the “true” model. Effective model building requires substantive theory to suggest
relevant predictors and plausible functional forms for the model.
The RSQUARE method differs from the other selection methods in that RSQUARE always identifies
the model with the largest R2 for each number of variables considered. The other selection methods are not
guaranteed to find the model with the largest R2 . The RSQUARE method requires much more computer
time than the other selection methods, so a different selection method such as the STEPWISE method is a
good choice when there are many independent variables to consider.
74
Adjusted R2 Selection (ADJRSQ)
This method is similar to the RSQUARE method, except that the adjusted R2 statistic is used as the
criterion for selecting models, and the method finds the models with the highest adjusted R2 within the
range of sizes.
7.2.1 Example
The following SAS program illustrates the output for stepwaise and backward selection methods;
options center nodate pagesize=100 ls=80;
symbol1 i=none color=red v=star ;
symbol2 i=join v=none color=blue;
symbol3 i=join color=green v=none;
title1 ’Data on page 46’;
data t01a;
*infile ’R:/export/home/jtubbs/ftp/Regression/DS_data/t01a’;
input obs x1-x10 ;
cards;
1 10.98 5.20 0.61 7.4 31 20 22 35.3 54.8 4
2 11.13 5.12 0.64 8.0 29 20 25 29.7 64.0 5
3 12.51 6.19 0.78 7.4 31 23 17 30.8 54.8 4
4 8.40 3.89 0.49 7.5 30 20 22 58.8 56.3 4
5 9.27 6.28 0.84 5.5 31 21 0 61.4 30.3 5
6 8.73 5.76 0.74 8.9 30 22 0 71.3 79.2 4
7 6.36 3.45 0.42 4.1 31 11 0 74.4 16.8 2
8 8.50 6.57 0.87 4.1 31 23 0 76.7 16.8 5
9 7.82 5.69 0.75 4.1 30 21 0 70.7 16.8 4
10 9.14 6.14 0.76 4.5 31 20 0 57.5 20.3 5
11 8.24 4.84 0.65 10.3 30 20 11 46.4 106.1 4
12 12.19 4.88 0.62 6.9 31 21 12 28.9 47.6 4
13 11.88 6.03 0.79 6.6 31 21 25 28.1 43.6 5
14 9.57 4.55 0.60 7.3 28 19 18 39.1 53.3 5
15 10.94 5.71 0.70 8.1 31 23 5 46.8 65.6 4
16 9.58 5.67 0.74 8.4 30 20 7 48.5 70.6 4
17 10.09 6.72 0.85 6.1 31 22 0 59.3 37.2 6
18 8.11 4.95 0.67 4.9 30 22 0 70.0 24.0 4
19 6.83 4.62 0.45 4.6 31 11 0 70.0 21.2 3
20 8.88 6.60 0.95 3.7 31 23 0 74.5 13.7 4
75
21 7.68 5.01 0.64 4.7 30 20 0 72.1 22.1 4
22 8.47 5.68 0.75 5.3 31 21 1 58.1 28.1 6
23 8.86 5.28 0.70 6.2 30 20 14 44.6 38.4 4
24 10.36 5.36 0.67 6.8 31 20 22 33.4 46.2 4
25 11.08 5.87 0.70 7.5 31 22 28 28.6 56.3 5
;
proc reg graphics;
model x1 = x2 x3 x4 x5 x6 x7 x8 x9 x10 / selection=stepwise; run;
model x1 = x2 x3 x4 x5 x6 x7 x8 x9 x10 / selection=backward; run;
model x1 = x2 x3 x4 x5 x6 x7 x8 x9 x10 / selection=rsquare best =5; run;
plot cp.*np./ chocking=red cmallows=blue; run;
The output is;
Data on page 46
The REG Procedure
Model: MODEL1
Dependent Variable: x1
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Parameter Standard
Variable Estimate Error Type II SS F Value Pr > F
Analysis of Variance
76
Sum of Mean
Source DF Squares Square F Value Pr > F
Parameter Standard
Variable Estimate Error Type II SS F Value Pr > F
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Parameter Standard
Variable Estimate Error Type II SS F Value Pr > F
77
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Parameter Standard
Variable Estimate Error Type II SS F Value Pr > F
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Parameter Standard
Variable Estimate Error Type II SS F Value Pr > F
78
Bounds on condition number: 1.0534, 9.3248
--------------------------------------------------------------------------------
All variables left in the model are significant at the 0.1500 level.
No other variable met the 0.1500 significance level for entry into the model.
--------------------------------------------------------------------------------------------------------
The REG Procedure
Model: MODEL2
Dependent Variable: x1
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Parameter Standard
Variable Estimate Error Type II SS F Value Pr > F
79
x6 0.17935 0.08095 1.59339 4.91 0.0426
x7 -0.01818 0.02451 0.17859 0.55 0.4697
x8 -0.07742 0.01659 7.06721 21.77 0.0003
x9 -0.08585 0.05200 0.88475 2.73 0.1195
x10 -0.34501 0.21070 0.87036 2.68 0.1223
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Parameter Standard
Variable Estimate Error Type II SS F Value Pr > F
Analysis of Variance
80
Sum of Mean
Source DF Squares Square F Value Pr > F
Parameter Standard
Variable Estimate Error Type II SS F Value Pr > F
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Parameter Standard
Variable Estimate Error Type II SS F Value Pr > F
81
x9 -0.09798 0.04523 1.38878 4.69 0.0439
x10 -0.40827 0.18658 1.41681 4.79 0.0421
All variables left in the model are significant at the 0.1000 level.
--------------------------------------------------------------------------------------------------------
Number in
Model R-Square Variables in Model
1 0.7144 x8
1 0.4104 x7
1 0.2874 x6
1 0.2250 x4
1 0.1557 x9
-----------------------------------------------------
2 0.8600 x2 x8
2 0.8491 x6 x8
2 0.8467 x3 x8
2 0.7555 x5 x8
2 0.7495 x8 x10
-----------------------------------------------------
3 0.8849 x5 x6 x8
3 0.8796 x2 x6 x8
3 0.8651 x2 x8 x10
3 0.8638 x2 x5 x8
3 0.8635 x6 x8 x9
-----------------------------------------------------
4 0.8934 x2 x5 x6 x8
4 0.8914 x2 x6 x8 x10
4 0.8899 x5 x6 x7 x8
4 0.8884 x5 x6 x8 x9
82
4 0.8879 x2 x3 x6 x8
-----------------------------------------------------
5 0.8995 x2 x3 x6 x8 x10
5 0.8988 x2 x5 x6 x8 x10
5 0.8975 x4 x5 x6 x8 x9
5 0.8975 x2 x3 x5 x6 x8
5 0.8974 x2 x6 x8 x9 x10
-----------------------------------------------------
6 0.9165 x2 x4 x6 x8 x9 x10
6 0.9075 x2 x3 x6 x8 x9 x10
6 0.9066 x2 x4 x5 x6 x8 x9
6 0.9052 x2 x3 x4 x6 x8 x10
6 0.9037 x2 x3 x5 x6 x8 x10
-----------------------------------------------------
7 0.9201 x2 x4 x5 x6 x8 x9 x10
7 0.9197 x2 x4 x6 x7 x8 x9 x10
7 0.9186 x2 x3 x4 x6 x8 x9 x10
7 0.9122 x3 x4 x5 x6 x8 x9 x10
7 0.9109 x2 x3 x6 x7 x8 x9 x10
-----------------------------------------------------
8 0.9226 x2 x4 x5 x6 x7 x8 x9 x10
8 0.9220 x2 x3 x4 x6 x7 x8 x9 x10
8 0.9209 x2 x3 x4 x5 x6 x8 x9 x10
8 0.9158 x3 x4 x5 x6 x7 x8 x9 x10
8 0.9119 x2 x3 x5 x6 x7 x8 x9 x10
-----------------------------------------------------
9 0.9237 x2 x3 x4 x5 x6 x7 x8 x9 x10
83
84
85
7.2.2 Selection using JMP
86
Chapter 8
Multicollinearity
In this chapter the problem of having linear dependence among the “independent” variables is discussed.
Draper and Smith discuss this topic in chapter 16 under the title of “Ill-conditioning in regression data”.
The problem arises whenever there is a linear dependency among the independent variables. That is, let
where ~xi is the n × 1 vector of responses for the ith variable and ~x0 = ~j. The indendent variables are said
to have linear dependence whenever
p−1
X
tj ~xj = 0,
j=0
for tj 6= 0. If the above condition holds then (X 0 X)−1 does not exist. Seldom does the above linear depen-
dency actually hold, rather one nearly has linear dependency which implies that (X 0 X)−1 is ill conditioned,
hence any estimates using (X 0 X)−1 are “poor”. Think about the problem of dividing a number by a very
small positive value. Montogomery and Peck list four primary reasons for having a mulicollinearity problem
within the regression model. They are;
1. The data collection method employed.
2. Constraints on the model or in the population.
3. Model specification.
4. An over-defined model.
Consider the following artifical example given in Sen and Srivastava gives an example of what can happen
whenever collinearity is present. Note the different values for y are nearly the same yet the parameter
estimates vary greatly.
options center nodate pagesize=100 ls=80;
symbol1 i=none color=red v=star ;
symbol2 i=join v=none color=blue;
symbol3 i=join color=green v=none;
title1 ’Artifical Multicollinearity Example’;
data t01a;
input x1 x2 y1 y2 y3 ;
cards;
87
2.705 2.659 4.1 4.1 4.06
2.995 3.005 4.34 4.73 4.39
3.255 3.245 4.95 4.81 5.02
3.595 3.605 5.36 5.3 5.23
3.805 3.795 5.64 5.75 5.57
4.145 4.155 6.18 6.26 6.5
4.405 4.395 6.69 6.61 6.65
4.745 4.755 7.24 7.13 7.26
4.905 4.895 7.46 7.3 7.48
4.845 4.855 7.23 7.32 7.39
;
proc reg graphics;
model y1 y2 y3 = x1 x2 / noint collin vif tol; run;
The SAS output is;
Artifical Multicollinearity Example
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Parameter Standard
Variable DF Estimate Error t Value Pr > |t| Tolerance
Variance
Variable DF Inflation
x1 1 54102
x2 1 54102
88
Collinearity Diagnostics
Condition --Proportion of Variation-
Number Eigenvalue Index x1 x2
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Parameter Standard
Variable DF Estimate Error t Value Pr > |t| Tolerance
Variance
Variable DF Inflation
x1 1 54102
x2 1 54102
Collinearity Diagnostics
89
The REG Procedure
Model: MODEL1
Dependent Variable: y3
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Parameter Standard
Variable DF Estimate Error t Value Pr > |t| Tolerance
Variance
Variable DF Inflation
x1 1 54102
x2 1 54102
Collinearity Diagnostics
Condition --Proportion of Variation-
Number Eigenvalue Index x1 x2
90
and
T OLj = C(jj) = (1 − Rj2 ), j = 1, 2, . . . , p − 1,
and
V IFj = T OL−1
j .
Some authors suggest that if V IFj > 10 then collinearity may be present.
where λj is the j th eigenvalue of the matrix X 0 X. Since the order of the eigenvalues can be ordered so that
λ0 ≥ λ1 ≥ . . . ≥ λp−1 whenever one of the λ0 s becomes small the matrix X 0 X is said to be ill conditioned.
The condition number is defined as q
CNj = λ0 /λj .
If CNj > 30, collinearity may be present.
91
8.1.4 SAS Example
The SAS code for the example is;
options center nodate pagesize=100 ls=80;
symbol1 color=black v=x ;
symbol2 v=circle color=red;
symbol3 color=blue v=square;
symbol4 v=triangle color=green;
symbol5 v=plus color=orange;
title1 ’Data on page 46’;
data t01a;
*infile ’MacIntosh HD:Regression:DS_data:t01a’;
input obs x1-x10 ;
cards;
1 10.98 5.20 0.61 7.4 31 20 22 35.3 54.8 4
2 11.13 5.12 0.64 8.0 29 20 25 29.7 64.0 5
3 12.51 6.19 0.78 7.4 31 23 17 30.8 54.8 4
4 8.40 3.89 0.49 7.5 30 20 22 58.8 56.3 4
5 9.27 6.28 0.84 5.5 31 21 0 61.4 30.3 5
6 8.73 5.76 0.74 8.9 30 22 0 71.3 79.2 4
7 6.36 3.45 0.42 4.1 31 11 0 74.4 16.8 2
8 8.50 6.57 0.87 4.1 31 23 0 76.7 16.8 5
9 7.82 5.69 0.75 4.1 30 21 0 70.7 16.8 4
10 9.14 6.14 0.76 4.5 31 20 0 57.5 20.3 5
11 8.24 4.84 0.65 10.3 30 20 11 46.4 106.1 4
12 12.19 4.88 0.62 6.9 31 21 12 28.9 47.6 4
13 11.88 6.03 0.79 6.6 31 21 25 28.1 43.6 5
14 9.57 4.55 0.60 7.3 28 19 18 39.1 53.3 5
15 10.94 5.71 0.70 8.1 31 23 5 46.8 65.6 4
16 9.58 5.67 0.74 8.4 30 20 7 48.5 70.6 4
17 10.09 6.72 0.85 6.1 31 22 0 59.3 37.2 6
18 8.11 4.95 0.67 4.9 30 22 0 70.0 24.0 4
19 6.83 4.62 0.45 4.6 31 11 0 70.0 21.2 3
20 8.88 6.60 0.95 3.7 31 23 0 74.5 13.7 4
21 7.68 5.01 0.64 4.7 30 20 0 72.1 22.1 4
22 8.47 5.68 0.75 5.3 31 21 1 58.1 28.1 6
23 8.86 5.28 0.70 6.2 30 20 14 44.6 38.4 4
24 10.36 5.36 0.67 6.8 31 20 22 33.4 46.2 4
25 11.08 5.87 0.70 7.5 31 22 28 28.6 56.3 5
;
title2 ’COLLINEARITY OUTPUT’;
proc reg;
model x1 = x2-x10 / vif tol collin collinoint; run;
The SAS output is;
The REG Procedure
Model: MODEL1
Dependent Variable: y
92
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Parameter Standard
Variable DF Estimate Error t Value Pr > |t| Tolerance
Variance
Variable DF Inflation
Intercept 1 0
x2 1 15.74659
x3 1 20.13711
x4 1 126.62562
x5 1 1.83663
x6 1 4.41192
x7 1 4.69501
x8 1 6.06743
x9 1 107.59089
x10 1 2.38505
Collinearity Diagnostics
93
3 0.16428 7.42994 0.00001022 0.00016643 0.00019850
4 0.05575 12.75456 0.00027529 0.00184 0.00367
5 0.01457 24.94866 0.00000153 0.00745 0.01573
6 0.00945 30.98408 0.00530 0.00400 0.00521
7 0.00494 42.86180 0.00132 0.04999 0.02661
8 0.00098671 95.86934 0.02321 0.61220 0.59483
9 0.00041422 147.96509 0.00030721 0.01339 0.13346
10 0.00014186 252.83547 0.96957 0.31092 0.22023
----------------------Proportion of Variation----------------------
Number x4 x5 x6 x7 x8
Collinearity Diagnostics
--Proportion of Variation-
Number x9 x10
1 0.00002062 0.00017870
2 0.00024277 0.00011191
3 0.00717 0.00301
4 0.00010957 0.04011
5 0.00000490 0.72075
6 0.00512 0.00608
7 0.00965 0.00776
8 0.01404 0.01752
Collinearity Diagnostics
--Proportion of Variation-
Number x9 x10
9 0.80868 0.13873
10 0.15496 0.06575
94
Collinearity Diagnostics(intercept adjusted)
----------------------Proportion of Variation----------------------
Number x5 x6 x7 x8 x9
-Proportion of Variation-
Number x10
1 0.00135
2 0.02465
3 0.00618
4 0.10686
5 0.58110
6 0.13660
7 0.01522
8 0.07364
9 0.05440
95
Chapter 9
Ridge Regression
Ridge regression is a popular method for detecting multicollinearity within a regression model. It was first
proposed by Hoerl and Kennard (1970) and it was one of the first biased estimation procedures. The idea
is fairly simple. Since the matrix X 0 X is ill-condition or nearly singular one can add positive constants to
the diagonal matrix and insure that the resulting matrix is not ill-conditioned. That is, consider the biased
normal equations given by
(X 0 X + kIn ]β = X 0 y.
With a resulting biased estimate for β given by
where k is called the shrinkage parameter. Since, E[β̃] 6= β some do not want to use such a precedure.
However in spite of the fact that it is biased, it does have the effect of reducing the variance in the estimator.
It can be shown that,
var(β̂j ) = σ 2 1/λj ,
where λj is the j th eigenvalue of X 0 X. So when X 0 X is ill-conditioned some of the λ0j s are very small, hence
var(β̂j ) is very large. However,
var(β̃j ) = σ 2 λj /(λj + k)2 .
Consider the example whereσ 2 = 1, λ1 = 2.985, λ2 = 0.01, and λ3 = 0.005, the usual least squares estimation
gives,
X3 3
X
2
var(β̂j ) = σ 1/λj = .3350 + 100 + 200 = 300.3350.
j=1 i=1
This process of reducing the total variance is very desireable and has led people to proposed similar estimation
procedures called shrinkage estimators. In this class, we are interested using this procedure as a way of
identifying multicollinearity and the variables which may contribute to this problem. The best way of
illustrating this procedure is with an example.
96
9.0.5 Ridge Plot Example
It can be shown that
lim β̃(k) → β̂,
as k → 0. Whenever multicollinearity is present the value of the estimate as it approaches zero will often
change dramatically. Notice the following plot the effect on the least sqaures estimate for x3 and x4.
The SAS code is,
options center nodate pagesize=100 ls=80;
symbol1 color=black v=x ;
symbol2 v=circle color=red;
symbol3 color=blue v=square;
symbol4 v=triangle color=green;
symbol5 v=plus color=orange;
title1 ’Data on page 46’;
data t01a;
input obs x1-x10 ;
cards;
1 10.98 5.20 0.61 7.4 31 20 22 35.3 54.8 4
2 11.13 5.12 0.64 8.0 29 20 25 29.7 64.0 5
3 12.51 6.19 0.78 7.4 31 23 17 30.8 54.8 4
4 8.40 3.89 0.49 7.5 30 20 22 58.8 56.3 4
5 9.27 6.28 0.84 5.5 31 21 0 61.4 30.3 5
6 8.73 5.76 0.74 8.9 30 22 0 71.3 79.2 4
7 6.36 3.45 0.42 4.1 31 11 0 74.4 16.8 2
8 8.50 6.57 0.87 4.1 31 23 0 76.7 16.8 5
9 7.82 5.69 0.75 4.1 30 21 0 70.7 16.8 4
10 9.14 6.14 0.76 4.5 31 20 0 57.5 20.3 5
11 8.24 4.84 0.65 10.3 30 20 11 46.4 106.1 4
12 12.19 4.88 0.62 6.9 31 21 12 28.9 47.6 4
13 11.88 6.03 0.79 6.6 31 21 25 28.1 43.6 5
14 9.57 4.55 0.60 7.3 28 19 18 39.1 53.3 5
15 10.94 5.71 0.70 8.1 31 23 5 46.8 65.6 4
16 9.58 5.67 0.74 8.4 30 20 7 48.5 70.6 4
17 10.09 6.72 0.85 6.1 31 22 0 59.3 37.2 6
18 8.11 4.95 0.67 4.9 30 22 0 70.0 24.0 4
19 6.83 4.62 0.45 4.6 31 11 0 70.0 21.2 3
20 8.88 6.60 0.95 3.7 31 23 0 74.5 13.7 4
21 7.68 5.01 0.64 4.7 30 20 0 72.1 22.1 4
22 8.47 5.68 0.75 5.3 31 21 1 58.1 28.1 6
23 8.86 5.28 0.70 6.2 30 20 14 44.6 38.4 4
24 10.36 5.36 0.67 6.8 31 20 22 33.4 46.2 4
25 11.08 5.87 0.70 7.5 31 22 28 28.6 56.3 5
;
title2 ’Ridge Plot’;
proc reg outest=b outvif ridge = 0 to 0.02 by .002;
model y = x2 - x10 / vif tol collin collinoint; run;
plot /ridgeplot nomodel legend=legend2 nostat vref=0 lvref=1 cvref=blue; run;
The SAS output is similar to that when considering the collinearity problem of the previous chapter. The
main difference is the following plot which is called a ridge trace plot.
97
98
When the maginitude of the coefficients are adjusted.
99
4 49.409 53.84 118.75 117.73 65 281.30
5 72.958 49.17 119.72 117.69 25 282.20
6 107.702 47.61 168.38 173.46 45 216.14
7 97.239 64.19 169.85 169.85 45 223.88
8 105.856 52.73 169.85 170.86 45 222.80
9 99.348 51.00 170.89 173.92 80 218.84
10 111.907 47.37 171.31 173.34 25 218.12
11 100.008 43.18 171.43 171.43 45 219.20
12 175.380 71.23 171.59 263.49 45 168.62
13 117.800 49.30 171.63 171.63 45 217.58
14 217.409 50.87 171.93 170.91 10 219.92
15 41.725 54.44 173.92 71.73 45 296.60
16 151.139 47.93 221.44 217.39 65 189.14
17 220.630 42.91 222.74 221.73 25 186.08
18 131.666 66.60 228.90 114.40 25 285.80
19 80.537 64.94 231.19 113.52 65 286.34
20 152.966 43.18 236.84 167.77 45 221.72
;
title2 ’COLLINEARITY OUTPUT’;
proc reg data = t01a;
model y1 y2 = x1-x4 / ss1 ss2 vif tol collin collinoint; run;
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
100
Coeff Var 18.25196
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t| Type I SS
Parameter Estimates
Variance
Variable DF Type II SS Inflation
Intercept 1 117.55264 0
X1 1 12900 1.01124
X2 1 1501.68695 19.75531
X3 1 10449 1.00062
X4 1 38.03088 19.80063
Collinearity Diagnostics
Condition
Number Eigenvalue Index
1 4.74606 1.00000
2 0.11521 6.41820
3 0.09539 7.05362
4 0.04297 10.50897
5 0.00035682 115.33036
Collinearity Diagnostics
----------------------Proportion of Variation----------------------
Number Intercept X1 X2 X3 X4
101
Collinearity Diagnostics(intercept adjusted)
Condition
Number Eigenvalue Index
1 1.99079 1.00000
2 1.00202 1.40953
3 0.98158 1.42413
4 0.02561 8.81690
-----------------Proportion of Variation----------------
Number X1 X2 X3 X4
COLLINEARITY OUTPUT
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t| Type I SS
102
X1 1 0.03502 0.03646 0.96 0.3520 72.79590
X2 1 0.48264 0.15456 3.12 0.0070 3.82753
X3 1 0.04319 0.09192 0.47 0.6452 15.25565
X4 1 0.60627 0.18551 3.27 0.0052 543.85694
Parameter Estimates
Variance
Variable DF Type II SS Inflation
Intercept 1 338.77239 0
X1 1 46.98637 1.01124
X2 1 496.50004 19.75531
X3 1 11.24086 1.00062
X4 1 543.85694 19.80063
Collinearity Diagnostics
Condition
Number Eigenvalue Index
1 4.74606 1.00000
2 0.11521 6.41820
3 0.09539 7.05362
4 0.04297 10.50897
5 0.00035682 115.33036
Collinearity Diagnostics
----------------------Proportion of Variation----------------------
Number Intercept X1 X2 X3 X4
Condition
Number Eigenvalue Index
1 1.99079 1.00000
2 1.00202 1.40953
3 0.98158 1.42413
4 0.02561 8.81690
103
Collinearity Diagnostics(intercept adjusted)
-----------------Proportion of Variation----------------
Number X1 X2 X3 X4
104
105
106
Chapter 10
The material in this chapter is similar to that given in Draper and Smith chapter 14. I will not inlcude any
discussion of why and how we use dummy variables. Rather I have chosen to consider the example that
they given in their chapter 14 and illustrate how one can use PROC GLM in SAS to reproduce the results
without having to define these dummy variables. On page 302, example 1 the aim is to determine whether
or not their are differences in states (y intercepts) for the regression model of weights (y) as a function of
age (x). This model will be expanded to determine if there are differing slope for the y = x model.
107
title2 ’Least Squares Fit to the Data’;
proc sort; by x;
proc gplot;plot y*x=state;run;
proc reg graphics;
model y = x;
plot y*x;
run;
proc reg ;
model y = x z1 z2;
run;
title2 ’Proc GLM Using Dummy varaiable for State’;
proc glm;
class state;
model y = x state / E solution;
contrast ’Test for G - V’
state 1 -1 0;
contrast ’Test for V - W’
state 0 1 -1;
contrast ’Test for G - W’
state 1 0 -1;
run;
proc glm;
class state;
model y = x state x*state / e solution;
contrast ’Test for G - V y-int’
state 1 -1 0;
contrast ’Test for V - W y-int’
state 0 1 -1;
contrast ’Test for G - W y-int’
state 1 0 -1;
contrast ’Test for G - V slope’
state*x 1 -1 0;
contrast ’Test for V - W slope’
state*x 0 1 -1;
contrast ’Test for G - W slope’
state*x 1 0 -1;
run;
The SAS output is;
Example 1 page 303 Using Dummy Variables
Least Squares Fit to the Data
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
108
Model 1 26.20192 26.20192 21.81 0.0007
Error 11 13.21500 1.20136
Corrected Total 12 39.41692
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
109
Using Proc GLM model to determine if states have different y-intercepts
Effect Coefficients
Intercept L1
x L2
state G L3
state V L4
state W L1-L3-L4
Dependent Variable: y
Sum of
Source DF Squares Mean Square F Value Pr > F
110
Contrast DF Contrast SS Mean Square F Value Pr > F
Standard
Parameter Estimate Error t Value Pr > |t|
NOTE: The X’X matrix has been found to be singular, and a generalized inverse
was used to solve the normal equations. Terms whose estimates are
followed by the letter ’B’ are not uniquely estimable.
Effect Coefficients
Intercept L1
x L2
state G L3
state V L4
state W L1-L3-L4
x*state G L6
x*state V L7
x*state W L2-L6-L7
Dependent Variable: y
Sum of
Source DF Squares Mean Square F Value Pr > F
111
Corrected Total 12 39.41692308
Standard
Parameter Estimate Error t Value Pr > |t|
NOTE: The X’X matrix has been found to be singular, and a generalized inverse
was used to solve the normal equations. Terms whose estimates are
followed by the letter ’B’ are not uniquely estimable.
The graphs are;
112
113
10.0.7 Harris County Discrimination Example
The following example concerns potential gender-salary discrimination in Harris County Texas. The SAS
code is;
options center nodate pagesize=100 ls=80;
symbol1 i=none color=red v=star ;
symbol2 i=none v=square color=blue;
symbol3 i=none color=green v=diamond;
title1 ’Harris County Discrimination’;
data harris;
input SALARY EDUCAt EXPER MONTHS Gender @@;
cards;
3900 12 0 1 0 4020 10 44 7 0 4290 12 5 30 0
4380 8 6 7 0 4380 8 8 6 0 4380 12 0 7 0
4380 12 0 10 0 4380 12 5 6 0 4440 15 75 2 0
4500 8 52 3 0 4500 12 8 19 0 4620 12 52 3 0
4800 8 70 20 0 4800 12 6 23 0 4800 12 11 12 0
4800 12 11 17 0 4800 12 63 22 0 4800 12 144 24 0
4800 12 163 12 0 4800 12 228 26 0 4800 12 381 1 0
4800 16 214 15 0 4980 8 318 25 0 5100 8 96 33 0
5100 12 36 15 0 5100 12 59 14 0 5100 15 115 1 0
5100 15 165 4 0 5100 16 123 12 0 5160 12 18 12 0
5220 8 102 29 0 5220 12 127 29 0 5280 8 90 11 0
5280 8 190 1 0 5280 12 107 11 0 5400 8 173 34 0
5400 8 228 33 0 5400 12 26 11 0 5400 12 36 33 0
5400 12 38 22 0 5400 12 82 29 0 5400 12 169 27 0
5400 12 244 1 0 5400 15 24 13 0 5400 15 49 27 0
5400 15 51 21 0 5400 15 122 33 0 5520 12 97 17 0
5520 12 196 32 0 5580 12 133 30 0 5640 12 55 9 0
5700 12 90 23 0 5700 12 117 25 0 5700 15 51 17 0
5700 15 61 11 0 5700 15 241 34 0 6000 12 121 30 0
6000 15 79 13 0 6120 12 209 21 0 6300 12 87 33 0
6300 15 231 15 0 4620 12 12 22 1 5040 15 14 3 1
5100 12 180 15 1 5100 12 315 2 1 5220 12 29 14 1
5400 12 7 21 1 5400 12 38 11 1 5400 12 113 3 1
5400 15 18 8 1 5400 15 359 11 1 5700 15 36 5 1
6000 8 320 21 1 6000 12 24 2 1 6000 12 32 17 1
6000 12 49 8 1 6000 12 56 33 1 6000 12 252 11 1
6000 12 272 19 1 6000 15 25 13 1 6000 15 36 32 1
6000 15 56 12 1 6000 15 64 33 1 6000 15 108 16 1
6000 16 46 3 1 6300 15 72 17 1 6600 15 64 16 1
6600 15 84 33 1 6600 15 216 16 1 6840 15 42 7 1
6900 12 175 10 1 6900 15 132 24 1 8100 16 55 33 1
;
title2 ’Least Squares Fit to the Data’;
proc sort; by exper;
proc gplot;plot salary*exper=gender salary*educat=gender salary*months=gender;run;
proc reg;
model salary = EDUCAT EXPER MONTHS /ss1 ss2;
114
run;
title2 ’Proc GLM Using Gender’;
proc glm;
class gender;
model salary = educat exper months gender / E solution;
contrast ’Test for gender y-int’
gender 1 -1;
run;
title2 ’Test for different slopes’;
proc glm;
class gender;
model salary = educat exper months gender gender*educat gender*exper gender*months / e solution;
contrast ’Test for gender y-int’
gender 1 -1 ;
contrast ’Test for gender*educat slope’
gender*educat 1 -1;
contrast ’Test for gender*exper slope’
gender*exper 1 -1;
contrast ’Test for gender*months slope’
gender*months 1 -1;
run;
The SAS output is;
Harris County Discrimination
Least Squares Fit to the Data
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t| Type I SS
115
Intercept 1 3179.47980 383.42484 8.29 <.0001 2732330410
EDUCAT 1 139.60934 27.71249 5.04 <.0001 7862534
EXPER 1 1.48405 0.69705 2.13 0.0360 2047106
MONTHS 1 20.62908 6.15440 3.35 0.0012 4081607
Parameter Estimates
Variable DF Type II SS
Intercept 1 24980136
EDUCAT 1 9219791
EXPER 1 1646670
MONTHS 1 4081607
Effect Coefficients
Intercept L1
EDUCAT L2
EXPER L3
MONTHS L4
Gender 0 L5
Gender 1 L1-L5
Sum of
Source DF Squares Mean Square F Value Pr > F
116
MONTHS 1 4081606.609 4081606.609 15.86 0.0001
Gender 1 9678502.666 9678502.666 37.60 <.0001
Standard
Parameter Estimate Error t Value Pr > |t|
NOTE: The X’X matrix has been found to be singular, and a generalized inverse
was used to solve the normal equations. Terms whose estimates are
followed by the letter ’B’ are not uniquely estimable.
Effect Coefficients
Intercept L1
EDUCAT L2
EXPER L3
117
MONTHS L4
Gender 0 L5
Gender 1 L1-L5
EDUCAT*Gender 0 L7
EDUCAT*Gender 1 L2-L7
EXPER*Gender 0 L9
EXPER*Gender 1 L3-L9
MONTHS*Gender 0 L11
MONTHS*Gender 1 L4-L11
Sum of
Source DF Squares Mean Square F Value Pr > F
118
EDUCAT*Gender 1 393532.787 393532.787 1.52 0.2209
EXPER*Gender 1 108312.735 108312.735 0.42 0.5194
MONTHS*Gender 1 29539.879 29539.879 0.11 0.7363
Contrast Pr > F
Standard
Parameter Estimate Error t Value Pr > |t|
NOTE: The X’X matrix has been found to be singular, and a generalized inverse
was used to solve the normal equations. Terms whose estimates are
followed by the letter ’B’ are not uniquely estimable.
The graphs are;
119
120
121
122
Chapter 11
SAS PROC TRANSREG is a very power procedure for finding transformation for a number of different type
of models.
The TRANSREG (transformation regression) procedure fits linear models, optionally with spline and other
nonlinear transformations, and it can be used to code experimental designs prior to their use in other
analyses.
The TRANSREG procedure fits many types of linear models, including
• ordinary regression and ANOVA
• metric and nonmetric conjoint analysis (Green and Wind 1975; de Leeuw, Young, and Takane 1976)
• metric and nonmetric vector and ideal point preference mapping (Carroll 1972)
• simple, multiple, and multivariate regression with variable transformations (Young, de Leeuw, and
Takane 1976; Winsberg and Ramsay 1980; Breiman and Friedman 1985)
• redundancy analysis (Stewart and Love 1968) with variable transformations (Israels 1984)
• canonical correlation analysis with variable transformations (van der Burg and de Leeuw 1983)
• response surface regression (Meyers 1976; Khuri and Cornell 1987) with variable transformations
The data set can contain variables measured on nominal, ordinal, interval, and ratio scales (Siegel 1956).
Any mix of these variable types is allowed for the dependent and independent variables. The TRANSREG
procedure can transform
• nominal variables by scoring the categories to minimize squared error (Fisher 1938), or they can be
expanded into dummy variables
• ordinal variables by monotonically scoring the ordered categories so that order is weakly preserved
(adjacent categories can be merged) and squared error is minimized. Ties can be optimally untied or
left tied (Kruskal 1964). Ordinal variables can also be transformed to ranks.
• interval and ratio scale of measurement variables linearly or nonlinearly with spline (de Boor 1978;
van Rijckevorsel 1982) or monotone spline (Winsberg and Ramsay 1980) transformations. In addition,
smooth, logarithmic, exponential, power, logit, and inverse trigonometric sine transformations are
available.
123
Transformations produced by the PROC TRANSREG multiple regression algorithm, requesting spline trans-
formations, are often similar to transformations produced by the ACE smooth regression method of Breiman
and Friedman (1985). However, ACE does not explicitly optimize a loss function (de Leeuw 1986), while
PROC TRANSREG always explicitly optimizes a squared-error loss function.
PROC TRANSREG extends the ordinary general linear model by providing optimal variable transfor-
mations that are iteratively derived using the method of alternating least squares (Young 1981). PROC
TRANSREG iterates until convergence, alternating
1. finding least-squares estimates of the parameters of the model given the current scoring of the data
(that is, the current vectors)
2. finding least-squares estimates of the scoring parameters given the current set of model parameters
For more background on alternating least-squares optimal scaling methods and transformation regression
methods, refer to Young, de Leeuw, and Takane (1976), Winsberg and Ramsay (1980), Young (1981), Gifi
(1990), Schiffman, Reynolds, and Young (1981), van der Burg and de Leeuw (1983), Israels (1984), Breiman
and Friedman (1985), and Hastie and Tibshirani (1986). (These are just a few of the many relevant sources.)
124
– model identity(y1-y4) = identity(x1-x5) / method=redundancy;
• preference mapping, vector model (Carroll 1972)
– model identity(Attrib1-Attrib3) = identity(Dim1-Dim2);
• preference mapping, ideal point model (Carroll 1972)
– model identity(Attrib1-Attrib3) = point(Dim1-Dim2);
• preference mapping, ideal point model, elliptical (Carroll 1972)
– model identity(Attrib1-Attrib3) = epoint(Dim1-Dim2);
• preference mapping, ideal point model, quadratic (Carroll 1972)
– model identity(Attrib1-Attrib3) = qpoint(Dim1-Dim2);
• metric conjoint analysis
– model identity(Subj1-Subj50) = class(a b c d e f / zero=sum);
• nonmetric conjoint analysis
– model monotone(Subj1-Subj50) = class(a b c d e f / zero=sum);
• main effects, two-way interaction
– model identity(y) = class(a—b);
• less-than-full-rank model -main effects and two-way interaction are constrained to sum to zero
– model identity(y) = class(a—b / zero=sum);
• main effects and all two-way interactions
– model identity(y) = class(a—b—c@2);
• main effects and all two- and three-way interactions
– model identity(y) = class(a—b—c);
• main effects and just B*C two-way interaction
– model identity(y) = class(a b c b*c);
• seven main effects, three two-way interactions
– model identity(y) = class(a b c d e f g a*b a*c a*d);
• deviations-from-means (effects or (1, 0, -1)) coding, with an A reference level of ’1’ and a B reference
level of ’2’
– model identity(y) = class(a—b / deviations zero=’1’ ’2’);
• cell-means coding (implicit intercept)
– model identity(y) = class(a*b / zero=none);
• reference cell model
125
– model identity(y) = class(a—b / zero=’1’ ’1’);
• reference line with change in line parameters
– model identity(y) = class(a) — identity(x);
• reference curve with change in curve parameters
– model identity(y) = class(a) — spline(x);
• separate curves and intercepts
– model identity(y) = class(a / zero=none) — spline(x);
• quantitative effects with interaction
– model identity(y) = identity(x1 — x2);
• separate quantitative effects with interaction within each cell
– model identity(y) = class(a * b / zero=none) — identity(x1 — x2);
126
11.1.2 Example 65.4: Transformation Regression of Exhaust Emissions Data
In this example, the MORALS algorithm is applied to data from an experiment in which nitrogen oxide
emissions from a single cylinder engine are measured for various combinations of fuel, compression ratio, and
equivalence ratio. The data are provided by Brinkman (1981).
The equivalence ratio and nitrogen oxide variables are continuous and numeric, so spline transformations
of these variables are requested. Each spline is degree three with nine knots (one at each decile) in order
to allow PROC TRANSREG a great deal of freedom in finding transformations. The compression ratio
variable has only five discrete values, so an optimal scoring is requested. The character variable Fuel is
nominal, so it is designated as a classification variable. No monotonicity constraints are placed on any of
the transformations. Observations with missing values are excluded with the NOMISS a-option.
The squared multiple correlation for the initial model is less than 0.25. PROC TRANSREG increases
the R2 to over 0.95 by transforming the variables. The transformation plots show how each variable is
transformed. The transformation of compression ratio (TCpRatio) is nearly linear. The transformation of
equivalence ratio (TEqRatio) is nearly parabolic. It can be seen from this plot that the optimal transfor-
mation of equivalence ratio is nearly uncorrelated with the original scoring. This suggests that the large
increase in R2 is due to this transformation. The transformation of nitrogen oxide (TNOx) is something like
a log transformation.
These results suggest the parametric model
X
log(N OX) = β0 + β1 ∗ EqRatio + β2 ∗ EqRatio2 + β3 3 ∗ CpRatio + β(j) ∗ F uel(j) + error.
You can perform this analysis with PROC TRANSREG using the following MODEL statement:
model log(NOx)= psp(EqRatio / deg=2) identity(CpRatio)
class(Fuel / zero=first);
The LOG transformation computes the natural log. The PSPLINE expansion expands EqRatio into a linear
term, EqRatio, and a squared term, EqRatio2. A linear transformation of CpRatio and a dummy variable
expansion of Fuel is requested with the first level as the reference level. These should provide a good
parametric operationalization of the optimal transformations. The final model has an R2 of 0.91 (smaller
than before since the model uses fewer degrees of freedom, but still quite good).
The following statements produce Output 65.4.1 through Output 65.4.3:
title ’Gasoline Example’;
data Gas;
input Fuel :$8. CpRatio EqRatio NOx @@;
label Fuel = ’Fuel’
CpRatio = ’Compression Ratio (CR)’
EqRatio = ’Equivalence Ratio (PHI)’
NOx = ’Nitrogen Oxide (NOx)’;
datalines;
Ethanol 12.0 0.907 3.741 Ethanol 12.0 0.761 2.295
Ethanol 12.0 1.108 1.498 Ethanol 12.0 1.016 2.881
Ethanol 12.0 1.189 0.760 Ethanol 9.0 1.001 3.120
Ethanol 9.0 1.231 0.638 Ethanol 9.0 1.123 1.170
Ethanol 12.0 1.042 2.358 Ethanol 12.0 1.215 0.606
Ethanol 12.0 0.930 3.669 Ethanol 12.0 1.152 1.000
Ethanol 15.0 1.138 0.981 Ethanol 18.0 0.601 1.192
Ethanol 7.5 0.696 0.926 Ethanol 12.0 0.686 1.590
127
Ethanol 12.0 1.072 1.806 Ethanol 15.0 1.074 1.962
Ethanol 15.0 0.934 4.028 Ethanol 9.0 0.808 3.148
Ethanol 9.0 1.071 1.836 Ethanol 7.5 1.009 2.845
Ethanol 7.5 1.142 1.013 Ethanol 18.0 1.229 0.414
Ethanol 18.0 1.175 0.812 Ethanol 15.0 0.568 0.374
Ethanol 15.0 0.977 3.623 Ethanol 7.5 0.767 1.869
Ethanol 7.5 1.006 2.836 Ethanol 9.0 0.893 3.567
Ethanol 15.0 1.152 0.866 Ethanol 15.0 0.693 1.369
Ethanol 15.0 1.232 0.542 Ethanol 15.0 1.036 2.739
Ethanol 15.0 1.125 1.200 Ethanol 9.0 1.081 1.719
Ethanol 9.0 0.868 3.423 Ethanol 7.5 0.762 1.634
Ethanol 7.5 1.144 1.021 Ethanol 7.5 1.045 2.157
Ethanol 18.0 0.797 3.361 Ethanol 18.0 1.115 1.390
Ethanol 18.0 1.070 1.947 Ethanol 18.0 1.219 0.962
Ethanol 9.0 0.637 0.571 Ethanol 9.0 0.733 2.219
Ethanol 9.0 0.715 1.419 Ethanol 9.0 0.872 3.519
Ethanol 7.5 0.765 1.732 Ethanol 7.5 0.878 3.206
Ethanol 7.5 0.811 2.471 Ethanol 15.0 0.676 1.777
Ethanol 18.0 1.045 2.571 Ethanol 18.0 0.968 3.952
Ethanol 15.0 0.846 3.931 Ethanol 15.0 0.684 1.587
Ethanol 7.5 0.729 1.397 Ethanol 7.5 0.911 3.536
Ethanol 7.5 0.808 2.202 Ethanol 7.5 1.168 0.756
Indolene 7.5 0.831 4.818 Indolene 7.5 1.045 2.849
Indolene 7.5 1.021 3.275 Indolene 7.5 0.970 4.691
Indolene 7.5 0.825 4.255 Indolene 7.5 0.891 5.064
Indolene 7.5 0.710 2.118 Indolene 7.5 0.801 4.602
Indolene 7.5 1.074 2.286 Indolene 7.5 1.148 0.970
Indolene 7.5 1.000 3.965 Indolene 7.5 0.928 5.344
Indolene 7.5 0.767 3.834 Ethanol 7.5 0.749 1.620
Ethanol 7.5 0.892 3.656 Ethanol 7.5 1.002 2.964
82rongas 7.5 0.873 6.021 82rongas 7.5 0.987 4.467
82rongas 7.5 1.030 3.046 82rongas 7.5 1.101 1.596
82rongas 7.5 1.173 0.835 82rongas 7.5 0.931 5.498
82rongas 7.5 0.822 5.470 82rongas 7.5 0.749 4.084
82rongas 7.5 0.625 0.716 94%Eth 7.5 0.818 2.382
94%Eth 7.5 1.128 1.004 94%Eth 7.5 1.191 0.623
94%Eth 7.5 1.132 1.030 94%Eth 7.5 0.993 2.593
94%Eth 7.5 0.866 2.699 94%Eth 7.5 0.910 3.177
94%Eth 12.0 1.139 1.151 94%Eth 12.0 1.267 0.474
94%Eth 12.0 1.017 2.814 94%Eth 12.0 0.954 3.308
94%Eth 12.0 0.861 3.031 94%Eth 12.0 1.034 2.537
94%Eth 12.0 0.781 2.403 94%Eth 12.0 1.058 2.412
94%Eth 12.0 0.884 2.452 94%Eth 12.0 0.766 1.857
94%Eth 7.5 1.193 0.657 94%Eth 7.5 0.885 2.969
94%Eth 7.5 0.915 2.670 Ethanol 18.0 0.812 3.760
Ethanol 18.0 1.230 0.672 Ethanol 18.0 0.804 3.677
Ethanol 18.0 0.712 . Ethanol 12.0 0.813 3.517
Ethanol 12.0 1.002 3.290 Ethanol 9.0 0.696 1.139
Ethanol 9.0 1.199 0.727 Ethanol 9.0 1.030 2.581
128
Ethanol 15.0 0.602 0.923 Ethanol 15.0 0.694 1.527
Ethanol 15.0 0.816 3.388 Ethanol 15.0 0.896 .
Ethanol 15.0 1.037 2.085 Ethanol 15.0 1.181 0.966
Ethanol 7.5 0.899 3.488 Ethanol 7.5 1.227 0.754
Indolene 7.5 0.701 1.990 Indolene 7.5 0.807 5.199
Indolene 7.5 0.902 5.283 Indolene 7.5 0.997 3.752
Indolene 7.5 1.224 0.537 Indolene 7.5 1.089 1.640
Ethanol 9.0 1.180 0.797 Ethanol 7.5 0.795 2.064
Ethanol 18.0 0.990 3.732 Ethanol 18.0 1.201 0.586
Methanol 7.5 0.975 2.941 Methanol 7.5 1.089 1.467
Methanol 7.5 1.150 0.934 Methanol 7.5 1.212 0.722
Methanol 7.5 0.859 2.397 Methanol 7.5 0.751 1.461
Methanol 7.5 0.720 1.235 Methanol 7.5 1.090 1.347
Methanol 7.5 0.616 0.344 Gasohol 7.5 0.712 2.209
Gasohol 7.5 0.771 4.497 Gasohol 7.5 0.959 4.958
Gasohol 7.5 1.042 2.723 Gasohol 7.5 1.125 1.244
Gasohol 7.5 1.097 1.562 Gasohol 7.5 0.984 4.468
Gasohol 7.5 0.928 5.307 Gasohol 7.5 0.889 5.425
Gasohol 7.5 0.827 5.330 Gasohol 7.5 0.674 1.448
Gasohol 7.5 1.031 3.164 Methanol 7.5 0.871 3.113
Methanol 7.5 1.026 2.551 Methanol 7.5 0.598 0.204
Indolene 7.5 0.973 5.055 Indolene 7.5 0.980 4.937
Indolene 7.5 0.665 1.561 Ethanol 7.5 0.629 0.561
Ethanol 9.0 0.608 0.563 Ethanol 12.0 0.584 0.678
Ethanol 15.0 0.562 0.370 Ethanol 18.0 0.535 0.530
94%Eth 7.5 0.674 0.900 Gasohol 7.5 0.645 1.207
Ethanol 18.0 0.655 1.900 94%Eth 7.5 1.022 2.787
94%Eth 7.5 0.790 2.645 94%Eth 7.5 0.720 1.475
94%Eth 7.5 1.075 2.147
;
129
*-Fit the Parametric Model Suggested by the Nonparametric Analysis-;
proc transreg data=Gas dummy ss2 short nomiss;
title ’Gasoline Example’;
title2 ’Now fit log(NOx) = b0 + b1*EqRatio + b2*EqRatio**2 +’;
title3 ’b3*CpRatio + Sum b(j)*Fuel(j) + Error’;
model log(NOx)= pspline(EqRatio / deg=2) identity(CpRatio)
class(Fuel / zero=first);
output out=Results2;
run;
The SAS output is;
Gasoline Example
Iteratively Estimate NOx, CPRATIO and EQRATIO
Algorithm converged.
Sum of Mean
Source DF Squares Square F Value Liberal p
The above statistics are not adjusted for the fact that the dependent
variable was transformed and so are generally liberal.
Gasoline Example
Iteratively Estimate NOx, CPRATIO and EQRATIO
130
The TRANSREG Procedure
The Wilks’ Lambda, Pillai’s Trace, and Hotelling-Lawley Trace statistics are a conservative
adjustment of the normal statistics. Roy’s Greatest Root is liberal. These statistics are
normally defined in terms of the squared canonical correlations which are the eigenvalues of the
matrix H*inv(H+E). Here the R-Square is used for the first eigenvalue and all other eigenvalues
are set to zero since only one linear combination is used. Degrees of freedom are computed
assuming all linear combinations contribute to the Lambda and Trace statistics, so the F tests
for those statistics are conservative. The p values for the liberal and conservative statistics
provide approximate lower and upper bounds on p. A liberal test statistic with conservative
degrees of freedom and a conservative test statistic with liberal degrees of freedom yield at
best an approximate p value, which is indicated by a "~" before the p value.
Gasoline Example
Now fit log(NOx) = b0 + b1*EqRatio + b2*EqRatio**2 +
b3*CpRatio + Sum b(j)*Fuel(j) + Error
Log(NOx)
Algorithm converged.
Sum of Mean
Source DF Squares Square F Value Pr > F
131
Root MSE 0.21573 R-Square 0.9142
Dependent Mean 0.63130 Adj R-Sq 0.9099
Coeff Var 34.17294
Gasoline Example
Now fit log(NOx) = b0 + b1*EqRatio + b2*EqRatio**2 +
b3*CpRatio + Sum b(j)*Fuel(j) + Error
Type II
Sum of Mean
Variable DF Coefficient Squares Square F Value Pr > F Label
132
133
134
135
Chapter 12
In the previous chapters we have address the problem of having an unstabled models with either;
1. Model selection using one of the stepswise procedures.
2. Transformations such as the boxcox transformation or using PROC TRANSREG.
In this chapter we want to consider another approach whereby we use a subset model where the new variables
are linear combinations of the original variables. The approach is called principle component regression.
Before considering an example using this method it is necessary to review some matrix algebra concerning
eigenvalues and eigenvectors.
A~x = λ~x
This last equation is called the characteristic equation for a n × n matrix A. The matrix A has possibly n
roots or solutions for the value λ to the characteristic equation. These solutions are called the characteristic
values or roots or the eigenvalues for the matrix A. Suppose that λ1 is a solution and
then ~x1 is said to a characteristic vector or eigenvector of A corresponding to the eigenvalue λ1 . Note: The
eigenvalues may or may not be real numbers.
136
12.1.1 Properties of Eigenvalues–Eigenvectors
1. The n × n matrix A has at least one eigenvalue equal to zero if and only if A is singular.
2. The eigenvalues for A, C −1 AC and CAC −1 have the same set of eigenvalues for any nonsingular matrix
C.
3. The matrices A and A0 have the same set of eigenvalues but need not have the same eigenvectors.
4. Let A be a nonsingular matrix with eigenvalue λ, then 1/λ is an eigenvalue of A−1 .
5. The eigenvectors are not unique, for if ~x1 is an eigenvector corresponding to λ1 then c~x1 is also n
eigenvector, since Ac~x1 = λ1 c~x1 , for any nonzero value c.
6. Let A be an n × n real matrix, then there exists a nonsingular, complex matrix Q such that Q0 AQ = T ,
where T is a complex, upper triangular matrix, and the eigenvalues of A are the diagonal elements of
the matrix T .
7. Suppose that A is a symmetric matrix;
(a) Then the eigenvalues of A are real numbers.
(b) For each eigenvalue there exists a real eigenvector (each element is a real number).
(c) Let λ1 and λ2 be eigenvalues of A with corresponding eigenvectors ~x1 and ~x2 , then ~x1 and ~x2 are
orthogonal vectors, that is, ~x01 ~x2 = 0.
(d) There exists an orthogonal matrix P (P 0 P = P P 0 = In ) such that P 0 AP = D, where D is a
diagonal matrix whose diagonal elements are the eigenvalues of the matrix A.
where var(~yi ) = ~li0 Σ~li and cov(~yi , ~yk ) = ~li0 Σ~lk = 0 and var(~y1 ) ≥ var(~y2 ) ≥ . . . var(~yp ) for ~li0 = (l1i , l2i , . . . , lpi ).
This problem has the following solution;
1. Suppose that the matrix Σ has associated real eigenvalue–eigenvectors given by (λi , ~ei ) where λ1 ≥
λ2 ≥ . . . ≥ λp ≥ 0, then the ith principle component is given by
~yi = ~e0i X = ei1~x1 + ei2 ~x2 + . . . + eip ~xp ,
and var(~yi ) = λi for i = 1, 2 . . . , p, cov(~yi , ~yk ) = ~e0i Σ~ek = 0 for i 6= k. Note, the eigenvalues λi are
unique, however, the eigenvectors (and hence the vectors ~yi ) are not.
Pp
2. The total variance for the p dimensionsP is tr[Σ] = i=1 λi . Hence, the proportion of variance explained
p
by the k th principle component is λk / i=1 λi .
Pp
3. If the matrix X is centered and scaled so that Σ is the correlation matrix, then i=1 λi = p.
137
12.2.1 Example – FOC Sales
This example using the FOC sales data found in Dielman’s text. The sas code is,
Observations 265
Variables 8
Simple Statistics
Correlation Matrix
138
Eigenvalues of the Correlation Matrix
Eigenvectors
In order to consider this problem in the regression setting, consider the following output from the multiple
regression problem.
139
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t| Type I SS Type II SS
Collinearity Diagnostics
Collinearity Diagnostics
------------------------Proportion of Variation------------------------
140
Number TRANS UTILITY FINANCE PROD HOUSE
Durbin-Watson D 1.821
Number of Observations 265
1st Order Autocorrelation 0.085
From the condition numbers one observes that only 7 variables are needed when including the intercept.
The ridge trace plots confirm the fact that the model is very unstable.
If one considers the model selection solution one has the following:
Regular Linear regression
141
Summary of Stepwise Selection
Number of Root
Extracted Mean Prob >
Factors PRESS T**2 T**2
Number of
Extracted Model Effects Dependent Variables
Factors Current Total Current Total
142
1 78.7219 78.7219 62.4695 62.4695
2 11.1137 89.8356 8.8480 71.3175
3 8.0377 97.8734 3.2763 74.5938
Number of
Extracted
Factors FOV COMPOSITE INDUSTRIAL TRANS UTILITY FINANCE
1 0.196279 0.395296 0.394798 0.385564 0.380283 0.392185
2 0.767804 0.025306 0.036677 0.016319 -0.079879 0.029463
3 0.601657 -0.149454 -0.144876 -0.201304 -0.111648 -0.169716
PROD HOUSE
0.389979 0.224016
-0.054709 -0.630859
.064182 0.713282
One can now use the principle comonents as independent variables in a new regression problem.
proc reg data=outpls;
model sales = t1-t3;run;
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Parameter Estimates
143
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
144
Chapter 13
Robust Regression
The purpose of this chapter is to introduce some additional regression procedures which are robust or not
highly effected by the presence of outliers in the data. In statistics there are procedures which are called
robust estimators. For example, the sample median would be a robust estimator of the population mean,
µ, for a symmetric p.d.f. since it is not effected by the outlying observations. Whereas, the sample mean is
not. In this chapter we want to discuss how these estimation procedures are used in the regression problem.
The method that we consider here are called M-estimators since they are “maximum likelihood type”
estimators. That is, suppose the errors are independently distributed and all follow the same distribution,
f (). Then the maximum likelihood estimator (MLE) of β is given by β̂, which maximizes the quantity
n
Y
f (yi − x0i β),
i=1
where x0i is the ith row of X, i = 1, 2, . . . , n, in the model Y = Xβ +. Equivalently, the MLE of β maximizes
n
X
ln f (yi − x0i β).
I=1
In the least squares problem this leads to minimizing the sum of squares function
n
X
(yi − x0i β)2
i=1
when the data are normally distributed. If the p.d.f. is the double exponential case one minimizes
n
X
| yi − x0i β | .
i=1
The basic idea is to extend this idea as follows. Suppose ρ(u) is a defined function of u and suppose that
s is an estimate of a scale. Define a robust estimator as one that minimizes
n n
X X yi − x0i β
ρ(i /s) = ρ( ).
i=1
s
I=1
One needs to minimize the above equation with respect to the parameters, βj which provides
n
X yi − x0i β
xij ψ( ) = 0,
i=1
s
145
where ψ(u) is the partial derivative ∂ρ/∂u and xij is the jth entry of x0i = (1, xi1 , xi2 , . . . , xik ). This equation
will not usually have a close form solution so it is necessary to use a procedure which is called an iterative
reweighted least squares (IRLS). That is, define the weights by
X 0 Wβ Xβ = X 0 Wβ Y,
where Wβ is a diagonal matrix with elements wiβ . This equation is the weighted least squares which can
be used to estimate β as
β̂ = (X 0 Wβ X)−1 X 0 Wβ Y.
This procedure is used in an iterative scheme whereby,
β̂q+1 = (X 0 Wq X)−1 X 0 Wq Y, q = 0, 1, 2, . . . ,
where the procedure terminates whenever the specified convergence criteria is met. SAS PROC NLIN
allows for the (IRLS) procedure.
146
;
title ’Steel Example’;
symbol1 v=star color=black;
symbol2 v=none i=join color=red;
symbol3 v=none i=join color=blue;
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
147
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Weighted
Iter b0 b1 SS
Estimation Summary
Method Gauss-Newton
Iterations 11
R 5.312E-6
PPC(b0) 0.000039
RPC(b0) 0.000088
Object 9.54E-7
Objective 2064.649
Observations Read 10
148
Observations Used 10
Observations Missing 0
Approx
Parameter Estimate Std Error Approximate 95% Confidence Limits
b0 1.0000000 -0.7274205
b1 -0.7274205 1.0000000
and corresponding graphs are;
149
150
13.1.2 Phone Data Example
A more illustrative example is as follows; The SAS code is,
option ls=80 center nodate;
symbol1 v=star color=black;
symbol2 v=none i=join color=red;
symbol3 v=none i=join color=blue;
data phones;
input year calls @@;
cards;
50 4.4 51 4.7 52 4.7 53 5.9 54 6.6
55 7.3 56 8.1 57 8.8 58 10.6 59 12.0
60 13.5 61 14.9 62 16.1 63 21.2 64 119.0
65 124.0 66 142.0 67 159.0 68 182.0 69 212.0
70 43.0 71 24.0 72 27.0 73 29.0
;run;
title ’Analysis of Phone Data’;
*Robust procedure using Proc NLIN with iterated reweighted least squares;
title2 ’Tukey Biweigth using IRLS’;
title3 ’Robust = red LS = blue’;
proc nlin data=a nohalve;
parms b0=-260 b1=5;
model calls = b0 + b1*year;
resid=calls-model.calls;
p=model.calls;
sigma=20;der.b0=1;der.b1=year;
b=4.685;
r=abs(resid/sigma);
if r<=b then _weight_=(1-(r/b)**2);
else _weight_=0;
output out=c r=rbi p=p; run;
data c; set c; sigma=40; b=4.685; r=abs(rbi/sigma);
if r<=b then _weight_=(1-(r/b)**2);
else _weight_=0;
proc gplot data=c;
plot calls*year p*year yhat*year/ overlay;
run;
*Robust procedure using Proc NLIN with iterated reweighted least squares;
title2 ’Andrews method using IRLS’;
title3 ’Robust = red LS = blue’;
proc nlin data=a nohalve;
151
parms b0=-260 b1=5;
model calls = b0 + b1*year;
resid=calls-model.calls;
p=model.calls;
sigma=20;der.b0=1;der.b1=year;
b=4.21;
r=abs(resid/sigma);
if r<=b then _weight_=b*sin(r/b)/r;
else _weight_=0;
output out=c r=rbi p=p; run;
data c; set c; sigma=40; b=4.21; r=abs(rbi/sigma);
if r<=b then _weight_=b*sin(r/b)/r;
else _weight_=0;
proc gplot data=c;
plot calls*year p*year yhat*year/ overlay;
run;
*Robust procedure using Proc NLIN with iterated reweighted least squares;
title2 ’Hubers method using IRLS’;
title3 ’Robust = red LS = blue’;
proc nlin data=a nohalve;
parms b0=-260 b1=5;
model calls = b0 + b1*year;
resid=calls-model.calls;
p=model.calls;
sigma=20;der.b0=1;der.b1=year;
b=2;
r=abs(resid/sigma);
if r>=b then _weight_=b/abs(r);
else _weight_=1;
output out=c r=rbi p=p; run;
data c; set c; sigma=40; b=2; r=abs(rbi/sigma);
if r>=b then _weight_=b/abs(r);
else _weight_=1;
proc gplot data=c;
plot calls*year p*year yhat*year/ overlay;
run;
*Robust procedure using Proc NLIN with iterated reweighted least squares;
title2 ’Ramseys method using IRLS’;
title3 ’Robust = red LS = blue’;
proc nlin data=a nohalve;
parms b0=-260 b1=5;
model calls = b0 + b1*year;
resid=calls-model.calls;
p=model.calls;
sigma=20;der.b0=1;der.b1=year;
b=0.3;
r=abs(resid/sigma);
_weight_=exp(-b*abs(r));
output out=c r=rbi p=p; run;
152
data c; set c; sigma=40; b=0.3; r=abs(rbi/sigma);
if r<=b then _weight_=exp(-b*abs(r));
else _weight_=0;
proc gplot data=c;
plot calls*year p*year yhat*year/ overlay;
run;
The SAS output is;
Analysis of Phone Data
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
______________________________________________________
Analysis of Phone Data
Tukey Biweigth using IRLS
Robust = red LS = blue
Weighted
Iter b0 b1 SS
153
0 -260.0 5.0000 20671.7
1 -166.1 3.2452 9784.0
2 -82.1794 1.6768 664.4
3 -63.5502 1.3053 302.8
4 -63.2587 1.3000 302.7
5 -63.2555 1.2999 302.7
Estimation Summary
Method Gauss-Newton
Iterations 5
R 1.392E-6
PPC(b0) 5.676E-7
RPC(b0) 0.000051
Object 2.276E-6
Objective 302.7444
Observations Read 24
Observations Used 18
Observations Missing 6
b0 1.0000000 -0.9928513
b1 -0.9928513 1.0000000
154
_______________________________________________________
Analysis of Phone Data
Andrews method using IRLS
Robust = red LS = blue
Weighted
Iter b0 b1 SS
Estimation Summary
Method Gauss-Newton
Iterations 5
R 2.767E-8
PPC(b0) 1.132E-8
RPC(b0) 4.999E-6
Object 4.418E-8
Objective 307.7286
Observations Read 24
Observations Used 18
Observations Missing 6
155
Approx Approximate 95% Confidence
Parameter Estimate Std Error Limits
b0 1.0000000 -0.9928441
b1 -0.9928441 1.0000000
__________________________________________________________
Analysis of Phone Data
Hubers method using IRLS
Robust = red LS = blue
Weighted
Iter b0 b1 SS
Estimation Summary
Method Gauss-Newton
Iterations 10
R 6.595E-6
PPC(b0) 7.908E-6
RPC(b0) 0.000042
Object 0.000013
156
Objective 32251.95
Observations Read 24
Observations Used 24
Observations Missing 0
____________________________________________________________
Analysis of Phone Data
Hubers method using IRLS
Robust = red LS = blue
b0 1.0000000 -0.9932326
b1 -0.9932326 1.0000000
Weighted
Iter b0 b1 SS
157
6 -128.7 2.5465 15700.3
7 -127.2 2.5182 15639.8
8 -126.5 2.5052 15612.3
9 -126.2 2.4994 15599.9
10 -126.1 2.4967 15594.2
11 -126.0 2.4955 15591.6
12 -126.0 2.4949 15590.5
13 -126.0 2.4947 15590.0
14 -126.0 2.4946 15589.7
15 -125.9 2.4945 15589.6
Estimation Summary
Method Gauss-Newton
Iterations 15
R 6.817E-6
PPC(b0) 9.849E-6
RPC(b0) 0.000022
Object 6.892E-6
Objective 15589.62
Observations Read 24
Observations Used 24
Observations Missing 0
_________________________________________________________
Analysis of Phone Data
Ramseys method using IRLS
Robust = red LS = blue
158
b1 2.4945 0.9421 0.5408 4.4483
b0 1.0000000 -0.9933620
b1 -0.9933620 1.0000000
with corresponding graphs;
159
160
161
13.1.3 Phone Example with Splus
The Splus script for running this example is;
phones.lm <- lm(calls ~ year, phones)
attach(phones); plot(year, calls); detach()
abline(phones.lm$coef)
abline(rlm(calls ~ year, phones, maxit=50), lty=2, col=2)
abline(lqs(calls ~ year, phones), lty=3, col=3)
legend(locator(1), legend=c("least squares", "M-estimate", "LTS"), lty=1:3, col=1:3)
Coefficients:
Value Std. Error t value Pr(>|t|)
(Intercept) -260.0592 102.6070 -2.5345 0.0189
year 5.0415 1.6579 3.0408 0.0060
Coefficients:
Value Std. Error t value
(Intercept) -102.6222 26.6082 -3.8568
year 2.0414 0.4299 4.7480
162
Residuals:
Min 1Q Median 3Q Max
-68.15 -29.46 -11.52 22.74 132.7
Coefficients:
Value Std. Error t value
(Intercept) -227.9250 101.8740 -2.2373
year 4.4530 1.6461 2.7052
Coefficients:
Value Std. Error t value
(Intercept) -52.3025 2.7530 -18.9985
year 1.0980 0.0445 24.6846
163
164
Chapter 14
In this chapter I wish to consider a procedure called AUTOREG in SAS which has some useful techniques for
detecting and handling violations of the independence and homogenous error structure assumptions which
are assumed in linear regression.
I will consider two examples which illustrate how one might use this procedure. The first violates the
independence assumption for the residuals. The second is an example in which one has a heteroscedastic
error structure. In both cases, PROC AUTOREG provides a number of procedures that are beyond the
scope of this course.
y = Xβ + e
where
y1
1 x11 x21 ... x(p−1)1
e1
y2 1 x12 x21 ... x(p−1)1 e2
y=
.. X=. e=
...
.. .. ..
..
. . . .
yn 1 x1n x2n ... x(p−1)n en
and
β0
β1
β2
β= ,
..
.
βp−1
where
y is n × 1 vector of dependent observations.
X is an n × p matrix of independent observations or known values.
β is a p × 1 vector of parameters.
165
e is a n × 1 vector of unobservable errors or residuals.
By adding the distributional assumption on the errors thta the unobserved error ei , i = 1, 2, . . . , n are i.i.d
normals with mean = 0 and variance = σ 2 . This assumption becomes
e1
e2 2
... ∼ Nn (0, σ In ).
e=
en
In this section, one considers the problem when the error covariance matrix is V 6= σ 2 In and that the errors
behave according to an autoregressive error structure given by,
The above error structure is said to have an AR(p) model, called an autoregressive model of order p. When
p=1, the model is said to be an AR(1) or Markov error model. When p=1, it follows that,
166
25.64 153.1
26.36 157.3
26.98 160.7
27.52 164.2
27.78 165.6
28.24 168.7
28.78 171.7
;
title ’Blaisdell Data’;
options center nodate pagesize=100 ls=80;
symbol1 i=none color=red v=star ;
symbol2 i=join v=none color=blue;
symbol3 i=join color=green v=none;
title2 ’Output from OLS’;
PROC reg data=blais graphics;
MODEL COMSALE=INDSALE / DW;
output out=new p=yhat r=r ucl=u lcl=l;run;
proc gplot data=new;
plot comsale*indsale=1 l*indsale=3 u*indsale=3 yhat*indsale=2/overlay;
plot r*indsale /vref=0; run;
title2 ’Output from Autoreg’;
PROC AUTOREG data=blais;
model comsale=indsale/nlag=1 archtest dwprob;
output out=new2 p=yhat r=r ucl=u lcl=l; RUN;
proc gplot data=new2;
plot comsale*indsale=1 yhat*indsale=2 u*indsale=3 l*indsale=3/overlay;
plot r*indsale /vref=0; run;
The SAS output is;
Blaisdell Data
Output from OLS
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
167
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Durbin-Watson D 0.735
Number of Observations 20
1st Order Autocorrelation 0.626
_________________________________________________________________
Blaisdell Data
Output from Autoreg
168
Standard Approx
Variable DF Estimate Error t Value Pr > |t|
Estimates of Autocorrelations
Standard
Lag Coefficient Error t Value
1 -0.626005 0.189134 -3.31
Yule-Walker Estimates
Standard Approx
Variable DF Estimate Error t Value Pr > |t|
169
This graph is the confidence bands as givven by ordinary least squares (PROC REG)
The next graph gives the residual plot using ordinary least squares.
170
The next two graphs are similar except that I have used PROC AUTOREG withan autoregressive of
order one error model.
171
14.2 Detecting Heteroscedastic Error Structure
PROC AUTOREG using the theory developed for modeling conditional heteroscedasticity or GARCH model
given by
yi = x0i β + νi
νi = i − ρ1 νi−1 − . . . − ρm νi−m
p
i = hi ei
q
X p
X
hi = ω + αj 2i−j + γk hi−k
j=1 k=1
172
plot r*n /vref=0;
PROC AUTOREG data=mt71;
model effic = veloc volt/dw=1 dwprob archtest garch=(p=1,q=1);
output out=new2 p=yhat r=r ucl=u lcl=l cpev=vhat;
title2 ’Output from Autoreg’;
RUN;
data new3; set new2; shat=sqrt(vhat); w=1/shat;run;
proc gplot data=new3;
plot effic*n=1 yhat*n=2 u*n=3 l*n=3/overlay;
title3 ’Estimated SE and residual vs n’;
plot r*n=1 shat*n=2/vref=0 cvref=black overlay;
run;
proc reg data=new3 graphics;
model effic = veloc volt
/ noprint ss1 ss2 collinoint vif dw;
weight w;
title3 ’Weighted LS’;
plot r.*obs./vref=0;
run;
The SAS output is;
Meyers Table 7.1 Data
Output from OLS
The REG Procedure
Dependent Variable: effic
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t| Type I SS
Parameter Estimates
Variance
Variable DF Type II SS Inflation
173
Intercept 1 17763 0
veloc 1 681.45025 1.00000
volt 1 3435.46225 1.00000
Durbin-Watson D 1.883
Number of Observations 40
1st Order Autocorrelation 0.016
___________________________________________________________
Output from Autoreg
174
9 28.8634 0.0007 30.8401 0.0003
10 28.8708 0.0013 30.8769 0.0006
11 30.5517 0.0013 30.9659 0.0011
12 32.1185 0.0013 30.9694 0.0020
Standard Approx
Variable DF Estimate Error t Value Pr > |t|
Algorithm converged.
GARCH Estimates
Standard Approx
Variable DF Estimate Error t Value Pr > |t|
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
175
Root MSE 2.14715 R-Square 0.8640
Dependent Mean 78.99993 Adj R-Sq 0.8566
Coeff Var 2.71792
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t| Type I SS
Parameter Estimates
Variance
Variable DF Type II SS Inflation
Intercept 1 5456.44356 0
veloc 1 211.02182 1.00000
volt 1 872.09087 1.00000
Durbin-Watson D 2.001
Number of Observations 40
1st Order Autocorrelation -0.024
176
177
178
14.2.2 Draper and Smith Example Using Weighted Least Squares
In chapter 9, Draper and Smith consider the problem of Generalized Least Squares and Weighted Least
Squares. In section 9.3 they consider an example (data found in Table 9.1) where they estimate their
179
weights from modeling the sample variance for the independent data xi . The weights given in Table 9.1 are
the inverse of the estimated sample variance for the individual x0i s.
In this example I am going to estimate the weights using PROC AUTOREG and compare the results
using weighted least squares. The SAS code is,
options center nodate pagesize=100 ls=80;
symbol1 i=none color=red v=star ;
symbol2 i=join v=none color=blue;
symbol3 i=join color=green v=none;
title1 ’Table 9.1 data X, Y, W page 226’;
data t91;
*infile ’R:/export/home/jtubbs/ftp/Regression/DS_data/15a’;
input x y w ;n=_n_;
cards;
1.15 0.99 1.24
1.90 0.98 2.18
3.00 2.60 7.84
3.00 2.67 7.84
3.00 2.66 7.84
3.00 2.78 7.84
3.00 2.80 7.84
5.34 5.92 7.43
5.38 5.35 6.99
5.40 4.33 6.78
5.40 4.89 6.78
5.45 5.21 6.30
7.70 7.68 0.89
7.80 9.81 0.84
7.81 6.52 0.83
7.85 9.71 0.82
7.87 9.82 0.81
7.91 9.81 0.79
7.94 8.50 0.78
9.03 9.47 0.47
9.07 11.45 0.46
9.11 12.14 0.45
9.14 11.50 0.45
9.16 10.65 0.44
9.37 10.64 0.41
10.17 9.78 0.31
10.18 12.39 0.31
10.22 11.03 0.30
10.22 8.00 0.30
10.22 11.90 0.30
10.18 8.68 0.31
10.50 7.25 0.28
10.23 13.46 0.30
10.03 10.19 0.32
10.23 9.93 0.30
;
180
proc reg data=t91 graphics;
model y = x / dw; run;
title2 ’Least Squares’;
title3 ’Regression Plot’; plot y*x p.*x uclm.*x lclm.*x/overlay; run;
title3 ’Residual Plot’; plot r.*obs./ vref=0; run;
* Weighted Least Squares;
proc reg data=t91 graphics;
model y = x / dw;
weight w;run;
title2 ’Weighted LS’;
title3 ’Regression Plot’; plot y*x p.*x uclm.*x lclm.*x/overlay; run;
title3 ’Residual Plot’; plot r.*obs./ vref=0; run;
PROC AUTOREG data=t91;
* model y = x/dw=2 dwprob archtest;
model y = x/garch=(p=1,q=1);
output out=new2 p=yhat r=r ucl=u lcl=l cpev=vhat;
title2 ’Output from Autoreg’;
RUN;
data new3; set new2; shat=sqrt(vhat); wn=1/vhat;run;
proc gplot data=new3;
plot y*x=1 yhat*x=2 u*x=3 l*x=3/overlay;
title3 ’Estimated SE and residual vs n’;
plot r*n=1 shat*n=2/vref=0 cvref=black overlay;
run;
proc reg data=new3 graphics;
model y = x / dw;
weight wn;
title3 ’Weighted LS’;
run;
proc print data=new3;run;
The SAS output is;
Table 9.1 data X, Y, W page 226
The REG Procedure
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Parameter Estimates
Parameter Standard
181
Variable DF Estimate Error t Value Pr > |t|
Durbin-Watson D 1.952
Number of Observations 35
1st Order Autocorrelation 0.015
________________________________________________________________
Table 9.1 data X, Y, W page 226
Weighted Least Squares
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Durbin-Watson D 1.662
Number of Observations 35
1st Order Autocorrelation 0.160
_________________________________________________________________
Table 9.1 data X, Y, W page 226
Output from Autoreg
Standard Approx
Variable DF Estimate Error t Value Pr > |t|
182
Intercept 1 -0.5790 0.6792 -0.85 0.4001
x 1 1.1354 0.0862 13.17 <.0001
Algorithm converged.
GARCH Estimates
Standard Approx
Variable DF Estimate Error t Value Pr > |t|
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Durbin-Watson D 1.929
Number of Observations 35
183
1st Order Autocorrelation 0.031
184
Chapter 15
Regression Trees
Classification and Regression trees are a relatively new method for exploring relationships among data in
either a regression model or classfication problem. These methods are very easy to use and interprete
although they are somewhat difficult to understand. I have downloaded some material from Salford Systems
which produces and markets a commerical prodcut called, CART, which was developed by Brieman and
Friedman in the early 1980’s.
Splitting Rules
To split a node into two child nodes, CART always asks questions that have a ‘yes’ or ‘no’ answer. For
example, the questions might be: is age <= 55? Or is credit score <= 600?
How do we come up with candidate splitting rules? CART?’ method is to look at all possible splits for all
variables included in the analysis. For example, consider a data set with 215 cases and 19 variables. CART
considers up to 215 times 19 splits for a total of 4085 possible splits. Any problem will have a finite number
of candidate splits and CART will conduct a brute force search through them all.
Choosing a Split
CART’s next activity is to rank order each splitting rule on the basis of a quality-of-split criterion. The
default criterion used in CART is the GINI rule, essentially a measure of how well the splitting rule separates
the classes contained in the parent node. (Alternative criteria are also available).
185
Class Assignment
Once a best split is found, CART repeats the search process for each child node, continuing recursively until
further splitting is impossible or stopped. Splitting is impossible if only one case remains in a particular
node or if all the cases in that node are exact copies of each other (on predictor variables). CART also allows
splitting to be stopped for several other reasons, including that a node has too few cases. (The default for
this lower limit is 10 cases, but may be set higher or lower to suit a particular analysis).
Once a terminal node is found we must decide how to classify all cases falling within it. One simple
criterion is the plurality rule: the group with the greatest representation determines the class assignment.
CART goes a step further: because each node has the potential for being a terminal node, a class assignment
is made for every node whether it is terminal or not. The rules of class assignment can be modified from
simple plurality to account for the costs of making a mistake in classification and to adjust for over- or
under-sampling from certain classes.
A common technique among the first generation of tree classifiers was to continue splitting nodes (growing
the tree) until some goodness-of-split criterion failed to be met. When the quality of a particular split fell
below a certain threshold, the tree was not grown further along that branch. When all branches from the
root reached terminal nodes, the tree was considered complete. While this technique is still embodied in
several commercial programs, including CHAID and KnowledgeSEEKER, it often yields erroneous results.
CART uses a completely different technique.
Pruning Trees
Instead of attempting to decide whether a given node is terminal or not, CART proceeds by growing trees
until it is not possible to grow them any further. Once CART has generated what we call a maximal tree,
it examines smaller trees obtained by pruning away branches of the maximal tree. Unlike other methods,
CART does not stop in the middle of the tree-growing process, because there might still be important
information to be discovered by drilling down several more levels.
Testing
Once the maximal tree is grown and a set of sub-trees are derived from it, CART determines the best tree
by testing for error rates or costs. With sufficient data, the simplest method is to divide the sample into
learning and test sub-samples. The learning sample is used to grow an overly-large tree. The test sample is
then used to estimate the rate at which cases are misclassified (possibly adjusted by misclassification costs).
The misclassification error rate is calculated for the largest tree and also for every sub-tree. The best sub-tree
is the one with the lowest or near-lowest cost, which may be a relatively small tree.
Some studies will not have sufficient data to allow a good-sized separate test sample. The tree-growing
methodology is data intensive, requiring many more cases than classical regression. When data are in short
supply, CART employs the computer-intensive technique of cross validation.
Cross Validation
Cross validation is used if data are insufficient for a separate test sample. In such cases, CART grows a
maximal tree on the entire learning sample. This is the tree that will be pruned back. CART then proceeds
by dividing the learning sample into 10 roughly-equal parts, each containing a similar distribution for the
dependent variable. CART takes the first 9 parts of the data, constructs the largest possible tree, and uses
the remaining 1/10 of the data to obtain initial estimates of the error rate of selected sub-trees. The same
process is then repeated (growing the largest possible tree) on another 9/10 of the data while using a different
1/10 part as the test sample. The process continues until each part of the data has been held in reserve one
186
time as a test sample. The results of the 10 mini-test samples are then combined to form error rates for
trees of each possible size; these error rates are applied to the tree based on the entire learning sample.
The upshot of this complex process is a set of fairly reliable estimates of the independent predictive
accuracy of the tree. This means that we can know how well any tree will perform on completely fresh
data-even if we do not have an independent test sample. Because the conventional methods of assessing tree
accuracy can be wildly optimistic, cross validation is the method CART normally uses to obtain objective
measures for smaller data sets.
Conclusion
CART uses a combination of exhaustive searches and computer-intensive testing techniques to identify useful
tree structures of data. It can be applied to virtually any data set and can proceed with little or no guidance
from the user. Thus, if you have a data set and have no idea how to proceed with its analysis, you can
simply hand it over to CART and let it do the work. If this sounds too good to be true, the natural question
is: does CART really deliver useful results that you can trust?
The surprising answer is a resounding yes. When automatic CART analyses are compared with stepwise
logistic regressions or discriminant analyses, CART typically performs about 10 to 15 better on the learning
sample. CART’s performance on test samples is even more important. Because CART does not suffer from
the statistical deficiencies that plague conventional stepwise techniques, CART will typically be far more
accurate on new data. Further, when automatic CART analyses are compared with the best parametric
models of sophisticated teams of statisticians, CART is still competitive. CART can often generate models
in an hour or two that are only slightly worse in predictive accuracy than models that may take specialists
several days to develop.
187
• CART is extremely robust to the effects of outliers.
Outliers among the independent variables generally do not affect CART because splits usually occur
at non-outlier values. Outliers in the dependent variable are often separated into nodes where they
no longer affect the rest of the tree. Also, in regression models, least absolute deviations can be used
instead of least squares, diminishing the effect of outliers in the dependent variable.
188
16 9.58 5.67 0.74 8.4 30 20 7 48.5 70.6 4
17 10.09 6.72 0.85 6.1 31 22 0 59.3 37.2 6
18 8.11 4.95 0.67 4.9 30 22 0 70.0 24.0 4
19 6.83 4.62 0.45 4.6 31 11 0 70.0 21.2 3
20 8.88 6.60 0.95 3.7 31 23 0 74.5 13.7 4
21 7.68 5.01 0.64 4.7 30 20 0 72.1 22.1 4
22 8.47 5.68 0.75 5.3 31 21 1 58.1 28.1 6
23 8.86 5.28 0.70 6.2 30 20 14 44.6 38.4 4
24 10.36 5.36 0.67 6.8 31 20 22 33.4 46.2 4
25 11.08 5.87 0.70 7.5 31 22 28 28.6 56.3 5
> attach(x1A)
> x1A.tree<-tree(y ~ x2+x3+x4+x5+x6+x7+x8+x9+x10)
> x1A.tree
node), split, n, deviance, yval
* denotes terminal node
Regression tree:
tree(formula = y ~ x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10)
Variables actually used in tree construction:
[1] "x8" "x5"
Number of terminal nodes: 4
Residual mean deviance: 0.683 = 14.34 / 21
Distribution of residuals:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-1.504 -0.4671 -0.07 8.882e-016 0.64 1.358
> plot(prune.tree(x1A.tree,k=2))
> text(plot(prune.tree(x1A.tree,k=2)))
> summary(prune.tree(x1A.tree,k=2))
Regression tree:
snip.tree(tree = x1A.tree, nodes = 6)
Variables actually used in tree construction:
[1] "x8"
Number of terminal nodes: 3
Residual mean deviance: 0.7003 = 15.41 / 22
189
Distribution of residuals:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-1.504 -0.4671 -0.04375 5.329e-016 0.6363 1.684
> prune.tree(x1A.tree,k=2)
node), split, n, deviance, yval
* denotes terminal node
190
Cereal Example
The breakfast cereal example output becomes;
Regression tree:
tree(formula = rating ~ cal + protein + fat + sodium + fiber + carbs +
sugar + potass + vitm, data = cereal, na.action = na.exclude,
mincut = 5, minsize = 10, mindev = 0.01)
Variables actually used in tree construction:
[1] "cal" "sugar" "carbs" "fat" "fiber"
Number of terminal nodes: 11
Residual mean deviance: 105.7 = 6341 / 60
Distribution of residuals:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-25.73 -5.23 0.4829 2.802e-015 4.76 24
node), split, n, deviance, yval
* denotes terminal node
191
192
Chapter 16
LOESS Regression
The SAS LOESS procedure is a nonparametric method for finding a regression relationship between variables.
16.1 Overview
The LOESS procedure implements a nonparametric method for estimating regression surfaces pioneered
by Cleveland, Devlin, and Grosse (1988), Cleveland and Grosse (1991), and Cleveland, Grosse, and Shyu
(1992). The LOESS procedure allows great flexibility because no assumptions about the parametric form of
the regression surface are needed.
The SAS System provides many regression procedures such as the GLM, REG, and NLIN procedures for
situations in which you can specify a reasonable parametric model for the regression surface. You can use
the LOESS procedure for situations in which you do not know a suitable parametric form of the regression
surface. Furthermore, the LOESS procedure is suitable when there are outliers in the data and a robust
fitting method is necessary.
The main features of the LOESS procedure are as follows:
• fits nonparametric models
• supports the use of multidimensional data
• supports multiple dependent variables
• supports both direct and interpolated fitting using kd trees
• performs statistical inference
• performs iterative reweighting to provide robust fitting when there are outliers in the data
• supports multiple SCORE statements
yi = g(xi ) + ei
where g is the regression function and ei is a random error. The idea of local regression is that at a predictor
x, the regression function g(x) can be locally approximated by the value of a function in some specified
193
parametric class. Such a local approximation is obtained by fitting a regression surface to the data points
within a chosen neighborhood of the point x.
In the loess method, weighted least squares is used to fit linear or quadratic functions of the predictors at
the centers of neighborhoods. The radius of each neighborhood is chosen so that the neighborhood contains a
specified percentage of the data points. The fraction of the data, called the smoothing parameter, in each local
neighborhood controls the smoothness of the estimated surface. Data points in a given local neighborhood
are weighted by a smooth decreasing function of their distance from the center of the neighborhood.
In a direct implementation, such fitting is done at each point at which the regression surface is to be
estimated. A much faster computational procedure is to perform such local fitting at a selected sample of
points in predictor space and then to blend these local polynomials to obtain a regression surface.
You can use the LOESS procedure to perform statistical inference provided the error distribution satisfies
some basic assumptions. In particular, such analysis is appropriate when the are i.i.d. normal random
variables with mean 0. By using the iterative reweighting, the LOESS procedure can also provide statistical
inference when the error distribution is symmetric but not necessarily normal. Furthermore, by doing
iterative reweighting, you can use the LOESS procedure to perform robust fitting in the presence of outliers
in the data.
While all output of the LOESS procedure can be optionally displayed, most often the LOESS procedure
is used to produce output data sets that will be viewed and manipulated by other SAS procedures. PROC
LOESS uses the Output Delivery System (ODS) to place results in output data sets. This is a departure
from older SAS procedures that provide OUTPUT statements to create SAS data sets from analysis results.
16.2 Examples
16.2.1 Phone Example
The SAS code is,
194
68 182.0
69 212.0
70 43.0
71 24.0
72 27.0
73 29.0
;run;
/*
data phones;
infile ’R:/export/home/jtubbs/ftp/Regression/phones.dat’;
input year calls; run;
*/
title2 ’Linear Regression’;
*Least Squares Procedure;
proc reg data=phones;
* ods html body=’R:/export/home/jtubbs/public_html/loess.phones.1.html’;
model calls = year;
* ods html body=’R:/export/home/jtubbs/public_html/loess.phones.2.html’;
output out=a p=yhat;
*plot calls*year;
run;
symbol1 v=star color=black;
symbol2 v=none i=join color=red;
symbol3 v=none i=join color=blue;
*Robust procedure using Proc NLIN with iterated reweighted least squares;
title2 ’Tukey Biweigth using IRLS’;
title3 ’Robust = red LS = blue’;
proc nlin data=a nohalve;
parms b0=-260 b1=5;
* ods html body=’R:/export/home/jtubbs/public_html/loess.phones.3.html’;
model calls = b0 + b1*year;
resid=calls-model.calls;
p=model.calls;
sigma=20;der.b0=1;der.b1=year;
b=4.685;
r=abs(resid/sigma);
if r<=b then _weight_=(1-(r/b)**2)**2;
else _weight_=0;
output out=c r=rbi p=p; run;
data c; set c; sigma=40; b=4.685; r=abs(rbi/sigma);
if r<=b then _weight_=(1-(r/b)**2)**2;
else _weight_=0;
proc gplot data=c;
* ods html body=’R:/export/home/jtubbs/public_html/loess.phones.4.html’;
plot calls*year p*year yhat*year/ overlay;
run;
symbol1 color=blue value=dot;
symbol2 color=red interpol=spline value=none;
195
symbol3 color=green interpol=spline value=none;
symbol4 color=green interpol=spline value=none;
*Loess Procedure for data;
title2 ’LOESS Regression’;
proc loess data=phones;
ods output OutputStatistics = phonefit FitSummary=Summary;
* ods html body=’R:/export/home/jtubbs/public_html/loess.phones.5.html’;
model calls = year / degree=2 smooth= 0.3 0.6 1.0
details(outputstatistics) details(predatvertices);
run;
proc sort data=phonefit;
by SmoothingParameter year;
proc gplot data=phonefit;
by SmoothingParameter;
* ods html body=’R:/export/home/jtubbs/public_html/loess.phones.6.html’;
plot (DepVar Pred LowerCL UpperCL)*year / overlay;
run;
*Loess Procedure for residuals;
title2 ’Loess for Phone Data’;
proc loess data=phonefit;
ods output OutputStatistics = residout;
* ods html body=’R:/export/home/jtubbs/public_html/loess.phones.7.html’;
model Residual = year / degree=2 smooth= 0.3 0.6 1.0 details(kdtree)
details(outputstatistics) details(predatvertices);
run;
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
196
Dependent Mean 49.99167 Adj R-Sq 0.2639
Coeff Var 112.46553
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
_______________________________________________________________
Phones Data
Tukey Biweigth using IRLS
Robust = red LS = blue
Weighted
Iter b0 b1 SS
Estimation Summary
Method Gauss-Newton
Iterations 5
R 2.786E-6
PPC(b0) 1.131E-6
RPC(b0) 0.000052
Object 4.621E-6
Objective 296.5022
Observations Read 24
Observations Used 18
Observations Missing 6
197
Sum of Mean Approx
Source DF Squares Square F Value Pr > F
Approx
Parameter Estimate Std Error Approximate 95% Confidence Limits
b0 1.0000000 -0.9928601
b1 -0.9928601 1.0000000
_________________________________________________________________
Phones Data
LOESS Regression
Statistic year
Vertex Predicted
Number year Value
198
1 50.00000 4.41320
2 51.00000 4.58248
3 52.00000 5.02038
4 53.00000 5.72984
5 54.00000 6.63094
6 55.00000 7.32320
7 56.00000 7.99172
8 57.00000 9.00883
9 58.00000 10.44531
10 59.00000 12.05414
11 60.00000 13.49227
12 61.00000 14.57516
13 62.00000 9.54896
14 63.00000 42.41547
15 64.00000 96.46968
16 65.00000 133.26582
17 66.00000 140.37578
18 67.00000 159.46406
19 68.00000 198.01022
20 69.00000 169.07405
21 70.00000 79.89311
22 71.00000 28.91523
23 72.00000 13.93099
24 73.00000 34.06159
Output Statistics
Predicted
Obs year calls calls
199
Dependent Variable: calls
Output Statistics
Predicted
Obs year calls calls
Fit Summary
Phones Data
LOESS Regression
Vertex Predicted
200
Number year Value
1 50.00000 4.34864
2 51.00000 4.66114
3 52.00000 5.10032
4 54.00000 6.36513
5 55.00000 7.19961
6 57.00000 9.23011
7 58.00000 8.87318
8 60.00000 7.67110
9 61.00000 16.01639
10 63.00000 55.63975
11 64.00000 85.66250
12 66.00000 154.05459
13 67.00000 166.58650
14 69.00000 138.61922
15 70.00000 114.06367
16 72.00000 37.10261
17 73.00000 -14.50157
Output Statistics
Predicted
Obs year calls calls
201
Smoothing Parameter: 0.6
Dependent Variable: calls
Output Statistics
Predicted
Obs year calls calls
Fit Summary
Phones Data
LOESS Regression
Vertex Predicted
Number year Value
1 50.00000 15.12017
2 52.00000 3.09888
3 55.00000 -1.96298
4 58.00000 9.31005
5 61.00000 36.79542
202
6 64.00000 90.13736
7 67.00000 104.06626
8 70.00000 89.41573
9 73.00000 40.80335
Output Statistics
Predicted
Obs year calls calls
Fit Summary
203
Degree of Local Polynomials 2
Smoothing Parameter 1.00000
Points in Local Neighborhood 24
Residual Sum of Squares 38006
The SAS graphs are;
204
205
206
16.2.2 Motor Cycle Example
The SAS graphs are;
207
208