LINEAR REGRESSION ANALYSIS
MODULE – V
Lecture - 21
Correcting Model
Inadequacies Through
Transformation and Weighting
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur
2
Analytical methods for selecting a transformation on study variable
The Box-Cox method
Suppose the normality and/or constant variance of study variable y can be corrected through a power transformation
on y. This means y is to be transformed as yλ where λ is the parameter to be determined. For example, if λ = 0.5,
then the transformation is square root and y is used as study variable in place of y.
Now the linear regression model has parameters β , σ 2 and λ . Box and Cox method tells how to estimate
simultaneously the λ and parameters of the model using the method of maximum likelihood.
Note that as λ approaches zero, yλ approaches to 1. So there is a problem at λ =0 because this makes all the
observation y to be unity. It is meaningless that all the observation on study variable are constant. So there is a
y λ − 1 as a study variable.
discontinuity at λ = 0. One approach to solve this difficulty is to use
λ
yλ −1
Note that as λ → 0, → ln y . So a possible solution is to use the transformed study variable as
λ
yλ −1
for λ ≠ 0
W = λ
ln y for λ = 0.
3
So family W is continuous. Still it has a drawback. As λ changes, the value of W change dramatically. So it is difficult
to obtain the best value ofλ . If different analyst obtain different values of λ , then it will fit different models. It may then
not be appropriate to compare the models with different values of λ . So it is preferable to use an alternative form
yλ −1
for λ ≠ 0
y (λ=
)
V= λ y*λ −1
y ln y for λ = 0
*
where y* is the geometric mean of yi ‘s as y* = ( y1 y2 ... yn )1/ n which is constant.
1 n
For calculation purpose, we can use ln y* = ∑ ln yi .
n i =1
When V is applied to each yi, we get V = (V1 , V2 ,..., Vn ) ' as a vector of observation on transformed study variable and
we use it to fit a linear model=
V Xβ +ε using least squares or maximum likelihood method.
The quantity λ y*λ −1 in the denominator is related to the nth power of Jacobian of transformation. See how:
4
We want to convert yi into yi( λ ) as
(λ ) yiλ − 1
y = W= ; λ ≠ 0.
i i
λ
Let
y (=
y1 , y2 ,..., yn ) ', W (W1 , W2 ,..., Wn ) '.
y1λ − 1
Note that if W1 = , then
λ
∂W1 λ y1λ −1
= = y1λ −1
∂y1 λ
∂W1
= 0.
∂y2
In general,
λ −1
∂Wi yi if i = j
=
∂y j 0 if i ≠ j.
The Jacobian of transformation is given by
∂yi 1 1
J ( yi → Wi ) = = = .
∂Wi ∂Wi yiλ −1
∂yi
5
∂W1 ∂W1 ∂W1
∂y1 ∂y2 ∂yn
y1λ −1 0 0 0
∂W2 ∂W2 ∂W2
0 y2λ −1 0 0
=
J (W → y ) ∂y1 =
∂y2 ∂yn
0 0 0 ynλ −1
∂Wn ∂Wn ∂Wn
∂y1 ∂y2 ∂yn
n
= ∏ yiλ −1
i =1
λ −1
n
= ∏ yi
i =1
λ −1
1 n
J(y →
= W) = 1
J (W → Y )
∏
i =1
yi
.
Since this is a Jacobian when we want to transform the whole vector y to whole vector W. If an individual yi is to be
transform into Wi, then take its geometric mean as
λ −1
1
n
n
∏
J ( yi → Wi ) = 1 yi .
i =1
The quantity n
1 ∏ yiλ −1
J (Y → W ) =
i =1
ensures that unit volume is preserved moving from the set of yi to the set of Vi. This is a factor which scales and ensures
that the residual sum of squares obtained from different values of λ can be compared.
6
To find the appropriate family, consider
y (λ=
)
V= X β + ε
where
(λ ) yλ −1
y = λ −1
, ε ~ N (0, σ 2 I ).
λ y*
Applying method of maximum likelihood for likelihood function for y (λ ) ,
n 2
∑ εi
n
(λ ) 1 2
L y
= 2
exp − i =1 2
2πσ 2σ
n
1 2 ε 'ε
= 2
exp − 2
2πσ 2σ
n
1 2 ( y ( λ ) − X β ) '( y ( λ ) − X β )
= 2
exp −
2πσ 2σ 2
n ( y ( λ ) − X β ) '( y ( λ ) − X β )
− ln σ 2 −
ln L y ( λ ) = (ignoring constant).
2 2σ 2
7
Solving
∂ ln L y ( λ )
=0
∂β
∂ ln L y ( λ )
=0
∂σ 2
gives the maximum likelihood estimators
βˆ (λ ) = ( X ' X ) −1 X ' y ( λ )
1 (λ ) y ( λ ) ' Hy ( λ )
σˆ (λ ) =
2 (λ )
y ' I − X ( X ' X ) X ' y =
−1
n n
for a given value of λ .
(λ )
Substituting these estimates in the log likelihood function ln L y gives
n n
L (λ ) = − ln [ SS r e s (λ )]
− ln σˆ 2 =
2 2
where SS r e s (λ ) is the sum of squares due to residuals which is a function of λ . Now maximize L(λ ) with respect to λ .
It is difficult to obtain any closed form of the estimator of λ. So we maximize it numerically.
n
The function − ln [ SS r e s (λ ) ] is called as the Box-Cox objective function.
2
8
Let λmax be the value of λ which maximizes the Box-Cox objective function. Then under fairly general conditions, for any
other λ
n ln [ SS r e s (λ ) ] − n ln [ SS r e s (λmax ) ]
has approximately χ 2 (1) distribution. This result is based on the large sample behaviour of the likelihood ratio statistic.
This is explained as follows:
The likelihood ratio test statistic in our case is
Max L
Ω
ηn ≡ η = o
Max L
Ω
n
1 2
Max 2
Ωo σ
= n
1 2
Max 2
Ω σ
n
1 2
σˆ 2 (λ )
=
n
1 2
ˆ2
σ (λmax )
n
1/ SS r e s (λ ) 2
=
1/ SS r e s (λmax )
9
n SS r e s (λmax )
ln η = ln
2 SS r e s (λ )
n SS r e s (λ )
− ln η =ln
2 SS r e s (λmax )
n n
= ln [ SS r e s (λ ) ] − ln [ SS r e s (λmax ) ]
2 2
− L(λ ) + L(λmax )
=
where
n
L(λ ) = − ln [ SSr e s (λ )]
2
n
L(λmax ) = − ln [ SSr e s (λmax )] .
2
−2 ln ηn converges in distribution to χ (1) when the null hypothesis is true, so
2
Since under certain regularity conditions,
− 2 ln η ~ χ 2 (1)
χ 2 (1)
or − ln η ~
2
χ 2 (1)
or L(λmax ) − L(λ ) ~ .
2