0% found this document useful (0 votes)
6 views9 pages

Lecture 21

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views9 pages

Lecture 21

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

LINEAR REGRESSION ANALYSIS

MODULE – V
Lecture - 21
Correcting Model
Inadequacies Through
Transformation and Weighting
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur
2

Analytical methods for selecting a transformation on study variable

The Box-Cox method


Suppose the normality and/or constant variance of study variable y can be corrected through a power transformation
on y. This means y is to be transformed as yλ where λ is the parameter to be determined. For example, if λ = 0.5,
then the transformation is square root and y is used as study variable in place of y.

Now the linear regression model has parameters β , σ 2 and λ . Box and Cox method tells how to estimate
simultaneously the λ and parameters of the model using the method of maximum likelihood.

Note that as λ approaches zero, yλ approaches to 1. So there is a problem at λ =0 because this makes all the
observation y to be unity. It is meaningless that all the observation on study variable are constant. So there is a
y λ − 1 as a study variable.
discontinuity at λ = 0. One approach to solve this difficulty is to use
λ

yλ −1
Note that as λ → 0, → ln y . So a possible solution is to use the transformed study variable as
λ
 yλ −1
 for λ ≠ 0
W = λ
ln y for λ = 0.

3

So family W is continuous. Still it has a drawback. As λ changes, the value of W change dramatically. So it is difficult
to obtain the best value ofλ . If different analyst obtain different values of λ , then it will fit different models. It may then
not be appropriate to compare the models with different values of λ . So it is preferable to use an alternative form

 yλ −1
 for λ ≠ 0
y (λ=
)
V=  λ y*λ −1
 y ln y for λ = 0
 *
where y* is the geometric mean of yi ‘s as y* = ( y1 y2 ... yn )1/ n which is constant.

1 n
For calculation purpose, we can use ln y* = ∑ ln yi .
n i =1

When V is applied to each yi, we get V = (V1 , V2 ,..., Vn ) ' as a vector of observation on transformed study variable and
we use it to fit a linear model=
V Xβ +ε using least squares or maximum likelihood method.

The quantity λ y*λ −1 in the denominator is related to the nth power of Jacobian of transformation. See how:
4

We want to convert yi into yi( λ ) as

(λ ) yiλ − 1
y = W= ; λ ≠ 0.
i i
λ
Let
y (=
y1 , y2 ,..., yn ) ', W (W1 , W2 ,..., Wn ) '.

y1λ − 1
Note that if W1 = , then
λ
∂W1 λ y1λ −1
= = y1λ −1
∂y1 λ
∂W1
= 0.
∂y2

In general,
λ −1
∂Wi  yi if i = j
=
∂y j 0 if i ≠ j.
The Jacobian of transformation is given by

∂yi 1 1
J ( yi → Wi ) = = = .
∂Wi  ∂Wi  yiλ −1
 
 ∂yi 
5

∂W1 ∂W1 ∂W1



∂y1 ∂y2 ∂yn
y1λ −1 0 0  0
∂W2 ∂W2 ∂W2
 0 y2λ −1 0  0
=
J (W → y ) ∂y1 =
∂y2 ∂yn
    
   
0 0 0  ynλ −1
∂Wn ∂Wn ∂Wn

∂y1 ∂y2 ∂yn
n
= ∏ yiλ −1
i =1
λ −1
 n 
=  ∏ yi 
 i =1 
λ −1
1  n

J(y →
= W) = 1
J (W → Y ) 

i =1
yi 

.

Since this is a Jacobian when we want to transform the whole vector y to whole vector W. If an individual yi is to be
transform into Wi, then take its geometric mean as
λ −1
 1

  n
n 
  ∏
J ( yi → Wi ) = 1 yi  .
i =1  
 
The quantity n
1 ∏ yiλ −1
J (Y → W ) =
i =1
ensures that unit volume is preserved moving from the set of yi to the set of Vi. This is a factor which scales and ensures
that the residual sum of squares obtained from different values of λ can be compared.
6

To find the appropriate family, consider

y (λ=
)
V= X β + ε

where

(λ ) yλ −1
y = λ −1
, ε ~ N (0, σ 2 I ).
λ y*

Applying method of maximum likelihood for likelihood function for y (λ ) ,

 n 2
 ∑ εi 
n
(λ )  1  2
L  y  
= 2 
exp  − i =1 2 
 2πσ   2σ 
 
n
 1 2  ε 'ε 
= 2 
exp  − 2 
 2πσ   2σ 
n
 1 2  ( y ( λ ) − X β ) '( y ( λ ) − X β ) 
= 2 
exp − 
 2πσ   2σ 2 
n  ( y ( λ ) − X β ) '( y ( λ ) − X β ) 
− ln σ 2 − 
ln L  y ( λ )  =  (ignoring constant).
2  2σ 2 
7

Solving

∂ ln L  y ( λ ) 
=0
∂β
∂ ln L  y ( λ ) 
=0
∂σ 2

gives the maximum likelihood estimators


βˆ (λ ) = ( X ' X ) −1 X ' y ( λ )
1 (λ ) y ( λ ) ' Hy ( λ )
σˆ (λ ) =
2 (λ )
y '  I − X ( X ' X ) X '  y =
−1

n n
for a given value of λ .

(λ )
Substituting these estimates in the log likelihood function ln L  y  gives
n n
L (λ ) = − ln [ SS r e s (λ )]
− ln σˆ 2 =
2 2
where SS r e s (λ ) is the sum of squares due to residuals which is a function of λ . Now maximize L(λ ) with respect to λ .
It is difficult to obtain any closed form of the estimator of λ. So we maximize it numerically.

n
The function − ln [ SS r e s (λ ) ] is called as the Box-Cox objective function.
2
8

Let λmax be the value of λ which maximizes the Box-Cox objective function. Then under fairly general conditions, for any
other λ
n ln [ SS r e s (λ ) ] − n ln [ SS r e s (λmax ) ]

has approximately χ 2 (1) distribution. This result is based on the large sample behaviour of the likelihood ratio statistic.
This is explained as follows:

The likelihood ratio test statistic in our case is


Max L

ηn ≡ η = o
Max L

n
 1 2
Max  2 
Ωo  σ 
= n
 1 2
Max  2 
Ω σ 
n
 1  2
 σˆ 2 (λ ) 
=  
n
 1 2
 ˆ2 
 σ (λmax ) 
n
 1/ SS r e s (λ )  2
= 
 1/ SS r e s (λmax ) 
9

n  SS r e s (λmax ) 
ln η = ln  
2  SS r e s (λ ) 

n  SS r e s (λ ) 
− ln η =ln  
2  SS r e s (λmax ) 
n n
= ln [ SS r e s (λ ) ] − ln [ SS r e s (λmax ) ]
2 2
− L(λ ) + L(λmax )
=

where
n
L(λ ) = − ln [ SSr e s (λ )]
2
n
L(λmax ) = − ln [ SSr e s (λmax )] .
2

−2 ln ηn converges in distribution to χ (1) when the null hypothesis is true, so


2
Since under certain regularity conditions,

− 2 ln η ~ χ 2 (1)
χ 2 (1)
or − ln η ~
2
χ 2 (1)
or L(λmax ) − L(λ ) ~ .
2

You might also like