0% found this document useful (0 votes)

6 views9 pages

Lecture 21

Uploaded by

Pranav Bhaskar Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views9 pages

Lecture 21

Uploaded by

Pranav Bhaskar Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

LINEAR REGRESSION ANALYSIS

MODULE – V
Lecture - 21
Correcting Model
Inadequacies Through
Transformation and Weighting
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur
2

Analytical methods for selecting a transformation on study variable

The Box-Cox method

Suppose the normality and/or constant variance of study variable y can be corrected through a power transformation
on y. This means y is to be transformed as yλ where λ is the parameter to be determined. For example, if λ = 0.5,
then the transformation is square root and y is used as study variable in place of y.

Now the linear regression model has parameters β , σ 2 and λ . Box and Cox method tells how to estimate
simultaneously the λ and parameters of the model using the method of maximum likelihood.

Note that as λ approaches zero, yλ approaches to 1. So there is a problem at λ =0 because this makes all the
observation y to be unity. It is meaningless that all the observation on study variable are constant. So there is a
y λ − 1 as a study variable.
discontinuity at λ = 0. One approach to solve this difficulty is to use
λ

yλ −1
Note that as λ → 0, → ln y . So a possible solution is to use the transformed study variable as
λ
 yλ −1
 for λ ≠ 0
W = λ
ln y for λ = 0.

3

So family W is continuous. Still it has a drawback. As λ changes, the value of W change dramatically. So it is difficult
to obtain the best value ofλ . If different analyst obtain different values of λ , then it will fit different models. It may then
not be appropriate to compare the models with different values of λ . So it is preferable to use an alternative form

 yλ −1
 for λ ≠ 0
y (λ=
)
V=  λ y*λ −1
 y ln y for λ = 0
 *
where y* is the geometric mean of yi ‘s as y* = ( y1 y2 ... yn )1/ n which is constant.

1 n
For calculation purpose, we can use ln y* = ∑ ln yi .
n i =1

When V is applied to each yi, we get V = (V1 , V2 ,..., Vn ) ' as a vector of observation on transformed study variable and
we use it to fit a linear model=
V Xβ +ε using least squares or maximum likelihood method.

The quantity λ y*λ −1 in the denominator is related to the nth power of Jacobian of transformation. See how:
4

We want to convert yi into yi( λ ) as

(λ ) yiλ − 1
y = W= ; λ ≠ 0.
i i
λ
Let
y (=
y1 , y2 ,..., yn ) ', W (W1 , W2 ,..., Wn ) '.

y1λ − 1
Note that if W1 = , then
λ
∂W1 λ y1λ −1
= = y1λ −1
∂y1 λ
∂W1
= 0.
∂y2

In general,
λ −1
∂Wi  yi if i = j
=
∂y j 0 if i ≠ j.
The Jacobian of transformation is given by

∂yi 1 1
J ( yi → Wi ) = = = .
∂Wi  ∂Wi  yiλ −1
 
 ∂yi 
5

∂W1 ∂W1 ∂W1


∂y1 ∂y2 ∂yn
y1λ −1 0 0  0
∂W2 ∂W2 ∂W2
 0 y2λ −1 0  0
=
J (W → y ) ∂y1 =
∂y2 ∂yn
    
   
0 0 0  ynλ −1
∂Wn ∂Wn ∂Wn

∂y1 ∂y2 ∂yn
n
= ∏ yiλ −1
i =1
λ −1
 n 
=  ∏ yi 
 i =1 
λ −1
1  n

J(y →
= W) = 1
J (W → Y ) 
∏
i =1
yi 

.

Since this is a Jacobian when we want to transform the whole vector y to whole vector W. If an individual yi is to be
transform into Wi, then take its geometric mean as
λ −1
 1

  n
n 
  ∏
J ( yi → Wi ) = 1 yi  .
i =1  
 
The quantity n
1 ∏ yiλ −1
J (Y → W ) =
i =1
ensures that unit volume is preserved moving from the set of yi to the set of Vi. This is a factor which scales and ensures
that the residual sum of squares obtained from different values of λ can be compared.
6

To find the appropriate family, consider

y (λ=
)
V= X β + ε

where

(λ ) yλ −1
y = λ −1
, ε ~ N (0, σ 2 I ).
λ y*

Applying method of maximum likelihood for likelihood function for y (λ ) ,

 n 2
 ∑ εi 
n
(λ )  1  2
L  y  
= 2 
exp  − i =1 2 
 2πσ   2σ 
 
n
 1 2  ε 'ε 
= 2 
exp  − 2 
 2πσ   2σ 
n
 1 2  ( y ( λ ) − X β ) '( y ( λ ) − X β ) 
= 2 
exp − 
 2πσ   2σ 2 
n  ( y ( λ ) − X β ) '( y ( λ ) − X β ) 
− ln σ 2 − 
ln L  y ( λ )  =  (ignoring constant).
2  2σ 2 
7

Solving

∂ ln L  y ( λ ) 
=0
∂β
∂ ln L  y ( λ ) 
=0
∂σ 2

gives the maximum likelihood estimators

βˆ (λ ) = ( X ' X ) −1 X ' y ( λ )
1 (λ ) y ( λ ) ' Hy ( λ )
σˆ (λ ) =
2 (λ )
y '  I − X ( X ' X ) X '  y =
−1

n n
for a given value of λ .

(λ )
Substituting these estimates in the log likelihood function ln L  y  gives
n n
L (λ ) = − ln [ SS r e s (λ )]
− ln σˆ 2 =
2 2
where SS r e s (λ ) is the sum of squares due to residuals which is a function of λ . Now maximize L(λ ) with respect to λ .
It is difficult to obtain any closed form of the estimator of λ. So we maximize it numerically.

n
The function − ln [ SS r e s (λ ) ] is called as the Box-Cox objective function.
2
8

Let λmax be the value of λ which maximizes the Box-Cox objective function. Then under fairly general conditions, for any
other λ
n ln [ SS r e s (λ ) ] − n ln [ SS r e s (λmax ) ]

has approximately χ 2 (1) distribution. This result is based on the large sample behaviour of the likelihood ratio statistic.
This is explained as follows:

The likelihood ratio test statistic in our case is

Max L
Ω
ηn ≡ η = o
Max L
Ω

n
 1 2
Max  2 
Ωo  σ 
= n
 1 2
Max  2 
Ω σ 
n
 1  2
 σˆ 2 (λ ) 
=  
n
 1 2
 ˆ2 
 σ (λmax ) 
n
 1/ SS r e s (λ )  2
= 
 1/ SS r e s (λmax ) 
9

n  SS r e s (λmax ) 
ln η = ln  
2  SS r e s (λ ) 

n  SS r e s (λ ) 
− ln η =ln  
2  SS r e s (λmax ) 
n n
= ln [ SS r e s (λ ) ] − ln [ SS r e s (λmax ) ]
2 2
− L(λ ) + L(λmax )
=

where
n
L(λ ) = − ln [ SSr e s (λ )]
2
n
L(λmax ) = − ln [ SSr e s (λmax )] .
2

−2 ln ηn converges in distribution to χ (1) when the null hypothesis is true, so

2
Since under certain regularity conditions,

− 2 ln η ~ χ 2 (1)
χ 2 (1)
or − ln η ~
2
χ 2 (1)
or L(λmax ) − L(λ ) ~ .
2

CH 2
No ratings yet
CH 2
31 pages
Classical Linear Regression and Its Assumptions
No ratings yet
Classical Linear Regression and Its Assumptions
63 pages
Answers To Odd-Numbered Exercises For Fox, Applied Regression Analysis
No ratings yet
Answers To Odd-Numbered Exercises For Fox, Applied Regression Analysis
151 pages
Wooldridge 6e AppE IM
No ratings yet
Wooldridge 6e AppE IM
5 pages
2.1972 Generalized Linear Models Nelder Wedderburn
No ratings yet
2.1972 Generalized Linear Models Nelder Wedderburn
16 pages
Theory Generalized Linear Model
No ratings yet
Theory Generalized Linear Model
16 pages
Formulas and Probability Tables: Quantitative Methods III
No ratings yet
Formulas and Probability Tables: Quantitative Methods III
8 pages
Sperate Product Mean Estimator
No ratings yet
Sperate Product Mean Estimator
14 pages
Chapter5 Regression TransformationAndWeightingToCorrectModelInadequacies
No ratings yet
Chapter5 Regression TransformationAndWeightingToCorrectModelInadequacies
16 pages
Regression Model Transformation Guide
No ratings yet
Regression Model Transformation Guide
16 pages
Advanced Econometrics: Instructor: Kanika Mahajan
No ratings yet
Advanced Econometrics: Instructor: Kanika Mahajan
36 pages
Nelder 1972
No ratings yet
Nelder 1972
16 pages
PRML Solution Manual-2
No ratings yet
PRML Solution Manual-2
122 pages
θ, then the probability density function for Y, θ), can be written as  y∣=exp  ybcd  y θ) is called the natural −m  n y ,
No ratings yet
θ, then the probability density function for Y, θ), can be written as  y∣=exp  ybcd  y θ) is called the natural −m  n y ,
6 pages
ASM Compre Paper (Sem-I) (2021-22)
No ratings yet
ASM Compre Paper (Sem-I) (2021-22)
2 pages
Stat2 2023 Syllabus B v1.0 Weeks 5-6-7
No ratings yet
Stat2 2023 Syllabus B v1.0 Weeks 5-6-7
41 pages
Estimation of Parametric Functions in Downton's
No ratings yet
Estimation of Parametric Functions in Downton's
17 pages
Quasi-Likelihood Functions, Generalized Linear Models
No ratings yet
Quasi-Likelihood Functions, Generalized Linear Models
10 pages
Silvey
No ratings yet
Silvey
20 pages
Manual Econometrics
No ratings yet
Manual Econometrics
20 pages
Advanced Econometrics PDF
No ratings yet
Advanced Econometrics PDF
58 pages
1.probability Random Variables and Stochastic Processes Athanasios Papoulis S. Unnikrishna Pillai 1 300 271 300
No ratings yet
1.probability Random Variables and Stochastic Processes Athanasios Papoulis S. Unnikrishna Pillai 1 300 271 300
30 pages
Chapter 12
No ratings yet
Chapter 12
48 pages
Econ 251 PS1 Solutions
No ratings yet
Econ 251 PS1 Solutions
5 pages
Lecture6 Module2 Anova 1
No ratings yet
Lecture6 Module2 Anova 1
10 pages
Introduction To Kriging: To Cite This Version
No ratings yet
Introduction To Kriging: To Cite This Version
40 pages
GLM - NelderWedderburn1972
No ratings yet
GLM - NelderWedderburn1972
16 pages
Lecture39 Module16 Econometrics
No ratings yet
Lecture39 Module16 Econometrics
10 pages
��
No ratings yet
��
3 pages
STAT 714 Linear Statistical Models: Lecture Notes
No ratings yet
STAT 714 Linear Statistical Models: Lecture Notes
150 pages
Manual For Instructors: TO Linear Algebra Fifth Edition
No ratings yet
Manual For Instructors: TO Linear Algebra Fifth Edition
12 pages
Shapiro-Wilk Test for Normality
No ratings yet
Shapiro-Wilk Test for Normality
21 pages
Question 1
No ratings yet
Question 1
23 pages
HMWK 4
No ratings yet
HMWK 4
5 pages
Assign20153 Sol
No ratings yet
Assign20153 Sol
47 pages
Performance of Differential Evolution Method in Least Squares Fitting of Some Typical Nonlinear Curves
No ratings yet
Performance of Differential Evolution Method in Least Squares Fitting of Some Typical Nonlinear Curves
21 pages
Inference in Linear Regression Models With Many Covariates and Heteroskedasticity
No ratings yet
Inference in Linear Regression Models With Many Covariates and Heteroskedasticity
47 pages
Answer Key To Exercises - LN3 - Ver2
No ratings yet
Answer Key To Exercises - LN3 - Ver2
16 pages
Econometric Theory: Module - Iii
No ratings yet
Econometric Theory: Module - Iii
10 pages
Notes
No ratings yet
Notes
10 pages
Finance
No ratings yet
Finance
5 pages
Probability and Statistics Guide
No ratings yet
Probability and Statistics Guide
1 page
1 s2.0 S2950509725000231 Efron
No ratings yet
1 s2.0 S2950509725000231 Efron
10 pages
SLRM Note
No ratings yet
SLRM Note
15 pages
Joining Instructions Lisboa
No ratings yet
Joining Instructions Lisboa
8 pages
Chapter 9 Variable Transformation Heteroscedasticity
No ratings yet
Chapter 9 Variable Transformation Heteroscedasticity
25 pages
Homework Topic 1&2.: Plus 20
No ratings yet
Homework Topic 1&2.: Plus 20
11 pages
Mathematics 13 02255
No ratings yet
Mathematics 13 02255
89 pages
Gaussian Random Vector Solutions
No ratings yet
Gaussian Random Vector Solutions
12 pages
Simple Regression
No ratings yet
Simple Regression
18 pages
Normal Distribution Properties and Transformations
No ratings yet
Normal Distribution Properties and Transformations
9 pages
Correlation & Regression Students)
No ratings yet
Correlation & Regression Students)
23 pages
Codigo Box Cox SAS
No ratings yet
Codigo Box Cox SAS
37 pages
RigNotes15 PDF
No ratings yet
RigNotes15 PDF
130 pages
M604 Final Solutions
No ratings yet
M604 Final Solutions
20 pages
Seattle SISG 18 IntroQG Lecture08
No ratings yet
Seattle SISG 18 IntroQG Lecture08
21 pages
Unit - 1
No ratings yet
Unit - 1
8 pages
w6 - Statistical Modelling
No ratings yet
w6 - Statistical Modelling
24 pages
Playground
No ratings yet
Playground
3 pages
Bivariate
No ratings yet
Bivariate
1 page
Quiz 1
No ratings yet
Quiz 1
5 pages
Bivariate
No ratings yet
Bivariate
2 pages
Pyoderma Gangrenosum
No ratings yet
Pyoderma Gangrenosum
18 pages
ECON2206 Assignment 2 William Chau z3376203
No ratings yet
ECON2206 Assignment 2 William Chau z3376203
5 pages
Determining The Value of The Acceleration Due To Gravity: President Ramon Magsaysay State University
No ratings yet
Determining The Value of The Acceleration Due To Gravity: President Ramon Magsaysay State University
12 pages
Texas Examination of Educator Standards Teacher Competencies For Mathematics (8-12)
No ratings yet
Texas Examination of Educator Standards Teacher Competencies For Mathematics (8-12)
16 pages
Thesis Heeps L
No ratings yet
Thesis Heeps L
175 pages
Mathematical Statistics Borovkov A. A. Download
100% (2)
Mathematical Statistics Borovkov A. A. Download
63 pages
Forecasting & Budgeting Essentials
No ratings yet
Forecasting & Budgeting Essentials
5 pages
Report of DDC Market
No ratings yet
Report of DDC Market
31 pages
CT3 Past Exams 2005 - 2009
No ratings yet
CT3 Past Exams 2005 - 2009
175 pages
Data Science Syllabus
No ratings yet
Data Science Syllabus
3 pages
Leaf Area Estimation of Anacardium Humile
No ratings yet
Leaf Area Estimation of Anacardium Humile
8 pages
The Effect of Lifestyle, Brand Image and Personalities On Smartphone Purchase Decision of Consumers in Hochiminh City
No ratings yet
The Effect of Lifestyle, Brand Image and Personalities On Smartphone Purchase Decision of Consumers in Hochiminh City
12 pages
Challenge of Sap in Eeu
No ratings yet
Challenge of Sap in Eeu
101 pages
Hsslive-Xii-Statistics-2. Rehression English
No ratings yet
Hsslive-Xii-Statistics-2. Rehression English
5 pages
B.Com (Hons) Semester-VI Course Overview
No ratings yet
B.Com (Hons) Semester-VI Course Overview
39 pages
Propensity Score Matching in Stata Using Teffects
No ratings yet
Propensity Score Matching in Stata Using Teffects
6 pages
Unit 5. Transportation Systems Planning: Civil Engineering Department 8 Semester
No ratings yet
Unit 5. Transportation Systems Planning: Civil Engineering Department 8 Semester
19 pages
Personalized Product Recommendation and User Satisfaction - Reference To Industry 5.0
No ratings yet
Personalized Product Recommendation and User Satisfaction - Reference To Industry 5.0
28 pages
9 - The Determinants of Consumer's Intention to Use E-wallet The Case Study of MoMo in Vietnam - bỏ
No ratings yet
9 - The Determinants of Consumer's Intention to Use E-wallet The Case Study of MoMo in Vietnam - bỏ
10 pages
MST1102 Course Outline
No ratings yet
MST1102 Course Outline
6 pages
(2011) - Nezlek, J. B. Multilevel Modeling For Social and Personality Psychology
No ratings yet
(2011) - Nezlek, J. B. Multilevel Modeling For Social and Personality Psychology
122 pages
Multiple Regression: Estimation & Testing
No ratings yet
Multiple Regression: Estimation & Testing
12 pages
Analytical Chemistry for B.Tech Students
100% (1)
Analytical Chemistry for B.Tech Students
61 pages
Syllebus
No ratings yet
Syllebus
2 pages
Adaboost: An Ensemble Learning Approach For Estimating Weather-Related Outages in Distribution Systems
No ratings yet
Adaboost: An Ensemble Learning Approach For Estimating Weather-Related Outages in Distribution Systems
10 pages
Regression Interaction Analysis
No ratings yet
Regression Interaction Analysis
3 pages
House Price Prediction Using Linear Regression in ML
No ratings yet
House Price Prediction Using Linear Regression in ML
9 pages
PSYC6102 Psychological Statistics
No ratings yet
PSYC6102 Psychological Statistics
39 pages
2209 2950 1 SM
No ratings yet
2209 2950 1 SM
25 pages
Math 10 Elementary Statistics: Course Information Contact Information
No ratings yet
Math 10 Elementary Statistics: Course Information Contact Information
4 pages