0% found this document useful (0 votes)

193 views16 pages

Multiple Linear Regression: Diagnostics: Statistics 203: Introduction To Regression and Analysis of Variance

This document provides an overview of topics in multiple linear regression diagnostics, including spline models, assumptions of regression models, problems in regression functions, outlier detection methods, and influence diagnostics. Some key points covered include using partial residual and added variable plots to check for non-linearity, assumptions that errors are normally distributed and have constant variance, and how to measure the influence of observations using statistics like DFFITS and Cook's distance.

Uploaded by

cesardako

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

193 views16 pages

Multiple Linear Regression: Diagnostics: Statistics 203: Introduction To Regression and Analysis of Variance

Uploaded by

cesardako

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Statistics 203: Introduction to Regression

and Analysis of Variance

Multiple Linear Regression: Diagnostics

Jonathan Taylor

- p. 1/16

Today

Spline models
What are the assumptions?
Problems in the regression

Splines + other bases.

Diagnostics

function
Partial residual plot
Added-variable plot
Problems with the errors
Outliers & Influence
Dropping an observation
Different residuals
Crude outlier detection test
Bonferroni correction for
multiple comparisons
DF F IT S
Cooks distance
DF BET AS

- p. 2/16

Spline models

Today

Spline models
What are the assumptions?
Problems in the regression
function
Partial residual plot
Added-variable plot
Problems with the errors

Outliers & Influence

Splines are piecewise polynomials functions, i.e. on an

interval between knots (ti , ti+1 ) the spline f (x) is
polynomial but the coefficients change within each interval.
Example: cubic spline with knows at t1 < t2 < < th

Dropping an observation
Different residuals

f (x) =

Crude outlier detection test

Bonferroni correction for
multiple comparisons

3
X

0j xj +

j=0

DF F IT S
Cooks distance

where

DF BET AS

(x ti )+ =

h
X
i=1

x ti
0

i (x ti )3+

if x ti 0
otherwise.

Here is an example.
Conditioning problem again: B-splines are used to keep the
model subspace the same but have the design less
ill-conditioned.
Other bases one might use: Fourier: sin and cos waves;
Wavelet: space/time localized basis for functions.

- p. 3/16

What are the assumptions?

Today

Spline models

What is the full model for a given design matrix X ?

What are the assumptions?

Yi = 0 + 1 Xi1 + + p Xi,p1 + i

Problems in the regression

Errors N (0, 2 I).

What can go wrong?
Regression function can be wrong missing predictors,
nonlinear.
Assumptions about the errors can be wrong.
Outliers & influential observations: both in predictors and
observations.

- p. 4/16

Problems in the regression function

Today

Spline models
What are the assumptions?
Problems in the regression
function
Partial residual plot
Added-variable plot
Problems with the errors

True regression function may have higher-order non-linear

terms i.e. X12 or even interactions X1 X2 .
How to fix? Difficult in general we will look at two plots
added variable plots and partial residual plots.

Outliers & Influence

Dropping an observation
Different residuals
Crude outlier detection test
Bonferroni correction for
multiple comparisons
DF F IT S
Cooks distance
DF BET AS

- p. 5/16

Partial residual plot

Today

Spline models
What are the assumptions?
Problems in the regression
function
Partial residual plot
Added-variable plot
Problems with the errors

Outliers & Influence

Dropping an observation
Different residuals
Crude outlier detection test
Bonferroni correction for
multiple comparisons
DF F IT S

For 1 j p 1 let
eij = ei + bj Xij .

Can help to determine if variance depends on X j and

outliers.
If there is a non-linear trend, it is evidence that linear is not
sufficient.

Cooks distance
DF BET AS

- p. 6/16

Added-variable plot

Today

Spline models
What are the assumptions?
Problems in the regression
function
Partial residual plot

(I H(j) )Y vs.(I H(j) )Xj .

Added-variable plot
Problems with the errors
Outliers & Influence
Dropping an observation

For 1 j p 1 let H(j) be the Hat matrix with this predictor

deleted. Plot

Plot should be linear and slope should be j . Why?

Different residuals
Crude outlier detection test

Y = X(j) (j) + j Xj +

Bonferroni correction for

multiple comparisons

(I H(j) )Y = (I H(j) )X(j) (j) + j (I H(j) )Xj + (I H(j) )

DF F IT S
Cooks distance
DF BET AS

(I H(j) )Y = j (I H(j) )Xj + (I H(j) )

Also can be helpful for detecting outliers.

If there is a non-linear trend, it is evidence that linear is not
sufficient.

- p. 7/16

Problems with the errors

Today

Spline models
What are the assumptions?
Problems in the regression
function
Partial residual plot
Added-variable plot
Problems with the errors

Outliers & Influence

Dropping an observation
Different residuals
Crude outlier detection test
Bonferroni correction for
multiple comparisons
DF F IT S
Cooks distance

DF BET AS

Errors may not be normally distributed. We will look at

QQplot for a graphical check. May not effect inference in
large samples.
Variance may not be constant. Transformations can
sometimes help correct this. Non-constant variance affects
b which can change t and F statistics
our estimates of SE()
substantially!
Graphical checks of non-constant variance: added variable
plots, partial residual plots, fitted vs. residual plots.
Errors may not be independent. This can seriously affect our
b
estimates of SE().

- p. 8/16

Outliers & Influence

Today

Spline models
What are the assumptions?
Problems in the regression
function
Partial residual plot
Added-variable plot
Problems with the errors
Outliers & Influence
Dropping an observation
Different residuals

Crude outlier detection test

Bonferroni correction for
multiple comparisons
DF F IT S
Cooks distance
DF BET AS

Some residuals may be much larger than others which can

affect the overall fit of the model. This may be evidence of an
outlier: a point where the model has very poor fit. This can
be caused by many factors and such points should not be
automatically deleted from the dataset.
Even if an observation does not have a large residual, it can
exert a strong influence on the regression function.
General stragegy to measure influence: for each
observation, drop it from the model and measure how much
does the model change?

- p. 9/16

Dropping an observation

Today

Spline models
What are the assumptions?
Problems in the regression
function
Partial residual plot

Added-variable plot
Problems with the errors
Outliers & Influence
Dropping an observation
Different residuals
Crude outlier detection test
Bonferroni correction for
multiple comparisons
DF F IT S
Cooks distance
DF BET AS

A (i) indicates i-th observation was not used in fitting the

model.
For example: Ybj(i) is the regression function evaluated at the
j-th observations predictors BUT the coefficients
(b0,(i) , . . . , bp1,(i) ) were fit after deleting i-th row of data.

Basic idea: if Ybj(i) is very different than Ybj (using all the data)
then i is an influential point for determining Ybj .

- p. 10/16

Different residuals

Today
Spline models
What are the assumptions?
Problems in the regression

function
Partial residual plot
Added-variable plot
Problems with the errors
Outliers & Influence
Dropping an observation
Different residuals

Ordinary residuals: ei = Yi Ybi

Standardized residuals: ri = ei /s(ei ) = ei /b

1 Hii , H is
the hat matrix. (rstandard)

Studentized residuals: ti = ei /d
(i) 1 Hii tnp1 .
(rstudent)

Crude outlier detection test

Bonferroni correction for
multiple comparisons
DF F IT S
Cooks distance
DF BET AS

- p. 11/16

Crude outlier detection test

Today

Spline models
What are the assumptions?
Problems in the regression
function
Partial residual plot

Added-variable plot
Problems with the errors
Outliers & Influence
Dropping an observation
Different residuals

If the studentized residuals are large: observation may be an

outlier.
Problem: if n is large, if we threshold at t1/2,np1 we
will get many outliers by chance even if model is correct.
Solution: Bonferroni correction, threshold at t1/(2n),np1 .

Crude outlier detection test

Bonferroni correction for
multiple comparisons
DF F IT S
Cooks distance
DF BET AS

- p. 12/16

Bonferroni correction for multiple comparisons

Today

If we are doing many t (or other) tests, say m > 1 we can

control overall false positive rate at by testing each one at
level /m.
Proof:
P (at least one false positive)
=P

DF F IT S
Cooks distance
DF BET AS

m
i=1 |Ti |

m
X
i=1

t1/(2m),np2

P |Ti | t1/(2m),np2

m
X

= .
=
m
i=1

- p. 13/16

DF F IT S

Today

Spline models
What are the assumptions?
Problems in the regression
function
Partial residual plot
Added-variable plot
Problems with the errors

Outliers & Influence

Dropping an observation
Different residuals
Crude outlier detection test
Bonferroni correction for
multiple comparisons
DF F IT S
Cooks distance
DF BET AS

Ybi Ybi(i)

DF F IT Si =

b(i) Hii

This quantity measures how much the regression function

changes at the i-th observation when the i-th variable is
deleted.
For small/medium datasets: value of 1 or greater is p
considered suspicious. For large dataset: value of 2 p/n.

- p. 14/16

Cooks distance

Today

Spline models
What are the assumptions?

Di =

Problems in the regression

function
Partial residual plot
Added-variable plot
Problems with the errors

Outliers & Influence

Dropping an observation
Different residuals
Crude outlier detection test
Bonferroni correction for
multiple comparisons
DF F IT S
Cooks distance

bj Ybj(i) )2
(
Y
j=1

p
b2
This quantity measures how much the entire regression
function changes when the i-th variable is deleted.
Should be comparable to Fp,np : if the p-value of Di is 50
percent or more, then the i-th point is likely influential:
investigate this point further.

DF BET AS

- p. 15/16

DF BET AS

Today

Spline models
What are the assumptions?
Problems in the regression
function
Partial residual plot
Added-variable plot
Problems with the errors
Outliers & Influence

Dropping an observation
Different residuals
Crude outlier detection test
Bonferroni correction for
multiple comparisons

DF F IT S
Cooks distance
DF BET AS

DF BET ASj(i) = q

bj bj(i)

2 (X T X)1

b(i)
jj

This quantity measures how much the coefficients change

when the i-th variable is deleted.
For small/medium datasets: value of 1 or greater
is

suspicious. For large dataset: value of 2/ n.

Here is an example.

- p. 16/16

Linear Regression Analaysis - 14
No ratings yet
Linear Regression Analaysis - 14
17 pages
Chapter6 Regression Diagnostic For Leverage and Influence
No ratings yet
Chapter6 Regression Diagnostic For Leverage and Influence
10 pages
4-Regression Diagnostics SAS
No ratings yet
4-Regression Diagnostics SAS
12 pages
Türkan Et Al (2011) - Outlier Detection by Regression Diagnostics Based On Robust Parameter Estimates
No ratings yet
Türkan Et Al (2011) - Outlier Detection by Regression Diagnostics Based On Robust Parameter Estimates
9 pages
Econometrics II: Outliers & Influence
No ratings yet
Econometrics II: Outliers & Influence
71 pages
OLS Assumptions & Issues Guide
No ratings yet
OLS Assumptions & Issues Guide
4 pages
Lecture 20: Outliers and Influential Points
No ratings yet
Lecture 20: Outliers and Influential Points
11 pages
Outlier Detection in Regression
No ratings yet
Outlier Detection in Regression
19 pages
Ra Web
No ratings yet
Ra Web
70 pages
Estadistica, Articulo, Analyzing Outliers: Influential or Nuisance?
No ratings yet
Estadistica, Articulo, Analyzing Outliers: Influential or Nuisance?
3 pages
DB Structure Pivot Etc
No ratings yet
DB Structure Pivot Etc
14 pages
1 Residuals, Outliers and Regression Diagnostics - CH 14.8 15.8 Revised
No ratings yet
1 Residuals, Outliers and Regression Diagnostics - CH 14.8 15.8 Revised
48 pages
Lesson 5 8 Linear Regression Outliers
No ratings yet
Lesson 5 8 Linear Regression Outliers
13 pages
3 Regression Diagnostics
100% (1)
3 Regression Diagnostics
53 pages
Module 3: Linear Regression: TMA4268 Statistical Learning V2025
No ratings yet
Module 3: Linear Regression: TMA4268 Statistical Learning V2025
110 pages
Chapter 14
No ratings yet
Chapter 14
15 pages
Linear Regression Lecture Notes
100% (2)
Linear Regression Lecture Notes
228 pages
Correlation Regression Tutorial
No ratings yet
Correlation Regression Tutorial
42 pages
Robust Regression with STATA Guide
No ratings yet
Robust Regression with STATA Guide
93 pages
Cooks
No ratings yet
Cooks
5 pages
LM04 Extensions of Multiple Regression IFT Notes
No ratings yet
LM04 Extensions of Multiple Regression IFT Notes
17 pages
Introduction of Regression
No ratings yet
Introduction of Regression
57 pages
Multiple Linear Regression: y BX BX BX
No ratings yet
Multiple Linear Regression: y BX BX BX
14 pages
2023 Level II Key Facts and Formula Sheet (KFFS)
No ratings yet
2023 Level II Key Facts and Formula Sheet (KFFS)
14 pages
Chapter 3 Notes
No ratings yet
Chapter 3 Notes
5 pages
1 - Simple Linear Regression
No ratings yet
1 - Simple Linear Regression
43 pages
R Data Analysis Techniques
No ratings yet
R Data Analysis Techniques
6 pages
ECON 342 AE Model Specification and Data Problems 2021
No ratings yet
ECON 342 AE Model Specification and Data Problems 2021
43 pages
78 Outliers Etc
No ratings yet
78 Outliers Etc
4 pages
Cheatsheet
No ratings yet
Cheatsheet
4 pages
Regression Models Course Notes
No ratings yet
Regression Models Course Notes
102 pages
Chapter 8 Regression Model - 2023
No ratings yet
Chapter 8 Regression Model - 2023
21 pages
BA501 Week5 Linear Regression
No ratings yet
BA501 Week5 Linear Regression
45 pages
Nonlinear Regression
No ratings yet
Nonlinear Regression
12 pages
Linear Regression Analysis Guide
No ratings yet
Linear Regression Analysis Guide
2 pages
An Alternative Approach To AIC and Mallow's CP Statistics Based Relative Influence Measure (RIMs) in Regression Variable Selection
No ratings yet
An Alternative Approach To AIC and Mallow's CP Statistics Based Relative Influence Measure (RIMs) in Regression Variable Selection
6 pages
Calibration and Curve Fitting Guide
No ratings yet
Calibration and Curve Fitting Guide
42 pages
Regression Analysis
No ratings yet
Regression Analysis
65 pages
Regression Outliers and Diagnostics
No ratings yet
Regression Outliers and Diagnostics
6 pages
Linear Regression Analysis Guide
100% (5)
Linear Regression Analysis Guide
2 pages
Linear Regression Essentials
No ratings yet
Linear Regression Essentials
2 pages
Introduction To Curve Fitting
No ratings yet
Introduction To Curve Fitting
10 pages
STAT630Slide Adv Data Analysis
0% (1)
STAT630Slide Adv Data Analysis
238 pages
L4 Emt 2101 Engineering Mathematics Iii
No ratings yet
L4 Emt 2101 Engineering Mathematics Iii
25 pages
Chat GPT
No ratings yet
Chat GPT
6 pages
Edmund S. Scanlon
No ratings yet
Edmund S. Scanlon
31 pages
Chapter4 - Part 2
No ratings yet
Chapter4 - Part 2
37 pages
What Is Simple Linear Regression?
No ratings yet
What Is Simple Linear Regression?
7 pages
Advanced Econometrics: OLS & Regression Analysis
No ratings yet
Advanced Econometrics: OLS & Regression Analysis
65 pages
Chapter 7 (I) Correlation and Regression Model - Oct21
No ratings yet
Chapter 7 (I) Correlation and Regression Model - Oct21
23 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
HLST 2302 Lecture 4
No ratings yet
HLST 2302 Lecture 4
30 pages
Data Science Interview Preparation
100% (1)
Data Science Interview Preparation
113 pages
Linear Regression for Researchers
No ratings yet
Linear Regression for Researchers
41 pages
Topic - 9 PDF
No ratings yet
Topic - 9 PDF
12 pages
Chapter 3 MLR
No ratings yet
Chapter 3 MLR
40 pages
Symbols For Pipe Fittings
100% (1)
Symbols For Pipe Fittings
5 pages
Refrigerant 134a in A Piston-Cylinder Assemblies
No ratings yet
Refrigerant 134a in A Piston-Cylinder Assemblies
10 pages
Whats New in Renewables 3
No ratings yet
Whats New in Renewables 3
5 pages
Plumbing Multiple Choice Practice
100% (4)
Plumbing Multiple Choice Practice
9 pages
Vg-. CV,: Eg NPG N (P + 2Z/R)
No ratings yet
Vg-. CV,: Eg NPG N (P + 2Z/R)
3 pages
Vg-. CV,: Eg NPG N (P + 2Z/R)
No ratings yet
Vg-. CV,: Eg NPG N (P + 2Z/R)
3 pages
Simple Linear Regression: Definition of Terms
No ratings yet
Simple Linear Regression: Definition of Terms
13 pages
Math Review For Plumbers
No ratings yet
Math Review For Plumbers
9 pages
Homework # 1: Stat 6338 - Adv - Stat.Methods II
No ratings yet
Homework # 1: Stat 6338 - Adv - Stat.Methods II
1 page
Advanced Math Review Guide
No ratings yet
Advanced Math Review Guide
4 pages
(Ebook PDF) Global Business Today 10th Edition Instant Download
100% (1)
(Ebook PDF) Global Business Today 10th Edition Instant Download
64 pages
Philip Condit and The 777 Project - Case Study Solution
100% (1)
Philip Condit and The 777 Project - Case Study Solution
9 pages
Systems Engineering
80% (5)
Systems Engineering
74 pages
C++ Builder Programming 2nd Edition PDF
100% (1)
C++ Builder Programming 2nd Edition PDF
820 pages
NL Master Specification Guide For Public Funded Buildings: Re-Issued 2019/02/07
No ratings yet
NL Master Specification Guide For Public Funded Buildings: Re-Issued 2019/02/07
3 pages
DPFEM Annual Report 2015 16
No ratings yet
DPFEM Annual Report 2015 16
122 pages
Introduction To Phonetics and Phonology
No ratings yet
Introduction To Phonetics and Phonology
17 pages
Gender Differences & Relationships
No ratings yet
Gender Differences & Relationships
25 pages
Chapter 4 - Past Paper Question (A Maths) For Practice
No ratings yet
Chapter 4 - Past Paper Question (A Maths) For Practice
16 pages
Medical Image Captioning Papers
No ratings yet
Medical Image Captioning Papers
9 pages
Digital Control Systems Lecture
No ratings yet
Digital Control Systems Lecture
10 pages
Terapi Psikologi Individu: /theory of Individual Psychology
No ratings yet
Terapi Psikologi Individu: /theory of Individual Psychology
27 pages
Summer Training Report About Aerobridges
No ratings yet
Summer Training Report About Aerobridges
36 pages
7 Common Mistakes Men Make When Attracting A Woman
100% (1)
7 Common Mistakes Men Make When Attracting A Woman
21 pages
Cloudera Quickstart VM
No ratings yet
Cloudera Quickstart VM
11 pages
Pranav Mistry
No ratings yet
Pranav Mistry
1 page
Shimadzu Dar-8000i Adj. M.
No ratings yet
Shimadzu Dar-8000i Adj. M.
96 pages
She Is A Born Leader (Autosaved)
No ratings yet
She Is A Born Leader (Autosaved)
54 pages
Production Planning for Managers
100% (2)
Production Planning for Managers
3 pages
HMI User Guide
No ratings yet
HMI User Guide
34 pages
Mock Cmat LR
No ratings yet
Mock Cmat LR
7 pages
C Decimal Numbers Multiplication
No ratings yet
C Decimal Numbers Multiplication
2 pages
Cobol
No ratings yet
Cobol
39 pages
SET4523 Final Tutorial
No ratings yet
SET4523 Final Tutorial
60 pages
Treasure Hunting
No ratings yet
Treasure Hunting
3 pages
Chapter III
No ratings yet
Chapter III
19 pages
Data Collection Systems
No ratings yet
Data Collection Systems
22 pages
A Study of The Relationship Between Tard PDF
No ratings yet
A Study of The Relationship Between Tard PDF
10 pages
Rules of Inference in Logic
No ratings yet
Rules of Inference in Logic
26 pages

Multiple Linear Regression: Diagnostics: Statistics 203: Introduction To Regression and Analysis of Variance

Uploaded by

Multiple Linear Regression: Diagnostics: Statistics 203: Introduction To Regression and Analysis of Variance

Uploaded by

Statistics 203: Introduction to Regression

and Analysis of Variance

Multiple Linear Regression: Diagnostics

Splines + other bases.

Outliers & Influence

Splines are piecewise polynomials functions, i.e. on an

Crude outlier detection test

What are the assumptions?

What is the full model for a given design matrix X ?

What are the assumptions?

Problems in the regression

Errors N (0, 2 I).

Problems in the regression function

True regression function may have higher-order non-linear

Outliers & Influence

Partial residual plot

Outliers & Influence

Can help to determine if variance depends on X j and

(I H(j) )Y vs.(I H(j) )Xj .

For 1 j p 1 let H(j) be the Hat matrix with this predictor

Plot should be linear and slope should be j . Why?

Bonferroni correction for

(I H(j) )Y = (I H(j) )X(j) (j) + j (I H(j) )Xj + (I H(j) )

(I H(j) )Y = j (I H(j) )Xj + (I H(j) )

Also can be helpful for detecting outliers.

Problems with the errors

Outliers & Influence

Errors may not be normally distributed. We will look at

Outliers & Influence

Crude outlier detection test

Some residuals may be much larger than others which can

A (i) indicates i-th observation was not used in fitting the

Ordinary residuals: ei = Yi Ybi

Standardized residuals: ri = ei /s(ei ) = ei /b

Crude outlier detection test

Crude outlier detection test

If the studentized residuals are large: observation may be an

Crude outlier detection test

Bonferroni correction for multiple comparisons

If we are doing many t (or other) tests, say m > 1 we can

Outliers & Influence

This quantity measures how much the regression function

Problems in the regression

Outliers & Influence

This quantity measures how much the coefficients change

suspicious. For large dataset: value of 2/ n.

You might also like