0% found this document useful (0 votes)

28 views54 pages

2DI90 ch11

Chapter 11 of the Probability & Statistics course discusses the inadequacies of traditional statistical models when analyzing relationships between two related quantities, exemplified by the correlation between hydrocarbon levels and oxygen purity in a chemical distillation process. It introduces simple linear regression and least squares methods for modeling these relationships, emphasizing the importance of model validity and the interpretation of results. The chapter also covers confidence intervals, prediction intervals, and the significance of regression models while cautioning against assuming causation from correlation.

Uploaded by

Rasha Elsayed Sakr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views54 pages

2DI90 ch11

Uploaded by

Rasha Elsayed Sakr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 54

2DI90

Probability &
Statistics

2DI90 – Chapter 11 of MR
Motivation
So far in the course we have dealt with statistical models that
assume data is a random sample from some distribution…

Although this is quite powerful, it is inadequate in many

situations, for instance, when one observes two quantities that
are related:

Example: Data was collected about several models of laptop

computers available in a big online store: In particular, for each
model of computer we took note of the processor speed and
time it took to perform a certain benchmark task (e.g. encoding
a 1 minute .divx video).

As expected computers with faster processors can typically do

the task in less time, but can we say what the exact relation
between processor speed and time to complete the task? 2
Example 11.1 (MR)
Data was collected in a chemical distillation plan producing
oxygen for medical applications. One of the steps reduces the
impurities by condensation. The percentage of hydrocarbons
collected in the condenser might give a good indication of the
oxygen purity:
Hydrocarbon level (%) Purity (%)
0.99 90.01
1.02 89.05

98
1.15 91.43
1.29 93.74
1.46 96.73
96
1.36 94.45
Purity oxygen

0.87 87.59
94

1.23 91.77
1.55 99.42
1.4 93.65
92

1.19 93.54
1.15 92.52
90

0.98 90.56
1.01 89.54
1.11 89.85
88

1.2 90.39
1.26 93.25
1.32 93.41 0.9 1.0 1.1 1.2 1.3 1.4 1.5
1.43 94.98 3
0.95 87.33 Hydrocarbonlevel
Example 11.1 (MR)

98
Clearly it seems the

96
Purity oxygen
hydrocarbon level is telling us

94
something about the purity of

92
oxygen produced.

90
88
0.9 1.0 1.1 1.2 1.3 1.4 1.5

Hydrocarbonlevel

4
Example 11.1 (MR)

98
96
One possible way to construct

Purity oxygen

94
a model for the observations

92
is to assume the oxygen purity

90
levels are measured with a

88
(small) error 0.9 1.0 1.1 1.2 1.3 1.4 1.5

Hydrocarbonlevel

5
Simple Linear Regression
Definition: Simple Regression Model

6
Simple Linear Regression

98
96
Purity oxygen

94
92
90
88

0.9 1.0 1.1 1.2 1.3 1.4 1.5

Hydrocarbonlevel

We want ALL the distances between the fitting line and the
points to be small:

7
Least Squares

98
96
Purity oxygen
Minimize instead the sum of the

94
SQUARED distances !!!

92
90
88
0.9 1.0 1.1 1.2 1.3 1.4 1.5

Hydrocarbonlevel

8
Least Squares

9
Least Squares
Definition: Least Squares Estimates

10
Least Squares

11
Least Squares
Definition: Least Squares Estimates

Definition: Fitted Regression Line and Residuals

12
Example 11.1 (MR)
Data was collected in a chemical distillation plan producing
oxygen for medical applications. One of the steps reduces the
impurities by condensation. The percentage of hydrocarbons
collected in the condenser might give a good indication of the
oxygen purity:

98
Hydrocarbon level (%) Purity (%)
0.99 90.01

96
1.02 89.05
Purity oxygen
1.15 91.43
1.29 93.74 94
1.46 96.73
1.36 94.45
92

0.87 87.59
1.23 91.77
90

1.55 99.42
1.4 93.65
1.19 93.54
88

1.15 92.52
0.98 90.56
1.01 89.54 0.9 1.0 1.1 1.2 1.3 1.4 1.5
1.11 89.85
1.2 90.39 Hydrocarbonlevel
1.26 93.25
1.32 93.41
1.43 94.98 13
0.95 87.33
Example 11.1 (MR)

98
You must always be careful
with the interpretation of 96
Purity oxygen

the results you have… Your

inferences are only as valid

as the model you are using

is reasonable !!!
88

0.9 1.0 1.1 1.2 1.3 1.4 1.5

14
Hydrocarbonlevel
Estimating the Variance

98
96
Purity oxygen

94
92
90
88
0.9 1.0 1.1 1.2 1.3 1.4 1.5

Hydrocarbonlevel

Definition: Estimate of the Variance

15
Least Squares

16
Properties of Least Squares

Proposition: Least Squares Coefficients

17
Partial proof on the board (maybe)…
Relation to Maximum Likelihood Estimation

Proposition: Maximum Likelihood in Regression

Home Exercise: Prove the above proposition

18
Distribution of the Regression Coeff.’s

19
Distribution of the Regression Coeff.’s
Theorem:

20
Testing in Linear Regression

21
Example 11.1 (MR)
Data was collected in a chemical distillation plan producing
oxygen for medical applications. One of the steps reduces the
impurities by condensation. The percentage of hydrocarbons
collected in the condenser might give a good indication of the
oxygen purity:
Hydrocarbon level (%) Purity (%)
0.99 90.01
1.02 89.05

98
1.15 91.43
1.29 93.74
1.46 96.73
96
1.36 94.45
Purity oxygen

0.87 87.59
94

1.23 91.77
1.55 99.42
1.4 93.65
92

1.19 93.54
1.15 92.52
90

0.98 90.56
1.01 89.54
1.11 89.85
88

1.2 90.39
1.26 93.25
1.32 93.41 0.9 1.0 1.1 1.2 1.3 1.4 1.5
1.43 94.98 22
0.95 87.33 Hydrocarbonlevel
Example

23
Example 11.2 (MR)

24
Summary of Testing Procedures
Tests for the Slope:

25
Summary of Testing Procedures
Tests for the Intercept:

26
Summary of Testing Procedures
Tests for the Variance:

27
Analysis of Variance

28
98 ANOVA - ANalysis Of VAriance

96
96

94
94

94
y

92
92

90
90

88
88

0.9 1.1 1.3 1.5 0.9 1.1 1.3 1.5 0.9 1.1 1.3 1.5

x x x
29
ANOVA - Analysis of Variance

30
Example 11.3 (MR)
Data was collected in a chemical distillation plan producing
oxygen for medical applications. One of the steps reduces the
impurities by condensation. The percentage of hydrocarbons
collected in the condenser might give a good indication of the
oxygen purity:
Hydrocarbon level (%) Purity (%)
0.99 90.01
1.02 89.05

98
1.15 91.43
1.29 93.74
1.46 96.73
96

1.36 94.45
Purity oxygen

0.87 87.59
94

1.23 91.77
1.55 99.42
1.4 93.65
92

1.19 93.54
1.15 92.52
90

0.98 90.56
1.01 89.54
1.11 89.85
88

1.2 90.39
1.26 93.25
1.32 93.41 0.9 1.0 1.1 1.2 1.3 1.4 1.5
1.43 94.98 31
0.95 87.33 Hydrocarbonlevel
Example 11.3 (MR)

32
Example 11.3 (MR)

33
Example 11.3 (MR)

34
Example 11.3 (MR)

Descriptive
Statistics of the
Residuals

Test stats.
and p-values

35
Relation Between ANOVA and t-tests

36
Confidence Intervals
Recall the result we shown before:

Theorem:

37
CI for the Regr. Coeff.’s and the Variance
Confidence Intervals:

One-sided confidence intervals are obtained as we seen before…

38
Example 11.4 (MR)

39
CIs on the Mean Response

Definition:

Proposition:

40
CIs on the Mean Response

Proposition:

We have all the pieces needed to construct a nice

confidence interval for the mean response…

41
CIs on the Mean Response
Confidence Interval for the Mean Response:

It is quite interesting to note that the CI is the

narrowest when , and becomes wider elsewhere.

As before, using these is a matter of plugging in the

proper quantities…

42
Example 11.5 (MR)
We can plot these CI for each point, and get a nice
(pointwise) confidence band around the regression line !!!

105
100
Purity oxygen

95
90
85

Regression Line
80

Mean Response 95% CI

0.5 1.0 1.5 2.0

43
Hydrocarbon level
Regression Prediction Intervals

Proposition:

44
Prediction Interval for Regression
Prediction Interval for Regression:

Note these intervals are always a significantly wider than

the CIs of the previous slides…

45
Example 11.6 (MR)
105
100
Purity oxygen

95
90
85

Regression Line
Mean Response 95% CI
80

95% Prediction Interval

0.5 1.0 1.5 2.0

Hydrocarbon level 46
Important Remarks
• All the predictions, estimates, and confidence statements are
only valid if the model assumptions are reasonable…
• It is quite dangerous to extrapolate the response for
predictor values outside the range you observed.
• If the regression model is significant it means the response
and predictor variable are correlated, but it doesn’t say
anything about a causal relation !!!

47
Correlation does NOT IMPLY Causation

48
Adequacy of the Regression Model

Our regression model only makes sense if the above expression

is approximately true. Furthermore, to do hypotheses test we
need to also assume that the error variables follow a normal
distribution.

You should always take such assumptions with a grain of salt,

and see if there is enough evidence to reject them !!!

49
Residual Analysis

Residuals Normal Q-Q Plot

2
2

Sample Quantiles

1
1
model$residuals

0
0

-1
-1

Shapiro-Wilk p-value=0.9293

0.9 1.0 1.1 1.2 1.3 1.4 1.5 -2 -1 0 1 2

x Theoretical Quantiles 50
Residual Analisys

51
Coefficient of Determination

Definition:

52
Coefficient of Determination

53
Final Remarks
The topic of regression models is very broad, and we barely
scrapped the surface !!! There are other courses offered at the
TU/e exclusively on this topic…

• We can consider models with transformed variables

• We can consider logistic regression, which allows us to deal
with categorical data
• There is also non-parametric regression models, as well as non-
linear regression procedures

• Most practical methods in machine learning are regression-like

procedures (e.g. Support Vector Machines).

Always question the validity of the models you are using.

All models are wrong…
…but some are useful. 54

Solution of Problem Set 1 For Purity Hydrocarbon Data PDF
No ratings yet
Solution of Problem Set 1 For Purity Hydrocarbon Data PDF
4 pages
Chapt 11 Simples Linear Regression and Correlation
No ratings yet
Chapt 11 Simples Linear Regression and Correlation
48 pages
webMATH236 Lecture8
No ratings yet
webMATH236 Lecture8
39 pages
Lecture 7 - Regression
No ratings yet
Lecture 7 - Regression
36 pages
PSLP Unit-3 (Regression Lines) - 1
No ratings yet
PSLP Unit-3 (Regression Lines) - 1
11 pages
L4 Data Preprocessing
No ratings yet
L4 Data Preprocessing
40 pages
06 Simple Modeling
No ratings yet
06 Simple Modeling
16 pages
Note3 PreClass PDF
No ratings yet
Note3 PreClass PDF
21 pages
Quality Is A LOUSY Idea-: If It's Only An Idea
No ratings yet
Quality Is A LOUSY Idea-: If It's Only An Idea
32 pages
Statistical Data Treatment: - Part 1 (Manual Calculations)
No ratings yet
Statistical Data Treatment: - Part 1 (Manual Calculations)
51 pages
Analytical Calibration Guide
No ratings yet
Analytical Calibration Guide
13 pages
Linear Regression
No ratings yet
Linear Regression
29 pages
Method - 410-4 - 1993 DQO
No ratings yet
Method - 410-4 - 1993 DQO
13 pages
HorwiHorwitz Equation As Quality Benchmark in ISO/IEC 17025 Testing Laboratory Carlos Rivera Rosario Rodrigueztz Iie 10
No ratings yet
HorwiHorwitz Equation As Quality Benchmark in ISO/IEC 17025 Testing Laboratory Carlos Rivera Rosario Rodrigueztz Iie 10
50 pages
Basic Principles of Quality Control
No ratings yet
Basic Principles of Quality Control
32 pages
Certificate of Analysis: (Batch R) Certified Reference Material Information
No ratings yet
Certificate of Analysis: (Batch R) Certified Reference Material Information
4 pages
DGA - DGA Gas-In-Oil Standards A Tool - M Cyr
100% (1)
DGA - DGA Gas-In-Oil Standards A Tool - M Cyr
14 pages
Calibration Curve Insights
No ratings yet
Calibration Curve Insights
16 pages
Polynomial Regression Insights
No ratings yet
Polynomial Regression Insights
36 pages
Talanta: J.M. Jurado, A. Alcázar, R. Muñiz-Valencia, S.G. Ceballos-Magaña, F. Raposo
No ratings yet
Talanta: J.M. Jurado, A. Alcázar, R. Muñiz-Valencia, S.G. Ceballos-Magaña, F. Raposo
9 pages
Chap 4 SPC
100% (1)
Chap 4 SPC
47 pages
QEM Problem Set 1 - Cost of Quality and Statistics For Quality
No ratings yet
QEM Problem Set 1 - Cost of Quality and Statistics For Quality
2 pages
Gsoe9810 Week 9
No ratings yet
Gsoe9810 Week 9
30 pages
Selection, Verification and Validation of Methods PDF
No ratings yet
Selection, Verification and Validation of Methods PDF
47 pages
Uop678 04
No ratings yet
Uop678 04
8 pages
Chem 2241 Midterm
No ratings yet
Chem 2241 Midterm
11 pages
Ery Eco
No ratings yet
Ery Eco
6 pages
Lesson 16 SPC - 1
No ratings yet
Lesson 16 SPC - 1
56 pages
Sta111 (2014) Exams
No ratings yet
Sta111 (2014) Exams
1 page
Certificate of Analysis: (Batch C) Certified Reference Material Information
No ratings yet
Certificate of Analysis: (Batch C) Certified Reference Material Information
4 pages
1HY013 - Exercise 4
No ratings yet
1HY013 - Exercise 4
3 pages
Cleaning Validation Toc Method
100% (1)
Cleaning Validation Toc Method
2 pages
TPai - 300 00101 - 900 Series Performance Specs - EN
No ratings yet
TPai - 300 00101 - 900 Series Performance Specs - EN
13 pages
Pafi Inin
No ratings yet
Pafi Inin
35 pages
OSHA 2005 Method XXXX Air Sampling Methods Analysis Spectroscopic
No ratings yet
OSHA 2005 Method XXXX Air Sampling Methods Analysis Spectroscopic
39 pages
NLBQCM
No ratings yet
NLBQCM
58 pages
Engineering Quality Control Guide
No ratings yet
Engineering Quality Control Guide
23 pages
AIB2 D, AIB2D, 4th Signal (DIESEL.D)
No ratings yet
AIB2 D, AIB2D, 4th Signal (DIESEL.D)
3 pages
LC Qaqc
No ratings yet
LC Qaqc
22 pages
AIB2 D, AIB2D, 4th Signal (DIESEL NAOH 10 %.D)
No ratings yet
AIB2 D, AIB2D, 4th Signal (DIESEL NAOH 10 %.D)
2 pages
An20190927fueloilicpoes en
No ratings yet
An20190927fueloilicpoes en
5 pages
Principles of Optical Dissolved Oxygen Measurements
No ratings yet
Principles of Optical Dissolved Oxygen Measurements
4 pages
Presentation AirQuality Prediction Using Machine Learning
No ratings yet
Presentation AirQuality Prediction Using Machine Learning
16 pages
Analytical Chemistry Methods Guide
No ratings yet
Analytical Chemistry Methods Guide
23 pages
Thermodynamics in Hysys
No ratings yet
Thermodynamics in Hysys
26 pages
1.2.2 Thermodynamics and HYSYS - 5 PDF
No ratings yet
1.2.2 Thermodynamics and HYSYS - 5 PDF
26 pages
10 Regression
No ratings yet
10 Regression
41 pages
AB 32252 SSBS Assignment 2
No ratings yet
AB 32252 SSBS Assignment 2
6 pages
Destilacion Del Diesel A30%
No ratings yet
Destilacion Del Diesel A30%
2 pages
Hasil UTS SI 3 - Hanif Srisubaga Alim
No ratings yet
Hasil UTS SI 3 - Hanif Srisubaga Alim
4 pages
Syntactic and Dependency Parsing
No ratings yet
Syntactic and Dependency Parsing
159 pages
Slides08 LR Parsing
No ratings yet
Slides08 LR Parsing
25 pages
Primes
No ratings yet
Primes
39 pages
10 Estimators Pre Lecture
No ratings yet
10 Estimators Pre Lecture
109 pages
Bag - of - Words NLP
100% (1)
Bag - of - Words NLP
23 pages
Lect33 Textcat
No ratings yet
Lect33 Textcat
70 pages
2DI90 chID190-CH5
No ratings yet
2DI90 chID190-CH5
62 pages
Reduction Proofs
No ratings yet
Reduction Proofs
9 pages
ch07 Consistency Replication
No ratings yet
ch07 Consistency Replication
30 pages
New Trends For Authentication
No ratings yet
New Trends For Authentication
5 pages
Tut4 - WordEmb NLP
No ratings yet
Tut4 - WordEmb NLP
30 pages
NLP LLM
No ratings yet
NLP LLM
47 pages
4 - Slides Regualer Expression
No ratings yet
4 - Slides Regualer Expression
75 pages
2DI90 ch9
No ratings yet
2DI90 ch9
83 pages
Jarrar LectureNotes Ch1 Introduction
No ratings yet
Jarrar LectureNotes Ch1 Introduction
18 pages
3 - Slides Corpus3
No ratings yet
3 - Slides Corpus3
88 pages
CSE538 sp25 (4) Lexical and Vector Semantics 2-25 NLP
No ratings yet
CSE538 sp25 (4) Lexical and Vector Semantics 2-25 NLP
126 pages
13-Oo-Opolymorphism PLC
0% (1)
13-Oo-Opolymorphism PLC
15 pages
ML4D-L6 nlp2
No ratings yet
ML4D-L6 nlp2
58 pages
13-Neuralcrf Pos Tagging
No ratings yet
13-Neuralcrf Pos Tagging
40 pages
2.BasicTextProcessing NEW
No ratings yet
2.BasicTextProcessing NEW
39 pages
01-Bayes-All-Handout Prob
No ratings yet
01-Bayes-All-Handout Prob
28 pages
Imc Shift-Cipher
No ratings yet
Imc Shift-Cipher
17 pages
04-Textcat Text Class
No ratings yet
04-Textcat Text Class
77 pages
07 Covariance Answers Hidden Lecture
No ratings yet
07 Covariance Answers Hidden Lecture
62 pages
POS Tagging
No ratings yet
POS Tagging
63 pages
01-Introduction PLC
No ratings yet
01-Introduction PLC
53 pages
2 Corpora and Smoothing
No ratings yet
2 Corpora and Smoothing
85 pages
02 Random Vars All Handout
No ratings yet
02 Random Vars All Handout
23 pages
Ch. 1 Notes
No ratings yet
Ch. 1 Notes
11 pages
Spatial Interpolation Method Optimization
No ratings yet
Spatial Interpolation Method Optimization
9 pages
Fitzgibbon Algorithm
No ratings yet
Fitzgibbon Algorithm
5 pages
Advanced Issues in Partial Least Squares Structural Equation Modeling by JR Hair Joe Marko Sarstedt Ebook and TestBank Bundle Fast Access
No ratings yet
Advanced Issues in Partial Least Squares Structural Equation Modeling by JR Hair Joe Marko Sarstedt Ebook and TestBank Bundle Fast Access
347 pages
Guidelines For The Calibration of Dosimeters For Use in Radiation Processing
No ratings yet
Guidelines For The Calibration of Dosimeters For Use in Radiation Processing
26 pages
Ultrasonic Pulse Velocity-Compressive Strength Relationship For Portland Cement Mortars Cured at Di Erent Conditions
No ratings yet
Ultrasonic Pulse Velocity-Compressive Strength Relationship For Portland Cement Mortars Cured at Di Erent Conditions
14 pages
Assignment 5
No ratings yet
Assignment 5
10 pages
Applied Linear Algebra and Optimization Using MATLAB
No ratings yet
Applied Linear Algebra and Optimization Using MATLAB
1,176 pages
Econometrics Notes
No ratings yet
Econometrics Notes
30 pages
Matlab Assignment File
No ratings yet
Matlab Assignment File
14 pages
Dummy Variable Regression Models 9.1:, Gujarati and Porter
No ratings yet
Dummy Variable Regression Models 9.1:, Gujarati and Porter
18 pages
E Views Command
No ratings yet
E Views Command
8 pages
Demand Forecasting Methods
No ratings yet
Demand Forecasting Methods
5 pages
Tutorial Del Programa SciDAVis
No ratings yet
Tutorial Del Programa SciDAVis
35 pages
J ctv1pdrpsj 22
No ratings yet
J ctv1pdrpsj 22
7 pages
Fdsa Unit 5
No ratings yet
Fdsa Unit 5
48 pages
Circulating Load
No ratings yet
Circulating Load
6 pages
Chapter 3 Summary
No ratings yet
Chapter 3 Summary
8 pages
Business Analytics
No ratings yet
Business Analytics
6 pages
Engineering Optimization Guide
No ratings yet
Engineering Optimization Guide
52 pages
MOB Unit 2 SBA Group 8
0% (1)
MOB Unit 2 SBA Group 8
25 pages
Stats Lab 2
No ratings yet
Stats Lab 2
7 pages
2019 H2 Mathematics Paper 2 (Solutions)
No ratings yet
2019 H2 Mathematics Paper 2 (Solutions)
16 pages
Linear Programming for Statisticians
100% (1)
Linear Programming for Statisticians
8 pages
Olah Data P3
No ratings yet
Olah Data P3
11 pages
W11 - A New Method To Estimate Heat Source Parameters in Gas Metal Arc Welding Simulation Process
No ratings yet
W11 - A New Method To Estimate Heat Source Parameters in Gas Metal Arc Welding Simulation Process
9 pages
Power Grid State Estimation Guide
No ratings yet
Power Grid State Estimation Guide
6 pages
Joint Inversion of Seismic Refraction and Resistivity Data Using Layered Models - Applications To Groundwater Investigation
No ratings yet
Joint Inversion of Seismic Refraction and Resistivity Data Using Layered Models - Applications To Groundwater Investigation
13 pages
KU Geography UG Syllabus NEP 2020 Sem V VI 11 06 2025 Final
No ratings yet
KU Geography UG Syllabus NEP 2020 Sem V VI 11 06 2025 Final
14 pages
Introduction To Algorithms For Data Mining and Machine Learning 1st Edition - Ebook PDF PDF Download
100% (4)
Introduction To Algorithms For Data Mining and Machine Learning 1st Edition - Ebook PDF PDF Download
81 pages
Crystallography of Piperidin-4-one
No ratings yet
Crystallography of Piperidin-4-one
11 pages

2DI90 ch11

Uploaded by

2DI90 ch11

Uploaded by

2DI90

Although this is quite powerful, it is inadequate in many

Example: Data was collected about several models of laptop

As expected computers with faster processors can typically do

0.9 1.0 1.1 1.2 1.3 1.4 1.5

Definition: Fitted Regression Line and Residuals

the results you have… Your

inferences are only as valid

as the model you are using

0.9 1.0 1.1 1.2 1.3 1.4 1.5

Definition: Estimate of the Variance

Proposition: Least Squares Coefficients

Proposition: Maximum Likelihood in Regression

Home Exercise: Prove the above proposition

One-sided confidence intervals are obtained as we seen before…

We have all the pieces needed to construct a nice

It is quite interesting to note that the CI is the

As before, using these is a matter of plugging in the

Mean Response 95% CI

0.5 1.0 1.5 2.0

Note these intervals are always a significantly wider than

95% Prediction Interval

0.5 1.0 1.5 2.0

Our regression model only makes sense if the above expression

You should always take such assumptions with a grain of salt,

Residuals Normal Q-Q Plot

0.9 1.0 1.1 1.2 1.3 1.4 1.5 -2 -1 0 1 2

• We can consider models with transformed variables

• Most practical methods in machine learning are regression-like

Always question the validity of the models you are using.

You might also like