0% found this document useful (0 votes)
17 views36 pages

CH 17

The document discusses curve fitting techniques, particularly least-squares regression, which is used to find the best-fitting line for a set of discrete data points. It explains the importance of minimizing residual errors and introduces statistical concepts such as standard deviation, variance, and correlation coefficients to evaluate the fit of the model. Additionally, it covers methods for linearizing nonlinear relationships and provides examples of fitting equations to data sets.

Uploaded by

usc12629
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views36 pages

CH 17

The document discusses curve fitting techniques, particularly least-squares regression, which is used to find the best-fitting line for a set of discrete data points. It explains the importance of minimizing residual errors and introduces statistical concepts such as standard deviation, variance, and correlation coefficients to evaluate the fit of the model. Additionally, it covers methods for linearizing nonlinear relationships and provides examples of fitting equations to data sets.

Uploaded by

usc12629
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Chapter 17

~Curve Fitting ~
Least-Squares Regression

“These notes are only to be used in class presentations”


1
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Curve Fitting

• Data is given for discrete values.


• Fit the best curve to a discrete data set and obtain estimates at
points between the discrete values.

Two common applications in engineering:


Trend analysis. Predicting values of dependent variable:
extrapolation beyond data points or interpolation between
data points.
Hypothesis testing. Comparing existing mathematical model
with measured data.

2
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Two general approaches:

Data exhibit a significant degree of


scatter (error or “noise”) Find a
single curve that represents the
general trend of the data. Function
doesn’t have to intersect the points.
Least-squares regression

Data is very precise Pass a curve(s)


exactly through each of the points.

Interpolation 3
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Basis Statistics
Given a set of data, y1, y2 ,  yn
Arithmetic mean - The sum of the individual data points (yi) divided by the
number of points.

1 n
y   yi i  1,, n
n i 1

Standard deviation – a common measure of spread for a sample (sample


standard deviation)
Degrees of freedom
St
sy  • If the individual measurements are spread out widely
n 1
around the mean, St and consequently sy will be large.
n
St   ( yi  y ) 2 •If they are grouped tightly , sy will be small.
i 1 4
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Variance
St
s 2y 
n 1

Coefficient of variation – quantifies the spread of data (similar to relative


error)

sy
c.v.  100%
y

5
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Linear Least-Squares Regression
• Set of data points: (x1, y1), (x2, y2),…,(xn, yn)
• The goal is to come up with a e Error
straight line that comes close to
fitting given data points.
•The closeness is determined by
the error (e) or residual
yi : measured value
e : error
Line equation
yi = a0 + a1 xi + e y = a 0 + a1 x
e = yi - a0 - a1 xi
a1 : slope
a0 : intercept 6
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Choosing Criteria For a “Best Fit”
• Minimize the sum of the residual errors
for all available data? Regression
n n line

i  i o 1i
e 
i 1
( y  a  a
i 1
x ) Inadequate!

– Positive and negative errors can cancel out


• Sum of the absolute values?
n n

 ei   yi  a0  a1 xi
i 1 i 1
Inadequate!

– May not get a unique best fit


• How about minimizing the distance that
an individual point falls from the line?
– May be overly influenced by outliers
Inadequate!
7
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
• Best strategy is to minimize the sum of the squares
of the residuals between the measured-y and the y
calculated with the linear model:
e Error
n
S r   ei2
i 1
n
  ( yi ,measured  yi ,model ) 2
i 1
n
S r   ( yi  a0  a1 x i ) 2
i 1
Sr: Sum of the squares of residuals
around the regression line.
• Yields a unique line for a given set of data
• Easy to differentiate
• Positive errors don’t cancel out negative errors.
• Large errors are magnified
8
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Least-Squares Fit of a Straight Line

• Need to compute a0 and a1 such that Sr is minimized

n n
Minimize error : S r   ei2   ( yi  a0  a1xi ) 2
i 1 i 1

n n n n
S r
 2 ( yi  ao  a1xi )  0   yi   a0   a1xi  0
ao i 1 i 1 i 1 i 1
n n n n
S r
 2 [( yi  ao  a1xi ) xi ]  0   i i  0 i  1 i 0
y x  a x  a x 2
a1 i 1 i 1 i 1 i 1

9
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
n
Since  a0  na0
i 1
 n  n
na0    xi a1   yi
 
 i 1  i 1 Normal equations which can
 n   n 2 n be solved simultaneously
  xi a0    xi a1   xi yi
   
 i 1   i 1  i 1

In matrix form:

 n   n 
 n  xi  a    yi 
 i 1  0   i 1 
n n  a   n 
 xi 2  1  
i 1
 i x 
i 1
xi yi 

i 1  10
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
When these two equations are solved;

Mean values

Convenient for computer applications.


11
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
“Goodness” of our fit

Sy/x : standard error of the estimate. The error is for the


predicted value of y corresponding to a particular value of x.

Notice the improvement in the error due to linear regression


12
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
The improvement of the total error is measured by the correlation coefficient.

• Sr : Sum of the squares n

of residuals around the regression line S r   ( yi  a0  a1 xi ) 2


i 1

• St : Sum of the squares n


of residuals around the mean St   ( yi  y ) 2
i 1
• (St – Sr) quantifies the improvement or error reduction due to describing
data in terms of a straight line rather than as an average value.
r2 : coefficient of determination St  S r
r 2
St

St  S r
r
r : correlation coefficient St
13
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
• For a perfect fit Sr=0 and r = r2 = 1 signifies that the line explains
%100 of the variability of the data.

•For r = r2 = 0  Sr=St  the fit represents no improvement

14
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example 17.1:
Fit a straight line to the data set. (x and y values ).
Calculate the standard deviation, standard error of the estimate and
correlation coefficient. Approximate y=f(x) for x=2.5.

x 1 2 3 4 5 6 7
y 0.5 2.5 2.0 4.0 3.5 6.0 5.5

15
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
( y i  y ) 2 f ( x i )  a0  a1x i (yi-f(xi))
2 2
xi yi xi xi yi
1 0.5 1 0.5 8.5765 0.9107 0.1687
2 2.5 4 5.0 0.8622 1.75 0.5625
3 2.0 9 6.0 2.0408 2.5893 0.3473
4 4.0 16 16.0 0.3265 3.4286 0.3265
5 3.5 25 17.5 0.0051 4.2679 0.5897
6 6.0 36 36.0 6.6122 5.1072 0.7971
7 5.5 49 38.5 4.2908 5.9465 0.1994
Sum 28 24 140 119.5 22.7143 2.9911
Average 4 3.4286 20.0 17.0714
St Sr

f ( x )  0.0714  0.8393x
s y / x  0.7735
s y  1.9457
r  0.932
f ( 2.5)  2.1697 16
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Linearization of Nonlinear Relationships

• Linear regression is based on the fact that the relationship between


dependent and independent variables is linear
• If the function is not linear, you will need polynomial regression
techniques or other nonlinear techniques

(a) Data that is ill-suited for linear


least-squares regression
(b) Indication that a parabola may
be more suitable

17
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Linearization of Nonlinear Relationships

For certain classes of functions, you can linearize the data and still
use linear regression.

•Translate a non-polynomial relation to a linear relation


(translate data set accordingly)
• Use Linear regression for the translated data set
• Translate linear relation back to original relation

18
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Exponential Eq.

y  e x

slope  

intercept  ln
19
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Power Eq.

20
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Saturation
growth-rate Eq.

21
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
• Rational function

22
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example 17.2 :
Fit a power equation to the given data set. Calculate the correlation
coefficient. Approximate y=f(x) for x=3.2.

x 1 2 3 4 5
y 0.5 1.7 3.4 5.7 8.4

Answer

f ( x)  0.5 x1.75
r  1.0000
f (3.2)  3.8281

23
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
fi : calculated value=a0 +a1 logxi

(𝒍𝒐𝒈𝒚𝒊 − 𝒍𝒐𝒈𝒚𝒊 )𝟐
xi yi logxi logyi (logxi)2 (logxi)(logyi) fi (logyi-fi)2
1.0 0.5 0.0000 -0.3010 0.0000 0.0000 0.5318 -0.3000 0.00000
2.0 1.7 0.3010 0.2304 0.0906 0.0694 0.0391 0.2268 0.00001
3.0 3.4 0.4771 0.5315 0.2276 0.2536 0.0107 0.5350 0.00001
4.0 5.7 0.6021 0.7559 0.3625 0.4551 0.1074 0.7536 0.00001
5.0 8.4 0.6990 0.9243 0.4886 0.6460 0.2461 0.9232 0.00000
Sum 2.0792 2.1411 1.1693 1.4241 0.9350 0.000033
Average 3.0000 3.9400 0.4158 0.4282
St Sr

24
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Polynomial Regression

• Some engineering data is poorly represented by a straight


line. A curve (polynomial) may be better suited to fit the
data. The least squares method can be extended to fit the
data to higher order polynomials.

General equation for a mth order polynomial,

f ( x)  a0  a1x  a2 x 2  .........  am x m

A straight line is a m=1 : 1st order polynomial.

25
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
•To fit the data to an mth order polynomial, we need to solve the
following system of linear equations for a0 ,a1.....am
•(m+1) equations with (m+1) unknowns

 n n   n 
 n  xi   xi m
  yi 
 i 1 i 1   a   i 1 
 n n n    n
0

  xi m 1  a
 i 1
i
x 2
  i   1     xi yi 
x
i 1 i 1    i 1
         
n n n   am   n 
 xm m 1 2m   xm y 
 i i  i   i i 
x x
i 1 i 1 i 1  i 1 
26
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Standard error of the estimate

Sr
sy / x 
n  (m  1)

Where

n
Sr   ( yi  a0  a1 x  a2 x 2  ...........  am x m ) 2
i 1

27
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example 17.3:
Fit a parabola to the given data set. Calculate the standard error of
the estimate.

x 0 1 2 3 4 5
y 2.1 7.7 13.6 27.2 40.9 61.1

28
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
f ( x i )  a 0  a1x i  a 2 x 2i
xi yi xi2 xi3 xi4 xi yi xi2y (y i  y) 2
f(xi) (yi-f(xi))2
0 2.1 0 0 0 0 0 544.4444 2.4786 0.1433
1 7.7 1 1 1 7.7 7.7 314.4711 6.6986 1.0028
2 13.6 4 8 16 27.2 54.4 140.0278 14.6400 1.0816
3 27.2 9 27 81 81.6 244.8 3.1211 26.3028 0.8050
4 40.9 16 64 256 163.6 654.4 239.2178 41.6870 0.6194
5 61.1 25 125 625 305.5 1527.5 1272.1111 60.7926 0.0945
Sum 15 152.6 55 225 979 585.6 2488.8 2513.3933 3.7466
Average 2.5 25.4333
St Sr

f ( x)  2.4786  2.3593x  1.8607 x 2


s y / x  1.1175

29
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
General comments on regression :

• Each x has a fixed value; it is not random and is known without


error.
• The y values are independent random variables and all have the
same variance.
• Regression for y versus x is not the same as x versus y.
• The y values for a given x must be normally distributed.

30
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Multiple linear regression
It is the case where y is a linear function of two or more
independent variables.

31
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
32
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Two dimesional case can easily extented to m dimensions

and the coeficients are determined similarly.

33
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example 17.4:

Use multiple linear regression to fit this


data.

34
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
35
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
General Linear Least Squares
y  a0 z0  a1 z1  a2 z 2    am z m  e
z0 , z1,  , zm are m  1 basis functions
Y   Z A E
Z   matrix of the calculated values of the basis functions
at the measured values of the independent variable
Y observed valued of the dependent variable
A unknown coefficients
E residuals
Minimized by taking its
2
n  m  partial derivative w.r.t.
S r    yi   a j z ji  each of the coefficients
i 1  j 0  and setting the resulting
36
equation equal to zero
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

You might also like