0% found this document useful (0 votes)
74 views3 pages

Linear Regression and Least Squares

1) The document discusses linear regression and the method of least squares. It describes minimizing the sum of squared errors (SSE) to estimate the intercept (β0) and slope (β1) of the linear trend line. 2) The SSE is derived as a function of β0 and β1, and partial derivatives are taken to find the critical points that minimize SSE. 3) This leads to formulas for the least squares estimates of β0 and β1 in terms of the sample means, deviations from the means, and sums of squares. 4) It is noted that β0 and β1 can be shown to minimize SSE using the second derivative test for multivariate functions.

Uploaded by

Terasil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views3 pages

Linear Regression and Least Squares

1) The document discusses linear regression and the method of least squares. It describes minimizing the sum of squared errors (SSE) to estimate the intercept (β0) and slope (β1) of the linear trend line. 2) The SSE is derived as a function of β0 and β1, and partial derivatives are taken to find the critical points that minimize SSE. 3) This leads to formulas for the least squares estimates of β0 and β1 in terms of the sample means, deviations from the means, and sums of squares. 4) It is noted that β0 and β1 can be shown to minimize SSE using the second derivative test for multivariate functions.

Uploaded by

Terasil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Statistics 252 (March 11, 2005)

Linear Regression and Least Squares


Consider the linear regression model Y = β0 + β1 x + ε where ε is a mean zero random variable.
Our goal is to predict the linear trend

E(Y ) = β0 + β1 x

by estimating the intercept and the slope of this line. That is, we seek estimators βˆ0 and βˆ1
such that
ŷ = βˆ0 + βˆ1 x.
Our goal is to choose βˆ0 and βˆ1 in such a way that they minimize SSE, the sum of the squares
of the errors. That is, we want to minimize
n
X
SSE = (yi − yˆi )2
i=1

where yi is the ith observation of the random variable Y , and corresponds to the input xi .
Substituting the value of ŷ into the equation for SSE we see that SSE can be viewed as a
function of βˆ0 and βˆ1 . We can now use the methods of elementary calculus to minimize this
function. Namely, we find the first derivative, set it equal to 0, and solve for the critical points.
We then use the second derivative test to check that the critical points are indeed minimizers.
Thus,
n
X
SSE(βˆ0 , βˆ1 ) = (yi − βˆ0 − βˆ1 xi )2 .
i=1
Next, we find
n n n
∂ X X X
SSE(βˆ0 , βˆ1 ) = −2 (yi − βˆ0 − βˆ1 xi ) = −2 yi + 2nβˆ0 + 2βˆ1 xi
∂ βˆ0 i=1 i=1 i=1
= −2ny + 2nβˆ0 + 2nβˆ1 x

and
n n n n
∂ X X X X
SSE(βˆ0 , βˆ1 ) = −2 (yi − βˆ0 − βˆ1 xi )xi = −2 xi yi + 2βˆ0 xi + 2βˆ1 x2i .
∂ βˆ1 i=1 i=1 i=1 i=1

From the first equation,



SSE(βˆ0 , βˆ1 ) = 0
∂ βˆ0
implies
−2ny + 2nβˆ0 + 2nβˆ1 x = 0
so that
βˆ0 = y − βˆ1 x.
From the second equation,

SSE(βˆ0 , βˆ1 ) = 0
∂ βˆ1
implies
n
X n
X n
X
−2 xi yi + 2βˆ0 xi + 2βˆ1 x2i = 0
i=1 i=1 i=1

so that
n
X n
X n
X
− xi yi + (y − βˆ1 x) xi + βˆ1 x2i = 0.
i=1 i=1 i=1

Distributing, and collecting βˆ1 gives


n
X n
X n
X n
X
− xi yi + y xi − βˆ1 x xi + βˆ1 x2i = 0
i=1 i=1 i=1 i=1

so that
n n n n
!
X X X X
βˆ1 x xi − x2i =y xi − xi yi .
i=1 i=1 i=1 i=1

Thus, we find
n
X n
X
y xi − xi yi
i=1 i=1
βˆ1 = n n .
X X
x xi − x2i
i=1 i=1

Now, we can write this in a nicer way:


n
X n
X
ny x − xi yi xi yi − ny x
i=1 i=1
βˆ1 = n = n .
X X
2
nx − x2i x2i − nx 2

i=1 i=1

With a bit of algebra, we can write the numberator as


n
X n
X
xi yi − ny x = (xi − x)(yi − y) =: SXY
i=1 i=1

and the denominator as


n
X n
X
x2i − nx = 2
(xi − x)2 =: SXX .
i=1 i=1

Thus, we can write βˆ1 as


SXY
βˆ1 = .
SXX
This version might be useful for remembering the formula, and it often appears when you are
doing numerical calculations with computer software (for example, with SAS). The software
tends to return the values of SXY and SXX because they are useful for residual analysis in
general (this is not Stat 252 stuff). One final note: the formulæ for βˆ0 and βˆ1 agree with those
in the Stat 151 textbook (pages 529–532 in the first edition and pages 491–494 in the second
edition).
For Culture: The Multivariable Second Derivative Test
The second derivative test for functions of two variables is not required for Stat 252, although
the ability to compute partial derivatives IS required for Stat 252.

Suppose that f is a (continuously differentiable) function of two variables, say f (x, y). The
critical points of the function f occur at those points (x0 , y0 ) where (simultaneously)

∂f ∂f
(x0 , y0 ) = 0 and (x0 , y0 ) = 0.
∂x ∂y

The point (x0 , y0 ) is a local minimum if


2
∂2f ∂2f ∂2f ∂2f

(x0 , y0 ) > 0 and (x ,
0 0y ) · (x0 , y0 ) − (x0 , y0 ) > 0.
∂x2 ∂x2 ∂y 2 ∂x∂y

The point (x0 , y0 ) is a local maximum if


2
∂2f ∂2f ∂2f ∂2f

(x0 , y0 ) < 0 and (x ,
0 0y ) · (x0 , y0 ) − (x0 , y0 ) > 0.
∂x2 ∂x2 ∂y 2 ∂x∂y

[Yes, the inequalities are correct.]

Using this second derivative test, you can verify that βˆ0 and βˆ1 are, in fact, the minimizers of
SSE. To quote our textbook, “We leave this for you to prove.” (page 538)

Exercise: If you are taking Math 212, Math 213, or Math 214 this semester, guess what?

You might also like