0% found this document useful (0 votes)
157 views15 pages

The Simple Regression Model: Introductory Econometrics: A Modern Approach (Wooldridge)

The document summarizes key aspects of simple linear regression models: 1) A simple regression model relates a dependent variable y to an independent variable x using an equation of the form y = β0 + β1x + u, where u is an error term. 2) Ordinary least squares (OLS) estimation minimizes the sum of squared residuals to estimate the slope β1 and intercept β0 parameters. 3) OLS chooses β1 as the sample covariance between x and y divided by the sample variance of x, and β0 is chosen to normalize the mean of the error term to zero.

Uploaded by

thanhpham0505
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
157 views15 pages

The Simple Regression Model: Introductory Econometrics: A Modern Approach (Wooldridge)

The document summarizes key aspects of simple linear regression models: 1) A simple regression model relates a dependent variable y to an independent variable x using an equation of the form y = β0 + β1x + u, where u is an error term. 2) Ordinary least squares (OLS) estimation minimizes the sum of squared residuals to estimate the slope β1 and intercept β0 parameters. 3) OLS chooses β1 as the sample covariance between x and y divided by the sample variance of x, and β0 is chosen to normalize the mean of the error term to zero.

Uploaded by

thanhpham0505
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

15/11/2011

Introductory Econometrics:
A modern approach (Wooldridge)
Chapter 2

The Simple Regression Model

Dr. Lê Văn Chơn – FTU, 2011 Chap 9-1

2.1 Definition of Simple Regression Model


2

Applied econometric analysis often begins with 2 variables y and x.


We are interested in “studying how y varies with changes in x”.
E.g., x is years of education, y is hourly wage.
x is number of police officers, y is a community crime rate.

In the simple linear regression model:


y = β 0 + β1 x + u (2.1)
y is called the dependent variable, the explained variable, or the
regressand.
x is called the independent variable, the explanatory variable, or the
regressor.
u, called error term or disturbance, represents factors other than x
that affect y. u stands for “unobserved”.
Dr. Lê Văn Chơn – FTU, 2011

2.1 Definition of Simple Regression Model


3

If the other factors in u are held fixed, ∆u = 0 , then x has a linear


effect on y: ∆y = β1∆x
β1 is the slope parameter. This is of primary interest in applied
economics.

One-unit change in x has the same effect on y, regardless of the


initial value of x. Unrealistic.
E.g., wage-education example, we might want to allow for
increasing returns.

Dr. Lê Văn Chơn – FTU, 2011

1
CuuDuongThanCong.com https://fb.com/tailieudientucntt
15/11/2011

2.1 Definition of Simple Regression Model


4

An assumption: the average value of u in the population is zero.


E(u) = 0 (2.5)
This assumption is not restrictive since we can always use β 0 to
normalize E(u) to 0.

Because u and x are random variables, we can define conditional


distribution of u given any value of x.
Crucial assumption: average value of u does not depend on x.
E(u|x) = E(u) (2.6)

(2.5) + (2.6) the zero conditional mean assumption.

This implies E(y|x) = β 0 + β1 x

Dr. Lê Văn Chơn – FTU, 2011

2.1 Definition of Simple Regression Model


5

Population regression function (PRF): E(y|x) is a linear function of x.


For any value of x, the distribution of y is centered about E(y|x).

y
f(y)

E ( y | x ) = β 0 + β1 x

x1 x2
Dr. Lê Văn Chơn – FTU, 2011

2.2 Ordinary Least Squares


6

How to estimate population parameters β 0 and β1 from a sample?

Let {(xi,yi): i = 1, 2, …, n} denote a random sample of size n from the


population.

For each observation in this sample, it will be the case that


yi = β 0 + β1 xi + ui

Dr. Lê Văn Chơn – FTU, 2011

2
CuuDuongThanCong.com https://fb.com/tailieudientucntt
15/11/2011

2.2 Ordinary Least Squares


7

PRF, sample data points and the associated error terms:


y
y4
u4 { E ( y | x ) = β 0 + β1 x

y3 } u3
y2 u2 {

y1 } u1
x1 x2 x3 x4 x
Dr. Lê Văn Chơn – FTU, 2011

2.2 Ordinary Least Squares


8

To derive the OLS estimates, we need to realize that our main


assumption of E(u|x) = E(u) = 0 also implies that
Cov(x,u) = E(xu) = 0 (2.11)
Why? Cov(x,u) = E(xu) – E(x)E(u) = Ex[E(xu|x)] = Ex[xE(u|x)] = 0.

We can write 2 restrictions (2.5) and (2.11) in terms of x, y, β 0 and β1


E ( y − β 0 − β1 x) = 0 (2.12)
E[ x( y − β 0 − β1 x )] = 0 (2.13)

(2.12) and (2.13) are 2 moment restrictions with 2 unknown


parameters. They can be used to obtain good estimators of β 0
and β1 .

Dr. Lê Văn Chơn – FTU, 2011

2.2 Ordinary Least Squares


9

Method of moments approach to estimation implies imposing the


population moment restrictions on the sample moments.

Given a sample, we choose estimates β 0 and β1 to solve the


sample versions:
1 n
∑ ( yi − βˆ0 − βˆ1xi ) = 0
n i =1
(2.14)

1 n
∑ xi ( yi − βˆ0 − βˆ1xi ) = 0
n i =1
(2.15)

Given the properties of summation, (2.14) can be rewritten as


y = βˆ + βˆ x
0 1 (2.16)
or βˆ0 = y − βˆ1 x (2.17)
Dr. Lê Văn Chơn – FTU, 2011

3
CuuDuongThanCong.com https://fb.com/tailieudientucntt
15/11/2011

2.2 Ordinary Least Squares


10

Drop 1/n in (2.15) and plug (2.17) into (2.15):


n

∑ x ( y − [ y − βˆ x ] − βˆ x ) = 0
i =1
i i 1 1 i

n n

∑ x ( y − y ) = βˆ ∑ x ( x − x )
i =1
i i 1
i =1
i i

n n

∑ ( xi − x )( yi − y ) = βˆ1 ∑ ( xi − x ) 2
i =1 i =1
n
Provided that ∑ ( xi − x ) 2 > 0
i =1
(2.18)
n

∑ ( x − x )( yi i − y)
the estimated slope is βˆ1 = i =1
n (2.19)
∑ (x − x)
i =1
i
2

Dr. Lê Văn Chơn – FTU, 2011

2.2 Ordinary Least Squares


11

Summary of OLS slope estimate:


- Slope estimate is the sample covariance between x and y divided
by the sample variance of x.
- If x and y are positively correlated, the slope will be positive.
- If x and y are negatively correlated, the slope will be negative.
-Only need x to vary in the sample.

βˆ0 and βˆ1 given in (2.17) and (2.19) are called the ordinary least
squares (OLS) estimates of β 0 and β1 .

Dr. Lê Văn Chơn – FTU, 2011

2.2 Ordinary Least Squares


12

To justify this name, for any βˆ0 and βˆ1 , define a fitted value for y
given x = xi: yˆ i = βˆ0 + βˆ1 xi (2.20)

The residual for observation i is the difference between the actual yi


and its fitted value: uˆi = yi − yˆ i = yi − βˆ0 − βˆ1 xi

Intuitively, OLS is fitting a line through the sample points such that
the sum of squared residuals is as small as possible term
“ordinary least squares”.
Formal minimization problem:
n n

ˆ ˆ ∑ i
min uˆ 2 = ∑ ( yi − βˆ0 − βˆ1 xi ) 2 (2.22)
β 0 , β1
i =1 i =1

Dr. Lê Văn Chơn – FTU, 2011

4
CuuDuongThanCong.com https://fb.com/tailieudientucntt
15/11/2011

2.2 Ordinary Least Squares


13

Sample regression line, sample data points and residuals:


y
y4
û4 {
yˆ = βˆ0 + βˆ1 x

y3 } û3
y2 û2 {

y1 .} û1
x1 x2 x3 x4 x
Dr. Lê Văn Chơn – FTU, 2011

2.2 Ordinary Least Squares


14

To solve (2.22), we obtain 2 first order conditions, which are the


same as (2.14) and (2.15), multiplied by n.

Once we have determined the OLS βˆ0 and βˆ1 , we have the OLS
regression line: yˆ = βˆ0 + βˆ1 x (2.23)

(2.23) is also called the sample regression function (SRF) because


it is the estimated version of the population regression function
(PRF) E ( y | x) = β 0 + β1 x .

Remember that PRF is fixed but unknown.


Different samples generate different SRFs.

Dr. Lê Văn Chơn – FTU, 2011

2.2 Ordinary Least Squares


15

Slope estimate βˆ1 is of primary interest. It tells us the amount by


which ŷ changes when x increases by 1 unit.
∆yˆ = βˆ1∆x

E.g., we study the relationship between firm performance and CEO


compensation.
salary = β 0 + β1 roe + u
salary = CEO’s annual salary in thousands of dollars,
roe = average return (%) on the firm’s equity for previous 3 years.
Because a higher roe is good for the firm, we think β1 > 0.

Data set CEOSAL1 contains information on 209 CEOs in 1990.


OLS regression line: salâry = 963.191 + 18.501roe (2.26)
Dr. Lê Văn Chơn – FTU, 2011

5
CuuDuongThanCong.com https://fb.com/tailieudientucntt
15/11/2011

2.2 Ordinary Least Squares


16

E.g., for the population of the workforce in 1976, let


y = wage, $ per hour,
x = educ, years of schooling.

Using data in WAGE1 with 526 observations, we obtain the OLS


regression line:
wâge = -0.90 + 0.54educ (2.27)

Implication of the intercept? Why?


Only 18 people in the sample have less than 8 years of education.
the regression line does poorly at very low levels.
Implication of the slope?
Dr. Lê Văn Chơn – FTU, 2011

2.3 Mechanics of OLS


17

Fitted Values and Residuals


Given βˆ0 and βˆ1 , we can obtain the fitted value ŷi for each
observation. Each ŷi is on the OLS regression line.

OLS residual associated with observation i, ûi , is the difference


between yi and its fitted value.
If ûi is positive, the line underpredicts yi.
If ûi is negative, the line overpredicts yi.

In most cases, every uˆi ≠ 0 , none of the data points lie on the OLS
line.

Dr. Lê Văn Chơn – FTU, 2011

2.3 Mechanics of OLS


18

Algebraic Properties of OLS Statistics


(1) The sum and thus the sample average of the OLS residuals is zero.
n
1 n

i =1
uˆi = 0 and thus ∑ uˆi = 0
n i =1

(2) The sample covariance between the regressors and the OLS
residuals is zero. n

∑ xiuˆi = 0
i =1

(3) The OLS regression line always goes through the mean of the
sample. y = βˆ + βˆ x
0 1

Dr. Lê Văn Chơn – FTU, 2011

6
CuuDuongThanCong.com https://fb.com/tailieudientucntt
15/11/2011

2.3 Mechanics of OLS


19

We can think of each observation i as being made up of an


explained part and an unexplained part yi = yˆ i + uˆi .

We define the following:


n

∑(y
i =1
i − y ) 2 is the total sum of squares (SST),
n

∑ ( yˆ
i =1
i − y ) 2 is the explained sum of squares (SSE),
n

∑ uˆ
i =1
2
i is the residual sum of squares (SSR).

Then SST = SSE + SSR (2.36)

Dr. Lê Văn Chơn – FTU, 2011

2.3 Mechanics of OLS


20

Proof:
n n n

∑(y
i =1
i − y ) 2 = ∑ [( yi − yˆ i ) + ( yˆ i − y )]2 = ∑ [uˆi + ( yˆ i − y )]2
i =1 i =1
n n n
= ∑ uˆi2 + 2∑ uˆi ( yˆ i − y ) + ∑ ( yˆ i − y ) 2
i =1 i =1 i =1
n
= SSR + 2∑ uˆi ( yˆ i − y ) +SSE
i =1
n
and we know that ∑ uˆ ( yˆ
i =1
i i − y) = 0

Dr. Lê Văn Chơn – FTU, 2011

2.3 Mechanics of OLS


21

Goodness-of-Fit
How well the OLS regression line fits the data?

Divide (2.36) by SST to get:


1 = SSE/SST + SSR/SST
The R-squared of the regression or the coefficient of
determination SSE SSR
R2 ≡ = 1−
SST SST (2.38)
It implies the fraction of the sample variation in y that is explained
by the model.
0 ≤ R2 ≤ 1

Dr. Lê Văn Chơn – FTU, 2011

7
CuuDuongThanCong.com https://fb.com/tailieudientucntt
15/11/2011

2.3 Mechanics of OLS


22

E.g., CEOSAL1. roe explains only about 1.3% of the variation in


salaries for this sample.
98.7% of the salary variations for these CEOs is left unexplained!

Notice that a seemingly low R2 does not mean that an OLS


regression equation is useless.
It is still possible that (2.26) is a good estimate of the ceteris paribus
relationship between salary and roe.

Dr. Lê Văn Chơn – FTU, 2011

2.4 Units of Measurement


23

OLS estimates change when the units of measurement of the


dependent and independent variables change.

E.g., CEOSAL1. Rather than measuring salary in $’000, we


measure it in $, salardol = 1,000.salary.
Without regression, we know that
salârdol = 963,191 + 18,501roe. (2.40)

Multiply the intercept and the slope in (2.26) by 1,000 (2.26) and
(2.40) have the same interpretations.

Define roedec = roe/100 where roedec is a decimal.


salâry = 963.191 + 1850.1roedec. (2.41)

Dr. Lê Văn Chơn – FTU, 2011

2.4 Units of Measurement


24

What happens to R2 when units of measurement change?


Nothing.

Dr. Lê Văn Chơn – FTU, 2011

8
CuuDuongThanCong.com https://fb.com/tailieudientucntt
15/11/2011

2.4 Nonlinearities in Simple Regression


25

It is rather easy to incorporate many nonlinearities into simple


regression analysis by appropriately defining y and x.

E.g., WAGE1. βˆ1 of 0.54 means that each additional year of


education increases wage by 54 cents. maybe not reasonable.

Suppose that the percentage increase in wage is the same given


one more year of education.
(2.27) does not imply a constant percentage increase.

New model: log(wage) = β 0 + β1 educ + u (2.42)


where log(.) denotes the natural logarithm.

Dr. Lê Văn Chơn – FTU, 2011

2.4 Nonlinearities in Simple Regression


26

For each additional year of education, the percentage change in


wage is the same. the change in wage increases.
(2.42) implies an increasing return to education.

Estimating this model and the mechanics of OLS are the same:
lôg(wage) = 0.584 + 0.083educ (2.44)
wage increases by 8.3 percent for every additional year of educ.

Dr. Lê Văn Chơn – FTU, 2011

2.4 Nonlinearities in Simple Regression


27

Another important use of the natural log is in obtaining a constant


elasticity model.

E.g., CEOSAL1. We can estimate a constant elasticity model


relating CEO salary ($’000) to firm sales ($ mil):
log(salary) = β 0 + β1 log(sales) + u (2.45)
where β1 is the elasticity of salary with respect to sales.

If we change the units of measurement of y, what happens to β1 ?


Nothing.

Dr. Lê Văn Chơn – FTU, 2011

9
CuuDuongThanCong.com https://fb.com/tailieudientucntt
15/11/2011

2.4 Meaning of Linear Regression


28

We have seen a model that allows for nonlinear relationships. So


what does “linear” mean?

An equation y = β 0 + β1 x + u is linear in parameters, β 0 and β1 .


There are no restrictions on how y and x relate to the original
dependent and independent variables.

Plenty of models cannot be cast as linear regression models


because they are not linear in their parameters.
E.g., cons = 1/( β 0 + β1 inc) + u

Dr. Lê Văn Chơn – FTU, 2011

2.5 Unbiasedness of OLS


29

Unbiasedness of OLS is established under a set of assumptions:


Assumption SLR.1 (Linear in Parameters)
The population model is linear in parameters as
y = β 0 + β1 x + u (2.47)
where β 0 and β1 are the population intercept and slope parameters.

Realistically, y, x, u are all viewed as random variables.

Assumption SLR.2 (Random Sampling)


We can use a random sample of size n, {(xi,yi): i = 1, 2, …, n}, from
the population model.

Dr. Lê Văn Chơn – FTU, 2011

2.5 Unbiasedness of OLS


30

Not all cross-sectional samples can be viewed as random samples,


but many may be.

We can write (2.47) in terms of the random sample as


yi = β 0 + β1 xi + ui , i = 1, 2, …, n (2.48)

To obtain unbiased estimators of β 0 and β1 , we need to impose


Assumption SLR.3 (Zero Conditional Mean)
E(u|x) = 0

This assumption implies E(ui|xi) = 0 for all i = 1, 2, …, n.

Dr. Lê Văn Chơn – FTU, 2011

10
CuuDuongThanCong.com https://fb.com/tailieudientucntt
15/11/2011

2.5 Unbiasedness of OLS


31

Assumption SLR.4 (Sample Variation in the Independent Variable)


In the sample, xi, i = 1, 2, …, n are not all equal to a constant.
n
This assumption is equivalent to ∑ (x − x)
i =1
i
2
>0
n n
From (2.19) : ∑ ( x − x )( y − y ) ∑ ( x − x ) y
i i i i
βˆ1 = i =1
n
= i =1
n

∑ (x − x)
i =1
i
2
∑ (x − x)
i =1
i
2

Plug (2.48) into this:


n n

∑ ( x − x )(β
i 0 + β1 xi + ui ) ∑ ( x − x )u
i i
βˆ1 = i =1
n
= β1 + i =1

SSTx
∑ (x − x)
i =1
i
2

Dr. Lê Văn Chơn – FTU, 2011

2.5 Unbiasedness of OLS


32

Errors ui’s are generally different from 0. βˆ1 differs from β1 .

The first important statistical property of OLS:


Theorem 2.1 (Unbiasedness of OLS)
Using Assumptions SLR.1 through SLR.4,
E ( βˆ0 ) = β 0 , and E ( βˆ1 ) = β1 (2.53)
The OLS estimates of β 0 and β1 are unbiased.

n
Proof: E ( βˆ1 ) = β1 + E[(1 / SSTx )∑ ( xi − x )ui ]
i =1
n
= β1 + (1 / SSTx )∑ ( xi − x )E (ui ) = β1
i =1

Dr. Lê Văn Chơn – FTU, 2011

2.5 Unbiasedness of OLS


33

(2.17) implies
βˆ0 = y − βˆ1 x = β 0 + β1 x + u − βˆ1 x = β 0 + ( β1 − βˆ1 ) x + u
E ( βˆ ) = β + E[(β − βˆ ) x ] = β
0 0 1 1 0

Remember unbiasedness is a feature of the sampling distributions


of βˆ0 and βˆ1 .It says nothing about the estimate we obtain for a
given sample.

If any of four assumptions fails, then OLS is not necessarily


unbiased.

When u contains factors affecting y that are also correlated with x


can result in spurious correlation.

Dr. Lê Văn Chơn – FTU, 2011

11
CuuDuongThanCong.com https://fb.com/tailieudientucntt
15/11/2011

2.5 Unbiasedness of OLS


34

E.g., let math10 denote % of tenth graders at a high school


receiving a passing score on a standardized math exam.
Let lnchprg denote % of students eligible for the federally funded
school lunch program.
We expect the lunch program has a positive effect on performance:
math10 = β 0 + β1 ln chprg + u

MEAP93 has data on 408 Michigan high school for the 1992-1993
school year.
mâth10 = 32.14 – 0.319lnchprg
Why? u contains such as the poverty rate of children attending
school, which affects student performance and is highly correlated
with eligibility in the lunch program.
Dr. Lê Văn Chơn – FTU, 2011

2.5 Variances of the OLS Estimators


35

Now we know that the sampling distribution of our estimate is


centered about the true parameter.
How spread out is this distribution? the variance.

We need to add an assumption.


Assumption SLR.5 (Homoskedasticity)
Var(u|x) = σ
2

This assumption is distinct from Assumption SLR.3: E(u|x) = 0.


This assumption simplifies the variance calculations for βˆ0 and β1
ˆ
and it implies OLS has certain efficiency properties.

Dr. Lê Văn Chơn – FTU, 2011

2.5 Variances of the OLS Estimators


36

Var(u|x) = E(u2|x) – [E(u|x)]2 = E(u2|x) = σ Var(u) = E(u2) = σ


2 2

σ 2 is often called the error variance.


σ , the square root of the error variance, is called the standard
deviation of the error.

We can say that


E ( y | x ) = β 0 + β1 x (2.55)
Var ( y | x ) = σ 2 (2.56)

Dr. Lê Văn Chơn – FTU, 2011

12
CuuDuongThanCong.com https://fb.com/tailieudientucntt
15/11/2011

2.5 Variances of the OLS Estimators


37

Homoskedastic case:
y
f(y|x)

E(y|x) = β0 + β1x

x1 x2 x
Dr. Lê Văn Chơn – FTU, 2011

2.5 Variances of the OLS Estimators


38

Heteroskedastic case:
f(y|x)

E(y|x) = β0 + β1x

x1 x2 x3 x
Dr. Lê Văn Chơn – FTU, 2011

2.5 Variances of the OLS Estimators


39

Theorem 2.2 (Sampling variances of the OLS estimators)


Under Assumptions SLR.1 through SLR.5,
σ2 σ2
Var ( βˆ1 ) = n
=
SSTx (2.57)
∑ (x − x)
i =1
i
2

1 n 2
σ2 ∑ xi
n i =1
and Var ( βˆ0 ) = n
(2.58)
∑ (x − x)
i =1
i
2

1 n
SSTx σ2
Proof: Var ( βˆ1 ) =
SSTx2
∑ ( x − x ) Var (u ) = SST
i =1
i
2
i 2
σ2 =
SSTx
x

Dr. Lê Văn Chơn – FTU, 2011

13
CuuDuongThanCong.com https://fb.com/tailieudientucntt
15/11/2011

2.5 Variances of the OLS Estimators


40

(2.57) and (2.58) are invalid in the presence of heteroskedasticity.

(2.57) and (2.58) imply that:


(i) The larger the error variance, the larger are Var( β̂ j ) .
(ii) The larger the variability in the xi, the smaller are Var( βˆ j ).

Problem: the error variance σ is unknown because we don’t


2

observe the errors, ui.

Dr. Lê Văn Chơn – FTU, 2011

2.5 Estimating the Error Variance


41

What we observe are the residuals, ûi . We can use the residuals to
form an estimate of the error variance.

We write the residuals as a function of the errors:


uˆi = yi − βˆ0 − βˆ1 xi = ( β 0 + β1 xi + ui ) − βˆ0 − βˆ1 xi
uˆi = ui − ( βˆ0 − β 0 ) − ( βˆ1 − β1 ) xi (2.59)

An unbiased estimator of σ is
2

1 n 2 SSR (2.61)
σˆ =
2
∑ uˆi = n − 2
n − 2 i =1

Dr. Lê Văn Chơn – FTU, 2011

2.5 Estimating the Error Variance


42

σˆ = σˆ 2 = standard error of the regression (SER).

σ
Recall that sd ( βˆ1 ) = , if we substitute σ̂ for σ 2 , then we
2

SSTx
have the standard error of βˆ1 :
σˆ σˆ
se( βˆ1 ) = =
SSTx  n 
 ∑ ( xi − x ) 2 
 i =1 

Dr. Lê Văn Chơn – FTU, 2011

14
CuuDuongThanCong.com https://fb.com/tailieudientucntt
15/11/2011

2.6 Regression through the Origin


43

In rare cases, we impose the restriction that when x = 0, E(y|0) = 0.


E.g., if income (x) is zero, income tax revenues (y) must also be
zero.
~
Equation y = β1 x + u~ (2.63)
Obtaining (2.63) is called regression through the origin.

We still use OLS method with the corresponding first order condition
n

n
~ˆ ~ˆ ∑x y i i

∑ x (y i i −β1 xi ) = 0 β1 = i =1
n (2.66)
i =1
∑x
i =1
2
i

If β 0 ≠ 0 , then β1 is a biased estimator of β1.

Dr. Lê Văn Chơn – FTU, 2011

15
CuuDuongThanCong.com https://fb.com/tailieudientucntt

You might also like