0% found this document useful (0 votes)
9 views7 pages

Lecture 5

Uploaded by

issy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views7 pages

Lecture 5

Uploaded by

issy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Econ 103 Lecture 5: Multiple Regression Part 1

Instructor: Lucas Zhang


November 6, 2023

In the last section, we have studied linear regression models with one regressor:
Yi = —0 + —1 Xi + ui
In particular, we have discussed in detail the basic properties of the OLS estimator —ˆ1 and
how to conduct inference. The same procedures for testing hypotheses and constructing
confidence interval hold for —ˆ0 .2

One of the key assumption we make in this section is that our regressor Xi and the error
ui are uncorrelated. Is this a plausible assumption? Suppose we regress “test score” on
“student teacher ratio”:
test score = —0 + —1 ◊ (student teacher ratio) + error
Since the errors capture all other variations of “test score” that were not explained by
“student-teacher ratio”, is it reasonable to assume that “student-teacher ratio” and the errors
are uncorrelated? Probably not. It is likely that the schools in more affluent communities
are better funded. They can afford to hire more teachers and better teaching resources.
Moreover, the students themselves are likely to come from affluent families who can afford
private/extracurricular tutors.

As we will see in the next section, omitting these factors will have severe consequences
on the OLS estimators. In fact, when the assumption that the regressor X and the error
u are correlated, the OLS estimator —ˆ1 on X will not converge to the true —1 , making the
OLS estimator questionable. One way to resolve this issue is to include those omitted factors
directly, which is precisely what we are going to see in the context of the regression with
multiple regressors.

1 Regression with Multiple Regressors


Let’s consider the simple linear regression model with one regressor
Yi = —0 + —1 Xi + ui
We have omitted the formula for the standard error of —ˆ0 for pedagogical reasons. In practice, the
2

standard errors will be read off the software output directly, and in most of the cases, the researchers/policy
makers care more about —1 than —0 .

1
Moreover, let’s suppose the Cov(Xi , ui ) ”= 0. This in fact implies that E[ui |Xi ] ”= 0 and
hence violates the OLS assumption we made in last section1 . What’s the consequence? The
OLS estimators are no longer consistent. One can show (not in this class)2

Cov(Xi , ui )
—ˆ1 —1 +
p
æ

V ar(Xi )
¸ ˚˙ ˝
”=0

Note that the bias is persistent even we have large sample, which makes the estimator unre-
liable. One way to address this issue is to include additional variables to our model.

1.1 The Multiple Regression Model


The multiple regression model extends the simple linear model to include additional regres-
sors:
Yi = —0 + —1 X1i + —2 X2i + · · · + —k Xki + ui , i = 1, · · · , n
where

• Y is the dependent variable;

• X1 , X2 , · · · , Xk are regressors;

• u is the error;

• the subscript i on Yi , X1i , · · · , Xki indicates they are the i-th observation from the
sample;

• —0 is the coefficient on the constant 1, —1 is the coefficient on regressor X1 , —2 is the


coefficient on regressor X2 and so on.

Suppose the error ui conditional on all the regressors X1i , X2i , · · · , Xki has mean 0:

E[ui |X1i , X2i , · · · , Xki ] = 0

then we have

E[Yi |X1i , X2i , · · · , Xki ] = E[—0 + —1 X1i + —2 X2i + · · · + —k Xki + ui |X1i , X2i , · · · , Xki ]
= —0 + —1 X1i + —2 X2i + · · · + —k Xki + E[ui |X1i , X2i , · · · , Xki ]
¸ ˚˙ ˝
=0
= —0 + —1 X1i + —2 X2i + · · · + —k Xki population regression line
1
Note that we derived E[ui |Xi ] = 0 =∆ Cov(Xi , ui ) = 0 before. Using mathematical logic of contra-
positive, Cov(Xi , ui ) ”= 0 =∆ E[ui |Xi ] ”= 0
2
We will revisit this in later sections.

2
That is, the conditional expectation of Yi conditional on X1i , · · · , Xki is simply the regression
“line” without the error term. Now suppose we change X1 by X1 units while holding
X2 , · · · , Xk constant, we have

E[Yi |X1i , X2i , · · · , Xki ]+ E[Yi |X1i , X2i , · · · , Xki ] = —0 +—1 (X1i + X1 )+—2 X2i +· · ·+—k Xki

subtract the population regression line E[Yi |X1i , X2i , · · · , Xki ] = —0 + —1 X1i + —2 X2i + · · · +
—k Xki from this gives us

E[Yi |X1i , X2i , · · · , Xki ] = —1 X1

For X1 = 1, above result indicates that the coefficient —1 on X1i is the expected change
in Y given a unit change in X1 while holding all other regressors X2 , · · · , Xk
constant.
• More formally, —1 is the partial effect of X1 on the expected value of Y while holding
all other regressors X2 , · · · , Xk constant: take the partial derivative with respect to
X1i (suppose X1 is continuous):
ˆ
E[Yi |X1i , X2i , · · · , Xki ] = —1
ˆX1
per the definition of partial derivative.
• The similar interpretation holds for all other regressors X2 , · · · , Xk .
• How to interpret —0 ?
– Whenever it makes economic sense, —0 is the expected value of Yi when all the
regressors X1 , X2 , · · · , Xk are zero.
– Otherwise, it is simply the extrapolation of population regression line to when all
the regressors are zero.

1.2 The OLS Estimators -both resulting estimator


Bo and B ,

As in the simple regression case, we employ the ordinary least square (OLS) estimation
methods. Recall that in the simple linear regression model, for any intercept b0 and slope
b1 we choose, there is an error Yi ≠ (b0 + b1 Xi ) between the point on the line b0 + b1 Xi and
the point Yi . The OLS method searches among all possible b0 and b1 to minimize the sum
-

of squared residuals (SSR) ni=1 (Yi ≠b0 ≠b1 Xi )2 , and we call the resulting estimator —ˆ0 and —ˆ1 .
q

The similar procedure extends to multiple regression as well: we search through all
possible sets of b0 , b1 , b2 , · · · , bk that minimizes the sum of squared residuals:
n
ÿ
—ˆ0 , —ˆ1 , · · · , —ˆk = arg min (Yi ≠ (b0 + b1 X1i + · · · + bk Xki ))2
b0 ,b1 ,··· ,bk
i=1

where the above expression reads as: “—ˆ0 , —ˆ1 , · · · , —ˆk are the arguments/solutions that mini-
mize the sum of squared errors over all possible b0 , b1 , · · · , bk ”.

3
• fitted/predicted value:

Ŷi = —ˆ0 + —ˆ1 X1i + · · · + —ˆk Xki for i = 1, · · · , n

• fitted residual:

ûi = Yi ≠ (—ˆ0 + —ˆ1 X1i + · · · + —ˆk Xki ) = Yi ≠ Ŷi for i = 1, · · · , n

Unfortunately without further math background in linear algebra, we don’t have simple
formula for the OLS estimators —ˆ0 , —ˆ1 , · · · , —ˆk . This won’t pose a problem for our class as
we don’t need to compute these by hand and it’s more important for us to understand the
properties and interpretations of these estimators.1

1.3 The Goodness of Fit in Multiple Regression


1.3.1 Standard Error of Regression -

estimator of the standard deviation of the


error Vi
The standard error of regression (SER) is an estimator of the standard deviation of the
error ui . Recall that the error, ui = Yi ≠ (—0 + —1 X1i + · · · + —k Xki ), is the distance between
the actual data point and the true regression line, so the smaller the standard deviation,
the tighter are the points around the regression line. Since the errors ui are unknown, we
estimate it with the fitted residual:
ı̂
ı 1 ÿn
SER = Ù û2
n ≠ k ≠ 1 i=1 i

ûi = Yi ≠ (—ˆ0 + —ˆ1 X1i + · · · + —ˆk Xki )


where in n ≠ k ≠ 1 = n ≠ (k + 1), the k + 1 is used to adjust the degree of freedom since we
already estimated k + 1 coefficients. The adjustment is negligible for large sample n.
1.3.2 The R2 (basically repeats the simple regression discussion)
The R2 is defined similarly as before. Note that Yi can be decomposed into two compo-
nents:
Yi = Ŷi + (Yi ≠ Ŷi ) = Ŷi + ûi
¸˚˙˝ ¸˚˙˝
fitted/predicted value residual

Recall that Ŷi = —ˆ0 + —ˆ1 X1i + · · · + —ˆk Xki . So we can think Ŷi as the part of Yi ex-
plained/predicted by the regressors. The R2 measures the fraction of the sample
variance of Yi that can be explained by the regressors. Formally, we define

• The total sum of squares (TSS) measures the sample variance of Yi :


n
ÿ
T SS = (Yi ≠ Ȳ )2
i=1
1
If you are interested in learning more about this, you should consider taking at least the introductory
linear algebra and take more advanced econometrics. I’m also happy to chat with you about this.

4
• The explained sum of squares (ESS) measures the sample variance of predicted
value Ŷi :2
n
ÿ
ESS = (Ŷi ≠ Ȳ )2
i=1

• The sum of squared residuals (SSR) is the sum of OLS residuals


n
ÿ
SSR = û2i
i=1

One can show that


T SS = ESS + SSR
and we formally define R2 as the fraction of explained sum of squares (ESS) over the total
sum of squares (TSS):
ESS SSR
R2 = =1≠
T SS T SS
• As a fraction of positive numbers, R2 is always between 0 and 1.

• If R2 is close to 1, the regressor is good at predicting Yi ; similarly, if R2 is to 0, the


regressor is not good at predicting Yi .

• What does R2 not telling us? The R2 does not tell us if the regression is “good”
or “bad”. In particular, a low R2 simply indicates there are other factors (possibly
unobserved) that influence Yi , which does not mean that the current regressor is not
important.

• The word “explain” we are using here does not convey any economic or causal meaning.

R2 is always nondecreasing as we add more regressors. There are two ways to see this.

• Intuitively, add more regressors (hence model parameters) makes the model more flex-
ible and can potentially explain the data better, hence the smaller SSR.

• If we want a slightly more rigorous way of looking at this, consider the following
example. We can think the simple regression

Yi = —0 + —1 X1i + ui

as a special base case of multiple regression

Yi = —0 + —1 X1i + · · · + —k Xki + ui

where all other coefficients —ˆ2 , · · · , —ˆk are set to zero. If we are running multiple
regression and OLS just happen to set those coefficients to 0, the R2 will be the same
for simple and multiple regression. On the other hand, if the OLS picks nonzero
2
Here we used the fact that Ŷi = Ȳ , that is, the average of Yi equals to the average of Ŷi .

5
coefficients —ˆ2 , · · · , —ˆk , the sum of squared residuals from multiple regression will be
smaller in the multiple regression model by definition OLS:
SSR(—ˆ0 , —ˆ1 , · · · , —ˆk ) < SSR(—ˆ0 , —ˆ1 , 0, · · · , 0)
That is, (—ˆ0 , —ˆ1 , · · · , —ˆk ) minimizes SSR while (—ˆ0 , —ˆ1 , 0, · · · , 0) does not. Then examine
the fomula of R2 = 1 ≠ SSR T SS
will give us the desired.
1.3.3 The Adjusted R2
Sometimes practitioners may throw in irrelevant regressors to increase R2 and hence
inflating the fit. To prevent this, we define the adjusted R2 :
n ≠ 1 SSR
R̄2 := 1 ≠
n ≠ k ≠ 1 T SS
which mechanically makes the R2 smaller, where k is the number of regressors. Some com-
ments on adjusted R2 : because
• Since n≠1
is always greater than 1, so R̄2 is always less than R2 . numerator> denominator
n≠k≠1

• Adding a regressor has two opposite effects: it decreases the SSR and hence increases
R̄2 ; it increases k and hence n≠k≠1
n≠1
, which decreases R̄2 .

• R̄2 can be negative.

1.4 The Least Squares Assumptions


Recall that we model X1i , X2i , · · · , Xki , Yi as having a linear relationship:
Yi = —0 + —1 X1i + · · · + —k Xki + ui
The following assumptions will give the OLS estimators —ˆ0 , —ˆ1 , · · · , —ˆk many good properties.
1. {(X1i , X2i , · · · , Xki , Yi }ni=1 is an i.i.d. random sample.
2. E[ui |X1i , · · · , Xki ] = 0, which a formal way of saying (i) conditioning on X1i =
x1 , · · · , Xki = xk , the error ui has mean zero; (ii) the regressor X1i , · · · , Xki and the
error (other factors) ui are uncorrelated.
3. Large outliers are unlikely. There are many different ways to make this statement
formal, for example, in some popular texts, 0 < E[X14 ] < Œ, · · · , 0 < E[Xk4 ] < Œ and
E[Y 4 ] < Œ.
4. There’s no perfect multicollinearity: formally it means that no regressors are perfect
linear functions of other regressors.
– Intuitively, having perfect multicollinearity while running the regression is asking
for the impossible: if one regressor is a perfect linear function of the other regres-
sors, that means this regressor must be constant if all other regressors are held
constant. However, the coefficient on that regressor we care trying to estimate
is the effect of change of that regressor while holding other regressors constant.
This is simply preposterous.

6
1.5 The Least Squares Properties
Under the least square assumptions, for large sample, the OLS estimators —ˆ0 , —ˆ1 , · · · , —ˆk are
jointly normal1 . In particular, in large sample, each individual coefficient
=
—ˆj ¥ N (—j , V ar(—ˆj ))

for all j = 0, 1, · · · , k, that is, in large sample, the distributions of OLS estimators are well
approximated by normal distributions that centered at the true parameters of the regression.
Unlike the simple regression case, there’s no easy formula for the variance V ar(—ˆj ) without
additional mathematical background in linear algebra. However in practice the regression
softwares will
Ò
automatically produce the estimates —ˆj and the estimated standard error
‰ —ˆ ) = V ar(—ˆ ).
SE( j j

1
For those who are interested in the intuition, these estimators are still averages, just in more complicated
matrix/vector form, and a central limit theorem still applies. The derivation is well beyond the scope of this
class.

You might also like