0% found this document useful (0 votes)

15 views73 pages

Lect2 Part2

Uploaded by

lynn871004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views73 pages

Lect2 Part2

Uploaded by

lynn871004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 73

Econometrics I: Fundamentals of Regression Analysis

Part 2

Javier Abellán, Màxim Ventura and Carlos Suárez

Universitat Pompeu Fabra

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 1 / 73
Fundamentals of Regression Analysis The OLS estimator assumptions

The OLS estimator assumptions

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 2 / 73
Fundamentals of Regression Analysis The OLS estimator assumptions

Why do we use OLS instead of other possible estimators?

• OLS is a generalisation of the sample average: if the "line" would be just

an intercept (that is, if the model does not include any variable), then the
OLS estimator would be the sample average of Y1 , . . . , Yn (Ȳ)
• Similar to Ȳ, the OLS estimator has some desirable properties:
▶ Under certain assumptions, it is unbiased, it is an unbiased
estimator: E(β̂1 ) = β1
▶ Under certain assumptions, has a tighter sampling distribution than
some other unbiased candidates estimators of β1 (that is, it has
lower variance).

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 3 / 73
Fundamentals of Regression Analysis The OLS estimator assumptions

Least-Squares assumptions

Yi = β0 + β1 Xi + ui , i = 1, ..., n

In order for the OLS estimators, βˆ0 and βˆ1 , to be appropriate estimators of the
true parameters β0 and β1 , the following three assumptions need to be true:

• Assumption 1: The conditional distribution of ui given Xi has a mean of

zero.
E(ui |Xi ) = 0
• Assumption 2: Observations are independently and identically
distributed.
(Xi , Yi ), i = 1, . . . , n are i.i.d
• Assumption 3: Large outliers are unlikely

0 < E(X 4 ) < ∞ and 0 < E(Y 4 ) < ∞

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 4 / 73
Fundamentals of Regression Analysis The OLS estimator assumptions

Assumption 1: E(ui | Xi ) = 0

If the conditional distribution of ui given Xi has a mean of zero:

E(ui |Xi ) = 0

• All the "other factors" captured in the error term ui (those that
explain Yi but have not been included in the model) are (linearly)
unrelated to Xi : Cov(X, u) = 0 (See Appendix)
• The conditional distribution of Yi is centered in the population
regression line: That is, on average, the prediction of Yi is right (See
Appendix)
• We will frequently come back to this assumption during the course

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 5 / 73
Fundamentals of Regression Analysis The OLS estimator assumptions

With experimental data:

• In randomized control trials (RCT), Xi is randomly assigned to individuals
without taking into account its characteristics
• As a consequence, Xi is unrelated to all characteristics of the individual
that affects Yi , which in our model are captured by ui
• Therefore, in well-designed experimental settings:
▶ ui and Xi are independently distributed
▶ E(ui |Xi ) = 0
With observational data:
• Xi is not randomly assigned across the population
• So we should be careful to understand wether this assumption is
actually valid in the data

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 6 / 73
Fundamentals of Regression Analysis The OLS estimator assumptions

Assumption 2: (Xi , Yi ), i = 1, ....., n are i.i.d

(Xi , Yi ), i = 1, ....., n are independently and identically distributed

• If observations are selected by simple random sampling from a single
large population, this assumption is true.
• Let’s continue with our example of the housing price in Barcelona. If X is
the area of a dwelling and Y its sale’s price.
• If we randomly sample n dwellings from the population of dwellings sold
in Barcelona between 1998 and 2000:
▶ because all observations are drawn from the same population, the
joint distribution of surface and price is the same for each i and
equals the joint distribution of surface and price in the population
(identically distributed).
▶ because the sample is selected at random, knowing the surface
and price of dwelling 1 tells us nothing about the surface and price
of the remaining n-1 dwellings (independently distributed).

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 7 / 73
Fundamentals of Regression Analysis The OLS estimator assumptions

Assumption 3: Large outliers are unlikely

• Outlier: an observation with values of Xi , Yi or both far outside the usual

range of the data
• Extreme values prevent the sample variance s2 to converge to the
population variance σ 2 , this making the OLS estimations misleading.
• Mathematically, that extreme values are unlikely is stated as:
▶ X and Y have nonzero finite fourth moments: 0 < E(X 4 ) < ∞ and
0 < E(Y 4 ) < ∞
▶ Another way to put this is that X and Y have finite kurtosis

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 8 / 73
Fundamentals of Regression Analysis The OLS estimator assumptions

Assumption 3: Large outliers are unlikely

• The validity of this assumption will depend on the characteristics of the

data
• For instance, the area of a dwelling will probably satisfy the assumption
• The same goes for exam grades, age of a person, etc.
• However, for other variables such as returns of the stock market, we
should check whether this is actually the case

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 9 / 73
Fundamentals of Regression Analysis The OLS estimator assumptions

Twin roles of the least square assumptions

1. Mathematical role: if the three assumptions hold...

▶ the OLS estimators will be unbiased estimators of the true
parameters
▶ the OLS estimators will be consistent estimators of the true
parameters
▶ the OLS estimators will have sampling distributions that are
approximately normal in large samples.
2. Circumstances when the OLS assumptions do not hold
▶ Corr(X, u) ̸= 0
▶ Observations not i.i.d
▶ Outliers

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 10 / 73
Fundamentals of Regression Analysis The sampling distribution of the OLS estimator

The sampling distribution of the OLS estimator

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 11 / 73
Fundamentals of Regression Analysis The sampling distribution of the OLS estimator

Sampling distribution of the OLS estimators

If the OLS estimators are computed from randomly drawn samples, βˆ0 and βˆ1
will be random variables themselves with a sampling distribution.

Under the least square assumptions:

• βˆ0 and βˆ1 are unbiased estimators of β0 and β1

E(βˆ0 ) = β0 and E(βˆ1 ) = β1

• In large samples, by the central limit theorem, the sampling distribution

of βˆ0 and βˆ1 can be well approximated by the bivariate normal
distribution.
▶ The marginal distributions of βˆ0 and βˆ1 are (approximately)
normally distributed in large samples.

βˆ0 → N (β0 , σβ2ˆ ) and βˆ1 → N (β1 , σβ2ˆ )

d 0 d 1

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 12 / 73
Fundamentals of Regression Analysis The sampling distribution of the OLS estimator

Unbiasedness of βˆ1

P
(Xi − X)(Yi − Y)
β̂1 = P
(Xi − X)2

• If we replace Yi by its population value according to the true model

(Yi = β0 + β1 Xi + ui ) and work out the math, we can show that: (See
Appendix)
P
(Xi − X)ui
β̂1 = β1 + P
(Xi − X)2

• This is one of the most important formulas we will see during this course

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 13 / 73
Fundamentals of Regression Analysis The sampling distribution of the OLS estimator

Unbiasedness of βˆ1

P
(Xi − X)ui
β̂1 = β1 + P
(Xi − X)2

• The intuitive idea is that our estimator is equal to the true parameter plus
’something’ else
• If the expected value of that ’something’ else is zero, our estimator is
unbiased; otherwise it is biased
• If the error term is uncorrelated with our X (if assumption #1 holds), then
the second term will be zero and thus our estimator will be unbiased
(E(β̂1 ) = β1 )
• However, if our model has left in the error term something relevant
(explains Y and therefore belongs to the error term and is correlated with
X), the second term will not be zero and our estimator will be biased

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 14 / 73
Fundamentals of Regression Analysis The sampling distribution of the OLS estimator

Normal approximation of βˆ1 and βˆ0 in large samples

The large sample approximation of βˆ1 is:

1 var[(Xi − µX )ui ]
N (β1 , σβ2ˆ ) where σβ2ˆ =
1 1 n [var(Xi )]2

The large sample approximation of βˆ0 is:

1 var(Hi ui )
N (β0 , σβ2ˆ ) where σβ2ˆ =
0 0 n [E(Hi )2 ]2
h µ i
X
and Hi = 1 − Xi
E(Xi2 )

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 15 / 73
Fundamentals of Regression Analysis The sampling distribution of the OLS estimator

From the variance formula of the OLS estimators we can see several things:

1. Other things equal, the larger the variance of Xi , the smaller the variance
of βˆ1
▶ Intuitively, the wider the range of X, the ’better’ information to draw
the the regression line.
2. Other things equal, the smaller the variance of ui , the smaller the
variance of βˆ1
▶ Intuitively, if we have a very good model (the errors are smaller), the
data will have a tighter scatter around the population regression
line, so its slope will be estimated more precisely.
3. Other things equal, a larger the sample size (n), the smaller the variance
βˆ1
▶ Intuitively, larger n means more dots (information) to draw the
regression line

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 16 / 73
Fundamentals of Regression Analysis The sampling distribution of the OLS estimator

Consistency of βˆ1 and βˆ0

From the variance formula of the OLS estimators we can see several things:

• βˆ0 and βˆ1 are consistent estimators of β0 and β1

▶ As n gets larger, the variance of βˆ0 and βˆ1 will go to zero
▶ Since n is in the denominator of the variance’s formulas, if
assumption #3 holds (the other terms are finite), the variance will
converges to zero as n → ∞

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 17 / 73
Fundamentals of Regression Analysis The sampling distribution of the OLS estimator

Estimator of the variance and standard error of βˆ1 and βˆ0

The variances of the OLS estimators, σβ2ˆ and σβ2ˆ , are unknown parameters
1 0
so they need to be estimated as well.

The estimators of σβ2ˆ and σβ2ˆ are, respectively:

1 0

1
Pn 2 2 1
Pn 2 2
1 n−2 i=1 (Xi − X̄) ûi 1 n−2 i=1 Ĥi ûi
σ̂β̂2 = n h P i2 and σ̂β̂2 = n h P 2 2
1 0
i
1 n 2 1 n
n i=1 (X i − X̄) n i=1 Ĥ i

1 Pn
where Ĥi = 1 − (X/ X 2 )Xi
n i=1 i
And the standard errors of βˆ1 and βˆ0 are estimators of the standard deviation
of βˆ1 and βˆ0 , σβˆ1 and σβˆ0 :
q q
se(βˆ1 ) = σ̂β2ˆ and se(βˆ0 ) = σ̂β2ˆ
1 0

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 18 / 73
Fundamentals of Regression Analysis Homoskedasticity and heteroskedasticity

Homoskedasticity and heteroskedasticity

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 19 / 73
Fundamentals of Regression Analysis Homoskedasticity and heteroskedasticity

Homoskedasticity

Let’s add a fourth assumption:

• Assumption 4: the errors ui are homoskedastic

The error term ui is homoskedastic if:

▶ The variance of the conditional distribution of ui given Xi , Var(ui |Xi ),
is constant for i = 1, ...,n
▶ In particular, does not depend of Xi
Otherwise, the error term is said to be heteroskedastic.

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 20 / 73
Fundamentals of Regression Analysis Homoskedasticity and heteroskedasticity

Graphically:

• In the left-hand figure, the spread of the conditional distribution of ui

given Xi (student-teacher ratio in the example) does not depend on the
value of Xi .
• On the contrary, in the right-hand figure, the spread of the conditional
distribution of ui given Xi is tight for low values of Xi and greater for larger
values of Xi . So it does depend on Xi .

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 21 / 73
Fundamentals of Regression Analysis Homoskedasticity and heteroskedasticity

Mathematical implications of homoskedasticity

If the three least square assumptions hold and the errors are homoskedastic:
1. The OLS estimators remain unbiased, consistent and asymptotically
normal
▶ Note that unbiasedness and consistency do not depend on whether
errors are heteroskedastic or homoskedastic
▶ For these properties to be true, we only need the
first 3 least square assumptions to hold

2. The OLS estimators βˆ0 and βˆ1 are efficient among all estimators that are
a linear combination of Y1 , ..., Yn and are unbiased (Gauss-Markov
theorem).
▶ This is, the OLS estimators are the more efficient linear
conditionally unbiased estimators (are BLUE)

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 22 / 73
Fundamentals of Regression Analysis Homoskedasticity and heteroskedasticity

3 Because the conditional variance of ui given Xi is constant,

Var(ui |Xi ) = σu2 , the formulas for the variance of βˆ0 and βˆ1 simplify to:

σu2 E(Xi2 ) 2
σβ2ˆ = and σβ2ˆ = σ
1 nσX2 0 nσX2 u

• Consequently, if errors are homoskedastic, the formula for the standard

errors of βˆ0 and βˆ1 is simplified. The homoskedasticity-only standard
errors:
q s2û
se(βˆ1 ) = σ̃β2ˆ where σ̃β2ˆ = Pn 2
i=1 (Xi − X)
1 1

1 P
n
i=1 Xi2 s2û
σ̃β2ˆ = Pnn
q
se(βˆ0 ) = σ̃β2ˆ where
0 0
i=1 (Xi − X)2

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 23 / 73
Fundamentals of Regression Analysis Homoskedasticity and heteroskedasticity

Warning

• When the errors are heteroskedastic, the homoskedasticity-only

formulas for the standard errors are inappropriate. Specifically:
▶ The t-statistic computed using the homoskedasticity-only standard
error does not have a standard normal distribution, even in large
samples.
▶ The 95% confidence intervals constructed using 1.96 as a critical
value and the homoskedasticity-only standard error will not contain
the true value of the parameter with 95% probability, even in large
samples.

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 24 / 73
Fundamentals of Regression Analysis Homoskedasticity and heteroskedasticity

• In contrast, using heteroskedasticity-robust standard errors (the

formulas initially presented for se(βˆ1 ) and se(βˆ1 )) leads to valid statistical
inferences whether or not the errors are heteroskedastic.
▶ At a general level, economic theory rarely gives any reason to
believe that the error term is homoskedastic
▶ So we will generally assume that errors are heteroskedastic and we
will use heteroskedasticity-robust standard errors.

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 25 / 73
Fundamentals of Regression Analysis Homoskedasticity and heteroskedasticity

Variance of the residuals in the dwellings example

• Does σû2 depend on Xi ?

• As we can see, a larger value of X has a larger û. Therefore, it is quite

likely that assumption #4 (homoskedasticity) does not hold

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 26 / 73
Fundamentals of Regression Analysis Hypothesis test and confidence intervals

Hypothesis test and confidence intervals

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 27 / 73
Fundamentals of Regression Analysis Hypothesis test and confidence intervals

Testing hypotheses about β1

The general approach to test hypothesis about the unknown parameter β1 is

the same as the one to test hypothesis about the population mean, µ.

Steps:

1. Set a null hypothesis (H0 ) about β1 and assume it is true

2. Characterise the sampling distribution of βˆ1 under H0
3. Calculate βˆ1 from the randomly selected sample
4. Choose a significance probability level (α)
5. Reject H0 or not accordingly. Three alternative ways:
5.1 Calculate the t-statistic and compare it to the critical value t*
5.2 Calculate the p-value and compare it to α
5.3 Calculate the confidence interval for β1 and check if β1,0 is in it

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 28 / 73
Fundamentals of Regression Analysis Hypothesis test and confidence intervals

Two-sided hypothesis concerning β1

Empirical question:
Does the size of an apartment affects its sale’s price?

Step 1: Convert your empirical question into a hypothesis

Our empirical question concerns the slope of the population regression line
that relates the size of an apartment with its price:

Price = β0 + β1 Size

Concretely, we want to know if this relation exists at all. Therefore, our null
and alternative hypotheses are:
• H0 : β 1 = 0 (NO relation between Size and Price in the population)
• H1 : β1 ̸= 0 (Relation between Size and Price in the population)

More generally: H0 : β1 = β1,0 and H1 : β1 ̸= β1,0

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 29 / 73
Fundamentals of Regression Analysis Hypothesis test and confidence intervals

Two-sided hypothesis concerning β1

Empirical question:
Does the size of an apartment affects its price of sale?

Step 2: Characterize the sampling distribution of βˆ1 under H0

We have seen in the previous section that in large samples the distribution of
βˆ1 is well approximated by:

1 var[(Xi − µX )ui ]
N(β1 , σβ2ˆ ) where σβ2ˆ =
1 1 n [var(Xi )]2

So under our H0 , if the sample is large enough the distribution of βˆ1 is

approximately N (0, σβ2ˆ )
1

Or more generally, approximately a N(β1,0 , σβ2ˆ )

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 30 / 73
Fundamentals of Regression Analysis Hypothesis test and confidence intervals

Two-sided hypothesis concerning β1

Empirical question:
Does the size of an apartment affects its price of sale?

Step 3: Calculate βˆ1 from a randomly selected sample.

P
(Xi − X)(Yi − Y) sX,Y
β̂1act = P 2
= 2
(Xi − X) sX

Or in our example:
P
(Sizei − Size)(Pricei − Price) sSize,Price
β̂1act = =
s2Size
P 2
(Sizei − Size)

Step 4: Choose a significance probability level (α).

• Commonly set at 5% (α = 0.05)

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 31 / 73
Fundamentals of Regression Analysis Hypothesis test and confidence intervals

Two-sided hypothesis concerning β1

Empirical question:
Does the size of an apartment affects its price of sale?

Step 5: Reject or not the null hypothesis that β1 = 0

Alternative 1: Calculate the t-statistic using β̂ act and compare it to the critical
value t* (for α = 0.05, t*=1.96)
act
βˆ1 − β1,0 βˆ1 − 0 βˆ1
t= = −→ tact =
se(βˆ1 ) se(βˆ1 ) se(βˆ1 )

where se(βˆ1 ) is the standard error of βˆ1 , which is the estimator of the standard
deviation of βˆ1 , σβˆ1 :
1
Pn 2 2
i=1 (Xi − X̄) ûi
q
ˆ
se(β1 ) = σ̂βˆ 2 where 2 1 n−2
σ̂β̂ = n h i2
1 1 Pn
1 2
n i=1 (X i − X̄)

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 32 / 73
Fundamentals of Regression Analysis Hypothesis test and confidence intervals

Alternative 1: Rejection rule

• If |tact | > 1.96 → Reject H0 at a 5% significance level
• If |tact | ≤ 1.96 → Do not reject H0 at a 5% significance level

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 33 / 73
Fundamentals of Regression Analysis Hypothesis test and confidence intervals

Step 5: Reject or not the null hypothesis that β1 = 0

Alternative 2: Calculate the p-value and compare it to α.

p-value: probability, under the assumption that H0 is true, of observing a

act
value of βˆ1 as far from β1,0 as your estimate βˆ1 .

p − value = PrH0 [|β̂1 − β1,0 | > |β̂1act − β1,0 |]

h β̂ − β
1 1,0 β̂ act − β1,0 i
= PrH0 > 1
se(βˆ1 ) se(βˆ1 )
= PrH0 [|t| > |tact |]

Because βˆ1 is approximately distributed in large samples, under H0 the

t-statistic is approximately distributed as a standard normal, so:

p − value = Pr[|Z| > |tact |] = 2Φ(−|tact |)

where Φ() is the cumulative distribution function (CDF) of the standard

normal distribution
Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 34 / 73
Fundamentals of Regression Analysis Hypothesis test and confidence intervals

Alternative 2: Rejection rule

• If p − value ≤ 0.05 → Reject H0 at a 5% significance level
• If p − value > 0.05 → Do not reject H0 at a 5% significance level

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 35 / 73
Fundamentals of Regression Analysis Hypothesis test and confidence intervals

Step 5: Reject or not the null hypothesis that β1 = 0

Alternative 3: Calculate the confidence interval for β1 and check if β1,0 is in it.

95% confidence interval (CI) of β1 : an interval that contains the true value
of β1 with 95% probability. Or equivalently, the set of values of β1 that cannot
be rejected by a 5% two-sided hypothesis test.

By rearranging the rejection rule based on the t-statistic:

act
βˆ1 − β1,0
Do not reject H0 if < |1.96|
se(β̂1 )
We can establish the set of values β1 that are not rejected at a 5%
significance level:
act act act
95% CI for β1 = {βˆ1 ± 1.96se(β̂1 )} = [βˆ1 − 1.96se(β̂1 ), βˆ1 + 1.96se(β̂1 )]

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 36 / 73
Fundamentals of Regression Analysis Hypothesis test and confidence intervals

Alternative 3: Rejection rule

act
/ {βˆ1
• If β1,0 ∈ ± 1.96se(βˆ1 )} → Reject H0 at a 5% significance level
act
• If β1,0 ∈ {βˆ1 ± 1.96se(βˆ1 )} → Do not reject H0 at a 5% significance level

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 37 / 73
Fundamentals of Regression Analysis Hypothesis test and confidence intervals

Confidence interval for predicted effect of changing Size

The 95% confidence interval for β1 can be used to construct a 95% interval
for the predicted effect of a general change in Size (∆Size) on Price (∆Price).
According to our model, the predicted change in Price will be:

∆Price = β1 ∆Size

β1 is unknown, but because we can construct a confidence interval for β1 , we

can also construct a confidence interval for the predicted effect β1 ∆Size:
act act
95% CI for β1 ∆Size = [βˆ1 ∆Size − 1.96se(β̂1 ) × ∆Size, βˆ1 ∆Size + 1.96se(β̂1 )]

For example, the confidence interval for the predicted change in price for a
15m2 increase in house size will be:

95% CI for β1 ∆Size = [1614.242 × 15 − 1.96 × 73.43 × 15, 1614.242 × 15 +

1.96 × 73.43 × 15] = [22054.79 , 26372.47]

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 38 / 73
Fundamentals of Regression Analysis Hypothesis test and confidence intervals

Two-sided hypothesis concerning β1

Empirical question:
Does the size of an apartment affects its price of sale?

Price = β0 + β1 Size

• H0 : β 1 = 0 (NO relation between Size and Price in the population)

• H1 : β1 ̸= 0 (Relation between Size and Price in the population)

We can use any of the three alternatives to reject H0 at a 5% significance

level:
1. Using the t-statistic: 22.35 > 1.96 −→ Reject H0
2. Using the p-value: 0.000 < 0.05 −→ Reject H0
3. Using the 95% confidence interval: 0 ∈
/ [1497.2 , 1785.3] −→ Reject H0

So we conclude that the size of an apartment affects its prices of sale.

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 39 / 73
Fundamentals of Regression Analysis Hypothesis test and confidence intervals

One-sided hypothesis concerning β1

Empirical question:
Is the increase in the price of sale for an additional square meter greater
than 1600 euros?

Step 1: Convert your empirical question into a hypothesis

Again, our empirical question concerns the slope of the regression line:

Price = β0 + β1 Size

But now we want to know if this slope is greater than 1600. Therefore, now
our null and alternative hypotheses are:
• H0 : β1 = 1600
• H1 : β1 > 1600

More generally: H0 : β1 = β1,0 and H1 : β1 > β1,0

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 40 / 73
Fundamentals of Regression Analysis Hypothesis test and confidence intervals

One-sided hypothesis concerning β1

Empirical question:
Is the increase in the price of sale for an additional square meter greater
than 1600 euros?

Step 2: Characterize the sampling distribution of βˆ1 under H0

• Under our H0 , the distribution of βˆ1 is approximately N(1600, σ 2ˆ )

β 1

Step 3: Calculate βˆ1 with a randomly selected sample.

P
act (Sizei − Size)(Pricei − Price) sSize,Price
β̂1 = =
s2Size
P 2
(Sizei − Size)

Step 4: Choose a significance probability level (α).

• Commonly set at 5% (α = 0.05)

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 41 / 73
Fundamentals of Regression Analysis Hypothesis test and confidence intervals

One-sided hypothesis concerning β1

Empirical question:
Is the increase in the price of sale for an additional square meter greater
than 1600 euros?

Step 5: Reject or not the null hypothesis that β1 > 1600

Alternative 1: Calculate the t-statistic using β̂ act and compare it to the critical
value t*

The t-statistic is constructed as you will do it for a two-sided hypothesis tests:

act
act
βˆ1 − 1600
t =
se(βˆ1 )

When using a one-tailed test, we are testing for the possibility of the
relationship in one direction and completely disregarding the possibility of a
relationship in the other direction. Therefore, we will concentrate on only one
side of the standard normal distribution and critical value for the CDF at 5%
changes (t∗). Concretely, in a one-sided test, for significance level of
Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 42 / 73
Fundamentals of Regression Analysis Hypothesis test and confidence intervals

Alternative 1: Rejection rule

• If H1 : β1 > β1,0 : Reject H0 if tact > 1.645
• If H1 : β1 < β1,0 : Reject H0 if tact < -1.645

So in our example:

act
act
βˆ1 − 1600 1641.24 − 1600
t = = = 0.56
se(βˆ1 ) 73.43

• 0.56 < 1.645 −→ We cannot reject the null that β1 = 1600

So, with the evidence at hand, we conclude that the increase in price for an
additional square meter is not different from 1600.

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 43 / 73
Fundamentals of Regression Analysis Hypothesis test and confidence intervals

Step 5: Reject or not the null hypothesis that β1 = 1600

Alternative 2: Calculate the p-value and compare it to α.

The p-value for a one-sided test is

act
• For H1 : β1 > β1,0 : p − value = PrH0 [βˆ1 − β1,0 > βˆ1 − β1,0 ]
act
• For H1 : β1 < β1,0 : p − value = PrH0 [βˆ1 − β1,0 < βˆ1 − β1,0 ]

And it is obtained from the cumulative standard normal distribution as:

• For H1 : β1 > β1,0 : p − value = Pr(Z > tact ) = 1 − Φ(tact )
• For H1 : β1 < β1,0 : p − value = Pr(Z < tact ) = Φ(tact )

Alternative 2: Rejection rule

In our example: p − value = Pr(Z > 0.56) = 1 − Φ(tact ) = 1 − 0.7123 = 0.2877
• 0.2877 > 0.05 −→ We cannot reject the null that β1 = 1600

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 44 / 73
Fundamentals of Regression Analysis Appendix

Appendix

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 45 / 73
Fundamentals of Regression Analysis Appendix

Assumption 1: E(ui |Xi ) = 0

If the conditional distribution of ui given Xi has a mean of zero:

E(ui |Xi ) = 0

• There are different ways of showing the result of these assumption

• The "other factors" contained in ui are unrelated to Xi :

(1) Cov(X, u) = E[(X − E(X))(u − E(u))] = E[(X − E(X))u]

(2) Cov(X, u) = E[Xu − E(X)u] = E(Xu) − E(X)E(u)

By the law of iterated expectations: E(u) = E[E(u|X)]

(3) Cov(X, u) = E(Xu) − E(X)E[E(u|X)] = E(Xu)

By the law of iterated expectations: E(Xu) = E[E(Xu|X)]

(4) Cov(X, u) = E[E(Xu|X)] = E[E(u|X)X] = 0 → Corr(X, u) = 0

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 46 / 73
Fundamentals of Regression Analysis Appendix

If the conditional distribution of ui given Xi has a mean of zero:

E(ui |Xi ) = 0

• The conditional distribution of Yi is centered in the population

regression line (on average, the prediction of Yi is right).

(1) E(Yi | Xi ) = E(β0 + β1 Xi + ui | Xi )

(2) E(Yi | Xi ) = E(β0 + β1 Xi | Xi ) + E(ui | Xi )

(3) E(Yi | Xi ) = β0 + β1 Xi

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 47 / 73
Fundamentals of Regression Analysis Appendix

It is often convenient to discuss the conditional mean assumption in terms of

correlation between ui and Xi .

When doing so, remember that:

• If E(ui |Xi ) = 0, then Corr(Xi , ui ) is always zero

• However, Corr(Xi , ui ) = 0 does not necessarily imply that E(ui |Xi ) = 0

▶ That is, the correlation only captures the linear relationship between
Xi and ui

• But Corr(Xi , ui ) ̸= 0 necessarily implies that E(ui |Xi ) ̸= 0

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 48 / 73
Fundamentals of Regression Analysis Appendix

Unbiasedness of βˆ1

P
(Xi − X)(Yi − Y)
β̂1 = P
(Xi − X)2

First, represent β̂1 in terms of X and u

• Hint: Yi − Y i = β1 (Xi − X i ) + (ui − u)

P
(Xi − X̄)[β1 (Xi − X i ) + (ui − u)]
(1) β̂1 = P
(Xi − X̄)2

β1 (Xi − X̄)2 + (Xi − X̄)(ui − u)]

P P
(2) β̂1 = P
(Xi − X̄)2

(Xi − X̄)2
P P P
(Xi − X̄)(ui − ū) (Xi − X̄)(ui − ū)
(3) β̂1 = β1 P 2
+ P 2
= β 1 + P
(Xi − X̄) (Xi − X̄) (Xi − X̄)2

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 49 / 73
Fundamentals of Regression Analysis Appendix

P P
(Xi − X̄)ui − (Xi − X̄)u
(4) β̂1 = β1 + P
(Xi − X̄)2
Pn
• Hint: X̄ = i=1 Xi Pn
→ i=1 Xi = nX̄
n
• Hint: (Xi − X̄)ū = [ ni=1 Xi − ni=1 X̄]ū = [nX̄ − nX̄]ū = 0
P P P

P
(Xi − X)ui
(5) β̂1 = β1 + P
(Xi − X)2

Then, take the expectation of βˆ1 .

P
(Xi − X)ui
(6) E(β̂1 ) = E β1 + P
(Xi − X)2

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 50 / 73
Fundamentals of Regression Analysis Appendix

• By the law of iterated expectations: E(ui ) = E[E(ui |X1 , ..., Xn )]

P
(Xi − X)E(ui |X1 , ..., Xn )
(7) E(β̂1 ) = E β1 + P
(Xi − X)2

• Because observations are independently distributed:

E(ui |X1 , ..., Xn ) = E[E(ui |Xi )] and E(ui |Xi ) = 0
P
(Xi − X)E(ui |Xi )
(8) E(β̂1 ) = E β1 + P = β1
(Xi − X)2

• Equivalently, by the law of iterated expectation:

(9) E(β̂1 ) = E[E(βˆ1 |X1, ..., Xn )] = β1

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 51 / 73
Fundamentals of Regression Analysis Appendix

Unbiasedness example

• What does E(β̂1 ) = β1 intuitively means?

• The concept of unbiasedness should make us reflect about things such
as probability and expected value
• We will do a little simulated experiment to understand what is behind the
concept of unbiasedness
• And to what extent it is important or relevant to ask an estimator to be
unbiased.

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 52 / 73
Fundamentals of Regression Analysis Appendix

Data simulation

First, we will use Stata to simulate some observations from a true model
(remember: we never know the true model and the whole point is to estimate
its parameters)
• The true model is Wagei = β0 + β1 × Agei + ui , with β0 = 21 y β1 = 2
• Age is in years and wage is in euros per hour
• Let’s assume for the sake of simplicity, that the unknown error term is
iid
u ∼ N (0, 3) and satisfies the assumption of E(u | X) = 0
• As we said, we will treat the model as known, and we will generate 1000
values for Age and u, and using the true values of the parameters β0 and
β1 we will generate 1000 values for the wage
• Therefore, the 1000 data points for (Yi , Xi , ui ) will be our population

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 53 / 73
Fundamentals of Regression Analysis Appendix

True regression line: Wagei = 21 + 2Agei

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 54 / 73
Fundamentals of Regression Analysis Appendix

Simulation

1. Let’s take a random sample of n=50 data from 1000 data points
2. Then estimate the parameters β0 and β1 applying OLS to those data
3. That is, let’s now pretend that we don’t know the true population
parameters and use our random sample to estimate both β̂0 and β̂1

• What do we expect as the result from this estimation?

• What relationship does it have regarding the true line?

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 55 / 73
Fundamentals of Regression Analysis Appendix

The red dots are those point from the populations that were chosen in the random sampling and the estimated line with
those 50 points Wagei = 21.87 + 1.75Agei . In black, we have the true line (Wagei = 21 + 2Agei )

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 56 / 73
Fundamentals of Regression Analysis Appendix

Continuing the experiment

• Let’s repeat the experiment 10 times

• Each of these ten times, we will pick 50 points randomly from the entire
population and estimate again the OLS regression
• What does this suggests regarding the property of unbiasedness of the
OLS estimator and the assumption of E(u|X) = 0?

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 57 / 73
Fundamentals of Regression Analysis Appendix

Again, in red we have the points chosen in the second random sampling and the regression line with 50 data points
Wagei = 20, 6 + 2, 06Edadi . In black, we have the true line (Wagei = 21 + 2Agei )

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 58 / 73
Fundamentals of Regression Analysis Appendix

In red we have the points chosen in the third random sampling and the regression line with 50 data points
Wagei = 22, 09 + 1, 81Agei . In black, we have the true line (Wagei = 21 + 2Agei )

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 59 / 73
Fundamentals of Regression Analysis Appendix

In red we have the points chosen in the fourth random sampling and the regression line with 50 data points
Wagei = 21, 12 + 1, 94Agei . In black, we have the true line (Wagei = 21 + 2Agei )

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 60 / 73
Fundamentals of Regression Analysis Appendix

In red we have the points chosen in the fifth random sampling and the regression line with 50 data points
Wagei = 20, 96 + 2, 04Agei . In black, we have the true line (Wagei = 21 + 2Agei )

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 61 / 73
Fundamentals of Regression Analysis Appendix

In red we have the points chosen in the sixth random sampling and the regression line with 50 data points
Wagei = 19, 66 + 2, 23Agei . In black, we have the true line (Wagei = 21 + 2Agei )

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 62 / 73
Fundamentals of Regression Analysis Appendix

In red we have the points chosen in the seventh random sampling and the regression line with 50 data points
Wagei = 21, 73 + 1, 85Agei . In black, we have the true line (Wagei = 21 + 2Agei )

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 63 / 73
Fundamentals of Regression Analysis Appendix

• In red we have the points chosen in the eight random sampling and the regression line with 50 data points
Wagei = 22, 71 + 1, 86Agei . In black, we have the true line (Wagei = 21 + 2Agei )

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 64 / 73
Fundamentals of Regression Analysis Appendix

In red we have the points chosen in the ninth random sampling and the regression line with 50 data points
Wagei = 20, 43 + 2, 14Agei . In black, we have the true line (Wagei = 21 + 2Agei )

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 65 / 73
Fundamentals of Regression Analysis Appendix

In red we have the points chosen in the tenth random sampling and the regression line with 50 data points
Wagei = 20, 57 + 2, 12Agei . In black, we have the true line (Wagei = 21 + 2Agei )

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 66 / 73
Fundamentals of Regression Analysis Appendix

Sample β̂0 β̂1

1 21.87 1.75
2 20.66 2.06
3 22.09 1.81
4 21.12 1.94
5 20.96 2.04
6 19.66 2.23
7 21.73 1.85
8 22.71 1.86
9 20.43 2.14
10 20.55 2.15

Average 21.178 1.983

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 67 / 73
Fundamentals of Regression Analysis Appendix

Simulation: continuation

• How would the table change if we would repeat it 30 times?

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 68 / 73
Fundamentals of Regression Analysis Appendix

Muestra β̂0 β̂1

1 21.87 1.75
2 20.66 2.06
3 22.09 1.81
4 21.12 1.94
5 20.96 2.04
6 19.66 2.23
7 21.73 1.85
8 22.71 1.86
9 20.43 2.14
10 20.55 2.15
11 20.57 2.12
12 21.86 1.88
13 21.36 2.00
14 22.60 1.75
15 21.50 1.95
16 20.49 2.11
17 20.81 2.02
18 21.20 1.98
19 22.23 1.75
20 20.80 2.12
21 19.69 2.16
22 21.58 1.83
23 21.00 2.05
24 19.99 2.04
25 20.89 2.12
26 20.89 2.06
27 21.73 1.99
28 21.85 2.03
29 21.95 1.87
30 20.31 2.12

Average 21.17 1.99

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 69 / 73
Fundamentals of Regression Analysis Appendix

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 70 / 73
Fundamentals of Regression Analysis Appendix

• How would the result would change if we could repeat the process more
times?
• How would the result change if we repeat it from scratch 1000, but now
using random samples of n=100?

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 71 / 73
Fundamentals of Regression Analysis Appendix

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 72 / 73
Fundamentals of Regression Analysis Appendix

Conclusions

• The estimator is a random variable: with different samples we get

different results
• Unbiased means that if we repeat the random sampling enough times,
the average of those experiments will be the true parameter
• But it does not states anything regarding a particular sample and
estimated coefficient
• Unbiasedness is a concept linked to fairness
• Its interpretation will depend on the concept of probability that we use
• In the frequentist view, the interpretation is the value that would arise if
we repeat the experiment a sufficient number of times (LLN)
• If the concept is more subjective (bayesian), it would be the value we
expect to show in light of our knowledge of how the true model works

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 73 / 73

Lecture 2 SLR - 1
No ratings yet
Lecture 2 SLR - 1
28 pages
Econometrics for Finance Students
No ratings yet
Econometrics for Finance Students
64 pages
02 Simple Regression
No ratings yet
02 Simple Regression
29 pages
1 - The Simple Regression Model
No ratings yet
1 - The Simple Regression Model
41 pages
CH 02
No ratings yet
CH 02
41 pages
Lecture 03 JEB109 2023
No ratings yet
Lecture 03 JEB109 2023
26 pages
OLS Estimates: Finite-Sample Properties
No ratings yet
OLS Estimates: Finite-Sample Properties
20 pages
MFIN 305 - Lecture1
No ratings yet
MFIN 305 - Lecture1
77 pages
Ordinary Least Squares: Rómulo A. Chumacero
No ratings yet
Ordinary Least Squares: Rómulo A. Chumacero
50 pages
Tema I (Mínimos Cuadrados Ordinarios)
No ratings yet
Tema I (Mínimos Cuadrados Ordinarios)
49 pages
Week 3-4
No ratings yet
Week 3-4
75 pages
UnivariateRegression 2
No ratings yet
UnivariateRegression 2
72 pages
Cheatsheet
No ratings yet
Cheatsheet
2 pages
Lect2 Part1
No ratings yet
Lect2 Part1
64 pages
FECO Note 2 - Simple Linear Regression: Xuan Chinh Mai
No ratings yet
FECO Note 2 - Simple Linear Regression: Xuan Chinh Mai
7 pages
Properties of OLS Estimators: Assumptions Underlying Model
100% (1)
Properties of OLS Estimators: Assumptions Underlying Model
23 pages
Simple Regression & OLS Explained
No ratings yet
Simple Regression & OLS Explained
41 pages
Ch.2 The Simple Regression Model
No ratings yet
Ch.2 The Simple Regression Model
6 pages
Week 2, OLS
No ratings yet
Week 2, OLS
83 pages
Ch3 Slides Ed4 2024
No ratings yet
Ch3 Slides Ed4 2024
72 pages
統計摘要
No ratings yet
統計摘要
12 pages
Ordinary Least Squares-2
No ratings yet
Ordinary Least Squares-2
31 pages
Ols 23-24
No ratings yet
Ols 23-24
87 pages
The Simple Regression Model
No ratings yet
The Simple Regression Model
24 pages
Week2 Lecture2
No ratings yet
Week2 Lecture2
59 pages
Multiple Regression Analysis: y + X + X + - . - X + U
No ratings yet
Multiple Regression Analysis: y + X + X + - . - X + U
26 pages
OLS Model Assumptions Explained
No ratings yet
OLS Model Assumptions Explained
11 pages
Ch3 Slides Ed4 2024 20
No ratings yet
Ch3 Slides Ed4 2024 20
72 pages
Pertemuan 2 - Simple Linear Regression
No ratings yet
Pertemuan 2 - Simple Linear Regression
24 pages
TCH442E Quantitative Methods For Finance
No ratings yet
TCH442E Quantitative Methods For Finance
21 pages
Econometrics Chap - 2
No ratings yet
Econometrics Chap - 2
57 pages
The Simple Regression Model
No ratings yet
The Simple Regression Model
41 pages
Introduction To Mathematical Modeling: Simple Linear Regression
No ratings yet
Introduction To Mathematical Modeling: Simple Linear Regression
21 pages
Properties of The OLS Estimator: Quantitative Methods 2
No ratings yet
Properties of The OLS Estimator: Quantitative Methods 2
57 pages
EC501 Lecture 02
No ratings yet
EC501 Lecture 02
27 pages
Lecture 6
No ratings yet
Lecture 6
45 pages
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics by Jim
No ratings yet
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics by Jim
71 pages
Welcome To The Course: Financial Econometrics I
No ratings yet
Welcome To The Course: Financial Econometrics I
14 pages
Chapter 2 Econometric
No ratings yet
Chapter 2 Econometric
28 pages
Econometrics ch2 Cheatsheet
No ratings yet
Econometrics ch2 Cheatsheet
2 pages
ECO 401 Econometrics: SI 2021 Week 2, 14 September
100% (1)
ECO 401 Econometrics: SI 2021 Week 2, 14 September
47 pages
ECO375H Slides 3
No ratings yet
ECO375H Slides 3
39 pages
OLS Estimator: Key Statistical Insights
No ratings yet
OLS Estimator: Key Statistical Insights
12 pages
Econometrics for Students
No ratings yet
Econometrics for Students
28 pages
4-Econometrics-Linear Regression
No ratings yet
4-Econometrics-Linear Regression
12 pages
OLS Regression Assumptions Guide
No ratings yet
OLS Regression Assumptions Guide
3 pages
Simple Linear Regression Model
No ratings yet
Simple Linear Regression Model
6 pages
Econ 399 Chapter2a
No ratings yet
Econ 399 Chapter2a
40 pages
Chapter3
No ratings yet
Chapter3
52 pages
ch02 1
No ratings yet
ch02 1
41 pages
Econometric S
No ratings yet
Econometric S
8 pages
Chapter 11 Lecture Notes .
No ratings yet
Chapter 11 Lecture Notes .
22 pages
HW 1
No ratings yet
HW 1
9 pages
Ordinary Least Squares
No ratings yet
Ordinary Least Squares
21 pages
Multiple Regression Analysis Guide
No ratings yet
Multiple Regression Analysis Guide
100 pages
SEMINAR 6 - Solutions
No ratings yet
SEMINAR 6 - Solutions
3 pages
Seminar 1 Portfolio Theory Solutions
No ratings yet
Seminar 1 Portfolio Theory Solutions
6 pages
Financial Economics2024 Model 1 Solutions July
No ratings yet
Financial Economics2024 Model 1 Solutions July
9 pages
SEMINAR 5 - Solutions
No ratings yet
SEMINAR 5 - Solutions
4 pages
Seminar 3 Midterm SOLUTIONS
No ratings yet
Seminar 3 Midterm SOLUTIONS
10 pages
Seminar 4 Bonds Solutions
No ratings yet
Seminar 4 Bonds Solutions
12 pages
Seminar 2 Pricing of Risky Assets Solutions
No ratings yet
Seminar 2 Pricing of Risky Assets Solutions
8 pages
Finantial Economic PS
No ratings yet
Finantial Economic PS
43 pages
Financial Economics2024 - Model 2 - Solutions
No ratings yet
Financial Economics2024 - Model 2 - Solutions
9 pages
Statistics Exam Review Problems
No ratings yet
Statistics Exam Review Problems
11 pages
Econometrics Analysis of Farm Inputs
100% (1)
Econometrics Analysis of Farm Inputs
10 pages
Nevo and Whinston (2010)
No ratings yet
Nevo and Whinston (2010)
16 pages
Statistics Course Outline 2022
No ratings yet
Statistics Course Outline 2022
2 pages
Saluyot Leaves As Acoustic Gel For Ultrasound Imaging
No ratings yet
Saluyot Leaves As Acoustic Gel For Ultrasound Imaging
5 pages
BITS Pilani: MATH F113-Probability and Statistics Assignment 1 (Updated)
No ratings yet
BITS Pilani: MATH F113-Probability and Statistics Assignment 1 (Updated)
10 pages
Introduction to Statistics Concepts
No ratings yet
Introduction to Statistics Concepts
4 pages
Statistics Test Questions With Answer Key
83% (6)
Statistics Test Questions With Answer Key
2 pages
Introductory Econometrics Exam
No ratings yet
Introductory Econometrics Exam
2 pages
Regression Techniques
No ratings yet
Regression Techniques
111 pages
Non Parametric Method
No ratings yet
Non Parametric Method
35 pages
STATISTICS Lecture 1
No ratings yet
STATISTICS Lecture 1
5 pages
Beyond Multiple Linear Regression Applied Generalized Linear Models and Multilevel Models in R 1st Edition Paul Roback
No ratings yet
Beyond Multiple Linear Regression Applied Generalized Linear Models and Multilevel Models in R 1st Edition Paul Roback
71 pages
Lampiran 1 Irna Revisi FIX
No ratings yet
Lampiran 1 Irna Revisi FIX
18 pages
Assignment #3 Template - Inferential Statistics Analysis and Writeup
No ratings yet
Assignment #3 Template - Inferential Statistics Analysis and Writeup
3 pages
Causality Models Ebook Download
100% (2)
Causality Models Ebook Download
24 pages
Statistics For The Behavioral Sciences 9th Edition Frederick J Gravetter Download
100% (1)
Statistics For The Behavioral Sciences 9th Edition Frederick J Gravetter Download
61 pages
BSN A23 Mat Sas 14
No ratings yet
BSN A23 Mat Sas 14
5 pages
Lehmann Scheffe PDF
100% (1)
Lehmann Scheffe PDF
7 pages
ANOVA and Nonparametric Tests Analysis
No ratings yet
ANOVA and Nonparametric Tests Analysis
12 pages
QAM Problem Set 2
No ratings yet
QAM Problem Set 2
9 pages
Advanced Bayesian Inference Course
No ratings yet
Advanced Bayesian Inference Course
59 pages
Correlation & Regression Analysis
No ratings yet
Correlation & Regression Analysis
5 pages
Econ 2300 L7
No ratings yet
Econ 2300 L7
27 pages
Statistical Inference 2 Note 02
100% (1)
Statistical Inference 2 Note 02
7 pages
Transportation Planning Analysis
No ratings yet
Transportation Planning Analysis
13 pages
1-1 - Simple and Multiple Linear Regression
No ratings yet
1-1 - Simple and Multiple Linear Regression
17 pages
U-4 Iml
No ratings yet
U-4 Iml
17 pages
Exercise - MLR - Colaboratory
No ratings yet
Exercise - MLR - Colaboratory
2 pages
Class Size vs Test Score Analysis
No ratings yet
Class Size vs Test Score Analysis
5 pages

Lect2 Part2

Uploaded by

Lect2 Part2

Uploaded by

Econometrics I: Fundamentals of Regression Analysis

Javier Abellán, Màxim Ventura and Carlos Suárez

Universitat Pompeu Fabra

The OLS estimator assumptions

Why do we use OLS instead of other possible estimators?

• OLS is a generalisation of the sample average: if the "line" would be just

• Assumption 1: The conditional distribution of ui given Xi has a mean of

0 < E(X 4 ) < ∞ and 0 < E(Y 4 ) < ∞

If the conditional distribution of ui given Xi has a mean of zero:

With experimental data:

Assumption 2: (Xi , Yi ), i = 1, ....., n are i.i.d

(Xi , Yi ), i = 1, ....., n are independently and identically distributed

Assumption 3: Large outliers are unlikely

• Outlier: an observation with values of Xi , Yi or both far outside the usual

Assumption 3: Large outliers are unlikely

• The validity of this assumption will depend on the characteristics of the

Twin roles of the least square assumptions

1. Mathematical role: if the three assumptions hold...

The sampling distribution of the OLS estimator

Sampling distribution of the OLS estimators

Under the least square assumptions:

• βˆ0 and βˆ1 are unbiased estimators of β0 and β1

E(βˆ0 ) = β0 and E(βˆ1 ) = β1

• In large samples, by the central limit theorem, the sampling distribution

βˆ0 → N (β0 , σβ2ˆ ) and βˆ1 → N (β1 , σβ2ˆ )

• If we replace Yi by its population value according to the true model

Normal approximation of βˆ1 and βˆ0 in large samples

The large sample approximation of βˆ1 is:

The large sample approximation of βˆ0 is:

Consistency of βˆ1 and βˆ0

• βˆ0 and βˆ1 are consistent estimators of β0 and β1

Estimator of the variance and standard error of βˆ1 and βˆ0

The estimators of σβ2ˆ and σβ2ˆ are, respectively:

Homoskedasticity and heteroskedasticity

Let’s add a fourth assumption:

The error term ui is homoskedastic if:

• In the left-hand figure, the spread of the conditional distribution of ui

Mathematical implications of homoskedasticity

3 Because the conditional variance of ui given Xi is constant,

• Consequently, if errors are homoskedastic, the formula for the standard

• When the errors are heteroskedastic, the homoskedasticity-only

• In contrast, using heteroskedasticity-robust standard errors (the

Variance of the residuals in the dwellings example

• Does σû2 depend on Xi ?

• As we can see, a larger value of X has a larger û. Therefore, it is quite

Hypothesis test and confidence intervals

Testing hypotheses about β1

The general approach to test hypothesis about the unknown parameter β1 is

1. Set a null hypothesis (H0 ) about β1 and assume it is true

Two-sided hypothesis concerning β1

Step 1: Convert your empirical question into a hypothesis

More generally: H0 : β1 = β1,0 and H1 : β1 ̸= β1,0

Two-sided hypothesis concerning β1

Step 2: Characterize the sampling distribution of βˆ1 under H0

So under our H0 , if the sample is large enough the distribution of βˆ1 is

Or more generally, approximately a N(β1,0 , σβ2ˆ )

Two-sided hypothesis concerning β1

Step 3: Calculate βˆ1 from a randomly selected sample.

Step 4: Choose a significance probability level (α).

• Commonly set at 5% (α = 0.05)

Two-sided hypothesis concerning β1

Step 5: Reject or not the null hypothesis that β1 = 0

Alternative 1: Rejection rule

Step 5: Reject or not the null hypothesis that β1 = 0

Alternative 2: Calculate the p-value and compare it to α.

p-value: probability, under the assumption that H0 is true, of observing a

p − value = PrH0 [|β̂1 − β1,0 | > |β̂1act − β1,0 |]

Because βˆ1 is approximately distributed in large samples, under H0 the

p − value = Pr[|Z| > |tact |] = 2Φ(−|tact |)

where Φ() is the cumulative distribution function (CDF) of the standard

Alternative 2: Rejection rule

Step 5: Reject or not the null hypothesis that β1 = 0

By rearranging the rejection rule based on the t-statistic:

Alternative 3: Rejection rule

Confidence interval for predicted effect of changing Size

β1 is unknown, but because we can construct a confidence interval for β1 , we

where Φ() is the cumulative distribution function (CDF) of the standard