0% found this document useful (0 votes)

113 views8 pages

Lecture 24: Weighted and Generalized Least Squares 1 Weighted Least Squares

Weighted least squares (WLS) generalizes ordinary least squares (OLS) by allowing for unequal weights on observations. WLS is optimal when variances are known and weights are set inversely to variances. The Gauss-Markov theorem states that among linear, unbiased estimators, WLS has minimum variance. WLS is useful when observations have differing precision, to focus on important regions, or to approximate generalized linear models.

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

113 views8 pages

Lecture 24: Weighted and Generalized Least Squares 1 Weighted Least Squares

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Lecture 24: Weighted and Generalized Least Squares

1 Weighted Least Squares

When we use ordinary least squares to estimate linear regression, we minimize the mean squared
error:
n
1X
M SE(b) = (Yi − Xi· β)2 (1)
n
i=1

where Xi· is the ith row of X. The solution is

βbOLS = (XT X)−1 XT Y. (2)

Suppose we minimize the weighted MSE

n
1X
W M SE(b, w1 , . . . wn ) = wi (Yi − Xi· b)2 . (3)
n
i=1

This includes ordinary least squares as the special case where all the weights wi = 1. We can solve
it by the same kind of linear algebra we used to solve the ordinary linear least squares problem. If
we write W for the matrix with the wi on the diagonal and zeroes everywhere else, then

W M SE = n−1 (Y − Xb)T W(Y − Xb) (4)

1
YT WY − YT WXb − bT XT WY + bT XT WXb .

= (5)
n
Differentiating with respect to b, we get as the gradient
2
−XT WY + XT WXb .

∇b W M SE =
n
Setting this to zero at the optimum and solving,

βbW LS = (XT WX)−1 XT WY. (6)

But why would we want to minimize Eq. 3?

1. Focusing accuracy. We may care very strongly about predicting the response for certain values
of the input — ones we expect to see often again, ones where mistakes are especially costly
or embarrassing or painful, etc. — than others. If we give the points near that region big
weights, and points elsewhere smaller weights, the regression will be pulled towards matching
the data in that region.

2. Discounting imprecision. Ordinary least squares minimizes the squared error when the vari-
ance of the noise terms is constant over all observations, so we’re measuring the regression
function with the same precision elsewhere. This situation, of constant noise variance, is called
homoskedasticity. Often however the magnitude of the noise is not constant, and the data
are heteroskedastic. Let σi2 = E[i |Xi ] = σi2 . Homoskedasticity means that σi2 = σ 2 for all
i. Heteroskedasticity means that the σi2 can be different.

1
When we have heteroskedasticity, ordinary least squares is no longer the optimal estimate —
we’ll see soon that other estimators can be unbiased and have smaller variance. If however
we know the noise variance σi2 at each measurement i, and set wi = 1/σi2 , we get to minimize
the variance of estimation.

3. Doing something else. There are a number of other optimization problems which can be
transformed into, or approximated by, weighted least squares. The most important of these
arises from generalized linear models, where the mean response is some nonlinear function
of a linear predictor; we will look at them in 402.

2 Heteroskedasticity
Suppose that
Yi = β0 + β1 X1i + · · · + βp Xpi + i
where E[i ] = 0 and Var[i ] = σi2 . (As usual, we are treating the Xi ’s as fixed.) This is called the
Heteroskedastic linear regression model. For now, assume we know σ1 , . . . , σp .
The model is
Y = Xβ +
where now (treating X as fixed), E[] = 0 and Var() = Σ and Σ is no longer of the form σ 2 I. The
weighted least squares estimator is
βb = KY
where K = (XT W X)−1 XT W. So,

βb = KY = K(Xβ + ) = β + K.

Hence,
E[β]
b = β, b = KΣK T .
Var(β) (7)
Note that the estimator is unbiased.

3 The Gauss-Markov Theorem

We’ve seen that when we do weighted least squares, our estimates of β are linear in Y, and unbiased:

βb = (XT WX)−1 XT WY

and E[β]
b = β.
Let us consider a special case: suppose we take W = Σ−1 . Then, from (7)
" # " #
b = KΣK T = (XT Σ−1 X)−1 XT Σ−1 Σ Σ−1 X(XT Σ−1 X)−1 = (XT Σ−1 X)−1 .
Var(β) (8)

We will now show that βb is, in a certain sense, optimal.

Like any optimality result, it is crucial to lay out carefully the range of possible alternatives,
and the criterion by which those alternatives will be compared. The classical optimality result for

2
estimating linear models is the Gauss-Markov theorem, which takes the range of possibilities
to be linear, unbiased estimators of β, and the criterion to be variance of the estimator.
Any linear estimator, say β,
e could be written as

βe = QY

where Q would be a (p + 1) × n matrix. We will show that if βe is unbiased, then it has larger
variance than βbWLS .
For βe to be an unbiased estimator, we must have

E [QY|X] = QXβ = β.

Since this must hold for all β and all X, we have to have QX = I. The variance is then

Var [QY|X] = QVar [|X] QT = QΣQ (9)

where Σ = Var [|X]. Let

K ≡ (XT Σ−1 X)−1 XT Σ−1 .
Now, whatever Q might be, we can always write

Q=K+R (10)

for some matrix R. The unbiasedness constraint on Q implies that

RX = 0

because KX = I. Now we substitute Eq. 10 into Eq. 9:

h i
Var βe = (K + R)Σ(K + R)T (11)
= KΣKT + RΣKT + KΣRT + RΣRT (12)
T −1 −1 T −1 −1 T −1 −1
= (X Σ X) X Σ ΣΣ X(X Σ X) (13)
−1 T −1 −1
+RΣΣ X(X Σ X)
−1 −1
+(X Σ T
X) X Σ−1 ΣRT
T

+RΣRT
= (XT Σ−1 X)−1 XT Σ−1 X(XT Σ−1 X)−1 (14)
T −1 −1 T −1 −1 T T
+RX(X Σ X) + (X Σ X) X R
T
+RΣR
= (XT Σ−1 X)−1 + RΣRT (15)

where the last step uses the fact that RX = 0 (and so XT RT = 0T ).

Hence, h i h i
Var β̃i = (XT Σ−1 X)−1 T T
ii + Ri ΣRi = Var βi + Ri ΣRi
b

where Ri denotes the ith row of R. Since Σ is a covariance matrix, it’s positive definite, meaning
that aΣaT ≥ 0 for any vector a. Hence,
h i h i
Var β̃i ≥ Var βbi .

3
We conclude that WLS, with W = Σ−1 , has the least variance among all possible linear, unbiased
estimators of the regression coefficients.
Notes:

1. If all the noise variances are equal, then we’ve proved the optimality of OLS.

2. The theorem doesn’t rule out linear, biased estimators with smaller variance. As an example,
albeit a trivial one, 0Y is linear and has variance 0, but is (generally) very biased.

3. The theorem also doesn’t rule out non-linear unbiased estimators of smaller variance. Or
indeed non-linear biased estimators of even smaller variance.

4. The proof actually doesn’t require the variance matrix to be diagonal.

4 Finding the Variance and Weights

So far we have assumed that we know σ1 , . . . , σp . Here are some cases where this might be true.

Multiple measurements. The easiest case is when our measurements of the response are actu-
ally averages over individual measurements, each with some variance σ 2 . If some Yi are based on
averaging more individual measurements than others, there will be heteroskedasticity. The variance
of the average of ni uncorrelated measurements will be σ 2 /ni , so in this situation we could take
wi ∝ n i .

Binomial counts Suppose our response variable is a count, derived from a binomial distribution,
i.e., Yi ∼ Binom(ni , pi ). We would usually model pi as a function of the predictor variables — at
this level of statistical knowledge, a linear function. This would imply that Yi had expectation ni pi ,
and variance ni pi (1 − pi ). We would be well-advised to use this formula for the variance, rather
than pretending that all observations had equal variance.

Proportions based on binomials If our response variable is a proportion based on a binomial,

we’d see an expectation value of pi and a variance of pi (1−p ni
i)
. Again, this is not equal across
different values of ni , or for that matter different values of pi .

Poisson counts Binomial counts have a hard upper limit, ni ; if the upper limit is immense or
even (theoretically) infinite, we may be better off using a Poisson distribution. In such situations,
the mean of the Poisson λi will be a (possibly-linear) function of the predictors, and the variance
will also be equal to λi .

Other counts The binomial and Poisson distributions rest on independence across “trials” (what-
ever those might be). There are a range of discrete probability models which allow for correlation
across trials (leadings to more or less variance). These may, in particular situations, be more
appropriate.

4
5 Conditional Variance Function Estimation
If we don’t know the variances, we might be able to estimate them. There are two common ways
to estimate conditional variances, which differ slightly in how they use non-parametric smoothing.
(We will discuss non-parametric smoothing in more detail later and in 402).
Method 1:

1. Estimate m(x) with your favorite regression method, getting m(x).

b i ))2 .
2. Construct the squared residuals, ui = (Yi − m(x

3. Use your favorite non-parametric method to estimate the conditional mean of the ui , call it
qb(x).

bx2 = qb(x).
4. Estimate the variance using σ

Here is method 2:

1. Estimate m(x) with your favorite regression method, getting m(x).

b i ))2 .
2. Construct the log squared residuals, zi = log (Yi − m(x

3. Use your favorite non-parametric method to estimate the conditional mean of the zi , call it
sb(x).

bx2 = exp sb(x).

4. Predict the variance using σ

The second method ensures that the estimates variances are positive.
We are estimating the variance function to do weighted least squares, but these methods can
be used more generally. It’s often important to understand variance in its own right, and this is a
general method for estimating it. Our estimate of the variance function depends on first having a
good estimate of the regression function

5.1 Example
> plot(x,y)
> out = lm(y ~ x)
> abline(out)
> summary(out)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.02934 0.03935 0.745 0.458
x 1.89866 0.07092 26.772 <2e-16 ***

> plot(x,rstudent(out))
> abline(h=0)
>
> u = log((resid(out))^2)
> tmp = loess(u ~ x)
> s2 = exp(tmp$fitted)

5
> plot(x,s2)
> w = 1/s2
> out2 = lm(y ~ x,weights=w)
> summary(out2)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.002967 0.008252 0.36 0.72
x 2.046929 0.043872 46.66 <2e-16 ***

6 Correlated Noise and Generalized Least Squares

Sometimes, we might believe the right model is (in matrix form)

Y = Xβ + (16)
E [|X] = 0 (17)
Var [|X] = Σ (18)

where the matrix Σ is not diagonal. The off-diagonal entries represent covariance in the noise
terms, Cov [i , j ] = Σij . How should we estimate β? Here are two approaches. We assume that Σ
is known.
Approach 1. Because Σ is a variance matrix, we know it is square, symmetric, and positive-
definite. This implies that there exists a square root matrix S such that SS = Σ. Now we multiply
our model by S−1 :
S−1 Y = S−1 Xβ + S−1
This os a linear regression of S−1 Y on S−1 X, with the same coefficients β as our original regression.
However, we have improved the properties of the noise. The noise is still zero in expectation,

E S−1 |X = S−1 0 = 0

but

Var S−1 |X = S−1 Var [|X] S−T

(19)
= S−1 ΣS−T (20)
−1 T −T
= S SS S (21)
= I. (22)

To sum up, if we know Σ, we can estimate β by doing an ordinary least squares regression of
S−1 Y on S−1 X. The estimate is

βb = ((S−1 X)T S−1 X)−1 (S−1 X)T S−1 Y (23)

T −T −1 −1 T −T −1
= (X S S X) X S S Y (24)
T −1 −1 T −1
= (X Σ X) X Σ Y. (25)

This looks just like our weighted least squares estimate, only with Σ−1 in place of W.

6
● ● ●

4
3

●
●

3
●
●
2

● ● ●
● ●●●●
●

rstudent(out)

2
● ●
● ● ●● ●
● ● ●● ●
●●
●●
● ●
1

●
● ●
● ●
● ●
●
●●

1
●
●●● ● ● ●● ●
●● ●
y

● ●● ● ●
●
●● ● ●
● ●
●●
● ● ● ●● ● ● ●● ●
●
●●●
● ● ● ● ● ●●
● ●

0
● ● ●●●● ●●
0

●
● ●●●●●● ●
●●
●●● ●●
● ● ● ● ●●
● ●
●●●
●
● ●●
● ●● ●
●●● ●●● ● ●

−1
● ●● ●
●
●
● ● ●
●● ●●
−1

● ● ●
● ● ● ●
● ●
● ● ● ●
●● ●
●
● ●
−2

●
●
● ●

−3
−1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0

x x

● ●
3

●
●
0.30

●
●
2

● ●●●● ●
● ● ●
● ● ●●
● ● ● ●● ●
●●
0.20

● ●
1

● ●
● ●
●
●●
●
●●●
s2

●● ● ●
y

●
●
●● ●
● ●●
● ●
●
●●●
●
0

● ●
●
●●●
●
0.10

●
●
●
●●●●
●
●
●●● ●●●
● ●●
● ●● ● ●
●● ●●
−1

● ● ● ● ●
●
●
● ● ● ●
●
●● ●
●
● ●●
● ●● ●
●
●●
0.00

● ●● ● ● ●
−2

●
●●●
●●●● ●●● ●
●●●
●●
● ●●
●●
●●
●●●●
● ●
●●●●●● ●

−1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0

x x

Figure 1: Top left: data with fitted line. Top right: residuals. Bottom left: Plot of estimates variances.
Bottom right: Data with weigthed least squares line.

7
Approach 2. This resemblance is no mere coincidence. We can write the WLS problem as
that of minimizing (Y − Xβ)T W(Y − Xβ), for a diagonal matrix W. Suppose we try instead to
minimize
(Y − Xβ)T W(Y − Xβ)
for a non-diagonal, but still symmetric and positive-definite, matrix W. This is called a general-
ized least squares (GLS) problem. Every single step we went through before is still valid, because
none of it rested on W being diagonal, so

βbGLS = (XT WX)−1 XT WY. (26)

We have just seen is that if we set W = Σ−1 , we also get this solution when we transform the
variables so as to de-correlate the noise, and then do ordinary least squares. This should at least
make it plausible that this is a good way to estimate β in the face of correlated noise.
To go beyond plausibility, refer back to §3. At no point in our reasoning did we actually rely
on Var [|X] being diagonal. It follows that if we set W = Var [|X]−1 , we get the linear, unbiased
estimator of minimum variance. If we believe that the noise is Gaussian, then this is also the
maximum likelihood estimator.
Where Do the Covariances Come From? In general we cannot estimate Σ. So we can
only use this approach when Σ comes from external information. For example, the data may have
some time series structure or spatial structure which suggests the form of Σ.

7 WLS versus Model Errors

When you find that your residuals from an initial model have non-constant variance or are correlated
with each other, there are (at least) two possible explanations. One is that the fluctuations around
the regression line really are heteroskedastic and/or correlated. In that case, you should try to
model that variance and those correlations, and use WLS or GLS. The other explanation is that
something is wrong with your model. If there’s an important predictor variable which is just missing
from your model, for example, then its contribution to the response will be part of your residuals.
If that omitted variable is larger in some parts of the data than in others, or if the omitted variable
has correlations, then that will make your residuals change in magnitude and be correlated. More
subtly, having the wrong functional form for a variable you do include can produce those effects as
well.

Lecture 24-25: Weighted and Generalized Least Squares: 36-401, Fall 2015, Section B 19 and 24 November 2015
No ratings yet
Lecture 24-25: Weighted and Generalized Least Squares: 36-401, Fall 2015, Section B 19 and 24 November 2015
27 pages
SLRM Note
No ratings yet
SLRM Note
15 pages
3 SimpleLinearRegression
No ratings yet
3 SimpleLinearRegression
30 pages
Bias-Variance Tradeoffs: 1 Single Sample MLE
No ratings yet
Bias-Variance Tradeoffs: 1 Single Sample MLE
7 pages
RegEstimationLS ML StatColumbia
No ratings yet
RegEstimationLS ML StatColumbia
44 pages
Chap 2
No ratings yet
Chap 2
40 pages
Linear Regression Analysis: Module - Vii
No ratings yet
Linear Regression Analysis: Module - Vii
10 pages
Lecture5 Module2 Anova 1
No ratings yet
Lecture5 Module2 Anova 1
9 pages
Lecture 4: Simple Linear Regression Models, With Hints at Their Estimation
No ratings yet
Lecture 4: Simple Linear Regression Models, With Hints at Their Estimation
12 pages
Simple Linear Regression Guide
No ratings yet
Simple Linear Regression Guide
34 pages
Non-Spherical Errors: 1 Efficient OLS
No ratings yet
Non-Spherical Errors: 1 Efficient OLS
14 pages
L5 2025 Spring
No ratings yet
L5 2025 Spring
40 pages
Linera Regression II PDF
No ratings yet
Linera Regression II PDF
14 pages
Math644 - Chapter 1 - Part2 PDF
No ratings yet
Math644 - Chapter 1 - Part2 PDF
14 pages
Seattle SISG 18 IntroQG Lecture08
No ratings yet
Seattle SISG 18 IntroQG Lecture08
21 pages
Least-Squares Estimation Guide
100% (2)
Least-Squares Estimation Guide
4 pages
Lect 6
No ratings yet
Lect 6
20 pages
統計摘要
No ratings yet
統計摘要
12 pages
1 When Heteroskedasticity Is Known Up To A Multiplicative Constant
No ratings yet
1 When Heteroskedasticity Is Known Up To A Multiplicative Constant
4 pages
Regression Analysis Essentials
No ratings yet
Regression Analysis Essentials
55 pages
Econometrics
No ratings yet
Econometrics
13 pages
Properties of The OLS Estimator: Quantitative Methods 2
No ratings yet
Properties of The OLS Estimator: Quantitative Methods 2
57 pages
Unit - III
No ratings yet
Unit - III
4 pages
Chapter 6: Regression
No ratings yet
Chapter 6: Regression
7 pages
Behavioral Statistics Estimation
No ratings yet
Behavioral Statistics Estimation
5 pages
Chap 7
No ratings yet
Chap 7
7 pages
TSNotes 1
No ratings yet
TSNotes 1
29 pages
Advanced Econometrics PDF
No ratings yet
Advanced Econometrics PDF
58 pages
Lecture 11 - Stochastic Regressors Measurement Errors
No ratings yet
Lecture 11 - Stochastic Regressors Measurement Errors
6 pages
Lecture Slides On Statistics at Uni St. Gallen
No ratings yet
Lecture Slides On Statistics at Uni St. Gallen
20 pages
Multiple Linear Regression Model by Jeevan Bista
No ratings yet
Multiple Linear Regression Model by Jeevan Bista
16 pages
EC501 Lecture 02
No ratings yet
EC501 Lecture 02
27 pages
MA 324, Lecture 1: Yohann Tendero Yohann - Tendero@
No ratings yet
MA 324, Lecture 1: Yohann Tendero Yohann - Tendero@
19 pages
Solutions For Tutorial 2
No ratings yet
Solutions For Tutorial 2
14 pages
Gauss Markov Theorem
No ratings yet
Gauss Markov Theorem
3 pages
Introduction To Mathematical Modeling: Simple Linear Regression
No ratings yet
Introduction To Mathematical Modeling: Simple Linear Regression
21 pages
Trix - Post Midsem Merged
No ratings yet
Trix - Post Midsem Merged
49 pages
Heteroscedasticity Slides PDF
No ratings yet
Heteroscedasticity Slides PDF
17 pages
Regression Analysis Guide
100% (1)
Regression Analysis Guide
280 pages
Gauss-Markov Theorem
No ratings yet
Gauss-Markov Theorem
5 pages
Advanced Least Squares Techniques
No ratings yet
Advanced Least Squares Techniques
12 pages
Regression Analysis
No ratings yet
Regression Analysis
37 pages
Statistics II - Least Squares Regression: Marcelo Sant'Anna
No ratings yet
Statistics II - Least Squares Regression: Marcelo Sant'Anna
18 pages
Robust - Regression - Methods - Achieving - Small - Standard Error
No ratings yet
Robust - Regression - Methods - Achieving - Small - Standard Error
18 pages
Econometrics: CLM & OLS Basics
No ratings yet
Econometrics: CLM & OLS Basics
11 pages
FCDS - RA ch1 Sp21
No ratings yet
FCDS - RA ch1 Sp21
14 pages
Emet2007 Notes
No ratings yet
Emet2007 Notes
6 pages
Basics of Regression Analysis
No ratings yet
Basics of Regression Analysis
63 pages
Unit - 1
No ratings yet
Unit - 1
8 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
GLS MMH
No ratings yet
GLS MMH
35 pages
Heteroskedasticity & Autocorrelation
No ratings yet
Heteroskedasticity & Autocorrelation
5 pages
3 The Basic Linear Model Finite Sample Results
No ratings yet
3 The Basic Linear Model Finite Sample Results
9 pages
Lecture 22: Review For Exam 2 1 Basic Model Assumptions (Without Gaussian Noise)
No ratings yet
Lecture 22: Review For Exam 2 1 Basic Model Assumptions (Without Gaussian Noise)
7 pages
Linear Stochastic Models: 5.1 Least Squares
No ratings yet
Linear Stochastic Models: 5.1 Least Squares
12 pages
Mungadze Linear
No ratings yet
Mungadze Linear
21 pages
Linear Regression: 1 1 N N I I I D I I
No ratings yet
Linear Regression: 1 1 N N I I I D I I
20 pages
Linear Classification: 1 1 N N I D I
No ratings yet
Linear Classification: 1 1 N N I D I
33 pages
Nonparametric Classification 10/36-702: 1 1 N N N I I
No ratings yet
Nonparametric Classification 10/36-702: 1 1 N N N I I
20 pages
Support Vector Machines
No ratings yet
Support Vector Machines
5 pages
Advanced Density Estimation Guide
No ratings yet
Advanced Density Estimation Guide
32 pages
Sparse Additive Models: University of California, Berkeley, USA
No ratings yet
Sparse Additive Models: University of California, Berkeley, USA
22 pages
Differential Privacy: 1 N I 1 N N
No ratings yet
Differential Privacy: 1 N I 1 N N
7 pages
Boosting: I I I I
No ratings yet
Boosting: I I I I
5 pages
Homework 4 Due Friday April 19 3:00 PM Submit A PDF File On Canvas
No ratings yet
Homework 4 Due Friday April 19 3:00 PM Submit A PDF File On Canvas
2 pages
High-Dimensional, Two-Sample Testing
No ratings yet
High-Dimensional, Two-Sample Testing
9 pages
Advanced Dimension Reduction Techniques
No ratings yet
Advanced Dimension Reduction Techniques
40 pages
Online Prediction Algorithms
No ratings yet
Online Prediction Algorithms
8 pages
36-708 Statistical Machine Learning Homework #3 Solutions: DUE: March 29, 2019
No ratings yet
36-708 Statistical Machine Learning Homework #3 Solutions: DUE: March 29, 2019
22 pages
Statistical Machine Learning Solutions
No ratings yet
Statistical Machine Learning Solutions
11 pages
36-708 Statistical Methods For Machine Learning Homework #1 Solutions
No ratings yet
36-708 Statistical Methods For Machine Learning Homework #1 Solutions
12 pages
A Closer Look at Sparse Regression Ryan Tibshirani: 2.1 Three Norms: ', ', '
No ratings yet
A Closer Look at Sparse Regression Ryan Tibshirani: 2.1 Three Norms: ', ', '
25 pages
CMU Statistical ML Homework Solutions
No ratings yet
CMU Statistical ML Homework Solutions
16 pages
Data Analysis Project 2 Due 5:00 PM Nov 21 1 Instructions
No ratings yet
Data Analysis Project 2 Due 5:00 PM Nov 21 1 Instructions
3 pages
Functionspaces PDF
No ratings yet
Functionspaces PDF
15 pages
High-Dimensional, Two-Sample Testing
No ratings yet
High-Dimensional, Two-Sample Testing
9 pages
Causal Inference: 1.1 Two Types of Causal Questions
No ratings yet
Causal Inference: 1.1 Two Types of Causal Questions
19 pages
Manifold Estimation, Hidden Structure and Dimension Reduction
No ratings yet
Manifold Estimation, Hidden Structure and Dimension Reduction
39 pages
Data Analysis Exam 1 36-401, Section B
No ratings yet
Data Analysis Exam 1 36-401, Section B
3 pages
Lecture 8: Inference 36-401, Fall 2015, Section B
No ratings yet
Lecture 8: Inference 36-401, Fall 2015, Section B
16 pages
36-401 Modern Regression HW #2 Solutions: Problem 1 (36 Points Total)
No ratings yet
36-401 Modern Regression HW #2 Solutions: Problem 1 (36 Points Total)
15 pages
HW7
No ratings yet
HW7
1 page
Lecture 7: Diagnostics: 36-401, Fall 2017, Section B
No ratings yet
Lecture 7: Diagnostics: 36-401, Fall 2017, Section B
35 pages
Lecture 9: Predictive Inference
No ratings yet
Lecture 9: Predictive Inference
10 pages
1 Review
No ratings yet
1 Review
7 pages
Edte 326 Statistics
No ratings yet
Edte 326 Statistics
103 pages
Statistical Methods in Hydrology
No ratings yet
Statistical Methods in Hydrology
27 pages
Chapter 11
No ratings yet
Chapter 11
69 pages
MCQ Measures of Dispersion With Correct Answers
No ratings yet
MCQ Measures of Dispersion With Correct Answers
10 pages
Psychological Statistics
No ratings yet
Psychological Statistics
36 pages
Stat&PropbTQQ3W1 4
No ratings yet
Stat&PropbTQQ3W1 4
4 pages
PTSP QB
No ratings yet
PTSP QB
9 pages
Missing Statistical Lessons 2005
100% (1)
Missing Statistical Lessons 2005
38 pages
Measure of Variability
No ratings yet
Measure of Variability
23 pages
State Estimation and Power Flow Analysis of Power Systems: Jiaxiong Chen
No ratings yet
State Estimation and Power Flow Analysis of Power Systems: Jiaxiong Chen
7 pages
Aatcc 201 2014
No ratings yet
Aatcc 201 2014
3 pages
Uncertainty of Measurement Calculator Ver3.3 Part1
No ratings yet
Uncertainty of Measurement Calculator Ver3.3 Part1
25 pages
Chap11 - Flexible Budgets and Overhead Analysis
100% (1)
Chap11 - Flexible Budgets and Overhead Analysis
58 pages
Midterms Quiz MS
No ratings yet
Midterms Quiz MS
5 pages
Statistics JEE (Main) 2024
No ratings yet
Statistics JEE (Main) 2024
78 pages
Descriptive Statistics
100% (3)
Descriptive Statistics
41 pages
Biostatistics 201 Mock Exam Final
No ratings yet
Biostatistics 201 Mock Exam Final
12 pages
(Ebook) Applied Statistics and The SAS Programming Language by Ron P. Cody, Jeffrey K. Smith ISBN 0131465325 Digital Download
No ratings yet
(Ebook) Applied Statistics and The SAS Programming Language by Ron P. Cody, Jeffrey K. Smith ISBN 0131465325 Digital Download
169 pages
Module in Measures of Variability
100% (1)
Module in Measures of Variability
15 pages
Gen Z's Virtual Try-On Impact
No ratings yet
Gen Z's Virtual Try-On Impact
10 pages
Mar2022 EPAT Stats Preparatory Session
No ratings yet
Mar2022 EPAT Stats Preparatory Session
79 pages
Essentials of Statistics For Business and Economics Revised 6th Edition David R. Anderson Instant Download
100% (1)
Essentials of Statistics For Business and Economics Revised 6th Edition David R. Anderson Instant Download
59 pages
Biostatics
No ratings yet
Biostatics
5 pages
Sarndal (2007)
No ratings yet
Sarndal (2007)
22 pages
Statistical Analysis of Experimental Designs Applied To Biological Assays
No ratings yet
Statistical Analysis of Experimental Designs Applied To Biological Assays
42 pages
Test Bank For Business Statistics Communicating With Numbers 2nd Edition by Jaggia and Kelly ISBN 0078020557 9780078020551
100% (21)
Test Bank For Business Statistics Communicating With Numbers 2nd Edition by Jaggia and Kelly ISBN 0078020557 9780078020551
50 pages
Module 2 Final
No ratings yet
Module 2 Final
76 pages
1 Descriptive Analysis Complete
No ratings yet
1 Descriptive Analysis Complete
11 pages
The Correct Answer For Each Question Is Indicated by A
No ratings yet
The Correct Answer For Each Question Is Indicated by A
19 pages
Economic Analysis Journal
No ratings yet
Economic Analysis Journal
7 pages

Lecture 24: Weighted and Generalized Least Squares 1 Weighted Least Squares

Uploaded by

Lecture 24: Weighted and Generalized Least Squares 1 Weighted Least Squares

Uploaded by

Lecture 24: Weighted and Generalized Least Squares

1 Weighted Least Squares

where Xi· is the ith row of X. The solution is

βbOLS = (XT X)−1 XT Y. (2)

Suppose we minimize the weighted MSE

W M SE = n−1 (Y − Xb)T W(Y − Xb) (4)

βbW LS = (XT WX)−1 XT WY. (6)

But why would we want to minimize Eq. 3?

3 The Gauss-Markov Theorem

We will now show that βb is, in a certain sense, optimal.

Var [QY|X] = QVar [|X] QT = QΣQ (9)

where Σ = Var [|X]. Let

for some matrix R. The unbiasedness constraint on Q implies that

because KX = I. Now we substitute Eq. 10 into Eq. 9:

where the last step uses the fact that RX = 0 (and so XT RT = 0T ).

4. The proof actually doesn’t require the variance matrix to be diagonal.

4 Finding the Variance and Weights

Proportions based on binomials If our response variable is a proportion based on a binomial,

1. Estimate m(x) with your favorite regression method, getting m(x).

1. Estimate m(x) with your favorite regression method, getting m(x).

bx2 = exp sb(x).

6 Correlated Noise and Generalized Least Squares

E S−1 |X = S−1 0 = 0

Var S−1 |X = S−1 Var [|X] S−T

βb = ((S−1 X)T S−1 X)−1 (S−1 X)T S−1 Y (23)

βbGLS = (XT WX)−1 XT WY. (26)

7 WLS versus Model Errors

You might also like

Var [QY|X] = QVar [|X] QT = QΣQ (9)

where Σ = Var [|X]. Let

E S−1 |X = S−1 0 = 0

Var S−1 |X = S−1 Var [|X] S−T