GLS
&
WLS
Introduction
In both ordinary least squares and maximum likelihood approaches to
parameter estimation, we made the assumption of constant variance, that is
the variance of an observation is the same regardless of the values of the
explanatory variables associated with it.
Another, assumption is that all the random error components are identically
and independently distributed.
When this assumption is violated, then ordinary least squares estimator of
the regression coefficient loses its property of minimum variance in the class
of linear and unbiased estimators.
2
Introduction
3
Introduction
There are many real situations in which this assumption is inappropriate. In
some cases the measurement system used might be a source of variability,
and the size of the measurement error is proportional to the measured
quantity.
Other times this occurs when errors are correlated. Also, when the
underlying distribution is continuous, but skewed, such as lognormal,
gamma, etc., the variance is not constant, and in many cases variance is a
function of the mean.
An important point is that the constant variance is linked to the assumption
of normal distribution for the response.
4
Introduction
The violation of such assumption can arise in anyone of the following
situations:
1. The variance of random error components is not constant.
2. The random error components are not independent.
3. The random error components do not have constant variance as well as
they are not independent.
When the assumption of constant variance is not satisfied a possible solution
is to transform the data (for example taking log of the response variable
and/or the explanatory variables) to achieve constant variance.
5
Introduction
In such cases, the covariance matrix of random error components does not
remain in the form of an identity matrix but can be considered as any
positive definite matrix.
Under such assumption, the OLSE does not remain efficient as in the case of
an identity covariance matrix.
The generalized or weighted least squares method is used in such situations
to estimate the parameters of the model.
6
Introduction
In this method, the deviation between the observed and expected values of yi
is multiplied by a weight ωi where ωi is chosen to be inversely proportional
to the variance of yi.
For a simple linear regression model, the weighted least squares function is
The least-squares normal equations are obtained by differentiating 𝑆(𝛽0 , 𝛽1 )
with respect to 𝛽0 and 𝛽1 and equating them to zero as
The solution of these two normal equations gives the weighted least squares
estimate of 𝛽0 and 𝛽1 .
7
Generalized least squares estimation
Suppose in usual multiple regression model
𝑦 = 𝑋𝛽 + 𝜀 𝑤𝑖𝑡ℎ 𝐸 𝜀 = 0, 𝑉 𝜀 = 𝜎 2 𝐼,
the assumption 𝑉 𝜀 = 𝜎 2 𝐼 is violated and become
𝑉 𝜀 = 𝜎 2𝑉
where 𝑉 is a known n n nonsingular, positive definite and symmetric matrix.
This structure of 𝑉 incorporates both the cases.
- when 𝑉 is diagonal but with unequal variances and
- when 𝑉 is not necessarily diagonal depending on the presence of correlated
errors, some of the diagonal elements are nonzero.
8
Generalized least squares estimation
The OLSE of 𝛽 is
𝑏 = (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦
In such cases, OLSE gives unbiased estimate but has more variability as
𝐸(𝑏) = 𝑋 ′ 𝑋 −1 𝑋 ′ 𝐸(𝑦)
= 𝑋 ′ 𝑋 −1 𝑋 ′ 𝑋 𝛽 = 𝛽
𝑉(𝑏) = 𝑋 ′ 𝑋 −1 𝑋 ′ 𝑉 𝑦 𝑋 𝑋 ′ 𝑋 −1
= 𝜎 2 𝑋 ′ 𝑋 −1 𝑋 ′ 𝑉𝑋 𝑋 ′ 𝑋 −1
9
Generalized least squares estimation
Now we attempt to find better estimator as follows:
Since Ω is positive definite, symmetric, so there exists a nonsingular matrix K such that.
KK' = 𝑉
Then in the model
𝑦 = 𝑋𝛽 + 𝜀
Pre multiply by K-1, this gives
𝐾 −1 𝑦 = 𝐾 −1 𝑋𝛽 + 𝐾 −1 𝜀
Or
𝑧 = 𝐵𝛽 + 𝑔
Where, z= 𝐾 −1 𝑦, B=𝐾 −1 𝑋, g=𝐾 −1 𝜀
10
Generalized least squares estimation
Now observe that
E(g)=𝐾 −1 𝐸 𝜀 = 0
And V(g) = E[{g-E(g)}{g-E(g)}’]
= 𝐸 𝑔𝑔′
= 𝐸[𝐾 −1 𝜀𝜀 ′ 𝐾′−1 ]
=𝐾 −1 𝐸(𝜀𝜀 ′ )𝐾′−1
=𝜎 2 𝐾 −1 𝑉𝐾′−1
=𝜎 2 𝐾 −1 KK′𝐾′−1
=𝜎 2 I
Thus the elements of g have 0 mean, and they are uncorrelated.
11
Generalized least squares estimation
So either minimize 𝑆 𝛽 = 𝑔′ 𝑔
= 𝜀 ′ 𝑉 −1 𝜀
= (𝑦 − 𝑋𝛽)′ 𝑉 −1 (𝑦 − 𝑋𝛽)
and get normal equations as
(𝑋 ′ 𝑉 −1 𝑋)𝛽መ = 𝑋 ′ 𝑉 −1 𝑦
Or
𝛽መ = (𝑋 ′ 𝑉 −1 𝑋)−1 𝑋 ′ 𝑉 −1 𝑦
Alternatively, we can apply OLS to transformed model and obtain OLSE of 𝛽 as
𝛽መ = (𝐵′ 𝐵)−1 𝐵′ 𝑧
= (𝑋 ′ 𝐾′−1 𝐾 −1 X)−1 𝑋 ′ 𝐾′−1 𝐾 −1 y
= (𝑋 ′ 𝑉 −1 𝑋)−1 𝑋 ′ 𝑉 −1 𝑦
This is termed as generalized least squares estimator (GLSE) of 𝛽.
12
Generalized least squares estimation
State and prove the Gauss-Aitken’s theorem on generalized least squares
estimator (GLSE).
Or
(GLSE is BLUE)
13
Generalized least squares estimation
Analysis of Variance for Generalized Least Squares
14
Example
The following linear model of county-level poverty rates in the United States uses
socioeconomic factors such as the median age of the county’s inhabitants, the vacancy
rate of houses for sale in the county and the percentage of the county’s population with at
least one college degree.
We’ll use use data from the US Census Bureau aggregated at a county level. The data set
used in this example can be downloaded using the link below
https://gist.github.com/sachinsdate/0b8ebc2b26afb67a1e83e752c69e1a25
15
Example
16
Example
The OLS output
We see that all coefficient estimates are significant at a p of < .001. If the errors
are not homoskedastic, then the above table will yield misleading results.
17
Example
A visual test of heteroskedasticity should tell us if the variance of the errors (as
estimated by the variance of the fitted model’s residuals) is constant. We’ll plot the
fitted model’s residuals (which are unbiased estimates of the model’s errors) against the
fitted (i.e., predicted) values of the response variable, as follows:
The model’s errors are
clearly heteroskedastic.
The use of the GLS estimator
is indicated for this data set.
18
Example
Let’s compare the parameter estimates from the OLS model, and the GLS model:
GLS
OLS
19
Example
Let’s also compare the standard errors of the parameter estimates from the
OLS model, and the GLS model:
We see the following comparative output showing what we would expect from
the GLS estimated model, namely that its standard errors are much smaller
than those from the OLS estimated model.
The GLS estimated model is seen to be more precise than the OLS estimated
model for this data set.
20
Suppose 𝑦𝑖 = 𝛽1 + 𝛽2 𝑥𝑖 + 𝜀𝑖 ; 𝑖 = 1,2, … , 𝑛 𝑤𝑖𝑡ℎ 𝐸 𝜀 = 0 𝑎𝑛𝑑
𝐸 𝜀𝑖 𝜀𝑗 = 0 𝑓𝑜𝑟 𝑖 ≠ 𝑗
𝜎2
= 𝑓𝑜𝑟 𝑖 = 𝑗.
𝑥𝑖 2
Find the GLSE of 𝛽1 and 𝛽2 .
2
𝐸 𝜀𝑖 𝜀𝑗 =𝑘 2 𝑥𝑖 𝑓𝑜𝑟 𝑖 = 𝑗.
𝐸 𝜀𝑖 𝜀𝑗 =𝑘𝑥𝑖 2 𝑓𝑜𝑟 𝑖 = 𝑗.
21
When are GLS and OLS estimators equivalent?
Let us look at the two expressions first:
By looking at the two expressions, one can easily deduce that if Y belongs to the column space of
X, then the two estimators are identical. Are there any other situations? Recall that Y can be
written as
22
22
When are GLS and OLS estimators equivalent?
23
23
When are GLS and OLS estimators equivalent?
24
24
Weighted Least Squares
Weighted Least Squares is an extension of Ordinary Least Squares regression. Non-
negative constants (weights) are attached to data points. It is used when any of the
following are true:
● Your data violates the assumption of homoscedasticity. In simple terms this means that your dependent
variable should be clustered with similar variances, creating an even scatter pattern. If your data doesn’t have
equal variances, you shouldn’t use OLS.
● You want to concentrate on certain areas (like low input value). OLS can’t “target” specific areas, while
weighted least squares works well for this task. You may want to highlight specific areas in your study: ones
that might be costly, expensive or painful to reproduce. By giving these areas bigger weights than others, you
pull the analysis to that region’s data—. This focuses the analysis on the areas that matter (Shalizi, 2015).
● You’re running the procedure as part of logistic regression or some other nonlinear function. With any
non-linear procedure, linear regression is usually not the most appropriate modeling tool unless you can group
the data. In addition, the error terms in logistic regression are heteroscedastic, which means you can’t use
OLS.
● You have any other situation where data points should not be treated equally. For example, you might
give more preference to points you know have been precisely measured and a lower preference to points that
are estimated.
25
Weighted Least Squares
One way to handle this issue is to instead use weighted least squares regression,
which places weights on the observations such that those with small error
variance are given more weight since they contain more information compared
to observations with larger error variance.
26
Introduction
When performing WLS, you need to know the weights.
1. Experience or prior information using some theoretical model.
2. Using residuals of the model, for example if var(εi)=σ2xi then we may
decide to use wi=1/xi.
3. If the responses are the average of ni observation at each xi or something
like var(yi)=var(εi)=σ2/ni, then we may decide to use wi=ni.
4. Sometime we know that different observations have been measured by
different instruments that have some (known or estimated) accuracy.
In this case we may decide to use weights as inversely proportional to
the variance of measurement errors.
27
Weighted Least Squares
When the errors ε are uncorrelated but have unequal variances so that the
covariance matrix of ε is
1
⋯ 0
𝑊1
𝜎 2 𝑉= 𝜎 2 ⋮ ⋱ ⋮
1
0 ⋯
𝑊𝑛
say, the estimation procedure is usually called weighted least squares.
Let W = V–1. Since V is a diagonal matrix, W is also diagonal with diagonal
elements or weights w1, w2…, wn.
28
Weighted Least Squares
The weighted least-squares normal equations are
● (𝑋 ′ 𝑊𝑋)𝛽መ = 𝑋 ′ 𝑊𝑦
So
● 𝛽መ = (𝑋 ′ 𝑊𝑋)−1 𝑋 ′ 𝑊𝑦
is the weighted least-squares estimator.
Note that observations with large variances will have smaller weights than observations with small
variances.
29
WLS: Example
A step-by-step example of how to perform weighted least squares regression
Step 1: Create the Data
The data contains the number of hours studied and the corresponding exam score for 16
students.
Step 2: Perform Linear Regression
Fit a simple linear regression model that uses hours as the predictor variable and score
as the response variable.
Coefficients Std. Error t value Pr(>|t|)
(Intercept) 60.467 5.128 11.791 1.17e-08 ***
Hours 5.500 1.127 4.879 0.000244 ***
Residual standard error: 9.224 on 14 degrees of freedom
30
WLS: Example
Step 3: Test for Heteroscedasticity
Next, we’ll create a residual vs. fitted values plot to visually check for
heteroscedasticity
We can see from the
plot that the residuals
exhibit a “cone” shape
– they’re not
distributed with equal
variance throughout the
plot.
31
WLS: Example
Step 4: Perform Weighted Least Squares Regression
Since heteroscedasticity is present, we will perform weighted least squares by
defining the weights in such a way that the observations with lower variance
are given more weight:
Coefficients Std. Error t value Pr(>|t|)
(Intercept) 63.9689 5.1587 12.400 6.13e-09 ***
Hours 4.7091 0.8709 5.407 9.24e-05 ***
Residual standard error: 1.199 on 14 degrees of freedom
From the output we can see that the coefficient estimate for the predictor
variable hours changed a bit and the overall fit of the model improved.
32
WLS: Example
The weighted least squares model has a residual standard error of 1.199 compared to
9.224 in the original simple linear regression model.
This indicates that the predicted values produced by the weighted least squares model
are much closer to the actual observations compared to the predicted values produced by
the simple linear regression model.
The weighted least squares model also has an R-squared of 0.6762 compared to 0.6296
in the original simple linear regression model.
This indicates that the weighted least squares model is able to explain more of the
variance in exam scores compared to the simple linear regression model.
These metrics indicate that the weighted least squares model offers a better fit to the
data compared to the simple linear regression model.
33
Advantages and Disadvantages
Advantages:
● It’s well suited to extracting maximum information from small data sets.
● It is the only method that can be used for data points of varying quality.
Disadvantages:
● It requires that you know exactly what the weights are. Estimating weights
can have unpredictable results, especially when dealing with small samples.
Therefore, the technique should only be used when your weight estimates
are fairly precise. In practice, precision of weight estimates usually isn’t
possible.
● Sensitivity to outliers is a problem. A rogue outlier given an inappropriate
weight could dramatically skew your results.
34
Thank you