0% found this document useful (0 votes)
17 views9 pages

AGBOLA

The document compares five tests for detecting heteroscedasticity in datasets: Park test, Glejser test, Breusch-Pagan test, White test, and Goldfeld test, using simulated datasets of varying sample sizes. It finds that the Glejser test has the highest capacity to detect heteroscedasticity, while the Goldfeld test has the least power value. The study emphasizes the importance of correctly specifying regression models to avoid errors related to heteroscedasticity.

Uploaded by

Claire Blaire
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views9 pages

AGBOLA

The document compares five tests for detecting heteroscedasticity in datasets: Park test, Glejser test, Breusch-Pagan test, White test, and Goldfeld test, using simulated datasets of varying sample sizes. It finds that the Glejser test has the highest capacity to detect heteroscedasticity, while the Goldfeld test has the least power value. The study emphasizes the importance of correctly specifying regression models to avoid errors related to heteroscedasticity.

Uploaded by

Claire Blaire
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/352441415

COMPARISON OF DIFFERENT TESTS FOR DETECTING HETEROSCEDASTICITY


IN DATASETS

Article · December 2020

CITATIONS READS

5 2,056

4 authors, including:

Oluwaseun Adegbilero-Iwari
Afe Babalola University
27 PUBLICATIONS 62 CITATIONS

SEE PROFILE

All content following this page was uploaded by Oluwaseun Adegbilero-Iwari on 16 June 2021.

The user has requested enhancement of the downloaded file.


Anale. Seria Informatică. Vol. XVIII fasc. 2 – 2020
Annals. Computer Science Series. 18th Tome 2nd Fasc. – 2020

COMPARISON OF DIFFERENT TESTS FOR DETECTING


HETEROSCEDASTICITY IN DATASETS
Obabire Akinleye A.1, Agboola, Julius O.1, Ajao Isaac O.1 and Adegbilero-Iwari Oluwaseun E.2
1
Department of Mathematics & Statistics, The Federal Polytechnic, Ado-Ekiti, Nigeria
2
Department of Community Medicine, Afe Babalola University, Ado-Ekiti, Nigeria

Corresponding author: 1obabireakinleye@gmail.com, 2juliusagboola21@gmail.com, 3


isaacoluwaseyiajao@gmail.com, 4adegbilero-iwari@abuad.edu.ng

ABSTRACT: Heteroscedasticity occurs mostly because naturally decreases. Also, as income grows, one has
of beneath mistakes in variable, incorrect data choices on how to dispose the income, consequently,
transformation, incorrect functional form, omission of 𝜎 2 is more likely to increase with the income. Also,
important variables, non-detailed model, outliers and a company with a large profit will give more
skewness in the distribution of one or more independent
variables in the model. All analysis were carried out in R
dividends to their shareholders than a company that
statistical package using Imtest, zoo and package.base. was newly established. In a situation where data
Five heteroscedasticity tests were selected, which are collecting technique improves, 𝜎𝑖 2 is more likely to
Park test, Glejser test, Breusch-Pagan test, White test and decrease. In the same sense, banks with good
Goldfeld test, and used on simulated datasets ranging equipment for processing data are less prone to
from 20,30,40,50,60,70,80,90 and 100 sample sizes make error when processing the monthly statement
respectively at different level of heteroscedasticity (that is of account of their customers than banks that does
at low level when sigma = 0.5, mild level when sigma = not have good facilities. Heteroscedasticity can also
1.0 and high level when sigma = 2.0). Also, the come to play due to outliers. Outliers are
significance criterion alpha = 0.05. However, each test
observations that are much different in a population.
was repeated 1000 times and the percentage of rejection
was computed over 1000 trials. For Glejser test, the
It can either be too large or small in relation to the
average empirical type I error rate for the test reject more observation in such sample. It can also be as a result
than expected while Goldfeld has the least power value. of violation of the assumption that regression model
Therefore, Glejser test has the highest capacity to detect is correctly specified. Not only that, it can be as a
heteroscedasticity in most especially on simulated result of skewness in the distribution of one or more
datasets. regressors included in the model. Skewness is when
KEYWORDS: Park test, Glejser test, Breusch-Pagan test, a distribution is right skewed. Heteroscedasticity
White test, Goldfeld test may be as a result of incorrect data transformation
and incorrect functional form. Heteroscedasticity is
1. INTRODUCTION not a property that is necessarily restricted to cross
sectional data or time series data where there occurs
One of the assumptions of classical linear regression an external shock or change in circumstances that
model states that the disturbances 𝑢𝑖 featuring in created uncertainty about y [7]. Cross sectional data
the population regression function are of the same involves data that deal with members of a population
variance; meaning that they are homoscedatic. In at a particular time. Here, members may be of
other words, the variance of residuals should not different types, size, and completion while time
increase with fitted values of response variable series data are similar in order of magnitude [6].
(SelvaPrabhakara, 2016). When this assumption More often than not, heteroscedasticity arises when
fails, its consequence is what is termed some variables that are important are omitted from
heteroscedasticity which can be expressed as: the model or superfluous of variables in a model
𝐸(𝑢𝑖 2 ) = 𝜎 2 i= 1, 2 …n (model specification error).
There are several reasons why we can have
variability in the variance of 𝑢𝑖 . The study on error 1.1 Model specification
learning models shows that as people learn over Applied econometrics is based on understanding
time, their errors rates reduce drastically such that intuition and skill. Users of economic data must be
𝜎𝑖 2 is expected to decrease. Practically, as one able to give model that other people can rely on [10].
increases his or her number of typing hours, the rate Assumption of Classical Linear Regression Model
of errors committed decreases while the variance states that the regression model used in the analysis

78
Anale. Seria Informatică. Vol. XVIII fasc. 2 – 2020
Annals. Computer Science Series. 18th Tome 2nd Fasc. – 2020

is correctly specified. Other problem called model 𝐸(𝑢𝑖 2 ) = 𝜎 2 𝑋𝑖2


specification error or model specification bias is Graphical methods or Park and Glesjer approaches
encountered. reveal that, if it is true that the variance of 𝑢𝑖 is
Users of economic data must have the following in proportional to the square of the explanatory
their mindset: variable 𝑋𝑗 the original model can be transformed as
a. able to have a standard to choose a model for follow:
empirical analysis,
b. recognize the model specification error types 𝐸(𝑌𝑖 ) = 𝛽1 + 𝛽2 𝑋𝑖 2.0
that can be encountered in practice, Divide the original model by 𝑋𝑖
c. have in mind the result of specification error, 𝑌𝑖 𝛽1 𝑢𝑖
= + 𝛽2 + 2.1
d. how to discover specification error or 𝑋𝑖 𝑋𝑖 𝑋𝑖
recognize tools that can be used, 1
= 𝛽1 𝑋 + 𝛽2 + 𝑣𝑖 2.2
e. discover the cure and good effect that can be 𝑖

used in detecting specification error, Where 𝑣𝑖 is the transformed disturbance term, equal
f. know how to judge the strength of competing to 𝑢𝑖 /𝑋𝑖 . Now it is easy to verify that:
models. 𝑢 2 1
𝐸(𝑣𝑖 2 ) = 𝐸 (𝑋𝑖 ) = 𝑋𝑖 2
𝐸(𝑢𝑖 2 ) 2.3
𝑖
1.2 How to identify heteroscedasticity
= 𝜎 2 (by using 𝐸(𝑢𝑖 2 ) = 𝜎 2 𝑋𝑖2 ) 2.4
The residual from linear regression are
e = 𝑌 − 𝑌̂ = 𝑌 − 𝑋𝛽̂ It should be noted that in the regression
which is used in place of unobservable errors 𝜀 [3]. transformation, the intercept term 𝛽2 is the slope
Residual are used to detect the behaviour of variance coefficient in the original equation and the slope
with datasets. Residual plots, where residuals 𝜀𝑖 are coefficient 𝛽1 is the intercept term in the original
plotted on the y-axis against the dependent variable model. Therefore, to get to the original model we
𝑦̂𝑖 on the x axis are the most commonly used tool shall have to multiply the estimated equation 2.0 by
[2]. But when heteroscedasticity is present, most Xi.
especially when the variance is proportional to a Pattern 2:
power of the mean, the residuals will appear in form The error variance is proportional to Xi. The square
of a fan shape. root transformation
This may not be the best method for detecting 𝐸(𝑢𝑖 2 ) = 𝜎 2 𝑋𝑖 2.5
heteroscedasticity as it difficult to interpret,
It is believed that the variance of 𝑢𝑖 , instead of being
particularly when the positive and negative residuals proportional to the squared Xi, is proportional to Xi
do not exhibit the same general pattern [3]. Cook itself, then the original model can be transformed as:
and Weisberg suggested that plotting the square 𝑌𝑖 𝛽1 𝑢𝑖
residuals, 𝑒𝑖2 to account for this. Then, a wedge = + 𝛽2 √𝑋𝑖 + 2.6
√𝑋𝑖 √𝑋𝑖 √𝑋𝑖
shape bounded below by 0 would indicate 1
heteroscedasticity. = 𝛽1 + 𝛽2 √𝑋𝑖 + 𝑣𝑖 2.7
√𝑋𝑖
However, as [2] pointed out, squaring residuals that Where 𝑣𝑖 = 𝑢𝑖 /√𝑋𝑖 and where Xi>0
are large in magnitude creates scaling problems;
resulting to a plot where patterns in the rest of the
Having given pattern 2, one can readily verify that
residuals are difficult to see. They instead advocate
for plotting the absolute residuals. This way, we do 𝐸(𝑣𝑖2 ) = 𝜎 2 , a homogeneous situation. However,
not need to identify positive and negative patterns, one may proceed to apply Ordinary Least Square to
and do not need to worry about scaling issues. A equation 2.3 to regressing 𝑌𝑖 /√𝑋𝑖 on 1/√𝑋𝑖 on
wedge shape of the absolute residual also indicates 1/√𝑋𝑖 and √𝑋𝑖 .
heteroscedasticity where the variance increase with It is worthy of note that an important feature of the
the mean. This is the plotting method that is used transformed model that states there no intercept
for identifying heteroscedasticity in the study. term. However, regression-through-the-origin model
to estimate 𝛽1 and 𝛽2 have to be used. Having run
2. COMMON HETEROSCEDASTICITY through equation 2.3, one can get back to the
PATTERNS original model simply by multiplying equation 2.3
by √𝑋𝑖 .
There exists some likely assumptions about
heteroscedasticity, they include: Pattern 3:
Pattern 1: The error variance is proportional to the square of
The error variance is proportional to 𝑋𝑖2 , i.e the mean value of Y.

79
Anale. Seria Informatică. Vol. XVIII fasc. 2 – 2020
Annals. Computer Science Series. 18th Tome 2nd Fasc. – 2020

𝐸(𝑢𝑖 2 ) = 𝜎 2 [𝐸(𝑌𝑖 )]2 2.8 3. HETEROSCEDASTICTY TESTS


Equation 2.4 asserts that the variance of 𝑢𝑖 is
proportional to the square of the expected value of Park Test:
Y. Park test made an assumption in his work that
Therefore, variance of the error term is proportional to the
𝐸(𝑌𝑖 ) = 𝛽1 + 𝛽2 𝑋𝑖 2.9 square of the independent variable [8].
It also validates the graphical method by suggesting
Now, if the original equation is transformed as
that 𝜎𝑖2 serves the purpose of the explanatory
follows: 𝛽
𝑌𝑖 𝛽1 𝑋 𝑢 variable 𝑋𝑖 . He hereby suggested 𝜎𝑖2 = 𝜎 2 𝑋𝑖 𝑒 𝑣𝑖 as
= + 𝛽2 𝑖 + 𝑖 2.10 the functional form.
𝐸(𝑌𝑖 ) 𝐸(𝑌𝑖 ) 𝐸(𝑌𝑖 ) 𝐸(𝑌𝑖 )
1 𝑋
= 𝛽1 (𝐸(𝑌 )) + 𝛽2 𝐸(𝑌𝑖 ) + 𝑣𝑖 or 𝐼𝑛𝜎𝑖2 = 𝐼𝑛𝜎 2 + 𝛽𝐼𝑛𝑋𝑖 + 𝑣𝑖
𝑖 𝑖
where𝑣𝑖 is the non-probabilistic disturbance term.
Where 𝑣𝑖 = (𝑢𝑖 /𝐸(𝑌̂𝑖 ). It can be seen that 𝐸(𝑣𝑖2 ) =
Since 𝜎𝑖2 is generally not known, Park suggested
𝜎 2 , that is, the disturbances vi are homogenous. using 𝑢̂𝑖2 as a proxy and running the following
Hence, it is regression of equation 2.5 that will
regression:
satisfy the homoscedastic assumption of the classical
regression model. In 𝑢̂𝑖2 = 𝐼𝑛𝜎 2 + 𝛽𝐼𝑛𝑋𝑖 + 𝑣𝑖 3.0
The transformation in equation 2.5 is not operational = 𝛼 + 𝛽𝐼𝑛𝑋𝑖 + 𝑣𝑖 3.1
because E(Yi) depends in 𝛽1 𝑎𝑛𝑑 𝛽2 , which are If 𝛽 reveals that it is statistically significant, it is an
unknown. Of course, we have 𝑌̂𝑖 . Then, using the evidence that heteroscedasticity is present in the
estimated 𝑌̂𝑖 , we transform our model to: data,otherwise, we may accept the assumption of
𝑌𝑖 1 𝑋 homogeneity. However, the Park test is a two-stage
𝑌̂𝑖
= 𝛽1 (𝑌̂ ) + 𝛽2 (𝑌̂𝑖) + 𝑣𝑖 2.11
𝑖 𝑖 procedure. For the first stage, we run OLS
Where 𝑣𝑖 = (𝑢𝑖 /𝑌̂𝑖 ). Though 𝑌̂𝑖 are not exactly regression not minding the heteroscedasticity issue.
𝐸(𝑌𝑖 ), they are consistent estimators, that is, as the We obtain 𝑢̂𝑖 from the regression and then run the
sample size increase indefinitely, they converge to regression in the second stage.
true E(Yi). Hence, the transformation in equation 2.4 It should be noted that empirically pleading, the Park
will perform satisfactory in practice if the sample test has some difficulties. Goldfield and Quandt
size is reasonably large. have argued that the error term vi entering into
𝐼𝑛𝑢̂𝑖2 = 𝐼𝑛𝜎 2 + 𝛽𝐼𝑛𝑋𝑖 + 𝑣𝑖 may not meet the
Pattern 4:
Ordinary least square assumptions and may itself be
The log transformation such as
heterogeneous. Nevertheless, as a powerful
𝐼𝑛 𝑌𝑖 = 𝛽1 + 𝛽2 𝐼𝑛 𝑋𝑖 + 𝑢𝑖 2.12 explanatory method, we may use the Park Test.
More often than not reduces heteroscedasticity Park test is used for heteroscedasticity when one has
compared with the regression some variable Z that one thinks might explain the
𝑌𝑖 = 𝛽1 + 𝛽2 𝑋𝑖 + 𝑢𝑖 2.13 different variances of the residuals.
The result arises because log transformation is good There are different forms of Park test, the log form is
at compressing the scales in which the variables are the commonest of all and is the one described here,
measured, thus, reduces a tenfold differences where: log (Residual2) = b(intercept)+slope log(X),
between two values to a twofold difference. For yet, another form can still be used theoretically, just
instance, figure 90 is ten time of figure 9 but In 90 like the linear form Residual2= b+m(X). The linear
(=4.4998) is about twice as large as In 9 (=2.1972). form is closely similar to the Breuch Pagan test.
Another benefit of the log transformation is that the
slope coefficient 𝛽2 measures the elasticity of Y Steps for running Park Test
with respect to X, that is, the percentage change in Y Step 1: On the data, run ordinary least squares and
for a percentage change in X. be sure that the regression produces a table
of residuals.
For example, if Y is consumption and X is income, Step 2: The residuals from step 1 should be squared
𝛽2 in equation 2.5 will measure income elasticity, Step 3: The natural log of the squared residuals
whereas in the original model 𝛽2 measures only the from step 2 should taken
rate of change of mean consumption for a unit Step 4: The natural log of Z should be taken and the
change in income. It is one reason why the log variable which you suspect is causing the
models are quite popular in empirical econometrics heteroscedasticity behaviour
[6]. Step 5: Run Ordinary least square once again for the
natural log of Z (step 4) against the natural log of
the squared residuals (step 3). In other words, log of

80
Anale. Seria Informatică. Vol. XVIII fasc. 2 – 2020
Annals. Computer Science Series. 18th Tome 2nd Fasc. – 2020

Z is your independent variable and log (residuals’) is unrelated to a set of explanatory variables versus the
the dependent variable for the regression. alternative hypothesis that the residuals’ variances
Step 6: Calculate the t-Statistic for the Z variable are parametric function of the predictor variable
and note that large t-statistic indicates the presence (Andreas .G, 2016). We can do away with the
of heteroscedasticity. limitations of this test provided we consider the
Breusch-Pagan Godfrey (BPG) test.
Glejser Test: This test can be illustrated by considering the k-
The test is just like Park test. After one must have variable linear regression model.
obtained the residuals 𝑢̂𝑖 from the ordinary least 𝑌𝑖 = 𝛽𝑖 + 𝛽2 𝑋2𝑖 + . . . + 𝛽𝑘 𝑋𝑘𝑖 + 𝑣𝑖 3.9
square regression, Glejser suggested regressing the 2
Assume that the error variance 𝜎𝑖 is described as:
absolute values of 𝑢̂𝑖 on the X variables that are
𝜎𝑖2 = 𝑓(𝛼1 + 𝛼2 𝑍2𝑖 +. . . + 𝜎𝑚 𝑍𝑚𝑖 )
thought to be closely related to 𝜎𝑖2 . The Glejser test
places weaker restrictions on the shape of error where 𝜎𝑖2 is a linear function of the Z’s.
distribution under the null hypothesis than Breusch If 𝛼2 = 𝛼3 = … = 𝜎𝑚 = 0, 𝜎𝑖2 = 𝛼1
and Pagan (1979) test based on normal likelihood which is a constant. Therefore, to test whether 𝜎𝑖2 is
function (Machado, 2000). It uses the following homogeneous, one can test the hypothesis that 𝛼2 =
functional forms: 𝛼3 = … = 𝛼𝑚 = 0. This is the brain behind the
|𝑢̂𝑖 | = 𝛽1 + 𝛽2 𝑋𝑖 + 𝑣𝑖 3.2 Breusch-Pagan Test. The actual test procedure is as
follows.
|𝑢̂𝑖 | = 𝛽1 + 𝛽2 √𝑋𝑖 + 𝑣𝑖 3.3 Step 1: Calculate 𝑌𝑖 = 𝛽1 + 𝛽2 𝑋2𝑖 + ⋯ + 𝛽𝑘 𝑋𝑘𝑖 +
1
|𝑢̂𝑖 | = 𝛽1 + 𝛽2 + 𝑣𝑖 3.4 𝑢𝑖 by ordinary least square and obtain the residuals
𝑋 𝑖
1 𝑢̂1 , 𝑢̂2 , . . . , 𝑢̂𝑛
|𝑢̂𝑖 | = 𝛽1 + 𝛽2 + 𝑣𝑖 3.5 ̂2
𝑢
√𝑋𝑖
Step 2: Find 𝜎̅ 2 = √ 𝑖 .
𝑛
|𝑢̂𝑖 | = √𝛽1 + 𝛽2 𝑋𝑖 + 𝑣𝑖 3.6 ̂2
𝑢
It should noted that OLS estimator is ∑ 𝑖
𝑛−𝑘
|𝑢̂𝑖 | = √𝛽1 + 𝛽2 𝑋𝑖 2 + 𝑣𝑖 3.7 Step 3: Construct variables 𝑝𝑖 defined as:
Thus, vi is the error term. 𝑢̂2
𝑝𝑖 = 𝑖⁄ 2
Also, as an empirical matter is concerned, Glejser 𝜎̅
which is simply each residual square divided by 𝜎̅ 2
test approach can be used. But Goldfeld and Quandt
pointed out that error term vi has some problems in Step 4: Regress 𝑝𝑖 thus constructed on the Z’s as
that its expected value is nonzero, it is serially 𝑝𝑖 = 𝛼1 + 𝛼2 𝑍2𝑖 + … + 𝛼𝑚 𝑍𝑚𝑖 + 𝑣𝑖
correlated and at the same time ironically where𝑣𝑖 is the residual term of this regression.
heterogeneous. Step 5: Obtain the explained sum of squares from
Another problem of Glejser method is that the 𝑝𝑖 = 𝛼1 + 𝛼2 𝑍2𝑖 + ⋯ + 𝛼𝑚 𝑍𝑚𝑖 + 𝑣𝑖
models such as: and define as:
|𝑢̂𝑖 | = 𝛽1 + 𝛽2 𝑋𝑖 + 𝑣𝑖 3.8 1
𝜃=
and 2(𝐸𝑆𝑆)
Assuming 𝑢𝑖 are normally distributed, one can show
|𝑢̂𝑖 | = √𝛽1 + 𝛽2 𝑋𝑖 2 + 𝑣𝑖 are nonlinear in the that if there is homoscedasticand if the sample size n
parameters, therefore, it cannot be estimated with the increases indefinitely, then:
2
usual Ordinary least square procedure. For large 𝜃~𝜒𝑚−1
samples, Glejser has found that the first four of the That is 𝜃 follows the chi-square distribution with
proceeding models give generally pleasant result in (m-1) degree of freedom.
detecting heteroscedasticity. Practically, Glejser Therefore, if in an application the compound
technique may be used for both large and small 𝜃(= 𝜒 𝑇 2) exceeds the critical 𝜒 2 value at the
sample strictly as a qualitative device to be aware of chosen level of significance, one can reject the
heteroscedasticity. hypothesis of homogeneity, or else, do not reject it.

Breusch-Pagan Test White Test


The accomplishment of the Goldfeld-Quandt test Do not forget that Goldfeld-Quandt test requires
depends both on the value of c (the number of reordering of observations with respect to the X
central observations to be omitted) and variable that supposedly bring about
identification of the correct X variable with which heteroscedasticity or the BPG test, which is
to order the observations. The Breusch-Pagan tests eventually sensitive to the normality assumption
the null hypothesis that the residuals’ variances are while the general test of homoscedastic proposed by

81
Anale. Seria Informatică. Vol. XVIII fasc. 2 – 2020
Annals. Computer Science Series. 18th Tome 2nd Fasc. – 2020

White does not rely on the normality assumption and 3. It will tests if truly the presence of
is easy to carryout. heteroscedasticity will cause the ordinary least
Consider three-variable regression model, square formula for the variances and the
𝑌𝑖 = 𝛽1 + 𝛽2 𝑋2𝑖 + 𝛽3 𝑋3𝑖 + 𝑢𝑖 3.10 covariances of the estimates to be
The White test proceeds is as follows: inconsistence.
Step 1: Having given the data, we calculate 𝑌𝑖 =
𝛽1 + 𝛽2 𝑋2𝑖 + 𝛽3 𝑋3𝑖 + 𝑢𝑖 and get the residuals, 𝑢̂𝑖 Goldfeld-Quandt Test:
Step 2: We can then run the following auxiliary This test is used specifically it is assumes that the
regression. heteroscedasticity variance,𝜎𝑖2 is positively related
𝑢̂𝑖2 = 𝛼1 + 𝛼2 𝑋2𝑖 + 𝛼3 𝑋3𝑖 + 𝛼4 𝑋2𝑖
2 2
+ 𝛼5 𝑋3𝑖 to one of the explanatory variance in the regression
+ 𝛼6 𝑋2𝑖 𝑋3𝑖 + 𝑣𝑖 model. For clarity, let us consider the two-variable
It means that the squared residuals from the original model:
regression are regressed on the original X variables 𝑌𝑖 = 𝛽1 + 𝛽2 𝑋𝑖 + 𝑢𝑖 3.11
2 2 2 2
or regressors, the squared values and the cross Suppose 𝜎𝑖 is positively related as Xi, 𝜎𝑖 = 𝜎 𝑋𝑖
product(s) of the regressors. Higher powers of Where 𝜎 2 is a constant.
regressors can also be introduced. Assume 𝜎𝑖2 = 𝜎 2 𝑋𝑖2 reveals that 𝜎𝑖2 is proportional
It is observed that there is a constant term in this to the square of the X variable.
equation even through the original regression may or If 𝜎𝑖2 = 𝜎 2 𝑋𝑖2 is appropriate, it would mean 𝜎𝑖2
may not contain it. Obtain the R2 from this would be larger, the larger the values of Xi. If that is
(auxiliary) regression. the result, it indicates that heteroscedasticity is most
Step 3:Under the null hypothesis that there is no likely to be present in the model. To verify this
heteroscedasticity, it can be shown that sample assertion, Goldfeld and Quandt suggest the
size(n) times that R2 obtained from the auxiliary following steps:
regression asymptotically follows the chi-square Step 1: Start the ranking of the observations
distribution with degree of freedom equal to the according to the values of Xi right from the
number of regressors (excluding the constant term) beginning with the lowest X value.
in the auxiliary regression. That is, Step 2: Ignore c central observations, where c is
2
𝑛. 𝑅 2 ~𝜒𝑑𝑓 specific a priori and divide the remaining (n-c)
Where df is defined previously. (𝑛−𝑐)
observations into two groups each of 2
Step 4: it should be noted that in as much as the chi-
2 observations.
square value obtained in 𝑛. 𝑅 2 ~𝜒𝑑𝑓 is greater than
Step 3: Fit separately ordinary least square
the critical chi-square value at the chosen level of (𝑛−𝑐)
significance, it can hereby be concluded that there regressions to the first observations and the last
2
(𝑛−𝑐)
is heteroscedasticity. If it is not greater than the observations and find the respective residual
2
critical chi- square value, there is heteroscedasticity, sums of squares RSS1 and RSS2, RSS1 representing
which is to say that in the auxiliary regression the RSS from the regression corresponding to the
𝑌𝑖 = 𝛽1 + 𝛽2 𝑋2𝑖 + 𝛽3 𝑋3𝑖 + 𝑢𝑖 , smaller Xi value (the small variance group) and
𝛼2 = 𝛼3 = 𝛼4 = 𝛼5 = 𝛼6 = 0 RSS2 that from the larger Xi values (the large
If a model has several regressors, then introducing variance group). These RSS each have:
all the regressors, their squared (or higher-powered) 𝑛−𝑐 𝑛−𝑐−2𝑘
2
− 𝑘 or ( 2 ) 𝑑𝑓
terms and their cross products can quickly consume
degree of freedom. Therefore, one must use caution Where k is the number of parameters to be estimated
in using the test. including the intercept.
This test can either be a test of heteroscedasticity or Step 4: Compute the ratio
𝑅𝑆𝑆2
specification error and sometimes both. It has been 𝑑𝑓
proved that if no cross-product terms are present in λ= ⁄𝑅𝑆𝑆1
the White test procedure, then it is a test of pure 𝑑𝑓
heteroscedasticity. If cross-product terms are If ui are assumed to be normally distributed (which
present, then it is a test of both heteroscedasticity we usually do) and if the assumption of
and specification bias. homoscedasticis valid, then it can shown that λ
White test has the: follows F distribution with numerator and
𝑛−𝑐−2𝑘
1. It will not force you to specify a model of the denominator df each of ( 2 ).
structure of the heteroscedasticity, if it is If in an application, the computed λ exceeds that of
present. the critical F at the chosen level of significance, we
2. It is based on the assumption that the errors are can reject the hypothesis of homoscedasticity,
normally distributed.
82
Anale. Seria Informatică. Vol. XVIII fasc. 2 – 2020
Annals. Computer Science Series. 18th Tome 2nd Fasc. – 2020

meaning that heteroscedasticity is more like to be


present.
It should be noted that c central observations are
omitted to sharpen the difference between the small
variance group (i.e RSS1) and the large variance
group (i.e RSS2). But the ability of the Goldfield-
Quandt test to do this successfully depends on how c
is chosen.
Practically, the power of the test depends on how c
is chosen. In statistics, the power of a test is
measured by the probability of rejecting the null
hypothesis when it is false [i.e by 1-Prob(type II
error)].
However, it may be noted that in case there is more Figure 4.1: Empirical Type I Error at alpha = 0.05
than one X variable in the model, the ranking of
observations, the first step in the test can be done Table 4.1 and figure 4.1 show the empirical type I
according to any one of them. Thus in the model: error rate for the five tests. The scenario corresponds
𝑌𝑖 = 𝛽1 + 𝛽2 𝑋2𝑖 + 𝛽3 𝑋3𝑖 + 𝛽4 𝑋4𝑖 + 𝑢𝑖 to the case where a homoscedastic model was
This test is applicable for large sample and it simulated and each test was carried out on the model
assumes that the observations must be at least twice residual. At each sample size, each test was repeated
as many as the parameters to be estimated. The test 1000 times and the percentage of rejection was
also assumes normality and serial independent error computed over the 1000 trials. On the average ,
terms. It compares the variance of error terms across Breusch-Pagan test, White test, Goldfeld test and
discrete subgroup [5]. Park test yield an empirical type I error rate of about
5% which shows that the tests are invalid since the
4. ANALYSIS AND RESULTS values obtained are not close to the imposed 5%
level. However, for Glejser test, the average
Homoscedasticdatasets of 20, 30, 40, 50, 60, 70, 80 empirical type 1 error rate for the test rejects more
90 and 100 observations were simulated respectively than expected. This suggests that care should be
and the number of times in percentage that Breusch- taken while using the Glejser test as a homoscedastic
Pagan test, White test, Goldfeld test, Park test and scenario may be considered heteroscedastic.
Glejser test reject null hypothesis were noted using
Table 4.2: Empirical Power at low heteroscedasticity
power test when alpha = 0.05 and the test with the
level at alpha = 0.05
highest number of percentage indicate the best test Sample Breusch White Goldfield Park Glejser
to detect homogeneity. size –Pagan
In the same vein, Homoscedasticdatasets of 20, 30, 20 22.4 14.2 5.7 19.5 28.9
40, 50, 60, 70, 80 90 and 100 observations were 30 37.8 26.5 5.9 30.1 49.6
simulated respectively and the number of time that 40 50.1 37.2 5.8 40.7 62.2
50 61.9 51.2 6.1 47.6 75.5
Breusch- Pagan test, White test, Goldfeld test, Park
60 71.9 63.1 7.2 58.2 85.4
test and Glejser test reject null hypothesis were 70 78.3 73.2 5.6 65.9 91.7
noted, for low level of heteroscedasticity, when 80 85.0 82.0 6.7 71.7 95.7
alpha = 0.05, for mild level of heteroscedasticity and 90 89.9 88.2 6.6 80.0 96.9
high level of heteroscedasticity. The test with the 100 93.6 93.1 4.5 83.4 98.6
highest percentage indicates the best test among the
selected tests. The results obtained from the
simulation study are in percentage as shown below:

Table 4.1: Empirical type I Error at alpha = 0.05


Sample Breusch
White Goldfield Park Glejser
size –Pagan
20 4.2 3.1 4.3 5.3 7.1
30 5.1 5.7 5.7 4.6 8.8
40 4.6 5.5 5.2 5.7 8.5
50 6.1 4.7 4.7 4.8 10.0
60 4.4 4.6 4.8 4.9 8.6
70 4.0 5.2 3.8 4.9 9.6
80 4.4 4.2 5.3 3.5 8.7
90 5.5 5.4 5.4 4.5 9.8 Figure 4.2: Empirical Power at low Heteroscedasticity
100 5.4 4.3 4.8 5.9 10.5 level at alpha = 0.05
83
Anale. Seria Informatică. Vol. XVIII fasc. 2 – 2020
Annals. Computer Science Series. 18th Tome 2nd Fasc. – 2020

Table 4.2 and figure 4.2 show empirical power of Table 4.4: Empirical Power at high heteroscedasticity
the selected test at low heteroscedasticity level. level at alpha = 0.05
Here, low heteroscedasticity model was simulated Sample Breusch White Goldfield Park Glejser
size –Pagan
and each test was carried out on the model residual. 20 93.6 80.8 9.7 90.6 99.3
Each test was repeated 1000 times at each sample 30 99.7 97.5 9.3 98.5 100.0
size and the percentage of rejection was computed 40 100.0 99.6 9.4 99.6 100.0
over the 1000 trials. On the long run, all the tests 50 100.0 99.9 12.1 99.9 100.0
except Goldfeld increase as the powers increase, it 60 100.0 100.0 9.9 100.0 100.0
shows that the test are valid. However, Glejser test is 70 100.0 100.0 10.6 100.0 100.0
80 100.0 100.0 12.6 100.0 100.0
the best test among all because it has the highest 90 100.0 100.0 11.2 100.0 100.0
power. 100 100.0 100.0 13.7 100.0 100.0

Table 4.3: Empirical Power at mild heteroscedasticity


level at alpha = 0.05
Sample Breusch White Goldfield Park Glejser
size –Pagan
20 57.9 40.2 7.1 52.6 74.8
30 81.7 70.8 8.0 78.9 95.1
40 92.7 87.9 8.5 89.6 99.1
50 97.8 96.7 7.6 97.3 99.8
60 99.4 99.5 7.3 98.5 100.0
70 99.7 99.6 7.4 99.7 100.0
80 100.0 100.0 8.2 99.8 100.0
90 100.0 100.0 7.7 100.0 100.0
100 100.0 100.0 8.2 100.0 100.0

Figure 4.4: Empirical Power at High Heteroscedasticity


level at alpha =0.05

Table 4.4 and figure 4.4 show empirical power at


high heteroscedasticity level at alpha = 0.05, high
heteroscedasticity model was simulated and each
test was carried out on the model residual. At each
sample size, each test was repeated 1000 times and
the percentage of rejection was computed over 1000
trials. The power value of all the tests except
Goldfeld test are considered to be adequate because
they are more than 80%, it shows that we have 80%
Figure 4.3: Empirical Power at Mild Heteroscedasticity chance of detecting a different between population
level at alpha =0.05
mean and the target when a difference actually exist.
The highest sensitivity power of Glejser test makes
Table 4.3 and figure 4.3 show empirical power of it to be the best.
the selected tests at mild heteroscedasticity level.
The scenario corresponds to the case where a mild 5. SUMMARY OF RESULTS
heteroscedasticity model was simulated and each
test was carried out on the model residual. On each For table 4.1, the focus was to assess the validity of
sample size, each test was repeated 1000 times and the test by comparing the empirical type I error with
the percentage of rejection was computed over the the imposed 0.05, the results show that the best test
1000 trials. All the tests except Goldfeld behave is the test with the highest power.
similarly with a minimum sample size 40 required For table 4.2, the power at low heteroscedasticity
to achieve a reasonable power of at least 80%. At the level was compared and found out that at various
sample size of 40, the most powerful test of the four
sample sizes, the best test is the test with highest
tests is Glejser and likewise on average the best test. power at various level and sample sizes.
However, care should be taken in considering the For table 4.3, the power at mild heteroscedasticity
fact that Glejser test rejects more than expected level was compared and found out that at various
which is the basis, the best test for mild sample sizes, the best test is the test with highest
heteroscedasticity is Breusch-Pagan for a reasonable
power at various level and sample sizes.
sample size of 40. At sample size above 40, all the For table 4.4, the power at high heteroscedasticity
tests except Goldfeld behave similarly with their level was compared and found out that at various
respective power.
84
Anale. Seria Informatică. Vol. XVIII fasc. 2 – 2020
Annals. Computer Science Series. 18th Tome 2nd Fasc. – 2020

sample sizes, the best test is the test with highest [2] Carroll, R. J., & Ruppert, D. (1988).
power at various level and sample sizes. Transformation and Weighting in Regression.
Chapman.
CONCLUSION [3] Cook, R. D. & Weisberg, S. (1982). Residuals
and Influence in Regression. Chapman and
This research work focused on the uses of selected Hall, New York.
five tests, which are Breusch Pagan, White, [4] Glejser, H. (1969). A New Test for
Goldfeld, Park and Glejser tests for detecting Heteroscedasticity; Journal of American
heteroscedasticity in simulated datasets, ranging Statistical Association, 64(235); pp315-323.
from 20, 30, 40, 50, 60, 70, 80, 90 and 100. The
[5] Gold Feld, S. M & Quandt, R. E. (1965).
datasets was simulated at all levels of
Some Tests of Homoscedasticity; Journal of
heteroscedasticity with low result when the sigma
American Statistical Association; 60(310); 539-
was set at 0.05, mild when sigma was at 0.10, and
547.
high when sigma was at 2.0. Also, all the tests were
run on the simulated datasets to know the test at 5% [6] Gujarati, D. N. & Porter, D.C. (2004). Basic
threshold. The results show that Glejser test has the Econometric,4th Edn, McGraw-Hill Companies
highest tendency to detect heteroscedasticity at all Inc. New York, 387-418.
levels, while the Goldfeld test is the least of selected [7] Hill, R. C., Griffiths, W.E. & Lim, G. C.
tests as far as the simulated datasets is considered. (2007). Principles of Econometrics, John Wiley
& Sons Inc. New Jersey.
REFERENCES [8] Park R. E. (1966). Estimation with
Heteroscedastic error terms, Econometrica,
[1] Andreas, G., Carla, G., Rebecca, D., 34(4), pp 888
Buchner, S., Diestel, K., Box,G.E.P & Cox, [9] SelvaPrabhakara (2016). Regression model in
D. R. (1964). An Analysis of Transformation; R.
Journal of Royal Statistics Society, Series B, [10] Wooldridge, J. M. (2015). Introductory
26, 211-252. econometrics: A modern approach. Nelson
Education.

85

View publication stats

You might also like