0% found this document useful (0 votes)
4 views40 pages

Dissanaike 2003

This paper critically examines the method of orthogonal regression, highlighting its flaws such as the implicit assumption of equations without error terms, which leads to over-optimistic slope estimates. The authors propose an adjusted orthogonal estimator that addresses these issues and demonstrates superior performance compared to traditional orthogonal regression and ordinary least squares in empirical tests. The paper concludes that the adjusted estimator provides a more reliable and unbiased slope coefficient in both error-free and error-in-variables contexts.

Uploaded by

carlo.demusis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views40 pages

Dissanaike 2003

This paper critically examines the method of orthogonal regression, highlighting its flaws such as the implicit assumption of equations without error terms, which leads to over-optimistic slope estimates. The authors propose an adjusted orthogonal estimator that addresses these issues and demonstrates superior performance compared to traditional orthogonal regression and ordinary least squares in empirical tests. The paper concludes that the adjusted estimator provides a more reliable and unbiased slope coefficient in both error-free and error-in-variables contexts.

Uploaded by

carlo.demusis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

A Critical Examination of Orthogonal Regression

Gishan Dissanaike a, Shiyun Wang b, *


a
University of Cambridge
b
University of Sheffield and University of Cambridge

ABSTRACT

The method of orthogonal regression has a long and distinguished history in statistics and
economics. It has been viewed as superior to ordinary least squares in certain situations.
However, our theoretical and empirical study shows that this method is flawed in that it
implicitly assumes equations without the error term. A direct result is that it over-
optimistically estimates the slope coefficient. It also cannot be applied to testing if there is
an equal proportionate relationship between two variables, a case where orthogonal
regression has been frequently used in previous research. We offer an alternative adjusted
orthogonal estimator and show that it performs better than all the previous orthogonal
regression models and, in most cases, better than ordinary least squares.

Keywords: Orthogonal Regression, Errors-in-Variables, OLS.


JEL classification: C20, C52.

*University of Sheffield Management School, 9 Mappin Street, Sheffield S1 4DT, United Kingdom. Email:
s.wang@sheffield.ac.uk. Phone: +44 (0) 114 2223455. Fax: +44 (0) 114 222 3348. The authors are grateful
for comments and advice from Ian Garrett, Phil Holmes, Paul Kattuman, Richard Woodhouse, Eden Yin and
participants at various seminars and conferences. The usual disclaimer applies. Dissanaike acknowledges
support from the Economic and Social Research Council (Grant No. H53627503496), and the Institute of
Chartered Accountants of England & Wales. Wang is grateful for support from the Charterhouse Trust.
A Critical Examination of Orthogonal Regression

INTRODUCTION

The method of Orthogonal Regression has a long and distinguished history in statistics and

economics. The method, which involves minimising the perpendicular distance between

the observations and the fitted line, has been viewed as superior to Ordinary Least Squares

in two different contexts.1 First, it has often been advocated when the independent and

dependent variables in a two-variable linear regression cannot be pre-determined. This is

because the minimising of perpendicular distance does not depend on a specific axis (see,

e.g., Smyth, Boyes and Peseau (1975), Shalit and Sankar (1977), Reza (1978), and Jackson

and Dunlevy (1988)).2 Second, the method has also been extensively used when there are

errors in the independent variables and, for this reason, is sometimes called the errors-in-

variables model (see Fuller (1987)).

Depending on how the estimators are derived, the method of orthogonal regression

has also been accorded other names, although the estimators are virtually identical. It is

called orthogonal distance regression (see Boggs and Spiegelman (1988)), the generalized

least squares estimator (see Anderson (1984)), moment estimator (see Fuller (1987)), or

maximum likelihood estimator (see Brown (1982)), etc. However, the phrase “orthogonal

1
When the method of Ordinary Least Squares is used, the coefficient estimates are obtained by minimising

the vertical distance between the observations and the fitted line (the direct regression), or by minimising

the horizontal distance between the observations and the fitted line (the reverse regression).

1
regression” has been the one most frequently used in different econometric textbooks (see

Malinvaud (1970), Kmenta (1997), Kennedy (1998), and Maddala (2001)), or in journal

articles (see Anderson (1984), Carrol and Ruppert (1996) and Liao (2002)). 3 For

consistency, throughout this paper, we use the term “orthogonal regression” to refer to the

idea of getting the slope estimator by minimising the weighted squared distance (deviation)

between the observations and the fitted line.

The orthogonal regression method has long been of interest to researchers in

different areas since Adcock (1878). Early studies focused on the derivation of the

orthogonal regression, often in different ways. The contributors included Adcock (1878),

Kummell (1879), Pearson (1901), Dent (1935), Koopmans (1937), Allen (1939), Tintner

(1945), Lindley (1947), and Madansky (1959). As stated by Anderson (1984), “the method

of orthogonal regression was discovered and rediscovered many times, often

independently.” Recent studies tend to examine the usefulness and limitations of this

methodology. Boggs and Spiegelman (1988) compare OLS to orthogonal regression for

fitting both linear and non-linear models when there are errors in the independent

variables. They conclude that orthogonal regression never performs appreciably worse than

OLS and often performs considerably better. Carrol and Ruppert (1996) also discuss the

use of orthogonal regression in the errors-in-variables context, and argue that because of

2
For example, when studying the interchangeability between two size measures (such as assets and

employment), theory does not specify which size measure should be the dependent variable and which

should be the independent variable.


3
The listed authors here all deal with the same estimator as that in this study. It should be noted that the term

“orthogonal regression” has sometimes been used in a different context. For example, Greene (2000, 1997)

describes orthogonal regression as follows. “If the variables in a multiple regression are not correlated (i.e.

are orthogonal), then the multiple regression slopes are the same as the slopes in the individual simple

regression.” Greene’s use of the term “orthogonal regression” is different to that used in this paper.

2
the failure to account for equation errors, orthogonal regression is often misused in errors-

in-variables regression. However, Carrol and Ruppert (1996) do not try to derive the

correct model to take account of equation errors. It is one of our purposes in this paper to

derive the correct version of the orthogonal regression estimator to acknowledge the

existence of equation errors. In the calibration literature, Liao(2002) studies a class of

weighted least-squares estimators that includes the classical OLS regression, the reverse

regression and the orthogonal regression approaches. By choosing the optimal weight

between the vertical and the horizontal distances, Liao concludes that the weighted least-

squares estimator (i.e. the orthogonal regression estimator according to our definition)

performs better than the classical and reverse regression estimators.

This article provides both a theoretical and empirical examination of the orthogonal

regression method both when there are no errors in the variables and when there are. We

argue that the orthogonal regression method is flawed in that it implicitly assumes

equations without the error term. A direct result is that it over-optimistically estimates the

slope coefficient and cannot be applied to testing whether there is an equal proportionate

relationship between two variables, a case where orthogonal regression has been frequently

used in previous research. Further, because the orthogonal estimator happens to be a

function of the weighted difference of the variances of the variables, whenever the

weighted difference of the variances is equal, the estimator will be equal to 1 in the case of

no errors in the variables, and equal to the squared root of the measurement error’s

variance ratio when there are errors in the variables. As the variances of the variables can

be manipulated by re-scaling, one will always be able to obtain unit elasticity between the

variables in almost any case when there are no errors in variables.4 Therefore, a unit slope

thus obtained is not reliable. We therefore develop an adjusted orthogonal estimator and it

4
Similarly, when there are errors in the variables, re-scaling will alter the slope estimate.

3
is shown in this paper that after adjusting for the equation error, our model is an

asymptotically consistent estimator of the true slope. In the simulation tests, this adjusted

estimator performs better than all the other previous orthogonal models and in most cases

better than the ordinary least squares model in obtaining an unbiased slope coefficient.

This paper is organised as follows. We describe the method of orthogonal

regression in the next section, and discuss its theoretical flaw in Section II. We propose the

new, adjusted orthogonal estimators in section III. Section IV categorises the various

orthogonal models according to whether there are measurement errors and/or equation

errors, and section V compares them using simulation tests. Section VII concludes the

paper.

I. Orthogonal Regression

Assume that two variables, Y and X, are theoretically linearly related. That is,

Y = α + β X + u, (1)

where α is the intercept, β is the slope and u is the equation error with zero mean and i.i.d.

normal property. One most commonly used way to estimate β is to use ordinary least

squares which minimises the vertical distance between the observations and the fitted line.

However, the OLS methodology will not be valid when the dependent and independent

variables cannot be pre-determined, or when there are measurement errors in variables. In

such cases the orthogonal regression method is thought to be more applicable. The

orthogonal regression estimators are obtained by minimising directly the distance between

the observations and the fitted line, and are (see Appendix I for the derivation):

4
αˆ = Y − βX (2)

2
σ 2 − σ X2 + (σ Y2 − σ X2 ) 2 + 4σ XY
βˆ = Y (3)
2σ XY

where σ XY is the covariance between X and Y, and σ X2 , σ Y2 , σ u2 are the variances of X, Y

and u respectively. Since the estimators of the orthogonal regression are obtained by

minimizing the perpendicular distance between the observations and the fitted line, the

orthogonal regression is especially applicable to the case when the independent and

dependent variables (the regressor and the regressand) cannot be pre-determined.

Orthogonal regression has also been advocated when there are errors in the

variables. Assume that x and y are two variables measured with errors. That is,

x = X + εx , (4)

y = Y + εy , (5)

where εx and εy are measurement errors for x and y respectively. The orthogonal regression

slope estimator when there are errors in the variables is (see Appendix II for the

derivation):5

2
s y2 − λs x2 + ( s y2 − λs x2 ) 2 + 4λs xy
βˆ = , (6)
2 s xy

where sxy is the sample covariance between x and y, and s x2 , s y2 are the sample variances of

x and y, respectively.

5
This estimator can be derived in different ways and therefore has been accorded various names. For

example, it is called orthogonal distance regression (see Boggs and Spiegelman, 1988), generalized least

squares estimator (see Anderson, 1984), moment estimator (see Fuller, 1987), or maximum likelihood

estimator (see Brown, 1982).

5
II. The problem with using the orthogonal regression method

In equation (3), it is clear that, whenever σx2 =σy2, the orthogonal slope estimator will

always be equal to 1. Therefore, whenever the variances of the two variables are close to

each other, no matter whether the two variables are really related, the orthogonal

regression method will always render a significant and close to unit relationship between

the two variables. The following example illustrates this deficiency.

Assume that there are two variables x and y. x has three observations taking the

values 2, 7, and 9. y has exactly the same values as x, but in a different order, say, 7, 2, and

9. We then have:

Observations Variance Correlation Orthogonal

between x, y estimate of slope

A B C

x 2 7 9 8.67 0.04 1

y 7 2 9 8.67

The correlation coefficient of 0.04 is not significantly different from 0. However, when we

run the orthogonal regression, we find unit elasticity between the two variables (â=1), and

the unit elasticity hypothesis cannot be rejected at any conventional level of significance!

This is clearly a deficiency that has not been recorded in the literature before. However,

orthogonal regression has been frequently used in previous studies to test for unit elasticity

6
between two variables (see e.g., Smyth et al. (1975), Shalit and Sankar (1977), Nguyen

(1995), and Clark et al. (1999)).

Likewise, in equation (6) when there are measurement errors in the variables, the

estimator will be equal to λ whenever the two variables’ variance ratio equals the

variance ratio of the measurement errors. Also, it suffers from the problem that the

estimated relationship will be based on the measurement unit used for the variables. For

example, one can always find a significant λ proportionate relationship between two

variables by simply changing the measurement unit of one of the variables (i.e. rescaling).

III. The adjusted classical orthogonal estimator

The problem with using orthogonal regression to test for unit elasticity between variables

(see section II) is due to the fact that it fails to account for the variation of the error term.

∂U ∂U
To derive the classical orthogonal estimator, we set = 0, and = 0 (see Appendix
∂α ∂β

I). But it can be shown that as long as the variance of the equation error is not zero (σu2 ≠

∂U
0), cannot take a value of 0. To see this, rewrite (A3) as,
∂β

∂U −2
[ ]
− 2 nβ
n

2 2 ∑
= (1 + β 2 )ui xi + βui = σ u2 ,
2
(7)
∂β (1 + β ) i =1 (1 + β )
2 2

7
assuming that Cov(ui,xi) = 0,6 and σu2 is the variance of the equation error term. Therefore,

we propose an alternative adjusted orthogonal moment estimator7 (see Appendix III for

derivation):

2
σ y2 − σ x2 − σ u2 + (σ y2 − σ x2 − σ u2 ) 2 + 4σ xy
βˆ= (8)
2σ xy

It can be shown that this new, adjusted estimator is smaller in absolute value than

the biased classical orthogonal regression estimator described in equation (3) (see

Appendix V).

Similarly, when there are measurement errors in the variables, equation (A16), in

Appendix II, can be re-written as,

β 2 s xy − β ( s 2y − λs x2 − σ u2 ) − λs xy = 0 . (9)

Solving (9), we obtain a new, adjusted orthogonal regression estimator for the case

of errors in variables, (see Appendix IV for the derivation):

2
s 2y − λs x2 − σ u2 + ( s y2 − λs x2 − σ u2 ) 2 + 4λs xy
βˆ= . (10)
2 s xy

It can be shown that this adjusted estimator is also smaller in absolute value than

the biased estimator when the equation error is omitted (proof is the same as that in

Appendix V).

6
Although the perpendicular error is dependent on both X and Y, the vertical error and the horizontal error are

only dependent on either Y or X, depending on whether the equation is presented in the direct or reverse

order. Therefore, the vertical error ui is asymptotically uncorrelated with xi.

7
∂U
This is not an orthogonal least square estimator, since ≠0.
∂β

8
Our adjusted orthogonal estimators are asymptotically unbiased estimates of the

true slope. To see this, apply equations (A12), (A13), (A15) and (A19) in the appendices to

(10). We then have,

( β 2σ X2 + σ ε2y + σ u2 ) − λ (σ X2 + σ ε2x ) − σ u2 + (( β 2σ X2 + σ ε2y + σ u2 ) − λ (σ X2 + σ ε2x ) − σ u2 ) 2 + 4λβ 2σ X4


βˆ=
2 βσ X2
( β 2 − λ )σ X2 + (( β 2 − λ )σ X2 ) 2 + 4λβ 2σ X4
=
2 βσ X2

( β 2 − λ ) + (( β 2 + λ ) 2
=

=β . (11)

It is obvious that when there are no errors in the variables (which corresponds to

the case when λ = 1), our adjusted orthogonal estimator is also an unbiased slope

estimator, as is the OLS estimator. However, the traditional orthogonal regression

estimator (see equation 3) overestimates the true slope.

In the case of errors in the variables, it is well known that the OLS slope estimator

underestimates the true slope in absolute terms, that is,

s xy s xy s xy
βˆOLS = 2 = 2 < = β , when β>0,
s x σ X + σ ε2x σ X2

s xy
> = β , when β<0.
σ X2

(12)

It is therefore interesting to note that in the case of errors in the variables, while the

OLS slope estimator underestimates the slope, the traditional orthogonal regression

9
estimator overestimates it. Only our new adjusted orthogonal model provides an unbiased

slope estimator.

IV. THE USE OF ORTHOGONAL REGRESSION

Based on the properties of the data, specific models of orthogonal regression might be

chosen according to whether there are measurement errors and/or equation errors. We

describe the various permutations in this section:

Model 1: Classical orthogonal regression (OR1), defined as in equation (3). This

estimator assumes no measurement errors in the independent variables as well as no

equation errors. This model only takes into account the explained sum of squares and

therefore over-optimistically estimates the slope. Theoretically, as the assumptions are

rarely met in reality, this model is in general not appropriate to use. However, this model

has certain advantages. It is easy to use since it does not require prior knowledge of the

unobserved measurement errors and the unobserved equation errors that are difficult to

obtain in practice. Moreover, unlike OLS which only minimises the vertical or the

horizontal errors, this method minimises both the vertical and the horizontal errors. As a

result, in cases when there are measurement errors in both dependent and independent

variables, the classical orthogonal regression model (OR1) would seem superior to the

OLS, especially when the variance ratio of the measurement errors is close to 1, or when

the equation error’s variance is considered not too large.

Model 2: This is our new adjusted orthogonal regression method (OR2), defined as

in equation (8). The assumptions made when using this model are that there are equation

errors but no measurement error in the independent variable. The use of this method is

10
appropriate when there is either no measurement error in the independent variable or when

there are measurement errors in both variables but the variance ratio of the measurement

errors is equal to 1. In practice, since σ u is unknown, we use OR1 to estimate σ u , and


2 2

then run OR2. We call this the two-step adjusted orthogonal regression, denoted OR2(1).

Model 3. Orthogonal regression in errors-in-variables model (OR3), defined as in

equation (6). This model assumes that there is no equation error, although it acknowledges

the existence of measurement errors. The use of this model will be appropriate, say, when

two different measurements are used to measure the same quantity. A common example of

this case is in experimental research.

Model 4. This is our new adjusted orthogonal regression method for errors-in-

variables (OR4), defined as in equation (10). This model acknowledges the existence of

measurement errors in variables and of equation errors. Therefore, theoretically this model

might be more applicable than the others in the real world, as most types of data in the

social sciences are measured with errors and their relationships rarely fall exactly on a

straight line (except when two measurements measure the same phenomenon). However, a

severe limitation in using this method is the difficulty in obtaining the measurement error

variance ratio and the variance of the equation errors. It is suggested that some additional

estimations are done to obtain replicates of observations for the purpose of estimating the

measurement error variance (see, e.g., Carroll and Ruppert (1996)). However, while

obtaining replicated observations might be possible in some areas (such as in the physical

sciences), it might not be as easy to do in non-experimental areas (such as finance or

economics) that mostly utilise historical data.

11
V. MONTE CARLO SIMULATION

We simulate two random variables and test their relationship using OLS and the four

different models of orthogonal regression (OR1 to OR4). We choose S&P 500 companies’

employment data during 1996 – 1999, denoted as X, as the true independent variable. It

should be pointed out that the choice of actual data for simulation purposes has no effect

on the power of the test and on the comparison of model performance.

A. The performance of classical orthogonal regression when the variances of the two

variables are equal

As we discussed in Sections II and III, the classical orthogonal estimator is

deficient in that it fails to account for the equation errors. As a result, the estimator will

always be equal to 1 as long as the variance ratio between the two variables is equal to 1.

We assume that employment, X, is the independent variable, and that the true slope, β,

takes a value between (-1, 1) For each given value of β, we generate the dependent

variable, Y, conditional on Y’s variance being equal to that of X’s. For each given β, we

run the regression of Y on X 500 times. We thus compare the estimated slopes obtained

using different regression models. See Appendix VI(A) for details of the simulation

procedures. The comparison of performance is given in Table I.

In Table I, all the slope estimates using classical orthogonal regression (OR1) are

statistically indifferent from 1, despite the fact that the true slope takes a value from –0.9 to

0.9. This result indicates that there are severe estimation mistakes made by OR1 whenever

the variance ratio between the dependent and independent variables is close to 1. If the

equation errors are taken into account, as shown in Table I, the adjusted orthogonal

12
regression (OR2) correctly estimates the true slopes. The results using OR2 are similar to

that based on OLS.

B. Comparison of estimations in errors-in-variables models

We choose the measurement error variance ratios λ= 0.5, 1, 1.5, and β = -1.5, -0.5,

0.5, 1. For each combination of (λ,β), we simulate two variables as our “observations”

with random measurement errors. We do not consider the case of equal variances of the

dependent and independent variables, as we have examined this in section V.A. For each

combination of the parameters, we run the regressions 500 times to compare the

performance of different regression models. See Appendix VI(B) for details of the

simulation.

The models we will test are listed in Table II. Using the estimators in Table II, we

consider, 1) the deviation of the slope estimator from the true slope and 2) the prediction

error.

B.1. Model performance when there are no equation errors

The variance of the equation error is assumed to be zero. The performance of the

different models is shown in Table III.

When there are no equation errors but only measurement errors in the variables, we

expect a priori that the use of the classical errors-in-variables model (OR3) will be

appropriate, and that OLS will be inferior since it does not account for the errors in the

variables. (The traditional model OR1 is a special case of OR3 when the measurement

error variance ratio • is 1.) T he above predictions are found to be valid in T able I I I . I n

13
terms of the slope estimations, for all combinations of • and • , both the traditional

orthogonal regression method for no errors in variables (OR1) and that for errors in

variables (OR3) perform better than OLS. The best-unbiased estimator is OR3.8 Regarding

the prediction errors, the OLS prediction errors are slightly smaller than those for the other

methods. However the difference between those of OLS and OR3 are very small.

Therefore, our overall assessment is that, in the presence of measurement errors, the errors-

in-variables orthogonal regression model (OR3) is the most preferred model for obtaining

an unbiased slope estimator.

B.2. Model performance when there are equation errors

We set the mean deviation due to the error term to be about 30% of the total mean

deviation of the dependent variable 9 . Using the estimators in Table II, we run each

regression 500 times and compare their performance. Table IV gives the results.

Consistent with our expectation, our adjusted errors-in-variable orthogonal

regression estimator (OR4) performs very well. OR4 performs better than OR2 (which

accounts for equation errors but not errors in variables), OR3 (which accounts for errors in

variables but not equation errors), OR1 (which accounts for neither errors in variables nor

equation errors), and OLS (which does not account for errors in variables). Regarding the

prediction errors, although OLS performs slightly better than the other models, the

differences between OLS and OR4 are very small. Overall, we can confirm that the

traditional orthogonal regression estimators (OR1 and OR3) are inferior to our adjusted

8
Note that OR2 and OR4 are not applicable here as we are assuming that there are no equation errors.
9
We also ran the simulation using other values (5%, 50%) but the conclusion remained the same. That is, the

lower the variance of the dependent variable (compared to that of the independent variable), the more

appropriate is the use of OR1 or OR2.

14
orthogonal regression estimators (OR2 and OR4), since OR2 and OR4 are unbiased

models.

To simulate the case where the variance of the equation errors is unknown, we run

a two-step orthogonal regression. In the first step we obtain the estimated variance of the

equation error by using OR1 or OR3, and in the second step we run our adjusted

orthogonal regression OR2 or OR4 using the estimated equation error variance. We use

βˆOR 2 (1) and βˆOR 4 ( 3 ) to denote the estimators obtained by this two-step orthogonal regression

based on OR2 and OR4, respectively. From Table IV, it is noted that OR2(1) which is our

2-step orthogonal slope estimator using the estimated equation error variance obtained

from OR1, is one of the best estimators for slope estimation. It can be seen that in all cases

when the variance of the equation errors is unknown, OR2(1) performs much better than

all the other models including OLS. This is really encouraging news since in practice, it is

hard to obtain the measurement error variance ratio and the variance of the equation errors.

Based on our simulation tests, we can see that even if one does not have prior information

about the error variances, we can still obtain a relatively precise slope estimate using

OR2(1).

C. Comparison of estimations when there are equation errors but no errors in

variables

The models we will test are listed in Table II, except that, since there are no errors

in variables, the errors-in-variables models OR3 and OR4 are not applicable. Assuming β

= -1.5, -1, -0.5, 0.5, 1, 1.5, for each model, we compare the deviation of the slope estimator

from the true slope and the prediction errors. The results are displayed in Table V.

Appendix IV(B) gives details of the simulation.

15
It can be seen from Table V that in all cases, our adjusted orthogonal regression

(OR2) and the OLS give better slope estimates than the traditional orthogonal regression

estimator OR1. The slope estimates obtained by OR2 are statistically similar to those

obtained by OLS. This result is reasonable since in the case of equation errors but no

measurement errors, both OLS and our adjusted OR2 are unbiased slope estimators.

A summary of all our empirical results presented in section V is given in table VI.

VI. CONCLUSION

In this paper, we perform a thorough examination of two-variable, linear orthogonal

regression, both in the case of classical regression assuming no measurement errors and the

case of errors-in-variables models. We prove that the orthogonal regression estimators used

by previous authors are inappropriate in most cases. The problem lies in the fact that

previous orthogonal regression estimators assume an exact straight-line relationship

between the dependent and the independent variables with no equation errors. This is too

restrictive and not applicable to most practical uses. We therefore developed an adjusted

orthogonal regression estimator, taking into account the equation errors. We show that,

whether or not there are measurement errors in the variables, our adjusted orthogonal

regression estimators provide unbiased slope estimation. In the case of errors in the

variables, while the OLS slope estimator underestimates the slope in absolute terms, the

traditional orthogonal regression estimator overestimates it. Only our new adjusted

orthogonal model provides an unbiased slope estimator. Using Monte Carlo simulation, we

compared the performance of the previous versions of orthogonal estimators, the adjusted

16
orthogonal estimators and the simple ordinary least squares estimators. The theoretical and

simulation results show that:

1. When there are equation errors but no measurement errors in the independent

variable, the classical orthogonal regression (OR1) fails to account for the equation

errors. One implication of this is that, whenever the two variable’s variances are

equal, an (incorrect) unit slope estimate will always be rendered. This means that

one can always obtain a unit slope relationship between two variables by simply re-

scaling one of the variables – i.e., using a different measurement unit. We show in

simulation tests that when the two variables’ variances are equal, no matter what

the true slope value is, the classical orthogonal regression will incorrectly render a

unit slope estimate. Our adjusted orthogonal estimator (OR2) successfully corrects

for this problem.

2. When there are equation errors and measurement errors in the independent

variable, the conventional orthogonal estimator for errors-in-variables (OR3) will

(incorrectly) produce a unit slope estimate whenever the two variable’s variance

ratio and the measurement error’s variances ratio are equal to one. This is once

again because the traditional 'errors


-in-variables' orthogonal regression estimator

(OR3) fails to account for equation errors. In our simulation tests, when there are

errors in the variables and the variance of the equation errors is known, the best

performing model is our adjusted orthogonal regression (OR4). However, when the

variance ratio of the equation errors and variance ratio of the measurement errors

are not known, the best performing model is our two-step adjusted orthogonal

regression OR2(1).

3. When there are errors-in-variables but no equation errors, the best performing

model (in simulation tests) is the classical orthogonal regression for errors in

17
variables (OR3). However, for most cases in the real world, there are likely to be

equation errors. Therefore, the classical orthogonal regression estimator is unlikely

to be applicable in many practical situations.10

We conclude from our simulations that for most practical uses, that is, when there

are both measurement errors and equation errors and we have no prior information about

the error variances, our 2-step orthogonal regression method, OR2(1), can deliver a much

more precise slope estimator than the traditional orthogonal regression method (OR1 or

OR3) or OLS. We therefore recommend the use of our new adjusted orthogonal regression

model OR2(1) in such instances.

10
Incidentally, when there are no errors in variables and no equation errors, all the regression models are

invalid because there is no error to minimise.

18
References

Adcock, R. J., 1878, A Problem in Least Squares, The Analyst, 5, 53-54.

Allen, R. G. G., 1939, The assumptions of Linear Regression, Economica 6, 191-201.

Anderson, T. W., 1984, Estimating Linear Statistical Relationships, The Annals of Statistics 12,

1-45.

Boggs, P. T., and C. H. Spiegelman, 1988, A Computational Examination of Orthogonal

Distance Regression, Journal of Econometrics 38, 169-201.

Brown, M. L., 1982, Robust Line Estimation With Errors in Both Variables, Journal of

American Statistical Association 77, 71-79.

Carroll, R. J., and D. Ruppert, 1996, The Use and Misuse of Orthogonal Regression in Linear

Errors-in-Variables Models, The American Statistician, 50, 1-6.

Clark, D. P., 1999, Regional Exchange Rate Indexes for the United States, Journal of Regional

Science 39, 149-166.

Dent, B., 1935, On observations of point connected by a linear relation, Proc. of the Physical

Soc. 47, 92-106.

Fuller, W.A., 1987, Measurement error models, Wiley, New York.

Greene, W. H., 1997, Econometric analysis, 3rd ed., Prentice Hall.

Greene, W. H., 2000, Econometric analysis, 4th ed., Prentice Hall.

Hart, P. E., and N. Oulton, 1996, Growth and size of firms, The Economic Journal 106,

1242-1252.

19
Jackson, J. D., and J. A. Dunlevy, 1988, Orthogonal Least Squares and the Interchangeability of

Alternative Proxy Variables in the Social Sciences, Statistician 37, 7-14.

Kennedy, P., 1998, A Guide to Econometrics, 4th ed., Blackwell Publishers Ltd., Oxford.

Kmenta, J., 1997, Elements of Econometrics, 2nd ed., The University of Michigan Press, Ann

Arbor.

Koopmans, T. C., 1937, Linear Regression Analysis of Economic Time Series, DeErven F. Bohn,

Haarlem, The Netherlands.

Kummel, C. H., 1879, Reduction of Observed Equations Which Contain More Than One

Observed Quantity, The Analyst 6, 97-105.

Leamer, E. E., 1978, Specification Searches: Ad Hoc Inference with Nonexperimental Data,

John Wiley & Sons.

Liao, J. J. Z., 2002, An insight into linear calibration: univariate case, Statistics & Probability

Letters, 50, 271-281.

Lindley, D. V., 1947, Regression Lines and the Linear Functional Relationship, Supplement to

the Journal of the Royal Statistical Society 9, 218-244.

Madansky, A., 1959, The fitting of straight lines when both variables are subject to error, Journal

of American Statistical Association 54, 173-205.

Maddala, G. S., 2001, Introduction to Econometrics, 3rd ed. John Wiley and Sons.

Malinvaud, E., 1970, Statistical Methods of Econometrics, North-Holland, Amsterdam.

Nguyen, T. H., and J. C. Cosset, 1995, The measurement of the degree of foreign involvement,

Applied Economics 27, 343-351.

Pearson, K., 1901, On Lines and Planes of Closest Fit to systems of Points in Space, Philos.

Mag. 2, 559-572.

20
Prais, S. J., 1958, The Statistical Conditions for a Change in Business Concentration, The Review

of Economics and Statistics 40, 268-272.

Reza, A. M., 1978, Geographical Differences in Earnings and Unemployment Rates, The Review

of Economics and Statistics 60, 201-208.

Shalit, S. S., and U. Sankar, 1977, The Measurement of Firm Size, The Review of

Economics and Statistics 59, 290-298.

Smyth D. J., W. J. Boyes, and D. E. Peseau, 1975, The measurement of firm size: theory

and evidence for the united states and the united kingdom, The Review of

Economics and Statistics 57, 111-114.

Tintner, G., 1945, A Note on Rank, Multicollinearity and Multiple Regression, Annals of

Mathematical Statistics 16, 304-308.

21
Appendix I. Derivation of classical orthogonal regression estimators

If we directly minimise the distance between the observations and the fitted line, the

orthogonal regression estimator will be obtained. In figure 1, θ is the angle that the fitted

line makes with the X-axis. We have β = tg(θ). The squared distance between an

observation A(xi, yi) and the fitted line is

( y i − α − βx i ) 2
AD 2 = [cos(θ )( y i − α − βxi )] =
2
(A1)
1+ β 2

( y i − α − βx i ) 2
We now use orthogonal least squares to minimise U = ∑ . Differentiating
1+ β 2

U partially with respect to α and β and setting the differentiation equal to 0 yields:

∂U n
− 2( yi − α − βxi )
=∑ = 0, (A2)
∂α i =1 1+ β 2

∂U n
− 2(1 + β 2 )( yi − α − βxi ) xi − 2b(( yi − α − βxi ) 2
=∑ = 0, (A3)
∂β i =1 (1 + β 2 ) 2

where n is the number of observations. From equation (A2), we get,

αˆ= y − βx , (A4)

where y and x are the sample means of x and y respectively. Applying (A4) to (A3) and

rearranging, we get,

β 2σ xy − β (σ y2 − σ x2 ) − σ xy = 0 , (A5)

22
where σx2 and σy2 are the sample variances of x and y, and σxy is the sample covariance

between x and y. Solving equation (A5) for β, we obtain,

σ y2 − σ x2 ± (σ y2 − σ x2 ) 2 + 4σ xy
2

β= . (A6)
2σ xy

Applying (A4) to (A1),

U =∑
n
[( y i − y ) − β ( xi − x ) ]
2

i =1 1+ β 2
(A7)
( n − 1)(σ y2 + σ x2 − 2 βσ xy )
=
1+ β 2

To minimize U, β and σxy should take the same sign. Therefore, the numerator of the right

hand side in equation (A6) should be positive. Thus, the classical orthogonal least squares

slope estimator is,

σ y2 − σ x2 + (σ y2 − σ x2 ) 2 + 4σ xy
2

β= (3)
2σ xy

Appendix II. Derivation of classical orthogonal regression estimators when there are

measurement errors in variables

Assume x and y are two variables measured with errors. That is,

x = X + εx , (4)

y = Y + εy , (5)

23
where εx and εy are measurement errors for x and y respectively. The true variable X and Y

should be independent of the measurement errors εx and εy. Assuming the true variable X

and Y take the relationship,

Y = α + βX + u, (A8)

where,

E(εx, εy u)’ = (0 0 0)’, (A9)

σ ε2x 0 0
 
Var − cov(ε x εy =  0 σ ε2y
u )' 0 . (A10)
 0 0 σ u2 

Traditional errors-in-variables models assume no equation error (see Fuller (1987)). That

is, equation (A9) is written as,

Y = α + βX . (A11)

From equations (4), (5) and (A11), we obtain,

s X2 = σ X2 + σ ε2x , (A12)

sY2 = σ Y2 + σ ε2y , (A13)

σ Y2 = β 2σ X2 , (A14)

s xy = βσ X2 , (A15)

24
where we substitute the sample moments for population moments of the variance-

covariance of (x y)’ and cov(x,y) = cov(X,Y). Assuming the measurement error variance

ratio is known, σεy2 /σεx2= λ, from equations (A12) to (A14), we have,

β 2 s xy − β ( s 2y − λs x2 ) − λs xy = 0 . (A16)

Solving (A16), we obtain the orthogonal estimator when there are errors in variables11,

2
s y2 − λs x2 + ( s y2 − λs x2 ) 2 + 4λs xy
βˆ= , (6)
2 s xy

where β should take the same sign as sxy, as shown in equation (A15).

Appendix III. Derivation of the adjusted orthogonal regression estimators

The estimator in equation (3) is flawed in that it is obtained by omitting the effect of

the error term. Examining (A3), we find that the least squares method utilizes the

∂U
questionable assumption that can take the value zero, whereas in fact the value it can
∂β

− 2 nβ
take is σ u2 . To see this,
(1 + β )
2 2

11
This estimator can be derived in different ways and therefore has been accorded various names. For

example, it is called orthogonal distance regression (see Boggs and Spiegelman, 1988), generalized least

squares estimator (see Anderson, 1984), moment estimator (see Fuller, 1987), or maximum likelihood

estimator (see Brown, 1982).

25
∂U −2
[ − 2nβ
]
n

2 2 ∑
= (1 + β 2 )ui xi + βui = σ u2
2
,
∂β (1 + β ) i =1 (1 + β )
2 2

(A17)

assuming that Cov(ui,xi) = 0, and σu2 is the maximum likelihood estimator of the equation

error. Therefore, Equation (A5) becomes,

β 2σ xy − β (σ y2 − σ x2 − σ u2 ) − σ xy = 0 .

(A18)

Solving equation (A18), we obtain the adjusted orthogonal moment estimator,

σ y2 − σ x2 − σ u2 + (σ y2 − σ x2 − σ u2 ) 2 + 4σ xy
2

βˆ= (8)
2σ xy

Appendix IV. Derivation of the adjusted orthogonal regression estimators when there

are errors-in-variables

When there are measurement errors in variables, we can follow equation (A8)

instead of (A11) and re-write equation (A14) as,

σ Y2 = β 2σ X2 + σ u2 , (A19)

and equation (A16) can be re-written as,

β 2 s xy − β ( s 2y − λs x2 − σ u2 ) − λs xy = 0 . (A20)

Solving (A20), we obtain the adjusted orthogonal moment estimator for the case

where there are errors in variables,

26
2
s 2y − λs x2 − σ u2 + ( s y2 − λs x2 − σ u2 ) 2 + 4λs xy
βˆ= . (10)
2 s xy

Appendix V. Proof that the adjusted orthogonal estimator is smaller in absolute value

than the classical orthogonal regression estimator.

The classical slope estimator of the orthogonal regression is,

σ y2 − σ x2 + (σ y2 − σ x2 ) 2 + 4σ xy
2

βˆ =
OR1
(3)
2σ xy

Our new, adjusted orthogonal estimator to account for the equation errors is,

σ y2 − σ x2 − σ u2 + (σ y2 − σ x2 − σ u2 ) 2 + 4σ xy
2

βˆOR 2 = (8)
2σ xy

We would like to prove that the absolute value of the adjusted estimator is smaller than the

unadjusted one, that is,

| βˆ | < | βˆ |
OR 2 OR1

(A21)

Proving (A21) is equivalent to proving,

− σ u2 + (σ y2 − σ x2 − σ u2 ) 2 + 4σ xy < (σ y2 − σ x2 ) 2 + 4σ xy
2 2

(A22)

(A22) can be written as,

27
(σ y2 − σ x2 − σ u2 ) 2 + 4σ xy < (σ y2 − σ x2 ) 2 + 4σ xy + σ u2
2 2

(A23

Since both sides are positive, squaring both sides and rearranging we get,

− 2σ u2 (σ y2 − σ x2 ) < 2σ u2 (σ y2 − σ x2 ) 2 + 4σ xy
2

(A24)

The right hand side of equation (A24) is therefore obviously larger.

Appendix VI. Procedures for Monte Carlo Simulations

We simulated two random variables and tested their relationship using the OLS

regression method and different models of orthogonal regressions. We choose S&P 500

companies’ employment data during 1996 – 1999, denoted as X, as the true independent

variable. The descriptive statistics of the log employment data are as follow.

Mean Sample Variance Minimum Maximum Count

9.89 1.86 5.04 13.72 1411

It should be pointed out that the choice of actual data for simulation purpose has no effect

on the power of the test and therefore we do not discuss the details of the actual data.

28
Appendix VI.A. The classical orthogonal regression when the variances of variables

are equal

Assume the employment data, X, are the “true” independent observations, and

assume the true slope, β, takes a series of values between (-1, 1). Without loss of

generality, we set a zero intercept. For each given β, we generate the dependent variable Y

as,

Y = β X + u, (A25)

where u is the error term with zero mean and i.i.d. normal property. The variance of u is,

σˆu2 = σ x2 − β 2σ x2 , (A26)

where σx2 is the variance of the independent variable, X. Thus the variance of Y is equal to

that of X,

σˆy2 = β 2σ x2 + σ u2 = σ x2 (A27)

From equation (A26), we can see that β can only take values between (-1, 1).

Appendix VI.B. Comparison of estimations

Step 1: Without loss of generality, we use the following equations,

Y = β X + u, (A28)

x = X + εx , (A29)

y = Y + εy , (A30)

29
where X, Y are the true variables, x, y are the “observations”, εx , εy are the measurement

errors, and u is the equation error which is normally distributed with zero mean. When

there are errors in variables, εx and εy are normally distributed with zero mean. Let λ =

σεy2/σεx2, d = σεx2/σx2 and δ = σu2/(σx2 + σy2), where σεy2, σεx2, σx2, σy2 are the variances for

εx , εy , X and Y. Without loss of generality, we choose δ = 0, to assume no equation error

and δ = 0.3 to simulate equation error, and we choose d =0.1.12,13 When there are no errors

in variables, σεy2=σεx2=0 and λ does not exist.

Taking a combination of (λ, β), where λ = 0.5, 1, 1.5, and β = -1.5, -0.5, 0.5, 1.5,

we can get a set of “observations” of x, Y and y based on one series of X. When there are

no errors in variables, we only change the value of β.

Step 2: Using x, y, run all regression models as listed in Table II. For each

regression, obtain the estimators αˆ and βˆ. Then calculate the absolute value of, 1) βˆ- β,

1 n
and 2) the average prediction error, yˆe = ∑ | y − αˆ− βˆx | .
n 1

Step 3: Fixing the combination of (λ, β), we run the process from step 1 to step 2

500 times. Thus, for each combination of (k, λ) and for each regression model, we have

500 sets of estimates as stated in Step 2. Obtain the mean values and standard deviations of

those estimates as follows:

12
We also ran the simulation using other values of d (0.005, 0.05, 0.20), the conclusion of which is basically

the same in that the lower the variance of the dependent variable (compared to that of the independent

variable), the more appropriate the use of OR1 or OR2.


13
Some previous researchers have used or implied different values of d. For example, Boggs and Spiegelman

(1988) assumed d was about 0.006 and Carroll and Ruppert’s (1996) case studies covered d values from

0.0424 to 0.80.

30
1 500 ˆ
β bias = ∑| βi − β | ,
500 i =1
(A31)

∑ (| βˆ − β | − β )
500 2
i bias
Sd βbias = i =1
, (A32)
500(500 − 1)

1 500 ˆ
y bias = ∑ y ei ,
500 i =1
(A33)

∑ (yˆ )
500
2
ei − y bias
Sd ybias = i =1
(A34)
500(500 − 1)

Step 4: change the combination of (k, λ), run simulation from step 1 to step 3 and

obtain the related estimates for comparison.

31
Table I
Slope estimates using OLS, classical orthogonal regression and adjusted
orthogonal regression

We generate the dependent variable Y based on the given X according to various pre-assigned slope
values, and conditional on the variance of Y being equal to that of X. See Appendix IV(A) for the
simulation details. The regression equation is, Y = α + βX + u. The estimators are described in Table
II. βˆ OR 1
βˆ is our proposed new orthogonal estimator adjusted
is the classical orthogonal estimator; OR 2

for equation error (assuming σ u known) and βˆ is the OLS estimator. The values in brackets are
2
OLS

the standard errors of the slope estimates based on 500 simulations.

True
-0.9 -0.7 -0.5 -0.3 -0.1 0.1 0.3 0.5 0.7 0.9
β

βˆ OR 1
-1.001 -1.000 -1.005 -1.005 -1.017 1.006 0.993 0.995 0.999 1.000
(0.001) (0.002) (0.004) (0.006) (0.019) (0.020) (0.006) (0.003) (0.002) (0.001)

βˆ OR 2
-0.901 -0.699 -0.504 -0.304 -0.105 0.101 0.298 0.496 0.699 0.900
(0.001) (0.002) (0.003) (0.003) (0.002) (0.003) (0.003) (0.002) (0.002) (0.001)

βˆ OLS
-0.901 -0.697 -0.504 -0.303 -0.105 0.101 0.298 0.496 0.698 0.900
(0.001) (0.002) (0.002) (0.003) (0.002) (0.003) (0.003) (0.002) (0.002) (0.001)

32
Table II
Regression models used in this study

Models Slope estimator Main assumptions


OLS
σ σ ε2x = 0 ,
βˆ = xy2 Cov (u , x) = 0 .
σx
OLS

OR1 σ ε2x = 0 ,
σ y2 − σ x2 + (σ y2 − σ x2 ) 2 + 4σ xy
2

βˆOR1 = σ ε2y = 0 , σ u2 = 0 .
2σ xy
OR2 σ ε2x = 0 ,
σ y2 − σ x2 − σ u2 + (σ y2 − σ x2 − σ u2 ) 2 + 4σ xy
2

βˆOR 2 =
2σ xy Cov (u , x) = 0 .
OR3 σ u2 = 0
s y2 − λs x2 + ( s y2 − λs x2 ) 2 + 4s xy
2

βˆOR 3 =
2s xy
OR4
s y2 − λ s x2 − σ u2 + ( s y2 − λ s x2 − σ u2 ) 2 + 4 λ s xy
2

βˆOR 4 =
2 s xy

Where, OLS - Ordinary least square;


ORi – Orthogonal regression, model i.
We assume that y = á + âx + å. The estimators for the intercept are the same for all models,
which is αˆ= y − βx .

33
Table III. Errors-in-variables regressions when there are no equation errors

Setting the values for β and the measurement error variance ratio λ, we simulate two random variables. Using the estimators in Table II, we consider, 1) the
deviation of the slope estimator from the true slope and 2) the prediction error. We run the simulation 500 times. In each cell, the values in the first row are
the average deviation, and those in the second row (in brackets) are the standard errors.βbias is the mean absolute deviation of the slope estimates, and ybias
is the absolute deviation of the prediction errors.

True β -1.5 1.5


λ 0.5 1 2 0.5 1 2

 βbias  ybias  βbias  ybias  βbias  ybias  βbias  ybias  βbias  ybias  βbias  ybias

βˆ OR 1
0.0227
(0.0006)
0.5073
(0.0005)
0.0119
(0.0004)
0.5151
(0.0005)
0.0462
(0.0008)
0.5336
(0.0005)
0.0237
(0.0006)
0.5088
(0.0004)
0.0122
(0.0004)
0.5162
(0.0005)
0.0455
(0.0008)
0.5330
(0.0005)

βˆOR 3 0.0108
(0.0004)
0.5146
(0.0005)
0.0119
(0.0004)
0.5151
(0.0005)
0.0142
(0.0005)
0.5152
(0.0005)
0.0110
(0.0004)
0.5161
(0.0005)
0.0122
(0.0004)
0.5162
(0.0005)
0.0150
(0.0005)
0.5146
(0.0005)

βˆ OLS
0.1354
(0.0005)
0.4905
(0.0004)
0.1369
(0.0006)
0.4912
(0.0004)
0.1369
(0.0007)
0.4914
(0.0004)
0.1370
(0.0005)
0.4920
(0.0004)
0.1360
(0.0006)
0.4919
(0.0004)
0.1370
(0.0007)
0.4907
(0.0004)

True β -0.5 0.5


λ 0.5 1 2 0.5 1 2

 βbias  ybias  βbias  ybias  βbias  ybias  βbias  ybias  βbias  ybias  βbias  ybias

βˆ OR 1
0.0193
(0.0003)
0.1669
(0.0002)
0.0083
(0.0003)
0.1723
(0.0002)
0.0426
(0.0006)
0.1930
(0.0004)
0.0199
(0.0003)
0.1667
(0.0002)
0.0081
(0.0003)
0.1722
(0.0002)
0.0430
(0.0006)
0.1929
(0.0004)

βˆOR 3 0.0059
(0.0002)
0.1722
(0.0002)
0.0083
(0.0003)
0.1723
(0.0002)
0.0104
(0.0003)
0.1728
(0.0002)
0.0058
(0.0002)
0.1720
(0.0002)
0.0081
(0.0003)
0.1722
(0.0002)
0.0106
(0.0003)
0.1726
(0.0002)

βˆ OLS
0.0454
(0.0003)
0.1642
(0.0001)
0.0454
(0.0004)
0.1642
(0.0001)
0.0454
(0.0005)
0.1648
(0.0002)
0.0459
(0.0003)
0.1641
(0.0001)
0.0453
(0.0004)
0.1641
(0.0002)
0.0452
(0.0005)
0.1646
(0.0002)

34
Table IV. Errors-in-variables regressions when there are equation errors
Assuming different values for β and the measurement error variance ratio λ, we simulate two random variables. We assume that the equation error is 30% of
the sum of the variances of the two variables. Using the estimators as in Table II, we compare, 1) the deviation of slope estimator from the true slope and 2)
the prediction error. We run each regression 500 times. In each cell, the value in the first row is the average deviation, and that in the second row (in
brackets) is the standard deviation.βbias is the mean absolute deviation of the slope estimates, and ybias is the absolute deviation of the prediction errors.
Apart from the estimators listed in Table II, we add two estimators based on the estimated equation error variance σˆu2 . βˆ OR 2 (1) is the OR2 slope estimator

using the estimated equation error variance obtained by OR1, and βˆOR 4 ( 3 ) is the OR4 slope estimator using the estimated equation error variance
obtained by OR3.

Panel A: β = -1.5, 1.5


True β -1.5 1.5
λ 0.5 1 2 0.5 1 2

 βbias  ybias  βbias  ybias  βbias  ybias  βbias  ybias  βbias  ybias  βbias  ybias

βˆ OR 1
0.1305
(0.0010)
0.5782
(0.0006)
0.1571
(0.0010)
0.5937
(0.0007)
0.2034
(0.0012)
0.6253
(0.0008)
0.1307
(0.0010)
0.5785
(0.0007)
0.1545
(0.0010)
0.5928
(0.0007)
0.2039
(0.0012)
0.6262
(0.0008)

βˆ OR 2
0.0252
(0.0008)
0.5085
(0.0005)
0.0182
(0.0006)
0.5161
(0.0006)
0.0468
(0.0011)
0.5340
(0.0006)
0.0262
(0.0008)
0.5091
(0.0005)
0.0177
(0.0006)
0.5155
(0.0005)
0.0472
(0.0011)
0.5344
(0.0006)

βˆOR 2 (1) 0.0147


(0.0005)
0.5123
(0.0005)
0.0133
(0.0005)
0.5138
(0.0005)
0.0143
(0.0005)
0.5167
(0.0005)
0.0159
(0.0005)
0.5126
(0.0005)
0.0135
(0.0004)
0.5137
(0.0005)
0.0146
(0.0005)
0.5167
(0.0005)

βˆOR 3 0.1810
(0.0010)
0.6104
(0.0007)
0.1571
(0.0010)
0.5937
(0.0007)
0.1187
(0.0011)
0.5710
(0.0007)
0.1814
(0.0011)
0.6107
(0.0007)
0.1545
(0.0010)
0.5928
(0.0007)
0.1188
(0.0011)
0.5715
(0.0007)

βˆOR 4 0.0171
(0.0005)
0.5158
(0.0005)
0.0182
(0.0006)
0.5161
(0.0006)
0.0193
(0.0006)
0.5157
(0.0005)
0.0181
(0.0006)
0.5164
(0.0006)
0.0177
(0.0006)
0.5155
(0.0005)
0.0196
(0.0007)
0.5159
(0.0006)

βˆOR 4 ( 3 ) 0.0840 0.4952 0.0133 0.5138 0.0784 0.5490 0.0845 0.4957 0.0135 0.5137 0.0783 0.5493
(0.0007) (0.0004) (0.0005) (0.0005) (0.0009) (0.0006) (0.0008) (0.0004) (0.0004) (0.0005) (0.0009) (0.0006)

βˆ OLS
0.1352
(0.0008)
0.4916
(0.0004)
0.1354
(0.0008)
0.4918
(0.0004)
0.1362
(0.0009)
0.4919
(0.0004)
0.1356
(0.0009)
0.4921
(0.0004)
0.1361
(0.0008)
0.4918
(0.0004)
0.1371
(0.0010)
0.4921
(0.0005)

35
Table IV. (Continued)

Panel B: β = -0.5, 0.5


True β -0.5 0.5

λ 0.5 1 2 0.5 1 2

 βbias  ybias  βbias  ybias  βbias  ybias  βbias  ybias  βbias  ybias  βbias  ybias

βˆ OR 1
0.0321
(0.0006)
0.1865
(0.0003)
0.0546
(0.0007)
0.2002
(0.0004)
0.1040
(0.0009)
0.2374
(0.0007)
0.0321
(0.0006)
0.1869
(0.0003)
0.0546
(0.0007)
0.2003
(0.0004)
0.1034
(0.0009)
0.2365
(0.0007)

βˆOR 2
0.0194
(0.0005)
0.1671
(0.0002)
0.0107
(0.0004)
0.1729
(0.0002)
0.0427
(0.0008)
0.1936
(0.0005)
0.0196
(0.0005)
0.1675
(0.0002)
0.0115
(0.0004)
0.1729
(0.0003)
0.0418
(0.0008)
0.1929
(0.0005)

βˆOR 2 (1) 0.0163


(0.0005)
0.1681
(0.0002)
0.0122
(0.0004)
0.1715
(0.0002)
0.0245
(0.0008)
0.1827
(0.0005)
0.0168
(0.0005)
0.1685
(0.0002)
0.0134
(0.0004)
0.1715
(0.0003)
0.0239
(0.0007)
0.1820
(0.0005)

βˆOR 3 0.0936
(0.0006)
0.2280
(0.0005)
0.0546
(0.0007)
0.2002
(0.0004)
0.0300
(0.0007)
0.1862
(0.0004)
0.0937
(0.0006)
0.2284
(0.0005)
0.0546
(0.0007)
0.2003
(0.0004)
0.0291
(0.0007)
0.1855
(0.0004)

βˆOR 4 0.0101
(0.0003)
0.1725
(0.0002)
0.0107
(0.0004)
0.1729
(0.0002)
0.0133
(0.0005)
0.1736
(0.0003)
0.0103
(0.0004)
0.1729
(0.0002)
0.0115
(0.0004)
0.1729
(0.0003)
0.0136
(0.0004)
0.1731
(0.0003)

βˆOR 4 ( 3) 0.0634 0.2059 0.0122 0.1715 0.0302 0.1666 0.0635 0.2063 0.0134 0.1715 0.0309 0.1663
(0.0008) (0.0005) (0.0004) (0.0002) (0.0007) (0.0002) (0.0008) (0.0005) (0.0004) (0.0003) (0.0007) (0.0002)

βˆ OLS
0.0449
(0.0005)
0.1642
(0.0002)
0.0454
(0.0005)
0.1650
(0.0002)
0.0454
(0.0007)
0.1654
(0.0002)
0.0450
(0.0005)
0.1647
(0.0001)
0.0451
(0.0006)
0.1648
(0.0002)
0.0460
(0.0007)
0.1653
(0.0002)

36
Table V. Regressions with equation errors but no errors in variables

Setting the values for β and assuming no measurement errors in variables, we simulate two
random variables. We assume that the equation error is 30% of the sum of the variances of the two
variables. Using the estimators in Table II, we consider, 1) the deviation of the slope estimator
from the true slope and 2) the prediction error. We run the simulation 500 times. In each cell, the
value in the first row is the average deviation, and that in the second row (in brackets) is the
standard error.βbias is the mean absolute deviation of the slope estimates, and ybias is the absolute
deviation of the prediction errors. Apart from the estimators listed in Table II, βˆOR 2 (1) is the OR2
slope estimator using the estimated equation error variance obtained by OR1.

True β -1.5 -1 -0.5

 βbias  ybias  βbias  ybias  βbias  ybias

βˆ OR 1
0.1543
(0.0007)
0.1645
(0.0007)
0.1066
(0.0006)
0.1138
(0.0006)
0.0547
(0.0005)
0.0591
(0.0005)

βˆOR 2 0.0125
(0.0004)
0.0225
(0.0005)
0.0108
(0.0003)
0.0180
(0.0004)
0.0082
(0.0003)
0.0145
(0.0003)

βˆOR 2 (1) 0.0763


(0.0006)
0.0827
(0.0007)
0.0950
(0.0005)
0.1016
(0.0006)
0.0242
(0.0005)
0.0285
(0.0005)

βˆ OLS
0.0118
(0.0004)
0.0220
(0.0005)
0.0102
(0.0003)
0.0176
(0.0004)
0.0082
(0.0003)
0.0145
(0.0003)

True β 0.5 1 1.5

 βbias  ybias  βbias  ybias  βbias  ybias

βˆ OR 1
0.0539
(0.0005)
0.0581
(0.0005)
0.1033
(0.0006)
0.1103
(0.0006)
0.1547
(0.0007)
0.1650
(0.0007)

βˆOR 2 0.0078
(0.0003)
0.0139
(0.0003)
0.0103
(0.0003)
0.0174
(0.0004)
0.0128
(0.0004)
0.0229
(0.0005)

βˆOR 2 (1) 0.0234


(0.0005)
0.0273
(0.0005)
0.0920
(0.0005)
0.0984
(0.0005)
0.0767
(0.0006)
0.0832
(0.0006)

βˆ OLS
0.0078
(0.0003)
0.0139
(0.0003)
0.0098
(0.0003)
0.0170
(0.0004)
0.0118
(0.0004)
0.0220
(0.0005)

37
Table VI
The best performing models: Empirical summary

With equation error, σ u ≠ 0


2
No equation
error ( σ u =0)
2
No errors in variables Errors in variables
Models but with
measurement
σ u2 is known σ u2 is σ u2 is known σ u2 is unknown
errors unknown
OLS √ √
OR1
OR2 √
OR2(1) √
OR3 √
OR4 √
OR4(3)

Where σ u - the variance of the equation errors


2

OLS - Ordinary least squares


OR1 - Classical orthogonal regression
OR2 - Proposed new adjusted orthogonal regression
OR2(1)- Proposed two-step adjusted orthogonal regression
OR3 - Classical orthogonal regression for errors-in-variables
OR4 - Proposed new adjusted orthogonal regression for errors-in-variables
OR4(3)- Proposed two-step adjusted orthogonal regression for errors-in-variables

38
A
y

A λ
C

D
B
θ

Figure 1. Illustration of least squares estimation of orthogonal regression

39

You might also like