0% found this document useful (0 votes)
2 views16 pages

Unit 2

This document provides an overview of the estimation of two-variable regression models, including concepts such as the error term, least-squares estimation, and the coefficient of determination. It explains how to estimate regression coefficients and the significance of the error term in modeling economic relationships. Additionally, it includes examples and exercises to reinforce the understanding of these concepts.

Uploaded by

has.labeeb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views16 pages

Unit 2

This document provides an overview of the estimation of two-variable regression models, including concepts such as the error term, least-squares estimation, and the coefficient of determination. It explains how to estimate regression coefficients and the significance of the error term in modeling economic relationships. Additionally, it includes examples and exercises to reinforce the understanding of these concepts.

Uploaded by

has.labeeb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

UNIT 2 ESTIMATION OF TWO-

VARIABLE REGRESSION
MODEL
Structure
2.0 Objectives
2.1 Introduction
2.2 Error Term
2.3 Estimation
2.4 Least-Squares Estimator
2.5 Coefficient of Determination
2.6 Intrinsically Linear Models
2.7 LetUs SumUp
2.8 Key Wsrds
* 2.9 Some Useful ~ooksf~eferences
2.10 AnswersIHints to Check Your Progress .Exercises

2.0 OBJECTIVES - - -- - - - - -

After going this unit you will be able to:


' ,
explain the concept of two variable model;
explain the justification of errorierm;
estimate beta coefficients;
#

explain the concept of &efficient*of determination; and


C

e explain the concept of intrinsically linear models.

'2.1 INTRODUCTION ,

The simplest economic relation is represented through a two-variable model.


For example, quantity demanded is a function of price; consumption is a
function of income, etc. However, more realistic formulations-require the
specifications of several variables in each model. To simplifjr the presentation
of the estimation for these relationships statistically, we examine first the two-
variable linear equation model, namely
Y=a+bX - ...(2.1)
where a and b are unknown parameters indicating the intercept and the slope
of the model respectively. They are also called the regression coefficients. Y ,
the left-hand variable, is called the dependent variable and X , the right-hand
variable, is called the independent variable.
/
Estimation of Two-variable
2.2 ERROR TERM Regression Model

Economic theory specifies exact functional relationships among its variables.


In reality the measurement of the functional relationships among its variables
is not exact. For a given.observed value of X (the independent variable), we
may observe inany possible values of Y (the dependent variable). As an
example, consider the consumption of &I individual who receives a certain
amount of income each year. Since the amount of money spent on a
particularly item, say food, is likely'to v.ary each year, we assume that for
each observation .X, observations on Y will differ randomly. To describe this
situation 'formally, we add a random "error" term-to the model, and writing it
as .

where Y is a random variable, X is fixed or non-stochastic, and E is a


random error. term whose value is based on an underlying probabilistic
distribution. We have switched our notation to use Greek letters a and P to
represent the intercept and slope of the line, i.e., the regression
\ .
parameters,
since our model now contains a random error term.
There are several justifications fo: the introduction of error term into the
equation. First, we may have errors of specification. Errors appear because the
model is a simplification of reality. It is not always possible to include all
relevant variables in the functional relation. For example, we assume that
price iS a sole determinant of demand for a product. In fact several variables
related to&mand, like individual tastes, population, income, etc. do influence
the price of the product, which are omitted from the function. Some of these
variables have positive effects while others have negative effects. The net
effects of these omitted variables in the equation are represented by the error
term. If the effects of these omitted variables are small, it is reasonable to
assume that the error term is random.
A second source oferror is associated with the collection and measurement of
the data. It is like$ that income and consumption information we obtained
from a household may not be accurate. If the data are obtained from a
government statistical report, the data may not be accurate as a result of
clerical handling and rounding error.

Income
Fig. 2.1: Least Squares Fitarng and Random Disturbances
Basic Econometric Theory
A third source of error is the sampling error. For instance, consider
e
eq. (2) as
a household-consumption function, where Y is consumption and X is income.
Even if eq. (2.2) is a correct-relationship, the sample we randomly choose to
examine may turn out to be predominantly poor families. Thus our estimation
of a and p from this sample group may not be as good as the estimates from
a balanced sample group.
When eq. (2.2) has either one or more of these three sources of error, it is
justified for the introduction of an error term. Figure 2.1 illustrates eq. (2.2)
using a household consumption as an example..
Given the above sources of error, the representation of the relationship in eq
(2.2) as a stochastic one is clear. For every value of X there exists a
probability distribution of E and therefore a probability distribution of the
Y 's. Thus we say variable Y is stochastic in nature. On the other
hand, the variable X is non-stochastic. Its values are kept fixed from sample
to sample.

2.3 ESTIMATION
In the specification of the model as described by eq. (2.2), the values of the
parameters a and p are not known, as a result the population regression line
is not known. When the values of a and P are estimated, we obtain a sample
regression line that serves as an estimate of the population regression line. If
a and fl are estimated by d. &d B
respectively, then the sample regression
line is given by,

f =&+j?x, ...(2.3)

whkre f is the fitted value of y.


The theory of estimatidi can be divided into two parts: point estimation and
interval estimation. In' point estimation, the aim is to use the prior and the
sample information for the purpose of calculating a value which would be, in
some sense, our best guess as to the actual value of the parameter of interest.
In interval estimation, on ihe other hand, the same infirmation is used f6r the
purpose of producing an interval which would,,contain the true value of the
parameter with some given level of probability. Since an interval is fully
chkacterised by its limits, estimating an interval is equivalent to estimating
its limits. The interval itself is usually called a confidence interval.
Confidence intervals can also be viewed as possible measures of the precision
of a point estimator.
The problem of point estimation is that of producing an estimate that would
represent our best guess about the value 'bf the parameter. To solve this
problem we have to do'two things. First, to specify what we mean by 'best
guess' and second, to devise estimators that would meet this criterion. The
first part of the problem amounts to specifying various properties of an
estimator that can be considered desirable. The properties of the estimators
are: unbiasedness, efficiency, minimum mean s q w e error and consistency.
The second part, on the other hand, involves devising estimators that would
have at least some of the desirable properties. The estimators are given names
that indicate.nature of the principle used in devising the fo&ula. We will Estimation Of
~ e ~ r e s s i oModel
n
t discuss here the least-squares method of devising estimators.

.
Check your Progress 1

1) What do you mean by two-variable regression model?

.... .
...................................................................................

2) Give the justifications of the error term in regression model.


..

2.4 LEAST SQUARES METHOD OF ESTIMATION


-
The dominating and powerful estimating principle is that of least squares. We
want to estimate the relationship between and X from the sample
ob~ervation~above such that

f =.a
. . +. jx,
where and fi
are estimates of the unknown parameters a and, p and Pis
. \

the estimated value of Y . Th&deviations between the observed and estimated


values of Y are called residuals E , i.e.,
\

E=Y-Y. ...(2.4)
The principle of least squares is to choose and fi
vr!ues that will minimise
'the sum of squared deriations between the observed and estimated values of
Y.
The estimated eqktion will be the best fitti& curve oq the least squares
criterion. We have therefore

zq2=z(q-2)' where i =1,2,.,.,;


r

= ~(r-a-jx,).
, Basic NOWmaking deviation of both sides and equating zero, we have

Rearranging these two equations gives the normal equations.

and

"
Now solving the above two normal equations, we c k get'the values of & and
a

a=P-bx
where

and

The value of 4 can also be shown in deviation of Y and X from their


means.

Using x, = X , - X and yl = q - Y , weget

Example 2.1
We can illustrate the application of the two variable models discussed in the
earlier section. Following is a numerical illustration of estimating a least-
7 squares equation.
'

1.et us take the relationship b s t w ~ ~ .s&ngs


. " a i d disposable income as Estimation of TWO-vaflable
Regression Model
foilows:

where Y is savings and X is disposable income. We take the data for the
above variables as follows: .

Year Savings (Y) Disposable income (X)


(Ks) (Rs)
1996 16.95 0.84
1997 18.25 1.34
,1998 19.56 -1.75
1999 20.46 1.55
2000 21.76 1:63-

From the above table we can get'the following values

Using .these aggregates. we can get

N , the number of observations, is equal to 10.


The regression coefficient is estihated as:

'The intercept is calculated by using the expression as


Basic Econometric Theory

Having estimated the intercept and slope of the regression line, we can now
construct the regression equation of savings of incom; as

Example 2.2
Let us take the relationship between consumption and income as follows:

where Y is consumption and X is income. Let us take the hypothetical data


for the above variables as follows: a

. .
Consuaption (Y) (Rs) Incotne (X) (Rs)
70 80
65'

From the above table we can get the following values

Using these aggregates, we can get

N ,the number of obsetvations, is equal to 10


The regression coefficient is estimated as:
--
Estimation of Two-variable
Regression Model

The intercept is calculated by using the expression as

Theestimated regrqssion line therefore is .

w
2.5 COEFFICIENT OF DETERMJNA'I'ION ,
a
Regression residuals can provide a useful measure of the fit between the
estimated regression line and the data. A-good regression equation is one ' 4

which helps explain a. large proportion of the variance of Y . To find a


'
measure of goodness of fit, it is reasonable to use the residual variance
divided by the varia~ceof Y . The variation in Y is given by ,

variation (Y) = C( q - q2
Our goal is to divide the variation of Y into two parts, the'first accounted for
by the regression equation and the second associated with the unexplained
portion af the model.
Consider the following identity, which holdstor
-.
all observations:

(K-F)=(Y-f)+(f-F) . ..(2.13)

'The term on .the left hand,side of'the equals sign .denotes the difference
between the sample value of Y and the mean of Y . The first right hand term
gives the residual i l ,and the second right-hand term gives the difference
between the predicted value of Y and tfie me& of Y .
To measure variation, we square both sides of eq. (2.13) and sum over all
observations (1 to N ),

The last term in eq. (2.14) can be shown to be 0 by using two properties of
' lea&-squares residuals, .z
2,= 0 aqd iX, ,= 0. It follows that

Variation Residual Explained


in Y .Variation Variation
Basic Econometric Theory or. TSS = ESS + RSS
Total sum Error (Residual) Regression sum
of squares sum of squares of squares
From the above' we say that the 'total sun1 of squares' (TSS) is the sum of thc
'error sum of squares' and the 'regression sum of squares'.
We divide both sides of eq. (2.15) by the total sum of squares (TSS) to get

I=- ESS +-RSS


TSS TSS

hen we define r 2 as

Since has a limit between zy12


xt12 and 0 , r 2 will lie between 0 and I:
Here r 2 is called the coefficient of determination. It measures the proportion
of the variation in Y . Z(r - Y ) is~ the total variation of the Y values, a?d

Z(f - F)* is the variation of Y explained by variations in X . Therefore,


this coefficient indicates the proportion of Y variance explained by the
variation of X .

In most cases we use the notation R2 as the coefticient of determination


instead of r 2 . There is a slight difference between r 2 and R" r 2denotes the
coefficient of determination between two variables, while R' denotes the
coefficient of determination of more than two variable cases. Thereefor' r 2 is
called the simple coefficient of determination, while R2 is called multiple
coefficient of determination. For simplicity, we will call R2 the coefficient of
determination regardless of the number of variables contained in the equation.
Example 2.3 '

Now to k n o d o w much of the sum of the squared variation of Y is explained


by the regression equation, we calculate r 2 . Following is a numerical
example based on the'data in Example 2.1.

I
Notice that in some of the text books on Econometrics the acronym ESS denotes 'explained
sum of squares' and KSS denotes 'residual sum of squares' so that R* is given by
EL%'/TSS. Be careful and ascertain how these two acronyms are defined. We in this course
uniformly follow the notation that ESS stands for 'error sum of squares' and RSS for
L*n"maa;nn e1.m n f rn,,nmr'
Estimation of Two-variable
Regression Model

We have b = 0.099

0 . 0 9 9 ~15.3880
Now ,.2 =
1.8266

With r 2 = 0.834, we cou.i say that over 83 per cent of the variation of y
(savings) about its mean value is accounted for by the relationship found.
Example 2.4
Let us take.an illustration of the relationship between coffee consumption and
average retail price of coffee. The hypothetical data is presented in the
following. Our purpose is to fit the two-variable linear model.

Year ~ o b e consumption
e Price ohoffee
.Per person cer day (Rs)(X)
(no. of-cups)(k?
1980 2.57 ': 0.77

The results can be obtained by the formula used in the examples 2.2 and 2.3
as follows:
Basic Econometric Theory
= 2.691 1- 0.47951, ...(2.17)

and r 2 = 0.6628
The interpretation of the estimated regression is as follows: If the average real
retail price of coffee goes up by a rupee, the average consumption of coffee
per day is expected to decrease by about half a cup. If the price of coffee were
to be zero, the average per person consumption of coffee is expected to be
about 2.69 cups a day. The r 2 value means that about 66 per cent of the
variation in per capita daily coffee consumption is explained by variation in
the retail price of coffee. -
- -

2.6 INTRINSICALLY LINEAR MODELS

Linear regression model can be applied to a more general class of equations


that are intrinsically (inherently) linear. Intrinsically linear models can be
expressed in a form that is linear in the parameters by transforming the
variables. That means, if a model is non-linear and after transformation of its
variables becomes linear then the model can be said to be intrinsically linear.
Let us take the case of the following non-linear model .

This model will be intrinsically linear if it can be transformed into

Using the logarithm of each of the variable in eq. (2.18), we get the following
transformed equation
logy = a + /3 log X + E make it log alpha ... (2.20)
The relationship in eq. (2.20) is intrinsically linear because it is linear with
respect to the parameters a and p. We can apply OLS to estimate these
parameters. Because of linearity, such models are also called log-log, double-
log, or log-linear models.
One attractive feature of this model, which has-made it popular in applied
work, is that the slope coefficient measures the elasticity of Y with respect
to X ,that is the percentage change in Y for a percentage change in X . Thus
if Y represents the quantity of a commodity demanded and X its unit price,
megsures the price elasticity of demand. If the relationship between
quantity demanded and price is shown in Fig. 2.2 (a), the formed equation as
shown in Fig. 2.2 (b) will then give the estimate of price elasticity (-8).
Estimation of Two-variable
Regression Model

logy = loga-PlogX,

I'

\
II 1% q
I

P X
i log X 1% P
Figure 2.2 (a) Figure 2.2 (b)
1
Example 2.5
We can refer to the coffee demand example for the data set and find the
logarithmic value of the variables as follows:
Year Coffee consumption Price sf coffee Log Y Log X
Per person per day (Rs)(X)
(no. of cups) (Y)

Using these aggregates, we can get


I
Basic Econometric ÿ he or^
log XClog Y
1
1(log;)(log y) = 1(log x)(log Y) - -1
N
= -0.0491

N ,the number of observations, is equal to 11.


-
The regression coefficient is estimated as:

- c10gY z4ogx
6 =
>- A
log Y - p log X = ------
N
- N

log f = 0.3376-0.2541 log X,

From this result we see that the price elasticity coefficient is -0.2541,
implying that for a 1 per cent increase in the real price of coffee, the demand
for coffee (as measured by cups of coffee consumed per day) on the average
decreases by about 0.25 per cent. Since the price elasticity value of 0.25 is
less than 1 in absolute terms, we cari say that the demand for coffee is price-
inelastic. .*

Now considering the results o f the linear demand function (Example 2.4) and
logrlinear demand function (example 2.5), we may question which model is
better. Can we say that eq. (2.21) is better thafi eq. (2.1 7) because its r2 value
is higher (0.7456 vs. 0.6628)? Unfortunately, we cannot say that, for when the
dependent variable of @,wmodels is r?ot same (here, logy vs. Y), thc two r 2
values are not directly comparable. We cannot directly comparc the slope
coefficients either, for in eq. (2.17) the slope coefficient gives the effect of a
unit change in the price of coffee on the constant absolute (i.e., not relative)
amount of decrease in coffee consumption, which is 0.4795 cups per day. On
the other hand, the coefficient of -0.2541 obtained from eq. (2.21) gives the
constant percentage decrease in coffee consumption as a result of a 1 per cent
increase in the price of coffee.

Check your Progress 2

1) '.Explain how you would estimate the beta coefficient in a two-variable


regression model.
Estimation of Two-variable
Regression Model

2) Define the coefficient of determination. What does it signify?


.......................................................................................

.......................................................................................

3) Find th; values of a, and r 2 using the following data.

4) What do you mean by intrinsically linear model? Can you compare the
results of an intrinsically linear model with that of a lineai model?\Why
or why not? . .

2.7 LET US SUM UP

In this unit we studied the basic framework of regression analysis. The reason
for inclusion of error term in the regression model is described irividly. We
also studied the estimation of beta coefficient, which serves as an estimate of
the population regression line. The overall goodness of fit of the regression
t~lodel is measured by the coefficient of determination, r 2 . It tells what
proportion of the variatiur, in the dependent v a r i a y is explained by the
explanatory (independent) variable. The unit also discussed about the
intrinsically linear models that can be expressed in a form that is linear in the
parameters 6) trllnsfonning the variables. A number of illustrative
applications are presented in ihis unit so as to make the students easy for
understanding.
I
Basic Econometric Theory
2.8 KEY WORDS
Two-variable model An equation, in which there are two
variables - one dependent variable and the
other independent variable is called a two
variable model.
Error Term The measurement of the functional
relationships among variables in reality is
not exact. For a given independent
variable X , we may observe many
possible dependent variable values of Y .
To describe this situation formally, we
add a random "error" term to the model.
Stochastic Nature ; The representation of the relationship in
an equation can be said to be stochastic in
nature if for every value of X
(independent variable) there exists a
probability distribution of & and therefore
a probability distribution of the Y 's
(dependent variable).
Estimation' In the specificadon of the two variable
model, the values of the parameters a
and P are not known, as a result the
population regression line is not known.
When the values of a and P are
estimated, we obtain a sample regression
line that serves.-.as an estimate of the
population regikssion line. This is known
as estifiation.
Least-Squares The principle of least squares is to choose
a and values that will minimise the
sum of squared deviations between the
observed and estimated values of Y .
Coefficient of Determination The coefficient of determination indicates
the proportion of Y variance explained by
the variation of X .
Intrinsically Linear Models : If a model is non-linear and after
transformation of its variables becomes
linear then the model can'be said to be
intrinsically linear.

2.9 SOME USEFUL BOOKS/REFERENCES

Gujarati, Damodar N., 1995, Basic ~conometrics, McGraw-Hill Inc.,


Singapore.
Estimation of Twevariable
Johnston, Jack and John Dinardo, 1997, Econometric Methods, The McGraw- Regression Model
Hill Companies Inc., Singapore.
Pindyck, Robert S. and Daniel L. ~ubinfeld;1998, ~conometricModels and
Economic Forecasts, Irwin/McGraw-Hill, Singapore.

2.10 ANSWERSIHINTS TO CHECK YOUR


PROGRESS EXERCISES

Check Your Progress 1


1) Go through Sections 2.1.
2) Go through Section 2.2.

Check Your Progress 2


1) Go through Section 2.5.
2) Go through Section 2.6.
3) Go through Sections 2.5 and 2.6. The results are 1, 1.75 and 0.9879
respectively.

4) Go through Sections 2.5,2.6 and 2.7.

You might also like