Unit 2
Unit 2
VARIABLE REGRESSION
MODEL
Structure
2.0 Objectives
2.1 Introduction
2.2 Error Term
2.3 Estimation
2.4 Least-Squares Estimator
2.5 Coefficient of Determination
2.6 Intrinsically Linear Models
2.7 LetUs SumUp
2.8 Key Wsrds
* 2.9 Some Useful ~ooksf~eferences
2.10 AnswersIHints to Check Your Progress .Exercises
2.0 OBJECTIVES - - -- - - - - -
'2.1 INTRODUCTION ,
Income
Fig. 2.1: Least Squares Fitarng and Random Disturbances
Basic Econometric Theory
A third source of error is the sampling error. For instance, consider
e
eq. (2) as
a household-consumption function, where Y is consumption and X is income.
Even if eq. (2.2) is a correct-relationship, the sample we randomly choose to
examine may turn out to be predominantly poor families. Thus our estimation
of a and p from this sample group may not be as good as the estimates from
a balanced sample group.
When eq. (2.2) has either one or more of these three sources of error, it is
justified for the introduction of an error term. Figure 2.1 illustrates eq. (2.2)
using a household consumption as an example..
Given the above sources of error, the representation of the relationship in eq
(2.2) as a stochastic one is clear. For every value of X there exists a
probability distribution of E and therefore a probability distribution of the
Y 's. Thus we say variable Y is stochastic in nature. On the other
hand, the variable X is non-stochastic. Its values are kept fixed from sample
to sample.
2.3 ESTIMATION
In the specification of the model as described by eq. (2.2), the values of the
parameters a and p are not known, as a result the population regression line
is not known. When the values of a and P are estimated, we obtain a sample
regression line that serves as an estimate of the population regression line. If
a and fl are estimated by d. &d B
respectively, then the sample regression
line is given by,
f =&+j?x, ...(2.3)
.
Check your Progress 1
.... .
...................................................................................
f =.a
. . +. jx,
where and fi
are estimates of the unknown parameters a and, p and Pis
. \
E=Y-Y. ...(2.4)
The principle of least squares is to choose and fi
vr!ues that will minimise
'the sum of squared deriations between the observed and estimated values of
Y.
The estimated eqktion will be the best fitti& curve oq the least squares
criterion. We have therefore
= ~(r-a-jx,).
, Basic NOWmaking deviation of both sides and equating zero, we have
and
"
Now solving the above two normal equations, we c k get'the values of & and
a
a=P-bx
where
and
Example 2.1
We can illustrate the application of the two variable models discussed in the
earlier section. Following is a numerical illustration of estimating a least-
7 squares equation.
'
where Y is savings and X is disposable income. We take the data for the
above variables as follows: .
Having estimated the intercept and slope of the regression line, we can now
construct the regression equation of savings of incom; as
Example 2.2
Let us take the relationship between consumption and income as follows:
. .
Consuaption (Y) (Rs) Incotne (X) (Rs)
70 80
65'
w
2.5 COEFFICIENT OF DETERMJNA'I'ION ,
a
Regression residuals can provide a useful measure of the fit between the
estimated regression line and the data. A-good regression equation is one ' 4
variation (Y) = C( q - q2
Our goal is to divide the variation of Y into two parts, the'first accounted for
by the regression equation and the second associated with the unexplained
portion af the model.
Consider the following identity, which holdstor
-.
all observations:
(K-F)=(Y-f)+(f-F) . ..(2.13)
'The term on .the left hand,side of'the equals sign .denotes the difference
between the sample value of Y and the mean of Y . The first right hand term
gives the residual i l ,and the second right-hand term gives the difference
between the predicted value of Y and tfie me& of Y .
To measure variation, we square both sides of eq. (2.13) and sum over all
observations (1 to N ),
The last term in eq. (2.14) can be shown to be 0 by using two properties of
' lea&-squares residuals, .z
2,= 0 aqd iX, ,= 0. It follows that
hen we define r 2 as
I
Notice that in some of the text books on Econometrics the acronym ESS denotes 'explained
sum of squares' and KSS denotes 'residual sum of squares' so that R* is given by
EL%'/TSS. Be careful and ascertain how these two acronyms are defined. We in this course
uniformly follow the notation that ESS stands for 'error sum of squares' and RSS for
L*n"maa;nn e1.m n f rn,,nmr'
Estimation of Two-variable
Regression Model
We have b = 0.099
0 . 0 9 9 ~15.3880
Now ,.2 =
1.8266
With r 2 = 0.834, we cou.i say that over 83 per cent of the variation of y
(savings) about its mean value is accounted for by the relationship found.
Example 2.4
Let us take.an illustration of the relationship between coffee consumption and
average retail price of coffee. The hypothetical data is presented in the
following. Our purpose is to fit the two-variable linear model.
Year ~ o b e consumption
e Price ohoffee
.Per person cer day (Rs)(X)
(no. of-cups)(k?
1980 2.57 ': 0.77
The results can be obtained by the formula used in the examples 2.2 and 2.3
as follows:
Basic Econometric Theory
= 2.691 1- 0.47951, ...(2.17)
and r 2 = 0.6628
The interpretation of the estimated regression is as follows: If the average real
retail price of coffee goes up by a rupee, the average consumption of coffee
per day is expected to decrease by about half a cup. If the price of coffee were
to be zero, the average per person consumption of coffee is expected to be
about 2.69 cups a day. The r 2 value means that about 66 per cent of the
variation in per capita daily coffee consumption is explained by variation in
the retail price of coffee. -
- -
Using the logarithm of each of the variable in eq. (2.18), we get the following
transformed equation
logy = a + /3 log X + E make it log alpha ... (2.20)
The relationship in eq. (2.20) is intrinsically linear because it is linear with
respect to the parameters a and p. We can apply OLS to estimate these
parameters. Because of linearity, such models are also called log-log, double-
log, or log-linear models.
One attractive feature of this model, which has-made it popular in applied
work, is that the slope coefficient measures the elasticity of Y with respect
to X ,that is the percentage change in Y for a percentage change in X . Thus
if Y represents the quantity of a commodity demanded and X its unit price,
megsures the price elasticity of demand. If the relationship between
quantity demanded and price is shown in Fig. 2.2 (a), the formed equation as
shown in Fig. 2.2 (b) will then give the estimate of price elasticity (-8).
Estimation of Two-variable
Regression Model
logy = loga-PlogX,
I'
\
II 1% q
I
P X
i log X 1% P
Figure 2.2 (a) Figure 2.2 (b)
1
Example 2.5
We can refer to the coffee demand example for the data set and find the
logarithmic value of the variables as follows:
Year Coffee consumption Price sf coffee Log Y Log X
Per person per day (Rs)(X)
(no. of cups) (Y)
- c10gY z4ogx
6 =
>- A
log Y - p log X = ------
N
- N
From this result we see that the price elasticity coefficient is -0.2541,
implying that for a 1 per cent increase in the real price of coffee, the demand
for coffee (as measured by cups of coffee consumed per day) on the average
decreases by about 0.25 per cent. Since the price elasticity value of 0.25 is
less than 1 in absolute terms, we cari say that the demand for coffee is price-
inelastic. .*
Now considering the results o f the linear demand function (Example 2.4) and
logrlinear demand function (example 2.5), we may question which model is
better. Can we say that eq. (2.21) is better thafi eq. (2.1 7) because its r2 value
is higher (0.7456 vs. 0.6628)? Unfortunately, we cannot say that, for when the
dependent variable of @,wmodels is r?ot same (here, logy vs. Y), thc two r 2
values are not directly comparable. We cannot directly comparc the slope
coefficients either, for in eq. (2.17) the slope coefficient gives the effect of a
unit change in the price of coffee on the constant absolute (i.e., not relative)
amount of decrease in coffee consumption, which is 0.4795 cups per day. On
the other hand, the coefficient of -0.2541 obtained from eq. (2.21) gives the
constant percentage decrease in coffee consumption as a result of a 1 per cent
increase in the price of coffee.
.......................................................................................
4) What do you mean by intrinsically linear model? Can you compare the
results of an intrinsically linear model with that of a lineai model?\Why
or why not? . .
In this unit we studied the basic framework of regression analysis. The reason
for inclusion of error term in the regression model is described irividly. We
also studied the estimation of beta coefficient, which serves as an estimate of
the population regression line. The overall goodness of fit of the regression
t~lodel is measured by the coefficient of determination, r 2 . It tells what
proportion of the variatiur, in the dependent v a r i a y is explained by the
explanatory (independent) variable. The unit also discussed about the
intrinsically linear models that can be expressed in a form that is linear in the
parameters 6) trllnsfonning the variables. A number of illustrative
applications are presented in ihis unit so as to make the students easy for
understanding.
I
Basic Econometric Theory
2.8 KEY WORDS
Two-variable model An equation, in which there are two
variables - one dependent variable and the
other independent variable is called a two
variable model.
Error Term The measurement of the functional
relationships among variables in reality is
not exact. For a given independent
variable X , we may observe many
possible dependent variable values of Y .
To describe this situation formally, we
add a random "error" term to the model.
Stochastic Nature ; The representation of the relationship in
an equation can be said to be stochastic in
nature if for every value of X
(independent variable) there exists a
probability distribution of & and therefore
a probability distribution of the Y 's
(dependent variable).
Estimation' In the specificadon of the two variable
model, the values of the parameters a
and P are not known, as a result the
population regression line is not known.
When the values of a and P are
estimated, we obtain a sample regression
line that serves.-.as an estimate of the
population regikssion line. This is known
as estifiation.
Least-Squares The principle of least squares is to choose
a and values that will minimise the
sum of squared deviations between the
observed and estimated values of Y .
Coefficient of Determination The coefficient of determination indicates
the proportion of Y variance explained by
the variation of X .
Intrinsically Linear Models : If a model is non-linear and after
transformation of its variables becomes
linear then the model can'be said to be
intrinsically linear.