0% found this document useful (0 votes)
212 views84 pages

Econometrics 2

This document provides an introduction to the course EES 401 - Fundamentals of Econometrics II taught at Kenyatta University. The course aims to build upon concepts introduced in EES 400 and further students' understanding and application of econometrics. Key topics that will be covered include dummy variables, simultaneous equation models, time series econometrics, and panel data analysis. Assessment will comprise class assignments, a mid-term exam, and a final exam. Students are encouraged to read widely on econometrics, with several recommended textbooks provided.

Uploaded by

josephnyamai1998
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
212 views84 pages

Econometrics 2

This document provides an introduction to the course EES 401 - Fundamentals of Econometrics II taught at Kenyatta University. The course aims to build upon concepts introduced in EES 400 and further students' understanding and application of econometrics. Key topics that will be covered include dummy variables, simultaneous equation models, time series econometrics, and panel data analysis. Assessment will comprise class assignments, a mid-term exam, and a final exam. Students are encouraged to read widely on econometrics, with several recommended textbooks provided.

Uploaded by

josephnyamai1998
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 84

START PAGE 16/81

KENYATTA UNIVERSITY

DIGITAL SCHOOL

IN COLLABORATION WITH

SCHOOL OF ECONOMICS

DEPARTMENT: ECONOMETRICS AND STATISTICS

UNIT CODE AND TITLE: EES 401 – FUNDAMENTALS OF

ECONOMETRICS II

WRITTEN BY:

DR. ANGELICA NJUGUNA

MR. JACOB WANYONYI NATO

Copyright © Kenyatta University, 2015


All Rights Reserved
Published By:
KENYATTA UNIVERSITY PRESS
COURSE DESCRIPTION

Page 1 of 83
INTRODUCTION

This course is the second part of the University Undergraduate Econometrics course,
which picks up from EES 400 – Fundamentals of Econometrics II.
The aim therefore, is to develop the students’ knowledge and understanding of Econometrics
from what was covered in EES 400, and then to motivate their desire to undertake further
studies in Econometrics.
You are therefore encouraged to work hard, and give attention to each lecture in the module,
and examine how you may apply the concepts studied in your future career. Also, develop a
liking of quantitative techniques.

OBJECTIVES

Econometrics II introduces you to the concept of dummy variables. Dummy


variables are categorical variables used to capture qualitative characteristics in a model. They
may appear both as explanatory variables or as the endogenous variables. Thus you will study
both approaches in this module.
Simultaneos equation models will also be discussed, since some relationships may require the
modeling of system of equations rather than single equation models. Thereafter, you will be
introduced to time series Econometrics, and the key issues when analysing time series data.
Finally, you will be intridced to panel data models, which are gaining much attention in the
field of Econometrics.

TABLE OF CONTENTS

TABLE OF CONTENTS
LECTURE ONE: DUMMY INDEPENDENT VARIABLE MODELS
1.1 Introduction to dummy independent variables
1.2 Incorporating dummy variables in a regression model
1.3 Interaction effects of dummy variables (intercept and slope effects)

Page 2 of 83
1.4 ANOVA and ANCOVA models
1.5 The dummy variable approach to the chow test

LECTURE TWO: DUMMY DEPENDENT VARIABLE MODELS


2.1 Introduction to dummy dependent variables
2.2 The Linear Probability Model
2.3 The logit and probit model (Binary cases)
2.4 Marginal effects for Dummy dependent Variable models
2.5 Goodness of fit for dummy dependent variable models

LECTURE THREE: SIMULTANEOUS EQUATION MODELS


3.1 Introduction to simultaneous equations
3.2 Endogenous versus Exogenous variables
3.3 Structural equations and reduced form equations
3.4 The Simultaneity bias in OLS estimation
3.5 Identification of Simultaneous equations – Order and Rank conditions
3.6 Methods of estimating simultaneous equations

LECTURE FOUR: TIME SERIES ECONOMETRICS


4.1 Introduction to Time series Econometrics
4.2 Characteristics of time series
4.3 Data generating processes
4.4 Stationary and non stationary series: Unit roots
4.5 Testing for stationarity of time series and remedial measures for non-
stationary series
4.6 Integrated time series
4.7 Cointergration and Error-correction Model

LECTURE FIVE: PANEL DATA ANALYSIS


5.1 Introduction to Panel Data
5.2 Merits and demerits of Panel Data

Page 3 of 83
5.3 Types of panel data
5.4 Models for estimation of Panel Data
5.5 Fixed Effects and Random effects: The Hausman Test

ASSESSMENT

The course will be assessed as follows: Class Assignments (15%), Sit-in CAT (15%) and Final
semester Examination (70%).

RECOMMENDED TEXT BOOK

Students are encouraged to read widely on the subject, and any textbook on Econometrics will be
useful. However, text books by Gujarati Damodar, Maddala G and Stock and Watson will form
the main reference material for the course.

LECTURE ONE (1): DUMMY INDEPENDENT VARIABLE MODELS

Page 4 of 83
INTRODUCTION

This chapter introduces you to the concept of Dummy variables, and how they can be
incorporated into a regression model. Its focus is on you to appreciate why dummy variables
are useful, and the different effects that dummy variables will have in a model. The various
models for dummy variables will also be discussed, and thereafter hot to test for stability
when dummy variables are incorporated in a model.

LECTURE OBJECTIVES

By the end of this lecture, you will be able to do the following:


(i) Introduction to dummy variables
(ii) Incorporating dummy variables in a regression model
(iii) Interraction effects caused by dummy variables (Intercept and Slope)
(iv) ANOVA versus ANCOVA models
(v) Testing for structural stability in dummy variable analysis (The Chow Test)

1. INTRODUCTION TO DUMMY VARIABLES

A dummy variable is one that takes on discrete values only, that is 0, 1, 2, 3… rather than
continuous numbers. Thus, a binary dummy will have only two possible values: 0 and 1 to
indicate the absence and presence of a particular characteristic respectively.

Dummy variables are also known as categorical variables, discrete variables or binary variables.

In regression analysis, dummy variables are mainly used to capture qualitative attributes or
characteristics, such as gender, religion, employment status, marital status, political party
membership, and so on.

Since these variables cannot be measured but they do influence the dependent variable in a
regression model, then we need to find a way of capturing them in the model, and this is done
usually through the process of CODING. In coding, we assign specific numerical values to each
particular attribute. For example:

- Gender: male (0), female (1)


- Employment status: unemployed (0), employed (1)
- Voting in elections: no (0), yes (1), undecided (2)

Page 5 of 83
- Political party membership: republican (0), democrat (1), others (2)
- Marital status: single (0), married (1), divorced (2), separated (3), widowed (4), etc.

Dummy variables thus sort the data into mutually exclusive categories. They are thus useful in
helping us to incorporate into a model:

(i) Qualitative characteristics,


(ii) Seasonal and regional analysis,
(iii) Occurrence of major events,
(iv) Changes in regimes, and so on.

2. INCORPORATING DUMMY VARIABLES IN A REGRESSION MODEL

Dummy variables are incorporated in regression models, in the same way as other quantitative
explanatory variables are included.

For example, consider the following model for salary of employees as a function of gender and
level of education:

Salary = 0 + 1 Gender + 2 Education +ut

Where: Salary is measured in thousands of Kenya shillings;

Gender = 1 if female, and 0 otherwise.

Education is measured by the number of years spent in school.

From the above regression model, we note that salary and education are quantitative variables,
whereas gender is a qualitative or dummy variable.

A binary dummy explanatory variable such as gender, can take only two values, 0 and 1 as
shown. The category assigned value 0 (i.e., male in this case) is assumed to have no role in
influencing the dependent variable. Such a category given a value of zero is thus omitted from
the regression model, and is thus called the REFERENCE CATEGORY or BASE CATEGORY
or BENCHMARK CATEGORY. The omitted category is the category to which no dummy is
assigned, and thus, it is the category against which comparisons are made. This is why it is
referred to as the reference or base or benchmark category.

On the other hand, the category given a value of 1 (i.e., female in this case) will affect the
regression model and it acts by CHANGING THE INTERCEPT OF THE REGRESSION
MODEL. This is demonstrated as shown below:

I. DUMMY VARIABLE MEASURING CHANGE IN INTERCEPT

MALE REGRESSION: Salary = 0 + 1 (0) + 2 Education +ut

Page 6 of 83
Salary = 0 + 2 Education +ut

FEMALE REGRESSION: Salary = 0 + 1 (1) + 2 Education +ut

Salary = (0 + 1) + 2 Education +ut

The intercept for male regression model is 0 while that for female is 0 + 1. However, the
slopes for the two regression models are constant (2), and thus the regression lines for MALE
and FEMALE will be parallel to each other. The diagram below illustrates the dummy variable
effect for change in intercept:

Figure 1: dummy variable effect for change in intercept

Figure 1 above thus shows an intercept shift in salary between male and female. The difference
in salary between males and females is 1, holding education and error term constant. Thus, the
coefficient 1 helps us to determine whether there is any discrimination in salary between men
and women.

Apart from capturing wage differences between male and female workers, dummy variables can
also be used to capture SEASONAL EFFECTS and also GEOGRAPHICAL LOCATIONS. The
four main seasons are: summer, autumn, winter and spring.

If we assume that spring is the reference or base season, then we can create the following dummy
variables to capture the remaining three seasons:

D1 = 1 if summer, 0 otherwise,

D2 = 1 if autumn, 0 otherwise, and

D3 = 1 if winter, 0 otherwise.

For the case of geographical locations, if we assume only three regions are present, say: north,
south and west, from which we want to study variations in teacher’s salaries, then we need to
define 2 dummy variables, and let one location be the reference category. For example, if we let
west to be the reference category, then we can define the following 2 dummies:

Page 7 of 83
D1 = 1 if teacher is from the north, 0 otherwise

D2 = 1 if teacher is from the south, 0 otherwise.

In all the examples above, notice that we have always been omitting one category, i.e., male,
summer and west respectively, which are actually the benchmark categories.

The reason for such omission is to AVOID PERFECT MULTI-COLLINEARITY in the


explanatory variables, a problem commonly known as the DUMMY VARIABLE TRAP. With
the dummy variable trap problem, we cannot estimate the model. Thus, the general rule is that if
you have k categories of a dummy variable, since one category has to be omitted as the reference
category, then we remain with (k – 1) categories.

EXAMPLE

Consider the following model which shows teachers’ salaries as a function of their location and
government expenditure on schools:

Yi = 0 + 1 D1i + 2 D2i + 3Xi +ui

Where Yi is salary of a teacher in a public school measured in US dollars,

D1i is defined as: D1i = 1 if from the north, 0 otherwise

D2i is defined as: D2i = 1 if from the south, 0 otherwise

Xi is government expenditure on public schools

Assuming that we collect cross-sectional data on all the variables above and run the regression
equation, and say we obtain the following regression equation:

Yi = 13,269.11 – 1,673.514 D1i – 1,144.157 D2i + 3.289 Xi

This model can thus be interpreted as follows:

 The average or expected salary of a teacher in a public school from the west region (base
category) will earn about US dollars 13,269.11 holding Xi constant.
 A teacher in a public school from the north will on average earn US dollars 1,673.514
less (notice the negative sign) than their counterpart who comes from the west. Thus,
holding Xi constant, the average salary of a teacher from the north will be US $ 13,269.11
– US $ 1,673.514 = US $ 11,595.596
 A teacher in a public school from the south will on average earn US dollars 1,144.157
less than their counterpart who comes from the west. Thus, holding Xi constant, the
average salary of a teacher from the south will be US $ 13,269.11 – US $ 1,144.157 = US
$ 12,124.953

Page 8 of 83
 Since the salary of a teacher in the north is about US $ 11,595.596, while their
counterpart in the south earns about US $ 12,124.953, then we can also determine the
difference in salary between a teacher in the north and that in the south, by taking the
difference between their expected salaries as follows: US $ 12,124.953 – US $
11,595.596 = US $ 529. Thus, the average salary of a teacher in the north is lower than
that of a teacher in the south by about US $ 529, holding all other factors constant.
 A unit increase in government expenditure on public schools (Xi) will lead to an increase
in a typical public school teachers’ salary by about US $ 3.829, holding all other factors
constant.

Figure 2 below illustrates the change in teachers’ salaries due to their locations:

From the graph, the average salary of a teacher in the north is lower than that of a teacher in the
west by about US $ 1673 (i.e., 1). On the other hand, the average salary of a teacher in the south
is lower than that of a teacher in the west by about US $ 1,144 (i.e., 2).

We can also argue that the average salary of a teacher in the north is lower than that of a teacher
in the south by about US $ 529 (i.e., 1 - 2). In all these instances, we have held Xi constant.

II. DUMMY VARIABLE MEASURING CHANGE IN SLOPE

EXAMPLE:

Consider the following model for salary of employees as a function of gender and education:
Salary = 0 + 1 Education + 2 (Gender * Education) +ut

Using this regression model, we can conclude as follows:

MALES REGRESSION: Salary = 0 + 1 Education + 2 (0 * Education) +ut

Salary = 0 + 1 Education + ut

FEMALE REGRESSION: Salary = 0 + 1 Education + 2 (1 * Education) +ut

Salary = 0 + (1 + 2) Education + ut

Figure 3: dummy variable effect for change in slope

Page 9 of 83
From the diagram, we note that the slope for male is 1 while the slope for female is (1 + 2).
However, the two regression lines for male salary and female salary have equal intercept at 0.
This is what is called interaction effect of dummy variables to measure change in slope. The
difference in slope between males and females, keeping error term constant is 2.

III. DUMMY VARIABLE MEASURING CHANGE IN BOTH INTERCEPT AND


SLOPE

A final interaction effect of dummy variables is used to measure CHANGE IN BOTH


INTERCEPT AND SLOPE. In this regard, we will modify the regression equation as follows:
Salary = 0 + 1 Gender + 2 Education + 3 (Gender * Education) +ut

Using this regression model, we can conclude as follows:

MALES REGRESSION: Salary = 0 + 1 (0) + 2 Education + 3 (0 * Education) +ut

Salary = 0 + 2 Education + ut

FEMALE REGRESSION: Salary = 0 + 1 (1) + 2 Education + 3 (1 * Education) +ut

Salary = (0 + 1) + (2 + 3) Education + ut

Comparing the two regression equations for males and females, we notice a change in both
intercept and slope for male and female regression. This is demonstrated in figure four:

Figure 4: dummy variable effect for change in both intercept and slope

The intercept for male is 0 while that for female is (0 + 1) so that the difference in intercept is
1. On the other hand, the slope for male is 2 while that for female is (2 + 3) so that the
difference in slope is 3.

Page 10 of 83
The interaction effect for dummy variable to measure change in both intercept and slope can thus
be considered as more general.

ASSIGNMENT 1

(a) The quantity demanded for wheat flour in a particular household is defined as follows: Qd
= 0 + 1P + 2Y + 3D + 4DY + ui
Where: Qd is the quantity demanded of wheat flour in a year,

P is the price of wheat flour per bag,

Y is the income of the household, and

D is a dummy variable defined as D = 1 if Christmas holiday in December,


and 0 if any other month of the year (January to November)

ui is the error term whose expected value is zero.

(i) Holding price and income constant, what is the mean quantity demanded for wheat
flour, in:
 December
 Other month of the year
(ii) Draw diagrams to illustrate your answer in a (i) above
(iii) What type of interaction effect is depicted in the model given above?

(b) Now, assume that people’s marginal propensity to consume (MPC) goes up in December,
so that the regression model is: Qd = 0 + 1P + 2Y + 3YD + ui where the variables are
defined as in (a) above;

(i) Holding price and income constant, what is the mean quantity demanded for wheat flour,
in:
 December
 Other month of the year
(ii) Draw diagrams to illustrate your answer in b (i) above
(iii) What type of interaction effect is depicted in the model given above?
(c) How would you modify the given equation to show a dummy variable measuring change
in intercept? Show the equation and demonstrate using a diagram.
3. ANOVA AND ANCOVA MODELS
(a) ANOVA MODELS

ANOVA stands for Analysis of Variance, while ANCOVA stands for Analysis of Covariance.

ANOVA is a regression model in which the dependent variable is quantitative in nature, but all
the explanatory variables are qualitative in nature (dummies).

Page 11 of 83
There are two major types of ANOVA models:

(i) ANOVA model with one qualitative variable


(ii) ANOVA model with two qualitative variables

I. ANOVA MODEL WITH ONE QUALITATIVE VARIABLE

EXAMPLE 1:

An analyst wishes to study how teacher’s salaries vary in three key regions: North, South and
West. He collects his data and runs a regression model given as:

Yi = 0 + 1 D1i + 2 D2i +ui

Where: Yi is the salary of a teacher in a public school measured in dollars,

D1i is defined as D1i = 1 if from the North, 0 otherwise,

D2i is defined as D2i = 1 if from the South, 0 otherwise.

In this case, the West region has been treated as the reference category. Since this model only has
dummies as its regressor, it is an ANOVA model with one qualitative variable – i.e., region from
which a teacher comes from, but which has been divided into three (3) categories: West, North
and South.

Thus, the average salary of a typical teacher in a public school is as follows:

- If teacher is from the WEST: Yi = 0 + 1 (0) + 2 (0) +ui


Thus, the expected salary is E (Yi) = 0 dollars
- If teacher is from the NORTH: Yi = 0 + 1 (1) + 2 (0) +ui
Thus, the expected salary is E (Yi) = (0 + 1) dollars
- If teacher is from the SOUTH: Yi = 0 + 1 (0) + 2 (1) +ui
Thus, the expected salary is E (Yi) = (0 + 2) dollars

Recall that the expected value of the error term is usually zero, E (ui) = 0.

The process of estimating the model and evaluating its properties are however similar as when
estimating any regression model. However, it may be useful to use matrix algebra.

EXAMPLE 2:

Consider the following data which shows sales in thousands of Kenya shillings for a sales person
as a function of their area of specialization while in college.

Sales in 000’s of Kenya shillings Area of specialization


48 Marketing

Page 12 of 83
41 Marketing
29 Management
38 Finance
43 Marketing
33 Management
43 Finance
42 Marketing
37 Finance
46 Marketing
30 Management
46 Finance
35 Finance
37 Management
51 Marketing

Required:

(i) Taking finance as the reference category, regress sales on area of specialization
(ii) Interpret the results of the estimated regression model
(iii) What sales value will a sales person with finance option make?
(iv) Calculate R2 and adjusted R2
(v) Test for the overall significance of the model
(vi) Derive the variance-covariance matrix, and test for the statistical significance for each
parameter in the model

SOLUTION

(i) Taking Finance as the reference category, regress sales on area of specialization
- Since finance is the reference category, we shall omit it from the regression model in
order to avoid the “dummy variable trap” problem. Given 3 categories of specialization,
we shall introduce k – 1 = 3 – 1 = 2 dummies:
D1 = 1 if marketing specialization, 0 otherwise (management or finance)
D2 = 1 if management specialization, 0 otherwise (marketing or finance).
- Thus, we shall define the regression model as follows:
Y = 0 + 1 D1 + 2 D2 + ui
Sales = 0 + 1 Marketing + 2 Management + ui
- The table below shows the computed values to be used in the regression model:

Sales (Y) Area of specialization D1 D2 D21 D22 D1D2 YD1 YD2 Y2


48 Marketing 1 0 1 0 0 48 0 2304
41 Marketing 1 0 1 0 0 41 0 1681
29 Management 0 1 0 1 0 0 29 841
38 Finance 0 0 0 0 0 0 0 1444

Page 13 of 83
43 Marketing 1 0 1 0 0 43 0 1849
33 Management 0 1 0 1 0 0 33 1089
43 Finance 0 0 0 0 0 0 0 1849
42 Marketing 1 0 1 0 0 42 0 1764
37 Finance 0 0 0 0 0 0 0 1369
46 Marketing 1 0 1 0 0 46 0 2116
30 Management 0 1 0 1 0 0 30 900
46 Finance 0 0 0 0 0 0 0 2116
35 Finance 0 0 0 0 0 0 0 1225
37 Management 0 1 0 1 0 0 37 1369
51 Marketing 1 0 1 0 0 51 0 2601
599 SUMMATION(Σ) 6 4 6 4 0 271 129 24517

Number of observations (N) = 15, mean of Y = ΣY/N = 599/15 = 39.9333


- The formula for obtaining the regression coefficients (the OLS estimator) for the
regression equation Y = 0 + 1 X1 + 2 X2 + u by using the matrix approach is:
 = (X’X)-1(X’Y), that is “X-transpose X inverse times X-transpose Y”
- Where , (X’X) and (X’Y) are defined as follows:

n Σ X1 ΣX2 ΣY

-
=
0
[]
1
2
; (X’X) =
[ Σ X1 Σ X 21
Σ X2 Σ X1 X2
However, the regression of interest to us is:
Σ X1 X2
ΣX
2
2
] ; and X’Y =
[ ]
ΣY X 1
ΣY X 2

Y = 0 + 1 D1 + 2 D2 + u
Thus, we shall slightly modify the above formulas for obtaining  by replacing X with D as
follows:
 = (D’D)-1(D’Y)
- Where , (D’D) and (D’Y) are defined as follows:

n Σ D1 Σ D2 ΣY
- =
0
[]
1
2
; (D’D) =
[ Σ D1 Σ D12

Σ D2 Σ D1 D2
Σ D1 D2
Σ D22 ] ; and D’Y =
[ ]
ΣY D 1
ΣY D 2

- Thus, the formula  = (D’D)-1(D’Y), is expressed as:


n Σ D1 Σ D2 ΣY

-
0
[]
1
2
=
[ Σ D1 Σ D1 2

Σ D2 Σ D 1 D 2
Σ D1 D2
Σ D22
From the table of values computed above, we can now write as follows:
] [ ]
-1 ΣY D 1
ΣY D 2

Page 14 of 83
0 15 6 4 599
[] [
1
2
= 6 6 0
4 0 4 ] [ ]
-1
271
129
- We therefore need to obtain the matrix inverse of (D’D), and this is done by computing:
the minor, the cofactor, the determinant, the adjoint and finally the inverse of that matrix.

(a) MINOR OF (D’D)


- Minor of element a11 = 15 is (6  4) – (0  0) = 24
- Minor of element a12 = 6 is (6  4) – (4  0) = 24
- Minor of element a13 = 4 is (6  0) – (4  6) = - 24
- Minor of element a21 = 6 is (6  4) – (4  0) = 24
- Minor of element a22 = 6 is (15  4) – (4  4) = 44
- Minor of element a23 = 0 is (15  0) – (4  6) = - 24
- Minor of element a31 = 4 is (6  0) – (6  4) = - 24
- Minor of element a32 = 0 is (15  0) – (6  4) = - 24
- Minor of element a33 = 4 is (15  6) – (6  6) = 54

24 24 −24
Therefore, minor (D’D) =
[ 24 44 −24
−24 −24 54 ] ;

(b) COFACTOR OF (D’D)


- The cofactor of a matrix is given by (-1) i+j  each element of minor matrix, where i and j
are the ith row and jth column respectively for any particular element. Alternatively, the
cofactor is obtained by interchanging the signs of the elements in the minor matrix.

24 −24 −24
Thus, cofactor (D’D) =
[ −24 44
−24 24
24
54 ] ;

(c) DETERMINANT OF (D’D)


- The determinant of a matrix can be found by LAPLACE EXPANSION as follows:
Determinant = Σaijcij = a11c11 + a12c12 + a13c13 where aij is the element in the ith row and jth
column of the parent matrix D’D while cij is the element in the ith row and jth column of
the cofactor matrix.
- From the parent matrix (D’D), we notice that a 11 = 15, a12 = 6 and a13 = 4. From the
cofactor matrix of (D’D), we notice that c 11 = 24, c12 = - 24 and c13 = - 24. Thus, using
LAPLACE EXPANSION, the determinant of (D’D) is: determinant (D’D) = (15  24) +
(6  -24) + (4  -24) = 120

(d) ADJOINT OF (D’D)

Page 15 of 83
- The adjoint of a matrix is obtained by transposing the cofactor matrix, i.e., interchange
rows to become columns and columns to become rows.
24 −24 −24
-
[
Thus, adjoint (D’D) = −24 44
−24 24
24 ;
54 ]
- An important point to note is that the matrices: (D’D), minor (D’D) and Cofactor (D’D)
are always symmetric. Therefore, the adjoint of (D’D) will in fact always resemble the
cofactor (D’D). Notice that cofactor = adjoint. This must be so.

INVERSE OF (D’D)

1
- The inverse of a matrix is usually defined as: Inverse = Adjoint
determinant

24 −24 −24
- Thus, inverse (D’D) or simply (D’D) = -1 1
120 [ −24 44
−24 24
24
54 ] ;

THE OLS ESTIMATORS

- Recall the formula for obtaining the OLS estimators was expressed earlier as:
0 15 6 4 599
-
[] [
1 =
2
6 6 0
4 0 4
-1

] [ ]
271
129
- Since we now know the inverse matrix, then we simply substitute into the equation. This
therefore yields:
0 24 −24 −24 599 39.8
 1 =
2
1
120 [
−24 44
−24 24
24 271
54 129
=
][ ] [ ]
5.3667
−7.55
- Thus, the OLS estimators are: 0^ = 39.8, 1=5.3667
^ ^
and 2=−7.55
- The regression model is:
Sales in 000’s Kshs = 39.8 + 5.3667 Marketing – 7.55 Management + ui
- Point to Note: We have applied matrix inversion procedure to obtain the solution values.
However, you may also use Crammer’s Rule Procedure as well.

(ii) Interpret the results of the estimated regression model


- The expected sales value for an individual who specialized in Finance is Kshs 39,800,
holding other factors constant.

- An individual who specialized in marketing will on average earn Kshs 5,366.7 more than
one who specialized in finance, holding other factors constant.0ii
Page 16 of 83
Thus, a marketing graduate will earn Kshs 39,800 + Kshs 5,366.7 = Kshs 45,166.7
- An individual who specialized in management will on average earn Kshs 7,550 less than
one who specialized in finance, holding other factors constant.
Thus, a management graduate will earn Kshs 39,800 - Kshs 7,550 = Kshs 32,250

(iii) What sales value will a sales person with finance option make?
- The expected sales value for an individual who specialized in Finance is Kshs 39,800,
holding other factors constant.

(iv) Calculate R2 and adjusted R2


explained ∑ of squares(ESS)
- Goodness of fit (R2) is given by: R2 =
total ∑ of squares (TSS)
- Explained sum of squares (ESS) = ' (X’Y) - n Y´ 2 where Ý
is the mean of Y
which was obtained as follows: Ý = mean of Y = ΣY/n = 599/15 = 39.9333
- Hence, ESS = ' (X’Y) - n Y´ 2
599
ESS =  39.8 5.3667 −7.55 
[ ]
271
129
- 15  39.93332

ESS = (39.8599) + (5.3667271) + (-7.55129) - 15  39.93332


ESS = 400.56
- Total sum of squares: (TSS) = Σ Y 2 - n Y´ 2

TSS = 24,517 – (1539.93332)

TSS = 596.93

explained ∑ of squares(ESS) 400.56


- R2 = = = 0.6710 or 67.10%
total ∑ of squares (TSS) 596.93
- Thus, marketing and management specializations account for 67.10% of the total
variations in sales value, leaving about 32.90% of the variations to be explained by other
factors.
N−1
- The formula for Adjusted R2 = 1 – (1 – R2)( )
N −K
15−1
Thus, adjusted R2 = 1 – (1 – 0.6710) ( )
15−3
Adjusted R2 = 1 – 0.3838 = 0.6162 or 61.62%

The adjusted R-squared takes into account the degrees of freedom. Thus, adjusted R-
squared is always less than the unadjusted R-squared, as shown.

Page 17 of 83
(v) Test for the overall significance of the model

-The null and alternative hypotheses respectively are:

H0: 1 = 2 = 0 (the model is not significant)


HA: 1  2  0 (the model is significant)
- For the overall significance of the model, we use the F-ratio which is defined as:
1−R
(¿¿ 2)/(N− K)
F=
R2 /(K −1)
¿
0.6710/(3−1) 0.3355
Thus, F = =
(1−0.6710)/( 15−3) 0.0274
F calculated = 12.24
k-1 df in the numerator
n-k df in the denominator
a-degree of significance
- F critical = Fk-1,n-k, = F3-1,15-3,0.05 = F2,12,5% = 3.89

- The decision rule is if F calculated is greater than F critical, we reject the null hypothesis
and conclude that the model is significant. On the other hand, if F calculated is less than
F critical, we do not reject the null hypothesis, and conclude that the model is not
significant.

- For our particular case, F calculated exceeds F critical, thus reject H0. Therefore, the
model is significant.

(vi) Derive the variance-covariance matrix, and test for the statistical significance for
each parameter in the model
- The formula for the variance-covariance (VAR-COV) matrix of the OLS estimator  is
given by:
VAR-COV  = 2(X’X)-1
RSS TSS−ESS
Where 2 is the variance of Y given as: 2 = =
n−k n−k
Thus, for our problem above, we replace X by D and hence the formula for the variance
covariance matrix therefore becomes:
VAR-COV  = 2(D’D)-1

- We already found the values for TSS and ESS as: TSS = 596.93 and ESS = 400.56. Thus,
RSS TSS−ESS 596.93−400.56
2 = = = =16.3642
n−k n−k 15−3

Page 18 of 83
24 −24 −24
-

-
Recall that: (D’D)-1 =
1
120 [ −24 44
−24 24
24
54 ]
Therefore, the variance-covariance matrix for the OLS estimator  becomes:
24 −24 −24
VAR-COV  = 2(D’D)-1 =
16.3642
120
−24 44
−24 24 [ 24
54 ]
3.2728 −3.2728 −3.2728
VAR-COV  =
[ −3.2728 6.0002
−3.2728 3.2728
3.2728
7.3639 ]
- The variance-covariance matrix is named so because the elements along the main
diagonal are the VARIANCES while all other elements off the main diagonal are the
COVARIANCES. This is demonstrated as shown below:

var 0 cov (0,1) cov (0,2)


VAR-COV  =  cov (0, 1) var 1 cov (1, 2) 
cov (0, 2) cov (1, 2) var 2

- However, of main interest to us are the variances. Thus variances of 0, 1 and 2
respectively are: Var 0 = 3.2728, Var 1 = 6.0002 and Var 2 = 7.3639.

- The square roots of the variances are the standard errors. Thus, the standard errors for
0, 1 and 2 respectively are: se (0) = 3.2728 = 1.809, se (1) = 6.0002 = 2.45 and se
(2) = 7.3639 = 2.714.

- The t-ratios are obtained by dividing a coefficient by its respective standard error: ie., t-

ratio = se ( )
- Hence, the complete regression model is a model that shows the parameters for each
variable, the standard errors, as well as their t-ratios as follows:
Sales in 000’s Kenya shillings = 39.8 + 5.3667 Marketing – 7.55 Management
Standard error: (1.809) (2.45) (2.714)
Calculated t-ratios: 22.00 2.19 2.78
- The critical t-statistics is t n – k, /2 for a two-tailed test. Thus, the critical t-statistics at
various levels of significance are as follows:
 At 1 % level of significance, t-critical = t 15 – 3, 0.01/2 = t12, 0.005 = 3.055
 At 5 % level of significance, t-critical = t 15 – 3, 0.05/2 = t12, 0.025 = 2.179
 At 10 % level of significance, t-critical = t 15 – 3, 0.10/2 = t12, 0.05 = 1.782

Page 19 of 83
- The respective hypotheses are as follows:
 For intercept: H0: 0 = 0 against HA: 0  0

 For marketing : H0: 1 = 0 against HA: 1  0


 For management : H0: 2 = 0 against HA: 2  0

- The decision rule is: If t-calculated is greater than t-critical, we reject the null hypothesis.
However, if t-calculated is less than t-critical, we do not reject the null hypothesis.
- Therefore, if we compare t-calculated as shown in the complete regression model and t-
critical at various levels of significance, we can conclude as follows:
 The intercept is significant at 1% level of significance,
 The coefficient for marketing is significant at 5% level of significance, but not at
1% level of significance,
 The coefficient for management is significant at 5% level of significance, but is
not significant at 1% level of significance.
- Apart from the t ratios for testing for significance, we may also make use of the
probability (p) values that are usually reported when a regression equation is performed
using computer software.

II. ANOVA MODEL WITH TWO QUALITATIVE VARIABLES

EXAMPLE: (For students to work out)

An analyst wishes to study how expenditure on travel by an individual varies with their gender
and employment status. They collect cross-sectional data for these variables and the data is as
follows:

Travel expenditure in Gender Employment Status


dollars
40 Male Employed
31 Female Unemployed
18 Male Unemployed
19 Male Unemployed
47 Male Employed
27 Female Unemployed
26 Female Unemployed
17 Male Unemployed
43 Male Employed
49 Male Employed
15 Male Unemployed
25 Female Unemployed
29 Female Unemployed
20 Male Unemployed

Page 20 of 83
41 Female Employed

Required:

(i) Taking gender as D1i = 1 if individual is female, but 0 otherwise; and employment
status as D2i = 1 if employed, but 0 otherwise; regress travel expenditure on gender
and employment status
(ii) Interpret the results of the estimated regression model
(iii) What expenditure value will a male unemployed person make?
(iv) Calculate R2 and adjusted R2
(v) Test for the overall significance of the model
(vi) Derive the variance-covariance matrix, and test for the statistical significance for each
parameter in the model

HINT: solve for this problem using the same approach as discussed in the previous example, and
discuss among yourselves.

(b) ANCOVA MODELS

ANCOVA model is a regression model in which the dependent variable is quantitative in nature,
but one or more of the explanatory variables are qualitative in nature (dummies) and the others
are quantitative variables.

EXAMPLE:

Consider the following data which shows the amount of time in hours people spend watching the
television as a function of their age and gender:

Hours watching TV Gender Age of individual


0 Male 41
180 Male 19
360 Female 54
900 Male 22
0 Male 48
360 Female 52
3600 Female 24
630 Male 60
1440 Female 28
0 Male 58
360 Female 35
4680 Female 67
630 Female 30
1440 Female 21
90 Male 32

Page 21 of 83
360 Male 33
5760 Female 39
720 Female 56
2160 Female 31
90 male 57

In order to regress hours on gender and age, we can define the model as follows:

Yi = 0 + 1D + 2X + ui

Where: Yi = the number of hours individuals spend watching TV.

D = gender, such that D = 1 if male, 0 otherwise

X = age of individual in years.

Required:

(a) Regress hours against gender and age of individual and interpret the results of the model
(b) Compute R-squared and adjusted R-squared
(c) Derive the variance-covariance matrix and use it to obtain the standard errors of the
regression coefficients
(d) Test the statistical significance of:
(i) The regression parameters
(ii) The regression model

SOLUTION

(a) Regress hours against gender and age of individual and interpret the results of the
model
- Since female is the reference category, we shall omit it from the regression model.
- The regression model has already been defined as follows:
Y = 0 + 1 D + 2 X
Hours of watching TV = 0 + 1 Gender + 2 Age
- The table below shows the computed values to be used in the regression model:

Hours Gender Age D D2 X2 XD YD YX Y2


(Y) (X)
0 Male 41 1 1 1681 41 0 0 0
180 Male 19 1 1 361 19 180 3,420 32,400
360 Female 54 0 0 2916 0 0 19,440 129,600
900 Male 22 1 1 484 22 900 19,800 810,000
0 Male 48 1 1 2304 48 0 0 0
360 Female 52 0 0 2704 0 0 18,720 129,600

Page 22 of 83
3600 Female 24 0 0 576 0 0 86,400 12,960,000
630 Male 60 1 1 3600 60 630 37,800 396,900
1440 Female 28 0 0 784 0 0 40,320 2,073,600
0 Male 58 1 1 3364 58 0 0 0
360 Female 35 0 0 1225 0 0 12,600 129,600
4680 Female 67 0 0 4489 0 0 313,560 21,902,400
630 Female 30 0 0 900 0 0 18,900 396,900
1440 Female 21 0 0 441 0 0 30,240 2,073,600
90 male 32 1 1 1024 32 90 2,880 8,100
360 Male 33 1 1 1089 33 360 11,880 129,600
5760 Female 39 0 0 1521 0 0 224,640 33,177,600
720 Female 56 0 0 3136 0 0 40,320 518,400
2160 Female 31 0 0 961 0 0 66,960 4,665,600
90 Male 57 1 1 3249 57 90 5,130 8,100
23,760 (Σ) 807 9 9 36809 370 2250 953,010 79,542,000

Number of observations (N) = 20, mean of Y = ΣY/N = 23,760/20 = 1,188


- The formula for obtaining the regression coefficients (the OLS estimator) for the
regression equation Y = 0 + 1D + 2 X + u by using the matrix approach is  = (D’X)-
1
(X’Y) where X’ is read “X-transpose”
- Where , (X’X) and (X’Y) are defined as follows:

0 N ΣD ΣX ΣY
 =  1  ; (D’X) =  ΣD ΣD ΣDX  ; and X’Y =  ΣYD 
2 ΣX ΣDX Σ X2 ΣYX

- Using the table of values as above and the formula  = (D’X)-1(X’Y), we can write:
- From the table above, we can now write as follows:
0 N ΣD ΣX ΣY
 1  =  ΣD ΣD ΣDX -1   ΣYD 
2 ΣX ΣDX Σ X 2 ΣYX
- By substituting the values in the table above, we obtain the following expression:

0 20 9 807 23760
 1 = 9 9 370 -1  2250 
2 807 370 36809 953010

- we therefore need to obtain the matrix inverse of (D’X), and this is shown below:

MINOR OF (D’X)

- Minor of element a11 = 20 is (9  36809) – (370  370) = 194,381


- Minor of element a12 = 9 is (9  36809) – (370  807) = 32,691

Page 23 of 83
- Minor of element a13 = 807 is (9  370) – (9  807) = - 3,933
- Minor of element a21 = 9 is (9  36809) – (370  807) = 32,691
- Minor of element a22 = 9 is (20  36809) – (807  807) = 84,931
- Minor of element a23 = 370 is (20  370) – (9  807) = 137
- Minor of element a31 = 807 is (9  370) – (9  807) = - 3,933
- Minor of element a32 = 370 is (20  370) – (9  807) = 137
- Minor of element a33 = 36809 is (20  9) – (9  9) = 99

194,381 32,691 −3,933


Therefore, minor (D’X) = =  32,691 84,931 137 
−3,933 137 99

COFACTOR OF (D’X)

- The cofactor of a matrix is given by (-1) i+j  minor. Alternatively, the cofactor is obtained
by interchanging the signs of the elements in the minor matrix.

194,381 −32,691 −3,933


Thus, cofactor (D’X) =  −32,691 84,931 −137 
−3,933 −137 99

DETERMINANT OF (D’X)

- The determinant of a matrix can be found by LAPLACE EXPANSION as follows:


Determinant = Σaijcij = a11c11 + a12c12 + a13c13 where aij is the element in the ith row and jth
column of the parent matrix D’X while cij is the element in the ith row and jth column of
the cofactor matrix.
- From the parent matrix (D’X), we notice that a 11 = 20, a12 = 9 and a13 = 807. From the
cofactor matrix of (D’X), we notice that c 11 = 194,381, c12 = - 32,691 and c13 = - 3,933.
Thus, using LAPLACE EXPANSION, the determinant of (D’X) is: det (D’X) = (20 
194,381) + (9  -32,691) + (807  -3,933) = 419,470

ADJOINT OF (D’X)

- The adjoint of a matrix is obtained by transposing the cofactor matrix, i.e., interchange
rows to become columns and columns to become rows. Since the cofactor matrix is
symmetric, it will be similar to its adjoint.
194,381 −32,691 −3,933
- Thus, adjoint (D’X) =  −32,691 84,931 −137 
−3,933 −137 99

INVERSE OF (D’X)

1
- The inverse of a matrix is usually defined as: Inverse = Adjoint
determinant

Page 24 of 83
194,381 −32,691 −3,933
1
- Thus, inverse (D’X) or simply (D’X) =-1
 −32,691 84,931 −137 
419,470
−3,933 −137 99

THE OLS ESTIMATES

- Recall the formula for obtaining the OLS estimates was expressed earlier as:
0 20 9 807 23760
-1
 1 = 9 9 370   2250 
2 807 370 36809 953010
- Since we now know the inverse matrix, then we simply substitute into the equation. This
therefore yields:
0 194,381 −32,691 −3,933 23760 1,899.42
1
 1 =  −32,691 84,931 −137   2250  =  −1,707.41 
419,470
2 −3,933 −137 99 953010 1.41
- Thus, the OLS estimates are: 0^ = 1,899.42, 1=−1,707.41
^ ^
and 2=¿ 1.41
- Regression model is: Hours watching TV = 1899.42 – 1707.41Gender + 1.41 Age

Interpret the results of the estimated regression model

- Holding gender and age of individual constant, the expected hours spent watching TV is
1,899.42 hours.
- Male viewers are less likely to watch TV. They spend 1,707.41 hours less watching TV
compared to their female counterparts, holding other factors constant.
- An increase in an individual’s age by one year increases the number of hours they spend
watching the TV by about 1.41 hours, holding other factors constant.

(b) Compute R2 and adjusted R2


explained ∑ of squares(ESS)
- Goodness of fit (R2) is given by: R2 =
total ∑ of squares (TSS)
- Explained sum of squares (ESS) = ' (X’Y) - n Y´ 2 where Ý
is the mean of Y
which was obtained as follows: Ý = mean of Y = ΣY/n = 2,3760/20 = 1,188
- Hence, ESS = ' (X’Y) - n Y´ 2
23760
ESS =  1899.42 −1707.41 1.41  2250  - 20  11882
953010
ESS = (1899.4223760) + (-1707.412250) + (1.41953010)-2011882
ESS = 14,405,410.8
- Total sum of squares: (TSS) = Σ Y 2 - n Y´ 2

TSS = 79,542,000 – (2011882)

Page 25 of 83
TSS = 51,315,120

explained ∑ of squares(ESS) 14,405,410.8


- R2 = = = 0.2807
total ∑ of squares (TSS) 51,315,120
- Thus, gender and age of an individual account for 28.07% of the total variations in hours
watching TV, leaving about 71.93% of the variations to be explained by other factors.
N−1
- The formula for Adjusted R2 = 1 – (1 – R2)( )
N −K
20−1
Thus, adjusted R2 = 1 – (1 – 0.2807) ( )
20−3
Adjusted R2 = 1 – 0.8039 = 0.1961 or 19.61%

Since adjusted R-squared takes into account the degrees of freedom, then adjusted R-
squared is always less than the unadjusted R-squared.

(c) Derive the variance-covariance matrix, and use it to obtain the standard errors of
the regression coefficients
- The formula for the variance-covariance (VAR-COV) matrix of the OLS estimator  is
given by:
VAR-COV  = 2(D’X)-1
RSS TSS−ESS
Where 2 is the variance of Y given as: 2 = =
n−k n−k
- We already found the values for TSS and ESS as: TSS = 51,315,120 and ESS =
14,405,410.8 Thus, 2 =
RSS TSS−ESS 51,315,120−14,405,410.8
= = =2,171,159.365
n−k n−k 20−3
194,381 −32,691 −3,933
1
- Recall that: (D’X) =-1
 −32,691 84,931 −137 
419,470
−3,933 −137 99
- Therefore, the variance-covariance matrix for the OLS estimator  becomes:
194,381 −32,691 −3,933
2,171,159.365
2
VAR-COV  =  (D’D) = -1
 −32,691 84,931 −137 
419,470
−3,933 −137 99

1,006,108.281 −169,207.31 −20,357.05


VAR-COV  =  −169,207.31 439,599.46 −709.11 
−20,357.05 −709.11 512.42

- The variances of 0, 1 and 2 respectively are: Var 0 = 1,006,108.281, Var 1 =


439,599.46 and Var 2 = 512.42

Page 26 of 83
- The standard errors for 0, 1 and 2 respectively are: se 0 = 1,006,108.281 =
1,003.5, se 1 = 439,599.46 = 663.02 and se 3 = 512.42 = 22.64

(d) Test the statistical significance of:


(i) The regression coefficients
- The t-ratios are obtained by dividing a coefficient by its respective standard error: ie., t-

ratio = se
- Hence, the complete regression model is a model that shows the parameters for each
variable, the standard errors, as well as their t-ratios as follows:
Hours watching TV = 1899.42 – 1707.41Gender + 1.41 Age
Standard error: 1003.5) (663.02) (22.64)
Calculated t-ratios: 1.8936 2.5752 0.0623
- The critical t-statistics is t n – k, /2 for a two-tailed test. Thus, the critical t-statistics at
various levels of significance are as follows:
 At 1 % level of significance, t-critical = t 20 – 3, 0.01/2 = t17, 0.005 = 2.898
 At 5 % level of significance, t-critical = t 20 – 3, 0.05/2 = t17, 0.025 = 2.110
 At 10 % level of significance, t-critical = t 20 – 3, 0.10/2 = t17, 0.05 = 1.740
- The respective hypotheses are as follows:
 For intercept: H0: 0 = 0 against HA: 0  0
 For marketing : H0: 1 = 0 against HA: 1  0
 For management : H0: 2 = 0 against HA: 2  0
- The decision rule is: If t-calculated is greater than t-critical, we reject the null hypothesis.
However, if t-calculated is less than t-critical, we do not reject the null hypothesis.
- Therefore, if we compare t-calculated as shown in the complete regression model and t-
critical at various levels of significance, we can conclude as follows:
 The intercept is significant at 10% level of significance,
 The coefficient for gender is significant at 5% level of significance,
 The coefficient for age is not significant at all levels of significance.

(ii) Test for the overall significance of the model


- For the overall significance of the model, we use the F-ratio which is defined as:
R 2 /K −1
F=
1−R2 / N−K
0.2807/(3−1) 0.14035
Thus, F = =
(1−0.2807)/(20−3) 0.04231
F calculated = 3.317

- F critical = Fk-1,n-k, = F3-1,20-3,0.05 = F2,17,5% = 3.59


- The null and alternative hypotheses respectively are:

Page 27 of 83
H0: 0 = 1 = 2 = 0 (the model is not significant)
HA: 0  1  2  0 (the model is significant)

- The decision rule is if F calculated is greater than F critical, we reject the null hypothesis
and conclude that the model is significant. On the other hand, if F calculated is greater
than F critical, we do not reject the null hypothesis, and conclude that the model is not
significant.

- For our particular case, F calculated is less than F critical, thus do not reject H 0.
Therefore, the model is not significant.

4. THE DUMMY VARIABLE APPROACH TO THE CHOW TEST

The chow test is used to test for the equivalence, and hence stability of two separate regressions,
i.e., to test whether the coefficients on the separate regressions are similar or not.

The following are the steps in conducting the chow test for stability:

(1) Run the pooled regression model and obtain the residual sum of squares (RSS P). The
pooled regression model is a single regression equation that involves both variables into
the regression.
For example, run the model: salary = β 0 + β1 Gender + β2 Age for both males and
females, and obtain the pooled residual sum of squares.
(2) Run two other separate regressions for each group separately and obtain the residual sum
of squares for the separate groups respectively, RSS 1 and RSS2. For example, we could
run two separate regression models: one for females only and get RSS F and the other for
males only and get RSSM.
(3) The test statistic is an F distribution defined by the formula:
( RSS M + RSS F )
N−2 K
F= ¿
[ RSS P – ( RSS M + RSS F ) ] / K
¿
Where: RSSP is the residual sum of squares of the pooled regression model,
RSSM is the residual sum of squares for regression model for males only,
RSS F is the residual sum of squares for regression model for females only. N is the
sample size and k is the number of regression coefficients in the pooled regression model.
(4) The critical F statistics is obtained from the F tables with k degrees of freedom for the
numerator and n-2k degrees of freedom for the denominator. Thus F K, N-2K at various
levels of significance.
(5) Finally, we then compare the critical F statistics against the calculated F value and make a
decision. If the calculated F value exceeds the critical F value, we reject the null

Page 28 of 83
hypothesis that there is value in splitting the sample rather than using the pooled
regression.

EXAMPLE ON CHOW TEST:

Consider the following data which shows the time in hours which people spend watching TV
annually, as a function of their Age and Gender:

Hours watching TV Gender Age of individual


0 Male 41
180 Male 19
360 Female 54
900 Male 22
0 Male 48
360 Female 52
3600 Female 24
630 Male 60
1440 Female 28
0 Male 58
360 Female 35
4680 Female 67
630 Female 30
1440 Female 21
90 Male 32
360 Male 33
5760 Female 39
720 Female 56
2160 Female 31
90 male 57

The following are the three regression results for the pooled regression, regression with males
only, and regression with females only, respectively.

Pooled regression: Hours watching TV = 1899.42 – 1707.41Gender + 1.41 Age

RSSP = 36,909,243.1

Males regression: Hours watching TV = 494.9897 – 5.959209 Age

RSSm = 753,532.764

Females regression: Hours watching TV = 1651.001 + 7.6636 Age

RSSf = 35,960,451.2

Page 29 of 83
( ( 753,532.764+ 35,960,451.2 ))
20−6
Hence, calculated F statistic is F = ¿ =
[36,909,243.1 – ( 753,532.764+35,960,451.2 ) ]/3
¿
0.0248
The critical F statistic at the 5% level of significance is given as: F (3, 14) = 3.34. Since
calculated F is less than critical F, we do not reject the null hypothesis. Thus, we conclude that
there is no value in splitting the sample, and use the pooled regression model instead.

TOPIC 2: DUMMY DEPENDENT VARIABLE MODELS

INTRODUCTION

Many regression models have dummy independent variables, i.e., a model in which one or more
of the independent variable(s) is a dummy variable – takes value of 0 or 1.

However, there are also some regression models in which the dependent variable is treated as a
dummy or dichotomous in nature. Such models are known as dummy dependent variable
models, or qualitative dependent variable models or limited dependent variable models.

There are four main dummy dependent variable models:

(i) The linear probability model


(ii) The logit model
(iii) The probit model
(iv) The tobit model

THE LINEAR PROBABILITY MODEL

Page 30 of 83
A linear probability model (LPM) is a regression model of the dependent variable (a dummy)
against the set of independent variables.

For example, if we assume an OLS regression model such as: D i = 0 + 1X1i + 2X2i + ui, where
Di=1 if a particular attribute is present, and 0 otherwise, Xi is the set of independent variables, 
is the vector of regression coefficients, and ui is the error term, then this equation is a linear
probability model.

The term linear probability model comes from the fact that the right hand side of the equation is
linear.

Taking a specific case for Di as follows: Di=1 if an individual is female, 0 otherwise, then the
expected value of the left side measures the probability that Di=1, i.e., that an individual is
female. Thus, ^ Di measures the probability that Di=1 for the ith observation, and is given as:
^
Di = pr (Di=1) = ❑0 ^ + ❑1^ X1i + ❑2 ^ X2i

Interpreting the coefficients of a linear probability model

Since an LPM is such that the dependent variable is a dummy, this means that we must be careful
when interpreting the coefficients of an LPM.

Consider the following LPM: ^ ^


Di = pr (Di=1) = ❑0 ^ X1i + ❑2
+ ❑1 ^ X2i

Since ^ Di measures the probability that Di=1, then a coefficient in an LPM tells us the
percentage point change in the probability that Di=1 caused by a one-unit increase in the
independent variable concerned, holding constant the other independent variables in the
equation.

Example of a Linear Probability Model

A study of the labor force participation of women was done and the resulting regression equation
given as: ^ Di = pr (Di=1) = −0.28 – 0.38Mi + 0.09Si

N = 30, ^
R2=0.32, ^
R2P = 0.81 (0.15) (0.03)

Where: Di=1 if the ith woman is in the labor force, and 0 otherwise,

Mi = marital status defined as Mi=1 if married, 0 otherwise,

Si = number of years of schooling of the ith woman.

We can now interpret the regression results of this model as follows:

- The probability of a woman participating in the labor force falls (notice the negative sign)
by 38 percent if she is married, holding schooling constant.

Page 31 of 83
- In addition, each year of schooling increases the probability of female labor force
participation by 9 percent, holding constant marital status. Notice that these results are
consistent with theory.
- Marital status and years of schooling explain about 32 percent of the probability that a
woman participates in the labor force participation.
- The model “correctly” predicts 81 percent of the cases of women participating in the
labor force, i.e., the model correctly tell us 81 in 100 women are in the labor force, and
indeed they happen to be in the labor force.

Advantages of the linear probability model

(i) The LPM is easy to estimate as it utilizes the OLS methodology


(ii) The LPM is easy to interpret the results

Problems or limitations of the linear probability model

1. ^
Di is not bounded by 0 and 1
Since Di is a dummy variable, we would expect ^ Di to be limited by the range of 0 to
1, i.e., 0 ^
Di 1. This is actually the meaningful range.
However, depending on the values of the X’s and ’s, the right hand side might well be
outside the 0 to 1 range. For example, in the model we have just discussed: ^ Di = pr
(Di=1) = −0.28 – 0.38Mi + 0.09Si
For example, let us assume the case of a woman who is not married (Mi=0) and has no
schooling at all (Si=0). The probability that they will participate in the labor force will be:
^
Di = pr (Di=1) = −0.28 – 0.38(0) + 0.09(0) = - 0.28
This is a meaningless result since we expect 0 ^ Di 1.
On the other hand, it is possible to see Di going beyond value 1.
The diagram below illustrates the unbounded nature of pr (Di=1)

To go about this problem of failure to adhere to the 0 ^ Di 1, there have been two
practices introduced:
(a) To use ^ Di =0.5 as the value that distinguishes a prediction of Di=1 from Di=0, so that
if Di>¿ 0.5, we predict that Di=1, and if ^
^ Di<¿ 0.5, we predict that Di=0. However,
this practice is not a must.

Page 32 of 83
(b) Another possible way is to simply ignore the problem, so that we assign ^
Di =1 to all
values of ^
Di above 1 and ^ Di =0 to all negative values.

2. The error term in a linear probability model is not homoskedastic (it does not have a
constant variance), yet OLS assumes error term has a constant variance.
- To show that the variance of the error term is not constant, we proceed as follows: var (u i)
= Pi (1-Pi)2 + (1-Pi)Pi2 = Pi(1-Pi). We notice that in fact, the variance of the error term
depends on the value of probability Pi (not constant).
- Since the error term of the LPM is heteroskedastic, we use the heteroskedasticity-robust
OLS standard errors for confidence intervals and hypothesis testing. Also, we can
transform the model to remove the problem.

3. The error term does not have a normal distribution but instead, it follows a binomial
(probability) or Bernoulli distribution, yet OLS assumes that the error term is always
normally distributed. The error term has a distribution which depends on Pi. However, the
lack of normality is not a big problem as far as estimation of the parameters is concerned
because a binomial distribution normalizes as sample size increases, but then becomes a
requirement when performing hypothesis testing.

4. It is not only the error term that fails to have a normal distribution. The same problem
also is with the dependent variable, which also has a binomial distribution since it also
depends on Pi as shown below:

Yi Probability (Pi) Yi  Pi
0 1-Pi 0(1-Pi) = 0
1 Pi 1(Pi) = Pi
Total (1-Pi)+Pi = 1 0+1 = Pi Expected (Yi) = Pi
Thus, the expected value of the dependent variable depends on Pi, and thus, Yi cannot be
normally distributed

5. ^
R2 is no longer an accurate measure of goodness of fit, under the LPM, and indeed
any dummy-dependent variable models. In fact, it becomes questionable and cannot be
relied upon. In practice, it is almost impossible to get ^
R2 higher than 0.7 even if all the
points lie on the regression line.

6. Another problem with the LPM is that it assumes constant marginal effects – i.e., it
assumes that the rate of change of probability per unit change in the value of the
explanatory variable is constant, given by the value of the slope. For example, we
interpreted the variable schooling in the above LPM as follows: “each year of schooling
increases the probability of female labor force participation by 9 percent, holding
constant marital status”. However, considering the fact that one individual is at primary
Page 33 of 83
school and another at university level, each additional year of schooling will increase the
probability of female labor force participation for both individuals but the 9 percent will
not be the same for both, it could be higher for the university student than the primary
student, and so on. Thus, marginal effects are never constant.

Conclusion

Due to the above limitations of the LPM, other standard methodologies have been developed to
go around these problems. These are the logit model, the probit model and the tobit model. We
shall now examine each of these models in turn, and see how they overcome the aforementioned
limitations.

THE LOGIT MODEL

(a) The Binomial Logit model


- The logit model avoids the unboundedness nature of the LPM by using a variant of the
1
cumulative logistic function: Di = −(0+1 X 1 i+ 2 X 2 i+ui)
1+ e
- Indeed, the logistic function ensures that the 0 ^ Di 1 requirement is met, and this is
demonstrated as follows:
1+e−¿ 1
 As 0+1 X 1i+2 X 2i+ui approaches positive infinity, then ^ Di = 1 =
1
¿
=1
1
 As 0+1 X 1i+2 X 2i+ui approaches negative infinity, then ^ Di = =
1+ e❑
1
=0

- As a result, the logit model takes on a sigmoid or s-shaped pattern as shown below, since
it is bound by 0 and 1. Thus, while in the LPM ^ Di is linearly related to Xi, in a
binomial logit model, ^ Di is nonlinearly related to Xi. Even exceptionally large or
small of Xi will not produce probabilities that lie outside the meaningful range of 0-1
interval, unlike the LPM.
- The major difference between a logit and the LPM is that whereas the LPM is a straight
line and thus has a constant slope, the logit on the other hand is sigmoid-shaped and thus
the slope changes as ^ Di moves from 0 to 1. Thus, the change in the probability that
^
Di = 1 caused by a unit increase in an independent variable (holding the other
independent variables constant) will vary as we move from ^ Di = 0 to ^ Di = 1.

Page 34 of 83
- As a result, unlike LPM which is estimated by OLS, logit is actually estimated by the
MAXIMUM LIKELIHOOD ESTIMATOR (MLE), which requires large samples.
- The major difference between OLS and MLE, is that whereas OLS aims at finding the 
that minimizes the sum of squared error terms (residual sum of squares), MLE on the
other hand aims at finding the  that maximizes the likelihood of the sample data set
being observed (i.e., the  that maximize the log of the probability, or likelihood, of
observing the particular set of values of the dependent variable in the sample for a given
set of Xs). In fact, MLE is an iterative process since it performs a number of iterations
until the best estimate is obtained from the definition given above.
- However, the estimates obtained by OLS and MLE are identical for a linear equation that
meets the classical assumptions. But MLE is mostly preferred due to the following
reasons:
(i) MLE is a large sample estimator
(ii) MLE is consistent and asymptotically efficient (unbiased and minimum variance
for large samples)
(iii) With very large samples, MLE has the added advantage of producing normally
distributed coefficient estimates, allowing the use of typical hypothesis testing
techniques
- When estimating a logit model, we first determine the ODDS RATIO given as: odds ratio
Di
= . The odds ratio tells us the likelihood that something will occur as opposed to
1−Di
it not occurring. For example, if the probability of buying a car is 0.8, the probability of
not buying is 0.2, and the odds ratio is 0.8/0.2 = 4. Thus, we say the odds are 4 to 1 in
favour of buying a car.
- However, having got the odds ratio we obtain the natural logarithm of the odds ratio and
Di
regress it against the variables. Thus, the logit model is as follows: ln ( ¿ = pr
1−Di
(Di=1) = ❑0 ^ + ❑1 ^ X1i + ❑2
^ X2i + ui where Di is the dummy variable.
Derivation of the Logit Model
 The probability of an event happening using the logistic function is: Pi =
1
−(0+1 X 1 i+ 2 X 2 i+ui)
1+ e
 Hence, the probability of that event not happening will be given by: 1 – Pi = 1 -
1 e−(0 +1 X 1 i+2 X 2 i+ui)
= .
1+ e−(0+1 X 1 i+ 2 X 2 i+ui) 1+ e−(0+1 X 1 i+ 2 X 2 i+ui)

Page 35 of 83
 The odds ratio is the probability of an event occurring divided the probability of it not
−(0 +1 X 1 i+2 X 2 i+ui)
Pi 1 e
occurring: odds ratio = = −(0+1 X 1 i+ 2 X 2 i+ui)  =
1−Pi 1+ e 1+ e
−(0+1 X 1 i+ 2 X 2 i+ui)

−(0 +1 X 1 i+2 X 2 i+ui)


1 e

−(0+1 X 1 i+ 2 X 2 i+ui) −(0+1 X 1 i+ 2 X 2 i+ui) = e(0+1 X 1i +2 X 2 i+ui)
1+ e 1+ e
 The logit model is obtained by taking the natural logarithm of the odds ratio. Thus, ln (
Pi ^ ^ X1i + ❑2 ^ X2i + ui. This is the logit model.
¿ = pr (Di=1) = ❑0 + ❑1
1−Pi
Notice that the dependent variable is the natural logarithm of the odds ratio, which is now
linear in the regressor.

EXAMPLE OF A LINEAR PROBABILITY MODEL (LPM), LOGIT AND PROBIT MODEL

The following data shows the effect of personalized system of instruction (PSI) on course grades:

observation GPA TUCE PSI Grade Letter


grade
1 2.66 20 0 0 C
2 2.89 22 0 0 B
3 3.28 24 0 0 B
4 2.92 12 0 0 B
5 4.00 21 0 1 A
6 2.86 17 0 0 B
7 2.76 17 0 0 B
8 2.87 21 0 0 B
9 3.03 25 0 0 C
10 3.92 29 0 1 A
11 2.63 20 0 0 C
12 3.32 23 0 0 B
13 3.57 23 0 0 B
14 3.26 25 0 1 A
15 3.53 26 0 0 B
16 2.74 19 0 0 B
17 2.75 25 0 0 C
18 2.83 19 0 0 C
19 3.12 23 1 0 B
20 3.16 25 1 1 A
21 2.06 22 1 0 C
22 3.62 28 1 1 A
23 2.89 14 1 0 C
24 3.51 26 1 0 B
25 3.54 24 1 1 A
26 2.83 27 1 1 A
27 3.39 17 1 1 A

Page 36 of 83
28 2.67 24 1 0 B
29 3.65 21 1 1 A
30 4.00 23 1 1 A
31 3.10 21 1 0 C
32 2.39 19 1 1 A
Where: grade = 1 if the final grade is A, but 0 if it is B or C

TUCE = score of matriculation exam upon admission

PSI = 1 if a new teaching method is adopted, and 0 otherwise

GPA = grade point average

The dependent variable is grade (a dummy variable), regressed on GPA, TUCE and PSI.

Using STATA software, the results for the linear probability model, logit and probit models are
presented as follows:

The commands of interest using the software are as follows:

- Regress grade gpa tuce psi


- mfx
- predict gradehat1
- logit grade gpa tuce psi
- mfx
- predict gradehat2
- probit grade gpa tuce psi
- mfx
- predict gradehat3
- browse
. regress grade gpa tuce psi

Source SS df MS Number of obs = 32


F( 3, 28) = 6.65
Model 3.00227631 3 1.00075877 Prob > F = 0.0016
Residual 4.21647369 28 .150588346 R-squared = 0.4159
Adj R-squared = 0.3533
Total 7.21875 31 .232862903 Root MSE = .38806

grade Coef. Std. Err. t P>|t| [95% Conf. Interval]

gpa .4638517 .1619563 2.86 0.008 .1320992 .7956043


tuce .0104951 .0194829 0.54 0.594 -.0294137 .0504039
psi .3785548 .1391727 2.72 0.011 .0934724 .6636372
_cons -1.498017 .5238886 -2.86 0.008 -2.571154 -.4248801

Page 37 of 83
. mfx

Marginal effects after regress


y = Fitted values (predict)
= .34375

variable dy/dx Std. Err. z P>|z| [ 95% C.I. ] X

gpa .4638517 .16196 2.86 0.004 .146423 .78128 3.11719


tuce .0104951 .01948 0.54 0.590 -.027691 .048681 21.9375
psi* .3785548 .13917 2.72 0.007 .105781 .651328 .4375

(*) dy/dx is for discrete change of dummy variable from 0 to 1

. logit grade gpa tuce psi

Iteration 0: log likelihood = -20.59173


Iteration 1: log likelihood = -13.259768
Iteration 2: log likelihood = -12.894606
Iteration 3: log likelihood = -12.889639
Iteration 4: log likelihood = -12.889633
Iteration 5: log likelihood = -12.889633

Logistic regression Number of obs = 32


LR chi2(3) = 15.40
Prob > chi2 = 0.0015
Log likelihood = -12.889633 Pseudo R2 = 0.3740

grade Coef. Std. Err. z P>|z| [95% Conf. Interval]

gpa 2.826113 1.262941 2.24 0.025 .3507938 5.301432


tuce .0951577 .1415542 0.67 0.501 -.1822835 .3725988
psi 2.378688 1.064564 2.23 0.025 .29218 4.465195
_cons -13.02135 4.931325 -2.64 0.008 -22.68657 -3.35613

. mfx

Marginal effects after logit


y = Pr(grade) (predict)
= .25282025

variable dy/dx Std. Err. z P>|z| [ 95% C.I. ] X

gpa .5338589 .23704 2.25 0.024 .069273 .998445 3.11719


tuce .0179755 .02624 0.69 0.493 -.033448 .069399 21.9375
psi* .4564984 .18105 2.52 0.012 .10164 .811357 .4375

(*) dy/dx is for discrete change of dummy variable from 0 to 1

Page 38 of 83
. probit grade gpa tuce psi

Iteration 0: log likelihood = -20.59173


Iteration 1: log likelihood = -12.908126
Iteration 2: log likelihood = -12.818963
Iteration 3: log likelihood = -12.818803
Iteration 4: log likelihood = -12.818803

Probit regression Number of obs = 32


LR chi2(3) = 15.55
Prob > chi2 = 0.0014
Log likelihood = -12.818803 Pseudo R2 = 0.3775

grade Coef. Std. Err. z P>|z| [95% Conf. Interval]

gpa 1.62581 .6938825 2.34 0.019 .2658255 2.985795


tuce .0517289 .0838903 0.62 0.537 -.1126929 .2161508
psi 1.426332 .5950379 2.40 0.017 .2600795 2.592585
_cons -7.45232 2.542472 -2.93 0.003 -12.43547 -2.469166

. mfx

Marginal effects after probit


y = Pr(grade) (predict)
= .26580809

variable dy/dx Std. Err. z P>|z| [ 95% C.I. ] X

gpa .5333471 .23246 2.29 0.022 .077726 .988968 3.11719


tuce .0169697 .02712 0.63 0.531 -.036184 .070123 21.9375
psi* .464426 .17028 2.73 0.006 .130682 .79817 .4375

(*) dy/dx is for discrete change of dummy variable from 0 to 1

The results of these regressions are now summarized in the following table:

Dependent Linear probability Logit model Probit model


variable (grade) model (LPM)
Gpa Coefficient 0.4639 2.8261 1.6258
p-value 0.008 0.025 0.019
Slope 0.4639 0.5339 0.5333

Tuce Coefficient 0.0105 0.0952 0.0517


p-value 0.594 0.501 0.537
Slope 0.0105 0.0180 0.0170

Psi Coefficient 0.3786 2.3787 1.4263


p-value 0.011 0.025 0.017
Slope 0.3786 0.4565 0.4644

constant Coefficient -1.4980 -13.0214 -7.4523


p-value 0.008 0.008 0.003

R-squared 0.4159 0.3740 0.3775

Page 39 of 83
Predicted probability 0.34375 0.2528 0.2658

Interpreting the results:

THE RESULTS OF THE LPM

 Holding all other factors constant, if the GPA goes up by one point, on the average
the probability of obtaining an A increases by 0.4639, irrespective of the value of
GPA we measure the change. GPA is statistically significant at 1% level of
significance.
 Holding other factors constant, the probability of obtaining an A increases by
0.0105 for every unit increase in the matriculation score (TUCE), irrespective of
the value of TUCE from which we measure the change. However, TUCE has no
discernible impact on the probability of securing an A.
 Holding other factors constant, the probability of scoring an A increases by 0.3786
for students who have been exposed to the new method of instruction (PSI). The
new method of instruction seems to be effective.
 GPA, TUCE and PSI account for about 42% of the probability that a student will
get an A.
 On average, there is a 0.34375 chance that a student will obtain an A, if we
consider only GPA, TUCE and PSI.

THE RESULTS OF THE LOGIT MODEL

- A coefficient in a logit model tells us the change in the log of the odds ratio per unit
change in the independent variable concerned from its mean.
- Marginal effect of an independent variable gives us the change in the expected value of
^
Di caused by a one unit increase in X1i holding constant the other independent
variables in the equation. Marginal effects equals 1^ Di
^ (1 - ^ Di )

We can therefore interpret the above logit model as follows:

 Holding all other factors constant, every unit change in a students’ GPA increases the log
of the odds ratio (logit) of getting an A by 2.8261 from the mean of GPA. Also, if a
student’s GPA increases by one unit from the mean of GPA, their probability of getting an
A increases by 0.5339, ceteris paribus. GPA is statistically significant.
 Holding all other factors constant, every unit change in a students’ TUCE score increases
the logit of getting an A by 0.0952 from the mean of TUCE. Also, if a students’ TUCE
increases by one unit from the mean, their probability of getting an A increases by
0.0180, ceteris paribus. TUCE is not statistically significant.

Page 40 of 83
 Students who have been exposed to new instruction methods have a higher logit of
getting an A by 2.3787 and also, their probability of getting an A increases by 0.4565,
ceteris paribus. PSI is significant.
 The pseudo R-squared is 0.3740. Thus, GPA, TUCE and PSI can explain for about 37%
of the probability of a student getting an A.
 On average, there is a 0.34375 chance that a student will obtain an A, if we consider only
GPA, TUCE and PSI.

Under the logit, Marginal effects equal 1^ Di


^ (1 - ^ Di ). For example, taking the case of
GPA, coefficient is 2.8261 while the probability is 0.2528. Hence, marginal effects or slope is
2.8261 0.2528(1-0.2528) = 0.5339 and so on.

Also, if we multiply a logit coefficient by 0.25, you will get an equivalent linear probability
model coefficient.

Pi
The logit model for our above example is as follows: ln ( ¿ = pr (Di=1) = −13.0214 +
1−Pi
2.8261GPA + 0.0952TUCE + 2.3787PSI thus, if we randomly pick any particular observation,
say observation 20 which had the following characteristics: GPA = 3.16, TUCE = 25, PSI = 1
Pi
and grade =1, if we substitute these into the logit model, we get: ln ( ¿ = pr (Di=1) =
1−Pi
Pi
−13.0214 + 2.8261 (3.16) + 0.0952 (25) + 2.3787 (1). Therefore, ln ( ¿ = pr (Di=1) =
1−Pi
0.6678. thus, for observation 20, the logit (log of odds ratio) for scoring an A is 0.6678. thus, to
get odds ratio in favour of scoring an A, we obtain anti-logarithm on both sides. Hence,
Pi
= e 0.6678 =¿ 1.95. Therefore, the probability of scoring A is Pi = 1.95/2.95 = 0.6610.
1−Pi

EXERCISE: what is the probability of observation 27 to score A?

TOPIC 3: SIMULTANEOUS EQUATIONS

INTRODUCTION

Page 41 of 83
So far we have been assuming a situation in which the variables are related by a single equation.
However, in real life, variables may be linked up by a system of equations (simultaneous
equations).

A good example of such equations is the KEYNESIAN national income model given as:

Yt = Ct + It + Gt
Ct = 0 + 1Yt - 2Tt + e1t
It = 0 + 1Yt-1 - 2Rt + e2t

Where: Yt is national income, Ct is aggregate consumption, It is aggregate investment, Tt is taxes,


Yt-1 is income lagged by one period, and Rt is interest rate, and e1t and e2t are the stochastic terms.

The variables Yt, Ct and It are the ENDOGENOUS variables, while G t, Tt, Yt-1 and Rt are
EXOGENOUS variables. Thus, the model has 3 endogenous and 4 exogenous variables.

STRUCTURAL VERSUS REDUCED FORM EQUATIONS

A structural equation is one that is derived by being informed by economic theory. They express
the relationship between the endogenous variable as a function of the exogenous as well as other
endogenous variables. Thus, the Keynesian national model given above is a structural system of
equations.

Reduced form equation is one that expresses the relationship between the endogenous variable as
a function of the exogenous variables only. Thus, given structural equations, we can derive the
reduced-form equations, and even investigate the relationship among them.

Example 1:

Consider the following system of simultaneous equations:

C = α + βY + u

Y=C+I

Where C is aggregate consumption, Y is the level of national income, I is investment


expenditure, α and β are parameters such that α>0, 0<β<1 and u is the disturbance term.

HOW TO OBTAIN REDUCED FORM EQUATIONS FROM THE STRUCTURAL


EQUATIONS:

We want to obtain the reduced form equations from the structural equations. We proceed as
follows:

Step 1: identify the endogenous and exogenous variables

- The endogenous variables are C and Y


Page 42 of 83
- The exogenous variable is I

Step 2: express the endogenous variables as functions of the exogenous variables only.
Thus, we transfer all endogenous variables to the left hand side as follows:

C – βY = α + u

-C + Y = I

Step 3: express the system of equations in matrix form:

[−11 −β1 ][ CY ] = [ α +uI]


A X= D

Step 4: find the determinant of matrix A

Det. A = (1 × 1) – (-1 × -β) = 1 – β

Step 5: we can solve for the equilibrium values of the endogenous variables Y and C using
either the MATRIX INVERSION METHOD or CRAMMERS RULE.

In this example, we shall use crammers rule:

(i) Replace the first column of matrix A with column vector D, and find the determinant
of the resulting matrix, named A1.
α +u −β
A1 =
I[ 1 ]
Thus, Det. A1 = [(α + u) × 1] – (- β × I)
= α + βI + u
(ii) Replace the second column of matrix A with column vector D, and find the
determinant of the resulting matrix, named A2.
1 α +u
A2 =
−1 [ I ]
Thus, Det. A2 = (1 × I) - [- 1 × (α + u)]
=α+I+u
(iii) The solution for equilibrium C is obtained by dividing Det. A1 by Det. A as follows:
det A 1 α + βI +u
C= =
det A 1−β
α β u
Thus, equilibrium C is C= + I+
1−β 1−β 1−β
Therefore, the reduced form equation for C is simply expressed as:
C = π10 + π11I + v1t

Page 43 of 83
α β u
Where π10 = , π11 = and v1t =
1−β 1−β 1−β
(iv) The solution for equilibrium Y is obtained by dividing Det. A2 by Det. A as follows:
det A 2 α + I +u
Y= =
det A 1−β
α 1 u
Thus, equilibrium Y is Y= + I+
1−β 1−β 1−β
Therefore, the reduced form equation for Y is simply expressed as:
Y = π20 + π21I + v2t
α 1 u
Where π20 = , π21 = and v2t =
1−β 1−β 1−β

HOW TO OBTAIN STRUCTURAL COEFFICIENTS GIVEN NUMERICAL VALUES


FOR THE REDUCED FORM EQUATIONS:

Assuming that the reduced form equations for the above equations are given as follows:

C = 35 + 4I + v1t

Y = 35 + 5I + v2t

Obtain the structural coefficients α and β from the reduced form equations above.

α β
- To obtain α and β, we compare C = 35 + 4I + v1t with C = + I+
1−β 1−β
u α
. Thus, = 35. Therefore, cross multiplying yields: α = 35 (1 – β)
1−β 1−β
…………………equation (1).
β
- We also equate as follows: = 4. Thus, cross multiplying yields: β = 4 (1 – β)
1−β
…………………………………………………………………………..… equation (2).
- We can solve for equation 2 to get 5β = 4. Therefore, β = 4/5 or 0.8
- Similarly, from equation 1, we can solve to get α = 35 (1 – 0.8). Hence, α = 7

HOW TO OBTAIN THE RELATIONSHIP BETWEEN THE REDUCED FORM


COEFFICIENTS:

The reduced form equations were obtained as follows:


C = π10 + π11I + v1t and Y = π20 + π21I + v2t

To obtain the relationship between the structural coefficients and the reduced form coefficients,
we obtain ratios as follows:

Page 44 of 83
π 10 α α
 = / = 1. Thus, π10 = π20
π 20 1−β 1−β
π 11 β 1
 = / = β. Thus, π11 = β.π21
π 21 1−β 1−β
π 10 α β α
 = / = . Thus, β.π10 = α.π11
π 11 1−β 1−β β
π 20 α 1
 = / = α. Thus, π20 = α.π21
π 21 1−β 1−β

Example 2:

Consider the following simultaneous equation model:

Yt = Ct + It + Gt
Ct = 0 + 1Yt - 2Tt + e1t
It = 0 + 1Yt-1 - 2Rt + e2t

Obtain the reduced form equations for this model.

SOLUTION:

To obtain the reduced from equations from the structural equations, we proceed as follows:

STEP 1: Express the endogenous variables in each equation as a function of the exogenous
variables only. This is done by collecting all the endogenous variables on the left hand side of the
model:

Y t - C t - It = G t

-1Yt + Ct = 0 - 2Tt + e1t

It = 0 + 1Yt-1 - 2Rt + e2t

STEP 2: Express the resulting equations in matrix form:

1 −1 −1 Yt ¿

[ −1 1
0 0
0 Ct
1 It ][ ] [ = 0−2T t+e 1t
0+1Y t−1−2 R t+ e 2 t ]
A X= D

Note A is a matrix of coefficients of order 33, X is a matrix of endogenous variables of order


31 and D is a matrix of exogenous variables, as well as the error terms and the intercept terms.

STEP 3: Obtain the inverse of matrix A

Page 45 of 83
(i) THE MINOR MATRIX
- Minor of element a11 = 1 is (11) – (00) = 1
- Minor of element a12 = -1 is (-11) – (00) = -1
- Minor of element a13 = -1 is (-10) – (01) = 0
- Minor of element a21 = -1 is (-11) – (0-1) = -1
- Minor of element a22 = 1 is (11) – (0-1) = 1
- Minor of element a23 = 0 is (10) – (0-1) = 0
- Minor of element a31 = 0 is (-10) – (1-1) = 1
- Minor of element a32 = 0 is (10) – (-1-1) = -1
- Minor of element a33 = 1 is (11) – (-11) = 1 - 1
1 −1 0

(ii)
Thus, the minor of A is: −1 1

THE COFACTOR MATRIX


1 −1 1−1 [0
]
Cofactor is obtained by interchanging signs of the minor matrix. Thus, the cofactor of
A is:
1 1 0

(iii)
[
1 1
1 1 1−1
0
]
THE DETERMINANT
Using Laplace Expansion, determinant is obtained as follows:
Det A = a11c11 + a12c12 + a13c13 = (11) + (-11) + (-10) = 1 - 1
(iv) THE ADJOINT
The adjoint is obtained by transposing the cofactor matrix as follows:
1 1 1

(v)
Adjoint of A = 1 1

INVERSE
[
0 0 1−1
1
]
Inverse is 1/determinant  Adjoint
1 1 1
-1
Thus, A =
1
1−1 [
1 1
0 0 1−1
1
]
STEP 4: Pre-multiply A inverse on both sides of AX=D. This yields the following

A-1AX = A-1D, or X = A-1D

Yt 1 1 1 ¿
[]
Ct
It
=
1
1−1 [ 1 1 1
][
0−2T t +e 1 t
0 0 1−1 0+1Y t−1−2 R t+ e 2 t ]
The solutions are as follows:

Page 46 of 83
( 0+0 ) +¿−2Tt +1Yt−1−2 Rt +(e 1 t+ e 2 t)
Yt =
1−1

( 0+0 ) 1 2 1 2 (e 1 t+ e 2 t)
Yt = + Gt - T + Yt-1 - R+
1−1 1−1 1−1 t 1−1 1−1 t 1−1

Yt = 10 + 11Gt - 12Tt + 13Yt-1 - 14Rt + v1t this equation is known as the reduced form
equation for National Income

( 0+0 ) 1 2 1 2
Where 10 = , 11 = , 12 = , 13 = , 14 = and v1t =
1−1 1−1 1−1 1−1 1−1
(e 1 t+ e 2 t)
. The coefficients 10, 11, 12, 13, and 14 are the reduced form coefficients.
1−1

In a similar way, we can obtain the reduced-form equations and coefficients for the other
endogenous variables, which are for Ct and It.

The solution for Ct is obtained as follows:

( 0+0 1 )+ 1>−2Tt +11 Yt −1−2 1 Rt (e 1 t+1e2 t)


Ct =
1−1

( 0+0 1 ) 1 2 11 21 (e 1 t+ 1e2t)
Ct = + Gt - T + Yt-1 - R+
1−1 1−1 1−1 t 1−1 1−1 t 1−1

Ct = 20 + 21Gt - 22Tt + 23Yt-1 - 24Rt + v2t is the reduced form equation for aggregate
consumption function.

( 0+0 1 ) 1 2 11 21
Where 20 = , 21 = , 22 = , 23 = , 24 = and v2t =
1−1 1−1 1−1 1−1 1−1
(e 1 t+ 1e2t)
. The coefficients 20, 21, 22, 23, and 24, are the reduced form coefficients.
1−1

In a similar way, the solution for It is as follows:

(1−1 ) (0+1 Yt−1−2 Rt +e 2 t)


It =
1−1

0 ( 1−1 ) 1(1−1) 2(1−1) (1−1) e 2t


It = + Yt-1 - Rt + ¿ ¿
1−1 1−1 1−1 1−1

It = 0 + 1Yt-1 - 2Rt + e2t

It = 30 + 31Yt-1 - 32Rt + v3t is the reduced form equation for aggregate investment function.

Page 47 of 83
Where 30 = 0 , 31 = 1, 32 = 2, and v3t = e2t. The coefficients 30, 31, and 32 are the reduced
form coefficients for the investment function.

DIRECT EFFECTS, INDIRECT EFFECTS AND TOTAL EFFECTS

The concept of Total Effect, Direct Effect and Indirect effect is very useful for policy purposes.
This is because any policy, if implemented, would have both direct (wanted) and indirect
(unwanted) effects. For example, the free maternity policy by the government would have:

- The direct effect of encouraging delivery in public health facilities, or even reduce
chances of death and other complications arising from delivery,
- The indirect effect is however, say population growth, congestion of such
facilities, poor service delivery, among others, as those who would have afforded
private facilities would prefer the free public facilities

Thus, in simultaneous equations, we can measure: Total effects, Direct Effects and Indirect
effects.

TOTAL EFFECT measures the partial derivative of the dependent variable of the reduced form
equations with respect to any of the exogenous or explanatory variables that it contains.

Consider the reduced form equation for national income function was expressed as:

Yt = 10 + 11Gt - 12Tt + 13Yt-1 - 14Rt + v1t. Thus, the partial derivative of Yt with respect to: Gt,
Tt, Yt-1 and Rt will give the total effects.

1
For example, Yt/Gt = 11 = . this is therefore the total effect of G t on Yt. Also, Yt/Tt
1−1
2
= 11 = , and so on for Yt-1 and Rt respectively. The same applies to all the other reduced-
1−1
form equations and coefficients. In overall, since we have 10 reduced form coefficients, then we
can measure 10 total effects.

DIRECT EFFECT measures the partial derivative of the dependent variable of the structural
equations with respect to any of the exogenous or explanatory variables that it contains.

Consider the structural equation for national income function was expressed as: Y t = Ct + It + Gt.
Thus, the partial derivative of Yt with respect to exogenous variable Gt will give the direct
effects.

For example, Yt/Gt = 1 . this is therefore the Direct effect of Gt on Yt. In a similar way, the
structural equation for consumption was given as: Ct = 0 + 1Yt - 2Tt + e1t. The partial derivative
of Ct with respect to exogenous variable Tt is Ct/Tt = -2 . this is therefore the Direct effect
of Tt on Ct. the same applies to all the other structural equations and coefficients.

Page 48 of 83
INDIRECT EFFECT therefore, measures the difference between the total effect and the direct
effect for the particular equation and exogenous variable concerned, i.e., INDIRECT EFFECT =
TOTAL EFFECT – DIRECT EFFECT.

For example the total effect and direct effect of Gt on Yt was as follows:

1
- Total effect of Gt on Yt is Yt/Gt = 11 =
1−1
- Direct effect of Gt on Yt is Yt/Gt = 1
1 1
- Indirect effect of Gt on Yt = total effect – direct effect = -1=
1−1 1−1

Using the same analysis, we can find the total effect, direct effect and indirect effect for the
remaining equations and variables.

In general, given the reduced form coefficients (the total effects), we can find the structural
coefficients (the direct effects), by simply examining the relationships among the reduced form
coefficients.

1 2
For example, we notice that 11 = , and 12 = . therefore, if we divide the two
1−1 1−1
11 1 2 1 1−1 1
coefficients, we shall get: =  =  =
12 1−1 1−1 1−1 2 2

11 1
Since = , then if we cross multiply, we get: 12 = 211. This is the relationship
12 2
between 11 and 12. We can therefore find the relationships between all the reduced-form
coefficients with each other, and even with the structural coefficients in the same way as
illustrated.

Hence it is possible to show for each equation , the direct, indirect and total effect illustrated in
the table below.
Equation Effects Constant Tt Yt-1 Rt Gt
Consumption Total β 0 + β 1 α0 −β 2 β1 α 1 − β1 α 2 β1
1−β 1 1−β 1 1−β 1 1− β1 1−β 1
Direct β0 −β 2 0 0 0
Indirect β 1 ( α0 + β 0) − β1 β 2 β1 α 1 − β1 α 2 β1
1−β 1 1− β1 1−β 1 1− β1 1−β 1
Investment Direct α0 0 α1 - α2 0
Direct α0 0 α1 α2 0

Indirect 0 0 0 0 0

Page 49 of 83
National β 0 +α 0
Income Total 1−β 1 −β 2 α1 −α 2 1
1−β 1 1−β 1 1−β 1 1−β 1
Direct 0 0 0 0 0

Indirect β 0 +α 0
1−β 1 −β 2 α1 −α 2 1
1−β 1 1−β 1 1−β 1 1−β 1

For example the indirect effect of the constant in the consumption function is obtained as
follows:
Indirect effect= Total effect – Direct effect
β 0 + β 1 α0 β 0 + β 1 α 0 −β 0 + β 0 β 1 ( α0 + β 0)
β1
1−β 1 β
- 0 = 1−β 1 = 1−β 1 and so on.
The justification or need for simultaneous equations, is that variables usually have a two-way
causality, i.e., X influences Y but at the same time Y influences X, hence a simultaneous equation
model, rather than a single regression equation.

WORK TO DO IN GROUPS:

Consider the following simultaneous equation model:

Y1t = 10 + 12Y2t + 11X1t + u1t

Y2t = 20 + 21Y1t + 22X2t + u2t

(a) Obtain the reduced-form equations for this model.


(b) What is the relationship between the reduced-form coefficients from (a) above, and the
structural coefficients?
(c) Given that the estimated reduced-form equations are as follows (error terms are now
omitted):
Y1t = 4 + 3X1t + 8X2t
Y2t = 2 + 6X1t + 10X2t
Obtain the values of the structural parameters (coefficients)

THE IDENTIFICATION PROBLEM

By the identification problem, we mean: is it possible to extract or retrieve the structural


coefficients from the reduced form coefficients?

Page 50 of 83
If it is possible to extract or retrieve the structural coefficients from the reduced-form
coefficients, then we say that the equation has been identified. Thus, for any equation to be said
to be identified, then it must be possible to separate the total effect, direct effect and indirect
effect.

On the other hand, if it is not possible to extract the structural coefficients from the reduced form
coefficients, then the equation is said either to be un-identified or under-identified. It is thus very
difficult to separate total, direct and indirect effects.

In general, the identification problem goes as follows: if you have n variables and n equations,
then the model is identified. For example, if you have 2x + 3y = 13 and 3x – y = 14, the
equations are exactly or just identified since it has 2 variables and 2 equations. Thus, we can
solve for x and y to give x = 5 and y =1. However, if we have 2x + 3y + z = 15 and 3x – y – z =
12, since we have 3 variables and 2 equations, it may be difficult to get solution values for x, y
and z. Thus, if a system has n variables and (n – k) equations, the model is under- or not
identified. Thus, we cannot solve for it. Also, if a system has n variables and (n + k) equations,
then it is over- identified since there will be more solutions than there are variables.

There are 2 official tests for identification, which are:


1. The order condition
2. The Rank condition

The order condition is the starting point to test for the identification status of each and every
equation in a system of equations.

The order condition for identification status

The order condition of testing for identification of equations is a necessary condition but not
sufficient condition. It states as follows:

“An equation is identified if and only the number of variables excluded from the equation to be
identified is at least greater than the number of endogenous variables, less one”

Thus, the formula for the order condition for identification is as follows: K – M  G -1.

Where: K is the number of variables in the model (system of equations),

M is the number of variables in the equation to be identified,

K – M is the number of variables excluded from the equation to be identified,

G is the number of endogenous variables, or even the number of equations.

Hence, from the order condition, there are three (3) possible cases:

 Case 1: K – M > G – 1, the equation is said to be over identified

Page 51 of 83
 Case 2: K – M = G – 1, the equation is said to be exactly or just identified
 Case 3: K – M < G – 1, the equation is said to under or not identified.

Using the example for the Keynesian model of national income determination as shown above,
we can thus, test for the identification status of each equation using the order condition as
illustrated below:

 Consumption function: Ct=β0+β1Yt - β2Tt+ϱ1t


 Investment function:
α
It= 0 + α 1 Yt-1- α 2 Rt+ϱ2t
 National income identity: Yt =Ct +It + Gt

The model has 3 endogenous variables (Yt, Ct, It,) and 4 exogenous variables (Tt, Yt-1, Rt , Gt,).
Thus, we can use order condition to test for the identification status of each equation as follows.

Equation K M K-M G G-1 K-M, G-1 Identification status

Consumption 7 3 4 3 2 4>2 Over - identified

Investment 7 3 4 3 2 4>2 Over - identified

Income 7 4 3 3 2 3>2 Over - identified

Hence, from the order condition, the consumption function and National income identity are over
identified, whereas the investment function is only exactly identified.

The Rank Condition of Testing for Identification

While the order condition is only a necessary condition, the rank condition is a sufficient
condition for testing for the identification of equations.
The Rank condition for testing for identification is stated as: “if it is possible to obtain at least
one non-zero determinant of order G-1 from the model, then the specific equation being
examined is identified, and vice versa”.

Therefore, when testing for the identification status of equations using the rank condition, the
following steps are useful:-
 For each equation, transfer all variables to the left hand side of the model, except the
error term.
 Present the resulting equations in a tabular format with all rhe equations and variables,
including the intercepts and exclude errors
 To identify each equation, cancel out the entire row equation and all non zero
( coefficients)

Page 52 of 83
 Obtain the determinant of the resulting matrix.
 If it is possible to obtain at least one non-zero determinant of order G-1, the equation is
identified; but if not, the equation is not identifiable.

To illustrate the Rank condition, let us still go back to the Keynesian Income Model:-

Ct =β0+β1Yt-β2Tt+ϱ1t (Consumption function)


It = ծ0+ծ1Yt-1 - ծ2Rt+ϱ2t (Investment function)
Y t = C t + It + G t (National income Identity)

STEP 1: For each equation transfer all variables to the left-hand side of the model, except error
term.

Consumption function: Ct =β0+β1Yt-β2Tt+ϱ1t becomes Ct - β0 - β1Yt - β2Tt = ϱ1t


Investment function: It = ծ0+ծ1Yt-1 - ծ2Rt+ϱ2t becomes It - ծ0 - ծ1Yt-1 - ծ2Rt = ϱ2t
National income Identity: Yt = Ct + It+ Gt becomes Yt - Ct - It - Gt = 0

STEP 2: Present all the equations resulting above into a tabular format including all the variables
and intercept term, but excluding the error term as follows:

Equation Intercept Ct It Yt Tt Yt-1 Rt Gt

Consumption -β0 1 0 -β1 β2 0 0 0

Investment −α 0 0 1 0 0 −α 1 −α 2 0

National 0 -1 -1 1 0 0 0 -1
Income

Endogenous variables exogenous variables

STEP 3: To identify any particular equation, cancel out the equation to be identified and all non-
zero elements.

For example, to identify the consumption function, we cancel out the consumption function and
all its non-zero coefficients as shown below.

Equation Intercept Ct It Yt Tt Yt-1 Rt Gt

Consumption -β0 1 0 -β1 β2 0 0 0

Investment −α 0 0 1 0 0 −α 1 −α 2 0

Page 53 of 83
National 0 -1 -1 1 0 0 0 -1
Income

The resultant matrix is 1 α1 α2 0


-1 0 0 -1

# To identify the investment function, we cancel out the investment function and its entire non-
zero coefficients as shown below:

Equation Intercept Ct It Yt Tt Yt-1 Rt Gt

Consumption -β0 1 0 -β1 β2 0 0 0

Investment −α 0 0 1 0 0 −α 1 −α 2 0

National 0 -1 -1 1 0 0 0 -1
Income

The resultant Matrix is 1 β1 β2 0

-1 1 0 -1

#To identify the National Income identity, we cancel out the National Income identity and all its
non-zero coefficients as shown below:

Equation Intercept Ct It Yt Tt Yt-1 Rt Gt

Consumption -β0 1 0 -β1 β2 0 0 0

Investment −α 0 0 1 0 0 −α 1 −α 2 0

National 0 -1 -1 1 0 0 0 -1
Income

The resulting matrix is -β0 β2 0 0


−α 0 0 −α 1 −α 2

STEP 4: Obtain the determinant of the resulting matrix of order G-1. The determinants of the
resulting matrices are:

The consumption function

Page 54 of 83
1 −α 1 −α 1 α2 α2 0
-1 0 = −α 1 0 0 =0 0 -1 = −α 2

The investment function.


1 - β1 β1 β1 β2 0

-1 1 = 1 −β 1 1 0 = - β2 0 -1 = - β2

- β0 β2 β2 0 0 0

−α 0 0=
α 0 β2 0 −α 1 = β 2 α 1 −α 1 α2 = 0

STEP 5: If it is possible to obtain at least one non-zero determinant of order G-1, the equations is
identified. Thus all equations in the model are identified by Rank Condition.

ESTIMATION OF SIMULTANEOUS EQUATIONS

When we want to estimate simultaneous equations, we cannot use the Ordinary Least Squares
(OLS) estimator. Application of OLS on a system of simultaneous equations will result in a
problem known as Simultaneous equation bias. This is the bias arising from the application of
classical least squares to an equation belonging to a system of simultaneous equations.

Due to the problem of simultaneous equation bias, then simultaneous equations cannot be
estimated by ordinary least squares(OLS); since the estimated coefficients will not only be
BIASED , but would be INCONSISTENT.

Therefore, simultaneous equations are estimated by either of the following methods.

1. Indirect Least Squares (ILS) method.


2. Two - Stage Least Squares (2SLS) method
3. Three - Stage Least Squares (3SLS) method.
4. Instrumental Variable (IV) method
5. Maximum likelihood (ML) method

These are now discussed as follows:-

i) Indirect Least Squares (ILS) Method

ILS is used to estimate exactly identified equations. This Method is also known as REDUCED-
FORM METHOD.

ILS is appropriate when structural equations contain both exogenous and endogenous variables.

Steps in the Indirect Least Squares method:-

Page 55 of 83
i) Obtain the reduced form equations from the structural equations as illustrated above.

ii) Use the method of Ordinary least Squares (OLS) to estimate the reduced form
coefficients/parameters.

iii) Substitute the parameter estimates obtained in the reduced-form equations to obtain
the coefficients in the structural equation by comparing the direct effect and the total
effect.

The structural parameters will be unique if the structural model was exactly identified. However,
ILS cannot be used when there is multi-collinearity problem, i.e. Among the explanatory
variables.

Similarly, the error tem should satisfy usual assumptions of the classical linear regression model.
i.e, E(U) =0; E(U2)= δ 2 ; E(U U )=0; and E(U X )=0
i j i i

ii) Two – Stage Least Squares(2SLS) Method

Consider the following system of equations

Y=C+I
C=a + b Y (a>0;0<b<1)

Where: Y is National Income; C is consumption and I is the level of Investment.

The parameters a, and b are autonomous consumption and marginal propensity to consume
respectively. We can estimate these equations using 2SLS method using following steps:

STEP 1: Obtain the reduced form equations from structural equations.

STEP 2: Use the method of ordinary Least Squares to estimate the reduced form coefficients and
the reduced form equation. OLS estimation will therefore yield the estimated values for Y and C
(i.e Ŷ and Ĉ).

STEP 3: Replace the estimated values Ŷ and Ĉ on the right hand side of the structural equations
where they appear as follows.

Y=C+I Y= Ĉ + I
C=a + b Y C= a + b Ŷ

STEP 4: Run another OLS regression to these equations to obtain the structural estimates.

2SLS is used to estimate over identified equations, although it can be applied on exactly
identified equations. However, the results for ILS and 2SLS are identical.

Page 56 of 83
In the 2SLS, the OLS estimates are obtained in the first stage while OLS estimates of the
structural equation are obtained in the second stage.

iii) The Method of Instrumental Variables

The instrumental variable method is a single equation method, being applied to one equation of
the system at a time. It has been developed as a solution to the simultaneous equation bias and is
appropriate for over-identified models. In order to reduce the dependence of the error term and
the explanatory variable (the simultaneous bias), the method proposes to use appropriate
exogenous variables as instruments; and then makes use of OLS to estimate equation.

Steps in the instrumental variables method:

STEP 1: Choose the appropriate instrumental variable which will replace the endogenous
variables appearing as exogenous variables I the right hand side of the structural equation.

STEP 2: Multiply the structural equation through by each one of the instrumental variables (as
well as by the exogenous variables already present in it, since these predetermined variables are
their own instruments) and sum the equation over all sample observations.

This will provide as many linear equations as there are known parameters. From the solutions of
these equations, we obtain estimates of the structural parameters.

An instrumental variable is an exogenous variable located somewhere in the system of


simultaneous equations which satisfies the following conditions.

i) It must be strongly correlated with endogenous variable which it will replace in


structural equation.

ii) It must be truly exogenous and hence uncorrelated with the error term of the structural
equation.

iii) It must be least correlated with the exogenous variable already appearing in the set of
explanatory variables of the particular structural equation. This is important so that the
problem of multi-collinearity does not arise.

N.B. Choose as many instrumental variables as there are endogenous variables in the set of
explanatory variables of a particular structural equation.

# Assignment: Read on showing that application of OLS leads to inconsistent estimators for
Simultaneous equations.

Page 57 of 83
TOPIC FOUR: TIME SERIES ECONOMETRICS

Introduction

A time series (TS) is a sequence of numerical data in which each item is associated with a
particular instance in time.

It is also defined as a collection of Random variables (X i) which are ordered in time. Hence a
time series is also referred to as a stochastic process.

Examples of Time Series

 Unemployment (monthly)
 Money supply (weekly)
 Stock price indices (daily)
 Gross domestic product (quarterly)
 Population (deci-annually)
Hence in time series analysis the frequency at which data is measured (periodicity) is an
important aspect.
A time series spreadsheet would appear as follows:

Year Var 1 Var 2 Var 3


1950 - - -
1951 - - -
1952 - - -
. - - -
. - - -
. - - -
2010 - - -

There are 2 types of Time series analysis

1. Univariate Time Series Analysis

This is analysis of a single sequence of data.

Page 58 of 83
2. Multivariate Time Series Analysis

This is analysis of several sets of data for the same sequence of time periods.

The aim of time series analysis is to study the dynamics of the data or to understand the
temporal structure of the data.

STATIONARITY OF TIME SERIES

A time series is said to be stationary if its mean and variance are constant over time and the value
of its covariance between two time periods depends only on the lag or distance between the two
time periods. Such as time series is however said to be weakly stationary or 2nd order stationary.

Therefore for a stochastic process Yt to be weakly stationary it must satisfy the following 3
properties:

E ( Y t ) =μ
1. Constant mean
2
Var ( Y t )=E ( Y t −μ ) =σ 2
2. Constant variance
3. Covariance should only depend on the lag k, between 2 time periods t and t+k
COV ( Y t )= E [ ( Y t −μ )( Y t+k −μ ) ]=γ k

If there is no lag i.e k=0, then the mean is equal to the variance as demonstrated below:
2
COV ( Y t )= E [ ( Y t −μ )( Y t −μ ) ]= E(Y t −μ ) =Var (Y t )

In summary if a Time Series is stationery, then its moments (mean, variance and covariance) will
be time invariant i.e. they do not vary with time. Such a Time Series will also tend to return to its
mean. This is called mean reversion.

A Time Series is said to be strictly stationary if all the moments of its probability distribution
are invariant over time. Thus if a Time Series is such that one or more of its moments vary with
time, then such a series is non-stationary.

Stationary series

Yt

Ῡ Mean or trend line

Page 59 of 83
1
Time(t)

Non-Stationary Series

Yt Mean or trend line

Time (t)

Therefore before conducting any Time Series analysis it is important to ensure that the time
series is stationary, because the results of a stationary series can be generalized to other time
periods, unlike an unstationary series which is of little practical value which we cannot
generalize. In fact, for a stationary series, any shocks to the series will eventually die away.

Therefore, a time series Yt is said to be stationary, if it fulfills the following conditions:

(i) Has a constant mean (mean reverting)

(ii) Has a constant variance

(iii) Has a constant covariance, which only depends on the time lag

(iv) Has a constant skewness and constant kurtosis

(v) Has no unit root, which means that it integrated of order zero

(vi) Has no trend. A trend, that is, an upward movement or a downward movement over
time.

(vii) Have a transitory innovations rather than permanent innovations (that is, a shock to a
stationary series will die away soon, but for a non-stationary series, it persists for
long).

Page 60 of 83
DATA GENERATION PROCESS

There are 6 types of data generation processes of stochastic processes. They describe the
behavior of a time series variable.

1. A Purely Random Walk process


2. A Random Walk process
3. An Autoregressive (AR) process
4. A Moving Average (MA) process
5. An Autoregressive Moving Average(ARMA) process
6. An Autoregressive Integrated Moving Average (ARIMA) process.
These are now explained as follows:
1. A Purely Random Walk process
A Purely random walk process is one in which a time series Y t changes only due to a change

in its error term Ut i.e


Y t =U t
For this reason, a purely random walk process is also known as a white noise process. Thus
a white noise process is stationary and it satisfies all the assumptions of classical linear
regression model (CLRM).

2. A Random Walk process


A Random Walk process is almost similar to a white noise process except for the fact that the
series Yt changes due to a change in the error term U t as well as the lagged value of the
series.

Thus a Random Walk process is defined as:


Y t =Y t−1 +U t
Examples of variables which exhibit such behavior include: stock market prices, exchange
rates, asset prices.

Yt

Time (t)

A random walk process is commonly referred to as Drunkard’s walk. A random walk process
is therefore non-stationary

Page 61 of 83
There are 2 types of random walks

i) Random walk without a drift or intercept or constant


Y t =Y t−1+U t
ii) Random walk with a drift/intercept

Y t =δ +Y t−1 +U t
Drift /intercept

3. An Autoregressive (AR) process


An AR process is a random walk process such that the co-efficient of lagged dependent
variables ( α ) lies between 0 and 1. i.e 0< α <1
Y t =αY t +U t .. . .. .. . .. .. .. . .. .. . .. .. . .. .. . .. .. AR ( 1 )
Hence, an example of an AR is: −1

The model above is an autogressive process of order 1, simply written as AR (1), because the
maximum lag length is 1.
Notice that if α =1, then an AR (1) becomes a random walk process.
Apart from an AR(1) we can have an AR(2), AR(3),…..etc as follows:-

Y t =αY t +U t ............................................ AR ( 1 )
Y t =αY t +α 2 Y t −2 +U t .. . .. .. . .. .. . .. .. .. . .. .. . .. . AR ( 2 )
−1

.
.
.
Y t =αY t +α 2 Y t −2 +. .. . .. .. . .. .+ α p Y t − p +U t . .. .. . AR ( p )
−1
P
Y t =∑ α i Y t +U t
i =1 −i
AR (p)
Whether an AR process will be stationary or non-stationary depends on the value of α .
If α = 1, then the process is non-stationary. But if α<1, then it is stationary.

4. A Moving Average (MA) process

A Moving Average (MA) process is similar to an AR process, except for the fact that the
dependent variable Yt is regressed on the present as well as the past error terms as follows:

Page 62 of 83
Y t =βU t +Ut.................................................MA(1 )
−1
Y t =β 1 U t−1 +β 2 U t−2 +U t. ...................................MA(2)
:
:
Y t =B1 U t−1 +β 2 U t−2 +.................+β q U t−q +U t .....MA(q)
q
Y t =∑ β i U t−i +U t
i=1

Point to note:

i. The M.A process is the inverse of the AR process.


ii. By the very nature of its construction (it consists of error terms only) the MA
process as is always stationary.

5. Auto regressive Moving Average (ARMA) process.

As the name suggests, the ARMA process is obtained by combining the AR and MA process
as is usually specified as an ARMA of order (p,q).

Where
p=order of an AR process
q= order of an MA process
Therefore ARMA (p,q)= AR ( p )+MA(q )
p q
∑ αi Y t−i + ∑ β 1 U t−1+U t
= i=1 i=1

For example: An ARMA (2, 3) combines an AR (2) with an MA (3) as shown below:

ARMA (2,3)=. α 1 Y t−1 +α 2 Y t−2 +


β 1 U t−1 +β 2 U t−2 +
β 3 U t−3 +U t

AR (2) MA (3)

Since the MA is always stationary by construction, as already noted, then the stationarity of
ARMA depends on the stationarity of the AR which in turn depends on the value of α . If α
=1, then it is a random walk which is always non-stationary. However, if α <1, then the time
series is stationary.

6. Autoregressive Integrated Moving Average (ARIMA) process.


If a time series Yt is such that its AR is not stationary then it needs to be made stationary by
differencing the series d times (where d = 0,1,2,3....... est.)

Page 63 of 83
If a series has to be differenced d times in order to make it stationary, then such a series is said to
be integrated of order d. i.e. I (d). Hence an ARIMA model is usually given as, ARIMA (p, d, q)
such that:
p = order of AR
d = number of times that AR needs to be differenced to make it stationary
q = order of MA

For example: An ARIMA (2, 1, 3) means that it is a combination of an AR (2) and an MA (3) but
that the AR (2) needs to be differenced once to make it stationary.

The main limitation of differencing a series, as a way of making a series to become stationary, is
that for every successive differencing, there is a loss of information (loss of degrees of freedom).

PROBLEMS OF NON-STATITIONARY SERIES


If an OLS regression is performed on variables which are non-stationary, two problems will
arise.
i) Non sensical or spurious regression results.

If an OLS regression is performed and one or more variables are non-stationary then the
results or spurious i.e. they will have no meaning.

ii) Inconsistent results

The goodness of fit (R2) or coefficient of determination will be very high but it will not
mean that the model has a good fit.

Similarly, hypothesis testing will become invalid because the standard errors of the
estimates will not be exact and hence inferential procedures will be misleading.

Therefore, before conducting an OLS regression, we need to test the stationarity for each and
every variable in the model.

TYPES OF NON STATIONARITY

There are two main types of non stationarity that is difference stationarity and trend stationarity.

DIFFERENCE STATIONARITY, also known as stochastic non-stationary, is a series that


becomes stationary only after differencing the series. For example, a random walk with drift, as
discussed above, is an example of a difference stationary series.

TREND STATIONARY, also known as deterministic non-stationary, is a series that becomes


stationary only after including a time trend in the model, or detrending the series. Thus, a trend

Page 64 of 83
stationary process is a stationary process around a linear trend, that is Yt = α + βt + Ut, where t is
the trend.

TESTING FOR STATIONARITY

There are 3 main tests for stationarity

1. The Unit Root Test


2. The Dickey-Fuller (DF) Test
3. The Augmented Dickey-Fuller (ADF) Test
4. The Phillips-Perron (PP) Test
5. The KPSS test

1. The Unit Root Test


The stationarity of a time series can be tested directly with a unit root test.
To illustrate, consider an AR (1) model of Y t given as: Yt = ρYt-1 + Ut, where we assume
Ut is a random disturbance term with zero mean and constant variance.
Now, if ρ = 1, then Yt is the random walk, which is non-stationary. Thus, if ρ = 1, then
the series is said to contain a UNIT ROOT.

On the other hand, if |ρ| < 1, then the AR (1) process is stationary. Therefore, the unit root
test of stationarity is a s test of null hypothesis H0: ρ = 1 (non stationarity), against the
alternative hypothesis |ρ| < 1 (stationarity).

If an AR (1) is non stationary, it can be made stationary by differencing it once. In this


case, we say the series is integrated of order one, that is, I (1). Stationary series do not
need any differencing, and are thus said to be integrated of order zero, I (0).

The STRENGTH of the unit root test, as shown above, is that is very easy to understand
and apply in testing for stationarity.
The main LIMITATIONS of the unit root test are three-fold:
(i) The low power of the test, that is, if ρ = 0.95, the test may conclude that the series
is non stationary (since ρ is very close to unity), yet it is stationary.
(ii) The null hypothesis is that of non stationarity, but if we reject this null hypothesis,
it does not automatically imply that now the series is stationary – further tests will
be needed to confirm that.
(iii) The test begins with an AR (1). However, we cannot tell beforehand if indeed the
series to be tested follows an AR (1), or the other types of data generating
processes that were discussed above.
Due to these limitations, the unit root test for stationarity is rarely used, and more robust
tests, such as ADF, Phillips-Perron, and KPSS test.

Page 65 of 83
2. The Dickey-Fuller (DF) Test
The Dickey Fuller test is a test for stationarity that finds out whether a time series contains a
unit root or not. If a time series contains a unit root, then it is said to be non-stationary.
The DF test is derived from an AR (1) AS follows:-
Y t =αY t−1 +U t …………………………………. AR (1)
Subtract Yt-1 from both sides i.e difference once
Y t −Y t−1 =αY t−1 −Y t−1 +U t
ΔY t =( α−1 ) Y t−1 +U t
ΔY t =φY t−1 +U t , where , φ=α−1

This equation is known as the Dickey-Fuller equation and the stationarity is defined as
follows:-

- If φ=0,or ,|α|=1 then Yt is said to contain a UNIT ROOT. If a series contains a unit
root, it is said to be non-stationary.

- If φ<0,or ,|α|<1 , then Yt does not have a unit root, and hence it is stationary.

The Dickey-Fuller equation has 3 different specifications:


ΔY t =φY t−1 +U t ……………………… (No drift, No trend)
ΔY t =β 0 +φY t−1 +U t ……………… (Drift present, No trend)
ΔY t =β 0 +φY t−1 + β1 t +U t ……... (Drift and Trend Present)

The Dickey-Fuller test for testing for the presence of a unit root is done the same way as testing
for statistical significance of coefficients in ordinary regression Models.
Thus the t-calculated is given as:
coefficient φ
=
tau-calculated = s tan dard .. error Se φ

Rather than using the normal t-tables as in ordinary regression models to find critical‘t’ we
instead use the Dickey-Fuller (DF) tables to find the critical ‘t’ better known as critical ‘tau’.
The ‘tau’ values in the DF tables are usually negative. Thus, we always consider the absolute
value in hypothesis testing.

The rule of rule of thumb is as follows:


i) If t-calculated >tau
We reject HO of stationarity and thus conclude that the series is not stationary.
ii) If t-calculated <tau
Page 66 of 83
Do not reject HO or accept HO of stationarity and thus conclude that the series is
stationary.
The level of significance is similar to those in ordinary regression i.e. 1%, 5%, 10%.
However, the DF tests assume an AR (1) which means it only accommodates one lag. Where
there are more than one lag, e.g. AR (2), AR (3), etc, the appropriate tests is the augmented
Dickey Fuller Test.

The Augmented Dickey-Fuller (ADF) Test

The Dickey Fuller test explained above is useful for testing the presence of unit-roots
(stationarity) but assumes that the explanatory variable is the dependent variable that has been
lagged by one period only as follows.
Y t =αY t−1 +U t AR (1)

However, it is possible to have a situation in which the explanatory variables are the dependent
variables which have been lagged by more than one period, say lagged by 2 periods or 3 periods,
and so on. For example, consider an AR (3), as shown below:

Y t =α 1 Y t−1 +α 2 Y t−2 +α 3 Y t−3 +U t AR (3)


Taking first difference, (subtracting Yt-1 on both sides)
Y t −Y t−1 =α 1 Y t−1−Y t−1 +α 2 Y t−2 +α 3 Y t−3 +U t
ΔY t =(α−1 )Y t−1 +α 2 Y t−2 +α3 Y t−3 +U t
ΔY t =φY t−1 +α 2 Y t−2 +α 3 Y t−3 +U t
K
ΔY t =φY t−1 + ∑ αi Y t−i +U t
i=2

This equation is the equation of interest in the Augmented Dickey Fuller (ADF) Test.
Therefore, the ADF test of stationarity is similar in spirit to the Dickey Fuller Test but in spirit to
the Dickey Fuller Test but it improves upon the DF Test by including extra lagged terms of the
dependent variable as explanatory variables.

The reason for including extra lagged terms of the dependent variable is to make the residuals
serially uncorrelated (to remove the problem of auto-correlation).

Page 67 of 83
The name ‘Augmented’ is due to the fact that the model is a Dickey Fuller Test which has simply
been augmented with extra lagged variables of the dependent variable.

The Augmented Dickey Fuller model has 3 different specifications just like the Dickey Fuller
model.
K
ΔY t =φY t −1 + ∑ α i Y t−i +U t
i) i=2 ………………............. (No constant, No trend)
K }+U
ΔY t = β 0 + φY t −1 + ∑ α i Y t −i t
i=2
ii) ¿ ……………....… (Constant present, No trend)
K
ΔY t =β 0 + φY t −1 + β1 t + ∑ α i Y t −i + U t
iii) i =2 ……… (Constant present, trend present)

The last equation which has a constant and trend is said to be the least restrictive and thus it is
the most useful.
Therefore equation (iii) is most preferred for testing for stationarity.

Steps for testing stationarity using the ADF Test

#Step 1
Start with the least restrictive model, that is the one with constant and trend and test for the
following hypothesis.
K
ΔY t =β 0 + φY t−1 + β1 t + ∑ α i Y t −i + U t
i =2
In doing so, we shall compare the calculated‘t’ against the critical tau.

- If calculated t<critical tau, reject H0: φ=0 and conclude that there is no unit root.
Hence, STOP THERE.

- If calculated t>critical tau, do not reject H0: φ =0 and conclude that there is a unit root
and proceed to step 2.

#Step 2
Test for the following hypothesis
H : φ=β 2=0
0

HA: φ≠β 2≠0


In this step, we test for the significance of the trend given that the model was found to have a
unit root.

Page 68 of 83
- If the null hypothesis is rejected, then test again for the presence of unit root but this time
using the standard normal tables rather than the Dickey Fuller Tables.

- If H0: φ =0 is rejected then conclude that the trend is significant and the series also has
a unit root.

- If H0: φ =0 is not rejected, it means that the trends is not statistically significant and
therefore proceed to step 3.

#Step 3
Estimate the model without the trend i.e estimate: equation ii).
K }+U
ΔY t = β 0 + φY t −1 + ∑ α i Y t −i t
i=2
¿ ………………. (Constant present, No trend)
And test for the presence of unit root using the Dickey Fuller Tables with constant present,
and no trend.

- If H0: φ =0 is rejected, there is no unit root and STOP THERE.

- If H0: φ =0 is not rejected, there is a unit root and hence test for the presence of

constant /Drift
β0 . To test for the presence of constant/drift the hypothesis are as
follows:

H0
: β 0 =φ=0

H0:
β 0 ≠φ≠0
- If the null hypothesis above is rejected, it means that the drift/constant is statistically

significant therefore we need to test again for the presence of the unit root (H 0: φ =0)
but using the standard normal distribution rather than the Dickey Fuller Table.

- If you reject H0: φ =0 it means that the drift is significant and also that the series has a
unit root so, STOP THERE.

- On the other hand, if H0: φ =0 is not rejected, it means that the drift/constant is not
statistically significant and therefore proceed to step 4.

Page 69 of 83
#Step 4
Estimate the model without drift and without trend (equation 1)
K
ΔY t =φY t −1 + ∑ α i Y t−i +U t
i=2 ………………............. (No constant, No trend)
Add test for the presence of unit root HO: φ =0 using the Dickey Fuller Table with no
constant and no trend.

- If H0: φ =0 is rejected, it means there is no unit root and thus STOP THERE.
- But if H0: φ =0 is not rejected, conclude that the series has a unit root.

THE PHILLIPS-PERRON (PP) TEST

The Phillips Peron test is similar to the ADF but it automatically corrects for auto-correlated
residuals (Serial Correlation) which the Dickey Fuller Test cannot. In fact, the Phillips Peron test
is non-parametric because it is not based on any econometrics methodology/statistics
methodology. The PP test however yields the same conclusions as the ADF tests.

Thus the Phillips Peron Test is suitable if the residuals are highly correlated and results will be
similar to the ADF test.

INTERGRATED SERIES

If a time series has a unit root it is said to be non-stationary but if a series has no unit root it is
said to be stationary. If a series is non-stationary, then we need to make is stationary because any
regression on non-stationary series will yield two problems.
1. Spurious regression
2. Inconsistent results

There are two methods of converting a non-stationary series to become stationary.

- Differencing
- De-trending

i. Differencing

Here, we keep on including as many lagged variables of the dependent variable until the
series becomes stationary. This was in fact the concept/idea behind the ADF test.

Page 70 of 83
Such a series is therefore said to be difference stationary. Differencing is the most commonly
used approach but it leads to loss of information.

ii. Detrending

Detrending a series is removing the trend to make it stationary and such a series is said to be
trend stationary.

The main limitation of detrending a series is that it will remove non-stationarity but it will
add an MA (1) in the process to the resulting series.

- If a series is found to contain one unit root (non-stationary) it needs to be differenced


once to make it stationary. Such a series that needs to be differenced once to make it
stationary is said to be integrated of order one, which is I (1).

- On the other hand, if a series has 2 unit roots, which means that it is non-stationary it
needs to be differenced twice to make it stationary and such a series is integrated of
order 2 i.e I(2)

In overall, if a series Yt has d unit roots, it needs to be differenced d times to make it

stationary and thus the series is said to be I (d) otherwise expressed as:
Yt ~I(d).

Therefore, if a series Xt has no unit roots it means that it is stationary and therefore there is
no need for differencing. In such a case the series is said to be integrated of order 0. i.e X t
~I(0).

Characteristics of integrated series

Y
1. If Xt ~I (0) and t ~I(1), then Zt=(Xt+Yt) ~I(1)
i.e if one combines a stationary series and a non-stationary series, the resulting series will be
non-stationary.

2. If Xt ~I (1) and Yt ~I (2), then Zt= (Xt+Yt) ~ I (2).

Page 71 of 83
i.e if we combine two non-stationary series of different orders, then the resulting series will
also be non-stationary but its order of integration will be equal to the series which has a
higher order of integration.

3. If Xt ~I (1) then Zt= (a+bXt) ~ I (1).

I.e. if a series is non stationary, then its linear-combination will also be non-stationary of the
same order as before the combination.

4. If Xt ~I (1) and Yt ~I (1), then Zt= (Xt+Yt) ~ I (1).

I.e if one combines 2 non-stationary series of the same order, then resulting series, will also
be non-stationary with the same order.

However, it is also possible that Xt ~I (1) and Yt ~I (1), but Zt= (Xt+Yt) ~ I (0).

If 2 variables are non-stationary but their linear combination is stationary then such variables
such as Xt and Yt are said to be co-integrated.

COINTEGRATION
If two variables,
Xt and Y
t have a long term equilibrium relationship among them, then the
two variables are said to be co integrated. This means that, they move together or they do not
wander or drift away from each other as demonstrated in the figure below.
Yt
Y t , Xt Xt

Time (t)

Therefore, if two time series are individually non-stationary, but their linear combination is
stationary, then the two series are said to be co integrated. Good examples of economic variables
that are co integrated include:

 Money Supply and Inflation


 Unemployment and inflation (the Phillip’s curve)

Page 72 of 83
 Equity prices and dividends
 Spot Prices and future prices.
They move together because they have a long term relationship.

Testing for Co integration

Assume two time series


X
t and Y
t are thought to be co integrated perhaps from economic
theory. We therefore need to formally test this.
First, estimate the regression
Y t =β 0 + β1 X t +U t and obtain the residuals. Use the residuals to

estimate model
U t = ρU t−1+e t where ρ=rho .
Test the following hypothesis:
H0: ρ=1
HA: ρ<1
- If ρ=1 , then the residuals will be non-stationary (they have a unit root and thus X t and
Y t are not Co integrated).

- However, if ρ<1 , it means that residuals will not have a unit root. (The residuals are
stationary) and therefore Xt and Yt are Co integrated.

In a nutshell, if Xt ~I (1) and Yt ~I (1), but Ut~ I (0), then


X
t and Y
t are co integrated. If
2 variables are non-stationary but their linear combination is stationary then such variables
such as Xt and Yt are said to be co-integrated.

The above procedure of testing for Cointergration is known as the Engel Granger two step
procedures. However, the limitation of Engel Granger test is that it assumes only two
variables at a time, such as
X Y
t and t . Thus, the test assumes only one cointergrating
equation between the variables.
Where there are multiple variables (more than two), then we use the Johansen Test.
With Johansen test, where there could be 3 or more variables, we can find more than one
cointergrating equation. Generally, with n variables, we shall most likely have n – 1
cointergrating equations.
The Johansen test, specifies the Eigen values (we pick the maximum Eigen value), the
cointergrating vectors and the rank of the matrix. All these, can be implemented by EViews
and STATA software.

Page 73 of 83
ERROR CORRECTION MODEL (ECM)

While co integration gives us a long term equilibrium relationship between variables, the error
correction model on the other hand gives us a short run relationship between variables.

Therefore the ECM appreciates the fact that while in the long run variables would have a long
run equilibrium relationship, in the short run there could be some disequilibrium which the ECM
takes care of.

The short run relationship between variables is represented by the ECM. Therefore, ECM gives
both the long run and short run relationships between variables.

Consider a simple bi-variate Regression model:


Y t =β 0 + β1 X t +U t . From this model, the
U
error term ( t ) can be expressed as follows:
U t =Y t − β0 −β 1 X t

Therefore the error term can be regarded as a disequilibrium error and it will take a value of 0 if
X t and Y t are in equilibrium (co integrated).

Include the lagged values of Xt and Yt to the bi-variate regression model.

Y t =β 0 + β1 X t +U t +β 2 X t−1 +αY t−1 , 0<α 1

Take the 1st (first) difference (subtract Yt-1) from both sides.

Y t −Y t−1 =β 0 +β 1 X t +β 2 X t−1 +Y t−1 −Y t−1 +U t


ΔY t =β 0+β 1 X t + β2 X t−1 +(α−1 )Y t−1 +U t

Add and subtract


β 1 X t−1 on the RHS of the equation

ΔY t =β 0 +β 1 X t +β 2 X t−1 +β 1 X t−1 −β 1 X t−1 +(α−1)Y t−1 +U t


ΔY t =β 0 +β 1 ( X t −X t−1 )+( β 1 +β 2 ) X t−1 +(α +1)Y t−1 +U t
ΔY t =β 0 +β 1 ΔX t +( β 1 +β 2 )X t−1 +(α+1)Y t−1 +U t
ΔY t =β 0 +β 1 ΔX t +( β 1 +β 2 ) X t−1−(1−α )Y t−1 +U t

To simplify; Let 1−α= λ and


( β1 + β 2 )=φλ

ΔY t =β 0 +β 1 ΔX t +(φλ) X t−1 −λY t−1 +U t

Also let
β 0 =δλ

Page 74 of 83
ΔY t =δλ+β 1 ΔX t +φλ X t−1 −λY t−1 +U t

Factorize λ

ΔY t =β 1 ΔX t − λ [ Y t−1−δ−φX t−1 ] +U t
or
ΔY t =β 1 ΔX t +λ [ δ+φX t−1 −Y t−1 ] +U t

This equation is known as the First Order Error Correction model (ECM). Since we only

introduced the first lags of


Xt and
Yt .

In this model, it can be interpreted as: the current change in


Yt depends on the change in
Xt as well as the disequilibrium error in the last period. Therefore the expression
Y t−1−δ−φX t−1 is called the disequilibrium error in the previous period, but more formally
it is called the error of correction mechanism (ECM)

Therefore the first order ECM obtained above can actually be written as follows:-

ΔY t =β 1 ΔX t − λ ECM +U t
Where: ECM = t−1
Y −δ−φX t−1

The co-efficient of the ECM ( λ ) is called an adjustment factor and thus it shows us the

proportion of the current change in


Yt . i.e
ΔY t that can be explained by the disequilibrium
error in the previous period (ECM).

NOTE:
The ECM is developed only for variables which are co-integrated i.e they have a long term
relationship but the variables themselves are non-stationary.

Therefore before estimating an ECM we first need to test whether the variables are co integrated
because ECM assumes that variables are co integrated but they may have short run
disequilibrium.

TOPIC FIVE: PANEL DATA MODELS

INTRODUCTION

Page 75 of 83
Panel data refers to the pooling or combination of cross-section observations and time series data
together. for example, data on GDP for East African Countries from 1960 to 2015.

 Cross-section models use observations at the same point in time but from different
entities – that is, countries, firms, individuals, etc. cross-section data observations are
given as i = 1, 2, 3 …, N.
 Time series data are measured across different time periods, but for the same entity. Time
series data are given as t = 1, 2, 3 … T.

Pooling cross-section and time-series data enables us to study both changes over time, as well as
differences across entities, thus we shall have NT observations. By combining the data sets, we
will have a larger sample size. If the data set contains observations from the same entities over
time, then the pooled data are called panel data. Panel data is more useful than non-panel data
because it allows us to compare what happens across time more easily.

The use of panel data has become popular due to the increasing availability of time series data
(macro-economic data) for various countries. Panel data may be balanced or unbalanced. It is
balanced if every cross-section follows the same regular frequency with the same date
observations. On the other hand, it is unbalanced if some units do not appear in each time period,
usually due to natural attrition.

Advantages of panel data:

(i) Pooling cross-section and time-series data enables us to study both changes over time,
as well as differences across entities, so we capture dynamics better with panel data.
(ii) With panel data, we have so many observations, a much larger sample size and much
more information. Actually, we have NT observations.
(iii) panel data can be used to deal with heterogeneity in micro units, and thus solve the
omitted variable bias
(iv) By having more information than a pure cross-section or a pure time-series, panel
data yields more efficient estimators. Panel data also creates more variability.
(v) Panel data can be used to examine issues that cannot be studied using time series or
cross-section data, say the role of technological change in production.

Page 76 of 83
A key limitation of panel data is that it may be costly to collect or acquire the data.

ESTIMATION TECHNIQUES FOR PANEL DATA

To estimate panel data models, we cannot simply run separate time-series regression for each
entity, nor run separate cross-section regressions for each year. While doing this, we will miss
out on information contained in the data set for one entity that may be affecting another entity –
thus, our estimates will be less accurate. Indeed, entities are related to each other, and events that
occur over time, will affect many other entities. Combining the data sets therefore increases the
sample size and thus more information to estimate the coefficients.

There are three estimation techniques that we can use with pooled data:

(i) Seemingly Unrelated Regression


(ii) A Dummy Variable Specification (Fixed Effects Regression)
(iii) Error Components Model (Random Effects Model)

We now discuss each of these models as shown below:

1. SEEMINGLY UNRELATED REGRESSION

Seemingly unrelated regression (SUR) is a set of regression equations that seem unrelated but in
reality, they are actually related – that is, they actually do have something in common with each
other.

As an example, the yields of maize, beans and wheat all depend on factors such as: rainfall,
temperature, fertilizer and input costs. Thus, we can have three regression equations, one for
maize, the other for beans, and the last one for wheat, as the dependent variables, but each is
regressed on the same set of independent variables: Rainfall, temperature, fertilizer and input
costs. These three separate but related regressions are seemingly unrelated regressions (SUR). In
simple terms, SUR are regression equations in which the independent variables are the same but
the dependent variables are different.

With SUR, the error terms will be correlated across separate but related regressions. In this way,
the SUR procedure can cause the correlation between error terms to improve the estimates.

Page 77 of 83
The following are the STEPS in SUR:

(a) estimate each equation separately using OLS


(b) save the error term observations from the regressions in step (a) above
(c) use the error term observations to estimate the error variances and correlations between error
terms for different regressions
(d) Utilize the estimated error variances and correlations in a generalized least squares (GLS)
procedure to estimate the regressions jointly.

If all the independent variables take the same value across the different regressions, SUR and
OLS (Ordinary Least Squares) will give the same results.

There are two situations in which SUR turns out the same as running separate OLS regressions:

- if every value for each independent variable is the same across every equation,
- If the correlation between errors terms in the individual regressions is zero (no
autocorrelation in the individual regression equations).

In many cases, however, SUR substantially improves upon the estimates found by OLS (that is,
SUR usually give lower standard errors of the estimates). SUR also is unbiased and consistent.

In SUR, the parameters of the function differ across entities but are constant across time. Thus, a
SUR model is given as: yit = β1i + β2iX2it + β3iX3it + eit

We can test for cross-section restrictions, that is, whether the different equations have identical
coefficients or not. The null hypotheses is that the coefficients are identical for all the equations,
against the alternative hypotheses that at least one pair of coefficients are not equal.

To perform this test, we can use an F test or a Chi-square (X 2) test, both of which are large
sample approximate test:

 The F test has J and (MT – K) degrees of freedom, where M is the number of equations,
K is the total number of coefficients in the whole system, and J is the number of
restrictions.
 the chi-square test has J degrees of freedom, where J is the number of restrictions

Page 78 of 83
At a particular level of significance, if the computed value exceeds the critical test statistic, then
we reject the null hypothesis of equal coefficients, and vice versa.

2. FIXED EFFECTS MODEL

The fixed effects model is also known as the dummy variable specification.

In this model, the intercept parameter varies across firms or entities but not over time. This
means that the intercept is different for each entity but each intercept stays constant over time.
The slope coefficients otherwise known as response parameters do not vary.

Hence, the Fixed Effects (FE) model is given as: yit = β1i + β2X2it + β3X3it + eit

The independent variables must also vary over time. Therefore, all behavioral differences
between individual firms and over time are captured by the intercept. Also, we assume that the
errors are independent and normally distributed with a mean of zero and constant variance for all
entities and in all time periods.

The FE model works by using dummy variables for the intercepts. Each cross-sectional entity in
the model has its own intercept dummy variable, and thus, its own intercept.

To avoid the dummy variable trap, we leave out the regular β 0 intercept, so that the coefficients
of each dummy represent a different intercept for that entity. Thus:

yit = β11D1i + β12D2i + … + β1NDNi + β2X2it + β3X3it + eit

Where D1i = 1 if i = 1, and 0 otherwise

D2i = 1 if i = 2, and 0 otherwise

D3i = 1 if i = 3, and 0 otherwise, and so on

Thus, instead of using N regressions (since we have N dummies), each with a sample size of t,
we can estimate just one regression with a sample size of Nt. each entity gets its own intercept.
The different intercepts allow the FE model to capture differences between entities.

Page 79 of 83
By introducing a fixed effects component in the model, the FE model helps to reduce the effects
of omitted variable bias. The omitted variable bias arises if a variable that is correlated with the
included variables is excluded from the model.

FE is really just a special type of OLS with dummy variables used with pooled cross-section
time-series data. Some econometrics software include a fixed effects option that runs fixed
effects automatically, so that you do not have to add the dummy variables yourself.

The FE model can also be run with the regular β 0 intercept if one dummy variable is omitted.
Each dummy variable coefficient would be equal to the difference between the intercept for that
entity or firm and the intercept for the entity or firm for which we did not specify a dummy
variable (that is, the base entity or firm).

Advantages of Fixed Effects over SUR:

(i) FE allows all the data to be used in one regression, so the sample size is much bigger
(ii) FE therefore has more degrees of freedom than SUR, and hence, more accurate
estimates
(iii) Because it includes the dummy variables, FE estimates more coefficients than the
SUR.

The only limitation of FE is that by estimating many coefficients, it consumes degrees of


freedom rapidly. Also, the R2 of FE is usually very high, but it does not explain why differences
occur across entities.

The F test for Fixed Effects Model

The F test for FE model tests whether we should assume that the intercept is the same for all the
entities (use only one intercept for all the entities) or assume different intercepts for different
entities (the Fixed Effects Model)?

To test whether to use one intercept for all entities or different intercepts for various entities, we
specify the null and alternative hypothesis as follows:

H0: β11 = β12 = … =β1N (the restricted model)

Page 80 of 83
HA: the β1i are not all equal (the unrestricted model or the FE model)

We therefore obtain the residual sum of squares (RSS) for both the restricted and unrestricted
models. Then, we compute the F statistic manually as follows:

RSS RESTRICTE D −RSSUNRESTRICTED /J


FJ, NT – K =
RSSUNRESTRICTED / NT−K

If the computed F statistic exceeds the critical F statistic at a particular level of significance and
(J, NT – K) degrees of freedom, then we reject the null hypothesis that we should use one
intercept. We therefore conclude that the FE model is superior to the model with just one
common intercept.

3. THE RANDOM EFFECTS MODEL

The Random Effects (RE) model is also known as the Error Components Model. In this model,
each entity or firm is allowed to still have a different intercept (as in the fixed effects model), but
the intercepts are now random variables.

Under the fixed effects model, the differences between intercepts for different entities are
considered constant, not random. For the random effects model, the intercepts for different cross-
sectional units are considered to be random variables, that is, they are randomly drawn from a
normal probability distribution.

a random effects model is appropriate if each industry or entity in the cross-sectional data is
chosen at random to represent a larger population. Therefore, the differences between the
intercepts occur because of random variation. For fixed effects model, the sample actually
represents the entire population. For random effects, sample is representative of a population.

The similarity between RE and FE is that in both, the slope coefficients are forced to remain
constant.

The RE model has more degrees of freedom than the FE model, for the same study, because
unlike FE which uses intercept dummy variables, RE does not use intercept dummy variables.

Page 81 of 83
The random effects model takes the intercept β1i to be random, and is modeled as: β1i = β́ 1 +

µi where i = 1, 2, 3… N. β́ 1 is an unknown parameter that represents the population mean


intercept, E (β1i) = β́ 1 and µi is an unobservable random error that accounts for the individual
differences in firm behavior.

If we substitute β1i = β́ 1 + µi in to yit = β1i + β2X2it + β3X3it + eit, we get:

yit = β1i + β2X2it + β3X3it + eit

yit = β́ 1 + β2X2it + β3X3it + (eit + µi)

yit = β́ 1 + β2X2it + β3X3it + vit

Where vit = eit + µi. the phrase error components come from the fact that the error term v it = eit +
µi consists of two components:

- the overall error eit which follows the classical OLS assumptions, and
- The individual error µi which reflects individual preferences – it varies across individuals
but is constant across time, and thus, it does not follow the OLS assumptions.

The error term vit has the following properties:

i) it has a zero mean, E (vit) = 0


ii) it has a constant variance or is homoskedastic, var (vit) = σ2
iii) it has a non-zero correlation, cov (vit, vis) = σ2u for t ≠ s

Due to the above properties, especially property (i), OLS is not the optimal technique to estimate
the model. The generalized least squares (GLS) estimator is a better estimator, and also for
hypothesis testing.

Sometimes, the results of FE and RE show very little differences in the coefficient estimates for
respective independent variables. However, in other situations, the results can be very different,
that it matters a lot whether one uses FE or RE model.

Page 82 of 83
Page 83 of 83

You might also like