Economics Students' Statistics Guide
Economics Students' Statistics Guide
COURSE GUIDE
Course Developer:
Okojie, Daniel Esene
Economics Department
University of Lagos
Edited By:
Dr. Ibrahim Bakare
Department of Economics,
Lagos State University
Course Reviewer
Dr. Mutiu Rasaki
Department of Economics,
Augustine University, Ilara, Epe, Lagos State
1
COURSE
GUIDE
ECO 253
STATISTICS FOR ECONOMIST 1
2
© 2023 by NOUN Press
National Open University of Nigeria
Headquarters
University Village
Plot 91, Cadastral Zone,
Nnamdi Azikwe Expressway
Jabi, Abuja
Lagos Office
14/16 Ahmadu Bello Way
Victoria Island, Lagos.
e-mail: centralinfo@nou.edu.ng
URL: www.nou.edu.ng
ISBN:
3
ECO 253 COURSE
GUIDE
Introduction
Course Competences
Course Objectives
Working through this Course
Study Units
References and Further Readings
Presentation Schedule
Assessment
How to Get the Most from This Course
Online Facilitations
Course Information
Course Unit: 2
Course Blub:
Course Team
Course Developer: NOUN
Instructional Designer:
Learning Technologists:
Copy Editor
4
ECO 253 COURSE
GUIDE
Introduction
Statistics for Economist 1 is a three-credit unit, first semester undergraduate course for Economics Students
in the National Open University of Nigeria. This course focuses progressively on elementary understanding
of distribution functions and other inferential statistical techniques. The course focuses on practical issues
involved in the substantive interpretation of economic data using sampling, estimation, hypothesis testing,
correlation, and regression. For this reason, empirical case studies that apply the techniques to real-life data
are stressed and discussed throughout the course, and students are required to perform several statistical
analyses on their own.
The course is a very useful material to you in your academic pursuit and helps to further broaden your
understanding of the role of statistics in the study of economics. This course is developed to guide you on
what statistics for economists’ entails, what course materials in line with a course learning structure
you will be using. The learning structure suggests some general guidelines for a time frame required of
you on each unit in order to achieve the course aims and objectives effectively.
Course Competences
This course is basically an introduction to statistics and application of Statistics in Economics. The topics
covered in this course include: The Normal, Binomial and Poisson Distributions, Estimate Theory, Test
of Statistical hypothesis including t, f and chi- square tests analysis of least square method, correlation
and Regression analyses. Others are elementary sampling theory and design of experiments, non-
parametric methods, introduction to the central limit theory (CLT) and the law of large numbers
Course Objectives
To achieve the aims set above in addition with the overall slated course objectives, each unit would also
have its specific objectives. The unit objectives are included at the beginning of a unit; you should read
them before you start working through the unit. You may want to refer to them during your study of the
unit to check on your progress. You should always look at the unit objectives after completing a unit.
In this way, you can be certain you have done what was necessary of you by the unit. The course
objectives are set below for you to achieve the aims of the course. On successful conclusion of the course,
you should be able to:
To successfully complete this course, you are required to read the study units, referenced books and other
materials on the course.
5
Each unit contains self-assessment exercises called Student Assessment Exercises (SAE). At some
points in the course, you will be required to submit assignments for assessment purposes. At the end of
the course there is a final examination. This course should take about twelve weeks to complete and some
components of the course are outlined under the course material subsection.
Study Units
There are 20 units in this course which should be studied carefully and diligently.
6
Here module 1 (units 1-4) presents you with the common probability distribution functions as a general
background on the course, statistics for economists; the discreteness of Bernoulli, Binomial and Poisson
distributions and the continuous natures of Normal distribution are shown. Module 2 (units 5-9) explains
some statistical hypothesis tests; the t-test, f-test, chi square test, analysis of variance (ANOVA),
parametric and non-parametric test methods are all introduced. Their usage, significance, samples
comparison and application for economists are also explained.
Correlation and Regression Coefficient Analyses are contained in module 3 (unit 10-
13). This module explores Pearson’s Correlation Coefficient, Spearman’s Rank Correlation Coefficient,
the Least Square Regression Line and Forecasting in Regression. The module 4 (unit 14-17) covers detail
description of an introduction to central limit theory (CLT). CL theorems for independent sequences,
dependent processes and the relation to law of large numbers brought to the students’ knowledge here. Also,
extensions to the theorem and beyond the classical framework are presented in units 17 of module 4.
While basic concepts and notation of elementary Index Numbers and Introduction to Research Methods in
Social Sciences are in units 18-20 of module 5. This module 5 (units 18-20) has present in it: Index Number,
Statistical Data and Sample & Sampling Techniques.
.
Each study unit will take at least two hours, and it include the introduction, objective, main content,
examples, In-Text Questions (ITQ) and their solutions, self-assessment exercise, conclusion, summary and
reference. Other areas border on the Tutor-Marked Assessment (TMA) questions. Some of the ITQ and
self-assessment exercise will require you brainstorming and solving with some of your colleagues. You are
advised to do so in order to comprehend and get acquainted with how important statistics is in making the
most of economics.
There are also statistical materials, textbooks under the reference and other (on-line and off-line) resources
for further reading. They are meant to give you additional information whenever you avail yourself of such
opportunity. You are required to study the materials; practise the ITQ and self-assessment exercise and TMA
questions for greater and in-depth understanding of the course. By doing so, the stated learning objectives of
the course would have been achieved.
Spiegel, Murray R. and Walpole, Ronald E., (1992). Theory and Problems of Statistics op. cit:
Introduction to Statistic. 2nd Ed. Collier Macmillan International Editions.
Dowling Edward T., (2001). Mathematical Economics 2nd Edition; Schaum Outline
Series.
Esan E. O. and Okafor R. O., Basic Statistical Methods, Lagos, Nigeria. JAS Publishers. ISBN – 978
– 33180 – 0 – 4
7
Presentation Schedule
The presentation plan included in your course materials gives you the important dates
for this year for the completion of tutor-marking assignments and attending tutorials. Remember, you are
required to submit all your assignments by due date. You should guide against falling behind in your work.
Assessment
There are two types of assessments this course. First is the tutor-marked assignment
and second, would be a written examination.
In attempting the assignments, you are expected to apply information, knowledge and techniques gathered
during the course. The assignments must be submitted to your tutor/lecturer for formal Assessment in
accordance with the deadlines stated in the Presentation Schedule and the Assignments File. The work
you submit to your tutor for assessment will count for 30 % of your total course mark.
At the end of the course, you will need to sit for a final written examination of three hours' duration.
This examination will also count for 70% of your total course mark.
An advantage of the distance learning is that the study units replace the university lecturer. You can read
and work through specially designed study materials at your own tempo and at a time and place that goes
well with you.
Consider doing it yourself in solving and providing solutions to statistical problems in the lecture instead
of listening and copying solution being provided by a lecturer. In
the same way that a lecturer might set you some practice exercises and ITQ to do, the study units tell you
when to solve problems and read your books or other material, and when to embark on discussion with
your colleagues. Just as a lecturer might give you an in-class exercise, your study units provides exercises
for you to do at appropriate points.
Each of the study units follows a common format. The first item is an introduction to the subject
matter of the unit and how a particular unit is integrated with the other units and the course as a
whole. Next is a set of learning objectives. These objectives let you know what you should be able to do
by the time you have completed the unit. You should use these objectives to guide your study. When you
have finished the unit you must go back and check whether you have achieved the objectives. If you
make a habit of doing this you will significantly improve your chances of passing the course and
getting the best grade.
The main body of the unit guides you through the required understanding from other sources. This will
usually be either from your set books or from a readings section.
Some units require you to undertake practical overview of real life statistical events. You will be directed
when you need to embark on discussion and guided through the tasks you must do.
The purpose of the practical overview of real life statistical events is in twofold. First, it will enhance
your understanding of the material in the unit. Second, it will give you
practical experience and skills to evaluate economic arguments, and understand the
roles of statistics in guiding current economic problems, calculations, analysis, solutions and debates
outside your studies. In any event, most of the critical thinking skills you will develop during studying
are applicable in normal working practice, so it is important that you encounter them during your studies.
Self-assessments are interspersed throughout the units, and answers are given at the ends of the units.
Working through these tests will help you to achieve the objectives of the unit and prepare you for the
assignments and the examination. You should do each self-assessment exercises as you come to it in
the study unit. Also, ensure to master some major statistical theorems and models during the course of
8
studying the material.
The following is a practical strategy for working through the course. If you run into any trouble, consult
your tutor. Remember that your tutor's job is to help you. When you need help, don't hesitate to call and
ask your tutor to provide it.
9
Your tutor will mark and comment on your assignments, keep a close watch on your progress
and on any difficulties you might encounter, and provide assistance to you during the course.
You must mail your tutor-marked assignments to your tutor well before the due date (at least
two working days are required). They will be marked by your tutor and returned to you as soon
as possible.
Do not hesitate to contact your tutor by telephone, e-mail, or discussion board if you need
help. The following might be circumstances in which you would find help necessary.
Contact your tutor if.
• You do not understand any part of the study units or the assigned readings
• You have difficulty with the self-assessment exercises
• You have a question or problem with an assignment, with your tutor's comments on
an assignment or with the grading of an assignment.
You should try your best to attend the tutorials. This is the only chance to have face to face
contact with your tutor and to ask questions which are answered instantly. You can raise any
problem encountered in the course of your study. To gain the maximum benefit from course
tutorials, prepare a question list before attending them. You will learn a lot from participating in
discussions actively.
The general aim of this module is to provide learners‘ with a thorough understanding of
Probability and Statistical Distribution Functions. The focus here is to provide learners‘with the
common probability distribution functions as a general background to the course. The discreteness
of Bernoulli, Binomial and Poisson distributions and the continuous natures of Normal
distribution are presented in this module.
10
Module 1: Probability and Statistic Distribution
Functions
The four units that constitute t h i s module a r e s t a t i s t i c a l l y l i n k e d . At the
e n d o f t h i s module, learners would have been able to list, differentiate and link these
common probability distribution functions as well as identify and use them to solve related
statistical problems. The units to be studied are;
11
UNIT 1: BERNOULLI DISTRIBUTION UNIT STRUCTURE
1.1 Introduction
1.2 Learning Outcomes
1.3 Bernoulli Distribution
1.4 Bernoulli Trials
1.5 Bernoulli Process
1.5.1 Interpretation
1.5.2 Further Explanation
1.5.3 Solved Examples
1.6 Summary
1.7 References/Further Reading/ Web Resources
1.8 Possible Answers to Self-Assessment Exercises (SAEs)
1.1 Introduction
Bernoulli distribution is a discrete probability distribution, meaning it’s concerned with discrete random
variables. A discrete random variable is one that has a finite or countable number of possible values—the
number of heads you get when tossing three coins at once. A Bernoulli distribution is a discrete
distribution with only two possible values for the random variable. The distribution has only two possible
outcomes and a single trial which is called a Bernoulli trial. The two possible outcomes in Bernoulli
distribution are labeled by n=0 and n=1 in which n=1 (success) occurs with probability p and n=0 (failure)
occurs with probability 1-p.
A classical example of a Bernoulli experiment is a single toss of a coin. The coin might come up
heads with probability P and tails with probability 1-P. The experiment is called fair if, P=0.5 indicating
the origin of the terminology in betting (the bet is fair if both possible outcomes have the same
probability).
12
f(k ;p) = {
zero or
1.
The expected value of a Bernoulli random variable X is E(X) = P, and its variance is
Var X = p(1-q)
It should be NOTED that Bernoulli distribution is a special case of the Binomial distribution with n=1.
The kurtosis goes to infinity for high and low values of P, but for P = 0.5 the Bernoulli
distribution has a lower kurtosis than any other probability distribution, namely −2.
The maximum likelihood estimator based on a random sample is the sample mean.
A Bernoulli process is a finite or infinite sequence of binary random variable, so it is a discrete- time
stochastic (involving or showing random behaviour) process that takes only two values specifically
0 and 1. The component Bernoulli variables Xi are identical and independent. In the ordinary sense, a
Bernoulli process is a repeated coin flipping, possibly with an unfair coin (but with consistent
unfairness). Every variable Xi in the sequence is associated with a Bernoulli trial or experiment. They
all have the same Bernoulli distribution. Much of what can be said about the Bernoulli process can also
be generalized to more than two outcomes (such as the process for a six-sided die); this generalization
is known as the Bernoulli scheme.
The problem of determining the process, given only a limited sample of the Bernoulli trials, may be
13
called the problem of checking if a coin is fair.
In other words, a Bernoulli process is a sequence of independent identically distributed Bernoulli trials.
Independence of the trials implies that the process has no memory. Given that the probability p is known,
past outcomes provide no information about future outcomes. (If p is unknown, however, the past informs
about the future indirectly, through inferences about p). If the process is infinite, then from any point the
future trials constitute a Bernoulli process identical to the whole process, the fresh-start property.
1.5.1 Interpretation
The two possible values of each Xi are often called "success" and "failure". Thus, when expressed as a
number 0 or 1, the outcome may be called the number of successes on the ith "trial". Two other
common interpretations of the values are true or false and yes or no. Under any interpretation
of the two values, the individual variables Xi may be called Bernoulli trials with parameter p. In
many applications time passes between trials, as the index i increases. In effect, the trials X1, X2, ... Xi,
... happen at "points in time" 1, 2, ..., i, .... However, passage of time and the associated notions of "past"
and "future" are not necessary. Most generally, any Xi and Xj in the process are simply two from a set
of random variables indexed by {1, 2... n} or by {1, 2, 3...}, the finite and infinite cases.
Several random variables and probability distributions beside the Bernoulli itself may be derived from the
Bernoulli process:
- The number of successes in the first n trials, which has a Binomial distribution B(n, p)
- The number of trials needed to get r successes, which has a negative Binomial
distribution NB(r, p)
- The number of trials needed to get one success, which has a geometric distribution
The Bernoulli process can be formalized in the language of probability spaces as a random sequence
of independent realisations of a random variable that can take values of heads or tails. The state space
for an individual value is denoted by 2 = {H, T}. Specifically, one considers the countable infinite direct
product of copies of 2 = {H, T}. It is common to examine either the one- sided set Ω = 2N = {H, T}N
or the two-sided set Ω =2z. There is a natural topology on this space,
called the product topology. The sets in this topology are finite sequences of coin flips, that is, finite-
length strings of H and T, with the rest of (infinitely long) sequence taken as "don't care". These sets of
finite sequences are referred to as cylinder sets in the product topology. The set of all such strings
14
form a sigma algebra, specifically, a Borel algebra. This algebra is then commonly written as (Ω,Ƒ)
where the elements of Ƒ are the finite-length sequences of coin flips (the cylinder
sets).If the chances of flipping heads or tails are given by the probabilities {p, 1-p}, then one can define
a natural measure on the product space, given by P = {p, 1-p}N(or by P = {p, 1-p}Z for the two-sided
process). Given a cylinder set, that is, a specific sequence of coin flip results [w 1, w2,
w3,..........................wn] at times 1, 2, 3.........n , the probability of observing this particular sequence
is given by; P([w1, w2, w3,..........................wn]) = pk(1-p)n-k
wherek is the number of times that H appears in the sequence, and n-k is the number of times that T
appears in the sequence. There are several different kinds of notations for the above; a common one is to
write
P(X1= w1, X2 = w2 ............., Xn = wn) = pk(1-p) n-k
where each Xi is a binary-valued random variable. It is common to write xi for wi. This probability P is
commonly called the Bernoulli measure.
Note that the probability of any specific, infinitely long sequence of coin flips is exactly zero; this is
because for any. One says that any given infinite sequence has measure zero. Nevertheless, one can still
say that some classes of infinite sequences of coin flips are far more likely than others; this is given by
the asymptotic equipartition property.
To conclude the formal definition, a Bernoulli process is then given by the probability triple, (Ω, Ƒ, P)
(as defined above).
A Bernoulli random variable is one which has only 0 and 1 as possible values. Let p
= P( X = 1)
Thus a Bernoulli distribution X has the following ―table‖
Possible values of X 0 1
Probabilities 1-p p
A Bernoulli random variable is the simplest random variable. It models an experiment in which there are
only two outcomes.Generically, we say that X=1 is a success and X=0 is a failure. We say that p is the
―success‖ probability.
15
Mean and Variance: For a Bernoulli random variable with success probability p:
X = 0(1 − p) +1p = p
2 2 2 2
X = 0 (1 − p) + 1 p− p
2
= p− p = p(1 − p)
16
ECO 253 STATISTICS FOR ECONOMIST 1
Example 1: A fair die is tossed. Let X = 1 only if the first toss shows a ―4‖ or ―5‖.
Solution:
Then
Solution:In a single throw of a die, the outcome ―5‖ is called a success and any other outcome is called
a failure, then the successive throws of a dice will contain Bernoulli trials. Therefore, the
probability of success and the probability of failure
In this unit, you are expected to have learnt the essentials and applications of Bernoulli distribution.
Also, learners‘ by now would have been able toidentify Bernoulli distribution function problems and
solve the problems accordingly.
Spiegel, M. R.and Stephens L.J. (2008).Statistics.(4th ed.).New York: McGraw Hill press. Swift L.,
(1997).Mathematics and Statistics for Business, Management and Finance.(2nd ed.). London:
Macmillan Publishers
McCullagh P. and Nelder J., (1989). Generalized Linear Models, (2nded.). Boca Raton: Chapman and
Hall/CRC. ISBN 0-412-31760-5 Web site:http://en.wikipedia.org/wiki/Bernoulli_distribution
Johnson, N. L., Kotz, S. and Kemp A., (1993).Univariate Discrete Distributions (2nd ed.). Wiley.
ISBN 0-471-54897-9 Web site:http://en.wikipedia.org/wiki/Bernoulli_distribution
Lind, A. D., Marchal, W. G. & Wathen, S. A. (2006) Basic Statistics for Business and Economics (5th ed.).
New York: McGraw-Hill.
17
ECO 253 STATISTICS FOR ECONOMIST 1
Anderson, D. R., Sweeney, D. J., Camm, J. D. & Cochran, J. J. (2014). Statistics for Business and
Economics (12th ed.).South-Western
Barrow, M. (2017). Statistics for Economics, Accounting and Business Studies, (7th ed.). Pearson, United
Kingdom.
McClave, J. T., Benson, P. G. & Sincich. T. (2017) Statistics for Business and Economics, (13th ed.),
Pearson, UK.
Answers to SAE 1
1. A Bernoulli trial is an instantiation of a Bernoulli event. It’s an experiment where there are two
possible outcomes -Success and Failure.
2. Examples of Bernoulli trials are Coin tosses, Football, and Rolling Dice.
3. A Bernoulli process is a sequence of Bernoulli trials. It is a finite or infinite sequence of binary
random variable, involving or showing random behavior, and that takes only two values
specifically 0 and 1.
18
UNIT 2: BINOMIAL DISTRIBUTION
Unit Structure
2.1 Introduction
2.6 Summary
2.1 INTRODUCTION
The binomial probability distribution is a discrete probability distribution that provides many applications. It is
associated with a multiple-step experiment that we call the binomial experiment.
(i) The random experiment is performed repeatedly a finite and fixed number of times. In other words
n, number of trials, is finite and fixed.
(ii) The outcome of the random experiment (trial) results in the dichotomous classification of events.
In other words, the outcome of each trial may be classified into two mutually disjoint categories,
called success (the occurrence of the event) and failure (the non- occurrence of the event). i.e. no
middle event.
(iii) All trials are independent i.e. the result of any trial is not affected in any way, by the result of
any trial, is not affected in any way, by the preceding trials and does not affect
the result of succeeding trials.
(iv) The probability of success (happening of an event) in any trial is p and is constant for each trial. q
= 1-p, is then termed as the probability of failure (non-occurrence of the event) and is constant for
each trial.
For example, if we toss a fair coin n times (which is fixed and finite) then the outcome of any trial is one of
the mutually exclusive events, viz., head (success) and tail (failure). Furthermore, all the trials are independent,
since the result of any throw of a coin does not affect and is not affected by the result of other throws. Moreover,
the probability of success (head) in any trial is ½, which is constant for each trial. Hence, the coin tossing
19
problems will give rise to Binomial distribution.
Similarly, dice throwing problems will also conform to Binomial distribution. More
precisely, we expect a Binomial distribution under the following conditions:
(ii) Each trial results in mutually exclusive and exhaustive outcomes termed as success and failure.
(iii) Trials are independent.
(iv) p, the probability of success is constant for each trial. Then q = 1-p, is the probability of failure in
any trial.
Note: The trials satisfying the above four conditions are also known as Bernoulli trials.
If X denotes the number of success in n trials satisfying the above conditions, then X is a random variable
which can take the values 0,1,2,3 ...n; since in n trials we may get no success (i.e. all failures), one success,
two successes, ..........n successes. The general expression for the probability of r successes is given by:
Proof: Let Si denote the success and Fi denote the failures at the ith trial; i=1,2,.....n. Then we have:
P(Si) = p and P(Fi) = q; i= 1,2,3,...........n ...equation (2)
The probability of r successes and consequently (n-r) failuresin a sequence of n-trials in any fixed specified
order, say, S1 F2 S3 S4 F5 F6................Sn-1Fnwhere S occurs r times and F occurs (n-r) times is given by:
[
all of which are mutually disjoint. The probability for each of these nCr mutually exclusive ways is
r n-r
the same as given in equation (2), viz., p q .
Hence by the addition theorem of probability, the required probability of getting r successes and
consequently (n-r) failures in n trials, in any order what-so-ever is given by:
= nCrPrqn-r; r = 0, 1, 2....................n
R P(r) = P( X = r )
0 nC P0qn = qn
0
1 nC P0qn-1
1
2 nC P2qn-2
2
. .
. .
. .
. .
N nC P0qn = Pn
0
Note:
1. Putting r = 0,1,2,.......n in equation 1, we get the probabilities of 0,1, 2, .........n success
respectively in n trials and these are tabulated in the table above. Since these probabilities are the
successive terms in the Binomial expansion (q + p)n, it is called the BINOMIAL
DISTRIBUTION
2. Total probability is unity, i.e. 1;
∑
3. The expression for P(X = r) in equation 1is known as the probability mass function of the Binomial
distribution with parameters n and p. The random variable X following the probability law
expressed in equation 1 is called the Binomial Variate with parameters n and p. The Binomial
distribution is completely determined, i.e. all the probabilities can be obtained, if n and p are known.
Obviously, q is known when p is given because q = 1-p.
4. Since the random variable X takes only integral values, Binomial distribution is a discrete
21
probability distribution.
5. For n trials, the binomial probability distribution consists of (n+1) terms, the successive binomial
coefficients being,
nC nC , nC , nC , .........., nC ,
0, 1 2 3 n-1
nC
n
Since nC0 = nCn = 1, the first and last coefficient will always be 1.
Further, since nCr = nCn-r, the binomial coefficients will be symmetric. Moreover, we have for all
values of x:
= np[qn-1+(n-1)qn-2p+ qn-3p2+......+pn-1]
22
Ʃr2p(r)=12X nC1qn-1p + 22nC2qn-2.p2 + 32nC3qn-3.p3+........+n2pn
np[qn-1+2(n-1)qn-2p+3/2(n-1)(n-2)qn-3p2+........+npn-1]
= np[{qn-1+(n-1)qn-2p+ qn-3p2+.......+1pn-1}
+{(n-1)qn-2p+(n-1)(n-2)qn-3p2+........+(n-1)Pn-1}]
=np[(q+p)n-1+(n-1)p(q+p)n-2]
=np[1+(n-1)p]
Example 1: Ten unbiased coins are tossed simultaneously. Find the probability of obtaining:
(iii) No head
23
(vi) At least four heads
Recall the Binomial probability law that the probability of r heads is given by
Therefore, 10C6
= 210
=
= 210⁄1024
= 0.205
p (exactly 6 heads) =
(½)8 . (½)2
= 45 X 1/256 X ¼
=
= 45⁄1024
24
= 0.044
(½)9 . (½)1
= 10 X
=
=10 × 45⁄1024
= 450⁄1024
= 0.439
=
=1 × 1⁄1024
= 1⁄1024
= 0.001
P(X= r) = nCrPrqn-r
P( 0 head) = 10C0P0q10-0
=1X1X
P(0 head) =
=1 × 1 × 1⁄1024
= 1⁄1024
= 0.001
25
= 1 – P[No head]
= 1 – P(0)
Recall that P(0) =
=1-
=1-
=
= 1 − 1⁄1024
= 1 − 0.001
= 0.999
= =
= 𝐶010 𝑝 𝑞 + 𝐶110 𝑝10 𝑞10−1 + 𝐶210 𝑝10 𝑞10−2 + 𝐶310 𝑝10 𝑞10−3
10 10−0
1 0 1 10 1 1 1 9 1 2 1 8 1 3 1 7
= 𝐶010 ( ) ( ) + 𝐶110 ( ) ( ) + 𝐶210 ( ) ( ) + 𝐶310 ( ) ( )
2 2 2 2 2 2 2 2
10 10 10
1 1 1 1 10
=1×( ) + 10 × ( ) + 45 × ( ) + 120 × ( )
2 2 2 2
1 10 45 120
= + + +
1024 1024 1024 1024
176
=
1024
= 0.172
(vi) Probability (at least 4 heads) = (X ≥4) = 1 – P (X≤3)
=
= 1 − 0.172 = 0.828
Example 2: There are five flights daily from Pittsburgh via US Airways into the Bradford, Pennsylvania
Regional Airport. Suppose the probability that any flight arrives late is .20.
(a) What is the probability that none of the flights are late today?
(b) What is the probability that exactly one of the flights is late today?
The probability that a particular flight is late is .20.
𝑝 = 0.2; 𝑞 = 1 − 𝑝 = 1 − 0.2 = 0.8
There are five flights, so 𝑛 = 5, and 𝑥, the random variable, refers to the number of successes. In this case a
"success" is a plane that arrives late.
(a) What is the probability that none of the flights are late today?
26
n r n-r
p(r) = P(X=r) = Cr P q
= (1)(1)(0.3277)
= 0.3277
(b) What is the probability that exactly one of the flights is late today?
= (5)(0.2)(0.496)
= 0.496
Self-Assessment Exercises 1
1. The NCC survey shows that 70% of Nigerian households have mobile phones. If 15
households are chosen at random, what is the probability that
2. According to the NCC, 70% of Nigerian households have mobile phones. If 300
households are chosen at random, determine the
27
2.6 SUMMARY
In this unit, learners have been made to understand that a Binomial Distribution is the sum of Independent
Bernoulli Random Variables and that the Binomial distribution describes thedistribution of binary data from a
finite sample. Thus it gives the probability of getting r events out of n trials. In summary, the binomial
distribution describes the behaviour of a count variable X if the following conditions apply:
1. The number of observations n is fixed.
If in your application of Binomial these conditions are met, then X has a Binomial distribution with
parameters n and p, abbreviated B(n, p).
Finance.London,Macmillan
Lind, A. D., Marchal, W. G. & Wathen, S. A. (2006) Basic Statistics for Business and Economics (5th ed.). New
York: McGraw-Hill.
Anderson, D. R., Sweeney, D. J., Camm, J. D. & Cochran, J. J. (2014). Statistics for Business and Economics
(12th ed.).South-Western.
Barrow, M. (2017). Statistics for Economics, Accounting and Business Studies, (7th ed.). Pearson, United
Kingdom.
McClave, J. T., Benson, P. G. & Sincich. T. (2017) Statistics for Business and Economics, (13th ed.), Pearson,
UK.
1. (a) 0.2061
(b) 0.1268
2. (a) 210
(b) 63 (c) 7.9
28
UNIT 3: NORMAL DISTRIBUTION
Unit Structure
3.1 Introduction
3.6 Summary
3.1 INTRODUCTION
The Normal probability distribution commonly called the normal distribution is one of the most important
continuous the theoretical distributions in Statistics. Most of the data relating to economic and business
statistics or even in the social and physical sciences conform to this distribution. The normal distribution was
first discovered by English Mathematician De-voire (1667-1754) in 1733 who obtained the mathematical
equation for this distribution while dealing with problems arising in the game of chance. Normal distribution
is also known as Gaussian distribution (Gaussian Law of Errors) after Karl Friedrich Gauss (1777-1855)
who used the distribution to describe the theory of accidental errors of measurements involved in the
calculation of orbits of heavenly bodies.
Today, normal probability model is one of the most important probability models in statistical analysis.
Its graph, called the normal curve is shown below:
29
0.4
0.3
0.2
0.1
Z
-3 -2 -1 1 2 3
68.27%
95.45%
99.73%
1. The mode which is the point on the horizontal axis where the curve is a maximum occurs at
3. The normal curve approaches the horizontal axis asymptotically as we proceed in either
direction away from the mean.
4. The total area under the curve and above the horizontal axis is equal to 1
When does normal distribution arise?
Because the normal probability density function (pdf) peaks at the mean and ―tails off‖ towards
the extremes, the normal distribution provides a good approximation for many naturally occurring
random variables. However, the normal distribution occurs even more
widely due to the following:
30
1. The total (and also the average) of a large number of random variables which have the same
probability distribution approximately has a normal distribution. For instance, if the amount taken
by a shop in a day has particular (maybe unknown) distribution, the total of
100 days‘ takings is the sum of 100 identically distributed random variables and so it will
(approximately) have a normal distribution. Many random variables are normal because of this.
For example, the amount of rainfall which falls during a month is the total of the amounts of rainfall
which have fallen each day or each hour of the month and so is likely to have a normal distribution.
In the same way the average or total of a large sample will usually have a normal distribution.
This can be explored further by further readings on populations and samples
2. The normal distribution provides approximate probabilities for the binomial distribution when
n, the number of trials is large.
3.3.2 Definitions
1. A random variable X has normal Distribution, and it is referred to as a normal random variable,
if and only if its probability density is given by:
, - ∞ <x< ∞ or
√
, - ∞ <x< ∞ and ζ > 0
√
22
Where and are constants given by = /7 ,√ =2.5066
Note
(i) Definition No.3 is used to determine probabilities relating to random variables having normal
distribution other than the standard normal distribution.
(ii) Because a normal curve is symmetrical about its mean, P(z < -a) = P(z > a)
(iv) Only values of P(z < a) are shown in most statistical tables. For P(z > a), 1- P(z < a) is used.
Students are implored to make copies of normal tables from any standard statistic textbook.
3.4 The Standard Normal Distribution
The number of normal distributions is unlimited, each having a different mean (IL), standard deviation (a), or
both. While it is possible to provide probability tables for discrete distributions such as the binomial and the
Poisson, providing tables for the infinite number of normal distributions is impossible. Fortunately, one
member of the family can be used to determine the probabilities for all normal distributions. It is called the
standard normal distribution, and it is unique because it has a mean of 0 and a standard deviation of 1.
Any normal distribution can be converted into a standard normal distribution by subtracting the mean from
each observation and dividing this difference by the standard deviation. The results are called z values. They
are also referred to as z scores, the z statistics, the standard normal deviates, the standard normal values,
or just the normal deviate.
The formula for 𝑧 𝑠𝑐𝑜𝑟𝑒 𝑜𝑟 𝑣𝑎𝑙𝑢𝑒 is given as:
𝑋− 𝜇
𝑧=
𝜎
31
where:
𝑋 is the value of any particular observation or measurement.
𝜇 is the mean of the distribution.
𝜎 is the standard deviation of the distribution
3.4.1 Solved Example
1. Using normal tables, find the values of the following probabilities: (a)
P(z < 0.50)
(b) P(z < -2.50)
Solution
(a) 280) = P(z > 2.50)
Therefore,
P(X >
32
ECO 253 STATISTICS FOR ECONOMIST 1
(b) P(X < 220)
Z=
= -0.50
Therefore, P(X < 220) = P(z < -0.50)
= 0.3085
(c) P(220 < X < 280)
= P(-0.50 < z < 2.50)
= P(z < 2.50) - P(z < -0.50)
= 0.9938 – 0.3085
= 0.6853
3.. The weekly incomes of shift foremen in the glass industry are normally distributed with a mean of
$1,000 and a standard deviation of $100. What is the z value for the income X of a foreman who earns:
(a) $1,100 per week?
(b) $900 per week?
Solution
Using formula
𝑋− 𝜇
𝑧=
𝜎
(a) X = $1,100
𝑋− 𝜇 1100 − 1000
𝑧= =
𝜎 100
= 1.00
(b. X = $900
𝑋− 𝜇 900 − 1000
𝑧= =
𝜎 100
= −1.00
In the light of the above, binomial related problems can be solve through Poisson approximation using a
combination of both.
Lind, A. D., Marchal, W. G. & Wathen, S. A. (2006) Basic Statistics for Business and Economics (5th ed.).
New York: McGraw-Hill.
Anderson, D. R., Sweeney, D. J., Camm, J. D. & Cochran, J. J. (2014). Statistics for Business and
Economics (12th ed.).South-Western.
Barrow, M. (2017). Statistics for Economics, Accounting and Business Studies, (7th ed.). Pearson, United
Kingdom.
McClave, J. T., Benson, P. G. & Sincich. T. (2017) Statistics for Business and Economics, (13th ed.),
Pearson, UK.
These are the possible answers to the SAEs within the content.
Answers to SAEs 1
1. (i) The curve is symmetric about a vertical axis through the mean μ
2.
(ii) The normal curve approaches the horizontal axis asymptotically as we proceed in either
direction away from the mean.
2.. (a) -0.25 (b) 1.25
34
ECO 253 STATISTICS FOR ECONOMIST 1
4.1 INTRODUCTION
Poisson distribution was derived in 1837 by a French mathematician Simeon D. Poisson (1781 –
1840). Poisson distribution may be obtained as a limiting case of Binomial probability distribution under
the following conditions:
(i) n, the number of trials is indefinitely large i.e.n tends towards infinity
(ii) p, the constant probability of success for each trial is indefinitely small i.e.p tends
towards zero.
(iii) np = μ, is finite
Under the above three conditions the Binomial probability function tends to the probability
function of the Poisson distribution given as:
Where X or r is the number of success (occurrences of the event) μ = np and e =2.71828 (the base of the
system of natural logarithm)
35
ECO 253 STATISTICS FOR ECONOMIST 1
4.5 Formula for Poisson Distribution
Example 1: The mean number of misprints per page in a book is 1.2. What is the probability of
finding on a particular page?
(a) No misprints
(b) Three or more misprints
Solutio
n
μ = 1.2
(a) Pr (No misprints)
=Pr(X=0)
= e-
1.2
=
0.301 (b) Pr(or more misprint)
= Pr(X ≥ 3)
= 1 – [Pr(0) + Pr(1) + Pr(2)]
Pr(0) = 0.301 as in (a) above
Pr(1)
= 0.3612
Pr(2)=
= 0.21672
Pr (0) + Pr(1) + Pr(2) = 0.87892
= 0.12108
= 0.121
Example 2: Suppose a random sample of 1,000 flights shows a total of 300 bags were lost. Thus, the
arithmetic mean number of lost bags per flight is 0.3, found by 300/1,000. Find the
probability of not losing any bags.
Solution
𝜇 𝑋 𝑒 −𝜇
𝑃 (𝑋 ) =
𝑋!
𝑋 = 0; 𝜇 = 0.3
𝜇 𝑋 𝑒 −𝜇
𝑃 (𝑋 ) =
𝑋!
36
ECO 253 STATISTICS FOR ECONOMIST 1
0.30 𝑒 −0.3
𝑃 (0) =
0!
= 0.7408
4.6 SUMMARY
In this unit, student must have learnt the rudiments and applications of Poisson distribution.
Students are must have learnt how to solve problems using Poisson distribution.
Lind, A. D., Marchal, W. G. & Wathen, S. A. (2006) Basic Statistics for Business and Economics (5th ed.).
New York: McGraw-Hill.
Anderson, D. R., Sweeney, D. J., Camm, J. D. & Cochran, J. J. (2014). Statistics for Business and
Economics (12th ed.).South-Western.
Barrow, M. (2017). Statistics for Economics, Accounting and Business Studies, (7th ed.). Pearson, United
Kingdom.
McClave, J. T., Benson, P. G. & Sincich. T. (2017) Statistics for Business and Economics, (13th ed.),
Pearson, UK.
2.. 0.2222
37
ECO 253 STATISTICS FOR ECONOMIST 1
Statistics are helpful in analyzing most collections of data. This is equally true of hypothesis testing
which can justify conclusions even when no scientific theory exists.
Common test Statistics are; t-test, z-test, chi-square test and f-test which is sometimes referred to as
analysis of variance (ANOVA) test.
In this module, five statistical tests will be discussed and analyzed in order to make learners appreciate
and understand of the different statistical hypothesis tests. These statistical tests are:
Unit 1: T- test
Unit 2: F- test
Unit 3: Chi square test
Unit 4: ANOVA
Unit 5: Parametric and Non-Parametric test Methods
38
ECO 253 STATISTICS FOR ECONOMIST 1
UNIT 1: T–TEST
UNIT STRUCTURE
1.1 Introduction
1.2 Learning Outcomes
1.3 T-Test
1.3.1 Characteristics of T-test
1.3.2 Application of t-distribution
1.4 Test for single mean
1.4.1 Assumptions for Student‘s test
1.4.2 One-tailed test
1.4.3 Two-tailed test
1.5. Solved Examples
1.6 Summary
1.7 References/Further Reading/Web Resources
1.8 Possible Answers to Self-Assessment Exercises (SAEs) within the content
1.1 INTRODUCTION
A t-test (also known as Student's t-test) is a tool for evaluating the means of one or two populations using
hypothesis testing. A t-test may be used to evaluate whether a single group differs from a known value (a one-
sample t-test), whether two groups differ from each other (an independent two-sample t-test), or whether there
is a significant difference in paired measurements (a paired, or dependent samples t-test).
If the population variance is unknown then for the large samples, its estimates provided by sample variance
S2 is used and normal test is applied. For small samples an unbiased estimate of population variance
ζ2 is given by:
It is quite conventional to replace ζ2 byS2 (for small samples) and then apply the normal test even
for small samples. W.S.Goset, who wrote under the pen name of Student, obtained the sampling
distribution of the statistic for small samples and showed that it is far from normality. This
√
39
ECO 253 STATISTICS FOR ECONOMIST 1
discovery started a new field, viz ‗Exact Sample Test‘ in the history of statistical inference.
Note: If x1, x2...............xn is a random sample of size n from a normal population with mean μ and
variance ζ2 then the Student‘s t statistic is defined as:
√
Where = is the sample mean and ∑ is an unbiased estimate of the
population variance ζ2
1. 3 T-test
1.3.1 Characteristics of T-test
1. It is a continuous distribution.
2. It is bell-shaped and symmetrical.
3. There is a family of t distributions. Each time the degrees of freedom change, a new distribution is
created.
4. As the number of degrees of freedom increases, the shape of the t distribution approaches that of the
standard normal distribution.
5. The t distribution is flatter, or more spread out, than the standard normal distribution.
(i) t-test for the significance of single mean, population variance being unknown
(ii) t-test for the significance of the difference between two sample means, the population
variances being equal but unknown
(iii) t-test for the significance of an observed sample correlation coefficient
(i) the given normal population has a specified value of the population mean, say μ o.
(ii) the sample mean differ significantly from specified value of population mean.
(iii) A given random sample x1, x2...............xnof size n has been drawn from a normal
population with specified meanμo.
40
ECO 253 STATISTICS FOR ECONOMIST 1
Basically, all the three problems are the same. We set up the corresponding null hypothesis thus:
(b) Ho: There is no significant difference between the sample mean and the population mean.
(c) Ho: The given random sample has been drawn from the normal population with mean μ o.
𝑋̅ − 𝜇
𝑡=
𝑠/√𝑛
We compute the test-statistic using the formula above under Ho and compare it with the tabulated value
of t for (n-1) d.f at the given level of significance. If the absolute value of the calculated t is greater than
tabulated t, we say it is significant and the null hypothesis is rejected. But if the calculated t is less than
tabulated t, Ho may be accepted at the level of significance adopted.
(i) The parent population from which the sample is drawn is normal
(ii) The sample observations are independent i.e the given sample is random. (iii)
The population standard deviation ζ is unknown
We want to develop a hypothesis test for which the decision to reject H0 will lead to the
conclusion that the population mean rating for the Heathrow Airport is greater than 7. Thus,
an upper tail test with 𝐻𝑎 = 𝜇 > 7 is required. The null and alternative hypotheses for this upper
tail test are as follows:
41
ECO 253 STATISTICS FOR ECONOMIST 1
𝐻0 : 𝜇 ≤ 7
𝐻1 : 𝜇 > 7
We will use α _ .05 as the level of significance for the test. Using the formula with𝑥̅ = 7.25, 𝜇0 = 7, 𝑠 =
1.052., and 𝑛 = 60, the value of the test statistic is:
𝑥̅ − 𝜇0
𝑡=
𝑠/√𝑛
7.25 − 7
𝑡=
1.052/√60
= 1.84
The sampling distribution of t has 𝑛 − 1 = 60 − 1 = 59 degrees of freedom. Because the test is an upper
tail test, the p-value is the area under the curve of the t distribution to the right of
𝑡 = 1.84
The t distribution table provided in most textbooks will not contain sufficient detail to determine the exact p-
value, such as the p-value corresponding to 𝑡 = 1.84. For instance using the t distribution with 59 degrees of
freedom provides the following information
We see that 𝑡 = 1.84 is between 1.671 and 2.001. Although the table does not provide the
exact p-value, the values in the “Area in Upper Tail” row show that the p-value must be less
than .05 and greater than .025. With a level of significance of α _ .05, this placement is all
we need to know to make the decision to reject the null hypothesis and conclude that
Heathrow should be classified as a superior service airport.
With t = 1.84 provides the upper tail p-value of .0354 for the Heathrow Airport hypothesis test.
With .0354 < 0.05, we reject the null hypothesis and conclude that Heathrow should be classified as a
superior service airport.
𝐻0 : 𝜇 = 40
𝐻1 : 𝜇 ≠ 40
If 𝐻0 cannot be rejected, Holiday will continue its production planning based on the marketing
director’s estimate that the population mean order quantity per retail outlet will be 𝜇 = 40 𝑢𝑛𝑖𝑡𝑠 units.
However, if 𝐻0 is rejected, Holiday will immediately reevaluate its production plan for the product. A two-
tailed hypothesis test is used because Holiday wants to re-evaluate the production plan if the population mean
quantity per retail outlet is less than anticipated or greater than anticipated. Because no historical data are
available (it’s a new product), the population mean 𝜇 and the population standard deviation must both be
estimated using 𝑥̅ and 𝑠 from the sample data
The sample of 25 retailers provided a mean of 𝑥̅ = 37.4 and a standard deviation of 𝑠 = 11.79 units. Using
the formula, with 𝑥̅ = 37.4, 𝜇0 = 40, 𝑠 = 11.79, and 𝑛 = 25, the value of the test statistic is:
𝑥̅ − 𝜇0
𝑡=
𝑠/√𝑛
37.4 − 40
𝑡=
11.79/√25
= −1.10
Because we have a two-tailed test, the 𝑝 − 𝑣𝑎𝑙𝑢𝑒 is two times the area under the curve of the 𝑡 distribution
for 𝑡 ≤ −1.10. Using Table, the 𝑡 distribution table for 24 degrees of freedom provides the following
information.
t=1.10
The t distribution table only contains positive t values. Because the t distribution is symmetric,
however, the area under the curve to the right of t =1.10 is the same as the area under the curve to the left of t
= -1.10. We see that t =1.10 is between 0.857 and 1.318.
43
ECO 253 STATISTICS FOR ECONOMIST 1
From the “Area in Upper Tail” row, we see that the area in the tail to the right of t = 1.10 is between .20 and
.10. When we double these amounts, we see that the p-value must be between .40 and .20. With a level of
significance of α = .05, we now know that the p-value is greater than α. Therefore, 𝐻0 cannot be rejected.
Sufficient evidence is not available to conclude that Holiday should change its production plan for the
coming season.
The p-value obtained is .2822. With a level of significance of α =.05, we cannot reject 𝐻0 because .2822 >
.05. The test statistic can also be compared to the critical value to make the two-tailed hypothesis testing
decision. With α = .05 and the t distribution with 24 degrees of freedom, −𝑡0.25 = −2.064 and 𝑡0.25 =
2.064are the critical values for the two-tailed test. The rejection rule using the test statistic is:
Based on the test statistic t = -1.10, 𝐻0 cannot be rejected. This result indicates that Holiday should continue
its production planning for the coming season based on the expectation that 𝜇 = 40.
Hint: You are given that for d.f =9, t0.05 = 2.26
Null hypothesis, Ho : μ = 12 kg (i.e the sample mean of =11.8 kg does not differ significantly from
the population mean μ = 12 kg
Solution
𝑋̅ − 𝜇
𝑡=
𝑠/√𝑛
11.8 − 12
𝑡=
0.15/√10
−0.2
𝑡=
0.15/√3.1623
−0.2
𝑡=
0.0474
−0.2
𝑡=
0.0474
𝑡 = 4.219
The tabulated value of t for 9 d.f at 5% level of significance is 2.26. Since the the calculated t is
much greater than the tabulated t, it is highly significant. Hence, null hypothesis is rejected at 5%
44
ECO 253 STATISTICS FOR ECONOMIST 1
level of significance and we conclude that the sample mean differ significantly.
Example 2
The mean life of a battery used in a digital clock is 305 days. The lives of the batteries follow the normal
distribution. The battery was recently modified with the objective of making it last longer. A sample of 20 of
the modified batteries had a mean life of 311 days with a standard deviation of 12 days. Did the modification
increase the mean life of the battery?
(a) State the null hypothesis and the alternate hypothesis.
(b) Compute the value of t. What is your decision regarding the null hypothesis? Briefly summarize your
results.
Solution
(a)
𝐻0 : 𝜇 ≤ 305
𝐻1 : 𝜇 > 305
(b)
𝑋̅ − 𝜇
𝑡=
𝑠/√𝑛
311 − 305
𝑡=
12/√20
6
𝑡=
12/4.4721
6
𝑡=
2.6833
𝑡 = 2.236
Reject Ho because 2.236> 1.729. The modification increased the mean battery life to more than 305 days.
1. The mean length of a small counterbalance bar is 43 millimeters. The production supervisor is
concerned that the adjustments of the machine producing the bars have changed. He asks the
Engineering Department to investigate. Engineering selects a random sample of 12 bars and measures
each. The results are reported below in millimeters.
42 39 42 45 43 40 39 41 40 42 43 42
(a) Formulate the hypothesis
(b) Is it reasonable to conclude that there has been a change in the mean length of the bars? Use the
.05 significance level.
1.6 SUMMARY
In summary, learners would have learnt how to apply t-test in solving statistical problems such as test to
confirm if mean is a certain value, to test significance of the difference between two mean among others.
45
ECO 253 STATISTICS FOR ECONOMIST 1
Swift L. (1997).Mathematics and Statistics for Business, Management and Finance. London:
Macmillan.
Lind, A. D., Marchal, W. G. & Wathen, S. A. (2006) Basic Statistics for Business and Economics (5th ed.).
New York: McGraw-Hill.
Anderson, D. R., Sweeney, D. J., Camm, J. D. & Cochran, J. J. (2014). Statistics for Business and
Economics (12th ed.).South-Western.
Barrow, M. (2017). Statistics for Economics, Accounting and Business Studies, (7th ed.). Pearson, United
Kingdom.
McClave, J. T., Benson, P. G. & Sincich. T. (2017) Statistics for Business and Economics, (13th ed.),
Pearson, UK.
46
ECO 253 STATISTICS FOR ECONOMIST 1
UNIT 2: F Distribution
UNIT STRUCTURE
2.1 Introduction
2.2 Learning Outcomes
2.3 F distribution
2.3.1 Characteristics of F distribution
2.4 Applications of the F-distribution
2.5 Comparing Two Population Variances
2.5.1 Assumption for F distribution for equality of variances
2.5.2 Solved Examples
2.6 Summary
2.7 References/Further Reading/ Web Resources
2.8 Possible Answers to Self-Assessment Exercises (SAEs) within the content
2.1 INTRODUCTION
The F distribution was named to honor Sir Ronald Fisher, one of the founders of modern-day statistics. This
probability distribution is used as the distribution of the test statistic for several situations. It is used to test
whether two samples are from populations having equal variances, and it is also applied when we want to
compare several population means simultaneously.
2.3 F distribution
The F family of distributions resembles the 𝜒 2 distribution in shape: it is always non-negative and is skewed
to the right. It has two sets of degrees of freedom (these are its parameters, labelled 𝜈1 and 𝜈2 ) and these
determine its precise shape.
• Two Barth shearing machines are set to produce steel bars of the same length. The bars, therefore, should
have the same mean length. We want to ensure that in addition to having the same mean length they also have
similar variation.
• The mean rate of return on two types of common stock may be the same, but there may be more variation in
the rate of return in one than the other. A sample of 10 Internet stocks and 10 utility socks shows the same
mean rate of return, but there is likely more variation in the Internet stocks.
• A study by the marketing department for a large newspaper found that men and women spent about the same
amount of time per day reading the paper. However, the same report indicated
there was nearly twice as much variation in time spent per day among the men than the women.
The F distribution is also used to test assumptions for some statistical tests. Recall that. in the previous chapter
when small samples were assumed, we used the t test to investigate whether the means of two independent
populations differed. To employ that test, we assume that the variances of two normal populations are the
same. The F distribution provides a means for conducting a test regarding the variances of two normal
populations.
Regardless of whether we want to determine if one population has more variation than another population or
validate an assumption for a statistical test, we first state the null hypothesis. The null hypothesis could be that
the variance of one normal population, 𝜎12 , equals the variance of the other normal population, 𝜎22 . The
alternate hypothesis is that the variances differ. In this instance the null hypothesis and the alternate hypothesis
are:
𝐻0 : 𝜎12 = 𝜎22
𝐻0 ∶ 𝜎12 ≠ 𝜎22
To conduct the test, we select a random sample of 𝑛1 observations from one population, and a sample of 𝑛2
observations from the second population. The test statistic is defined as follows:
𝑠12
𝐹=
𝑠22
Where the terms 𝑠12 and 𝑠22 are the respective sample variances
The test statistic follows the F distribution with 𝑛1 − 1 and 𝑛2 − 1degrees of freedom. In order to reduce
the size of the table of critical values, the larger sample variance is placed in the numerator; hence, the tabled
F ratio is always larger than 1.00. Thus, the right-tail critical value is the only one required. The critical value
48
ECO 253 STATISTICS FOR ECONOMIST 1
of F for a two-tailed test is found by dividing the significance level in half (𝛼/2) and then referring to the
appropriate degrees of freedom.
Example 2
2.5.1 Assumption for F-test for equality of variances
1. The samples are simple random samples
2. The samples are independent of each other
3. The parent populations from which the samples are drawn are normal
N.B (1) Since the most available tables of the significant values of F are for the right-tail test, i.e
against the alternative H o : ζ 21
> ζ 22, in numerical problems we will take greater of the variances
or as the numerator and adjust for the degree of freedom accordingly. Thus, in F ~ (v1, v2), v1
refers to the degree of freedom of the larger variance, which must be taken as the numerator
while computing F.
If Ho is true i.e ζ 21 = ζ22 = ζ2 the value of F should be around 1, otherwise, it should be greater
49
than 1. If the value of F is far greater than 1 the Ho should be rejected. Finally, if we take larger of or as
the numerator, all the tests based on the F-statistic become right tailed tests.
- All one tailed tests for Ho at level of significance ―α‖ will be right tailed tests only with area ―α‖
in the right.
- For two-tailed tests, the critical value is located in the right tail of F-distribution with area
(α/2) in the right tail.
Formula for F distribution
2.5.2 Solved Examples
Example 1: The time taken (in minutes) by drivers to drive from Town A to Town B driving two
Car Type X: 20 16 26 27 23 22
Car Type Y: 27 33 42 35 32 34 38
Do the data show that the variances of time distribution from population from which the samples
Solution:
X d = x – 22 d2 Y d = y -35 D2
20 -2 4 27 -8 64
16 -6 36 33 -2 4
26 4 16 42 7 49
25 5 9 35 0 0
23 1 1 32 -3 9
22 0 0 34 -1 1
38 3 9
Total 2 d2 = 82 -4 ΣD2 =136
𝑠12 82
𝐹= 2=
𝑠2 136
= 0.603
Tabulated F0.05(6,5) =4.95
Since the calculated F is less than tabulated F, it is not significant. Hence Ho may be accepted at
5% level of significance or risk level. We may therefore conclude that variability of the time
distribution in the two populations is same.
50
Example 2: Lammers Limos offers limousine service from the city hall in Toledo, Ohio, to Metro Airport in
Detroit. Sean Lammers, president of the company, is considering two routes. One is via U.S. 25 and the other
via Interstate-75. He wants to study the time it takes to drive to the airport using each route and then compare
the results. He collected the following sample data, which is reported in minutes. Using the .10 significance
level, is there a difference in the variation in the driving times for the two routes?
US Route 25 Interstate 75
52 59
67 60
56 61
45 51
70 56
54 63
64 57
65
U.S. Route 25
∑𝑋
𝑋̅ =
𝑛
408
𝑋= = 58.29
7
∑(𝑋 − 𝑋̅)2
𝑠= √
𝑛−1
485.43
𝑠= √
7−1
485.43
𝑠= √
6
𝑠 = 8.9947
Interstate 75
∑𝑋
𝑋̅ =
𝑛
51
472
𝑋̅ = = 59.00
8
∑(𝑋 − 𝑋̅)2
𝑠= √
𝑛−1
134
𝑠= √
8−1
134
𝑠= √
7
𝑠 = 4.3753
There is more variation, as measured by the standard deviation, in the U.S. 25 route than in the 1-75 route. This
is somewhat consistent with his knowledge of the two routes; the U.S. 25 route contains more stoplights,
whereas 1-75 is a limited-access interstate highway. However, the 1-75 route is several miles longer. It is
important that the service offered be both timely and consistent, so he decides to conduct a statistical test to
determine whether there really is a difference in the variation of the two routes.
𝐻0 : 𝜎12 = 𝜎22
𝐻0 ∶ 𝜎12 ≠ 𝜎22
52
TABLE 2-1 Critical Values of the F Distribution, 𝛼 = 0.05
Degree of Degree of freedom for numerator
freedom for 5 6 7 8
denominator
1 230 234 237 239
2 19.3 19.3 19.4 19.4
3 9.01 8.94 8,89 8.85
4 6.26 6.16 6.09 6.04
5 5.05 4.95 4.88 4.82
6 4.39 4.28 4.21 4.15
7 3.97 3.97 3.79 3.73
8 3.69 3.58 3.50 3.44
9 3.48 3.37 3.29 3.23
10 3.33 3.22 3.14 3.07
Step 5: The final step is to take the ratio of the two sample variances, determine the value of the test statistic,
and make a decision regarding the null hypothesis. Note that formula (12-1) refers to the sample variances but
we calculated the sample standard deviations. We need to square the standard deviations to determine the
variances.
𝑠12
𝐹= 2
𝑠2
(8.9947)2
𝐹=
(4.3753)2
𝐹 = 4.23
The decision is to reject the null hypothesis, because the computed F value (4.23) is larger than the critical
value (3.87). We conclude that there is a difference in the variation of the travel times along the two routes.
Route B 59 60 61 51 56 63 57 65
53
2.6. SUMMARY
In summary learners would have learnt the theories and application of the F-test. Such knowledge would
definitely enhance learners‘ ability to solve more challenging statistical problems related to
F-test.
Swift L. (1997).Mathematics and Statistics for Business, Management and Finance. London:
Macmillan.
Lind, A. D., Marchal, W. G. & Wathen, S. A. (2006) Basic Statistics for Business and Economics (5th ed.).
New York: McGraw-Hill.
Anderson, D. R., Sweeney, D. J., Camm, J. D. & Cochran, J. J. (2014). Statistics for Business and
Economics (12th ed.).South-Western.
Barrow, M. (2017). Statistics for Economics, Accounting and Business Studies, (7th ed.). Pearson, United
Kingdom.
McClave, J. T., Benson, P. G. & Sincich. T. (2017) Statistics for Business and Economics, (13th ed.), Pearson,
UK.
2. 𝐹 = 4.23
54
UNIT 3: CHI-SQUARE TEST
UNIT STRUCTURE
3.1 Introduction
3.2 Learning Outcomes
3.3 Chi-Square
3.1 INTRODUCTION
A chi-square test is a statistical test that is used to compare observed and expected results. The goal of this test
is to identify whether a disparity between actual and predicted data is due to chance or to a link between the
variables under consideration. As a result, the chi-square test is an ideal choice for aiding in our understanding
and interpretation of the connection between our two categorical variables.
A chi-square test or comparable nonparametric test is required to test a hypothesis regarding the distribution of
a categorical variable. Categorical variables, which indicate categories such as animals or countries, can be
nominal or ordinal. They cannot have a normal distribution since they can only have a few particular values.
3.3 Chi-Square
Chi-square is a measure of discrepancy existing between the observed and expected frequencies is
supplied by the statistics χ2 given by:
2
(𝑓0 − 𝑓𝑒 )2
𝜒 =
𝑓𝑒
Follows χ2-distribution with v = n-1, d.f where O1, O2,..................On are the observed frequencies and E1,
E2,..................En are the corresponding expected or theoretical frequencies obtained under some theory or
hypothesis.
3.4.1 Steps for computing χ2 and drawing conclusions
(i) Compute the expected frequencies E1, E2, .....................En corresponding to the
observed frequencies O1, O2, ...................On under some theory or hypothesis
(ii) Compute the deviations (O-E) for each frequency and then square them to obtain (O- E)2.
(iii) Divide the square of the deviations (O-E)2 by the corresponding expected frequency to
obtain (O-E)2/E.
(iv) Add values obtained in step (iii) to compute χ2 = ∑
(v) Under the null hypothesis that the theory fits the data well, the statistic follows χ 2-
distribution with v = n-1 d.f
2
(vi) Look for the tabulated (critical) values of χ for (n-1) d.f at certain level of significance, usually
5% or 1%, from any Chi-square distribution table.
If calculated value of χ2 obtained in step (iv) is less than the corresponding tabulated value
obtained in step (vi), then it is said to be non-significant at the required level of significance.
56
This implies that the discrepancy between observed values (experiment)
and the expected values (theory) may be attributed to chance, i.e fluctuations of
sampling. In other words, data do not provide us any evidence against the null hypothesis
[given in step (v)] which may, therefore, be accepted at the required level of significance and we
may conclude that there is good correspondence (fit) between theory and experiment.
(vii) On the other hand, if calculated value of χ2 is greater than the tabulated value, it is said
: : : ............. ............ : :
: : : ............. ............ : :
Where (Ai ) is the frequency of the ith attribute Ai,i.e, it is, number of persons possessing the attribute
Ai , i=1,2, .........r; (Bj) is the number of persons possessing the attribute Bj, j=1,2.....s; and (AiBj) is the
number of persons possessing both the attributes Ai and Bj ; (i: 1, 2, ......r; j: 1, 2, ...., s)
Under the hypothesis that the two attributes A and B are independent, the expected frequency for
(Ai, Bi) is given by
E[(AiBj)] = N.P [AiBj] = N.P[Ai ∩Bj] = N.P [Ai]. P[Bj]
=N
If (AiBj)o denotes the expected frequency of (AiBj) then
57
(AiBj)o = ; (i = 1, 2, ........,r; j=1,2, .........s)
Thus, under the null hypothesis of independence of attributes, the expected frequencies for each of the cell
frequencies of the above table can be obtained on using this last equation. The rule in the last can be stated
in the words as follows:
―Under the hypothesis of independence of attributes the expected frequency for any of the cell
frequencies can be obtained by multiplying the row totals and the column totals in which the frequency
occurs and dividing the product by the total frequency N”.
Here, we have a set of r x s observed frequencies (AiBj) and the corresponding expected frequencies
(AiBj)o. Applying χ2–test of goodness of fit , the statistic
[ ]
χ2 = ∑ ∑ [ ]
Note: For the contingency table data, the null hypothesis is always set up that the attributes under
consideration are independent. It is only under this hypothesis that formula (AiBj)o = ; (i =
1, 2, ........,r; j=1,2, .........s) can be used for computing expected frequencies.
It should be noted that the expected sums if the dice are fair, are determined from the
distribution of x as in the table below:
58
Sum (x) P(x)
2 1/36
3 2/36
4 3/36
5 4/36
6 5/36
7 6/36
8 5/36
9 4/36
10 3/36
11 2/36
12 1/36
To obtain the expected frequencies, the P(x) is multiplied by the total number of trials
59
ECO 253 STATISTICS FOR ECONOMIST 1
Therefore χ1 2 = (O1 – E1)2 /E1 = (15 – 13.9)2 /13.9 =
0.09 χ22 = (O2 – E2)2/E2 = (35 – 27.8)2/27.8
= 1.86 χ32 = (O3 – E3)2/E3 = (49 –
41.7)2/41.7 = 1.28 χ42 = (O4 – E4)2/E4 =
(58 – 55.6)2/55.6 = 0.10 χ52 = (O5 –
E5)2/E5 = (65 – 69.5)2/69.5 = 0.29
χ62 = (O6 – E6)2/E6 = (76 – 83.4)2/83.4 = 0.66
Therefore, χ2 = 0.09 + 1.86 + 1.28+ 0.10 + 0.29 + 0.66 + 0.09 + 0.35 + 1.08 + 0.05 + 4.49
χ2 = 10.34
Decision: since the calculated value which is 10.34 is less than table (critical) value the null
hypothesis is accepted.
Conclusion: There is no significant difference between observed and expected frequencies. The
slight observed differences occurred due to chance.
Example 2: A movie producer is bringing out a new movie. In order to map out her advertising, she
wants to determine whether the movie will appeal most to a particular age group or whether it will
appeal equally to all age groups. The producer takes a random sample from persons attending a pre-
reviewing show of the new movie and obtained the result in the table below. Use Chi-square
(χ2) test to arrive at the conclusion (α=0.05).
60
ECO 253 STATISTICS FOR ECONOMIST 1
Age-groups (in years)
Persons Under 20 20-39 40 – 59 60 and over Total
Liked the movie 320 80 110 200 710
Disliked the movie 50 15 70 60 195
Indifferent 30 5 20 40 95
Total 400 100 200 300 1,000
Solution:
It should be noted that the two attributes being considered here are the age groups of the people
and their level of likeness of the new movie. Our concern here is to determine whether the two
attributes are independent or not.
Null hypothesis (Ho): Likeness of the of the movie is independent of age group (i.e. the movie
appeals the same way to different age group)
Alternative hypothesis (Ha): Likeness of the of the movie depends on age group (i.e. the movie
appeals differently across age group)
As earlier explained, to calculate the expected value in the cell of row 1 column 1, we divide the
product of row 1 total and column 1 total by the grand total (N) i.e.
Eij = (AiBj)/N
61
χ2calculated=∑ ∑ =4.56+1.14+7.12+0.79+10.05+1.04+24.64+0.04+1.68+2.13+0.05
+4.64 = 57.97
Recall, that the d.f is (number of row minus one) X (number of column minus one)
χ2(r-1)(s-1) = 12.59 (critical value)
Decision: Since the calculated χ2 value is greater than the table (critical value) we shall reject the null
hypothesis and accept the alternative.
Conclusion: It can be concluded that the movie appealed differently to different age groups (i.e likeness
of the movie is dependent on age).
2. The sample below is from the marketing manager for a manufacturer of sports cards. The
number of cards sold for each player is shown below:
Players TL TM TN TO TP TQ
Card sold 13 33 14 7 36 17
(a) Can the manager conclude the sales are not the same for each player?
(b) What is the decision regarding the null hypothesis?
3.6 SUMMARY
In this unit, we have examined the concept of chi-square and its scope. We also look at its methodology
and applications. It has been emphasized that it is not just an ordinary statistical exercise but a practical
tool for solving day-to-day business and economic problems.
Swift L. (1997).Mathematics and Statistics for Business, Management and Finance. London: Macmillan.
Lind, A. D., Marchal, W. G. & Wathen, S. A. (2006) Basic Statistics for Business and Economics (5th ed.).
New York: McGraw-Hill.
Anderson, D. R., Sweeney, D. J., Camm, J. D. & Cochran, J. J. (2014). Statistics for Business and
Economics (12th ed.).South-Western.
Barrow, M. (2017). Statistics for Economics, Accounting and Business Studies, (7th ed.). Pearson, United
Kingdom.
McClave, J. T., Benson, P. G. & Sincich. T. (2017) Statistics for Business and Economics, (13th ed.),
Pearson, UK.
https://www.simplilearn.com/tutorials/statistics-tutorial/chi-square-test
(b) The computed 𝜒 2 of 34.40 is in the rejection region beyond the critical value of 11.070. The decision,
therefore,is to reject 𝐻0 at the .05 level and to accept 𝐻1 • The difference between the observed and the
expected frequencies is not due to chance. Rather, the differences between 𝑓0 and 𝑓𝑒 are large enough to be
considered significant. The chance these differences are due to sampling error is very small. So we conclude
that it is unlikely that card sales are the same among the six players.
63
UNIT 4: ANALYSIS OF VARIANCE (ANOVA)
UNIT STRUCTURE
4.1 Introduction
4.5 Steps for testing hypothesis for more than two means (ANOVA)
4.5.1 Solved Examples
4.6 Summary
4.1 INTRODUCTION
In day-to-day business management and in sciences, instances may arise where we need to compare
means. If there are only two means e.g. average recharge card expenditure between male and female
students in a faculty of a University, the typical t-test for the difference of two means becomes handy to
solve this type of problem. However in real life situation man is always confronted with situation
where we need to compare more than two means at the same time. The typical t-test for the difference
of two means is not capable of handling this type of problem; otherwise, the obvious method is to
compare two means at a time by using the t-test earlier treated. This process is very time consuming,
since as few as 4 sample means would require 4 C2 =
6, different tests to compare 6 possible pairs of sample means. Therefore, there must be a
procedure that can compare all means simultaneously. One such procedure is the analysis of
variance (ANOVA). For instance, we may be interested in the mean telephone recharge
expenditures of various groups of students in the university such as student in the faculty of Science,
Arts, Social Sciences, Medicine, and Engineering. We may be interested in testing if the average monthly
expenditure of students in the five faculties are equal or not or whether they are drawn from the same
normal population. The answer to this problem is provided by the technique of analysis of variance. It
should be noted that the basic purpose of the analysis of variance is to test the homogeneity of several
means.
The term Analysis of Variance was introduced by Prof. R.A Fisher in 1920s to deal with problem s in the
analysis of agronomical data. Variation is inherent in nature. The total variation in any set of numerical
data is due to a number of causes which may be classified as:
The variation due to assignable causes can be detected and measured whereas the variation due to chances
is beyond the control of human and cannot be traced separately.
64
4.2 LEARNING OUTCOMES
4.3 ANOVA
4.3.1 Assumption for ANOVA test
ANOVA test is based on the test statistic F (or variance ratio). For the validity of the F-test in
(ii) Parent population from which observation are taken are normal. (iii)
Various treatment and environmental effects are additive in nature.
ANOVA as a tool has different dimensions and complexities. ANOVA can be (a) One-way
classification or (b) two-way classification. However, the one-way ANOVA will dealt with in this
course material
Note
(i) ANOVA technique enables us to compare several population means simultaneously and thus results
in lot of saving in terms of time and money as compared to several experiments required
for comparing two populations means at a time.
(ii) The origin of the ANOVA technique lies in agricultural experiments and as such its language
is loaded with such terms as treatments, blocks, plots etc. However, ANOVA technique is so
versatile that it finds applications in almost all types of design of experiments in various
diverse fields such as industry, education, psychology, business, economics etc.
(iii) It should be clearly understood that ANOVA technique is not designed to test equality of several
population variances. Rather, its objective is to test the equality of several population means or the
homogeneity of several independent sample means.
(iv) In addition to testing the homogeneity of several sample means, the ANOVA technique is now
frequently applied in testing the linearity of the fitted regression line or the
significance of the correlation ratio.
Assuming n sample observations of random variable X are divided into k classes on the basis of some
criterion or factor of classification. Let the ithclass consist of niobservations and let:
65
The n sample observations can be expressed as in the table below:
observation
: : : : : : :
: : : : : : :
: : : : : : :
Such scheme of classification according to a single criterion is called one-way classification and its
analysis of variance is known as one-way analysis of variance.
The total variation in the observations Xijcan be split into the following two components:
(i) The variation between the classes or the variation due to different bases of classification
(commonly known as treatments in pure sciences, medicine and agriculture). This type of
variation is due to assignable causes which can be detected and controlled by human
endeavour.
(ii) The variation within the classes, i.e. the inherent variation of the random variable within
the observations of a class. This type of variation is due to chance causes which are beyond the
control of man.
The main objective of the analysis of variance technique is to examine if there is significant
difference between the class means in view of the inherent variability within the separate classes.
4.5 Steps for testing hypothesis for more than two means (ANOVA): Here, we adopt the rejection
region method and the steps are as follows:
Also, compute the mean of all the data observations in the k-classes by the formula:
∑
= ∑ ∑
∑
Step 5: Obtain the Within Classes Sum of Squares (WSS) by the formula:
∑ ∑ ∑
Step 8: Find the critical value of the test statistic F for the degree of freedom and at desired level
of significance in any standard statistical table.
If computed value of test-statistic F is greater than the critical (tabulated) value, reject (Ho,
otherwise Ho may be regarded as true.
Hospital A: 4 3 8 7 7 1 5
Hospital A: 1 4 9 8 7 2 3
67
Calculations for various Sum of Squares
8 4 1 4.5796 1 14.8996
5 3 4 0.7396 4 0.7396
9 8 9 9.8596 9 17.1396
2 7 8 14.8996 4 9.8596
7 7 7 1.2996 4 4.5796
8 1 2 4.5796 16 8.1796
2 5 3 14.8996 0 3.4596
= T1 = 41 35 = 41 =50.8572 =38
∑
= ;
∑
=
∑
=
Within Sample Sum of Square: To find the variation within the sample, we compute the sum of the
square of the deviations of the observations in each sample from the mean values of the respective
samples (see the table above)
Sum of Squares within Samples = ∑ ∑ ∑
68
Between Sample sum of Squares: ∑
To obtain theECvOari2a5t3ion between samples, we compute the sum of the squares oST
f AthTeISdTeICvSiaFtO
ioRnEsCoOfNOMIST 1
= 0.3844;
= 0.0576;
= 0.1444;
Sum of square Between Samples (hospitals):
∑ = (
= 7(0.3844) + 7(0.0576) + 7(0.1444)
69
ECO 253 STATISTICS FOR ECONOMIST 1
Note: Sum of Squares Within Samples + S.S Between Samples = 147.71 + 4.10 =151.81
ANOVA TABLE
(1) (4) =
Between Samples 3-1 =2 4.10
(Hospitals)
(Error)
70
ECO 253 STATISTICS FOR ECONOMIST 1
Critical Value: The tabulated (critical) value of F for d.f (v1=2, v2=18) d.f at 5% level of
significance is 3.55
Since the calculated F = 0.25 is less than the critical value 3.55, it is not significant. Hence we fail to
accept Ho.
However, in cases like this when MSS between classes is less than the MSS within classes, we
need not calculate F and we may conclude that the means,, and do not differ
significantly. Hence, Ho may be regarded as true.
Conclusion: Ho : μ1 = μ2 = μ3, may be regarded as true and we may conclude that there is no
significant difference in the average stay at each of the three hospitals.
Critical Difference: If the classes (called treatments in pure sciences) show significant effect then we
would be interested to find out which pair(s) of treatment differs significantly. Instead of calculating
Student‘s t for different pairs of classes (treatments) means, we calculate the Least Significant
Difference (LSD) at the given level of significance. This LSD is also known as
Critical Difference (CD).
The LSD between any two classes (treatments) means, say and at level of significance ‗α‘ is
given by:
LSD ( - ) = [The critical value of t at level of significance α and error d.f] X [S.E ( - )]
Note: S.E means Standard Error. Therefore, the S.E ( - ) above mean the standard error of the
difference between the two means being considered.
= t n-k (α/2) X √
Step 5: Compute ∑
=
Step 7: Within Classes or Error Sum of Squares = Total S.S – Between Classes S.S The
calculations here are much simpler and shorter than in the first method
Application: Let us now apply this alternative method to solve the same problem treated earlier. n=
Total number of observation = 7 + 7 + 7 = 21
71
ECO 253 STATISTICS FOR ECONOMIST 1
Grand Total (G) = ∑ ∑
But ∑ ∑ ∑
Therefore, BCSS =
=
Therefore, Within Classes (hospitals) Sum of Squares or Error S.S = TSS – BCSS
Having arrived at the same Sums of Squares figures, computations can proceed as done earlier.
Example 2: The table below gives the retail prices of a commodity in some shops selected at random
in four cities of Lagos, Calabar, Kano and Abuja. Carry out the Analysis of Variance (ANOVA) to
test the significance of the differences between the mean prices of the commodity in
the four
cities.
If significant difference is established, calculate the Least Significant Difference (LSD) and use it to
compare all the possible combinations of two means (α=0.05).
Solution:
72
ECO 253 STATISTICS FOR ECONOMIST 1
Using the alternative method of obtaining the sum of square
City Price per unit of the commodity in different shops Total Means
Lagos 9 7 10 8 34 8.5
Calabar 5 4 5 6 20 5
Kano 10 8 9 9 36 9
Abuja 7 8 9 8 32 8
= 34 + 20 + 36 +32=
122
Correction Factor (CF) =
=
= 930.25
Raw Sum of Square (RSS) = ∑ ∑
Decision: Sincethe computed F is greater than the table value F(v1, v2, α) , the null hypothesis is
rejected and the alternative is accepted.
Conclusion: At least one of the means is significantly different from others.
73
ECO 253 STATISTICS FOR ECONOMIST 1
But the standard error of =√
= 2.18 X √
= 2.18 X 0.678
LSD = 1.48
Comparison between different means
74
Self-Assessment Exercise (SAE 1)
1. Professor Ade had the 22 students in his 10 A.M. Introduction to Statistics rate his
performance as Excellent, Good, Fair, or Poor. A graduate student collected the ratings and
assured the students that Professor Ade would not receive them until after course grades had
been sent to the Deans’ office. The rating (i.e., the treatment) a student gave the professor
was matched with his or her course grade, which could range from 0 to 100. The sample
information is reported below.
Course grades
94 75 70 68
90 68 73 70
85 77 76 72
80 83 78 65
88 80 74
68 65
65
(a) Is there a difference in the mean score of the students in each of the four rating
categories? Use the .01 significance level.
(b) Interpret your answer
2. MAH Clean is a new all-purpose cleaner being test marketed by placing displays in three
different locations within various supermarkets. The number of 12-ounce bottles sold from
each location within the supermarket is reported below.
MZ 18 14 19 17
MM 12 18 10 16
Other cleaners 26 28 30 32
At the .05 significance level, is there a difference in the mean number of bottles sold at the three
locations?
(a) State the null hypothesis and the alternate hypothesis.
(b) Compute the values of SS total, SST, and SSE.
(c) Develop an ANOVA table.
(d) What is your decision regarding the null hypothesis?
4.6 SUMMARY
In summary, ANOVA is very useful in the multiple comparison of mean among other important
uses in both social and applied sciences.
75
(2011). Fundamentals of Statistics. (6th Rev. and Enlarged ed.). Mumbai, India:
Himalayan Publishing House.
Swift L. (1997).Mathematics and Statistics for Business, Management and Finance. London: Macmillan
Lind, A. D., Marchal, W. G. & Wathen, S. A. (2006) Basic Statistics for Business and Economics (5th ed.).
New York: McGraw-Hill.
Anderson, D. R., Sweeney, D. J., Camm, J. D. & Cochran, J. J. (2014). Statistics for Business and
Economics (12th ed.).South-Western.
Barrow, M. (2017). Statistics for Economics, Accounting and Business Studies, (7th ed.). Pearson, United
Kingdom.
McClave, J. T., Benson, P. G. & Sincich. T. (2017) Statistics for Business and Economics, (13th ed.), Pearson,
UK.
(b) The computed value of F is 8.99, which is greater than the critical value of 5.09, so the null hypothesis
is rejected. We conclude the population means are not all equal. The mean scores are not the same in
each of the four ratings groups. It is likely that the grades students earned in the course are related to the
opinion they have of the overall competency and classroom performance of Prof Ade, the instructor.
(a)
𝐻0 = 𝜇1 = 𝜇2 = 𝜇3
𝐻1 = At least one treatment mean is different
(b)
SS Total = 578
SSE = 74
SST = 504
c.
(d)
𝐻0 is rejected. There is a difference in the mean number of bottles sold at the various locations.
76
UNIT 5: N ON-PARAMETRIC TEST METHODS
UNIT STRUCTURE
5.1 Introduction
5.6 Summary
5.7 References/Further Reading/Web Resources
5.8 Possible Answers to Self-Assessment Exercise(s) within the content
5.1 INTRODUCTION
In statistics, the term non-parametric statistics refers to statistics that do not assume the data or population
have any characteristic structure or parameters. For example, non-parametric statistics are suitable for
examining the order in which runners complete a race, while parametric statistics would be more appropriate
for looking at the actual race times (which may possess parameters such as a mean and standard deviation).
In other words, the order (or "rank") of the values is used rather than the actual values themselves.
(1) The first meaning of non-parametric covers techniques that do not rely on data belonging to any
particular distribution. These include, among others:
(a) distribution free methods, which do not rely on assumptions that the data are drawn from a given
probability distribution. As such it is the opposite of parametric statistics. It includes non-parametric
descriptive statistics, statistical models, inference and statistical tests.
(b) non-parametric statistics (in the sense of a statistic over data, which is defined to be a function on a
77
sample that has no dependency on a parameter), whose interpretation does not depend on the population
fitting any parameterised distributions. Order statistics, which are based on the ranks of observations,
are one example of such statistics and these play a central role in many non-parametric approaches.
(2) The second meaning of non-parametric covers techniques that do not assume that the structure
of a model is fixed. Typically, the model grows in size to accommodate the complexity of the
data. In these techniques, individual variables are typically assumed to belong to parametric
distributions, and assumptions about the types of connections among variables are also made. These
techniques include, among others:
(a) non-parametric regression, which refers to modeling where the structure of the relationship between
variables is treated non-parametrically, but where nevertheless there may be parametric assumptions
about the distribution of model residuals.
(b) non-parametric hierarchical Bayesian models, such as models based on the Dirichlet process, which
allow the number of latent variables to grow as necessary to fit the data, but here individual variables
still follow parametric distributions and even the process controlling the rate of growth of latent variables
follows a parametric distribution.
Non-parametric methods are widely used for studying populations that take on a ranked order (such as
movie reviews receiving one to four stars). The use of non-parametric methods may be necessary when data
have a ranking but no clear numerical interpretation, such as when assessing preferences. In terms of levels
of measurement, non-parametric methods result in "ordinal" data.
As non-parametric methods make fewer assumptions, their applicability is much wider than the
corresponding parametric methods. In particular, they may be applied in situations where less is
known about the application in question. Also, due to the reliance on fewer assumptions, non- parametric
methods are more robust.
Another justification for the use of non-parametric methods is simplicity. In certain cases, even when the
use of parametric methods is justified, non-parametric methods may be easier to use. Due both to this
simplicity and to their greater robustness, non-parametric methods are seen by some statisticians as leaving
less room for improper use and misunderstanding.
The wider applicability and increased robustness of non-parametric tests comes at a cost: in cases where a
parametric test would be appropriate, non-parametric tests have less power. In other words, a larger
sample size can be required to draw conclusions with the same degree of confidence.
Non-parametric models differ from parametric models in that the model structure is not specified
a priori but is instead determined from data. The term non-parametricis not meant to imply that such
models completely lack parameters but that the number and nature of the parameters are flexible and
not fixed in advance.
Non-parametric (or distribution-free) inferential statistical methods are mathematical procedures for
statistical hypothesis testing which, unlike parametric statistics, make no assumptions about the probability
distributions of the variables being assessed.
78
4. Wilcoxon signed-rank test: tests whether matched pair samples are drawn from populations with
different mean ranks
5. Mann–Whitney U or Wilcoxon rank sum test: tests whether two samples are drawn from the
same distribution, as compared to a given alternative hypothesis
6. Anderson–Darling test: tests whether a sample is drawn from a given distribution
7. Statistical Bootstrap Methods: estimates the accuracy/sampling distribution of a statistic
8. Cochran's Q: tests whether k treatments in randomized block designs with 0/1 outcomes have
identical effects
9. Cohen's kappa: measures inter-rater agreement for categorical items
10. Friedman two-way analysis of variance by ranks: tests whether k treatments in randomized block designs
have identical effects
11. Kaplan–Meier: estimates the survival function from lifetime data, modelling censoring
14. Kolmogorov–Smirnov test: tests whether a sample is drawn from a given distribution, or whether two
samples are drawn from the same distribution
15. Kuiper's test: tests whether a sample is drawn from a given distribution, sensitive to cyclic variations
such as day of the week
16. Logrank Test: compares survival distributions of two right-skewed, censored samples
17. McNemar's test: tests whether, in 2 × 2 contingency tables with a dichotomous trait and matched
pairs of subjects, row and column marginal frequencies are equal
18. Median test: tests whether two samples are drawn from distributions with equal medians
19. Pitman's permutation test: a statistical significance test that yields exact p values by examining all
possible rearrangements of labels
20. Rank products: detects differentially expressed genes in replicated microarray experiments
21. Siegel–Tukey test: tests for differences in scale between two groups
22. Squared ranks test: tests equality of variances in two or more samples
23. Wald–Wolfowitz runs test: tests whether the elements of a sequence are mutually
independent/random
The test is usually based on large-sample theory that the sampling distribution of H can be closely
79
approximated with a chi-square distribution with k-1 degree of freedom
5.4.2 The Sign Test
The sign test is a versatile nonparametric method for hypothesis testing that uses the binomial distribution with
p =50 as the sampling distribution. It does not require an assumption about the distribution of the population.
Note always that P = ½. To get t, we look at value close to our α e.g. let α = 0.05, we look at value closer to
this value, the value on the left hand side of the table corresponding to this is our t. i.e. the value under the
column of P=0.5 in the Binomial Distribution table.
Number of +‘s =8
Number of –‗s =1
Number of ties =1
Solution:
n = number of +‘s and –‗s =9
T = number of +‘s =8
We now go into the Binomial distribution table with P= ½ (i.e. under the column of 0.50), n =9 and we
look for value close to 0.05 but not more than. In this case, what we have is 0.0195. This corresponds to 1.
i.e. t=1.
Therefore, we reject Ho if
T ≥ n-t
T = 8, n=9, t =1
Therefore, 8 ≥ 9-1
80
8=8
Example 2: The following are measurements of the households‘ weekly demand for water in litres:
163, 165, 160, 189, 161, 171, 158, 151, 169, 162, 163, 139, 172, 165, 148, 166, 172, 163,
187, 173.
Computations:
Replace each value exceeding 160with a plus sign, each value less than 160 with a negative sign and
discarding those actually equal to 160, we have the following:
+++++--+++-++-+++++
M F M MM F FF M F M F M MM F F M MM
In the above example, there are 11 runs and they are represented by u i.e.
u = 11
n1 = 12 (for m‘s)
n2 = 8 (for F‘s)
When n1 and n2 are small, tests of the null hypothesis of randomness are usually based on specially
constructed tables in the any statistical tables. However, when n1 and n2 are either 10 or more, the sampling
81
distribution of u (the total number of Runs) can be approximated with a normal distribution. For this,
we require the following results:
E(u) =
Var (u) =
√
Solution:
Ho: arrangement is random
H1: arrangement is not random
Critical region is -2.58 to 2.58
Where
√
Computation:
n1 = 20 (for n‘s)
n2 = 12 (for d‘s)
U = 9 (total number of runs)
E(u) =
=
= 16
E(9) < E(u) (16), hence we use + ½ Var
(u) =
= 6.77
√
= -2.50
Decision: Since Z = -2.50 falls between -2.58 and 2.58, then we cannot reject the null hypothesis of
randomness.
82
Example 2: Consider the following sample observations taken from three different populations
94 85 89
88 82 67
91 79 72
74 84 76
87 61 69
97 72
80
Solution:
Ho: μ1 = μ2 = μ3
H1: The three means are not equal
Rank all the observations together from the smallest to the highestas if they are from one sample.
83
Population I R1 Population II R1I Population III R1I
I
94 17 85 12 89 15
88 14 82 10 67 2
91 16 79 8 72 4.5
74 6 84 11 76 7
87 13 61 1 69 3
97 18 72 4.5
80 9
84 55.5 31.5
∑
[
= 61.67
χ2 0.05, 2 = 5.991
5.6 SUMMARY
The unit has explored the concept of non-parametric test viz: the definition, types, applications
(including hypothesis setting and testing)and interpretation of various tests. It emphasized that non-
parametric test as distribution free tests do not make assumptions about the probability
distributions of the variables being assessed. This also contributes to its flexibility and wide applicability.
Self-Assessment Exercises (SAE 1)
1. Explain the meaning of non-parametric method
84
5.7 REFERENCES/FURTHER READINGS/WEB RESOURCES
Gupta S.C (2011). Fundamentals of Statistics,(6th Rev. and Enlarged ed.). Mumbai,
India:Himalayan Publishing House
FisherR. A. (1925).Statistical Methods for Research Workers, Edinburgh: Oliver and Boyd, p.43.
85
MODULE 3: CORRELATION AND REGRESSION ANALYSIS
This module focuses on explaining the statistical relationship and interdependence among economic
variables. The two technique considered are correlation and regression analysis. The two techniques are
applied to measure strength of relationship between 2 or more economic variables and the level of
significance`
Correlation provides an estimate of the relationship between two measurements, without any assumption
of whether one comes before the other. For example, muscle mass and fat mass are correlated, both depends
on body size. Correlation coefficients have a value between -1 and +1. A positive coefficient means that x
and y values increases and decrease in the same direction. A negative correlation means that as x and y
move in opposite directions, one increases as the other decreases. Coefficient of 0 means x and y are
associated randomly.
The correlation measures only the degree of linear association between two variables while regression
analysis is a statistical process for estimating the relationships among variables. In this module the under
listed topics will be considered:
111111111111
\
86
ECO 253
1.1 INTRODUCTION
For Pearson's correlation coefficient is based on pairs of measurement (x,y) and the data is entered in 2
columns, each pair in a row. The coefficients, and whether it significantly differs from null (0), are
usually presented. More recently, the 95% confidence interval of the coefficient is presented, and
correlation can be considered statistically significant if the 95% confidence interval does not overlap the
zero (0) value. Sample size calculations or tables can be used for estimating sample size requirements or
power of the results in correlation.
Correlation measures the strength of the relationship between two (or more) ratio scale variables. Correlation
is a technique used by economists and forecasters.
(i) They can be used to answer such questions as
(ii) Is there a link between the money supply and the price level?
(iii) Do bigger firms produce at lower cost than smaller firms?
(iv) Does spending more on advertising increase sales?
Each of these questions is about economics or business as much as about statistics. The statistical analysis is
part of a wider investigation into the problem; it cannot provide a complete answer to the problem but, used
sensibly, is a vital input. Correlation technique may be applied to time-series or cross-section data.
The graphs are helpful, but it would be useful to have a simple numerical summary measure of each
relationship. For this purpose, we use the correlation coefficient between any pair of variables. The correlation
coefficient is a number which summarizes the relationship between two variables.
The different types of possible relationship between any two variables, X and Y, may be summarised as follows:
-High values of X tend to be associated with low values of Y and vice versa. This is termed negative correlation
and appears to be the case for B and G.
-High (low) values of X tend to be associated with high (low) values of Y. This is positive correlation and
reflects (rather weakly) the relationship between B and the income ratio (IR).
- No relationship between X and Y exists. High (low) values of X are associated
87
ECO 253
about equally with high and low values of Y. This is zero, or the absence of, correlation.
There appears to be little correlation between the birth rate and per capita GNP. It should be noted that positive
correlation does not mean that high values of X are always associated with high values of Y, but usually they
are. It is also the case that correlation only measures a linear relationship between the two variables. As a
counter-example, consider the backward-bending labour supply curve, as suggested by economic theory
(higher wages initially encourage extra work effort, but above a certain point the benefit of higher wage rates
is taken in the form of more leisure). The relationship is non-linear and the measured degree of correlation
between wages and hours of work is likely to be low, even though the former obviously influences the latter.
The sample correlation coefficient, r, is a numerical statistic which distinguishes between the types of cases
shown in Figure 7.1. It has the following properties:
- It always lies between -1 and +1. This makes it relatively easy to judge the strength of an association.
-A positive value of r indicates positive correlation, a higher value indicating a stronger correlation between X
and Y (i.e. the observations lie closer to a straight line). A value of r = 1 indicates perfect positive correlation
and means that all the observations lie precisely on a straight line with positive slope.
-A negative value of r indicates negative correlation. Similar to the above, a larger negative value indicates
stronger negative correlation and r = -1 signifies perfect negative correlation.
- A value of r = 0 (or close to it) indicates a lack of correlation between X and Y.
- The relationship is symmetric, i.e. the correlation between X and Y is the same as between Y and X. It does
not matter which variable is labelled Y and which is labelled X.
1.3.1Types of correlation
(a) Positive Correlation: Situations may arise when the values of two variables deviate in the same
direction i.e. if the increase in the values of one variables results, on an average, in a corresponding
increase in the values of the other variable or if a decrease in the values of one variable results, on an
average, in a corresponding decrease in the values of the other variable. Correlation is said to be positive.
Some examples of possible positive correlations are:
- Price and supply of a commodity
(b) Negative Correlation: Correlation is said to be negative or inverse if the variables deviate in the
opposite direction i.e.; if the increase (or decrease) in the values of one variable results, on the
average, in a corresponding decrease (or increase) in the value of the other variable. Example of
negative correlation are:
- Quantity demanded and price
88
ECO 253
(c) Linear Correlation: This describes a situation where for a unit change in one variable there is
a constant corresponding change in the other variable over the entire range of values. E.g.
x: 1 2 3 4 5
y:2 5 8 11 14
As seen above, for a unit change in x, there is a constant change (i.e. 3) in the
corresponding value of y. This can be expressed as y = 2+3x
I general two variables are said to be linearly related if they have a relationship of the form
y = a+bx
(c) Non-linear or curvilinear correlation: This describes situations if corresponding to a unit change
one variable; the other variable does not change at a constant rate but at a fluctuating rate.
(a) Mutual dependence: This is the situation when the phenomena under study inter-influence each
other. Such instances are usually observed in data relating to economic and business situations.
(b) Both variables being influenced by the same external factor(s): A high degree of correlation
between the two variables may be due to the effect or interaction of a third variable or a number of
variables on each of these two variables.
(c) Chance: It may happen that a small randomly selected sample from a bivariate distribution may
show a fairly high degree of correlation though, actually, the variables may not be
correlated in the population. Such correlation may be due to chance fluctuation. For example,
one may observe a high degree of correlation between the height and intelligence in a group of
people. Such correlation is called spurious or non-sense correlation.
Second, even if the raw data are available, they may be highly skewed and hence the correlation coefficient
may be influenced heavily by a few outliers. In this case the hypothesis test for correlation may be misleading
as it is based on the assumption of underlying Normal distributions for the data. In this case we could transform
the values to ranks, and calculate the correlation of the ranks. In a similar manner to the median, described in
Chapter 1, this can effectively deal with heavily skewed distributions.
89
ECO 253
Note the difference between the two cases. In the first, we would prefer to have the actual school pass rates and
expenditures because our analysis would be better. We could actually see how much extra we have to spend in
order to get better results. In the second case we actually prefer to use the ranks because the original data might
mislead us, through the presence of outliers for example. Non- parametric statistics are those which are robust
to the distribution of the data, such as the calculation of the median, rather than the mean which is a parametric
measure. The rank correlation coefficient is one of the parametric measures.
Alternative formula that relies on deviation of each individual observation from the mean is also
frequently used where the deviation from the mean x = X- and y = Y- . Here and are the
sample means of the set of data Xi and Yi respectively. This formula is given as
∑
√∑ ∑
90
1.3.6 Interpretation of the value of r
1.3.8 Assumptions one must meet in order to use the Pearson product-moment correlation
1. The measures are approximately normally distributed
2. The variance of the two measures is similar (homoscedasticity)
3. The relationship is linear
4. The sample represents the population
5 The variables are measured on an interval or ratio scale
Apart from spurious correlation, there are four possible reasons for a non-zero
value of r:
(1) X influences Y.
(2) Y influences X.
(3) X and Y jointly influence each other.
(4) Another variable, Z, influences both X and Y.
91
Correlation alone does not allow us to distinguish between these alternatives. For example, wages (X) and
prices (Y) are highly correlated. Some people believe this is due to cost–push inflation, i.e. that wage rises
lead to price rises. This is case (1) above. Others believe that wages rise to keep up with the cost of living
(i.e. rising prices), which is (2). Perhaps a more convincing explanation is (3), a wage– price spiral where
each feeds upon the other. Others would suggest that it is the growth of the money supply, Z, which allows
both wages and prices to rise. To
distinguish between these alternatives is important for the control of inflation, but correlation alone does
not allow that distinction to be made.
Correlation is best used therefore as a suggestive and descriptive piece of analysis, rather than a technique
which gives definitive answers. It is often a preparatory piece of analysis, which gives some clues to what
the data might yield, to be followed by more sophisticated techniques such as regression.
Solution: Let the advertising expenses (in ‗000 Naira) be denoted by the variable x and the sales
(in ‗000,000) be denoted by the variable x and the sale (in ‗000,000) be denoted by the variable y.
92
X Y x=X- y = Y- x2 y2 Xy
65 53 0 -13 0 169 0
62 58 -3 -8 9 64 24
82 62 17 -4 289 16 -68
75 68 10 2 100 4 20
∑ ∑
∑
Using the deviation from the mean formula:
√∑ ∑
93
ECO 253 STATISTICS FOR ECONOMIST 1
Example 2: The following table shows the marks obtained in Mathematics (X) and English (Y)
by ten students chosen randomly from a group of final year students in a Senior Secondary
School.
75 82
80 78
93 86
65 72
87 91
71 80
98 95
68 75
84 89
77 74
Calculate the product moment correlation coefficient between the two subjects and interpret your result.
∑ ∑ ∑
ρ=
√[ ∑ ∑ ][ ∑ ∑ ]
94
ECO 253 STATISTICS FOR ECONOMIST 1
X Y X2 Y2 XY
–
ρ=
√[ [ ][ ]
ρ=
√
It can be said that there is a strong positive relationship between the marks obtained in English and
Mathematics by the 10 ten students.
95
ECO 253 STATISTICS FOR ECONOMIST 1
1.6 SUMMARY
In summary, learners should have been able to find out here that Pearson‘s Correlation is a
measure of relationship between two (or more) variables that change together.
Spiegel M. R. and Stephens L. J. (2008).Statistics. (4th ed.). New York: McGraw Hill Press.
Gupta S.C. (2011). Fundamentals of Statistics. (6th Rev. and Enlarged ed.). Mumbai, India: Himalayan
Publishing House.
Swift L. (1997). Mathematics and Statistics for Business, Management and Finance. London: Macmillan.
Lind, A. D., Marchal, W. G. & Wathen, S. A. (2006) Basic Statistics for Business and Economics (5th ed.).
New York: McGraw-Hill.
Anderson, D. R., Sweeney, D. J., Camm, J. D. & Cochran, J. J. (2014). Statistics for Business and
Economics (12th ed.).South-Western.
Barrow, M. (2017). Statistics for Economics, Accounting and Business Studies, (7th ed.). Pearson, United
Kingdom.
McClave, J. T., Benson, P. G. & Sincich. T. (2017) Statistics for Business and Economics, (13th ed.),
Pearson, UK.
96
ECO 253 STATISTICS FOR ECONOMIST 1
UNIT 2: SPEARMAN’S RANK CORRELATION METHOD UNIT STRUCTURE
2.1 Introduction
2.2 Learning Outcomes
2.3 Spearman’s Rank Correlation Method
2.3.1 Meaning of Spearman’s Rank Correlation
2.4 Solved Examples
2.5 Challenges using Spearman’s Rank
2.6 Summary
2.1 INTRODUCTION
In certain instances, we come across statistical series in the variables under consideration cannot be
measured quantitatively but can only be arranged in serial order. This is always the situation when we are
dealing with qualitative attributes such as intelligence, preference, honesty, morality etc. In such case, Karl
Pearson‘s coefficient of correlation cannot be used. A British Psychologist Charles Edward Spearman
developed a formula in 1904 which can be used to obtain the correlation coefficient between the ranks
of n individuals in the two variables or attributes being study.
For example, assuming we are interested in determining the correlation between fluency in English
Language (A) and Beauty (B) among a group young ladies numbering n. These are variables which
cannot be measured but we can arrange the group of n individuals in order of merit (ranks) with respect
to their proficiency in the two attributes. Let the random variables X
97
ECO 253 STATISTICS FOR ECONOMIST 1
and Y denote the rank of the individuals in the characteristics A and B respectively. Also, if it is assumed
that there is no tie, i.e no two individuals get the same rank in a characteristic, then, X and Y assume
numerical values ranging from 1 to n.
Spearman‘s Rank Correlation Coefficient, usually denoted by ρ (Rho) is given by the formula:
∑
Where d is the difference between the pair of ranks of the same individual in the two
Example 1:Fifteen members of staff of the administrative unit of an organization were studied to determine
the correlation between their punctuality at work (X) and the compliance of their dresses with
organizational dress code (Y) and the following ranks as given in the table below
were observed:
Rank in (X) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Rank in (Y) 10 7 2 6 4 8 3 1 11 15 9 5 14 12 13
Calculate the Spearman‘s rank correlation coefficient between the two characteristics.
Solution:
Spearman‘s Rank Correlation Coefficient is given by the formula:
∑
98
ECO 253 STATISTICS FOR ECONOMIST 1
1 10 -9 81
2 7 -5 25
3 2 1 1
4 6 -2 4
5 4 1 1
6 8 -2 4
7 3 4 16
8 1 7 49
9 11 -2 4
10 15 -5 25
11 9 2 4
12 5 7 49
13 14 -1 1
14 12 2 4
15 13 2 4
Σd = 0 Σd2 = 272
Example 2: Calculate the Spearman‘s rank correlation coefficient between advert expenditure and sales
revenue recorded by some randomly selected companies in an industrial estate as given
below:
99
ECO 253 STATISTICS FOR ECONOMIST 1
Advert (N ‘000) 24 29 19 14 30 19 27 30 20 28 11
Sales (N ‘000) 37 35 16 26 45 27 28 33 16 41 21
Solution:
24 37 6 3 3 9
29 35 3 4 -1 1
19 16 8.5 10.5 -2 4
14 26 10 8 2 4
27 28 5 6 -1 1
28 41 4 2 2 4
11 21 11 9 2 4
Σd = 0 Σd2 = 54
2.5 Some challenging cases may arise when using Spearman’s rank correlation method: These
include:
Case I: When ranks are not given: The Edward Spearman‘s rank correlation formula can be used
even when dealing with variables which are measured quantitatively, i.e. when the actual data but not
the ranks relating to two variables are given. In such case, we shall have to convert the data into ranks.
The highest (or smallest) observation is given rank 1. The next highest (or next lowest) observation is
given rank 2and so on. It does not matter in which way (ascending or descending) the ranks are assigned.
However, the same approach should be followed for all the variables under consideration.
Case II: Repeated ranks: In case of attributes, if there is a tie, i.e. if any two or more individuals are
100
ECO 253 STATISTICS FOR ECONOMIST 1
placed together in any classification with respect to an attributes or if in any case of variable data there
are more than one items with the same value in either or both the series, then the Spearman‘s
formula for calculating the rank correlation coefficient breaks down, since in this case the variable X (the
rank of individuals in the first characteristic series) and Y (the rank of individuals in the second
characteristics series) do not take the values from 1 to n and consequently as assumed in the
derivation of the formula. In such instance, common
ranks are assigned to the repeated items. The common rank assigned is the arithmetic mean of the ranks
which these items would have got if they were different from each other and the next item will get the
rank next to the rank used in computing the common ran. For example, suppose an item is repeated at
rank 6, then the common rank to be assigned to each of the two items is (6+7)/2 i.e. 6.5which is the
average of 6 and 7, the ranks which the two observations would have assumed if they were different.
Therefore, the next item will be assigned the rank 8. Meanwhile, if an item is repeated thrice at 9 for
instance, then the common rank to be assigned to each of the three will be (9+10+11)/3 i.e. 10which is
the arithmetic mean of the three ranks. The next rank to be assigned will be 12.
2.6 SUMMARY
This unit by now has been able to help learners to calculate and interpret the simple correlation between
two variables and to determine whether the correlation is significant
Siegel S. and Castellan Jr. N. J. (1988).Nonparametric Statistics for the Behavioral Sciences,(2nd.
ed.).Boston Massachusetts:McGraw Hill, Inc. ISBN 0-07-057357-3 p235-244.
Lind, A. D., Marchal, W. G. & Wathen, S. A. (2006) Basic Statistics for Business and Economics (5th ed.).
New York: McGraw-Hill.
Anderson, D. R., Sweeney, D. J., Camm, J. D. & Cochran, J. J. (2014). Statistics for Business and
Economics (12th ed.).South-Western
Barrow, M. (2017). Statistics for Economics, Accounting and Business Studies, (7th ed.). Pearson, United
Kingdom.
McClave, J. T., Benson, P. G. & Sincich. T. (2017) Statistics for Business and Economics, (13th ed.), Pearson,
UK.
UNIT STRUCTURE
3.1 Introduction
3.2 Learning Outcomes
3.3 Least Square Regression Analysis
3.4 Meaning of Regression Analysis
3.5 Solved Examples
3.6 Summary
3.7 References/Further Reading/Web Resources
3.8 Possible Answers to Self-Assessment Exercise(s) within the content
3.1 INTRODUCTION
If two variables are significantly correlated, and if there is some theoretical basis for doing so, it is possible
to predict values of one variable from the other. This observation leads to a very important concept
known as ‗Regression Analysis‘.
The literal or dictionary meaning of the word ―regression‖ is ―stepping back or returning to the average
value‖. The term was first used by the British Biometrician Sir Francis Galton in late 19 th century in
connection with some studies he conducted on estimating the extent to which the
stature of the sons of tall parents reverts or regresses back to the mean stature of the population.
He studied the relationship between the heights of about one thousand fathers and sons and
published the results in a paper ―Regression towards Mediocrity in Hereditary Stature‖.
Regression analysis is a more sophisticated way of examining the relationship between two (or more) variables
than is correlation. The major differences between correlation and regression are the following:
-Regression can investigate the relationships between two or more variables.
- A direction of causality is asserted, from the explanatory variable (or variables) to the dependent variable.
- The influence of each explanatory variable upon the dependent variable is measured.
- The significance of each explanatory variable’s influence can be ascertained.
Thus regression permits answers to such questions as:
- Does the growth rate influence a country’s birth rate?
- If the growth rate increases, by how much might a country’s birth rate be expected to fall?
-Are other variables important in determining the birth rate?
3.4.1 Simple Regression: This a type of regression in which more than two variables are studied. This is
always the case in our day-to-day life because more often than not, a particular phenomenon is affected by
multiplicity of factors. For example, demand for a particular product is not only determined by its market
price but also by prices of substitutes, income of buyers, population and taste and fashion among others.In
regression analysis there are two types of variables and these are:
Dependent Variable: This is the variable whose value is influenced or is to be predicted. For example,
elementary economic theory states that ―the higher the price the lower the quantity demanded‖ In this,
it is clear that quantity demanded is influenced by price of the commodity. Therefore, quantity demanded of
the commodity is described as the ―Dependent variable‖
Independent Variable: This is the variable which influences the value of the dependent variable or which
is used for prediction. In our example involving the law of demand, price of the commodity determines
or influences the quantity demanded. Therefore, price is described as the
―independent variable‖.
In regression analysis, the dependent variable is also known as regressand, regressed or explained variable.
On the other hand, the independent variable is also known as the regressor, predictor or explanatory
variable.
3.4.2 Line of Regression: This is the line which gives the best estimate of one variable for any given value
of the other variable. Therefore, the line of regression of y on x is the line which gives the best estimates
for the value of y for any specified value of x.
The term best fit is interpreted in accordance with the principle of least squares which involves minimising
the sum of the squares of the residuals or the errors of the estimates i.e, the deviations between the given
observed values of the variable and their corresponding estimated values as given by the line of best fit.
It should be noted that several lines can be drawn from the same set of pairs of observations plotted
in the form of a scattered diagram, but the best fit line gives the best estimate of the dependent variable for
any given level of independent variable.
103
ECO 253 STATISTICS FOR ECONOMIST 1
The term ―a‖ represents the intercept of the model and it is the value of Y when X is equal to zero. It is
represented by the formula‖
∑ ∑ ∑ ∑
∑ ∑
Furthermore, the term ―b‖ represents the slope off the regression model and it is the amount of change in
the dependent variable Y as a result of a unit change in the value of the independent variable X. It is
represented by the formula:
∑ ∑ ∑
∑ ∑
∑ ∑
Where is the sample mean of the dependent variable Y and is the sample mean of the
independent variable X. From the foregoing ―a‖ can be obtained having obtained ―b‖ by:
x=X– ;y=Y–
∑
∑
Solution:
X Y X2 XY x y xy x2 y2
25 10 625 250 -6 -1 6 36 1
24 8 576 192 -7 -3 21 49 9
104
ECO 253 STATISTICS FOR ECONOMIST 1
43 12 1849 516 12 1 12 144 1
23 11 529 253 -8 0 0 64 0
30 13 900 390 -1 2 -2 1 4
34 13 1156 442 3 2 6 9 4
∑ ∑ ∑ ∑
∑ ∑
∑ ∑
∑ ∑
Alternatively, having obtained estimate value for the slope parameter b, the intercept term a can
be obtained using the formula:
∑
In the above problem
105
ECO 253 STATISTICS FOR ECONOMIST 1
Example 2: Suppose data were collected from a sample of 10 Foodco restaurants located near NOUN
campuses are shown in the table below:
Foodco restaurant Student population (1000s) Sales per quarter (N1000s)
1 2 58
2 6 105
3 8 88
4 8 118
5 12 117
6 16 137
7 20 157
8 20 169
9 22 149
10 26 202
∑𝑋
𝑥̅ =
𝑛
140
𝑥̅ = = 14
10
∑𝑌
𝑦̅ =
𝑛
1300
𝑦̅ = = 130
10
𝑥̅ = mean of the independent variable
𝑦̅= mean of the dependent variable
106
ECO 253 STATISTICS FOR ECONOMIST 1
Foodco resta x y 𝑥 − 𝑥̅ 𝑦 − 𝑦̅ (𝑥 − 𝑥̅ )(𝑦 − 𝑦̅) (𝑥 − 𝑥̅ )2
1 2 58 -12 -72 864 144
2 6 105 -8 -25 200 64
3 8 88 -6 -42 252 36
4 8 118 -6 -12 72 36
5 12 117 -2 -13 26 4
6 16 137 2 -7 14 4
7 20 157 6 27 162 36
8 20 169 6 39 234 36
9 22 149 8 19 152 64
10 26 202 12 72 864 144
Total ∑ 𝑋 =140 ∑ 𝑌 =1300 2840 568
𝑏0 = 𝑦̅ − 𝑏1 𝑥̅
𝑏0 = 130 − 3 (14)
𝑏0 = 130 − 70
𝑏0 = 60
The estimated equation is:
𝑦𝑡 = 60 + 5𝑥
The slope of the estimated regression equation (𝑏1 = 5) is positive, implying that as student
population increases, sales increase. In fact, we can conclude (based on sales measured in N1000s and
student population in 1000s) that an increase in the student population of 1000 is associated with an increase
of N5000 in expected sales; that is, quarterly sales are expected to increase by N5 per student.
If we believe the least squares estimated regression equation adequately describes the relationship between x
and y, it would seem reasonable to use the estimated regression equation
to predict the value of y for a given value of x. For example, if we wanted to predict quarterly sales for a
restaurant to be located near a campus with 16,000 students, we would compute:
107
ECO 253 STATISTICS FOR ECONOMIST 1
𝑦𝑡 = 60 + 5(16)
𝑦𝑡 = 60 + 80
𝑦𝑡 = 140
3.6 SUMMARY
With reference to the explanations and illustrations demonstrated above, learners can now apply least
square method to study the nature of the relation between two variables.
108
ECO 253 STATISTICS FOR ECONOMIST 1
4.1 INTRODUCTION
In general, we are interested in point forecasts, which predict a single number in each forecast period.
Needless to say, the information provided by the forecasts can be very useful. For instance, it can help
not only in policy and decision making, but also in validating the model from which the forecasts are made.
In the forecasting process, we are usually given the values of the independent variables and our goal is to
predict the value of the dependent variable. This raises the question of whether the values of the
independent variables are known with certainty. If so, then we are making an unconditional forecast.
Otherwise, we are making a conditional forecast. To see the difference between the two, consider the
following settings:
1. Suppose that we use the following linear regression model to describe the relationship between
the demand for beef and household income Y = a + bX + u as used in the previous unit. Once we
obtain estimators of the regression parameters a; b, we can use the resulting regression line to make
forecasts. Specifically, the forecasted household demand for beef
will be given by . Sometimes, the value X which is the household income may
depend on some unpredictable factors that are not known with certainty at the time of
forecast. Thus, our forecast for Y will be conditional on our forecast for X.
2. Suppose that we use the linear regression model to describe the relationship betweenthe
monthly auto sales stand the production capacity Ct:
St= a + bCt-2 + ϵt:
In other words, sales in the t-th month depends linearly on the production capacity of the
(t-2)nd month. If we are currently at the T-th month and we want to forecast the auto sales in the (T +1)-
st month, then we need the production capacity of the (T-1)st month, which can be determined with
certainty. Thus, in this case, the forecast will be unconditional.
109
ECO 253 STATISTICS FOR ECONOMIST 1
Forecasting methods can be classified as qualitative or quantitative. Qualitative methods generally involve the
use of expert judgment to develop forecasts. Such methods are appropriate when historical data on the variable
being forecast are either not applicable or unavailable. Quantitative forecasting methods can be used when (1)
past information about the variable being forecast is available, (2) the information can be quantified, and (3) it
is reasonable to assume that the pattern of the past will continue into the future. In such cases, a forecast can
be developed using a time series method or a causal method.
If the historical data are restricted to past values of the variable to be forecast, the forecasting procedure is
called a time series method and the historical data are referred to as a time series. The objective of time series
analysis is to discover a pattern in the historical data or time series and then extrapolate the pattern into the
future; the forecast is based solely on past values of the variable and/or on past forecast errors.
Causal forecasting methods are based on the assumption that the variable we are forecasting has a cause-effect
relationship with one or more other variables. In the discussion of regression analysis in, we showed how one
or more independent variables could be used to predict the value of a single dependent variable. Looking at
regression analysis as a forecasting tool, we can view the time series value that we want to forecast as the
dependent variable. Hence, if we can identify a good set of related independent, or explanatory, variables, we
may be able to develop an estimated regression equation for predicting or forecasting the time series. For
instance, the sales for many products are influenced by advertising expenditures, so regression analysis may be
used to develop an equation showing how sales and advertising are related. Once the advertising budget for the
next period is determined, we could substitute this value into the equation to develop a prediction or forecast
of the sales volume for that period. Note that if a time series method were used to develop the forecast,
advertising expenditures would not be considered; that is, a time series method would base the forecast solely
on past sales.
By treating time as the independent variable and the time series as a dependent variable, regression analysis
can also be used as a time series method. To help differentiate the application of regression analysis in these
two cases, we use the terms cross-sectional regression and time series regression. Thus, time series regression
refers to the use of regression analysis when the independent variable is time.
_ the t and F, which test the explanatory power of the independent variables;
110
ECO 253 STATISTICS FOR ECONOMIST 1
It should be noted that these are quite different measures of model reliability, and they need not subsume
each other. For instance, a regression model can have significant t{statistics and a high R2 value, and yet
it still forecasts poorly. This could happen if there is a structural change in the system during the forecasting
period, which occurs after the estimation period (i.e., the period during which we collect data and estimate
the coefficients of the regression model). On the other hand, one may be able to obtain good forecasts from
regression models that have insignificant regression coefficients or relatively low R2 values. This could
happen when there is very little variation in the dependent variable, so that although it is not well
explained by the regression model, it can still be forecast easily
Forecast or predict the demand for beef by households whose incomes are N35,000,N40,000 and
N45,000.
Solution: Following the regression formulas for obtaining the intercept term a and the slope estimate b
discussed in the last unit one can easily obtain these values by substituting the values of income (X) into the
estimated regression equation.
Recall that intercept term is given as:
∑ ∑ ∑
∑
∑
∑
However, one may obtain the value of the estimate of the slope parameter b and use the formula:
to estimate the value of the intercept term a. For
the above problem, the estimated regression equation is given as:
= 2.6 + 0.27X
111
ECO 253 STATISTICS FOR ECONOMIST 1
For values of explanatory variables 35, 40 and 45 (remember that ‗000 was factored out of the
calculations)
(i) For household income X = N35,000
4.6 SUMMARY
In summary, the unit has explored the rudiments of Least Square Regression, the condition under which it
could be used and the procedure for estimating the relevant parameter estimates and their interpretations
Spiegel, M. R. and Stephens L. J. (2008).Statistics. (4th ed.). New York: McGraw Hill Press. Gupta
S.C. (2011). Fundamentals of Statistics. (6th Rev. and Enlarged ed.). Mumbai, India:
Himalayan Publishing House.
Swift L. (1997).Mathematics and Statistics for Business, Management and Finance. London:
Macmillan.
112
ECO 253 STATISTICS FOR ECONOMIST 1
1.1 INTRODUCTION
If a random sample of N cases is drawn from a population with a mean µ and standard deviation s, then
the sampling distribution of the mean has:
1. a mean equal to the population mean µx
2. a standard deviation (standard error) equal to the standard deviation divided by the square
root of the sample size N:
√
3. the shape of the sampling distribution of the mean approaches normal as N increases.
Rule 1.µӯ = µ
Rule 2. This rule is approximately correct as long as no more than 5% of the population is
√
Rule 4.When n is sufficiently large, the sampling distribution of Y is well approximated by a normal
curve, even when the population distribution is not itself normal.
Suppose that a sample of size n is selected from a population that has mean µ and standard deviation ζ.
Let X1;X2; _ _ _ Xn be the n observations that are independent and identically distributed (i.i.d.). Define
now the sample mean and the total of these n observations as
follows:
∑
∑
The central limit theorem states that the sample mean follows approximately the normal
distribution with mean µ and standard deviation , where µ and ζ are the mean and standard
√
deviation of the population from where the sample was selected. The sample size n has to be large (usually
n˃30) if the population from where the sample is taken is non-normal. If the population follows the normal
distribution then the sample size n can either be small or large. The sample mean of a large random
sample of random variables with mean μ and finite variance ζ2 has
approximately the normal distribution with mean μ and variance ζ 2/n. This result helps to justify
the use of the normal distribution as a model for many random variables that can be thought of as being
made up of many independent parts. Another version of the central limit theorem is given
that applies to independent random variables that are not identically distributed.
To summarize:
√
114
ECO 253
Let us suppose that Y1, Y2, Yn ,..., are independent and identically distributed with mean = µand
finite variance σ2. We now prove these two theorems about the mean and variance of the sample mean.
Theorem 1
Theorem 2:
Pro
In probability theory central limit theorem states that given a certain conditions the mean of a
sufficiently large number of iterates.
The CLT can tell us about the distribution of large sums of random variables even if the distribution
of the random variables is almost unknown. With this result we are able to approximate how likely
it is that the arithmetic mean deviates from its expected value.
Using the CLT we can verify hypotheses by making statistical decisions, because we are able to
i.e. a centred and normalized sum of independent and identically distributed (i.i.d.) random
Example: Let X be a random variable with µ = 10 and ζ = 4. A sample of size 100 is taken from this
population. Find the probability that the sample mean of these 100 observations is less than 9. We write
( )
√
Let {X1, ..., Xn} be a random sample of size n — that is, a sequence of independent and identically
distributed random variables drawn from distributions of expected values given by µ and finite
variances given by ζ2. Suppose we are interested in the sample average
115
ECO 253
of these random variables. By the law of large numbers, the sample averages converge in probability
and almost surely to the expected value µ as n → ∞. The classical central limit theorem describes
the size and the distributional form of the stochastic fluctuations around the deterministic number µ during
this convergence. More precisely, it states that as n gets larger, the distribution of the difference between
the sample average Sn and its limit µ, when multiplied by the factor √n (that is √n(Sn − µ)), approximates
the normal distribution with mean 0 and variance ζ2. For large enough n, the distribution of Sn is close to
the normal distribution with mean µ and
variance ζ2 n. The usefulness of the theorem is that the distribution of √n (Sn − µ) approaches
normality regardless of the shape of the distribution of the individual Xi‘s. Formally, the theorem can be
stated as follows:
1. Lindeberg–Lévy CLT.
Suppose {X1, X2, ...} is a sequence of independent and identically distributed (i.i.d) random variables
with E[Xi] = µ and Var[Xi] = ζ2 < ∞. Then as n approaches infinity, the random variables √n(Sn −
µ) converge in distribution to a normalN(0, ζ2):
In the case ζ > 0, convergence in distribution means that the cumulative distribution functions
(cdf) of √n(Sn − µ) converge point wise to the cdf of the N(0, ζ 2) distribution: for every real
number z,
where Φ(x) is the standard normal cdf evaluated at x. Note that the convergence is uniform in z in
where sup denotes the least upper bound (or supremum) of the set.
116
2. Lyapunov CLT
The theorem is named after Russian mathematician Aleksandr Lyapunov. In this variant of the central limit
theorem the random variables Xi have to be independent, but not necessarily identically distributed.
The theorem also requires that random variables |Xi | have moments of some order (2 + δ), and that the
rate of growth of these moments is limited by the Lyapunov condition given below.
Suppose {X1, X2, ...} is a sequence of independent random variables, each with finite expected value
μi and variance σ2i . Define
is satisfied, then a sum of (Xi − μi)/sn converges in distribution to a standard normal random variable,
as n goes to infinity:
In practice it is usually easiest to check the Lyapunov‘s condition for δ = 1. If a sequence of random
variables satisfies Lyapunov‘s condition, then it also satisfies Lindeberg‘s condition. The converse
implication, however, does not hold.
3. Lindeberg CLT
In the same setting and with the same notation as above, the Lyapunov condition can be replaced with the
following weaker one called Lindeberg's condition, for every ε > 0
121
117
ECO 253 STATISTICS FOR ECONOMIST 1
where1{…} is the indicator function. Then the distribution of the standardized sums
4. Multidimensional CLT
Proofs that the used characteristic functions can be extended to cases where each individual X1, ..., Xn is an
independent and identically distributed random vector in Rk, with mean vector μ = E(Xi) and covariance
matrixΣ (amongst the individual components of the vector). Now, if we take the
summations of these vectors as being done component wise, then the multidimensional central limit
theorem states that when scaled, these converge to a multivariate normal distribution.
Let
be the i-vector. The bold in Xi means that it is a random vector, not a random (univariate)
and therefore
118
ECO 253 STATISTICS FOR ECONOMIST 1
1.6 SUMMARY
Some level of independence sequence in CLT is highlighted here to make learners have an
understanding of inferential statistics and hypothesis testing. Your ability to attempt the assignments
below will go a long way in improving on the knowledge already acquired above.
471-00710-2
Bradley, R. (2007), Introduction to Strong Mixing Conditions (1st ed.), Heber City, UT: Kendrick
Press, ISBN0-9740427-9-X
Bradley, R. (2005), Basic Properties of Strong Mixing Conditions. A Survey and Some Open Questions,
Probability Surveys2:107–144, arXiv:math/0511078v1,doi:10.1214/154957805100000104,
http://arxiv.org/pdf/math/0511078.pdf
Dinov, I., Christou, N. and Sanchez, J. (2008), Central Limit Theorem: New SOCR Applet and
Demonstration Activity, Journal of Statistics Education (ASA) 16 (2),
http://www.amstat.org/publications/jse/v16n2/dinov.html, website: www. Wikipedia.com
(ii) When the population distribution is normal, the sampling distribution of Y is also normal for any sample
size n
2.. (i) Lindeberg–Lévy CLT
(ii) Lyapunov CLT
(iii.) Lindeberg CLT
(iv.) Multidimensional CLT
119
ECO 253 STATISTICS FOR ECONOMIST 1
2.1 INTRODUCTION
A useful generalization of a sequence of independent identically distributed random variables is a mixing
random process in discrete time; "mixing" means, roughly, that random variables temporally far apart
from one another are nearly independent. Several kinds of mixing are used in ergodic theory and probability
theory. Strong mixing (also called α-mixing) is defined by α(n) →
0 where α(n) is so-called strong mixing coefficient.
A simplified formulation of the central limit theorem under strong mixing is provided for in the
following:
▪ CLT under weak dependence
2.4 Theorems
1. Theoremi. Suppose that X1, X2, … is stationary and α-mixing with αn = O(n−5) and that E(Xn ) = 0
and E(Xn12) < ∞. Denote Sn = X1 + … + Xn, then the limit
In fact, ∑
The assumption ζ ≠ 0 cannot be omitted, since the asymptotic normality fails for Xn = Yn − Yn−1
where Yn are another stationary sequence.
There is a stronger version of the theorem: the assumption E(Xn12) < ∞ is replaced with
E(|Xn |2 + δ) < ∞, and the assumption αn = O(n−5) is replaced with ∑
120
ECO 253 STATISTICS FOR ECONOMIST 1
Existence of such δ > 0 ensures the conclusion.
But this limit is just the characteristic function of a standard normal distribution N(0, 1), and the
central limit theorem follows from the Lévy continuity theorem, which confirms that the convergence of
characteristic functions implies convergence in distribution.
2.6 SUMMARY
The central limit theorem applies in particular to sums of independent and identically distributed discrete
random variables. A sum of discrete random variables is still a discrete random variable, so that we are
confronted with a sequence of discrete random variables whose cumulative
probability distribution function converges towards a cumulative probability distribution function
corresponding to a continuous variable (namely that of the normal distribution). This means that if we build
a histogram of the realisations of the sum of n independent identical discrete variables, the curve that
joins the centres of the upper faces of the rectangles forming the histogram converges toward a
Gaussian curve as n approaches infinity, this relation is known as de Moivre– Laplace theorem. The binomial
distribution article details such an application of the central limit theorem in the simple case of a discrete
variable taking only two possible values.
Artstein, S.; Ball, K.; Barthe, F. andNaor, A. (2004), "Solution of Shannon's Problem on the Monotonicity
of Entropy",Journal of the American Mathematical Society17 (4): 975–982, doi:10.1090/S0894-0347-04-
00459-X
Rosenthal, J. S. (2000) A first look at rigorous probability theory, World Scientific, ISBN 981-02-
Website: www.wikipedia.com
122
ECO 253 STATISTICS FOR ECONOMIST 1
3.1 Introduction
3.2 Learning Outcomes
3.3 The Law of Large Numbers
3.4 Solved Examples
3.5 Summary
3.6 References/Further Reading/Web Resources
3.7 Possible Answers to Self-Assessment Exercise(s) within the content
3.1 INTRODUCTION
It is a rule that assumes that as the number of samples increases, the average of the samples is likely to
reach the mean of the population. The law of large numbers says that the sample mean of a random sample
converges in probability to the mean μ of the individual random variables, if the variance exists. This means
that the sample mean will be close to μ if the size of the random sample is sufficiently large.
Suppose that X1, . . . ,Xnform a random sample from a distribution for which the mean is μ and for which
the variance is finite. Let Xndenote the sample mean. Then
Proof:Let the variance of each Xi beσ2. It then follows from the Chebyshev inequality that for every
number ε >0,
| |
Hence,
Dividing both parts by φ1(n) and taking the limit will produce a1, the coefficient of the highest- order term
in the expansion, which represents the rate at which f(n) changes in its leading term.
Informally, one can say: "f(n) grows approximately as a1 φ1(n)". Taking the difference between
f(n) and its approximation and then dividing by the next term in the expansion, we arrive at a more refined
statement about f(n):
123
ECO 253 STATISTICS FOR ECONOMIST 1
Here one can say that the difference between the function and its approximation grows approximately as
a2φ2(n). The idea is that dividing the function by appropriate normalizing functions, and looking at
the limiting behavior of the result, can tell us much about the limiting behavior of the original function itself.
Informally, something along these lines is happening when the sum, Sn, of independent identically
distributed random variables, X1, ...,Xn, is studied in classical probability theory. If each Xi has finite
mean μ, then by the law of large numbers, Sn/n → μ. If in addition each Xi has finite
variance ζ2, then by the central limit theorem,
where ξ is distributed as N(0, ζ2). This provides values of the first two constants in the informal
expansion
In the case where the Xi's do not have finite mean or variance, convergence of the shifted and
rescaled sum can also occur with different centering and scaling factors:
or informally
Distributions Ξ which can arise in this way are called stable. Clearly, the normal distribution is stable, but
there are also other stable distributions, such as the Cauchy distribution, for which the mean or variance
are not defined. The scaling factor bn may be proportional to nc, for any c ≥ 1/2; it may also be multiplied
by a slowly varying function of n.
Since n must be chosen so that Pr(|Xn− μ| <1) ≥ 0.99, it follows that n must be chosen so that 4/n
According to the law of large numbers if a large number of six-sided dice are rolled the average of their
values sometimes called the sample mean is likely to be close to 3.5 with the accuracy increasing as more
dice are rolled.
3.5 SUMMARY
In summary, the explanations and illustrations presented above would have provided clear understanding of
this unit. In case learners encounter some difficulties in understanding any area, it is suggested that they
make reference to further reading list at the end of this unit. Such reference is expected to enhance
learners‘ knowledge to be able to solve problems arising from large numbers.
Rosenthal, J. S. (2000) A first look at rigorous probability theory, World Scientific, ISBN 981-02-
Johnson, O. T. (2004) Information theory and the central limit theorem, Imperial College Press,
Vladimir V. U. and Zolotarev V. M. (1999) Chance and stability: stable distributions and their
applications, VSP. ISBN 90-6764-301-7.(pp. 61–62)
125
ECO 253 STATISTICS FOR ECONOMIST 1
4.1 Introduction
4.2 Learning Outcomes
4.3 Extension to the CLT and beyond the Classical Framework
4.4 Lacunary trigonometric series
4.5 Linear functions of orthogonal matrices
4.6 Summary
4.7 References/Further Reading/Web Resources
4.8 Possible Answers to Self-Assessment Exercise(s) within the content
4.1 INTRODUCTION
Convex body
Theorem. There exists a sequence εn ↓ 0 for which the following holds. Let n ≥ 1, and let random variables
X1, …, Xn have a log-concave joint density f such that f(x1, …, xn) = f(|x1 |, …, |xn |) for all x1, …, xn, and
E(Xk2) = 1 for all k = 1, …, n. Then the distribution of
√
isεn-close to N(0, 1) in the total variation distance.
These two εn-close distributions have densities (in fact, log-concave densities), thus, the total variance
distance between them is the integral of the absolute value of the difference between the densities.
Convergence in total variation is stronger than weak convergence.
An important example of a log-concave density is a function constant inside a given convex body and
vanishing outside; it corresponds to the uniform distribution on the convex body, which explains the term
"central limit theorem for convex bodies".
Another example: f(x1, …,xn) = const · exp( − (|x1 |α + … + |xn |α)β) where α > 1 and αβ > 1. If β = 1
then f(x1, …,xn) factorizes into const · exp ( − |x1 |α)…exp( − |xn |α), which means independence of
X1, …, Xn. In general, however, they are dependent.
The condition f(x1, …, xn) = f(|x1 |, …, |xn |) ensures that X1, …, Xn are of zero mean and
uncorrelated; still, they need not be independent, nor even pairwise independent. By the way, pairwise
independence cannot replace independence in the classical central limit theorem.
Below is a Berry-Esseen type result.
Theorem. Let X1, …,Xn satisfy the assumptions of the previous theorem, then
| ( ) ∫ ⁄ |
√
126
ECO 253 STATISTICS FOR ECONOMIST 1
for all a<b; here C is a universal (absolute) constant. Moreover, for every c1, …,cn∈R such that c12
2
+ … + cn = 1,
| ∫ ⁄ |
uniform). However, the distribution of c1X1 + … + cnXn is close to N(0, 1) (in the total variation
distance) for most of vectors (c1, …, cn) according to the uniform distribution on the sphere c12 +
… + cn2 = 1.
4.3 MAIN CONTENT EXTENSION TO THE CLT AND BEYOND THE CLASSICAL
FRAMEFORK
4.4 Lacunary trigonometric series
Theorem (Salem-Zygmund) Let U be a random variable distributed uniformly on (0, 2π), and Xk
- nk satisfy the lacunarity condition: there exists q> 1 such that nk+1 ≥ qnk for all k,
- 0 ≤ ak< 2π.
Then
127
ECO 253 STATISTICS FOR ECONOMIST 1
√
Gaussian polytopes
Theorem Let A1, ...,An be independent random points on the plane R2 each having the two-
dimensional standard normal distribution. Let Kn be the convex hull of these points, and Xn the
converges in distribution to N(0, 1) as n tends to infinity.The same holds in all dimensions (2, 3,
olds for the number of vertices (of the Gaussian polytope), the number of edges, and in fact, faces of all
dimensions.
A linear function of a matrix M is a linear combination of its elements (with given coefficients),
M↦tr(AM) where A is the matrix of the coefficients; see Trace_(linear_algebra)#Inner product.
normalized Haar measure on the orthogonal group O(n, R); see Rotation matrix#Uniform random rotation
matrices.
Theorem. Let M be a random orthogonal n × n matrix distributed uniformly, and Aa fixed n × n matrix
such that tr(AA*) = n, and let X = tr(AM). Then the distribution of X is close to N(0, 1) in the total variation
metric up to 2√3/(n−1).
Implications
Theorem. Let random variables X1, X2, …∈L2(Ω) be such that Xn → 0 weakly in L2(Ω) and Xn2 →
1 weakly in L1(Ω). Then there exist integers n1<n2< … such that
√ converges in distribution to N(0, 1) as k tends to infinity.
Q-analogues
A generalized q-analog of the classical central limit theorem has been described by Umarov, Tsallis
and Steinberg in which the independence constraint for the i.i.d. variables is relaxed to an extent defined
by the q parameter, with independence being recovered as q->1. In analogy to the classical central limit
theorem, such random variables with fixed mean and variance tend towards the q-Gaussian distribution,
which maximizes the Tsallis entropy under these constraints. Umarov, Tsallis, Gell-Mann and Steinberg
have defined q-analogs of all symmetric alpha-stable distributions, and have formulated a number of
conjectures regarding their relevance to an even
more general Central limit theorem.
128
ECO 253 STATISTICS FOR ECONOMIST 1
4.6 SUMMARY
In this unit, we have been able to treat the issues which are addition to the Classical Limit Theorem
beyond the classical framework and this includes: Products of positive random variables, the theorem around
convex body, the Lacunary trigonometric series, the Linear Functions of Orthogonal Matrices and its
implication among others. Students are expected to be proficient in the use of the theorem in order to be
able to apply it to solving practical day-to-day problems.
Zygmund, A.(1959), Trigonometric series, Volume II, Cambridge. (2003 combined volume I,II: ISBN
0-521-89053-5) (Sect. XVI.5, Theorem 5-5)
Meckes, E. (2008), "Linear functions on the classical matrix groups", Transactions of the
9947-08-04444-9
129
ECO 253 STATISTICS FOR ECONOMIST 1
CONTENTS
Unit 1: Index Number
Unit 2: Statistical Data
Unit 3: Sample and Sampling Techniques
UINT STRUTURE
1.1 Introduction
1.2 Learning Outcomes
1.3 Index Number
1.3.1 Uses of index numbers
1.3.2 Types of index number
1.4 Methods of constructing index numbers
1.4.1 Solved Examples
1.5 Problems encountered in the construction of index numbers
1.6 Summary
1.7 References / Further Reading/Web Resources
1.8 Possible Answers to Self-Assessment Exercise(s) within the content
1.1 INTRODUCTION
Index numbers are indicators which reflect the relative changes in the level of certain phenomenon
in any given period (or over a specified period of time) called the current period with respect to its value in
some fixed period called the base period selected for comparison. The phenomenon or variable under
consideration may be price, volume of trade, factory production, agricultural production, imports or exports,
shares, sales, national income, wage structure, bank deposits, foreign exchange reserves, cost of living of
people of a particular community etc.
130
ECO 253 STATISTICS FOR ECONOMIST 1
Index number may be classified in terms of the variables they measure. They are generally
classified into three categories:
1. Price Index Number: The most common index numbers are the price index numbers which
study changes in price level of commodities over a period of time. They are of two types:
(a) Wholesale price index number – They depict changes in the general price level of the
economy.
(b) Retail Price Index Number – The reflect changes in the retail prices of different
commodities. The are normally constructed for different classes of consumers.
2. Quantity Index Number – They reflect changes in the volume of goods produced or consumed
3. Value Index Number – They study changes in the total value (price X quantity) e.g index
(1) Simple (unweight) Aggregate Method – Aggregate of prices (of all the selected
commodities) in the current year as a percentage of the aggregate of prices in the base year.
P01 → Price index number in the current year with respect to the base year
∑
Exercise: From the following data calculate Index Number by Simple Aggregate method.
Commodity A B C D
Price 2012 85 82 95 73
∑
∑
(2) Weighted Aggregate Method - In this method, appropriate weights are assigned to various commodities
to reflect their relative importance in the group. The weights can be production figures, consumption
figure or distribution figure
Limitations of the Simple Aggregate Method
(i) Laspeyre’s Price Index or Base year method – Taking the base year quantity as weights i.e w =
qo in the equation above, the Laspeyre‘s Price Index is given as:
∑
∑
∑
∑
(iv) Fisher’s Price Index – Irving Fisher advocated the geometric cross of Laspeyre‘s and
Paasche‘s Price index numbers and is given as:
∑ ∑
√ * +
∑ ∑
Fisher‘s Index is termed as an ideal index since it satisfies time reversal and factor
reversal test for the consistency of index numbers.
Many business and economic series reported over time, such as company sales, industry sales, and inventories,
are measured in dollar amounts. These time series often show an increasing growth pattern over time, which is
132
ECO 253 STATISTICS FOR ECONOMIST 1
generally interpreted as indicating an increase in the physical volume associated with the activities. For
example, a total dollar amount of inventory up by 10% might be interpreted to mean that the physical inventory
is 10% larger. Such interpretations can be misleading if a time series is measured in terms of Naira or dollars,
and the total Naira or dollar amount is a combination of both price and quantity changes. Hence, in periods
when price changes are significant, the changes in the Naira or dollar amounts may not be indicative of quantity
changes unless we are able to adjust the time series to eliminate the price change effect.
For example, let us assume that from 2006 to 2020, the total amount of spending in the construction industry
increased approximately by 75%. That figure suggests excellent growth in construction activity. However,
construction prices were increasing just as fast as—or sometimes even faster than—the 75% rate. In fact, while
total construction spending was increasing, construction activity was staying relatively constant or, as in the
case of new housing starts, decreasing. To interpret construction activity correctly for the 2006–2020 period,
we must adjust the total spending series by a price index to remove the price increase effect. Whenever we
remove the price increase effect from a time series, we say we are deflating the series.
In relation to personal income and wages, we often hear discussions about issues such as “real wages” or the
“purchasing power” of wages. These concepts are based on the notion of deflating an hourly wage index. For
example, if salaries increase from N30,000 per month to N60,000 per month from 2022 - 2023. Should workers
be pleased with this growth in monthly salaries? The answer depends on what happened to the purchasing
power of their salaries. If we can compare the purchasing power of the N30,000 monthly salaries in 2022 with
the purchasing power of the N60,000 monthly salaries hourly wage in 2023, we will be better able to judge the
relative improvement in wages.
Commodities Price per unit 2010 (N) Price per unit 2012 (N) Consumption value 2010
133
ECO 253 STATISTICS FOR ECONOMIST 1
Beans 90 110 900
Solution: In the above problem, we are given the base year (2010) consumption
values (poqo) and current year quantities (q1) are not given, the appropriate formula for index number
here is the
Laspeyre‘s Price Index.
∑ 3620 ∑ 4
350
Therefore, the Laspeyre‘s Price Index for 2012 with respect to (w.r.t) base 2010 is given by:
∑
∑
=
=
Example 2: From the following data calculate price index for 2012 with 2007 as the base year by
(i) Laspeyre‘s method (ii) Paasche‘s method (iii) Fisher‘s method and
(iv) Dowbish-Bowley price index methods
Solution:
Commoditis 2007 2012
Price Quantit Price Quantit po q o po q 1 p1 q o p1 q 1
134
ECO 253 STATISTICS FOR ECONOMIST 1
Gaari 20 8 40 6 160 120 320 240
Rice 50 10 60 5 500 250 600 300
Fish 40 15 50 15 600 600 750 750
Palm-oil 20 20 20 25 400 500 400 500
Total poqo= poq1= p1q0= p1q1=
= 1.24699 X 100
= 124.7
∑
∑
= 123.23
∑ ∑
* +
∑ ∑
* +
= 1.23235 X 100
= 123.24
1. The purpose of index number – This must be carefully defined as there is no general purpose
index number.
135
ECO 253 STATISTICS FOR ECONOMIST 1
2. Selection of base period – The base period is the previous period with which comparison of some
later period is made. The index of the base period is taken to be 100. The following points should be
borne in mind while selecting a base period:
(a) Base period should be a normal period devoid of natural disaster, economic boom,
depression, political instability, famine etc.
(b) The base period should not be too distant from the given period. This is because
circumstances such as tastes customs, habits and fashion keep changing.
(c) One must determine whether to use fixed-base or chain-base method
3. Selection of commodities – Commodities to be selected must be relevant to the study; must not be
too large nor too small and must be of the same quality in different periods.
4. Data for the index number- Data to be used must be reliable.
6. Choice of formula – There are different types of formulas and the choice is mostly dependent
on available data.
7. System of weighting – Different weights should be assigned to different commodities according
to their relative importance in the group.
1.6 SUMMARY
Price and quantity indexes are important measures of changes in price and quantity levels within the business
and economic environment. Price relatives are simply the ratio of the current unit price of an item to a base-
period unit price multiplied by 100, with a value of 100 indicating no difference in the current and base-period
prices. Aggregate price indexes are created as a composite measure of the overall change in prices for a given
group of items or products. Usually the items in an aggregate price index are weighted by their quantity of
usage. A weighted aggregate price index can also be computed by weighting the price relatives by the usage
quantities for the items in the index.
In this unit, we have been able to introduce students to the concept of index numbers, its uses and methods
of calculation. Students are now expected to be proficient in the calculation, use and interpretation of index
numbers. This is useful in the study and interpretation of inflation, cost of living, trends of economic
variables among others.
136
ECO 253 STATISTICS FOR ECONOMIST 1
2. (i) Price Index Number: The most common index numbers are the price index numbers
which study changes in price level of commodities over a period of time. They are of two types:
(a) Wholesale price index number – They depict changes in the general price level of the
economy.
(b) Retail Price Index Number – The reflect changes in the retail prices of different
commodities. The are normally constructed for different classes of consumers.
(ii). Quantity Index Number – They reflect changes in the volume of goods produced or consumed
(iii). Value Index Number
3. (i) Selection of commodities – Commodities to be selected must be relevant to the study; must not be
too large nor too small and must be of the same quality in different periods.
(ii) Data for the index number- Data to be used must be reliable.
(iv). Choice of formula – There are different types of formulas and the choice is mostly dependent
on available data.
(v). System of weighting – Different weights should be assigned to different commodities according
to their relative importance in the group.
137
ECO 253 STATISTICS FOR ECONOMIST 1
UNIT STRUCTURE
2.1 Introduction
2.2 Learning Outcomes
2.3. Statistical Data
2.3.1 Types of data
2.4 Classification based on form of the data
2.5 Sources of Data
2.6 Summary
2.7 References/Further Reading/Web Resources
2.8 Possible Answers to Self-Assessment Exercise(s) within the content
2.1 INTRODUCTION
Statistics is a branch of mathematics that deals with the collection, organization, and analysis of numerical
data and with such problems as experiment design and decision making. Simple forms of statistics have
been used since the beginning of civilization, when pictorial representations or other symbols were used to
record numbers of people, animals, and inanimate objects on skins, slabs, or sticks of wood and the walls
of caves. Before 3000BC the Babylonians used small clay tablets to record tabulations of agricultural
yields and of commodities bartered or sold. The Egyptians analyzed the population and material wealth
of their country before beginning to build the pyramids in the 31st century BC. The biblical books of
Numbers and first Chronicles are primarily statistical works, the former containing two separate censuses
of the Israelites and the latter describing the material wealth of various Jewish tribes. Similar numerical
records existed in China before 2000BC. The ancient Greeks held censuses to be used as bases for taxation
as early as 594BC. The Roman Empire was the first government to gather extensive data about the
population, area, and wealth of the territories that it controlled.
At present, however, statistics is a reliable means of describing accurately the values of economic, political,
social, psychological, biological, and physical data and serves as a tool to correlate and analyze such data.
The work of the statistician is no longer confined to gathering and tabulating data, but is chiefly a process
of interpreting the information. The development of the theory of probability increased the scope of
statistical applications. Much data can be approximated accurately by certain probability distributions, and
the results of probability distributions can be used in analyzing statistical data. Probability can be used
to test the reliability of statistical inferences and to indicate the kind and amount of data required for a
particular problem.
1. Based on sources – Data can be classified base on the sources from which they are obtained. In this
regards, we have:
(a) Primary data – These are data collected directly from the field of enquiries by the user(s)
or researcher(s) themselves.
138
ECO 253 STATISTICS FOR ECONOMIST 1
Advantages
– They are always relevant to the subject under study because they are collected
primarily for the purpose.
- They are more accurate and reliable
- Provide opportunity for the researcher to interact with study population.
- Information on other relevant issues can be obtained
Disadvantages
– Always costly to collect
- Inadequate cooperation from the study population
- Wastes a lot of time and energy
(b) Secondary Data: These are data which have been collected by someone else or some
organization either in published or unpublished forms.
Advantages
- It is easier to get
- It is less expensive
Disadvantages
- May not completely meet the need of the research athand because it was not
collected primarily for particular purpose
- There is always a problem of missing periods
139
ECO 253 STATISTICS FOR ECONOMIST 1
December 2013 will form a panel data.
Note that Social and Economic data of national importance are collected routinely as by-product of
governmental activities e.g. information on trade, wages, prices, education, health, crime, aids and grants
etc.
Data can be obtained from existing sources or from surveys and experimental studies designed to collect
new data.
2.5.1 Existing Sources
In some cases, data needed for a particular application already exist. Companies maintain a variety of
databases about their employees, customers, and business operations. Data on
employee salaries, ages, and years of experience can usually be obtained from internal personnel
records. Other internal records contain data on sales, advertising expenditures, distribution costs,
inventory levels, and production quantities. Most companies also maintain detailed data about their
customers.
Organizations that specialize in collecting and maintaining data make available substantial
amounts of business and economic data. Companies access these external data
sources through leasing arrangements or by purchase. Dun & Bradstreet, Bloomberg, and
Dow Jones & Company are three firms that provide extensive business database services
to clients. ACNielsen and Information Resources, Inc., built successful businesses collecting and
processing data that they sell to advertisers and product manufacturers. Data are also available from a
variety of industry associations and special interest organizations.
The Travel Industry Association maintains travel-related information such as the number of tourists and
travel expenditures by states. Such data would be of interest to firms and individuals in the travel
industry. The Post-graduate Admission Council maintains data on test scores, student characteristics,
and graduate management education programs. Most of the data from these types of sources are
available to qualified users at a modest cost. The Internet continues to grow as an important source of
data and statistical information. Almost all companies maintain websites that provide general
information about the company as well as data on sales, number of employees, number of products,
product prices, and product specifications. In addition, a number of companies now specialize in making
information available over the Internet. As a result, one can obtain access to stock quotes, meal prices
at restaurants, salary data, and an almost infinite variety of information. Government agencies are
another important source of existing data. For instance, the National Bureau of Statistics maintains
considerable data on employment rates. Most government agencies that collect and process data also
make the results available through a website.
140
ECO 253 STATISTICS FOR ECONOMIST 1
Non-experimental, or observational, statistical studies make no attempt to control the variables of interest. A
survey is perhaps the most common type of observational study. For instance, in a personal interview survey,
research questions are first identified. Then a questionnaire is designed and administered to a sample of
individuals. Some restaurants use observational studies to obtain data about customer opinions on the quality
of food, quality of service, atmosphere, and so on. A customer opinion questionnaire is used to gather
information about the restaurant. The customers are asked to fill out the questionnaire by providing ratings,
including overall experience, greeting by hostess, manager (table visit), overall service, and so on. The response
categories of excellent, good, average, fair, and poor provide categorical data enable the restaurant management
to maintain high standards for the restaurant’s food and service.
141
ECO 253 STATISTICS FOR ECONOMIST 1
(ii) Economic and Financial Review (by CBN) (iv)
Population of Nigeria (by NPC)
Data collection requires one of the following scales of measurement: nominal, ordinal, interval, or ratio. The
scale of measurement determines the amount of information contained in the data and indicates the most
appropriate data summarization and statistical analyses.
When the data for a variable consist of labels or names used to identify an attribute of the element, the scale of
measurement is considered a nominal scale. In cases where the scale of measurement is nominal, a numeric
code as well as nonnumeric labels may be used. For example, to facilitate data collection and to prepare the
data for entry into a computer database, we might use a numeric code by letting 1 denote primary education, 2
denote secondary education, and 3 denotes tertiary education. In this case the numeric values 1, 2, and 3 identify
the level of education. The scale of measurement is nominal even though the data appear as numeric values.
The scale of measurement for a variable is called an ordinal scale if the data exhibit the properties of nominal
data and the order or rank of the data is meaningful. For example, Abuja Automotive sends customers a
questionnaire designed to obtain data on the quality of its automotive repair service. Each customer provides a
repair service rating of excellent, good, or poor. Because the data obtained are the labels—excellent, good, or
poor—the data have the properties of nominal data. In addition, the data can be ranked, or ordered, with respect
to the service quality. Data recorded as excellent indicate the best service, followed by good and then poor.
Thus, the scale of measurement is ordinal. Ordinal data can also be provided using a numeric code, for example,
your class rank in school.
The scale of measurement for a variable is an interval scale if the data have all the properties of ordinal data
and the interval between values is expressed in terms of a fixed unit of measure. Interval data are always
numeric. Scholastic Aptitude Test (SAT) scores are an example of interval-scaled data. For example, three
students with SAT math scores of 620, 550, and 470 can be ranked or ordered in terms of best performance to
poorest performance. In addition, the differences between the scores are meaningful. For instance, student 1
scored 620 - 550 =70 points more than student 2, while student 2 scored 550 – 470 = 80 points more than
student 3.
The scale of measurement for a variable is a ratio scale if the data have all the properties of interval data and
the ratio of two values is meaningful. Variables such as distance, height, weight, and time use the ratio scale of
measurement. This scale requires that a zero value be included to indicate that nothing exists for the variable at
the zero point For example, consider the cost of an automobile. A zero value for the cost would indicate that
the automobile has no cost and is free. In addition, if we compare the cost of N300,000 for one automobile to
142
ECO 253 STATISTICS FOR ECONOMIST 1
the cost of N150,000 for a second automobile, the ratio property shows that the first automobile is
N300,000/N15,000 = 2 times, or twice, the cost of the second automobile..
2.6 SUMMARY
This unit has acquainted you with the transformation of the processed data into statistics and steps in the
statistical cycle. The transformation involves analysis and interpretation of data to identify important
characteristics of a population and provide insights into the topic being investigated.
143
ECO 253 STATISTICS FOR ECONOMIST 1
3.1 INTRODUCTION
Researchers collect data in order to test hypotheses and to provide empirical support for explanations and
predictions. Once investigators have constructed their measuring instrument in order to collect sufficient
data pertinent to the research problem, the subsequent explanations and predictions must be capable of being
generalized to be of scientific value. Generalizations are important not only for testing hypotheses but also
for descriptive purposes. Typically, generalizations are not based on data collected from all the
observations, all the respondents, or the events that are defined by the research problem as this is always not
possible or where possible too expensive to undertake. Instead, researchers use a relatively small number of
cases (a sample) as the bases for making inferences for all the cases (a population).
The entire set of relevant units of analysis, or data is called the population. When the data serving as the
basis for generalizations is comprised of a subset of the population, that subset is called a sample. A
particular value of the population, such as the mean income or the level of formal education, is called a
parameter; its counterpart in the sample is termed the statistic. The major objective of sampling theory is
to provide accurate estimates of unknown values of the parameters from sample statistics that can be easily
calculated. To accurately estimate unknown parameters from known statistics, researchers have to effectively
deal with three major problems:
3.3.1 Population
Methodologically, a population is the ―aggregate of all cases that conform to some designated set of
specifications‖. For example, a population may be composed of all the residents in a specific neighbourhood,
legislators, houses, records, and so on. The specific nature of the population depends on the research
problem. If you are investigating consumer behaviour in a particular city, you might define the population
as all the households in that city. Therefore, one of the first problems facing a researcher who wishes to
estimate a population value from a sample value is how to determine the population involved.
145
ECO 253 STATISTICS FOR ECONOMIST 1
or a nation.
A well – designed sample ensures that if a study were to be repeated on a number of different samples drawn
from a given population, the findings from each sample would not differ from the population parameters by
more than a specified amount. A probability sample design makes it possible for researchers to estimate
the extent to which the findings based on one sample are likely to differ from what they would have
found by studying the entire population. When a researcher is using a probability sample design, it is
possible for him or her to estimate the
population‘s parameters on the basis of the sample statistics calculated.
146
ECO 253 STATISTICS FOR ECONOMIST 1
3.4.1 Non-probability Sample Designs
Three major designs utilizing non-probability samples have been employed by social scientists:
convenience samples, purposive samples, and quota samples.
(a) Convenience sampling: Researchers obtain a convenience sample by selecting whatever sampling
units are conveniently available. Thus a University professor may select students in a class; or a
researcher may take the first 200 people encountered on the street who are willing to be interviewed.
The researcher has no way of estimating the representativeness of convenience sample, and therefore
cannot estimate the population‘s parameters.
(b) Purposive sampling: With purposive samples (occasionally referred to as judgment samples),
researchers select sampling units subjectively in an attempt to obtain a sample that appears to be
representative of the population. In order words, the chance that a particular sampling unit will selected
for the sample depends on the subjective judgment of the researcher. At times, the main reason for
selecting a unit in purposive sampling is the possession of pre-determined characteristic(s) which may
be different from that the main population. For example, in a study of demand preference for cigarette
brands in a city, researcher will need to select smokers purposively.
(c.) Quota sampling: The chief aim of quota sample is to select a sample that is as similar as possible to
the sampling population. For example, if it is known that the population has equal numbers of males and
females, the researcher selects an equal numbers of males and females in the sample. In quota sampling,
interviewers are assigned quota groups characterized by specific variables such as gender, age, place of
residence, and ethnicity.
Four common designs of probability samples are simple random sampling, systematic sampling, stratified
sampling, and cluster sampling.
(a) Simple random sampling – is the basic probability sampling design, and it is incorporated into all
the more elaborate probability sampling designs. Simple random sampling is a procedure that gives each
of the total sampling units of the population an equal and known nonzero probability of being selected.
For example, when you toss a perfect coin, the probability that you will get a head or a tail is equal
and known (50 percent), and each subsequent outcome is independent of thprevious outcomes.
Random selection procedures ensure that every sampling unit of the population has an equal and known
probability of being included in the sample; this probability is n/N, where n stands for the size of the
sample and N for the size of the population. For example if we are interested in selection 60 household
147
ECO 253 STATISTICS FOR ECONOMIST 1
from a population of 300 households using simple random sampling, the probability of a particular household
being selected is 60/300 = 1/5.
(b) Systematic Sampling: It consists of selecting every kth sampling unit of the population after the first
sampling unit is selected at random from the total of sampling units. Thus if you wish to select a
sample of 100 persons from total population of 10,000, you would take every hundredth individual
(K=N/n = 10,000/100 = 100). Suppose that the fourteenth person were selected; the sample would
then consist of individuals numbered 14,114, 214, 314, 414, and so on. Systematic sampling is more
convenient than simple random sampling. Systematic samples are also more amenable for use with very
large populations or when large samples are to be selected.
(c) Stratified Sampling:Researchers use this method, primarily to ensure that different groups of
population are adequately represented in the sample. This is to increase their level of accuracy when
estimating parameters. Furthermore, all other things being equal, stratified sampling considerably
reduces the cost of execution. The underlying idea in stratified sampling is to use available information
on the population ―to divide it into groups such that the elements within each group are more alike
than are the elements in the population as a whole. That is, you create a set of homogeneous samples
based on the variables you are interested in studying. If a series of homogenous groups can be sampled
in such a way when the samples are combined they constitute a sample of a more heterogeneous
population, you will increase the accuracy of your parameter estimates.
(d) Cluster sampling: it is frequently used in large-scale studies because it is the least expensive sample
design. Cluster sampling involves first selecting large groupings, called clusters, and then selecting
the sampling units from the clusters. The clusters are selected by a simple random sample or
a stratified sample. Depending on the research problem, researchers can include all the sampling
units in these clusters in the sample or make a selection within the clusters using simple or stratified
sampling procedures.
There are several misconceptions about the necessary size of a sample. One is that the sample size must be
certain proportion (often set as 5 percent) of the population; another is that the sample should total about
2000; still another is that any increase in the sample size will increase the precision of the sample results.
148
ECO 253 STATISTICS FOR ECONOMIST 1
These are faulty notions because they do not derive from the sampling theory. To estimate the adequate
size of the sample properly, researchers need to determine what level of accuracy is expected of their
estimates; that is, how large a standard error is acceptable.
Advantages
➢ The cost is low compared to others
➢ Biasing error is reduced because respondents are not influenced by interviewed
characteristics or techniques.
➢ Questionnaires provide a high degree of anonymity for respondents. This is especially important
when sensitive issues are involved.
➢ Respondents have time to think about their answers and /or consult other sources.
➢ Questionnaires provide wide access to geographically dispersed samples at low cost
149
ECO 253 STATISTICS FOR ECONOMIST 1
Disadvantages
➢ Questionnaires require simple, easily understood questions and instructions
➢ Mail questionnaires do not offer researchers the opportunity to probe for additional information
or to clarify answers.
➢ Researchers cannot control who fills out the questionnaire.
➢ Response rate are low
Researchers use various strategies to overcome the difficulty of securing an acceptable response rate to
mail questionnaires and to increase the response rate.
several considerations: typography, colour, and length and type of cover letter.
Personal interview
The personal interview is a face-to-face, interpersonal role situation in which an interviewer asks
respondents question designed to elicits answers pertinent to the research hypotheses. The questions, their
wording, and their sequence define the structure of the interview.
150
ECO 253 STATISTICS FOR ECONOMIST 1
• High response rate: The personal interview results in a higher response rate than the mail
questionnaire.
• Fuller information: An interviewer can collect supplementary information about
respondents. This may include background information, personal characteristics and their
environment that can aid the researcher in interpreting the results.
Telephone interview
It is also called telephone survey, and can be characterised as a semi-personal method of collecting
information. In comparison, the telephone is convenient, and it produces a very significant cost saving.
SAE 2
3.6 SUMMARY
You now would be able to discern that a sample is a subset of a population selected to meet specific
objectives. And also familiar with the guiding principle and sampling techniques in selecting a sample,
is that it must, as far as possible have the essential characteristics of the target
population
Gupta S.C (2011). Fundamentals of Statistics.(6th Rev. and Enlarged ed.). Mumbai India:
Himalayan Publishing House
Esan E. O. and Okafor R. O. (1995) Basic Statistical Methods,(1sted.). Lagos Nigeria: JAS
Publishers, pages 72-89
152
ECO 253 STATISTICS FOR ECONOMIST 1
SAE 2
1. (i) mail questionnaires
(ii) Personal interviews
(iii) Telephone interviews.
153