0% found this document useful (0 votes)
29 views53 pages

Sta 111 Note

The document outlines the course STA 111 (Descriptive Statistics), covering topics such as statistical data types, data presentation methods, measures of location and dispersion, and the Nigerian statistical system. It distinguishes between primary and secondary data, discusses data collection methods, and emphasizes the importance of statistics in decision-making and planning. Additionally, it includes examples of data presentation techniques like frequency distribution, cumulative frequency, and various types of charts.

Uploaded by

keriansfaustina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views53 pages

Sta 111 Note

The document outlines the course STA 111 (Descriptive Statistics), covering topics such as statistical data types, data presentation methods, measures of location and dispersion, and the Nigerian statistical system. It distinguishes between primary and secondary data, discusses data collection methods, and emphasizes the importance of statistics in decision-making and planning. Additionally, it includes examples of data presentation techniques like frequency distribution, cumulative frequency, and various types of charts.

Uploaded by

keriansfaustina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 53

STA 111 (DESCRIPTIVE STATISTICS) 2 UNITS

COURSE OUTLINE

1 STATISTICAL DATA: (TYPES, SOURCES, METHOD OF COLLECTION)

2 PRESENTATION OF DATA: (TABLES, CHARTS AND GRAPHS)

3 ERRORS AND APPROXIMATIONS

4 FREQUENCY AND CUMULATIVE DISTRIBUTIONS

5 MEASURES OF LOCATION, PARTITION AND DISPERSION

6 SKEWNESS AND KURTOSIS

7 RATES RATIOS AND INDEX NUMBERS

8 NIGERIAN STATISTICAL SYSTEM

9 OFFICAL STATISTICS AND THEIR PROBLEMS

TEXT BOOKS- FUNDAMENTALS OF STATISTICS (S.C. GUPTA)

AN INTRODUCTORY STATISTICS (N.S. UDOH, K.M. NSINI)

1
INTRODUCTION

MEANING OF STATISTICS

Statistics is a branch of knowledge which deals with the scientific method employed in the proper
collection, presentation, analysis and dissemination of information that can be expressed in
numerical form. (As a plural), statistics means numerical set of data.

(As single), Statistics mean the science of statistical methods embodying the theory and
techniques used for collecting, analyzing and drawing inferences from the numerical data.

Statistics is divided into 2 parts namely:

1. Descriptive (applied) statistics


2. Inferential (mathematical) statistics

1. Descriptive (applied) statistics- This deals with the method of illustrating the mass of data
in order to provide more precise information that are capable of being readily assimilated
or used in decision making.

Business statistics are descriptive applied statistics as they deal with the analysis, measurement
and presentation of business facts relating to past or present on which certain decisions about
various business problems are usually taken.

USES OF STATISTICS

1 Decision making
2 Planning
3 Routine administration
4 Forecasting
5 Quality control

STATISTICAL DATA: These are those measurable or enumerable or observable characteristics


of variables. Statistical data are information that can be expressed in numerical form.

DATA is an arranged information or facts. The information collected through census and surveys
or in a routine manner or other sources is called a raw data.

TYPES OF STATISTICAL DATA

There are 2 types of statistical data namely: Primary and Secondary data.

1. Primary Data- These are those data collected by a particular person or organization for
his own use from the primary source.

2
2. Secondary Data- These are those data collected by some other person or organization for
their own use, but the investigator also gets it for his use.

Sources of Data

Thus, we have primary and secondary sources.

1. Primary sources- When an investigator plans and collects the needed information at first
hand (in person), the data so collected are called primary data and the source a primary
source.

Advantages

a. You have confidence in the data because of self supervision.


b. You can obtain a highly comprehensive result
c. They are more reliable.

Disadvantages

a. Often time consuming.


b. It is expensive to obtain

Secondary sources- When bulk of the statistical data are obtained from existing sources
published or unpublished, example of such sources includes, government agencies, newspapers,
research institutes. We refer to such data as secondary data and the sources as secondary sources.

Advantages

a. They are convenient to obtain


b. They are readily available
c. They are valuable past information.

Disadvantages

a. Official protocols- It might not be easily obtainable.


b. It can be abused or misused.
c. Precision may be difficult to obtain because method of collection might be unknown.
d. The data may not be in a form suitable for the problem at hand, i.e low expectation of
data.

Preference of primary data to secondary data

1. Reliability- Primary data are usually more reliable and meaningful than secondary
data.
2. Greater details- The primary data are usually more comprehensive and also show
data in greater details.

3
3. The primary source frequently includes definition of terms and units used in data
collection.

Methods of data collection.

1. The use of documentary source- Data on events relating to the past are often obtained
from documentary source such as:

A Sources of information about the survey population, for example, records of Nigerian
Medical Association.

B Sources of information about individual units of the population, for example, employment
records.

C Personal documents- Those documents which come directly from the informants, for
example, Diaries, letters, essays etc.

2 The observation method- The systematic method of data collection in which the events are
watched as they occur naturally and notes are taken. Here the required information is
obtained directly rather than through reports of others. It can fairly be referred to as the
classic method of scientific investigation.

The interview method- An interview is a conversation between two persons, the interviewer and
the respondent with the aim of eliciting certain information from the respondent. The interviewer
is the person asking the question, while the respondent is the person doing the answering. An
ideal interviewer is expected to have the following personal qualities:

1 Honesty

2 Interest

3 Accuracy

4 Adaptability

5 Intelligent

6 Tactfulness

Advantages of the interview method

1 High rate.
2 Instant response.
3 Errors could be corrected.
4 It is useful in developing countries where a greater number of the population can neither
reads nor write.

4
Disadvantages

1 High cost
2 Interviewer bias or errors
3 No room for consultation

The mail questionnaire- This method of data collection involves the sending of a set of well-
worded set of questions on the subject under investigation by post to carefully selected
respondents.

Qualities of a good questionnaire

The questions asked in a questionnaire should be:

A Clear

B Brief (as small as possible)

C Corroborative

D Non- offending

E courteous in tone

F unambiguous

G Questions of a sensitive and personal nature should be avoided.

H Questions should be arranged in a natural logical sequence.

Things that will help one to design a good questionnaire

1. You have to decide the information required.


2. Define the target respondent.
3. Choose the methods of reaching your target respondent. (example, post, telephone,
electronic and personally
4. Decide on the question content
5. Develop the question wording
6. Put the question into a meaningful order and format
7. Check the length of the question (not to be too lengthy)
8. Develop the final survey form.

5
PRESENTATION OF DATA
When the data are presented in tables, charts or graphs in order to bring out their salient features,
it is called the presentation of data.
The raw data can be arranged in the following ways:
1. Serial order
Roll No. Marks
1 41
2 25
3 5
4 33
5 12
6 21
7 19
8 39
9 19
10 21

2. Ascendingorder- 5, 12, 19, 19, 21, 21, 25, 33, 39, 41

3. Descendingorder- 41, 39, 33, 25, 21, 21, 19, 19, 12, 5

Note- when a raw data is put in ascending or descending order of magnitude, it is called an array.

4 Data in tabular form- A simple table has three columns; thus


a. Variable or characteristic column
b. Tally marks
c. Frequency (number of time an observation occurs)

A tabular arrangement of data by classes together with the corresponding class frequencies is
called frequency distribution or frequency table.

Example 1- The following are ages of some students in 100 level statistics: 16, 20, 15, 25, 30, 15,
25, 20, 20, 25, 15, 25, 20, 30, 20, form the frequency table for the given data.

6
Age Tally Frequency

15 3

16 1

20 5

25 4

30 2

15

The above table is called ungrouped frequency table.

The ungrouped frequency distribution is not helpful when the data is very large or when the
values making up the data have large spread. For such a data, the table will be too long, or at
times impossible.

GROUPED FREQUENCY DISTRIBUTION

Instead of making use of single values, the grouped frequency distribution method group values
together. The frequency of each group is then matched with it to form the desired table.

Note the following-

Class interval- this refers to the length of interval of a class. Example, 1-5, 6-10, 11-15, 16-20,
etc.

Class size or class width-denoted by “i” is the number of elements in a class. From above, class
size is 5.

Class limits- are the extreme scores, that is, lower and upper limits. In the example above 1-5,
the lower class limit is 1 and the upper class limit is 5.

Class boundary- to get this, subtract ½ (0.5) from the lower limit of each class and add ½ (0.5) to
the upper of each class, example, the class 6-10 will be 5.5-10.5, 11-15 will be 10.5-15.5.

Example 2

Form the group frequency distribution for the data given below using class size [i] = 3

7
9 15 12 14 17 20 18 19 23

13 10 15 16 18 21 17 22 24

11 14 17 18 19 20 19 22 24

11 12 18 18 20 21 18 17 25

13 14 12 14 17 16 15 14 23

range
NOTE- number of class interval is =
class ¿ ¿ ¿

Range = maximum – minimum (number)

Solution:

Class Tally Frequency

9-11 4

12-14 10

15-17 10

18-20 12

21-23 7

24-26 2

45

Cumulative frequency and Relative frequency distribution.

Cumulative frequency is the addition of successive frequencies.

Relative frequency is the division of each frequency by the total frequency.

Class frequency
Relative frequency =
Total frequency

Relative frequency should always sum up to 1.00 or 100%.

8
Example 3.

Form a cumulative and relative frequency table for the data below:

X 6 7 8 9 10

F 5 9 6 8 2

Solution :

X F Cumulative Freq. Relative Freq.

6 5 5 5/30 =0.167

7 9 14= (5+9) 9/30 = 0.3

8 6 20= (14+6) 6/30 = 0.2

9 8 28=(20+8) 8/30 = 0.27

10 2 30= (28+2) 2/30 = 0.067

30

Assignment 1- The heights (in cm) of 40 persons are:- 110, 112, 125, 135, 150, 155, 152, 150,
159, 130, 128, 138, 133, 143, 147, 151, 154, 156, 112, 116, 119, 111, 113, 115, 118, 121, 123,
120, 125, 121, 110, 113, 114, 149, 153, 155, 150, 155, 152, 111. Array the data and form a
cumulative frequency table with class interval of 10. {110-120, 120-130…} also compute the
relativefrequencies.

9
DIAGRAMATIC AND GRAGHICAL PRESENTATION OF DATA

1 Line diagram
2 Bar charts
3 Pie charts
4 Pictogram
5 Histogram
6 Frequency polygon
7 Cumulative frequency curve

Line diagram (line graph): This is the simplest of all diagrams. Note that in graphs, the vertical
axis is always the y-axis while the horizontal axis is always the x-axis.

Example 1

Plot a line graph for the following set of data given below.

X 6 7 8 9 10

F 5 9 6 8 2

Bar charts- a bar chart is a series of rectangular bars, drawn either vertically or horizontally for
each class with the height of each bar being proportional to the magnitude of its frequency. The
bar and pie charts are usually suitable for presenting qualitative data (discrete).

Types of bar chart

1 Simple bar chart


2 Component bar chart
3 Multiple bar chart
10
4 Compound bar chart

1 Simple bar chart- These are blocks or rectangular diagrams in which single bars are used
to represent the frequency of each category.

Example 2- Below are data showing the expenditure (₦’000) of a family in 2019. Use the
information to illustrate the preparation of simple bar chart, component bar chart multiple bar
chart and pie chart.

Period
Item Jan-April May-Aug. Sept-Dec. Total
Food 330 190 425 945
Clothing 65 85 175 325
Rent 170 170 170 510
Medi care 75 100 45 220
Others 100 125 175 400

Solution;

A simple bar chart ????

2 Component bar chart- this is a chart that shows part of a whole. Each bar is subdivided
such that the ratio of each subdivision indicates its relationship to the total.

Component bar chart


1000
900
800
700
600 Series3
500 Series2
400 Period
300
200
100
0
m od ng nt re rs
Ite Fo thi Re i ca the
Cl
o ed O
M

11
3 Multiple bar chart- In this case the bars are shown in multiples, each component value is
shown as a bar of its own joining to another.

Multiple bar chart


450
400
350
300 Period
Series2
250
Series3
200
150
100
50
0
Item Food Clothing Rent Medi careOthers
4 Pie chart- Thisis a circular diagram cut into segments with the size of each segment being
proportional to the size of the category it represents. The value of each item is expressed
in degrees.

Total expenditure = 2400 (₦’000)

945
Food = *3600= 1420 (39%)
2400

325
Clothing = * 3600 = 490 (14%)
2400

510
Rent = * 3600 = 76.50 (21%)
2400

220
Medicare = * 3600 = 330 (9%)
2400

A PIE CHART
Food
Clothing
Rent
Medi care
Others

12
Compound bar chart- these are charts used to facilitate comparison.

Example – The table below represents the birth and death rate in Imo state from 2002 to 2007

Year 2002 2003 2004 2005 2006 2007


Birth rate 120 100 90 50 30 60
Death rate 70 110 80 40 90 50

Represent the information above using a compound bar chart.

Compound bar Chart


140

120

100
Birth rate
80 Death rate
60

40

20

0
1 2 3 4 5 6
s

Pictogram (pictograph)- This simply means the use of pictures to represent statistical data.
Pictogram helps in quick visualization of comparison.

For example, a pictogram might be used to represent the number of 100 level students in
MOUAU.

Where = 1000

:- 2012 = = 3000

2013 = = 5000

2014 = = 2000

Histogram – This is a graph of quantitative data, consisting of rectangular bars drawn on a


continuous scale. The bars are of varying height, each corresponding to the frequency of the class
boundaries they represent. To construct a histogram, we plot the frequency against the class
boundaries.

13
Example: the table below shows the distribution of 40 students by weight.

Class 51-55 56-60 61-65 66-70 71-75 76-80 81-85


Frequency 3 5 9 11 5 5 2
Represent the above information using a histogram, frequency polygon and cumulative frequency
curve (Ogive).

Solution-

Class Frequency Class boundary Class mark Cumulative freq


51-55 3 50.5-55.5 53 3
56-60 5 55.5-60.5 58 8
61-65 9 60.5-65.5 63 17
66-70 11 65.5-70.5 68 28
71-75 5 70.5-75.5 73 33
76-80 5 75.5-80.5 78 38
81-85 2 80.5-85.5 83 40
40
A histogram???

Frequency polygon- This is plotting the frequency against class mark or join the mid-points of
the bars on the histogram.

Cumulative frequency curve (Ogive)- Here we plot the cumulative frequency against upper
class boundaries.

An ogive???

Assignment. (1)The following figures shows how many people visited a cinema each day for 50
days.

32 39 30 60 53 54 35 51 13 42

36 43 44 44 38 39 52 25 30 58

28 27 31 44 29 46 49 42 47 51

43 34 52 50 39 53 51 20 38 20

54 33 35 45 51 59 19 28 34 25

A Using the intervals of 11-20, 21-30 etc, arrange the figures in classes and obtain a
frequency distribution.

14
Bi Draw a histogram. (ii) Draw the frequency polygon. (iii) And obtain the cumulative
frequency curve for the distribution.

(2) 120 students were asked what their favourite course is, their replies are as follows.

Courses GSS111 MTH111 STA111 STA112 CMP111 UGC111


Number 30 21 15 16 18 10
Show the information on (a) a pie chart (b) a bar chart.

ERRORS AND APPROXIMATIONS

The word error refers to the difference between the true value of a population parameter and its

Estimated or approximated value provided by an appropriate sample statistic computed by some

Statistical device.

Some factors that bring about errors in statistics.

1 Approximations in measurements- for example, age may be measured correct to the


nearest month, distance may be measured correct to the nearest meter etc.
2 Approximation in rounding off figures to the nearest hundreds, thousand etc or in
rounding of decimals.
3 Bias due to faulty collection and analysis of data, personal bias of the investigator.

Errors are classified into 2 groups namely sampling and non-sampling errors

Sampling errors- this error occurs due to the fact that only a subset of the population, that is
(sample) has been used to estimate the population parameters and draw inferences about the
population. Note that sampling error is present only in sample survey and is completely absent in
census method.

Some causes of sampling errors-

1 Faulty selection of the sample- that is a deliberate selection of a representative sample


instead of a simple random sampling technique which reduces bias.
2 Substitution- this obviously leads to bias due to the different characteristics possessed by
the substituted unit from the original unit.
3 faulty demarcation of sampling units- this depends largely on the discretion of the
investigator.

15
4 Error due to bias in the estimation method- in simple random sampling, if x 1, x2, …, xn are
n
1
observations on the n sampled units, then the sample variance S = =∑ ( xi− X ̅ )2
2
n i=1
is a biased estimate of the population variance , while an unbiased estimate of is
n
1
given by S2 = =∑ (̅ x i−X ̅ ) 2 .
n−1 i=1
X
5 Variability of the population.

Non-sampling errors- This error occur due to causes which can be traced and may arise at any
stage of the enquiry. The factors causing such errors are within human control. Non-sampling
errors are present in both census and sample survey.

Some factors responsible for non-sampling errors:

1 Personal bias of the investigator.


2 Lack of trained and qualified investigators and lack of supervisory staff.
3 Failure of respondents’ memory to recall past events.
4 Defective method of interviewing and asking question.
5 Exaggerated or wrong answers from the respondents, example, reducing ones age due to
self-interest.
6 Non-response or incomplete response.
7 Faulty planning.
8 Improper coverage.
9 Compiling errors
10 Publication errors.

Bias and unbiased errors

Biased errors- this type of error is due to:

1 Bias in the measuring instrument used.


2 Bias due to non-response.
3 Bias due to faulty collection of data.
4 Bias in the technique of approximation.
5 Respondent’s bias (over-statement or under-statement).

Biased errors are also known as cumulative errors. Also the magnitude of biased errors is directly
proportional to the number of observations.

Unbiased errors- errors are said to be unbiased if the chance of making an over-estimate is
almost same as the chance of making an under-estimate. Example, rounding to the nearest
hundred, 380 = 400, 415 = 400. The magnitude of the unbiased error is inversely proportional to
the number of items.

16
Measure of statistical errors (Absolute and Relative errors)

Absolute error- an absolute error (A.E) is the difference between the true value of any particular
observed item or variable and its estimated or approximated values.

Symbolically it is written as A.E = │a-e│. where a is the actual value and e is the estimated value
and │a-e│ represents the modulus value of (a-e).

Example. If the number 54,87,350 is approximated to 55,00,000.

Then A.E = │a-e│ = │54,87,350 – 55,00,000│ = │-12,650│ = 12,650.

Relative error (R.E)- this is the ratio of the absolute error to the actual value.

AE │a−e │
Symbolically written as RE = =
Actual value a

12,650
From the above example, RE = = 0.0023
54 , 87,350

12,650
Or percentage RE = * 100 = 0.23.
54 , 87,350

Assignment-(1) List 7 control measures for non-sampling errors.

(2) Distinguish between sampling and non-sampling errors (at least 10 differences) in a tabular
form.

MEASURES OF CENTRAL TENDENCY OR LOCATION

There are five types of measures of central tendency or location


Arithmetic mean (A.M)
Median
Mode
Geometric mean (G.M)
Harmonic mean (H.M)
1. The Arithmetic mean (X̅) This is the sum of all the items in the distribution divided by the
number of items.
Given a set of numbers or random variable x1, x2,…, xn, the A.M of this n numbers is defined as
n
X
X̅ =
X 1+ x 2+ …+ x n
=
∑ Xi (for ungrouped data)
i=1
n
n
Example 1. Find the arithmetic mean of the numbers 5, 8, 7, 10, 20

17
5+8+7+10+ 2 50
Solution- X̅¿ = =5
5 5

Mean of Grouped Data


If the number x1, x2, ---, xnare associated with some frequencies f 1, f2, ---, fn, then the arithmetic mean
is given by

f 1 x 1+f 2 x 2+ …+f n x n
∑ Fi Xi
i=1
X̅ = =
f 1+f 2+… f n n

∑ Fi
i=1

Eg 2. Find the arithmetic mean of the data given below

X 6 7 8 9 10
F 5 9 6 8 2 30
FX 30 63 48 72 20 233

∑ Fi Xi 233
i=1
X̅ = = =7.77
n
30
∑ Fi
i=1

Where X̅ =(x-bar) = Sample mean


Xi = class mark
Fi = number of observation taken into the ith class
∑ = (sigma) = Greek Symbol meaning all valves from I ton
n = number of groups or classes in frequency distribution.

Example 3- Calculate the arithmetic mean

Class 51 – 55 56 – 60 61 – 65 66 – 70 71 – 75 76 – 80 81 - 85
Frequenc 3 5 9 11 5 5 2
y

Solution-

Class Frequency Xi fX
51 – 55 3 53 159
56 – 60 5 58 290
61 – 65 9 63 567
66 – 70 11 68 748
71 – 75 5 73 365

18
76 – 80 5 67 390
81 – 85 2 83 166
40 2685

X̅ =
∑ fx = 2685 = 67.1
∑ f 40
COMPUTATION OF MEAN USING THE ASSUMED MEAN.
The method is useful when the sizes of the numbers in the distribution are fairly large.
For ungrouped data the formula is given by;
n

X̅ = A +
∑ (xi− A) = A + ∑ V i
i=1
n
n

Where A = Assumed mean


V = deviation of each value from the assumed
Mean that is: Vi = Xi - A
n = number of samples

For a set of grouped data, the procedure for computing the mean is as follows
i. Choose the Assumed mean (A) from the class mark, preferably the middle number.
ii. Find the deviations (xi – A) of each class mark xi from the assumed mean, that is
Vi =xi – A.
iii. Find the productfivi and the sum ∑fivi
∑f i vi
iv. Obtain the correction factor
∑f i
v. Add the correction factor to the mean for the distribution using the formula.
n

∑ fi (xi− A)
X̅ = A +
i=1
=A+
∑ fi V i
n
fi
∑ fi
i=1

Example1- Ten coins were tossed together and the number of the result from them were
observed. The operation was performed 1050 times and the frequencies thus observed for
different number of tails (x) are shown in the following table calc. the arithmetic mean using the
assumed mean method.

X 0 1 2 3 4 5 6 7 8 9 10
F 2 8 43 133 207 260 213 120 54 9 1

Solution –

19
Let 5 be the assumed mean. That is, A = 5

X F V=x-5 Fv
0 2 -5 -10
1 8 -4 -32
2 43 -3 -129
3 133 -2 -266
4 207 -1 -207
5 260 0 0
6 213 1 213
7 120 2 240
8 54 3 162
9 9 4 36
10 1 5 5
∑f = 1050 ∑fv = 12

From formula X̅ = A +
∑ fi V i = 5 + 12 = 5 + 0.0114 = 5.0114
∑ fi 1050

Example 2 The number of days spent by patients in hospital after being operated upon is
shown in the following table.
Days Number of patients (f)
1-5 6
6-10 38
11-15 60
16-20 25
21-25 16
26-30 3
31-35 2
Calculate the mean using the assumed mean method.
Solution-

Days Class mark (x) F (X-A)=V FV


1-5 3 6 -15 -90
6-10 8 38 -10 -380
11-15 13 60 -5 -300
16-20 18 25 0 0
21-25 23 16 5 80
26-30 28 3 10 30
31-35 33 2 15 30
150 0 -630

Let the assumed mean be 18 = (A)

20
Using X̅ = A +
∑ FV = 18 +
−630
=18 - 4.2 = X̅ = 13.8
∑F 150

COMPUTATION OF THE MEAN USING THE CODING METHOD.


n

∑ F iU i
i=1
Using this method, the mean is defined as, X̅ = A + n W
∑Fi
i=1
Where- W = class width
F = frequency
A = Assumed mean (mid class mark)
X i−A
Ui =
W
Xi = class mark
Using the data of example 2 above calculate the mean using the coding method.
Solution- using A = 18, W = 5
Xi Vi= (X i –A) F Ui F iU i
3 -15 6 -3 -18
8 -10 38 -2 -76
13 -5 60 -1 -60
18 0 25 0 0
23 5 16 1 16
28 10 3 2 6
33 15 2 3 6
Total 0 150 0 -126
n

∑ F iU i −126
i=1
X̅ = A + W = 18 + *15 = 13.8
n
150
∑Fi
i=1
Note that the deviation from the assumed mean V i =(Xi-A) and the Ui will always sum up to
zero.

Assignment- list 4 properties, 4 advantages and 4 disadvantages of the mean.

Median(Xm)- For a set of ungrouped data, the median is the middle term or the mean of the two
middle values if the series or sequence is even. In other words, the median is that value which
divides the distribution into two equal parts. To calculate the median, first arrange the data in
ascending or descending order of magnitude.

For example,
A 13, 14, 15, 16, 16, 18, 19. The median is = 16
B 13, 14, 15, 15, 16, 16, 18, 19. The median is = 15.5

21
Median of grouped data
N
( −Cb)
Median = Lm + 2 W
Fm
Where- Lm = lower class boundary of the median class
N
= the median term
2
Cb = cumulative frequency of the class before the median class
W = width of the median class
Fm = frequency of the median class

Example 1 Find the median of the following frequency distribution.

Class 2-4 5-7 8-10 11-13 14-16 17-19


Frequency 2 3 8 3 3 1

Solution-

Class F Class boundary Cumulative Freq.


2-4 2 1.5 - 4.5 2
5-7 3 4.5 – 7.5 5
8-10 8 7.5 – 10.5 13
11-13 3 10.5 – 13.5 16
14-16 3 13.5 – 16.5 19
17-19 1 16.5 – 19.5 20
20

N
(
−Cb)
Median = Lm + 2 W
Fm

Median class = 8 – 10

N 20
N = 20, = = 10
2 2

(10−5) (5) 15
= 7.5 + * 3 = 7.5 + * 3 = 7.5 + = 7.5 + 1.875 = 9.375
8 8 8

Assignment- Find the median in the frequency distribution below,

X 1 2 3 4 5
F 2 6 7 3 2

22
MODE (X0)- The mode can simply be defined as the value or score with the highest frequency in
a disribution. The mode may not exist, and even when it exists, it may not be unique as it may be
bi-modal. In a frequency table, it is the interval containing the highest number of observation. In
the histogram, the modal group is the one corresponding to the highest/longest rectangle. The
mode is the only measure of central tendency that may be appropriately used with norminal
scale.The mode is often very useful as a measure of central tendency in a distribution because it is
unstable as sample run from the same population may have different mode. Again, a distribution
may have more than one made
Example- Find the mode in each of the following set of numbers.
A 2, 3, 5, 7, 9, 9, 9, 10, 11, 12, 18
B 3, 5, 8, 10, 12, 15, 16
C 2, 3, 4, 4, 5, 5, 7, 7, 7, 9, 9
Solution-
A the mode is 9
B no mode
C the mode is 4 and 7 (bi-modal)

Mode for grouped frequency distribution


D1
The formula is given by- Mode = Li + ( )W
D1+ D 2
Where- Li= the lower class boundary of the modal class
D1 = absolute difference in frequency between the modal class and the previous
class
D2 = the difference in frequency between the modal class and the following class
W = modal class width.

Example 1- the following are the distribution of marks of 100 students in a statistics test.

Number of marks Number of candidates Class boundary


20-24 5 19.5-24.5
25-29 23 24.5-29.5
30-34 34 29.5-34.5
35-39 25 34.5-39.5
40-44 10 39.5-44.5
45-49 3 44.5-49.5
Estimate the mode for the following set of data using (i) the formula (ii) the histogram
Solution-
Modal class = 30-34
Modal class frequency = 34
L = 29.5
D1 = 34-23 = 11
D2 = 34-25 = 9
W=5
11
Mode = X0 = 29.5 + * 5 = 29.5 + 2.75 = 32.25
11+9

23
The histogram
Geometric mean (XG) (G.M)-The geometric mean of a set of data X1, X2, …, Xn is the nth root of
the product of the n values involved. That is, X̅G = √n X 1∗X 2∗…∗X n … (G)
Because of the difficulty in resolving (G) above, the observations are instead resolved in
logarithm and anti-logarithm of the transformed value obtained.
n n

logX̅G =
∑ logX i = X̅G ∑ logX i )
= Antilog ( i=1
i=1
n n
n n

∑ fi logX i ∑ fi logX i
i=1 i=1
for a grouped data, G.M is defined as logX̅G = n , X̅G = Antilog ( n
)
∑ fi ∑ fi
i =1 i=1

Harmonic mean(X̅H) or (H.M)- the harmonic mean of a set of numbers is the reciprocal of the
arithmetic mean of the reciprocal of the numbers.
n
1 n
Given X1, X2, …, Xn, Harmonic mean (H.M) = n = 1 1
+ +…+1 / Xn
∑ Xi X 1 X 2
i=1
n

∑ fi
i=1
For a set of grouped data, the Harmonic mean is given by H.M = n

∑ (f / X )
i=1
Example – calculate the geometric mean and harmonic mean for the following distributions.
A 1, 4, 10, 8, 10

B-
Class interval 1-3 4-6 7-9 10-12
frequency 1 3 4 2

Solution (A) using the geometric mean

24
log 3200 3.5052
X̅G = √5 X 1∗X 2∗…∗X n = √5 1∗4∗10∗8∗10 = √5 3200 = (3200) 1/5 = logX̅G = =
5 5
= 0.7010
OR

X logX
1 0.0000
4 0.6021
10 1.0000
8 0.9031
10 1.0000
3.5052
n

X̅G =
∑ logX i = 3.5052 = 0.7010
i=1
5
n

Using the harmonic mean-


n 5 5
n
X̅H = 1 = 1 1 1 1 = 63 = X̅H = 3.2
∑ Xi 1+ + + +
4 10 8 10 40
i=1

(B)
Xi fi logXi fi log Xi f/Xi
2 1 0.3010 0.3010 0.5
5 3 0.6990 2.0970 0.6
8 4 0.9031 3.6124 0.5
11 2 1.0414 2.0828 0.18
Total 10 8.0932 1.78
n

∑ fi logX i 8.0932
i=1
The geometric mean is given by- logX̅G = = = 0.80932
n
10
∑ fi
i =1
X̅G = Anti-log (0.80932) = 6.45
n

∑ fi 10
i=1
The harmonic mean is given by- H.M = = = 5.62
n
1.78
∑ (f / X )
i=1

25
Relationship between the A.M, G.M and H.M is given by (A.M) (H.M) = (G.M) 2. But also note
that A.M ≥ G.M ≥ H.M.

Also the empirical relationship between mean, median and mode is given by-
mode = mean-3(mean – median).
That is, mode = 3 median – 2 mean

3 median−mode
Mean =
2

MEASURES OF PARTITION (FRACTILES)

Fractiles also known as quantiles is a point on a number scale which is assumed to underline a set
of observation into two groups with known proportions in each group. There are basically three
measures of partition namely quartiles, Deciles and percentiles.
1 Quartiles- Just as the median divides a data into two equal halves, quartile divides the
data into four equal parts.
1st quartile is Q1 = (25% of values below)
2nd quartile is Q2 = median with 50% of data below
3rd quartile is Q3 = (75% of values below)
4th quartile is Q4 = 100%
2 Deciles- These are the nine points that divides a distribution into ten equal parts. It is
denoted by D1, D2, …, D9. Where the 5th decile (D5) is the median of the distribution.
3 Percentiles- These are the 99 points that divides the distribution into 100 equal parts. The
1st percentile is 1% of the observation below it. The 50th percentile is the median.

Example- The data below is the frequency distribution of the scores of students in an examination

Scores 25-32 33-40 41-48 49-56 57-64 65-72 73-80 81-88 89-96
Number of 3 4 6 9 8 10 3 4 3
students

Ai Draw a histogram to represent the data.


Aii from your histogram in Ai above, estimate the mode.
Bi Draw the cumulative frequency curve (ogive)
Bii use the curve to estimate (1) the median (2) lower quartile (Q1) (3) third decile (4) 70 th
percentile (5) semi inter-quartile range.

Solution-
Scores F Class boundary Cumulative freq.
25-32 3 24.5-32.5 3

26
33-40 4 32.5-40.5 7
41-48 6 40.5-48.5 13
49-56 9 48.5-56.5 22
57-64 8 56.5-64.5 30
65-72 10 64.5-72.5 40
73-80 3 72.5-80.5 43
81-88 4 80.5-88.5 47
89-96 3 88.5-96.5 50
50

A histogram

Mode = 66.5
Using the formula-
D1 2 2 16
L1 + ( ) W= 64.5 + ( ) * 8 = 64.5 + ( ) * 8 = 64.5+ = 66.3
D1+ D 2 2+ 7 9 9

27
The Ogive

Where N= ∑ f
50
N/2 = = 25
2
Median from Ogive = 59.5
N
( −Cb)
Median using the formula =Lm + 2 W
Fm
Where the median class = 57-64, Lm = 56.5, Cb = 22, W = 8, Fm = 8
(25−22) (3)
Median = 56.5 + * 8 = 56.5 + * 8 = 56.5 + 3 = 59.5
8 8
1 1
Q1 = * N = * 50 = 12.5
4 4
Q1 = 48.0
3
D3 = * 50 = 15
10
D3 = 50.5
70
70th percentile P70= * 50 = 35
100
P70= 68.5
3
Q3 = * 50 = 37.5
4
Q3 = 70.5
Q3−Q 1 70.5−48.0
Semi interquartile range = = = 11.25
2 2

Assignment-
1 The marks of 50 candidates in an examination for which the maximum mark was 100 are
given below
62 21 4 26 7 38 32 64 12 38
45 6 33 55 62 48 49 7 9 41
21 30 31 3 25 57 48 8 18 43
72 23 5 8 37 3 31 39 65 53
4 75 17 4 61 50 51 38 36 40
A group your sample into a frequency distribution with intervals of 0-9, 10-19 etc.
B plot the sample cumulative distribution (Ogive)
C use the method of coding to evaluate the sample mean.

2 Given the set of observation


X 6 7 8 9 10
F 5 9 6 8 2

Calculate the (i) Arithmetic mean, (ii) Geometric mean, (iii) Harmonic mean (iv) what is the
relationship between the I, ii, and iii above from your calculations.
28
3 The distribution of marks scored by 100 students in a given examination were recorded as
shown below.

Marks
(%) 0-9 10-19 20-29 30-39 40-49- 50-59 60-69 70-79 80-89 90-99
Number
of
students 2 7 8 13 24 30 6 5 3 2

a Draw the cumulative frequency curve, from your curve determine (i) median, (ii) quartiles
(iii) quartile deviation.

b Draw the histogram for the distribution. From your histogram, estimate the (i) median, (ii)
mode.

CODING METHOD FOR LARGE NUMBERS

Find the mean of the data given below

X 304 308 312 316 320 324


F 1 5 9 4 4 2
Solution

X f fX Y = X-312 fY
304 1 304 -8 -8
308 5 1540 -4 -20
312 9 2808 0 0
316 4 1264 4 16
320 4 1280 8 32
324 2 648 12 24
25 7844 44

X̅ =
∑f X =
7844
= 313.76
∑f 25

Ӯ=
∑fY =
44
= 1.76
∑f 25

Y = X-321

Ӯ = X̅-312

29
X̅ = Ӯ + 312, X̅ = 1.76 + 312 = 313.76

(2)

Time f X Y = X-45.5 fY fX
11-20 1 15.5 -30 -30 15.5
21-30 2 25.5 -20 -40 51
31-40 5 35.5 -10 -50 177.5
41-50 11 45.5 0 0 500.5
51-60 8 55.5 10 80 444
61-70 2 65.5 20 40 131
71-80 11 75.5 30 330 830.5
40 330 2150

X̅ =
∑f X and
∑f
fY 330
Ӯ= = = 8.25
∑f 40

Y = X- 45.5

Ӯ = X̅-45.5

X̅ = Ӯ + 45.5

X̅ = 8.25 + 45.5 = 53.75

X ̅=
∑f X =
2150
= 53.75
f 40

Assignment- Using the coding method, find the mean of the following numbers.

4610, 4612, 4614, 4616, 4618, 4620

MEASURES OF DISPERSION

This measures the degree to which item vary in numerical values. They are also called measures
of variability.

1 The range
2 Mean deviation
3 Variance

30
4 Standard deviation
5 Relative dispersion

The range- The range is the simplest measure of dispersion with respect to its concept and
computation. It is the difference between the minimum and maximum items of the series. For
grouped data in a frequency distribution, the range is the difference between the upper limit of the
class at the upper extreme and the lower limit of the class at the lower extreme of the distribution.

Absolute range = Xmax – Xmin or L – S. where L = largest, S = Smallest

Absolute Range L−S


Coefficient of range or relative range = =
∑ of the two extrems L+S
Example – Find the range and coefficient of range of the weight of 10 students from the following data.

41 20 15 65 73 84 53 35 71 55

Solution- First arrange the data in ascending order

15 20 35 41 53 55 65 71 73 84

L= 84, S = 15

Range = L – S = 84 – 15 = 69

L−S 84−15 69
Coefficient of range = = = = 0.697
L+S 84+ 15 99

MEAN DEVIATION- the mean deviation for a set of ungrouped data is given by
∑ │d │ =
n
∑ │ X −X ̅ │
n

Example (1) calculate the mean deviation about the mean for the following series –

15 20 17 19 21 13 12 10 17 9 12

Solution- n = 11

15+20+17+19+ 21+13 +12+ 10+17+9+ 12 165


Mean = X̅ = = X̅ = = 15
11 11

31
X d =X -X̅ │d│= │X - X̅│
15 0 0
20 5 5
17 2 2
19 4 4
21 6 6
13 -2 2
12 -3 3
10 -5 5
17 2 2
9 -6 6
12 -3 3

∑ │ d │= │X - X̅│ = 38

Mean deviation =
∑ │d │ = 38 = 3.455
n 11

Mean deviation for grouped data- Let X1, X2, X3, …, Xn occur with frequencies f1, f2, f3, …, fn
respectively, and let ∑ f = n and X = mean, then the mean deviation is given by the formula

Mean deviation =
∑ f │ X− X ̅ │
=
∑ f │d│
∑f n

Example (2) find the mean deviation from (i) mean (ii) Median, for the following data.

Marks 20 18 16 14 12 10 8 6
No. of
students 2 4 9 18 27 25 14 1

Solution-

Marks (x) No. of students (f) f*X │d│= │X- X̅│ f * │d│
6 1 6 6 6
8 14 112 4 56
10 25 250 2 50
12 27 324 0 0
14 18 252 2 36
16 9 144 4 36
18 4 72 6 24
20 2 40 8 16
∑ f = 100 ∑ f x= 1200 ∑ f │d │= 224

32
Mean =
X
=
∑ fx = 1200 = 12
= ∑ f 100
=

Mean deviation =
∑ f │ X− X ̅ │
=
∑ f │ d │ = 224 = 2.24
∑f ∑f 100

Mean deviation about the median

Solution-

Marks (x) No. of students (f) Cumulative frequency f│d│= f│x - 12│
6 1 1 6
8 14 15 56
10 25 40 50
12 27 67 0
14 18 85 36
16 9 94 36
18 4 98 24
20 2 100 16
= 100 = 224

Median = Average of n/2th and (n/2 + 1)th item. = average of 50th and 51st item = 12

Mean deviation about the median =


∑ f │ d │ = 224 = 2.24
∑f 100

Example (3) calculate the mean deviation from the mean for the following data.

Marks 0-10 10-20 20-30 30-40 40-50 50-60 60-70


No. of stu 6 5 8 15 7 6 3

Marks Mid-value (x) Frequency (f) F*X │x - x̅│ F│x - x̅│


0-10 5 6 30 28.4 170.4
10-20 15 5 75 18.4 92.0
20-30 25 8 200 8.4 67.2
30-40 35 15 525 1.6 24.0
40-50 45 7 315 11.6 81.2
50-60 55 6 330 21.6 129.6
60-70 65 3 195 31.6 94.8
∑ f = 50 ∑ fx = 1670 ∑ f │ x−x ̅ │=
659.2

33
Mean = x̅ =
∑ fx = 1670 = 33.4
∑ f 50

Mean deviation =
∑ f │ x −x ̅ │
=
659.2
= 13.184
∑f 50

Standard deviation- This is the positive square root of the average of squared deviations taken
from the mean. Let x be a random variate which takes on n values thus, X1, X2, X3, …,Xn, then
the standard deviation of these n observations is given by;

S.D = σ =
√∑ (x−x ̅ ) 2 OR
n

S.D = σ =
√ ∑
n
x2 ∑x
−(
n
¿ )2 ¿ OR σ =
√ ∑ x 2 −(x ̅ )2
n

in case of frequency distribution, the S.D is given by, σ =


√ 1
N
∑ f (x−x ̅ ) 2

where X̅ =
∑ x and N or n = ∑ f
n

Variance- variance is the square of standard deviation and is denoted by σ2

1
σ2=
N
∑ (x−x ̅ )2
σ
Coefficient of standard deviation =

Example (1) Find the standard deviation, variance and coefficient of standard deviation of the
following numbers. 3, 4, 5, 6

Solution- n = 4, ∑ X = 3+4+5+6 = 18

∑ x 2= 32 + 42 + 52 + 62 = 9 + 16 + 25 + 36 = 86
X̅ =
∑ x = 18 = 4.5
n 4

(i) σ=
√ ∑ x 2 −( ∑ x ¿ )2=
n n √ 86 18
4
−( ¿ )2 ¿ ¿
4

34
σ = √ 21.5−(4.5 ¿) 2 ¿ = √ 21.5−¿ 20.25 ¿ = √ 1.25
σ = 1.12
(ii) variance = σ2= (1.12)2 = 1.25

σ 1.12
(iii) Coefficient of S.D = = = 0.25
X̅ 4.5

Example 2 Find the standard deviation and variance of the following data.

Size 10 11 12 13 14 15 16
frequency 2 7 11 15 10 4 1

Solution-

Size Frequency (f) fX X2 f(X2)


10 2 20 100 200
11 7 77 121 847
12 11 132 144 1584
13 15 195 169 2535
14 10 140 196 1960
15 4 60 225 900
16 1 16 256 256
91 50 640 8282

σ=
√ ∑ fx 2 −( ∑ fx ¿ )2 ¿ =
n n √ 8282 640
50
−(
50
¿ )2 ¿ = √ 165.64−163.84 = √ 1.8 =

σ = 1.342

σ2 = (1.342)2 = 1.8

Calculation of standard deviation using the shortest method or assumed mean method

The formula is given by; σ=


√ ∑ fd 2 −( ∑ fd ¿ )2 ¿
n n
where d = X – a

OR

σ=
√ ∑ fd 2 −( ∑ fd ¿ )2 ¿* i
n n

35
x−A
where, d =
i
,n= ∑ f , A = Assumed mean
Example 3. Find the standard deviation for the following distribution.

Marks 10-20 20-30 30-40 40-50 50-60 60-70 70-80


No. of stu 5 12 15 20 10 4 2

Solution- Let A = 4

Marks (class No. of Mid value m−45


D=
interval) students (F) (m) 10 fd fd2
10-20 5 15 -3 -15 45
20-30 12 25 -2 -24 48
30-40 15 35 -1 -15 15
40-50 20 45 0 0 0
50-60 10 55 1 10 10
60-70 4 65 2 8 16
70-80 2 75 3 6 18
Total 68 -30 152

σ=
√ ∑ fd 2 −( ∑ fd ¿ )2 ¿* i
n n

= 10 *
√ 152 −30
68
−(
68
¿ )2¿ =10∗√ (2.2352−0.1946¿)¿

= 10 * √ 2.0406 = 14.3

σ2 = (14.3)2 = 204.49

Assignment- (1) In a study on customers, the following data was obtained. Find the standard
deviation and variance of the data.

Ages (in
years 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89
Number
of items 1 0 1 10 17 38 9 3

2 The marks obtained by 10 students in an examination were as follows; 70, 65, 68, 70, 75,
73, 80, 70, 83, 86. Find the mode, mean, median, range and mean deviation about the

36
mean. Also find the coefficient of mean deviation based on (i) mean, (ii), median, (iii)
mode. Also find the coefficient of range.

RELATIVE DISPERSION

The most commonly used measure of relative dispersion is the coefficient of variation C.V
given by;

σ
C.V = * 100 Where σ = standard deviation, x̅ = mean

If coefficient of variation (C.V) is more in one series as compared to another, then;


(A) There will be more variation in that series
(B) There will be lesser stability in its composition
(C)There will be lesser consistency in its composition

Example – calculate the coefficient of variation for the following data; 9.5, 13.7, 10.4, 8.6

X X2
9.5 90.25
13.7 187.69
10.4 108.16
8.6 73.96
42.2 460.06

X
= =
∑x =
42.2
= 10.55
n 4
=

σ=
√ ∑ x 2 −( ∑ x ¿ )2 ¿=
n n √ 460.06 42.2
4
−(
4
¿ )2¿

σ= √ 115−111.3 = √ 3.7 = 1.92

37
σ 1.92
C.V = * 100 = * 100 = 18.23%
x̅ 10.55

Assignment- Calculate the coefficient of variation for the quarterly profits of 2 companies.

Quarters Company A Company B


January - March 15.5 3.3
April – June 21.2 5.7
July – September 23.4 8.9
October - December 18.8 2.6

SKEWNESS AND KURTOSIS

Skewness- This is the degree of asymmetry of a distribution.

Relative measure of skewness or coefficient of skewness.

There are four types of relative measures of skewness;

1. Karl Pearson’s coefficient of skewness


2. Bowley’s coefficient of skewness
3. Kelly’s coefficient of skewness
4. Measure of skewness based on the moments.

Interpretation of skewness;

1. Its value usually lies between -1 and +1


2. When its value is zero, there is no skewness. That is, the distribution is symmetrical.
3. When its value is negative, the distribution is negatively skewed or skewed to the left.
4. When its value is positive, the distribution is positively skewed or skewed to the right.
1. Karl pearson’s measure of skewness SKp- this is given by

mean−mode x ̅ −mode
SKP = =
standard deviation σ

In moderately skewed distribution we have

Mode = 3 Median – 2 Mean

38
mean−( 3 median−2 mean)
SKP =
standard deviation

3(mean−median) 3(x ̅ −median)


SKP = =
standard deviation σ

2. Bowley’s measure of skewness (quarterly coefficient of skewness).

This is given by;

SKP = (Q3- median) – (median– Q1)

Q3+Q 1−2median
SKP =
Q3−Q 1

3. Kelly’s measure of skewness (percentile coefficient of skewness).


This is given by;
( P 90−2 median ) −(median−P 10)
SKP =
P 90−P 10

( P 1 0+ P 90−2 median ) ( P 1 0+ P 90−2 P 50 )


SKP = or
P 90−P 10 P 90−P10

4. Measure of skewness based on moments.


This is given by;
µ3
σ3
Where;

µ3 is the third moment given by; µ3 =


∑ ( X−X ̅ )3 or ∑ f ( X− X ̅ )3 or ∑ X 3 or
N N N
∑ fX 3
N
and σ3 = 3rd power of the standard deviation.

Example 1- From the information given below, calculate (a) Karl Pearson’s coefficient of
skewness (b) quarterly coefficient.

Measure Product A Product B


Mean 150 140
Median 142 155

39
Standard deviation 30 55
Third quartile 195 260
Lower quartile 62 80

Solution-
Product A
Given X̅ = 150, median = 142, σ = 30, Q3 = 195, Q1 = 62

Mode = 3 median – 2 mean

= 3 * 142 – 2 * 150 = 426 – 300 = 126

Product B

Given X̅ = 140, median = 155, σ = 55, Q3 =260, Q1 = 80

Mode = 3 median – 2 mean

= 3 * 155 – 2 *140 = 465 – 280 = 185

Karl Pearson’s coefficient of skewness =

mean−mode x ̅ −mode
SKP = =
standard deviation σ

Product A

150−126 24
SKP(A) = = = 0.8
30 30

Product B

140−185 45
SKP(B) = = = 0.82
55 55

Q3+Q 1−2median
Quartile coefficient of skewness =
Q3−Q 1

Product A

195+62−2∗142
SKP(A) = = - 0.203
195−62

Product B

260+80 – 2∗155
SKP(B) = = 0.167
260−80

40
Calculate kelly’s coefficient of skewness for the data given below

Size Frequency Size Frequency


1 3 7 32
2 8 8 28
3 15 9 20
4 23 10 46
5 35 11 14
6 40 12 6

Solution-

Size Frequency Cum. Freq. Size Frequency Cum. Freq.


1 3 3 7 32 156
2 8 11 8 28 184
3 15 26 9 20 204
4 23 49 10 46 250
5 35 84 11 14 264
6 40 124 12 6 270

N = ∑ f = 270

10 10
Tenth percentile = P10 = ∗270 =27th item & P10 = ∗271 =27.1th item
100 100

P10 = 4

50 270 50 271
Median or P50= ∗270 or = 135th item& ∗271 or = 135.5th item
100 2 100 2

P50= median = 7

90 50
Ninetieth percentile = P90 = ∗270 = 243.9th item & ∗271 = 243.9th item
100 100

P90 = 10

( P 1 0+ P 90−2 P 50 ) 10+4−2∗7
Kelly’s coefficient of skewness = = =0
P 90−P10 10−4

KURTOSIS- This is the degree or extent of peakedness or flatness of a curve of a distribution.


The peakedness here refers to the shape of the top of the distribution.

There are 3 types of kurtosis curve namely;

41
1. Leptokurtic- A distribution with a relatively high peak.
2. Mesokurtic- A distribution with a normal peak.
3. Platykurtic- A distribution with a low or flat peak.

Coefficient of kurtosis using the moment is given by;

µ4
Km =
(σ 2) 2

1
(Q 3−Q 1)
The percentile coefficient of kurtosis is given by; Kp = 2
P 90−P 10

Example- Given that Q1 = 53.75, Q3 = 76.33, P 10 = 43.0 and P90 = 85.5. find the percentile
coefficient of kurtosis and how would you describe the shape of the distribution?

1
(76.33−53.75) 11.29
Kp = 2 = = 0.266
42.5
85.5−43.0

This implies that the distribution is slightly platykurtic since the calculated value is slightly higher
than 0.263.

Example 2- Given µ4 = 4253.21, σ2 = 34.32. Calculate the moment coefficient of kurtosis.

µ4 4253.21 4253.21
Km = = = = 3.61
(σ 2) 2 (34.32)2 1178.21

Km = 3.61 implies that the distribution is Leptokurtic.

Note- when Km= 3, the distribution is normal (mesokurtic)

When Km< 3, the distribution is Platykurtic

When Km> 3, the distribution is Leptokurtic

When Kp = 0.263, the distribution is normal (mesokurtic)

When Kp> 0.263, the distribution is Platykurtic

WhenKp< o.263, the distribution is Leptokurtic.

Assignment-

(1) From the information given below calculate the Karl Pearson’s coefficient of skewness.

Measure Place (A) Place (B)

42
Mean 256.5 240.8
Median 201.0 201.6
Standard deviation 215.4 181.1
Third quartile 260.0 242.0
First quartile 157.0 164.2

(2) The series at an aptitude test by 100 candidates are given below. Calculate Karl Pearson’s
coefficient of skewness.

Marks No. of candidates Marks No. of candidates


0-10 10 40-50 10
10-20 15 50-60 10
20-30 24 60-70 6
30-40 25

INDEX NUMBERS

An index number is a ratio or an average of ratios expressed as a percentage. An index number is


therefore a relative number. It is used to express relationship between two similarly measured
figures of which one is called the base. It is also used to describe changes over time in business
and economic activities.

TYPES OF INDEX NUMBERS

1. Price index numbers

43
2. Value index numbers
3. Quantity index numbers

PRICE RELATIVE- A price relative expresses the unit price of an item in each periods a
percentage of the item unit in the base period. Another name for price relative is simple price
index.
'
curre nt yea r sprice P1
Price relative = * 100 = * 100
'
Base yea r sprice P0

Example 1- The table below gives the collection of the average unit price of four food items
consumed by a certain family in 2010 and 2015. Using 2010 as the base year, calculate the price
relative.

Food item 2010 (P0) 2015 (Pn) Pn


Price relative * 100
P0
Meat (kg) 185 230 124.32
Milk (litre) 58 70 120.69
Egg (tens) 120 142 118.33
Bread (loaf) 50 80 160.00

Example 2- The table below shows price for meat for the years 2000 to 2004, using year 2000 as
the base year, calculate the price relative.

Year Unit price (₦) Pn


Price relative * 100
P0
2000 450 450
∗100 = 100
450
2001 430 430
* 100 = 95.56
450
2002 480 480
* 100 = 107.11
450
2003 540 540
* 100 =120
450
2004 620 620
* 100 = 137.78
450

Q1
Quantity/volume relative = * 100
Q0

Where Q1= quantity in the current year

Q0 = quantity in the base year

44
Example 3- The table below shows the production figures in thousand tones of three agricultural
crops for the years 2007 and 2009. Calculate the quantity relative using 2007 as the base year.

Production in (000) tones


Q1
Q.R = * 100
Crop 2007 2009 Q0
Groundnut 650 710 109.23
Soya beans 138 130 94.20
Palm oil 720 794 110.28

P 1Q 1 V1
Value relative = * 100 = * 100
P0Q0 V0

Example 4- The table below gives the production figure and the unit prices of items in the
previous table, using 2007 as the base year, calculate the value relative.

Production in (000) tones Unit price in (₦ ‘000’)


Value relative
2007 P 1Q 1
* 100
Crop (Q0) 2009(Q1) 2007 2009 P0 Q0 P1Q1 P0Q0
Groundnut 650 710 22 26 14300 18460 129.1
Soya beans 138 130 15 20 2070 2600 125.6
Palm oil 720 794 16 18 115200 14292 124.06

Aggregate indices (simple and weighted)

Simple aggregative price index (SAPI) =


∑ P 1 * 100
∑ P0
Example 5- using the table below, Calculate the SAPI

Food item 1990 P0 1995 P1


Meat (kg) 185 230
Milk (litre) 58 70
Egg (ten) 120 142
Bread (loaf) 50 80
413 522

SAPI =
∑ P 1 * 100 = = 522 * 100 = 126.39
∑ P0 413

Simple aggregative quantity index (SAQI) =


∑ Q 1 * 100
∑ Q0
45
Example 6- Using the table below, calculate the SAQI

Production in (000) tones


Crop 2007 (Q0) 2009 (Q1)
Groundnut 650 710
Soya beans 138 130
Palm oil 720 794

SAQI =
∑ Q 1 * 100 = = 710+130+794 * 100 = = 1634 * 100 = 108.36
∑ Q0 650+138+720 1508

Simple aggregative value index (SAVI) =


∑ P 1Q 1 * 100
∑ P 0Q 0
Using the table below calculate the SAVI

Production in 000 tones


2007 (Q0) 2009 (Q1) 2007 (P0) 2009(P1) P1Q1 P0Q0
650 710 22 26 18460 14300
138 130 15 20 2600 2070
720 794 16 18 14292 11520
35352 27890

P 1Q 1 35352
SAVI = * 100= * 100 = 126.76
P0Q0 27890

WEIGHTED AGGREGATIVE INDEX NUMBERS

Some of the important method of constructing weighted aggregative index numbers include;

1. Laspeyer’s price index (LPI)

P01 =
∑ P 1Q 1 * 100
∑ P 0Q 0
2. Paascher’s price index

P0 =
∑ P 1Q 1 * 100
∑ P 0Q 1
3. Kelly’s price index (KPI)

46
P01 =
∑ P 1Q * 100, where Q = Q0+ Q1
∑ P 0Q 2

4 Laspeyer’s Quantity index (LQI)

Q01 =
∑ Q 1 P 0 * 100
∑ Q0P0
5 Paascher’s Quantity index

Q01 =
∑ P 1Q 1 * 100
∑ Q0P1
6 Kelly’s Quantity index (KPI)

Q01 =
∑ Q 1 P * 100, where P = P 0+ P 1
∑ Q0P 2

Example 7- compute laspeyers, paasche’s and kelly’s price index number for 2005 from the
following data.

2000 2005
Commodity Price Rs Quantity Price Quantity
A 15 15 22 12
B 20 5 27 4
C 4 10 7 5
Solution-

Q=
Q0+ Q1
Commodity P0 Q0 P1 Q1 P0Q0 P1Q1 P1Q0 P0Q1 Q0+Q1 2 P1Q P0Q
A 15 15 22 12 225 264 330 180 27 13.5 297.0 202.5
B 20 5 27 4 100 108 135 80 9 4.5 121.5 90.0
C 4 10 7 5 40 35 70 20 15 7.5 52.5 30.0
365 407 535 280 471 322.5

Lp1 =
∑ P 1Q 0 *100 = 535+ P1 * 100 = 146.57
∑ P 0Q 0 365

Pp1 =
∑ P 1Q 1 * 100 = 407 * 100 = 145.35
∑ P 0Q 1 280

47
KP1 =
∑ P 1Q *100 = 471 * 100 = 146.05
∑ P 0Q 322.5

LINK RELATIVE (L.R) (CHAIN INDEX)

Price∈the current year


L.R = * 100
price∈the previous year

P1 P 1
L.R = * 100
P0 P0

Price∈the current year


OR L.R = * 100
price relative ∈the previous year

Example 8-. From the fixed base index numbers given below. Calculate the link relative

Year Price relative Link Relative


2000 188 188
( * 100) = 100
188
2001 199 199
( * 100) = 105.85
188
2002 204 204
( * 100) = 102.51
199
2003 190 190
( * 100) = 93.14
204
2004 196 196
( * 100) = 103.16
190
2005 200 200
( * 100) = 102.04
196

Assignment (1)- The following data relate to the price of rice per kg in different years.
Year 1998 1999 2000 2001 2002 2003 2004 2005
Price in RS 6 7 7 8 10 14 12 13
Find out the price relatives

1. Taking 1998 as base


2. Taking 2002 as base
3. Taking average of 1998, 1999 and 2000 as the base
4. Calculate the link relatives
Example (2)- Using the example (7) above in weighted aggregative index numbers calculate
LQI, PQI and KPI.

Example (3)- For the following compute the laspeyer’s and kelly’s price and quantity index
number for 2005 with 1995 as the base year.

48
1995 2005
Commodity Quantity Value Quantity Value
A 50 350 60 420
B 120 600 140 700
C 30 330 20 200
D 20 360 15 300
E 5 40 5 50

Hint; V0 = P0Q0, V1 = P1Q1

RATIO

A ratio is a comparison of two numbers (two related quantities). It can be written in many ways.
5
For example, the ratio of 5 to 9 can be expressed as; 5 to 9, 5 : 9,
9
A ratio compares quantities of the same kind with the same units in a definite order. The ratio itself
has no units.
In a class, there are 15 girls, 20 boys and 1 teacher.
The ratio of girls to boys is 15:20 or 15 to 20.
The ratio of teachers to students is 1:35 or 1 to 35.
Ratios containing two quantities can be written as a fraction. Once the ratio is
written as a fraction it is then possible for it to be written as a decimal and
percentage.
Ratio Fraction Decimal Percentage
1 1∗25 25
= = 0.25
1:4 4 4∗25 100 0.25 * 100 = 25%

When writing ratios, it is expected that they will be written in their simplest form containing whole
numbers. Remember ratios can be written as fractions and simplifying fractions ideas are used to
simplify ratios.
Examples:

Simplify 12:20
To simplify, the Highest Common Factor of 12 and 20 is required. Remember the HCF is the highest
number that goes into 12 and 20, the HCF is 4. If the common factor used is not the HCF, then
simplifying may take two or more stages.
=12÷4:20÷4 = 3:5

49
Simplify 35:75
In this question the HCF is 5.
=35÷5:75÷5 = 7:15

Simplify 300mm:1.2m
In this question the units are different, so the first step is make the units the same. Generally, it is
better to express both in the smaller unit. (1.2m = 1.2 x 1000 =1200 mm)
=300:1200 =300÷300:1200÷300 = 1:4
Simplify 0.5:1.25
Because both numbers are expressed as decimals, multiplying both sides by 10, 100, 1000, etc is
required to make both whole numbers. The number with the most decimal place is 1.25, that is 2
decimal places, multiplying by 100 is required to remove decimals.
=0.5×100:1.25×100 = 50:125
Now the HCF of 50 and 125 is 25, the ratio can be simplified.
=50:125 = 50÷25:125÷25 = 2:5

Simplify₦5.00 is to 25k
= ₦5.00 = 5 * 100 = 500k (converting naira to kobo to have the same unit)
500 20
= 500k : 25k = = 20 :1
25 1

Further Examples

1 Divide ₦200 between A and B in the ratio 2 : 3. Find their shares.


Solution
Total share 2+ 3 = 5
₦ 200
Each share will be = ₦40
5
Share for A = 2 * ₦40 = ₦80. Share for B = 3 * ₦40 = ₦120
2 3
ORFor A = * 200 = ₦80 For B = * 200 = ₦120
5 5

2 If 2 : 5 =120 : x, calculate x
2 120
Solution = =
5 x
120∗5
2x = 120 * 5 =x= = 300
2

3 The ages of three men are in the ratio 3 : 4 : 5. If the difference between the ages of the
oldest and youngest is 18 years, find the sum of the ages of the three men.
Solution
Total ratio = 3+4+5 = 12
Let the sum of their ages be x
5x 3x 5x 3x 5 x−3 x
The oldest = The youngest = = - = 18, = = 18
12 12 12 12 12

50
216
2x = 18 * 12 = 216 x= = 108
2

Exercises

1. Complete this table

Ratio Fraction Decimal Percentage


1:5
7
8
0.65
240%

2. Divide ₦420 among Ada, Olu and Femi such that Ada has twice as much as Olu and Olu
twice as much as Femi.

FURTHER EXERCISES
Find the definition/meaning of the following:-
(1) Variable (2) Random Variable (3) Event (4) parameter
(5) Sample Space (6) Sample (7) Statistic

NIGERIAN STATISTICAL SYSTEM (NSS)

What is a system?
A system is a network of things, agents etc. connected or working towards a given
objective/output.

Overview of the Nigerian Statistical System

According to 2006 population census, Nigeria is home to over 140 million people on a land
area of approximately 91,907.9 sq. Kilometres. The country operates a federal system of
government with thirty-six (36) states and the Federal Capital Territory (FCT). The entire
country is further sub-divided into six (6) geopolitical zones and seven hundred and seventy four
(774) Local Government Areas (LGAs). With over 500 ethnic groups, over 250
languages and over 20 million households, Nigeria is well diversified in culture and rich in
human resources. Nigeria attained independence in 1960 and a Republic status in 1963. The
political development is characterized by instability with nearly three (3) decades of military
dictatorship but have enjoyed uninterrupted democratic rule since May 29, 1999.

The current statistical system is made up of the following:

51
1. The institutions involved in the data production and compilation. These include:
National Bureau of Statistics (NBS), Central Bank of Nigeria (NBS), National
Population Commission, Department of Planning, Research and Statistics (DPRS) of
Ministries and Parastatals, the State Statistical Agencies (SSAs), Budget and Planning
of Local Government Councils among others.

2. The data users or the clientele of data production systems. This group demands and
utilizes statistical products and services. The members are quite diverse in nature and
the number is large and ever increasing. Some of them include: Policy and decision
makers in government ministries and institutions, Politicians (e.g. Members of the
national and state assemblies, Political parties, etc), Researchers, Academicians, Non
Governmental Organisations (NGOs), Donor Community, International
Organisations, the Media and the general public.

3. Those who cooperate and supply data to the data collecting agencies popularly called
data suppliers. These are mainly individuals, groups, households and establishments.

4. The research and training institutions. Among others, the research institutions include
Nigerian Institute of Social and Economic Research (NISER), Centre for Econometric
and Allied Research (CEAR) and Federal Institute of Industrial Research, Oshodi
(FIIO). The training institutions also include, Federal School of Statistics (FSS) and
the Department of Statistics in Nigerian tertiary institutions, particularly, Department
of Statistics in the Nigerian Premier University, University of Ibadan.

The objectives of the Nigerian statistical system as stipulated in the Statistics Act 2007 are
to:
1. Raise public awareness about the importance and role of statistical information to society;
2. Collect, process, analyze and disseminate quality statistical data;
3. Promote the use of bestpractice and international standards in statistical production,
management and dissemination;
4. Promote the use of statistical data and information at individual, institutional, Local
Government Area, State, National and International levels, especially for evidence-based
policy design and decision-making.
5. Build sustainable capacity for the production and useof statistical data and information in
the country for planning purposes.

Uses of statistics
Statistical Information are needed for:

1. Evidenced – based policy formulation (Vision20:2020)


2. Programme planning and design
3. Project execution
4. Monitoring and Evaluation of Government Interventions
5. Design of Production and Marketing Strategies in the formal and informal Sector

52
Major challenges
1. Inadequate funding
2. Slow pace of statistical development at the sub-national level
3. Inadequate competent staff at the sub-national level
4. Apathy towards statistics
5. Overlapping Roles Among Agencies
6. Problems with statistical outputs

Recommendations

Improved funding by both Governments and Development Partners


Effective Implementation of Statistical Strategy
Intensive Advocacy on the importance of statistics

Solution;
Let Femi’s share be x
Olu will be 2x
Ada will be 4x
X + 2x + 4x = 7x
4x
Ada = * 420 = ₦240
7x
2x
Olu = * 420 = ₦120
7x
x
Femi = * 420 = ₦60
7x
OR
7x = 420, x = 60
Femi = x = ₦60
Olu = 2x = 60 * 2 = ₦120
Ada = 4x = 60 * 4 = ₦240

53

You might also like