Sta 111 Note
Sta 111 Note
COURSE OUTLINE
1
INTRODUCTION
MEANING OF STATISTICS
Statistics is a branch of knowledge which deals with the scientific method employed in the proper
collection, presentation, analysis and dissemination of information that can be expressed in
numerical form. (As a plural), statistics means numerical set of data.
(As single), Statistics mean the science of statistical methods embodying the theory and
techniques used for collecting, analyzing and drawing inferences from the numerical data.
1. Descriptive (applied) statistics- This deals with the method of illustrating the mass of data
in order to provide more precise information that are capable of being readily assimilated
or used in decision making.
Business statistics are descriptive applied statistics as they deal with the analysis, measurement
and presentation of business facts relating to past or present on which certain decisions about
various business problems are usually taken.
USES OF STATISTICS
1 Decision making
2 Planning
3 Routine administration
4 Forecasting
5 Quality control
DATA is an arranged information or facts. The information collected through census and surveys
or in a routine manner or other sources is called a raw data.
There are 2 types of statistical data namely: Primary and Secondary data.
1. Primary Data- These are those data collected by a particular person or organization for
his own use from the primary source.
2
2. Secondary Data- These are those data collected by some other person or organization for
their own use, but the investigator also gets it for his use.
Sources of Data
1. Primary sources- When an investigator plans and collects the needed information at first
hand (in person), the data so collected are called primary data and the source a primary
source.
Advantages
Disadvantages
Secondary sources- When bulk of the statistical data are obtained from existing sources
published or unpublished, example of such sources includes, government agencies, newspapers,
research institutes. We refer to such data as secondary data and the sources as secondary sources.
Advantages
Disadvantages
1. Reliability- Primary data are usually more reliable and meaningful than secondary
data.
2. Greater details- The primary data are usually more comprehensive and also show
data in greater details.
3
3. The primary source frequently includes definition of terms and units used in data
collection.
1. The use of documentary source- Data on events relating to the past are often obtained
from documentary source such as:
A Sources of information about the survey population, for example, records of Nigerian
Medical Association.
B Sources of information about individual units of the population, for example, employment
records.
C Personal documents- Those documents which come directly from the informants, for
example, Diaries, letters, essays etc.
2 The observation method- The systematic method of data collection in which the events are
watched as they occur naturally and notes are taken. Here the required information is
obtained directly rather than through reports of others. It can fairly be referred to as the
classic method of scientific investigation.
The interview method- An interview is a conversation between two persons, the interviewer and
the respondent with the aim of eliciting certain information from the respondent. The interviewer
is the person asking the question, while the respondent is the person doing the answering. An
ideal interviewer is expected to have the following personal qualities:
1 Honesty
2 Interest
3 Accuracy
4 Adaptability
5 Intelligent
6 Tactfulness
1 High rate.
2 Instant response.
3 Errors could be corrected.
4 It is useful in developing countries where a greater number of the population can neither
reads nor write.
4
Disadvantages
1 High cost
2 Interviewer bias or errors
3 No room for consultation
The mail questionnaire- This method of data collection involves the sending of a set of well-
worded set of questions on the subject under investigation by post to carefully selected
respondents.
A Clear
C Corroborative
D Non- offending
E courteous in tone
F unambiguous
5
PRESENTATION OF DATA
When the data are presented in tables, charts or graphs in order to bring out their salient features,
it is called the presentation of data.
The raw data can be arranged in the following ways:
1. Serial order
Roll No. Marks
1 41
2 25
3 5
4 33
5 12
6 21
7 19
8 39
9 19
10 21
3. Descendingorder- 41, 39, 33, 25, 21, 21, 19, 19, 12, 5
Note- when a raw data is put in ascending or descending order of magnitude, it is called an array.
A tabular arrangement of data by classes together with the corresponding class frequencies is
called frequency distribution or frequency table.
Example 1- The following are ages of some students in 100 level statistics: 16, 20, 15, 25, 30, 15,
25, 20, 20, 25, 15, 25, 20, 30, 20, form the frequency table for the given data.
6
Age Tally Frequency
15 3
16 1
20 5
25 4
30 2
15
The ungrouped frequency distribution is not helpful when the data is very large or when the
values making up the data have large spread. For such a data, the table will be too long, or at
times impossible.
Instead of making use of single values, the grouped frequency distribution method group values
together. The frequency of each group is then matched with it to form the desired table.
Class interval- this refers to the length of interval of a class. Example, 1-5, 6-10, 11-15, 16-20,
etc.
Class size or class width-denoted by “i” is the number of elements in a class. From above, class
size is 5.
Class limits- are the extreme scores, that is, lower and upper limits. In the example above 1-5,
the lower class limit is 1 and the upper class limit is 5.
Class boundary- to get this, subtract ½ (0.5) from the lower limit of each class and add ½ (0.5) to
the upper of each class, example, the class 6-10 will be 5.5-10.5, 11-15 will be 10.5-15.5.
Example 2
Form the group frequency distribution for the data given below using class size [i] = 3
7
9 15 12 14 17 20 18 19 23
13 10 15 16 18 21 17 22 24
11 14 17 18 19 20 19 22 24
11 12 18 18 20 21 18 17 25
13 14 12 14 17 16 15 14 23
range
NOTE- number of class interval is =
class ¿ ¿ ¿
Solution:
9-11 4
12-14 10
15-17 10
18-20 12
21-23 7
24-26 2
45
Class frequency
Relative frequency =
Total frequency
8
Example 3.
Form a cumulative and relative frequency table for the data below:
X 6 7 8 9 10
F 5 9 6 8 2
Solution :
6 5 5 5/30 =0.167
30
Assignment 1- The heights (in cm) of 40 persons are:- 110, 112, 125, 135, 150, 155, 152, 150,
159, 130, 128, 138, 133, 143, 147, 151, 154, 156, 112, 116, 119, 111, 113, 115, 118, 121, 123,
120, 125, 121, 110, 113, 114, 149, 153, 155, 150, 155, 152, 111. Array the data and form a
cumulative frequency table with class interval of 10. {110-120, 120-130…} also compute the
relativefrequencies.
9
DIAGRAMATIC AND GRAGHICAL PRESENTATION OF DATA
1 Line diagram
2 Bar charts
3 Pie charts
4 Pictogram
5 Histogram
6 Frequency polygon
7 Cumulative frequency curve
Line diagram (line graph): This is the simplest of all diagrams. Note that in graphs, the vertical
axis is always the y-axis while the horizontal axis is always the x-axis.
Example 1
Plot a line graph for the following set of data given below.
X 6 7 8 9 10
F 5 9 6 8 2
Bar charts- a bar chart is a series of rectangular bars, drawn either vertically or horizontally for
each class with the height of each bar being proportional to the magnitude of its frequency. The
bar and pie charts are usually suitable for presenting qualitative data (discrete).
1 Simple bar chart- These are blocks or rectangular diagrams in which single bars are used
to represent the frequency of each category.
Example 2- Below are data showing the expenditure (₦’000) of a family in 2019. Use the
information to illustrate the preparation of simple bar chart, component bar chart multiple bar
chart and pie chart.
Period
Item Jan-April May-Aug. Sept-Dec. Total
Food 330 190 425 945
Clothing 65 85 175 325
Rent 170 170 170 510
Medi care 75 100 45 220
Others 100 125 175 400
Solution;
2 Component bar chart- this is a chart that shows part of a whole. Each bar is subdivided
such that the ratio of each subdivision indicates its relationship to the total.
11
3 Multiple bar chart- In this case the bars are shown in multiples, each component value is
shown as a bar of its own joining to another.
945
Food = *3600= 1420 (39%)
2400
325
Clothing = * 3600 = 490 (14%)
2400
510
Rent = * 3600 = 76.50 (21%)
2400
220
Medicare = * 3600 = 330 (9%)
2400
A PIE CHART
Food
Clothing
Rent
Medi care
Others
12
Compound bar chart- these are charts used to facilitate comparison.
Example – The table below represents the birth and death rate in Imo state from 2002 to 2007
120
100
Birth rate
80 Death rate
60
40
20
0
1 2 3 4 5 6
s
Pictogram (pictograph)- This simply means the use of pictures to represent statistical data.
Pictogram helps in quick visualization of comparison.
For example, a pictogram might be used to represent the number of 100 level students in
MOUAU.
Where = 1000
:- 2012 = = 3000
2013 = = 5000
2014 = = 2000
13
Example: the table below shows the distribution of 40 students by weight.
Solution-
Frequency polygon- This is plotting the frequency against class mark or join the mid-points of
the bars on the histogram.
Cumulative frequency curve (Ogive)- Here we plot the cumulative frequency against upper
class boundaries.
An ogive???
Assignment. (1)The following figures shows how many people visited a cinema each day for 50
days.
32 39 30 60 53 54 35 51 13 42
36 43 44 44 38 39 52 25 30 58
28 27 31 44 29 46 49 42 47 51
43 34 52 50 39 53 51 20 38 20
54 33 35 45 51 59 19 28 34 25
A Using the intervals of 11-20, 21-30 etc, arrange the figures in classes and obtain a
frequency distribution.
14
Bi Draw a histogram. (ii) Draw the frequency polygon. (iii) And obtain the cumulative
frequency curve for the distribution.
(2) 120 students were asked what their favourite course is, their replies are as follows.
The word error refers to the difference between the true value of a population parameter and its
Statistical device.
Errors are classified into 2 groups namely sampling and non-sampling errors
Sampling errors- this error occurs due to the fact that only a subset of the population, that is
(sample) has been used to estimate the population parameters and draw inferences about the
population. Note that sampling error is present only in sample survey and is completely absent in
census method.
15
4 Error due to bias in the estimation method- in simple random sampling, if x 1, x2, …, xn are
n
1
observations on the n sampled units, then the sample variance S = =∑ ( xi− X ̅ )2
2
n i=1
is a biased estimate of the population variance , while an unbiased estimate of is
n
1
given by S2 = =∑ (̅ x i−X ̅ ) 2 .
n−1 i=1
X
5 Variability of the population.
Non-sampling errors- This error occur due to causes which can be traced and may arise at any
stage of the enquiry. The factors causing such errors are within human control. Non-sampling
errors are present in both census and sample survey.
Biased errors are also known as cumulative errors. Also the magnitude of biased errors is directly
proportional to the number of observations.
Unbiased errors- errors are said to be unbiased if the chance of making an over-estimate is
almost same as the chance of making an under-estimate. Example, rounding to the nearest
hundred, 380 = 400, 415 = 400. The magnitude of the unbiased error is inversely proportional to
the number of items.
16
Measure of statistical errors (Absolute and Relative errors)
Absolute error- an absolute error (A.E) is the difference between the true value of any particular
observed item or variable and its estimated or approximated values.
Symbolically it is written as A.E = │a-e│. where a is the actual value and e is the estimated value
and │a-e│ represents the modulus value of (a-e).
Relative error (R.E)- this is the ratio of the absolute error to the actual value.
AE │a−e │
Symbolically written as RE = =
Actual value a
12,650
From the above example, RE = = 0.0023
54 , 87,350
12,650
Or percentage RE = * 100 = 0.23.
54 , 87,350
(2) Distinguish between sampling and non-sampling errors (at least 10 differences) in a tabular
form.
17
5+8+7+10+ 2 50
Solution- X̅¿ = =5
5 5
f 1 x 1+f 2 x 2+ …+f n x n
∑ Fi Xi
i=1
X̅ = =
f 1+f 2+… f n n
∑ Fi
i=1
X 6 7 8 9 10
F 5 9 6 8 2 30
FX 30 63 48 72 20 233
∑ Fi Xi 233
i=1
X̅ = = =7.77
n
30
∑ Fi
i=1
Class 51 – 55 56 – 60 61 – 65 66 – 70 71 – 75 76 – 80 81 - 85
Frequenc 3 5 9 11 5 5 2
y
Solution-
Class Frequency Xi fX
51 – 55 3 53 159
56 – 60 5 58 290
61 – 65 9 63 567
66 – 70 11 68 748
71 – 75 5 73 365
18
76 – 80 5 67 390
81 – 85 2 83 166
40 2685
X̅ =
∑ fx = 2685 = 67.1
∑ f 40
COMPUTATION OF MEAN USING THE ASSUMED MEAN.
The method is useful when the sizes of the numbers in the distribution are fairly large.
For ungrouped data the formula is given by;
n
X̅ = A +
∑ (xi− A) = A + ∑ V i
i=1
n
n
For a set of grouped data, the procedure for computing the mean is as follows
i. Choose the Assumed mean (A) from the class mark, preferably the middle number.
ii. Find the deviations (xi – A) of each class mark xi from the assumed mean, that is
Vi =xi – A.
iii. Find the productfivi and the sum ∑fivi
∑f i vi
iv. Obtain the correction factor
∑f i
v. Add the correction factor to the mean for the distribution using the formula.
n
∑ fi (xi− A)
X̅ = A +
i=1
=A+
∑ fi V i
n
fi
∑ fi
i=1
Example1- Ten coins were tossed together and the number of the result from them were
observed. The operation was performed 1050 times and the frequencies thus observed for
different number of tails (x) are shown in the following table calc. the arithmetic mean using the
assumed mean method.
X 0 1 2 3 4 5 6 7 8 9 10
F 2 8 43 133 207 260 213 120 54 9 1
Solution –
19
Let 5 be the assumed mean. That is, A = 5
X F V=x-5 Fv
0 2 -5 -10
1 8 -4 -32
2 43 -3 -129
3 133 -2 -266
4 207 -1 -207
5 260 0 0
6 213 1 213
7 120 2 240
8 54 3 162
9 9 4 36
10 1 5 5
∑f = 1050 ∑fv = 12
From formula X̅ = A +
∑ fi V i = 5 + 12 = 5 + 0.0114 = 5.0114
∑ fi 1050
Example 2 The number of days spent by patients in hospital after being operated upon is
shown in the following table.
Days Number of patients (f)
1-5 6
6-10 38
11-15 60
16-20 25
21-25 16
26-30 3
31-35 2
Calculate the mean using the assumed mean method.
Solution-
20
Using X̅ = A +
∑ FV = 18 +
−630
=18 - 4.2 = X̅ = 13.8
∑F 150
∑ F iU i
i=1
Using this method, the mean is defined as, X̅ = A + n W
∑Fi
i=1
Where- W = class width
F = frequency
A = Assumed mean (mid class mark)
X i−A
Ui =
W
Xi = class mark
Using the data of example 2 above calculate the mean using the coding method.
Solution- using A = 18, W = 5
Xi Vi= (X i –A) F Ui F iU i
3 -15 6 -3 -18
8 -10 38 -2 -76
13 -5 60 -1 -60
18 0 25 0 0
23 5 16 1 16
28 10 3 2 6
33 15 2 3 6
Total 0 150 0 -126
n
∑ F iU i −126
i=1
X̅ = A + W = 18 + *15 = 13.8
n
150
∑Fi
i=1
Note that the deviation from the assumed mean V i =(Xi-A) and the Ui will always sum up to
zero.
Median(Xm)- For a set of ungrouped data, the median is the middle term or the mean of the two
middle values if the series or sequence is even. In other words, the median is that value which
divides the distribution into two equal parts. To calculate the median, first arrange the data in
ascending or descending order of magnitude.
For example,
A 13, 14, 15, 16, 16, 18, 19. The median is = 16
B 13, 14, 15, 15, 16, 16, 18, 19. The median is = 15.5
21
Median of grouped data
N
( −Cb)
Median = Lm + 2 W
Fm
Where- Lm = lower class boundary of the median class
N
= the median term
2
Cb = cumulative frequency of the class before the median class
W = width of the median class
Fm = frequency of the median class
Solution-
N
(
−Cb)
Median = Lm + 2 W
Fm
Median class = 8 – 10
N 20
N = 20, = = 10
2 2
(10−5) (5) 15
= 7.5 + * 3 = 7.5 + * 3 = 7.5 + = 7.5 + 1.875 = 9.375
8 8 8
X 1 2 3 4 5
F 2 6 7 3 2
22
MODE (X0)- The mode can simply be defined as the value or score with the highest frequency in
a disribution. The mode may not exist, and even when it exists, it may not be unique as it may be
bi-modal. In a frequency table, it is the interval containing the highest number of observation. In
the histogram, the modal group is the one corresponding to the highest/longest rectangle. The
mode is the only measure of central tendency that may be appropriately used with norminal
scale.The mode is often very useful as a measure of central tendency in a distribution because it is
unstable as sample run from the same population may have different mode. Again, a distribution
may have more than one made
Example- Find the mode in each of the following set of numbers.
A 2, 3, 5, 7, 9, 9, 9, 10, 11, 12, 18
B 3, 5, 8, 10, 12, 15, 16
C 2, 3, 4, 4, 5, 5, 7, 7, 7, 9, 9
Solution-
A the mode is 9
B no mode
C the mode is 4 and 7 (bi-modal)
Example 1- the following are the distribution of marks of 100 students in a statistics test.
23
The histogram
Geometric mean (XG) (G.M)-The geometric mean of a set of data X1, X2, …, Xn is the nth root of
the product of the n values involved. That is, X̅G = √n X 1∗X 2∗…∗X n … (G)
Because of the difficulty in resolving (G) above, the observations are instead resolved in
logarithm and anti-logarithm of the transformed value obtained.
n n
logX̅G =
∑ logX i = X̅G ∑ logX i )
= Antilog ( i=1
i=1
n n
n n
∑ fi logX i ∑ fi logX i
i=1 i=1
for a grouped data, G.M is defined as logX̅G = n , X̅G = Antilog ( n
)
∑ fi ∑ fi
i =1 i=1
Harmonic mean(X̅H) or (H.M)- the harmonic mean of a set of numbers is the reciprocal of the
arithmetic mean of the reciprocal of the numbers.
n
1 n
Given X1, X2, …, Xn, Harmonic mean (H.M) = n = 1 1
+ +…+1 / Xn
∑ Xi X 1 X 2
i=1
n
∑ fi
i=1
For a set of grouped data, the Harmonic mean is given by H.M = n
∑ (f / X )
i=1
Example – calculate the geometric mean and harmonic mean for the following distributions.
A 1, 4, 10, 8, 10
B-
Class interval 1-3 4-6 7-9 10-12
frequency 1 3 4 2
24
log 3200 3.5052
X̅G = √5 X 1∗X 2∗…∗X n = √5 1∗4∗10∗8∗10 = √5 3200 = (3200) 1/5 = logX̅G = =
5 5
= 0.7010
OR
X logX
1 0.0000
4 0.6021
10 1.0000
8 0.9031
10 1.0000
3.5052
n
X̅G =
∑ logX i = 3.5052 = 0.7010
i=1
5
n
(B)
Xi fi logXi fi log Xi f/Xi
2 1 0.3010 0.3010 0.5
5 3 0.6990 2.0970 0.6
8 4 0.9031 3.6124 0.5
11 2 1.0414 2.0828 0.18
Total 10 8.0932 1.78
n
∑ fi logX i 8.0932
i=1
The geometric mean is given by- logX̅G = = = 0.80932
n
10
∑ fi
i =1
X̅G = Anti-log (0.80932) = 6.45
n
∑ fi 10
i=1
The harmonic mean is given by- H.M = = = 5.62
n
1.78
∑ (f / X )
i=1
25
Relationship between the A.M, G.M and H.M is given by (A.M) (H.M) = (G.M) 2. But also note
that A.M ≥ G.M ≥ H.M.
Also the empirical relationship between mean, median and mode is given by-
mode = mean-3(mean – median).
That is, mode = 3 median – 2 mean
3 median−mode
Mean =
2
Fractiles also known as quantiles is a point on a number scale which is assumed to underline a set
of observation into two groups with known proportions in each group. There are basically three
measures of partition namely quartiles, Deciles and percentiles.
1 Quartiles- Just as the median divides a data into two equal halves, quartile divides the
data into four equal parts.
1st quartile is Q1 = (25% of values below)
2nd quartile is Q2 = median with 50% of data below
3rd quartile is Q3 = (75% of values below)
4th quartile is Q4 = 100%
2 Deciles- These are the nine points that divides a distribution into ten equal parts. It is
denoted by D1, D2, …, D9. Where the 5th decile (D5) is the median of the distribution.
3 Percentiles- These are the 99 points that divides the distribution into 100 equal parts. The
1st percentile is 1% of the observation below it. The 50th percentile is the median.
Example- The data below is the frequency distribution of the scores of students in an examination
Scores 25-32 33-40 41-48 49-56 57-64 65-72 73-80 81-88 89-96
Number of 3 4 6 9 8 10 3 4 3
students
Solution-
Scores F Class boundary Cumulative freq.
25-32 3 24.5-32.5 3
26
33-40 4 32.5-40.5 7
41-48 6 40.5-48.5 13
49-56 9 48.5-56.5 22
57-64 8 56.5-64.5 30
65-72 10 64.5-72.5 40
73-80 3 72.5-80.5 43
81-88 4 80.5-88.5 47
89-96 3 88.5-96.5 50
50
A histogram
Mode = 66.5
Using the formula-
D1 2 2 16
L1 + ( ) W= 64.5 + ( ) * 8 = 64.5 + ( ) * 8 = 64.5+ = 66.3
D1+ D 2 2+ 7 9 9
27
The Ogive
Where N= ∑ f
50
N/2 = = 25
2
Median from Ogive = 59.5
N
( −Cb)
Median using the formula =Lm + 2 W
Fm
Where the median class = 57-64, Lm = 56.5, Cb = 22, W = 8, Fm = 8
(25−22) (3)
Median = 56.5 + * 8 = 56.5 + * 8 = 56.5 + 3 = 59.5
8 8
1 1
Q1 = * N = * 50 = 12.5
4 4
Q1 = 48.0
3
D3 = * 50 = 15
10
D3 = 50.5
70
70th percentile P70= * 50 = 35
100
P70= 68.5
3
Q3 = * 50 = 37.5
4
Q3 = 70.5
Q3−Q 1 70.5−48.0
Semi interquartile range = = = 11.25
2 2
Assignment-
1 The marks of 50 candidates in an examination for which the maximum mark was 100 are
given below
62 21 4 26 7 38 32 64 12 38
45 6 33 55 62 48 49 7 9 41
21 30 31 3 25 57 48 8 18 43
72 23 5 8 37 3 31 39 65 53
4 75 17 4 61 50 51 38 36 40
A group your sample into a frequency distribution with intervals of 0-9, 10-19 etc.
B plot the sample cumulative distribution (Ogive)
C use the method of coding to evaluate the sample mean.
Calculate the (i) Arithmetic mean, (ii) Geometric mean, (iii) Harmonic mean (iv) what is the
relationship between the I, ii, and iii above from your calculations.
28
3 The distribution of marks scored by 100 students in a given examination were recorded as
shown below.
Marks
(%) 0-9 10-19 20-29 30-39 40-49- 50-59 60-69 70-79 80-89 90-99
Number
of
students 2 7 8 13 24 30 6 5 3 2
a Draw the cumulative frequency curve, from your curve determine (i) median, (ii) quartiles
(iii) quartile deviation.
b Draw the histogram for the distribution. From your histogram, estimate the (i) median, (ii)
mode.
X f fX Y = X-312 fY
304 1 304 -8 -8
308 5 1540 -4 -20
312 9 2808 0 0
316 4 1264 4 16
320 4 1280 8 32
324 2 648 12 24
25 7844 44
X̅ =
∑f X =
7844
= 313.76
∑f 25
Ӯ=
∑fY =
44
= 1.76
∑f 25
Y = X-321
Ӯ = X̅-312
29
X̅ = Ӯ + 312, X̅ = 1.76 + 312 = 313.76
(2)
Time f X Y = X-45.5 fY fX
11-20 1 15.5 -30 -30 15.5
21-30 2 25.5 -20 -40 51
31-40 5 35.5 -10 -50 177.5
41-50 11 45.5 0 0 500.5
51-60 8 55.5 10 80 444
61-70 2 65.5 20 40 131
71-80 11 75.5 30 330 830.5
40 330 2150
X̅ =
∑f X and
∑f
fY 330
Ӯ= = = 8.25
∑f 40
Y = X- 45.5
Ӯ = X̅-45.5
X̅ = Ӯ + 45.5
X ̅=
∑f X =
2150
= 53.75
f 40
Assignment- Using the coding method, find the mean of the following numbers.
MEASURES OF DISPERSION
This measures the degree to which item vary in numerical values. They are also called measures
of variability.
1 The range
2 Mean deviation
3 Variance
30
4 Standard deviation
5 Relative dispersion
The range- The range is the simplest measure of dispersion with respect to its concept and
computation. It is the difference between the minimum and maximum items of the series. For
grouped data in a frequency distribution, the range is the difference between the upper limit of the
class at the upper extreme and the lower limit of the class at the lower extreme of the distribution.
41 20 15 65 73 84 53 35 71 55
15 20 35 41 53 55 65 71 73 84
L= 84, S = 15
Range = L – S = 84 – 15 = 69
L−S 84−15 69
Coefficient of range = = = = 0.697
L+S 84+ 15 99
MEAN DEVIATION- the mean deviation for a set of ungrouped data is given by
∑ │d │ =
n
∑ │ X −X ̅ │
n
Example (1) calculate the mean deviation about the mean for the following series –
15 20 17 19 21 13 12 10 17 9 12
Solution- n = 11
31
X d =X -X̅ │d│= │X - X̅│
15 0 0
20 5 5
17 2 2
19 4 4
21 6 6
13 -2 2
12 -3 3
10 -5 5
17 2 2
9 -6 6
12 -3 3
∑ │ d │= │X - X̅│ = 38
Mean deviation =
∑ │d │ = 38 = 3.455
n 11
Mean deviation for grouped data- Let X1, X2, X3, …, Xn occur with frequencies f1, f2, f3, …, fn
respectively, and let ∑ f = n and X = mean, then the mean deviation is given by the formula
Mean deviation =
∑ f │ X− X ̅ │
=
∑ f │d│
∑f n
Example (2) find the mean deviation from (i) mean (ii) Median, for the following data.
Marks 20 18 16 14 12 10 8 6
No. of
students 2 4 9 18 27 25 14 1
Solution-
Marks (x) No. of students (f) f*X │d│= │X- X̅│ f * │d│
6 1 6 6 6
8 14 112 4 56
10 25 250 2 50
12 27 324 0 0
14 18 252 2 36
16 9 144 4 36
18 4 72 6 24
20 2 40 8 16
∑ f = 100 ∑ f x= 1200 ∑ f │d │= 224
32
Mean =
X
=
∑ fx = 1200 = 12
= ∑ f 100
=
Mean deviation =
∑ f │ X− X ̅ │
=
∑ f │ d │ = 224 = 2.24
∑f ∑f 100
Solution-
Marks (x) No. of students (f) Cumulative frequency f│d│= f│x - 12│
6 1 1 6
8 14 15 56
10 25 40 50
12 27 67 0
14 18 85 36
16 9 94 36
18 4 98 24
20 2 100 16
= 100 = 224
Median = Average of n/2th and (n/2 + 1)th item. = average of 50th and 51st item = 12
Example (3) calculate the mean deviation from the mean for the following data.
33
Mean = x̅ =
∑ fx = 1670 = 33.4
∑ f 50
Mean deviation =
∑ f │ x −x ̅ │
=
659.2
= 13.184
∑f 50
Standard deviation- This is the positive square root of the average of squared deviations taken
from the mean. Let x be a random variate which takes on n values thus, X1, X2, X3, …,Xn, then
the standard deviation of these n observations is given by;
S.D = σ =
√∑ (x−x ̅ ) 2 OR
n
S.D = σ =
√ ∑
n
x2 ∑x
−(
n
¿ )2 ¿ OR σ =
√ ∑ x 2 −(x ̅ )2
n
where X̅ =
∑ x and N or n = ∑ f
n
1
σ2=
N
∑ (x−x ̅ )2
σ
Coefficient of standard deviation =
X̅
Example (1) Find the standard deviation, variance and coefficient of standard deviation of the
following numbers. 3, 4, 5, 6
Solution- n = 4, ∑ X = 3+4+5+6 = 18
∑ x 2= 32 + 42 + 52 + 62 = 9 + 16 + 25 + 36 = 86
X̅ =
∑ x = 18 = 4.5
n 4
(i) σ=
√ ∑ x 2 −( ∑ x ¿ )2=
n n √ 86 18
4
−( ¿ )2 ¿ ¿
4
34
σ = √ 21.5−(4.5 ¿) 2 ¿ = √ 21.5−¿ 20.25 ¿ = √ 1.25
σ = 1.12
(ii) variance = σ2= (1.12)2 = 1.25
σ 1.12
(iii) Coefficient of S.D = = = 0.25
X̅ 4.5
Example 2 Find the standard deviation and variance of the following data.
Size 10 11 12 13 14 15 16
frequency 2 7 11 15 10 4 1
Solution-
σ=
√ ∑ fx 2 −( ∑ fx ¿ )2 ¿ =
n n √ 8282 640
50
−(
50
¿ )2 ¿ = √ 165.64−163.84 = √ 1.8 =
σ = 1.342
σ2 = (1.342)2 = 1.8
Calculation of standard deviation using the shortest method or assumed mean method
OR
σ=
√ ∑ fd 2 −( ∑ fd ¿ )2 ¿* i
n n
35
x−A
where, d =
i
,n= ∑ f , A = Assumed mean
Example 3. Find the standard deviation for the following distribution.
Solution- Let A = 4
σ=
√ ∑ fd 2 −( ∑ fd ¿ )2 ¿* i
n n
= 10 *
√ 152 −30
68
−(
68
¿ )2¿ =10∗√ (2.2352−0.1946¿)¿
= 10 * √ 2.0406 = 14.3
σ2 = (14.3)2 = 204.49
Assignment- (1) In a study on customers, the following data was obtained. Find the standard
deviation and variance of the data.
Ages (in
years 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89
Number
of items 1 0 1 10 17 38 9 3
2 The marks obtained by 10 students in an examination were as follows; 70, 65, 68, 70, 75,
73, 80, 70, 83, 86. Find the mode, mean, median, range and mean deviation about the
36
mean. Also find the coefficient of mean deviation based on (i) mean, (ii), median, (iii)
mode. Also find the coefficient of range.
RELATIVE DISPERSION
The most commonly used measure of relative dispersion is the coefficient of variation C.V
given by;
σ
C.V = * 100 Where σ = standard deviation, x̅ = mean
x̅
Example – calculate the coefficient of variation for the following data; 9.5, 13.7, 10.4, 8.6
X X2
9.5 90.25
13.7 187.69
10.4 108.16
8.6 73.96
42.2 460.06
X
= =
∑x =
42.2
= 10.55
n 4
=
σ=
√ ∑ x 2 −( ∑ x ¿ )2 ¿=
n n √ 460.06 42.2
4
−(
4
¿ )2¿
37
σ 1.92
C.V = * 100 = * 100 = 18.23%
x̅ 10.55
Assignment- Calculate the coefficient of variation for the quarterly profits of 2 companies.
Interpretation of skewness;
mean−mode x ̅ −mode
SKP = =
standard deviation σ
38
mean−( 3 median−2 mean)
SKP =
standard deviation
Q3+Q 1−2median
SKP =
Q3−Q 1
Example 1- From the information given below, calculate (a) Karl Pearson’s coefficient of
skewness (b) quarterly coefficient.
39
Standard deviation 30 55
Third quartile 195 260
Lower quartile 62 80
Solution-
Product A
Given X̅ = 150, median = 142, σ = 30, Q3 = 195, Q1 = 62
Product B
mean−mode x ̅ −mode
SKP = =
standard deviation σ
Product A
150−126 24
SKP(A) = = = 0.8
30 30
Product B
140−185 45
SKP(B) = = = 0.82
55 55
Q3+Q 1−2median
Quartile coefficient of skewness =
Q3−Q 1
Product A
195+62−2∗142
SKP(A) = = - 0.203
195−62
Product B
260+80 – 2∗155
SKP(B) = = 0.167
260−80
40
Calculate kelly’s coefficient of skewness for the data given below
Solution-
N = ∑ f = 270
10 10
Tenth percentile = P10 = ∗270 =27th item & P10 = ∗271 =27.1th item
100 100
P10 = 4
50 270 50 271
Median or P50= ∗270 or = 135th item& ∗271 or = 135.5th item
100 2 100 2
P50= median = 7
90 50
Ninetieth percentile = P90 = ∗270 = 243.9th item & ∗271 = 243.9th item
100 100
P90 = 10
( P 1 0+ P 90−2 P 50 ) 10+4−2∗7
Kelly’s coefficient of skewness = = =0
P 90−P10 10−4
41
1. Leptokurtic- A distribution with a relatively high peak.
2. Mesokurtic- A distribution with a normal peak.
3. Platykurtic- A distribution with a low or flat peak.
µ4
Km =
(σ 2) 2
1
(Q 3−Q 1)
The percentile coefficient of kurtosis is given by; Kp = 2
P 90−P 10
Example- Given that Q1 = 53.75, Q3 = 76.33, P 10 = 43.0 and P90 = 85.5. find the percentile
coefficient of kurtosis and how would you describe the shape of the distribution?
1
(76.33−53.75) 11.29
Kp = 2 = = 0.266
42.5
85.5−43.0
This implies that the distribution is slightly platykurtic since the calculated value is slightly higher
than 0.263.
µ4 4253.21 4253.21
Km = = = = 3.61
(σ 2) 2 (34.32)2 1178.21
Assignment-
(1) From the information given below calculate the Karl Pearson’s coefficient of skewness.
42
Mean 256.5 240.8
Median 201.0 201.6
Standard deviation 215.4 181.1
Third quartile 260.0 242.0
First quartile 157.0 164.2
(2) The series at an aptitude test by 100 candidates are given below. Calculate Karl Pearson’s
coefficient of skewness.
INDEX NUMBERS
43
2. Value index numbers
3. Quantity index numbers
PRICE RELATIVE- A price relative expresses the unit price of an item in each periods a
percentage of the item unit in the base period. Another name for price relative is simple price
index.
'
curre nt yea r sprice P1
Price relative = * 100 = * 100
'
Base yea r sprice P0
Example 1- The table below gives the collection of the average unit price of four food items
consumed by a certain family in 2010 and 2015. Using 2010 as the base year, calculate the price
relative.
Example 2- The table below shows price for meat for the years 2000 to 2004, using year 2000 as
the base year, calculate the price relative.
Q1
Quantity/volume relative = * 100
Q0
44
Example 3- The table below shows the production figures in thousand tones of three agricultural
crops for the years 2007 and 2009. Calculate the quantity relative using 2007 as the base year.
P 1Q 1 V1
Value relative = * 100 = * 100
P0Q0 V0
Example 4- The table below gives the production figure and the unit prices of items in the
previous table, using 2007 as the base year, calculate the value relative.
SAPI =
∑ P 1 * 100 = = 522 * 100 = 126.39
∑ P0 413
SAQI =
∑ Q 1 * 100 = = 710+130+794 * 100 = = 1634 * 100 = 108.36
∑ Q0 650+138+720 1508
P 1Q 1 35352
SAVI = * 100= * 100 = 126.76
P0Q0 27890
Some of the important method of constructing weighted aggregative index numbers include;
P01 =
∑ P 1Q 1 * 100
∑ P 0Q 0
2. Paascher’s price index
P0 =
∑ P 1Q 1 * 100
∑ P 0Q 1
3. Kelly’s price index (KPI)
46
P01 =
∑ P 1Q * 100, where Q = Q0+ Q1
∑ P 0Q 2
Q01 =
∑ Q 1 P 0 * 100
∑ Q0P0
5 Paascher’s Quantity index
Q01 =
∑ P 1Q 1 * 100
∑ Q0P1
6 Kelly’s Quantity index (KPI)
Q01 =
∑ Q 1 P * 100, where P = P 0+ P 1
∑ Q0P 2
Example 7- compute laspeyers, paasche’s and kelly’s price index number for 2005 from the
following data.
2000 2005
Commodity Price Rs Quantity Price Quantity
A 15 15 22 12
B 20 5 27 4
C 4 10 7 5
Solution-
Q=
Q0+ Q1
Commodity P0 Q0 P1 Q1 P0Q0 P1Q1 P1Q0 P0Q1 Q0+Q1 2 P1Q P0Q
A 15 15 22 12 225 264 330 180 27 13.5 297.0 202.5
B 20 5 27 4 100 108 135 80 9 4.5 121.5 90.0
C 4 10 7 5 40 35 70 20 15 7.5 52.5 30.0
365 407 535 280 471 322.5
Lp1 =
∑ P 1Q 0 *100 = 535+ P1 * 100 = 146.57
∑ P 0Q 0 365
Pp1 =
∑ P 1Q 1 * 100 = 407 * 100 = 145.35
∑ P 0Q 1 280
47
KP1 =
∑ P 1Q *100 = 471 * 100 = 146.05
∑ P 0Q 322.5
P1 P 1
L.R = * 100
P0 P0
Example 8-. From the fixed base index numbers given below. Calculate the link relative
Assignment (1)- The following data relate to the price of rice per kg in different years.
Year 1998 1999 2000 2001 2002 2003 2004 2005
Price in RS 6 7 7 8 10 14 12 13
Find out the price relatives
Example (3)- For the following compute the laspeyer’s and kelly’s price and quantity index
number for 2005 with 1995 as the base year.
48
1995 2005
Commodity Quantity Value Quantity Value
A 50 350 60 420
B 120 600 140 700
C 30 330 20 200
D 20 360 15 300
E 5 40 5 50
RATIO
A ratio is a comparison of two numbers (two related quantities). It can be written in many ways.
5
For example, the ratio of 5 to 9 can be expressed as; 5 to 9, 5 : 9,
9
A ratio compares quantities of the same kind with the same units in a definite order. The ratio itself
has no units.
In a class, there are 15 girls, 20 boys and 1 teacher.
The ratio of girls to boys is 15:20 or 15 to 20.
The ratio of teachers to students is 1:35 or 1 to 35.
Ratios containing two quantities can be written as a fraction. Once the ratio is
written as a fraction it is then possible for it to be written as a decimal and
percentage.
Ratio Fraction Decimal Percentage
1 1∗25 25
= = 0.25
1:4 4 4∗25 100 0.25 * 100 = 25%
When writing ratios, it is expected that they will be written in their simplest form containing whole
numbers. Remember ratios can be written as fractions and simplifying fractions ideas are used to
simplify ratios.
Examples:
Simplify 12:20
To simplify, the Highest Common Factor of 12 and 20 is required. Remember the HCF is the highest
number that goes into 12 and 20, the HCF is 4. If the common factor used is not the HCF, then
simplifying may take two or more stages.
=12÷4:20÷4 = 3:5
49
Simplify 35:75
In this question the HCF is 5.
=35÷5:75÷5 = 7:15
Simplify 300mm:1.2m
In this question the units are different, so the first step is make the units the same. Generally, it is
better to express both in the smaller unit. (1.2m = 1.2 x 1000 =1200 mm)
=300:1200 =300÷300:1200÷300 = 1:4
Simplify 0.5:1.25
Because both numbers are expressed as decimals, multiplying both sides by 10, 100, 1000, etc is
required to make both whole numbers. The number with the most decimal place is 1.25, that is 2
decimal places, multiplying by 100 is required to remove decimals.
=0.5×100:1.25×100 = 50:125
Now the HCF of 50 and 125 is 25, the ratio can be simplified.
=50:125 = 50÷25:125÷25 = 2:5
Simplify₦5.00 is to 25k
= ₦5.00 = 5 * 100 = 500k (converting naira to kobo to have the same unit)
500 20
= 500k : 25k = = 20 :1
25 1
Further Examples
2 If 2 : 5 =120 : x, calculate x
2 120
Solution = =
5 x
120∗5
2x = 120 * 5 =x= = 300
2
3 The ages of three men are in the ratio 3 : 4 : 5. If the difference between the ages of the
oldest and youngest is 18 years, find the sum of the ages of the three men.
Solution
Total ratio = 3+4+5 = 12
Let the sum of their ages be x
5x 3x 5x 3x 5 x−3 x
The oldest = The youngest = = - = 18, = = 18
12 12 12 12 12
50
216
2x = 18 * 12 = 216 x= = 108
2
Exercises
2. Divide ₦420 among Ada, Olu and Femi such that Ada has twice as much as Olu and Olu
twice as much as Femi.
FURTHER EXERCISES
Find the definition/meaning of the following:-
(1) Variable (2) Random Variable (3) Event (4) parameter
(5) Sample Space (6) Sample (7) Statistic
What is a system?
A system is a network of things, agents etc. connected or working towards a given
objective/output.
According to 2006 population census, Nigeria is home to over 140 million people on a land
area of approximately 91,907.9 sq. Kilometres. The country operates a federal system of
government with thirty-six (36) states and the Federal Capital Territory (FCT). The entire
country is further sub-divided into six (6) geopolitical zones and seven hundred and seventy four
(774) Local Government Areas (LGAs). With over 500 ethnic groups, over 250
languages and over 20 million households, Nigeria is well diversified in culture and rich in
human resources. Nigeria attained independence in 1960 and a Republic status in 1963. The
political development is characterized by instability with nearly three (3) decades of military
dictatorship but have enjoyed uninterrupted democratic rule since May 29, 1999.
51
1. The institutions involved in the data production and compilation. These include:
National Bureau of Statistics (NBS), Central Bank of Nigeria (NBS), National
Population Commission, Department of Planning, Research and Statistics (DPRS) of
Ministries and Parastatals, the State Statistical Agencies (SSAs), Budget and Planning
of Local Government Councils among others.
2. The data users or the clientele of data production systems. This group demands and
utilizes statistical products and services. The members are quite diverse in nature and
the number is large and ever increasing. Some of them include: Policy and decision
makers in government ministries and institutions, Politicians (e.g. Members of the
national and state assemblies, Political parties, etc), Researchers, Academicians, Non
Governmental Organisations (NGOs), Donor Community, International
Organisations, the Media and the general public.
3. Those who cooperate and supply data to the data collecting agencies popularly called
data suppliers. These are mainly individuals, groups, households and establishments.
4. The research and training institutions. Among others, the research institutions include
Nigerian Institute of Social and Economic Research (NISER), Centre for Econometric
and Allied Research (CEAR) and Federal Institute of Industrial Research, Oshodi
(FIIO). The training institutions also include, Federal School of Statistics (FSS) and
the Department of Statistics in Nigerian tertiary institutions, particularly, Department
of Statistics in the Nigerian Premier University, University of Ibadan.
The objectives of the Nigerian statistical system as stipulated in the Statistics Act 2007 are
to:
1. Raise public awareness about the importance and role of statistical information to society;
2. Collect, process, analyze and disseminate quality statistical data;
3. Promote the use of bestpractice and international standards in statistical production,
management and dissemination;
4. Promote the use of statistical data and information at individual, institutional, Local
Government Area, State, National and International levels, especially for evidence-based
policy design and decision-making.
5. Build sustainable capacity for the production and useof statistical data and information in
the country for planning purposes.
Uses of statistics
Statistical Information are needed for:
52
Major challenges
1. Inadequate funding
2. Slow pace of statistical development at the sub-national level
3. Inadequate competent staff at the sub-national level
4. Apathy towards statistics
5. Overlapping Roles Among Agencies
6. Problems with statistical outputs
Recommendations
Solution;
Let Femi’s share be x
Olu will be 2x
Ada will be 4x
X + 2x + 4x = 7x
4x
Ada = * 420 = ₦240
7x
2x
Olu = * 420 = ₦120
7x
x
Femi = * 420 = ₦60
7x
OR
7x = 420, x = 60
Femi = x = ₦60
Olu = 2x = 60 * 2 = ₦120
Ada = 4x = 60 * 4 = ₦240
53