0% found this document useful (0 votes)
34 views37 pages

Unit 3

This document covers statistics used in data science, focusing on measures of central tendency, including mean, median, and mode. It explains how to calculate these measures for both raw and grouped data, providing examples and formulas for clarity. Additionally, it introduces the concept of measures of dispersion and various methods for calculating the mean, such as direct and assumed mean methods.

Uploaded by

favoha4730
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views37 pages

Unit 3

This document covers statistics used in data science, focusing on measures of central tendency, including mean, median, and mode. It explains how to calculate these measures for both raw and grouped data, providing examples and formulas for clarity. Additionally, it introduces the concept of measures of dispersion and various methods for calculating the mean, such as direct and assumed mean methods.

Uploaded by

favoha4730
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Integrated Master of Science (IT) [iMSc (IT)]

(Core Course)
Semester IV
221601404 DATA SCIENCE
Unit 3
Statistics used in Data Science
Measures of Central Tendancy

MEAN

MEDIAN
MODE
Measures of Dispersion

Range

Quartile Deviation & Co-efficent of QD


Standard Deviation and Co-efficient of standard Deviation & Variance
MODULE - 6 Measures of Central Tendency
Statistics

Notes
25
MEASURES OF CENTRAL TENDENCY

In the previous lesson, we have learnt that the data could be summarised to some extent
by presenting it in the form of a frequency table. We have also seen how data were
represented graphically through bar graphs, histograms and frequency polygons to get
some broad idea about the nature of the data.
Some aspects of the data can be described quantitatively to represent certain features of
the data. An average is one of such representative measures. As average is a number of
indicating the representative or central value of the data, it lies somewhere in between the
two extremes. For this reason, average is called a measure of central tendency.
In this lesson, we will study some common measures of central tendency, viz.

(i) Arithmetical average, also called mean


(ii) Median
(iii) Mode

OBJECTIVES
After studying this lesson, you will be able to
• define mean of raw/ungrouped and grouped data;
• calculate mean of raw/ungrouped data and also of grouped data by ordinary
and short-cut-methods;
• define median and mode of raw/ungrouped data;
• calculate median and mode of raw/ungrouped data.

25.1 ARITHMETIC AVERAGE OR MEAN


You must have heard people talking about average speed, average rainfall, average height,
average score (marks) etc. If we are told that average height of students is 150 cm, it does
not mean that height of each student is 150 cm. In general, it gives a message that height of

634 Mathematics Secondary Course


Measures of Central Tendency MODULE - 6
Statistics
students are spread around 150 cm. Some of the students may have a height less than it,
some may have a height greater than it and some may have a height of exactly 150 cm.

25.1.1 Mean (Arithmetic average) of Raw Data


Notes
To calculate the mean of raw data, all the observations of the data are added and their sum
is divided by the number of observations. Thus, the mean of n observations x1, x2, ....xn is

x1 + x 2 + ... + x n
n
It is generally denoted by x . so

x1 + x 2 + ... + x n
x =
n

∑x
i =1
i
= (I)
n

where the symbol “Σ” is the capital letter ‘SIGMA’ of the Greek alphabet and is used to
denote summation.
To economise the space required in writing such lengthy expression, we use the symbol Σ,
read as sigma.
n

In ∑x
i =1
i , i is called the index of summation.

Example 25.1: The weight of four bags of wheat (in kg) are 103, 105, 102, 104. Find the
mean weight.
103 + 105 + 102 + 104
Solution: Mean weight ( x ) = kg
4

414
= kg = 103.5 kg
4
Example 25.2: The enrolment in a school in last five years was 605, 710, 745, 835 and
910. What was the average enrolment per year?
Solution: Average enrolment (or mean enrolment)

605 + 710 + 745 + 835 + 910 3805


= = = 761
5 5

Mathematics Secondary Course 635


MODULE - 6 Measures of Central Tendency
Statistics
Example 25.3:The following are the marks in a Mathematics Test of 30 students of
Class IX in a school:
40 73 49 83 40 49 27 91 37 31
Notes
91 40 31 73 17 49 73 62 40 62
49 50 80 35 40 62 73 49 31 28
Find the mean marks.
Solution: Here, the number of observation (n) = 30
x1 = 40, x2 = 73, ........., x10 = 31
x11 = 41, x12 = 40, ........., x20 = 62
x21 = 49, x22 = 50, ........., x30 = 28
From the Formula (I), the mean marks of students is given by

30

∑x i
40 + 73 + .... + 28 1455
Mean = ( x ) = i =1
= =
n 30 30

= 48.5
Example 25.4: Refer to Example 25.1. Show that the sum of x1– x , x2– x , x3– x and
x4– x is 0, where xi’s are the weights of the four bags and x is their mean.
Solution: x1– x = 103 – 103.5 = – 0.5, x2– x = 105 – 103.5 = 1.5
x3– x = 102 – 103.5 = – 1.5, x4– x = 104 – 103.5 = 0.5
So, (x1– x ) + (x2– x ) + (x3– x ) + (x4– x ) = – 0.5 + 1.5 + (–1.5) + 0.5 = 0
Example 25.5: The mean of marks obtained by 30 students of Section A of Class X is
48, that of 35 students of Section B is 50. Find the mean marks obtained by 65 students
in Class X.
Solution: Mean marks of 30 students of Section A = 48
So, total marks obtained by 30 students of Section A = 30 × 48 = 1440
Similarly, total marks obtained by 35 students of Section B = 35 × 50 = 1750
Total marks obtained by both sections = 1440 + 1750 = 3190
3190
Mean of marks obtained by 65 students = = 49.1 approx.
65
Example 25.6: The mean of 6 observations was found to be 40. Later on, it was detected
that one observation 82 was misread as 28. Find the correct mean.

636 Mathematics Secondary Course


MODULE - 6 Measures of Central Tendency
Statistics

25.1.2 Mean of Ungrouped Data


We will explain to find mean of ungrouped data through an example.
Find the mean of the marks (out of 15) obtained by 20 students.
Notes
12 10 5 8 15 5 2 8 10 5
10 12 12 2 5 2 8 10 5 10
This data is in the form of raw data. We can find mean of the data by using the formula (I),

i.e.,
∑x i
. But this process will be time consuming.
n
We can also find the mean of this data by first making a frequency table of the data and
then applying the formula:
n

∑fx i i
mean = x = i =1
n
(II)
∑f
i =1
i

where fi is the frequency of the ith observation xi.


Frequency table of the data is :
Marks Number of students
(xi) (fi)
2 4
5 5
8 3
10 5
12 2
15 1
Σfi = 20
To find mean of this distribution, we first find fi xi, by multiplying each xi with its
corresponding frequency fi and append a column of fixi in the frequency table as given
below.
Marks Number of students f ix i
(xi) (fi)
2 4 2×4 = 8
5 5 5× 5 = 25
8 3 3× 8 = 24
10 5 5 × 10 = 50
12 2 2 × 12 = 24
15 1 1 × 15 = 15
Σfi = 20 Σfi xi = 146

638 Mathematics Secondary Course


Measures of Central Tendency MODULE - 6
Statistics

∑fx i i
=
146
= 7.3
Mean =
∑f i 20

Example 25.7: The following data represents the weekly wages (in rupees) of the Notes
employees:
Weekly wages 900 1000 1100 1200 1300 1400 1500
(in `)
Number of 12 13 14 13 14 11 5
employees
Find the mean weekly wages of the employees.
Solution: In the following table, entries in the first column are xi’s and entries in second
columen are fi’s, i.e., corresponding frequencies. Recall that to find mean, we require the
product of each xi with corresponding frequency fi. So, let us put them in a column as
shown in the following table:
Weekly wages (in `) Number of employees f ix i
(xi) (fi)
900 12 10800
1000 13 13000
1100 14 15400
1200 13 15600
1300 12 15600
1400 11 15400
1500 5 7500
Σfi = 80 Σfi xi = 93300

Using the Formula II,

∑fx i i 93300
Mean weekly wages = =`
∑f i 80

= ` 1166.25
Sometimes when the numerical values of xi and fi are large, finding the product fi and xi
becomes tedius and time consuming.
We wish to find a short-cut method. Here, we choose an arbitrary constant a, also called
the assumed mean and subtract it from each of the values xi. The reduced value,
di = xi – a is called the deviation of xi from a.
Thus, xi = – a + di

Mathematics Secondary Course 639


MODULE - 6 Measures of Central Tendency
Statistics

and fixi = afi + fidi


n n n

Notes
∑ f x =∑ af + ∑ f d
i =1
i i
i =1
i
i =1
i i [Summing both sides over i from i to r]

1
Hence x = ∑ f i + ∑ fi di , where Σfi = N
N

1
x =a+
N
∑ f i di (III)

[since Σfi = N]
This meghod of calcualtion of mean is known as Assumed Mean Method.
In Example 25.7, the values xi were very large. So the product fixi became tedious and
time consuming. Let us find mean by Assumed Mean Method. Let us take assumed
mean a = 1200
Weekly wages Number of Deviations f id i
(in `) (xi) employees (fi) di = xi – 1200
900 12 – 300 – 3600
1000 13 – 200 – 2600
1100 14 – 100 – 1400
1200 13 0 0
1300 12 100 + 1200
1400 11 200 + 2200
1500 5 300 + 1500
Σfi = 80 Σfi di = – 2700

Using Formula III,

1
Mean = a +
N
∑ f i di
1
= 1200 + (– 2700)
80
= 1200 – 33.75 = 1166.25
So, the mean weekly wages = ` 1166.25
Observe that the mean is the same whether it is calculated by Direct Method or by Assumed
Mean Method.

640 Mathematics Secondary Course


Measures of Central Tendency MODULE - 6
Statistics

Example 25.8: If the mean of the following data is 20.2, find the value of k
xi 10 15 20 25 30
fi 6 8 20 k 6
Notes

∑fx i i
=
60 + 120 + 400 + 25k + 180
Solution: Mean =
∑f i 40 + k

760 + 25k
=
40 + k
760 + 25k
So, = 20.2 (Given)
40 + k
or 760 +25k = 20.2 (40 + k)
or 7600 + 250k = 8080 + 202k
or k = 10

CHECK YOUR PROGRESS 25.2


1. Find the mean marks of the following distribution:
Marks 1 2 3 4 5 6 7 8 9 10
Frequency 1 3 5 9 14 18 16 9 3 2
2. Calcualte the mean for each of the following distributions:
(i) x 6 10 15 18 22 27 30
f 12 36 54 72 62 42 22

(ii) x 5 5.4 6.2 7.2 7.6 8.4 9.4


f 3 14 28 23 8 3 1
3. The wieghts (in kg) of 70 workers in a factory are given below. Find the mean weight
of a worker.
Weight (in kg) Number of workers
60 10
61 8
62 14
63 16
64 15
65 7

Mathematics Secondary Course 641


MODULE - 6 Measures of Central Tendency
Statistics
4. If the mean of following data is 17.45 determine the value of p:
x 15 16 17 18 19 20
f 3 8 10 p 5 4
Notes
25.1.3 Mean of Grouped Data
Consider the following grouped frequency distribution:

Daily wages (in `) Number of workers


150-160 5
160-170 8
170-180 15
180-190 10
190-200 2

What we can infer from this table is that there are 5 workers earning daily somewhere
from ` 150 to ` 160 (not included 160). We donot know what exactly the earnings of
each of these 5 workers are
Therefore, to find mean of the grasped frequency distribution, we make the following
assumptions:
Frequency in any class is centred at its class mark or mid point

150 + 160
Now, we can say that there are 5 workers earning a daily wage of ` =
2
160 + 170
` 155 each, 8 workers earning a daily wage of ` = ` 165, 15 workers aerning
2
170 + 160
a daily wage of ` = ` 175 and so on. Now we can calculate mean of the given
2
data as follows, using the Formula (II)

Daily wages (in `) Number of Class marks (xi) f ix i


workers (fi)
150-160 5 155 775
160-170 8 165 1320
170-180 15 175 2625
180-190 10 185 850
190-200 2 195 390
Σfi = 40 Σfixi = 6960

642 Mathematics Secondary Course


Measures of Central Tendency MODULE - 6
Statistics

∑fx i i
=
6960
= 174
Mean =
∑f i 40
So, the mean daily wage = ` 174 Notes
This method of calculate of the mean of grouped data is Direct Method.
We can also find the mean of grouped data by using Formula III, i.e., by Assumed Mean
Method as follows:
We take assumed mean a = 175
Daily wages Number of Class marks Deviations f id i
(in `) workers (fi) (xi) di = xi–175
150-160 5 155 – 20 – 100
160-170 8 165 – 10 – 80
170-180 15 175 0 0
180-190 10 185 + 10 100
190-200 2 195 + 20 40
Σfi = 40 Σfidi = – 40
So, using Formula III,
1
Mean = a +
N
∑ f i di
1
= 175 + (–40)
40
= 175 – 1 = 174
Thus, the mean daily wage = ` 174.
Example 25.9: Find the mean for the following frequency distribution by (i) Direct Method,
(ii) Assumed Mean Method.
Class Frequency
20-40 9
40-60 11
60-80 14
80-100 6
100-120 8
120-140 15
140-160 12
Total 75

Mathematics Secondary Course 643


MODULE - 6 Measures of Central Tendency
Statistics
Solution: (i) Direct Method
Class Frequency (fi) Class marks (xi) f ix i

Notes 20-40 9 30 270


40-60 11 50 550
60-80 14 70 980
80-100 6 90 540
100-120 8 110 880
120-140 15 130 1950
140-160 12 150 1800
Σfi = 75 Σfixi = 6970

∑fx i i
=
6970
= 92.93
So, mean =
∑f i 75

(ii) Assumed mean method


Let us take assumed mean = a = 90
Class Frequency (fi) Class marks (xi) Deviation f id i
di = xi – 90
20-40 9 30 – 60 – 540
40-60 11 50 – 40 – 440
60-80 14 70 – 20 – 280
80-100 6 90 0 0
100-120 8 110 + 20 160
120-140 15 130 + 40 600
140-160 12 150 + 60 720
Ν = Σfi = 75 Σfidi = 220

1 220
Mean = a +
N
∑ f i di = 90 +
75
= 92.93

Note that mean comes out to be the same in both the methods.
In the table above, observe that the values in column 4 are all multiples of 20. So, if we
divide these value by 20, we would get smaller numbers to multiply with fi.
Note that, 20 is also the class size of each class.

xi − a
So, let ui = , where a is the assumed mean and h is the class size.
h

644 Mathematics Secondary Course


Measures of Central Tendency MODULE - 6
Statistics
Now we calculate ui in this way and then uifi and can find mean of the data by using the
formula

⎛ ∑ f iU i ⎞
Mean = x = a + ⎜⎜ ⎟⎟ × h (IV) Notes
⎝ ∑ f i ⎠

Let us find mean of the data given in Example 25.9


Take a = 90. Here h = 20

Class Frequency Class Deviation ui= f iu i


(fi) marks (xi) di = xi – 90
20-40 9 30 – 60 –3 – 27
40-60 11 50 – 40 –2 – 22
60-80 14 70 – 20 –1 – 14
80-100 6 90 0 0 0
100-120 8 110 + 20 1 8
120-140 15 130 + 40 2 30
140-160 12 150 + 60 3 36
Σfi = 75 Σfiui = 11

Using the Formula (IV),

⎛ ∑ f i ui ⎞
Mean = x = a + ⎜⎜ ⎟⎟ × h = 90 + 11 × 20
⎝ ∑ fi ⎠ 75

220
= 90 + = 92.93
75
Calculating mean by using Formula (IV) is known as Step-deviation Method.
Note that mean comes out to be the same by using Direct Method, Assumed Method or
Step Deviation Method.
Example 25.10: Calcualte the mean daily wage from the following distribution by using
Step deviation method.
Daily wages (in `) 150-160 160-70 170-180 180-190 190-200
Numbr of workers 5 8 15 10 2

Mathematics Secondary Course 645


MODULE - 6 Measures of Central Tendency
Statistics
Solution: We have already calculated the mean by using Direct Method and Assumed
Method. Let us find mean by Step deviation Method.
Let us take a = 175. Here h = 10
Notes
xi − a
Daily wages Number of Class Deviation ui= f iu i
h
(in `) workers (fi) marks (xi) di = xi – 90
150-160 5 155 – 20 –2 – 10
160-170 8 165 – 10 –1 –8
170-180 15 175 0 0 0
180-190 10 185 10 1 10
190-200 2 195 20 2 4
Σfi = 40 Σfiui = – 4

Using Formula (IV),

⎛ ∑ f i ui ⎞
Mean daily wages = a + ⎜⎜ ⎟⎟ × h = 175 + − 4 × 10 = ` 174
⎝ ∑ fi ⎠ 40

Note: Here again note that the mean is the same whether it is calculated using the Direct
Method, Assumed mean Method or Step deviation Method.

CHECK YOUR PROGRESS 25.3


1. Following table shows marks obtained by 100 students in a mathematics test
Marks 0-10 10-20 20-30 30-40 40-50 50-60
Number of 12 15 25 25 17 6
students
Calculate mean marks of the students by using Direct Method.
2. The following is the distribution of bulbs kept in boxes:
Number of 50-52 52-54 54-56 56-58 58-60
bulbs
Number of 15 100 126 105 30
boxes
Find the mean number of bulbs kept in a box. Which method of finding the mean did
you choose?
3. The weekly observations on cost of living index in a certain city for a particular year
are given below:

646 Mathematics Secondary Course


Measures of Central Tendency MODULE - 6
Statistics

Cost of living 140-150 150-160 160-170 170-180 180-190 190-200


index
Number of 5 8 20 9 6 4
weeks Notes

Calculate mean weekly cost of living index by using Step deviation Method.
4. Find the mean of the following data by using (i) Assumed Mean Method and (ii) Step
deviation Method.
Class 150-200 200-250 250-300 300-350 350-400
Frequency 48 32 35 20 10

25.2 MEDIAN
In an office there are 5 employees: a superviosor and 4 workers. The workers draw a
salary of ` 5000, ` 6500, ` 7500 and ` 8000 per month while the supervisor gets
` 20000 per month.
5000 + 6500 + 7500 + 8000 + 20000
In this case mean (salary) = `
5
47000
=` = ` 9400
5
Note that 4 out of 5 employees have their salaries much less than ` 9400. The mean salary
` 9400 does not given even an approximate estimate of any one of their salaries.
This is a weakness of the mean. It is affected by the extreme values of the observations in
the data.
This weekness of mean drives us to look for another average which is unaffected by a few
extreme values. Median is one such a measure of central tendency.
Median is a measure of central tendency which gives the value of the middle-
most observation in the data when the data is arranged in ascending (or descending)
order.

25.2.1 Median of Raw Data

Median of raw data is calculated as follows:

(i) Arrange the (numerical) data in an ascending (or descending) order

⎛ n +1⎞
(ii) When the number of observations (n) is odd, the median is the value of ⎜ ⎟ th
⎝ 2 ⎠
observation.

Mathematics Secondary Course 647


MODULE - 6 Measures of Central Tendency
Statistics

⎛n⎞
(iii) When the number of observations (n) is even, the median is the mean of the ⎜ ⎟ th
⎝2⎠
Notes ⎛n ⎞
and ⎜ +1⎟ th observations.
⎝2 ⎠
Let us illustrate this with the help of some examples.
Example 25.11: The weights (in kg) of 15 dogs are as follows:
9, 26, 10 , 22, 36, 13, 20, 20, 10, 21, 25, 16, 12, 14, 19
Find the median weight.
Solution: Let us arrange the data in the ascending (or descending) order:
9, 10, 10, 12, 13, 14, 16, 19, 20, 20, 21, 22, 25, 36
Median
Here, number of observations = 15

⎛ n +1⎞ ⎛ 15 + 1 ⎞
So, the median will be ⎜ ⎟ th, i.e., ⎜ ⎟ th, i.e., 8th observation which is 19 kg.
⎝ 2 ⎠ ⎝ 2 ⎠
Remark: The median weight 19 kg conveys the information that 50% dogs have weights
less than 19 kg and another 50% have weights more then 19 kg.
Example 25.12: The points scored by a basket ball team in a series of matches are as
follows:
16, 1, 6, 26, 14, 4, 13, 8, 9, 23, 47, 9, 7, 8, 17, 28
Find the median of the data.
Solution: Here number of observations = 16

⎛ 16 ⎞ ⎛ 16 ⎞
So, the median will be the mean of ⎜ ⎟ th and ⎜ +1⎟ th, i.e., mean of 6th and 9th
⎝2⎠ ⎝ 2 ⎠
observations, when the data is arranged in ascending (or descending) order as:
1, 4, 6, 7, 8, 8, 9, 9, 13, 14, 16, 17, 23, 26, 28, 47
8th term 9th term
9 + 13
So, the median = = 11
2
Remark: Here again the median 11 conveys the information that the values of 50% of the
observations are less than 11 and the values of 50% of the observations are more than 11.

648 Mathematics Secondary Course


Measures of Central Tendency MODULE - 6
Statistics
25.2.2 Median of Ungrouped Data

We illustrate caluculation of the median of ungrouped data through examples.


Example 25.13: Find the median of the following data, which gives the marks, out of 15, Notes
obtaine by 35 students in a mathematics test.
Marks obtained 3 5 6 11 15 14 13 7 12 10
Number of Students 4 6 5 7 1 3 2 3 3 1
Solution: First arrange marks in ascending order and prepare a frequency table as follows:
Marks obtained 3 5 6 7 10 11 12 13 14 15
Number of Students 4 6 5 3 1 7 3 2 3 1
(frequency)

⎛ n +1⎞ ⎛ 35 + 1 ⎞
Here n = 35, which is odd. So, the median will be ⎜ ⎟ th, i.e., ⎜ ⎟ th, i.e., 18th
⎝ 2 ⎠ ⎝ 2 ⎠
observation.
To find value of 18th observation, we prepare cumulative frequency table as follows:
Marks obtained Number of students Cumulative frequency
3 4 4
5 6 10
6 5 15
7 3 18
10 1 19
11 7 26
12 3 29
13 2 31
14 3 34
15 1 35
From the table above, we see that 18th observation is 7
So, Median = 7
Example 25.14: Find the median of the following data:
Weight (in kg) 40 41 42 43 44 45 46 48
Number of 2 5 7 8 13 26 6 3
students

Mathematics Secondary Course 649


MODULE - 6 Measures of Central Tendency
Statistics

Solution: Here n = 2 + 5 + 7 + 8 + 13 + 26 + 6 + 3 = 70, which is even, and weight are


already arranged in the ascending order. Let us prepare cumulative frequency table of the
data:
Notes Weight Number of students Cumulative
(in kg) (frequency) frequency
40 2 2
41 5 7
42 7 14
43 8 22
44 13 35 35th observation
45 26 61 36th observation
46 6 67
48 3 70

⎛n⎞ ⎛n ⎞
Since n is even, so the median will be the mean of ⎜ ⎟ th and ⎜ +1⎟ th observations,
⎝2⎠ ⎝2 ⎠
i.e., 35th and 36th observations. From the table, we see that
35 the observation is 44
and 36th observation is 45

44 + 45
So, Median = = 44.5
2

CHECK YOUR PROGRESS 25.4


1. Following are the goals scored by a team in a series of 11 matches
1, 0 , 3, 2, 4, 5, 2, 4, 4, 2, 5
Determine the median score.
2. In a diagnostic test in mathematics given to 12 students, the following marks (out of
100) are recorded
46, 52, 48, 39, 41, 62, 55, 53, 96, 39, 45, 99
Calculate the median for this data.

650 Mathematics Secondary Course


Measures of Central Tendency MODULE - 6
Statistics
3. A fair die is thrown 100 times and its outcomes are recorded as shown below:
Outcome 1 2 3 4 5 6
Frequency 17 15 16 18 16 18
Notes
Find the median outcome of the distributions.
4. For each of the following frequency distributions, find the median:
(a) xi 2 3 4 5 6 7
fi 4 9 16 14 11 6

(b) xi 5 10 15 20 25 30 35 40
fi 3 7 12 20 28 31 28 26

(c) xi 2.3 3 5.1 5.8 7.4 6.7 4.3


fi 5 8 14 21 13 5 7

25.3 MODE
Look at the following example:
A company produces readymade shirts of different sizes. The company kept record of its
sale for one week which is given below:

size (in cm) 90 95 100 105 110 115


Number of shirts 50 125 190 385 270 28

From the table, we see that the sales of shirts of size 105 cm is maximum. So, the company
will go ahead producing this size in the largest number. Here, 105 is nothing but the mode
of the data. Mode is also one of the measures of central tendency.
The observation that occurs most frequently in the data is called mode of the
data.
In other words, the observation with maximum frequency is called mode of the data.
The readymade garments and shoe industries etc, make use of this measure of central
tendency. Based on mode of the demand data, these industries decide which size of the
product should be produced in large numbers to meet the market demand.

25.3.1 Mode of Raw Data

In case of raw data, it is easy to pick up mode by just looking at the data. Let us consider
the following example:

Mathematics Secondary Course 651


MODULE - 6 Measures of Central Tendency
Statistics
Example 25.15: The number of goals scored by a football team in 12 matches are:
1, 2, 2, 3, 1, 2, 2, 4, 5, 3, 3, 4

Notes What is the modal score?


Solution: Just by looking at the data, we find the frequency of 2 is 4 and is more than the
frequency of all other scores.
So, mode of the data is 2, or modal score is 2.
Example 25.16: Find the mode of the data:
9, 6, 8, 9, 10, 7, 12, 15, 22, 15
Solution: Arranging the data in increasing order, we have
6, 7, 8, 9, 9, 10, 12, 15, 15, 22
We find that the both the observations 9 and 15 have the same maximum frequency 2. So,
both are the modes of the data.
Remarks: 1. In this lesson, we will take up the data having a single mode only.
2. In the data, if each observation has the same frequency, then we say that the data does
not have a mode.

25.3.2 Mode of Ungrouped Data

Let us illustrate finding of the mode of ungrouped data through an example


Example 25.17: Find the mode of the following data:
Weight (in kg) 40 41 42 43 44 45 46 48
Number of Students 2 6 8 9 10 22 13 5
Solution: From the table, we see that the weight 45 kg has maximum frequency 22 which
means that maximum number of students have their weight 45 kg. So, the mode is 45 kg or
the modal weight is 45 kg.

CHECK YOUR PROGRESS 25.5


1. Find the mode of the data:
5, 10, 3, 7, 2, 9, 6, 2, 11, 2
2. The number of TV sets in each of 15 households are found as given below:
2, 2, 4, 2, 1, 1, 1, 2, 1, 1, 3, 3, 1, 3, 0
What is the mode of this data?

652 Mathematics Secondary Course


Median and Interquartile Range
– Grouped Data
Step 1: Construct the cumulative frequency distribution.
Step 2: Decide the class that contain the median.
Class Median is the first class with the value of cumulative
frequency equal at least n/2.
Step 3: Find the median by using the following formula:
⎛ n ⎞
⎜ 2 - F ⎟
M e d ia n = L m + ⎜ ⎟ i
⎜ f m ⎟
⎝ ⎠
Where:
n = the total frequency
F = the cumulative frequency before class median
f = the frequency of the class median
m

i = the class width


Lm = the lower boundary of the class median
Example: Based on the grouped data below, find the median:
Time to travel to work Frequency
1 – 10 8
11 – 20 14
21 – 30 12
31 – 40 9
41 – 50 7
Solution:

1st Step: Construct the cumulative frequency distribution


Time to travel Frequency Cumulative
to work Frequency
1 – 10 8 8
11 – 20 14 22
21 – 30 12 34
31 – 40 9 43
41 – 50 7 50

n 50
= = 25 class median is the 3rd class
2 2
So, F = 22, fm = 12, L = 20.5 and i = 10
m
Therefore,
⎛n ⎞
⎜ - F ⎟
Median = Lm + ⎜ 2 ⎟i
f
⎜ m ⎟
⎝ ⎠
⎛ 25 - 22 ⎞
= 21.5 + ⎜ ⎟ 10
⎝ 12 ⎠
= 24

Thus, 25 persons take less than 24 minutes to travel to work and another 25 persons
take more than 24 minutes to travel to work.
What is Mode of Grouped Data?
Mode is one of the measurements of a dataset’s central tendency that requires the identification of
the data set’s central position as a single number. When dealing with ungrouped data, the mode is
simply the item with the highest frequency. The mode is derived for grouped data using the formula.
Empirical Formula for Mean, Median and Mode
For a moderately skewed frequency distribution, there exists a relationship between mean, median
and mode which is given as below:
Mode = 3 Median – 2 Mean

Example: For a given distribution the values of mean and median are 44 and 43 respectively.
Find the value of mode.
Solution:
We know,

Mode = 3 Median – 2 Mean

⇒ Mode = 3×43 – 2×44

Find the mode of grouped data presented in the table below:

Class Interval Frequency


10-20 8

20-30 15

30-40 12

40-50 5

Solution:
Modal class = 20 – 30

Lower limit of the modal class = (L) = 20

Frequency of the modal class = 15

Frequency of the preceding modal class = 8

Frequency of the next modal class = 12

Size of the class interval = (h) = 10.

⇒ Mode = 20 + 10{15-8/(2×15-8-12)}

⇒ Mode = 20 + 10{7/10]

⇒ Mode = 20 + 7 = 27
Therefore, Mode = 27

Measure of Dispersion in Statistics


Measures of Dispersion measure the scattering of the data. It tells us how the values are distributed
in the data set. In statistics, we define the measure of dispersion as various parameters that are used
to define the various attributes of the data.
These measures of dispersion capture variation between different values of the data.

Types of Measures of dispersion

Range of Data Set


The range is the difference between the largest and the smallest values in the distribution.
Thus, it can be written as
R= L– S

where,
L is the largest value in the Distribution, S is the smallest value in the Distribution
Example: Find the range of the data set 10, 20, 15, 0, 100.
Solution:
• Smallest Value in the data = 0
• Largest Value in the data = 100

Thus, the range of the data set is,

R = 100 – 0

R = 100

Range cannot be calculated for the open-ended frequency distributions. Open-ended


frequency distributions are those distributions in which either the lower limit of the lowest
class or the higher limit of the highest class is not defined.

Range for Ungrouped Data


To find the range for the ungrouped data set, first we have to find the smallest and the largest value
of the data set by observing. The difference between them gives the range of ungrouped data.
We can understyand this with the help of following example:
Example: Find out the range for the following observations, 20, 24, 31, 17, 45, 39, 51, 61.
Solution:
• Largest Value = 61
• Smallest Value = 17
Thus, the range of the data set is

Range = 61 – 17 = 44

Range for Grouped Data


The range of the grouped data set is found by studying the following example,
Example: Find out the range for the following frequency distribution table for the marks
scored by class 10 students.

Marks Intervals Number of Students


0-10 5
10-20 8
20-30 15
30-40 9
Solution:
• For Largest Value: Taking the higher limit of Highest Class = 40
• For Smallest Value: Taking the lower limit of Lowest Class = 0

Range = 40 – 0

Thus, the range of the given data set is,

Range = 40

Interquartile Range (IQR)


The quartiles of a ranked set of data values are three points that divide the data into exactly four
equal parts, each part comprising quarter data.
1. Q1 is defined as the middle number between the smallest number and the median of the data
set.
2. Q2 is the median of the data.
3. Q3 is the middle value between the median and the highest value of the data set.
The interquartile range IQR tells us the range where the bulk of the values lie.
The interquartile range is calculated by subtracting the first quartile
from the third quartile.
IQR = Q3 – Q1

Examples:

Input : 1, 19, 7, 6, 5, 9, 12, 27, 18, 2, 15


Output : 13
The data set after being sorted is
1, 2, 5, 6, 7, 9, 12, 15, 18, 19, 27
As mentioned above Q2 is the median of the data.
Hence Q2 = 9
Q1 is the median of lower half, taking Q2 as pivot.
So Q1 = 5
Q3 is the median of upper half talking Q2 as pivot.
So Q3 = 18
Therefore IQR for given data=Q3-Q1=18-5=13

Upper Quartile is calculated using the formula:

Q3 = ((3 × (n + 1))/4)th term

where,n is the number of terms.

Lower Quartile is calculated using the below formula-

Q1 = ((n + 1)/4)th term

where,

n is the total number of terms.

Question 1: Find Inter Quartile Range for the data 20,10,50,40,25,70,30

Solution:
Step 1: Given data is in unsorted manner. So sort it in ascending order.

10,20,25,30,40,50,70

Step 2: Find first Quartile

Q1 = ((n+1)/4)th term

Here n = 7 (Total 7 terms)

= ((7+1)/4)th term

= (8/4)th term

= 2nd term

2nd term is 20

So Quartile1 = 20

Find Upper/third Quartile

Q3 = ((3x(n+1))/4)th term

Here n = 7 (Total 7 terms)

= ((3×(7+1))/4)th term
= ((3×8)/4)th term

= (24/4)th term

= 6th term

6th term is 50

So Quartile3 = 50

Step 3: Find IQR (Inter Quartile Range)

IQR = Q3 – Q1

= 50 – 20

= 30

Interquartile Range for the given data is 30.

Question 2: Find Inter Quartile Range for the data 22,12,55,45,25,75,30,26,49


Solution:
Step 1: Given data is in unsorted manner. So sort it in ascending order.

12,22,25,26,30,45,49,55,75

Step 2: Find first Quartile

Q1 = ((n+1)/4)th term

Here n = 9 (Total 9 terms)

= ((9+1)/4)th term

= (10/4)th term

= 2.5th term

2.5th term is average of 2nd and 3rd terms

2.5th term = (22+25)/2

= 47/2 => 23.5

So Quartile1 = 23.5

Find Upper/third Quartile


Q3 = ((3x(n+1))/4)th term

Here n = 9 (Total 9 terms)

= ((3x(9+1))/4)th term

= ((3×10)/4)th term

= (30/4)th term

= 7.5th term

7.5th term is average of 7th and 8th terms

7.5th term = (49+55)/2

= 104/2

= 52

So Quartile3 = 52

Step 3: Find IQR (Inter Quartile Range)

IQR = Q3 – Q1

= 52 – 23.5

= 28.5

Interquartile Range for the given data is 28.5


What is Standard Deviation?
Standard Deviation is defined as the degree of dispersion of the data point with respect to the mean
value of the data point. It tells us how the value of the data points varies with respect to the mean
value of the data point and it tells us about the variation of the data point in the sample of the data.
The standard deviation of the given sample of the data set is also defined as the square root of the
variance of the data set. The mean deviation of the n values (say x1, x2, x3, …, xn) is calculated by
taking the sum of the squares of the difference of each value from the mean, i.e.

Example: Find the Standard Deviation of the data set. X = {2, 3, 4, 5, 6}


Solution:
Given: n = 5, and observations xi = {2, 3, 4, 5, 6}

We know,

Mean(μ) = (Sum of Observations)/(Number of Observations)

⇒ μ = (2 + 3 + 4 + 5 + 6)/ 5
⇒μ=4

σ2 = ∑in (xi – x̄)2/n

⇒ σ2 = 1/n[(2 – 4)2 + (3 – 4)2 + (4 – 4)2 + (5 – 4)2 + (6 – 4)2]


⇒ σ2 = 10/5 = 2

Thus, σ = √(2) = 1.414


Standard Deviation of Discrete Data by Assumed mean Method
Standard Deviation of Discrete Data by Step Deviation Method

You might also like