MEASURES OF LOCATION OR CENTRAL TENDENCY
Adegbilero-Iwari Oluwaseun Eniola (PhD)
Department of Community Medicine, College of Medicine and Health Sciences, Afe Babalola University
Ado-Ekiti, Ekiti.
Introduction
The first step in looking at data set is to describe it in some concise way. Measures of location or central
tendency are descriptive measures concerned with the meaningful manner in which raw data (which is
usually in the form of large sets of unorganized numerical values) are summarized and interpreted such
that important features may be identified easily from the statistical investigation. These measures play a
very crucial role in the inferential estimation of a population.
Objectives
The aim of this chapter is to introduce you to descriptive measures such as the measures of location. At
the end of this chapter, you should be able to:
i. Understand and use the summation operator in computation
ii. Understand and summarise data using measures of location techniques.
MEASURES OF LOCATION OR CENTRAL TENDENCY
The basic problem of statistics can be stated as follows. Consider a sample of data x 1, x 2 , … x n where x 1
corresponds to the first sample point and x n to the nth sample point. Presuming that the sample is drawn
from some population P what inferences or conclusions can be made about P from the sample?
Before answering this question, the data must be summarized as much as possible especially when the
sample size is large so as not to loose track of the overall picture. One measure useful for this purpose is
called a measure of Location.
Measures of location or central tendency are central values used to locate the centre of a frequency
distribution i.e. the middlemost value about which other values cluster.
The three most commonly used measures of central tendency are the mean, median and mode.
The Mean
The arithmetic mean (or mean or sample mean) is the most commonly used measure of central tendency.
It is the sum of a collection of observations divided by the number of observations. It is expressed as;
∑ xi Or
∑ f i xi where f i is the frequency of the variablei . The sign ' Σ ' (sigma, a Greek
x=
n ∑ fi
letter) is a summing operator. Simply a short way of writing ¿ ¿+ x 2+ …+ x n ¿ .
Properties of the Mean
1. Uniqueness: For a given set of data there is one and only one Arithmetic mean.
2. Simplicity: It is easily understood and easy to compute.
3. Its major disadvantage is that it is affected by extreme values particularly for small number of
observations. Since each and every value in a data set enters into computation of the mean.
Example 1
Given 4, 2, 3, 1, 6, 7, as the ratings of the quality of health care by six patients in a hospital, compute the
following using the ' Σ ' notation.
1. ∑ xi
2. ∑ xi 2
( )
∑ xi
2
3.
4
Answer
1. ∑ x i ¿ 4 +2+…+7=23
2. ∑ xi 2=42 +22 +…+7 2=115
∑ x i ) =( 23 )2 = 232 = 529 = 33.06
2
3. (
4 4 4
2
16
Example 2
Some researchers studied 13 HIV- positive patients who were treated with highly active anti-retroviral
therapy (HAART) for at least 6 months. The CD4 T cell counts ( ×10 6 /L ¿ at baseline for the 13 subjects
are listed below:
230 205 313 207 227 245 173 58 103 181 105 301 169. Find the mean?
Answer
x=
∑ xi
n
230+205+…+ 169
=
13
= 193.62
Example 3
Suppose in Example 2, the 5th observation is 2. What is its effect on the mean of the data?
Answer: The mean will considerably decrease. This illustrates a major disadvantage of the mean in that
it is affected by extreme values particularly for small number of observations.
Example 4 (Grouped data- Frequency distribution)
The following data are marks obtained in a test by 88 medical students.
Table 1: Frequency distribution of marks obtained by 88 medical students in a Test
Marks 0 ≤ x ≤ 10 10 ≤ x ≤ 20 20 ≤ x ≤ 30 30 ≤ x ≤ 40 40 ≤ x ≤ 50
frequency 6 16 24 25 17
Obtain the mean
Answer
The mean of a frequency distribution is expressed as:
x=¿
∑ f i xi
∑ fi
Table 2: Frequency distribution of marks obtained by 88 medical students: Mean
Marks xi fi f i xi
0 ≤ x ≤ 10 (0+10)/2= 5 6 30
10 ≤ x ≤ 20 (10+20)/2= 15 16 240
20 ≤ x ≤ 30 (20+30)/2= 25 24 600
30 ≤ x ≤ 40 (30+40)/2= 35 25 875
40 ≤ x ≤ 50 (40+50)/2= 45 17 765
88 2510
Where
f i is the frequency of the variablei
x is the mid-interval: It is the average of the lower and upper bounds of the class mark.
x=¿
∑ f i xi
∑ fi
2510
¿
88
= 28.523
The Median
The median of a collection of observations is the middlemost measurement when the observations are
arranged according to size from smallest to largest. The sample median is expressed as:
n+1
The [ ]th largest observation if n is odd
2
n n
The average of the [ ]th and [ + 1]th largest observations if n is even
2 2
The rationale for these definitions is to ensure an equal number of sample points on both sides of the
sample median.
Properties of the Median
1. Uniqueness: For a given set of data there exist only one median.
2. Simplicity: it is easy to calculate.
3. Its principal strength is that it is insensitive to very large or very small values.
4. Its principal weakness is that it is determined mainly by the middle points in a sample and is less
sensitive to the actual numeric values of the remaining data points.
Example 5
Some investigators performed a 4-year retrospective review of 102 women undergoing radical
hysterectomy for cervical or endometrial cancer. Catheter-associated urinary tract infection was
observed in 7 0f the subjects. Below are the numbers of postoperative days until diagnosis of the
infection for each subject experiencing an infection.
12, 13, 16, 15, 17, 18 and 20. Determine the median hospital stay.
Answer
The first step is to arrange the sample in ascending order;
12, 13, 15, 16, 17, 18, 20
n+1
Since n (the number of observations is odd), the median is the [ ]th largest observation
2
7+1
That is the [ ]th = the 4th observation = 16
2
Example 6
Now suppose the second observation in example 5 is replaced by 5 or the last by 50, the median remains
the same. This illustrates its principal strength in that it is insensitive to very large or very small values.
Relationship between the sample mean and median
In many samples, the relationship between the sample mean and the sample median can be used to
assess the symmetry of a distribution. In particular, for symmetric distributions (bell-shaped curve), the
mean will be approximately the same as the median. For a positively skewed distribution, the mean will
tend to be larger than the median while for a negatively skewed distribution, the mean will tend to be
smaller than the median.
The Mode
The mode is the most frequently occurring value among all the observations in a sample. A collection of
data could be unimodal (one mode), bimodal (2 modes), trimodal (3 modes) or multimodal (many
modes).
Properties of the Mode
1. Not unique: For a given set of data, it is possible to have more than one mode.
2. Simplicity: It can be obtained simply by inspection.
3. Its principal strength is its insensitivity to very large or very small values since neither of the
extremes of the dataset are needed in its computation.
4. Its principal weakness is that it is determined mainly by the most frequently observation in a sample
and is less sensitive to the actual numeric values of the remaining data points.
Example 7
Consider the sample of time intervals between successive menstrual periods for a group of 500 college
ladies aged 18-21, as shown in the table below. The frequency column gives the number of women who
reported each of the respective durations. Obtain the mode.
Table 3: Time intervals between successive menstrual periods for a group of 500 college ladies aged
18-21
Value Frequency Value Frequency Value Frequency
24 5 29 96 34 7
25 10 30 63 35 3
26 28 31 24 36 2
27 64 32 9 37 1
28 185 33 2 38 1
Answer
The mode is the most frequently occurring value among all the observations
Hence the mode is 28
Choice of Measure of Central Tendency
Your choice of a measure of central tendency depends on your intention of its use. The median is the
most preferable when the data has the possibility of extreme values. When description of qualitative data
is required, the mode is more useful. However, for the purpose of inference, the mean is the most useful.
For the purpose of research, it is usually of more benefit to use the three measures.
Exercise
An investigation was done on survival times for 15 patients following a new treatment of
prostate cancer. The times in months were:
25, 16, 20, 25, 30, 35, 20, 27, 35, 40, 28, 40, 24, 25 and 30.
Obtain the mean, median and mode for the data set above.
Further reading
Campbell M. J., Machin D., Walters S. J. (2010) Medical statistics: a textbook for the health
sciences. John Wiley & Sons.
Hill, A. B. and Hill, I.D. (1991), Principles of Medical Statistics, 12th edition, E. Arnold, London
Rosner, B (2000), Fundamentals of Biostatistics, 5th edition, Crockett C. (Ed), Scratch gravel
publishing services
Wayne, W. D. (2008). Bio statistics: A Foundation for Analysis in the Health Sciences. 9th edition,
Wiley Series in Probability and Statistics.
3.2 MEASURES OF DISPERSION (SPREAD OR VARIABILITY)
It is possible for two data sets to have the same mean, the same median or the same mode but they might
have different degree of spread around the centre point. For instance, consider two samples of
cholesterol measurements, each on the same person, but using different measurement techniques.
Autoanalyzer method (mg/dl): 177, 193, 195, 209, 226
Microenzymatic method (mg/dl): 192, 197, 202, 209
Put graph
The two samples have the same mean i.e. x = 200 but you will discover that the Autoanalyzer method
has a greater variability or spread relative to the Microenzymatic method. Now we will consider the
most common measure of this spread.