3.
0    MEASURES OF CENTRE AND MEASURES OF DISPERSION
3.1   Measures of Centre (also referred to as location parameters)
A fundamental task in many statistical analyses is to find a typical (or central) value that best describes
the data, or ―location parameter for the distribution‖. There are three common measures of centre:
         3.1.1     Mode
The mode is the value of the observation that occurs with the greatest frequency. It is not necessarily
unique. There may be a single dominant observation in the data, or perhaps two or more modes. This is
usually evident from a histogram of the data. When taking samples from continuous populations, we
need to be somewhat careful in how we define the mode. A specific value may not occur more than
once if the data are continuous. What may be a more meaningful is the midpoint of the class interval
with the histogram with the highest peak. In this case the mode is an entire class and it is referred to as
―the modal class‖.
         3.1.2     Mean
It can be very useful to summarize a group of numerical values by finding their average value. The
mean gives a rough idea of the size of the values that you are dealing with, without having to look at
every one of them. The mean (or to use its proper name, the arithmetic mean) is another term for the
'average'. The mean is the sum of the observations divided by the number of observations.
         3.1.3     Median
The median is the value of the point which has half the data smaller than that point and half the data
larger than that point. That is, if X1, X2, ..., XN is a random sample sorted from the smallest value to the
largest value, the median is the observation in position N+1)/2 when N is odd or the average of the
observations in positions N/2 and (N/2)+1 when N is even:
The Table 3.1 shows the percentage of glycosolated haemoglobin circulating in the blood for patients
with diabetes. The mean, median and mode are as shown in the frequency distribution in Figure.
                                                    15
Table 3.1: Percentage of glycosolated haemoglobin circulating in the blood for patients with diabetes
                                                   %     Frequency
                                                  (X)           (f)     X.f
                                                  4.3             1     4.3
                                                  4.4             1     4.4
                                                  4.5             1     4.5
                                                  4.7             1     4.7
                                                  4.9             2     9.8
                                                  5.0             3      15
                              Mode                5.4             1     5.4
                                                  5.5             2      11
                                                  5.8             1     5.8
                              Median              6.0             1       6
                                                  6.1             1     6.1
                                                  6.2             1     6.2
                              Mean                7.0             1       7
                                                  7.6             1     7.6
                                                  7.9             1     7.9
                                                  8.5             1     8.5
                                                  8.9             1     8.9
                                                  9.9             1     9.9
                                                 10.7             1    10.7
                                                 10.8             1    10.8
                                                  1.9             1     1.9
                                                 11.2             1    11.2
                                             SUM                26    167.6
                                             Mean                     6.446
3.2   Measures of Dispersion
         3.2.1     Range
Largest observation - smallest observation
         3.2.2     Quartiles and inter-quartile range
The quartiles, namely the lower quartile, the median and the upper quartile, divide the data into four
equal parts; that is there will be approximately equal numbers of observations in the four sections (and
exactly equal if the sample size is divisible by four and the measures are all distinct). The quartiles are
calculated in a similar way to the median; first order the data and then count the appropriate number
from the bottom. The interquartile range is a useful measure of variability and is given by the difference
of the lower and upper quartiles. The interquartile range is not vulnerable to outliers, and whatever the
distribution of the data, we know that 50% of them lie within the interquartile range.
Example 3.1
       Find the median and the inter-quartile range of x if x has the frequency distribution below:
       x      4      5      6       7       8      9         10
       f      3      5      8       4       2      1         1
                                                    16
Solution:
To find the median, we fist calculate the cumulative frequencies and record them in a row we label CF
        x      4      5        6       7     8       9      10
        f      3      5        8       4     2       1      1
        CF     3      8        16      20    22      23     24
The cumulative frequencies show the last position occupied by each observation in the arranged data;
for example, the last position occupied by a 4 is the 3rd, the last position occupied by a 5 is the 8th; the
last position occupied by a 6 is the 16th etc... The last CF gives us the total number of observations;
which is here (n=24). Since n is even, the median is the average of the observations in positions n/2
and (n/2)+1 or 12th and 13th positions. It so happens that both the 12th and the 13th observations are 6;
therefore the median is (6+6)/2 which is 6. This calculated median divides the data set into two halves
of 12 observations each; i.e.
        x      4       5       6
        f      3       5       4
        CF     3       8       12
 and:
        x      6       7       8       9       10
        f      4       4       2       1       1
        CF     4       8       10      11      12
The lower quartile, Q1 will be the median of the lower half, which is the average of the observation in
6th position and that in 7th position; which is 5. Simalarly, the upper quartile Q3 is the median of the
upper half which is a 7. The interquartile is calculated as Q3-Q1=7-5=2.
                                                    17
 Note that to find the quartiles using SPSS, we need to expand the data and write each observation as many times as it appears, such that the data view will be a column
 showing numbers as follows:
 x
 4
 4
 4
 5
 5
 5
 5
 5
 etc...
 The variable view for the data will be simply one row as shown in the figure 3.1.
 Figure 3.1: Variable view for the data in the frequency distribution table above
 From the above data, SPSS can be used to prepare a frequency distribution table such as the one in Table 3.2 below or a summary of statistics such as the one in Table 3.3
 by using ―Analysis‖ then ―Frequencies‖ or ―Statistics.‖
 Table: 3.2: Frequency distribution table
                                                                        x             Frequency           Percent
                                                        Valid           4                  3                12.5
                                                                        5                  5                20.8
                                                                        6                  8                33.3
                                                                        7                  4                16.7
                                                                        8                  2                8.3
                                                                        9                  1                4.2
                                                                        10                 1                4.2
                                                                        Total             24               100.0
 Table 3.3: Summary statistics
                                                       Mean                                                      6.17
                                                       Median                                                    6.00
                                                       Mode                                                         6
                                                       Std. Deviation                                           1.523
                                                       Percentiles              25                               5.00
                                                                                50                               6.00
                                                                                75                               7.00
 The same data can also be used to draw the a Box and whisker diagramme such as the one in the Figure 3.2
                                                                                              Figure 3.2: Box and whisker diagramme for the data above
Box 3: Use of SPSS for the calculation of Median and Quartiles and for the construction of box and
       whisker diagramme
                                                                                     18
         3.2.3     Variance and standard deviation
We have seen that the inter-quartile range indicates the variation of data where the median is the
measure of central tendency. Standard deviation is used where this measure is the mean. It indicates the
difference between a group of values and their mean, taking all of the data into account. Although this
means that it may be influenced by extreme values, the standard deviation plays an important role in
many tests of statistical significance. The larger the standard deviation, the more the values differ from
the mean, and therefore the more widely they are spread out. For example, one small group of patients
in a particular outpatient clinic may wait for a mean time of 11 minutes to be seen by a doctor, and the
standard deviation from the mean for this group is 5.701. Individual waiting times vary widely - from 7
minutes up to 21 minutes. There is wide variation between these waiting times, and they are quite
widely spread out from their mean. These waiting times are therefore heterogeneous or dissimilar. On
another day, another group of patients from the same clinic may also have a mean waiting time of 11
minutes, but their standard deviation is 0.707. This is much less than the first group's standard deviation
of 5.701. Looking at this group's actual waiting times, it can be seen that they only vary from 10 to 12
minutes. Waiting times for the second group are more homogeneous - that is, the data are more similar
to each other. They are less widely spread out around their mean than the first group. Let us look at the
actual waiting times recorded for each group, as shown in Table 3.4. You can see that the data in Group
1 are much more spread out than those in Group 2. This difference in standard deviations can be
explained by the fact that, although most patients in Group 1 waited a very short time, one patient had
to wait for a long time (21 minutes).
Table 3.4: Waiting times and standard deviation for each patient group
                                                    19
Although the abbreviations SD or s.d. are used to represent standard deviation generally, s is used to
represent standard deviation for samples, and σ is used to represent standard deviation for populations.
The most usual formula for standard deviation is as follows:                                                  where x = individual
value,        = sample mean and n = number of values. The above equation is only suitable for a sample
(or population estimate). This will usually be the case, since we rarely know the true population value
(which in this case is the mean).
The following steps are used to work out a standard deviation.
1          Find the mean of the group.
2          Subtract this from every value in the group individually - this shows the deviation from the
           mean, for every value.
3          Work out the square (x2) of every deviation this produces a squared deviation for every value.
4          Add up all of the squared deviations.
5          Add up the number of observed values, and subtract 1.
6          Divide the sum of squared deviations by this number, to produce the sample variance.
7          Work out the square root of the variance.
Example 3.2
                     Find the standard deviation of x where x has the frequency distribution given below:
                     x      1      2       3      4       5       6
                     f      4      6       9      3       4       2
    To find the standard error of the mean using SPSS, select ―Analyze‖, the ―Descriptive statistics‖, the ―Descriptives‖ and select the
    appropriate variable and in the options, tick ―standard deviation‖ then click OK.
Box 4: Use of SPSS for the calculation of Standard deviation
                                                                      20