2.2 Correlation
2.2 Correlation
Objectives
After completing this chapter, you should be able to:
● Calculate the value of the product moment correlation coefficient,
2
understand the effect of coding on it and understand the conditions
for its use → pages 22–26
● Calculate and interpret Spearman’s rank correlation coefficient → pages 26–32
● Carry out hypothesis tests for zero correlation using either Spearman’s
rank correlation coefficient or the product moment correlation coefficient
→ pages 33–38
15
10
0
0 1 2 3 4 5 6 7 8 9 10
Age (years)
21
O O x Ox O p p
Raw data is often tightly grouped or contains very The degree of linear correlation is unaffected
small or very large values. Changing the scale by the change of scale. The value of r for the
(which is equivalent to linear coding) can make uncoded and coded values will be the same.
the scatter graph easier to read.
22
Example 1
Example 2
Data are collected on the amount of dietary supplement, d grams, given to a sample of 8 cows and
d m
their milk yield, m litres. The data were coded using x = __
− 6 and y = ___
. The following summary
2 20
statistics were obtained:
∑d 2 = 4592 S dm = 90.6 ∑x = 44 S yy = 0.05915
a Use the formula for S yyto show that S mm = 23.66.
b Find the value of the product moment correlation coefficient between d and m.
23
(
∑ )
m 2
___
20
= ∑( ) −
m
___ 2
________
a Syy Substitute the code for y into the formula for Syy.
20 8
2
___
___
400 1
__________ (∑m) Problem-solving
0.05915 = 400 ∑m 1 2
−
8 1 1
If you take a factor of ___
2 = ____
out of each term
= 400 (
8 )
2
___ (______
∑m) 20 400
1
∑m −
2
on the right-hand side you are left with the
formula for S mm .
= ___
1
400 S mm
= 0.05915 × 400 = 23.66
Hence Smm Substitute and simplify to find the value of Smm.
Exercise 2A
1 Given that Sxx = 92, Syy = 112 and Sxy = 100 find the value of the product moment correlation
coefficient between x and y.
2 Given the following summary data,
∑ x = 367 ∑ y = 270 ∑ x2 = 33 845 ∑ y2 = 12 976 ∑ xy = 17 135 n = 6
calculate the product moment correlation coefficient, r, using the formula
Sxy
r = _______
______
√SxxSyy
E 3 The ages, a years, and heights, h cm, of seven members of a team were recorded. The data were
summarised as follows:
∑ a = 115 ∑ a2 = 1899 Shh = 571.4 Sah = 72.1
a Find Saa. (1 mark)
b Find the value of the product moment correlation coefficient between a and h. (1 mark)
c Describe and interpret the correlation between the age and height of these seven people
based on these data. (2 marks)
E 4 In research on the quality of bacon produced by different breeds of pig, data were obtained
about the leanness, l, and taste, t, of the bacon. The data are shown in the table.
Leanness, l 1.5 2.6 3.4 5.0 6.1 8.2
Taste, t 5.5 5.0 7.7 9.0 10.0 10.2
a Find Sll, Stt and Slt. (3 marks)
b Calculate the product moment correlation coefficient between l and t using the values
found in part a. (2 marks)
24
O 1 2 3 4 5 6 7 8 9 l
E 5 Eight children had their IQ measured and then took a general knowledge test.
Their IQ, x, and their marks, y, for the test were summarised as follows:
∑ x = 973 ∑ x2 = 120 123 ∑ y = 490 ∑ y2 =33 000 ∑ xy = 61 595.
a Calculate the product moment correlation coefficient. (3 marks)
b Describe and interpret the correlation coefficient between IQ and general knowledge. (2 marks)
7 The following data are to be coded using the coding p = x and q = y – 100.
x 0 5 3 2 1
y 100 117 112 110 106
8 The product moment correlation is to be worked out for the following data set using coding.
x 50 40 55 45 60
y 4 3 5 4 6
E 9 A shopkeeper thinks that the more newspapers he sells in a week the more sweets he sells. He
records the amount of money (m pounds) that he takes in newspaper sales and also the amount
of money he takes in sweet sales (s pounds) each week for seven weeks. The data are shown in
the following table.
Newspaper sales, m pounds 380 402 370 365 410 392 385
Sweet sales, s pounds 560 543 564 573 550 544 530
a Use the coding x = m – 365 and y = s – 530 to find Sxx, Syy and Sxy. (4 marks)
b Calculate the product moment correlation coefficient for m and s. (1 mark)
c State, with a reason, whether or not what the shopkeeper thinks is correct. (1 mark)
25
O x
Unfortunately, she forgot to label the axes on her scatter diagram and left the summary data
calculations incomplete.
Problem-solving
A second student was asked to complete the analysis of the data. Substitute the code into
a Show that S ff = 11 148. (3 marks) the formula for S xx.
b Find the value of the product moment correlation coefficient between f and g. (4 marks)
c With reference to the scatter diagram, comment on the result in part b. (1 mark)
E/P 11 Alice, a market gardener, measures the amount of fertiliser, x litres, that she adds to the
compost for a random sample of 7 chilli plant beds. She also measures the yield of chillies, y kg.
The data are shown in the table below:
x, litres 1.1 1.3 1.4 1.7 1.9 2.1 2.5
y, kg 6.2 10.5 12 15 17 18 19
For example, suppose a manufacturer of tea produced a number of different blends; you could taste
each blend and place the blends in order of preference. You do not, however, have a continuous
numerical scale for measuring your preference. Similarly, it may be quicker to arrange a group of
individuals in order of height than to measure each one. Under these circumstances, Spearman’s rank
correlation coefficient is used.
26
● one or both data sets already represent a ranking (as in Example 3 below).
Example 3
Two tea tasters were asked to rank nine blends of tea in their order of preference. The tea they
liked best was ranked 1. Their orders of preference are shown in the table:
Blend A B C D E F G H I
Taster 1 (x) 3 6 2 8 5 9 7 1 4
Taster 2 ( y) 5 6 4 2 7 8 9 1 3
Calculate Spearman’s rank correlation coefficient for these data. Online Explore
Spearman’s rank
xi yi xi2 yi2 xi yi correlation coefficient
3 5 9 25 15 using GeoGebra.
6 6 36 36 36
2 4 4 16 8
8 2 64 4 16
Find x
2i , y
2i and xi yi
5 7 25 49 35
9 8 81 64 72
7 9 49 81 63
1 1 1 1 1
4 3 16 9 12 Find
xi = 45 yi = 45 xi2 = 285 yi2 = 285 xi yi = 258 xi, yi, xi2 , yi2
xiyi and xi yi
Sxy xiyi – n
_______
______ __________________________________
rs = =
______________________________
√
√
(
(_______
xi)2
) ( n )
(_______
yi)2
SxxSyy
x2i –
n y
2
i –
Use the standard formula
45 × 45
258 – ________
to calculate rs
9
__________________________________ 33
_________________________________
=
= __________
________ = 0.55
√(
45 × 45
________
285 –
9 )(45 × 45
________
285 –
9
√
)
60 × 60
27
You can calculate Spearman’s rank correlation coefficient more quickly by looking at the differences
between the ranks of each observation.
■■ If there are no tied ranks, Spearman’s rank Watch out Tied ranks occur when two
correlation coefficient, rs, is calculated using
or more data values in one of the data sets
6 ∑d 2 are the same. If there are only one or two
rs = 1 − _________
n( n 2 − 1) tied ranks, this formula gives a reasonable
estimate for rs but if there are many tied
where d is the difference between the ranks
ranks then you should use the PMCC formula
of each observation, and n is the number with the ranked data.
of pairs of observations.
Example 4
During a cattle show, two judges ranked ten cattle for quality according to the following table.
Cattle A B C D E F G H I J
Judge A 1 5 2 6 4 8 3 7 10 9
Judge B 3 6 2 7 5 8 1 4 9 10
Find Spearman’s rank correlation coefficient between the two judges and comment on the result.
28
If you are ranking data, and two or more data values are equal, then these data values will have a
tied rank.
■■ Equal data values should be assigned a rank equal to the mean of the tied ranks.
For example:
Data value 200 350 350 400 700 800 800 800 1200
Rank 1 2.5 2.5 4 5 7 7 7 9
The 2nd and 3rd rank are tied, so assign a The 6th, 7th and 8th ranks are tied, so assign a rank
rank of 2.5 to each of these data values. 6+7+8
of ________
= 7 to each of these data values.
3
When ranks are tied, the formula for the Spearman’s rank correlation coefficient only gives an
approximate value of rs. This approximation is sufficient when there are only a small number of tied
ranks. If there are many tied ranks, then you should use the PMCC formula with the ranked data.
Example 5
The marks of eight pupils in French and German tests were as follows:
A B C D E F G H
French, f % 52 25 86 33 55 55 54 46
German, g % 40 48 65 57 40 39 63 34
6 ∑d 2
a Use the formula r s = 1 − ________
2 to find an estimate for Spearman’s rank correlation
n(n − 1)
coefficient, showing clearly how you deal with tied ranks. Give your answer to 2 decimal places.
b Without recalculating the correlation coefficient, state how your answer to part a would change if:
i pupil H’s mark for German was changed to 38%
ii a ninth pupil was included who scored 95% in French and 89% in German.
The teacher collects extra data from other students in the class and finds that there are now many
tied ranks.
c Describe how she would now find a measure of the correlation.
29
m
Exercise 2B
325
P 1 The scatter graph shows the length, l m, and mass, m kg, of 320
10 randomly selected male Siberian tigers. 315
A student wishes to analyse the correlation between l and m. 310
Give one reason why the student might choose to use: 305
a the product moment correlation coefficient 300
b Spearman’s rank correlation coefficient. 295
3.05 3.1 3.15 3.2 3.25 3.3 3.35 3.4 l
P 2 A college is trying to determine whether a published placement test (PPT) gives a good indicator
of the likely student performance in a final exam. Data on past performances are shown in the
scatter graph:
90
Final exam percentage
80
70
60
50
40
30
20
200 205 210 215 220 225 230 235 240
PPT score
Give a reason why the college should not use the product moment correlation coefficient to
measure the strength of the correlation of the two variables.
P 3 A sports science researcher is investigating whether there is a correlation between the height of a
basketball player and the number of attempts it takes them to score a free throw. The researcher
proposes to collect a random sample of data and then calculate the product moment correlation
coefficient between the two variables.
Give a reason why the PMCC would not be appropriate in this situation and state an alternative
method that the researcher can use.
30
4 For each of the data sets of ranks given below, calculate the Spearman’s rank correlation
coefficient and interpret the result.
a rx 1 2 3 4 5 6
ry 3 2 1 5 4 6
b rx 1 2 3 4 5 6 7 8 9 10
ry 2 1 4 3 5 8 7 9 6 10
c rx 5 2 6 1 4 3 7 8
ry 5 6 3 8 7 4 2 1
5 Match the scatter graphs with the given values of Spearman’s rank correlation coefficient.
a b y
y 4.8
8 4.6
7 4.4
6 4.2
5 4.0
4 3.8
3 3.6
2 3.4
1 3.2
0 3.0
3.0 3.2 3.4 3.6 3.8 4.0 4.2 x 4.0 4.2 4.4 4.6 4.8 5.0 5.2 x
c y d y
0.8
12 0.7
11 0.6
10 0.5
0.4
9
0.3
8
0.2
7 0.1
6 0.0
0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 x 0 1 2 3 4 5 6 x
E 6 The number of goals scored by football teams and their positions in the league were recorded
as follows for the top 12 teams.
Team A B C D E F G H I J K L
Goals 49 44 43 36 40 39 29 21 28 30 33 26
League position 1 2 3 4 5 6 7 8 9 10 11 12
a Find d 2, where d is the difference between the ranks of each observation. (3 marks)
b Calculate Spearman’s rank correlation coefficient for these data.
What conclusions can be drawn from this result? (3 marks)
31
7 A veterinary surgeon and a trainee veterinary surgeon both rank a small herd of cows for
quality. Their rankings are shown below.
Cow A D F E B C H J
Qualified vet 1 2 3 4 5 6 7 8
Trainee vet 1 2 5 6 4 3 8 7
Find Spearman’s rank correlation coefficient for these data, and comment on the experience of
the trainee vet.
a Explain why you would use Spearman’s rank correlation coefficient in this case.
b Calculate Spearman’s rank correlation coefficient rs, and comment on how well the judges agree.
It turns out that Judge 1 incorrectly recorded their score for competitor A and it should have been
7.7.
c Explain how you would now deal with equal data values if you had to recalculate the
Spearman’s rank correlation coefficient.
E 9 In a diving competition, two judges scored each of 7 divers on a forward somersault with twist.
Diver A B C D E F G
Judge 1 score 4.5 5.1 5.2 5.2 5.4 5.7 5.8
Judge 2 score 5.2 4.8 4.9 5.1 5.0 5.3 5.4
a Give one reason to support the use of Spearman’s rank correlation coefficient in
this case. (1 mark)
b Calculate Spearman’s rank correlation coefficient for these data. (4 marks)
The judges also scored the divers on a back somersault with two twists.
Spearman’s rank correlation coefficient for their ranks in this case was 0.676.
c Compare the judges’ ranks for the two dives. (1 mark)
E/P 10 Two tea tasters sample 6 different teas and give each a score out of 10.
Tea A B C D E F
Taster 1 score 7 8 6 7 9 10
Taster 2 score 8 9 5 7 10 10
32
Product moment coefficient For a sample size of 8 you see from the
Level Sample table that the critical value of r to be
significant at the 5% level on a one-
0.10 0.05 0.025 0.01 0.005 size
tailed test is 0.6215. An observed value
0.8000 0.9000 0.9500 0.9800 0.9900 4 of r greater than 0.6215 from a sample
0.6870 0.8054 0.8783 0.9343 0.9587 5 of size 8 would provide sufficient
0.6084 0.7293 0.8114 0.8822 0.9172 6 evidence to reject the null hypothesis
and conclude that ρ > 0. Similarly, an
0.5509 0.6694 0.7545 0.8329 0.8745 7
observed value of r less than −0.6215
0.5067 0.6215 0.7067 0.7887 0.8343 8 would provide sufficient evidence to
0.4716 0.5822 0.6664 0.7498 0.7977 9 conclude that ρ < 0.
Example 6
A chemist observed 20 reactions, and recorded the mass of the reactant, x grams, and the duration
of a reaction, y minutes.
She summarised her findings as follows:
∑ x = 20 ∑ y = 35 ∑ xy = 65 ∑ x2 = 35 ∑ y2 = 130
Test, at the 5% significance level, whether these results show evidence of any correlation between
the mass of the reactant and the duration of the reaction.
33
√(
(
xi)
) ( n )
(_______
yi)2
2
x2i – _______
n y 2
i –
20 × 35
65 – ________
20
__________________________________
_________________________________
=
√(
202
35 – ________
20 )(
352
130 – ________
20 )
Use the given summary statistics to calculate r.
= 0.934…
You reject H0 if the observed value lies inside the
0.934… > 0.4438. The observed value of r
critical region.
lies within the critical region, so reject H0.
There is evidence, at the 5% level of
significance, that there is a correlation Write a conclusion in the context of the original
between the mass of the reactant and the question.
duration of the reaction.
You can carry out a hypothesis test for zero correlation using Spearman’s rank correlation coefficient
in the same way. The table of critical values for Spearman’s coefficient is also given in the formulae
booklet and on page 216.
Spearman’s coefficient
Sample Level
size 0.05 0.025 0.01
4 1.0000 – –
5 0.9000 1.0000 1.0000
6 0.8286 0.8857 0.9429
For a sample size of 8 you see from
7 0.7143 0.7857 0.8929 the table that the critical value of rs to
8 0.6429 0.7381 0.8333 be significant at the 0.025 level on a
9 0.6000 0.7000 0.7833 one-tailed test is ±0.7381.
34
Example 7
The popularity of 16 subjects at a comprehensive school was found by counting the number of
boys and the number of girls who chose each subject and then ranking the subjects. The results are
shown in the table below.
Subject A B C D E F G H I J K L M N O P
Boys’ ranks, b 2 5 9 8 1 3 15 16 6 10 12 14 4 7 11 13
Girls’ ranks, g 4 7 11 3 6 9 12 16 5 13 10 8 2 1 15 14
d 2 = 214
6 d 2
rs = 1 – ________
n(n2 – 1)
6 × 214
= 1 – __________
16(162 – 1)
= 0.685… State your hypotheses. You are testing for
positive correlation so this is a one-tailed test.
b H0: ρ = 0
H1: ρ > 0 Find the critical value.
From the tables for a sample size of 16 the critical value is 0.5824.
See if your value of rs
Since 0.685… > 0.5824, the result is significant at the 1% level.
is significant.
You reject H0 and accept H1: there is evidence that boys’ and girls’ Draw a conclusion.
choices are positively correlated.
Exercise 2C
1 A sample of 7 observations (x, y) was taken, and the following values were calculated:
∑x = 29 ∑x 2= 131 ∑y = 28 ∑y 2= 140 ∑xy = 99
a Calculate the product moment correlation coefficient for this sample.
b Test H0: ρ = 0 against H1: ρ ≠ 0. Use a 1% significance level and state any assumptions you
have made.
35
E 2 The ages, X years, and heights, Y cm, of 11 members of an athletics club were recorded and the
following statistics were used to summarise the results.
X Y = 1275 XY = 20 704 X 2 = 2585 Y 2 = 320 019
= 168
a Calculate the product moment correlation coefficient for these data. (3 marks)
b Test the assertion that the ages and heights of the club members are positively correlated.
State your conclusion in words and any assumptions you have made.
Use a 5% level of significance. (5 marks)
3 A sample of 30 compact cars was taken, and the fuel consumption and engine sizes of the cars
were ranked.
A consumer group wants to test whether fuel consumption and engine size are related.
a Find the critical region for a hypothesis test based on Spearman’s rank correlation coefficient.
Use a 5% level of significance.
A Spearman’s rank correlation coefficient of rs = 0.5321 was calculated for the sample.
b Comment on this value in light of your answer to part a.
E/P 4 For one of the activities at a gymnastics competition, 8 gymnasts were awarded marks out of 10
for artistic performance and for technical ability. The results were as follows.
Gymnast A B C D E F G H
Technical ability 8.5 8.6 9.5 7.5 6.8 9.1 9.4 9.2
Artistic performance 6.2 7.5 8.2 6.7 6.0 7.2 8.0 9.1
The value of the product moment correlation coefficient for these data is 0.774.
a Stating your hypotheses clearly, and using a 1% level of significance, test for evidence of a
positive association between technical ability and artistic performance.
Interpret this value. (4 marks)
b Calculate the value of Spearman’s rank correlation coefficient for these data. (3 marks)
c Give one reason why a hypothesis test based on Spearman’s rank correlation coefficient
might be more suitable for this data set. (1 mark)
d Use your answer to part b to carry out a second hypothesis test for evidence of a positive
correlation between technical ability and artistic performance.
Use a 1% significance level. (4 marks)
E/P 5 Two judges ranked 8 ice skaters in a competition according to the table below.
Skater
i ii iii iv v vi vii viii
Judge
A 2 5 3 7 8 1 4 6
B 3 2 6 5 7 4 1 8
A test is to be carried out to see if there is a positive association between the rankings of the judges.
a Give a reason to support the use of Spearman’s rank correlation coefficient in this
case. (1 mark)
b Evaluate Spearman’s rank correlation coefficient. (3 marks)
c Carry out the test at the 5% level of significance, stating your hypotheses clearly. (4 marks)
36
P 6 Each of the teams in a school hockey league had the total number of goals scored by them and
against them recorded, with the following results.
Team A B C D E F G
Goals for 39 40 28 27 26 30 42
Goals against 22 28 27 42 24 38 23
Investigate whether there is any association between the goals for and those against by using
Spearman’s rank correlation coefficient. Use a suitable test at the 1% level to investigate the
statement, ‘A team that scores a lot of goals concedes very few goals’.
E 7 The weekly takings and weekly profits for six different branches of a kebab restaurant are
shown in the table below.
Shop 1 2 3 4 5 6
Takings (£) 400 6200 3600 5100 5000 3800
Profits (£) 400 1100 450 750 800 500
a Calculate Spearman’s rank correlation coefficient, rs, between the takings and
profit. (3 marks)
b Test, at the 5% significance level, the assertion that profits and takings are positively
correlated. (4 marks)
a Calculate Spearman’s rank correlation coefficient, rs, showing your value of d 2. (3 marks)
b Test the assertion that there is no correlation between these subjects. State the null
and alternative hypotheses used. Use a 5% significance level. (4 marks)
E/P 9 A child is asked to place 10 objects in order and gives the ordering
A C H F B D G E J I
The correct ordering is
A B C D E F G H I J
Conduct, at the 5% level of significance, a suitable hypothesis test to determine whether there is
a positive association between the child’s order and the correct ordering. You must state clearly
which correlation coefficient you are using and justify your selection. (8 marks)
E/P 10 The crop of a root vegetable was measured over six consecutive years, the years being ranked
for wetness. The results are given in the table below.
Year 1 2 3 4 5 6
Crop (10 000 tons) 62 73 52 77 63 61
Rank of wetness 5 4 1 6 3 2
A seed producer claims that crop yield and wetness are not correlated. Test this assertion using
a 5% significance level. You must state which correlation coefficient you are using and justify
your selection. (8 marks)
37
P 11 A researcher collects data on the heights and masses of a random sample of gorillas. She finds
that the correlation coefficient between the data is 0.546.
a Explain which measure of correlation the researcher is likely to have used.
Given that the value of the correlation coefficient provided sufficient evidence to accept the
alternative hypothesis that there is positive correlation between the variables,
b find the smallest possible significance level given that she collected data from 14 gorillas
c find the smallest possible sample size given that she carried out the test at the 5% level of
significance.
Mixed exercise 2
E 1 Wai wants to know whether the 10 people in her group are as good at science as they are at art.
She collected the end of term test marks for science (s), and art (a), and coded them
s a
using x = ___
and y = ___
10 10
The data she collected can be summarised as follows,
∑x = 67 ∑x2 = 465 ∑ y = 65 ∑ y2 = 429 ∑xy = 434.
a Work out the product moment correlation coefficient for x and y. (3 marks)
b Write down the product moment correlation coefficient for s and a. (1 mark)
c Write down whether or not it is it true to say that the people in Wai’s group who are good at
science are also good at art. Give a reason for your answer. (1 mark)
E 2 Nimer thinks that oranges that are very juicy cost more than those that are not very juicy. He
buys 20 oranges from different places, and measures the amount of juice ( j ml), that each orange
produces. He also notes the price ( p) of each orange.
The data can be summarised as follows,
∑ j = 979 ∑ p = 735 ∑ j 2 = 52 335 ∑ p 2 = 32 156 ∑ jp = 39 950.
a Find Sjj, Spp and Sjp. (3 marks)
b Using your answers to part a, calculate the product moment correlation coefficient. (1 mark)
c Describe the type of correlation between the amount of juice and the cost and state, with a
reason, whether or not Nimer is correct. (2 marks)
38
c Write down the value of the correlation coefficient between x and y. (1 mark)
d With reference to the scatter diagram, comment on the result in part b. (1 mark)
E 4 Two judges at a cat show place the 10 entries in the following rank orders.
Cat A B C D E F G H I J
First judge 4 6 1 2 5 3 10 9 8 7
Second judge 2 9 3 1 7 4 6 8 5 10
a Explain why Spearman’s rank correlation coefficient is appropriate for these data. (1 mark)
b Find the value of Spearman’s rank correlation coefficient. (3 marks)
c Explain briefly the role of the null and alternative hypotheses in a test of significance. (1 mark)
d Stating your hypotheses clearly, carry out a test at the 5% level of significance and use your
result to comment on the extent of the agreement between the two judges. (4 marks)
E/P 5 a Explain briefly the conditions under which you would measure association using Spearman’s
rank correlation coefficient. (1 mark)
b Nine applicants for places at a college were interviewed by two tutors. Each tutor ranked the
applicants in order of merit. The rankings are shown below.
Applicant A B C D E F G H I
Tutor 1 1 2 3 4 5 6 7 8 9
Tutor 2 1 3 5 4 2 7 9 8 6
By carrying out a suitable hypothesis test, investigate the extent of the agreement between the
two tutors. (7 marks)
E 6 In a ski jumping contest each competitor made two jumps. The order of merit for the
10 competitors who completed both jumps are shown.
Ski jumper A B C D E F G H I J
First jump 2 9 7 4 10 8 6 5 1 3
Second jump 4 10 5 1 8 9 2 7 3 6
a Calculate, to 2 decimal places, Spearman’s rank correlation coefficient for the performance of
the ski jumpers in the two jumps. (3 marks)
b Using a 5% significance, and quoting from the table of critical values, investigate whether
there is a positive association between performance on the first and second jumps. State your
null and alternative hypotheses clearly. (4 marks)
E/P 7 An expert on porcelain is asked to place seven china bowls in date order of manufacture,
assigning the rank 1 to the oldest bowl. The actual dates of manufacture and the order given by
the expert are shown below.
Bowl A B C D E F G
Date of manufacture 1920 1857 1710 1896 1810 1690 1780
Order given by expert 7 3 4 6 2 1 5
Carry out a hypothesis test to determine whether the expert is able to judge relative age
accurately. You must state:
• the significance level of your test
• your null and alternative hypotheses
• which correlation coefficient, with justification. (8 marks)
39
E 8 A small bus company provides a service for a small town and some neighbouring villages.
In a study of their service a random sample of 20 journeys was taken and the distances x, in
kilometres, and journey times t, in minutes, were recorded. The average distance was 4.535 km
and the average journey time was 15.15 minutes.
a Using x2 = 493.77, t 2 = 4897, xt = 1433.8, calculate the product moment correlation
coefficient for these data. (3 marks)
b Stating your hypotheses clearly test, at the 5% level, whether or not there is evidence of a
positive correlation between journey time and distance. (4 marks)
c State any assumptions that have to be made to justify the test in part b. (1 mark)
E 9 A group of students scored the following marks in their statistics and geography exams.
Student A B C D E F G H
Statistics 64 71 49 38 72 55 54 68
Geography 55 50 51 47 65 45 39 82
a Find the value of Spearman’s rank correlation coefficient between the marks of these
students. (3 marks)
b Stating your hypotheses and using a 5% level of significance, test whether marks in
statistics and marks in geography are associated. (4 marks)
E/P 10 An international study of female literacy investigated whether there was any correlation
between the life expectancy of females and the percentage of adult females who were literate.
A random sample of 8 countries was taken and the following data were collected.
40
E 11 Six Friesian cows were ranked in order of merit at an agricultural show by the official judge and
by a student vet. The ranks were as follows:
Official judge 1 2 3 4 5 6
Student vet 1 5 4 2 6 3
E 12 As part of a survey in a particular profession, age, x years, and salary, £y thousands, were recorded.
The values of x and y for a randomly selected sample of ten members of the profession are as
follows:
x 30 52 38 48 56 44 41 25 32 27
y 22 38 40 34 35 32 28 27 29 41
E 13 A machine hire company kept records of the age, X months, and the maintenance costs,
£Y, of one type of machine. The following table summarises the data for a random sample of
10 machines.
Machine A B C D E F G H I J
Age, x 63 12 34 81 51 14 45 74 24 89
Maintenance costs, y 111 25 41 181 64 21 51 145 43 241
41
E 14 The data below show the height above sea level, x metres, and the temperature, y °C, at
7.00 a.m., on the same day in summer at nine places in Europe.
Height, x (m) 1400 400 280 790 390 590 540 1250 680
Temperature, y (°C) 6 15 18 10 16 14 13 7 13
a Use your calculator to find the product moment correlation coefficient for this
sample. (1 mark)
b Test, at the 5% significance level, whether height above sea level and temperature are
negatively correlated. (4 marks)
On the same day the number of hours of sunshine was recorded and Spearman’s rank
correlation coefficient between hours of sunshine and temperature, based on d 2 = 28,
was 0.767.
c Stating your hypotheses and using a 5% two-tailed test, interpret this rank correlation
coefficient. (4 marks)
E 15 a Explain briefly the conditions under which you would measure association using Spearman’s
rank correlation coefficient rather than the product moment correlation coefficient. (1 mark)
At an agricultural show 10 Shetland sheep were ranked by a qualified judge and by a trainee
judge. Their rankings are shown in the table.
Qualified judge 1 2 3 4 5 6 7 8 9 10
Trainee judge 1 2 5 6 7 8 10 4 3 9
E 16 The positions in a league table of 8 rugby clubs at the end of a season are shown, together with
the average attendance (in hundreds) at home matches during the season.
Club A B C D E F G H
Position 1 2 3 4 5 6 7 8
Average attendance 30 32 12 19 27 18 15 25
Calculate Spearman’s rank correlation coefficient between position in the league and home
attendance. Comment on your results. (4 marks)
E/P 17 The ages, in months, and the weights, in kg, of a random sample of nine babies are shown in
the table below.
Baby A B C D E F G H I
Age (x) 1 2 2 3 3 3 4 4 5
Weight (y) 4.4 5.2 5.8 6.4 6.7 7.2 7.6 7.9 8.4
a The product moment correlation coefficient between weight and age for these babies was
found to be 0.972. By testing for positive correlation at the 5% significance level interpret
this value. (4 marks)
42
A boy who does not know the weights or ages of these babies is asked to list them, by
guesswork, in order of increasing weight. He puts them in the order
A C E B G D I F H
b Obtain, to 3 decimal places, a rank correlation coefficient between the boy’s order and the
true weight order. (3 marks)
c By carrying out a suitable hypothesis test at the 5% significance level, assess the boy’s ability
to correctly rank the babies by weight. (4 marks)
Challenge
xi and yi are ranked variables with no ties, so that each takes the values 1, 2, 3, … n exactly once.
The difference for each pair of data values is defined as di = yi – xi.
n
a Explain why ∑x i 2= ∑y i 2= ∑ r 2, and hence express the quantity in terms of n.
r=1
__________________ __________________
_2 _ _2 _2 _ n(n 2 − 1)
b Explain why √ ∑
∑(x i − x
) ( y i − y ) 2 =
(x i − x , and hence show that √ ∑
) ) (y i − y ) 2 = ________
( x i − x
12
∑d i 2
c By expanding ∑( y i − x i) 2, show that ∑x i y i= ∑x i 2− _____ .
2
_ _ ________ n(n 2 − 1) _____ ∑
d i 2
d Hence show that ∑(x i − x (y i − y) =
) −
12 2
_ _
∑(xi− x
_____________________
)( yi− y ) 6∑d i 2
e Hence, or otherwise, prove that: ____________________ = 1 − ________ 2
_2 _2
√ ∑
i− x) ( yi− y)
( x n(n − 1)
43