0% found this document useful (0 votes)
62 views9 pages

Understanding Correlation Basics

The document discusses correlation and the correlation coefficient, which is a measure used to indicate the strength and direction of a linear relationship between two continuous variables. It provides formulas to calculate the correlation coefficient and covers cases when the correlation is positive, negative, and when there is no correlation. Examples are included to demonstrate calculating the correlation coefficient from data sets.

Uploaded by

JAGADISH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views9 pages

Understanding Correlation Basics

The document discusses correlation and the correlation coefficient, which is a measure used to indicate the strength and direction of a linear relationship between two continuous variables. It provides formulas to calculate the correlation coefficient and covers cases when the correlation is positive, negative, and when there is no correlation. Examples are included to demonstrate calculating the correlation coefficient from data sets.

Uploaded by

JAGADISH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Correlation and Correlation Coefficient

We can find many situations where two variables x and y are related in such a way that the changes
in one variable has influence on the changes in the other variable. Such a relationship is called
Correlation (or covariation). Correlation is used to describe the linear relationship with two
variables x and y.
If x and y increase or decrease together, the correlation is said to be positive. If the increase
(or decrease) in x corresponds to the decrease (or increase) in y, the correlation is said to be
negative. If there is no relationship indicated between the variables x and y, they are said to be
independent or uncorrelated.
Let x1 , x2 , x3 , xn be n values of variable x and y1 , y2 , y3 , yn be the corresponding values
of variable y. Then, the measure of correlation (called coefficient of correlation) is defined by the
relation

r
  x  x  y  y 
i i
---(i)
n x y

where x is mean of x-series, y is mean of y-series, and  x is standard deviation of x-series are
 y is standard deviation of y-series.

Let  xi  x   X i , deviation from the mean x and  yi  y   Yi , deviation from the mean
y then we can write (i) as

r
 X Y ---(ii)
i i
or r
 xy
---(iii)
n x y  x y

1 1
where  xy 
n
  xi  x  yi  y    X iYi is called the covariance of x and y.
n
Note:
1 1
(1) x 
n
 xi , y   yi
n
1 1 1 1
 x2    xi  x    X i2 and  y2    yi  y    Yi 2 are called variance of x
2 2
(2)
n n n n
and y respectively.
(3) Coefficient of correlation is also known as Karl Pearson’s coefficient of correlation.
(4) 1  r  1
(5) If r  1 , we say that x and y are perfectly correlated
(6) If r  0 , we say that x and y are non-correlated (or independent).
Problems
1. The following table gives the age (in years) of 10 couples. Calculate the covariance and the
coefficient of correlation between these ages.
age of husband (x): 23 27 28 29 30 31 33 35 36 39
age of wife (y): 18 22 23 24 25 26 28 29 30 32

1 1 1 1
Solution: n  10 x
n
 xi   311  31.1
10
y
n
 yi   257   25.7
10
xi yi X i  xi  x Yi  yi  y X i2 Yi 2 X iYi
23 18 -8.1 -7.7 65.61 59.29 62.37
27 22 -4.1 -3.7 16.81 13.69 15.17
28 23 -3.1 -2.7 9.61 7.29 8.37
29 24 -2.1 -1.7 4.41 2.89 3.57
30 25 -1.1 -0.7 1.21 0.49 0.77
31 26 -0.1 0.3 0.01 0.09 -0.03
33 28 1.9 2.3 3.61 5.29 4.37
35 29 3.9 3.3 15.21 10.89 12.87
36 30 4.9 4.3 24.01 18.49 21.07
39 32 7.9 6.3 62.41 39.69 49.77
 202.9 158.1 178.3

1 1
 x2 
n
 X i2   202.9   20.29
10
  x  4.5044

1 1
 y2 
n
 Yi 2  158.1  15.81
10
  y  3.9762

1 1
Covariance of x and y is  xy 
n
 X iYi  178.3  17.83
10
 xy 17.83
The coefficient of variation between x and y is r    0.9955
 x y 4.5044  3.9762

So, the correlation is almost perfect. i.e., in the given data, the ages of husbands and wives are
almost perfectly correlated.
2. Find the coefficient of correlation between industrial production and export (both in crores tons)
using the following data:
production (x): 55 56 58 59 60 60 62
export (y): 35 38 38 39 44 43 45

1 1 1 1
Solution: n  7 x
n
 xi   410   58.5714
7
y
n
 yi   282   40.2857
7
xi yi X i  xi  x Yi  yi  y X i2 Yi 2 X iYi
55 35 -3.5714 -5.2857 12.7549 27.9386 18.8773
56 38 -2.5714 -2.2857 6.6121 5.2244 5.8774
58 38 -0.5714 -2.2857 0.3265 5.2244 1.306
59 39 0.4286 -1.2857 0.1837 1.653 -0.5511
60 44 1.4286 3.7143 2.0409 13.796 5.3062
60 43 1.4286 2.7143 2.0409 7.3674 3.8776
62 45 3.4286 4.7143 11.7553 22.2246 16.1634
 35.7143 83.4284 50.8068

1 1
 x2 
n
 X i2   35.7143  5.102
7
  x  2.2588

1 1
 y2 
n
 Yi 2  83.4284   11.9183
7
  y  3.4523

1 1
Covariance of x and y is  xy 
n
 X iYi   50.8568  7.2653
7
 xy 7.2653
The coefficient of variation between x and y is r    0.9317
 x y 2.2588  3.4523

3. Psychological tests of intelligence and computational ability were applied to ten children.
Following is the record showing intelligence ratio (IR) and ability ratio (AR). Calculate the
coefficient of correlation.
IR (x): 105 104 102 101 100 99 98 96 95 94
AR (y): 101 103 100 98 95 96 104 97 97 96

1 1 1 1
Solution: n  10 x
n
 xi   994   99.4
10
y
n
 yi   987   98.7
10
xi yi X i  xi  x Yi  yi  y X i2 Yi 2 X iYi
105 101 5.6 2.3 31.36 5.29 12.88
104 103 4.6 4.3 21.16 18.49 19.78
102 100 2.6 1.3 6.76 1.69 3.38
101 98 1.6 -0.7 2.56 0.49 -1.12
100 95 0.6 -3.7 0.36 13.69 -2.22
99 96 -0.4 -2.7 0.16 7.29 1.08
98 104 -1.4 5.3 1.96 28.09 -7.42
96 97 -3.4 -1.7 11.56 2.89 5.78
95 97 -4.4 -1.7 19.36 2.89 7.48
94 96 -5.4 -2.7 29.16 7.29 14.58
 124.4 88.1 54.2
1 1
 x2 
n
 X i2  124.4   12.44
10
  x  3.527

1 1
 y2 
n
 Yi 2   88.1  8.81
10
  y  2.9682

1 1
Covariance of x and y is  xy 
n
 X iYi   54.2   5.42
10
 xy 5.42
The coefficient of variation between x and y is r    0.5177
 x y 3.527  2.9682

4. Obtain the coefficient of correlation for the following data


x: 10 14 18 22 26 30
y: 18 12 24 6 30 36

1 1 1 1
Solution: n  6 x
n
 xi  120   20
6
y
n
 yi  126   21
6
xi yi X i  xi  x Yi  yi  y X i2 Yi 2 X iYi
10 18 -10 -3 100 9 30
14 12 -6 -9 36 81 54
18 24 -2 3 4 9 -6
22 6 2 -15 4 225 -30
26 30 6 9 36 81 54
30 36 10 15 100 225 150
 280 630 252

1 1
 x2 
n
 X i2   280   46.6667
6
  x  6.8313

1 1
 y2 
n
 Yi 2   630   105
6
  y  10.247

1 1
Covariance of x and y is  xy 
n
 X iYi   252   42
6
 xy 42
The coefficient of variation between x and y is r    0.6
 x y 6.8313 10.247
Alternate formula to compute coefficient of correlation (r)
If z  ax  by and r is the coefficient of correlation between x and y, show that
 z2   a 2 x2  b2 y2 
  a   b   2abr x y or r 
2 2 2 2 2
.
2ab x y
z x y

Proof: Since z  ax  by , we get z  ax  by . Where z , x , y are means if x, y, z.

Let zi  axi  byi , where i  1, 2,3 , n .

Consider zi  z  a  xi  x   b  yi  y 

On squaring both sides

 zi  z   a 2  xi  x   b2  yi  y   2ab  xi  x  yi  y 
2 2 2

Take the summation upto n terms

 z  z   a 2   xi  x   b2   yi  y   2ab  xi  x  yi  y 
2 2 2
i

n z2  a 2 n x2  b2 n y2  2abrn x y

 z2  a 2 x2  b2 y2
  a   b   2abr x y
2 2 2 2 2
or r .
2ab x y
z x y

Note:
1.  ax by  a  x  b  y  2abr x y
2 2 2 2 2

2.  x  y   x   y  2r x y (Taking a  1 & b  1 )


2 2 2

3.  x2 y   x2   y2  2r x y (Taking a  1 & b  1)

4. Above formula is useful to compute r ,  x &  y when  ax by is given (like following problems)

Problems
1. If the variables x and y are such that (i) x  y has variance 15, (ii) x  y has variance 11 and (iii)
2x  y has variance 29, find  x ,  y and the coefficient of correlation between x and y.

Solution: we know that  ax2 by  a 2 x2  b2 y2  2abr x y .

So,  x2 y  15  15   x2   y2  2r x y ---(i)

 x2 y  11  11   x2   y2  2r x y ---(ii)

 22x y  29  29  4 x2   y2  4r x y ---(iii)


(i) + (ii) gives 26  2  x2   y2   13   x2   y2 ---(iv)

2 x (ii) + (iii) gives 51  6 x2  3 y2  17  2 x2   y2 ---(v)

(iv) - (v) gives  x2  4 x  2

using it in (iv) gives  y2  9 y  3

 x2   y2  11 4  9  11 1
From (ii), we get r     0.1667
2 x y 2 23 6

2. The standard deviation of x and y are 2 and 3 respectively. If the coefficient of correlation
between x and y is 0.4, find the standard deviations of x  y and x  y .

Solution: Given  x  2 ,  y  3 and r  0.4 .

We know  ax2 by  a 2 x2  b2 y2  2abr x y

So,  x2 y   x2   y2  2r x y  4  9  2  0.4  2  3  17.8   x  y  4.219

 x2 y   x2   y2  2r x y  4  9  2  0.4  2  3  8.2   x  y  2.8636

Rank Correlation
A group of n individuals may be arranged in order to merit with respect to some
characteristic. The same group would give different orders for different characteristics.
Considering the orders corresponding to two characteristics A and B, the correlation between these
n pairs of ranks is called the rank correlation in the characteristics A and B for that group of
individuals.
Let xi , yi be the ranks of the ith individuals in A and B respectively. Assuming that no two
individuals are bracketed equal in either case, each of the variables taking the values 1, 2, 3,…, n,
1  2  3    n n  n  1 n  1
we have x  y   
n 2n 2
If X , Y be the deviations of x, y from their means, then

 X   x  x     x   x  
 2 xi x   xi2  n  x   2 x  xi
2 2 2 2 2
i i i

n  n  1 n  n  1 2n  1 n  n  1 n  n  1
2 2
 n 1 
 n 2
 2 n     n  1
4  2  6 4 2

n  n  1 2n  1 n  n  1 n  n  1
2 2

=   n3  n 
1
  
6 4 2 12
Similarly, Y i
2

12
n  n
1 3

Now let di  xi  yi so that di   xi  x    yi  y   X i  Yi

di2  X i2  Yi 2  2 X iYi   di2   X i2  Yi 2  2 X iYi

  X iYi 
1
2
  X i2  Yi 2   di2 
1 1  1
  X iYi    n3  n    n3  n    di2   n3  n    di2
1 1 1
2 12 12  2 12 2

 x2 
1
n
 X i2 
1
12n
 n3  n  and  y2   Yi 2 
1
n
1
12n
 n3  n 
Hence the coefficient of correlation between these variables is

XY 12

1 3
n  n    di2
1
2 12

1 3
n  n    di2
1
2 6 di2
r i i
   1 3
n x y
n
 1  3
 n  n
1 3
  n  n
 n  n
 12n   12 
This is called the rank correlation coefficient and is denote by  .

Problems
1. Ten competitors in a beauty contest are ranked by two judges A and B in the following order:
ID No. of competitors 1 2 3 4 5 6 7 8 9 10
Judge A 1 6 5 10 3 2 4 9 7 8
Judge B 6 4 9 8 1 2 3 10 5 7

Calculate the rank correlation coefficient.


Solution: Here n  10 .
Ranks by
A (  xi ) B(  yi ) di  xi  yi d i2
1 6 -5 25
6 4 2 4
5 9 -4 16
10 8 2 4
3 1 2 4
2 2 0 0
4 3 1 1
9 10 -1 1
7 5 2 4
8 7 1 1
 60
6 d 2
6  60
  1 i
 1  0.6363
n 3
 n 990

2. The following are the marks of 8 students in Statistics and Mathematics.


Marks in statistics 25 43 27 35 54 61 37 45
Marks in mathematics 35 47 20 37 63 54 28 40

Calculate the rank correlation coefficient.


Solution: n  8
Marks in Rank
Statistics Mathematics Statistics Mathematics di  xi  yi d i2
(  xi ) (  yi )
25 35 8 6 2 4
43 47 4 3 1 1
27 20 7 8 -1 1
35 37 6 5 1 1
54 63 2 1 1 1
61 54 1 2 -1 1
37 28 5 7 -2 4
45 40 3 4 -1 1
 14

6 di2 6 14
  1  1  0.8333 .
n 3
 n 504

3. Ten competitors in a beauty contest are ranked by three judges A, B and C in the following order:
ID No. of competitors 1 2 3 4 5 6 7 8 9 10
Judge A 1 6 5 10 3 2 4 9 7 8
Judge B 3 5 8 4 7 10 2 1 6 9
Judge C 6 4 9 8 1 2 3 10 5 7

Find which pair of judges has common approach


Solution: Here n  10 .
Ranks by d1 d2 d3 d12 d 22 d 32
A ( x) B(  y ) C(  z ) x y yz zx
1 3 6 -2 -3 5 4 9 25
6 5 4 1 1 -2 1 1 4
5 8 9 -3 -1 4 9 1 16
10 4 8 6 -4 -2 36 16 4
3 7 1 -4 6 -2 16 36 4
2 10 2 -8 8 0 64 64 0
4 2 3 2 -1 -1 4 1 1
9 1 10 8 -9 1 64 81 1
7 6 5 1 1 -2 1 1 4
8 9 7 -1 2 -1 1 4 1
 200 214 60

6 d12 6  200
  x, y   1   1  0.2121
n 3
 n 990

6 d 22 6  214
  y, z   1   1  0.29697
n 3
 n 990

6 d32 6  60
  z, x   1   1  0.636364
n 3
 n 990

Since   z, x  is maximum, the pair of judges A and C have the nearest common approach.

You might also like