Lesson 9.
Correlation Coefficient
Correlation is a measure of relationship between two variables of
interest, where some of the common things measured tend to be related.
The relationship between two variables can be determined through the
use of correlation coefficient.
Correlation coefficient indicates two things:
1. The direction of the relationship
2. The strength or degree of the relationship
Direction of the Relationship
If the correlation coefficient is positive, the relationship is said to be
direct ;
- this suggest that a high value in one variable X corresponds to a
high value in the other variable Y
- students who perform high in Math tend tend to perform also in
English
If the correlation coefficient is negative, the relationship is said to be
inverse. ;
- this suggest that a high value in one variable X corresponds to a
low value in other variable Y
- students who perform high in Math tend to perform low in English.
If the correlation coefficient is zero, no relationship exist
- a high value in one variable does not necessarily corresponds to
a high or low value in the other variable
Perfect positive (direct) relationship
- this happens when the computed correlation coefficient is exactly
+ 1.
A Graphical Summary
Strength or Degree of Correlation
An r of 0 denotes no relationship
An r from 0.01 to ± 0.20 denotes negligible correlation
An r from ± 0.21 to ± 0.40 denotes low or slight correlation
An r from ± 0.41 to ± 0.70 denotes marked or moderate correlation
An r from ± 0.71 to ± 0.90 denotes high relationship
An r from ± 0.91 to 0.99 denotes very high relationship
An r from ± 1.00 denotes perfect relationship
A. Pearson Product – Moment Correlation Coefficient
Pearson r correlation:
Pearson r correlation is the most widely used correlation statistic to measure
the degree of the relationship between linearly related variables. For example,
in the stock market, if we want to measure how two stocks are related to each
other, Pearson r correlation is used to measure the degree of relationship
between the two. The point - biserial correlation is conducted with the
Pearson correlation formula except that one of the variables is dichotomous.
The following formula is used to calculate the Pearson r correlation from raw
scores
Types of Correlation
1. Direct and Inverse
Direct correlation is obtained when the changes in two variables
are in the same direction; as one variable increases or (decreases)
, the other also decreases or increases.
Example: employment increases when activity increases; it
decreases when business decline.
Inverse correlation is obtained when the changes in the two
variables are in the opposite direction.
Example: business failures decrease in frequency when business
improves, but they increase when declines.
2. Linear and Non Linear.
Linear correlation is obtained when the amount of change in one
variable tends to bear a constant ratio to the amount of change in
the other variable.
Example: if income is doubled, wealth is also doubled
Non-Linear or curvilinear correlation when the amount of change
in one variable does not bear a constant ration to the amount of
change in the other.
Example: if the amount of rainfall is doubled, the amount of palay
harvest is not necessarily doubled.
3. Simple, Multiple, Partial
Simple correlation is obtained when only two attributes or
characteristics are considered.
Multiple correlations are obtained when more than two variables
are considered
Partial correlation is obtained when more than two variables are
recognized but only two are considered to be influencing each
other.
Example.
The College Entrance Examination scores of 7 students and their
corresponding grade point average (GPA) are shown below:
Entrance Exam GPA
x y xy X2 Y2
80 75 6000 6400 5625
72 82 5904 5184 6724
80 76 6080 6400 5776
75 74 5550 5625 5476
70 78 5460 4900 6084
77 72 5544 5929 5184
75 75 5625 5625 5625
∑x = 529 ∑y = 532 40163 40063 40494
n = 7
Solution:
7(40163) - (529)(532)
r = -----------------------------------------------------------
√ [ 7( 40063) - (529)2 ] [ 7( 40494) - ( 532)2
281,141 - 281,428
r = ------------------------------------------------------------
√ ( 280,441 - 279,841) - ( 283,458 - 283,024)
287
r = ------------------------
√ ( 600) ( 434)
287 287
r = ---------------- = --------------
√ 260,400 510.29
r = 0.56
The value of r = 0.56 means that there is negative moderate relationship
between College Exam and the GPA. This means that those who have high
performance in college entrance exam tend to have moderate GPA.
Exercise 9A
Solve the following :
1. A random sample of 15 married couples both of whom earned
income (in thousand pesos) showed the following;
Income of Husband (x) Income of Wife (y)
13,000 8,000
18,000 12,000
24,000 16,000
76,000 48,000
25,000 30,000
19,000 16,000
40,000 32,000
52,000 60,000
8,000 42,000
19,000 17,000
26,000 25,000
18,000 19,000
33,000 45,000
37,000 32,000
12,000 28,000
What is the correlation coefficient of the income of both husband and
wife ? What is the degree of correlation. Interpret.
B. Point Biserial Correlation, (rpb )
This is used when a variable which is interval in nature is correlated
with another variable which is classified as real, dichotomous.
Relationship between performance rating (y) and sex (x)
categorized as male (1) and female (0)
Relationship between production and presence a modern
equipment with (1) and without (0) .
Formula ;
_ _
x1 - x0 n1 n0
r pb = ------------ √ ------------
Sy n(n-1)
Where;
x1 = mean of those which are labeled 1
X0 = mean of those which are labeled as 0
n1 = number of samples labeled as 1 in y
n 0 = number of samples labeled as 0 in y
n = total number if samples = n1 + n0
Sy = standard deviation of all the y values
Example
1. A researcher would like to find out whether the use of modern
equipment (x) is related to increase in production of rice in cavans (y).
Use of Equip. Productions
Farmer
(0) not using & (1) using in cavans y
1 1 120
2 0 100
3 1 160
4 0 110
5 1 150
6 1 180
7 0 180
8 0 60
9 1 70
Solution:
n1 = 5 n0 = 4 n = 5 + 4 = 9
120 + 160 + 150 + 180 + 70
y1 = ---------------------------------------
5
y1 = 136
100 + 110 + 180 + 60
y0 = -----------------------------------
4
y0 -= 112.5
Sy = 44.75
136 - 112.5 5(4)
r pb = ------------------------------ √ --------
44.75 9(8)
23.5
= -------- ( 0.527)
41.87
r pb = 0.277
Relationship between the continuous variable and the dichotomous
response (like true or false)
Formula :
∑ f ( ∑f1Y) - ∑ f1 ( ∑ f Y)
r pb = ------------------------------------------------------------------------
√ ∑ f ( ∑ f ) [ ∑ f ( ∑ f Y ) - (∑ f Y)2 ]
1 2
2
Where:
f = sum of frequencies
Y = continuous variable
f 1 = frequency of larger (or major ) group
f2 = frequency of smaller (minor) group
Example:
A study was conducted to determine the relation of sex to emotional
intelligence. A group of 90 students were selected at random and the
following results were obtained.
No. of Males No. of Females Total no. of Students
EQ Scores
obtaining scores obtaining scores obtaining scores
f 2 f 1
80 3 4 7
85 6 8 14
92 1 4 5
76 7 2 9
79 5 4 9
94 3 6 9
97 4 7 11
102 2 5 7
87 5 6 11
82 4 4 8
40 50 90
Using the point biserial correlation, determine the relationship between
emotional intelligence ad sex.
Solution :
Y f1 f2 f = f1 + f2 Y2 fY fY2 f1Y
80 4 3 7 6400 560 44800 320
85 8 6 14 7225 1190 101150 680
92 4 1 5 8464 460 42320 368
76 2 7 9 5776 684 51984 152
79 4 5 9 6214 711 56169 316
94 6 3 9 8836 846 79524 564
97 7 4 11 9409 1067 103499 679
102 5 2 7 10404 714 72828 510
87 6 5 11 7569 957 83259 522
82 4 4 8 6724 656 53792 328
50 40 90 7845 689325 4439
90(4439) - (50)(7845)
r pb = -------------------------------------------
√ (50)(40) [ 90(689325) - (7845)2
399,510 - 392,250
= ------------------------------------------------
√ 2,000 [( 62,039,250) - 61,544,025]
7260
= ------------------------
√ 2000( 495,225)
7260
= ---------------
31,471.42
= 0.23
The result gives a low positive point biserial correlation.
Exercise 9B
1. Using the following data, compute for the point biserial correlation.
Interpret the results as ti the degree of correlation.
Y f1 f2
20 5 3
25 6 2
30 15 9
15 7 2
10 12 4
40 7 3
2. In an achievement test administered to senior students, a study was made
to determine the relationship between test scores and type if school. The
following results were : Compute for point biserial correlation and interpret.
Test Scores fA fB
95 10 4
72 3 5
65 6 2
75 13 7
84 14 9
90 20 18
87 15 10
89 8 5
98 12 7
82 11 4
C. Spearman’s Rank Correlation Coefficient (rho)
Spearman rank correlation is a non-parametric test that is used to
measure the degree of association between two variables. The Spearman
rank correlation test does not carry any assumptions about the distribution of
the data and is the appropriate correlation analysis when the variables are
measured on a scale that is at least ordinal.
The following formula is used to calculate the Spearman rank correlation:
Where:
p = Spearman rank correlation
di = the difference between the ranks of corresponding variables
n = number of observations
Example
Two judges ranked 10 contestants in a beauty pageant in order of their
preference. The results were as follows:
Contestants 1 2 3 4 5 6 7 8 9 10
Judge A 7 1 5 8 2 9 6 3 10 4
Judge B 9 3 7 6 1 10 4 2 8 5
Did the judges tend to agree in their choice? Use α = 0.05
Solution:
Difference Squares
Judge A Judge B
D D2
7 9 2 4
1 3 2 4
5 7 2 4
8 6 2 4
2 1 1 1
9 10 1 1
6 4 2 4
3 2 1 1
10 8 2 4
4 5 1 1
∑D = 28
n = no. Of contestants = 10
6 ∑D2
rho ( ρ) = 1 - -----------------------
n ( n 2 - 1)
6 ( 28)
= 1 - --------------------
10 ( 102 - 1)
168
= 1 - -----------------
990
= 1 - 0.17
rho ( ρ) = 0.83
This means that there is a high relationship between the two judges on
preference of candidates.
Exercise 9C
1. Ten applicants to the position of an analyst engineer of international
firm were ranked in their performance in the Engineering board examination.
They were also ranked in the actual job performance. The data gathered are
tabulated below where the highest is 1 and the lowest is 10.
Ranked Ranked
Applicant
Board Exam Job Performance
1 6 5
2 3 2
3 1 3
4 8 6
5 4 7
6 2 1
7 10 9
8 9 8
9 5 4
10 7 10
Find the correlation coefficient and interpret the result.
2. Given the following data. (in millions of pesos).
Do capital and profit tend to agree? Interpret the correlation coefficient.
Capital Profit
6.5 1
10.8 1.6
21.4 5.3
18.7 3.4
9.3 2
15.1 4.5
Check - up Test ! ! !
Choose which of the following correlation tests be used on the
given data.
A group of administrators attended a training along Total Quality Management
(TQM). Their administrative competence through a questionnaire, and data
about their profile were determined before and after attending the program.
The data are tabulated as follows:
Sex Admin Admin Admin
Admin 1 = male Age Competence Competence Experience
0 = female Before After Years
1 0 42 4 5 3
2 1 52 4 5 5
3 1 39 3 4 1
4 0 47 3 4 5
5 0 40 4 4 1
6 1 50 5 5 8
7 1 52 5 4 7
8 0 48 4 4 4
9 1 46 5 4 6
10 0 54 4 5 8
Determine the degrees of relation between the following variables
A. Sex and change in Admin Competence
B. Age and change in Admin Competence
C. Admin Experience and Admin Competence Before
References;
Nocon, Ferdinand T. et al. General Statistics For Filipinos , Manila: National Book
Store, .1st Edition, 2010.
Punsalan, Twila G. et al . Statistics A Simplified Approach , Manila: Rex Book Store,
2010.
Reynoso, Lino C. et al. Statistics and Probability.
Walpole, R. Introduction to Statistics . 3rd Edition. New York, Macmillian Publishing
Co., Inc.