8.
3_Correlation
Correlation
Introduction to Correlation
In statistics, correlation measures the strength and direction of a linear relationship
between two quantitative variables.
It quantifies how the change in one variable is associated with the change in another.
A measurement of the degree of linear relationship can determine the usefulness of
the regression relationship analysis in a specific application .
Types Of Correlation :
Positive Correlation :
As one variable increases, the other also increases.
Negative Correlation :
As one variable increases, the other decreases.
No Correlation :
No apparent relationship between the variables.
Linear Correlation Coefficient (Pearson’s Correlation
Coefficient)
The Pearson correlation coefficient (r) measures the linear relationship between
two continuous variables .
Mathematical Formulas :
First Formula :
n
∑ (x i − x̄)(y i − ȳ)
i=1
r =
1
n n
2 2 2
[∑ (x i − x̄) ∑ (y i − ȳ) ]
i=1 i=1
Where:
xi , y i : Individual data points of variables X and Y .
x̄ , ȳ : Mean of variables X and Y .
r : Correlation coefficient .
ranges : from -1 to +1
Interpretation :
r = +1 : Perfect positive linear correlation
r = −1 : Perfect negative linear correlation
r = 0 : No linear correlation
Second Formula :
n n
n ∑ xi ∑ yi
i=1 i=1
∑ xi yi −
i=1 n
r =
1
n 2 n 2 2
n (∑ i=1 x i ) n (∑ i=1 y i )
2 2
[(∑ x − ) (∑ y − )]
i=1 i n i=1 i n
n n n
n∑ xi yi − ∑ xi ∑ yi
i=1 i=1 i=1
r =
1
n n 2 n n 2 2
2 2
[(n ∑ x − (∑ x i ) ) (n ∑ y − (∑ y i ) )]
i=1 i i=1 i=1 i i=1
Third Formula :
SS xy
r =
√SS xx SS yy
Where :
SS xy : represents the covariance between two variables x and y .
n n n n
(∑ x i ) (∑ yi )
i=1 i=1
SS xy = ∑ x i y i − = ∑(x i − x̄)(y i − ȳ)
n
i=1 i=1
SS xx : represents the variance of the x-values.
n 2
n n
(∑ xi )
2 i=1 2
SS xx = ∑ x i − = ∑(x i − x̄)
n
i=1 i=1
SS yy : represents the variance of the y-values.
n 2
n n
(∑ yi )
2 i=1 2
SS yy = ∑ y i − = ∑(y i − ȳ)
n
i=1 i=1
Examples :
Ex 1 :
Ex 2 :
Rank Correlation Coefficient (Spearman’s Rank
Correlation Coefficient)
Spearman's rank correlation measures the strength and direction of a monotonic
relationship between two ranked variables .
It is useful when data do not meet the assumptions of Pearson's correlation (non-
linear relationships or ordinal qualitative data).
Mathematical Formula :
n 2
6∑ d
i=1 i
ρ = 1 −
2
n(n − 1)
Where :
di : Difference between ranks of corresponding values of X and Y .
n : Number of data pairs of values for the variables (X, Y )
ρ : Spearman's rank correlation coefficient
ranges : from -1 to +1 )
Interpretation :
r = +1 : Perfect positive correlation .
r = −1 : Perfect negative correlation .
r = 0 : No correlation .
Examples :
Ex 1 :
Ex 2 :
Cramer's Coefficient of contingency
The coefficient of contingency r c is used to measure the association between two
categorical variables in a contingency table .
Mathematical Formula:
if we have two phenomena, the first phenomenon has n a elements belonging to
characteristic A , while n b elements belong to characteristic B, while the second
phenomenon has m a elements belonging to characteristic A , while m b elements
belong to characteristic B .
The Coefficient of contingency for the two phenomena is :
na . mb − nb . ma
rc =
na . mb + nb . ma
Examples :
Ex 1 :
Ex 2 :
The Coefficient of association
If we have descriptive or quantitative data represent two phenomena under study
and summarized in a table contains more than four columns .
In other words, if the first characteristic is divided into n categories and the second
characteristic into n categories and n. m > 4 and f ij is the number of items in
row number i and column number j as in the following table :
Formula :
2 2 2
(f 11 ) (f 12 ) (f nm )
C = + + … +
(f 1∙ )(f ∙1 ) (f 2∙ )(f ∙1 ) (f m∙ )(f ∙n )
Then we calculate The Coefficient of association r a from the relation :
C − 1
ra = √
C
Where :
∙ : means sum .
Examples :
Ex 1 :