Biostatistics I: Descriptive Statistics
Correlation
Eleni-Rosalina Andrinopoulou
Department of Biostatistics, Erasmus Medical Center
e.andrinopoulou@erasmusmc.nl
7@erandrinopoulou
In this Section
▶ Correlation coefficients
▶ Examples
1
Correlation
Correlation is a measure that describes the strength of the association
between two variables. Let’s assume that we have two continuous
variables, we can get the following relationships:
Positive correlation Negative correlation No correlation
2
2
1
1
1
Variable 2
Variable 2
Variable 2
0
0
0
−1
−1
−1
−2
−2
−2
−2 −1 0 1 2 −2 −1 0 1 2 −2 −1 0 1 2
Variable 1 Variable 1 Variable 1
2
Pearson Correlation
▶ magnitude of association
▶ linear association
▶ direction of the relationship
A relationship is linear when a change in one variable is associated with a
proportional change in the other variable
cov(X,Y)
Pearson Correlation: corr(X, Y) = sd(X)sd(Y) ,
where cov(X, Y) is the covariance and sd(X), sd(Y) are the standard
deviations
3
Spearman Correlation
▶ direction of the relationship
▶ monotonic relationship
In a monotonic relationship, the variables tend to change together, but
not always at a constant rate (as in the linear case)
The Spearman correlation coefficient is based on the ranked values:
cov(RX ,RY )
corrR (X, Y) = sd(R X )sd(RY )
What is rank?
Ranks are integers indicating the rank of some values. E.g. the rank of 3,
10, 16, 6, 2 is 2, 4, 5, 3, 1:
rank(c(3, 10, 16, 6, 2))
[1] 2 4 5 3 1
4
Difference between Pearson and Spearman
Weak positive correlation: Strong positive correlation:
Pearson = 0.5 Spearman = 0.51 Pearson = 0.9 Spearman = 0.8
2
2
1
1
Variable 2
Variable 2
0
0
−1
−1
−2
−2
−2 −1 0 1 2 −2 −1 0 1 2
Variable 1 Variable 1
5
Difference between Pearson and Spearman
What if there is a correlation but
Linear VS monotonic relationship:
this is not linear?
Pearson = −0.85 Spearman = −0.99
Pearson = 0.06 Spearman = 0.04
50
50
45
40
40
Variable 2
30
Variable 2
35
20
30
10
25
0
20
5 10 15 20
0 5 10 15 20 25 30
Variable 1
Variable 1
6
Other Correlation Measures
▶ Point-Biserial: It evaluates the association between a continuous
variable with a categorical (dichotomous) variable
▶ Intraclass: It evaluates the association between two continuous
variables that are structured in groups
Note
▶ Correlation must not be confused with causality
▶ If two variables are correlated, it does not imply that one variable
causes the changes in another variable