Correlation
Bivariate Distribution: Bivariate distribution refers to distribution that consists of two
variables, typically denoted as X and Y, which are measured or observed together. The objective
of analysing bivariate distribution is to under understand the relationship between these two
variables. For example, if we measure the heights and weights of a certain group of persons,
we shall get what is known as Bivariate distribution, one variable relating to height and other
variable relating to weight.
Scatter Diagram: It is the simplest way of the diagrammatic representation of bivariate data.
Thus, for the bivariate distribution (𝑥𝑖 , 𝑦𝑖 ); 𝑖 = 1,2,3, … , 𝑛, if the values of the variables X and
Y be plotted along the x-axis and y-axis respectively in the 𝑥𝑦-plane, the diagram of dots so
obtained is known as scatter diagram.
Scatter Plot example
Coefficient of Correlation: As a measure of intensity or degree of linear relationship between
two variables, Karl Pearson (1867-1936), a British Biometrician, developed a formula called
correlation Coefficient.
Correlation coefficient between two random variables X and Y, usually denoted by
𝑟(𝑋, 𝑌) or simply 𝑟𝑋𝑌 is a numerical measure of linear relationship between them and is defined
as
𝐶𝑜𝑣(𝑋, 𝑌)
𝑟(𝑋, 𝑌) =
𝜎𝑋 𝜎𝑌
If (𝑥𝑖 , 𝑦𝑖 ); 𝑖 = 1,2,3, … , 𝑛, is the bivariate distribution, then
𝑛
1
𝐶𝑜𝑣(𝑋, 𝑌) = ∑(𝑥𝑖 − 𝑥̅ ) (𝑦𝑖 − 𝑦̅)
𝑛
𝑖=1
𝑛
1
𝜎𝑋 = √ ∑(𝑥𝑖 − 𝑥̅ )2
𝑛
𝑖=1
𝑛
1
𝜎𝑌 = √ ∑(𝑦𝑖 − 𝑦̅)2
𝑛
𝑖=1
Where, 𝑟(𝑋, 𝑌) = Correlation Coefficient between variables X and Y.
𝐶𝑜𝑣(𝑋, 𝑌) = Covariance between the variables X and Y
𝜎𝑋 = Standard deviation of variable X
and 𝜎𝑌 = Standard deviation of variable Y
Correlation Coefficient between X and Y can be written as
∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ ) (𝑦𝑖 − 𝑦̅)
𝑟(𝑋, 𝑌) =
√∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 √∑𝑛𝑖=1(𝑦𝑖 − 𝑦̅)2
Here, 𝑥̅ , 𝑦̅ are the means of data X and Y respectively.
Example: Calculate the correlation coefficient for the following bivariate data as
X 3 4 5 6 7
Y 12 14 16 18 20
Solution: Make table to obtain the required quantities which will help in calculating the
Correlation coefficient.
X Y (𝑥 − 𝑥̅ ) (𝑥 − 𝑥̅ )2 (𝑦 − 𝑦̅) (𝑦 − 𝑦̅)2 (𝑥 − 𝑥̅ )(𝑦 − 𝑦̅)
3 12 -2 4 -4 16 8
4 14 -1 1 -2 4 2
5 16 0 0 0 0 0
6 18 1 1 2 4 2
7 20 2 4 4 16 8
𝑥̅ = 5 𝑦̅ = 16 10 44 20
From the above table, we get
∑(𝑥𝑖 − 𝑥̅ )2 = 10
∑(𝑦𝑖 − 𝑦̅)2 = 44
And ∑(𝑥𝑖 − 𝑥̅ )(𝑦𝑖 − 𝑦̅) = 20
Then the correlation coefficient between X and Y is calculated as
∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ ) (𝑦𝑖 − 𝑦̅)
𝑟(𝑋, 𝑌) =
√∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 √∑𝑛𝑖=1(𝑦𝑖 − 𝑦̅)2
20
=
√10√44
20
=
√440
= 0.95
Hence, the correlation coefficient between X and Y as 𝑟(𝑋, 𝑌) = 0.95
Range of correlation coefficient: The range of correlation coefficient 𝑟(𝑋, 𝑌) lies between -1
and 1, Mathematically it can be denoted as −1 ≤ 𝑟(𝑋, 𝑌) ≤ 1.
Interpretation
1. Strength: Coefficient value means 0 ≤ 𝑟(𝑋, 𝑌) ≤ 1.
2. Direction: Positive (direct), negative (inverse).
Correlation
Rank Correlation Coefficient: Let us suppose that a group of n individuals is arranged in
order of merits or proficiency in possession of two characteristics A and B. These ranks in the
two characteristics will, in general, be different. For example, if we consider the relation
between intelligence and beauty, it is not necessary that a beautiful individual is intelligent also.
Let (𝑥𝑖 , 𝑦𝑖 ); 𝑖 = 1,2,3, … , 𝑛 be the ranks of the 𝑖 𝑡ℎ individual in two characteristics A
and B respectively. Then, correlation coefficient between the ranks x’s and y’s is called the
rank correlation coefficient between A and B for the group of individuals.
Assuming that no two individuals are bracketed equal (means have same rank) in either
classification, each of the variable X and Y takes the values 1, 2, 3, …, n.
Then, the Rank correlation coefficient is defined as
6 ∑𝑛𝑖=1 𝑑𝑖 2
𝑟(𝑋, 𝑌) = 1 −
𝑛 (𝑛 2 − 1)
Where, 𝑑𝑖 = 𝑥𝑖 − 𝑦𝑖 ; 𝑖 = 1,2,3, … , 𝑛
Example: The ranks of same 10 students in Mathematics and Physics are as follows. Two
numbers within bracket denotes the ranks of the students in Mathematics and Physics.
(1, 10) (2, 9) (3, 4) (5, 3) (4, 6)
(7, 2) (8, 1) (6, 8) (10, 9) (9, 7)
Calculate the rank correlation coefficient for proficiencies of this group in Mathematics and
Physics.
Solution:
Ranks in Ranks in d=x-y 𝑑2
Maths(X) Physics(Y)
1 10 -9 81
2 9 -7 49
3 4 -1 1
5 3 2 4
4 6 -2 4
7 2 5 25
8 1 7 49
6 8 -2 4
10 9 1 1
9 7 2 4
∑ 𝑑 2 = 222
Thus, the rank correlation coefficient is calculated as
6 ∑𝑛𝑖=1 𝑑𝑖 2
𝑟(𝑋, 𝑌) = 1 −
𝑛 (𝑛 2 − 1)
Using the table, we get 𝑛 = 10 and ∑𝑛𝑖=1 𝑑𝑖 2 = 222
6 ∗ 222
𝑟(𝑋, 𝑌) = 1 −
10 ∗ (102 − 1)
1332
=1−
10 ∗ (100 − 1)
1332
=1−
10 ∗ 999
1332
=1−
9990
= 1 − 0.13
= 0.87
Hence, the rank correlation coefficient as 𝑟(𝑋, 𝑌) = 0.87.
Uses of correlation coefficient: Correlation coefficient measure the strength and direction of
linear relationship between two variables. Here are some uses:
Statistical Analysis
1. Hypothesis Testing: Test significance of correlation
2. Regression Analysis: Identify predictor variables.
3. Inference: make predictions based on correlations
Real World application
1. Finance: Analyse stock prices, trading volumes.
2. Medicine: Study disease relationship, treatment outcomes.
3. Social Sciences: Examine social, economic factors.
Normal Probability
Normal probability, also known as gaussian probability, is a continuous probability distribution
that is symmetric about the mean. It is widely used in statistics, mathematics, and science to
model real-valued random variables.
Some key characteristics:
1. Symmetry: The normal distribution is symmetric about its mean.
2. Bell-shaped: The normal distribution has a bell-shaped curve.
3. Mean (𝜇): The average value of the distribution.
4. Standard Deviation (𝜎): It measures the spread or dispersion of the distribution.
5. Total Area: The total area under the curve is 1.
Normal Distribution formula
The probability density function of a normal distribution is given by
1 1 𝑥−𝜇 2
𝑒 −2( )
𝑓 (𝑥 ) = 𝜎
√2𝜋𝜎 2
Where 𝑥 is the random variable, 𝜇 is the mean, 𝜎 is the standard deviation, and 𝑒 is the base
of the natural logarithm.
Types of Normal Distributions
1. Standard Normal Distribution: A normal distribution with a mean of 0 and a standard
deviation of 1 is called standard normal distribution.
2. Non-standard Normal Distribution: A normal distribution with a mean and standard
deviation other than 0 and 1 is called non-standard normal distribution.
Real-World Applications
1. Finance: Modelling stock prices and returns
2. Medicine: Modelling the distribution of blood pressure and other health metrics.
3. Engineering: Modelling measurement errors and tolerances.