0% found this document useful (0 votes)
18 views7 pages

Lecture 5

The document discusses bivariate distribution, which involves analyzing the relationship between two variables, X and Y, using methods like scatter diagrams and correlation coefficients. It explains how to calculate the correlation coefficient to measure the strength and direction of the linear relationship between the variables, as well as introducing rank correlation coefficients for ranked data. Additionally, it covers normal probability distributions, their characteristics, formulas, and real-world applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views7 pages

Lecture 5

The document discusses bivariate distribution, which involves analyzing the relationship between two variables, X and Y, using methods like scatter diagrams and correlation coefficients. It explains how to calculate the correlation coefficient to measure the strength and direction of the linear relationship between the variables, as well as introducing rank correlation coefficients for ranked data. Additionally, it covers normal probability distributions, their characteristics, formulas, and real-world applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Correlation

Bivariate Distribution: Bivariate distribution refers to distribution that consists of two


variables, typically denoted as X and Y, which are measured or observed together. The objective
of analysing bivariate distribution is to under understand the relationship between these two
variables. For example, if we measure the heights and weights of a certain group of persons,
we shall get what is known as Bivariate distribution, one variable relating to height and other
variable relating to weight.

Scatter Diagram: It is the simplest way of the diagrammatic representation of bivariate data.
Thus, for the bivariate distribution (𝑥𝑖 , 𝑦𝑖 ); 𝑖 = 1,2,3, … , 𝑛, if the values of the variables X and
Y be plotted along the x-axis and y-axis respectively in the 𝑥𝑦-plane, the diagram of dots so
obtained is known as scatter diagram.
Scatter Plot example

Coefficient of Correlation: As a measure of intensity or degree of linear relationship between


two variables, Karl Pearson (1867-1936), a British Biometrician, developed a formula called
correlation Coefficient.
Correlation coefficient between two random variables X and Y, usually denoted by
𝑟(𝑋, 𝑌) or simply 𝑟𝑋𝑌 is a numerical measure of linear relationship between them and is defined
as
𝐶𝑜𝑣(𝑋, 𝑌)
𝑟(𝑋, 𝑌) =
𝜎𝑋 𝜎𝑌
If (𝑥𝑖 , 𝑦𝑖 ); 𝑖 = 1,2,3, … , 𝑛, is the bivariate distribution, then
𝑛
1
𝐶𝑜𝑣(𝑋, 𝑌) = ∑(𝑥𝑖 − 𝑥̅ ) (𝑦𝑖 − 𝑦̅)
𝑛
𝑖=1
𝑛
1
𝜎𝑋 = √ ∑(𝑥𝑖 − 𝑥̅ )2
𝑛
𝑖=1

𝑛
1
𝜎𝑌 = √ ∑(𝑦𝑖 − 𝑦̅)2
𝑛
𝑖=1

Where, 𝑟(𝑋, 𝑌) = Correlation Coefficient between variables X and Y.


𝐶𝑜𝑣(𝑋, 𝑌) = Covariance between the variables X and Y
𝜎𝑋 = Standard deviation of variable X
and 𝜎𝑌 = Standard deviation of variable Y
Correlation Coefficient between X and Y can be written as

∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ ) (𝑦𝑖 − 𝑦̅)


𝑟(𝑋, 𝑌) =
√∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 √∑𝑛𝑖=1(𝑦𝑖 − 𝑦̅)2

Here, 𝑥̅ , 𝑦̅ are the means of data X and Y respectively.

Example: Calculate the correlation coefficient for the following bivariate data as
X 3 4 5 6 7
Y 12 14 16 18 20

Solution: Make table to obtain the required quantities which will help in calculating the
Correlation coefficient.
X Y (𝑥 − 𝑥̅ ) (𝑥 − 𝑥̅ )2 (𝑦 − 𝑦̅) (𝑦 − 𝑦̅)2 (𝑥 − 𝑥̅ )(𝑦 − 𝑦̅)
3 12 -2 4 -4 16 8
4 14 -1 1 -2 4 2
5 16 0 0 0 0 0
6 18 1 1 2 4 2
7 20 2 4 4 16 8
𝑥̅ = 5 𝑦̅ = 16 10 44 20

From the above table, we get

∑(𝑥𝑖 − 𝑥̅ )2 = 10

∑(𝑦𝑖 − 𝑦̅)2 = 44
And ∑(𝑥𝑖 − 𝑥̅ )(𝑦𝑖 − 𝑦̅) = 20
Then the correlation coefficient between X and Y is calculated as

∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ ) (𝑦𝑖 − 𝑦̅)


𝑟(𝑋, 𝑌) =
√∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 √∑𝑛𝑖=1(𝑦𝑖 − 𝑦̅)2
20
=
√10√44
20
=
√440
= 0.95
Hence, the correlation coefficient between X and Y as 𝑟(𝑋, 𝑌) = 0.95
Range of correlation coefficient: The range of correlation coefficient 𝑟(𝑋, 𝑌) lies between -1
and 1, Mathematically it can be denoted as −1 ≤ 𝑟(𝑋, 𝑌) ≤ 1.
Interpretation
1. Strength: Coefficient value means 0 ≤ 𝑟(𝑋, 𝑌) ≤ 1.
2. Direction: Positive (direct), negative (inverse).

Correlation
Rank Correlation Coefficient: Let us suppose that a group of n individuals is arranged in
order of merits or proficiency in possession of two characteristics A and B. These ranks in the
two characteristics will, in general, be different. For example, if we consider the relation
between intelligence and beauty, it is not necessary that a beautiful individual is intelligent also.
Let (𝑥𝑖 , 𝑦𝑖 ); 𝑖 = 1,2,3, … , 𝑛 be the ranks of the 𝑖 𝑡ℎ individual in two characteristics A
and B respectively. Then, correlation coefficient between the ranks x’s and y’s is called the
rank correlation coefficient between A and B for the group of individuals.
Assuming that no two individuals are bracketed equal (means have same rank) in either
classification, each of the variable X and Y takes the values 1, 2, 3, …, n.
Then, the Rank correlation coefficient is defined as

6 ∑𝑛𝑖=1 𝑑𝑖 2
𝑟(𝑋, 𝑌) = 1 −
𝑛 (𝑛 2 − 1)
Where, 𝑑𝑖 = 𝑥𝑖 − 𝑦𝑖 ; 𝑖 = 1,2,3, … , 𝑛

Example: The ranks of same 10 students in Mathematics and Physics are as follows. Two
numbers within bracket denotes the ranks of the students in Mathematics and Physics.
(1, 10) (2, 9) (3, 4) (5, 3) (4, 6)
(7, 2) (8, 1) (6, 8) (10, 9) (9, 7)
Calculate the rank correlation coefficient for proficiencies of this group in Mathematics and
Physics.
Solution:
Ranks in Ranks in d=x-y 𝑑2
Maths(X) Physics(Y)
1 10 -9 81
2 9 -7 49
3 4 -1 1
5 3 2 4
4 6 -2 4
7 2 5 25
8 1 7 49
6 8 -2 4
10 9 1 1
9 7 2 4
∑ 𝑑 2 = 222

Thus, the rank correlation coefficient is calculated as

6 ∑𝑛𝑖=1 𝑑𝑖 2
𝑟(𝑋, 𝑌) = 1 −
𝑛 (𝑛 2 − 1)
Using the table, we get 𝑛 = 10 and ∑𝑛𝑖=1 𝑑𝑖 2 = 222
6 ∗ 222
𝑟(𝑋, 𝑌) = 1 −
10 ∗ (102 − 1)
1332
=1−
10 ∗ (100 − 1)
1332
=1−
10 ∗ 999
1332
=1−
9990
= 1 − 0.13
= 0.87
Hence, the rank correlation coefficient as 𝑟(𝑋, 𝑌) = 0.87.

Uses of correlation coefficient: Correlation coefficient measure the strength and direction of
linear relationship between two variables. Here are some uses:
Statistical Analysis
1. Hypothesis Testing: Test significance of correlation
2. Regression Analysis: Identify predictor variables.
3. Inference: make predictions based on correlations
Real World application
1. Finance: Analyse stock prices, trading volumes.
2. Medicine: Study disease relationship, treatment outcomes.
3. Social Sciences: Examine social, economic factors.
Normal Probability
Normal probability, also known as gaussian probability, is a continuous probability distribution
that is symmetric about the mean. It is widely used in statistics, mathematics, and science to
model real-valued random variables.
Some key characteristics:
1. Symmetry: The normal distribution is symmetric about its mean.
2. Bell-shaped: The normal distribution has a bell-shaped curve.
3. Mean (𝜇): The average value of the distribution.
4. Standard Deviation (𝜎): It measures the spread or dispersion of the distribution.
5. Total Area: The total area under the curve is 1.

Normal Distribution formula


The probability density function of a normal distribution is given by
1 1 𝑥−𝜇 2
𝑒 −2( )
𝑓 (𝑥 ) = 𝜎
√2𝜋𝜎 2
Where 𝑥 is the random variable, 𝜇 is the mean, 𝜎 is the standard deviation, and 𝑒 is the base
of the natural logarithm.
Types of Normal Distributions
1. Standard Normal Distribution: A normal distribution with a mean of 0 and a standard
deviation of 1 is called standard normal distribution.
2. Non-standard Normal Distribution: A normal distribution with a mean and standard
deviation other than 0 and 1 is called non-standard normal distribution.
Real-World Applications
1. Finance: Modelling stock prices and returns
2. Medicine: Modelling the distribution of blood pressure and other health metrics.
3. Engineering: Modelling measurement errors and tolerances.

You might also like