CHAPTER TWO
Correlation Theory
2.1. Basic concepts of Correlation
Correlation Analysis
Economic variables have a great tendency of moving together
there is a possibility that the change in one variable is on average
accompanied by the change of the other variable.
This situation is known as correlation.
Correlation may be defined as the degree of relationship existing
between two or more variables.
The degree of relationship existing between two variables is called
simple correlation.
The degree of relationship connecting three or more variables is
called multiple correlations.
Cont’d…
A correlation is also said to be partial if it studies the degree of
relationship between two variables keeping all other variables are
constant.
Correlation may be linear,
when all points (X, Y) on scatter diagram seem to cluster near a straight,
or nonlinear, when all points seem to lie near a curve.
In other words, correlation is said to be linear if the change in one
variable brings a constant change of the other.
It may be non-linear if the change in one variable brings a different
change in the other.
Cont’d…
Correlation may also be positive or negative.
If it is positive an increase or a decrease in one variable is
accompanied by an increase or a decrease by the other
i.e. in which both variables are changed with the same direction.
Eg, Qs=a+bP
If it is negative, an increase or a decrease in one variable is
accompanied by a decrease or an increase in the other
i.e. in which both are changed with opposite direction.
Eg, Qd=a-bP
Methods of Measuring Correlation
There are three methods of measuring correlation.
These are:
The Scattered Diagram or Graphic Method
The Simple Linear Correlation coefficient
The coefficient of Rank Correlation
A. The Scattered Diagram or Graphic Method
It puts the data into X-Y plane by moving from the lowest data set to
the highest data set.
Cont’d…
The closer the data points come together and make a straight line,
the higher the correlation between the two variables, or the stronger
the relationship.
A perfect positive correlation is given the value of 1.
A perfect negative correlation is given the value of -1.
Cont’d…
The closer r is to +1, the stronger the positive correlation.
The closer r is to -1, the stronger the negative correlation. Examples
of strong and weak correlations are shown below.
2.2 Correlation coefficient and types of Correlation coefficient
The Popln Correlation Coefficient ‘’ and its Sample Estimate ‘r’
For a precise quantitative measurement of the degree of
correlation between Y and X we use a parameter which is
called the correlation coefficient and is usually designated by
the Greek letter .
Having as subscripts the variables whose correlation it
measures, refers to the correlation of all the values of the
population of X and Y.
Its estimate from any particular sample (the sample statistic
for correlation) is denoted by r with the relevant subscripts.
Cont’d…
For example if we measure the correlation between X and Y
the population correlation coefficient is represented by xy and
its sample estimate by rxy.
The simple correlation coefficient is used to measure relationships
which are simple and linear only.
It cannot help us in measuring non-linear as well as multiple
correlations.
Sample correlation coefficient is defined by the formula:
or
Where,
Cont’d…
Example 2.1: The following table shows the quantity supplied for a
commodity with the corresponding price values. Determine the type
of correlation that exists between these two variables.
Table 1: Data for computation of correlation coefficient
Time period(in days) Quantity supplied Yi Price Xi (in shillings)
(in tons)
1 10 2
2 20 4
3 50 6
4 40 8
5 50 10
6 60 12
7 80 14
8 90 16
9 90 18
10 120 20
Solution
Table 2: Computations of inputs for correlation coefficients
Y X xi X i
X yi Yi Y x2 y2 xiyi XY X2 Y2
10 2 -9 -51 81 2601 459 20 4 100
20 4 -7 -41 49 1681 287 80 16 400
50 6 -5 -11 25 121 55 300 36 2500
40 8 -3 -21 9 441 63 320 64 1600
50 10 -1 -11 1 121 11 500 100 2500
60 12 1 -1 This re
1 sult sho1ws that th
-1 ere 720 144 3600
positiv is a str o n g
80 14 3 19 9 e corre361 57een th1120
lation betw 196 6400
suppli e q ua n ti ty
90 16 5 29 25ed and th
841
e price 145
of the co1440
m m 256 8100
under49conside841 o dit y
90 18 7 29 ration. 203 1620 324 8100
120 20 9 59 81 3481 531 2400 400 14400
Sum=610 110 0 0 330 10490 1810 8520 1540 47700
Mean=61 11
•
• Check it by using Formula.
Cont’d…
The simple correlation coefficient has the value always ranging
between -1 and +1.
i.e. the value of correlation coefficient cannot be < -1 and cannot be > +1.
Its minimum value is -1 and its maximum value is +1.
If , there is positive correlation between the two variables and
movement from zero to +1 increases the degree of positive
correlation.
If the correlation coefficient is zero, it indicates that there is no
linear relationship between the two variables.
If the two variables are independent, the value of correlation
coefficient is zero
but zero correlation coefficient does not show us that the two
variables are independent.
Cont’d…
Interpretation of Pearson’s Correlation Coefficient
The sign of the correlation coefficient determines whether the correlation
is positive or negative.
The magnitude of the correlation coefficient determines the strength of
the correlation.
Correlation is an effect size and the absolute value of r:
0.00- 0.19 very weak”
0.20-0.39 “weak”
0.40-0.59 “moderate”
0.60-0.79 strong”
0.80-1.0 “very strong”
For example a correlation value of r=0.42 would be a “moderate positive
correlation”.
Properties of Simple Correlation Coefficient
The simple correlation coefficient has the following important properties:
1. The value of correlation coefficient always ranges between -1 and +1.
2. The correlation coefficient is symmetric.
i.e, where, is the correlation coefficient of X on Y and is the correlation
coefficient of Y on X.
3. The correlation coefficient is independent of change of origin and change
of scale.
4. If X and Y variables are independent, the correlation coefficient is zero.
But the converse is not true.
5. The correlation coefficient has the same sign with that of regression
coefficients.
6. The correlation coefficient is the geometric mean of two regression
coefficients.
Cont’d…
The major limitations of correlation coefficients are;
1. The correlation coefficient always assumes linear relationship
regardless of the fact whether the assumption is true or not.
2. Great care must be exercised in interpreting the value of this
coefficient as very often the coefficient is misinterpreted.
Eg, high correlation between lung cancer and smoking does not show us
smoking causes lung cancer.
3. The value of the coefficient is unduly (overly) affected by the
extreme values
4. The coefficient requires the quantitative measurement of both
variables.
Definition of Covariance
It is a systematic relationship between a pair of random variables wherein a change in
one variable reciprocated by an equivalent change in another variable.
Covariance : Measures how changes in one variable are associated
with changes in a second variable.
Variance: Measures how spread out values are in a given datasets.
Covariance can take any value between -∞ to +∞,
wherein __ the negative value is an indicator of negative relationship
___ a positive value represents the positive relationship.
Further, it ascertains the linear relationship between variables.
Therefore, when the value is zero, it indicates no relationship.
In addition to this, when all the observations of the either variable are
same, the covariance will be zero.
Cont’d…
In Covariance, when we change the unit of observation on any or
both the two variables,
then there is no change in the strength of the relationship
between two variables but the value of covariance is changed.
For two random variables, X Y their covariance is defined by
cov(X ,Y) = E(X − EX )(Y − EY ),
Alternative expression: cov (X ,Y ) = EXY − (EX )(EY ) .
Proof cov (X,Y)
are constants; use linearity)
Cont’d…
Random variables , X Y are uncorrelated if cov(X ,Y) = 0
Uncorrelatedness is close to independence, i.e. one variable does
not influence the other.
Or there is no statistical relationship between uncorrelated
variables.
Key differences between covariance and correlation
A measure used to indicate the extent to which two random
variables change in cycle is known as covariance.
A measure used to represent how strongly two random
variables are related known as correlation.
Covariance is nothing but a measure of correlation.
On the contrary, correlation refers to the scaled form of covariance.
Cont’d…
The value of ---- correlation takes place between -1 and +1.
----- covariance lies between -∞ and +∞.
Covariance is affected by the change in scale, i.e. if all the value of
one variable is multiplied by a constant, then the covariance is
changed.
As against this, correlation is not influenced by the change in scale.
Correlation is dimensionless, i.e. it is a unit-free measure of the
relationship between variables.
Unlike covariance, where the value is obtained.
The Rank Correlation Coefficient
In many cases the variables may be qualitative (or binary variables)
and hence cannot be measured numerically.
For example, profession, education, preferences for particular
brands, are such categorical variables.
so that it is impossible to calculate the value of the correlation
coefficient with the formulae developed in the preceding section.
For such cases it is possible to use another statistic, the rank
correlation coefficient (or spearman’s correlation coefficient.)
We rank the observations in a specific sequence for example in order
of size, importance, etc., using the numbers 1, 2, 3… n.
Cont’d…
If two variables X and Y are ranked in such way that the values are
ranked in ascending or descending order, the rank correlation
coefficient may be computed by the formula
Where,
D = difference between ranks of corresponding pairs of X and Y
n = number of observations.
The values that r may assume range from + 1 to – 1.
Example, A market researcher asks experts to express their preference for twelve
different brands of soap. Their replies are shown in the following table.
Cont’d…
Table 3: Example for rank correlation coefficient
Brands of soap A B C D E F G H I J K L
Person I 9 10 4 1 8 11 3 2 5 7 12 6
Person II 7 8 3 1 10 12 2 6 5 4 11 9
Table 4: Computation for rank correlation coefficient
Brands of soap A B C D E F G H I J K L Total
Person I 9 10 4 1 8 11 3 2 5 7 12 6
Person II 7 8 3 1 10 12 2 6 5 4 11 9
Di 2 2 1 0 -2 -1 1 -4 0 3 1 -3
Di2 4 4 1 0 4 1 1 16 0 9 1 9 50
The rank correlation coefficient is
This figure, 0.827, shows a marked similarity of preferences of
the two persons for the various brands of soap.
Partial Correlation Coefficients
A partial correlation coefficient measures the relationship between any
two variables, when all other variables connected with those two are
kept constant.
For example, let us assume that we want to measure the correlation
between the number of hot drinks (X1) consumed in a summer resort
and the number of tourists (X2) coming to that resort.
It is obvious that both these variables are strongly influenced by
weather conditions, which we may designate by X3.
Cont’d…
On a priori grounds we expect X1 and X2 to be positively correlated:
when a large number of tourists arrive in the summer resort, one should expect a
high consumption of hot drinks and vice versa.
The computation of the simple correlation coefficient between X1 and
X2 may not reveal the true relationship connecting these two variables,
however, because of the influence of the third variable, weather
conditions (X3).
In other words, the above positive relationship between number of
tourists and number of hot drinks consumed is expected to hold if
weather conditions can be assumed constant.
Cont’d…
If weather condition changes, the relationship between X1 and X2 may
change to such an extent as to appear even negative.
Thus, if the weather is hot, the number of tourists will be large, but
because of the heat they will prefer to consume more cold drinks and
ice-cream rather than hot drinks.
If we overlook the weather and look only at X1 and X2 we will observe a
negative correlation between these two variables which is explained
by the fact that hot drinks as well as number of visitors are affected
by heat.
Cont’d…
In order to measure the true correlation between X1 and X2, we must
find some way of accounting for changes in X3.
This is achieved with the partial correlation coefficient between X1
and X2, when X3 is kept constant.
In our example there are three simple correlation coefficients
– r12 = correlation coefficient between X1 and X2
– r13 = correlation coefficient between X1 and X3
– r23 = correlation coefficient between X2 and X3
Cont’d…
The partial correlation coefficient between X1 and X2 , keeping the
effect of X3 constant is given by:
Similarly, the partial correlation between X1 and X3, keeping the effect
of X2 constant is given by:
and
Example: The following table gives data on the yield of corn per acre(Y),
the amount of fertilizer used(X1) and the amount of insecticide used (X2).
Cont’d…
Then Compute the partial correlation coefficient between the yield of
corn and the fertilizer used keeping the effect of insecticide constant.
Table 5: Data on yield of corn, fertilizer and insecticides used
Year 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980
Y 40 44 46 48 52 58 60 68 74 80
X1 6 10 12 14 16 18 22 24 26 32
X2 4 4 5 7 9 12 14 20 21 24
The result of ryx1=0.9854
ryx2=0.9917 Home work
r Compute the partial correlation coefficient
x1x2=0.9725
between the yield of corn and the insecticide used
Then,
keeping the effect of fertilizer constant.
r yx r yx rx 0 . 9854 ( 0 . 9917 )( 0 . 9725 )
0 . 7023
x2
r yx
1 2 1
.x 2
r yx rx x 0 . 9917 0 . 9725
1
2 2 2 2
(1 )( 1 ) (1 )( 1 )
2 1 2
End of Ch-2!
Thank you for your
attention!!!