0 ratings0% found this document useful (0 votes) 14 views34 pagesCorrelation Analysis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
Correlation Analysis |
LEARNING OBJECTIVES
‘After studying this chapter, you should be able to
* express quantitatively the degree and direction of the covariation or association
between two variables.
‘+ determine the validity and reliability of the covariation or association between two
variables.
«provide a test of hypothesis to determine whether a linear relationship actually exists
between the variables.
13.1 INTRODUCTION
The statistical methods, discussed so far, are used to analyse the data involving only one
variable. Often an analysis of data concerning two or more quantitative variables is needed
to look for any statistical relationship or association between them that can describe specific
jumerical features of the association. The knowledge of such a relationship is important
inake inferences from the relationship between variables in a given situation. Few instances
iiore the knowledge of an association or a relationship between two variables would be
helpful to make decision are as follows:
ome and expenditure on luxury items.
Id ofa crop and quantity of fertilizer used.
«Sales revenue and expenses incurred on advertising
Frequency of smoking and lung damage.
Weight and height of individuals.
jrection
Alew
the strength (magnitude) and di
A statistical technique that is used to analyse
bles is called correlation analysi
of the relationship between two quantitative varia
definitions of correlation analysis are as follows:
. 1c relationship of two or more variables is usually called correla
‘An analysis of the relationship of two or more variables is usually called 87" ye
a
+ When the relationship is ofa quantitative nature, the appropriate statistical wool
discovering and measuring the relationship and expressing it in brief formule
—Croxton and Cow
known as correlation.sient of correlation is a number that indy RELATIONS 458 |
ine fetip DEED 0 abe, tes the strength and, ;
is
the rene
direction of
. closeness of
variables are
. for e
‘The direction is determined by whether one
+ ADS cases when the other variable increases,
following questions determine the importance of
. or m ‘a a te of examining the statistical
ip between tWO OF more variables and accordingly 5
ewer these questions: accordingly requires the statistical
the points to
plotted on a graph. A
iting the relationship.
wiable generally increases or
ee ro answer thes
ice as two or more variables? If yes, what is the form and
a ih ip strong or significant enough to be useful to an
ee ie peru at ndepenicr as ea eed
@
reata desirable
-The first two questions will be answered in this chapter, whi aoe wa
seanswered in next chapter. is chapter, while the third question will
For correlation analysis, the data on values of two variables ne fr ing
sa pairs, one for each ofthe two variables, hes must come from sampling
152_SIONAGANGE OF MEASURING CORRELATION
‘The objective of any scientific research is to establish relationships between two oF more
sas of observations or variables to artive at some valid conclusion. Few advantages of
tnessuring an association (or correlation) between two or more variables are as under:
1. Correlation analysis contributes to the understanding of economic behaviour, aids Coefficient of
in locating the critically important variables on which others depend, may r
tothe economist the connections by which disturbances spread and suggest to him measure of the degree
the paths through which stabilizing forces may become effective fof association between
nEWA. Neiswanger §%0 variables.
2. The effect of correlation is to reduce the range of uncertainty of our prediction.
‘The prediction based on correlation analysis will be more reliable and near co
reality. — Tippett
3, Ineconomic theory, an association (or correlation) between two or more variables,
such as price, supply and quantity demanded; customers retention is related to
convenitnee, amenities and service standards; yield of a crop is related to quantity
Gferilizer applied, type of soil, quality of seeds, rainfall and so on i established,
4. In healtheare, an association (or correlation) between two or more variables such
as validity and reliability of clinical measures; effect on health due to certain
biological or environmental factors, blood pressure and age of person: inter
observer reliability for two doctors who are asessing a patient's disease, and so on
is established.
|| Correlation: A statistical
13.3. CORRELATION AND GAUSATION
Correlation is one the three criteria for establishing a causal relationship between two
oF more variables, While correlation coefficient only measures the strength of a linear
relationship but it does not necessarily imply a causal relationship. ‘The following factors
should be examined to interpret the nature and extent of relationship between two oF
more variables:
) Chance Coincidence: ‘The inferences drawn from the value of correlation
coefficient may not be of any statistical significance because variables might be[454 cwprersa
entirely different and unrelated. Any association between them may be
a chance. For example, (i) a positive correlation between growth in pero)
and wheat production in the country has no statistical significance. aay ulstion
correlation in sales revenue and expenditure on advertisements over 4 (he
of time should be statistically significant and not just due to biased samy h et
sampling error. Pg on
Influence of Third Variable: Clinically, it has been proved that smoking ¢
lung damage. However, there are often multiple reasons such as stresy’ cau
of food and air pollution, of health problems. Similarly, the yild of rice git ®?
is positively correlated because both the crops are influenced by the amon
rainfall, But the yield of any one is not influenced by other. tof
Mutual Influence: Although two variables might be highly correlated, si
$ to which variable is influencing the other. For example, varity
supply, and demand of a commodity are mutually correlated. As price
ofa commodity increases, its demand decreases, so price influences the deman
level. But when demand of a commodity increases, its price also increases na
demand influences the p'
13.4 TYPES OF CORRELATIONS
‘There are three broad types of correlations:
(i) Positive and negative
(ii) Linear and non-linear
(iii) Simple, partial and multiple
In this chapter, we will discuss simple linear positive or negative correlation analysis,
13.4.1 Positive and Negative Correlations
‘The positive (or direct) correlation refers to
their values change (ic., increasing or decreasing) in the
(or inverse) correlation refers to an association between two
change (ic. increasing or decreasing) in the opposite direc
jon between two variables where
me direction. The negative
iables where their values
Illustration
Positive Correlation
ox: 5 8 10 15 17
Increasing > y : 10 12 16 18 20
Decreasing 2x: 17 15 10 8 5
Decreasing >y : 20 18 16 12 10
Negative Correlation
Increasing x: 5 8 10 15 17
Decreasing >y : 20 18 16 12 10
Decreasing +x : 17 15 12 10 6
Increasing >y > 2 7 9 13 14
Remarks: The change (increasing or decreasing) in values of both the variables may net
be proportional or fixed.
13.4.2 Linear and Noninear Correlations
A linear correlation refers to an association between two variables where variation
values is either proportional or fixed. The following pattern of variation in the values
‘wo variables x and y reveals linear correlation.
theiro
rr a |
0 1399 «170150 sen
r a
60 600
a
girs of values of x and y are
weer vest ‘hot be a straight line, free
Pe
wu ooo partial and Multiple Correlations
petween simple, partial and am
a er eraies involved in the correation analy
poe ANG variables are chosen to sidy correlation between them, then sx
if aely 00 (ered wo a simple correlation. A stud the yd of rerop nah cespet
rt i erie wed oF sales evene wth expe tn aot ot esney peas
ceo pet ate af examples of simple correlation "
a ‘al correlation, wo variables are chosen to study the correlation between them
a of aber influencing variables is kept constant. For example (i) yield of a
‘ed by the amount of fertilizer applied, whereas effect of other influencing
as rainfall quality of seed type of oi and pesticides i bept constant, and
sao tro a pric influenced by the lve of advertising experdture, whereas
se feaerinflvencing vatiables such as quality of the prec, price campers
eration and so on is kept constant,
nite correlation, more than two variables are chosen to study the «
soon nem For example, (i) employer-employee relationship in any organiza
ied with reference to, training and development faites; medical, be
ae rermaton to children facts; salary structures grievances handling system
Mt a) aks revenue from a product may be examined in relation with the level of
sheng expenditure, quality of the product, price, competitors, distribution and so on
raph paper, the line joining
curyy linear
le correlations is based upon the
13§ METHODS OF CORRELATION ANALYSIS
Ne corelation between two ratio-scaled (numeric) variables is represented by the letter,
‘vlad: ales on values between -1 and +1 only and is referred to as “Pearson product
¢ correlation coefficient is a relative
eement correction’ or correlation coefficient. Thi
casurement
sale fee) number and so its interpretation is independent of the units of me:
so varables, say x and 9.
Jn ts chapter, the following methods of calculating a correlation coefficient between
Se vanables x and y are discussed:
* Sater Diagram method
* Karl Pearson's Coefficient of Correlation method
: Spearman's Rank Correlation method
Method of Least-squares
i
hte ; is re ed
n't 13.1 shows how the strength of the associauion between two variables is represent
* coe
ficient of correlation.
CORRELATION ANALYSIS
ce[456 cmaprenss
Figure 13.4
Interpretation of Corelation
Cootcent
Seatter Diagram: A
graph of pairs of values
‘of two variables that
is plotted to indicate
a visual display of
the pattern of their
relationship.
‘Moderate posit
correlation
No correlation
Perfect negative Moderate negative
‘correlation ‘correlation
18.5.1 Scatter Diagram Method
atavglance method to understand an apparen,
he scatter diagram method is an
Piss eel Ng nn ag anh
na gray 1 by plotting pairs of values of les, sandy, taking values of varia
aan re ey of sarable.yon the axis The horizontal and vertical aes og
Sealed in units corresponding to the variables x andy, respectively. A straight fine dan
through these pair of values describes different types of Felationships between the iyq
variables. .
“gqure 132 shows examples of different types of relationships based on pairs of
Tincar relationships since the patterns are described by straight lines. The pattern in Fig
15.2{a) shows a postive relationship since the value of y tends to increase as the value of x
increases, whereas pattern in Fig, 13.2(b) shows a negative relationship since the value of
tends to decrease as the value of x increases. .
“The pattern shown in Fig, 18.2(¢ illustrates very low or no relationship between the
values of x and y, whereas Fig. 13.2(d) represents a curvilinear relationship since it is
described by a curve rather than a straight line. The wider scattering indicates that there
is a lower degree of association between the two variables x and y than there is in
Fig. 13.2.
Interpretation of Correlation Coefficients
While interpreting correlation coefficient x the following points should be taken into
account
+ A low positive or negative value of correlation coefficient, r, indicates that the
traight line. A non-linear relationship may
relationship is poorly described by
also exist.
‘+ Acorrelation is an observed association and does not indicate any cause-and-effect
relationship.
‘Types of Correlation Coofficients
‘Table 13.1 shows several types of correlation coefficients used in statistics along with the
conditions of their use. All of them are appropriate for quantifying linear relationship
between two variables x and y.
Table 13.1: Types of Corelaton Coefficients
Coefficient Gonditions Applied for Use
= 6 (phiy Both x and y variables are measured on a nominal scale
© p (rho) Both x and y variables are measured on, or changed to, ordinal
scales (rank data)
wr Both x and y variables are measured on an interval or ratio scale
scales (numeric data)
——____seles (numeric dara)
‘The correlation coefficient, denoted by n(eta), is used for quantifying non-line
relationships (It is beyond the scope of this text). In th chapter, methods of calcul:ation coefficients,
| corte
T and Spe,
| Zs
(9 Potive Linear Relaontin
(©) No Relationship
(8) Non-Linear Relationship"
fates ofthe Corelation Coefficient
Thefollowing are the common features among all correlation coefficient:
{) The value of correlation coefficient, x depends on the slope of the line passing
through the data points and the scattering of the pair of values of variables >
4 about this line,
The sign of the correlation coefficient indicates the direction of the relationship.
The positive correlation denoted by + (posi
sign) indi «the direction
of increase (or decrease) in the value of two variables is same, While negative
correlation denoted by ~ (minus sign) indicates that direction of increase (or
decrease) in the value of two variables is opposite.
{Gi) The values of the correlation coefficient range from + 1 to~ 1 regardless of the
units of measurements of x and Phat is, correlation coefficient is
independent of the unit of measurement.
(©) The value of correlation coefficient r = +1 or-I indicates perfect linear association
(relationship) between two variables, x and y. A perfect correlation implies that
every observed pair of values of x and y falls on the straight line, .
©) The value of correlation coefficient indicates the strength of association
(elationship) between two variables, ic.,a closeness ofthe observed pair of values
of and to the straight line. The sign of the correlation coe!
Strength of the linear relationship. 7 F
‘) Theralue of correlation coefficient remains unchanged when a constant ae
subtracted from every pair of values of variables x and y (abo referred 2 a
oforigin), also when a pair of values of variables x and y are divides
byaconstant (also referred to as change of scale).
and
pure number
ient indicates the[458 carers
+
Figure 13.3
Scatter Diagram
(sii) ‘The value of co
the data points is horizontal, am
two variables x and y.
il therefore no assoc
(ii) The square, 72, of correlation coefficient, ¥ val
determination.
Example 13.1: Given the following data:
Student 1 2 4.5 6 7 8B 9 yw
wepaude 400 675 475 350 425 G00 550 325 675 459
aptitude score
Grade pointaverage : 18 38 28 17 28 3.1 26 19 52 95
fa). Draw this data on a graph paper. ;
w Isthere any correlation between per capita ri jonal income and per capita consume,
expenditure? If yes, what is your opini
i is iate ind y axes, the pairs of obsery
Solution: By taking an appropriate scale on the x and y h vations
are plotted areal paper as show! in Fig. 13.3. The scatter eee in Fig. 13.3 with
stratgha line represents the relationship between x and y “ited through it
Grade Point
average
c 40
30
20
10
300 400 500 60700 800, *
Management Aptitude Score
bles are very close to a straight line passing
a high degree of association between two
tes a high degree of linear positive
nce pairs of values of tt
mn, therefore it appears that ther
The pattern of dotted points also indica
13.5.2 Karl Pearson's Correlation Coefficient
antitatively measures the degree of association
ues of x and 3,
Karl Pearson’s correlation coefficient q)
ationship) between wo variables x and y. For a set of n pairs of
son's correlation coefficient, r, is given by
Covariance (x,y) _ Cov(s, y)
where Covle,) = 1 S(e-%y-5)
n
= EES standard a
a ease © standard deviation of sample data on variable
Substituting values of Cov(r, ), ¢, and o,, we have
ation of sample data on variable*CORRELATIONANALYSIS 459
n
i EQ Ex) Ey)
-ExF Inne UD
are in fra
oan Method fr Ungrouped Data
se al mean values of variables x and y
11 3C0 gn coefficient can be simplified by «
cote alues from their assumed means Aang pecan
oe
» then calculation of Pearsons
8 deviations, d= x-Aandd, = 9
Fespectively. he formulaieengow ne
r= Edad, ~ Ed) a)
z
neds Ed nbd cay (13.2)
ap bvation Method for Grouped Data
yes ofvariablesx and y values are classified i
ipa) modified as Med intoa frequency distribution, then formula
arti =F fa.) & fa)
ie fs pa? ne eae (13-3)
{ssumptions for Using Pearson's Correlation Coefficient
1, Pearson's correlation coefficient is used only when both vari
onan interval or a ratio scale. ly th vat iables x andy are measured
2, Pearson's correlation coefficient is used only oe
Lea sed only when two variables x and y are linearly
Advantages and Disadvantages of Poarson’s Correlation Coefficient
‘The numerical value of correlation coefficient between
as well as the direction (positive or negative) of associ
limitations of Pearson's method are as follows:
1 and 1 indicates the strength
jon between two variables. Few
1. Pearson's correlation coefficient is used only when two variables x and y are linearly
related.
2. The value of the coefficient is unduly affected by the extreme values of two variable
values.
3. Comparatively, the computational time required to calculate the value of Pearson's
correlation coefficient, r, is lengthy.
43.5.3 Probable Error and Standard Error of Coefficient of Correlation
‘The probable error (PE) of Pearson's correlation cocfficient,r, indicates the extent .o which
its value depends on the condition of random sampling. If isthe value of correlation
coefficient in a sample of n pairs of observations, then its standard error SE, is given by
‘The probable error of the coefficient of correlation is calculated as follows:
a
= S| 1.6745,
PE, = 0.6745 SE, = 0.6745 —-—
‘The amount of Pe, is helpful to determine the range, P, = 7 * Pew within which
population coefficient of correlation is expected to fall where 1p,(rho) represents population
coefficient of correlation.\e proportion of
m in the
the ¥:
dependent variable
independent variable,
Figure 13.4
Interpretation of Coefficient of
Determination
ue of ris not significant, ie. there is no relationship betwee,
2 Mfr > 6PE, then
bles.
i.e. there exists a relationships be
iP between ten
lustration: Ifr = 0.8 and n = 25, then PE, becomes
=a?
poras 110.80
0.36,
5 = 0.008
0.6745,
slation correlation coefficient (p,) should ft jy
18 * 0.018 or 0.752 Sp, 50.848,
13.5.4 Coefficient of Determination
“The coefficient of determination, 1 always hasa value between O and I. While suai
the value of correlation coefficient, the information about the strength ofthe relation
is retained but the information about the direction is lost. The value of crfiins
determination represents the proportion (or percentage) of the total variability im the depen
triable, that t explained by the independent earabe, x. The proportion (or percentage) o
‘ariation in that x can explain determines precisely the extent or strength of asosion
between two variables andy (See Chapter [4 for detail).
According to Tattle, the confcient of comelaton, x hasbeen grassy overated ard is wed xin
too much. Tis square, confcient of determination 7, is a much more wsefil measure of eins,
Cavariance of two variables. The reader should develop the habit of sparing exer corn
teoffitien he finds cited or sated before coming to ans conclusion about the extent of he ny
relationship between to corelated variables.
Interpretation of Coefficient of Determination
“The knowledge of coefficient of determination is helpful in interpreting the strength of
ation in terms of percentage between two variables. Figure 13.4 illustrates proportion
,c) of explained variation in the value of dependent variable, .
(percent
9 variation in values of x. That is, there ism
# Ifr? = 0, then no variation in y du
association between xand y
«© Iff= 1, then entire variation in y is due to variation in values of x. That is thereis
perfect association between x and y,
« 1f0<12¢ 1, then degree of variation in y due to variation in values of x dependson
the value of 2, Value of 2 close to zero shows low proportion of variation in due
to variation in values of x. On the other hand, value ofr? close to one shows tht
the entire variation in y is due to variation in values of x.
00 1.00
Moderate
Strength of association betw
Proportion (percentage) of explained variation in y
Mathematically, the coefficient of determination is determined as
Explained variability in y
Total variability in y
By- pt Ly _ nest aaty tee
“r0-7 | nby-G)
where 5 =a + bvis the estimated value of y for given values of x.
|| ter correlation between variable x (height) and w
| Zeamles rent of determination 12 = gag er yn i
Fo" en tHE ON tions (changes) in value of variable een c
a able (neigh t
Ae OF ng 1 per cent of the variations may Yeas, ® aU 1 variable
} Mea fatty foods. a er factors, say
‘rant «© know that the ‘variability’ refers to the dispersion of
rund its mean value. The greater the correlation coelitient the woe
le jetermination, and the variability i the greater the
in depend .
Spendent variable. lependent variable can be accounted
CORRELATIONALYSSS 464 |
iable y (weight) be .
- The following table gives indices of industri
ste The of industrial producti umber
wa epee nH Cate a ofthe cotelatoncotiiene
on 1991 1992 1993
Tejex of Production : 100 102 104
impor unemployed: 151213
1994 1995 1996 1997 1998
107 10511210399
Moo eg
Calculations of Karl Pearson's correlation coefficient are shown below:
Gor Production x= (8-3) a? Unemployed hag=p
- " ead,
Foor 100 4 «6 15 0
oe 102 oon ae oe -3
1998104 0 o 13 “2
199107 4g ay v4
1995105 +1 1 oR -3
1996112 +800 to -3
1997103 -1 1 19 +4
199899 3 (6 + =35
Total 832 0120120
Ex 2882 © hog, and 5
nEdyl, ~ 2d, (2d)
8x-92
© JB 120 f8 «184
— = 0.619
148.580
i = -0.619 is moderately negative, it
Interpretation: Since coefficient of correlation it
Facets a tide that as the production index increases, the number of unemployed
1
decreases and vice versa.
: owing table gives the distribution of items of production andl also
Fe ets 018 ‘them, according to size groups. Find the correlation
coefficient between size and defect in quality.
Si ds : W-16 16-17-1718 B19 19-20 20-21
SS iN goo 270 40860400, 300
Jo. of i
ive items @)) i505 162, 170) 18018014
No, of defeive ies ©) [Delhi Univ, B.Com, 2007}
sour Let group size be denoted by variable x and number of defective items by
we * alcatations for Karl Pearson's correlation coefficient are shown below:je
EE Pewent of d= 3-30
Difedive Noms
Size
Group
4%
1 60
050
150 A
4 45 th
3 9 8 f
et
Substituting values in the formula of Karl Pearson's correlation coefficient r, we haye i
n¥dgd, ~ (Ed) (Edy) ;
re
nEd? ~(Ed,)* nEdy - (dy
6 x-106= 3418 036-54
[gx 19 ~(3 Yo 894 — (18) Vios J5010
Interpretation: Since value of r is negative, and is close to =1, association (relationship)
between x(size group) and y(percent of defective items) is moderate and negative. Hence,
it may be concluded that when size of group increases, the number of defective items
decreases and versa. gs
Example 13.4: The following data relate toage of employees and the number of days they
reported sick in a month.
Employes: 1 2 3 4 5 6 7 8 9 10 ,
Age 30 3203540 ABO ZH ST
Sick days: 1 0 2 5 2 4 6 5
Calculate Karl Pearson's coefficient of correlation and interpret it.
[Kashmir Univ, B.Com, 2005) 5
ily, respectively
\ge_and sick days be represented by variables x
Juc of correlation coefficient are shown below:
ee
Age Sick days
x xe y d=y-5 a dul,
30 -16 1 3 9 48
32 e
it nEded, - (Edy) (Zd,) ne
$ r= =
rnd ca? re Gay fOxi093 Jidxe
230
= 230 = 0.87
264.363 u
yalue of ris positive, therefore age of em
related toa high degree. Hence, we maj
463
= iployees and number of sick
ce veclated t e y conclude th
a ey pe is ikely (0 go on sick leave more often than otha 8
3 oye .
et ce following table shows the frequency, according tothe marks, obtained
Of ge in an intelligence test. Measure the degree of relationship between age and
sj st
Sis Age in years Tal
in years Taal
qaMois 18190
fat Marks TE
200 - 250 4 4 2 1 u
g50-300 8 5 4 2 4
00 ~ 350 2 6 8 5
sso-400 EA 6
Total 10 19 20 18 67
[Allahabad Univ., B.Com., 2007)
Let age of students and marks obtained by them be represented by variables
respectively. Calculations for correlation coefficient for this bivariate data are
sao
‘how below:
‘Age in years
x| 18 19 20 21
a\ -1 0 1 2 Total, f | fay fay fad,
J 7 0
.
6
6 | 14 instead ot 8 | a2
6
8
nian the correct value of correlation coefficient between x and y
(MD Unis, M.Com., 2006; Kimazom Univ, MBA, 2007]
von: The corrected values of variables required for the formula of Pearson’
solution: The tern ave determined as follows: the la of Pearson's
correlation coe
{ Correct Ex = 125- (6 + 8-8-6) = 125
| Correct By = 100-(14 + 6- 12-8) = 100
Correct Ex? = 650 — {(6)® + (8)* - (8)* - (6)2}
650 ~ (36 + 64-64-96) = 650
Correct Ey? = 460 - {(14)? + (6)? - (12)? - (8)}
460 - (196 + 36 ~ 144-64} = 436
Correct xy = 508 - {(6x 14) + (8x6) - (8X12) ~ (6 x 8)}
508 ~ {84 - 48-96 ~ 48} = 520
Applying the formula
n&xy ~ (Ex) (Ey) - 25 x 520-125 100
rex (Ex)? Yndy— Ey (25% 650 -(125)* y25% 436 = (100)?
= 13,000 - 12,500 _ a = 0.667
Je25 900 25x 30
Thus, the correct value of correlation coefficient between x and y is 0.667.
Self-practice Problems 13A
18.2 Find the correlation coefficient by Karl Pearson's
method between x and y and interpret its valu
neha
y2B7 42 40 33 42 45 42 44 40 56 44 43
181 Making use of the data summarized below, calculate
the coefficient of correlation.
Ge ae 5 1 10 60 30 41 29 97 27 19 18 19 31 29
ae ® E 124g Gajeatate the coefficient of correlation from the
B 6 4 ro ob OB following data: ; -
c 6 G AL 8 x: 100 200 300 400 500 600 700
yr 94H 9 4 Fi (30 50 60-80 100 110 130
> 9CHAPTER 13
[466
13.4. Calculate the coefficient of correlation between x and
9 from the following data and calculate the probable
‘errors, Assume 69 and 112 as the mean value for
andy, respectively.
x: 78 89 99 60 50 79 68 61
y + 125 137 156 112 107 136 123 108
Find the coefficient of correlation from the following
data
Cost : 39 G5 62 90 82 75 25 9B 36 78
Sales: 47 53 58 86 62 G8 60 91 51 84
{Madras Uni, B.Com, 2005]
Calculate Karl Pearson's coefficient of correlation
between age and playing habits from the data
below. Also calculate the probable error and comment
on the valu
Age 20 21 22 23 24 2
No. of students : 500 400 300 240 200 160
Regular players: 400 300 180 96 60 24
(HP Univ, MBA, 2005]
Find the coefficient of correlation between age and
the sum assured (in 1000 %) from the following table:
13.85
13.6
13.7
‘Age Group Sum Assured (a)
Co 10 20 30 40 50
20-80 46 3 7 1
30-40 2 8 b 7 1
40-50 3 9 2 6 2
50-60 8 4 2 = =
[Delhi Univ, MBA, 2007)
[Hints and Answers
13.2 r= -0.554 133 r= 0.997
134 r= 0.014 13.5 r= 0.780
13.6 r= 0.005 13.7 r= -0.256
13.8 Family income and its percentage spent o
the eave of one hundred farnilies pre the ne in
bivariate frequency distribution. Cateulye™*
coefficient of correlation and interpret its value,
Food ‘Monthly Family Income @—~
Expenditure 2000- 3000- 4000-5000-
{in percent) 3000 4000 3000 6000
wi —- — — 8
20 —- 4 #9 4 45
2 7 6 WB &
g-30 3 10 19 8 _
ee
(Dathi Unie, Mi,
With the following data in 6 cities, calculate Pearson,
coefficient of correlation between the density fe
population and death rate:
13.9
City Area in Pofrilation No,
A 150 30 300
B 180 90 1440
c 100 40 560
D 60 42 0
E 120 72 1224
[Subhadia Uni, B.Com. 206]
13.10 The coefficient of correlation between two variables:
and y is 0.3. The covariance is 9. The variance of xis
16. Find the standard deviation of y series
13.8 r= -0438 13.9 r= 0.988
13.10Given 6, = VIG Conley)
2,0,
By
or or 15.
43.5.5 Spearman's Rank Correlation Coofficient
In 1904, a British psychologist Charles Edward Spearman developed a method to
ionship) between two variables, say x and y, when only
lable, This implies that Spearman’s rank correlation
mn where quantitative measure of qualitative factors such
s, beauty, intelligence, honesty, efficiency, TV pros™
. colour and taste cannot be fixed but individual observations can be arran
ical association (rel
(or rank) data are avi
method is applied in a situa
judgment, brands personali
leadershi
in a definite order (or rank). The ranking is done by u:
measure
ordinal
ent
mime,
ig a set of ordinal rank num
with 1 for the individual observation ranked first; 2 for the individual observation r="Spearman's rank
. (3-4)
crank correlation coefficients Ry i rank of oye
nk of observations with "vations with
pete Ris the ral ic th respect tone Tespect to first
ple: Pe air of ranks; and n is the e1 Second variable; d = Ry —
mec na PY ‘© number of paired observations op inti
is
2
per ‘6° in the formula as scaling «
‘This method is easy
se ais ust
is method is useful for correlation analysis when vari i
a Tis cho i lysis when vatiables are expressed in
‘This method is developed to measure
emeen two variables, say.xand y, when
to understand and its application is simpler than Pearson's
@
the statistical association (relationship)
Only ordinal (or rank) data are available.
ages
{ Values ofboth the variables are assumed to be
h the v normally distributed and describi
alinear relationship rather than non-linear relationship. ne
(yA lage computational time is required when pairs of values of two variables
exceed 30.
ii) This method cannot be applied on grouped data to measure the association
between two variables.
tase When Ranks Are Given
observations in a data set are already arranged in a particular order (rank), then take the
diferencesin pairs of observations to determine the difference, d. Square these differ%
and obtain the total. Apply the formula to calculate Spearman's correlation coeflicie:
Example 13.8: The coefficient of rank correlation between debenture prices and share
prcsis found to be 0.143. Ifthe sum of the squares of the differences in ranks is given to
448, then find the values of n.
\ces
Satin: Apply the formula of Spearman's corr
ion coefficient:
st
R a 6x
noe =)
Given R = 0.143, S42 = 48 and n=7. Substituting values in the formula, we get
6x48 288
eee
0.143 cop
0.143(u3 =n) = (aS—n) = 288
n3—n—336 =0 or (n-7)(n? + 7n + 48)=0
This implies that either n =7 = 0, that is,» = 7 or n® + Jn + 48 = 0. But x? + 7m +
* on simplificati ir fn be ¢
plification gives undesirable value of n because
"tave. Hence, » =7, :
is discriminant b? = 4ac is(CHAPTER 13,
“The ranks of 15 tudents in two subjects Aand B are given below. The
aaa within brackets denote the ranks of a student in A and B subjects, respectively”
4), 6 (3 BH
1, 10), (27 (2% (46, G4 ; 1
& i fa 15), (11, 9, (12, 5)» (13, 14), (14, 12), (15, 13)
1d Spearman's rank correlation coefficient. (Sulla Uni 4, 2095
ect to their performance in two subjects ar
are shown below: Camere
dion: Since ranks of students with resp
Solut Sc
ven, calculations for rank correlation coefficient
Rank in A ‘Rank inB Difference @
R Ry d=R,-Ry
ms
1 10 -9 81
2 7 -5 25
3 2 1 1
4 6 -2 4
5 4 1 1
6 8 -2 4
7 3 4 16
8 1 7 49
9 u -2 4
10 15 5 25
a 9 2 4
12 5 7 49
13 “4 -1 1
4 12 2 4
15 13 2 4
Xd? = 272
ele eee
6x 272
Ay the fo la, R -_——
ply the formal 15{(15)" -1)
= 1 - 1632 = 1 ~ 0.4857 = 0.5143
3360
correlation between performances of students
‘The result shows a moderate po:
in two subjects.
Example 13.10: There are 12 clerks working in a office. The long-serving clerks feel
that they should get seniority increment based on length of service built into their salary
structure, Based on assessment of their efficiency by the HR department a ranking of
‘efficiency was developed. The ranking of efficiency together with a ranking of their length
of service is as follo’
Ranking according
to length of service : 1
Ranking according
to efficiency 12
Do the data support the clerks’ claim for seniority increment?
{[Suthadia Uni, MBA, 20001
2 3 4 5 6 7 8 9 10
3 5 1 9 10 Il 1 8 7 6 4
t are
Solution: Since ranks are already given, calculations for rank correlation coef
shown below:Rank According
a Di
i ene 8 Efficiency ay z
Ry ~R,
ye a
2. ' = —
2 3 -1 1
1
3 i -2
4
4 8
5 2) “4 9
6 10 a 16
7 a m4 16
8 12 a4 16
9 16
10
n
2
.¢ result shows a low degree positive correlation between length of service and
gency, the aim ofthe clerks for a seniority increment based on length of service ma
abe usilied. y
ple 13.11: Ten competitors in a beauty contest are ranked by three judges in the
falowing order? y
joel: 1 6 5 SBT
eS ne
judge: 64
8
eaeG =O
9 8 1 2 8 W 5 7
Use the rank correlation coefficient to determine which pair of judges has the nearest
approach for judgment of beauty [MD Univ, MBA, 2004]
Solution: The pair of judges who have the nearest approach for judgment of beauty can
deobtained in °C, = 3 ways as follows:
(@ Judge 1 and judge 2.
(ii) Judge 2 and judge 3.
(ii) Judge 8 and judge 1.
Caleulations afier comparing the ranking of judges are shown below:
Judge 1 Judge 2 Judge 3 4? = (Ri ~ Rye dP = (Rp RP dF = (Rs- RY
R Ry Rs
er a
———
1 3 6 4 9 %
6 5 4 1 1 4
5 8 9 9 1 16
10 4 8 LS ue 4
3 7 1 16 36 4
2 10 2 “ “ q
3 : fa a 81 1
1 6 5 1 1 4
8 9 7 1 4 1
200 214 G70 CHAPTER 9
“Applying the formula
ord. —5%200_ .,_ 1200.
Rye l-Tgt—y 1000-1) ‘990 7 ~O212
xd _,__6x214_ _y_ 1284
noe -D 10(100-1) ‘999 ~ 70-297
6Edj 6x60 _ .
wit ~ ' ToUoO-H ‘999 ~ 0.636
1 Rjy = 0.686 is highest, the judges 1 and 3 have nearey
Since the correlation coeff
approach for judgment of beauty.
aso 2: When Ranks Are Not Given
jcular order (rank), then ranks are
fobservations in a data set are not arranged in a parti
assigned by taking either the highest value or the lowest value as rank one and s0 on for
values of both the variables.
Example 18.12: Quotations of index numbers of security prices of certain joint stock
company are given below:
Year _Debenture Price Share Price
1 97.8 73.2
2 99.2 85.8
3 98.8 78.9
4 98.5 75.8
5 98.4 712
6 96.7 87.2
ai 97.1 83.8
Use rank correlation method to determine the relationship between debenture prices
[Calicut Uni, B.Com, 2005)
and share prices.
Solution: Let us start ranking from the lowest value for both the variables as shown below:
Rank Difference @? = (Ry - RP
Debenture ‘Rank Share Price
Price (8) 0) Ry - Rp
78 3 732 1 2
99.2 a 85.8 6 1
98.8 6 78.9 4 2
98.3 4 75.8 2 2
98.4 5 772 3 2
96.7 1 87.2 7 -6
97.1 2 83.8 5 -3
Applying the formula,
The result shows a low degree of negative correlation between the debenture pric
and share prices of a certain joint stock company.‘economist wanted to find 0 CORRELATION
73.18 AM erent rate ina country and its hee? there is any rela wees 471 |
nae cig it inflation ane 92, SRY Featonship +
sie S09) moore fom 7 countries
woe Country Srenbl iia
ate (Per cen tion
Rate ce
A 4.0
B 85 3
c 55 7
D 08 fy
E 13 101
F 5B 78
ee eee
hue degree of linear association between
i eat ‘unemployment rate in a country and
wit
Bw: Ranking fom the lowest value for ba the variables as shown bow:
Gaenplynent Rank — Inflation Rank Difference z
‘ees ROR) Ryd at, @ = (Ry RP
Rae
40 3 32 1 2 ;
85 7 82 5 3 ‘
53 4 O4 6 2 a
08 1 Bl 3 2 4
13 6 10.1 7 “1 7
58 5 78 4 1 1
cay 2 47 2 0 0
Be = 18
Applying the formula,
= 18x18 2 118 a os
(7-7) 336,
‘The result shows a moderately high degree of positive correlation between
unemployment rate and inflation rate of seven countries.
tase 3: When Ranks Are Equal
Ifmore than one observations of equal size are found at the time of ranking ‘observations
| inthe data set by taking either the highest value or lowest value as rank one, then rank
tobe assigned to individual observations is an average of the ranks that these individual
abservations deserved. For example, if two observations are ranked equal at third place,
then the average rank of (3 + 4)/2 = 3.5 is assigned to these two observations, Similarly, if
three observations are ranked equal at third place, then the average yank of (3 +4 + 53
isassigned to these three observations. .
The modified Spearman rank correlation coefficient formula for such a case is given
~
ole? +5 (nt -m)* (nm) +
2
nee) /
where m{i'= 1, 2,3, .) stands for the number of imes an observation © repeated in the
R
tata set for both variables.PIERS
1d out whether inventory tu
A financial analyst wanted 0 fin =
De cet ving er sare Ginper cent). Arandom sample of7 company
influences any comange wa sled andthe following data was recoded foreach,
sted in a soc
ir Earnings per
Invent Earnings per
coapey Stumm Share (re)
(Number of Times)
u
x 4
B . :
é 7 13
D . :
+ 6 13
F 2 :
G 5 8
—————_—————
Find the strength of association between inventory turnover and earnings per share,
Interpret this finding.
i bles. Since observations of equal
Solution: Ranking from lowest value for both the variat of eq
size are found at the time of ranking in the data set, therefore rank to be assigned to
repeat observations isan average of the ranks that these individual observations deserved
as shown below.
Tnventory Rank ‘Earnings ‘Rank — Difference d? = (R,~R,)?
Furnooer(s) Ry" PerShareG) Ryd = RyRy
4 2 u 5 -3.0 9.00
5 35 9 4 05 0.35
7 6 13 65 05 0.35
8 7 7 1 60 36.00
6 5 13 65-15 2.25
3 1 8 25 0-15 2.35
5 35 8 25 10 1.00
Bd? = 51
I may be noted that a value 5 of variable xis repeated twice (m, = 2) and values 8 and
13 of variable y is also repeated twice, so my = 2 and my = 2. Applying the formula:
1 1 1
: feat tigen ~m)+75 (im ~ ms) + (m§ - i}
n(n? =1)
R=
Peel eter oy
6fs1+ 4 e941 gs _
{ Cig cies mele
79H
_ 81514054 0.540.5) _
336
1S
- 0.9375 = 0.0625
‘The result shows a very week
positive association between inventory turnover and
earning per share,
Example 13.15: Obtain the rank correlation coetfi
from the following pairs of observed values,
x: 50 35 65 50 5560
y+ NO 10 115 125 40115
icient between the variables x and J
50 65 705
180 120 115 160
[Mangalore Unix, B.Com, 20051(CORRELATION ANALYSIS
bles. Since observations of equal
therefore rank to be assigned to
se individual observations deserved
2 15
025
45 15 O00
we i 12.25
2 u 25.00
a 0 20.25
6 4.00
e 8 36.00
15 6 a
e 4 25.00
iy a 00.00
134.00
ee
i may be noted that for variable x, 50 is repeated thrice (m, = 3), 55 is repeated
nce my = 2)-and 65 is repeated twice (my = 2). Also for variable y, 110 is repeated twice
foe 2)and 115 thrice (ms = 3). Applying the formula:
1 1
ofeat ej ont =m nh — ma) + 2 nd =) 4 mh =m nd ma)
BEM tg mt Tmt ig (mem) pg (msm) ig mt MY pg MM)
ae 12
a n(n? =I)
3 Logs 16s L 3 Los
_, Sbets H+ GE -I+ VE -D+G@-VM+ ge -3}
10(100=1)
6 (134 +2+0.540.540.542) _) _ 6x139.5 _ |) _ 837
7 990 “'> 990 990
= 10.845 = 0.155.
The result shows a weak positive association between variables x and y.
15.5.6 Method of Least Squares
The method of least squares to calculate the correlation coefficient requires the values of
regression coefficients 6, and b,,, $0 that
7 = ly Xb
In other words, correlation coefficient is the geometric mean of two regression
coefficients (see Chapter 14 for details).
135.7 Auto Correlation Coefficient
The auto correlation coefficient describes mutual dependence between values of the same
variable tan ve different time periods. Thus, it provides information on how a variable
relates te self for a specific time lag. The difference in the period before a cause-and-
ellect relationship is established is referred to as Tead time or lag. While computing the74 CHAPTERS
arration, the time gap must be considered; otherwise misleading conc
be arrived at. For example, the
immediately reflect on its price, it may take s
"The formula for auto-correlation coelficient a
nok
x, (5) - 7) G44 -¥)
, 3
‘decrease of increase in supply of a commodiyy "3 ™4y
ike some lead time or time lag. 'Y may not
time lag kis stated as:
where kis length of time lag; n is the number of observations; and ¥ is the mean of ay
observations.
Example 13.16: The monthly sales ofa product, in thousands of units, inthe lst 6 mon
are given below: s
Month: 1 2 3 « 5 6
3.1 3.0 4.2 34
coefficient up to lag 2. What conclusion can be deri
rrend in the data? mae
Sales : 18 25
Compute the auto-correlation
from these values regarding the presence of @ t
Solution: ‘The calculations for auto-correlation coefficient are shown below:
Time Sales, = One Time Lag Xy = Two Time Lags
@) Variable Constructed Variable Constructed
From x From x
1 1.8 25 3.1
2 25 3.1 3.0
3 3.1 3.0 42
4 3.0 42 34
5 42 34 =
°
34 =
6 ss ee
Fork= 1,8 = LB +25 +0 +34)=
{(1.8-3)2.5- 3) + 2.58) (8.1 -3)+(3.1-3) (3-3)
ne 1 8=942-9+42-984-9)
(8-3 + 25-3) +(B.1-3y + (3-3 + (42-3 + 8.4-3"
= 1.2) (0.5) + (-0.5) (0.1) + (0.1) (0) + (0) 1.2) + 0.2) (04)
144+ 0.25 + 0.014 0+ 144+ 0.16
— (0.6-0.5+0.48) _
= eon
312
(1.8 3) (3.1 - 3) + (2.5 - 3) (3- 3) +(3.1- 3)(4.2- 3)+8- 9B4-9)
(8-37 + 25-3) +..+ (84-3
(1.20.1) + (-0.5 x0) + (0.11.2) + (00.4) _ -0.12+0.12 _ 9
33 33s
danse the value of is positive itimplies that there isa seasonal pattern of G months
= O implies that there is no significant change in sales.gel
y jent of rank correlation of the marks
st, Of py 10 students in statistics and account
it pate Pg be 0:2. It was later discovered thatthe
cos funy ranks in two subjects obtained by one of
aifererus was wrongly taken as 9 instead of 7. Find
{he sder coefficient of rank correlation,
he (Delhi Unia, B.Com, 2004)
ranking of 10 students in accordance with their
ss janet Oo subjects A and B are as follows:
-practice Problems 13B
CORRELATONAUALYSSS 475 |
+
13.16 An in ing data with
3.11 sea gitor collected the following. data wit
espect to the socio-economic status and severity of
rs Ae scio-economic status and severity of
Socio-economic
status (rank)
Severity of illness
rank)
67235418
534371
cep 8 wk 4 OT Ba 2
ey ese mOMnEscEnOMtriustea Calculate the rank correlation coefficient and
comment on its value.
ae the rank correlation coefficient and 13-17 You are given the following data of marks obtained
fomment on its value by 11 students in statistics in two tests, one before and
spetae Spearman's coeffident of corration other after special coaching:
heen marks assigned to ten students by judges x Fa Tea Send Tet
{na certain competitive test as shown below: we coathi i
andy in a certain compet Stas shown below: (Before coaching) (After coaching)
Grndent_ Marks by Judge x Marks by Judge y 23 um
1 52 6 ” os
19 2
2 58 68 21 1B
4 42 43 18 20
4 60 38 20 22
5 45 7 ue ES
6 41 48 0 a
7 37 35 7 20
8 38 30 3 23
9 25 25 16 20
10 27 50 3 a
oS
18.4 An examination of eight applicants for a clerical post
was taken by a firm. From the marks obtained by the
applicants in the accountancy and statistics papers,
‘compute the rank correlation coellicient,
apes ie
‘Applicant A BODE FGH
Marks in
accountancy: 15 20 28 12 40 60 20 80
Marks in
statistics; 40 30 50 30 20 10 30 60
statistics +40 305080
18.15Seven methods of imparting business education were
ranked by the MBA students of two universities as
follows:
ree
Weed feng 1 2 3 4 5 ST
Do the marks indicate that the special coaching has
benefited the students? (Delhi Uni, M.Com, 2000]
13.18 Two departmental managers ranked a few trainees
according to their perceived abilities. The ranking
are given below:
Traine =: AB CDE 1
Manager A: 19) 1
Manager B: 310
°
x
62
81
0
9
Calculate an appropriate correlation coefficient
measure the consistency in the ranking,
18.19 In an office some keyboard operators, who were
already ranked on their speed, were also ranked
oon accuracy by their supervisor. The results were as
follows:
8
Rank by swdents gg 4 7 6 Opraor :ABCDEFGHI J
of Univ. A : Speed ae omay ech ater 000
Rank by students : 32475 6 Acuray :7 9 3 4 1 6 8 2 10 5
Gaatae the rank correlation coefficient and
‘comment on its value-
Calculate the appropriate correlation coefficient
between speed and accuracy.{478_oueren ny
13.20 The personnel departmentisimerested in comparing leant
the tangs of fob applicants when menered by nH
variety of standard test. The ratings of 8 applicants Interview
views and standard. pychological test are Standard test
show below:
mele
Caleulate Spearman's rank correlation coeffi
comment ont vale. scien
Hints and Answers
1S. Given R= 02. = 10; R= 1- 24 or fed? (n?-m)+ ty (mt-m)h
nin?) 1314R ao 2
nF
ie
OF dor Ed? = 100
02=
=o
82100 0,394 8OI=D
Correct value of R= 1 =
1099
13.158 = 0.50 13.16 R= 0477
13.42R = 1 8435 2 o7s2 13.178 = 0.71 13.18 R= 0.812
Loe 13.19 R = 0.006 13.20 R= 0.817
0276 _ 4.539
ISSR = 1-
Toa00-
13.6 HYPOTHESIS TESTING FOR CORRELATION COEFFICIENT
ator to test whether the possible
exits In other
hypothesis
ion coelficient is often used as an e
between two random variables in the populat
words, simple correlation coefficient, i used tor for testing a
about true population correlation corfficient (Greek tetier tho) with the assumption that two
ly distributed.
random va
13.6.1 Hypothesis Testing About Population Correlation Coefficient (Small Sample)
two variables x andy exis
1. The test of null hypothesis
whether there is
ne whether there exist any
icant correlation between hypothesis that the value
of the population correlation coefficient, p, is equal to zero. ‘The population correlation
pp. measures the degree of association between two variables in a population of
ineerest. The null and alternative hypotheses are expressed as
‘Two-tailed Test
Hy: p=0(0
Hy: p 0 (Correlation exists between variables x and y)
The null hypothesis that a
requires the knowl
relationship between x
o correlation between variables x and y)
One-tailed Test
Hy: p=0, and Hy: p>0 (orp <0)
The test statistic for testing the null hypothesis is given by
r rxJn-2
rPe =
5 terval
where r is sample correlation coefficient; 5 is standard error of correlation coefficient and
nis sample size.follows t-istribution with » — 9
syatsti zi
(og tne standard error of correlation coethre se reedom: If the
Aicemtisgienbys = (re
spre ae
nn ga value of est statistic is compared with ig
sh et dom and level of significance a to arrive at x dany
of
(or table) value at n -9
bs "
ww 5 decision as follows:
WE Fpeaniled DS —
a —__ Two-tailed Test
- Tet
+ ee oF tay <~ Reject Hy itt > 4
ita 7 fant fot ee lane
orwd it Hy
| other accept Ho + Others sept
Otherwise cept Hy
je18.17: Arandom sample of 27 pairs of observation
He eaton coflcient of O42. Is key tha the varables ee eopulatin
iables in the population are
gre ated?
otter ke a null hypothesis that there is no signi og, WDethi Uni, M.Com,, 2005}
saat correlation coefficients, that ignificant difference in the sample and
” Hy:p =
civenn = 27, df = "2 = 25,1 = 0.42. Applying t-test statistic as follows:
t= TPs e = 0.42
Fg MPln-2) Yo-(0ss}y/e7-2
= Ga0875 2512
since the calculated value of f,,) = 2.312 is more than its critical value, , = 1.708 at
a= 0.05 level of significance and df = 25, the null hypothesis is rejected. Hence, it may
teconcluded that there is significant difference in the sample and population correlation
coefficients.
Example 13.18: Is a correlation coefficient of 0.5 significant which is obtained from a
random sample of 11 pairs of values from a normal population? (Maira: Uni, B.Com, 2005)
Solution: Let us take the null hypothesis that the given correlation coefficient is not
sufficient. Applying t-test
t= ie = 05 321,732
Yaeryn-2 fa-osrai-2) 0866
where r = 0.5, = 11
‘The calculated value of fy
than the table value t, = 2.26 and hence
): yairs of observations must be included in a sample so that
than 2.7:
Solution: Given, r = 0.42,
1.732 at a = 0.05 level of significance and df, v = 9 is less
the given correaltion coefficient is not significant.
49,79. Applying t-test statistic, we get
-2 2p
r etor®x 25 =
Wore ret
fa-r?ykn-)
(n-2)_ 22-72
(0.42)? x 7-049" .
(2.72 (1-049) _ 7.3984(0.8236)
n= Oa 0.1764
= 0:0988 = g4.542
0.1764
a2 + 34.542 = 36.542 37
ould be of 37 pairs of observations.
jze 3
Hence, the sample aa
CORRELATIONS — 4CHAPTER 13
fs the stature of father and son,
Example 13.20: To study the correlation between th son, a sampic
1600 Faken from the universe of fathers and son The sample study gives the correo!
between the two to be 0.80. Within what limits does it hold true for the universes
rd error of the correlation coeficien
Solution: Since the sample size is large, the stan
is given by '
one
= 0.8 and n = 1600. Thus,
1-(0.8)? _ 1-064 _ 0.36 .
SE, = - = 938 x 999
Standard error SE, = “FET = 10 “ 0
The limits within which the correlation coefficient should hold true is given by
1% 3SE, = 0.80 + 3(0.009) or 0.773 << 0.827
Given correlation coefficient,
13.6.2 Hypothesis Testing About Population Correlation Coefficient (Large Sample)
distribution of sample correlation coefficient, ris not normal and its probability curve iy
{kewed inthe neighborhood of population correlation coefficient, p = +1, even for lange
sample size n, then use Fisher's z-transformation for transforming r into z as follows:
Ake value of: for different values of r can be seen from the standard table given in
the Appendix.
Changing natural logarithm to the base ¢ to the base 10 by multiplying with the
constant 2.3026 as follows:
loge x = 2.3026 log gx
where x is a positive integer: Thus the transformation formula becomes
zs ; (2.8026) logit = 1.1513 logo
Fisher's z-transformation for transforming r into z with:
Mean z,
and Standard de ya
1 2is approximation is useful for lange sample sizes. However, it can also be used for
small sample sizes of at least n 2 10,
‘The zest statistic to test the null hypothesis Hy: p = 0 and Hy: p #0 is given by
where ois the standard error of Z,
Decision Rule
+ H[Zcal| < Table value of Zy», then accept null hypothesis Hy.
+ Otherwise reject Hy,
13.6.3 Hypothesis Testing About the Difference Between Two
Independent Correlation Coefficients
‘The test statistic for testing a hypothesis about correlation coefficient in the single
Population can be generalized to test the hypothesis of two correlation coefficients r, and
72 derived from two independent samples as follows:Ce CORReL ANON ANALYSIS
oy ia
Men iy 1 4-3" cy
Ke = LISI Joy z ~
we HBT io and
1, eho
po Fhe TE = BIS tog Lt
I-y
sximacely normally distributed with 2
appro th zero mean and
it standard deviation,
9M shwolute value |Z call is less thaw its
we H its table. value, Z,
‘ae then accept the null
4 otherwise reject Hy.
esi aS
3 is 0.9?
correlation OTs
et Given, = 0.75,n = 30, and
‘elation: © = 0.9. Applying Fisher's, transformation, we get
= 1.1513 logy, 175
1.1513 logy) 1+7
0235
= 1.1513log 61.75 — 1000.25]
= 1.1513(0.24304 ~ 1.39794 ) = 0,
ah ti . 3 = 0.973
‘The distribution of zis normal around the true population correlation value p = 0.9.
Thus, _
Mean, 2) = 1 1513 Logig 2
1.1513 (logo1-90 - log, 0.10) =
“The Z-test statistic is given by
lel] de=31 _ 0.973-1.47) ope
3, Vins “Vos ren 0.498 X 5.196 =
Hence p (r $0.75) = P[Z s 2.59] — 0.9952 = 0.0048.
Example 13.22: ‘Test the significance of the correlation, r = 0.5 from a sunple of size 18
hypothesized population correlation, p = 0.70.
Solution: Take the null hypothesis that the difference is not significant, that is,
Ho: p = 0.70 and Hy: p#0.70
.5. Applying z-tr
=115 1+0.90
1.1513 logyo. “ = 11513 topo 320
+1513 (0.27875 + 1) = 1.47
Given, n = 18, r=
nsformation, we have
1+0.
2 = 1.1513 logy 122 = 1.1513 logyo~
I-r I-
1.50 _
1513 logy
= 1,1513(0.4771) = 0.5492
140.70
Mean zp = 1.1513 logyg4=£ = 1.1513 losi07—
I-p 1-070
= 1.1513 logyg——
5: 81005
and
= 1.70. 21,1513 10g 95.67
= 1.1513 logi07 55
= 1.1513(0.7536) = 0.8676
Applying Z-test statistic, we get
ge Foal = le-al 2
=p yi
ore eae
= | 0.5492 - 0.8676 | = 0.3184(3.872) = 1.233
479[480 cHpter 1
significance level,
i ulated value of Z , ;
Since calc eset null hypothesis is accepted. Hence, it may 62 concluded gant
hat
3 is less than its table value Za 9 = 1.9615
difference (if any) is due to sampling error.
13.23: Two independent samples of size 23 and 21
Pairs of observation,
Example nd their coefficient of correlation was found as 0.5 and 0.8, Tespectively net
an oo
analysed ¢
this value differ significantly?
Solution: Take the null hypothesis that two values do not differ signify, ya.
samples are drawn from the same population.
Given m = 23, 7; = 0.55 My = 28, 79
p= Boel
0
ince calculated value of)
.8. Applying Z-test statistic as follows:
145 5
2, = 1.1513 lomo = 1.1513 log, 1405,
1-05
= 1.1513 logy93 = 0.55
lin
1513 log, 1408
08
= 1.1513 logig9 = 1.10
= 1.833 is less than its table value yg = 1.96 at5 per cent
significance level, the null hypothesis is accepted. Hence, the difference in correlsin
values is not significant.
1, What is the meaning of the coefficient of correlation?
2. Explain the meaning and significance of the term
correlation. [Delhi Unie, MB4,2003)
8. What is meant by ‘correlation’? Distinguish between
positive, negative, and zero correlation.
[Ronchi Uni, MA, 2004]
its of r? and ? What does
zero? minus one?
4, What are the numerical li
it mean when r equals on
5. What is correlation? Clearly explain its role with
suitable illustration from simple business problems.
Wethi Unie, A404, 2005)
6. What is the relationship between the coefficient of
determination and the coefficient of correlation?
How is the coefficient of determination interpreted?
7. Does correlation always signify a cause-and-effect
relationship between the variables?
[Osmania Unie, MBA, 2000)
8. What information is provided by the coefficient
of correlation of a sample? Why is it necessary to
Perform a test of a hypothesis for correlation?
9. When the result of a
Conceptual Questions 13A
10.
u.
12.
13.
14.
What is the ¢-statistic that is used in a test for
correlation? What is meant by the number of degrees
of freedom in a test for correlation and how isit used?
What is coefficient of rank correlation? Bring out its
usefulness. How does this coefficient t from the
coefficient of correlation? [Delhi Univ., MBA, 2006)
What is Spearman's rank correlation coefficient?
How does it differ from Karl Pearson's coefficient of
correlation?
(a) What is a scatter diagram? How do you interpret
a scatter diagram?
(b) What is a scatter diagram? How does it help in
studying the correlation between two variables in
respect of both its direction and degree?
[Delhi Univ,, MBA, 2007]
Define correlation coefficient ‘r’ and give its
limitations. What interpretation would you give if
told that the correlation between the number of
truck accidents per year and the age of the drivers
(90.60 if only drivers with at least one accident a
considered?gf
sation between the price of two,
econ Sample of 60 is 0.68, Could
ina
ease?
‘ue NE “correlated population?
rom X population in which true correlation was 0 gp
‘ ing data give sample size .
wing data give sample sizes and
1 est the significance of the
oF ee values using Fisher's 2
‘commodities
the observed
correlation
difference
transformation,
be Value of +
‘Sample Sie
~s 0.870
12 0.560
a
suapany wants t0 study the relationship between
8 onFenditure (it €1000's) and annual prof tin
860 oF Te following table presents the information
ft ast 8 Yeats.
1988 87 86 85 84 83 82 81
ipepenses: 9 7 510 4 5 3 8
Kanal profit: 45 42 41 60 30 34 25 99
{ fsimate the sample correlation coefficient.
‘iar
Hints and Answers
aera
140.68
ler
216) 2 = 1151S ogg ** = 1.1513 logo TE
1.68
= 1.1513 log) “2% = 0.829
11515 181059
= 013
Sandard error, 6, = ts . e
Test statistic: = 2B = 0820-0
2 0.13
6.38
Since deviation of z from zp is 6 times more than .,
the hypothesis is not correct, that is, population is
correlated.
l+p
t= E41 _ [0829-1099] _ 998 > 2 times
o, 0.13
andard error, p is likely to be less than 0.8.
*Let Hy samples are drawn from the same population.
140.87
4 = 1.1513 logig a = 1.1513 logy 597
“hi
= 1.333
13.254 small retail bus
yepract ce Problems 13¢ CORRELATIONANALYSIS _481
sre
(©) Tes the signicance of correlation coefciet ata
© 5 per cent level o
13.24; eof significance.
: a the least value of rin a sample of 27 pairs from
‘variate normal population at a = 0.05 level of
Significance, where gq, = 2.06 atdf = 25.
4 siness has determined that the
correlation coefficient between monthly expenses and
Profits for the past year, measured at the end of each
month, is = 0.56. Assuming that both expenses and
Profits are approximately normal, test at a = 0.05
level of significance the null hypothesis that there is,
‘no correlation between them.
18.26 The manager of a small shop is hopeful that his sales
are rising significantly week by week. Treating the
sales for the previous six weeks as a typical example
of this rising trend, he recorded them in 2100's and
analysed the results. Has the rise been significant?
Wek: 1 2 3 4 «5 6
Sales + 2.69 262 280 2.70 2.75 2.81
Find the correlation coefficient between sales and
‘week and test it for significance at a = 0.05.
= LisISogig} 2 = 11815 o6F “5g
= 0633
a 7 = 0,895
Since the calculated value Z = 0.895 is less than its
table value Z, = 2.58 at a = 0.01 level of significance,
His accepted.
13.23 (a) r= 0.95 (8) Let Ho: = Oand Hy:r #0
r . 0.95
=P yi(m-2 — YU- 0.95" /8-9
= 7512
447 for df= 6, the Hy is
Since ty = 7.512 > 4
rejected.
rjn-2
St > 2.06
=r
re
or |r |= 0.381
19.25 r = 0.560 and (,,) = 0.576, Hp is rejected.
13.26 1 = 0.656 and f,,, = 0.729, Hy is rejected.[482 carrer‘
_[Formutae used
1, Karl Pearson's correlation coefficient
Covariance between x and y
Covariance between = an
Beem YEU -9"
ion from assumed mean
nSdy d, ~ Ed.) Edy)
© Yaid? ~ Gd.) fndd? - (2d,
4, =2-A.d,=y-B
AB = constants
+ Bivariate frequency distribution
nE fll ~ (© fil.) (© fy)
Vref fa? \nd fi fay
Using actual values of x andy
nEsy- (2s) @y)
nEx? — (Ex) Yn¥y? - (Ey)?
2. Standard error of correlation coefficient, r
* Probable error of correlation coefficient,
2
PE, = 0.674
lchapter Concepts Quiz
True or False
1. [T][F] There are several types of correlat
coefficients, the selection of which is
determined by the level of scaling of the two
variables.
2. [T][F] When both variables use measured on an
interval or ratio scale, Pearson's correlation
coefficient is most appropriate.
3. [T][F] To use Pearson's correlation coefficient, it is
assumed that both variables are continuous
and normally distributed,
(TI[F] When there is no linear association between
two variables, the value of r will be close to
zero.
(T][F] A correlation coefficient r = -1 represents a
very low linear correlation.
[7] [F] The coefficient of determination is the square
of the correlation coefficient.
4.
5.
6.
3. Coefficient of determination
Explained variance
“Total variance ~
4. Spearman's rank correlation coeticien, ©
+ Ranks are not equal
6rd?
a(n? 1)
Re
© Ranks are equal
2.
6) xd" +g (h-m)
n(n? 1)
t
Bp
5. Hypothesis testing
+ Population correlation coefficient r for
sample asmal
+ Population correlation coefficient fora la
sample 7
7. [TIF] As the correlation coefficient approaches
zero, the possible error in Tinear predicion
increases.
8. (T][F] The closer the correlation coefficient is
zero, the greater the predictive validity of
test,
9. (T][F] Ifa correlation coefficient for reliability f@ |
testis close to 1, then the testis unreliable
10. [T]{F] Even a high correlation is not necessiit
indicative of a casual relationship bet?
two variables.
11. [T][F] As the value of r increases, the propor? |
of variability of one variable y that am"
accounted for another variable x deers
12, (T][F] If the relationship between wo ah
is nonlinear, the value of the com
coefficient must be negative.jonan’s correlation coe
i SPE" one oF both variables
1 oer scaling.
inter iagFAM iS Used 10 het
spe
ad Pt decide
questions
reese
1 er
6 at oof zand y values
| 9 ooship between variables x andy ig
|, eran the scatter diagram? # linear,
yet exactly on a straight
ph yfllonacurve
| OM crepresent population parameters
0 eves represented by a straight line
ct ceationship between x and
the
oy decreases, variable x
a () decreases
(@) changes linearly
(b) must be linear
7 is positive, ag
na nega ;
1 sinereases,y increases
decreases, y decreases
| ward ae
() perinereases,j decreases
| {i both (a) and (b)
‘he lowest strength of association is reflected by
stich ofthe following correlation coefficients?
| oss () ~0.60
| 035 (a) 0.29
|. The highest strength of association is reflected by
hich of the following correlation coefficients?
@)-10 (b) -0.95
oa (@) 0.85
2, There isa high inverse association between measures
‘overweight’ and ‘life expectancy’. A correlation
coeficent consistent with the above statement is:
a) r= 0.80 (b) r= 0.20
(©) r=-0.20 (@) r= -0.80
4% OF the following measurement levels which is the
required evel for the valid calculation of the Pearson
correlation coefficient
6) nominal
(0 internal
(b) ordinal
(@) ratio
‘OF the ‘following measurement levels, which is
tired for the valid calculation of the Spearman
Correlation coefficient?
(@) nominal
f () ordinal
_ fimeral (@) ratio
Teresa high direct association between measures
comgetete smoking’ and ‘lung damage’. ‘The
ration coefficient consistent with the above
Menem
130 (b) 0.80
(80 (030
ger telation coefficient appropriate for
imlshing the degree of correlation between the
“ables (assuming a linear relationship)
he lating
15. cry lear orensanliP between wwo variables
(ry When Glevlating
‘oefficieny Eat
ing ‘Spearman's correlation
ifference ie
ithe timatinesqunchie
cen he means aT of
8 which measure of correlation to
eee ct of data, you should consider
relationship is i
the et tM eationship is linear or nonines 6)
(© bau ey meatal ‘of measurement for each variable
(@) neither (nor
28. The propor
8 INE Proportion of variance acount for bythe level
(ay arrelation between two variables i ealelated by
¥ br
© =x
(@) not possible
29. The value of correlation «x i
ficiem
(3) depends on the origin
(©) depends on the unit of scale
(© depends on both origin and unit of scale
@ Ss itdlependent with respect to origin and unit of
0. Which ofthe fotiowing statements is false?
(@) In a perfect positive correlation, each individual
obtains the same z value on each variable
(0) Spearman's correlation coefficient
(d) A correlation of r = 0.85 implies a stronger
association than r = = 0,70
31. The strength of a linear relationship between two
variables x and y is measured by
@r oe
© R (@) bor by
32, If value of r?=0.64, then what is the coefficient of
correlation?
(a) 0.40 (b) 0.04
(©) 0.80 (@) 0.08
33. If both dependent and independent variables
increase in an estimating equation, then coelficient of|
correlation falls in the range
(a) -1srst ) Osrsi
(©) -38rs3 (€) none of these
34, If unexplained variation between variables x and y is,
0.28, then 1? is,
(a) 0.25 () 0.50
© 075 (@) none of these
35, What type of relationship between the two variablesis
indicated by the sign of r?
{a) direct relation
(©) both (a) and (b)
(by indirect relation
(@) none of these[484 cmprer 3
Concepts Qulz Answers -
TT 27 ST 47 S&F aT TBR
ee oer ae TM) ag gy
Oo mw he ew BO MO BO RO me
ao) 3390) SL@ SO ED © 3 @
Review Self-practice Problems
eer
he services of any sale
1327 The following are the monthly figures of the terminate th y salesman who
enerally found that advertising expenditure has is Sn ses Gi 10008) made by nin slesmen
[mnpact on sales generally after 2 months. Allowing for the last one year
this time lag, calculate the coefficient of correlation, Test sores : 4 19 24 21 26 22 15 a9 1g
“Months Advertsing Sales Months Advertsing Sales Sales 31 36 48 37 50 45 39 4) 4p
Expenditure Expenditure Compute the coefficient of correlation between y
ea 2400 scores and sales. Does it indicate that termine
oe 501200 July p00 Sf the services of salesman with low test soreee
Feb. 60 1500 Aug. 160-2600 ote nits low test ores
March 70-1600 Sep. 170-2800 i BA 2
Api” 902000 Oc. 1902900 18.88 Calculate the coeicient of, corelaion andy
May 120-2200 Nov. 2003100 probable error from the following:
nic emit. ETS
June___150_2500 Dec. _250_3000 Subject ‘Percent Marks in Per cent Mary
13.28 The coefficient of correlation between two variables Final Bons i Sesionaly
‘xandy is 0.64. Their covariance is 16. The variance of 75 a
ris 19. Find the standard deviation of series ar a
13.29Given r = 0.8, Zay = 60, o, = 2.5 and Ex? = 90, fa &
find the number of observations, items. x and y are re
deviations from arithmetic mean. 60
[Delhi Unis, B.Com, 2006) Maths ” 69
13.80 Calculate the Karl Pearson’s coefficient of correlation Statisties 81 n
between age and playing habits from the data given Botany 84 a
75 2
below. Comment on the value ‘Zoology
2122 23 24 25 15.34 Following figures give the rainfall in inches forthe
year and the production (in 100's kg) for the Rati
Grop and Kharif crops. Calculate Karl Pearsons
nt of correlation, between rainfall and tal
‘Age
No, of students :
Regular players :
500 400 300 240 200 160
400 300 180 96 60 24
colic
{Osmania Uni, M4, 2006] aon
13.81 A survey regarding income and savings provided the oe
following data: Rainfall : 20 22 24 26 28 30 32
a ——_
Income @) Saving @) Rabi production : 15 18 20 32 40 39 40
Su 1000 1500 2000 Kharif production: 15 17 20 18 20 21 15
40,000. 8 4 a a [Pune Univ, BA, 2004)
6000 ae 12 4 1885 President of a consulting firm is interested in te
8000 = 5 A A relationship between environmental work faces
aida and the employees turnover rate. He defins
= _ 10 5 environmental factors as those aspects of a Jeb
12,000 9 4 other than salary and benefits. He visited (0 ‘ini
K; 7 ants and gave each plant ting 1 to 25 on
Compute Karl Pearson's coefficient of correlation and aa ee ea ramhjan obtained each pl?
interpret its value. [Kurukshetra Unie, MBA,2005) turnover rate (Annual in percentage) €2 od the
18.32 company gives on-the-job training to its salesmen, relationship.
followed by a tes. It is considering whether it should1 CORRELATION ANALYSIS. 485 |
vironmental 7 |
rating 1119 71213 10 16 22 14 49 ae 5
mower: 6 48378 39 55 4 10 9
amputethecorreation cocicient between turnover " 2
cand environmental rating and test it L 12 5
UGNou, 2004, M 13 1
sgssoeen companies ina slate hive been ranked N 4 6
wseTeording to profit earned during a particular 2 15 y
financial Year, ae the working capital for that year, P 16 2
salculate the rank correlation coefficient. Le
Cs 5 13.37 Following are the percentage figures of expenditure
Company Rank(Profit) Rank(Working capita incurred on clothing (in 2100's) and entertainment
a fl 3 (in 7100's) by an average working class family in a
7 ' period of 10 years
6
c 3 ra Year + 1989 9091 9293 94 95 9697 98
D 4 15 Expenditure
E 5 10 on clothing: - 24 2731322025 33302822
F 6 2 Expenditure on
oe entertainment : 11 8 5 31310 27 9 2
G 7 4
H 8
u
Hints and Answers
1327 r = 0.918
Compute Spearman's rank correlation coefficient
and comment on the result.
(60)
sts 3600
(90m) x6.25 90nx 6.25
n=10
13.30 r=-0.991 13.31 7 = 0.0522
13.32 r= 0.947 13.33 1 = 0.623, PE, = 0.146
13.34 r= 0.917 13.35 r=-0.801
1336 R=-08176 18.37 R=-0.60