CORRELATION
Correlation analysis studies the relation between two variables. Correlation studies and
measures the direction and intensity of relationship among variables. Correlation measures
covariation and not causation.
Univariate distributions are those where observations are given on a single variable. We
obtained a measure of central tendency and a measure of dispersion of the set of values of the
variable.
Suppose we have observations on two variables, X and Y for several individuals. This is a
bivariate distribution. We can calculate the central value and dispersion of each variable
separately. However to know the association between the two variables we study the
correlation between the two.
Correlation between two variables X and Y mean that when the value of one variable is found
to change in one direction the value of the other variable is found to change either in the same
direction or in the opposite direction but in a definite way.
In the words of Croxton and Cowden ‘ when the relationship is of a quantitative nature, the
appropriate statistical tool for discovering and measuring the relationship and expressing it in
a brief formula is known as correlation.
IMPORTANCE OF CORRELATION
   1. Measurement of relationship- the coefficient of correlation facilitates measurement of
      extent of relationship between two variables. Eg bivariate distributions can be studied
   2. Facilitates regression- when there is correlation between two variables the value of
      one variable can be estimated given the value of another variable. Regression
      equations can be used to find such values.
   3. Facilitates decision making- decision making becomes feasible because the trend of
      change in one variable resulting in change in the other variable can be studied and
      important decisions can be made.
   4. Formulation of policies- correlation analysis facilitates understanding of economic
      behaviour because of existence of relationship between two or more variables and
      helps in formulating and implementing certain policies.
       TYPES OF CORRELATION
       1. On the basis of direction of change
          a) Negative correlation
          b) Positive correlation
       2. On the basis of degree of change
          a) Linear correlation
          b) Non linear correlation
       3. On the basis of number of variables
          a) Simple correlation
         b) Partial correlation
         c) Multiple correlation
            1. On the basis of direction of change correlation can be classified as positive
               or negative. Positive correlation is when both the variables move in the
               same direction. ie; both of them increases or decreases. On the other hand,
               negative correlation is when both the variables move in opposite
               directions. ie; when one decreases the other increases and vice versa.
            2. The correlation is said to be linear if the ratio of change between two
               variables is constant or fixed. In other words there is proportionate change
               in the values of the two variables. If two sets of data are plotted on a graph
               paper the relationship will be indicated by a straight line. It is also known
               as perfect correlation. Linear or perfect correlation maybe positive or
               negative.
                Non linear correlation or curvilinear correlation is said to be non linear if
                the ratio of change between the two variables is not constant. There is no
                proportionate change in the values of the two variables and when plotted
                on a grave paper the relationship will not make a straight line on the graph
                paper. It may be positive or negative.
            3. In simple correlation we study the relationship between two variables
               involved in a statistical study. In a partial correlation more than two
               variables are involved in the statistical study and out of these variables we
               study the relationship between only two variables and all other factors are
               assumed to remain constant. In a multiple correlation more than two
               variables are involved in statistical study and the relationship between
               them are studied simultaneously and none of them are assumed to be
               constant.
DEGREE OF CORRELATION
  1. Perfect correlation
  2. Absence of correlation
  3. Limited degree of correlation
     1. Perfect correlation is that where changes in two related variables are exactly
        proportional. If equal proportion changes occur in the same direction then the
        value is described as +1. If equal proportionate changes occur in the reverse
        direction then the value is described as -1.
     2. Absence of correlation or zero correlation does not imply the absence of
        relationship between the two variables, it implies that the values are uncorrelated.
3. Limited degree of correlation refers to a situation when a change in one variable
   need not be exactly accompanied by a corresponding change in the other variable.
   It is a situation between 0 and (+/- )1. This can be further classified as
   a. High degree of correlation- if the coefficient of correlation lies between + and
      – 0.75 to 1
   b. Moderate degree of correlation- if coefficient of correlation lies between
      (+/-)0.25 to (+/-) 0.75 it is known as moderate degree of positive/negative
      correlation respectively
   c. Low degree of Correlation – if the coefficient of correlation lies between
      (+and –) 0 and 0.25
       METHODS OF ESTIMATING CORRELATION
       1. Scatter Diagram
       2. Karl Pearson’s coefficient of correlation method
       3. Spearman’s rank correlation coefficient method
           1. Scatter diagram is a simple method of measuring correlation. It is
              a graphic presentation of relation between two variables. It offers
              direction and degree of change between two variables graphically
              The X variable is taken on the horizontal axis and the Y variable is
              plotted on the vertical axis and a point is plotted for each variable.
              After the entire data is plotted a cluster of points is obtained on the
              graph paper. The trend between the plotted points is observed and the
              direction and degree of relationship between the X and the Y variables
              is studied.
              The following diagrams give us the relationship between the two
              variables and help us to study the trend.
              1. Perfect Positive Correlation
2. Perfect Negative Correlation
3. Limited Degree of Correlation
   a. High Degree of positive correlation
   b. Low degree of positive correlation
   c. High degree of negative correlation
   d. Low degree of negative correlation
4. Absence of correlation or no correlation
                          Merits of scatter diagram
                          1. Easy to draw
                          2. Helps to determine whether the relationship between X and Y
                             is Linear or Non Linear
                          3. In case of a linear relationship gives a clear visual picture
                              Demerits of scatter diagram
                              1. The degree of association between two variables cannot be
                                 determined in numerical terms from a scatter diagram
                              2. It is not possible to draw a scatter diagram on graph paper if
                                 there are more than two variables
Karl Pearson’s Coefficient of Correlation
This method overcomes the limitation of scatter diagram of not indicating precise magnitude
of correlation.
Calculating with Actual mean method
Calculating with direct method
   1.   We calculate the sum of values of series X and get summation of X
   2.   We calculate the sum of values of series Y and get summation of Y
   3.   We find squares of values of series X and get summation X square
   4.   We find the squares of values of series Y and get summation Y square
   5.   We find the product of the values of series X and Series y and get summation XY
        Merits of Karl Pearson’s Coefficient of Correlation
        1. It gives exact measurement of the correlation between the variables
        2. Expresses the direction and degree of change in the two variables
        3. Interpretation of results become easy as this method gives us precise magnitude of
           the correlation.
           Demerits of Karl Pearson’s Coefficient of Correlation
           1. The method cannot be used to quantify qualities
             2. Since the method is based on arithmetic mean the values of correlation is
                largely affected by the values of extreme items.
                  Spearman’s Rank Correlation
                  This method is used when the variables cannot be measured in quantitative
                  terms. This method provides a measure of linear association between ranks
                  assigned to individuals according to the quality of their attributes.
                  There are three types of sums that we need to cover here
                  1. When ranks are given
                  2. When ranks are not given
                  3. When ranks are repeated
                      Merits
                      1. It is a simple and easy method to calculate coefficient of correlation
                      2. Suitable in case of qualitative distribution such as honesty, intelligence
                      3. Sometimes the correlation coefficient between two variables with
                         extreme values may be quite different from the coefficient without the
                         extreme values (simple correlation which is based on arithmetic mean
                         and standard deviation). Here the rank correlation provides a better
                         alternative to simple correlation.
                           Demerits
                           1. If there is a large number of observations the method is not suitable
                           2. This method has limited use when group frequency is given
                           3. This method ignores the actual magnitude of data and the results
                              are not exact because the effect of the actual values are not studied.
Line of best fit is the one that passes through the scattered points such that it represents most
of these points. Roughly half of the scattered points should be on either side of the line.
----------------------------------------------------------------------------------------------------------------