Rajib Dolai
Correlation and Regression                                                https://rajib1.weebly.com/
    Correlation is concerned with the measurement of the ‘ strength of association’ between variable.
    While Regression is concerned with the ‘prediction’ of the most likely value of one variable when the
     value of the other variable in known.
      When statistical data relating to
       the simultaneous measurement
       on two variable, each pair of
       observation can be geometrically
       represented then that
       representation is known as
       Scatter Diagram.
      The o d o elatio is used to de ote the deg ee of asso iatio et ee a ia les .
      If y tends to increase as x increases the variables are said to be positively correlated.
      If y tends to decrease as x increases the variables are negatively correlated.
      If the values of y are not affected by changes in the values of x , the variables are said to be
       uncorrelated.
                 1
   Cov (x,y) =       ( −      )( −           )
             =        -(      )(        )
    Variance must be always positive, covariance may be positive, negative or zero.
    If x and y are two independent variable, then their co-variance is Zero. i.e. COV (X,Y) = 0 .
ASSUMPTION:
   1. X and Y are linear relationship .
   2. Both variable should be Normally Distributed.
   3. Homoscedasticity of the variable.
                     𝒐𝒗 ( , )
            r=
                      𝝈 𝝈
    r is independent of the choice of both origin and scale of observation.
    Correlation co-efficient between x and y = Correlation co-efficient between u and v.
                                    −             − ′
                           If u =           ,v=    ′
    r is a pure number and is unit free.
    r lies et ee - a d +        - ≤ ≤ .
      When r = +1     perfect positive Correlation between variable.
          r= -1       perfect negative Correlation between variable.                           Rajib Dolai
                                                                                       https://rajib1.weebly.com/
 r is a measure of degree of association between two variables.
 Correlation coefficient is adopted by karl Pearson.
 If two variable are independent, their correlation coefficient is Zero. But the converse is not true.
   Total variation = Unexplained variance + Explain variable
               𝑬
   1=     +
                   𝐸   𝑙 𝑖   𝑟𝑖       𝑙   (𝐸 )
    Now, 𝑟 2 =     Total variation (TV )
                When r2= 1 , TV =EV and UV =0
                When r2= 0 , EV=0
   Sign of r only indicates whether x and y more in the same direction or opposite directions but r2 is
    always positive.
                                                     −𝐸      𝑙 𝑖        𝑟𝑖   𝑙
   Coefficient of Non-Determination : K2 =
                                                           𝑡 𝑙     𝑟𝑖   𝑙
                                                       𝑬
                                                 =1-
                                                 = 1 – r2
   Coefficient of Alienation : K = ±     − 𝒓
   The Correlation coefficient are symmetric function of x and y i.e.
    i.e. 𝑟 = . 𝑟 . But regression co-efficient are not symmetric function of x and y i.e.      ≠     .
                𝑪𝑶 ( , )          𝝈
            =          𝝈
                             =r𝝈
1. y - =       ( x- )
    x- =       ( y- )
    Where        and      are respectively the regression coefficients of y on x and the regression
    coefficients of x on y.
2. The product of the two regression coefficients is equal to the square of correlation coefficient.
        .    = r2
3. r,      and     , all have the same sign. If the correlation coefficient r is zero, the regression coefficients
         and      are also zero.
4. The regression lines always intersect at the point ( , ) . The slopes of the regression line of y on x and
    the regression line of x on y are respectively        and 1/     .
5. The angle between the two regression lines depends on the correlation coefficient r. When r=0 , the
    two lines are perpendicular to each other; when r= +1, or r= -1, they coincide .As r increases
    numerically from 0 to 1 , the angle between the regression lines diminishes from 90 0 to 00.
6. The two regression equations are usually different . However, when r = ±1 , they become identical;
    and in this case, there is an exact linear relationship between the variables . When r = 0, the regression
    equations reduce to y = and x = , and neither y nor x can be estimated from linear regression
    equations.
7. If the variables are uncorrelated i.e. r = 0 then the lines are perpendicular.
8. If one of the regression coefficient is greater than one , the other must be less than one.
9. The A.M. of regression coefficient (       +      ) is greater than the correlation coefficient.
10. Regression coefficients are independent of change of origin but not of scale.
 Correlation need not imply cause and effect relationship between the variables. But regression
  analysis clearly indicates the cause and effect relationship between variables.
                                                                                                                 Rajib Dolai
                                                                                                         https://rajib1.weebly.com/
Example 1:
 Let the two regression lines be given as: 3x = 10 + 5y and 4y = 5 + 15x . Then the correlation
    oeffi ient etween and is…….
          10          5
   X=
           3
                 +                …………..
                       3
          5          15
      Y= +
        4
                                  …………..
                      4
             5           15            5        15        25           5
      𝑟2 =           ×            =         ×        =            = = 2.5 > 1 [ this is impossible ]
             3            4            3        4             4        2
      So from 1 and 2 e uatio
                           st          nd
                                                              e o e…….
               10             3                      5        4
      Y=-             +               and x = -           +
                 5            5                      15       15
             3            4            3        4                  4       2
      𝑟 2 = 5 × 15                =    5
                                            ×   15
                                                          =         = = 0.4 < 1
                                                                  25 5
                                                                                     so answer is 0.4.
Example 2:
 In a two variable regression Y is dependent variable and X is independent variable. The correlation
   coefficient between Y and X is 0.6. For this which of the result explained by X.
   Y = a + bX                    where Y = dependent variable
                                        X = independent variable
      Here r = 0.6
          r2 = 0.36
      So 36% variations in Y are explained by X.