Unit-2 Linear Regression Numericals
Linear regression is the most basic and commonly used predictive analysis. One variable is considered to
be an explanatory variable, and the other is considered to be a dependent variable. For example, a modeler
might want to relate the weights of individuals to their heights using a linear regression model.
There are several linear regression analyses available to the researcher.
Simple linear regression
    •  One dependent variable (interval or ratio)
  • One independent variable (interval or ratio or dichotomous)
Multiple linear regression
    •   One dependent variable (interval or ratio)
   • Two or more independent variables (interval or ratio or dichotomous)
Logistic regression
    •  One dependent variable (binary)
   • Two or more independent variable(s) (interval or ratio or dichotomous)
Ordinal regression
    •  One dependent variable (ordinal)
  • One or more independent variable(s) (nominal or dichotomous)
Multinomial regression
    •  One dependent variable (nominal)
    • One or more independent variable(s) (interval or ratio or dichotomous)
Discriminant analysis
    •  One dependent variable (nominal)
   • One or more independent variable(s) (interval or ratio)
Formula for linear regression equation is given by:
                                                𝑦 = 𝑎 + 𝑏𝑥
a and b are given by the following formulas:
             𝑛∑𝑥𝑦 − (∑𝑥)(∑𝑦)
𝑏(𝑠𝑙𝑜𝑝𝑒) =
              𝑛∑𝑥 2 − (∑𝑥)2
Where,
x and y are two variables on the regression line.
b = Slope of the line.
a = y-intercept of the line.
x = Values of the first data set.
y = Values of the second data set.
Solved Examples
Question: Find linear regression equation for the following two sets of data:
        x               2                4               6                  8
        y               3                7               5                  10
Solution:
Construct the following table:
          x                y      x2      xy
          2                3      4       6
          4                7      16      28
          6                5      36      30
          8                10     64      80
      = 20                = 25   = 120   = 144
   𝑛∑𝑥𝑦−(∑𝑥)(∑𝑦)
𝑏= 𝑛∑𝑥 2 −(∑𝑥)2
=
b = 0.95
     ∑𝑦∑𝑥 2 –∑𝑥∑𝑥𝑦
𝑎=   𝑛(∑𝑥 2 )–(∑𝑥)2
a = 1.5
Linear regression is given by:
y = a + bx
y = 1.5 + 0.95 x
                                           Linear Regression
                                         Problems with Solutions
Linear regression and modelling problems are presented along with their solutions at the bottom of the
page. Also a linear regression calculator and grapher may be used to check answers and create more
opportunities for practice.
Review
If the plot of n pairs of data (x , y) for an experiment appear to indicate a "linear relationship" between y
and x, then the method of least squares may be used to write a linear relationship between x and y.
The least squares regression line is the line that minimizes the sum of the squares (d1 + d2 + d3 + d4) of
the vertical deviation from each data point to the line (see figure below as an example of 4 points).
Figure 1. Linear regression where the sum of vertical distances d1 + d2 + d3 + d4 between observed and
predicted (line and its equation) values is minimized.
 The least square regression line for the set of n data points is given by the equation of a line in slope
intercept                                                                                            form:
y=ax+b
where a and b are given by
Figure 2. Formulas for the constants a and b included in the linear regression .
    •   Problem 1
        Consider the following set of points: {(-2 , -1) , (1 , 1) , (3                            , 2)}
        a) Find the least square regression line for the given data                                points.
        b) Plot the given points and the regression line in the same rectangular system of axes.
    •   Problem 2
        a) Find the least square regression line for the following set of data
        {(-1 , 0),(0 , 2),(1 , 4),(2 , 5)}
        b) Plot the given points and the regression line in the same rectangular system of axes.
    •   Problem 3
        The values of y and their corresponding values of y are shown in the table below
        x               0                1           2              3              4
        y               2                3           5              4              6
        a) Find the least square regression line y = a x + b.
        b) Estimate the value of y when x = 10.
    •   Problem 4
        The sales of a company (in million dollars) for each year are shown in the table below.
        x (year)                2005         2006        2007           2008           2009
        y (sales)               12         19             29         37           45
        a) Find the least square regression line y = a x + b.
Solutions to the Above Problems
   1. a) Let us organize the data in a table.
        x                   y                   xy                     x2
        -2                  -1                  2                      4
        1                   1                   1                      1
        3                   2                   6                      9
        Σx = 2              Σy = 2              Σxy = 9                Σx2 = 14
   2.
        We now use the above formula to calculate a                               and    b   as   follows
        a = (nΣx y - ΣxΣy) / (nΣx2 - (Σx)2) = (3*9 - 2*2) / (3*14 - 22) = 23/38
        b = (1/n)(Σy - a Σx) = (1/3)(2 - (23/38)*2) = 5/19
        b) We now graph the regression line given by y = a x + b and the given points.
   3.
        Figure 3. Graph of linear regression in problem 1.
   4. a) We use a table as follows
        x                  Y                     xy                         x2
     -1                0                     0                          1
     0                 2                     0                          0
     1                 4                     4                          1
     2                 5                     10                         4
     Σx = 2            Σy = 11               Σx y = 14                  Σx2 = 6
     We now use the above formula to calculate a and                                  b   as   follows
     a = (nΣx y - ΣxΣy) / (nΣx2 - (Σx)2) = (4*14 - 2*11) / (4*6 - 22) = 17/10 = 1.7
     b = (1/n)(Σy - a Σx) = (1/4)(11 - 1.7*2) = 1.9
     b) We now graph the regression line given by y = ax + b and the given points.
5.
     Figure 4. Graph of linear regression in problem 2.
6. a) We use a table to calculate a and b.
     x                     Y                  xy                       x2
    0                      2                 0                        0
    1                      3                 3                        1
    2                      5                 10                       4
    3                      4                 12                       9
    4                      6                 24                       16
    Σx = 10                Σy = 20           Σx y = 49                Σx2 = 30
    We now calculate a and b using the least square regression formulas for a and b.
    a = (nΣx y - ΣxΣy) / (nΣx2 - (Σx)2) = (5*49 - 10*20) / (5*30 - 102) = 0.9
    b = (1/n)(Σy - a Σx) = (1/5)(20 - 0.9*10) = 2.2
    b) Now that we have the least square regression line y = 0.9 x + 2.2, substitute x by 10 to find the
    value of the corresponding y.
    y = 0.9 * 10 + 2.2 = 11.2
7. a) We first change the variable x into t such that t = x - 2005 and therefore t represents the
   number of years after 2005. Using t instead of x makes the numbers smaller and therefore
   manageable. The table of values becomes.
    t (years after 2005)                               0      1       2        3     4
    y (sales)                                          12     19      29       37    45
    We now use the table to calculate a and b included in the least regression line formula.
    t                  Y                      ty                          t2
    0                  12                     0                           0
    1                  19                     19                          1
    2                  29                     58                          4
    3                  37                     111                         9
    4                  45                     180                         16
        Σx = 10            Σy = 142               Σxy = 368                 Σx2 = 30
        We now calculate a and b using the least square regression formulas for a and b.
        a = (nΣt y - ΣtΣy) / (nΣt2 - (Σt)2) = (5*368 - 10*142) / (5*30 - 102) = 8.4
        b = (1/n)(Σy - a Σx) = (1/5)(142 - 8.4*10) = 11.6
        b)       In         2012,         t        =         2012         -         2005         =   7
        The estimated sales in 2012 are: y = 8.4 * 7 + 11.6 = 70.4 million dollars.
Example 9.9
Calculate the regression coefficient and obtain the lines of regression for the following data
Solution:
Regression coefficient of X on Y
(i) Regression equation of X on Y
(ii) Regression coefficient of Y on X
(iii) Regression equation of Y on X
Y = 0.929X–3.716+11
= 0.929X+7.284
The regression equation of Y on X is Y= 0.929X + 7.284
Example 9.10
Calculate the two regression equations of X on Y and Y on X from the data given below, taking deviations
from a actual means of X and Y.
Estimate the likely demand when the price is Rs.20.
Solution:
Calculation of Regression equation
(i) Regression equation of X on Y
(ii) Regression Equation of Y on X
When X is 20, Y will be
       = –0.25 (20)+44.25
       = –5+44.25
= 39.25 (when the price is Rs. 20, the likely demand is 39.25)
Example 9.11
Obtain regression equation of Y on X and estimate Y when X=55 from the following
Solution:
(i) Regression coefficients of Y on X
(ii) Regression equation of Y on X
Y–51.57 = 0.942(X–48.29 )
Y = 0.942X–45.49+51.57=0.942 #–45.49+51.57
Y = 0.942X+6.08
The regression equation of Y on X is Y= 0.942X+6.08 Estimation of Y when X= 55
Y= 0.942(55)+6.08=57.89
Example 9.12
Find the means of X and Y variables and the coefficient of correlation between them from the following
two regression equations:
2Y–X–50 = 0
3Y–2X–10 = 0.
Solution:
We are given
2Y–X–50     = 0 ... (1)
3Y–2X–10 = 0 ... (2)
Solving equation (1) and (2)
We get      Y = 90
Putting the value of Y in equation (1)
We get      X     = 130
Calculating correlation coefficient
Let us assume equation (1) be the regression equation of Y on X
2Y = X+50
Example 9.13
Find the means of X and Y variables and the coefficient of correlation between them from the following
two regression equations:
4X–5Y+33 = 0
20X–9Y–107 = 0
Solution:
We are given
4X–5Y+33 = 0 ... (1)
20X–9Y–107        =0    ... (2)
Solving equation (1) and (2)
We get      Y = 17
Putting the value of Y in equation (1)
Calculating correlation coefficient
Let us assume equation (1) be the regression equation of X on Y
Let us assume equation (2) be the regression equation of Y on X
But this is not possible because both the regression coefficient are greater than
So our above assumption is wrong. Therefore treating equation (1) has regression equation of Y on X and
equation (2) has regression equation of X on Y . So we get
Example 9.16
For 5 pairs of observations the following results are obtained ∑X=15, ∑Y=25, ∑X2 =55, ∑Y2 =135,
∑XY=83 Find the equation of the lines of regression and estimate the value of X on the first line
when Y=12 and value of Y on the second line if X=8.
Solution:
Y–5 = 0.8(X–3)
       = 0.8X+2.6
When X=8 the value of Y is estimated as
     = 0.8(8)+2.6
=9