Tutorial 1:
Regression
EE4802/IE4213 Tutorial Session
Question 1
    Suppose that the MSE of model A is less than the MSE of model B. What can we
    say about the 𝑅𝑅2 of models A and B?
Solution:
                                      ∑(                                )2
                                        Actual Weight − Estimated Weight
                           𝑅𝑅 2 = 1 −
                                        ∑(Actual Weight − Mean Weight)2
 Observe that the denominator is the same for both models A and B.
 Since MSE of model A is less than model B, the numerator will be smaller under model A than model B.
 Therefore, R2 of model A is larger than that of model B.
    EE4802/IE4213 Tutorial Session
Question 2
   Show that adding more factors increases the 𝑅𝑅2 of a linear regression model.
Solution:           Consider the following two models:
                 • Model A: 𝑦𝑦 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥1 + 𝛽𝛽2 𝑥𝑥2 + ⋯ + 𝛽𝛽𝑛𝑛 𝑥𝑥𝑛𝑛
                 • Model B: 𝑦𝑦 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥1 + 𝛽𝛽2 𝑥𝑥2 + ⋯ + 𝛽𝛽𝑛𝑛 𝑥𝑥𝑛𝑛 + 𝛽𝛽𝑛𝑛+1 𝑥𝑥𝑛𝑛+1
                    Suppose linear regression is performed using Model B.
                 • If the resulting 𝛽𝛽𝑛𝑛+1 = 0, models A and B are equivalent. Under this scenario, R2 of model
                   A is the same as that of model B.
                 • If the resulting 𝛽𝛽𝑛𝑛+1 ≠ 0, MSE of model B is less than MSE of model A since linear
                   regression minimizes MSE.
                 • Therefore: MSE of model A ≥ MSE of model B
   EE4802/IE4213 Tutorial Session
Question 3
  EE4802/IE4213 Tutorial Session
Question 3(a)
   Express RSS of the above regression line as a function of 𝛽𝛽0 and 𝛽𝛽1 .
Solution:
       𝑅𝑅𝑅𝑅𝑅𝑅 = 57.48 − 𝛽𝛽0 − 1.52𝛽𝛽1 2 + 56.57 − 𝛽𝛽0 − 1.60𝛽𝛽1 2 + ⋯ + 70.19 − 𝛽𝛽0 − 1.80𝛽𝛽1   2
       = 5𝛽𝛽02 + 13.9953𝛽𝛽12 + 16.7𝛽𝛽0 𝛽𝛽1 − 630.9𝛽𝛽0 − 1058.4268𝛽𝛽1 + 20039.2935
   EE4802/IE4213 Tutorial Session
Question 3(b)
   Find the stationary point of the above RSS function (i.e., 𝛽𝛽0 and 𝛽𝛽1 values such
   that derivative of RSS equals zero).
Solution:           From 3(a) we have
                        𝑅𝑅𝑅𝑅𝑅𝑅 = 57.48 − 𝛽𝛽0 − 1.52𝛽𝛽1 2 + 56.57 − 𝛽𝛽0 − 1.60𝛽𝛽1 2 + ⋯ + 70.19 − 𝛽𝛽0 − 1.80𝛽𝛽1   2
                        = 5𝛽𝛽02 + 13.9953𝛽𝛽12 + 16.7𝛽𝛽0 𝛽𝛽1 − 630.9𝛽𝛽0 − 1058.4268𝛽𝛽1 + 20039.2935
        𝑑𝑑 𝑅𝑅𝑅𝑅𝑅𝑅
                  = 10𝛽𝛽0 + 16.7𝛽𝛽1 − 630.9 = 0 ⇒ 𝛽𝛽0 = 63.09 − 1.67𝛽𝛽1
          𝑑𝑑𝛽𝛽0
        𝑑𝑑 𝑅𝑅𝑅𝑅𝑅𝑅
                  = 27.9906𝛽𝛽1 + 16.7𝛽𝛽0 − 1058.4268 = 27.9906𝛽𝛽1 + 16.7 63.09 − 1.67𝛽𝛽1 − 1058.4268 = 0
          𝑑𝑑𝛽𝛽1
                                             1058.4268 − 16.7 × 63.09
                                    ⇒ 𝛽𝛽1 =                           = 47.4783
                                               27.9906 − 16.7 × 1.67
                                    ⇒ 𝛽𝛽0 = 63.09 − 1.67 × 47.4783 = −16.1988
   EE4802/IE4213 Tutorial Session
Question 3(c)
   Determine if the stationary point is a minimum, maximum or inflection point
                         From 3(b) we have
                           𝑑𝑑 𝑅𝑅𝑅𝑅𝑅𝑅                                 𝑑𝑑 𝑅𝑅𝑅𝑅𝑅𝑅
Solution:                    𝑑𝑑𝛽𝛽0
                                     = 10𝛽𝛽0 + 16.7𝛽𝛽1 − 630.9,
                                                                       𝑑𝑑𝛽𝛽1
                                                                               = 27.9906𝛽𝛽1 + 16.7𝛽𝛽0 − 1058.4268
    𝑑𝑑 2 𝑅𝑅𝑅𝑅𝑅𝑅     𝑑𝑑                                    𝑑𝑑 𝑑𝑑 𝑅𝑅𝑅𝑅𝑅𝑅         𝑑𝑑
                =       10𝛽𝛽0 + 16.7𝛽𝛽1 − 630.9 = 10                     =         10𝛽𝛽0 + 16.7𝛽𝛽1 − 630.9 = 16.7
        𝑑𝑑𝛽𝛽02    𝑑𝑑𝛽𝛽0                                 𝑑𝑑𝛽𝛽1  𝑑𝑑𝛽𝛽0         𝑑𝑑𝛽𝛽1
   𝑑𝑑 2 𝑅𝑅𝑅𝑅𝑅𝑅     𝑑𝑑
           2   =        27.9906𝛽𝛽1 + 16.7𝛽𝛽0 − 1058.4268 = 27.9906
       𝑑𝑑𝛽𝛽1     𝑑𝑑𝛽𝛽 1
   Because 10*27.9906 – 16.7*16.7 > 0, the Hessian matrix is positive definite, and thus (-16.1988, 47.4783)
   is a minimum point.
   EE4802/IE4213 Tutorial Session
Question 4
  EE4802/IE4213 Tutorial Session
Question 4(a)
   Express the ridge loss of as a function of 𝛽𝛽0 and 𝛽𝛽1 . Assume 𝛼𝛼 = 1.
Solution:
   Loss = (57.48 − 𝛽𝛽0 − 1.52𝛽𝛽1 )2 + (56.57 − 𝛽𝛽0 − 1.60𝛽𝛽1 )2 + ⋯ + (70.19 − 𝛽𝛽0 − 1.80𝛽𝛽1 )2 + 𝛽𝛽12
         = 5𝛽𝛽02 + 14.9953𝛽𝛽12 + 16.7𝛽𝛽0 𝛽𝛽1 − 630.9𝛽𝛽0 − 1058.4268𝛽𝛽1 + 20039.2935
   EE4802/IE4213 Tutorial Session
Question 4(b)
   Find the stationary point of the above loss function. Assume 𝛼𝛼 = 1.
Solution:         From 4(a) we have
                  Loss = (57.48 − 𝛽𝛽0 − 1.52𝛽𝛽1 )2 + (56.57 − 𝛽𝛽0 − 1.60𝛽𝛽1 )2 + ⋯ + (70.19 − 𝛽𝛽0 − 1.80𝛽𝛽1 )2 + 𝛽𝛽12
                       = 5𝛽𝛽02 + 14.9953𝛽𝛽12 + 16.7𝛽𝛽0 𝛽𝛽1 − 630.9𝛽𝛽0 − 1058.4268𝛽𝛽1 + 20039.2935
    𝑑𝑑 𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿
                = 10𝛽𝛽0 + 16.7𝛽𝛽1 − 630.9 = 0 ⇒ 𝛽𝛽0 = 63.09 − 1.67𝛽𝛽1
       𝑑𝑑𝛽𝛽0
    𝑑𝑑 𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿
                = 29.9906𝛽𝛽1 + 16.7𝛽𝛽0 − 1058.4268 = 29.9906𝛽𝛽1 + 16.7 63.09 − 1.67𝛽𝛽1 − 1058.4268 = 0
       𝑑𝑑𝛽𝛽1
                                    1058.4268 − 16.7 × 63.09
                           ⇒ 𝛽𝛽1 =                           = 2.2953
                                      29.9906 − 16.7 × 1.67
                         ⇒ 𝛽𝛽0 = 63.09 − 1.67 × 2.2953 = 59.2569
   EE4802/IE4213 Tutorial Session
Question 4(c)
    Express the stationary point of the above loss function as a function of 𝛼𝛼.
Solution:         Loss = (57.48 − 𝛽𝛽0 − 1.52𝛽𝛽1 )2 + (56.57 − 𝛽𝛽0 − 1.60𝛽𝛽1 )2 + ⋯ + (70.19 − 𝛽𝛽0 − 1.80𝛽𝛽1 )2 + 𝛼𝛼𝛽𝛽12
                        = 5𝛽𝛽02 + (14.9953 + 𝛼𝛼)𝛽𝛽12 + 16.7𝛽𝛽0 𝛽𝛽1 − 630.9𝛽𝛽0 − 1058.4268𝛽𝛽1 + 20039.2935
 𝑑𝑑 𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿
             = 10𝛽𝛽0 + 16.7𝛽𝛽1 − 630.9 = 0 ⇒ 𝛽𝛽0 = 63.09 − 1.67𝛽𝛽1
    𝑑𝑑𝛽𝛽0
 𝑑𝑑 𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿
             = 29.9906 + 2𝛼𝛼 𝛽𝛽1 + 16.7𝛽𝛽0 − 1058.4268 = (29.9906 + 2𝛼𝛼)𝛽𝛽1 + 16.7 63.09 − 1.67𝛽𝛽1 − 1058.4268 = 0
    𝑑𝑑𝛽𝛽1
                               1058.4268 − 16.7 × 63.09      4.8238
                     ⇒ 𝛽𝛽1 =                             =
                              27.9906 + 2𝛼𝛼 − 16.7 × 1.67 0.1016 + 2𝛼𝛼
                                               4.8238
                     ⇒ 𝛽𝛽0 = 63.09 − 1.67 ×
                                            0.1016 + 2𝛼𝛼
   EE4802/IE4213 Tutorial Session
Question 4(d)
   Does 𝑅𝑅2 increase or decrease with 𝛼𝛼? Does this make sense?
Solution:       First, we express RSS as a function of 𝛽𝛽1
     𝑅𝑅𝑅𝑅𝑅𝑅 = 57.48 − 𝛽𝛽0 − 1.52𝛽𝛽1 2 + 56.57 − 𝛽𝛽0 − 1.60𝛽𝛽1 2 + ⋯ + 70.19 − 𝛽𝛽0 − 1.80𝛽𝛽1 2
     = 5𝛽𝛽02 + 13.9953𝛽𝛽12 + 16.7𝛽𝛽0 𝛽𝛽1 − 630.9𝛽𝛽0 − 1058.4268𝛽𝛽1 + 20039.2935
     = 5 63.09 − 1.67𝛽𝛽1 2 + 13.9953𝛽𝛽12 + 16.7 63.09 − 1.67𝛽𝛽1 𝛽𝛽1 − 630.9 63.09 − 1.67𝛽𝛽1 − 1058.4268𝛽𝛽1
     = 0.0508𝛽𝛽12 − 4.8238𝛽𝛽1 + 137.553
                                    𝑑𝑑 𝑅𝑅𝑅𝑅𝑅𝑅
   Taking the first derivative:       𝑑𝑑𝛽𝛽1
                                                < 0 ⇒ 0.1016𝛽𝛽1 − 4.8238 < 0 ⇒ 𝛽𝛽1 < 47.478
   It follows that RSS decrease with 𝛽𝛽1 for all 𝛽𝛽1 < 47.478.
                         4.8238                                              4.8238
   Recall that 𝛽𝛽1 =                is decreasing with 𝛼𝛼 and that 𝛽𝛽1 <              = 47.478 for all 𝛼𝛼 > 0.
                       0.1016+2𝛼𝛼                                            0.1016
   Hence: Decreasing 𝛼𝛼 -> Increasing 𝛽𝛽1 -> Decreasing RSS -> Increasing 𝑅𝑅2
   EE4802/IE4213 Tutorial Session
Question 5
   Verify your solutions to Question 4 using scikit-learn Ridge
Solution:
                                                             Decreasing 𝛼𝛼 -> Increasing 𝑅𝑅2
   EE4802/IE4213 Tutorial Session