1. (40pts) TABLE 1.
Decision: CATEGORY
               Weekend (Example) Weather Parents Money Decision (Category)
                        W1              Sunny       Yes      Rich           Cinema
                        W2              Sunny        No      Rich            Tennis
                        W3              Windy       Yes      Rich           Cinema
                        W4              Rainy       Yes      Poor           Cinema
                        W5              Rainy        No      Rich            Stay in
                        W6              Rainy       Yes      Poor           Cinema
                        W7              Windy        No      Poor           Cinema
                        W8              Windy        No      Rich          Shopping
                        W9              Windy       Yes      Rich           Cinema
                       W10              Sunny        No      Rich            Tennis
    a. Create a Decision Tree
    b. Create a Model based on Naïve Bayes
    c. Student Number ending in Prime Number: Determine the Decision if
               Weather: Rainy, Parents: Yes, Money: Rich
       Student Number ending in Non-Prime Number: Determine the Decision if
               Weather: Sunny, Parents: Yes, Money: Poor
         A. Decision Tree
Step 1
Cinema = 6
Tennis = 2
Stay in = 1
Shopping = 1
H(Category) = H(3/5, 1/5, 1/10, 1/10)
H(3/5, 2/10, 1/10, 1/10)=-(3/5 log2 3/5) – (1/5 log2 1/5) – (1/10 log2 1/10) – (1/10 log2 1/10)
=0.444+ 0.464 + 0.332 + 0.332
H(Category)=1.572
Step 2
H(Category/weather), H(Category/parents), H(Category/money)
H(Category/weather)
Sunny * H + Windy * H + Rainy * H =3/10 (1/3, 2/3, 0/3, 0/3) + 4/10 (3/4, 0/4, 0/4, 1/4) + 3/10 (2/3, 0/3,
1/3, 0/3)
H(Category, Weather) = 3/10 ((-1/3 log2 1/3)-(2/3 log2 2/3) – (0/3 log2 0/3) – (0/3 log2 0/3)) + 4/10
((3/4 log2 ¾) – (0/4 log2 0/4) – (0/4 log2 0/4) – (1/4 log2 ¼)) + 3/10 ((-2/3 log2 2/3) – (0/3 log2 0/3) -
(1/3 log2 1/3) – (0/3 log2 0/3)
         = 3/10 (0.92) + 4/10 (0.82) + 3/10 ( 0.92)
         =0.276 + 0.328 + 0.276
H(Category/weather) = 0.88
H(Category/parents)
Yes * H + No * H = 5/10 (5/5, 0/5, 0/5, 0/5) + 5/10 (1/5, 2/5, 1/5, 1/5)
H(Category/parents) = 5/10 ((-5/5 log2 5/5) + 5/10 ((-1/5 log2 1/5) – (2/5 log2 2/5) – (1/5 log2 1/5) –
(1/5 log2 1/5))
=5/10 (0) + 5/10 (1.92)
= 0 + 0.96
H(Category/parents) = 0.96
H(Category/money)
Rich * H + Poor * H = 7/10 (3/7 , 2/7, 1/7, 1/7) + 3/10 (3/3, 0/3, 0/3)
H(Category/money) = 7/10 ((-3/7 log2 3/7) – (2/7 log2 2/7) – (1/7 log2 1/7) – (1/7 log2 1/7)) + 3/10 ((-
3/3 log2 3/3))
= 7/10 (1.85) + 3/10 (0)
= 1.295 + 0
H(Category/money) = 1.295
Step 3
H(Category/weather) = 0.88 | H(Category/parents) = 0.96 | H(Category/money) = 1.295
I(Category/weather) = 1.572 – 0.88 = 0.692
I(Category/parents) = 1.572 – 0.96 = 0.612
I(Category/money) = 1.572 – 1.295 = 0.277
Max(0.692, 0.612, 0.277) = 0.692, so Weather is best
Step 4
Sunny = H(1/3, 2/3, 0/3, 0/3) = 0.92
H (Category/parents)
1/3 (1/1, 0/1, 0/1, 0/1) + 2/3 (0/2, 2/2, 0/2, 0/2)
H (Category/parents) = 1/3 ((-1/1 log2 1/1)) + 2/3 ((-2/2 log2 2/2))
= 1/3 (0) + 2/3 (0)
H (Category/parents) = 0
I (Category/parents) = H(1/3, 2/3, 0/3, 0/3) – 0
I (Category/parents) = 0.92
H (Category/money)
3/3 ( 1/3, 2/3, 0/3, 0/3) + 0/3 (0,0,0,0)
H (Category/money) = 3/3 ((-1/3 log2 1/3) – (2/3 log2 2/3))
= 3/3 (0.92)
H (Category/money) = 0.92
I (Category/money) = 0.92 – 0.92 = 0
Max (0.92) = 0.92
Windy = H (3/4, 0/4, 0/4, ¼) = 0.815
H (Category/parents)
2/4 (2/2, 0/2, 0/2, 0/2) + 2/4 (1/2, 0/2, 0/2, ½)
H (Category/parents) = 2/4 ((-2/2 log2 2/2)) + 2/4 ((-1/2 log2 ½) - (-1/2 log2 ½))
= 2/4 (0) + 2/4 (1)
= 0 + 0.5
H (Category/parents) = 0.5
I (Category/parents) = 0.815 – 0.5 = 0.315
H (Category/money)
3/4 (2/3, 0/3, 0/3, 1/3) + ¼ (1/1, 0/1, 0/1, 0/1)
H (Category/money) = ¾ ((-2/3 log2 2/3) – (1/3 log2 1/3)) + ¼ (-1/1 log2 1/1)
= ¾ (0.92)
H (Category/money) =0.69
I (Category/money) = 0.815 – 0.69 = 0.315
B)
Stud Number: 2020100575 Prime
Given: Weather: Rainy, Parents: Yes, Money: Rich
Step 1
P(C1) = P (Decision = Cinema) = 6/10 = 0.6
P (C2) = P (Decision = Tennis) = 2/10 = 0.2
P (C3) = P (Decision = Stay In) = 1/10 = 0.1
P (C4) = P (Decision = Shopping) = 1/10 = 0.1
Step 2
P(Rainy/Cinema) = 2/6 = 0.333
P(Rainy/Tennis) = 0/2 = 0
P(Rainy/Stay In) = 1/1 = 1
P(Rainy/Shopping) = 0/1 = 0
P (Yes / Cinema) = 5/6 = 0.833
P (Yes/ Tennis) = 0/2 = 0
P (Yes/ Stay In) = 0/1 = 0
P (Yes/ Shopping) = 0/1 = 0
          P (Rich/ Cinema) = 3/6 = 0.5
          P (Rich/ Tennis) = 2/2 = 1
          P (Rich/ Stay In) = 1/1 = 1
          P (Rich/ Shopping) = 1/1 =1
          P(x/Cinema) = 0.333 * 0.833 * 0.5 = 0.139
          P(x/Tennis) = 0 * 0* 1 = 0
          P(x/Stay In) = 1 * 0 * 1 = 0
          P(x/Shopping) = 0 * 0 * 1 = 0
          P(Cinema) * P(x/cinema) = 0.6 * 0.139
          P(Cinema) * P(x/cinema) = 0.083
          Prediction: Cinema
    2. (40pts) Table 2. Decision: BUYS-RRSP
    a. Create a Decision Tree
    b. Create a Model based on Naïve Bayes
    c. Student Number ending in Prime Number: Determine the Decision if
               Sector: Farming, Income: medium, Self-Employed: Yes, Credit-Rating: Fair
       Student Number ending in Non-Prime Number: Determine the Decision if
               Sector: Banking, Income: medium, Self-Employed: Yes, Credit-Rating: Excellent
          A. Decision Tree
Step 1
Yes = 8
No = 6
H(Buys-RRSP) = H(8/14, 6/14)
=-(8/14 log2 8/14) – (6/14 log2 6/14)
=0.463 + 0.527
H(Buys-RRSP) =0.99
Step 2
H(Buys-RRSP/Sector), H(Buys-RRSP/Income), H(Buys-RRSP/Self-Employed), H(Buys-RRSP/Credit-Rating)
H(Buys-RRSP/Sector)
Farming * H + Oil * H + Banking * H = 5/14 (2/5, 3/5) + 5/14 (2/5, 3/5) + 4/14 (4/4, 0/4)
H(Buys-RRSP/Sector) = 5/14 ((-2/5 log2 2/5) – (3/5 log2 3/5)) + 5/14 ((-2/5 log2 2/5) – (3/5 log2 3/5)) +
4/14 (-4/4 log2 4/4)
= 5/14(0.97) + 5/14 (0.97) + 4/14 (0)
= 0.346 + 0.346
H(Buys-RRSP/Sector) = 0.692
H(Buys-RRSP/Income)
Low * H + Medium * H + High * H = 4/14(3/4, ¼) + 6/14 (3/6, 3/6) + 4/14 (2/4, 2/4)
H(Buys-RRSP/Income) = 4/14 ((-3/4 log2 ¾) – (1/4 log2 ¼)) + 6/14 ((-3/6 log2 3/6) – (3/6 log2 3/6)) +
4/14 ((-2/4 log2 2/4) – (2/4 log2 2/4))
= 0.232 + 0.429 + 0.286
H(Buys-RRSP/Income) = 0.947
H(Buys-RRSP/Self-Employed)
Yes * H + No * H = 7/14 (5/7, 2/7) + 7/14 (4/7, 3/7)
H(Buys-RRSP/Self-Employed) = 7/14 ((-5/7 log2 5/7) – (2/7 log2 2/7)) + 7/14 ((-4/7 log2 4/7) – (3/7 log2
3/7))
= 0.432 + 0.493
H(Buys-RRSP/Self-Employed) = 0.925
H(Buys-RRSP/Credit-Rating)
Fair * H + Excellent * H = 8/14 (3/8, 5/8) + 6/14 ( 5/6, 1/6)
H(Buys-RRSP/Credit-Rating) = 8/14 ((-3/8 log2 3/8) – (5/8 log2 5/8)) + 6/14 ((-5/6 log2 5/6) – (1/6 log2
1/6))
= 0.545 + 0.279
H(Buys-RRSP/Credit-Rating) = 0.824
Step 3
H(Buys-RRSP/Sector) = 0.692 | H(Buys-RRSP/Income) = 0.947 | H((Buys-RRSP/Self-Employed) =0.925
|H (Buys-RRSP/Credit-Rating) = 0.824
I(Buys-RRSP/Sector) = 0.99 – 0.692 = 0.298
I(Buys-RRSP/Income) = 0.99 – 0.947 = 0.043
I(Buys-RRSP/Self-employed) = 0.99 – 0.925 = 0.065
I(Buys-RRSP/credit-rating) = 0.99 – 0.824 = 0.166
Max (0.298, 0.043, 0.065, 0.166) = 0.298, so Sector is Best
Step 4
H(Buys-RRSP/Income), H(Buys-RRSP/Self-Employed), H(Buys-RRSP/Credit-Rating)
H(Buys-RRSP/Income)= 3/5 (1/3 , 2/3) + 2/5 (1/2,1/2) + 0/5 (0,0)
= 0.551 + 0.4
H(Buys-RRSP/Income)= = 0.951
I (Buys-RRSP/Income)= 0.99 – 0.951 = 0.039
H (Buys-RRSP/Self-Employed) = 3/5 (1/3, 2/3) + 2/5 (1/2 , ½)
= 0.551 + 0.4
H (Buys-RRSP/Self-Employed) = 0.951
I (Buys-RRSP/Self-Employed) = 0.99 – 0.951 = 0.039
I(Buys-RRSP/Credit-Rating) = 0.99 – 0 = 0.99
Max (0.039, 0.039, 0.99) = 0.99
         B)
Stud Number: 2020100575 Prime
Given: Sector: Farming, Income: medium, Self-Employed: Yes, Credit-Rating: Fair
Step 1
P (C1) = P (Buys – RRSP = Yes) = 8/14 = 0.571
P (C2) = P (Buys – RRSP = No) = 6/14 = 0.429
Step 2
P(Farming/Yes) = 2/8 = 0.25
P(farming/No) = 3/6 = 0.5
P(Medium / Yes) = 3/8 = 0.375
P(Medium / No) = 3/6 = 0.5
P(Yes/ Yes) = 5/8 = 0.625
P(Yes/ No) = 2/6 = 0.333
P(Fair/Yes) = 3/8 = 0.375
P(Fair/No) = 5/6 = 0.833
P(x/yes) = 0.25 * 0.375 * 0.625 * 0.375 = 0.021
P(x/no) = 0.5 * 0.5 * 0.333 * 0.833 = 0.069
P(x/yes) * P(yes) = 0.021 * 0.571 = 0.012
P(x/no) * P(no) = 0.069 * 0.429 = 0.030
Prediction: No with 0.030 probability
    3. (20PTS) TABLE 3. APPLY APRIORI ALGORITHM FOR BASKET ANALYSIS
        Min support: 2
        Min Confidence: 50%
        Step 1
 Items Bought                    Support Count
 A                               5
 B                               7
 C                               5
 D                               9
 E                               6
       Min support: 2
       Min Confidence: 50%
        Step 2
 Items Bought                    Support Count
 {A,B}                           3
 {A,C}                           2
 {A,D}                           4
 {A,E}                           4
 {B,C}                           3
 {B,D}                           6
 {B,E}                                 4
 {C,D}                                 4
 {C,E}                                 2
 {D,E}                                 6
         A, B, C ,D ,E
         Min support: 2
         Min Confidence: 50%
Step 3
ABC, ABD, ABE, ACD, ACE, ADE, BCD, BCE, BDE, CDE
         Min support: 2
         Min Confidence: 50%
 Items Bought                 Support Count
 {A, B, C}                    1 x
 {A, B, D}                    2
 {A, B, E}                    2
 {A, C, D}                    1x
 {A, C, E}                    1x
 {A, D, E}                    4
 {B, C, D}                    2
 {B, C, E}                    1x
 {B, D, E}                    4
 {C, D, E}                    2
New table with matching min support
 Items Bought                 Support Count
 {A, B, D}                    2
 {A, B, E}                    2
 {A, D, E}                    4
 {B, C, D}                    2
 {B, D, E}                    4
 {C, D, E}                    2
{A, B, D}, {A, B, E}, {A, D, E}, {B, C, D}, {B, D, E}, and {C, D, E}
Step 4
{A, B, D}, {A, B, E}, {A, D, E}, {B, C, D}, {B, D, E}, and {C, D, E}
 RULES                                       SUPPORT           CONFIDENCE
 A -> D^E                                    3                 Sup{ ( A ^(D^E)}/Sup (A) =3/5 =.6 =60%
 B -> D^E                                    4                 Sup { (B ^(D^E)}/ Sup (B) = 4/7 = .57 =57%
 A - > B^D                                   2                 Sup { A - > B^D}/Sup (A) = 2/5 = 0.4 = 40%
 B -> C^D                                    2                 Sup{ B -> C^D }/ Sup(B) = 2/7 = 0.26 = 26%
 B^D -> E                                    4                 Sup { (B^D)^E}/ Sup (B^D) = 4/5 = .8 =80%
 B^E -> D                                    4                 Sup { (B^E)^D}/ Sup (B^E) = 4/4 = 1 =100%
 C^D -> E                                    2                 Sup { C^D -> E }/ Sup (C^D) = 2/4 = .5 = 50%
 D^A -> E                                    3                 Sup { (D^A)^E}/ Sup (D^A) = 3/4 = .75 =75%
 D -> E^A                                    3                 Sup { (D^(E^A)}/ Sup (D) = 3/9 = .33 = 33%
 C -> B^D                                    2                 Sup { C -> B^D} / Sup (C) = 2/5 = .4 = 40%
 D^E -> A                                    3                 Sup { (D^E)^A}/ Sup (D^E) = 3/6 = .5 =50%
 D^E -> B                                    4                 Sup { (D^E)^B}/ Sup (D^E) = 4/6 = .67 =67%
 D -> B^E                                    4                 Sup { (D^(B^E)}/ Sup (D) = 4/9 = .44 =44%
 E^A -> D                                    3                 Sup { (E^A)^D}/ Sup (E^A) = 3/4 = .75 =75%
 E -> B^D                                    4                 Sup { (E^(B^D)}/ Sup (E) = 4/6 = .67 =67%
 E -> D^A                                    3                 Sup { (E(D^A)}/ Sup (E) = 3/6 = .5 =50