Random Forest
Random Forest
com
AB5D4F1ITD
                              Random Forest
nilashishsarkar70@gmail.com
AB5D4F1ITD
Employee of XYZ Financial Advisor of XYZ Stock Market Trader Employee of acompetitor Market Researchteam Social Media Expert
       Knows internal          perspective on companies          observed company’sstock                internal functionality of the            analyzes the customer             understand product
         functionality
nilashishsarkar70@gmail.com
AB5D4F1ITD
                                    vs competition                price over past 3years                      competitor firms                 preference of XYZ’sproduct             positioning
 lacks a broader perspective                                                                               has been right 60% of                 have been right 75%of          unaware of detailsbeyond
                               has been right 75%times.          has been right 70%times.
        on competitors                                                                                            times.                                 times.                     digital marketing
      Sno         X1          X2            Y                                                                                                Sno   X1      X2            Y
        1           432
nilashishsarkar70@gmail.com
                                   29 Yes                                        Random sample rows                                           3      125        67 No
        2
AB5D4F1ITD
                    529            34 Yes                                         with replacement                                            4      144        29 No
                                                                                                                                              4      144        29 No
        3           125            67 No
        4           144            29 No
                                                                                                                                             Sno   X1      X2            Y
                                                                                                                                              3      125        67 No
                                                                                                                                              2      529        34 Yes
                                                                                                                                              3      125        67 No
                                                                                                                           Sno            X1      X3        Y
                                                                                                                            3               125     317 No
                                                                                                                            2               529     379 Yes
                                                                                                                            3               125     317 No
nilashishsarkar70@gmail.com
                                                         Step2 – create a decision
AB5D4F1ITD
                                                         tree using boot strapped
               Step1 – create a                                                                                                Step3 – repeat the same
                                                          dataset. But only use a
             bootstrapped dataset                                                                                              and create multiple trees
                                                             random subset of
                                                           variables at each step
    2
      1
AB5D4F1ITD
            X1
                432
              529
                     X2
                         29
nilashishsarkar70@gmail.com
                          34
                               X3
                                 313
                                 379
                                       X4
                                            6 Yes
                                            2 Yes
                                                    Y
                                                                   Sno
                                                                    3
                                                                               X3
                                                                                 317
                                                                                           X4
                                                                                                 4 No
                                                                                                         Y         end up in the boot strapped
    3
    4
              125
              144
                          67
                          29
                                 317
                                 103
                                            4 No
                                            8 No
                                                                    4
                                                                    4
                                                                                 103
                                                                                 103
                                                                                                 8 No
                                                                                                 8 No              dataset
                                                                                                                 • This is called out-of-bag
                                                                   Sno
                                                                    3
                                                                               X1
                                                                                 125
                                                                                           X3
                                                                                             317 No
                                                                                                     Y             dataset
                                                                    2            529         379 Yes
                                                                    3            125         317 No
           variables
         • Then try a few settings above and below the value
          Consists of a large number                        Each tree in the random                                                 class with most votes
         of individual decision trees                        forest spits out a class
                                                                                                                                 becomes model’s prediction
         that operate asan ensemble                                prediction
nilashishsarkar70@gmail.com
AB5D4F1ITD
          EDA– bivariate
nilashishsarkar70@gmail.com
AB5D4F1ITD
                                              Model performance
             Predict for train &              • Acc, sens, spec
                                                                                                      Variable importance
                    test                      • AUC