For comparing the accuracy among different linear regression models, RMSE is a
better choice than R Squared.
    Decision Trees
    In simple words, a decision tree is a structure that contains nodes (rectangular boxes) and
    edges(arrows) and is built from a dataset (table of columns representing features/attributes and
    rows corresponds to records). Each node is either used to make a decision (known as decision
    node) or represent an outcome (known as leaf node).
    Decision tree Example
    The picture above depicts a decision tree that is used to classify whether a person is
    Fit or Unfit.
    The decision nodes here are questions like ‘’‘Is the person less than 30 years of age?’, ‘Does
    the person eat junk?’, etc. and the leaves are one of the two possible outcomesviz.
    Fit and Unfit.
    Looking at       the   Decision   Tree   we    can   say   make    the     following   decisions:
    if a person is less than 30 years of age and doesn’t eat junk food then he is Fit, if a person is
    less than 30 years of age and eats junk food then he is Unfit and so on.
    The initial node is called the root node (colored in blue), the final nodes are called the leaf
    nodes (colored in green) and the rest of the nodes are called intermediate or internal nodes.
    The root and intermediate nodes represent the decisions while the leaf nodes represent the
    outcomes.
    ID3
    ID3 stands for Iterative Dichotomiser 3 and is named such because the algorithm iteratively
    (repeatedly) dichotomizes(divides) features into two or more groups at each step.
CSIT DEPT-R22-MACHINE LEARNING                                                                          26
    Invented by Ross Quinlan, ID3 uses a top-down greedy approach to build a decision tree. In
    simple words, the top-down approach means that we start building the tree from the top and
    the greedy approach means that at each iteration we select the best feature at the present
    moment to create a node.
    Most generally ID3 is only used for classification problems with nominal features only.
    ID3 Steps
        1.       Calculate the Information Gain of each feature.
        2. Considering that all rows don’t belong to the same class, split the dataset S into subsets
              using the feature for which the Information Gain is maximum.
        3.       Make a decision tree node using the feature with the maximum Information gain.
        4. If all rows belong to the same class, make the current node as a leaf node with the classas
              its label.
        5. Repeat for the remaining features until we run out of all features, or the decision tree
              has all leaf nodes.
    CART Algorithm
    The CART algorithm works via the following process:
                The best split point of each input is obtained.
                Based on the best split points of each input in Step 1, the new “best” split point is
             identified.
                Split the chosen input according to the “best” split point.
                Continue splitting until a stopping rule is satisfied or no further desirable splitting is
             available.
     CART algorithm uses Gini Impurity to split the dataset into a decision tree .It does that by
    searching for the best homogeneity for the sub nodes, with the help of the Gini index criterion.
    Gini index/Gini impurity
    The Gini index is a metric for the classification tasks in CART. It stores the sum of squared
    probabilities of each class. It computes the degree of probability of a specific variable that is
    wrongly being classified when chosen randomly and a variation of the Gini coefficient. It works
    on categorical variables, provides outcomes either “successful” or “failure” and hence conducts
    binary splitting only.
    The degree of the Gini index varies from 0 to 1,
CSIT DEPT-R22-MACHINE LEARNING                                                                                27
              Where 0 depicts that all the elements are allied to a certain class, or only one class exists
         there.
              The Gini index of value 1 signifies that all the elements are randomly distributed across
         various classes, and
              A value of 0.5 denotes the elements are uniformly distributed into some classes.
    Classification tree
    A classification tree is an algorithm where the target variable is categorical. The algorithm is
    then used to identify the “Class” within which the target variable is most likely to fall.
    Classification trees are used when the dataset needs to be split into classes that belong to the
    response variable(like yes or no)
    Regression tree
    A Regression tree is an algorithm where the target variable is continuous and the tree is used
    to predict its value. Regression trees are used when the response variable is continuous. For
    example, if the response variable is the temperature of the day.
    Pseudo-code of the CART algorithm
    d = 0, endtree = 0
    Note(0) = 1, Node(1) = 0, Node(2) = 0
    while endtree < 1
        if Node(2d -1) + Node(2d) + ......+ Node(2d+1 -2) = 2 - 2d+1
          endtree = 1
        else
          do i = 2d -1, 2d, ......, 2d+1 -2
               if Node(i) > -1
                  Split tree
               else
                  Node(2i+1) = -1
                  Node(2i+2) = -1
               end if
          end do
        end if
    d=d+1
    end while
    CART model representation
CSIT DEPT-R22-MACHINE LEARNING                                                                                 28
    CART models are formed by picking input variables and evaluating split points on those
    variables until an appropriate tree is produced.
    Steps to create a Decision Tree using the CART algorithm:
           Greedy algorithm: In this The input space is divided using the Greedy method which
        is known as a recursive binary spitting. This is a numerical method within which all of the
        values are aligned and several other split points are tried and assessed using a cost function.
           Stopping Criterion: As it works its way down the tree with the training data, the
        recursive binary splitting method described above must know when to stop splitting. The
        most frequent halting method is to utilize a minimum amount of training data allocated to
        every leaf node. If the count is smaller than the specified threshold, the split is rejected and
        also the node is considered the last leaf node.
           Tree pruning: Decision tree’s complexity is defined as the number of splits in the tree.
        Trees with fewer branches are recommended as they are simple to grasp and less prone to
        cluster the data. Working through each leaf node in the tree and evaluating the effect of
        deleting it using a hold-out test set is the quickest and simplest pruning approach.
           Data preparation for the CART: No special data preparation is required for the CART
        algorithm.
    Naïve Bayes Classifier Algorithm
        o   Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes
            theorem and used for solving classification problems.
        o   It is mainly used in text classification that includes a high-dimensional training dataset.
        o   Naïve Bayes Classifier is one of the simple and most effective Classification algorithms
            which helps in building the fast machine learning models that can make quick
            predictions.
        o   It is a probabilistic classifier, which means it predicts on the basis of the
            probability of an object.
        o   Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental
            analysis, and classifying articles.
    Why is it called Naïve Bayes?
CSIT DEPT-R22-MACHINE LEARNING                                                                             29