Decision Trees: Make A Decision (Represent An Outcome

The document discusses various machine learning algorithms, focusing on decision trees, including ID3 and CART. It explains the structure of decision trees, their classification and regression types, and the algorithms' processes for creating and evaluating trees. Additionally, it briefly introduces the Naïve Bayes classifier, highlighting its application in text classification and its probabilistic nature.

Uploaded by

Govuri Teja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views4 pages

Decision Trees: Make A Decision (Represent An Outcome

Uploaded by

Govuri Teja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

 For comparing the accuracy among different linear regression models, RMSE is a

better choice than R Squared.

Decision Trees
In simple words, a decision tree is a structure that contains nodes (rectangular boxes) and
edges(arrows) and is built from a dataset (table of columns representing features/attributes and
rows corresponds to records). Each node is either used to make a decision (known as decision
node) or represent an outcome (known as leaf node).
Decision tree Example

The picture above depicts a decision tree that is used to classify whether a person is
Fit or Unfit.
The decision nodes here are questions like ‘’‘Is the person less than 30 years of age?’, ‘Does
the person eat junk?’, etc. and the leaves are one of the two possible outcomesviz.
Fit and Unfit.
Looking at the Decision Tree we can say make the following decisions:
if a person is less than 30 years of age and doesn’t eat junk food then he is Fit, if a person is
less than 30 years of age and eats junk food then he is Unfit and so on.
The initial node is called the root node (colored in blue), the final nodes are called the leaf
nodes (colored in green) and the rest of the nodes are called intermediate or internal nodes.
The root and intermediate nodes represent the decisions while the leaf nodes represent the
outcomes.
ID3
ID3 stands for Iterative Dichotomiser 3 and is named such because the algorithm iteratively
(repeatedly) dichotomizes(divides) features into two or more groups at each step.

CSIT DEPT-R22-MACHINE LEARNING 26

Invented by Ross Quinlan, ID3 uses a top-down greedy approach to build a decision tree. In
simple words, the top-down approach means that we start building the tree from the top and
the greedy approach means that at each iteration we select the best feature at the present
moment to create a node.
Most generally ID3 is only used for classification problems with nominal features only.
ID3 Steps
1. Calculate the Information Gain of each feature.
2. Considering that all rows don’t belong to the same class, split the dataset S into subsets
using the feature for which the Information Gain is maximum.
3. Make a decision tree node using the feature with the maximum Information gain.
4. If all rows belong to the same class, make the current node as a leaf node with the classas
its label.
5. Repeat for the remaining features until we run out of all features, or the decision tree
has all leaf nodes.
CART Algorithm
The CART algorithm works via the following process:
 The best split point of each input is obtained.
 Based on the best split points of each input in Step 1, the new “best” split point is
identified.
 Split the chosen input according to the “best” split point.
 Continue splitting until a stopping rule is satisfied or no further desirable splitting is
available.

CART algorithm uses Gini Impurity to split the dataset into a decision tree .It does that by
searching for the best homogeneity for the sub nodes, with the help of the Gini index criterion.
Gini index/Gini impurity
The Gini index is a metric for the classification tasks in CART. It stores the sum of squared
probabilities of each class. It computes the degree of probability of a specific variable that is
wrongly being classified when chosen randomly and a variation of the Gini coefficient. It works
on categorical variables, provides outcomes either “successful” or “failure” and hence conducts
binary splitting only.
The degree of the Gini index varies from 0 to 1,

CSIT DEPT-R22-MACHINE LEARNING 27

 Where 0 depicts that all the elements are allied to a certain class, or only one class exists
there.
 The Gini index of value 1 signifies that all the elements are randomly distributed across
various classes, and
 A value of 0.5 denotes the elements are uniformly distributed into some classes.
Classification tree
A classification tree is an algorithm where the target variable is categorical. The algorithm is
then used to identify the “Class” within which the target variable is most likely to fall.
Classification trees are used when the dataset needs to be split into classes that belong to the
response variable(like yes or no)
Regression tree
A Regression tree is an algorithm where the target variable is continuous and the tree is used
to predict its value. Regression trees are used when the response variable is continuous. For
example, if the response variable is the temperature of the day.
Pseudo-code of the CART algorithm
d = 0, endtree = 0
Note(0) = 1, Node(1) = 0, Node(2) = 0
while endtree < 1
if Node(2d -1) + Node(2d) + ......+ Node(2d+1 -2) = 2 - 2d+1
endtree = 1
else
do i = 2d -1, 2d, ......, 2d+1 -2
if Node(i) > -1
Split tree
else
Node(2i+1) = -1
Node(2i+2) = -1
end if
end do
end if
d=d+1
end while
CART model representation

CSIT DEPT-R22-MACHINE LEARNING 28

CART models are formed by picking input variables and evaluating split points on those
variables until an appropriate tree is produced.
Steps to create a Decision Tree using the CART algorithm:
 Greedy algorithm: In this The input space is divided using the Greedy method which
is known as a recursive binary spitting. This is a numerical method within which all of the
values are aligned and several other split points are tried and assessed using a cost function.
 Stopping Criterion: As it works its way down the tree with the training data, the
recursive binary splitting method described above must know when to stop splitting. The
most frequent halting method is to utilize a minimum amount of training data allocated to
every leaf node. If the count is smaller than the specified threshold, the split is rejected and
also the node is considered the last leaf node.
 Tree pruning: Decision tree’s complexity is defined as the number of splits in the tree.
Trees with fewer branches are recommended as they are simple to grasp and less prone to
cluster the data. Working through each leaf node in the tree and evaluating the effect of
deleting it using a hold-out test set is the quickest and simplest pruning approach.
 Data preparation for the CART: No special data preparation is required for the CART
algorithm.
Naïve Bayes Classifier Algorithm
o Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes
theorem and used for solving classification problems.
o It is mainly used in text classification that includes a high-dimensional training dataset.
o Naïve Bayes Classifier is one of the simple and most effective Classification algorithms
which helps in building the fast machine learning models that can make quick
predictions.
o It is a probabilistic classifier, which means it predicts on the basis of the
probability of an object.
o Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental
analysis, and classifying articles.
Why is it called Naïve Bayes?

CSIT DEPT-R22-MACHINE LEARNING 29

Ml-Unit Iii-1
No ratings yet
Ml-Unit Iii-1
46 pages
ML-PPT Unit Iii-1
No ratings yet
ML-PPT Unit Iii-1
38 pages
Peer Reviewed Scientific Journals
No ratings yet
Peer Reviewed Scientific Journals
9 pages
Decision Tree
No ratings yet
Decision Tree
13 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
Decision Tree Classification Guide
No ratings yet
Decision Tree Classification Guide
23 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
2023-24 ML Notes 2
No ratings yet
2023-24 ML Notes 2
16 pages
ML Unit 2 Final - III Yr
No ratings yet
ML Unit 2 Final - III Yr
72 pages
Week 7
No ratings yet
Week 7
32 pages
Module 2 CARTAlgorithm
No ratings yet
Module 2 CARTAlgorithm
13 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
Unit 2
No ratings yet
Unit 2
29 pages
Decision Trees: Example
No ratings yet
Decision Trees: Example
14 pages
10.1 Decision Tree
No ratings yet
10.1 Decision Tree
17 pages
DS Tech M 3 1
No ratings yet
DS Tech M 3 1
13 pages
ML Unit 3
No ratings yet
ML Unit 3
49 pages
Decision Tree DT
No ratings yet
Decision Tree DT
20 pages
ML - Module-3-Chapter-6 RNSIT
No ratings yet
ML - Module-3-Chapter-6 RNSIT
10 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
25 pages
Decision Trees
No ratings yet
Decision Trees
26 pages
DECISION TREES-jb
No ratings yet
DECISION TREES-jb
8 pages
Classification and Prediction
No ratings yet
Classification and Prediction
81 pages
Ch13. Decision Tree: KH Wong
No ratings yet
Ch13. Decision Tree: KH Wong
82 pages
Decision Tree & Techniques
71% (7)
Decision Tree & Techniques
41 pages
Unit 3
No ratings yet
Unit 3
21 pages
Decision Trees Lectures
No ratings yet
Decision Trees Lectures
55 pages
Module 4 Lecture - 2
No ratings yet
Module 4 Lecture - 2
65 pages
Unit 3
No ratings yet
Unit 3
25 pages
Business Data Mining Week 11
No ratings yet
Business Data Mining Week 11
15 pages
Decistion Tree
No ratings yet
Decistion Tree
27 pages
Applying Decision Tree Algorithm Classification An
No ratings yet
Applying Decision Tree Algorithm Classification An
5 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
No ratings yet
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
34 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
ML Unit 3
No ratings yet
ML Unit 3
22 pages
Decsion Tree
No ratings yet
Decsion Tree
6 pages
U4 ML Updated
No ratings yet
U4 ML Updated
32 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
ML Unit 3 Qa
No ratings yet
ML Unit 3 Qa
26 pages
Machine - Learning - Lecture - 08 - Decision Tree Learning
No ratings yet
Machine - Learning - Lecture - 08 - Decision Tree Learning
67 pages
Session 5b Classification by Decision Tree Induction
No ratings yet
Session 5b Classification by Decision Tree Induction
42 pages
L04 Decision Trees
No ratings yet
L04 Decision Trees
34 pages
TEAA - Tree Ensembles-1
No ratings yet
TEAA - Tree Ensembles-1
43 pages
S&ML Unit 6 - Q & A
No ratings yet
S&ML Unit 6 - Q & A
12 pages
Aiml M4 C1
No ratings yet
Aiml M4 C1
101 pages
223a1131 ML Exp 4
No ratings yet
223a1131 ML Exp 4
9 pages
Decision Tree
No ratings yet
Decision Tree
74 pages
CART - Machine Learning
No ratings yet
CART - Machine Learning
29 pages
Non-Metric Classification & Decision Trees
No ratings yet
Non-Metric Classification & Decision Trees
35 pages
Decision Tree
No ratings yet
Decision Tree
41 pages
Unit 3,4,5 ML (CS - AI)
No ratings yet
Unit 3,4,5 ML (CS - AI)
37 pages
ML Unit II
No ratings yet
ML Unit II
183 pages
Unit IV Da Online - PPTX 2 82
No ratings yet
Unit IV Da Online - PPTX 2 82
81 pages
Unit3 ML
No ratings yet
Unit3 ML
23 pages
7 Eleven Franchising Plan - Portugal
No ratings yet
7 Eleven Franchising Plan - Portugal
15 pages
Republic Vs Migrino Case Digest
50% (2)
Republic Vs Migrino Case Digest
2 pages
NCERT Solutions For Class 4 April 5 EVS Looking Around Chapter 14 Basvas Farm
No ratings yet
NCERT Solutions For Class 4 April 5 EVS Looking Around Chapter 14 Basvas Farm
6 pages
Raadaa - Garri Production and Processingdocx
100% (2)
Raadaa - Garri Production and Processingdocx
21 pages
Rig 2 Procedures for Apexindo
No ratings yet
Rig 2 Procedures for Apexindo
11 pages
Conclusion and Recommendations
No ratings yet
Conclusion and Recommendations
12 pages
Daisy Chain Bracelet Tutorial BraceletBook
No ratings yet
Daisy Chain Bracelet Tutorial BraceletBook
1 page
Resource Person Thesis
100% (2)
Resource Person Thesis
6 pages
Assignment of HRM-301
No ratings yet
Assignment of HRM-301
11 pages
P L D 2000 Lahore 461 Section 90
No ratings yet
P L D 2000 Lahore 461 Section 90
42 pages
BM8601 Dte-1 Question Bank
No ratings yet
BM8601 Dte-1 Question Bank
8 pages
Agreement For Credit Clients 2024
No ratings yet
Agreement For Credit Clients 2024
3 pages
THC 8 Syllabus 2022 1
No ratings yet
THC 8 Syllabus 2022 1
14 pages
Yellow Book - Miscellaneous
No ratings yet
Yellow Book - Miscellaneous
77 pages
Geriatric Health Care and Roles
No ratings yet
Geriatric Health Care and Roles
14 pages
LCF Paper High Strength Steel-2024
No ratings yet
LCF Paper High Strength Steel-2024
12 pages
Adaptive Control Theory: Model Reference Adaptive Systems
No ratings yet
Adaptive Control Theory: Model Reference Adaptive Systems
16 pages
Folding Technical Drawings
100% (1)
Folding Technical Drawings
2 pages
Hase Bikes Katalog 2014 en
No ratings yet
Hase Bikes Katalog 2014 en
68 pages
Materi Pendahuluan Exergy
No ratings yet
Materi Pendahuluan Exergy
21 pages
222-Article Text-558-2-10-20210217
No ratings yet
222-Article Text-558-2-10-20210217
8 pages
It Programs
100% (1)
It Programs
40 pages
Comac Abila Floor Scrubber
No ratings yet
Comac Abila Floor Scrubber
4 pages
Programming Android - Graphics
No ratings yet
Programming Android - Graphics
59 pages
CIVIL LAW Answers To The BAR As Arranged by Topics (Year 1990-2006)
0% (1)
CIVIL LAW Answers To The BAR As Arranged by Topics (Year 1990-2006)
5 pages
Conjestive Heart Failure
No ratings yet
Conjestive Heart Failure
61 pages
Anthelmintic Drugs - DSM
No ratings yet
Anthelmintic Drugs - DSM
25 pages
Electronics From Collins
No ratings yet
Electronics From Collins
91 pages
Study Schedule Exam Term3
No ratings yet
Study Schedule Exam Term3
3 pages
Supply Chain Design Simplified
No ratings yet
Supply Chain Design Simplified
9 pages