0% found this document useful (0 votes)

58 views44 pages

Machine Learning: Prepared by

The document discusses decision trees and their use in machine learning for classification problems. It provides explanations of key concepts related to decision trees, including: - The basic structure and terminology of decision trees, which use internal decision nodes to split the data into homogeneous subsets represented by leaf nodes. - Decision tree algorithms like ID3, C4.5 and C5.0 that use techniques like information gain to build the tree by recursively splitting nodes. - How decision trees can represent their classifications in logical structures that are understandable without statistical knowledge, making them useful for business applications.

Uploaded by

abdala sabry

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views44 pages

Machine Learning: Prepared by

Uploaded by

abdala sabry

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

Machine

Learning
Prepared By:
Dr. Sara Sweidan
Divide and
Conquer –
Classification
Using Decision
Trees and
Rules
To Know

decision trees and rule learners—two machine learning

methods that also make complex decisions from sets of
simple choices.

These methods then present their knowledge in

the form of logical structures that can be
understood with no statistical knowledge. This
aspect makes these models particularly useful for
business strategy and process improvement.
Understanding of decision tree
• Concept of Decision Tree

• Use of Decision Tree to classify data

• Basic algorithm to build Decision Tree

• Concept of Entropy
• Basic concept of entropy in information theory
• Mathematical formulation of entropy
• Calculation of entropy of a training set

• Decision Tree induction algorithms

• ID3, CART, C4.5, C5.0
Why use decision trees
• Decision Tree is a Supervised learning technique that can be used for both classification and
Regression problems, but mostly it is preferred for solving Classification problems.
• It is a tree-structured classifier, where internal nodes represent the features of a dataset,
branches represent the decision rules and each leaf node represents the outcome.
• In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision
nodes are used to make any decision and have multiple branches, whereas Leaf nodes are the
output of those decisions and do not contain any further branches.

• It is a graphical representation for getting all the possible solutions to a problem/decision

based on given conditions.
• It is called a decision tree because, similar to a tree, it starts with the root node, which expands on
further branches and constructs a tree-like structure.
• A decision tree simply asks a question, and based on the answer (Yes/No), it further split the tree
into sub-trees. Below diagram explains the general structure of a decision tree.
decision tree terminology.

The decision tree comprises of root node, leaf node, branch nodes, parent/child node etc.
following is the explanation of this terminology.
• Root Node: Root node is from where the decision tree starts. It represents the entire dataset, which
further gets divided into two or more homogeneous sets.
• Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated further after
getting a leaf node.
• Splitting: Splitting is the process of dividing the decision node/root node into sub- nodes according
to the given conditions.
• Branch/Sub Tree: A tree formed by splitting the tree.
• Pruning: Pruning is the process of removing the unwanted branches from the tree.
• Parent/Child node: The root node of the tree is called the parent node, and other nodes are called
the child nodes.
Understanding of decision tree
• A Decision Tree is an important data structure known to solve many computational problems

Example 1: Binary Decision Tree

A B C f
0 0 0 m0
0 0 1 m1
0 1 0 m2
0 1 1 m3
1 0 0 m4
1 0 1 m5
1 1 0 m6
1 1 1 m7
Understanding of decision tree
• In Example 1, we have considered a decision tree where values of any attribute if binary only. The
decision tree is also possible where attributes are of continuous data type

Example 2: Decision Tree with numeric data

DT characteristics
• Decision tree may be n-ary, n ≥ 2.
• There is a special node called root node.
• All nodes drawn with circle (ellipse) are called internal nodes.
• All nodes drawn with square boxes are called terminal nodes or leaf nodes.
• Edges of a node represent the outcome for a value of the node.
• In a path, a node with same label is never repeated.

• Decision tree is not unique, as different ordering of internal nodes can give different
decision tree.
• Example: Suppose there is a
candidate who has a job offer and
wants to decide whether he should
accept the offer or Not. So, to
solve this problem, the decision
tree starts with theroot node
(Salary attribute by ASM).
Building Decision Tree
• In principle, there are exponentially many decision tree that can be constructed from a
given dataset (also called training data).

• Some of the trees may not be optimum

• Some of them may give an inaccurate result

• Two approaches are known

➢ Greedy strategy
• A top-down recursive
• divide-and-conquer
• Modification of greedy strategy
• ID3, C4.5, CART, C5.0.
Definition of decision tree

Definition 1: Decision Tree

Given a database D = 𝑡1 , 𝑡2 , … . . , 𝑡𝑛 , where 𝑡𝑖 denotes a tuple, which is defined by a set of

attribute 𝐴 = 𝐴1 , 𝐴2 , … . . , 𝐴𝑚 . Also, given a set of classes C = 𝑐1 , 𝑐2 , … . . , 𝑐𝑘 .

A decision tree T is a tree associated with D that has the following properties:

• Each internal node is labeled with an attribute Ai

• Each edges is labeled with predicate value that can be applied to the attribute associated with
the parent node of it

• Each leaf node is labeled with class cj

Decision Tree algorithm for classification

• Step-1: Begin the tree with the root node, says D, which contains the complete dataset.

• Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM)

i.e. information gain and Gini index.

• Step-3: Divide the D into subsets that contains possible values for the best attributes.

• Step-4: Generate the decision tree node, which contains the best attribute.

• Step-5: Recursively make new decision trees using the subsets of the dataset created in step -3.
Continue this process until a stage is reached where you cannot further classify the nodes and
called the final node as a leaf node.
Built Decision Tree Algorithm
• Input: D : Training dataset
• Output: T : Decision tree

Steps

1. If all tuples in D belongs to the same class Cj

Add a leaf node labeled as Cj
Return // Termination condition

2. Select an attribute Ai (so that it is not selected twice in the same branch)

3. Partition D = { D1, D2, …, Dp} based on p different values of Ai in D

4. For each Dk ϵ D
Create a node and add an edge between D and Dk with label as the Ai’s attribute value in Dk

5. For each Dk ϵ D
BuildTD(Dk) // Recursive call
6. Stop
Node Splitting in BuildDT Algorithm
• BuildDT algorithm must provides a method for expressing an attribute test
condition and corresponding outcome for different attribute type

• Case: Binary attribute

• This is the simplest case of node splitting

• The test condition for a binary attribute generates only two outcomes
Node Splitting in BuildDT Algorithm
• Case: Nominal attribute
• Since a nominal attribute can have many values, its test condition can be expressed in two
ways:
• A multi-way split
• A binary split

• Muti-way split: Outcome depends on the number of distinct values for the corresponding
attribute

• Binary splitting by grouping attribute values

Node Splitting in BuildDT Algorithm
• Case: Ordinal attribute
• It also can be expressed in two ways:
• A multi-way split
• A binary split

• Muti-way split: It is same as in the case of nominal attribute

• Binary splitting attribute values should be grouped maintaining the order property of the
attribute values
Node Splitting in BuildDT Algorithm
• Case: Numerical attribute
• For numeric attribute (with discrete or continuous values), a test condition can be expressed
as a comparison set
• Binary outcome: A > v or A ≤ v
• In this case, decision tree induction must consider all possible split positions

• Range query : vi ≤ A < vi+1 for i = 1, 2, …, q (if q number of ranges are chosen)
• Here, q should be decided a priori
Node Splitting in BuildDT Algorithm
Example illustration of BuildDT Algorithm
• Consider a training data set as shown.

Person Gender Height Class

1 F 1.6 S
Attributes:
2 M 2.0 M
3 F 1.9 M Gender = {Male(M), Female (F)} // Binary attribute
4 F 1.88 M
5 F 1.7 S
Height = {1.5, …, 2.5} // Continuous attribute
6 M 1.85 M
7 F 1.6 S Class = {Short (S), Medium (M), Tall (T)}
8 M 1.7 S
9 M 2.2 T
10 M 2.1 T
11 F 1.8 M
12 M 1.95 M
13 F 1.9 M
Given a person, we are to test in which class s/he
14 F 1.8 M belongs
15 F 1.75 S
Node Splitting in BuildDT Algorithm
Example illustration of BuildDT Algorithm
• To built a decision tree, we can select an attribute in two different orderings: <Gender,
Height> or <Height, Gender>

• Further, for each ordering, we can choose different ways of splitting

• Different instances are shown in the following.

• Approach 1 : <Gender, Height>

Node Splitting in
BuildDT
Algorithm
Node Splitting in <Height, Gender>

BuildDT
Algorithm
Concept of entropy
Concept of entropy

More ordered Less ordered

less entropy higher entropy
More organized or Less organized or
ordered (less probable) disordered (more probable)
Concept of entropy
• The entropy is a measure of the uncertainty associated with a ra n d om
v ariable
• As uncertainty and or randomness increases for a result set so does the
entropy
• Values range from 0 – 1
• To represent the entropy of information :
Entropy (D)   − pi log2 ( pi )
Uncertainty

i=1

entropy
Concept of entropy
Entropy (D)   − pi log2 ( pi )
i=1
• Note:
• The above formula should be summed over the non-empty classes only, that is, classes for
which 𝑝𝑖 ≠ 0

• E is always a positive quantity

• E takes it’s minimum value (zero) if and only if all the instances have the same class (i.e.,
the training set with only one non-empty class, for which the probability 1).

• Entropy takes its maximum value when the instances are equally distributed among k
possible classes. In this case, the maximum value of E is 𝑙𝑜𝑔2 𝑘.
Concept of entropy
• Entropy is an important concept used in Physics in the context of heat and thereby
uncertainty of the states of a matter.

• At a later stage, with the growth of Information Technology, entropy becomes an

important concept in Information Theory.

• To deal with the classification job, entropy is an important concept, which is considered
as
• an information-theoretic measure of the “uncertainty” contained in training data due
to the presence of more than one classes.
Concept of entropy
Question related to event with eventuality (or impossibility) will be answered with 0 or 1
uncertainty.

Does sun rises in the East? (answer is with 0 uncertainty)

Will mother give birth to male baby? (answer is with ½ uncertainty)

Is there a planet like earth in the galaxy? (answer is with an extreme uncertainty)
Concept of entropy

➢ Entropy is a measure of the randomness in the information being

processed. The higher the entropy, the harder it is to draw any
conclusions from that information.
➢ It is a measure of disorder or impurity or unpredictability or
uncertainty.
➢ Low entropy means less uncertain and high entropy means more
uncertain.
Information Gain:

• Information gain is the measurement of changes in entropy after the segmentation of a dataset
based on an attribute.
• It calculates how much information a feature provides us about a class.
• According to the value of information gain, we split the node and build the decision tree. A
decision tree algorithm always tries to maximize the value of information gain, and a
node/attribute having the highest information gain is split first. It can be calculated using the
below formula:
Information Gain= Entropy(S) – [(Weighted Average) * Entropy (each feature)]
Entropy:
Entropy is a metric to measure the impurity in a given attribute. It specifies randomness in

data. Entropy can be calculated as:

Entropy(s) = – P(yes)log2 P(yes) – P(no) log2 P(no)
Table: Class‐labeled training tuples from All Electronics customer database.
• Class Y: buys_computer = “yes”
• Class N: buys_computer = “no”
9 9 5 5
• Entropy (D)= - 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 = 0.940
14 14 14 14
• Compute the expected information requirement for each attribute: start with the
attribute age
|𝑆𝑣 |
• Gain (age, D) = entropy (D) - σv{Youth,Middle−aged ,Senior} entropy (𝑆𝑣 )
|𝑆|
5 4 5
= 0.940 - 14 entropy (𝑆𝑦𝑜𝑢𝑡ℎ ) - 14 entropy (𝑆𝑚𝑖𝑑𝑑𝑙𝑒 ) - 14 entropy (𝑆𝑠𝑒𝑛𝑖𝑜𝑟 )
5 2 2 3 3 4 4 4 5 2 2 3 3
= 0.940 - 14 (- 5 𝑙𝑜𝑔2 - 5 𝑙𝑜𝑔2 ) - 14 (- 4 𝑙𝑜𝑔2 ) - 14 (- 5 𝑙𝑜𝑔2 - 5 𝑙𝑜𝑔2 ) = 0.247
5 5 4 5 5

• Gain (income, D) = 0.029

• Gain (student, D) = 0.151
• Gain (credit _ rating, D) = 0.048
The attribute age has the
highest information gain and
therefore becomes the
splitting attribute at the root
node of the decision tree.
Branches are grown for each
outcome of age. The tuples
are shown partitioned
accordingly.
Strengths Weakness
• An all-purpose classifier that does well • Decision tree models are often
on most problems biased toward splits on features
• Highly automatic learning process, having a large number of levels
which can handle numeric or nominal • It is easy to overfit or underfit the
features, as well as missing data model
• Excludes unimportant features • Can have trouble modeling some
• Can be used on both small and large relationships due to reliance on
datasets axis-parallel splits
• Results in a model that can be • Small changes in the training data can result
interpreted without a mathematical in large changes to decision logic
background (for relatively small trees) • Large trees can be difficult to interpret and
• More efficient than other complex the decisions they make may seem
models counterintuitive
Underfitting &
overfitting
• Underfitting & overfitting
Underfitting & overfitting

Training = 50% Training = 98% Training = 99%

Test = 48% Test = 92% Test = 60%
Bias = 50% (high bias) Bias = 2% (low bias) Bias = 1% (low bias)
Variance= 2 (low var.) Variance= 6 (low var.) Variance= 39 (high var.)
Underfitting model. fitting model. overfitting model.
Underfitting
Underfitting && overfitting
overfitting
• Bias
• measures accuracy or quality of the estimator
• low bias implies on average we will accurately estimate true
parameter from training data

• Variance
• Measures precision or specificity of the estimator
• Low variance implies the estimator does not change much as
the training set varies
underfit region
overfit region
underfit region

overfit region
Regression: Complexity versus Goodness of Fit

Low Bias
Low Variance / / High Variance
High Bias

Highest Bias low Bias low Bias

Lowest variance low Variance High variance
Model complexity = low Model complexity = medium Model complexity = high
10/3/19 Dr. Yanjun Qi / UVA CS 38
Credit: Stanford Machine Learning
Fixes to try:

- Try getting more training examples → fixes high variance.

- Try a smaller set of features → fixes high variance.
- Try a larger set of features → fixes high bias.
- Run gradient descent for more iterations → fixes optimization algorithm.

Decision Trees Edited
No ratings yet
Decision Trees Edited
56 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
Decision Trees and Regression Techniques
No ratings yet
Decision Trees and Regression Techniques
27 pages
09 Decision Tree Induction
No ratings yet
09 Decision Tree Induction
120 pages
ML Unit 3 Notes-1
No ratings yet
ML Unit 3 Notes-1
118 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
Machine - Learning - Lecture - 08 - Decision Tree Learning
No ratings yet
Machine - Learning - Lecture - 08 - Decision Tree Learning
67 pages
Decisiontree
No ratings yet
Decisiontree
6 pages
Decision Tree Classifier Project
100% (1)
Decision Tree Classifier Project
20 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
Lecture 7 Overview of ML Models
No ratings yet
Lecture 7 Overview of ML Models
77 pages
DS Tech M 3 1
No ratings yet
DS Tech M 3 1
13 pages
Classification and Prediction
No ratings yet
Classification and Prediction
81 pages
ML Ch-3 Decision Trees and Ensemble Methods
No ratings yet
ML Ch-3 Decision Trees and Ensemble Methods
14 pages
Decision Tree
No ratings yet
Decision Tree
13 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
AIML Lec-11
No ratings yet
AIML Lec-11
18 pages
Lecture Note #5 - PEC-CS701E
No ratings yet
Lecture Note #5 - PEC-CS701E
16 pages
Decision Trees - A Complete Introduction With Examples - by Shubham Koli - Medium
No ratings yet
Decision Trees - A Complete Introduction With Examples - by Shubham Koli - Medium
22 pages
Decision Tree & Random ForestNotes
No ratings yet
Decision Tree & Random ForestNotes
11 pages
L8 1 Decisiontrees Random Forest
No ratings yet
L8 1 Decisiontrees Random Forest
118 pages
ML Unit 3 Notes
No ratings yet
ML Unit 3 Notes
117 pages
Classification: Decision Tree Induction: Lecture #9
No ratings yet
Classification: Decision Tree Induction: Lecture #9
121 pages
Decision Tree in Machine Learning
No ratings yet
Decision Tree in Machine Learning
11 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
11 pages
UNIT 2 - Groups (Decision Tree)
No ratings yet
UNIT 2 - Groups (Decision Tree)
20 pages
Unit 3
No ratings yet
Unit 3
25 pages
21cs54 Aiml Module4
No ratings yet
21cs54 Aiml Module4
128 pages
ML - Module-3-Chapter-6 RNSIT
No ratings yet
ML - Module-3-Chapter-6 RNSIT
10 pages
ML Unit 3
No ratings yet
ML Unit 3
15 pages
ML L8 Decision Tree
No ratings yet
ML L8 Decision Tree
109 pages
Decision Tree Induction Algorithm
No ratings yet
Decision Tree Induction Algorithm
6 pages
Data Science Lectures 3
No ratings yet
Data Science Lectures 3
46 pages
Trees
No ratings yet
Trees
78 pages
Decision Trees for Beginners
No ratings yet
Decision Trees for Beginners
45 pages
Decision Tree Basics for Students
No ratings yet
Decision Tree Basics for Students
29 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
14 pages
Decision Tree
No ratings yet
Decision Tree
66 pages
AIML Removed
No ratings yet
AIML Removed
25 pages
AIML Removed Merged
No ratings yet
AIML Removed Merged
31 pages
Decision Trees
No ratings yet
Decision Trees
34 pages
Data Mining Notes Unit 4
No ratings yet
Data Mining Notes Unit 4
30 pages
Decision Tree
No ratings yet
Decision Tree
74 pages
Lecture Notes 3
No ratings yet
Lecture Notes 3
11 pages
Decision Tree
No ratings yet
Decision Tree
21 pages
Decision Tree Learning (8 Hours)
No ratings yet
Decision Tree Learning (8 Hours)
141 pages
Unit 3.2 Decision Tree Algorithm Wit Examples
No ratings yet
Unit 3.2 Decision Tree Algorithm Wit Examples
85 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
Supervised Learning-Classification Part-4 Divide and Conquer
No ratings yet
Supervised Learning-Classification Part-4 Divide and Conquer
32 pages
Decision Tree Algorithm, Explained
No ratings yet
Decision Tree Algorithm, Explained
20 pages
Springer - Linguistic Decision Trees For Classification-2014
No ratings yet
Springer - Linguistic Decision Trees For Classification-2014
43 pages
Module 4 Lecture - 2
No ratings yet
Module 4 Lecture - 2
65 pages
Decision Tree
No ratings yet
Decision Tree
20 pages
Decision Tree Basics for Data Scientists
No ratings yet
Decision Tree Basics for Data Scientists
61 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
25 pages
Aiml M4 C1
No ratings yet
Aiml M4 C1
101 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
5 pages
1-Logistic Regression
No ratings yet
1-Logistic Regression
27 pages
Linear Regression With Multiple Variables
No ratings yet
Linear Regression With Multiple Variables
20 pages
Naive Bayes Classification Guide
No ratings yet
Naive Bayes Classification Guide
21 pages
Lecture 2
No ratings yet
Lecture 2
15 pages
Module 3 - 2 Management Science PDF
No ratings yet
Module 3 - 2 Management Science PDF
33 pages
Discrete+Structures+Final+Paper+Fall+2018+ +soln
No ratings yet
Discrete+Structures+Final+Paper+Fall+2018+ +soln
13 pages
6.006 Introduction To Algorithms: Mit Opencourseware
No ratings yet
6.006 Introduction To Algorithms: Mit Opencourseware
6 pages
Day 20 Topological Sort
No ratings yet
Day 20 Topological Sort
27 pages
Q. What Is Input Buffering. What Is Sentinels?
No ratings yet
Q. What Is Input Buffering. What Is Sentinels?
6 pages
Shuttle Sort
No ratings yet
Shuttle Sort
5 pages
As para Ut d1 U1 Test
No ratings yet
As para Ut d1 U1 Test
5 pages
Blatt 09
No ratings yet
Blatt 09
4 pages
Theoritical Foundation in Computer Science
No ratings yet
Theoritical Foundation in Computer Science
28 pages
Midterm Exam
No ratings yet
Midterm Exam
15 pages
Homework 1 Solutions CSCI-2300: Data Structures and Algorithms Spring 2007
No ratings yet
Homework 1 Solutions CSCI-2300: Data Structures and Algorithms Spring 2007
5 pages
Name: MATH-2211 - Calculus-I - CRN 808 Test-3
No ratings yet
Name: MATH-2211 - Calculus-I - CRN 808 Test-3
2 pages
BINARY NUMBERS and Letter
No ratings yet
BINARY NUMBERS and Letter
11 pages
CS502 Mcqs MidTerm by Vu Topper RM
No ratings yet
CS502 Mcqs MidTerm by Vu Topper RM
45 pages
Assignment 1 3
No ratings yet
Assignment 1 3
2 pages
Made Easy Cse Workbook
67% (6)
Made Easy Cse Workbook
41 pages
Mca 101 MFC
No ratings yet
Mca 101 MFC
2 pages
Economic Dispatch of Thermal Units & Methods of Solution
No ratings yet
Economic Dispatch of Thermal Units & Methods of Solution
53 pages
Discrete Mathematics Midterm Quiz 2
No ratings yet
Discrete Mathematics Midterm Quiz 2
5 pages
20-Machine-Part Cell Formation (MPCF), Hierarchical Clustering For MPCF-10!11!2021 (10-Nov-2021) Material - I - 10!11!2021 - Binary - Ord
No ratings yet
20-Machine-Part Cell Formation (MPCF), Hierarchical Clustering For MPCF-10!11!2021 (10-Nov-2021) Material - I - 10!11!2021 - Binary - Ord
3 pages
5
No ratings yet
5
10 pages
Unit 4 Daa
No ratings yet
Unit 4 Daa
186 pages
1998-08 What Const Really Means PDF
No ratings yet
1998-08 What Const Really Means PDF
3 pages
COE4590 - 13 - Gustafson-Barsis's Law
No ratings yet
COE4590 - 13 - Gustafson-Barsis's Law
10 pages
Reference JRNL
No ratings yet
Reference JRNL
2 pages
A Systematic Approach
No ratings yet
A Systematic Approach
12 pages
Lawrence Wu, Alex Miller - Daily Coding Problem - Get Exceptionally Good at Coding Interviews by Solving One Problem Every Day (2019)
No ratings yet
Lawrence Wu, Alex Miller - Daily Coding Problem - Get Exceptionally Good at Coding Interviews by Solving One Problem Every Day (2019)
300 pages
Eda6 - Sklearn - neighbors.KNeighborsClassifier
No ratings yet
Eda6 - Sklearn - neighbors.KNeighborsClassifier
6 pages
AKTU Data Structures BCS 301 Fixed
No ratings yet
AKTU Data Structures BCS 301 Fixed
5 pages
6-2: Solving Systems of Equations Using Substitution Method
No ratings yet
6-2: Solving Systems of Equations Using Substitution Method
10 pages

Machine Learning: Prepared by

Uploaded by

Machine Learning: Prepared by

Uploaded by

Machine

decision trees and rule learners—two machine learning

These methods then present their knowledge in

• Use of Decision Tree to classify data

• Basic algorithm to build Decision Tree

• Decision Tree induction algorithms

• It is a graphical representation for getting all the possible solutions to a problem/decision

Example 1: Binary Decision Tree

Example 2: Decision Tree with numeric data

• Some of the trees may not be optimum

• Some of them may give an inaccurate result

• Two approaches are known

Definition 1: Decision Tree

Given a database D = 𝑡1 , 𝑡2 , … . . , 𝑡𝑛 , where 𝑡𝑖 denotes a tuple, which is defined by a set of

• Each internal node is labeled with an attribute Ai

• Each leaf node is labeled with class cj

i.e. information gain and Gini index.

1. If all tuples in D belongs to the same class Cj

3. Partition D = { D1, D2, …, Dp} based on p different values of Ai in D

• Case: Binary attribute

• Binary splitting by grouping attribute values

• Muti-way split: It is same as in the case of nominal attribute

Person Gender Height Class

• Further, for each ordering, we can choose different ways of splitting

• Different instances are shown in the following.

• Approach 1 : <Gender, Height>

More ordered Less ordered

• E is always a positive quantity

• At a later stage, with the growth of Information Technology, entropy becomes an

Does sun rises in the East? (answer is with 0 uncertainty)

Will mother give birth to male baby? (answer is with ½ uncertainty)

➢ Entropy is a measure of the randomness in the information being

data. Entropy can be calculated as:

• Gain (income, D) = 0.029

Training = 50% Training = 98% Training = 99%

Highest Bias low Bias low Bias

- Try getting more training examples → fixes high variance.

You might also like