Machine Learning
Dr. Shylaja S S
Professor
Department of Computer Science & Engineering
Acknowledgement : Dr. Rajanikanth K (Former Principal MSRIT, PhD(IISc),
Academic Advisor,
PESU)
Prof. Preet Kanwal (Associate Professor, PESU)
Teaching Assistant (Arya Rajiv Chaloli)
Machine Learning
Decision Trees
● Decision tree is a flowchart-like diagram
to make decisions based on a series of
conditions.
● The learned function is represented by a
decision tree.
● Decision tree can also be re-represented
as if-then rules to improve human
readability.
Machine Learning
Decision Trees
● Decision trees can be used for
classification and regression
problems
● Decision trees classify instances by
sorting them down the tree
from the root to some leaf node.
○ Node: Specifies some attribute of
an instance to be tested
○ Branch: Corresponds to one of
the possible values for an
attribute
Machine Learning
Decision Tree (multiple branches) - Loan Application
Income
Range
<30K >70K
30-70K
No. years in Criminal
Govt job
current job record
Yes No <1 1-5 >5 Yes No
Approve Reject Reject Credit card Approve Reject Approve
outstanding balance with
double
high medium low premium
Reject Approve Approve with premium
Machine Learning
Terminology
Root Node:
It’s the topmost node where we
start building the tree.
Leaf node:
It’s the end of the decision tree
and carries the decision.
Decision Node:
Is a node where branching
(testing) occurs. Certain attribute
is tested and based on the
attribute value, a particular branch
is taken.
Machine Learning
Terminology - Note
• Every node except leaf node is a decision Example decision tree for concept to play Tennis
node.
Outlook
• Each decision node tests for an attribute.
sunny Rain
• Each branch corresponds to an attribute value. Overcast
• Each leaf node assigns a classification label. Humidity yes Wind
• Decision trees represent a disjunction of
conjunctions (more expressive than FIND- High Normal Strong Weak
S).
no yes no yes
For example, the decision is yes whenever: (Outlook = Sunny ^ Humidity = Normal) OR (Outlook
= Overcast) OR (Outlook = Rain ^ Wind = Weak)
Machine Learning
Building Decision Trees
1. Select the “best” decision attribute for the
next node - crux of the problem
Decision tree for concept to play Tennis
2. Assign that attribute as the decision attribute
Outlook
for node.
3. For each value of the attribute, create new sunny Rain
descendant of node (new branch is opened).
Overcast
○ It will depend on the subset of instances
yes
that fall under that branch. Humidity Wind
○ If no further decision making is required,
you will reach the leaf node. High Normal Strong Weak
4. Sort training examples to new nodes. no yes no yes
5. If training examples perfectly classified, then
STOP, else iterate over new leaf nodes.
Machine Learning
Multiple decision trees for the same task
Decision trees vary based on the algorithm used to select the
next “best” decision attribute node
Example: To find whether a student plays basketball or not
Gender
Performance
Girl Boy
Above Below
Average average
Performance Height
Above Above
Below average Below 5.5 ft.
Average 5.5 ft. Height GENDER
NO YES
Height Performance Below 5.5 ft. Above 5.5 ft. Girl. Boy
Below Above 5.5 ft. Above
Below average
5.5 ft. Average
NO YES
NO YES
NO YES NO YES
Which one do we prefer and why?
Machine Learning
Some Decision Trees Algorithms
ID3 Algorithm
(Iterative Dichotomiser 3) CART
It was developed in 1986 by Ross Classification and Regression Trees or
Quinlan. CART for short is a term introduced
The algorithm creates a multiway by Leo Breiman. It can be used for
tree, finding for each node (i.e. in a classification or regression predictive
greedy manner) the categorical modeling problems.
feature that will yield the largest
information gain for the targets. It uses Gini Impurity as the heuristic,
for classification, and sum of least
It can be used for classification and squares as the heuristic for regression.
regression.
Machine Learning
ID3 Algorithm - Entropy
• At a given point, we prefer an attribute as the decision node that
partitions the set into subsets which are as homogeneous as possible.
Ideally, no more decisions are required!
• Entropy is an indicator of non-homogeneity / uncertainty. It is a measure
of disorder (indicator of how messy your data is)
■ Entropy needs to be reduced.
• Higher the entropy, the more non-homogeneous / uncertain / disordered
the data set is.
• If the attribute A has c possible values, its entropy with respect to a set S
is given by:
Entropy(A) = H(A) = - ∑ p(xi) log2 (p(xi)); i = 1 to c
Machine Learning
ID3 Algorithm - Entropy
BUCKET 1 BUCKET 2 BUCKET 3
LESS ENTROPY MEDIUM ENTROPY HIGH ENTROPY
We know for sure that We know with 75% certainty We know with 50%
the ball coming out is that the ball is red, and with certainty that the ball is
red. 25% certainty that it’s red, and 25% certainty
green. with yellow and green
each.
● So, we have more knowledge regarding bucket 1 and bucket 1 has the least entropy.
Machine Learning
ID3 Algorithm - Information Gain
● Information gain is a measure of the effectiveness of an attribute in
classifying the training data.
● Information Gain = (Entropy before split – Entropy after split)
■ Information gain should be as high as possible
● Information gain is the expected reduction in entropy caused by
partitioning the examples according to the attribute.
Machine Learning
ID3 Algorithm - Information Gain
● Average information entropy: I(A) is the weighted average of entropies
of individual subsets.
Machine Learning
ID3 Algorithm - Iterative Dichotomiser 3 (ID3)
Steps to create a decision tree using the ID3 algorithm
1. Compute the entropy for data-set entropy(s)
1. For each attribute:
• Calculate entropy for all other values entropy(a)
• Take average information entropy for the current attribute
• Calculate gain for the current attribute
3. Pick the highest gain attribute
4. Repeat until we get the tree we desire.
Machine Learning
Construct Decision Tree
Outlook Temp Humidity Windy Play tennis
Sunny High High Weak No
Sunny High High Strong No
Overcast High High Weak Yes
Rainy Medium High Weak Yes
Rainy Cool Normal Weak Yes Example 1
Rainy Cool Normal Strong No
Overcast Cool Normal Strong Yes
Sunny Medium High Weak No
Sunny Cool Normal Weak Yes
Rainy Medium Normal Weak Yes
Sunny Medium Normal Strong Yes
Overcast Medium High Strong Yes
Overcast High Normal Weak Yes
Rainy Medium High Strong No
Machine Learning
Entropy Calculation before Split
Outlook Temp Humidity Windy Tennis
Sunny High High Weak No ENTROPY(S) - Calculating for entire Set S
Sunny High High Strong No
|S| = 14 (training instances provided)
Overcast High High Weak Yes
Rainy Medium High Weak Yes P(play tennis = yes) = p1 = 9/14 = 0.643
Rainy Cool Normal Weak Yes
Rainy Cool Normal Strong No P(Play tennis = no) = p2 = 5/14 = 0.357
Overcast Cool Normal Strong Yes
Sunny Medium High Weak No
Sunny Cool Normal Weak Yes
Rainy Medium Normal Weak Yes
Sunny Medium Normal Strong Yes
Overcast Medium High Strong Yes = - (p1 log2 p1 + p2 log2 p2)
Overcast High Normal Weak Yes
Rainy Medium High Strong No
= 0.94
Machine Learning
Calculating Entropy for a Subset
• We will now see, What happens if we split based on:
■ Outlook
■ Temp
■ Humidity
■ Windy
• First, we will see what will happen If we split based on Outlook.
■ Outlook has 3 values that means there will be three branches (three subsets) =
sunny, overcast and rain.
■ We will calculate the entropy for each of these subsets.
■ We take a weighted sum of the entropy’s of the three subsets which becomes
the “Entropy after the split”.
■ We then calculate the gain(Outlook) = Entropy before split - Entropy after split
• We repeat this process for each attribute.
• The attribute with the maximum gain is picked as the decision node.
Machine Learning
Calculating Entropy for a Subset
Outlook Temp Humidity Windy Play tennis
Sunny High High Weak No
Sunny High High Strong No
Sunny Medium High Weak No
Sunny Cool Normal Weak Yes
Outlook
Sunny Medium
Temp Humidity
Normal Strong
Windy PlayYes
tennis
Overcast High High Weak Yes
Overcast Cool Normal Strong Yes
Overcast Medium High Strong Yes
Outlook
Overcast Temp
High Humidity
Normal Windy
Weak PlayYes
tennis
Rainy Medium High Weak Yes
Rainy Cool Normal Weak Yes
Rainy Cool Normal Strong No
Rainy Medium Normal Weak Yes
Rainy Medium High Strong No
Machine Learning
Entropy after Split based on Outlook and Gain Calculation
|S| = 14, Entropy(S) = 0.94
● Splitting based on Outlook - There will be three subsets
Ssunny , Sovercast , Srain such that |Ssunny|= 5, |Sovercast|= 4 and |Srain|= 5
● Entropy of these subsets:
■ E(Ssunny) = - ( ⅖ log2 ⅖ + ⅗ log2 ⅗ ) = 0.971
■ E(Sovercast) = - ( 1 log2 1 + 0 log2 0 ) =0
■ E(Srain) = - ( ⅗ log2 ⅗ + ⅖ log2 ⅖ ) = 0.971
● Avg Information of Outlook : weighted avg of the entropies of the subsets:
I(Outlook) = = 5/14 * 0.971 + 4/14 * 0 + 5/14 * 0.971
= 0.693
● Gain(S, outlook) = Entropy(S) - I(outlook) = 0.94 - 0.693 = 0.247
Machine Learning
Entropy after Split based on Temp and Gain Calculation
|S| = 14, Entropy(S) = 0.94
● Splitting based on Temp - There will be three subsets
Shigh , Smedium , Scool such that |Shigh|= 4, |Smedium|= 6 and |Scool|= 4
● Entropy of these subsets:
■ E(Shigh) = - (2/4 log2 2/4 + 2/4 log2 2/4 ) =1
■ E(Smedium) = - ( 4/6 log2 4/6 + 2/6 log2 2/6 ) = 0.918
■ E(Scool) = - ( 3/4 log2 3/4 + 1/4 log2 1/4 ) = 0.81125
● Avg Information of Outlook : weighted avg of the entropies of the subsets:
I(Temp) = = 4/14 * 1 + 6/14 * 0.918 + 4/14 * 0.81125
= 0.9109 = 0.911
● Gain(S, Temp) = Entropy(S) - I(Temp) = 0.94 - 0.911 = 0.029
Machine Learning
Picking an Attribute as Decision Node
• We do a similar calculation to calculate the following:
■ Gain(S, Humidity)
■ G(S, Windy)
• Hence, we can summarize the calculations as:
property value
Entropy(S) 0.94 outlook
G(S, Outlook) 0.247
sunny rain
G(S, Temp) 0.029 overcast
G(S, Humidity) 0.152
G(S, Windy) 0.048
• We pick the attribute with maximum Gain as the decision node.
Machine Learning
Arriving at a Decision Node - Attribute with highest Gain
● We choose our root node as outlook and split according to its three
values.
● Entropy of overcast was 0 which means it is single valued and hence we
can come to a conclusion from overcast .
outlook E(Ssunny) 0.971
E(SOvercast) 0
sunny rain
overcast E(Srain) 0.971
yes
Machine Learning
Repeat the process to pick the descendant node for each branch
With Outlook = Sunny, data would look like this:
outlook
Outlook Temp Humidity Windy Play tennis
Sunny High High Weak No
sunny rain
Sunny High High Strong No overcast
Sunny Medium High Weak No
yes
Sunny Cool Normal Weak Yes
Sunny Medium Normal Strong Yes
E(Ssunny) 0.971
E(SOvercast) 0
E(Srain) 0.971
Machine Learning
Entropy after Split based on Temp and Gain Calculation
|Ssunny| = 5, Entropy(Ssunny) = 0.971
● Splitting based on Temp - There will be three subsets
Shigh , Smedium , Scool such that |Shigh|= 2, |Smedium|= 2 and |Scool|= 1
● Entropy of these subsets:
■ E(Shigh) = - ( 0 log2 0 + 1 log2 1 ) = 0
■ E(Smedium) = - ( ½ log2 ½ + ½ log2 ½ ) = 1
■ E(Scool) = - (1 log2 1 + 0 log2 0 ) = 0
● Avg Information of Outlook : weighted avg of the entropies of the subsets:
I(Temp) = = ⅖ * 1 = 0.4
● Gain(Ssunny, Temp) = Entropy(Ssunny) - I(Temp) = 0.971 - 0.4 = 0.571
Machine Learning
Entropy after Split based on Humidity and Gain Calculation
|Ssunny| = 5, Entropy(Ssunny) = 0.971
● Splitting based on Humidity - There will be two subsets
Shigh , Snormal such that |Shigh|= 2, |Snormal|= 3
● Entropy of these subsets:
■ E(Shigh) = - ( 0 log2 0 + 1 log2 1 ) = 0
■ E(Snormal) = - ( 1 log2 1 + 0 log2 0 ) = 0
● Avg Information of Outlook : weighted avg of the entropies of the subsets:
I(Humidity) = =0
● Gain(Ssunny, Humidity) = Entropy(Ssunny) - I(Humidity) = 0.971 - 0 = 0.971
Machine Learning
Entropy after Split based on Windy and Gain Calculation
|Ssunny| = 5, Entropy(Ssunny) = 0.971
● Splitting based on Humidity - There will be two subsets
Sweak , Sstrong such that |Sweak|= 3, |Sstrongl|= 2
● Entropy of these subsets:
■ E(Sweak) = - ( ⅓ log2 ⅓ + ⅔ log2 ⅔ ) = 0.918
■ E(Sstrong) = - ( ½ log2 ½ + ½ log2 ½ ) = 1
● Avg Information of Outlook : weighted avg of the entropies of the subsets:
I(Windy) = = 0.951
● Gain(Ssunny, Windy) = Entropy(Ssunny) - I(Windy) = 0.971 - 0.951 = 0.02
Machine Learning
Arriving at a Decision node
● Humidity has the highest gain hence we choose humidity at second level after outlook
as sunny
● Remember that both value of humidity (high and normal) had entropy of 0 which
means we have reached the leaf node with this.
● Doing the same procedure for the table with outlook=rain we expand the rain leading
node to get the following decision tree.
outlook property value
Entropy(Ssunny) 0.97
sunny rain
overcast G(Ssunny,temp) 0.571
humidity
yes windy G(Ssunny,humidity) 0.971
G(Ssunny,windy) 0.02
high normal
strong weak
no yes
no yes
Machine Learning
Construct Decision Tree
salary Location job acceptance
Tier1 MUM YES
Tier 1 BLR YES
Example 2
Tier 2 BLR NO
Tier 1 HYD NO consider the data
Tier 2 MUM YES set of concept of
Tier 1 HYD NO accepting a job
Ensure there are no duplicate rows in your data
Machine Learning
Construct Decision Tree
salary Location job acceptance Solution
Step 1 : Entropy of the entire set:
MUM YES |S| = 5
Tier1
P(yes) = 3/5, P(no) = 2/5
Entropy(S) = 0.971
Tier 1 BLR YES
Step 2 : Entropy after Split
a)Based on salary
Tier 2 BLR NO → |STier1| = 3
P(yes) =2/3 , P(no) = 1/3
Tier 1 HYD NO
Entropy(STier1) = 0.918
→ |STier2| = 2
Tier 2 MUM YES
P(yes) = ½ , P(no) = ½
Entropy(STier2) = 1
→ I(salary) = 3/5 * 0.918 + 2/5 * 1 =
0.9508
→ Gain(S, salary) = 0.971 -
0.9508=0.0202
Machine Learning
Construct Decision Tree
Solution
Step 1 : Entropy of the entire set:
salary Location job acceptance |S| = 5
P(yes) = 3/5, P(no) = 2/5
Tier1 MUM YES Entropy(S) = 0.971
Step 2 : Entropy after Split
Tier 1 BLR YES b)Based on Location
→ |SMUM| = 2
Tier 2 BLR NO P(yes) = 1, P(no) = 0
Entropy(SMUM) = 0
Tier 1 HYD NO → |SBLR| = 2
P(yes) = ½ , P(no) = ½
Tier 2 MUM YES Entropy(SBLR) = 1
→ |SHYD| = 1
P(yes) = 0, P(no) = 1
Entropy(SHYD) = 0
→ I(Location) = 2/5 * 1 = 0.4
→ Gain(S, Location) = 0.971 - 0.4 =
Machine Learning
Construct Decision Tree
salary Location job acceptance Decision Node - Max Gain
Gain(S, salary) = 0.0202
Tier1 MUM YES Gain(S, location) = 0.571
Tier 1 BLR YES The root node : Location
Location
Tier 2 BLR NO
BLR HYD
Tier 1 HYD NO MUM
Tier 2 MUM YES Salary Yes No
TIER 1 TIER 2
Yes No
Machine Learning
Construct Decision Tree
M N O Y
A C X T
A C Z E
A D X T Example 3
A D Z T
B C X E
B C Z F
B D X F
B D Z F
Machine Learning
Construct Decision Tree
M N O Y
A C X T
A C Z E
A D X T
A D Z T
B C X E
B C Z F
B D X F 1.56
B D Z F
Note that if the target attribute can take on c possible
values, the entropy can be as large as log2 c.
Machine Learning
Construct Decision Tree
Weather Mood Genre Alone Watch Movie
Sunny Happy Comedy No Yes
Sunny Bored Drama Yes No
Cloudy Lazy Thriller Yes Yes
Rainy Sad Romance No Yes
Rainy Bored Comedy Yes No
Rainy Happy Thriller Yes No
Example 4
Cloudy Happy Romance No Yes
Sunny Bored Comedy No No
Sunny Happy Romance Yes Yes
Rainy Lazy Drama No Yes
Sunny Lazy Thriller Yes Yes
Cloudy Bored Comedy Yes Yes
Cloudy Sad Romance No Yes
Rainy Sad Thriller Yes No
Machine Learning
Entropy Calculation before Split
ENTROPY(S) - Calculating for entire Set S
|S| = 14 (training instances provided)
P(watch movie = yes) = p1 = 9/14 = 0.643
P(Play tennis = no) = p2 = 5/14 = 0.357
= - (p1 log2 p1 + p2 log2 p2)
= 0.94
Machine Learning
Calculating Entropy for a Subset
• We will now see, What happens if we split based on:
■ Weather
■ Mood
■ Genre
■ Alone
• First, we will see what will happen If we split based on Weather.
■ Weather has 3 values: three branches = sunny, cloudy, and rainy.
■ Calculate the entropy for each of these subsets.
■ Weighted sum of the entropy’s of the three subsets which becomes the “Entropy after
the split”.
■ We then calculate the gain(weather) = Entropy before split - Entropy after split
• We repeat this process for each attribute.
• The attribute with the maximum gain is picked as the decision node.
Machine Learning
Calculating Entropy for a Subset
Weather Mood Genre Alone Watch Movie
Cloudy Lazy Thriller Yes Yes
Cloudy Happy Romance No Yes
Cloudy Bored Comedy Yes Yes
Cloudy Sad Romance No Yes
Rainy Sad Romance No Yes
Rainy Bored Comedy Yes No
Rainy Happy Thriller Yes No
Rainy Lazy Drama No Yes
Rainy Sad Thriller Yes No
Sunny Happy Comedy No Yes
Sunny Bored Drama Yes No
Sunny Bored Comedy No No
Sunny Happy Romance Yes Yes
Sunny Lazy Thriller Yes Yes
Machine Learning
Entropy after Split based on Weather and Gain Calculation
|S| = 14, Entropy(S) = 0.94
● Splitting based on weather - there will be 3 subsets
|Ssunny| = 5, |Scloudy| = 4 and |Srainy| = 5
● Entropy of these subsets:
■ E(Ssunny) = - ( ⅖ log2 ⅖ + ⅗ log2 ⅗ ) = 0.971
■ E(Scloudy) = - ( 1 log2 1 + 0 log2 0 ) =0
■ E(Srainy) = - ( ⅗ log2 ⅗ + ⅖ log2 ⅖ ) = 0.971
● Weighted avg of the entropies of the subsets:
I(weather) = = 5/14 * 0.971 + 4/14 * 0 + 5/14 * 0.971
= 0.693
● Gain(S, outlook) = Entropy(S) - I(outlook) = 0.94 - 0.693 = 0.247
Machine Learning
Picking an Attribute as Decision Node
• We do a similar calculation (by splitting based on the other attributes) to
calculate the gain
• We can summarize the calculations as:
property value
Entropy(S) 0.94
G(S, weather) 0.247
G(S, mood) …
G(S, genre) …
G(S, alone) …
● We pick the attribute with maximum Gain as the decision node.
Machine Learning
Arriving at a Decision Node - Attribute with highest Gain
● We choose our root node as weather and property value
split according to its three values.
Entropy(S) 0.94
● Entropy of overcast was 0 which means it is G(S, weather) 0.247
single valued and hence we can come to a G(S, mood) …
conclusion from overcast .
G(S, genre) …
G(S, alone) …
weather
E(Ssunny) 0.971
sunny rainy
cloudy
E(Scloudy) 0
E(Srainy) 0.971
yes
Machine Learning
Repeat the process to pick the descendant node for each branch
With Outlook = Sunny, data would look like this:
weather
Weather Mood Genre Alone Watch Movie
sunny rainy
cloudy
Sunny Happy Comedy No Yes
Sunny Bored Drama Yes No yes
Sunny Bored Comedy No No
Sunny Happy Romance Yes Yes
E(Ssunny) 0.971
Sunny Lazy Thriller Yes Yes E(Scloudy) 0
E(Srainy) 0.971
Machine Learning
Entropy after Split based on Mood and Gain Calculation
|Ssunny| = 5, Entropy(Ssunny) = 0.971
● Splitting based on Mood - there will be 3 subsets
|Shappy|= 2, |Sbored|= 2 and |Slazy|= 1
● Entropy of these subsets:
■ E(Shappy) = - ( 1 log2 1 + 0 log2 0 ) =0
■ E(Sbored) = - ( 0 log2 0 + 1 log2 1 ) =0
■ E(Slazy) = - ( 1 log2 1 + 0 log2 0 ) =0
● Avg Information of Mood: weighted avg of the entropies of the subsets:
I(Mood) = =0
● Gain(Ssunny, Mood) = Entropy(Ssunny) - I(Mood) = 0.971 - 0 = 0.971
Machine Learning
Arriving at a Decision node
● Mood has the highest gain hence we choose mood at second level after
weather as sunny
● Remember that both value of all values of mood have entropy of 0 which
means we have reached the leaf node with this.
weather
property value
Entropy(Ssunny) 0.971
sunny cloudy rainy
G(Ssunny, genre) …
G(Ssunny, mood) 0.971
mood yes
G(Ssunny, alone) …
not bored bored
yes no
Machine Learning
Arriving at a Decision node
● Doing the same procedure for the table with weather=rainy we expand the rainy
leading node to get the following decision tree.
weather
sunny cloudy rainy
mood yes alone
not bored bored no yes
yes no yes no
Machine Learning
Dr. Shylaja S S
Department of Computer Science & Engineering
shylaja.sharath@pes.edu