0% found this document useful (0 votes)
5 views45 pages

Machine Learning: Professor Department of Computer Science & Engineering

The document provides an overview of decision trees in machine learning, detailing their structure, terminology, and algorithms such as ID3 and CART. It explains how decision trees are built using attributes and their associated entropy and information gain calculations. The document also includes examples of decision trees for various classification tasks, illustrating the decision-making process involved in selecting the best attributes for nodes.

Uploaded by

sanjaydm23072005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views45 pages

Machine Learning: Professor Department of Computer Science & Engineering

The document provides an overview of decision trees in machine learning, detailing their structure, terminology, and algorithms such as ID3 and CART. It explains how decision trees are built using attributes and their associated entropy and information gain calculations. The document also includes examples of decision trees for various classification tasks, illustrating the decision-making process involved in selecting the best attributes for nodes.

Uploaded by

sanjaydm23072005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Machine Learning

Dr. Shylaja S S
Professor
Department of Computer Science & Engineering

Acknowledgement : Dr. Rajanikanth K (Former Principal MSRIT, PhD(IISc),


Academic Advisor,
PESU)
Prof. Preet Kanwal (Associate Professor, PESU)
Teaching Assistant (Arya Rajiv Chaloli)
Machine Learning
Decision Trees

● Decision tree is a flowchart-like diagram


to make decisions based on a series of
conditions.

● The learned function is represented by a


decision tree.

● Decision tree can also be re-represented


as if-then rules to improve human
readability.
Machine Learning
Decision Trees

● Decision trees can be used for


classification and regression
problems

● Decision trees classify instances by


sorting them down the tree
from the root to some leaf node.
○ Node: Specifies some attribute of
an instance to be tested
○ Branch: Corresponds to one of
the possible values for an
attribute
Machine Learning
Decision Tree (multiple branches) - Loan Application

Income
Range

<30K >70K
30-70K

No. years in Criminal


Govt job
current job record

Yes No <1 1-5 >5 Yes No

Approve Reject Reject Credit card Approve Reject Approve


outstanding balance with
double
high medium low premium

Reject Approve Approve with premium


Machine Learning
Terminology

Root Node:
It’s the topmost node where we
start building the tree.

Leaf node:
It’s the end of the decision tree
and carries the decision.

Decision Node:
Is a node where branching
(testing) occurs. Certain attribute
is tested and based on the
attribute value, a particular branch
is taken.
Machine Learning
Terminology - Note

• Every node except leaf node is a decision Example decision tree for concept to play Tennis
node.
Outlook
• Each decision node tests for an attribute.
sunny Rain
• Each branch corresponds to an attribute value. Overcast

• Each leaf node assigns a classification label. Humidity yes Wind

• Decision trees represent a disjunction of


conjunctions (more expressive than FIND- High Normal Strong Weak
S).
no yes no yes

For example, the decision is yes whenever: (Outlook = Sunny ^ Humidity = Normal) OR (Outlook
= Overcast) OR (Outlook = Rain ^ Wind = Weak)
Machine Learning
Building Decision Trees

1. Select the “best” decision attribute for the


next node - crux of the problem
Decision tree for concept to play Tennis
2. Assign that attribute as the decision attribute
Outlook
for node.
3. For each value of the attribute, create new sunny Rain
descendant of node (new branch is opened).
Overcast
○ It will depend on the subset of instances
yes
that fall under that branch. Humidity Wind

○ If no further decision making is required,


you will reach the leaf node. High Normal Strong Weak

4. Sort training examples to new nodes. no yes no yes


5. If training examples perfectly classified, then
STOP, else iterate over new leaf nodes.
Machine Learning
Multiple decision trees for the same task

Decision trees vary based on the algorithm used to select the


next “best” decision attribute node
Example: To find whether a student plays basketball or not

Gender
Performance
Girl Boy
Above Below
Average average
Performance Height

Above Above
Below average Below 5.5 ft.
Average 5.5 ft. Height GENDER

NO YES
Height Performance Below 5.5 ft. Above 5.5 ft. Girl. Boy

Below Above 5.5 ft. Above


Below average
5.5 ft. Average
NO YES
NO YES
NO YES NO YES

Which one do we prefer and why?


Machine Learning
Some Decision Trees Algorithms

ID3 Algorithm
(Iterative Dichotomiser 3) CART
It was developed in 1986 by Ross Classification and Regression Trees or
Quinlan. CART for short is a term introduced
The algorithm creates a multiway by Leo Breiman. It can be used for
tree, finding for each node (i.e. in a classification or regression predictive
greedy manner) the categorical modeling problems.
feature that will yield the largest
information gain for the targets. It uses Gini Impurity as the heuristic,
for classification, and sum of least
It can be used for classification and squares as the heuristic for regression.
regression.
Machine Learning
ID3 Algorithm - Entropy

• At a given point, we prefer an attribute as the decision node that


partitions the set into subsets which are as homogeneous as possible.
Ideally, no more decisions are required!

• Entropy is an indicator of non-homogeneity / uncertainty. It is a measure


of disorder (indicator of how messy your data is)
■ Entropy needs to be reduced.

• Higher the entropy, the more non-homogeneous / uncertain / disordered


the data set is.

• If the attribute A has c possible values, its entropy with respect to a set S
is given by:
Entropy(A) = H(A) = - ∑ p(xi) log2 (p(xi)); i = 1 to c
Machine Learning
ID3 Algorithm - Entropy

BUCKET 1 BUCKET 2 BUCKET 3


LESS ENTROPY MEDIUM ENTROPY HIGH ENTROPY
We know for sure that We know with 75% certainty We know with 50%
the ball coming out is that the ball is red, and with certainty that the ball is
red. 25% certainty that it’s red, and 25% certainty
green. with yellow and green
each.

● So, we have more knowledge regarding bucket 1 and bucket 1 has the least entropy.
Machine Learning
ID3 Algorithm - Information Gain

● Information gain is a measure of the effectiveness of an attribute in


classifying the training data.

● Information Gain = (Entropy before split – Entropy after split)


■ Information gain should be as high as possible

● Information gain is the expected reduction in entropy caused by


partitioning the examples according to the attribute.
Machine Learning
ID3 Algorithm - Information Gain

● Average information entropy: I(A) is the weighted average of entropies


of individual subsets.
Machine Learning
ID3 Algorithm - Iterative Dichotomiser 3 (ID3)

Steps to create a decision tree using the ID3 algorithm

1. Compute the entropy for data-set entropy(s)

1. For each attribute:


• Calculate entropy for all other values entropy(a)
• Take average information entropy for the current attribute
• Calculate gain for the current attribute

3. Pick the highest gain attribute

4. Repeat until we get the tree we desire.


Machine Learning
Construct Decision Tree

Outlook Temp Humidity Windy Play tennis


Sunny High High Weak No
Sunny High High Strong No
Overcast High High Weak Yes
Rainy Medium High Weak Yes
Rainy Cool Normal Weak Yes Example 1
Rainy Cool Normal Strong No
Overcast Cool Normal Strong Yes
Sunny Medium High Weak No
Sunny Cool Normal Weak Yes
Rainy Medium Normal Weak Yes
Sunny Medium Normal Strong Yes
Overcast Medium High Strong Yes
Overcast High Normal Weak Yes
Rainy Medium High Strong No
Machine Learning
Entropy Calculation before Split

Outlook Temp Humidity Windy Tennis


Sunny High High Weak No ENTROPY(S) - Calculating for entire Set S
Sunny High High Strong No
|S| = 14 (training instances provided)
Overcast High High Weak Yes
Rainy Medium High Weak Yes P(play tennis = yes) = p1 = 9/14 = 0.643
Rainy Cool Normal Weak Yes
Rainy Cool Normal Strong No P(Play tennis = no) = p2 = 5/14 = 0.357
Overcast Cool Normal Strong Yes
Sunny Medium High Weak No
Sunny Cool Normal Weak Yes
Rainy Medium Normal Weak Yes
Sunny Medium Normal Strong Yes
Overcast Medium High Strong Yes = - (p1 log2 p1 + p2 log2 p2)
Overcast High Normal Weak Yes
Rainy Medium High Strong No
= 0.94
Machine Learning
Calculating Entropy for a Subset

• We will now see, What happens if we split based on:


■ Outlook
■ Temp
■ Humidity
■ Windy

• First, we will see what will happen If we split based on Outlook.


■ Outlook has 3 values that means there will be three branches (three subsets) =
sunny, overcast and rain.
■ We will calculate the entropy for each of these subsets.
■ We take a weighted sum of the entropy’s of the three subsets which becomes
the “Entropy after the split”.
■ We then calculate the gain(Outlook) = Entropy before split - Entropy after split

• We repeat this process for each attribute.

• The attribute with the maximum gain is picked as the decision node.
Machine Learning
Calculating Entropy for a Subset

Outlook Temp Humidity Windy Play tennis


Sunny High High Weak No
Sunny High High Strong No
Sunny Medium High Weak No
Sunny Cool Normal Weak Yes
Outlook
Sunny Medium
Temp Humidity
Normal Strong
Windy PlayYes
tennis
Overcast High High Weak Yes
Overcast Cool Normal Strong Yes
Overcast Medium High Strong Yes
Outlook
Overcast Temp
High Humidity
Normal Windy
Weak PlayYes
tennis
Rainy Medium High Weak Yes
Rainy Cool Normal Weak Yes
Rainy Cool Normal Strong No
Rainy Medium Normal Weak Yes
Rainy Medium High Strong No
Machine Learning
Entropy after Split based on Outlook and Gain Calculation

|S| = 14, Entropy(S) = 0.94

● Splitting based on Outlook - There will be three subsets


Ssunny , Sovercast , Srain such that |Ssunny|= 5, |Sovercast|= 4 and |Srain|= 5

● Entropy of these subsets:


■ E(Ssunny) = - ( ⅖ log2 ⅖ + ⅗ log2 ⅗ ) = 0.971
■ E(Sovercast) = - ( 1 log2 1 + 0 log2 0 ) =0
■ E(Srain) = - ( ⅗ log2 ⅗ + ⅖ log2 ⅖ ) = 0.971

● Avg Information of Outlook : weighted avg of the entropies of the subsets:


I(Outlook) = = 5/14 * 0.971 + 4/14 * 0 + 5/14 * 0.971
= 0.693

● Gain(S, outlook) = Entropy(S) - I(outlook) = 0.94 - 0.693 = 0.247


Machine Learning
Entropy after Split based on Temp and Gain Calculation

|S| = 14, Entropy(S) = 0.94

● Splitting based on Temp - There will be three subsets


Shigh , Smedium , Scool such that |Shigh|= 4, |Smedium|= 6 and |Scool|= 4

● Entropy of these subsets:


■ E(Shigh) = - (2/4 log2 2/4 + 2/4 log2 2/4 ) =1
■ E(Smedium) = - ( 4/6 log2 4/6 + 2/6 log2 2/6 ) = 0.918
■ E(Scool) = - ( 3/4 log2 3/4 + 1/4 log2 1/4 ) = 0.81125

● Avg Information of Outlook : weighted avg of the entropies of the subsets:


I(Temp) = = 4/14 * 1 + 6/14 * 0.918 + 4/14 * 0.81125
= 0.9109 = 0.911
● Gain(S, Temp) = Entropy(S) - I(Temp) = 0.94 - 0.911 = 0.029
Machine Learning
Picking an Attribute as Decision Node

• We do a similar calculation to calculate the following:


■ Gain(S, Humidity)
■ G(S, Windy)
• Hence, we can summarize the calculations as:

property value
Entropy(S) 0.94 outlook

G(S, Outlook) 0.247


sunny rain
G(S, Temp) 0.029 overcast
G(S, Humidity) 0.152
G(S, Windy) 0.048
• We pick the attribute with maximum Gain as the decision node.
Machine Learning
Arriving at a Decision Node - Attribute with highest Gain

● We choose our root node as outlook and split according to its three
values.

● Entropy of overcast was 0 which means it is single valued and hence we


can come to a conclusion from overcast .

outlook E(Ssunny) 0.971

E(SOvercast) 0
sunny rain
overcast E(Srain) 0.971
yes
Machine Learning
Repeat the process to pick the descendant node for each branch

With Outlook = Sunny, data would look like this:

outlook
Outlook Temp Humidity Windy Play tennis

Sunny High High Weak No


sunny rain
Sunny High High Strong No overcast
Sunny Medium High Weak No
yes
Sunny Cool Normal Weak Yes

Sunny Medium Normal Strong Yes

E(Ssunny) 0.971
E(SOvercast) 0
E(Srain) 0.971
Machine Learning
Entropy after Split based on Temp and Gain Calculation

|Ssunny| = 5, Entropy(Ssunny) = 0.971

● Splitting based on Temp - There will be three subsets


Shigh , Smedium , Scool such that |Shigh|= 2, |Smedium|= 2 and |Scool|= 1

● Entropy of these subsets:


■ E(Shigh) = - ( 0 log2 0 + 1 log2 1 ) = 0
■ E(Smedium) = - ( ½ log2 ½ + ½ log2 ½ ) = 1
■ E(Scool) = - (1 log2 1 + 0 log2 0 ) = 0

● Avg Information of Outlook : weighted avg of the entropies of the subsets:


I(Temp) = = ⅖ * 1 = 0.4

● Gain(Ssunny, Temp) = Entropy(Ssunny) - I(Temp) = 0.971 - 0.4 = 0.571


Machine Learning
Entropy after Split based on Humidity and Gain Calculation

|Ssunny| = 5, Entropy(Ssunny) = 0.971

● Splitting based on Humidity - There will be two subsets


Shigh , Snormal such that |Shigh|= 2, |Snormal|= 3

● Entropy of these subsets:


■ E(Shigh) = - ( 0 log2 0 + 1 log2 1 ) = 0
■ E(Snormal) = - ( 1 log2 1 + 0 log2 0 ) = 0

● Avg Information of Outlook : weighted avg of the entropies of the subsets:


I(Humidity) = =0

● Gain(Ssunny, Humidity) = Entropy(Ssunny) - I(Humidity) = 0.971 - 0 = 0.971


Machine Learning
Entropy after Split based on Windy and Gain Calculation

|Ssunny| = 5, Entropy(Ssunny) = 0.971

● Splitting based on Humidity - There will be two subsets


Sweak , Sstrong such that |Sweak|= 3, |Sstrongl|= 2

● Entropy of these subsets:


■ E(Sweak) = - ( ⅓ log2 ⅓ + ⅔ log2 ⅔ ) = 0.918
■ E(Sstrong) = - ( ½ log2 ½ + ½ log2 ½ ) = 1

● Avg Information of Outlook : weighted avg of the entropies of the subsets:


I(Windy) = = 0.951

● Gain(Ssunny, Windy) = Entropy(Ssunny) - I(Windy) = 0.971 - 0.951 = 0.02


Machine Learning
Arriving at a Decision node

● Humidity has the highest gain hence we choose humidity at second level after outlook
as sunny
● Remember that both value of humidity (high and normal) had entropy of 0 which
means we have reached the leaf node with this.
● Doing the same procedure for the table with outlook=rain we expand the rain leading
node to get the following decision tree.

outlook property value

Entropy(Ssunny) 0.97
sunny rain
overcast G(Ssunny,temp) 0.571
humidity
yes windy G(Ssunny,humidity) 0.971

G(Ssunny,windy) 0.02
high normal
strong weak

no yes
no yes
Machine Learning
Construct Decision Tree

salary Location job acceptance

Tier1 MUM YES

Tier 1 BLR YES


Example 2
Tier 2 BLR NO

Tier 1 HYD NO consider the data


Tier 2 MUM YES set of concept of
Tier 1 HYD NO accepting a job

Ensure there are no duplicate rows in your data


Machine Learning
Construct Decision Tree

salary Location job acceptance Solution


Step 1 : Entropy of the entire set:
MUM YES |S| = 5
Tier1
P(yes) = 3/5, P(no) = 2/5
Entropy(S) = 0.971
Tier 1 BLR YES
Step 2 : Entropy after Split
a)Based on salary
Tier 2 BLR NO → |STier1| = 3
P(yes) =2/3 , P(no) = 1/3
Tier 1 HYD NO
Entropy(STier1) = 0.918
→ |STier2| = 2
Tier 2 MUM YES
P(yes) = ½ , P(no) = ½
Entropy(STier2) = 1
→ I(salary) = 3/5 * 0.918 + 2/5 * 1 =
0.9508
→ Gain(S, salary) = 0.971 -
0.9508=0.0202
Machine Learning
Construct Decision Tree

Solution
Step 1 : Entropy of the entire set:
salary Location job acceptance |S| = 5
P(yes) = 3/5, P(no) = 2/5
Tier1 MUM YES Entropy(S) = 0.971
Step 2 : Entropy after Split
Tier 1 BLR YES b)Based on Location
→ |SMUM| = 2
Tier 2 BLR NO P(yes) = 1, P(no) = 0
Entropy(SMUM) = 0
Tier 1 HYD NO → |SBLR| = 2
P(yes) = ½ , P(no) = ½
Tier 2 MUM YES Entropy(SBLR) = 1
→ |SHYD| = 1
P(yes) = 0, P(no) = 1
Entropy(SHYD) = 0
→ I(Location) = 2/5 * 1 = 0.4
→ Gain(S, Location) = 0.971 - 0.4 =
Machine Learning
Construct Decision Tree

salary Location job acceptance Decision Node - Max Gain


Gain(S, salary) = 0.0202
Tier1 MUM YES Gain(S, location) = 0.571

Tier 1 BLR YES The root node : Location

Location
Tier 2 BLR NO
BLR HYD
Tier 1 HYD NO MUM

Tier 2 MUM YES Salary Yes No

TIER 1 TIER 2

Yes No
Machine Learning
Construct Decision Tree

M N O Y
A C X T
A C Z E
A D X T Example 3
A D Z T
B C X E
B C Z F
B D X F
B D Z F
Machine Learning
Construct Decision Tree

M N O Y
A C X T
A C Z E
A D X T
A D Z T
B C X E
B C Z F
B D X F 1.56
B D Z F
Note that if the target attribute can take on c possible
values, the entropy can be as large as log2 c.
Machine Learning
Construct Decision Tree

Weather Mood Genre Alone Watch Movie

Sunny Happy Comedy No Yes

Sunny Bored Drama Yes No

Cloudy Lazy Thriller Yes Yes

Rainy Sad Romance No Yes

Rainy Bored Comedy Yes No

Rainy Happy Thriller Yes No


Example 4
Cloudy Happy Romance No Yes

Sunny Bored Comedy No No

Sunny Happy Romance Yes Yes

Rainy Lazy Drama No Yes

Sunny Lazy Thriller Yes Yes

Cloudy Bored Comedy Yes Yes

Cloudy Sad Romance No Yes

Rainy Sad Thriller Yes No


Machine Learning
Entropy Calculation before Split

ENTROPY(S) - Calculating for entire Set S

|S| = 14 (training instances provided)

P(watch movie = yes) = p1 = 9/14 = 0.643

P(Play tennis = no) = p2 = 5/14 = 0.357

= - (p1 log2 p1 + p2 log2 p2)

= 0.94
Machine Learning
Calculating Entropy for a Subset

• We will now see, What happens if we split based on:


■ Weather
■ Mood
■ Genre
■ Alone

• First, we will see what will happen If we split based on Weather.


■ Weather has 3 values: three branches = sunny, cloudy, and rainy.
■ Calculate the entropy for each of these subsets.
■ Weighted sum of the entropy’s of the three subsets which becomes the “Entropy after
the split”.
■ We then calculate the gain(weather) = Entropy before split - Entropy after split

• We repeat this process for each attribute.

• The attribute with the maximum gain is picked as the decision node.
Machine Learning
Calculating Entropy for a Subset

Weather Mood Genre Alone Watch Movie

Cloudy Lazy Thriller Yes Yes

Cloudy Happy Romance No Yes

Cloudy Bored Comedy Yes Yes

Cloudy Sad Romance No Yes

Rainy Sad Romance No Yes

Rainy Bored Comedy Yes No

Rainy Happy Thriller Yes No

Rainy Lazy Drama No Yes

Rainy Sad Thriller Yes No

Sunny Happy Comedy No Yes

Sunny Bored Drama Yes No

Sunny Bored Comedy No No

Sunny Happy Romance Yes Yes

Sunny Lazy Thriller Yes Yes


Machine Learning
Entropy after Split based on Weather and Gain Calculation

|S| = 14, Entropy(S) = 0.94

● Splitting based on weather - there will be 3 subsets


|Ssunny| = 5, |Scloudy| = 4 and |Srainy| = 5

● Entropy of these subsets:


■ E(Ssunny) = - ( ⅖ log2 ⅖ + ⅗ log2 ⅗ ) = 0.971
■ E(Scloudy) = - ( 1 log2 1 + 0 log2 0 ) =0
■ E(Srainy) = - ( ⅗ log2 ⅗ + ⅖ log2 ⅖ ) = 0.971

● Weighted avg of the entropies of the subsets:


I(weather) = = 5/14 * 0.971 + 4/14 * 0 + 5/14 * 0.971
= 0.693
● Gain(S, outlook) = Entropy(S) - I(outlook) = 0.94 - 0.693 = 0.247
Machine Learning
Picking an Attribute as Decision Node

• We do a similar calculation (by splitting based on the other attributes) to


calculate the gain

• We can summarize the calculations as:

property value
Entropy(S) 0.94
G(S, weather) 0.247
G(S, mood) …
G(S, genre) …
G(S, alone) …
● We pick the attribute with maximum Gain as the decision node.
Machine Learning
Arriving at a Decision Node - Attribute with highest Gain

● We choose our root node as weather and property value


split according to its three values.
Entropy(S) 0.94
● Entropy of overcast was 0 which means it is G(S, weather) 0.247
single valued and hence we can come to a G(S, mood) …
conclusion from overcast .
G(S, genre) …
G(S, alone) …

weather

E(Ssunny) 0.971
sunny rainy
cloudy
E(Scloudy) 0
E(Srainy) 0.971
yes
Machine Learning
Repeat the process to pick the descendant node for each branch

With Outlook = Sunny, data would look like this:

weather

Weather Mood Genre Alone Watch Movie


sunny rainy
cloudy
Sunny Happy Comedy No Yes

Sunny Bored Drama Yes No yes

Sunny Bored Comedy No No

Sunny Happy Romance Yes Yes


E(Ssunny) 0.971
Sunny Lazy Thriller Yes Yes E(Scloudy) 0
E(Srainy) 0.971
Machine Learning
Entropy after Split based on Mood and Gain Calculation

|Ssunny| = 5, Entropy(Ssunny) = 0.971

● Splitting based on Mood - there will be 3 subsets


|Shappy|= 2, |Sbored|= 2 and |Slazy|= 1

● Entropy of these subsets:


■ E(Shappy) = - ( 1 log2 1 + 0 log2 0 ) =0
■ E(Sbored) = - ( 0 log2 0 + 1 log2 1 ) =0
■ E(Slazy) = - ( 1 log2 1 + 0 log2 0 ) =0

● Avg Information of Mood: weighted avg of the entropies of the subsets:


I(Mood) = =0

● Gain(Ssunny, Mood) = Entropy(Ssunny) - I(Mood) = 0.971 - 0 = 0.971


Machine Learning
Arriving at a Decision node

● Mood has the highest gain hence we choose mood at second level after
weather as sunny
● Remember that both value of all values of mood have entropy of 0 which
means we have reached the leaf node with this.

weather
property value

Entropy(Ssunny) 0.971
sunny cloudy rainy
G(Ssunny, genre) …

G(Ssunny, mood) 0.971


mood yes
G(Ssunny, alone) …

not bored bored

yes no
Machine Learning
Arriving at a Decision node

● Doing the same procedure for the table with weather=rainy we expand the rainy
leading node to get the following decision tree.

weather

sunny cloudy rainy

mood yes alone

not bored bored no yes

yes no yes no
Machine Learning
Dr. Shylaja S S
Department of Computer Science & Engineering
shylaja.sharath@pes.edu

You might also like