0% found this document useful (0 votes)
37 views11 pages

06 06 Information Gain 11-43

Topics in Machine Learning - Information Gain

Uploaded by

Name
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views11 pages

06 06 Information Gain 11-43

Topics in Machine Learning - Information Gain

Uploaded by

Name
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Back to decision trees

• Which attribute do we choose at each level?


• The one with the highest information gain
– The one that reduces the unpredictability the most

Bill Howe, UW
Before: 14 records, 9 are “yes”

outlook temperature humidity windy play


overcast cool normal TRUE yes
overcast hot high FALSE yes If we choose outlook:
overcast hot normal FALSE yes
overcast : 4 records, 4 are “yes”
overcast mild high TRUE yes
rainy cool normal TRUE no
rainy mild high TRUE no
rainy cool normal FALSE yes
rainy mild high FALSE yes rainy : 5 records, 3 are “yes”
rainy mild normal FALSE yes
sunny hot high FALSE no
sunny hot high TRUE no
sunny mild high FALSE no sunny : 5 records, 2 are “yes”
sunny cool normal FALSE yes
sunny mild normal TRUE yes

Expected new entropy:

= 0.69
Before: 14 records, 9 are “yes”

outlook temperature humidity windy play


overcast cool normal TRUE yes
overcast hot high FALSE yes If we choose temperature:
overcast hot normal FALSE yes
cool : 4 records, 3 are “yes”
overcast mild high TRUE yes
rainy cool normal TRUE no 0.81
rainy mild high TRUE no
rainy cool normal FALSE yes
rainy mild high FALSE yes rainy : 4 records, 2 are “yes”
rainy mild normal FALSE yes
sunny hot high FALSE no 1.0
sunny hot high TRUE no
sunny mild high FALSE no sunny : 6 records, 4 are “yes”
sunny cool normal FALSE yes
sunny mild normal TRUE yes 0.92

Expected new entropy:

0.81(4/14) + 1.0(4/14) + 0.92(6/14)

= 0.91
Before: 14 records, 9 are “yes”

outlook temperature humidity windy play


overcast cool normal TRUE yes
overcast hot high FALSE yes If we choose humidity:
overcast hot normal FALSE yes
normal : 7 records, 6 are “yes”
overcast mild high TRUE yes
rainy cool normal TRUE no 0.59
rainy mild high TRUE no
rainy cool normal FALSE yes
rainy mild high FALSE yes high : 7 records, 2 are “yes”
rainy mild normal FALSE yes
sunny hot high FALSE no 0.86
sunny hot high TRUE no
sunny mild high FALSE no
sunny cool normal FALSE yes
sunny mild normal TRUE yes

Expected new entropy:

0.59(7/14) + 0.86(7/14)

= 0.725
Before: 14 records, 9 are “yes”

outlook temperature humidity windy play


overcast cool normal TRUE yes
overcast hot high FALSE yes If we choose windy:
overcast hot normal FALSE yes
TRUE : 8 records, 6 are “yes”
overcast mild high TRUE yes
rainy cool normal TRUE no 0.81
rainy mild high TRUE no
rainy cool normal FALSE yes
rainy mild high FALSE yes FALSE : 5 records, 3 are “yes”
rainy mild normal FALSE yes
sunny hot high FALSE no 0.97
sunny hot high TRUE no
sunny mild high FALSE no
sunny cool normal FALSE yes
sunny mild normal TRUE yes

Expected new entropy:

0.81(8/14) + 0.97(6/14)

= 0.87
Before: 14 records, 9 are “yes”

outlook temperature humidity windy play


overcast cool normal TRUE yes
overcast hot high FALSE yes
overcast hot normal FALSE yes outlook
overcast mild high TRUE yes
rainy cool normal TRUE no 0.94 – 0.69 = 0.25 highest gain
rainy mild high TRUE no
rainy cool normal FALSE yes temperature
rainy mild high FALSE yes
rainy mild normal FALSE yes 0.94 – 0.91 = 0.03
sunny hot high FALSE no
sunny hot high TRUE no
humidity
sunny mild high FALSE no
sunny cool normal FALSE yes
sunny mild normal TRUE yes 0.94 – 0.725 = 0.215

windy

0.94 – 0.87 = 0.07


Document Classification

Includes “Falcons” Includes “Mars”

yes no yes no

50% “Sports” 50% “Sports” 2% “Sports” 50% “Sports”


50% “Science” 50% “Science” 95% “Science” 50% “Science”

Bill Howe, UW
Building a Decision Tree (ID3 Algorithm)
• Assume attributes are discrete
– Discretize continuous attributes
• Choose the attribute with the highest Information Gain
• Create branches for each value of attribute
• Examples partitioned based on selected attributes
• Repeat with remaining attributes
• Stopping conditions
– All examples assigned the same label
– No examples left

Bill Howe, UW
Problems

• Expensive to train
• Prone to overfitting
– Drive to perfection on training data, bad on
test data
– Pruning can help: remove or aggregate
subtrees that provide little discriminatory
power (C45)

Bill Howe, UW
C4.5 Extensions
• Continuous Attributes
outlook temperature humidity windy play
overcast cool 60 TRUE yes
overcast hot 80 FALSE yes
overcast hot 63 FALSE yes
overcast mild 81 TRUE yes
rainy cool 58 TRUE no
rainy mild 90 TRUE no
rainy cool 54 FALSE yes
rainy mild 92 FALSE yes
rainy mild 59 FALSE yes
sunny hot 90 FALSE no
sunny hot 89 TRUE no
sunny mild 90 FALSE no
sunny cool 60 FALSE yes
sunny mild 62 TRUE yes

Bill Howe, UW
Consider every possible binary
partition; choose the partition
with the highest gain

outlook temperature humidity windy play


rainy mild 54 FALSE yes
overcast hot 58 FALSE yes
overcast cool 59 TRUE yes
rainy cool 60 FALSE yes E(6/6)
overcast mild 60 TRUE yes
overcast hot 62 FALSE yes
= 0.0 E(9/10)+ E(1/10)
rainy mild 63 TRUE no
sunny cool 80 FALSE yes = 0.47
rainy mild 81 FALSE yes
sunny mild 89 TRUE yes
sunny hot 90 FALSE no E(3/8) + E(5/8)
rainy cool 90 TRUE no
sunny hot 90 TRUE no = 0.95 E(4/4)
sunny mild 92 FALSE no
= 0.0

Expect = 8/14*0.95 + 6/14*0 Expect = 10/14*0.47 + 4/14*0


= 0.54 = 0.33

Bill Howe, UW

You might also like