100% found this document useful (1 vote)

65 views34 pages

Lec 18

This document provides an overview of ensemble learning techniques. It discusses the motivation for using ensemble methods, how ensemble classifiers are constructed by manipulating training sets, input features, class labels, or learning algorithms. Specific ensemble methods like bagging, boosting, and random forests are explained. Bagging works by training base classifiers on randomly sampled subsets of data and combining predictions through voting. Boosting iteratively focuses on misclassified examples by increasing their weights. AdaBoost is presented as a popular boosting algorithm.

Uploaded by

ABHIRAJ E

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

65 views34 pages

Lec 18

Uploaded by

ABHIRAJ E

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Ensemble Learning

Prof. Navneet Goyal

Ensemble Learning
• Introduction & Motivation
• Construction of Ensemble Classifiers
– Boosting (Ada Boost)
– Bagging
– Random Forests
• Empirical Comparison
Introduction & Motivation
• Suppose that you are a patient with a set of symptoms
• Instead of taking opinion of just one doctor (classifier),
you decide to take opinion of a few doctors!
• Is this a good idea? Indeed it is.
• Consult many doctors and then based on their diagnosis;
you can get a fairly accurate idea of the diagnosis.
• Majority voting - ‘bagging’
• More weightage to the opinion of some ‘good’
(accurate) doctors - ‘boosting’
• In bagging, you give equal weightage to all classifiers,
whereas in boosting you give weightage according to
the accuracy of the classifier.
Introduction & Motivation
• Suppose you are preparing for an exam
• As part of the preparation, you have to solve 50
problems
• You, as an individual, can solve only 15 problems
• But, one of your wingies can solve 20 problems, some of
which are same as your 15
• Another wingie can solve a set of 12 problems
• When you put them all together, you find that you guys
have solution to 32 problems
• 32 is much better than what you could solve
individually!!
• Collaborative Learning
Introduction & Motivation
• Why do you think I ask you to form groups for
assignments?
– I am against individual student doing an assignment
• May be coz I want to reduce my evaluation work!!
– NOT TRUE!!!
• Group members complement each other and do a
better job
– Provided it is not just one student who actually does the
assignment and rest piggyback!!
• If all students in a group contribute properly, it’s a
WIN-WIN situation for both students and faculty!!
Ensemble Methods
• Construct a set of classifiers from the training data

• Predict class label of previously unseen records by

aggregating predictions made by multiple classifiers
General Idea
Original
D Training data

Step 1:
Create Multiple D1 D2 .... Dt-1 Dt
Data Sets

Step 2:
Build Multiple C1 C2 Ct -1 Ct
Classifiers

Step 3:
Combine C*
Classifiers

Figure taken from Tan et. al. book “Introduction to Data Mining”
Ensemble Classifiers (EC)
• An ensemble classifier constructs a set of ‘base
classifiers’ from the training data
• Methods for constructing an EC
• Manipulating training set
• Manipulating input features
• Manipulating class labels
• Manipulating learning algorithms
Ensemble Classifiers (EC)
• Manipulating training set
• Multiple training sets are created by resampling the data
according to some sampling distribution
• Sampling distribution determines how likely it is that an
example will be selected for training – may vary from one trial
to another
• Classifier is built from each training set using a paritcular
learning algorithm
• Examples: Bagging & Boosting
Ensemble Classifiers (EC)
• Manipulating input features
• Subset of input features chosen to form each training set
• Subset can be chosen randomly or based on inputs given by
Domain Experts
• Good for data that has redundant features
• Random Forest is an example which uses DT as its base
classifierss
Ensemble Classifiers (EC)
• Manipulating class labels
• When no. of classes is sufficiently large
• Training data is transformed into a binary class problem by
randomly partitioning the class labels into 2 disjoint subsets,
A0 & A1
• Re-labelled examples are used to train a base classifier
• By repeating the class labeling and model building steps
several times, and ensemble of base classifiers is obtained
• How a new tuple is classified?
• Example – error correcting output codings (pp 307)
Ensemble Classifiers (EC)
• Manipulating learning algorithm
• Learning algorithms can be manipulated in such a way that
applying the algorithm several times on the same training
data may result in different models
• Example – ANN can produce different models by changing
network topology or the initial weights of links between
neurons
• Example – ensemble of DTs can be constructed by
introducing randomness into the tree growing procedure –
instrad of choosing the best split attribute at each node, we
randomly choose one of the top k attributes
Ensemble Classifiers (EC)
• First 3 approaches are generic – can be applied to any
classifier
• Fourth approach depends on the type of classifier used
• Base classifiers can be generated sequentially or in
parallel
Ensemble Classifiers
• Ensemble methods work better with ‘unstable
classifiers’
• Classifiers that are sensitive to minor perturbations in
the training set
• Examples:
– Decision trees
– Rule-based
– Artificial neural networks
Why does it work?
• Suppose there are 25 base classifiers
– Each classifier has error rate, e = 0.35
– Assume classifiers are independent
– Probability that the ensemble classifier makes a wrong
prediction:
25
æ 25 ö i
å ç
ç i ÷
i =13 è
÷e (1 - e ) 25 -i
= 0.06
ø
– CHK out yourself if it is correct!!

Example taken from Tan et. al. book “Introduction to Data Mining”
Examples of Ensemble Methods

• How to generate an ensemble of classifiers?

– Bagging
– Boosting
– Random Forests
Bagging
• Also known as bootstrap aggregation
Original Data 1 2 3 4 5 6 7 8 9 10
Bagging (Round 1) 7 8 10 8 2 5 10 10 5 9
Bagging (Round 2) 1 4 9 1 2 3 2 7 3 2
Bagging (Round 3) 1 8 5 10 5 5 9 6 3 7
• Sampling uniformly with replacement
• Build classifier on each bootstrap sample
• 0.632 bootstrap
• Each bootstrap sample Di contains approx. 63.2% of
the original training data
• Remaining (36.8%) are used as test set

Example taken from Tan et. al. book “Introduction to Data Mining”
Bagging

• Accuracy of bagging:
k
Acc( M ) = å (0.632 * Acc( M i ) test _ set + 0.368 * Acc( M i ) train _ set )
i =1
• Works well for small data sets
• Example:

X 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
y 1 1 1 -1 -1 -1 -1 1 1 1
Actual
Class
labels

Example taken from Tan et. al. book “Introduction to Data Mining”
Bagging
• Decision Stump
• Single level decision binary
tree
• Entropy – x<=0.35 or
x<=0.75
• Accuracy at most 70%

Actual
Class
labels

Example taken from Tan et. al. book “Introduction to Data Mining”
Bagging

Accuracy of ensemble classifier: 100% J

Example taken from Tan et. al. book “Introduction to Data Mining”
Bagging- Final Points
• Works well if the base classifiers are unstable
• Increased accuracy because it reduces the variance
of the individual classifier
• Does not focus on any particular instance of the
training data
• Therefore, less susceptible to model over-fitting
when applied to noisy data
• What if we want to focus on a particular instances of
training data?
Boosting
• An iterative procedure to adaptively change
distribution of training data by focusing more on
previously misclassified records
– Initially, all N records are assigned equal weights
– Unlike bagging, weights may change at the end of a
boosting round
Boosting
• Records that are wrongly classified will have their
weights increased
• Records that are classified correctly will have their
weights decreased
Original Data 1 2 3 4 5 6 7 8 9 10
Boosting (Round 1) 7 3 2 8 7 9 4 10 6 3
Boosting (Round 2) 5 4 9 4 2 5 1 7 4 2
Boosting (Round 3) 4 4 8 10 4 5 4 6 3 4

• Example 4 is hard to classify

• Its weight is increased, therefore it is more likely
to be chosen again in subsequent rounds

Example taken from Tan et. al. book “Introduction to Data Mining”
Boosting
• Equal weights are assigned to each training tuple
(1/d for round 1)
• After a classifier Mi is learned, the weights are
adjusted to allow the subsequent classifier Mi+1 to
“pay more attention” to tuples that were
misclassified by Mi.
• Final boosted classifier M* combines the votes of
each individual classifier
• Weight of each classifier’s vote is a function of its
accuracy
• Adaboost – popular boosting algorithm
Adaboost
• Input:
– Training set D containing d tuples
– k rounds
– A classification learning scheme
• Output:
– A composite model
Adaboost
• Data set D containing d class-labeled tuples (X1,y1),
(X2,y2), (X3,y3),….(Xd,yd)
• Initially assign equal weight 1/d to each tuple
• To generate k base classifiers, we need k rounds or
iterations
• Round i, tuples from D are sampled with
replacement , to form Di (size d)
• Each tuple’s chance of being selected depends on its
weight
Adaboost
• Base classifier Mi, is derived from training tuples of
Di
• Error of Mi is tested using Di
• Weights of training tuples are adjusted depending
on how they were classified
– Correctly classified: Decrease weight
– Incorrectly classified: Increase weight
• Weight of a tuple indicates how hard it is to classify
it (directly proportional)
Adaboost
• Some classifiers may be better at classifying some
“hard” tuples than others
• We finally have a series of classifiers that
complement each other!
• Error rate of model Mi:d
error ( M i )= å w j * err ( X j )
j
where err(Xj) is the misclassification error for Xj(=1)
• If classifier error exceeds 0.5, we abandon it
• Try again with a new Di and a new Mi derived from it
Adaboost
• error (Mi) affects how the weights of training tuples are
updated
• If a tuple is correctly classified in round i, its weight is
multiplied by error ( M i )
1 - error ( M i )
• Adjust weights of all correctly classified tuples
• Now weights of all tuples (including the misclassified tuples)
are normalized sum _ of _ old _ weights
• Normalization factor = sum _ of _ new _ weights
error ( M i )
• Weight of a classifier Mi’s weight is log 1 - error ( M )
i
Adaboost
• The lower a classifier error rate, the more accurate it is, and
therefore, the higher its weight for voting should be
• Weight of a classifier Mi’s vote is
error ( M i )
log
1 - error ( M i )
• For each class c, sum the weights of each classifier that
assigned class c to X (unseen tuple)
• The class with the highest sum is the WINNER!
Example: AdaBoost
• Base classifiers: C1, C2, …, CT

• Error rate:

å w d (C ( x ) ¹ y )
N
1
ei = j i j j
N j =1
• Importance of a classifier:

1 æ 1 - ei ö
ai = lnçç ÷÷
2 è ei ø
Example: AdaBoost
• Weight update:
-a j
( j +1)
( j)
w ï ìexp if C j ( xi ) = yi
wi =i
í a
Z j ïî exp j if C j ( xi ) ¹ yi
where Z j is the normalization factor
C * ( x ) = arg max å a jd (C j ( x ) = y )
T

y j =1
• If any intermediate rounds produce error rate higher
than 50%, the weights are reverted back to 1/n and
the re-sampling procedure is repeated
• Classification:
Illustrating AdaBoost
Initial weights for each data point Data points
for training

0.1 0.1 0.1

Original
Data +++ - - - - - ++

B1
0.0094 0.0094 0.4623
Boosting
Round 1 +++ - - - - - - - a = 1.9459
Illustrating AdaBoost
B1
0.0094 0.0094 0.4623
Boosting
Round 1 +++ - - - - - - - a = 1.9459

B2
0.3037 0.0009 0.0422
Boosting
Round 2 - - - - - - - - ++ a = 2.9323

B3
0.0276 0.1819 0.0038
Boosting
Round 3 +++ ++ ++ + ++ a = 3.8744

Overall +++ - - - - - ++

Lec 20
No ratings yet
Lec 20
2 pages
Lec 1,2
No ratings yet
Lec 1,2
69 pages
Lec 16,17
No ratings yet
Lec 16,17
90 pages
Pix2Vox Context-Aware 3D Reconstruction From Single and Multi-View Images
No ratings yet
Pix2Vox Context-Aware 3D Reconstruction From Single and Multi-View Images
9 pages
c6 7 9
No ratings yet
c6 7 9
3 pages
CATALOG-TATEYAMA Low PDF
No ratings yet
CATALOG-TATEYAMA Low PDF
8 pages
1 Over 500 Misc Java Interview Questions (PDFDrive)
No ratings yet
1 Over 500 Misc Java Interview Questions (PDFDrive)
106 pages
Wiring Toyota
No ratings yet
Wiring Toyota
6 pages
Dynamic Modelling and Control of Cryogenic AIR Separation Plants
No ratings yet
Dynamic Modelling and Control of Cryogenic AIR Separation Plants
7 pages
Practical Experience Using VLF Tan Delta and Partial
No ratings yet
Practical Experience Using VLF Tan Delta and Partial
5 pages
Gigabyte Ga p43 Es3g Rev 1 4 Owner S Manual
No ratings yet
Gigabyte Ga p43 Es3g Rev 1 4 Owner S Manual
88 pages
Band Pass Filter: FP-E80-OD01, FP-E80D-OD01
100% (1)
Band Pass Filter: FP-E80-OD01, FP-E80D-OD01
3 pages
Well Control Training Levels Guidance Chart
No ratings yet
Well Control Training Levels Guidance Chart
1 page
Idea Presentation Format - SIH 2022 - College
No ratings yet
Idea Presentation Format - SIH 2022 - College
5 pages
IT002 VoIP
No ratings yet
IT002 VoIP
6 pages
Modern Cryptography
No ratings yet
Modern Cryptography
48 pages
Session 24 - SRAM & Computation-In-Memory
No ratings yet
Session 24 - SRAM & Computation-In-Memory
153 pages
Basic ELECtronics
No ratings yet
Basic ELECtronics
16 pages
VeriVide Accessories Datasheet UK 2019
No ratings yet
VeriVide Accessories Datasheet UK 2019
2 pages
Prisma VENT30 Prisma VENT30-C Prisma VENT40 Prisma VENT50 Prisma VENT50-C
No ratings yet
Prisma VENT30 Prisma VENT30-C Prisma VENT40 Prisma VENT50 Prisma VENT50-C
64 pages
Construction of Project Management Using Information System
No ratings yet
Construction of Project Management Using Information System
12 pages
Giriraj Precision Screws PVT - Ltd. Quality Manual Procedure APQP (Advanced Product Quality Planning) 1. Purpose
100% (1)
Giriraj Precision Screws PVT - Ltd. Quality Manual Procedure APQP (Advanced Product Quality Planning) 1. Purpose
9 pages
Hacking Airbnb's Search Rank Algorithm
No ratings yet
Hacking Airbnb's Search Rank Algorithm
17 pages
3G611R+ User Guide
No ratings yet
3G611R+ User Guide
107 pages
Tabela Made Xiaomi, Samsung, Oppo, Motorola, Infinix 14-01-2025
No ratings yet
Tabela Made Xiaomi, Samsung, Oppo, Motorola, Infinix 14-01-2025
5 pages
Be10x Workshop
100% (1)
Be10x Workshop
6 pages
CSB Scope
No ratings yet
CSB Scope
7 pages
Operation Manuel ClipX BM40
No ratings yet
Operation Manuel ClipX BM40
498 pages
For New Sales Tax Registration, Biometric Verification Is Necessary Individual NTN S of Members and Company Is Necessary For The Process
No ratings yet
For New Sales Tax Registration, Biometric Verification Is Necessary Individual NTN S of Members and Company Is Necessary For The Process
1 page
Long Division With No Remainders Activity Sheets Ver 4
No ratings yet
Long Division With No Remainders Activity Sheets Ver 4
4 pages
Massive Hydraulic Fracturing - A Case History in Western Siberia, Russia
No ratings yet
Massive Hydraulic Fracturing - A Case History in Western Siberia, Russia
12 pages
Investigating Mechanical Properties of Animal Bone Powder Partially Replaced Cement in Concrete Production
No ratings yet
Investigating Mechanical Properties of Animal Bone Powder Partially Replaced Cement in Concrete Production
14 pages
Proposal KTI by Dimas Nur Imanullah
No ratings yet
Proposal KTI by Dimas Nur Imanullah
36 pages
Early Fault Detection in Solar PV Systems Using
No ratings yet
Early Fault Detection in Solar PV Systems Using
18 pages

Lec 18

Uploaded by

Lec 18

Uploaded by

Ensemble Learning

Prof. Navneet Goyal

• Predict class label of previously unseen records by

• How to generate an ensemble of classifiers?

Accuracy of ensemble classifier: 100% J

• Example 4 is hard to classify

0.1 0.1 0.1

You might also like