0% found this document useful (0 votes)
24 views35 pages

Lecture 2

Ensemble learning is an AI/ML technique that combines multiple models to enhance prediction accuracy by leveraging their diverse strengths. It operates similarly to gathering opinions from various sources to make informed decisions, with methods like max voting, averaging, and weighted averaging. Advanced techniques include bagging and boosting, which improve model performance through parallel training and sequential learning, respectively.

Uploaded by

vikrammadhad2446
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views35 pages

Lecture 2

Ensemble learning is an AI/ML technique that combines multiple models to enhance prediction accuracy by leveraging their diverse strengths. It operates similarly to gathering opinions from various sources to make informed decisions, with methods like max voting, averaging, and weighted averaging. Advanced techniques include bagging and boosting, which improve model performance through parallel training and sequential learning, respectively.

Uploaded by

vikrammadhad2446
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

AIML

Dr. Nitin Arvind Shelke


1
Ensemble Learning
• Ensemble learning is a AIML technique that combines multiple
models to improve the accuracy of predictions.
• The idea is that combining models with different strengths and
weaknesses can produce better results than any single model.

2
Introduction
• When you want to purchase a new car, will you walk up
to the first car shop and purchase one based on the
advice of the dealer? It’s highly unlikely.
• You would likely browser a few web portals where
people have posted their reviews and compare
different car models, checking for their features and
prices. You will also probably ask your friends and
colleagues for their opinion. In short, you wouldn’t
directly reach a conclusion, but will instead make a
decision considering the opinions of other people as
well.
• Ensemble models in AI operate on a similar idea. They
combine the decisions from multiple models to
improve the overall performance.
3
Introduction to Ensemble Learning
• Let’s understand the concept of ensemble learning with another
example.
• Suppose you are a movie director and you have created a short
movie on a very important and interesting topic.
• Now, you want to take preliminary feedback (ratings) on the movie
before making it public. What are the possible ways by which you
can do that?

4
Introduction to Ensemble Learning
A: You may ask one of your friends to rate the movie for you.
Now it’s entirely possible that the person you have chosen loves you very
much and doesn’t want to break your heart by providing a 1-star rating to
the horrible work you have created.

B: Another way could be by asking 5 colleagues of yours to rate the


movie.
This should provide a better idea of the movie. This method may provide
honest ratings for your movie. But a problem still exists. These 5 people
may not be “Subject Matter Experts” on the topic of your movie. Sure,
they might understand the cinematography, the shots, or the audio, but
at the same time may not be the best judges of dark humour.

5
Introduction to Ensemble Learning
C: How about asking 50 people to rate the movie?
Some of which can be your friends, some of them can be your colleagues and
some may even be total strangers.

The responses, in this case, would be more generalized and diversified since
now you have people with different sets of skills. And as it turns out – this is a
better approach to get honest ratings than the previous cases we saw.

With these examples, you can infer that a diverse group of people are likely to
make better decisions as compared to individuals. Similar is true for a diverse
set of models in comparison to single models. This diversification in Machine
Learning is achieved by a technique called Ensemble Learning.

6
How it works
• Train multiple models: Train multiple models, such as regression
models, classification models, or neural networks, to address a
common problem
• Combine predictions: Combine the predictions from the
individual models using methods like averaging, voting, or
stacking
• Improve accuracy: The combined predictions should be more
accurate than any single model

7
Simple Ensemble Techniques
• Max Voting
• Averaging
• Weighted Averaging

8
Max Voting
• The max voting method is generally used for classification
problems. In this technique, multiple models are used to make
predictions for each data point. The predictions by each model are
considered as a ‘vote’. The predictions which we get from the
majority of the models are used as the final prediction.

• For example, when you asked 5 of your colleagues to rate your


movie (out of 5); we’ll assume three of them rated it as 4 while two
of them gave it a 5. Since the majority gave a rating of 4, the final
rating will be taken as 4. You can consider this as taking the mode
of all the predictions.

9
Max Voting

10
Averaging
• Similar to the max voting technique, multiple predictions are
made for each data point in averaging. In this method, we take an
average of predictions from all the models and use it to make the
final prediction. Averaging can be used for making predictions in
regression problems or while calculating probabilities for
classification problems.
• For example, in the below case, the averaging method would take
the average of all the values.

• i.e. (5+4+5+4+4)/5 = 4.4


11
Averaging

12
Weighted Average
• This is an extension of the averaging method. All models are
assigned different weights defining the importance of each model
for prediction. For instance, if two of your colleagues are
critics, while others have no prior experience in this field, then
the answers by these two friends are given more importance
as compared to the other people.

• The result is calculated as [(5*0.23) + (4*0.23) + (5*0.18) + (4*0.18)


+ (4*0.18)] = 4.41.

13
Weighted Average

14
Advanced Ensemble techniques
Bagging
Boosting

15
Bagging
• A Bagging classifier is an ensemble meta-estimator that fits base
classifiers each on random subsets of the original dataset and
then aggregate their individual predictions (either by voting or by
averaging) to form a final prediction.
• Each base classifier is trained in parallel with a training set which
is generated by randomly drawing, with replacement, N
examples(or data) from the original training dataset – where N is
the size of the original training set. Training set for each of the base
classifiers is independent of each other. Many of the original data
may be repeated in the resulting training set while others may be
left out.

16
Bagging Steps
• Multiple subsets are created from
the original dataset, selecting
observations with replacement.
• A base model (weak model) is
created on each of these subsets.
• The models run in parallel and are
independent of each other.
• The final predictions are determined
by combining the predictions from all
the models.

17
Bootstrapping
• The bootstrap method
refers to creating small
multiple subsets of data
from an entire dataset.
These subsets of data are
randomly sampled and
replaced. The
replacement of the
sample is known as
resampling.

18
Bagging

19
How Bagging works on training dataset ?

• Since Bagging resamples the original training dataset with


replacement, some instance(or data) may be present multiple times
while others are left out.

20
Algorithm

21
Boosting

• The term boosting is used to describe a family of algorithms which


are able to convert weak model to strong model.
• Boosting incrementally build an ensemble by training each model
with the same dataset where the weight of the instances are adjusted
according to the error of the last prediction. Each time the dataset is
created, it is modified by adding more of the data points that failed
with the previous model.
• The boosting technique follows a sequential order. The output of one
base learner will be input to another. If a base classifier is
misclassified (red box), its weight will get increased (over-weighting)
and the next base learner will classify more correctly.

22
How Boosting Algorithm Works?
The basic principle behind the working of the boosting algorithm is
to generate multiple weak learners and combine their predictions to
form one strong rule.
Firstly, a model is built from the training data. Then the second
model is built which tries to correct the errors present in the first
model. This procedure is continued and models are added until
either the complete training data set is predicted correctly or the
maximum number of models are added.

23
How Boosting Algorithm Works?
• Step 1: The base algorithm reads the data and assigns equal
weight to each sample observation.

• Step 2: False predictions made by the base learner are identified.


In the next iteration, these false predictions are assigned to the
next base learner with a higher weightage on these incorrect
predictions.

• Step 3: Repeat step 2 until the algorithm can correctly classify the
output.

24
Boosting
• Boosting is almost the same as bagging, except the dataset is not
random.
• This way, the performance of each of the subsequent models
increase significantly by specifically learning from the failed data
points (rather than the straight forward ones).

25
Boosting

The boosting technique follows a sequential order. The output of one base learner will be input to another.
If a base classifier is misclassified (red box), its weight will get increased (over-weighting) and the next
base learner will classify more correctly. The next logical step is to combine the classifiers to predict the
results. 26
Bagging Vs Boosting

27
Bagging Vs Boosting

28
Types Of Boosting

There are three main ways through which boosting can be carried
out:

• Adaptive Boosting or AdaBoost

• Gradient Boosting

• XGBoost

29
AdaBoost
• AdaBoost short for Adaptive Boosting is an ensemble learning
used in machine learning for classification and regression
problems.
• The main idea behind AdaBoost is to iteratively train the weak
classifier on the training dataset with each successive classifier
giving more weightage to the data points that are misclassified.
• The final AdaBoost model is decided by combining all the weak
classifiers that have been used for training with the weightage
given to the models according to their accuracies.
• The weak model which has the highest accuracy is given the
highest weightage while the model which has the lowest accuracy
is given a lower weightage.
30
Gradient Boosting
• Gradient Boosting is also based on sequential ensemble learning. Here
the base learners are generated sequentially in such a way that the
present base learner is always more effective than the previous one,
i.e. the overall model improves sequentially with each iteration.
• The difference in this type of boosting is that the weights for
misclassified outcomes are not incremented, instead, the Gradient
Boosting method tries to optimize the loss function of the previous
learner by adding a new model that adds weak learners in order to
reduce the loss function.
• The main idea here is to overcome the errors in the previous learner’s
predictions. This type of boosting has three main components:
31
Gradient Boosting
• Loss function that needs to be ameliorated.
• Weak learner for computing predictions and forming strong
learners.
• An Additive Model that will regularize the loss function.
• Like AdaBoost, Gradient Boosting can also be used for both
classification and regression problems.

32
XGBoost
• XGBoost is an implementation of Gradient Boosting and is a type
of ensemble learning method. Ensemble learning combines
multiple weak models to form a stronger model.
• XGBoost uses decision trees as its base learners combining them
sequentially to improve the model’s performance. Each new tree
is trained to correct the errors made by the previous tree and this
process is called boosting.
• It has built-in parallel processing to train models on large datasets
quickly. XGBoost also supports customizations allowing users to
adjust model parameters to optimize performance based on the
specific problem.

33
How XGBoost Works?
• Start with a base learner: The first model decision tree is trained on the
data. In regression tasks this base model simply predict the average of
the target variable.
• Calculate the errors: After training the first tree the errors between the
predicted and actual values are calculated.
• Train the next tree: The next tree is trained on the errors of the previous
tree. This step attempts to correct the errors made by the first tree.
• Repeat the process: This process continues with each new tree trying
to correct the errors of the previous trees until a stopping criterion is
met.
• Combine the predictions: The final prediction is the sum of the
predictions from all the trees.

34
35

You might also like