0% found this document useful (0 votes)
21 views3 pages

ML Exp 9

The document discusses ensemble learning, focusing on the AdaBoost algorithm and its various types, including bagging and boosting. It explains how ensemble methods combine multiple models to improve predictive performance, particularly in complex datasets. The document also details the workings of AdaBoost, highlighting its iterative training process and effectiveness in classification tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views3 pages

ML Exp 9

The document discusses ensemble learning, focusing on the AdaBoost algorithm and its various types, including bagging and boosting. It explains how ensemble methods combine multiple models to improve predictive performance, particularly in complex datasets. The document also details the workings of AdaBoost, highlighting its iterative training process and effectiveness in classification tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Machine Learning TECSE

Experiment No-9

Title: Ensemble learning-Adaboost Algorithm

Aim: To understand ensemble learning algorithm and its different types.


Theory:
Ensemble learning is a powerful technique in machine learning where multiple models are
combined to improve the overall performance of the system. The basic idea behind ensemble
learning is that by combining multiple models, each capturing different aspects of the data or
making different kinds of errors, the ensemble can make more accurate predictions than any
individual model.
There are several approaches to ensemble learning, including:
Voting: Different models make predictions, and the final prediction is determined by a majority
vote (for classification tasks) or averaging (for regression tasks).
Bagging (Bootstrap Aggregating): Multiple copies of the same model are trained on different
subsets of the training data (with replacement), and their predictions are averaged. Random
Forests are a popular example of this approach.
Boosting: Models are trained sequentially, with each new model focusing on the examples that
the previous models found difficult. Examples include AdaBoost and Gradient Boosting
Machines (GBM).
Stacking (Stacked Generalization): In stacking, the predictions of multiple models are used as
input features for a meta-model, which then makes the final prediction. This meta-model is often
a simple linear model, but it can also be more complex.
Ensemble methods are widely used in practice because they often result in better predictive
performance compared to individual models, especially when the individual models are diverse
and make different kinds of errors. They are particularly useful when dealing with complex,
high-dimensional datasets or when the underlying relationships in the data are not well
understood.

Bagging:
Bagging, short for Bootstrap Aggregating, is a popular ensemble learning technique in machine
learning. It aims to improve the stability and accuracy of machine learning algorithms,
particularly decision trees and their variations.
Here's how bagging works:

1
Machine Learning TECSE

Bootstrap Sampling: Bagging starts by creating multiple bootstrap samples from the original
dataset. Bootstrap sampling involves randomly selecting data points from the dataset with
replacement, meaning that the same data point can be selected multiple times or not at all.
Model Training: For each bootstrap sample, a base learner (usually a decision tree) is trained on
that sample. Because each bootstrap sample is different, each base learner is trained on a slightly
different subset of the original data.
Voting or Averaging: Once all base learners are trained, bagging combines their predictions
using a voting (for classification problems) or averaging (for regression problems) mechanism.
This aggregation helps to reduce variance and improve the overall performance of the model.
The key idea behind bagging is that by training multiple models on different subsets of the data
and combining their predictions, it reduces overfitting and variance in the final model. This often
leads to improved generalization performance, especially when dealing with complex datasets or
noisy data. Random Forest is one of the most well-known algorithms that uses the bagging
technique.

Boosting:
Boosting is a machine learning ensemble technique that combines multiple weak learners to
create a strong learner. Weak learners are models that perform slightly better than random
guessing, such as decision trees with only a few nodes. Boosting works by training a series of
weak learners sequentially, with each one focusing on the instances that the previous learners
struggled with. In essence, it pays more attention to the mistakes of earlier models, hence
"boosting" their performance.
The most popular boosting algorithm is AdaBoost (Adaptive Boosting), which assigns weights to
each training instance and adjusts them at each iteration to focus on the harder-to-classify
instances. Gradient Boosting Machines (GBMs) like XGBoost, LightGBM, and CatBoost are
also widely used and have become the go-to methods for many machine learning competitions
and real-world applications due to their exceptional performance and flexibility.
Boosting algorithms are particularly effective for tasks such as classification and regression.
They often outperform individual models and other ensemble methods like bagging (e.g.,
Random Forests) when applied correctly. However, they can be sensitive to noisy data and
outliers, and they may require careful tuning of hyperparameters to achieve optimal performance.

AdaBoost:
AdaBoost, short for Adaptive Boosting, is a popular ensemble learning algorithm used in
machine learning, specifically for classification tasks. It was proposed by Yoav Freund and
Robert Schapire in 1996. The main idea behind AdaBoost is to combine multiple weak learners
(classifiers that perform slightly better than random guessing) to create a strong classifier.

2
Machine Learning TECSE

Here's a brief overview of how AdaBoost works:


Initialization: Each training instance is assigned an equal weight initially.
Iterative Training: AdaBoost iteratively trains a series of weak classifiers on the training data. At
each iteration:
A weak learner (e.g., decision tree, perceptron) is trained on the dataset, focusing more on the
instances that were misclassified in the previous iteration.
After training, the weak learner's performance (accuracy) on the training set is evaluated.
The contribution of the weak learner to the final ensemble is calculated based on its performance.
Higher accuracy leads to higher contribution.
Weight Update: After each iteration, the weights of misclassified instances are adjusted.
Misclassified instances are given higher weights to ensure they receive more attention in the next
iteration. This allows AdaBoost to focus on the difficult instances that were not correctly
classified in previous rounds.
Ensemble Construction: The final strong classifier is constructed by combining the weak
classifiers, giving more weight to those with higher accuracy.
Prediction: To make predictions on new data, AdaBoost combines the predictions of all weak
classifiers using a weighted majority vote or a weighted sum.
AdaBoost is particularly effective when used with weak learners that have a slight edge over
random guessing, such as shallow decision trees or perceptrons. Despite its age, AdaBoost
remains a powerful and widely used algorithm in machine learning, especially in scenarios where
interpretability and performance are both important.

Conclusion: In this experiment we have learnt different ensemble learning algorithms with
their applications.

You might also like