B. P.
Poddar Institute of Management and Technology
                                      Machine Learning (PEC-CS701E) CA1-PPT
     TOPIC: Ensemble Methods: Combining Multiple Models for Better Predictions[SUPERVISED
                                        LEARNING]
                       SUBTOPICS: A Deep Dive into Bagging, Boosting, and Stacking
•   Soumyadeep Das (11500122094)
•   Akashdeep Naha (11500122070)
•   Angshook Banerjee (11500122068)
                                                                                     preencoded.png
                                                                                                      PAGE 1
Introduction to Ensemble Learning
Good morning! Today, we delve into the fascinating world of Ensemble Learning, a powerful technique in machine learning that combines
multiple models to achieve superior predictive performance compared to any single model. This approach is crucial for building robust and
accurate predictive systems across various domains.
 1    The Power of Many                         2     Key Ensemble Techniques                 3     Presentation Roadmap
      Explore why combining models                    Focus on three primary methods:               Outline the structure of our
      enhances overall prediction accuracy            Bagging, Boosting, and Stacking.              discussion, ensuring clarity and
      and generalisation.                                                                           coherence.
                                                                                                                                   preencoded.png
                                                                                                                                                    PAGE2
Bagging: Bootstrap Aggregating
Let's begin with Bagging, short for Bootstrap Aggregating. This method involves training multiple instances of the same learning algorithm
on different random subsets of the training data, then combining their predictions, typically through averaging for regression or voting for
classification.
   Random Subsets                                  Notable Example                                 Key Advantage
   Models are trained on bootstrapped              Random Forest is a prominent                    Bagging primarily reduces variance in
   subsets of the original dataset, which          example of Bagging, where multiple              predictions, making models less prone
   means sampling with replacement.                decision trees are built and their results      to overfitting, especially with complex
   This introduces diversity among the             are aggregated.                                 base learners like decision trees.
   models.
                                                                                                                                   preencoded.png
                                                                                                                                                    PAGE3
Bagging Process Visualised
This diagram provides a clear illustration of how Bagging operates. Observe how
the original dataset is resampled to create multiple bootstrapped subsets, each
used to train an independent model.
The final prediction is derived by either averaging the outputs of individual
models (for regression tasks) or through a majority vote (for classification tasks),
leading to a more stable and accurate result.
                                                                                       preencoded.png
                                                                                                        PAGE4
Boosting: Sequential Improvement
Next, we move to Boosting, a sequential ensemble technique that constructs models in a stepwise fashion. Each new model is specifically
designed to correct the errors made by the previously trained models, iteratively improving the overall performance.
                     1                                             2                                              3
  Sequential Training                            Prominent Algorithms                           Primary Advantage
  Models are built one after another, with       Key algorithms include AdaBoost,               Boosting primarily works to reduce bias,
  each subsequent model focusing on the          Gradient Boosting, and the highly              making models more accurate by
  data points that were misclassified or         optimised XGBoost, all known for their         focusing on difficult examples and
  poorly predicted by the preceding              strong predictive power.                       adapting to complex patterns in the
  models.                                                                                       data.
                                                                                                                               preencoded.png
                                                                                                                                                PAGE5
Boosting Process Visualised
This diagram visually explains the Boosting process. Notice how each model in
the sequence is built to give more weight to the data points that previous models
struggled with, leading to a refined and more accurate overall predictor.
This iterative correction mechanism allows Boosting algorithms to achieve high
accuracy, particularly in situations where initial models might have high bias.
                                                                                    preencoded.png
                                                                                                     PAGE6
Bagging vs. Boosting: A Comparison
Understanding the nuances between Bagging and Boosting is crucial for effective model selection. While both are powerful ensemble
techniques, they address different aspects of model error (variance and bias, respectively).
Bagging: Parallel Training                                                Boosting: Sequential Refinement
•   Reduces variance.                                                     •   Reduces bias.
•   Models are trained in parallel on independent subsets.                •   Models are trained sequentially, each correcting the prior one's
•   Ideal when base models are prone to overfitting (e.g., complex            errors.
    decision trees).                                                      •   Useful for reducing bias and improving overall model
                                                                              performance on challenging datasets.
    Both techniques are invaluable, but their optimal application depends on the specific characteristics of your data and the type of errors
    you aim to minimise.
                                                                                                                                    preencoded.png
                                                                                                                                                     PAGE7
Stacking: The Ensemble of Ensembles
Finally, we explore Stacking, also known as Stacked Generalisation. This advanced ensemble method takes the predictions of multiple
diverse base models and uses a separate meta-model (or "learner") to learn how to optimally combine these predictions for the final
output.
  Base Models                                      Meta-Model                                    Synergistic Advantage
  Multiple different learning algorithms           A higher-level model is trained on            Stacking leverages the unique
  (e.g., decision trees, support vector            the outputs (predictions) of the              strengths of various models, often
  machines, neural networks) are                   base models. This meta-model                  resulting in performance that
  trained on the same original dataset.            learns the strengths and                      surpasses any single base model or
                                                   weaknesses of each base model.                even simpler ensemble techniques
                                                                                                 like Bagging or Boosting.
                                                                                                                                 preencoded.png
                                                                                                                                                  PAGE8
Stacking Process Visualised
This diagram illustrates the sophisticated workflow of Stacking. You can see how
raw data is fed into multiple diverse base models, whose predictions then form a
new dataset for the meta-model to learn from, ultimately generating the final
prediction.
This hierarchical approach allows Stacking to capture complex non-linear
relationships between the base models' predictions and the true labels, leading
to highly accurate and robust models.
                                                                                   preencoded.png
                                                                                                    PAGE9
A Case Study: Predicting
Customer Churn
Ensemble methods are incredibly powerful in real-world applications. Take
customer churn prediction in telecommunications. Identifying customers at risk
allows proactive intervention, significantly impacting revenue.
   The Challenge                             Ensemble Solution
   Traditional models often struggle         Random Forest (Bagging) or
   with complex, non-linear                  Gradient Boosting (Boosting) can
   patterns and imbalanced churn             dramatically improve accuracy
   data, leading to suboptimal               by combining weaker learners.
   accuracy.
   Key Benefits
   Ensemble models provide robust, accurate predictions, enabling precise
   targeting of at-risk customers and better understanding of churn drivers.
                                                                                 preencoded.png
                                                                                                  PAGE10
Conclusion & Key Takeaways
In summary, Bagging, Boosting, and Stacking represent powerful strategies for enhancing machine learning model performance by
cleverly combining multiple individual learners.
       Bagging: Variance Reduction                     Boosting: Bias Correction                       Stacking: Synergistic
                                                                                                       Combination
                                                       Minimises bias through sequential
       Effective in mitigating overfitting by          training, where each model focuses              Leverages the unique strengths of
       averaging predictions from models               on correcting the errors of its                 various models via a meta-model,
       trained on diverse data subsets.                predecessors.                                   often yielding superior predictive
                                                                                                       accuracy.
References
•   Breiman, L. (1996). "Bagging Predictors." Machine Learning, 24(2), 123-140.
•   Freund, Y., & Schapire, R. (1997). "A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting." Journal of
    Computer and System Sciences, 55(1), 119-139.
•   Zhou, Z.-H. (2012). Ensemble Methods: Foundations and Algorithms. CRC Press.
                                                                                                                                   preencoded.png
                                                                                                                                                    PAGE11