PART A
( PART A: TO BE REFERRED BY STUDENTS)
                                Experiment No. 04
A.1 Aim:
       To implement the Ensemble algorithm.
A.2 Prerequisite:
               Python Basic Concepts
A.3 Outcome:
       Students will be able to implement Ensemble
       Ensemble Learning Techniques in Machine Learning, Machine learning models
suffer bias and/or variance. Bias is the difference between the predicted value and actual
value by the model. Bias is introduced when the model doesn’t consider algorithms. A.4
Theory:
 the variation of data and creates a simple model. The simple model doesn’t follow the
patterns of data, and hence the model gives errors in predicting training as well as
testing data i.e. the model with high bias and high variance
 When the model follows even random quirks of data, as pattern of data, then the
model might do very well on training dataset i.e. it gives low bias, but it fails on test
data and gives high variance.
 Therefore, to improve the accuracy (estimate) of the model, ensemble learning methods
are developed. Ensemble is a machine learning concept, in which several models are
trained using machine learning algorithms. It combines low performing classifiers (also
called as weak learners or base learner) and combine individual model prediction for
the final prediction.
 On the basis of type of base learners, ensemble methods can be categorized as
homogeneous and heterogeneous ensemble methods. If base learners are same, then it
is a homogeneous ensemble method. If base learners are different then it is a heterogeneous
ensemble method.
Ensemble Learning Methods
Ensemble techniques are classified into three types:
      1.   Bagging
      2.   Boosting
      3.   Stacking
      4.   Bagging
Consider a scenario where you are looking at the users’ ratings for a product. Instead
of approving one user’s good/bad rating, we consider average rating given to the
product. With average rating, we can be considerably sure of quality of the product.
Bagging makes use of this principle. Instead of depending on one model, it runs the
data through multiple models in parallel, and average them out as model’s final output.
What is Bagging? How it works?
      ●   Bagging is an acronym for Bootstrapped Aggregation . Bootstrapping means
          random selection of records with replacement from the training dataset.
          ‘Random selection with replacement’ can be explained as follows:
      a. Consider that there are 8 samples in the training dataset. Out of these 8
         samples, every weak learner gets 5 samples as training data for the model.
         These 5 samples need not be unique, or non-repetitive.
      b. The model (weak learner) is allowed to get a sample multiple times. For
         example, as shown in the figure, Rec5 is selected 2 times by the model.
         Therefore, weak learner1 gets Rec2, Rec5, Rec8, Rec5, Rec4 as training data.
      c. All the samples are available for selection to next weak learners. Thus all 8
         samples will be available for next weak learner and any sample can be
         selected multiple times by next weak learners.
      ●   Bagging is a parallel method , which means several weak learners learn the
          data pattern independently and simultaneously. This can be best shown in the
          below diagram:
      1. The output of each weak learner is averaged to generate final output of the model.
      2. Since the weak learner’s outputs are averaged, this mechanism helps to reduce
       variance or variability in the predictions. However, it does not help to reduce
       bias of the model.
      3. Since final prediction is an average of output of each weak learner, it means
       that each weak learner has equal say or weight in the final output. To summarize:
      1. Bagging is Bootstrapped Aggregation
      2. It is Parallel method
      3. Final output is calculated by averaging the outputs produced by individual
         weak learner
      4. Each weak learner has equal say
      5. Bagging reduces variance
 Boosting
 We saw that in bagging every model is given equal preference, but if one model
predicts data more correctly than the other, then higher weightage should be given to
this model over the other. Also, the model should attempt to reduce bias. These concepts
are applied in the second ensemble method that we are going to learn, that is Boosting.
What is Boosting?
      1. To start with, boosting assigns equal weights to all data points as all points
         are equally important in the beginning. For example, if a training dataset has
         N samples, it assigns weight = 1/N to each sample.
      2. The weak learner classifies the data. The weak classifier classifies some
         samples correctly, while making mistake in classifying others.
      3. After classification, sample weights are changed. Weight of correctly classified
       sample is reduced, and weight of incorrectly classified sample is increased. Then
       the next weak classifier is run.
      4. This process continues until model as a whole gives strong predictions.
Note: Adaboost is the ensemble learning method used in binary classification only.
                                   PART B
                 ( PART B : TO BE COMPLETED BY STUDENTS)
Roll. No.: B43                               Name: Nikhil Aher
Class: Fourth Year (B)                       Batch: B3
Date of Experiment: 26/07/24                 Date of Submission: 02/08/24
Grade:
     B.1 Software Code written by student:
     B.2 Input and Output:
       B.3 Observations and learning:
                     Ensemble learning combines multiple models to improve predictive
accuracy and robustness. Key methods include Bagging, which builds multiple models
on different subsets of data and averages their predictions (e.g., Random Forests), and
Boosting, which sequentially trains models to correct errors made by previous ones
(e.g., Gradient Boosting). Stacking involves training a meta-model to combine predictions
from various base models. Ensembles often reduce overfitting and enhance performance
by leveraging model diversity, though they can be computationally intensive and complex
to interpret. Balancing these factors is crucial for effectively applying ensemble methods to
specific problems. B.4 Conclusion:
                Ensemble methods enhance machine learning performance by combining
multiple models to capitalize on their individual strengths and mitigate weaknesses.
Techniques like bagging reduce variance by averaging or voting from multiple models
trained on different data subsets. Boosting improves accuracy by sequentially training
models to correct previous errors. Stacking leverages a meta-model to combine
predictions from various models for better accuracy. Overall, ensemble methods provide
robust, high-performing solutions by integrating diverse model outputs, although they
may introduce increased complexity and computational demands.
                                         **********