0% found this document useful (0 votes)
38 views21 pages

ML Unit 3

This document provides an overview of machine learning concepts, focusing on decision trees, their construction, and their applications in classification and regression tasks. It discusses the advantages and disadvantages of decision trees, key components, and criteria for splitting data, as well as ensemble learning techniques like bagging and boosting. The document also highlights the differences between classification and regression, including evaluation metrics and common algorithms used in each domain.

Uploaded by

xerojey983
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views21 pages

ML Unit 3

This document provides an overview of machine learning concepts, focusing on decision trees, their construction, and their applications in classification and regression tasks. It discusses the advantages and disadvantages of decision trees, key components, and criteria for splitting data, as well as ensemble learning techniques like bagging and boosting. The document also highlights the differences between classification and regression, including evaluation metrics and common algorithms used in each domain.

Uploaded by

xerojey983
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

lOMoARcPSD|57693734

UNIT-III - Machine Learning UNIT-III

machine learning (CMR Technical Campus)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by ken kaneki (yabofe9281@endibit.com)
lOMoARcPSD|57693734

MACHINE LEARNING
22CS601PC

LECTURE NOTES
B.TECH III YEAR – I SEM-(R22)
(2024-25)

SVSV Prasad Sanaboina


Assistant Professor

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

SVSV Prasad Sanaboina., Asst.Prof

Downloaded by ken kaneki (yabofe9281@endibit.com)


lOMoARcPSD|57693734

MACHINE LEARNING

UNIT-III
Learning with Trees – Decision Trees – Constructing Decision Trees – Classification and
Regression Trees – Ensemble Learning – Boosting – Bagging – Different ways to Combine
Classifiers – Basic Statistics – Gaussian Mixture Models – Nearest Neighbor Methods –
Unsupervised Learning – K means Algorithms

Learning with Trees


Learning with decision trees is a fundamental concept in machine learning. Decision trees
are versatile, interpretable, and form the basis for more advanced algorithms like Random
Forests and Gradient Boosting Machines. Below, how learning works with decision trees,
their key concepts, and how they are used in machine learning.

1. Learning with Trees?


Learning with trees involves constructing a decision tree model from training data. The goal
is to create a model that can make predictions by learning simple decision rules inferred from
the data features.

2. Key Concepts in Learning with Trees


a) Supervised Learning
Decision trees are used in supervised learning, where the model is trained on labeled
data (input features and corresponding output labels).
For classification, the output is a class label (e.g., "Yes" or "No").
For regression, the output is a continuous value (e.g., house price).
b) Tree Structure
The tree consists of nodes (decisions based on features), branches (outcomes of
decisions), and leaf nodes (final predictions).
The tree is built by recursively splitting the data into subsets based on feature values.
c) Splitting Criteria
The algorithm selects the best feature to split the data at each node. Common criteria
include:
Gini Impurity (for classification): Measures the likelihood of
misclassification.
Entropy (for classification): Measures the disorder in the data.
Information Gain: The reduction in entropy after a split.
Variance Reduction (for regression): Minimizes the variance of the target
variable.
d) Stopping Criteria
The tree stops growing when one of the following conditions is met:
Maximum depth is reached.
Minimum number of samples in a leaf node.
No further improvement in splitting criteria.
e) Pruning
To avoid overfitting, the tree can be pruned by removing unnecessary branches after
construction.
3. Advantages of Learning with Trees
Interpretability: Trees are easy to visualize and understand.

SVSV Prasad Sanaboina., Asst.Prof

Downloaded by ken kaneki (yabofe9281@endibit.com)


lOMoARcPSD|57693734

Handles Non-Linearity: Can model complex, non-linear relationships.


Requires Little Preprocessing: No need for feature scaling or normalization.
Handles Mixed Data Types: Works with both numerical and categorical data.

4.Disadvantages of Learning with Trees


Overfitting: Trees can become too complex and fit the training data too closely.
Instability: Small changes in data can lead to completely different trees.
Bias: Trees can be biased if some classes dominate.
Poor Generalization: May not perform well on unseen data without proper tuning.

Decision Trees
A decision tree is a supervised machine learning algorithm used for both classification and
regression tasks. It is a tree-like model of decisions and their possible consequences, where
each internal node represents a decision based on a feature, each branch represents the
outcome of that decision, and each leaf node represents a final output (class label or
continuous value).

Key Components of a Decision Tree:


1. Root Node: The topmost node that represents the entire dataset.
2. Internal Nodes: Nodes that split the data based on a feature.
3. Branches: Outcomes of a decision (e.g., "Yes" or "No").
4. Leaf Nodes: Terminal nodes that represent the final decision or prediction.
5. Splitting: The process of dividing a node into sub-nodes based on a feature.
6. Pruning: Removing unnecessary branches to avoid overfitting.

How a Decision Tree Works:


1. Start at the Root Node: The algorithm selects the best feature to split the data.
2. Split the Data: The dataset is divided into subsets based on the feature's value.
3. Repeat: The process is repeated recursively for each subset until a stopping criterion
is met (e.g., maximum depth, minimum samples per leaf).
4. Assign a Label: Once a leaf node is reached, the majority class (for classification) or
average value (for regression) is assigned.

SVSV Prasad Sanaboina., Asst.Prof

Downloaded by ken kaneki (yabofe9281@endibit.com)


lOMoARcPSD|57693734

Criteria:
The choice of feature to split on is determined by specific criteria:
1. For Classification:
Gini Impurity: Measures the likelihood of an incorrect classification if a
random label is chosen.
Entropy: Measures the disorder or uncertainty in the data.
Information Gain: The reduction in entropy after a dataset is split.
2. For Regression:
Variance Reduction: Splits are chosen to minimize the variance of the target
variable.
Advantages of Decision Trees:
1. Easy to understand and interpret (visualizable).
2. Can handle both numerical and categorical data.
3. Requires little data preprocessing (e.g., no need for scaling).
4. Can model non-linear relationships.

Disadvantages of Decision Trees:


1. Prone to overfitting, especially with deep trees.
2. Sensitive to small changes in the data.
3. Can create biased trees if some classes dominate.
4. May not perform well with imbalanced datasets.

Popular Algorithms for Decision Trees:


1. ID3 (Iterative Dichotomiser 3): Uses information gain.
2. C4.5: An extension of ID3 that handles continuous features and missing values.
3. CART (Classification and Regression Trees): Uses Gini impurity for classification
and variance reduction for regression.
4. Random Forest: An ensemble method that builds multiple decision trees and
combines their outputs

Applications:
 Customer segmentation
 Fraud detection
 Medical diagnosis
 Predictive modeling
 Recommendation systems

Constructing Decision Trees


Constructing decision trees is a fundamental task in machine learning, used for both
classification and regression problems. Decision trees are intuitive models that split data into
subsets based on feature values, making them easy to interpret. Below is a step-by-step guide
to constructing decision trees:
1. Understand the Basics
A decision tree consists of:
 Nodes: Represent decisions based on features.
 Edges: Represent outcomes of decisions.
 Leaves: Represent final predictions (class labels or regression values).

SVSV Prasad Sanaboina., Asst.Prof

Downloaded by ken kaneki (yabofe9281@endibit.com)


lOMoARcPSD|57693734

2. Choose a Splitting Criterion


The tree is built by recursively splitting the data based on features. The choice of feature and
split point is determined by a criterion:
 For Classification:
o Gini Impurity: Measures the likelihood of misclassification.
o Entropy: Measures the disorder or uncertainty in the data.
o Information Gain: Measures the reduction in entropy after a split.
 For Regression:
o Variance Reduction: Splits are chosen to minimize the variance of the target
variable within each subset.
3. Recursive Splitting
1. Start at the Root Node:
o Evaluate all features and possible split points using the chosen criterion.
o Select the feature and split point that maximize information gain (or minimize
impurity).
2. Create Child Nodes:
o Split the dataset into subsets based on the chosen feature and split point.
o Repeat the process for each subset until a stopping condition is met.

4. Stopping Conditions
To prevent overfitting, stop splitting when:
 All samples in a node belong to the same class (for classification).
 The node reaches a minimum number of samples.
 The tree reaches a maximum depth.
 Further splits do not improve the model significantly.

5. Pruning (Optional)
Pruning removes unnecessary branches to simplify the tree and improve generalization:
 Pre-pruning: Stop splitting early based on conditions.
 Post-pruning: Build the full tree first, then remove branches that contribute little to
accuracy.

6. Make Predictions
To predict for a new instance:
 Traverse the tree from the root node to a leaf node based on the feature values of the
instance.
 Use the majority class (classification) or average value (regression) at the leaf node as
the prediction.

7. Implement the Algorithm


You can implement decision trees from scratch or use libraries like:
 Scikit-learn (Python): DecisionTreeClassifier and DecisionTreeRegressor.
 R: rpart package.
 Other Tools: XGBoost, LightGBM, or TensorFlow Decision Forests.

Classification and Regression


Classification and regression are two primary tasks in supervised machine learning,
where key difference lies in the nature of the output: classification deals with discrete
outcomes (e.g., yes/no, categories), while regression handles continuous values (e.g., price,
temperature).

SVSV Prasad Sanaboina., Asst.Prof

Downloaded by ken kaneki (yabofe9281@endibit.com)


lOMoARcPSD|57693734

Both approaches require labeled data for training but differ in their objectives—
classification aims to find decision boundaries that separate classes, whereas regression
focuses on finding the best-fitting line to predict numerical outcomes. Understanding these
distinctions helps in selecting the right approach for specific machine learning tasks.

For example, it can determine whether an email is spam or not, classify images as “cat” or
“dog,” or predict weather conditions like “sunny,” “rainy,” or “cloudy.” with decision
boundary and regression models are used to predict house prices based on features like size
and location, or forecast stock prices over time with straight fit line.

Decision Boundary vs Best-Fit Line


When teaching the difference between classification and regression in machine learning, a
key concept to focus on is the decision boundary (used in classification) versus the best-fit
line (used in regression). These are fundamental tools that help models make predictions,
but they serve distinctly different purposes.

1. Decision Boundary in Classification


It is an surface or line that separates data points into different classes in a feature
space. It can be linear (a straight line) or non-linear (a curve), depending on the complexity
of the data and the algorithm used. For example:
 A linear decision boundary might separate two classes in a 2D space with a straight line
(e.g., logistic regression).
 A more complex model, may create non-linear boundaries to better fit intricate datasets.

During training classifier learns to partition the feature space by finding a boundary
that minimizes classification errors.
 For binary classification, this boundary separates data points into two groups (e.g., spam
vs. non-spam emails).
 In multi-class classification, multiple boundaries are created to separate more than two
classes.

SVSV Prasad Sanaboina., Asst.Prof

Downloaded by ken kaneki (yabofe9281@endibit.com)


lOMoARcPSD|57693734

The decision boundary is not inherent to the training data but rather depends on the classifier
used; we will understand more about classifiers in next chapter.
2. Best-Fit Line in Regression
In regression, a best-fit line (or regression line) represents the relationship between
independent variables (inputs) and a dependent variable (output). It is used to predict
continuous numerical values capturing trends and relationships within the data, allowing for
accurate predictions of continuous variables. The best-fit line can be linear or non-linear:
 A straight line is used for linear regression.
 Curves are used for more complex regressions, like polynomial regression

The plot demonstrates Regression, where both Linear and Polynomial models
are used to predict continuous target values based on the input feature, in contrast to
Classification, which would create decision boundaries to separate discrete classes.
Classification Algorithms
There are different types of classification algorithms that have been developed over time to
give the best results for classification tasks. Don’t worry if they seem overwhelming at
first—we’ll dive deeper into each algorithm, one by one, in the upcoming chapters.
 Logistic Regression
 Decision Tree
 Random Forest
 K – Nearest Neighbors
 Support Vector Machine
 Naive Bayes
Regression Algorithms
There are different types of regression algorithms that have been developed over time to
give the best results for regression tasks.
 Lasso Regression
 Ridge Regression
 XGBoost Regressor
 LGBM Regressor

SVSV Prasad Sanaboina., Asst.Prof

Downloaded by ken kaneki (yabofe9281@endibit.com)


lOMoARcPSD|57693734

Comparison between Classification and Regression


Feature Classification Regression

In this problem statement, the


target variables are discrete. Continuous numerical value (e.g.,
Output type
Discrete categories (e.g., “spam” price, temperature).
or “not spam”)

To predict which category a data To predict an exact numerical value


Goal
point belongs to. based on input data.

Email spam detection, image


Example House price prediction, stock market
recognition, customer sentiment
problems forecasting, sales prediction.
analysis.

Evaluation Evaluation metrics like Precision, Mean Squared Error, R2-Score,


metrics Recall, and F1-Score , MAPE and RMSE.

Decision Clearly defined boundaries No distinct boundaries, focuses on


boundary between different classes. finding the best fit line.

Logistic regression, Decision Linear Regression, Polynomial


Common
trees, Support Vector Machines Regression, Decision Trees (with
algorithms
(SVM) regression objective).

Classification vs Regression : Conclusion


Classification trees are employed when there’s a need to categorize the dataset into distinct
classes associated with the response variable. Often, these classes are binary, such as “Yes”
or “No,” and they are mutually exclusive. While there are instances where there may be
more than two classes, a modified version of the classification tree algorithm is used in those
scenarios.
On the other hand, regression trees are utilized when dealing with continuous response
variables. For instance, if the response variable represents continuous values like the price of
an object or the temperature for the day, a regression tree is the appropriate choice.
There are situations where a blend of regression and classification approaches is necessary.
For instance, ordinal regression comes into play when dealing with ranked or ordered
categories, while multi-label classification is suitable for cases where data points can be
associated with multiple classes at the same time.

Ensemble Learning
Ensemble learning combines the predictions of multiple models (called "weak learners" or
"base models") to make a stronger, more reliable prediction. The goal is to reduce errors and
improve performance.
Types of Ensemble Learning in Machine Learning
There are two main types of ensemble methods:
1. Bagging (Bootstrap Aggregating): Models are trained independently on different subsets
of the data, and their results are averaged or voted on.
2. Boosting: Models are trained sequentially, with each one learning from the mistakes of
the previous model.

SVSV Prasad Sanaboina., Asst.Prof

Downloaded by ken kaneki (yabofe9281@endibit.com)


lOMoARcPSD|57693734

1. Bagging Algorithm
Bagging classifier can be used for both regression and classification tasks. Here is an
overview of Bagging classifier algorithm:
 Bootstrap Sampling: Divides the original training data into ‘N’ subsets and randomly
selects a subset with replacement in some rows from other subsets. This step ensures that
the base models are trained on diverse subsets of the data and there is no class imbalance.
 Base Model Training: For each bootstrapped sample we train a base model
independently on that subset of data. These weak models are trained in parallel to increase
computational efficiency and reduce time consumption. We can use different base
learners i.e different ML models as base learners to bring variety and robustness.
 Prediction Aggregation: To make a prediction on testing data combine the predictions of
all base models. For classification tasks it can include majority voting or weighted
majority while for regression it involves averaging the predictions.
 Out-of-Bag (OOB) Evaluation: Some samples are excluded from the training subset of
particular base models during the bootstrapping method. These “out-of-bag” samples can
be used to estimate the model’s performance without the need for cross-validation.
 Final Prediction: After aggregating the predictions from all the base models, Bagging
produces a final prediction for each instance.

2. Boosting Algorithm
Boosting is an ensemble technique that combines multiple weak learners to create a strong
learner. Weak models are trained in series such that each next model tries to correct errors of
the previous model until the entire training dataset is predicted correctly. One of the most
well-known boosting algorithms is AdaBoost (Adaptive Boosting). Here is an overview of
Boosting algorithm:
 Initialize Model Weights: Begin with a single weak learner and assign equal weights to
all training examples.
 Train Weak Learner: Train weak learners on these dataset.
 Sequential Learning: Boosting works by training models sequentially where each model
focuses on correcting the errors of its predecessor. Boosting typically uses a single type of
weak learner like decision trees.
 Weight Adjustment: Boosting assigns weights to training datapoints. Misclassified
examples receive higher weights in the next iteration so that next models pay more
attention to them.

Benefits of Ensemble Learning in Machine Learning


Ensemble learning is a versatile approach that can be applied to machine learning model
for:-
 Reduction in Overfitting: By aggregating predictions of multiple models ensembles can
reduce overfitting that individual complex models might exhibit.
 Improved Generalization: It generalize better to unseen data by minimizing variance and
bias.
 Increased Accuracy: Combining multiple models gives higher predictive accuracy.
 Robustness to Noise: It mitigates the effect of noisy or incorrect data points by averaging
out predictions from diverse models.
 Flexibility: It can work with diverse models including decision trees, neural networks and
support vector machines making them highly adaptable.
 Bias-Variance Tradeoff: Techniques like bagging reduce variance, while boosting
reduces bias leading to better overall performance.

SVSV Prasad Sanaboina., Asst.Prof

Downloaded by ken kaneki (yabofe9281@endibit.com)


lOMoARcPSD|57693734

Boosting in Machine Learning


Boosting is an ensemble learning technique that improves the accuracy of weak classifiers
by combining them into a strong classifier. It works by training models sequentially, where
each model focuses on correcting the errors of its predecessor.

• Weak Learners: Models that perform slightly better than random guessing (e.g.,
shallow decision trees).
• Sequential Training: Each model is trained to reduce the mistakes of the previous
one.
• Weighted Samples: More weight is given to misclassified instances to improve
performance.
• Final Prediction: The models' outputs are combined (e.g., weighted majority voting)
to make the final decision.

Advantages of Boosting
✅ Improves accuracy significantly
✅Reduces bias and variance
✅Works well with structured/tabular data
✅Handles missing values efficiently

Disadvantages of Boosting
❌Can overfit if not tuned properly
❌Computationally expensive for large datasets
❌Sensitive to noisy data

SVSV Prasad Sanaboina., Asst.Prof

Downloaded by ken kaneki (yabofe9281@endibit.com)


lOMoARcPSD|57693734

Bagging (Bootstrap Aggregating) in Machine Learning


• Bagging (Bootstrap Aggregating) is an ensemble learning technique that improves
the accuracy and stability of machine learning models by training multiple models on
different subsets of the data and averaging their predictions.
• How Bagging Works
• Bootstrap Sampling: Randomly selects multiple subsets of data (with replacement)
from the original dataset.
• Train Weak Learners: Multiple independent models (usually the same type, e.g.,
decision trees) are trained on these subsets.
• Aggregate Predictions:
– For classification: Majority voting is used.
– For regression: Predictions are averaged

Advantages of Bagging
✅Reduces overfitting by averaging multiple models
✅Decreases variance while maintaining low bias
✅Works well with high-dimensional and noisy data
✅ Improves model stability
Disadvantages of Bagging
❌Computationally expensive due to multiple model training
❌Doesn't significantly reduce bias (especially for weak models)

SVSV Prasad Sanaboina., Asst.Prof

Downloaded by ken kaneki (yabofe9281@endibit.com)


lOMoARcPSD|57693734

Different ways to Combine Classifiers


There are multiple ways to combine classifiers, including bagging, boosting, stacking, and
classifier ensembles.

Ensemble Classifier | Data Mining


Ensemble learning helps improve machine learning results by combining several models.
This approach allows the production of better predictive performance compared to a single
model. Basic idea is to learn a set of classifiers (experts) and to allow them to vote.

Advantage : Improvement in predictive accuracy.


Disadvantage : It is difficult to understand an ensemble of classifiers.

Why do ensembles work?


Dietterich(2002) showed that ensembles overcome three problems –
Statistical Problem –
The Statistical Problem arises when the hypothesis space is too large for the amount of
available data. Hence, there are many hypotheses with the same accuracy on the data and the
learning algorithm chooses only one of them! There is a risk that the accuracy of the chosen
hypothesis is low on unseen data!
Computational Problem –
The Computational Problem arises when the learning algorithm cannot guarantees finding the
best hypothesis.
Representational Problem –
The Representational Problem arises when the hypothesis space does not contain any good
approximation of the target class(es).

Main Challenge for Developing Ensemble Models?

The main challenge is not to obtain highly accurate base models, but rather to obtain base
models which make different kinds of errors. For example, if ensembles are used for
classification, high accuracies can be accomplished if different base models misclassify
different training examples, even if the base classifier accuracy is low.

SVSV Prasad Sanaboina., Asst.Prof

Downloaded by ken kaneki (yabofe9281@endibit.com)


lOMoARcPSD|57693734

Methods for Independently Constructing Ensembles –


 Majority Vote
 Bagging and Random Forest
 Randomness Injection
 Feature-Selection Ensembles
 Error-Correcting Output Coding

Methods for Coordinated Construction of Ensembles –


 Boosting
 Stacking

Types of Ensemble Classifier –


Bagging:
Bagging (Bootstrap Aggregation) is used to reduce the variance of a decision tree. Suppose a
set D of d tuples, at each iteration i, a training set Di of d tuples is sampled with replacement
from D (i.e., bootstrap). Then a classifier model Mi is learned for each training set D < i.
Each classifier Mi returns its class prediction. The bagged classifier M* counts the votes and
assigns the class with the most votes to X (unknown sample).
Implementation steps of Bagging –
1. Multiple subsets are created from the original data set with equal tuples, selecting
observations with replacement.
2. A base model is created on each of these subsets.
3. Each model is learned in parallel from each training set and independent of each other.
4. The final predictions are determined by combining the predictions from all the models.

Boosting
Boosting in machine learning is a technique that combines multiple weak models into a
single, more accurate model. It's a type of ensemble learning that can improve the accuracy of
classification and regression algorithms.
In machine learning a single model may not be sufficient to solve complex problems as it
can be too weak to solve it independently. To enhance its predictive accuracy we combine
multiple multiple weak models to build a more powerful and robust model. This process of
combining multiple weak learners to form a strong learner is known as Boosting.

SVSV Prasad Sanaboina., Asst.Prof

Downloaded by ken kaneki (yabofe9281@endibit.com)


lOMoARcPSD|57693734

What is Boosting?
Boosting is an ensemble learning technique that sequentially combines multiple weak
classifiers to create a strong classifier. It is done by training a model using training data and is
then evaluated. Next model is built on that which tries to correct the errors present in the first
model. This procedure is continued and models are added until either the complete training
data set is predicted correctly or predefined number of iterations is reached.
Think of it like in a class a teacher focuses more on weak learners to improve its academic
performance similarly boosting works.
Adaboost and its working
To understand boosting and its working we will be using Adaboost boosting technique. It is
one of the simple and widely used technique due to which we are using it here for better
understanding. AdaBoost assigns higher weights to misclassified data points in each
iteration ensuring subsequent models focus on these points. The influence of each weak
learner is determined by its classification error.
AdaBoost (Adaptive Boosting) is an ensemble learning algorithm that improves
classification accuracy by combining multiple decision trees. It assigns equal weights to all
training samples initially and iteratively adjusts these weights by focusing more on
misclassified datapoints for next model. It effectively reduces bias and variance making it
useful for classification tasks but it can be sensitive to noisy data and outliers.

The above diagram explains the AdaBoost algorithm in a very simple way. Let’s try to
understand it in a stepwise process:
Step 1: Initial Model (B1)
The dataset consists of multiple data points (red, blue and green circles).
Equal weight is assigned to each data point.
The first weak classifier attempts to create a decision boundary.
8 data points are wrongly classified.
Step 2: Adjusting Weights (B2)
The misclassified points from B1 are assigned higher weights (shown as darker points in
the next step).
A new classifier is trained with a refined decision boundary focusing more on the
previously misclassified points.
Some previously misclassified points are now correctly classified.
6 data points are wrongly classified.

SVSV Prasad Sanaboina., Asst.Prof

Downloaded by ken kaneki (yabofe9281@endibit.com)


lOMoARcPSD|57693734

Step 3: Further Adjustment (B3)


The newly misclassified points from B2 receive higher weights to ensure better
classification.
The classifier adjusts again using an improved decision boundary and 4 data points remain
misclassified.
Step 4: Final Strong Model (B4 – Ensemble Model)
The final ensemble classifier combines B1, B2 and B3 to get strengths of all weak
classifiers.
By aggregating multiple models the ensemble model achieves higher accuracy than any
individual weak model.

Types Of Boosting Algorithms


There are several types of boosting algorithms some of the most famous and useful models
are as :
1. Gradient Boosting – Gradient Boosting constructs models in a sequential manner where
each weak learner minimizes the residual error of the previous one using gradient descent.
Instead of adjusting sample weights like AdaBoost Gradient Boosting reduces error
directly by optimizing a loss function.
2. XGBoost – XGBoost is an optimized implementation of Gradient Boosting that
uses regularization to prevent overfitting. It is faster and more efficient than standard
Gradient Boosting and supports handling both numerical and categorical variables.
3. CatBoost – CatBoost is particularly effective for datasets with categorical features. It
employs symmetric decision trees and a unique encoding method that considers target
values, making it superior in handling categorical data without preprocessing.

Gaussian Mixture Model


Clustering is a key technique in unsupervised learning, used to group similar data points
together. While traditional methods like K-Means and Hierarchical Clustering are widely
used, they assume that clusters are well-separated and have rigid shapes. This can be
limiting in real-world scenarios where clusters can be more complex.
To overcome these limitations, Gaussian Mixture Models (GMM) offer a more flexible
approach. Unlike K-Means, which assigns each point to a single cluster, GMM uses a
probabilistic approach to cluster the data, allowing clusters to have more varied shapes and
soft boundaries. Let’s dive into what GMM is and how it works

• Nicely Separated Each Clusters


• Every Single Point Assigned One Unique Cluster Label Which is known as hard
Clustering

SVSV Prasad Sanaboina., Asst.Prof

Downloaded by ken kaneki (yabofe9281@endibit.com)


lOMoARcPSD|57693734

1 fig 2 fig

3fig 4fig

Let’s have data points one dimensional space there are real numbers and numerical values. Here I
assume some Gaussian distribution I am taking two

Once randomly initialized two Gaussian distributions

Calculate the responsibility (Probability of belongingness) for each point

SVSV Prasad Sanaboina., Asst.Prof

Downloaded by ken kaneki (yabofe9281@endibit.com)


lOMoARcPSD|57693734

Color coding is known as responsibility

In between in Gaussian it will have different region look like this represented colors.

Find the probability of blue and orange each element

Only focus on specific point’s particular gudgeons i.e only we consider blue points
Once again compute normal distribution i.e is mean and standard deviation

SVSV Prasad Sanaboina., Asst.Prof

Downloaded by ken kaneki (yabofe9281@endibit.com)


lOMoARcPSD|57693734

Mean is centralized
Standard Deviation is shape of the normal distribution

SVSV Prasad Sanaboina., Asst.Prof

Downloaded by ken kaneki (yabofe9281@endibit.com)


lOMoARcPSD|57693734

Middle of the of the Points is called soft clustering

calculate the majority of influence of the data point

K-means
K-means groups together similar data points into clusters by minimizing the distance between
data points in a cluster with their centroid or k mean value. The primary goal of the k-means
algorithm is to minimize the total distances between points and their assigned cluster
centroid.

K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled


dataset into different clusters.

Here K defines the number of pre-defined clusters that need to be created in the process, as if
K=2, there will be two clusters, and for K=3, there will be three clusters, and so on.

It is an iterative algorithm that divides the unlabeled dataset into k different clusters in such a
way that each dataset belongs only one group that has similar properties.

It allows us to cluster the data into different groups and a convenient way to discover the
categories of groups in the unlabeled dataset on its own without the need for any training.

It is a centroid-based algorithm, where each cluster is associated with a centroid. The main
aim of this algorithm is to minimize the sum of distances between the data point and their
corresponding clusters.

The algorithm takes the unlabeled dataset as input, divides the dataset into k-number of
clusters, and repeats the process until it does not find the best clusters. The value of k should
be predetermined in this algorithm.

The k-means clustering algorithm mainly performs two tasks:

Determines the best value for K center points or centroids by an iterative process.

SVSV Prasad Sanaboina., Asst.Prof

Downloaded by ken kaneki (yabofe9281@endibit.com)


lOMoARcPSD|57693734

Assigns each data point to its closest k-center. Those data points which are near to the
particular k-center, create a cluster.

Hence each cluster has datapoints with some commonalities, and it is away from other
clusters.

The below diagram explains the working of the K-means Clustering Algorithm:

How does the K-Means Algorithm Work?

The working of the K-Means algorithm is explained in the below steps:

Step-1: Select the number K to decide the number of clusters.

Step-2: Select random K points or centroids. (It can be other from the input dataset).

Step-3: Assign each data point to their closest centroid, which will form the predefined K
clusters.

Step-4: Calculate the variance and place a new centroid of each cluster.

Step-5: Repeat the third steps, which means reassign each datapoint to the new closest
centroid of each cluster.

Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

Step-7: The model is ready.

SVSV Prasad Sanaboina., Asst.Prof

Downloaded by ken kaneki (yabofe9281@endibit.com)

You might also like