0% found this document useful (0 votes)

15 views10 pages

MLF Mod3

Uploaded by

sriharshasepana424

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views10 pages

MLF Mod3

Uploaded by

sriharshasepana424

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

MODULE –III

UNSUPERVISED LEARNING

Clustering: Clustering or cluster analysis is a machine learning technique, which groups the
unlabeled dataset. It can be defined as "A way of grouping the data points into different clusters,
consisting of similar data points. The objects with the possible similarities remain in a group that
has less or no similarities with another group."

It does it by finding some similar patterns in the unlabelled dataset such as shape, size, color,
behavior, etc., and divides them as per the presence and absence of those similar patterns.

It is an unsupervised learning method, hence no supervision is provided to the algorithm, and it

deals with the unlabeled dataset.

After applying this clustering technique, each cluster or group is provided with a cluster-ID. ML
system can use this id to simplify the processing of large and complex datasets.

K-Means Clustering:
K-Means Clustering is an unsupervised learning algorithm that is used to solve the clustering
problems in machine learning or data science.
K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled dataset
into different clusters. Here K defines the number of pre-defined clusters that need to be created
in the process, as if K=2, there will be two clusters, and for K=3, there will be three clusters, and
so on.
It is an iterative algorithm that divides the unlabeled dataset into k different clusters in such a way
that each dataset belongs only one group that has similar properties.
It allows us to cluster the data into different groups and a convenient way to discover the
categories of groups in the unlabeled dataset on its own without the need for any training.
It is a centroid-based algorithm, where each cluster is associated with a centroid. The main aim
of this algorithm is to minimize the sum of distancesbetween the data point and their
corresponding clusters.

The algorithm takes the unlabeled dataset as input, divides the dataset into k-number of
clusters, and repeats the process until it does not find the best clusters. The value of k
should be predetermined in this algorithm.

The k-means clustering algorithm mainly performs two tasks:

o Determines the best value for K center points or centroids by an iterative process.
o Assigns each data point to its closest k-center. Those data points which are near
to the particular k-center, create a cluster.

Hence each cluster has datapoints with some commonalities, and it is away from other
clusters.

The below diagram explains the working of the K-means Clustering Algorithm:

K-Means Algorithm:
Step-1: Select the number K to decide the number of clusters.

Step-2: Select random K points or centroids. (It can be other from the input dataset).

Step-3: Assign each data point to their closest centroid, which will form the predefined
K clusters.

Step-4: Calculate the variance and place a new centroid of each cluster.

Step-5: Repeat the third steps, which means reassign each datapoint to the new closest
centroid of each cluster.

Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

Step-7: The model is ready.

Let's understand the above steps by considering the visual plots.

Suppose we have two variables M1 and M2. The x-y axis scatter plot of these two
variables is given below:
o Let's take number k of clusters, i.e., K=2, to identify the dataset and to put them
into different clusters. It means here we will try to group these datasets into two
different clusters.
o We need to choose some random k points or centroid to form the cluster. These
points can be either the points from the dataset or any other point. So, here we are
selecting the below two points as k points, which are not the part of our dataset.
Consider the below image:

Now we will assign each data point of the scatter plot to its closest K-point or centroid. We will
compute it by applying some mathematics that we have studied to calculate the distance between
two points. So, we will draw a median between both the centroids. Consider the below image:
From the above image, it is clear that points left side of the line is near to the K1 or blue centroid,
and points to the right of the line are close to the yellow centroid. Let's color them as blue and
yellow for clear visualization.

As we need to find the closest cluster, so we will repeat the process by choosing a new centroid.
To choose the new centroids, we will compute the center of gravity of these centroids, and will
find new centroids as below:
Next, we will reassign each datapoint to the new centroid. For this, we will repeat the same process
of finding a median line. The median will be like below image:

From the above image, we can see, one yellow point is on the left side of the line, and two blue
points are right to the line. So, these three points will be assigned to new centroids.

As reassignment has taken place, so we will again go to the step-4, which is finding new
centroids or K-points.

o We will repeat the process by finding the center of gravity of centroids, so the new
centroids will be as shown in the below image:
As we got the new centroids so again will draw the median line and reassign the data points. So,
the image will be:

We can see in the above image; there are no dissimilar data points on either side of the line, which
means our model is formed. Consider the below image:

As our model is ready, so we can now remove the assumed centroids, and the two final clusters
will be as shown in the below image:
Dimensionality Reduction:
The number of input features, variables, or columns present in a given dataset is known as
dimensionality, and the process to reduce these features is called dimensionality reduction.

A dataset contains a huge number of input features in various cases, which makes the predictive
modeling task more complicated. Because it is very difficult to visualize or make predictions for
the training dataset with a high number of features, for such cases, dimensionality reduction
techniques are required to use.

Dimensionality reduction technique can be defined as, "It is a way of converting the higher
dimensions dataset into lesser dimensions dataset ensuring that it provides similar
information." These techniques are widely used in machine learning for obtaining a better fit
predictive model while solving the classification and regression problems.

It is commonly used in the fields that deal with high-dimensional data, such as speech
recognition, signal processing, bioinformatics, etc. It can also be used for data visualization,
noise reduction,
PCA and KERNEL PCA:

PCA(PRINCIPAL COMPONENT ANALYSIS)

PRINCIPAL COMPONENT ANALYSIS: is a tool which is used to reduce the dimension of
the data. It allows us to reduce the dimension of the data without much loss of information.
PCA reduces the dimension by finding a few orthogonal linear combinations (principal
components) of the original variables with the largest variance. The first principal component
captures most of the variance in the data. The second principal component is orthogonal to the
first principal component and captures the remaining variance, which is left of first principal
component and so on. There are as many principal components as the number of original
variables. These principal components are uncorrelated and are ordered in such a way that the
first several principal components explain most of the variance of the original data. To learn
more about PCA you can read the article Principal Component Analysis .

KERNEL PCA: PCA is a linear method. That is it can only be applied to datasets which are
linearly separable. It does an excellent job for datasets, which are linearly separable. But, if we
use it to non-linear datasets, we might get a result which may not be the optimal dimensionality
reduction. Kernel PCA uses a kernel function to project dataset into a higher dimensional
feature space, where it is linearly separable. It is similar to the idea of Support Vector Machines.
There are various kernel methods like linear, polynomial, and gaussian.

In the kernel space the two classes are linearly separable. Kernel PCA uses a kernel function to
project the dataset into a higher-dimensional space, where it is linearly separable. Finally, we
applied the kernel PCA to a non-linear dataset using scikit-learn.
Kernel Principal Component Analysis (PCA) is a technique for dimensionality reduction in
machine learning that uses the concept of kernel functions to transform the data into a high-
dimensional feature space. In traditional PCA, the data is transformed into a lower-dimensional
space by finding the principal components of the covariance matrix of the data. In kernel PCA,
the data is transformed into a high-dimensional feature space using a non-linear mapping
function, called a kernel function, and then the principal components are found in this high-
dimensional space.

Advantages of Kernel PCA:

1. Non-linearity: Kernel PCA can capture non-linear patterns in the data that are not possible
with traditional linear PCA.
2. Robustness: Kernel PCA can be more robust to outliers and noise in the data, as it considers
the global structure of the data, rather than just local distances between data points.
3. Versatility: Different types of kernel functions can be used in kernel PCA to suit different
types of data and different objectives.

Disadvantages of Kernel PCA:

1. Complexity: Kernel PCA can be computationally expensive, especially for large datasets,
as it requires the calculation of eigenvectors and eigenvalues.
2. Model selection: Choosing the right kernel function and the right number of components
can be challenging and may require expert knowledge or trial and error.

Clustering
No ratings yet
Clustering
10 pages
K Means Clustering
No ratings yet
K Means Clustering
11 pages
Unit 3 & 4 (p18)
No ratings yet
Unit 3 & 4 (p18)
18 pages
K-Mean Clustering
No ratings yet
K-Mean Clustering
8 pages
K-Means and K-Medoids Clustering Guide
No ratings yet
K-Means and K-Medoids Clustering Guide
29 pages
Machine Learning Unit 4
No ratings yet
Machine Learning Unit 4
22 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
Clustering Techniques Explained
No ratings yet
Clustering Techniques Explained
11 pages
Chapter 4
No ratings yet
Chapter 4
30 pages
Unit IV
No ratings yet
Unit IV
96 pages
Algo
No ratings yet
Algo
59 pages
Unsupervised Learning Part 1
No ratings yet
Unsupervised Learning Part 1
9 pages
Aiml Unit 4
No ratings yet
Aiml Unit 4
20 pages
UNIT - 3 - Clustering
No ratings yet
UNIT - 3 - Clustering
21 pages
Machine Learning
No ratings yet
Machine Learning
23 pages
Unit 4
No ratings yet
Unit 4
22 pages
Clustering
No ratings yet
Clustering
24 pages
Lecture 1 (UNIT 1)
No ratings yet
Lecture 1 (UNIT 1)
68 pages
Intro to Clustering Techniques
No ratings yet
Intro to Clustering Techniques
13 pages
KMeans Clustering
No ratings yet
KMeans Clustering
16 pages
Week 11
No ratings yet
Week 11
49 pages
Clustering
No ratings yet
Clustering
17 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
31 pages
K-Means Clustering Guide 2023
No ratings yet
K-Means Clustering Guide 2023
14 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
23 pages
Chapter 3 p4
No ratings yet
Chapter 3 p4
18 pages
4.7 Clustering Kmean MeanShift
No ratings yet
4.7 Clustering Kmean MeanShift
24 pages
AI Week 11
No ratings yet
AI Week 11
21 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
12 pages
ML Unit-4 Final 2024-25
No ratings yet
ML Unit-4 Final 2024-25
28 pages
Presentation 1
No ratings yet
Presentation 1
47 pages
CLUSTERING
No ratings yet
CLUSTERING
11 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
26 pages
DWDM Unit5
No ratings yet
DWDM Unit5
14 pages
ML Unit 2
No ratings yet
ML Unit 2
17 pages
CH 5
No ratings yet
CH 5
34 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
45 pages
PART2
No ratings yet
PART2
61 pages
UNIT III Part-1
No ratings yet
UNIT III Part-1
69 pages
Mod4 - Unsupervised Learning
No ratings yet
Mod4 - Unsupervised Learning
9 pages
Unsupervised Learning Modi
No ratings yet
Unsupervised Learning Modi
16 pages
UNIT-5 Material
No ratings yet
UNIT-5 Material
42 pages
CH-6 DM Clustering
No ratings yet
CH-6 DM Clustering
28 pages
Unit4 ML
No ratings yet
Unit4 ML
20 pages
Unit IV Clustering
No ratings yet
Unit IV Clustering
60 pages
Unsupervised Learning for Students
No ratings yet
Unsupervised Learning for Students
59 pages
Unit 4
No ratings yet
Unit 4
29 pages
ML Module5 Clustering
No ratings yet
ML Module5 Clustering
71 pages
4 Clustering
No ratings yet
4 Clustering
9 pages
ML Unit-2
No ratings yet
ML Unit-2
31 pages
ML - Unit - 2
No ratings yet
ML - Unit - 2
13 pages
K-Means Clustering Guide
100% (1)
K-Means Clustering Guide
14 pages
Unit 4
No ratings yet
Unit 4
125 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
DSV - Unit 3 - Data Analysis in Depth
No ratings yet
DSV - Unit 3 - Data Analysis in Depth
53 pages
K Mean Clustering
No ratings yet
K Mean Clustering
45 pages
K-Means Clustering Guide & Python Implementation
No ratings yet
K-Means Clustering Guide & Python Implementation
21 pages
ML Unit3
No ratings yet
ML Unit3
21 pages
Data Mining: Decision Trees & CHAID
No ratings yet
Data Mining: Decision Trees & CHAID
18 pages
Lab Manual
No ratings yet
Lab Manual
25 pages
A Study of Artificial Intelligence
No ratings yet
A Study of Artificial Intelligence
15 pages
Advanced Tech Courses for Engineers
No ratings yet
Advanced Tech Courses for Engineers
6 pages
Resume Shravani Sorte-1
No ratings yet
Resume Shravani Sorte-1
1 page
Demand Planning Inventory & Supply Chain Optimisation.: Machine Learning. Neural Networks. Ai Capabilities
No ratings yet
Demand Planning Inventory & Supply Chain Optimisation.: Machine Learning. Neural Networks. Ai Capabilities
2 pages
CyberSense For Dell PowerProtect Cyber Recovery 1656644381
No ratings yet
CyberSense For Dell PowerProtect Cyber Recovery 1656644381
12 pages
Project Ideas For BCA Students
No ratings yet
Project Ideas For BCA Students
33 pages
Case Study On Software Engineering
No ratings yet
Case Study On Software Engineering
19 pages
B.Tech AIML Student Resume 2024
No ratings yet
B.Tech AIML Student Resume 2024
1 page
XAI MajorProject
No ratings yet
XAI MajorProject
14 pages
Preprints202405 1285 v1
No ratings yet
Preprints202405 1285 v1
20 pages
3.7.1 Copies of Colabarations For 2021 22 Part 3
No ratings yet
3.7.1 Copies of Colabarations For 2021 22 Part 3
240 pages
Hexagon MI MSC WhitePaper-Ai 2021 DTE
No ratings yet
Hexagon MI MSC WhitePaper-Ai 2021 DTE
24 pages
Papers
No ratings yet
Papers
18 pages
UCS551 Chapter 6 - Classification
No ratings yet
UCS551 Chapter 6 - Classification
20 pages
AI Solutions Class 10 - Part B
50% (2)
AI Solutions Class 10 - Part B
33 pages
D Index
No ratings yet
D Index
10 pages
Aiml Odd Sem Syllabus
No ratings yet
Aiml Odd Sem Syllabus
217 pages
Neural Network Methods For Natural Language Processing 1st Edition by Yoav Goldberg ISBN 9783031021657 3031021657 Instant Download
100% (1)
Neural Network Methods For Natural Language Processing 1st Edition by Yoav Goldberg ISBN 9783031021657 3031021657 Instant Download
50 pages
Applsci 12 01863
No ratings yet
Applsci 12 01863
16 pages
MissionGPT Stable Release 4.0.6
No ratings yet
MissionGPT Stable Release 4.0.6
144 pages
Optimizing Coverage and Capacity in Cellular Networks Using Machine Learning
No ratings yet
Optimizing Coverage and Capacity in Cellular Networks Using Machine Learning
5 pages
Approximate Solution of One and Two Dimensional Nonlinear Klein Sinh Gordon Equations Using Method of Line Based On Fibonacci Polynomials
No ratings yet
Approximate Solution of One and Two Dimensional Nonlinear Klein Sinh Gordon Equations Using Method of Line Based On Fibonacci Polynomials
23 pages
Irjet V9i1124
No ratings yet
Irjet V9i1124
5 pages
Data Scientist Syllabus
No ratings yet
Data Scientist Syllabus
25 pages
Chapter 3 - Big Data Overview
No ratings yet
Chapter 3 - Big Data Overview
17 pages
PHD Proposal Heriot P&A
No ratings yet
PHD Proposal Heriot P&A
3 pages
Introducing Machine Learning To Parameter Estimation
No ratings yet
Introducing Machine Learning To Parameter Estimation
6 pages
Precision Farming with AI
No ratings yet
Precision Farming with AI
11 pages

MLF Mod3

Uploaded by

MLF Mod3

Uploaded by

MODULE –III

It is an unsupervised learning method, hence no supervision is provided to the algorithm, and it

The k-means clustering algorithm mainly performs two tasks:

Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

Step-7: The model is ready.

Let's understand the above steps by considering the visual plots.

PCA(PRINCIPAL COMPONENT ANALYSIS)

Advantages of Kernel PCA:

Disadvantages of Kernel PCA:

You might also like