0% found this document useful (0 votes)

40 views125 pages

Unit 4

The document discusses key concepts in Unsupervised Learning, focusing on techniques such as K-Means Clustering, Association Analysis, and the Apriori algorithm. K-Means is an iterative algorithm used to group unlabeled data into clusters based on similarity, while Association Analysis identifies relationships between items in large datasets. The document also covers the FP-Growth algorithm, which efficiently finds frequent itemsets without multiple database scans, enhancing the performance of association rule mining.

Uploaded by

aelurigowri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views125 pages

Unit 4

Uploaded by

aelurigowri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 125

Unit-IV

Machine Learning II: Unsupervised Learning

 Grouping unlabeled items using k-means clustering,

 Association analysis
 Association analysis with the Apriori algorithm,
 Efficiently finding frequent item sets with FP-growth.
 Dimensionality reduction: Feature extraction - Principal component analysis,
 Singular value decomposition.
 Feature selection – feature ranking and subset selection,
 filter,
 wrapper and embedded methods.
Grouping unlabeled items using k-means clustering
Clustering
 Clustering is a type of unsupervised learning that automatically forms clusters of similar things. It’s like
automatic classification.

 Clustering is like sorting a bunch of similar items into different groups based on their characteristics. In
data mining and machine learning, it’s a powerful technique used to group similar data points together,
making it easier to find patterns or understand large datasets.

 You can cluster almost anything, and the more similar the items are in the cluster, the better your
clusters are.

 Clustering is sometimes called unsupervised classification because it produces the same result as
classification but without having predefined classes.

 This notion of similarity depends on a similarity measurement. Essentially, clustering helps identify
natural groupings in your data.
Types of Clustering
 Clustering is a type of unsupervised learning wherein data points are grouped into different sets based
on their degree of similarity.

 The various types of clustering are:

1. Hierarchical clustering
2. Partitioning clustering

 Hierarchical clustering is further subdivided into:

 Agglomerative clustering
 Divisive clustering
 Partitioning clustering is further subdivided into:
 K-Means clustering
 Fuzzy C-Means clustering
Hierarchical Clustering
Hierarchical clustering uses a tree-like structure.
Partitioning Clustering Partitioning clustering is split into two subtypes - K-Means clustering and
Fuzzy C-Means.
K-Means Clustering Algorithm

 K-Means Clustering is an unsupervised learning algorithm that is used to solve the clustering
problems in machine learning or data science.

 It is an Unsupervised Learning algorithm, which groups the unlabeled dataset into different clusters.
Here K defines the number of pre-defined clusters that need to be created in the process, as if K=2,
there will be two clusters, and for K=3, there will be three clusters, and so on.

 It is an iterative algorithm that divides the unlabeled dataset into k different clusters in such a way
that each dataset belongs only one group that has similar properties.

 It allows us to cluster the data into different groups and a convenient way to discover the categories
of groups in the unlabeled dataset on its own without the need for any training.

 It is a centroid-based algorithm, where each cluster is associated with a centroid. The main aim
of this algorithm is to minimize the sum of distances between the data point and their
corresponding clusters.
 The algorithm takes the unlabeled dataset as input, divides the dataset into k-number of clusters,
and repeats the process until it does not find the best clusters. The value of k should be
predetermined in this algorithm.

 The k-means clustering algorithm mainly performs two tasks:

 Determines the best value for K center points or centroids by an iterative process.
 Assigns each data point to its closest k-center. Those data points which are near to the
particular k-center, create a cluster.

 Hence each cluster has data points with some commonalities, and it is away from other clusters.
Objective of k means Clustering
The main objective of k-means clustering is to partition your data into a specific number (k) of groups,
where data points within each group are similar and dissimilar to points in other groups. It achieves this
by minimizing the distance between data points and their assigned cluster’s center, called the centroid.

Here’s an objective:

 Grouping similar data points: K-means aims to identify patterns in your data by grouping data
points that share similar characteristics together. This allows you to discover underlying structures
within the data.
 Minimizing within-cluster distance: The algorithm strives to make sure data points within a cluster
are as close as possible to each other, as measured by a distance metric (usually Euclidean distance).
This ensures tight-knit clusters with high cohesiveness.
 Maximizing between-cluster distance: Conversely, k-means also tries to maximize the separation
between clusters. Ideally, data points from different clusters should be far apart, making the clusters
distinct from each other.
The k-means algorithm works like this.

1. First, the k centroids are randomly assigned to a point.

2. Next, each point in the dataset is assigned to a cluster.
3. The assignment is done by finding the closest centroid and assigning the point to that
cluster.
4. After this step, the centroids are all updated by taking the mean value of all the points in
that cluster.
5. Closest centroid refers to some sort of distance measure
6. Performance on a dataset is determined by the distance measure used
The working of the K-Means algorithm is explained in the below steps:

1. Initialization: Start by randomly selecting K points from the dataset. These points will act as the
initial cluster centroids.
2. Assignment: For each data point in the dataset, calculate the distance between that point and each of
the K centroids. Assign the data point to the cluster whose centroid is closest to it. This step
effectively forms K clusters.
3. Update centroids: Once all data points have been assigned to clusters, recalculate the centroids of
the clusters by taking the mean of all data points assigned to each cluster.
4. Repeat: Repeat steps 2 and 3 until convergence. Convergence occurs when the centroids no longer
change significantly or when a specified number of iterations is reached.
5. Final Result: Once convergence is achieved, the algorithm outputs the final cluster centroids and the
assignment of each data point to a cluster..
K means Clustering

The algorithm will categorize the items into k groups or clusters of similarity. To calculate that similarity,
we will use the Euclidean distance as a measurement.

The algorithm works as follows:

First, we randomly initialize k points, called means or cluster centroids.

We categorize each item to its closest mean, and we update the mean’s coordinates, which are the
averages of the items categorized in that cluster so far.

We repeat the process for a given number of iterations and at the end, we have our clusters.

The “points” mentioned above are called means because they are the mean values of the items categorized
in them. To initialize these means, we have a lot of options. An intuitive method is to initialize the means
at random items in the data set. Another method is to initialize the means at random values between the
boundaries of the data set. For example for a feature x the items have values in [0,3] we will initialize the
means with values for x at [0,3].
Euclidean distance is mostly used distance measure
Advantages of K-Means Clustering Algorithm

 It is very easy to understand and implement.

 If we have large number of variables then, K-means would be faster than Hierarchical clustering.
 On re-computation of centroids, an instance can change the cluster.
 Tighter clusters are formed with K-means as compared to Hierarchical clustering.

Disadvantages of K-Means Clustering Algorithm

 It is a bit difficult to predict the number of clusters i.e. the value of k.

 Output is strongly impacted by initial inputs like number of clusters (value of k).
 Order of data will have strong impact on the final output.
 It is very sensitive to rescaling. If we will rescale our data by means of normalization or
standardization, then the output will completely change final output.
 It is not good in doing clustering job if the clusters have a complicated geometric shape.
Applications of K-Means Clustering

Image Segmentation
K-Means clustering can be used to segment an image into different regions based on the color or texture of
the pixels. This technique is widely used in computer vision applications, such as object recognition, image
retrieval, and medical imaging.

Customer Segmentation
K-Means clustering can be used to segment customers into different groups based on their purchasing
behavior or demographic characteristics. This technique is widely used in marketing applications, such as
customer retention, loyalty programs, and targeted advertising.

Anomaly Detection
K-Means clustering can be used to detect anomalies in a dataset by identifying data points that do not belong
to any cluster. This technique is widely used in fraud detection, network intrusion detection, and predictive
maintenance.

Genomic Data Analysis

K-Means clustering can be used to analyze gene expression data to identify different groups of genes that are
co-regulated or co-expressed. This technique is widely used in bioinformatics applications, such as drug
discovery, disease diagnosis, and personalized medicine.
Association analysis or association rule learning

 Looking at items commonly purchased together can give stores an idea of customers’ purchasing
behavior.
 This knowledge, extracted from the sea of data, can be used for pricing, marketing promotions,
inventory management, and so on.
 Looking for hidden relationships in large datasets is known as association analysis or association
rule learning.
 The problem is, finding different combinations of items can be a time-consuming task and
prohibitively expensive in terms of computing power.
 Brute-force solutions aren’t capable of solving this problem, so a more intelligent approach is
required to find frequent itemsets in a reasonable amount of time.
 Association rule mining finds interesting associations and relationships among large sets of
data items.
 This rule shows how frequently a item set occurs in a transaction.

 A typical example is a Market Based Analysis. Market Based Analysis is one of the key
techniques used by large relations to show associations between items.

 It allows retailers to identify relationships between the items that people buy together
frequently.

 Given a set of transactions, we can find rules that will predict the occurrence of an item based
on the occurrences of other items in the transaction.
 Association rule learning is a type of unsupervised learning technique that checks for the
dependency of one data item on another data item and maps accordingly so that it can be
more profitable.

 It tries to find some interesting relations or associations among the variables of dataset. It is
based on different rules to discover the interesting relations between variables in the database.

 The association rule learning is one of the very important concepts of machine learning, and
it is employed in Market Basket analysis, Web usage mining, continuous production, etc.

 Here market basket analysis is a technique used by the various big retailer to discover the
associations between items.
We can understand it by taking an example of a supermarket, as in a supermarket, all products that are purchased
together are put together
 Association analysis is the task of finding interesting relationships in large datasets.

 These interesting relationships can take two forms: frequent item sets or association rules.

 Frequent item sets are a collection of items that frequently occur together.

 The second way to view interesting relationships is association rules. Association rules suggest
that a strong relationship exists between two items.
• Frequent items sets are lists of items that commonly appear together.
• From the dataset we can also find an association rule such as bread ➞ milk. This means that if
someone buys bread, there’s a good chance they’ll buy milk.
• With the frequent item sets and association rules, retailers have a much better understanding of
their customers.
• Although common examples of association analysis are from the retail industry, it can be applied
to a number of other industries, such as website traffic analysis and medicine.
 If lift = 1, it would imply that the probability of occurrence of the antecedent and that of
the consequent are independent of each other.

 When two events are independent of each other no rule can be drawn involving those
two events.

 If lift > 1, that lets us know the degree to which two occurrences are dependent on one
another and makes those rules potentially useful for predicting the consequent in future
data sets.
Types of Association Rule Learning: Association rule learning can be divided into three algorithms:

Apriori Algorithm
This algorithm uses frequent datasets to generate association rules. It is designed to work on the databases
that contain transactions. This algorithm uses a breadth-first search and Hash Tree to calculate the itemset
efficiently.

It is mainly used for market basket analysis and helps to understand the products that can be bought
together. It can also be used in the healthcare field to find drug reactions for patients.

Eclat Algorithm
Eclat algorithm stands for Equivalence Class Transformation. This algorithm uses a depth-first search
technique to find frequent itemsets in a transaction database. It performs faster execution than Apriori
Algorithm.

F-P Growth Algorithm

The F-P growth algorithm stands for Frequent Pattern, and it is the improved version of the Apriori
Algorithm. It represents the database in the form of a tree structure that is known as a frequent pattern or
tree. The purpose of this frequent tree is to extract the most frequent patterns.
Apriori algorithm
• Name of the algorithm is Apriori because it uses prior knowledge of frequent item set properties.
• How do we define these so-called interesting relationships? Who defines what’s interesting?
• When we’re looking for frequent item sets, what’s the definition of frequent?
• There are a number of concepts we can use to select these things, but the two most important are
support and confidence.
• This entire process breaks down into two steps: 1) find all frequent item sets within the
dataset, and 2) find rules, using the frequent item sets, that also have high confidence.
• The Apriori algorithm scans and determines the frequency of individual items and excludes
those that do not meet the threshold.
• The Apriori principle says that if an itemset is frequent, then all of its subsets are frequent.
• The rule turned around says that if an itemset is infrequent, then its supersets are also infrequent.
Finding frequent itemsets with the Apriori algorithm

 In association analysis two things are important i.e. Frequent item sets and association rules.
 The way to find frequent itemsets is the Apriori algorithm.
 The Apriori algorithm needs a minimum support level as an input and a data set.
 The algorithm will generate a list of all candidate itemsets with one item.
 The transaction data set will then be scanned to see which sets meet the minimum support level.
 Sets that don’t meet the minimum support level will get tossed out. The remaining sets will then be
combined to make itemsets with two elements.
 Again, the transaction dataset will be scanned and itemsets not meeting the minimum support level
will get tossed.
 This procedure will be repeated until all sets are tossed out.
Procedure
 Apriori principle to reduce the number of sets that are checked against the database.
 The Apriori principle states that if an item is infrequent, then supersets containing that item will also be
infrequent.
 The Apriori algorithm starts from single itemsets and creates larger sets by combining sets that meet the
minimum support measure.
 Support is used to measure how often a set appears in the original data.
 Once frequent itemsets have been found, you can use the frequent itemsets to generate association rules.
 The significance of an association rule is measured by confidence.
 Confidence tells you how many times this rule applies to the frequent itemsets.
Lets understand the concept of apriori Algorithm with the help of an example. Consider the
following dataset and we will find frequent itemsets and generate association rules for them:
The Apriori Algorithm, as demonstrated in the bread-butter example, is widely used in modern
startups like Zomato, Swiggy, and other food delivery platforms. These companies use it to
perform market basket analysis, which helps them identify customer behavior patterns and
optimize recommendations.
Applications
• Association analysis can be performed on many different items. Some
common examples are items in a store and pages visited on a website.
• Association analysis has also been used to look at the voting history of
elected officials and judges.
Efficiently finding frequent itemsets with FP-growth Algorithm

FPGrowth is an algorithm for discovering itemsets (group of items) occurring frequently in a

transaction database (frequent itemsets). A frequent itemset is an itemset appearing in at least minsup
transactions from the transaction database, where minsup is a parameter given by the user.
Note:

 The Frequent Pattern Growth (FP-Growth) algorithm improves upon the Apriori algorithm by
eliminating the need for multiple database scans and reducing computational overhead.

 By using a Trie data structure and focusing on ordered-item sets, FP-Growth efficiently mines
frequent itemsets, making it a faster and more scalable solution for large datasets.

 This approach ensures faster processing while maintaining accuracy, making FP-Growth a
powerful tool for data mining.
Why fp-growth is faster ?
 The FP-growth algorithm is an efficient way of finding frequent patterns in a dataset.
 The FP-growth algorithm works with the Apriori principle but is much faster.
 The Apriori algorithm generates candidate itemsets and then scans the dataset to see if they’re
frequent.
 FP-growth is faster because it goes over the dataset only twice.
 The dataset is stored in a structure called an FP-tree.
 After the FP-tree is built, you can find frequent itemsets by finding conditional bases for an item
and building a conditional FP-tree.
 This process is repeated, conditioning on more items until the conditional Fptree has only one
item.
Applications

• The FP-growth algorithm can be used to find frequent words in a series of text documents.
• The microblogging site Twitter provides a number of APIs for developers to use their services.
• The Python module Python-Twitter allows easy access to Twitter.
• Applying the FP-growth algorithm to a Twitter feed on a certain topic can give you some summary
information for that topic.
• There are a number of other uses for frequent itemset generation such as shopping transactions,
medical diagnosis, and study of the atmosphere.
Dimensionality-reduction Techniques
 While working with machine learning models, we often encounter datasets with a large number of
features. These datasets can lead to problems such as increased computation time and overfitting. To
address these issues, we use dimensionality reduction techniques.
 The number of input features, variables, or columns present in a given dataset is known as dimensionality,
and the process to reduce these features is called dimensionality reduction.
 To preprocess the inputs before using any of the ML algorithms
 These techniques reduce noise and improve the performance of machine learning algorithms and works
on labeled and unlabeled data
The Curse of Dimensionality

 Handling the high-dimensional data is very difficult in practice, commonly known as the curse of
dimensionality.
 If the dimensionality of the input dataset increases, any machine learning algorithm and model
becomes more complex.
 As the number of features increases, the number of samples also gets increased proportionally, and the
chance of overfitting also increases.
 If the machine learning model is trained on high-dimensional data, it becomes overfitted and results
in poor performance.
 Hence, it is often required to reduce the number of features, which can be done with dimensionality
reduction.
What is dimensionality reduction?

 In machine learning classification problems, there are often too many factors on the basis of which the
final classification is done.
 These factors are basically variables called features. The higher the number of features, the
harder it gets to visualize the training set and then work on it.
 Sometimes, most of these features are correlated, and hence redundant. This is where dimensionality
reduction algorithms come into play. Eg: humidity and rainfall, e-mail spam classification etc
 Dimensionality reduction is the process of reducing the number of random variables under
consideration, by obtaining a set of principal variables.
 It can be divided into feature selection and feature extraction.
 A 3-D classification problem can be hard to visualize, whereas a 2-D one can be mapped to a
simple 2 dimensional space, and a 1-D problem to a simple line.
 The below figure illustrates this concept, where a 3-D feature space is split into two 1-D feature
spaces, and later, if found to be correlated, the number of features can be reduced even further.

On the left, data points exist in a 3D space (X, Y, Z), but the Z-
dimension appears unnecessary since the data primarily varies
along the X and Y axes. The goal of dimensionality reduction is
to remove less important dimensions without losing valuable
information.

On the right, after reducing the dimensionality, the data is

represented in lower-dimensional spaces. The top plot (X-Y)
maintains the meaningful structure, while the bottom plot (Z-Y)
shows that the Z-dimension contributed little useful
information.

This process makes data analysis more efficient, improving

computation speed and visualization while minimizing
redundancy
 There are two components of dimensionality reduction:

 Feature selection: find a subset of the original set of variables, or features, to get a smaller subset
which can be used to model the problem.
 It usually involves three ways:

 Filter
 Wrapper
 Embedded
 Feature extraction: This reduces the data in a high dimensional space to a lower dimension space, i.e.
a space with lesser no. of dimensions. Eg: PCA and SVD
Feature Selection
Feature selection chooses the most relevant features from the dataset without altering them. It helps
remove redundant or irrelevant features, improving model efficiency. There are several methods for
feature selection including filter methods, wrapper methods, and embedded methods.

1. Filter methods rank the features based on their relevance to the target variable.
2. Wrapper methods use the model performance as the criteria for selecting features.
3. Embedded methods combine feature selection with the model training process.

Feature Extraction
Feature extraction involves creating new features by combining or transforming the original features.

PCA is a popular technique that projects the original features onto a lower-dimensional space while
preserving as much of the variance as possible.
Benefits of applying Dimensionality Reduction
Some benefits of applying dimensionality reduction technique to the given dataset are given below:
 Faster Computation: With fewer features, machine learning algorithms can process data more
quickly. This results in faster model training and testing, which is particularly useful when working
with large datasets.
 Better Visualization: As we saw in the earlier figure, reducing dimensions makes it easier to visualize
data, revealing hidden patterns.
 Prevent Overfitting: With fewer features, models are less likely to memorize the training data and
overfit. This helps the model generalize better to new, unseen data, improving its ability to make
accurate predictions.
 By reducing the dimensions of the features, the space required to store the dataset also gets reduced.
 Less Computation training time is required for reduced dimensions of features.
 Reduced dimensions of features of the dataset help in visualizing the data quickly.
 It removes the redundant features (if present) by taking care of multicollinearity.
Disadvantages of dimensionality Reduction

 Data Loss & Reduced Accuracy – Some important information may be lost during
dimensionality reduction, potentially affecting model performance.
 Interpretability Challenges – The transformed features (e.g., principal components) may not
have clear meanings, making it harder to understand relationships in the original data.
 Choosing the Right Components – Deciding how many dimensions to keep is difficult, as
keeping too few may lose valuable information, while keeping too many can lead to overfitting.
 Some data may be lost due to dimensionality reduction.
 In the PCA dimensionality reduction technique, sometimes the principal components required to
consider are unknown
Principal Component Analysis (PCA)
 This method was introduced by Karl Pearson.
 Principal Component Analysis is a statistical process that converts the observations of correlated features
into a set of linearly uncorrelated features with the help of orthogonal transformation.
 These new transformed features are called the Principal Components.
 It works on a condition that while the data in a higher dimensional space is mapped to data in a lower
dimension space, the variance of the data in the lower dimensional space should be maximum.
 It is one of the popular tools that is used for exploratory data analysis , image processing, movie
recommendation system, optimizing the power allocation in various communication channels.
• It involves the following steps:
i. Construct the covariance matrix of the data.
ii. Compute the eigenvectors of this matrix.
iii. Eigenvectors corresponding to the largest eigenvalues
are used to reconstruct a large fraction of variance of
the original data.
• As we are left with a lesser number of eigenvectors, and
there might have been some data loss in the process.
• But, the most important variances should be retained by
the remaining eigenvectors.
Key Concepts of PCA
 Dimensionality Reduction
PCA reduces the number of features while preserving as much variance (information)
as possible.

 Orthogonal Transformation
PCA transforms the original correlated features into a set of new uncorrelated features
called principal components.

 Variance Maximization

 The first principal component captures the largest variance in the data.
 The second principal component captures the next highest variance orthogonal to the
first.
 This process continues for all components.
Principal Components in PCA

The transformed new features or the output of PCA are the Principal Components.

The number of these PCs are either equal to or less than the original features present in the dataset.

Some properties of these principal components are given below:

1. The principal component must be the linear combination of the original features.
2. These components are orthogonal, i.e., the correlation between a pair of variables is zero.
3. The importance of each component decreases when going to 1 to n, it means the 1 PC has the most
importance, and n PC will have the least importance.
In the above figure, we have several points plotted on a 2-D plane. There are two principal components.
PC1 is the primary principal component that explains the maximum variance in the data. PC2 is another
principal component that is orthogonal to PC1
 The aim of this step is to understand how the variables of the input data set are varying from the
mean with respect to each other, or in other words, to see if there is any relationship between
them.

 Because sometimes, variables are highly correlated in such a way that they contain redundant
information. So, in order to identify these correlations, we compute the covariance matrix.
 The aim of this step is to understand how the variables of the input data set are varying from the
mean with respect to each other, or in other words, to see if there is any relationship between
them.

 Because sometimes, variables are highly correlated in such a way that they contain redundant
information. So, in order to identify these correlations, we compute the covariance matrix.
 If the value of the Covariance Matrix is positive, then it indicates that the variables are
correlated. ( If X increases, Y also increases and vice versa)

 If the value of the Covariance Matrix is negative, then it indicates that the variables are
inversely correlated. ( If X increases, Y also decreases and vice versa).

 Zero (the variables are not related to each other).

Step 3: Compute the eigenvectors and eigenvalues of the covariance matrix to
identify the principal components
 Eigenvectors and eigenvalues are the linear algebra concepts that we need to compute from the
covariance matrix in order to determine the principal components of the data.

 It is eigenvectors and eigenvalues who are behind all the magic of principal components because the
eigenvectors of the Covariance matrix are actually the directions of the axes where there is the most
variance (most information) and that we call Principal Components.

 And eigenvalues are simply the coefficients attached to eigenvectors, which give the amount of
variance carried in each Principal Component.

 By ranking your eigenvectors in order of their eigenvalues, highest to lowest, you get the principal
components in order of significance.
Step 4: Create a Feature Vector

 In the previous step, computing the eigenvectors and ordering them by their eigenvalues in
descending order, allow us to find the principal components in order of significance.

 In this step, what we do is, to choose whether to keep all these components or discard those of
lesser significance (of low eigenvalues), and form with the remaining ones a matrix of vectors
that we call Feature vector.

 So, the feature vector is simply a matrix that has as columns the eigenvectors of the components
that we decide to keep.

 This makes it the first step towards dimensionality reduction, because if we choose to keep
only p eigenvectors (components) out of n, the final data set will have only p dimensions.
Advantages of Principal Component Analysis

 Easy to calculate and compute.

 Speeds up machine learning computing processes and algorithms.
 Prevents predictive algorithms from data overfitting issues.
 Increases performance of ML algorithms by eliminating unnecessary correlated variables.
 Principal Component Analysis results in high variance and increases visualization.
 Helps reduce noise that cannot be ignored automatically.

Disadvantages of Principal Component Analysis

 Sometimes, PCA is difficult to interpret. In rare cases, you may feel difficult to identify the
most important features even after computing the principal components.
 You may face some difficulties in calculating the covariances and covariance matrices.
 Sometimes, the computed principal components can be more difficult to read rather than
the original set of components.
Singular Value Decomposition
 A matrix decomposition method for reducing a matrix to its constituent parts in order to make
certain subsequent matrix calculations simpler.
 represents our original data set with a much smaller data set
 extracts the relevant features from a collection of noisy data
 powerful tool used to distill information in a number of applications, from bioinformatics to finance
including selection of restaurant etc.

 Drawbacks of eigen decomposition:

 The concept of eigen values is only applicable to square matrices
 For square matrices, the presence of complex eigen values restricts the amount of real eigen spaces.
 It is basically a matrix factorization technique, which decomposes any matrix into 3 generic and familiar
matrices.
 The multiplication of a matrix and a vector produces another vector which is defined as the transformation
occurred to that vector wrt the specific matrix in the given vector space.
Advantages of SVD

 Dimensionality reduction: SVD can reduce the number of dimensions in high-dimensional data,
making it easier to visualize and analyze.
 Applications in various fields: SVD has various applications in fields like image processing, natural
language processing, and recommendation systems.
 Interpretability: The singular values and singular vectors obtained from SVD can provide insight into
the structure of the data and the relationship between features.

Disadvantages of SVD

 Limitations in non-linear relationships: SVD assumes that the relationships between features are linear,
which can limit its ability to capture complex non-linear relationships.
 SVD is sensitive to the scale of the features.
Applications of SVD in Image Processing

 Dimensionality Reduction (PCA)

SVD is used in Principal Component Analysis (PCA) to reduce the dimensionality of high-dimensional
data while preserving the most important features.Helps in visualization and noise reduction.

 Latent Semantic Analysis (LSA)

 Used in Natural Language Processing (NLP) to extract relationships between words and documents.
Helps in topic modeling and information retrieval.

 Image Compression
 SVD allows for efficient storage and compression of images by keeping only the most significant
singular values. Reduces storage requirements while maintaining image quality.

 Recommender Systems
 Used in collaborative filtering for building recommendation systems (e.g., movie or product
recommendations). Netflix and Amazon use SVD-based techniques to improve user recommendations.
 Signal Processing

Helps in noise reduction and feature extraction in signals like speech or EEG signals.Used in
filtering and denoising audio/video signals.

 Pseudo inverse Calculation

SVD helps in computing the pseudoinverse of a non-square or singular matrix, which is
useful in solving linear least squares problems.

 Solving Linear Systems

Used for solving ill-conditioned or over determined linear systems in numerical
analysis.

 Anomaly Detection
Identifies outliers in data by analyzing the contribution of singular values.
Used in fraud detection and fault diagnosis.

 Data Clustering
 Helps in discovering hidden patterns in high-dimensional datasets.10. Quantum
ComputingSVD is used in quantum algorithms, including quantum principal component
analysis
Feature selection
 The process of selecting the subset of the relevant features and leaving out the irrelevant
features present in a dataset to build a model of high accuracy.
 In other words, it is a way of selecting the optimal features from the input dataset.
 To find a subset of the original set of variables, or features, to get a smaller subset which can
be used to model the problem.
Feature or variable ranking
 It is the process of ordering the features by the value of some scoring function, which usually
measures feature-relevance.
 The score S(fi) is computed from the training data, measuring some criteria of feature fi.
 By convention a high score is indicative for a valuable (relevant) feature.
 A simple method for feature selection using variable ranking is to select the k highest ranked
features according to S.
 This is usually not optimal, but often preferable to other, more complicated methods. It is
computationally efficient — only calculation and sorting of n scores.
 Ranking Criteria: Correlation Criteria and Information Theoretic Criteria
Important features of feature ranking method of dimensionality reduction
 Feature ranking is a popular method of dimensionality reduction that involves selecting a subset of
features from a larger set of variables.

 The goal is to identify the most important features that contribute to the prediction or classification
task. Here are some important features of feature ranking methods .

 Ranking Criteria: Feature ranking methods use various criteria to rank the importance of
features, such as correlation, mutual information, entropy, and so on. The ranking criteria
should reflect the relevance of each feature to the problem at hand.
 Selection Method: Once the features are ranked, a selection method is used to choose a subset
of the top-ranked features. The selection method can be based on a threshold value, such as
selecting the top k features, or it can be a more sophisticated method, such as a greedy
algorithm or a genetic algorithm.
 Performance Evaluation: Feature ranking methods should be evaluated based on their ability
to improve the performance of the prediction or classification model. The selected subset of
features should lead to better performance compared to using all the features.
 Robustness: Feature ranking methods should be robust to noise and outliers in the data. They
should also be able to handle missing values and deal with redundant or correlated features.

 Interpretability: Feature ranking methods should provide interpretable results that can be easily
understood by domain experts. The importance of each feature should be explained in terms of its
relevance to the problem at hand.

 Computational Efficiency: Feature ranking methods should be computationally efficient,

especially for large datasets with a large number of features. They should be able to rank and
select features in a reasonable amount of time

 Overall, feature ranking is a useful technique for reducing the dimensionality of high-dimensional
data and improving the performance of machine learning models
Feature Subset Selection

 The Goal of Feature Subset Selection is to find the optimal feature subset.

 For Feature Subset Selection you’d need:

 A measure for assessing the goodness of a feature subset (scoring function)

 A strategy to search the space of possible feature subsets
 Finding a minimal optimal feature set for an arbitrary target concept is hard. It would need
Good Heuristics.

 Feature Subset Selection Methods can be classified into three broad categories
 Filter Methods
 Wrapper Methods
 Embedded Methods
Filter Methods
 In this method, the dataset is filtered, and a subset that contains only the relevant features is taken.
 Select subsets of variables as a pre-processing step, independently of the used classifier.
 Filter methods evaluate each feature independently with target variable.
 Feature with high correlation with target variable are selected as it means this feature has some relation
and can help us in making predictions.
 These methods are used in the preprocessing phase to remove irrelevant or redundant features based on
statistical tests (correlation) or other criteria.
Key features of Filter Methods for Feature Subset Selection:
 Filter Methods are usually fast
 Filter Methods provide generic selection of features, not tuned by given learner (universal)
 Filter Methods are also often criticized (feature set not optimized for used classifier)
 Filter Methods are sometimes used as a pre-processing step for other methods.
 Some common techniques of filters method are:
Correlation
Chi-Square Test
ANOVA
Information Gain, etc.
Wrappers Methods
 The wrapper method has the same goal as the filter method, but it takes a machine learning model
for its evaluation.
 the Learner is considered a black-box.
 In this method, some features are fed to the ML model, and evaluate the performance.
 The performance decides whether to add those features or remove to increase the accuracy of the
model.
 This method is more accurate than the filtering method but complex to work.
 Some common techniques of wrapper methods are:
 Forward Selection
 Backward Selection
 Bi-directional Elimination
 Wrapper methods are also referred as greedy algorithms that train algorithm.

 They use different combination of features and compute relation between these subset features and
target variable and based on conclusion addition and removal of features are done.

 Stopping criteria for selecting the best subset are usually pre-defined by the person training the model
such as when the performance of the model decreases or a specific number of features are achieved.
Wrappers Methods
Embedded Methods
 Embedded Methods are specific to a given learning machine
 Performs variable selection (implicitly) in the process of training
 Embedded methods check the different training iterations of the machine learning model and
evaluate the importance of each feature.
 E.g. WINNOW-algorithm (linear unit with multiplicative updates).
 Some common techniques of Embedded methods are:
LASSO
Elastic Net
Ridge Regression, etc.

K Means Algorithm
No ratings yet
K Means Algorithm
4 pages
K Mean
No ratings yet
K Mean
7 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
12 pages
K - Means Clustering
No ratings yet
K - Means Clustering
13 pages
Mini Project
No ratings yet
Mini Project
8 pages
Minor Project
No ratings yet
Minor Project
10 pages
Clustering Techniques - Hierarchical, K-Means Clustering
No ratings yet
Clustering Techniques - Hierarchical, K-Means Clustering
22 pages
K-Means Clustering Guide for Beginners
No ratings yet
K-Means Clustering Guide for Beginners
19 pages
K Mean
No ratings yet
K Mean
12 pages
K Means Clustering
No ratings yet
K Means Clustering
22 pages
ML UNIT 4 Sir
No ratings yet
ML UNIT 4 Sir
42 pages
K-Means Clustering
No ratings yet
K-Means Clustering
6 pages
Mod4 - Unsupervised Learning
No ratings yet
Mod4 - Unsupervised Learning
9 pages
UNIT-6 K Means Clustering
No ratings yet
UNIT-6 K Means Clustering
12 pages
K Mean Algorithm
No ratings yet
K Mean Algorithm
18 pages
1 Kmeans
No ratings yet
1 Kmeans
13 pages
Unit 4
No ratings yet
Unit 4
74 pages
Clustering Techniques for CS Students
100% (1)
Clustering Techniques for CS Students
26 pages
Clustering
No ratings yet
Clustering
18 pages
Chapter 9
No ratings yet
Chapter 9
8 pages
ML Unit-4 Final 2024-25
No ratings yet
ML Unit-4 Final 2024-25
28 pages
DSV - Unit 3 - Data Analysis in Depth
No ratings yet
DSV - Unit 3 - Data Analysis in Depth
53 pages
L7 Clustering
No ratings yet
L7 Clustering
58 pages
Week 9
No ratings yet
Week 9
66 pages
P-3 1 2-Kmeans
No ratings yet
P-3 1 2-Kmeans
43 pages
K Mean Clustering
No ratings yet
K Mean Clustering
59 pages
KMeans Clustering Report
No ratings yet
KMeans Clustering Report
2 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
31 pages
K Mean Clustering1
No ratings yet
K Mean Clustering1
23 pages
Week 14 and 15 Machine Learning Unsupervised 2
No ratings yet
Week 14 and 15 Machine Learning Unsupervised 2
25 pages
KMeans Clustering
No ratings yet
KMeans Clustering
16 pages
Clustering
No ratings yet
Clustering
84 pages
ML Unit 4
No ratings yet
ML Unit 4
110 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
20 pages
K Means Clustering
No ratings yet
K Means Clustering
3 pages
Unit 3 - KmeansClustering
No ratings yet
Unit 3 - KmeansClustering
17 pages
9.1. Machine Learning Unsupervised Learning-1
No ratings yet
9.1. Machine Learning Unsupervised Learning-1
57 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
CLUSTERING
No ratings yet
CLUSTERING
11 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
23 pages
Introduction To Unsupervised Learning:: Clustering
No ratings yet
Introduction To Unsupervised Learning:: Clustering
21 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
27 pages
ML 12
No ratings yet
ML 12
19 pages
Machine Learning Chapter 3
No ratings yet
Machine Learning Chapter 3
12 pages
Clustering Part1
No ratings yet
Clustering Part1
84 pages
Clustering Algorithm: An Unsupervised Learning Approach
No ratings yet
Clustering Algorithm: An Unsupervised Learning Approach
23 pages
Digital Image Processing: Segmentation-5
No ratings yet
Digital Image Processing: Segmentation-5
43 pages
K Mean Clustering
No ratings yet
K Mean Clustering
32 pages
Cluster
No ratings yet
Cluster
50 pages
A Tutorial On Clustering Algorithms
No ratings yet
A Tutorial On Clustering Algorithms
4 pages
Week 11
No ratings yet
Week 11
49 pages
SLide#4 - Clustering and Elbow Technique
No ratings yet
SLide#4 - Clustering and Elbow Technique
29 pages
U1 - KMeans - 5th Sem - DS
No ratings yet
U1 - KMeans - 5th Sem - DS
14 pages
Datamining Lect8
No ratings yet
Datamining Lect8
79 pages
Clustering
No ratings yet
Clustering
125 pages
ML CH 4
No ratings yet
ML CH 4
51 pages
Lect 6 - Clustering
No ratings yet
Lect 6 - Clustering
50 pages
Session 37 CO4 Unsupervised Learning
No ratings yet
Session 37 CO4 Unsupervised Learning
34 pages
ML UNIT 4 4m 2m
No ratings yet
ML UNIT 4 4m 2m
12 pages
Deep Learning UNIT 5
No ratings yet
Deep Learning UNIT 5
182 pages
ML Unit3b
No ratings yet
ML Unit3b
175 pages
Converted Text Times New Roman
No ratings yet
Converted Text Times New Roman
1 page
Unit 4 Notes
No ratings yet
Unit 4 Notes
21 pages
Ban Unit-3
No ratings yet
Ban Unit-3
21 pages
Nexus Prologue
No ratings yet
Nexus Prologue
79 pages
Pool Main Drains & Covers Guide
No ratings yet
Pool Main Drains & Covers Guide
12 pages
Po Na Select Solutions Drum
No ratings yet
Po Na Select Solutions Drum
4 pages
David - Class 6 Crime Online
No ratings yet
David - Class 6 Crime Online
3 pages
Generation Beta
100% (1)
Generation Beta
2 pages
Industrial Pressure Transmitter: 0.13% FS Accuracy, External Adjustments, 4 To 20 Ma Output
No ratings yet
Industrial Pressure Transmitter: 0.13% FS Accuracy, External Adjustments, 4 To 20 Ma Output
3 pages
226D Cab
0% (1)
226D Cab
23 pages
Tutorial 1
No ratings yet
Tutorial 1
9 pages
Winter 24 Key Points
No ratings yet
Winter 24 Key Points
56 pages
Meta-Learning Adaptive Filters for Audio
No ratings yet
Meta-Learning Adaptive Filters for Audio
15 pages
Academic Year: 2022 - 23: Micro Project
No ratings yet
Academic Year: 2022 - 23: Micro Project
19 pages
Computer Programming Lab - Pointers LAB 8
No ratings yet
Computer Programming Lab - Pointers LAB 8
1 page
Son Nguyen's Blog - How To Install HASP MultiKey USB Dongle Emulator On Windows 7 64 Bit
0% (1)
Son Nguyen's Blog - How To Install HASP MultiKey USB Dongle Emulator On Windows 7 64 Bit
27 pages
Mastermind Game with Tkinter
No ratings yet
Mastermind Game with Tkinter
8 pages
Class Notes - ADC For Numerical Relay
No ratings yet
Class Notes - ADC For Numerical Relay
5 pages
Syllabus - Sem 3 (Scheme 2024)
No ratings yet
Syllabus - Sem 3 (Scheme 2024)
26 pages
Leaving Cert 2025 Maths Paper 1 Higher Level 1
No ratings yet
Leaving Cert 2025 Maths Paper 1 Higher Level 1
32 pages
The Ethical Implications of Artificial Intelligence Essay
No ratings yet
The Ethical Implications of Artificial Intelligence Essay
5 pages
Secabo CIII
No ratings yet
Secabo CIII
27 pages
MecaNet Typing Practice Lessons
No ratings yet
MecaNet Typing Practice Lessons
12 pages
Lab Manual: Sri Sri University
No ratings yet
Lab Manual: Sri Sri University
96 pages
Startup Hubs Overview
No ratings yet
Startup Hubs Overview
4 pages
Software Manual of The Easy Servo Drives: ES-D508, ES-D808, ES-D1008, ES-D1208 & ES-D2306
No ratings yet
Software Manual of The Easy Servo Drives: ES-D508, ES-D808, ES-D1008, ES-D1208 & ES-D2306
27 pages
JCB Dealer Antifreeze Bulletin
No ratings yet
JCB Dealer Antifreeze Bulletin
1 page
LabSolutionsSIMDIS OperatorsGuide
No ratings yet
LabSolutionsSIMDIS OperatorsGuide
92 pages
High-Performance Hybrid Step Motors
No ratings yet
High-Performance Hybrid Step Motors
2 pages
TESDA Circular No. 039-2025
No ratings yet
TESDA Circular No. 039-2025
3 pages
So-06202401564 PT Cablenet Fiber Data
No ratings yet
So-06202401564 PT Cablenet Fiber Data
1 page
WallStreetASIA UserGuide
No ratings yet
WallStreetASIA UserGuide
36 pages

Unit 4

Uploaded by

Unit 4

Uploaded by

Unit-IV

Machine Learning II: Unsupervised Learning

 Grouping unlabeled items using k-means clustering,

 The various types of clustering are:

 Hierarchical clustering is further subdivided into:

 The k-means clustering algorithm mainly performs two tasks:

1. First, the k centroids are randomly assigned to a point.

The algorithm works as follows:

First, we randomly initialize k points, called means or cluster centroids.

 It is very easy to understand and implement.

Disadvantages of K-Means Clustering Algorithm

 It is a bit difficult to predict the number of clusters i.e. the value of k.

Genomic Data Analysis

F-P Growth Algorithm

FPGrowth is an algorithm for discovering itemsets (group of items) occurring frequently in a

On the right, after reducing the dimensionality, the data is

This process makes data analysis more efficient, improving

Some properties of these principal components are given below:

 Zero (the variables are not related to each other).

 Easy to calculate and compute.

Disadvantages of Principal Component Analysis

 Drawbacks of eigen decomposition:

 Dimensionality Reduction (PCA)

 Latent Semantic Analysis (LSA)

 Pseudo inverse Calculation

 Solving Linear Systems

 Computational Efficiency: Feature ranking methods should be computationally efficient,

 For Feature Subset Selection you’d need:

 A measure for assessing the goodness of a feature subset (scoring function)

You might also like