0% found this document useful (0 votes)
17 views37 pages

Unit 3,4,5 ML (CS - AI)

The document discusses various machine learning algorithms, focusing on distance-based methods like K-Nearest Neighbors (KNN), Decision Trees, Random Forests, and Support Vector Machines (SVM). It outlines the advantages and disadvantages of each algorithm, their applications, and how they function, including concepts like entropy and Gini impurity. Additionally, it introduces unsupervised learning and clustering techniques such as K-Means and DBSCAN, highlighting their properties and methodologies.

Uploaded by

Kala Mca
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views37 pages

Unit 3,4,5 ML (CS - AI)

The document discusses various machine learning algorithms, focusing on distance-based methods like K-Nearest Neighbors (KNN), Decision Trees, Random Forests, and Support Vector Machines (SVM). It outlines the advantages and disadvantages of each algorithm, their applications, and how they function, including concepts like entropy and Gini impurity. Additionally, it introduces unsupervised learning and clustering techniques such as K-Means and DBSCAN, highlighting their properties and methodologies.

Uploaded by

Kala Mca
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Distance Based Methods:

Distance-based algorithms are machine learning algorithms that classify queries


by computing distances between these queries.
K-Nearest Neighbor(KNN) Algorithm for Machine Learning
 K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on
Supervised Learning technique.
 K-NN algorithm assumes the similarity between the new case/data and available cases and
put the new case into the category that is most similar to the available categories.
 K-NN algorithm stores all the available data and classifies a new data point based on the
similarity. This means when new data appears then it can be easily classified into a well suite
category by using K- NN algorithm.
 K-NN algorithm can be used for Regression as well as for Classification but mostly it is used
for the Classification problems.
 K-NN is a non-parametric algorithm, which means it does not make any assumption on
underlying data.
 It is also called a lazy learner algorithm because it does not learn from the training set
immediately instead it stores the dataset and at the time of classification, it performs an action
on the dataset.
 KNN algorithm at the training phase just stores the dataset and when it gets new data, then it
classifies that data into a category that is much similar to the new data.
Why do we need a KNN algorithm?
Suppose there are two categories, i.e., Category A and Category B, and we have a new data
point x1, so this
data point will lie in which of these categories. To solve this type of problem, we need a K-NN
algorithm. With
the help of K-NN, we can easily identify the category or class of a particular dataset. Consider
the below
diagram:
Advantages of KNN Algorithm:
 It is simple to implement.
 It is robust to the noisy training data
 It can be more effective if the training data is large.
Disadvantages of KNN Algorithm:
 Always needs to determine the value of K which may be complex some time.
 The computation cost is high because of calculating the distance between the data points for
all the training samples.

Decision Tree Classification Algorithm:


 Decision Tree is a Supervised learning technique that can be used for both classification and
Regression problems, but mostly it is preferred for solving Classification problems. It is a tree-
structured classifier, where internal nodes represent the features of a dataset, branches represent
the decision rules and each leaf node represents the outcome.
 In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node.
Decision nodes are used to make any decision and have multiple branches, whereas Leaf nodes
are the output of those decisions and do not contain any further branches.
 The decisions or the test are performed on the basis of features of the given dataset.
 It is a graphical representation for getting all the possible solutions to a problem/decision
based on given conditions.
 It is called a decision tree because, similar to a tree, it starts with the root node, which
expands on further branches and constructs a tree-like structure.
 In order to build a tree, we use the CART algorithm, which stands for Classification and
Regression Tree algorithm.
 A decision tree simply asks a question, and based on the answer (Yes/No), it further split the
tree into subtrees.
 Below diagram explains the general structure of a decision tree:
Decision Tree Terminologies
 Root Node: Root node is from where the decision tree starts. It represents the entire dataset,
which further gets divided into two or more homogeneous sets.
 Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated further
after getting a leaf node.
 Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes
according to the given conditions.
 Branch/Sub Tree: A tree formed by splitting the tree.
 Pruning: Pruning is the process of removing the unwanted branches from the tree.
 Parent/Child node: The root node of the tree is called the parent node, and other nodes are
called the child nodes.

3
17

124%

ID3 : This algorithm measures how mixed up the data is at a node using something
called entropy. It then chooses the feature that helps to clarify the data the most.
C4.5 : This is an improved version of ID3 that can handle missing data and continuous
attributes.
CART : This algorithm uses a different measure called Gini impurity to decide how to split the
data. It can be used for both classification (sorting data into categories) and regression
(predicting continuous values) tasks.
Did you notice anything in the above flowchart? We see that if the weather is cloudy then we
must go to play. Why didn‟t it split more? Why did it stop there?

To answer this question, we need to know about few more concepts like entropy, information
gain, and Gini index. But in simple terms, I can say here that the output for the training dataset
is always yes for cloudy weather, since there is no disorderliness here we don‟t need to split the
node further.
The goal of machine learning is to decrease uncertainty or disorders from the dataset and for
this, we use these trees.
Now you must be thinking how do I know what should be the root node? what should be the
decision node? when should I stop splitting? To decide this, there is a metric called “Entropy”
which is the amount of uncertainty in the dataset.
How Decision tree algorithms work?

• Decision Tree algorithm works in simpler steps:

• Starting at the Root: The algorithm begins at the top, called the “root node,”
representing the entire dataset.

• Asking the Best Questions: It looks for the most important feature or question that
splits the data into the most distinct groups. This is like asking a question at a fork in
the tree.

• Branching Out: Based on the answer to that question, it divides the data into smaller
subsets, creating new branches. Each branch represents a possible route through the
tree.

• Repeating the Process: The algorithm continues asking questions and splitting the data
at each branch until it reaches the final “leaf nodes,” representing the predicted
outcomes or classifications.

Advantages of Decision Trees

• Easy to Understand: They are simple to visualize and interpret, making them easy to
understand even for non-experts.

• Handles Both Numerical and Categorical Data: They can work with both types of
data without needing much preprocessing.
• No Need for Data Scaling: These trees do not require normalization or scaling of data.

• Automated Feature Selection: They automatically identify the most important


features for decision-making.

• Handles Non-Linear Relationships: They can capture non-linear patterns in the data
effectively.

Disadvantages of Decision Trees

• Overfitting Risk: It can easily overfit the training data, especially if they are too deep.

• Unstable with Small Changes: Small changes in data can lead to completely different
trees.

• Biased with Imbalanced Data: They tend to be biased if one class dominates the
dataset.

• Limited to Axis-Parallel Splits: They struggle with diagonal or complex decision


boundaries.

• Can Become Complex: Large trees can become hard to interpret and may lose their
simplicity.
Applications of Decision Trees

• Healthcare

• Diagnosing diseases based on patient symptoms:

• Predicting patient outcomes and treatment effectiveness

• Identifying risk factors for specific health conditions:

• Finance

• Assessing credit risk for loan approvals

• Detecting fraudulent transactions

• Predicting stock market trends and investment risks

• Education

• Predicting student performance and outcomes

• Identifying factors affecting student dropout rates

• Personalizing learning paths for students

Random Forest Algorithm

• A Random Forest Algorithm is a supervised machine learning algorithm that is


extremely popular and is used for Classification and Regression problems in Machine
Learning.

• We know that a forest comprises numerous trees, and the more trees more it will be
robust.

• Similarly, the greater the number of trees in a Random Forest Algorithm, the higher its
accuracy and problem-solving ability.

• Random Forest is a classifier that contains several decision trees on various subsets of
the given dataset and takes the average to improve the predictive accuracy of that
dataset.

• It is based on the concept of ensemble learning which is a process of combining


multiple classifiers to solve a complex problem and improve the performance of the
model.
Steps to follow Random Forest Tree:

• The following steps explain the working Random Forest Algorithm:

• Step 1: Select random samples from a given data or training set.

• Step 2: This algorithm will construct a decision tree for every training data.

• Step 3: Voting will take place by averaging the decision tree.

• Step 4: Finally, select the most voted prediction result as the final prediction result.

This combination of multiple models is called Ensemble. Ensemble uses two methods:

Bagging: Creating a different training subset from sample training data with replacement is
called Bagging. The final output is based on majority voting.

Boosting: Combing weak learners into strong learners by creating sequential models such
that the final model has the highest accuracy is called Boosting. Example: ADA BOOST,
XG BOOST.

Bagging: From the principle mentioned above, we can understand Random forest
uses the Bagging code.
 Bagging is also known as Bootstrap Aggregation used by random forest.
 The process begins with any original random data.
 After arranging, it is organised into samples known as Bootstrap Sample. This
process is known as Bootstrapping.
 Further, the models are trained individually, yielding different results known as
Aggregation.
 In the last step, all the results are combined, and the generated output is based on
majority voting.

This step is known as Bagging and is done using an Ensemble Classifier.

Support Vector Machine Algorithm

• Support Vector Machine or SVM is one of the most popular Supervised Learning
algorithms, which is used for Classification as well as Regression problems. However,
primarily, it is used for Classification problems in Machine Learning.

• The goal of the SVM algorithm is to create the best line or decision boundary that can
segregate n-dimensional space into classes so that we can easily put the new data point
in the correct category in the future. This best decision boundary is called a hyperplane.

• SVM chooses the extreme points/vectors that help in creating the hyperplane. These
extreme cases are called as support vectors, and hence algorithm is termed as Support
Vector Machine. Consider the below diagram in which there are two different
categories that are classified using a decision boundary or hyperplane:
• Example: SVM can be understood with the example that we have used in the KNN
classifier. Suppose we see a strange cat that also has some features of dogs, so if we
want a model that can accurately identify whether it is a cat or dog, so such a model can
be created by using the SVM algorithm. We will first train our model with lots of
images of cats and dogs so that it can learn about different features of cats and dogs,
and then we test it with this strange creature. So as support vector creates a decision
boundary between these two data (cat and dog) and choose extreme cases (support
vectors), it will see the extreme case of cat and dog. On the basis of the support vectors,
it will classify it as a cat. Consider the below diagram:

• Types of SVM

• SVM can be of two types:


• Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset
can be classified into two classes by using a single straight line, then such data is
termed as linearly separable data, and classifier is used called as Linear SVM classifier.

• Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which
means if a dataset cannot be classified by using a straight line, then such data is termed
as non-linear data and classifier used is called as Non-linear SVM classifier.

How does SVM works?


Linear SVM:

Non-Linear SVM:
What is Unsupervised Learning?

• Unsupervised learning is a branch of machine learning that deals with unlabeled data.

• Unlike supervised learning, where the data is labeled with a specific category or
outcome, unsupervised learning algorithms are tasked with finding patterns and
relationships within the data without any prior knowledge of the data‟s meaning.

• Unsupervised machine learning algorithms find hidden patterns and data without any
human intervention, i.e., we don‟t give output to our model.

• The training model has only input parameter values and discovers the groups or patterns
on its own.

Unsupervised Learning Algorithms

• There are mainly 3 types of Algorithms which are used for Unsupervised dataset.

• Clustering

• Association Rule Learning

• Dimensionality Reduction
Clustering Algorithms

Clustering in unsupervised machine learning is the process of grouping unlabeled data into
clusters based on their similarities. The goal of clustering is to identify patterns and
relationships in the data without any prior knowledge of the data‟s meaning.

Broadly this technique is applied to group data based on different patterns, such as
similarities or differences, our machine model finds.

These algorithms are used to process raw, unclassified data objects into groups. For
example, in the above figure, we have not given output parameter values, so this technique
will be used to group clients based on the input parameters provided by our data.

K-means clustering

What is K-Means Clustering?

K-means clustering is a popular unsupervised machine learning algorithm used for


partitioning a dataset into a pre-defined number of clusters. The goal is to group similar
data points together and discover underlying patterns or structures within the data.

Let’s say you are


working on a
project where
you need to
predict the sales
of a big mart:

Or, a project
where your task
is to predict
whether a loan
will be approved
or not:
Properties of K means Clustering
Second Property of K-Means Clustering Algorithm

How to Apply K-Means Clustering Algorithm?

Choose the number of clusters k

The first step in k-means is to pick the number of clusters, k.

Select k random points from the data as centroids

Next, we randomly select the centroid for each cluster. Let‟s say we want to have 2 clusters, so
k is equal to 2 here. We then randomly select the centroid: k means
Assign all the points to the closest cluster centroid

Once we have initialized the centroids, we assign each point to the closest cluster centroid:

• Recompute the centroids of newly formed clusters

• Now, once we have assigned all of the points to either cluster, the next step is to
compute the centroids of newly formed clusters:

• Here, the red and green crosses are the new centroids.

• Repeat steps 3 and 4

Stopping Criteria for K-Means Clustering:

• There are essentially three stopping criteria that can be adopted to stop the K-means
algorithm:

• Centroids of newly formed clusters do not change

• Points remain in the same cluster

• Maximum number of iterations is reached


DBSCAN Clustering in ML | Density based clustering


DBSCAN is a density-based clustering algorithm that groups data points that are closely
packed together and marks outliers as noise based on their density in the feature
space. It identifies clusters as dense regions in the data space, separated by areas of lower
density.
Unlike K-Means or hierarchical clustering, which assume clusters are compact and spherical,
DBSCAN excels in handling real-world data irregularities such as:
 Arbitrary-Shaped Clusters: Clusters can take any shape, not just circular or convex.
 Noise and Outliers: It effectively identifies and handles noise points without assigning
them to any cluster.

The figure above shows a data set with clustering algorithms: K-Means and
Hierarchical handling compact, spherical clusters with varying noise tolerance, while
DBSCAN manages arbitrary-shaped clusters and excels in noise handling.
Key Parameters in DBSCAN
 1. eps: This defines the radius of the neighborhood around a data point.
If the distance between two points is less than or equal to eps, they are considered neighbors.
Choosing the right eps is crucial:
 If eps is too small, most points will be classified as noise.
 If eps is too large, clusters may merge, and the algorithm may fail to distinguish between
them.
A common method to determine eps is by analyzing the k-distance graph.
 2. MinPts: This is the minimum number of points required within the eps radius to form a
dense region.
A general rule of thumb is to set MinPts >= D+1, where D is the number of dimensions in the
dataset. For most cases, a minimum value of MinPts = 3 is recommended.
How Does DBSCAN Work?
DBSCAN works by categorizing data points into three types:
1. core points, which have a sufficient number of neighbors within a specified radius (eplison)
2. border points, which are near core points but lack enough neighbors to be core points
themselves
3. noise points, which do not belong to any cluster.
By iteratively expanding clusters from core points and connecting density-reachable points,
DBSCAN forms clusters without relying on rigid assumptions about their shape or size.

Steps in the DBSCAN Algorithm


1. Identify Core Points: For each point in the dataset, count the number of points within
its eps neighborhood. If the count meets or exceeds MinPts, mark the point as a core
point.
2. Form Clusters: For each core point that is not already assigned to a cluster, create a new
cluster. Recursively find all density-connected points (points within the eps radius of the
core point) and add them to the cluster.
3. Density Connectivity: Two points, a and b, are density-connected if there exists a chain
of points where each point is within the eps radius of the next, and at least one point in the
chain is a core point. This chaining process ensures that all points in a cluster are connected
through a series of dense regions.
4. Label Noise Points: After processing all points, any point that does not belong to a cluster
is labeled as noise.

Pseudocode For DBSCAN Clustering Algorithm


DBSCAN(dataset, eps, MinPts){
# cluster index
C=1
for each unvisited point p in dataset {
mark p as visited
# find neighbors
Neighbors N = find the neighboring points of p

if |N|>=MinPts:
N = N U N'
if p' is not a member of any cluster:
add p' to cluster C
}
Implementation Of DBSCAN Algorithm In Python

.
Import Libraries
import matplotlib.pyplot as plt
import numpy as np
from sklearn.cluster import DBSCAN
from sklearn import metrics
from sklearn.datasets import make_blobs
from sklearn.preprocessing import StandardScaler
from sklearn import datasets

# Load data in X
X, y_true = make_blobs(n_samples=300, centers=4,
cluster_std=0.50, random_state=0)

db = DBSCAN(eps=0.3, min_samples=10).fit(X)
core_samples_mask = np.zeros_like(db.labels_, dtype=bool)
core_samples_mask[db.core_sample_indices_] = True
labels = db.labels_

# Number of clusters in labels, ignoring noise if present.


n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)

# Plot result

# Black removed and is used for noise instead.


unique_labels = set(labels)
colors = ['y', 'b', 'g', 'r']
print(colors)
for k, col in zip(unique_labels, colors):
if k == -1:
# Black used for noise.
col = 'k'

class_member_mask = (labels == k)

xy = X[class_member_mask & core_samples_mask]


plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=col,
markeredgecolor='k',
markersize=6)

xy = X[class_member_mask & ~core_samples_mask]


plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=col,
markeredgecolor='k',
markersize=6)

plt.title('number of clusters: %d' % n_clusters_)


plt.show()
Output:
Cluster of dataset

DBSCAN K-Means

K-Means is very sensitive to the number


In DBSCAN we need not specify the number
of clusters so it
of clusters.
need to specified

Clusters formed in K-Means are


Clusters formed in DBSCAN can be of any
spherical or
arbitrary shape.
convex in shape

K-Means does not work well with


DBSCAN can work well with datasets having outliers data. Outliers
noise and outliers can skew the clusters in K-Means to a
very large extent.

In K-Means only one parameter is


In DBSCAN two parameters are required for
required is for training
training the Model
the model
Dimensionality Reduction Techniques

Dimensionality reduction techniques can be broadly divided into two categories:

1. Feature selection: This refers to retaining the relevant (optimal) features and
discarding the irrelevant ones to ensure the high accuracy of the model. Feature
selection methods such as filter, wrapper, and embedded methods are popularly
used.
2. Feature extraction: This process is also termed feature projection, wherein
multidimensional space is converted into a space with lower dimensions. Some
known feature extraction methods include principal component analysis (PCA),
linear discriminant analysis (LDA), Kernel PCA (K-PCA), and quadratic
discriminant analysis (QCA).

Although one can perform dimensionality reduction with several techniques, the following are
the most commonly used ones:

1. Principal component analysis (PCA)

Principal component analysis performs orthogonal transformations to convert an observation of


correlated characteristics into a set of linearly correlated features. The newly changed
characteristics are termed „principal components‟. This statistical method is a key data analysis
and predictive modeling technique.

2. Missing value ratio

When a dataset contains several missing values, such variables are eliminated as they fail to
provide relevant or reliable information. The elimination task is accomplished by defining a
threshold level, wherein a variable with more missing values than the threshold is immediately
dropped. This implies that the more the threshold value, the lesser the efficiency.
3. Backward feature elimination

This approach is typically used while developing a linear regression or logistic regression
model. In this technique, you can specify the number of features essential for ML algorithms
based on the estimated model performance and tolerated error rate.

The process begins by training the ML model using all „n‟ variables provided in the dataset.
Upon training, the model‟s performance is evaluated. After this, features are eliminated one at
a time, and the model is trained on „n-1‟ features for n times. The performance of the model is
typically re-evaluated at each step.

The above iteration is repeated until the variable that made the least or no difference to the
model‟s performance is identified. Upon identification, this variable or feature is eliminated,
and you are left with „n-1‟ features. The process is repeated until a point is reached where no
feature can be ultimately dropped from the dataset.

4. Forward feature selection

Forward feature selection is opposite to the backward elimination technique. Rather than
deleting any feature, we rely on determining the best characteristics that result in an above-
average gain in the model‟s performance.

In this approach, we start with a single feature and then progressively add features one at a
time. Here, the model is trained on each feature independently. Thus, the feature with the
highest performance is identified, and the model is iteratively run using it. The process is
repeated until the model‟s performance improves over time.

5. Random forest

Random forest is a feature selection approach with a built-in feature significance package that
identifies feature importance. As a result, the need to program it individually is eliminated.

In this approach, multiple decision trees are constructed against the target feature to identify the
variable subset using statistics for each attribute. Moreover, as random forest accepts input in
the form of numeric data, a hot encoding process is essential to convert any type of input data
into numeric data.

How Collaborative Filtering Works

Collaborative filtering works by leveraging data from multiple users to make


recommendations. Instead of relying on item attributes, it focuses on user-item interactions.
The key assumption is that if two users have agreed on certain items in the past, they are likely
to agree on other items in the future.

Example Scenario

Consider a movie recommendation system with three users:

 User U1 likes movies m1, m2, and m4


 User U2 likes movies m1, m3, and m4
 User U3 likes movie m1

To recommend a movie for User U3, we observe the following:

 User U3 has shown a preference for m1.


 Users U1 and U2 also liked m1, indicating that they have a similar taste.
 Since U1 and U2 have also liked m4, it is likely that U3 will enjoy m4 as well.
 Therefore, the system recommends movie m4 to User U3.

This example demonstrates the logic of collaborative filtering, where recommendations are
based on the preferences of similar users.

Types of Collaborative Filtering

Collaborative filtering is divided into two main types:

1. Memory-Based Collaborative Filtering


o Uses similarity measures to find relationships between users or items.
o Does not require a learning phase, making it easier to implement.
o Can be further classified into:
 User-Based Filtering: Recommends items based on the preferences of
similar users.
 Item-Based Filtering: Recommends items similar to those a user has
previously interacted with.
2. Model-Based Collaborative Filtering
o Uses machine learning techniques such as matrix factorization, deep learning,
and Bayesian models.
o Requires training a predictive model on historical user-item interactions.
o More scalable and accurate for large datasets compared to memory-based
methods.
Advantages of Collaborative Filtering

 Personalized Recommendations: Provides highly relevant suggestions based on user


behavior.
 No Need for Item Metadata: Works purely on user interaction data, making it flexible
across different domains.
 Improves Over Time: The more users interact, the better the recommendations
become.

Challenges of Collaborative Filtering

 Cold Start Problem: New users and items lack enough interaction data for
recommendations.
 Data Sparsity: Most users interact with only a small subset of available items, making
similarity calculations difficult.
 Scalability Issues: For large datasets, real-time computation of similarities can be
computationally expensive.

Collaborative Recommender Systems

Collaborative recommender systems are a type of recommendation system that suggests items
to users based on their past interactions and the behavior of similar users. Unlike content-based
systems, which rely on item attributes, collaborative filtering uses patterns in user behavior to
make predictions. There are two main types of collaborative filtering: memory-based filtering
and model-based filtering.

Memory-Based Collaborative Filtering

Memory-based collaborative filtering, also known as neighbor-based filtering, relies on the


similarity between users or items to make recommendations. It does not require an explicit
learning phase but instead works by computing relationships in the user-item interaction
matrix. This method is divided into two subtypes:
User-Based Collaborative Filtering

In user-based collaborative filtering, recommendations are made based on the preferences of


users who exhibit similar behavior. The assumption is that if two users have rated or interacted
with many items in a similar way, they are likely to have similar tastes in the future as well.

Working Principle

1. The system calculates the similarity between users by comparing their past interactions
with items.
2. Each user is assigned a weight based on their similarity with the target user.
3. A group of similar users (nearest neighbors) is selected.
4. The system predicts the target user's preferences based on the weighted average of the
ratings given by similar users.
5. Finally, the system recommends items that the similar users have liked but the target
user has not yet interacted with.

Example

Consider a movie recommendation system where User A and User B have rated several movies
in a similar way. If User B has watched and highly rated a movie that User A has not yet seen,
that movie will be recommended to User A.

Challenges

 Data Sparsity: Since most users interact with only a small portion of items, finding
enough similar users can be difficult.
 Scalability Issues: As the number of users increases, the computation of pairwise
similarities becomes complex and time-consuming.
 Cold-Start Problem: A new user without prior interactions cannot be effectively
compared to existing users.

Item-Based Collaborative Filtering

Item-based collaborative filtering focuses on the relationships between items rather than users.
The system identifies items that have been rated or interacted with similarly by multiple users
and recommends new items to users based on their past interactions.

Working Principle

1. The system calculates similarity scores between different items based on how users
interact with them.
2. Items that frequently appear together in user preferences are considered similar.
3. If a user has shown interest in one item, the system recommends other items that are
similar based on past interactions.
Example

In an e-commerce platform, if many users who purchased a smartphone also purchased a


specific brand of earphones, the system will recommend those earphones to other users who
buy the smartphone.

Advantages Over User-Based Filtering

 More Stable Over Time: Items do not change as frequently as user preferences,
making recommendations more consistent.
 Scalability: Since the number of items is generally smaller than the number of users,
item-based filtering can be more efficient.

Challenges

 Lack of Context Awareness: The system does not consider why items are rated
similarly, which may lead to irrelevant recommendations.
 Sparsity of Data: If there are few interactions with a particular item, computing its
similarity with other items becomes difficult.
Unit- 5

Association Rule Learning

Association rule learning is a type of unsupervised learning technique that checks for the
dependency of one data item on another data item and maps accordingly so that it can be more
profitable. It tries to find some interesting relations or associations among the variables of
dataset. It is based on different rules to discover the interesting relations between variables in
the database.

The association rule learning is one of the very important concepts of machine learning, and it
is employed in Market Basket analysis, Web usage mining, continuous production,
etc. Here market basket analysis is a technique used by the various big retailer to discover the
associations between items. We can understand it by taking an example of a supermarket, as in
a supermarket, all products that are purchased together are put together.

For example, if a customer buys bread, he most likely can also buy butter, eggs, or milk, so
these products are stored within a shelf or mostly nearby. Consider the below diagram:

Association rule learning can be divided into three types of algorithms:

1. Apriori
2. Eclat
3. F-P Growth Algorithm

We will understand these algorithms in later chapters.


How does Association Rule Learning work?

Association rule learning works on the concept of If and Else Statement, such as if A then B.

Here the If element is called antecedent, and then statement is called as Consequent. These
types of relationships where we can find out some association or relation between two items is
known as single cardinality. It is all about creating rules, and if the number of items increases,
then cardinality also increases accordingly. So, to measure the associations between thousands
of data items, there are several metrics. These metrics are given below:

o Support
o Confidence
o Lift
Let's understand each of them:

Support

Support is the frequency of A or how frequently an item appears in the dataset. It is defined as
the fraction of the transaction T that contains the itemset X. If there are X datasets, then for
transactions T, it can be written as:

Confidence

Confidence indicates how often the rule has been found to be true. Or how often the items X
and Y occur together in the dataset when the occurrence of X is already given. It is the ratio of
the transaction that contains X and Y to the number of records that contain X.

Lift

It is the strength of any rule, which can be defined as below formula:


It is the ratio of the observed support measure and expected support if X and Y are independent
of each other. It has three possible values:

o If Lift= 1: The probability of occurrence of antecedent and consequent is independent


of each other.
o Lift>1: It determines the degree to which the two itemsets are dependent to each other.
o Lift<1: It tells us that one item is a substitute for other items, which means one item has
a negative effect on another.

Types of Association Rule Lerning

Association rule learning can be divided into three algorithms:

Apriori Algorithm

This algorithm uses frequent datasets to generate association rules. It is designed to work on
the databases that contain transactions. This algorithm uses a breadth-first search and Hash
Tree to calculate the itemset efficiently.

It is mainly used for market basket analysis and helps to understand the products that can be
bought together. It can also be used in the healthcare field to find drug reactions for patients.

Eclat Algorithm

Eclat algorithm stands for Equivalence Class Transformation. This algorithm uses a depth-
first search technique to find frequent itemsets in a transaction database. It performs faster
execution than Apriori Algorithm.

F-P Growth Algorithm

The F-P growth algorithm stands for Frequent Pattern, and it is the improved version of the
Apriori Algorithm. It represents the database in the form of a tree structure that is known as a
frequent pattern or tree. The purpose of this frequent tree is to extract the most frequent
patterns.

Applications of Association Rule Learning

It has various applications in machine learning and data mining. Below are some popular
applications of association rule learning:
o Market Basket Analysis: It is one of the popular examples and applications of
association rule mining. This technique is commonly used by big retailers to determine
the association between items.
o Medical Diagnosis: With the help of association rules, patients can be cured easily, as
it helps in identifying the probability of illness for a particular disease.
o Protein Sequence: The association rules help in determining the synthesis of artificial
Proteins.
o It is also used for the Catalog Design and Loss-leader Analysis and many more other
applications.
Apriori Algorithm is a foundational method in data mining used for discovering frequent
itemsets and generating association rules. Its significance lies in its ability to identify
relationships between items in large datasets which is particularly valuable in market basket
analysis.
For example, if a grocery store finds that customers who buy bread often also buy butter, it
can use this information to optimize product placement or marketing strategies.
How the Apriori Algorithm Works?
The Apriori Algorithm operates through a systematic process that involves several key steps:
1. Identifying Frequent Itemsets: The algorithm begins by scanning the dataset to identify
individual items (1-item) and their frequencies. It then establishes a minimum support
threshold, which determines whether an itemset is considered frequent.
2. Creating Possible item group: Once frequent 1-itemgroup(single items) are identified,
the algorithm generates candidate 2-itemgroup by combining frequent items. This process
continues iteratively, forming larger itemsets (k-itemgroup) until no more frequent
itemgroup can be found.
3. Removing Infrequent Item groups: The algorithm employs a pruning technique based
on the Apriori Property, which states that if an itemset is infrequent, all its supersets
must also be infrequent. This significantly reduces the number of combinations that need
to be evaluated.
4. Generating Association Rules: After identifying frequent itemsets, the algorithm
generates association rules that illustrate how items relate to one another, using metrics
like support, confidence, and lift to evaluate the strength of these relationships.
Key Metrics of Apriori Algorithm
 Support: This metric measures how frequently an item appears in the dataset relative to
the total number of transactions. A higher support indicates a more significant presence
of the itemset in the dataset. Support tells us how often a particular item or combination
of items appears in all the transactions (“Bread is bought in 20% of all transactions.”)
 Confidence: Confidence assesses the likelihood that an item Y is purchased when item X
is purchased. It provides insight into the strength of the association between two items.
 Confidence tells us how often items go together. (“If bread is bought, butter is bought
75% of the time.”)
 Lift: Lift evaluates how much more likely two items are to be purchased together
compared to being purchased independently. A lift greater than 1 suggests a strong
positive association. Lift shows how strong the connection is between items. (“Bread and
butter are much more likely to be bought together than by chance.”)
Lets understand the concept of apriori Algorithm with the help of an example. Consider the
following dataset and we will find frequent itemsets and generate association rules for them:

Transactions of a Grocery Shop


Step 1 : Setting the parameters
 Minimum Support Threshold: 50% (item must appear in at least 3/5 transactions). This
threeshold is formulated from this formula:
Support(A)=Number of transactions containing itemset ATotal number of transactionsSuppor
t(A)=Total number of transactionsNumber of transactions containing itemset A
 Minimum Confidence Threshold: 70% ( You can change the value of parameters as per
the usecase and problem statement ). This threeshold is formulated from this formula:
Confidence(X→Y)=Support(X∪Y)Support(X)Confidence(X→Y)=Support(X)Support(X∪Y)
Step 2: Find Frequent 1-Itemsets
Lets count how many transactions include each item in the dataset (calculating the frequency
of each item).

Frequent 1-Itemsets
All items have support% ≥ 50%, so they qualify as frequent 1-itemsets. if any item has
support% < 50%, It will be ommited out from the frequent 1- itemsets.
Step 3: Generate Candidate 2-Itemsets
Combine the frequent 1-itemsets into pairs and calculate their support.
For this usecase, we will get 3 item pairs ( bread,butter) , (bread,ilk) and (butter,milk) and
will calculate the support similiar to step 2

Candidate 2-Itemsets
Frequent 2-itemsets:
 {Bread, Butter}, {Bread, Milk} both meet the 50% threshold but {butter,milk} doesnt
meet the threeshold, so will be ommited out.
Step 4: Generate Candidate 3-Itemsets
Combine the frequent 2-itemsets into groups of 3 and calculate their support.
for the triplet, we have only got one case i.e {bread,butter,milk} and we will calculate the
support.

Candidate 3-Itemsets
Since this does not meet the 50% threshold, there are no frequent 3-itemsets.
Step 5: Generate Association Rules
Now we generate rules from the frequent itemsets and calculate confidence.
Rule 1: If Bread → Butter (if customer buys bread, the customer will buy butter also)
 Support of {Bread, Butter} = 3.
 Support of {Bread} = 4.
 Confidence = 3/4 = 75% (Passes threshold).
Rule 2: If Butter → Bread (if customer buys butter, the customer will buy bread also)
 Support of {Bread, Butter} = 3.
 Support of {Butter} = 3.
 Confidence = 3/3 = 100% (Passes threshold).
Rule 3: If Bread → Milk (if customer buys bread, the customer will buy milk also)
 Support of {Bread, Milk} = 3.
 Support of {Bread} = 4.
 Confidence = 3/4 = 75% (Passes threshold).
The Apriori Algorithm, as demonstrated in the bread-butter example, is widely used in
modern startups like Zomato, Swiggy, and other food delivery platforms. These companies
use it to perform market basket analysis, which helps them identify customer behavior
patterns and optimize recommendations.
Applications of Apriori Algorithm
Below are some applications of Apriori algorithm used in today‟s companies and startups
1. E-commerce: Used to recommend products that are often bought together, like laptop +
laptop bag, increasing sales.
2. Food Delivery Services: Identifies popular combos, such as burger + fries, to
offer combo deals to customers.
3. Streaming Services: Recommends related movies or shows based on what users often
watch together, like action + superhero movies.
4. Financial Services: Analyzes spending habits to suggest personalized offers, such
as credit card deals based on frequent purchases.
5. Travel & Hospitality: Creates travel packages (e.g., flight + hotel) by finding
commonly purchased services together.
6. Health & Fitness: Suggests workout plans or supplements based on users‟ past activities,
like protein shakes + workouts.
The ECLAT algorithm stands for
Equivalence Class Clustering and bottom-up Lattice Traversal
. It is one of the popular methods of
Association Rule mining
. It is a more efficient and scalable version of the Apriori algorithm. While the Apriori
algorithm works in a horizontal sense imitating the Breadth-First Search of a graph, the
ECLAT algorithm works in a vertical manner just like the Depth-First Search of a graph.
This vertical approach of the ECLAT algorithm makes it a faster algorithm than the Apriori
algorithm.
How the algorithm work? :
The basic idea is to use Transaction Id Sets(tidsets) intersections to compute the support
value of a candidate and avoiding the generation of subsets which do not exist in the prefix
tree. In the first call of the function, all single items are used along with their tidsets. Then
the function is called recursively and in each recursive call, each item-tidset pair is verified
and combined with other item-tidset pairs. This process is continued until no candidate item-
tidset pairs can be combined. Let us now understand the above stated working with an
example:- Consider the following transactions record:-

The above-given data is a boolean matrix where for each cell (i, j), the value denotes whether
the j‟th item is included in the i‟th transaction or not. 1 means true while 0 means false. We
now call the function for the first time and arrange each item with it‟s tidset in a tabular
fashion:-
k = 1, minimum support = 2
Item Tidset

Bread {T1, T4, T5, T7, T8, T9}

Butter {T1, T2, T3, T4, T6, T8, T9}


Item Tidset

Milk {T3, T5, T6, T7, T8, T9}

Coke {T2, T4}

Jam {T1, T8}

We now recursively call the function till no more item-tidset pairs can be combined:-
k=2
Item Tidset

{Bread, Butter} {T1, T4, T8, T9}

{Bread, Milk} {T5, T7, T8, T9}

{Bread, Coke} {T4}

{Bread, Jam} {T1, T8}

{Butter, Milk} {T3, T6, T8, T9}

{Butter, Coke} {T2, T4}

{Butter, Jam} {T1, T8}

{Milk, Jam} {T8}

k=3
Item Tidset
Item Tidset

{Bread, Butter, Milk} {T8, T9}

{Bread, Butter, Jam} {T1, T8}

k=4
Item Tidset

{Bread, Butter, Milk, Jam} {T8}

We stop at k = 4 because there are no more item-tidset pairs to combine. Since minimum
support = 2, we conclude the following rules from the given dataset:-
Items Bought Recommended Products

Bread Butter

Bread Milk

Bread Jam

Butter Milk

Butter Coke

Butter Jam

Bread and Butter Milk

Bread and Butter Jam


Advantages over Apriori algorithm:-
1. Memory Requirements: Since the ECLAT algorithm uses a Depth-First Search
approach, it uses less memory than Apriori algorithm.
2. Speed: The ECLAT algorithm is typically faster than the Apriori algorithm.
3. Number of Computations: The ECLAT algorithm does not involve the repeated
scanning of the data to compute the individual support values.

You might also like