0% found this document useful (0 votes)

17 views37 pages

Unit 3,4,5 ML (CS - AI)

The document discusses various machine learning algorithms, focusing on distance-based methods like K-Nearest Neighbors (KNN), Decision Trees, Random Forests, and Support Vector Machines (SVM). It outlines the advantages and disadvantages of each algorithm, their applications, and how they function, including concepts like entropy and Gini impurity. Additionally, it introduces unsupervised learning and clustering techniques such as K-Means and DBSCAN, highlighting their properties and methodologies.

Uploaded by

Kala Mca

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views37 pages

Unit 3,4,5 ML (CS - AI)

Uploaded by

Kala Mca

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Distance Based Methods:

Distance-based algorithms are machine learning algorithms that classify queries

by computing distances between these queries.
K-Nearest Neighbor(KNN) Algorithm for Machine Learning
 K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on
Supervised Learning technique.
 K-NN algorithm assumes the similarity between the new case/data and available cases and
put the new case into the category that is most similar to the available categories.
 K-NN algorithm stores all the available data and classifies a new data point based on the
similarity. This means when new data appears then it can be easily classified into a well suite
category by using K- NN algorithm.
 K-NN algorithm can be used for Regression as well as for Classification but mostly it is used
for the Classification problems.
 K-NN is a non-parametric algorithm, which means it does not make any assumption on
underlying data.
 It is also called a lazy learner algorithm because it does not learn from the training set
immediately instead it stores the dataset and at the time of classification, it performs an action
on the dataset.
 KNN algorithm at the training phase just stores the dataset and when it gets new data, then it
classifies that data into a category that is much similar to the new data.
Why do we need a KNN algorithm?
Suppose there are two categories, i.e., Category A and Category B, and we have a new data
point x1, so this
data point will lie in which of these categories. To solve this type of problem, we need a K-NN
algorithm. With
the help of K-NN, we can easily identify the category or class of a particular dataset. Consider
the below
diagram:
Advantages of KNN Algorithm:
 It is simple to implement.
 It is robust to the noisy training data
 It can be more effective if the training data is large.
Disadvantages of KNN Algorithm:
 Always needs to determine the value of K which may be complex some time.
 The computation cost is high because of calculating the distance between the data points for
all the training samples.

Decision Tree Classification Algorithm:

 Decision Tree is a Supervised learning technique that can be used for both classification and
Regression problems, but mostly it is preferred for solving Classification problems. It is a tree-
structured classifier, where internal nodes represent the features of a dataset, branches represent
the decision rules and each leaf node represents the outcome.
 In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node.
Decision nodes are used to make any decision and have multiple branches, whereas Leaf nodes
are the output of those decisions and do not contain any further branches.
 The decisions or the test are performed on the basis of features of the given dataset.
 It is a graphical representation for getting all the possible solutions to a problem/decision
based on given conditions.
 It is called a decision tree because, similar to a tree, it starts with the root node, which
expands on further branches and constructs a tree-like structure.
 In order to build a tree, we use the CART algorithm, which stands for Classification and
Regression Tree algorithm.
 A decision tree simply asks a question, and based on the answer (Yes/No), it further split the
tree into subtrees.
 Below diagram explains the general structure of a decision tree:
Decision Tree Terminologies
 Root Node: Root node is from where the decision tree starts. It represents the entire dataset,
which further gets divided into two or more homogeneous sets.
 Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated further
after getting a leaf node.
 Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes
according to the given conditions.
 Branch/Sub Tree: A tree formed by splitting the tree.
 Pruning: Pruning is the process of removing the unwanted branches from the tree.
 Parent/Child node: The root node of the tree is called the parent node, and other nodes are
called the child nodes.

3
17

124%

ID3 : This algorithm measures how mixed up the data is at a node using something
called entropy. It then chooses the feature that helps to clarify the data the most.
C4.5 : This is an improved version of ID3 that can handle missing data and continuous
attributes.
CART : This algorithm uses a different measure called Gini impurity to decide how to split the
data. It can be used for both classification (sorting data into categories) and regression
(predicting continuous values) tasks.
Did you notice anything in the above flowchart? We see that if the weather is cloudy then we
must go to play. Why didn‟t it split more? Why did it stop there?

To answer this question, we need to know about few more concepts like entropy, information
gain, and Gini index. But in simple terms, I can say here that the output for the training dataset
is always yes for cloudy weather, since there is no disorderliness here we don‟t need to split the
node further.
The goal of machine learning is to decrease uncertainty or disorders from the dataset and for
this, we use these trees.
Now you must be thinking how do I know what should be the root node? what should be the
decision node? when should I stop splitting? To decide this, there is a metric called “Entropy”
which is the amount of uncertainty in the dataset.
How Decision tree algorithms work?

• Decision Tree algorithm works in simpler steps:

• Starting at the Root: The algorithm begins at the top, called the “root node,”
representing the entire dataset.

• Asking the Best Questions: It looks for the most important feature or question that
splits the data into the most distinct groups. This is like asking a question at a fork in
the tree.

• Branching Out: Based on the answer to that question, it divides the data into smaller
subsets, creating new branches. Each branch represents a possible route through the
tree.

• Repeating the Process: The algorithm continues asking questions and splitting the data
at each branch until it reaches the final “leaf nodes,” representing the predicted
outcomes or classifications.

Advantages of Decision Trees

• Easy to Understand: They are simple to visualize and interpret, making them easy to
understand even for non-experts.

• Handles Both Numerical and Categorical Data: They can work with both types of
data without needing much preprocessing.
• No Need for Data Scaling: These trees do not require normalization or scaling of data.

• Automated Feature Selection: They automatically identify the most important

features for decision-making.

• Handles Non-Linear Relationships: They can capture non-linear patterns in the data
effectively.

Disadvantages of Decision Trees

• Overfitting Risk: It can easily overfit the training data, especially if they are too deep.

• Unstable with Small Changes: Small changes in data can lead to completely different
trees.

• Biased with Imbalanced Data: They tend to be biased if one class dominates the
dataset.

• Limited to Axis-Parallel Splits: They struggle with diagonal or complex decision

boundaries.

• Can Become Complex: Large trees can become hard to interpret and may lose their
simplicity.
Applications of Decision Trees

• Healthcare

• Diagnosing diseases based on patient symptoms:

• Predicting patient outcomes and treatment effectiveness

• Identifying risk factors for specific health conditions:

• Finance

• Assessing credit risk for loan approvals

• Detecting fraudulent transactions

• Predicting stock market trends and investment risks

• Education

• Predicting student performance and outcomes

• Identifying factors affecting student dropout rates

• Personalizing learning paths for students

Random Forest Algorithm

• A Random Forest Algorithm is a supervised machine learning algorithm that is

extremely popular and is used for Classification and Regression problems in Machine
Learning.

• We know that a forest comprises numerous trees, and the more trees more it will be
robust.

• Similarly, the greater the number of trees in a Random Forest Algorithm, the higher its
accuracy and problem-solving ability.

• Random Forest is a classifier that contains several decision trees on various subsets of
the given dataset and takes the average to improve the predictive accuracy of that
dataset.

• It is based on the concept of ensemble learning which is a process of combining

multiple classifiers to solve a complex problem and improve the performance of the
model.
Steps to follow Random Forest Tree:

• The following steps explain the working Random Forest Algorithm:

• Step 1: Select random samples from a given data or training set.

• Step 2: This algorithm will construct a decision tree for every training data.

• Step 3: Voting will take place by averaging the decision tree.

• Step 4: Finally, select the most voted prediction result as the final prediction result.

This combination of multiple models is called Ensemble. Ensemble uses two methods:

Bagging: Creating a different training subset from sample training data with replacement is
called Bagging. The final output is based on majority voting.

Boosting: Combing weak learners into strong learners by creating sequential models such
that the final model has the highest accuracy is called Boosting. Example: ADA BOOST,
XG BOOST.

Bagging: From the principle mentioned above, we can understand Random forest
uses the Bagging code.
 Bagging is also known as Bootstrap Aggregation used by random forest.
 The process begins with any original random data.
 After arranging, it is organised into samples known as Bootstrap Sample. This
process is known as Bootstrapping.
 Further, the models are trained individually, yielding different results known as
Aggregation.
 In the last step, all the results are combined, and the generated output is based on
majority voting.

This step is known as Bagging and is done using an Ensemble Classifier.

Support Vector Machine Algorithm

• Support Vector Machine or SVM is one of the most popular Supervised Learning
algorithms, which is used for Classification as well as Regression problems. However,
primarily, it is used for Classification problems in Machine Learning.

• The goal of the SVM algorithm is to create the best line or decision boundary that can
segregate n-dimensional space into classes so that we can easily put the new data point
in the correct category in the future. This best decision boundary is called a hyperplane.

• SVM chooses the extreme points/vectors that help in creating the hyperplane. These
extreme cases are called as support vectors, and hence algorithm is termed as Support
Vector Machine. Consider the below diagram in which there are two different
categories that are classified using a decision boundary or hyperplane:
• Example: SVM can be understood with the example that we have used in the KNN
classifier. Suppose we see a strange cat that also has some features of dogs, so if we
want a model that can accurately identify whether it is a cat or dog, so such a model can
be created by using the SVM algorithm. We will first train our model with lots of
images of cats and dogs so that it can learn about different features of cats and dogs,
and then we test it with this strange creature. So as support vector creates a decision
boundary between these two data (cat and dog) and choose extreme cases (support
vectors), it will see the extreme case of cat and dog. On the basis of the support vectors,
it will classify it as a cat. Consider the below diagram:

• Types of SVM

• SVM can be of two types:

• Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset
can be classified into two classes by using a single straight line, then such data is
termed as linearly separable data, and classifier is used called as Linear SVM classifier.

• Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which
means if a dataset cannot be classified by using a straight line, then such data is termed
as non-linear data and classifier used is called as Non-linear SVM classifier.

How does SVM works?

Linear SVM:

Non-Linear SVM:
What is Unsupervised Learning?

• Unsupervised learning is a branch of machine learning that deals with unlabeled data.

• Unlike supervised learning, where the data is labeled with a specific category or
outcome, unsupervised learning algorithms are tasked with finding patterns and
relationships within the data without any prior knowledge of the data‟s meaning.

• Unsupervised machine learning algorithms find hidden patterns and data without any
human intervention, i.e., we don‟t give output to our model.

• The training model has only input parameter values and discovers the groups or patterns
on its own.

Unsupervised Learning Algorithms

• There are mainly 3 types of Algorithms which are used for Unsupervised dataset.

• Clustering

• Association Rule Learning

• Dimensionality Reduction
Clustering Algorithms

Clustering in unsupervised machine learning is the process of grouping unlabeled data into
clusters based on their similarities. The goal of clustering is to identify patterns and
relationships in the data without any prior knowledge of the data‟s meaning.

Broadly this technique is applied to group data based on different patterns, such as
similarities or differences, our machine model finds.

These algorithms are used to process raw, unclassified data objects into groups. For
example, in the above figure, we have not given output parameter values, so this technique
will be used to group clients based on the input parameters provided by our data.

K-means clustering

What is K-Means Clustering?

K-means clustering is a popular unsupervised machine learning algorithm used for

partitioning a dataset into a pre-defined number of clusters. The goal is to group similar
data points together and discover underlying patterns or structures within the data.

Let’s say you are

working on a
project where
you need to
predict the sales
of a big mart:

Or, a project
where your task
is to predict
whether a loan
will be approved
or not:
Properties of K means Clustering
Second Property of K-Means Clustering Algorithm

How to Apply K-Means Clustering Algorithm?

Choose the number of clusters k

The first step in k-means is to pick the number of clusters, k.

Select k random points from the data as centroids

Next, we randomly select the centroid for each cluster. Let‟s say we want to have 2 clusters, so
k is equal to 2 here. We then randomly select the centroid: k means
Assign all the points to the closest cluster centroid

Once we have initialized the centroids, we assign each point to the closest cluster centroid:

• Recompute the centroids of newly formed clusters

• Now, once we have assigned all of the points to either cluster, the next step is to
compute the centroids of newly formed clusters:

• Here, the red and green crosses are the new centroids.

• Repeat steps 3 and 4

Stopping Criteria for K-Means Clustering:

• There are essentially three stopping criteria that can be adopted to stop the K-means
algorithm:

• Centroids of newly formed clusters do not change

• Points remain in the same cluster

• Maximum number of iterations is reached

DBSCAN Clustering in ML | Density based clustering


DBSCAN is a density-based clustering algorithm that groups data points that are closely
packed together and marks outliers as noise based on their density in the feature
space. It identifies clusters as dense regions in the data space, separated by areas of lower
density.
Unlike K-Means or hierarchical clustering, which assume clusters are compact and spherical,
DBSCAN excels in handling real-world data irregularities such as:
 Arbitrary-Shaped Clusters: Clusters can take any shape, not just circular or convex.
 Noise and Outliers: It effectively identifies and handles noise points without assigning
them to any cluster.

The figure above shows a data set with clustering algorithms: K-Means and
Hierarchical handling compact, spherical clusters with varying noise tolerance, while
DBSCAN manages arbitrary-shaped clusters and excels in noise handling.
Key Parameters in DBSCAN
 1. eps: This defines the radius of the neighborhood around a data point.
If the distance between two points is less than or equal to eps, they are considered neighbors.
Choosing the right eps is crucial:
 If eps is too small, most points will be classified as noise.
 If eps is too large, clusters may merge, and the algorithm may fail to distinguish between
them.
A common method to determine eps is by analyzing the k-distance graph.
 2. MinPts: This is the minimum number of points required within the eps radius to form a
dense region.
A general rule of thumb is to set MinPts >= D+1, where D is the number of dimensions in the
dataset. For most cases, a minimum value of MinPts = 3 is recommended.
How Does DBSCAN Work?
DBSCAN works by categorizing data points into three types:
1. core points, which have a sufficient number of neighbors within a specified radius (eplison)
2. border points, which are near core points but lack enough neighbors to be core points
themselves
3. noise points, which do not belong to any cluster.
By iteratively expanding clusters from core points and connecting density-reachable points,
DBSCAN forms clusters without relying on rigid assumptions about their shape or size.

Steps in the DBSCAN Algorithm

1. Identify Core Points: For each point in the dataset, count the number of points within
its eps neighborhood. If the count meets or exceeds MinPts, mark the point as a core
point.
2. Form Clusters: For each core point that is not already assigned to a cluster, create a new
cluster. Recursively find all density-connected points (points within the eps radius of the
core point) and add them to the cluster.
3. Density Connectivity: Two points, a and b, are density-connected if there exists a chain
of points where each point is within the eps radius of the next, and at least one point in the
chain is a core point. This chaining process ensures that all points in a cluster are connected
through a series of dense regions.
4. Label Noise Points: After processing all points, any point that does not belong to a cluster
is labeled as noise.

Pseudocode For DBSCAN Clustering Algorithm

DBSCAN(dataset, eps, MinPts){
# cluster index
C=1
for each unvisited point p in dataset {
mark p as visited
# find neighbors
Neighbors N = find the neighboring points of p

if |N|>=MinPts:
N = N U N'
if p' is not a member of any cluster:
add p' to cluster C
}
Implementation Of DBSCAN Algorithm In Python

.
Import Libraries
import matplotlib.pyplot as plt
import numpy as np
from sklearn.cluster import DBSCAN
from sklearn import metrics
from sklearn.datasets import make_blobs
from sklearn.preprocessing import StandardScaler
from sklearn import datasets

# Load data in X
X, y_true = make_blobs(n_samples=300, centers=4,
cluster_std=0.50, random_state=0)

db = DBSCAN(eps=0.3, min_samples=10).fit(X)
core_samples_mask = np.zeros_like(db.labels_, dtype=bool)
core_samples_mask[db.core_sample_indices_] = True
labels = db.labels_

# Number of clusters in labels, ignoring noise if present.

n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)

# Plot result

# Black removed and is used for noise instead.

unique_labels = set(labels)
colors = ['y', 'b', 'g', 'r']
print(colors)
for k, col in zip(unique_labels, colors):
if k == -1:
# Black used for noise.
col = 'k'

class_member_mask = (labels == k)

xy = X[class_member_mask & core_samples_mask]

plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=col,
markeredgecolor='k',
markersize=6)

xy = X[class_member_mask & ~core_samples_mask]

plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=col,
markeredgecolor='k',
markersize=6)

plt.title('number of clusters: %d' % n_clusters_)

plt.show()
Output:
Cluster of dataset

DBSCAN K-Means

K-Means is very sensitive to the number

In DBSCAN we need not specify the number
of clusters so it
of clusters.
need to specified

Clusters formed in K-Means are

Clusters formed in DBSCAN can be of any
spherical or
arbitrary shape.
convex in shape

K-Means does not work well with

DBSCAN can work well with datasets having outliers data. Outliers
noise and outliers can skew the clusters in K-Means to a
very large extent.

In K-Means only one parameter is

In DBSCAN two parameters are required for
required is for training
training the Model
the model
Dimensionality Reduction Techniques

Dimensionality reduction techniques can be broadly divided into two categories:

1. Feature selection: This refers to retaining the relevant (optimal) features and
discarding the irrelevant ones to ensure the high accuracy of the model. Feature
selection methods such as filter, wrapper, and embedded methods are popularly
used.
2. Feature extraction: This process is also termed feature projection, wherein
multidimensional space is converted into a space with lower dimensions. Some
known feature extraction methods include principal component analysis (PCA),
linear discriminant analysis (LDA), Kernel PCA (K-PCA), and quadratic
discriminant analysis (QCA).

Although one can perform dimensionality reduction with several techniques, the following are
the most commonly used ones:

1. Principal component analysis (PCA)

Principal component analysis performs orthogonal transformations to convert an observation of

correlated characteristics into a set of linearly correlated features. The newly changed
characteristics are termed „principal components‟. This statistical method is a key data analysis
and predictive modeling technique.

2. Missing value ratio

When a dataset contains several missing values, such variables are eliminated as they fail to
provide relevant or reliable information. The elimination task is accomplished by defining a
threshold level, wherein a variable with more missing values than the threshold is immediately
dropped. This implies that the more the threshold value, the lesser the efficiency.
3. Backward feature elimination

This approach is typically used while developing a linear regression or logistic regression
model. In this technique, you can specify the number of features essential for ML algorithms
based on the estimated model performance and tolerated error rate.

The process begins by training the ML model using all „n‟ variables provided in the dataset.
Upon training, the model‟s performance is evaluated. After this, features are eliminated one at
a time, and the model is trained on „n-1‟ features for n times. The performance of the model is
typically re-evaluated at each step.

The above iteration is repeated until the variable that made the least or no difference to the
model‟s performance is identified. Upon identification, this variable or feature is eliminated,
and you are left with „n-1‟ features. The process is repeated until a point is reached where no
feature can be ultimately dropped from the dataset.

4. Forward feature selection

Forward feature selection is opposite to the backward elimination technique. Rather than
deleting any feature, we rely on determining the best characteristics that result in an above-
average gain in the model‟s performance.

In this approach, we start with a single feature and then progressively add features one at a
time. Here, the model is trained on each feature independently. Thus, the feature with the
highest performance is identified, and the model is iteratively run using it. The process is
repeated until the model‟s performance improves over time.

5. Random forest

Random forest is a feature selection approach with a built-in feature significance package that
identifies feature importance. As a result, the need to program it individually is eliminated.

In this approach, multiple decision trees are constructed against the target feature to identify the
variable subset using statistics for each attribute. Moreover, as random forest accepts input in
the form of numeric data, a hot encoding process is essential to convert any type of input data
into numeric data.

How Collaborative Filtering Works

Collaborative filtering works by leveraging data from multiple users to make

recommendations. Instead of relying on item attributes, it focuses on user-item interactions.
The key assumption is that if two users have agreed on certain items in the past, they are likely
to agree on other items in the future.

Example Scenario

Consider a movie recommendation system with three users:

 User U1 likes movies m1, m2, and m4

 User U2 likes movies m1, m3, and m4
 User U3 likes movie m1

To recommend a movie for User U3, we observe the following:

 User U3 has shown a preference for m1.

 Users U1 and U2 also liked m1, indicating that they have a similar taste.
 Since U1 and U2 have also liked m4, it is likely that U3 will enjoy m4 as well.
 Therefore, the system recommends movie m4 to User U3.

This example demonstrates the logic of collaborative filtering, where recommendations are
based on the preferences of similar users.

Types of Collaborative Filtering

Collaborative filtering is divided into two main types:

1. Memory-Based Collaborative Filtering

o Uses similarity measures to find relationships between users or items.
o Does not require a learning phase, making it easier to implement.
o Can be further classified into:
 User-Based Filtering: Recommends items based on the preferences of
similar users.
 Item-Based Filtering: Recommends items similar to those a user has
previously interacted with.
2. Model-Based Collaborative Filtering
o Uses machine learning techniques such as matrix factorization, deep learning,
and Bayesian models.
o Requires training a predictive model on historical user-item interactions.
o More scalable and accurate for large datasets compared to memory-based
methods.
Advantages of Collaborative Filtering

 Personalized Recommendations: Provides highly relevant suggestions based on user

behavior.
 No Need for Item Metadata: Works purely on user interaction data, making it flexible
across different domains.
 Improves Over Time: The more users interact, the better the recommendations
become.

Challenges of Collaborative Filtering

 Cold Start Problem: New users and items lack enough interaction data for
recommendations.
 Data Sparsity: Most users interact with only a small subset of available items, making
similarity calculations difficult.
 Scalability Issues: For large datasets, real-time computation of similarities can be
computationally expensive.

Collaborative Recommender Systems

Collaborative recommender systems are a type of recommendation system that suggests items
to users based on their past interactions and the behavior of similar users. Unlike content-based
systems, which rely on item attributes, collaborative filtering uses patterns in user behavior to
make predictions. There are two main types of collaborative filtering: memory-based filtering
and model-based filtering.

Memory-Based Collaborative Filtering

Memory-based collaborative filtering, also known as neighbor-based filtering, relies on the

similarity between users or items to make recommendations. It does not require an explicit
learning phase but instead works by computing relationships in the user-item interaction
matrix. This method is divided into two subtypes:
User-Based Collaborative Filtering

In user-based collaborative filtering, recommendations are made based on the preferences of

users who exhibit similar behavior. The assumption is that if two users have rated or interacted
with many items in a similar way, they are likely to have similar tastes in the future as well.

Working Principle

1. The system calculates the similarity between users by comparing their past interactions
with items.
2. Each user is assigned a weight based on their similarity with the target user.
3. A group of similar users (nearest neighbors) is selected.
4. The system predicts the target user's preferences based on the weighted average of the
ratings given by similar users.
5. Finally, the system recommends items that the similar users have liked but the target
user has not yet interacted with.

Example

Consider a movie recommendation system where User A and User B have rated several movies
in a similar way. If User B has watched and highly rated a movie that User A has not yet seen,
that movie will be recommended to User A.

Challenges

 Data Sparsity: Since most users interact with only a small portion of items, finding
enough similar users can be difficult.
 Scalability Issues: As the number of users increases, the computation of pairwise
similarities becomes complex and time-consuming.
 Cold-Start Problem: A new user without prior interactions cannot be effectively
compared to existing users.

Item-Based Collaborative Filtering

Item-based collaborative filtering focuses on the relationships between items rather than users.
The system identifies items that have been rated or interacted with similarly by multiple users
and recommends new items to users based on their past interactions.

Working Principle

1. The system calculates similarity scores between different items based on how users
interact with them.
2. Items that frequently appear together in user preferences are considered similar.
3. If a user has shown interest in one item, the system recommends other items that are
similar based on past interactions.
Example

In an e-commerce platform, if many users who purchased a smartphone also purchased a

specific brand of earphones, the system will recommend those earphones to other users who
buy the smartphone.

Advantages Over User-Based Filtering

 More Stable Over Time: Items do not change as frequently as user preferences,
making recommendations more consistent.
 Scalability: Since the number of items is generally smaller than the number of users,
item-based filtering can be more efficient.

Challenges

 Lack of Context Awareness: The system does not consider why items are rated
similarly, which may lead to irrelevant recommendations.
 Sparsity of Data: If there are few interactions with a particular item, computing its
similarity with other items becomes difficult.
Unit- 5

Association Rule Learning

Association rule learning is a type of unsupervised learning technique that checks for the
dependency of one data item on another data item and maps accordingly so that it can be more
profitable. It tries to find some interesting relations or associations among the variables of
dataset. It is based on different rules to discover the interesting relations between variables in
the database.

The association rule learning is one of the very important concepts of machine learning, and it
is employed in Market Basket analysis, Web usage mining, continuous production,
etc. Here market basket analysis is a technique used by the various big retailer to discover the
associations between items. We can understand it by taking an example of a supermarket, as in
a supermarket, all products that are purchased together are put together.

For example, if a customer buys bread, he most likely can also buy butter, eggs, or milk, so
these products are stored within a shelf or mostly nearby. Consider the below diagram:

Association rule learning can be divided into three types of algorithms:

1. Apriori
2. Eclat
3. F-P Growth Algorithm

We will understand these algorithms in later chapters.

How does Association Rule Learning work?

Association rule learning works on the concept of If and Else Statement, such as if A then B.

Here the If element is called antecedent, and then statement is called as Consequent. These
types of relationships where we can find out some association or relation between two items is
known as single cardinality. It is all about creating rules, and if the number of items increases,
then cardinality also increases accordingly. So, to measure the associations between thousands
of data items, there are several metrics. These metrics are given below:

o Support
o Confidence
o Lift
Let's understand each of them:

Support

Support is the frequency of A or how frequently an item appears in the dataset. It is defined as
the fraction of the transaction T that contains the itemset X. If there are X datasets, then for
transactions T, it can be written as:

Confidence

Confidence indicates how often the rule has been found to be true. Or how often the items X
and Y occur together in the dataset when the occurrence of X is already given. It is the ratio of
the transaction that contains X and Y to the number of records that contain X.

Lift

It is the strength of any rule, which can be defined as below formula:

It is the ratio of the observed support measure and expected support if X and Y are independent
of each other. It has three possible values:

o If Lift= 1: The probability of occurrence of antecedent and consequent is independent

of each other.
o Lift>1: It determines the degree to which the two itemsets are dependent to each other.
o Lift<1: It tells us that one item is a substitute for other items, which means one item has
a negative effect on another.

Types of Association Rule Lerning

Association rule learning can be divided into three algorithms:

Apriori Algorithm

This algorithm uses frequent datasets to generate association rules. It is designed to work on
the databases that contain transactions. This algorithm uses a breadth-first search and Hash
Tree to calculate the itemset efficiently.

It is mainly used for market basket analysis and helps to understand the products that can be
bought together. It can also be used in the healthcare field to find drug reactions for patients.

Eclat Algorithm

Eclat algorithm stands for Equivalence Class Transformation. This algorithm uses a depth-
first search technique to find frequent itemsets in a transaction database. It performs faster
execution than Apriori Algorithm.

F-P Growth Algorithm

The F-P growth algorithm stands for Frequent Pattern, and it is the improved version of the
Apriori Algorithm. It represents the database in the form of a tree structure that is known as a
frequent pattern or tree. The purpose of this frequent tree is to extract the most frequent
patterns.

Applications of Association Rule Learning

It has various applications in machine learning and data mining. Below are some popular
applications of association rule learning:
o Market Basket Analysis: It is one of the popular examples and applications of
association rule mining. This technique is commonly used by big retailers to determine
the association between items.
o Medical Diagnosis: With the help of association rules, patients can be cured easily, as
it helps in identifying the probability of illness for a particular disease.
o Protein Sequence: The association rules help in determining the synthesis of artificial
Proteins.
o It is also used for the Catalog Design and Loss-leader Analysis and many more other
applications.
Apriori Algorithm is a foundational method in data mining used for discovering frequent
itemsets and generating association rules. Its significance lies in its ability to identify
relationships between items in large datasets which is particularly valuable in market basket
analysis.
For example, if a grocery store finds that customers who buy bread often also buy butter, it
can use this information to optimize product placement or marketing strategies.
How the Apriori Algorithm Works?
The Apriori Algorithm operates through a systematic process that involves several key steps:
1. Identifying Frequent Itemsets: The algorithm begins by scanning the dataset to identify
individual items (1-item) and their frequencies. It then establishes a minimum support
threshold, which determines whether an itemset is considered frequent.
2. Creating Possible item group: Once frequent 1-itemgroup(single items) are identified,
the algorithm generates candidate 2-itemgroup by combining frequent items. This process
continues iteratively, forming larger itemsets (k-itemgroup) until no more frequent
itemgroup can be found.
3. Removing Infrequent Item groups: The algorithm employs a pruning technique based
on the Apriori Property, which states that if an itemset is infrequent, all its supersets
must also be infrequent. This significantly reduces the number of combinations that need
to be evaluated.
4. Generating Association Rules: After identifying frequent itemsets, the algorithm
generates association rules that illustrate how items relate to one another, using metrics
like support, confidence, and lift to evaluate the strength of these relationships.
Key Metrics of Apriori Algorithm
 Support: This metric measures how frequently an item appears in the dataset relative to
the total number of transactions. A higher support indicates a more significant presence
of the itemset in the dataset. Support tells us how often a particular item or combination
of items appears in all the transactions (“Bread is bought in 20% of all transactions.”)
 Confidence: Confidence assesses the likelihood that an item Y is purchased when item X
is purchased. It provides insight into the strength of the association between two items.
 Confidence tells us how often items go together. (“If bread is bought, butter is bought
75% of the time.”)
 Lift: Lift evaluates how much more likely two items are to be purchased together
compared to being purchased independently. A lift greater than 1 suggests a strong
positive association. Lift shows how strong the connection is between items. (“Bread and
butter are much more likely to be bought together than by chance.”)
Lets understand the concept of apriori Algorithm with the help of an example. Consider the
following dataset and we will find frequent itemsets and generate association rules for them:

Transactions of a Grocery Shop

Step 1 : Setting the parameters
 Minimum Support Threshold: 50% (item must appear in at least 3/5 transactions). This
threeshold is formulated from this formula:
Support(A)=Number of transactions containing itemset ATotal number of transactionsSuppor
t(A)=Total number of transactionsNumber of transactions containing itemset A
 Minimum Confidence Threshold: 70% ( You can change the value of parameters as per
the usecase and problem statement ). This threeshold is formulated from this formula:
Confidence(X→Y)=Support(X∪Y)Support(X)Confidence(X→Y)=Support(X)Support(X∪Y)
Step 2: Find Frequent 1-Itemsets
Lets count how many transactions include each item in the dataset (calculating the frequency
of each item).

Frequent 1-Itemsets
All items have support% ≥ 50%, so they qualify as frequent 1-itemsets. if any item has
support% < 50%, It will be ommited out from the frequent 1- itemsets.
Step 3: Generate Candidate 2-Itemsets
Combine the frequent 1-itemsets into pairs and calculate their support.
For this usecase, we will get 3 item pairs ( bread,butter) , (bread,ilk) and (butter,milk) and
will calculate the support similiar to step 2

Candidate 2-Itemsets
Frequent 2-itemsets:
 {Bread, Butter}, {Bread, Milk} both meet the 50% threshold but {butter,milk} doesnt
meet the threeshold, so will be ommited out.
Step 4: Generate Candidate 3-Itemsets
Combine the frequent 2-itemsets into groups of 3 and calculate their support.
for the triplet, we have only got one case i.e {bread,butter,milk} and we will calculate the
support.

Candidate 3-Itemsets
Since this does not meet the 50% threshold, there are no frequent 3-itemsets.
Step 5: Generate Association Rules
Now we generate rules from the frequent itemsets and calculate confidence.
Rule 1: If Bread → Butter (if customer buys bread, the customer will buy butter also)
 Support of {Bread, Butter} = 3.
 Support of {Bread} = 4.
 Confidence = 3/4 = 75% (Passes threshold).
Rule 2: If Butter → Bread (if customer buys butter, the customer will buy bread also)
 Support of {Bread, Butter} = 3.
 Support of {Butter} = 3.
 Confidence = 3/3 = 100% (Passes threshold).
Rule 3: If Bread → Milk (if customer buys bread, the customer will buy milk also)
 Support of {Bread, Milk} = 3.
 Support of {Bread} = 4.
 Confidence = 3/4 = 75% (Passes threshold).
The Apriori Algorithm, as demonstrated in the bread-butter example, is widely used in
modern startups like Zomato, Swiggy, and other food delivery platforms. These companies
use it to perform market basket analysis, which helps them identify customer behavior
patterns and optimize recommendations.
Applications of Apriori Algorithm
Below are some applications of Apriori algorithm used in today‟s companies and startups
1. E-commerce: Used to recommend products that are often bought together, like laptop +
laptop bag, increasing sales.
2. Food Delivery Services: Identifies popular combos, such as burger + fries, to
offer combo deals to customers.
3. Streaming Services: Recommends related movies or shows based on what users often
watch together, like action + superhero movies.
4. Financial Services: Analyzes spending habits to suggest personalized offers, such
as credit card deals based on frequent purchases.
5. Travel & Hospitality: Creates travel packages (e.g., flight + hotel) by finding
commonly purchased services together.
6. Health & Fitness: Suggests workout plans or supplements based on users‟ past activities,
like protein shakes + workouts.
The ECLAT algorithm stands for
Equivalence Class Clustering and bottom-up Lattice Traversal
. It is one of the popular methods of
Association Rule mining
. It is a more efficient and scalable version of the Apriori algorithm. While the Apriori
algorithm works in a horizontal sense imitating the Breadth-First Search of a graph, the
ECLAT algorithm works in a vertical manner just like the Depth-First Search of a graph.
This vertical approach of the ECLAT algorithm makes it a faster algorithm than the Apriori
algorithm.
How the algorithm work? :
The basic idea is to use Transaction Id Sets(tidsets) intersections to compute the support
value of a candidate and avoiding the generation of subsets which do not exist in the prefix
tree. In the first call of the function, all single items are used along with their tidsets. Then
the function is called recursively and in each recursive call, each item-tidset pair is verified
and combined with other item-tidset pairs. This process is continued until no candidate item-
tidset pairs can be combined. Let us now understand the above stated working with an
example:- Consider the following transactions record:-

The above-given data is a boolean matrix where for each cell (i, j), the value denotes whether
the j‟th item is included in the i‟th transaction or not. 1 means true while 0 means false. We
now call the function for the first time and arrange each item with it‟s tidset in a tabular
fashion:-
k = 1, minimum support = 2
Item Tidset

Bread {T1, T4, T5, T7, T8, T9}

Butter {T1, T2, T3, T4, T6, T8, T9}

Item Tidset

Milk {T3, T5, T6, T7, T8, T9}

Coke {T2, T4}

Jam {T1, T8}

We now recursively call the function till no more item-tidset pairs can be combined:-
k=2
Item Tidset

{Bread, Butter} {T1, T4, T8, T9}

{Bread, Milk} {T5, T7, T8, T9}

{Bread, Coke} {T4}

{Bread, Jam} {T1, T8}

{Butter, Milk} {T3, T6, T8, T9}

{Butter, Coke} {T2, T4}

{Butter, Jam} {T1, T8}

{Milk, Jam} {T8}

k=3
Item Tidset
Item Tidset

{Bread, Butter, Milk} {T8, T9}

{Bread, Butter, Jam} {T1, T8}

k=4
Item Tidset

{Bread, Butter, Milk, Jam} {T8}

We stop at k = 4 because there are no more item-tidset pairs to combine. Since minimum
support = 2, we conclude the following rules from the given dataset:-
Items Bought Recommended Products

Bread Butter

Bread Milk

Bread Jam

Butter Milk

Butter Coke

Butter Jam

Bread and Butter Milk

Bread and Butter Jam

Advantages over Apriori algorithm:-
1. Memory Requirements: Since the ECLAT algorithm uses a Depth-First Search
approach, it uses less memory than Apriori algorithm.
2. Speed: The ECLAT algorithm is typically faster than the Apriori algorithm.
3. Number of Computations: The ECLAT algorithm does not involve the repeated
scanning of the data to compute the individual support values.

Ml-Unit Iii-1
No ratings yet
Ml-Unit Iii-1
46 pages
What Is Decision Tree
No ratings yet
What Is Decision Tree
35 pages
Dmi Unit 4
No ratings yet
Dmi Unit 4
34 pages
ML Unit-Ii Notes
No ratings yet
ML Unit-Ii Notes
17 pages
Unit - II
No ratings yet
Unit - II
37 pages
Unit 3 - ML (NEW)
No ratings yet
Unit 3 - ML (NEW)
68 pages
Unit 4
No ratings yet
Unit 4
33 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
4 pages
2.unit 2
No ratings yet
2.unit 2
23 pages
ML Unit-2
No ratings yet
ML Unit-2
26 pages
ML Notes
No ratings yet
ML Notes
50 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
25 pages
Lecture-4 Unit 2
No ratings yet
Lecture-4 Unit 2
73 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Unit Iir20
No ratings yet
Unit Iir20
22 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
34 pages
DataMining-Handouts1 5
No ratings yet
DataMining-Handouts1 5
8 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
30 pages
ML-PPT Unit Iii-1
No ratings yet
ML-PPT Unit Iii-1
38 pages
Decision Tree
0% (1)
Decision Tree
16 pages
DS Unit - 4
No ratings yet
DS Unit - 4
76 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
Lecture 3 - Decision Trees and Random Forest
No ratings yet
Lecture 3 - Decision Trees and Random Forest
20 pages
Business Analytics: Foundation: Material Handouts
No ratings yet
Business Analytics: Foundation: Material Handouts
7 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
Decision Trees for Beginners
No ratings yet
Decision Trees for Beginners
45 pages
Entropy and Information Gain For Decision Tree Algorithm
No ratings yet
Entropy and Information Gain For Decision Tree Algorithm
12 pages
Unit 3 (MLT)
No ratings yet
Unit 3 (MLT)
42 pages
Lecture Note #5 - PEC-CS701E
No ratings yet
Lecture Note #5 - PEC-CS701E
16 pages
FMLanswerkey-IT 2
No ratings yet
FMLanswerkey-IT 2
11 pages
Decision Tree & Random ForestNotes
No ratings yet
Decision Tree & Random ForestNotes
11 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
14 pages
Unit Ii
No ratings yet
Unit Ii
102 pages
2.12 Chapter 6 Decision Tree
No ratings yet
2.12 Chapter 6 Decision Tree
56 pages
Decision Tree and Random Forest
No ratings yet
Decision Tree and Random Forest
41 pages
ML Unit4
No ratings yet
ML Unit4
10 pages
1822 B.E Cse Batchno 149
No ratings yet
1822 B.E Cse Batchno 149
66 pages
ML Assignment 2 PDF
No ratings yet
ML Assignment 2 PDF
9 pages
Unit 3
No ratings yet
Unit 3
25 pages
Decision Tree Induction Algorithm
No ratings yet
Decision Tree Induction Algorithm
6 pages
Unit3 ML
No ratings yet
Unit3 ML
23 pages
Decision Tree
No ratings yet
Decision Tree
13 pages
ML Unit 2 Final - III Yr
No ratings yet
ML Unit 2 Final - III Yr
72 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
Decisiontree, Prefixcodeandgametree
No ratings yet
Decisiontree, Prefixcodeandgametree
12 pages
Random Forests Simplified
No ratings yet
Random Forests Simplified
39 pages
Naive Bayes and Decision Tree Classification
No ratings yet
Naive Bayes and Decision Tree Classification
21 pages
ML (Interview)
No ratings yet
ML (Interview)
20 pages
K-NN & Decision Tree Algorithms
No ratings yet
K-NN & Decision Tree Algorithms
29 pages
Decision Trees
No ratings yet
Decision Trees
18 pages
Machine Learning: Supervised Learning Basics
No ratings yet
Machine Learning: Supervised Learning Basics
46 pages
TEAA - Tree Ensembles-1
No ratings yet
TEAA - Tree Ensembles-1
43 pages
Unit 4 Classification & Prediction
No ratings yet
Unit 4 Classification & Prediction
10 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
17 pages
UNIT-3 ML Notes
No ratings yet
UNIT-3 ML Notes
4 pages
Decision Trees
No ratings yet
Decision Trees
26 pages
Decsion Tree
No ratings yet
Decsion Tree
6 pages
Building Recommender Systems With Machine Learning and AI - Kane, Frank - 2nd, 2018 - Independently Published - Anna's Archive
100% (1)
Building Recommender Systems With Machine Learning and AI - Kane, Frank - 2nd, 2018 - Independently Published - Anna's Archive
503 pages
Bot Detection Avoidance
No ratings yet
Bot Detection Avoidance
2 pages
When Large Language Models Meet Personalization Perspectives of Challenges and Opportunities
No ratings yet
When Large Language Models Meet Personalization Perspectives of Challenges and Opportunities
45 pages
Business Analytics: Emerging Trends and Future Impacts: Learning Objectives For Chapter 14
No ratings yet
Business Analytics: Emerging Trends and Future Impacts: Learning Objectives For Chapter 14
18 pages
CS8080 Information Retrieval Guide
No ratings yet
CS8080 Information Retrieval Guide
10 pages
Chapter 16. People To People Reciprocal Recommenders
No ratings yet
Chapter 16. People To People Reciprocal Recommenders
23 pages
A Framework For Improving E-Commerce Websites Usability Using A Hybrid Genetic Algorithm and Neural Network System
No ratings yet
A Framework For Improving E-Commerce Websites Usability Using A Hybrid Genetic Algorithm and Neural Network System
13 pages
Master Thesis Chapter 1
100% (2)
Master Thesis Chapter 1
7 pages
Dynamic AD Targeting-12
No ratings yet
Dynamic AD Targeting-12
71 pages
Mit Data Science Machine Learning Program Brochure
No ratings yet
Mit Data Science Machine Learning Program Brochure
17 pages
Data Mining for Grad Students
No ratings yet
Data Mining for Grad Students
79 pages
Recommender System With PHP-SQL
100% (1)
Recommender System With PHP-SQL
8 pages
A Literature Review and Classification of Recommender Systems Research PDF
100% (2)
A Literature Review and Classification of Recommender Systems Research PDF
7 pages
Bitspilani ML Ai Wilp
No ratings yet
Bitspilani ML Ai Wilp
31 pages
A Hybrid Fuzzy-Based Personalized Recommender System For Telecom Products Services
No ratings yet
A Hybrid Fuzzy-Based Personalized Recommender System For Telecom Products Services
21 pages
Movie Recommender System Analysis
No ratings yet
Movie Recommender System Analysis
26 pages
Movie Recommendation System Using Content Based Filtering
No ratings yet
Movie Recommendation System Using Content Based Filtering
10 pages
Movie Amp Song Recommender System Based On Emotion Detection Using Machine Learning
No ratings yet
Movie Amp Song Recommender System Based On Emotion Detection Using Machine Learning
5 pages
A Deep Learning Recommender System For Anime
No ratings yet
A Deep Learning Recommender System For Anime
23 pages
A Research of Job Recommendation System Based On Collaborative Filtering
No ratings yet
A Research of Job Recommendation System Based On Collaborative Filtering
6 pages
Deep Learning Notes
100% (1)
Deep Learning Notes
71 pages
AI Solutions and Business Plans
No ratings yet
AI Solutions and Business Plans
13 pages
A Theory-Driven Design Framework For Social Recommender Systems
No ratings yet
A Theory-Driven Design Framework For Social Recommender Systems
37 pages
A Study On Enhancing Recommendation Systems For Experience Goods
No ratings yet
A Study On Enhancing Recommendation Systems For Experience Goods
72 pages
Book Recommendation Systems A Deep Dive
No ratings yet
Book Recommendation Systems A Deep Dive
8 pages
Internship Report
No ratings yet
Internship Report
65 pages
DMBAR Chapter 14 Association Rules and Collaborative Filtering
No ratings yet
DMBAR Chapter 14 Association Rules and Collaborative Filtering
21 pages
Movie Recommendation System Using Cosine Similarity and KNN: II. Related Work
No ratings yet
Movie Recommendation System Using Cosine Similarity and KNN: II. Related Work
4 pages
Movie Recommendation System VAIBHAV
No ratings yet
Movie Recommendation System VAIBHAV
46 pages
Movie Recommendation System Using Machine Learning Techniques
No ratings yet
Movie Recommendation System Using Machine Learning Techniques
21 pages