Unit 3,4,5 ML (CS - AI)
Unit 3,4,5 ML (CS - AI)
3
17
124%
ID3 : This algorithm measures how mixed up the data is at a node using something
called entropy. It then chooses the feature that helps to clarify the data the most.
C4.5 : This is an improved version of ID3 that can handle missing data and continuous
attributes.
CART : This algorithm uses a different measure called Gini impurity to decide how to split the
data. It can be used for both classification (sorting data into categories) and regression
(predicting continuous values) tasks.
Did you notice anything in the above flowchart? We see that if the weather is cloudy then we
must go to play. Why didn‟t it split more? Why did it stop there?
To answer this question, we need to know about few more concepts like entropy, information
gain, and Gini index. But in simple terms, I can say here that the output for the training dataset
is always yes for cloudy weather, since there is no disorderliness here we don‟t need to split the
node further.
The goal of machine learning is to decrease uncertainty or disorders from the dataset and for
this, we use these trees.
Now you must be thinking how do I know what should be the root node? what should be the
decision node? when should I stop splitting? To decide this, there is a metric called “Entropy”
which is the amount of uncertainty in the dataset.
How Decision tree algorithms work?
• Starting at the Root: The algorithm begins at the top, called the “root node,”
representing the entire dataset.
• Asking the Best Questions: It looks for the most important feature or question that
splits the data into the most distinct groups. This is like asking a question at a fork in
the tree.
• Branching Out: Based on the answer to that question, it divides the data into smaller
subsets, creating new branches. Each branch represents a possible route through the
tree.
• Repeating the Process: The algorithm continues asking questions and splitting the data
at each branch until it reaches the final “leaf nodes,” representing the predicted
outcomes or classifications.
• Easy to Understand: They are simple to visualize and interpret, making them easy to
understand even for non-experts.
• Handles Both Numerical and Categorical Data: They can work with both types of
data without needing much preprocessing.
• No Need for Data Scaling: These trees do not require normalization or scaling of data.
• Handles Non-Linear Relationships: They can capture non-linear patterns in the data
effectively.
• Overfitting Risk: It can easily overfit the training data, especially if they are too deep.
• Unstable with Small Changes: Small changes in data can lead to completely different
trees.
• Biased with Imbalanced Data: They tend to be biased if one class dominates the
dataset.
• Can Become Complex: Large trees can become hard to interpret and may lose their
simplicity.
Applications of Decision Trees
• Healthcare
• Finance
• Education
• We know that a forest comprises numerous trees, and the more trees more it will be
robust.
• Similarly, the greater the number of trees in a Random Forest Algorithm, the higher its
accuracy and problem-solving ability.
• Random Forest is a classifier that contains several decision trees on various subsets of
the given dataset and takes the average to improve the predictive accuracy of that
dataset.
• Step 2: This algorithm will construct a decision tree for every training data.
• Step 4: Finally, select the most voted prediction result as the final prediction result.
This combination of multiple models is called Ensemble. Ensemble uses two methods:
Bagging: Creating a different training subset from sample training data with replacement is
called Bagging. The final output is based on majority voting.
Boosting: Combing weak learners into strong learners by creating sequential models such
that the final model has the highest accuracy is called Boosting. Example: ADA BOOST,
XG BOOST.
Bagging: From the principle mentioned above, we can understand Random forest
uses the Bagging code.
Bagging is also known as Bootstrap Aggregation used by random forest.
The process begins with any original random data.
After arranging, it is organised into samples known as Bootstrap Sample. This
process is known as Bootstrapping.
Further, the models are trained individually, yielding different results known as
Aggregation.
In the last step, all the results are combined, and the generated output is based on
majority voting.
• Support Vector Machine or SVM is one of the most popular Supervised Learning
algorithms, which is used for Classification as well as Regression problems. However,
primarily, it is used for Classification problems in Machine Learning.
• The goal of the SVM algorithm is to create the best line or decision boundary that can
segregate n-dimensional space into classes so that we can easily put the new data point
in the correct category in the future. This best decision boundary is called a hyperplane.
• SVM chooses the extreme points/vectors that help in creating the hyperplane. These
extreme cases are called as support vectors, and hence algorithm is termed as Support
Vector Machine. Consider the below diagram in which there are two different
categories that are classified using a decision boundary or hyperplane:
• Example: SVM can be understood with the example that we have used in the KNN
classifier. Suppose we see a strange cat that also has some features of dogs, so if we
want a model that can accurately identify whether it is a cat or dog, so such a model can
be created by using the SVM algorithm. We will first train our model with lots of
images of cats and dogs so that it can learn about different features of cats and dogs,
and then we test it with this strange creature. So as support vector creates a decision
boundary between these two data (cat and dog) and choose extreme cases (support
vectors), it will see the extreme case of cat and dog. On the basis of the support vectors,
it will classify it as a cat. Consider the below diagram:
• Types of SVM
• Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which
means if a dataset cannot be classified by using a straight line, then such data is termed
as non-linear data and classifier used is called as Non-linear SVM classifier.
Non-Linear SVM:
What is Unsupervised Learning?
• Unsupervised learning is a branch of machine learning that deals with unlabeled data.
• Unlike supervised learning, where the data is labeled with a specific category or
outcome, unsupervised learning algorithms are tasked with finding patterns and
relationships within the data without any prior knowledge of the data‟s meaning.
• Unsupervised machine learning algorithms find hidden patterns and data without any
human intervention, i.e., we don‟t give output to our model.
• The training model has only input parameter values and discovers the groups or patterns
on its own.
• There are mainly 3 types of Algorithms which are used for Unsupervised dataset.
• Clustering
• Dimensionality Reduction
Clustering Algorithms
Clustering in unsupervised machine learning is the process of grouping unlabeled data into
clusters based on their similarities. The goal of clustering is to identify patterns and
relationships in the data without any prior knowledge of the data‟s meaning.
Broadly this technique is applied to group data based on different patterns, such as
similarities or differences, our machine model finds.
These algorithms are used to process raw, unclassified data objects into groups. For
example, in the above figure, we have not given output parameter values, so this technique
will be used to group clients based on the input parameters provided by our data.
K-means clustering
Or, a project
where your task
is to predict
whether a loan
will be approved
or not:
Properties of K means Clustering
Second Property of K-Means Clustering Algorithm
Next, we randomly select the centroid for each cluster. Let‟s say we want to have 2 clusters, so
k is equal to 2 here. We then randomly select the centroid: k means
Assign all the points to the closest cluster centroid
Once we have initialized the centroids, we assign each point to the closest cluster centroid:
• Now, once we have assigned all of the points to either cluster, the next step is to
compute the centroids of newly formed clusters:
• Here, the red and green crosses are the new centroids.
• There are essentially three stopping criteria that can be adopted to stop the K-means
algorithm:
DBSCAN is a density-based clustering algorithm that groups data points that are closely
packed together and marks outliers as noise based on their density in the feature
space. It identifies clusters as dense regions in the data space, separated by areas of lower
density.
Unlike K-Means or hierarchical clustering, which assume clusters are compact and spherical,
DBSCAN excels in handling real-world data irregularities such as:
Arbitrary-Shaped Clusters: Clusters can take any shape, not just circular or convex.
Noise and Outliers: It effectively identifies and handles noise points without assigning
them to any cluster.
The figure above shows a data set with clustering algorithms: K-Means and
Hierarchical handling compact, spherical clusters with varying noise tolerance, while
DBSCAN manages arbitrary-shaped clusters and excels in noise handling.
Key Parameters in DBSCAN
1. eps: This defines the radius of the neighborhood around a data point.
If the distance between two points is less than or equal to eps, they are considered neighbors.
Choosing the right eps is crucial:
If eps is too small, most points will be classified as noise.
If eps is too large, clusters may merge, and the algorithm may fail to distinguish between
them.
A common method to determine eps is by analyzing the k-distance graph.
2. MinPts: This is the minimum number of points required within the eps radius to form a
dense region.
A general rule of thumb is to set MinPts >= D+1, where D is the number of dimensions in the
dataset. For most cases, a minimum value of MinPts = 3 is recommended.
How Does DBSCAN Work?
DBSCAN works by categorizing data points into three types:
1. core points, which have a sufficient number of neighbors within a specified radius (eplison)
2. border points, which are near core points but lack enough neighbors to be core points
themselves
3. noise points, which do not belong to any cluster.
By iteratively expanding clusters from core points and connecting density-reachable points,
DBSCAN forms clusters without relying on rigid assumptions about their shape or size.
if |N|>=MinPts:
N = N U N'
if p' is not a member of any cluster:
add p' to cluster C
}
Implementation Of DBSCAN Algorithm In Python
.
Import Libraries
import matplotlib.pyplot as plt
import numpy as np
from sklearn.cluster import DBSCAN
from sklearn import metrics
from sklearn.datasets import make_blobs
from sklearn.preprocessing import StandardScaler
from sklearn import datasets
# Load data in X
X, y_true = make_blobs(n_samples=300, centers=4,
cluster_std=0.50, random_state=0)
db = DBSCAN(eps=0.3, min_samples=10).fit(X)
core_samples_mask = np.zeros_like(db.labels_, dtype=bool)
core_samples_mask[db.core_sample_indices_] = True
labels = db.labels_
# Plot result
class_member_mask = (labels == k)
DBSCAN K-Means
1. Feature selection: This refers to retaining the relevant (optimal) features and
discarding the irrelevant ones to ensure the high accuracy of the model. Feature
selection methods such as filter, wrapper, and embedded methods are popularly
used.
2. Feature extraction: This process is also termed feature projection, wherein
multidimensional space is converted into a space with lower dimensions. Some
known feature extraction methods include principal component analysis (PCA),
linear discriminant analysis (LDA), Kernel PCA (K-PCA), and quadratic
discriminant analysis (QCA).
Although one can perform dimensionality reduction with several techniques, the following are
the most commonly used ones:
When a dataset contains several missing values, such variables are eliminated as they fail to
provide relevant or reliable information. The elimination task is accomplished by defining a
threshold level, wherein a variable with more missing values than the threshold is immediately
dropped. This implies that the more the threshold value, the lesser the efficiency.
3. Backward feature elimination
This approach is typically used while developing a linear regression or logistic regression
model. In this technique, you can specify the number of features essential for ML algorithms
based on the estimated model performance and tolerated error rate.
The process begins by training the ML model using all „n‟ variables provided in the dataset.
Upon training, the model‟s performance is evaluated. After this, features are eliminated one at
a time, and the model is trained on „n-1‟ features for n times. The performance of the model is
typically re-evaluated at each step.
The above iteration is repeated until the variable that made the least or no difference to the
model‟s performance is identified. Upon identification, this variable or feature is eliminated,
and you are left with „n-1‟ features. The process is repeated until a point is reached where no
feature can be ultimately dropped from the dataset.
Forward feature selection is opposite to the backward elimination technique. Rather than
deleting any feature, we rely on determining the best characteristics that result in an above-
average gain in the model‟s performance.
In this approach, we start with a single feature and then progressively add features one at a
time. Here, the model is trained on each feature independently. Thus, the feature with the
highest performance is identified, and the model is iteratively run using it. The process is
repeated until the model‟s performance improves over time.
5. Random forest
Random forest is a feature selection approach with a built-in feature significance package that
identifies feature importance. As a result, the need to program it individually is eliminated.
In this approach, multiple decision trees are constructed against the target feature to identify the
variable subset using statistics for each attribute. Moreover, as random forest accepts input in
the form of numeric data, a hot encoding process is essential to convert any type of input data
into numeric data.
Example Scenario
This example demonstrates the logic of collaborative filtering, where recommendations are
based on the preferences of similar users.
Cold Start Problem: New users and items lack enough interaction data for
recommendations.
Data Sparsity: Most users interact with only a small subset of available items, making
similarity calculations difficult.
Scalability Issues: For large datasets, real-time computation of similarities can be
computationally expensive.
Collaborative recommender systems are a type of recommendation system that suggests items
to users based on their past interactions and the behavior of similar users. Unlike content-based
systems, which rely on item attributes, collaborative filtering uses patterns in user behavior to
make predictions. There are two main types of collaborative filtering: memory-based filtering
and model-based filtering.
Working Principle
1. The system calculates the similarity between users by comparing their past interactions
with items.
2. Each user is assigned a weight based on their similarity with the target user.
3. A group of similar users (nearest neighbors) is selected.
4. The system predicts the target user's preferences based on the weighted average of the
ratings given by similar users.
5. Finally, the system recommends items that the similar users have liked but the target
user has not yet interacted with.
Example
Consider a movie recommendation system where User A and User B have rated several movies
in a similar way. If User B has watched and highly rated a movie that User A has not yet seen,
that movie will be recommended to User A.
Challenges
Data Sparsity: Since most users interact with only a small portion of items, finding
enough similar users can be difficult.
Scalability Issues: As the number of users increases, the computation of pairwise
similarities becomes complex and time-consuming.
Cold-Start Problem: A new user without prior interactions cannot be effectively
compared to existing users.
Item-based collaborative filtering focuses on the relationships between items rather than users.
The system identifies items that have been rated or interacted with similarly by multiple users
and recommends new items to users based on their past interactions.
Working Principle
1. The system calculates similarity scores between different items based on how users
interact with them.
2. Items that frequently appear together in user preferences are considered similar.
3. If a user has shown interest in one item, the system recommends other items that are
similar based on past interactions.
Example
More Stable Over Time: Items do not change as frequently as user preferences,
making recommendations more consistent.
Scalability: Since the number of items is generally smaller than the number of users,
item-based filtering can be more efficient.
Challenges
Lack of Context Awareness: The system does not consider why items are rated
similarly, which may lead to irrelevant recommendations.
Sparsity of Data: If there are few interactions with a particular item, computing its
similarity with other items becomes difficult.
Unit- 5
Association rule learning is a type of unsupervised learning technique that checks for the
dependency of one data item on another data item and maps accordingly so that it can be more
profitable. It tries to find some interesting relations or associations among the variables of
dataset. It is based on different rules to discover the interesting relations between variables in
the database.
The association rule learning is one of the very important concepts of machine learning, and it
is employed in Market Basket analysis, Web usage mining, continuous production,
etc. Here market basket analysis is a technique used by the various big retailer to discover the
associations between items. We can understand it by taking an example of a supermarket, as in
a supermarket, all products that are purchased together are put together.
For example, if a customer buys bread, he most likely can also buy butter, eggs, or milk, so
these products are stored within a shelf or mostly nearby. Consider the below diagram:
1. Apriori
2. Eclat
3. F-P Growth Algorithm
Association rule learning works on the concept of If and Else Statement, such as if A then B.
Here the If element is called antecedent, and then statement is called as Consequent. These
types of relationships where we can find out some association or relation between two items is
known as single cardinality. It is all about creating rules, and if the number of items increases,
then cardinality also increases accordingly. So, to measure the associations between thousands
of data items, there are several metrics. These metrics are given below:
o Support
o Confidence
o Lift
Let's understand each of them:
Support
Support is the frequency of A or how frequently an item appears in the dataset. It is defined as
the fraction of the transaction T that contains the itemset X. If there are X datasets, then for
transactions T, it can be written as:
Confidence
Confidence indicates how often the rule has been found to be true. Or how often the items X
and Y occur together in the dataset when the occurrence of X is already given. It is the ratio of
the transaction that contains X and Y to the number of records that contain X.
Lift
Apriori Algorithm
This algorithm uses frequent datasets to generate association rules. It is designed to work on
the databases that contain transactions. This algorithm uses a breadth-first search and Hash
Tree to calculate the itemset efficiently.
It is mainly used for market basket analysis and helps to understand the products that can be
bought together. It can also be used in the healthcare field to find drug reactions for patients.
Eclat Algorithm
Eclat algorithm stands for Equivalence Class Transformation. This algorithm uses a depth-
first search technique to find frequent itemsets in a transaction database. It performs faster
execution than Apriori Algorithm.
The F-P growth algorithm stands for Frequent Pattern, and it is the improved version of the
Apriori Algorithm. It represents the database in the form of a tree structure that is known as a
frequent pattern or tree. The purpose of this frequent tree is to extract the most frequent
patterns.
It has various applications in machine learning and data mining. Below are some popular
applications of association rule learning:
o Market Basket Analysis: It is one of the popular examples and applications of
association rule mining. This technique is commonly used by big retailers to determine
the association between items.
o Medical Diagnosis: With the help of association rules, patients can be cured easily, as
it helps in identifying the probability of illness for a particular disease.
o Protein Sequence: The association rules help in determining the synthesis of artificial
Proteins.
o It is also used for the Catalog Design and Loss-leader Analysis and many more other
applications.
Apriori Algorithm is a foundational method in data mining used for discovering frequent
itemsets and generating association rules. Its significance lies in its ability to identify
relationships between items in large datasets which is particularly valuable in market basket
analysis.
For example, if a grocery store finds that customers who buy bread often also buy butter, it
can use this information to optimize product placement or marketing strategies.
How the Apriori Algorithm Works?
The Apriori Algorithm operates through a systematic process that involves several key steps:
1. Identifying Frequent Itemsets: The algorithm begins by scanning the dataset to identify
individual items (1-item) and their frequencies. It then establishes a minimum support
threshold, which determines whether an itemset is considered frequent.
2. Creating Possible item group: Once frequent 1-itemgroup(single items) are identified,
the algorithm generates candidate 2-itemgroup by combining frequent items. This process
continues iteratively, forming larger itemsets (k-itemgroup) until no more frequent
itemgroup can be found.
3. Removing Infrequent Item groups: The algorithm employs a pruning technique based
on the Apriori Property, which states that if an itemset is infrequent, all its supersets
must also be infrequent. This significantly reduces the number of combinations that need
to be evaluated.
4. Generating Association Rules: After identifying frequent itemsets, the algorithm
generates association rules that illustrate how items relate to one another, using metrics
like support, confidence, and lift to evaluate the strength of these relationships.
Key Metrics of Apriori Algorithm
Support: This metric measures how frequently an item appears in the dataset relative to
the total number of transactions. A higher support indicates a more significant presence
of the itemset in the dataset. Support tells us how often a particular item or combination
of items appears in all the transactions (“Bread is bought in 20% of all transactions.”)
Confidence: Confidence assesses the likelihood that an item Y is purchased when item X
is purchased. It provides insight into the strength of the association between two items.
Confidence tells us how often items go together. (“If bread is bought, butter is bought
75% of the time.”)
Lift: Lift evaluates how much more likely two items are to be purchased together
compared to being purchased independently. A lift greater than 1 suggests a strong
positive association. Lift shows how strong the connection is between items. (“Bread and
butter are much more likely to be bought together than by chance.”)
Lets understand the concept of apriori Algorithm with the help of an example. Consider the
following dataset and we will find frequent itemsets and generate association rules for them:
Frequent 1-Itemsets
All items have support% ≥ 50%, so they qualify as frequent 1-itemsets. if any item has
support% < 50%, It will be ommited out from the frequent 1- itemsets.
Step 3: Generate Candidate 2-Itemsets
Combine the frequent 1-itemsets into pairs and calculate their support.
For this usecase, we will get 3 item pairs ( bread,butter) , (bread,ilk) and (butter,milk) and
will calculate the support similiar to step 2
Candidate 2-Itemsets
Frequent 2-itemsets:
{Bread, Butter}, {Bread, Milk} both meet the 50% threshold but {butter,milk} doesnt
meet the threeshold, so will be ommited out.
Step 4: Generate Candidate 3-Itemsets
Combine the frequent 2-itemsets into groups of 3 and calculate their support.
for the triplet, we have only got one case i.e {bread,butter,milk} and we will calculate the
support.
Candidate 3-Itemsets
Since this does not meet the 50% threshold, there are no frequent 3-itemsets.
Step 5: Generate Association Rules
Now we generate rules from the frequent itemsets and calculate confidence.
Rule 1: If Bread → Butter (if customer buys bread, the customer will buy butter also)
Support of {Bread, Butter} = 3.
Support of {Bread} = 4.
Confidence = 3/4 = 75% (Passes threshold).
Rule 2: If Butter → Bread (if customer buys butter, the customer will buy bread also)
Support of {Bread, Butter} = 3.
Support of {Butter} = 3.
Confidence = 3/3 = 100% (Passes threshold).
Rule 3: If Bread → Milk (if customer buys bread, the customer will buy milk also)
Support of {Bread, Milk} = 3.
Support of {Bread} = 4.
Confidence = 3/4 = 75% (Passes threshold).
The Apriori Algorithm, as demonstrated in the bread-butter example, is widely used in
modern startups like Zomato, Swiggy, and other food delivery platforms. These companies
use it to perform market basket analysis, which helps them identify customer behavior
patterns and optimize recommendations.
Applications of Apriori Algorithm
Below are some applications of Apriori algorithm used in today‟s companies and startups
1. E-commerce: Used to recommend products that are often bought together, like laptop +
laptop bag, increasing sales.
2. Food Delivery Services: Identifies popular combos, such as burger + fries, to
offer combo deals to customers.
3. Streaming Services: Recommends related movies or shows based on what users often
watch together, like action + superhero movies.
4. Financial Services: Analyzes spending habits to suggest personalized offers, such
as credit card deals based on frequent purchases.
5. Travel & Hospitality: Creates travel packages (e.g., flight + hotel) by finding
commonly purchased services together.
6. Health & Fitness: Suggests workout plans or supplements based on users‟ past activities,
like protein shakes + workouts.
The ECLAT algorithm stands for
Equivalence Class Clustering and bottom-up Lattice Traversal
. It is one of the popular methods of
Association Rule mining
. It is a more efficient and scalable version of the Apriori algorithm. While the Apriori
algorithm works in a horizontal sense imitating the Breadth-First Search of a graph, the
ECLAT algorithm works in a vertical manner just like the Depth-First Search of a graph.
This vertical approach of the ECLAT algorithm makes it a faster algorithm than the Apriori
algorithm.
How the algorithm work? :
The basic idea is to use Transaction Id Sets(tidsets) intersections to compute the support
value of a candidate and avoiding the generation of subsets which do not exist in the prefix
tree. In the first call of the function, all single items are used along with their tidsets. Then
the function is called recursively and in each recursive call, each item-tidset pair is verified
and combined with other item-tidset pairs. This process is continued until no candidate item-
tidset pairs can be combined. Let us now understand the above stated working with an
example:- Consider the following transactions record:-
The above-given data is a boolean matrix where for each cell (i, j), the value denotes whether
the j‟th item is included in the i‟th transaction or not. 1 means true while 0 means false. We
now call the function for the first time and arrange each item with it‟s tidset in a tabular
fashion:-
k = 1, minimum support = 2
Item Tidset
We now recursively call the function till no more item-tidset pairs can be combined:-
k=2
Item Tidset
k=3
Item Tidset
Item Tidset
k=4
Item Tidset
We stop at k = 4 because there are no more item-tidset pairs to combine. Since minimum
support = 2, we conclude the following rules from the given dataset:-
Items Bought Recommended Products
Bread Butter
Bread Milk
Bread Jam
Butter Milk
Butter Coke
Butter Jam