4/19/2025
MODULE 3
SIMILARITY BASED LEARNING,
REGRESSION ANALYSIS, DECISION
TREE LEARNING
DR. SHIVA PRASAD KM
ASSOCIATE PROFESSOR
DEPARTMENT OF CSE
RYMEC BALLARI
M: 7899964163
OVERVIEW CHAPTER 4 5 & 6 FROM TEXT BOOK 1
■ Introduction on Similarity based learning
■ Nearest-Neighbor Learning
■ Weighted K-Nearest-Neighbor Algorithm
■ Nearest Centroid Classifier
■ Locally Weighted Regression (LWR).
■ Problems
■ Introduction to Regression
■ Introduction to Linear Regression
■ Multiple Linear Regression
■ Polynomial Regression
■ Logistic Regression.
Dr. Shiva Prasad KM Associate Professor CSE Department RYMEC Ballari M: 7899964163
1
4/19/2025
INTRODUCTION ON SIMILARITY BASED LEARNING
■ Similarity based classifiers use similarity measures to locate the nearest neighbors and classify the test
instances which works in contrast with other learning mechanisms such as decision trees or neural
networks.
■ Similarity based learning is also called as instance based learning or just in time learning since it does
not build an abstract model of the training instances and performs lazy learning classification for new
instances.
■ An instance is an entity or an example in the training dataset. It is described by a set of features or
attributes. One attribute describes the class label or category of the instance.
INSTANCE BASED LEARNING VS MODEL BASED LEARNING
2
4/19/2025
NEAREST NEIGHBOUR LEARNING
■ A natural approach to similarity based classification is K-Nearest Neighbor (K-NN) which is non-
parametric method used for both classification and regression problems.
■ KNN is one of the most basic yet essential classification algorithms in machine learning. It belongs to
the supervised learning domain and finds intense application in pattern recognition, data mining, and
intrusion detection.
■ It is simple and powerful non-parametric algorithm that predicts the category based on the test instances
according to the k training samples which are closer to the test instances and classifies it to that
category which has the largest probability.
■ There are various distance metrics used in K NN Algorithm they are Eucledian distance, Manhattan
Distance, Minkowski Distance
NEAREST NEIGHBOUR LEARNING
3
4/19/2025
NEAREST NEIGHBOUR LEARNING
■ Consider the student performance training dataset of 8 instances shown in the table below, which
describes the performance of individual student in a course and their CGPA obtained in previous
semesters based on the performance of student classify weather a student will pass or fail using K-
Nearest Neighbor algorithm?
WEIGHTED K-NEAREST NEIGHBOUR ALGORITHM
■ The weighted KNN is an extension of K-NN. It chooses the neighbors by using the weighted
distance.
■ The K- Nearest Neighbor algorithm has some serious limitations as its performance is solely
dependent on choosing the K-Nearest Neighbor the distance metrics used for the decision rule.
■ In weighted K-NN, the nearest k points are given a weight using a function called as the kernel
function. The intuition behind weighted K-NN, is to give more weight to the points which are nearby
and less weight to the points which are further away.
4
4/19/2025
WEIGHTED K-NEAREST NEIGHBOUR ALGORITHM
NEAREST CENTROID ALGORITHM
■ A simple alternative to k-NN Classifiers for similarity based classification is the nearest centroid
Classifier (Mean Difference Classifier).
■ The Nearest Centroid Classifier is a simple and intuitive classification method. It assigns a class label
to a new data point based on the closest class centroid (mean of features for each class).
■ Algorithm:
5
4/19/2025
NEAREST CENTROID ALGORITHM
LOCALLY WEIGHTED REGRESSION (LWR)
■ Locally Weighted Regression (LWR) is a non-parametric supervised learning algorithm that performs
local regression by combining regression model with nearest neighbor model.
■ LWR is also referred as memory based method as it requires training data while prediction but uses
only the training instances locally around the point of interest.
■ Using nearest neighbor we find the instances that are closest to the test instances and fit the linear
function to each of those k nearest instances in local regression model.
■ The key idea is that we need to approximate the linear function of all k neighbor that minimizes the
error such that the prediction line is no more linear but rather it is curve.
■ Ordinarily the linear regression finds out the linear relationship between x input and y output.
6
4/19/2025
LOCALLY WEIGHTED REGRESSION (LWR)
REGRESSION AND TYPES OF REGRESSION
■ Regression analysis is a statistical method to model the relationship between a dependent (target) and independent
(predictor) variables with one or more independent variables.
■ Regression is a supervised learning technique which helps in finding the correlation between variables.
■ It is mainly used for prediction, forecasting, time series modeling, and determining the causal-effect relationship
between variables.
■ Regression shows a line or curve that passes through all the data points on target-predictor graph in such a way
that the vertical distance between the data points and the regression line is minimum." The distance between data
points and line tells whether a model has captured a strong relationship or not. Function of regression analysis is
given by:
Y=f(x)
■ Here, y is called dependent variable and x is called independent variable
7
4/19/2025
TYPES OF REGRESSION
REGRESSION AND TYPES OF REGRESSION
■ Linear Regression: It is a type of regression where line is fitted upon the given data for finding the linear
relationship between one independent variable and one dependent variable to describe the relationship.
■ Multiple Regression : It is a type of regression where a line is fitted for finding the linear relationship
between two or more independent variables and one dependent variable to describe relationship among the
variables.
■ Polynomial regression: It is a type of non-linear regression method of describing the relationship among the
variables where Nth degree polynomial is used to model the relationship between one dependent and one
independent variable. It is used to model two or more independent variables and one dependent variable.
■ Logistic Regression: It is used for predicting categorical variables that are involve one or more independent
variable and one dependent variable. It is also called as Binary classifier.
8
4/19/2025
LINEAR REGRESSION
■ In the simplest form the linear regression can be created by fitting the line among the scattered data points.
■ The line is the form of the equation given below
■ Here a0 is the intercept which represents the bias and a1 represents the slope of the line. These are called
regression coefficients. e is the error in prediction.
■ The assumptions of linear regression are listed below:
– The observations (y) are random and are mutually independent.
– The difference between the predicted and true values are called error. The error is also mutually
independent with the same distribution such as normal distribution with zero mean and constant variables.
– The distribution of the error term is independent of joint distribution and the unknown parameters are
constant.
LINEAR REGRESSION
■ The formulas used to calculate the values of a0 and a1 are given below as follows
9
4/19/2025
LINEAR REGRESSION IN MATRIX FORM
■ Matrix notations can be used for representing the values of independent and dependent variables.
■ The equation is written in the form :
■ This can be written as :
– Y=Xa+e, where X is an n*2 matrix, Y is an n*1 vector, a is a 2*1 column vector and e is an n*1 column
vector.
MULTIPLE LINEAR REGRESSION
■ Multiple regression model involves multiple predictors or independent variables and one dependent variable.
This is an extension of linear regression model.
■ The basic assumption of multiple linear regression are that the independent variables are not highly corellated
and hence multicollinearity problem does not exist.
10
4/19/2025
MODULE 3
DECISION TREE AND BAYESIAN
LEARNING
DR. SHIVA PRASAD KM
ASSOCIATE PROFESSOR
DEPARTMENT OF CSE
RYMEC BALLARI
M: 7899964163
OVERVIEW
• INTRODUCTION OF DECISION TREE
• ENTROPY AND INFORMATION GAIN
• DECISION TREE INDUCTION ALGORITHM: ID3
• DECISION TREE REPRESENTATION
• REGRESSION TREE AND ITS PROBLEMS
Dr. Shiva Prasad KM Associate Professor CSE Department RYMEC Ballari M: 7899964163
11
4/19/2025
INTRODUCTION OF DECISION TREE
■ A decision tree is a supervised machine learning algorithm used for both classification
and regression tasks.
■ It's a predictive modeling tool that works by creating a tree-like structure, representing
decisions and their possible consequences. Decision tree learning involves building a
tree from a dataset by recursively splitting it based on different attributes or features.
■ The representation of a decision tree is in the form of a tree structure where each
internal node represents a feature, each branch represents a decision rule based on that
feature, and each leaf node represents the outcome or the class label.
INTRODUCTION OF DECISION TREE
12
4/19/2025
PROBLEMS OF DECISION TREE
■ Classification problems: Decision trees are excellent for classification tasks. Whether the classes are
binary (e.g., yes/no, true/false) or multiclass (categorizing into more than two classes), decision trees can
effectively handle these scenarios.
■ Predictive analysis: When you need to make predictions about continuous variables (regression
problems), decision trees can also be used. For instance, predicting the price of a house based on various
features like area, location, number of bedrooms, etc.
■ Feature selection: Decision trees can help identify important features in a dataset. They assess the
relevance of each feature by their placement in the tree, making them valuable for feature selection in
larger datasets.
PROBLEMS OF DECISION TREE
■ Data exploration and understanding: Decision trees are great for understanding the relationships and
structures within the data. They provide a transparent, easily interpretable model that can be useful in
uncovering insights about the relationships between different attributes.
■ Handling non-linear relationships: Decision trees can model non-linear relationships between features
and the target variable, as they recursively split the data based on these relationships.
■ Medical diagnosis: Decision trees are used in the medical field for diagnostic purposes. They can be
employed to determine the likelihood of a disease based on symptoms or other medical test results.
■ Customer relationship management: For customer segmentation or churn prediction in business
applications, decision trees can help identify patterns that differentiate customer behaviors.
13
4/19/2025
PROBLEMS OF DECISION TREE
■ Anomaly detection: Decision trees can be utilized to identify anomalies or outliers in a dataset by
flagging cases that do not fit the patterns established by the tree.
■ Text and sentiment analysis: In natural language processing, decision trees can be applied for
sentiment analysis, spam filtering, and text categorization.
■ Multi-output problems: Decision trees can handle multiple outputs. For example, if a system needs to
make multiple decisions simultaneously based on various input features, decision trees can be structured
to handle such scenarios.
ENTROPY AND INFORMATION GAIN
■ Entropy and Information Gain are fundamental concepts used in decision tree learning, particularly in
the construction and selection of nodes during the tree-building process.
■ Entropy: Entropy is a measure of impurity or disorder in a set of examples. In the context of decision
trees, entropy is used to determine the homogeneity of a group of examples.
■ Lower entropy implies that the examples within a group are more uniform or pure, while higher entropy
indicates the presence of mixed or diverse examples.
■ The formula for entropy, often used in the context of decision trees, is given by:
Entropy=−∑(pi⋅log2(pi))
Where: pi : represents proportion of examples in the dataset that belongs to class I and
the summation runs over all classes in the dataset.
14
4/19/2025
ENTROPY AND INFORMATION GAIN
■ Information Gain: Information Gain is the measure of the effectiveness of a particular attribute in
classifying the examples. When constructing a decision tree, the algorithm aims to choose the attribute
that best splits the dataset. Information Gain helps decide which attribute to use for this split by
determining the attribute that provides the most knowledge or the most significant reduction in entropy.
■ The formula for Information Gain is based on the entropy. For a given attribute A, the Information
Gain is calculated as follows:
Information Gain=Entropy before splitting−Weighted average of Entropy after splitting
Example:
■ The attribute with the highest Information Gain is considered the most informative or the best attribute
for splitting the dataset at a particular node in the decision tree.
Iterative Dichotomiser 3-ID3 Algorithm
■ The ID3 (Iterative Dichotomiser 3) algorithm is a classic approach used for constructing decision trees
by using the training dataset with labels.
■ It was developed by Ross Quinlan and is specifically designed for learning decision trees from datasets.
■ The key steps of the ID3 algorithm are as follows:
■ Calculate Entropy: Compute the entropy of the target variable in the dataset.
■ Feature Selection: For each attribute in the dataset:
– Calculate the Information Gain for that attribute.
– Select the attribute that has the highest Information Gain as the best attribute to split the dataset.
■ Create a Node: Create a decision node based on the best attribute.
15
4/19/2025
Iterative Dichotomiser 3-ID3 Algorithm
■ Split the Dataset: Split the dataset into subsets based on the values of the selected attribute.
■ Repeat: For each subset, recursively apply the above steps until one of the following conditions is met:
– All the instances in the subset belong to the same class (pure node).
– There are no more attributes left to split, or a predefined stopping criterion is met.
Iterative Dichotomiser 3-ID3 Algorithm
■ Let's consider an example to demonstrate the ID3 algorithm. Suppose we have a dataset that
contains weather and play tennis data, and we aim to construct a decision tree using the ID3
algorithm.
16
4/19/2025
Iterative Dichotomiser 3-ID3 Algorithm
Iterative Dichotomiser 3-ID3 Algorithm
17
4/19/2025
Iterative Dichotomiser 3-ID3 Algorithm
Iterative Dichotomiser 3-ID3 Algorithm
18
4/19/2025
Iterative Dichotomiser 3-ID3 Algorithm
Iterative Dichotomiser 3-ID3 Algorithm
19
4/19/2025
Iterative Dichotomiser 3-ID3 Algorithm
Iterative Dichotomiser 3-ID3 Algorithm
20
4/19/2025
Iterative Dichotomiser 3-ID3 Algorithm
Iterative Dichotomiser 3-ID3 Algorithm
21
4/19/2025
Iterative Dichotomiser 3-ID3 Algorithm
Iterative Dichotomiser 3-ID3 Algorithm
22
4/19/2025
Iterative Dichotomiser 3-ID3 Algorithm
REGRESSION TREE
■ Regression Tree are a variant of decision tree where the target feature is the continuous valued variable.
These trees can be constructed using an algorithm called reduction in variance which uses standard
deviation to choose the best splitting attribute.
23
4/19/2025
REGRESSION TREE
PROBLEM ON REGRESSION TREE
■ Construct a regression tree using the table below which consists of 10 data instances and 3 attributes
“Assessment”, “Assignment” and “Project”. The target attribute is the Result which is a continuous
attribute.
24
4/19/2025
VALIDATING AND PRUNNING OF DECISION TREE
■ Validating and pruning decision trees is a crucial part of building accurate and robust machine learning
models.
■ Decision trees are prone to over fitting, which means they can learn to capture noise and details in the
training data that do not generalize well to new, unseen data.
■ Validation and pruning are techniques used to mitigate this issue and improve the performance of
decision tree models.
■ The pre-pruning technique of Decision Trees is tuning the hyper parameters prior to the training
pipeline. It involves the heuristic known as ‘early stopping’ which stops the growth of the decision tree -
preventing it from reaching its full depth. It stops the tree-building process to avoid producing leaves
with small samples.
VALIDATING AND PRUNNING OF DECISION TREE
■ Post-pruning does the opposite of pre-pruning and allows the Decision Tree model to grow to its full
depth. Once the model grows to its full depth, tree branches are removed to prevent the model from over
fitting. The algorithm will continue to partition data into smaller subsets until the final subsets produced
are similar in terms of the outcome variable. The final subset of the tree will consist of only a few data
points allowing the tree to have learned the data to the T. However, when a new data point is introduced
that differs from the learned data - it may not get predicted well.
25