BCS602 | MACHINE LEARNING| VTU Belagavi
Module-3
Chapter – 01 - Similarity-based Learning
Nearest-Neighbor Learning
k- Nearest-Neighbors (k-NN) Learning
Definition:
o k-NN is a non-parametric, similarity-based algorithm used for both classification and
regression.
o It predicts the class or value of a test instance based on the ‘K’ nearest neighbors in the
training data.
Working:
o Classification:
The algorithm determines the class of a test instance by considering the ‘K’
nearest neighbors and selecting the class with the majority vote.
Machine Learning Page 1
BCS602 | MACHINE LEARNING| VTU Belagavi
o Regression:
The output is the mean of the target variable values of the ‘K’ nearest
neighbors.
Assumption:
o k-NN relies on the assumption that similar objects are closer to each other in the feature
space.
Instance-Based Learning:
o Memory-Based: The algorithm does not build a prediction model ahead of time, but stores
training data for predictions to be made at the time of the test instance.
o Lazy Learning: No model is constructed during training; the learning process
happens only during testing when predictions are required.
Distance Metric:
o The most common distance metric used is Euclidean distance to measure the
closeness of training data instances to the test instance.
Choosing ‘K’:
o The value of ‘K’ determines how many neighbors should be considered for the prediction.
It is typically selected by experimenting with different values of K to find the optimal one
that produces the most accurate predictions.
Machine Learning Page 2
BCS602 | MACHINE LEARNING| VTU Belagavi
Classification Process:
o For a discrete target variable (classification): The class of the test instance is
determined by the majority vote of the 'K' nearest neighbors.
o For a continuous target variable (regression): The output is the mean of the output
variable values of the ‘K’ nearest neighbors.
Advantages:
o Simple and intuitive.
o Effective for small to medium-sized datasets.
o Can handle multi-class classification.
Disadvantages:
o Computationally expensive during prediction because it requires calculating
distances to all training data instances.
o Performance may degrade with high-dimensional data (curse of dimensionality).
Weighted K-Nearest-Neighbor Algorithm Overview:
o Weighted k-NN is an extension of the k-NN algorithm.
o It improves upon k-NN by assigning weights to neighbors based on their distance from the
test instance.
Motivation:
o Traditional k-NN assigns equal importance to all the ‘k’ nearest neighbors, which can lead to
poor performance when:
Neighbors are at varying distances.
The nearest instances are more relevant than the farther ones.
Machine Learning Page 3
BCS602 | MACHINE LEARNING| VTU Belagavi
o Weighted k-NN addresses this by making closer neighbors more influential.
Working Principle:
Weights are inversely proportional to distance:
Closer neighbors get higher weights, while farther neighbors get lower
weights.
o The final prediction is based on the weighted majority vote (classification) or the
weighted average (regression) of the k nearest neighbors.
Weight Assignment:
o Uniform Weighting: All neighbors are given the same weight (as in standard k-NN).
o Distance-Based Weighting: Weights are computed based on the inverse distance, giving
closer neighbors more influence.
Advantages:
o Addresses the limitations of standard k-NN by considering the relative importance of
neighbors.
o Performs better in datasets where closer neighbors are more relevant to the
prediction.
Applications:
o Classification: Predict the class of the test instance by weighted voting of the k nearest
neighbors.
Machine Learning Page 4
BCS602 | MACHINE LEARNING| VTU Belagavi
o Regression: Predict the output value by computing the weighted mean of the k nearest
neighbors.
Limitations:
o Computational cost increases as distance calculations and weight assignments are performed
for each query.
o Sensitive to the choice of the distance metric (e.g., Euclidean, Manhattan, etc.).
Nearest Centroid Classifier
A simple alternative to k-NN classifiers for similarity-based classification is the Nearest Centroid
Classifier.
It is a simple classifier and also called as Mean Difference classifier.
The idea of this classifier is to classify a test instance to the class whose centroid/mean is closest
to that instance.
The Nearest Centroid Classifier (also known as the Mean Difference Classifier) is a simple
alternative to k-Nearest Neighbors (k-NN) for similarity-based classification.
Algorithm
Inputs: Training dataset T, Distance metric d, Test instance t Output:
Predicted class or category
1. Compute the mean/centroid of each class.
2. Compute the distance between the test instance and mean/centroid of each class
(Euclidean Distance).
3. Predict the class by choosing the class with the smaller distance.
Machine Learning Page 5
BCS602 | MACHINE LEARNING| VTU Belagavi
Locally Weighted Regression (LWR)
Locally Weighted Regression (LWR) is a non-parametric supervised learning algorithm that
performs local regression by combining regression model with nearest neighbor’s model.
LWR is also referred to as a memory-based method as it requires training data while
prediction but uses only the training data instances locally around the point of interest.
Machine Learning Page 6
BCS602 | MACHINE LEARNING| VTU Belagavi
Using nearest neighbors algorithm, we find the instances that are closest to a test instance and
fit linear function to each of those ‘K’ nearest instances in the local regression model.
The key idea is that we need to approximate the linear functions of all ‘K’ neighbors that
minimize the error such that the prediction line is no more linear but rather it is a curve.
Machine Learning Page 7
BCS602 | MACHINE LEARNING| VTU Belagavi
Machine Learning Page 8
BCS602 | MACHINE LEARNING| VTU Belagavi
Machine Learning Page 9
BCS602 | MACHINE LEARNING| VTU Belagavi
Chapter – 02
Regression Analysis
Introduction to Regression
Definition:
Regression analysis is a supervised learning technique used to model the relationship between
one or more independent variables (x) and a dependent variable (y).
Objective:
The goal is to predict or forecast the dependent variable (y) based on the independent variables
(x), which are also called explanatory, predictor, or independent variables.
Mathematical Representation:
The relationship is represented by a function:
Purpose:
Regression analysis helps to determine how the dependent variable changes when an independent
variable is varied while others remain constant.
It answers key questions such as:
o What is the relationship between variables?
o What is the strength and nature (linear or non-linear) of the relationship?
o What is the relevance and contribution of each variable?
Machine Learning Page 10
BCS602 | MACHINE LEARNING| VTU Belagavi
Applications:
Sales forecasting
Bond values in portfolio management
Insurance premiums
Agricultural yield predictions
Real estate pricing
Prediction Focus:
Regression is primarily used for predicting continuous or quantitative variables, such as price,
revenue, and other measurable factors.
Introduction to Linear Regression
Definition:
Linear Regression is a fundamental supervised learning algorithm used to model the
relationship between one or more independent variables (predictors) and a dependent variable
(target).
It assumes a linear relationship between the variables.
Objective:
The primary goal of linear regression is to find a linear equation that best fits the data points.
This equation is used to predict the dependent variable based on the values of the independent
variables.
Mathematical Representation:
The relationship is represented as:
Assumptions:
Linearity: The relationship between x and y is linear.
Independence: Observations are independent of each other.
Machine Learning Page 11
BCS602 | MACHINE LEARNING| VTU Belagavi
Homoscedasticity: Constant variance of errors across all levels of xxx.
Normality: The residuals (errors) are normally distributed.
Types of Linear Regression:
Simple Linear Regression: Involves one independent variable.
Multiple Linear Regression: Involves two or more independent variables.
Applications:
Predicting house prices based on features like size and location.
Estimating sales based on advertising expenditure.
Forecasting stock prices or other financial metrics.
Modeling growth trends in industries.
Advantages:
Easy to implement and interpret.
Efficient for linearly separable data.
Limitations:
Struggles with non-linear relationships.
Sensitive to outliers, which can distort predictions.
Machine Learning Page 12
BCS602 | MACHINE LEARNING| VTU Belagavi
Machine Learning Page 13
BCS602 | MACHINE LEARNING| VTU Belagavi
Multiple Linear Regression
Multiple regression model involves multiple predictors or independent variables and one
dependent variable.
This is an extension of the linear regression problem. The basic assumptions of multiple linear
regression are that the independent variables are not highly correlated and hence multicollinearity
problem does not exist.
Also, it is assumed that the residuals are normally distributed.
Machine Learning Page 14
BCS602 | MACHINE LEARNING| VTU Belagavi
Machine Learning Page 15
BCS602 | MACHINE LEARNING| VTU Belagavi
Definition:
Multiple Linear Regression (MLR) is an extension of simple linear regression, where multiple
independent variables (predictors) are used to model the relationship with a single dependent
variable (target).
Mathematical Representation:
The relationship is represented as:
Assumptions of Multiple Linear Regression:
No Multicollinearity: The independent variables should not be highly correlated with each
other. Multicollinearity can cause issues in estimating the coefficients accurately.
Normality of Residuals: The residuals (errors) should be normally distributed for valid
inference and hypothesis testing.
o Linearity: The relationship between each independent variable and the dependent variable
should be linear.
o Independence of Errors: Observations should be independent of each other.
o Homoscedasticity: The variance of residuals should be constant across all levels of the
independent variables.
Machine Learning Page 16
BCS602 | MACHINE LEARNING| VTU Belagavi
Applications:
o Predicting house prices based on multiple features (size, location, number of rooms, etc.).
o Estimating the sales of a product based on various factors (price, advertising budget,
competition, etc.).
o Modeling health outcomes based on multiple risk factors (age, BMI, physical activity, etc.).
Advantages:
o Can model the relationship between multiple predictors and a single outcome.
o Provides insights into how different predictors influence the dependent variable.
Limitations:
o If multicollinearity exists (high correlation between predictors), it can affect the
stability and interpretability of the model.
o Can be computationally complex with a large number of predictors.
o Sensitive to outliers, which can distort the relationship between variables.
Polynomial Regression
Introduction to Polynomial Regression
Definition:
Polynomial Regression is a form of regression analysis that models the relationship between the
independent variable(s) and the dependent variable as a polynomial function.
It is used when the relationship between variables is non-linear and cannot be effectively modeled
using linear regression.
Machine Learning Page 17
BCS602 | MACHINE LEARNING| VTU Belagavi
Purpose:
When the data exhibits a non-linear trend, linear regression may result in large errors.
Polynomial regression overcomes this limitation by fitting a curved line to the data.
Approaches to Handle Non-Linearity:
Features of Polynomial Regression:
Captures curved relationships between variables.
Provides a more flexible model compared to linear regression.
Applications:
Modeling growth trends in populations or markets.
Predicting real-world phenomena such as temperature variations, physics
experiments, or chemical reactions.
Engineering designs involving complex relationships.
Machine Learning Page 18
BCS602 | MACHINE LEARNING| VTU Belagavi
Advantages:
Capable of modeling non-linear relationships without transforming the data.
Provides a better fit for datasets with curved trends.
Limitations:
Increasing the polynomial degree can lead to overfitting the training data.
Sensitive to outliers, which can significantly distort the fitted curve.
May require careful tuning of the degree nnn to balance bias and variance.
Logistic Regression
Introduction to Logistic Regression
Definition:
Logistic Regression is a supervised learning algorithm used for classification problems,
particularly binary classification, where the output is a categorical variable with two possible
outcomes (e.g., yes/no, pass/fail, spam/not spam).
Purpose:
Logistic Regression predicts the probability of a categorical outcome and maps the
prediction to a value between 0 and 1. It works well when the dependent variable is binary.
Applications:
o Email classification: Is the email spam or not?
o Student admission prediction: Should a student be admitted or not based on scores?
o Exam result classification: Will the student pass or fail based on marks?
Core Concept:
o Logistic Regression models the probability of a particular response variable.
Machine Learning Page 19
BCS602 | MACHINE LEARNING| VTU Belagavi
o For instance, if the predicted probability of an email being spam is 0.7, there is a 70% chance
the email is spam.
Challenges with Linear Regression for Classification:
o Linear regression can predict values outside the range of 0 to 1, which is unsuitable for
probabilities.
o Logistic Regression overcomes this by using a sigmoid function to map values to the range [0,
1].
Sigmoid Function:
The sigmoid function (also called the log it function) is used to map any real number to the range
[0, 1]. It is mathematically represented as:
Difference between Odds and Probability:
Machine Learning Page 20
BCS602 | MACHINE LEARNING| VTU Belagavi
For example:
If the probability of an event is 0.75, the odds are:
Features of Logistic Regression:
Logistic Regression predicts the probability of a class label.
It applies a threshold (e.g., 0.5) to determine the class label.
It is based on the log-odds transformation to linearize the relationship between
variables.
Advantages:
Simple and efficient for binary classification.
Works well when the relationship between the dependent and independent
variables is linear (in terms of log-odds).
Outputs interpretable probabilities.
Limitations:
Struggles with non-linear decision boundaries (can be addressed with extensions like
polynomial logistic regression).
Sensitive to outliers in the dataset.
Machine Learning Page 21
BCS602 | MACHINE LEARNING| VTU Belagavi
For Understanding Purpose Only
Machine Learning Page 22
BCS602 | MACHINE LEARNING| VTU Belagavi
Chapter – 03
Decision Tree Learning
Introduction to Decision Tree Learning Model
Overview:
Decision tree learning is a popular supervised predictive model for classification tasks.
It performs inductive inference, generalizing from observed examples.
It can classify both categorical and continuous target variables.
The model is often used for solving complex classification problems with high
accuracy.
Structure of a Decision Tree:
Root Node: The topmost node that represents the entire dataset.
Internal/Decision Nodes: These are nodes that perform tests on input attributes and split
the dataset based on test outcomes.
Branches: Represent the outcomes of a test condition at a decision node.
Machine Learning Page 23
BCS602 | MACHINE LEARNING| VTU Belagavi
Leaf Nodes/Terminal Nodes: Represent the target labels or output of the decision process.
Path: A path from root to leaf node represents a logical rule for classification.
Process of Building a Decision Tree:
Goal: Construct a decision tree from the given training dataset. Tree
Construction:
o Start from the root and recursively find the best attribute for splitting.
o This process continues until the tree reaches leaf nodes that cannot be further
split.
o The tree represents all possible hypotheses about the data.
Output: A fully constructed decision tree that represents the learned model. Inference or
Classification:
Goal: For a given test instance, classify it into the correct target class. Classification:
o Start at the root node and traverse the tree based on the test conditions for each
attribute.
o Continue evaluating test conditions until reaching a leaf node, which provides the
target class label for the instance.
Advantages of Decision Trees:
1. Easy to model and interpret.
2. Simple to understand.
3. Can handle both discrete and continuous predictor variables.
4. Can model non-linear relationships between variables.
Machine Learning Page 24
BCS602 | MACHINE LEARNING| VTU Belagavi
5. Fast to train.
Disadvantages of Decision Trees:
1. It is difficult to determine how deep the tree should grow and when to stop.
2. Sensitive to errors and missing attribute values in training data.
3. Computational complexity in handling continuous attributes, requiring
discretization.
4. Risk of overfitting with complex trees.
5. Not suitable for classifying multiple output classes.
6. Learning an optimal decision tree is an NP-complete problem.
Decision Tree Induction Algorithms
Several decision tree algorithms are widely used in classification tasks, including ID3, C4.5, and
CART, among others.
These algorithms differ in their splitting criteria, handling of attributes, and robustness to data
characteristics.
Popular Decision Tree Algorithms:
ID3 (Iterative Dichotomizer 3):
o Developed by J.R. Quinlan in 1986.
o Constructs univariate decision trees (splits based on a single attribute).
o Uses Information Gain as the splitting criterion.
o Assumes attributes are discrete or categorical.
o Works well with large datasets but is prone to overfitting on small datasets.
o Cannot handle missing values or continuous attributes directly (requires
discretization).
o No pruning is performed, making it sensitive to outliers.
Machine Learning Page 25
BCS602 | MACHINE LEARNING| VTU Belagavi
C4.5:
o An extension of ID3 developed by J.R. Quinlan in 1993.
o Uses Gain Ratio as the splitting criterion, which normalizes Information Gain.
o Can handle both categorical and continuous attributes.
o Handles missing values by estimating the best split based on available data.
o Prone to outliers, which can affect the tree construction.
CART (Classification and Regression Trees):
o Developed by Breiman et al. in 1984.
o Can handle categorical and continuous-valued target variables.
o Uses the GINI Index as the splitting criterion for classification tasks.
o Builds binary decision trees (only two splits per node).
o Handles missing values and is robust to outliers.
o Can be used for regression tasks, making it versatile.
Univariate vs. Multivariate Decision Trees:
Univariate Decision Trees:
o Split based on a single attribute at each decision node.
o Examples: ID3 and C4.5.
o Simple and axis-aligned splits.
Multivariate Decision Trees:
o Consider multiple attributes for splitting at a single decision node.
o Example: CART.
o More complex and better suited for non-linear relationships.
Machine Learning Page 26
BCS602 | MACHINE LEARNING| VTU Belagavi
Features of Decision Tree Algorithms
Advantages and Limitations of ID3, C4.5, and CART:
Machine Learning Page 27
BCS602 | MACHINE LEARNING| VTU Belagavi
Algorithm
Machine Learning Page 28
BCS602 | MACHINE LEARNING| VTU Belagavi
Machine Learning Page 29