04 Unit-Iv - ML
04 Unit-Iv - ML
The KNN algorithm is useful when you are performing a pattern recognition task for
classifying objects based on different features.
As illustrated above, the sharpness of claws is significant for cats, but not so much for
dogs. On the other hand, the length of ears is significant for dogs, but not quite when it comes
to cats.
Since KNN is based on feature similarity, we can perform classification tasks using
the KNN classifier. The image below—trained with the KNN algorithm—shows the
predicted outcome, a black cat.
What is KNN?
K-Nearest Neighbors is one of the simplest supervised machine learning algorithms
used for classification. It classifies a data point based on its neighbors’ classifications. It
stores all available cases and classifies new cases based on similar features.
The following example below shows a KNN algorithm being leveraged to predict if a
glass of wine is red or white. Different variables that are considered in this KNN algorithm
include sulphur dioxide and chloride levels.
K in KNN is a parameter that refers to the number of nearest neighbors in the majority
voting process.
Here, we have taken K=5. The majority votes from its fifth nearest neighbor and
classifies the data point. The glass of wine will be classified as red since four out of five
neighbors are red.
How to Choose the Factor ‘K’?
But, if we consider K=7, the new data point is a blue triangle. This is because the
amount of red squares outnumbers the blue triangles.
To choose the value of K, take the square root of n (sqrt(n)), where n is the total
number of data points. Usually, an odd value of K is selected to avoid confusion between two
classes of data.
To recap, the goal of the k-nearest neighbor algorithm is to identify the nearest neighbors of a
given query point, so that we can assign a class label to that point. In order to do this, KNN
has a few requirements:
In order to determine which data points are closest to a given query point, the distance
between the query point and the other data points will need to be calculated. These distance
metrics help to form decision boundaries, which partitions query points into different regions.
You commonly will see decision boundaries visualized with Voronoi diagrams.
While there are several distance measures that you can choose from, this article will only
cover the following:
Euclidean distance (p=2): This is the most commonly used distance measure, and it is
limited to real-valued vectors. Using the below formula, it measures a straight line between
the query point and the other point being measured.
Manhattan distance (p=1): This is also another popular distance metric, which measures the
absolute value between two points. It is also referred to as taxicab distance or city block
distance as it is commonly visualized with a grid, illustrating how one might navigate from
one address to another via city streets.
Minkowski distance: This distance measure is the generalized form of Euclidean and
Manhattan distance metrics. The parameter, p, in the formula below, allows for the creation
of other distance metrics. Euclidean distance is represented by this formula when p is equal to
two, and Manhattan distance is denoted with p equal to one.
Hamming distance: This technique is used typically used with Boolean or string vectors,
identifying the points where the vectors do not match. As a result, it has also been referred to
as the overlap metric. This can be represented with the following formula:
The algorithm selects the K data points from X that have the shortest distances to x.
For classification tasks, the algorithm assigns the label y that is most frequent among the K
nearest neighbors to x. For regression tasks, the algorithm calculates the average or weighted
average of the values y of the K nearest neighbors and assigns it as the predicted value for x.
Advantages of the KNN Algorithm
Easy to implement as the complexity of the algorithm is not that high.
Adapts Easily – As per the working of the KNN algorithm it stores all the data in
memory storage and hence whenever a new example or data point is added then the
algorithm adjusts itself as per that new example and has its contribution to the future
predictions as well.
Few Hyperparameters – The only parameters which are required in the training of a
KNN algorithm are the value of k and the choice of the distance metric which we
would like to choose from our evaluation metric.
When studying regression algorithms in machine learning, we only touch over a few
regression algorithms, but we miss out on a fundamental regression algorithm, i.e., Locally
Weighted Regression. It is a non-parametric algorithm, unlike a typical linear regression
algorithm which is a parametric algorithm. A parametric algorithm is an algorithm that
doesn’t need to retain the training data when we need to make predictions. Let’s explore more
about this algorithm below!!!
Points to remember
LWLR fits a separate linear regression model for each query point based on the weights
assigned to the training data points.
The weights assigned to each training data point are inversely proportional to their
distance from the query point.
Training data points that are closer to the query point will have a higher weight and
contribute more to the linear regression model.
LWLR is useful when a global linear model does not well-capture the relationship
between the input and output variables. The goal is to capture local patterns in the data.
So this is what we do in a linear regression algorithm. Then the question arises why do
we need another regression algorithm like a locally weighted regression algorithm?
Say we have a non-linear dataset, as shown in the image above. A standard line
wouldn’t fit entirely into this type of dataset. In such a case, we have to introduce a locally
weighted algorithm that can predict a very close value to the actual value of a given query
point. So another thought coming to our mind is, let's break the given dataset into a smaller
dataset and say we have multiple smaller lines that can fit the smaller dataset. Together, these
multiple lines fit the entire dataset. To understand better, look at the diagram below.
Look at the 4 small lines that have been tried to fit smaller datasets and fit the entire
dataset together. But now, the question arises of how to select the best line that can be chosen
to predict the output for a given query point. Before we move on to find the best fit line, we
must understand that we will always learn a different parameter for a particular query point.
Hence locally weighted regression is a non-parametric algorithm.
Parametric
Parametric models are those which simplify the function to a known form. It has a
collection of parameters that summarize the data through these parameters.
These parameters are fixed in number, which means that the model already knows
about these parameters and they do not depend on the data. They are also independent in
nature with respect to the training samples.
As an example, let us have a mapping function as described below.
b0+b1x1+b2x2=0
From the equation, b0, b1, and b2 are the coefficients of the line that controls the
intercept and slope. Input variables are represented by x1 and x2.
Non - Parametric
They have the freedom to choose any functional form from the training data. As a
result, for parametric models to estimate the mapping function they require much more data
than parametric ones.
x is the point where we want to make the prediction. x(i) is the ith training example
τ can be called as the bandwidth of the Gaussian bell−shaped curve of the weighing
function.
The value of τ can be adjusted to vary the values of w based on distance from the
query point.
A small value for τ means smaller distance of the data point to the query point where
value of w becomes large (more weightage) and vice versa.
Example
2,5,10,17,26,37,50,65,82
Taking a query point as x = 7 and three points from the dataset 5,10,26
Thus,
From the above examples it is evident that, the closer the query point (x) to the a
particular data point/sample x(1),x(2),x(3) . etc. the larger is the value for w. The weightage
decreases / falls exponentially for data points far away from the query point.
As the distance between x(i) and x increases, weights decrease. This decreases the
contribution of error term to the cost function and vice versa.
Locally Weighted Regression (LWR) is a type of regression analysis used when the
data exhibit heteroscedasticity, meaning the variance around the regression line is not
constant. Unlike traditional regression methods that globally fit a function to all data points,
LWR fits a separate regression model for each query point, with the model’s parameters
dependent on that query point.
At its core, LWR assigns weights to the data points based on their proximity to the
query point. The intuition behind this is that nearby points are likely to be more relevant in
predicting the value of the query point. This proximity-based weighting gives more
importance to nearby observations while down weighting those that are farther away.
Mathematical Formulation
Key Characteristics of Locally Weighted Regression
1. Non-parametric Nature: LWR doesn’t make explicit assumptions about the functional
form of the relationship between input and output variables. Instead, it adapts locally to
the data, making it suitable for modeling complex relationships.
2. Robustness: LWR gives more weight to nearby points, making it robust to outliers.
Outliers exert less influence on the fitted model unless they are close to the query point.
3. Local Adaptation: The model parameters vary depending on the query point, allowing
for local adaptation to different regions of the input space. This flexibility is particularly
useful when the relationship between variables changes across the input space.
In the realm of machine learning and artificial intelligence, Neural Networks (NN)
have established their prominence due to their remarkable ability to learn from data and make
predictions or decisions without being explicitly programmed to perform the task. Among
various types of neural networks, radial basis function neural networks (RBFNN) are a
unique class that have proved to be highly effective in various applications including function
approximation, time series prediction, classification, and control.
A radial basis function (RBF) neural network is a type of artificial neural network that
uses radial basis functions as activation functions. It typically consists of three layers: an
input layer, a hidden layer, and an output layer. The hidden layer applies a radial basis
function, usually a Gaussian function, to the input. The output layer then linearly combines
these outputs to generate the final output. RBF neural networks are highly versatile and are
extensively used in pattern classification tasks, function approximation, and a variety of
machine learning applications. They are especially known for their ability to handle non-
linear problems effectively.
Input layer: This layer simply transmits the inputs to the neurons in the hidden layer.
Hidden layer: Each neuron in this layer applies a radial basis function to the inputs it
receives.
Output layer: Each neuron in this layer computes a weighted sum of the outputs from
the hidden layer, resulting in the final output.
Mathematical background
Radial basis function networks (RBFNs) work by comparing the input to known examples
from the training data to classify it.
1. RBFNs start with an input vector. This vector is fed into the input layer of the
network.
2. The network also has a hidden layer, which comprises radial basis function (RBF)
neurons.
3. Each of these RBF neurons has a center, and they measure how close the input is to
their center. They do this using a special function called a Gaussian transfer function.
The output of this function is higher when the input is close to the neuron’s center and
lower when the input is far away.
4. The outputs from the hidden layer are then combined in the output layer. Each node in
the output layer corresponds to a different category or class of data. The network
determines the input’s class by calculating a weighted sum of the outputs from the
hidden layer.
5. The final output of the network is a combination of these weighted sums, which is
used to classify the input.
Let’s think of radial basis function networks (RBFNs) as a team of detectives trying to
solve a mystery.
1. The mystery is the input data. It’s like a puzzle that needs to be solved or a question
that needs to be answered.
2. The detectives are the neurons in the hidden layer. Each detective has a special area of
expertise or a “center.” They are good at solving mysteries close to their area of
expertise.
3. When a new mystery comes in, each detective compares it to their area of expertise
using a tool called a Gaussian transfer function. This tool tells them how similar the
mystery is to what they know best. If the mystery is very similar to their area of
expertise, the tool gives a high score. If it’s very different, the tool gives a low score.
4. Once all the detectives have scored the mystery, their scores are combined in a way
that gives more weight to the most confident detectives. This is the weighted sum.
This is done in the output layer. Each node in this layer represents a different possible
solution to the mystery.
5. The final solution to the mystery is the one that gets the highest combined score from
all the detectives. This is how the network classifies the input data.
So, in a nutshell, RBFNs solve mysteries (classify input data) by letting a team of expert
detectives (neurons) compare the mystery to their areas of expertise and combine their scores
to find the best solution.
There are several types of Radial Basis Functions (RBFs), each with its own characteristics
and mathematical formulations. Some common types include:
Gaussian Radial Basis Function: It has a bell-shaped curve and is often employed in
various applications due to its simplicity and effectiveness. It is represented as:
Thin Plate Spline Radial Basis Function: The Thin Plate Spline RBF is defined as
ϕ(r)= r2log(r) is the Euclidean distance between the input and the center. This RBF is
often used in applications involving thin-plate splines, which are used for surface
interpolation and deformation.
Cubic Radial Basis Function: The Cubic RBF is defined as ϕ(r)= r3 where r is the
Euclidean distance. It has cubic polynomial behavior and is sometimes used
in interpolation.
In this method, the system uses a case base that contains a number of previously
solved cases that are similar to the current task. The system uses this information to search
for similar cases and apply the previous solution to the current task.
The CBL process consists of three phases: retrieval, adaptation and evaluation. In the
retrieval phase, the system searches for similar cases in the database. In the adaptation phase,
the system modifies the solution of the previous case to fit the current task. In the evaluation
phase, the system evaluates the proposed solution and compares it with the optimal solution.
In the realm of machine learning, various techniques and methodologies are employed
to teach algorithms to recognize patterns, make predictions, and solve problems. One such
approach gaining traction is Case Based Learning (CBL). CBL is a method where new
problems are solved by referring to similar past cases. It operates on the premise that the past
experiences of solving problems can be leveraged to tackle new ones. Let’s delve deeper into
what Case Based Learning entails and how it is utilized in the field of machine learning.
At its core, Case Based Learning involves using past experiences or cases to guide
decision-making in new situations. Just as humans often draw from their past experiences to
handle novel problems, CBL algorithms learn from historical cases to make predictions or
decisions. These cases typically consist of a problem, its corresponding solution, and
contextual information. The key idea is to find similarities between the current problem and
past cases and then adapt or apply the solutions accordingly.
In the realm of machine learning, the process of Case Based Learning involves several
steps:
2. Case Retrieval: When faced with a new problem, the CBL system retrieves similar cases
from its memory based on certain similarity metrics. These metrics evaluate how closely
the characteristics of the current problem match those of past cases.
3. Adaptation: Once relevant cases are retrieved, the system adapts the solutions from these
cases to fit the current problem. This may involve tweaking the solutions based on the
specific context or conditions of the new problem.
4. Solution Application: Finally, the adapted solution is applied to address the new
problem, and the outcome is observed. Over time, as more cases are encountered and
solutions are applied, the CBL system refines its decision-making abilities.
Case Based Learning offers several advantages in the realm of machine learning:
1. Flexibility: CBL systems can adapt to a wide range of problems, as they rely on past
experiences rather than predefined rules or models.
2. Incremental Learning: As new cases are encountered, the system continuously learns
and improves its decision-making abilities, making it well-suited for dynamic
environments.
3. Interpretability: Since solutions are based on past cases, the reasoning behind decisions
made by CBL systems is often transparent and understandable.
4. Efficiency: By leveraging existing knowledge, CBL can reduce the need for extensive
computational resources or labeled data, making it a cost-effective approach in certain
scenarios.
1. Healthcare: CBL can assist in medical diagnosis by referencing similar patient cases and
treatment outcomes.
2. Finance: In financial markets, CBL algorithms can analyze past trading patterns to make
investment decisions.
4. Fault Diagnosis: In industrial settings, CBL can help identify and troubleshoot
equipment failures by comparing them to past maintenance cases.
For example, assume there is a CBR mechanism for an e-commerce application that
provides services to its customer. The CBR mechanism can be used to improve customer
experiences based on past experiences. Let’s say if someone likes a particular category, then
the CBR mechanism helps find similar cases to the customer.
Types of Knowledge in case-based reasoning
There are mainly four types of knowledge containers in CBR.
Vocabulary: Vocabulary involves the process of determining information
about attributes and parameters required for the selection of features that are used for
determining cases.
CBR Cycle
The CBR (Case-Based Learning) cycle is an iterative process that tells us how a new
problem is approached. It refers to collecting together past experiences and making use of
relevant information. Let's discuss the main steps in the Case-based reasoning cycle.
New Problem: The CBR cycle starts when a new problem arrives.
Case retrieval: After properly analyzing the problem, the relevant information is
extracted by comparing similar cases with the newly arrived problem. Therefore, we put
together useful similar cases to solve the problem.
Case reuse: As a next step in the cycle, the past information is reused, and feasible
solutions are selected from similar cases.
Reasoning results: It refers to applying the filtered information and solutions by
analyzing previous similar cases to the present problem. In this step, we generally use
algorithms and heuristics to narrow down to a solution.
Case revision: After applying the necessary information, the next step is to evaluate the
output→judge its effectiveness →and produce feedback.
Suppose the result’s quality is not up to the mark. In that case,
the solution is modified according to the feedback, and an efficient solution is recorded
for future similar cases like this.
Case retain: Now, this revised case with the present problem and the efficient
solution is stored as the case base, thus enriching with more problem-solving
capabilities.
Case-Based Reasoning VS Other Techniques
The case-based reasoning is an exceptionally well-designed technique that uses past
experiences to improve its performance. Let's discuss how it differs from other techniques.
Rule-based systems
In rule-based reasoning, pre-defined rules are used for solving problems. Experts in
the field design this set of rules. On the other hand, CBR is efficient in handling situations
where the rules may not be efficient or are not present.
Decision trees
These are a type of algorithms are used for solving classification problems and
widely used in Machine Learning and data mining. While CBR uses past experiences instead
of creating a decision tree based on the given data.
Neural networks
These are machine learning algorithms that use past information to make predictions
on the provided information. They are used in natural language processing, stock market
predictions, intelligent searching, and image recognition and are inspired by a human’s brain
structure. While CBR’s primary focus is learning from stored past experiences cases.
Advantages of Case-Based Reasoning
Below are the advantages of case-based reasoning in machine learning.
CBR helps in avoiding past mistakes.
CBR considers small details, features, and information relevant to the problem for
effective decision-making.
Case retention in CBR helps in expanding knowledge and improving the system.
Disadvantages of Case-based Reading
Below are the disadvantages of case-based reasoning in machine learning.
CBR can be time-consuming as it involves various stages of retrieving similar cases and
arriving at the best solution.
Scalability is also an issue in CBR. With the increase in the case base, the time needed
for retrieving and adapting the cases also increases, affecting the overall efficiency.
CBR may give out incorrect solutions if the cases are not represented correctly.
Applications of Case-Based Reasoning
Below are the applications of case-based reasoning in machine learning
CBR is used in fault diagnosis systems for resolving faults. CBR helps in providing
guidance to diagnose faults.
CBR can be used for the designing process by using previous design cases that fit into a
similar set of rules.
It is also efficient for planning a process using past experiences and arranging the action
sequence.
CBR can be used in classifying tasks by analyzing past cases and features associated
with them.
Consequences
Case-based reasoning can help solve problems more efficiently. For example, if you
are struggling to solve a problem, others will often advise you to take a break and tackle it
with fresh eyes later. Case-based reasoning, essentially, is looking at past problems and their
solutions with fresh eyes and trying to apply them to a new situation.
Case-based reasoning recognizes that usually, the second time we approach a task we
are better at it. We have some experience; we remember our mistakes and successes and
proceed accordingly. There are various factors that can influence how effective one’s use of
case-based reasoning is:
Experiences: It is likely that the more you’ve experienced, the better you will
be at case-based reasoning. If you’ve only cooked pasta once and try to whip
up fettuccine alfredo, you only have one prior experience to use as a problem-
solving tool. If you’ve cooked pasta 100 times, you have 100 pieces of
evidence to work with.
Creativity and innovativeness: Case-based reasoning is only going to be
effective if you have the ability to consider creative ways to adapt past
solutions to new problems. Even if you’ve had 100 experiences, tackling a
new problem the same way that you’ve tackled an old one hurts your chance
of success. Being innovative allows you to slightly adjust old tactics and
consider new ways of solving problems.
The ability to perceive new situations in light of old experiences: On a basic
level, case-based reasoning requires one to be able to remember past
experiences, so a decent memory is a condition for this problem-solving
approach. For case-based reasoning to work, one must find comparisons
between new problems and old problems. Drawing links between the two —
whether it be similarities or differences — will help in identifying what tactics
are going to be useful and which ones do not apply to the new situation.
Indexing experiences in such a way that makes them easy to recall is
important.
An ability to evaluate: Being rational and logical when it comes to case-based
reasoning is also important. You should be able to adequately evaluate how
successful or unsuccessful past endeavors were to know which avenues are
worth pursuing.2
Controversies
Case-based reasoning contrasts with other popular theories of how artificial
intelligence functions. Knowledge-based reasoning, for example, suggests that having general
knowledge of a problem or domain and drawing links between general ideas and specific
problems is how computers and people arrive at solutions. Case-based reasoning,
alternatively, suggests that we recall specific instances and use them to perceive the new
problem.
Since case-based reasoning is based on anecdotal past experiences, it is very personal
and deviates from facts and agreed-upon knowledge. This makes it not the best approach
when working on a team. It also is liable to personal biases and memory as someone may
think a past solution worked really well but it was actually only a good solution for them, or
they may have recalled the past inaccurately. As well, the use of anecdotal evidence means
that the success of the approach is difficult to evaluate.
There are four steps to case-based reasoning (Retrieve, Reuse, Revise, and Retain),
and with each step comes the potential for error.6 Case-based reasoning also requires
individuals to properly remember quite a few past experiences with high levels of accuracy
— it is a bit like trying to memorize an entire scientific textbook rather than trying to
understand various processes so that you’ll be able to apply them to future reasoning, which
knowledge-based reasoning employs.