0% found this document useful (0 votes)
27 views23 pages

04 Unit-Iv - ML

for mca
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views23 pages

04 Unit-Iv - ML

for mca
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 23

UNIT-4

1. Introduction to INSTANT BASED LEARNING


The Machine Learning systems which are categorized as instance-based learning are
the systems that learn the training examples by heart and then generalize to new instances
based on some similarity measure. It is called instance-based because it builds the hypotheses
from the training instances. It is also known as memory-based learning or lazy-
learning (because they delay processing until a new instance must be classified). The time
complexity of this algorithm depends upon the size of training data. Each time whenever a
new query is encountered, its previously stores data is examined. And assign to a target
function value for the new instance.
The worst-case time complexity of this algorithm is O (n), where n is the number of
training instances.
For example, if we were to create a spam filter with an instance-based learning
algorithm, instead of just flagging emails that are already marked as spam emails, our spam
filter would be programmed to also flag emails that are very similar to them. This requires a
measure of resemblance between two emails. A similarity measure between two emails could
be the same sender or the repetitive use of the same keywords or something else.
Some of the instance-based learning algorithms are :
K Nearest Neighbor (KNN)
Self-Organizing Map (SOM)
Learning Vector Quantization (LVQ)
Locally Weighted Learning (LWL)
Case-Based Reasoning
•Instance-based learning methods simply store the training examples instead of
learning explicit description of the target function. – Generalizing the examples is postponed
until a new instance must be classified. – When a new instance is encountered, its
relationship to the stored examples is examined in order to assign a target function value for
the new instance.
• Instance-based learning includes nearest neighbor, locally weighted regression and
case-based reasoning methods.
• Instance-based methods are sometimes referred to as lazy learning methods because
they delay processing until a new instance must be classified.
• A key advantage of lazy learning is that instead of estimating the target function
once for the entire instance space, these methods can estimate it locally and differently for
each new instance to be classified.
2. K- Nearest Neighbour Learning
The K-Nearest Neighbors (KNN) algorithm is a supervised machine learning method
employed to tackle classification and regression problems. Evelyn Fix and Joseph Hodges
developed this algorithm in 1951, which was subsequently expanded by Thomas Cover. The
article explores the fundamentals, workings, and implementation of the KNN algorithm.
What is the K-Nearest Neighbors Algorithm?
KNN is one of the most basic yet essential classification algorithms in machine
learning. It belongs to the supervised learning domain and finds intense application in pattern
recognition, data mining, and intrusion detection.
It is widely disposable in real-life scenarios since it is non-parametric, meaning it does
not make any underlying assumptions about the distribution of data (as opposed to other
algorithms such as GMM, which assume a Gaussian distribution of the given data). We are
given some prior data (also called training data), which classifies coordinates into groups
identified by an attribute.

Why Do We Need the KNN Algorithm?

The KNN algorithm is useful when you are performing a pattern recognition task for
classifying objects based on different features.

Suppose there is a dataset that contains


information regarding cats and dogs. There is a new
data point and you need to check if that sample data
point is a cat or dog. To do this, you need to list the
different features of cats and dogs.

Now, let us consider two features: claw


sharpness and ear length. Plot these features on a 2D plane and check where the data points
fit in.

As illustrated above, the sharpness of claws is significant for cats, but not so much for
dogs. On the other hand, the length of ears is significant for dogs, but not quite when it comes
to cats.

Now, if we have a new data point


based on the above features, we can easily
determine if it’s a cat or a dog.
The new data point features indicate that the animal is, in fact, a cat.

Since KNN is based on feature similarity, we can perform classification tasks using
the KNN classifier. The image below—trained with the KNN algorithm—shows the
predicted outcome, a black cat.

What is KNN?
K-Nearest Neighbors is one of the simplest supervised machine learning algorithms
used for classification. It classifies a data point based on its neighbors’ classifications. It
stores all available cases and classifies new cases based on similar features.

The following example below shows a KNN algorithm being leveraged to predict if a
glass of wine is red or white. Different variables that are considered in this KNN algorithm
include sulphur dioxide and chloride levels.

K in KNN is a parameter that refers to the number of nearest neighbors in the majority
voting process.

Here, we have taken K=5. The majority votes from its fifth nearest neighbor and
classifies the data point. The glass of wine will be classified as red since four out of five
neighbors are red.
How to Choose the Factor ‘K’?

A KNN algorithm is based on feature similarity. Selecting


the right K value is a process called parameter tuning, which is
important to achieve higher accuracy.
There is not a definitive way to determine the best value of K. It depends on the type
of problem you are solving, as well as the business scenario. The most preferred value for K
is five. Selecting a K value of one or two can be noisy and may lead to outliers in the model,
and thus resulting in overfitting of the model. The algorithm
performs well on the training set, compared to its true performance
on unseen test data.

Consider the following example below to predict which


class the new data point belongs to.

If you take K=3, the new data point is a red square.

But, if we consider K=7, the new data point is a blue triangle. This is because the
amount of red squares outnumbers the blue triangles.

To choose the value of K, take the square root of n (sqrt(n)), where n is the total
number of data points. Usually, an odd value of K is selected to avoid confusion between two
classes of data.

Compute KNN: distance metrics

To recap, the goal of the k-nearest neighbor algorithm is to identify the nearest neighbors of a
given query point, so that we can assign a class label to that point. In order to do this, KNN
has a few requirements:

Determine your distance metrics

In order to determine which data points are closest to a given query point, the distance
between the query point and the other data points will need to be calculated. These distance
metrics help to form decision boundaries, which partitions query points into different regions.
You commonly will see decision boundaries visualized with Voronoi diagrams.

While there are several distance measures that you can choose from, this article will only
cover the following:

Euclidean distance (p=2): This is the most commonly used distance measure, and it is
limited to real-valued vectors. Using the below formula, it measures a straight line between
the query point and the other point being measured.

Manhattan distance (p=1): This is also another popular distance metric, which measures the
absolute value between two points. It is also referred to as taxicab distance or city block
distance as it is commonly visualized with a grid, illustrating how one might navigate from
one address to another via city streets.

Minkowski distance: This distance measure is the generalized form of Euclidean and
Manhattan distance metrics. The parameter, p, in the formula below, allows for the creation
of other distance metrics. Euclidean distance is represented by this formula when p is equal to
two, and Manhattan distance is denoted with p equal to one.

Hamming distance: This technique is used typically used with Boolean or string vectors,
identifying the points where the vectors do not match. As a result, it has also been referred to
as the overlap metric. This can be represented with the following formula:

Workings of KNN algorithm


Thе K-Nearest Neighbors (KNN) algorithm operates on the principle of similarity,
where it predicts the label or value of a new data point by considering the labels or values of
its K nearest neighbors in the training dataset.

Step-by-Step explanation of how KNN works is discussed below:


Step 1: Selecting the optimal value of K
K represents the number of nearest neighbors that needs to be considered while making
prediction.
Step 2: Calculating distance
To measure the similarity between target and training data points, Euclidean distance is used.
Distance is calculated between each of the data points in the dataset and target point.
Step 3: Finding Nearest Neighbors
The k data points with the smallest distances to the target point are the nearest neighbors.
Step 4: Voting for Classification or Taking Average for Regression
In the classification problem, the class labels of K-nearest neighbors are determined
by performing majority voting. The class with the most occurrences among the neighbors
becomes the predicted class for the target data point.
In the regression problem, the class label is calculated by taking average of the target
values of K nearest neighbors. The calculated average value becomes the predicted output for
the target data point.
Let X be the training dataset with n data points, where each data point is represented
by a d-dimensional feature vector XiXi and Y be the corresponding labels or values for each
data point in X. Given a new data point x, the algorithm calculates the distance between x and
each data point XiXi in X using a distance metric, such as Euclidean distance:

The algorithm selects the K data points from X that have the shortest distances to x.
For classification tasks, the algorithm assigns the label y that is most frequent among the K
nearest neighbors to x. For regression tasks, the algorithm calculates the average or weighted
average of the values y of the K nearest neighbors and assigns it as the predicted value for x.
Advantages of the KNN Algorithm
 Easy to implement as the complexity of the algorithm is not that high.
 Adapts Easily – As per the working of the KNN algorithm it stores all the data in
memory storage and hence whenever a new example or data point is added then the
algorithm adjusts itself as per that new example and has its contribution to the future
predictions as well.
 Few Hyperparameters – The only parameters which are required in the training of a
KNN algorithm are the value of k and the choice of the distance metric which we
would like to choose from our evaluation metric.

Disadvantages of the KNN Algorithm


 Does not scale – As we have heard about this that the KNN algorithm is also
considered a Lazy Algorithm. The main significance of this term is that this takes lots
of computing power as well as data storage. This makes this algorithm both time-
consuming and resource exhausting.
 Curse of Dimensionality – There is a term known as the peaking phenomenon
according to this the KNN algorithm is affected by the curse of dimensionality which
implies the algorithm faces a hard time classifying the data points properly when the
dimensionality is too high.
 Prone to Overfitting – As the algorithm is affected due to the curse of dimensionality
it is prone to the problem of overfitting as well. Hence generally feature selection as
well as dimensionality reduction techniques are applied to deal with this problem.

3. Locally weighted Regression


In the realm of machine learning, regression techniques are widely used for predicting
continuous values. One powerful approach in this domain is Locally Weighted Regression
(LWR). LWR is a non-parametric method that adapts the model locally to the data points,
assigning different weights to different points based on their proximity to the target point. In
this blog post, we’ll delve into the intricacies of Locally Weighted Regression, understand its
mechanics, and explore its applications.

When studying regression algorithms in machine learning, we only touch over a few
regression algorithms, but we miss out on a fundamental regression algorithm, i.e., Locally
Weighted Regression. It is a non-parametric algorithm, unlike a typical linear regression
algorithm which is a parametric algorithm. A parametric algorithm is an algorithm that
doesn’t need to retain the training data when we need to make predictions. Let’s explore more
about this algorithm below!!!

What is Locally Weighted Linear Regression

Locally Weighted Linear Regression (LWLR) is a non-parametric regression


technique that aims to fit a linear regression model to a dataset by giving more weight to
nearby data points. For example, consider a dataset of temperature readings and
corresponding energy consumption. LWLR can be used to predict the energy consumption
for a given temperature reading by fitting a linear regression model to the training data, where
the weight assigned to each training data point is inversely proportional to its distance from
the query point. This means that training data points that are closer to the query point will
have a higher weight and contribute more to the linear regression model.

Points to remember

Some important points to remember regarding LWLR.


 LWLR is a non-parametric regression technique that fits a linear regression model to a
dataset by giving more weight to nearby data points.

 LWLR fits a separate linear regression model for each query point based on the weights
assigned to the training data points.

 The weights assigned to each training data point are inversely proportional to their
distance from the query point.
 Training data points that are closer to the query point will have a higher weight and
contribute more to the linear regression model.

 LWLR is useful when a global linear model does not well-capture the relationship
between the input and output variables. The goal is to capture local patterns in the data.

Why Locally Weighted Linear Regression Algorithm


In the traditional linear regression algorithm, we aim to fit a general line on the given
training dataset and predict some parameters that can be generalized for any input to make
correct predictions.

So this is what we do in a linear regression algorithm. Then the question arises why do
we need another regression algorithm like a locally weighted regression algorithm?

Say we have a non-linear dataset, as shown in the image above. A standard line
wouldn’t fit entirely into this type of dataset. In such a case, we have to introduce a locally
weighted algorithm that can predict a very close value to the actual value of a given query
point. So another thought coming to our mind is, let's break the given dataset into a smaller
dataset and say we have multiple smaller lines that can fit the smaller dataset. Together, these
multiple lines fit the entire dataset. To understand better, look at the diagram below.

Look at the 4 small lines that have been tried to fit smaller datasets and fit the entire
dataset together. But now, the question arises of how to select the best line that can be chosen
to predict the output for a given query point. Before we move on to find the best fit line, we
must understand that we will always learn a different parameter for a particular query point.
Hence locally weighted regression is a non-parametric algorithm.

Understanding Locally Weighted Regression

Locally Weighted Linear Regression is a non−parametric method/algorithm. In Linear


regression, the data should be distributed linearly whereas Locally Weighted Regression is
suitable for non−linearly distributed data. Generally, in Locally Weighted Regression, points
closer to the query point are given more weightage than points away from it.

Parametric and Non-Parametric Models

Parametric

Parametric models are those which simplify the function to a known form. It has a
collection of parameters that summarize the data through these parameters.

These parameters are fixed in number, which means that the model already knows
about these parameters and they do not depend on the data. They are also independent in
nature with respect to the training samples.
As an example, let us have a mapping function as described below.

b0+b1x1+b2x2=0

From the equation, b0, b1, and b2 are the coefficients of the line that controls the
intercept and slope. Input variables are represented by x1 and x2.

Non - Parametric

Non−parametric algorithms do not make particular assumptions about the kind of


mapping function. These algorithms do not accept a specific form of the mapping function
between input and output data as true.

They have the freedom to choose any functional form from the training data. As a
result, for parametric models to estimate the mapping function they require much more data
than parametric ones.

Derivation of Cost Function and weights

The cost function of linear regression is

In case of Locally Weighted Linear Regression, the cost function is modified to

where 𝑤(𝑖) denotes the weight of ith training sample.

The weighting function can be defined as

x is the point where we want to make the prediction. x(i) is the ith training example
τ can be called as the bandwidth of the Gaussian bell−shaped curve of the weighing
function.

The value of τ can be adjusted to vary the values of w based on distance from the
query point.

A small value for τ means smaller distance of the data point to the query point where
value of w becomes large (more weightage) and vice versa.

The Value of w Typically Ranges From 0 to 1.

weights, 𝜃 are determined during the prediction phase.


The locally Weighted Regression algorithm does not have a training phase. All

Example

Let us consider a dataset consisting of the following points:

2,5,10,17,26,37,50,65,82

Taking a query point as x = 7 and three points from the dataset 5,10,26

Therefore x(1) = 5, x(2) = 10 , x(3) = 26 . Let 𝜏 = 0.5

Thus,

From the above examples it is evident that, the closer the query point (x) to the a
particular data point/sample x(1),x(2),x(3) . etc. the larger is the value for w. The weightage
decreases / falls exponentially for data points far away from the query point.

As the distance between x(i) and x increases, weights decrease. This decreases the
contribution of error term to the cost function and vice versa.
Locally Weighted Regression (LWR) is a type of regression analysis used when the
data exhibit heteroscedasticity, meaning the variance around the regression line is not
constant. Unlike traditional regression methods that globally fit a function to all data points,
LWR fits a separate regression model for each query point, with the model’s parameters
dependent on that query point.

At its core, LWR assigns weights to the data points based on their proximity to the
query point. The intuition behind this is that nearby points are likely to be more relevant in
predicting the value of the query point. This proximity-based weighting gives more
importance to nearby observations while down weighting those that are farther away.

Mathematical Formulation
Key Characteristics of Locally Weighted Regression

We use a modified cost function to perform this.Key Characteristics of Locally Weighted


Regression

1. Non-parametric Nature: LWR doesn’t make explicit assumptions about the functional
form of the relationship between input and output variables. Instead, it adapts locally to
the data, making it suitable for modeling complex relationships.
2. Robustness: LWR gives more weight to nearby points, making it robust to outliers.
Outliers exert less influence on the fitted model unless they are close to the query point.
3. Local Adaptation: The model parameters vary depending on the query point, allowing
for local adaptation to different regions of the input space. This flexibility is particularly
useful when the relationship between variables changes across the input space.

Applications of Locally Weighted Regression

1. Function Approximation: LWR is often used for approximating complex functions


where a global model might not capture the nuances of the data accurately. For instance,
it can be employed in robotics for sensor calibration or trajectory planning.
2. Anomaly Detection: Due to its robustness to outliers, LWR can be utilized in anomaly
detection tasks where detecting anomalies in data is crucial, such as fraud detection in
financial transactions or fault detection in industrial processes.
3. Time Series Forecasting: In time series analysis, LWR can be employed for short-term
forecasting by fitting regression models to local segments of the time series data.

Locally Weighted Regression is a versatile technique in the machine learning toolbox,


offering flexibility and robustness in modeling complex relationships within data. By adapting
locally to the data points, LWR can capture intricate patterns that may be overlooked by
traditional regression methods. Its applications range from function approximation to anomaly
detection and time series forecasting. Understanding and leveraging LWR can thus empower
machine learning practitioners to tackle a wide array of real-world problems effectively.

4. Radial Basis Functions

In the realm of machine learning and artificial intelligence, Neural Networks (NN)
have established their prominence due to their remarkable ability to learn from data and make
predictions or decisions without being explicitly programmed to perform the task. Among
various types of neural networks, radial basis function neural networks (RBFNN) are a
unique class that have proved to be highly effective in various applications including function
approximation, time series prediction, classification, and control.

What are radial basis function neural networks?

A radial basis function (RBF) neural network is a type of artificial neural network that
uses radial basis functions as activation functions. It typically consists of three layers: an
input layer, a hidden layer, and an output layer. The hidden layer applies a radial basis
function, usually a Gaussian function, to the input. The output layer then linearly combines
these outputs to generate the final output. RBF neural networks are highly versatile and are
extensively used in pattern classification tasks, function approximation, and a variety of
machine learning applications. They are especially known for their ability to handle non-
linear problems effectively.

Structure of RBF neural networks

An RBF neural network typically comprises three layers:

 Input layer: This layer simply transmits the inputs to the neurons in the hidden layer.
 Hidden layer: Each neuron in this layer applies a radial basis function to the inputs it
receives.
 Output layer: Each neuron in this layer computes a weighted sum of the outputs from
the hidden layer, resulting in the final output.

Here's the basic flow diagram of the RBF neural network:


Structure of RBF neural networks

Mathematical background

The output yy of an RBF network is a linear combination of radial basis functions. It


is given by:
How do RBF neural networks work?

Radial basis function networks (RBFNs) work by comparing the input to known examples
from the training data to classify it.

Here’s a simplified explanation:

1. RBFNs start with an input vector. This vector is fed into the input layer of the
network.
2. The network also has a hidden layer, which comprises radial basis function (RBF)
neurons.
3. Each of these RBF neurons has a center, and they measure how close the input is to
their center. They do this using a special function called a Gaussian transfer function.
The output of this function is higher when the input is close to the neuron’s center and
lower when the input is far away.
4. The outputs from the hidden layer are then combined in the output layer. Each node in
the output layer corresponds to a different category or class of data. The network
determines the input’s class by calculating a weighted sum of the outputs from the
hidden layer.
5. The final output of the network is a combination of these weighted sums, which is
used to classify the input.

Understanding RBFNN in a fun way

Let’s think of radial basis function networks (RBFNs) as a team of detectives trying to
solve a mystery.

1. The mystery is the input data. It’s like a puzzle that needs to be solved or a question
that needs to be answered.
2. The detectives are the neurons in the hidden layer. Each detective has a special area of
expertise or a “center.” They are good at solving mysteries close to their area of
expertise.
3. When a new mystery comes in, each detective compares it to their area of expertise
using a tool called a Gaussian transfer function. This tool tells them how similar the
mystery is to what they know best. If the mystery is very similar to their area of
expertise, the tool gives a high score. If it’s very different, the tool gives a low score.
4. Once all the detectives have scored the mystery, their scores are combined in a way
that gives more weight to the most confident detectives. This is the weighted sum.
This is done in the output layer. Each node in this layer represents a different possible
solution to the mystery.
5. The final solution to the mystery is the one that gets the highest combined score from
all the detectives. This is how the network classifies the input data.

So, in a nutshell, RBFNs solve mysteries (classify input data) by letting a team of expert
detectives (neurons) compare the mystery to their areas of expertise and combine their scores
to find the best solution.

Training an RBF network

Training an RBF network involves two steps:

1. Determining the centers c_i and the parameter β of


the radial basis functions: This can be done using a
clustering algorithm like K-means on the training data.
2. Determining the weights w_i: This can be done using a
linear regression algorithm on the outputs of the hidden
layer.

Types of Radial Basis Function

There are several types of Radial Basis Functions (RBFs), each with its own characteristics
and mathematical formulations. Some common types include:

 Gaussian Radial Basis Function: It has a bell-shaped curve and is often employed in
various applications due to its simplicity and effectiveness. It is represented as:

 Multiquadric Radial Basis Function: It provides a smooth interpolation and is


commonly used in applications like meshless methods and radial basis function
interpolation. It is defined as:
 Inverse Multiquadric Radial Basis Function: This type of function is similar to the
Multiquadric RBF but has the inverse in the denominator, resulting in a different shape.
Here is the formula for this function:

 Thin Plate Spline Radial Basis Function: The Thin Plate Spline RBF is defined as
ϕ(r)= r2log(r) is the Euclidean distance between the input and the center. This RBF is
often used in applications involving thin-plate splines, which are used for surface
interpolation and deformation.

 Cubic Radial Basis Function: The Cubic RBF is defined as ϕ(r)= r3 where r is the
Euclidean distance. It has cubic polynomial behavior and is sometimes used
in interpolation.

5. Case Based Learning

Case-based learning (CBL) is a machine learning method in which a system learns


from solving previous cases similar to the current task.

In this method, the system uses a case base that contains a number of previously
solved cases that are similar to the current task. The system uses this information to search
for similar cases and apply the previous solution to the current task.

The CBL process consists of three phases: retrieval, adaptation and evaluation. In the
retrieval phase, the system searches for similar cases in the database. In the adaptation phase,
the system modifies the solution of the previous case to fit the current task. In the evaluation
phase, the system evaluates the proposed solution and compares it with the optimal solution.

Case-based learning is used in a variety of applications, such as medical diagnostic


problem solving, pattern recognition, decision making, task planning, among others.

In the realm of machine learning, various techniques and methodologies are employed
to teach algorithms to recognize patterns, make predictions, and solve problems. One such
approach gaining traction is Case Based Learning (CBL). CBL is a method where new
problems are solved by referring to similar past cases. It operates on the premise that the past
experiences of solving problems can be leveraged to tackle new ones. Let’s delve deeper into
what Case Based Learning entails and how it is utilized in the field of machine learning.

Understanding Case Based Learning

At its core, Case Based Learning involves using past experiences or cases to guide
decision-making in new situations. Just as humans often draw from their past experiences to
handle novel problems, CBL algorithms learn from historical cases to make predictions or
decisions. These cases typically consist of a problem, its corresponding solution, and
contextual information. The key idea is to find similarities between the current problem and
past cases and then adapt or apply the solutions accordingly.

How Case Based Learning Works

In the realm of machine learning, the process of Case Based Learning involves several
steps:

1. Case Representation: Each historical case is represented in a structured format,


capturing relevant features and their corresponding outcomes. This representation
facilitates comparison and retrieval during the decision-making process.

2. Case Retrieval: When faced with a new problem, the CBL system retrieves similar cases
from its memory based on certain similarity metrics. These metrics evaluate how closely
the characteristics of the current problem match those of past cases.

3. Adaptation: Once relevant cases are retrieved, the system adapts the solutions from these
cases to fit the current problem. This may involve tweaking the solutions based on the
specific context or conditions of the new problem.

4. Solution Application: Finally, the adapted solution is applied to address the new
problem, and the outcome is observed. Over time, as more cases are encountered and
solutions are applied, the CBL system refines its decision-making abilities.

Benefits of Case Based Learning

Case Based Learning offers several advantages in the realm of machine learning:

1. Flexibility: CBL systems can adapt to a wide range of problems, as they rely on past
experiences rather than predefined rules or models.

2. Incremental Learning: As new cases are encountered, the system continuously learns
and improves its decision-making abilities, making it well-suited for dynamic
environments.

3. Interpretability: Since solutions are based on past cases, the reasoning behind decisions
made by CBL systems is often transparent and understandable.
4. Efficiency: By leveraging existing knowledge, CBL can reduce the need for extensive
computational resources or labeled data, making it a cost-effective approach in certain
scenarios.

Applications of Case Based Learning

Case Based Learning finds applications across various domains, including:

1. Healthcare: CBL can assist in medical diagnosis by referencing similar patient cases and
treatment outcomes.

2. Finance: In financial markets, CBL algorithms can analyze past trading patterns to make
investment decisions.

3. Customer Service: CBL systems can provide personalized recommendations or solutions


to customer queries based on past interactions.

4. Fault Diagnosis: In industrial settings, CBL can help identify and troubleshoot
equipment failures by comparing them to past maintenance cases.

Case-Based Reasoning in Machine Learning


In Case-based reasoning, a new problem is solved by adapting solutions that were also
useful in the past. Therefore, it is also referred to as an experience-based approach/
intelligent problem-solving method. Therefore, it means learning from past experiences and
using that knowledge to approach new problems.

For example, assume there is a CBR mechanism for an e-commerce application that
provides services to its customer. The CBR mechanism can be used to improve customer
experiences based on past experiences. Let’s say if someone likes a particular category, then
the CBR mechanism helps find similar cases to the customer.
Types of Knowledge in case-based reasoning
There are mainly four types of knowledge containers in CBR.
 Vocabulary: Vocabulary involves the process of determining information
about attributes and parameters required for the selection of features that are used for
determining cases.

 Similarity Measures: Similarity measures mean analyzing similarities between


cases and selecting efficient methods to deal with the problem. To choose the most
efficient similar measure, we must have a complete understanding of the actual domain
problem.

 Adaptation knowledge: Adaptation knowledge means adapting the


information needed to evaluate multiple stages in the CBR cycle.
It tells how differences in problems affect the solution. It involves guiding a process to
improve a solution and select those which fit better to the needs and constraints of the
given problem.

 Cases: Cases consist of information about a solved problem. They are


the representation of stored knowledge of past experiences and are a fundamental
part of the CBR cycle. Their content is determined on the basis of selected vocabulary.

CBR Cycle
The CBR (Case-Based Learning) cycle is an iterative process that tells us how a new
problem is approached. It refers to collecting together past experiences and making use of
relevant information. Let's discuss the main steps in the Case-based reasoning cycle.

 New Problem: The CBR cycle starts when a new problem arrives.

 Case retrieval: After properly analyzing the problem, the relevant information is
extracted by comparing similar cases with the newly arrived problem. Therefore, we put
together useful similar cases to solve the problem.

 Case reuse: As a next step in the cycle, the past information is reused, and feasible
solutions are selected from similar cases.
 Reasoning results: It refers to applying the filtered information and solutions by
analyzing previous similar cases to the present problem. In this step, we generally use
algorithms and heuristics to narrow down to a solution.

 Case revision: After applying the necessary information, the next step is to evaluate the
output→judge its effectiveness →and produce feedback.
Suppose the result’s quality is not up to the mark. In that case,
the solution is modified according to the feedback, and an efficient solution is recorded
for future similar cases like this.

 Case retain: Now, this revised case with the present problem and the efficient
solution is stored as the case base, thus enriching with more problem-solving
capabilities.
Case-Based Reasoning VS Other Techniques
The case-based reasoning is an exceptionally well-designed technique that uses past
experiences to improve its performance. Let's discuss how it differs from other techniques.
Rule-based systems
In rule-based reasoning, pre-defined rules are used for solving problems. Experts in
the field design this set of rules. On the other hand, CBR is efficient in handling situations
where the rules may not be efficient or are not present.
Decision trees
These are a type of algorithms are used for solving classification problems and
widely used in Machine Learning and data mining. While CBR uses past experiences instead
of creating a decision tree based on the given data.
Neural networks
These are machine learning algorithms that use past information to make predictions
on the provided information. They are used in natural language processing, stock market
predictions, intelligent searching, and image recognition and are inspired by a human’s brain
structure. While CBR’s primary focus is learning from stored past experiences cases.
Advantages of Case-Based Reasoning
Below are the advantages of case-based reasoning in machine learning.
 CBR helps in avoiding past mistakes.

 CBR is efficient in adapting to new problems by making use of previous cases.

 CBR considers small details, features, and information relevant to the problem for
effective decision-making.

 Case retention in CBR helps in expanding knowledge and improving the system.
Disadvantages of Case-based Reading
Below are the disadvantages of case-based reasoning in machine learning.
 CBR can be time-consuming as it involves various stages of retrieving similar cases and
arriving at the best solution.

 CBR is a resource-intensive process it requires more computational power.

 Scalability is also an issue in CBR. With the increase in the case base, the time needed
for retrieving and adapting the cases also increases, affecting the overall efficiency.
 CBR may give out incorrect solutions if the cases are not represented correctly.
Applications of Case-Based Reasoning
Below are the applications of case-based reasoning in machine learning
 CBR is used in fault diagnosis systems for resolving faults. CBR helps in providing
guidance to diagnose faults.

 CBR can be used for the designing process by using previous design cases that fit into a
similar set of rules.

 It is also efficient for planning a process using past experiences and arranging the action
sequence.

 CBR can be used in classifying tasks by analyzing past cases and features associated
with them.

Consequences
Case-based reasoning can help solve problems more efficiently. For example, if you
are struggling to solve a problem, others will often advise you to take a break and tackle it
with fresh eyes later. Case-based reasoning, essentially, is looking at past problems and their
solutions with fresh eyes and trying to apply them to a new situation.
Case-based reasoning recognizes that usually, the second time we approach a task we
are better at it. We have some experience; we remember our mistakes and successes and
proceed accordingly. There are various factors that can influence how effective one’s use of
case-based reasoning is:
 Experiences: It is likely that the more you’ve experienced, the better you will
be at case-based reasoning. If you’ve only cooked pasta once and try to whip
up fettuccine alfredo, you only have one prior experience to use as a problem-
solving tool. If you’ve cooked pasta 100 times, you have 100 pieces of
evidence to work with.
 Creativity and innovativeness: Case-based reasoning is only going to be
effective if you have the ability to consider creative ways to adapt past
solutions to new problems. Even if you’ve had 100 experiences, tackling a
new problem the same way that you’ve tackled an old one hurts your chance
of success. Being innovative allows you to slightly adjust old tactics and
consider new ways of solving problems.
 The ability to perceive new situations in light of old experiences: On a basic
level, case-based reasoning requires one to be able to remember past
experiences, so a decent memory is a condition for this problem-solving
approach. For case-based reasoning to work, one must find comparisons
between new problems and old problems. Drawing links between the two —
whether it be similarities or differences — will help in identifying what tactics
are going to be useful and which ones do not apply to the new situation.
Indexing experiences in such a way that makes them easy to recall is
important.
 An ability to evaluate: Being rational and logical when it comes to case-based
reasoning is also important. You should be able to adequately evaluate how
successful or unsuccessful past endeavors were to know which avenues are
worth pursuing.2

Controversies
Case-based reasoning contrasts with other popular theories of how artificial
intelligence functions. Knowledge-based reasoning, for example, suggests that having general
knowledge of a problem or domain and drawing links between general ideas and specific
problems is how computers and people arrive at solutions. Case-based reasoning,
alternatively, suggests that we recall specific instances and use them to perceive the new
problem.
Since case-based reasoning is based on anecdotal past experiences, it is very personal
and deviates from facts and agreed-upon knowledge. This makes it not the best approach
when working on a team. It also is liable to personal biases and memory as someone may
think a past solution worked really well but it was actually only a good solution for them, or
they may have recalled the past inaccurately. As well, the use of anecdotal evidence means
that the success of the approach is difficult to evaluate.
There are four steps to case-based reasoning (Retrieve, Reuse, Revise, and Retain),
and with each step comes the potential for error.6 Case-based reasoning also requires
individuals to properly remember quite a few past experiences with high levels of accuracy
— it is a bit like trying to memorize an entire scientific textbook rather than trying to
understand various processes so that you’ll be able to apply them to future reasoning, which
knowledge-based reasoning employs.

You might also like