0% found this document useful (0 votes)
7 views7 pages

ML 4

Instance-Based Learning (IBL) is a machine learning approach that utilizes past training examples for predictions without creating a model during training. The K-Nearest Neighbor (KNN) algorithm, a popular IBL method, classifies new instances based on the majority class of their k nearest neighbors. Variants of KNN, such as Weighted KNN and Fuzzy KNN, aim to enhance its performance and adaptability in various applications.

Uploaded by

charitha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views7 pages

ML 4

Instance-Based Learning (IBL) is a machine learning approach that utilizes past training examples for predictions without creating a model during training. The K-Nearest Neighbor (KNN) algorithm, a popular IBL method, classifies new instances based on the majority class of their k nearest neighbors. Variants of KNN, such as Weighted KNN and Fuzzy KNN, aim to enhance its performance and adaptability in various applications.

Uploaded by

charitha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

1 What is instance based learning?

Explain K Nearest neighbor algorithm

Instance-Based Learning
Instance-Based Learning (IBL) is a type of machine learning where the system stores all training examples and uses them
to make predictions when needed.

 It is also called lazy learning, because it does not learn a model during training.
 Instead, it waits until a new instance appears, then finds similar past examples to make a prediction.

Main Idea:
“Remember the past to predict the future.”

K-Nearest Neighbor (KNN) Algorithm


K-Nearest Neighbor (KNN) is a popular instance-based learning algorithm used for classification and regression.

Steps of KNN:
1. Choose a number k (e.g., 3, 5).
2. Measure the distance (e.g., Euclidean distance) between the new point and all training points.
3. Select the k nearest training points.
4. Vote: The most common class among the neighbors is chosen as the predicted class.

Distance Formula (Euclidean):


𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒 = √(𝑥1 − 𝑥2 )2 + (𝑦1 − 𝑦2 )2
Example:
Suppose we have 3 students:

Height Weight Class


160 cm 60 kg Fit
165 cm 65 kg Fit
180 cm 85 kg Unfit

Now we want to classify a new student: 170 cm, 70 kg


 Find distance to all 3 students.
 Pick k = 2 nearest neighbors.
 Suppose both are “Fit”, then classify as “Fit”.

Advantages:
 Simple to understand and implement.
 No training time required.
 Works well with small datasets.

Disadvantages:
 Slow with large datasets.
 Performance depends on value of k.
 Sensitive to irrelevant features and scaling.

Instance-Based Learning methods like KNN rely on past data to make decisions. KNN is easy to implement, makes no
assumptions about the data, and is widely used in pattern recognition and recommendation systems.

2 Explain about Radial basis functions.

Radial Basis Functions (RBF) are a type of function used in machine learning, especially in neural networks and function
approximation tasks. They are most commonly used in RBF Networks, which are a kind of artificial neural network.
What is a Radial Basis Function?
A Radial Basis Function is a real-valued function whose output depends only on the distance from a central point (called
a center). It is radially symmetric, meaning the value changes only with distance and not direction.

General form:
∅(𝑥) = ∅(‖𝑥 − 𝑐‖)
Where:
 x = input vector
 c = center of the function
 ‖𝑥 − 𝑐‖ = Euclidean distance between x and c

Common example: Gaussian RBF


‖𝑥−𝑐‖2

∅(𝑥) = 𝑒 2𝜎 2

 σ controls the width (spread) of the curve.

RBF Neural Network Structure


An RBF Network typically has three layers:
1. Input Layer: Passes the input features.
2. Hidden Layer: Applies RBFs (like Gaussians) centered at selected points.
3. Output Layer: Performs a weighted sum of RBF outputs (like linear regression).

Working:
 Each hidden unit computes a radial basis function value based on the distance between the input and its center.
 These outputs are then combined linearly to give the final output.

Applications:
 Function approximation
 Pattern classification
 Time-series prediction
 Control systems

Advantages:
 Fast training and good approximation abilities.
 Effective for problems with localized patterns.
 Can approximate any continuous function with enough neurons.

Disadvantages:
 Choice of centers and spread (σ) is critical.
 Performance drops with high-dimensional data.
 Sensitive to outliers.

Radial Basis Functions are powerful tools used in neural networks for tasks like classification and function
approximation. RBF networks are especially useful when the data has localized behaviors, and they offer simple,
interpretable models with strong performance in many practical problems.

3 Discuss about Locally Weighted linear regression and its remarks.

Locally Weighted Linear Regression (LWLR) is a non-parametric algorithm that performs linear regression but only
focuses on a local neighborhood of the input point when making predictions.

Unlike standard linear regression that fits one global model, LWLR fits many local models - each tailored to the area
near the test point.
Key Idea:
Instead of fitting a single straight line through all the data, LWLR fits a line around the point of interest, giving more
importance (weight) to nearby points and less to distant ones.

Working of LWLR:
1. For a given test point x, compute the weights for all training points using a kernel function (usually a Gaussian).

(𝑥−𝑥𝑖)2

𝑤𝑖 = 𝑒 2𝜏 2

o 𝑤𝑖 : Weight for training point 𝑥𝑖


o τ: Bandwidth (controls how far the influence of nearby points extends)
2. Perform weighted linear regression using these weights.
3. Predict the output for xx using the model created from weighted training data.

Characteristics:
 It is a lazy learning algorithm (like KNN); it delays computation until prediction time.
 Uses different regression models for different query points.
 Works well when the data is non-linear or has local variations.

Remarks / Advantages:
 Flexible: Can fit complex, non-linear relationships.
 No need to pre-train a global model.
 Interpretable: Uses linear models locally.
 Good accuracy when data has local patterns.

Example Application:
 Predicting housing prices based on location and size.
 In different neighborhoods, price trends may vary - LWLR captures these local trends effectively.

Locally Weighted Linear Regression is a powerful and flexible method for modeling data with local patterns. Although it
is slower than standard regression during prediction, it provides more accurate and tailored predictions in many real-
world tasks where data behavior changes in different regions.

4 Illustrate case based reasoning.

Case-Based Reasoning (CBR) is a problem-solving technique in artificial intelligence where new problems are solved
using past experiences (called cases).

Instead of learning rules or models from data, CBR stores solved cases and reuses them to solve similar problems in the
future.

Key Steps in Case-Based Reasoning:


1. Retrieve:
Find the most similar past case(s) from the case library based on the current problem.
2. Reuse:
Use the solution of the retrieved case, possibly with modifications, to solve the new problem.
3. Revise:
If necessary, adapt or update the proposed solution to better fit the current situation.
4. Retain:
Store the new solution and its results as a new case in the database for future use.
Example: Medical Diagnosis
 Problem: A doctor encounters a patient with certain symptoms.
 Retrieve: The system finds a past patient case with similar symptoms.
 Reuse: Uses the same diagnosis or treatment plan.
 Revise: Adjusts the treatment slightly based on current patient’s condition.
 Retain: Stores this new case for future diagnosis.

Advantages:
 Learns continuously by adding new cases.
 Mimics human reasoning (people solve problems by remembering similar situations).
 Useful when a complete model is hard to build.
 Easily explainable since it refers to real examples.

Disadvantages:
 Needs a well-organized case database.
 Retrieval can be slow for large datasets.
 Poor results if similar cases do not exist.
 May require adaptation rules for complex problems.

Applications of CBR:
 Medical diagnosis
 Customer support systems
 Legal case analysis
 Technical troubleshooting (e.g., computer support)

Case-Based Reasoning is a powerful AI method that solves problems based on previously solved similar cases. It is
especially useful in complex, real-world domains where building explicit rules is difficult, and experience plays a major
role in problem-solving.

5 Discuss the variants of k-nearest neighbor learning algorithm and also discuss its remarks.

The K-Nearest Neighbor (KNN) algorithm is a type of instance-based learning where a new instance is classified based
on the majority class of its k closest neighbors in the training data.

Several variants of KNN have been developed to improve its performance, accuracy, and efficiency.

Variants of KNN Algorithm

1. Weighted KNN
 In standard KNN, all neighbors contribute equally.
 In Weighted KNN, closer neighbors are given higher weights than distant ones.
1
 Example: Use inverse distance as weight - 𝑤𝑖 = 2
𝑑 𝑖
2. Distance Metric Variants
 Different distance metrics can be used:
o Euclidean distance
o Manhattan distance
o Cosine similarity
o Minkowski distance
 The choice affects classification performance depending on data type.

3. Edited and Condensed KNN


 Edited KNN removes noisy or misclassified data to improve accuracy.
 Condensed KNN reduces dataset size by keeping only critical samples, making the algorithm faster.
4. Local KNN
 The value of k is selected based on the local density of data points.
 Smaller k in dense regions, larger k in sparse regions.

5. Fuzzy KNN
 Instead of assigning a hard class, Fuzzy KNN assigns membership values to different classes based on proximity.
 Useful when class boundaries are not clear.

Remarks (Advantages and Disadvantages)

Advantages:
 Simple and easy to implement.
 No training phase - fast setup.
 Performs well with small datasets and well-separated classes.

Disadvantages:
 Slow during prediction with large datasets.
 Sensitive to irrelevant features and noise.
 Choice of k greatly affects performance.
 Requires proper feature scaling (e.g., normalization).

Applications of KNN:
 Handwriting recognition
 Recommender systems
 Image and speech classification
 Credit scoring

KNN is a powerful, intuitive algorithm with various variants that improve its accuracy, speed, and adaptability. By
choosing the right variant and parameters, KNN can be highly effective in many real-world classification and regression
tasks.

6 Discuss about Locally Weighted linear regression and its remarks.

Locally Weighted Linear Regression (LWLR) is a non-parametric machine learning algorithm that improves regular linear
regression by focusing on local data points instead of the entire dataset.

Unlike standard linear regression that fits one global line, LWLR fits a new regression line for each test point, giving
more importance (weight) to data points that are closer to it.

How LWLR Works:


1. For a new input x, compute weights for all training data points.
2. The weights are higher for points that are closer to x and lower for faraway points.
3. Use these weights to perform a weighted linear regression.
4. The output is the predicted value for x.

Weight Function (Gaussian Kernel):


(𝑥−𝑥𝑖)2

𝑤𝑖 = 𝑒 2𝜏 2

 𝑤𝑖 : Weight of training point 𝑥𝑖


 τ: Bandwidth parameter (controls how "local" the model is)
Example:
If we want to predict a house price based on area, LWLR would fit a line only using nearby houses (in terms of area) to
make the prediction. This helps capture local trends better than global linear regression.
Remarks

Advantages:
 Captures local patterns in the data.
 Works well when data shows non-linear behavior.
 No need to train a global model.

Disadvantages:
 Slow at prediction time, since it fits a new model for every test point.
 Performance depends heavily on choice of τ (bandwidth).
 Sensitive to noise if τ is too small.
 Not efficient with large datasets.

Applications:
 Time-series prediction
 Medical data analysis
 Stock market forecasting
 Any task with non-linear but smooth trends

Locally Weighted Linear Regression is a flexible and powerful method that adapts to local changes in the data. It is
especially useful when the data doesn't follow a simple global pattern, but rather changes in small regions.

7 Write the differences between Eager Learning and Lazy learning approaches.

Aspect Eager Learning Lazy Learning


Definition Learns a model during training time Delays learning until a query is made
Training Time High (model is built in advance) Low (no model is built beforehand)
Prediction Fast (uses pre-built model) Slow (must compute at query time)
Time
Memory Usage Low (stores only model) High (stores entire training data)
Examples Decision Trees, Naive Bayes, Neural K-Nearest Neighbor (KNN), Case-Based
Networks Reasoning
Generalization Generalizes before seeing the query Generalizes after seeing the query
Adaptability to Requires retraining for new data Easily adapts (just add to dataset)
New Data
Efficiency More efficient for large number of Less efficient for frequent queries
queries
Flexibility Less flexible (fixed model) More flexible (adapts locally per query)
Suitability Good for static, large datasets Good for small, dynamic, or locally
varying datasets

8 Illustrate K Nearest neighbor algorithm.

K-Nearest Neighbor (KNN) is a supervised learning algorithm used for classification and regression. It classifies a new
data point based on the majority class of its K nearest neighbors in the training dataset.

It is a lazy learning method, meaning it does not learn a model in advance but makes decisions at prediction time.

Working of KNN (Step-by-Step):


1. Choose K (number of neighbors to consider).
2. Calculate the distance between the new point and all training data points. Common distances:
o Euclidean distance
o Manhattan distance
3. Identify the K nearest neighbors (the points with the smallest distances).
4. Majority voting (for classification):
o The class which occurs most frequently among the K neighbors is assigned to the new point.
5. For regression:
o The average of the K nearest neighbor values is the output.

Example:
Let’s classify a new fruit based on its weight and color score.

Fruit Weight Color Score Class


A 100g 0.5 Apple
B 110g 0.6 Apple
C 120g 0.8 Orange
D 130g 0.9 Orange

 New fruit: 115g, 0.7


 Use K = 3
 Find distances to all fruits
 Closest 3 are: A, B, C
 Classes: Apple, Apple, Orange → Majority = Apple
 So, the new fruit is classified as Apple

Diagram (Optional to draw in exam):


Plot training points on a 2D graph (Weight vs Color Score), and show a circle around the test point that includes its K
nearest neighbors.

Advantages:
 Simple to understand and implement
 No training time needed
 Adapts to changing data

Disadvantages:
 Slow for large datasets
 Sensitive to irrelevant features and noise
 Requires proper feature scaling

K-Nearest Neighbor is a powerful and flexible algorithm used in many real-world applications like image recognition,
recommendation systems, and medical diagnosis. Its effectiveness depends on the right choice of K and distance
metric.

You might also like