1 What is instance based learning?
Explain K Nearest neighbor algorithm
Instance-Based Learning
Instance-Based Learning (IBL) is a type of machine learning where the system stores all training examples and uses them
to make predictions when needed.
It is also called lazy learning, because it does not learn a model during training.
Instead, it waits until a new instance appears, then finds similar past examples to make a prediction.
Main Idea:
“Remember the past to predict the future.”
K-Nearest Neighbor (KNN) Algorithm
K-Nearest Neighbor (KNN) is a popular instance-based learning algorithm used for classification and regression.
Steps of KNN:
1. Choose a number k (e.g., 3, 5).
2. Measure the distance (e.g., Euclidean distance) between the new point and all training points.
3. Select the k nearest training points.
4. Vote: The most common class among the neighbors is chosen as the predicted class.
Distance Formula (Euclidean):
𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒 = √(𝑥1 − 𝑥2 )2 + (𝑦1 − 𝑦2 )2
Example:
Suppose we have 3 students:
Height Weight Class
160 cm 60 kg Fit
165 cm 65 kg Fit
180 cm 85 kg Unfit
Now we want to classify a new student: 170 cm, 70 kg
Find distance to all 3 students.
Pick k = 2 nearest neighbors.
Suppose both are “Fit”, then classify as “Fit”.
Advantages:
Simple to understand and implement.
No training time required.
Works well with small datasets.
Disadvantages:
Slow with large datasets.
Performance depends on value of k.
Sensitive to irrelevant features and scaling.
Instance-Based Learning methods like KNN rely on past data to make decisions. KNN is easy to implement, makes no
assumptions about the data, and is widely used in pattern recognition and recommendation systems.
2 Explain about Radial basis functions.
Radial Basis Functions (RBF) are a type of function used in machine learning, especially in neural networks and function
approximation tasks. They are most commonly used in RBF Networks, which are a kind of artificial neural network.
What is a Radial Basis Function?
A Radial Basis Function is a real-valued function whose output depends only on the distance from a central point (called
a center). It is radially symmetric, meaning the value changes only with distance and not direction.
General form:
∅(𝑥) = ∅(‖𝑥 − 𝑐‖)
Where:
x = input vector
c = center of the function
‖𝑥 − 𝑐‖ = Euclidean distance between x and c
Common example: Gaussian RBF
‖𝑥−𝑐‖2
−
∅(𝑥) = 𝑒 2𝜎 2
σ controls the width (spread) of the curve.
RBF Neural Network Structure
An RBF Network typically has three layers:
1. Input Layer: Passes the input features.
2. Hidden Layer: Applies RBFs (like Gaussians) centered at selected points.
3. Output Layer: Performs a weighted sum of RBF outputs (like linear regression).
Working:
Each hidden unit computes a radial basis function value based on the distance between the input and its center.
These outputs are then combined linearly to give the final output.
Applications:
Function approximation
Pattern classification
Time-series prediction
Control systems
Advantages:
Fast training and good approximation abilities.
Effective for problems with localized patterns.
Can approximate any continuous function with enough neurons.
Disadvantages:
Choice of centers and spread (σ) is critical.
Performance drops with high-dimensional data.
Sensitive to outliers.
Radial Basis Functions are powerful tools used in neural networks for tasks like classification and function
approximation. RBF networks are especially useful when the data has localized behaviors, and they offer simple,
interpretable models with strong performance in many practical problems.
3 Discuss about Locally Weighted linear regression and its remarks.
Locally Weighted Linear Regression (LWLR) is a non-parametric algorithm that performs linear regression but only
focuses on a local neighborhood of the input point when making predictions.
Unlike standard linear regression that fits one global model, LWLR fits many local models - each tailored to the area
near the test point.
Key Idea:
Instead of fitting a single straight line through all the data, LWLR fits a line around the point of interest, giving more
importance (weight) to nearby points and less to distant ones.
Working of LWLR:
1. For a given test point x, compute the weights for all training points using a kernel function (usually a Gaussian).
(𝑥−𝑥𝑖)2
−
𝑤𝑖 = 𝑒 2𝜏 2
o 𝑤𝑖 : Weight for training point 𝑥𝑖
o τ: Bandwidth (controls how far the influence of nearby points extends)
2. Perform weighted linear regression using these weights.
3. Predict the output for xx using the model created from weighted training data.
Characteristics:
It is a lazy learning algorithm (like KNN); it delays computation until prediction time.
Uses different regression models for different query points.
Works well when the data is non-linear or has local variations.
Remarks / Advantages:
Flexible: Can fit complex, non-linear relationships.
No need to pre-train a global model.
Interpretable: Uses linear models locally.
Good accuracy when data has local patterns.
Example Application:
Predicting housing prices based on location and size.
In different neighborhoods, price trends may vary - LWLR captures these local trends effectively.
Locally Weighted Linear Regression is a powerful and flexible method for modeling data with local patterns. Although it
is slower than standard regression during prediction, it provides more accurate and tailored predictions in many real-
world tasks where data behavior changes in different regions.
4 Illustrate case based reasoning.
Case-Based Reasoning (CBR) is a problem-solving technique in artificial intelligence where new problems are solved
using past experiences (called cases).
Instead of learning rules or models from data, CBR stores solved cases and reuses them to solve similar problems in the
future.
Key Steps in Case-Based Reasoning:
1. Retrieve:
Find the most similar past case(s) from the case library based on the current problem.
2. Reuse:
Use the solution of the retrieved case, possibly with modifications, to solve the new problem.
3. Revise:
If necessary, adapt or update the proposed solution to better fit the current situation.
4. Retain:
Store the new solution and its results as a new case in the database for future use.
Example: Medical Diagnosis
Problem: A doctor encounters a patient with certain symptoms.
Retrieve: The system finds a past patient case with similar symptoms.
Reuse: Uses the same diagnosis or treatment plan.
Revise: Adjusts the treatment slightly based on current patient’s condition.
Retain: Stores this new case for future diagnosis.
Advantages:
Learns continuously by adding new cases.
Mimics human reasoning (people solve problems by remembering similar situations).
Useful when a complete model is hard to build.
Easily explainable since it refers to real examples.
Disadvantages:
Needs a well-organized case database.
Retrieval can be slow for large datasets.
Poor results if similar cases do not exist.
May require adaptation rules for complex problems.
Applications of CBR:
Medical diagnosis
Customer support systems
Legal case analysis
Technical troubleshooting (e.g., computer support)
Case-Based Reasoning is a powerful AI method that solves problems based on previously solved similar cases. It is
especially useful in complex, real-world domains where building explicit rules is difficult, and experience plays a major
role in problem-solving.
5 Discuss the variants of k-nearest neighbor learning algorithm and also discuss its remarks.
The K-Nearest Neighbor (KNN) algorithm is a type of instance-based learning where a new instance is classified based
on the majority class of its k closest neighbors in the training data.
Several variants of KNN have been developed to improve its performance, accuracy, and efficiency.
Variants of KNN Algorithm
1. Weighted KNN
In standard KNN, all neighbors contribute equally.
In Weighted KNN, closer neighbors are given higher weights than distant ones.
1
Example: Use inverse distance as weight - 𝑤𝑖 = 2
𝑑 𝑖
2. Distance Metric Variants
Different distance metrics can be used:
o Euclidean distance
o Manhattan distance
o Cosine similarity
o Minkowski distance
The choice affects classification performance depending on data type.
3. Edited and Condensed KNN
Edited KNN removes noisy or misclassified data to improve accuracy.
Condensed KNN reduces dataset size by keeping only critical samples, making the algorithm faster.
4. Local KNN
The value of k is selected based on the local density of data points.
Smaller k in dense regions, larger k in sparse regions.
5. Fuzzy KNN
Instead of assigning a hard class, Fuzzy KNN assigns membership values to different classes based on proximity.
Useful when class boundaries are not clear.
Remarks (Advantages and Disadvantages)
Advantages:
Simple and easy to implement.
No training phase - fast setup.
Performs well with small datasets and well-separated classes.
Disadvantages:
Slow during prediction with large datasets.
Sensitive to irrelevant features and noise.
Choice of k greatly affects performance.
Requires proper feature scaling (e.g., normalization).
Applications of KNN:
Handwriting recognition
Recommender systems
Image and speech classification
Credit scoring
KNN is a powerful, intuitive algorithm with various variants that improve its accuracy, speed, and adaptability. By
choosing the right variant and parameters, KNN can be highly effective in many real-world classification and regression
tasks.
6 Discuss about Locally Weighted linear regression and its remarks.
Locally Weighted Linear Regression (LWLR) is a non-parametric machine learning algorithm that improves regular linear
regression by focusing on local data points instead of the entire dataset.
Unlike standard linear regression that fits one global line, LWLR fits a new regression line for each test point, giving
more importance (weight) to data points that are closer to it.
How LWLR Works:
1. For a new input x, compute weights for all training data points.
2. The weights are higher for points that are closer to x and lower for faraway points.
3. Use these weights to perform a weighted linear regression.
4. The output is the predicted value for x.
Weight Function (Gaussian Kernel):
(𝑥−𝑥𝑖)2
−
𝑤𝑖 = 𝑒 2𝜏 2
𝑤𝑖 : Weight of training point 𝑥𝑖
τ: Bandwidth parameter (controls how "local" the model is)
Example:
If we want to predict a house price based on area, LWLR would fit a line only using nearby houses (in terms of area) to
make the prediction. This helps capture local trends better than global linear regression.
Remarks
Advantages:
Captures local patterns in the data.
Works well when data shows non-linear behavior.
No need to train a global model.
Disadvantages:
Slow at prediction time, since it fits a new model for every test point.
Performance depends heavily on choice of τ (bandwidth).
Sensitive to noise if τ is too small.
Not efficient with large datasets.
Applications:
Time-series prediction
Medical data analysis
Stock market forecasting
Any task with non-linear but smooth trends
Locally Weighted Linear Regression is a flexible and powerful method that adapts to local changes in the data. It is
especially useful when the data doesn't follow a simple global pattern, but rather changes in small regions.
7 Write the differences between Eager Learning and Lazy learning approaches.
Aspect Eager Learning Lazy Learning
Definition Learns a model during training time Delays learning until a query is made
Training Time High (model is built in advance) Low (no model is built beforehand)
Prediction Fast (uses pre-built model) Slow (must compute at query time)
Time
Memory Usage Low (stores only model) High (stores entire training data)
Examples Decision Trees, Naive Bayes, Neural K-Nearest Neighbor (KNN), Case-Based
Networks Reasoning
Generalization Generalizes before seeing the query Generalizes after seeing the query
Adaptability to Requires retraining for new data Easily adapts (just add to dataset)
New Data
Efficiency More efficient for large number of Less efficient for frequent queries
queries
Flexibility Less flexible (fixed model) More flexible (adapts locally per query)
Suitability Good for static, large datasets Good for small, dynamic, or locally
varying datasets
8 Illustrate K Nearest neighbor algorithm.
K-Nearest Neighbor (KNN) is a supervised learning algorithm used for classification and regression. It classifies a new
data point based on the majority class of its K nearest neighbors in the training dataset.
It is a lazy learning method, meaning it does not learn a model in advance but makes decisions at prediction time.
Working of KNN (Step-by-Step):
1. Choose K (number of neighbors to consider).
2. Calculate the distance between the new point and all training data points. Common distances:
o Euclidean distance
o Manhattan distance
3. Identify the K nearest neighbors (the points with the smallest distances).
4. Majority voting (for classification):
o The class which occurs most frequently among the K neighbors is assigned to the new point.
5. For regression:
o The average of the K nearest neighbor values is the output.
Example:
Let’s classify a new fruit based on its weight and color score.
Fruit Weight Color Score Class
A 100g 0.5 Apple
B 110g 0.6 Apple
C 120g 0.8 Orange
D 130g 0.9 Orange
New fruit: 115g, 0.7
Use K = 3
Find distances to all fruits
Closest 3 are: A, B, C
Classes: Apple, Apple, Orange → Majority = Apple
So, the new fruit is classified as Apple
Diagram (Optional to draw in exam):
Plot training points on a 2D graph (Weight vs Color Score), and show a circle around the test point that includes its K
nearest neighbors.
Advantages:
Simple to understand and implement
No training time needed
Adapts to changing data
Disadvantages:
Slow for large datasets
Sensitive to irrelevant features and noise
Requires proper feature scaling
K-Nearest Neighbor is a powerful and flexible algorithm used in many real-world applications like image recognition,
recommendation systems, and medical diagnosis. Its effectiveness depends on the right choice of K and distance
metric.