0% found this document useful (0 votes)
19 views27 pages

Linear Regression in Machine Learning

Linear regression is a fundamental machine learning algorithm used for predictive analysis of continuous variables, showing a linear relationship between dependent and independent variables. It can be categorized into simple and multiple linear regression, with the goal of finding the best-fit line that minimizes prediction error. Additionally, concepts like linear separability, candidate elimination, and various machine learning types (supervised, unsupervised, semi-supervised, and reinforcement learning) are discussed, highlighting their applications and challenges.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views27 pages

Linear Regression in Machine Learning

Linear regression is a fundamental machine learning algorithm used for predictive analysis of continuous variables, showing a linear relationship between dependent and independent variables. It can be categorized into simple and multiple linear regression, with the goal of finding the best-fit line that minimizes prediction error. Additionally, concepts like linear separability, candidate elimination, and various machine learning types (supervised, unsupervised, semi-supervised, and reinforcement learning) are discussed, highlighting their applications and challenges.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Linear Regression in Machine Learning

• Linear regression is one of the easiest and most popular Machine Learning algorithms.
It is a statistical method that is used for predictive analysis.
• Linear regression makes predictions for continuous/real or numeric variables such as
sales, salary, age, product price, etc.
• Linear regression algorithm shows a linear relationship between a dependent (y) and
one or more independent (y) variables, hence called as linear regression.
• Since linear regression shows the linear relationship, which means it finds how the
value of the dependent variable is changing according to the value of the independent
variable.
• The linear regression model provides a sloped straight line representing the
relationship between the variables. Consider the below image:

Mathematically, we can represent a linear regression as: y= a0+a1x+ ε


Y= Dependent Variable
X= Independent Variable
a0= intercept of the line
a1 = Linear regression coefficient
ε = random error
The values for x and y variables are training datasets for Linear Regression model
representation.
Types of Linear Regression
Linear regression can be further divided into two types of the algorithm:
Simple Linear Regression: If a single independent variable is used to predict the value
of a numerical dependent variable, then such a Linear Regression algorithm is called
Simple Linear Regression
Multiple Linear Regression: If more than one independent variable is used to predict
the value of a numerical dependent variable, then such a Linear Regression algorithm
is called Multiple Linear Regression.
Linear Regression Line
A linear line showing the relationship between the dependent and independent variables is
called a regression line. A regression line can show two types of relationship:
Positive Linear Relationship:
If the dependent variable increases on the Y-axis and independent variable increases on X-
axis, then such a relationship is termed as a Positive linear relationship.
Negative Linear Relationship:
If the dependent variable decreases on the Y-axis and independent variable increases on the
X-axis, then such a relationship is called a negative linear relationship.

The goal of the algorithm is to find the best Fit Line equation that can predict the values based
on the independent variables.
Best Fit Line:
• The best Fit Line equation provides a straight line that represents the relationship between
the dependent and independent variables.
• The slope of the line indicates how much the dependent variable changes for a unit change
in the independent variable(s).
• Our primary objective while using linear regression is to locate the best-fit line, which implies
that the error between the predicted and actual values should be kept to a minimum.
• There will be the least error in the best-fit line.
Cost or loss Function:
• It measures the error between what the model predicts (Ŷ) and the actual answer (Y).
• It tells us how wrong the model is.

• The goal is to find the best values for:


o θ₁ (intercept or starting point of the line)
o θ₂ (slope or how much y changes with x)
• This gives us the best-fit line that closely matches the data.
• The model predicts values using the formula:
ŷᵢ = θ₁ + θ₂xᵢ
(Where xᵢ is the input, and ŷᵢ is the predicted output)
• MSE Formula:
Linear separability

Linear Separability implies that if there are two classes then there will be point, line, plane, or
hyperplane that splits the input features in such a way that all points of one class are in one-
half space and the second class is in the other half-space.

Linear separability is not an inherent property of the data, but depends on how the data is
represented or transformed. Proper feature engineering and data preprocessing can often
turn non-linearly separable data into linearly separable.

Linear separability is a foundational concept in machine learning that makes binary


classification problems simpler. While many datasets are not linearly separable,
understanding this concept helps choose the right models and techniques for classification.

• For example, here is a case of selling a house based on area and price. We have got a number
of data points for that along with the class, which is house Sold/Not Sold:

Importance in Supervised Learning:


• Especially important in binary classification problems.
• It helps in determining whether linear models like Logistic Regression, Perceptron, and Linear
SVM can classify the data accurately.
• If data is linearly separable, classification becomes easier and faster.

Use in Machine Learning Algorithms:


• Algorithms like Perceptron and Linear SVM are designed to find such separating hyperplanes.
• These models work best when data is perfectly separable using linear boundaries.

Linearly Non-Separable Data:

Many real-world datasets are not linearly separable (e.g., XOR problem).

In such cases, we need:


• Non-linear classifiers like Decision Trees, Neural Networks, etc.
• Or apply feature transformations or kernel tricks (in SVM) to make data linearly
separable in a higher-dimensional space.

Candidate Elimination and Version space


The Candidate Elimination Algorithm is a technique used in concept learning within machine
learning. It identifies all the hypotheses that correctly match the given training data. The
algorithm updates its set of possible hypotheses by using both positive and negative
examples. The group of all consistent hypotheses is known as the version space.

• The Candidate Elimination Algorithm (CEA) is used to find the correct target concept
from a hypothesis space based on given examples.

• It does this by keeping only the hypotheses that match all training examples.

• It uses two important boundaries:

o S (Specific hypothesis): The most specific hypothesis that fits all positive
examples.

o G (General hypothesis): The most general hypothesis that fits all positive
examples and rejects all negative ones.

• The area between S and G is called the version space, which contains all the possible
correct hypotheses.

Important Terms Used


1. Concept Learning

• Concept learning is the process of learning a general rule or condition (called a


hypothesis) from given examples.

• The goal is to generalize from the specific training data to correctly classify unseen
data.

2. Hypothesis

• A hypothesis is a rule that defines the concept.


For example: “If sky = sunny and humidity = normal, then play = yes.”

3. Specific Hypothesis (S)

• The most specific rule that covers only the positive examples.

• Initially, it is set to the null or most specific condition.


• Example: If attributes are [sky, temperature], an S might look like [sunny, warm].

4. General Hypothesis (G)

• The broadest rule that could still match the positive examples.

• Initially, it allows all values, like [?, ?] for two attributes.

• It becomes more specific with negative examples.

5. Version Space

• It is the space between S and G.

• It includes all hypotheses that agree with all training examples.

• As more examples are seen, the version space gets smaller, until the target concept is
found.

Steps of the Candidate Elimination Algorithm


1. Initialization:

o S = Most specific hypothesis (matches nothing).

o G = Most general hypothesis (matches everything).

2. For each training example:

o If the example is positive:

▪ Remove inconsistent hypotheses from G.

▪ Generalize S if needed to include the example.

o If the example is negative:

▪ Remove inconsistent hypotheses from S.

▪ Specialize G if needed to exclude the example.

3. Refinement:

o The algorithm updates S and G until all examples are processed.

o Final version space lies between S and G.

Advantages of CEA

• Uses both positive and negative examples.

• Maintains all possible consistent hypotheses.

• Provides more accurate and generalized results.


• Can handle noisy data better than Find-S.

Disadvantages of CEA

• More complex and slower to compute.

• May fail if data has too much noise.

• Difficult to implement for beginners.

• Not efficient for very small datasets.

Find-S Algorithm
The Find-S Algorithm is a simple concept learning method used in machine learning. It is
designed to find the most specific hypothesis that fits all the positive training examples. The
algorithm is based on the assumption that the hypothesis space contains the target concept
and that there are no errors in the data.

What is Concept Learning?


Concept learning is the task of finding a general rule or condition (called a hypothesis) that
correctly describes a target concept based on training examples. The goal is to generalize from
specific examples to correctly classify future data.

What is the Find-S Algorithm?


• Find-S stands for "Find the Most Specific Hypothesis."

• It works by starting with the most specific hypothesis and gradually generalizing it
using only positive examples.

• It ignores negative examples, which is both a strength and a limitation depending on


the data.


Steps in Find-S Algorithm

Advantages of Find-S Algorithm

1. Very easy to understand and use

2. Works fast with small data

3. Finds the most exact rule for the given positive examples

4. Good when there are only correct (positive) examples and no mistakes in data

Disadvantages of Find-S Algorithm

1. Ignores negative (wrong) examples


2. Works only if the correct rule is already in the search space

3. Cannot handle wrong or missing data (noise)

4. May give a rule that is too narrow (overfits)

Machine Learning
• Machine Learning is the study of computer algorithms that allow computer programs to
automatically improve through experience.

• Machine Learning is about making computers modify or adapt their actions so that these
actions get more accurate.

• Machine learning is a subset of AI, which enables the machine to automatically learn from
data, improve performance from past experiences, and make predictions.

Types of Machine Learning:


1. Supervised Machine Learning

2. Unsupervised Machine Learning

3. Semi-Supervised Machine Learning

4. Reinforcement Learning

1. Supervised Machine Learning


Supervised learning is a machine learning technique where models are trained using labelled
data (input is already mapped to correct output). The goal is to learn a function that maps
input (X) to output (Y). Once trained, the model can predict outputs for new, unseen data.

Supervised learning is most effective when there is a large amount of accurately labeled data.
It allows the model to learn from previous examples and make future predictions. It is widely
used in applications where past data can be reliably linked to expected outcomes.

Example: Training a model with images of cats and dogs, labeled accordingly. After learning
the features like shape, size, color, and tail, the model can classify new images as cat or dog.

Types of supervised learning:


a) Classification

Classification algorithms are used to solve the classification problems in which the output
variable is categorical, such as "Yes" or No, Male or Female, Red or Blue, etc. The classification
algorithms predict the categories present in the dataset.
Algorithms: Decision Tree, Logistic Regression, SVM, Random Forest

Examples: Spam detection, Email filtering

b) Regression

Regression algorithms are used to solve regression problems in which there is a linear
relationship between input and output variables. These are used to predict continuous output
variables, such as market trends, weather prediction, etc.

Algorithms: Linear Regression, Multivariate Regression, Lasso, Decision Tree

2. Unsupervised Machine Learning


Unsupervised Machine Learning: In unsupervised learning, the model is trained using
unlabeled data. The system identifies hidden patterns or groupings in the data without
guidance.

Example: Feeding images of various fruits without labels, the model will group them based on
similarities like color, shape, or size.

Types of unsupervised learning:


a) Clustering

Clustering groups similar data points into clusters based on shared features. It helps uncover
hidden structures in data. For example, it is used to group customers by their buying habits.

b) Association

Association finds interesting relationships between variables in large datasets. It identifies


which items often occur together, such as products bought together in a market basket.

3. Semi-Supervised Learning
Semi-supervised learning combines both labeled and unlabeled data for training. It is useful
when acquiring labeled data is expensive or time-consuming.

Example: A student learns from a teacher (labeled data) but also revises independently
(unlabeled data).

Semi-supervised learning helps when you have a small amount of labeled data and a large
amount of unlabeled data. It improves learning efficiency by using both types. It is commonly
used in text classification, image recognition, and speech analysis.

4. Reinforcement Learning
Reinforcement learning is based on a feedback system where an agent learns by interacting
with the environment. Good actions are rewarded; bad actions are punished. The goal is to
maximize the cumulative reward.
Example: An AI learning to play a game by trial and error, improving its moves based on scores.

Reinforcement learning focuses on decision making through trial and error. The agent receives
rewards or penalties and adjusts its actions accordingly. It is useful in situations where actions
influence future outcomes over time.

Types of Reinforcement Learning:


1. Positive Reinforcement: Encourages behavior by giving rewards. It helps increase the
likelihood of the behavior being repeated. This is the most common type used in
learning agents.

2. Negative Reinforcement: Encourages avoiding bad behavior by removing negative


outcomes. It increases the chance of repeating the correct behavior. Often used to
guide agents away from harmful actions.

Perspectives and Issues in Machine Learning:


Perspectives

• Machine learning means finding the best solution (hypothesis) from many possibilities
that match the data and what we already know.

• These possible solutions (hypotheses) can be represented using models like decision
trees, linear functions, or neural networks.

• Different problems require different types of models for better learning results.

• Learning can be seen as a “search process,” where the algorithm looks for the right
model based on the amount of training data, the size of the solution space, and how
confident we are that the model will work on new data.

Specific Issues in Machine Learning:

1. Inadequate Training Data: Too little data makes it hard for the model to learn correctly.

2. Poor Quality Data: If the data is full of errors or noise, the model will learn the wrong
patterns.

3. Non-representative Data: If the training data doesn’t reflect real-world conditions, the
model may perform poorly.

4. Overfitting and Underfitting: Overfitting happens when the model memorizes the
training data, while underfitting happens when it fails to learn the underlying pattern.

5. Monitoring and Maintenance: Machine learning models need regular updates and
monitoring to stay accurate.
6. Bad Recommendations: Poor model training can lead to incorrect suggestions,
especially in recommendation systems.

7. Lack of Skilled Resources: Building and managing ML models requires experts, which
are sometimes hard to find.

8. Customer Segmentation: Wrong grouping of customers may affect personalized


services or marketing.

9. Process Complexity: Machine learning involves many steps—data cleaning, feature


selection, model training, testing, etc., which can be complicated.

10. Data Bias: If the data has bias, the model will learn and repeat the same bias, leading
to unfair or incorrect results.

PERCEPTRON
In Machine Learning and AI, a Perceptron is a fundamental concept and the starting point for
understanding neural networks and deep learning. It consists of input values, weights, and a
threshold (bias) to make decisions.

It was invented by Frank Rosenblatt in the mid-20th century to perform calculations that
detect patterns in data. The Perceptron is a supervised learning algorithm used mainly for
binary classification problems. It helps the model learn from data by adjusting weights
during training.

Perceptron Model
A Perceptron is a simple supervised learning algorithm used for binary classification. It is also
called an artificial neuron and is the basic unit of a neural network.

The Perceptron model works like a single-layer neural network with four key parts:

• Inputs
• Weights and Bias
• Net sum (weighted total)
• Activation function

It helps in detecting patterns from input data and is one of the simplest forms of neural
networks used in machine learning.

Binary Classifier

A binary classifier is a machine learning algorithm that classifies input data into one of two
classes, such as yes/no, spam/not spam, or positive/negative.
It works by checking if the input (usually represented as feature vectors) belongs to a specific
class. Most binary classifiers use a linear function that combines weights and features to
make a prediction

Components of Perceptron
Mr. Frank Rosenblatt invented the perceptron model as a binary classifier which contains
three main components. These are as follows:

Input Layer
The input layer is the initial component of the perceptron that accepts the raw data. Each
input node represents a feature of the dataset and typically holds a real-valued input. These
inputs are passed forward for weighted processing.

Weights and Bias


Weights determine the influence of each input on the output. Bias acts like the intercept in a
linear equation, helping adjust the output threshold.

Activation Function
Processes the weighted sum and bias to determine the output. A step function is commonly
used to produce binary results, deciding whether the neuron should activate.
Types of Activation functions:

• Sign function
• Step function
• Sigmoid function

A data scientist uses the activation function to help the model make decisions based on the
problem.
Different activation functions like Sign, Step, and Sigmoid are used in perceptrons depending
on how the learning behaves—whether it's slow or affected by issues like vanishing or
exploding gradients.

Working
In Machine Learning, a Perceptron is a single-layer neural network that includes four main
parts: input values, weights and bias, net sum, and an activation function.

• Each input is multiplied by its corresponding weight.


• All weighted values are added together along with the bias to form a net sum.
• This net sum is passed through an activation function (usually a step function), which
decides the final output.
The activation function helps convert the output to a fixed range like (0, 1) or (-1, 1).

• The weight shows how important an input is.


• The bias shifts the activation curve up or down to improve learning flexibility.

The Perceptron model works in two main steps:

Step 1: Calculate Weighted Sum

Multiply each input value with its corresponding weight wi

Add all the products and include the bias b.

Step 2: Apply Activation Function

The activation function f is applied to the net sum to produce the final output:

This output can be binary (e.g., 0 or 1) or continuous, depending on the function used.

LINEAR DISCRIMINANT ANALYSIS (LDA)


Linear Discriminant Analysis (LDA) is a popular dimensionality reduction and classification
technique used in supervised learning, especially when dealing with more than two classes.

It is also known as:

• Normal Discriminant Analysis (NDA)


• Discriminant Function Analysis (DFA)

What LDA Does:


• Reduces high-dimensional data to a lower-dimensional space to save computation and
avoid overfitting.
• Separates two or more classes by projecting data in a way that maximizes the difference
between them.
• Often used as a preprocessing step before applying classification algorithms

Why LDA

• Logistic Regression works well for only two-class problems.


• LDA can handle multiple classes efficiently.

Whenever we need to separate two or more classes that have multiple features, Linear
Discriminant Analysis (LDA) is one of the most commonly used techniques.

For example, if we try to classify two classes using just one feature, their values might
overlap, making it hard to separate them clearly.
But LDA combines multiple features to create a new axis that helps in better class separation
with less overlap.

To overcome the overlapping issue in the classification process, we must increase the
number of features regularly.

Example

Imagine we have two classes of data points plotted on a 2D plane (X and Y axes), as shown
in the image.

Black circles = Class 1

Red circles = Class 2


In this 2D space, it's hard to draw a straight line that completely separates the two classes.

Linear Discriminant Analysis (LDA) helps by projecting this 2D data into a 1D space in a way
that:

• Maximizes the distance between the classes, and


• Minimizes overlap within each class

This makes classification much easier and more accurate.

How Linear Discriminant Analysis (LDA) Works

LDA is used to reduce high-dimensional data (like 2D or 3D) into a lower-dimensional space
(like 1D) to make classification easier.

Imagine we have two classes of points plotted in a 2D space (X and Y axes), and we want to
separate them clearly.

Linear Discriminant Analysis (LDA) helps by:

• Drawing a new axis (a straight line) that best separates the two classes.
• Projecting all data points onto this new axis.

This makes it easier to classify the points, as the separation between classes becomes more
visible in this new 1D space.

Hence, we can maximize the separation between these classes and reduce the 2-D plane
into 1-D.

To do this, LDA follows two key rules:

• Maximize the distance between the means of the two classes.


• Minimize the variance within each class.

By applying these rules, LDA finds a new axis that:

• Spreads the classes farther apart, and


• Keeps the points of the same class closer together.
This makes it easier to separate the classes and improves classification accuracy.

Drawbacks of Linear Discriminant Analysis (LDA)

While LDA is effective for multi-class classification, it has some limitations:

• Fails when class distributions have the same mean:


If two or more classes share the same mean, LDA cannot find a clear axis to separate
them, making it ineffective.

• In such cases, non-linear techniques like Non-Linear Discriminant Analysis or kernel-


based methods are used.

Real-World Applications of LDA

• Face Recognition
• Medical Diagnosis
• Customer Identification
• Predictive Modeling
• Robotics and Learning

CONCEPT LEARNING
Concept Learning is a supervised learning approach in machine learning where the goal is to
learn a general rule or concept from specific examples.
In simple terms, it’s about learning how to classify data into positive or negative examples
based on defined conditions or rules.
Concept Learning is the task of inferring a Boolean-valued function (yes/no, true/false) from
given input-output examples.
For example:
Learning the concept of a "bird" from examples like:
• Sparrow – Yes (Bird)
• Car – No (Not a Bird)
The model tries to understand what makes something a "bird" based on features
Key Elements:
• Instance Space (X): All possible examples.
• Hypothesis Space (H): All possible rules the model can learn.
• Target Concept (c): The actual concept we're trying to learn.
• Training Examples (D): Labelled examples used to learn.

CONCEPT LEARNING TASK


The Concept Learning Task is the problem of automatically identifying a concept or rule from
a set of labelled training examples.
The goal is to learn a target concept (a function) that maps input instances to yes/no
(true/false) decisions.

A concept learning task is defined as:


• A set of instances (X)
• A set of training examples D, where each example is labeled as positive (belongs to
the concept) or negative (does not)
• A hypothesis space (H) that contains all possible rules or concepts
The task:
Find a hypothesis h ∈ H such that:

Where:
• x is an input instance
• h(x) is the hypothesis prediction
• c(x) is the actual label (target concept)

Hypothesis Representation in Concept Learning


When training a machine to learn a concept (like "good day to go sailing"), we need to
represent possible rules (called hypotheses) that the machine can choose from.

A hypothesis is basically a set of conditions that describes a possible concept.

Example

Each instance has six attributes

1. Sky (e.g., Sunny, Cloudy, Rainy)


2. AirTemp (e.g., Warm, Cold)
3. Humidity (e.g., High, Normal)
4. Wind (e.g., Strong, Weak)
5. Water (e.g., Warm, Cool)
6. Forecast (e.g., Same, Change)

So, a hypothesis is a vector (or list) with six entries, one for each of these attributes.

How Can a Hypothesis Represent an Attribute?


For each attribute, the hypothesis can say one of the following:
1. "?" (Question mark)
→ Any value is acceptable for this attribute.
→ Example: "?" in the Wind position means Wind can be Strong or Weak.

2. Specific Value
→ Only one specific value is allowed.
→ Example: "Warm" in AirTemp means it must be Warm.

3. "θ" (Theta symbol)


→ No value is acceptable. It rejects all instances.
→ Used to represent the most specific hypothesis, which accepts nothing.

Ex: [Sunny, ?, High, ?, Warm, Same]

This means:

• Sky must be Sunny

• Any value is acceptable for AirTemp and Wind

• Humidity must be High

• Water must be Warm

• Forecast must be Same

Inductive Learning Hypothesis


The Inductive Learning Hypothesis is a basic idea in machine learning. It says that if a
hypothesis (rule) works well on the training data, then it is likely to work well on new,
unseen data too.

That means, when we train a machine learning model, we give it labeled examples
(training data). The model learns a hypothesis (a rule or function) from these examples.
If this rule correctly predicts most of the training data, we assume it will also predict
future data correctly.

This is what makes generalization possible in machine learning. It allows models to make

predictions on data they’ve never seen before.

Concept Learning as Search


• In machine learning, concept learning can be seen as a search problem where we
search for the best hypothesis (rule) that matches the training examples.
• We are searching through a space called the Hypothesis Space (H).
• This space contains all possible hypotheses (rules) that could explain the data.
• The Goal is to find the hypothesis in H that correctly classifies all positive and
negative examples from the training set.
How It Works:
• The hypothesis space is all the possible rules the model can choose from.
• The learning algorithm searches this space to find a rule that:
o Matches positive examples (yes)
o Rejects negative examples (no)
Concept learning is like a search process, where the learner explores different hypotheses
to find the one that matches the concept based on examples.
By choosing a hypothesis representation (like specific symbols or formats),
the designer decides:
• What kind of rules the model can learn
• What the limits of the learning algorithm will be
So, the structure of the hypothesis space depends entirely on how the learning problem
is defined.
Example: Concept Learning as Search
Let’s say we want to teach a machine the concept of a "fruit that is good to eat" based
on 3 attributes:

Attribute Possible Values


Color Red, Green, Yellow
Texture Smooth, Rough
Taste Sweet, Sour

Step 1: Training Data (Examples)


Color Texture Taste Good to Eat?
Red Smooth Sweet Yes
Green Rough Sour No
Yellow Smooth Sweet Yes

Step 2: Define Hypothesis Space


We define hypotheses like this: [Color, Texture, Taste]
Each position can be ?(any value) or specific value or no value

Step 3: Searching the Hypothesis Space


most general hypothesis: [?, ?, ?] → Accepts everything
most specific hypothesis: [θ, θ, θ] → Accepts nothing
Then, based on training examples, we search the space by adjusting the hypothesis:
• First positive example: [Red, Smooth, Sweet]
• Generalize to include the next positive: [?, Smooth, Sweet] (matches both Red and
Yellow)
This search continues until we find a hypothesis that:
• Covers all positive examples
• Excludes all negative ones
[?, Smooth, Sweet] → Any color, but texture must be Smooth and taste must be Sweet.

General-to-Specific Ordering of Hypotheses


In concept learning, many algorithms organize the search through the hypothesis space
using a structure called the general-to-specific ordering. That means
• Some hypotheses are more general than others — they apply to more examples.
• Some hypotheses are more specific — they apply to fewer examples.
This natural structure helps us search more efficiently, even in large or infinite hypothesis
spaces, without checking every possible rule.
Example:
Let’s consider two hypotheses:
• h1 = (Sunny, ?, ?, Strong, ?, ?)
• h2 = (Sunny, ?, ?, ?, ?, ?)
Now look at the difference:
• h1 says the wind must be Strong.
• h2 doesn't care about wind (more general).
So, any instance that matches h1 will also match h2, but not the other way around.
Therefore, we can say that “h2 is more general than h1”
Definition: Let hj and hk be Boolean- valued functions defined over X. Then hj is more
general-than-or-equal-to hk (written hj >= hk) if and only if

THE BRAIN AND THE NEURON


• In animals, learning happens in the brain, which is a powerful and complex system.
• It can handle noisy, incomplete, and high-dimensional data (like images) efficiently.
• The brain weighs around 1.5 kg and still performs well even as neurons die with age.
• The basic unit of the brain is the neuron.
How a Neuron Works:
• Dendrites receive signals.
• If enough signals are received, the neuron fires by sending a signal down the axon.
• We can mimic this process in machine learning using a function that:
o Takes weighted inputs
o Adds them up
o Fires (outputs a signal) if the sum crosses a threshold (bias)

Each neuron performs a simple task, but the brain as a whole acts as a massively parallel
computer with around 10^11 neurons.
Hebb’s Rule (Learning Rule in Neuroscience)
• Proposed by Donald Hebb.

• If two neurons fire at the same time, the connection between them becomes
stronger.

• If they don’t fire together, the connection weakens or dies.

This rule forms the basis of learning in biological and artificial neural networks.

McCulloch and Pitts Neurons:


➢ A mathematical model of a neuron developed in the 1940s.
➢ The neuron receives multiple inputs xi with corresponding weights wi.
➢ These inputs are summed, and:

• If the sum is greater than a threshold, the neuron fires (outputs 1).

• If not, the neuron does not fire (outputs 0).

Components:
• Weighted Inputs – Like synapses

• Adder/Sum – Like the neuron's membrane

• Activation Function – Usually a threshold function

Limitations of the McCulloch & Pitts Model

• Can only handle Boolean inputs (0 or 1)


• Thresholds must be set manually
• No built-in weight adjustment or learning
• Cannot solve non-linearly separable problems

LEARNING SYSTEM
A Learning System is a system that automatically learns and improves from experience
without being explicitly programmed for every possible task.

A learning system is any machine or algorithm that can:

• Take input data

• Learn patterns or relationships from that data (training)

• Make predictions or decisions based on learned knowledge

• Improve its performance over time with more data

Example: Spam filter in email systems learns to classify emails as spam or not spam
based on labelled examples.

Designing a Learning System


Designing a machine learning system involves several well-defined steps or components:
1. Define the Learning Problem

• What is the goal? (e.g., classification, regression, clustering)


• What is the output? (e.g., label, value, category)

• What kind of data is available?

2. Data Collection and Preparation

• Collect data relevant to the task

• Clean, preprocess, and transform it (e.g., handle missing values, normalize features)

3. Choose a Learning Algorithm

• Supervised learning (e.g., decision trees, neural networks)

• Unsupervised learning (e.g., k-means)

• Reinforcement learning (e.g., Q-learning)

4. Train the Model

• Use training data to learn parameters

• Adjust internal model weights to minimize errors

5. Evaluate the Model

• Use a test dataset to measure performance

• Use metrics like accuracy, precision, recall, F1-score

6. Tune Hyperparameters

• Use techniques like cross-validation, grid search

• Improve generalization by avoiding overfitting/underfitting

7. Deploy and Monitor

• Deploy the model to a real-world system

• Continuously monitor and update with new data

Key Considerations While Designing

• Data quality and quantity

• Choice of features (feature engineering)

• Model complexity vs interpretability

• Scalability and computational cost

Example: Credit Card Fraud Detection System

Step 1: Define the Learning Problem


• Goal: Detect fraudulent transactions

• Type: Binary classification

• Output: 1 (Fraud) or 0 (Genuine)

Step 2: Data Collection & Preparation

• Collect a dataset of past transactions, including:

o Amount, time, location

o Merchant type

o Cardholder's profile

o Label (fraud or not)

• Clean the data:

o Remove missing or duplicate records

o Normalize transaction amounts

o Encode categorical features (e.g., one-hot encoding for location/merchant)

Step 3: Choose Learning Algorithm

Use a Supervised Learning algorithm like:

• Logistic Regression

• Decision Trees / Random Forest

• Support Vector Machines (SVM)

• Neural Networks

Suppose we use Random Forest for its good performance on imbalanced datasets.

Step 4: Train the Model

• Split data into training and testing sets (e.g., 80/20 split)

• Train the Random Forest model on the training set

Step 5: Evaluate the Model


Use metrics suitable for imbalanced datasets:

• Accuracy (not enough alone)

• Precision: % of predicted frauds that are correct

• Recall: % of actual frauds detected

• F1-Score: Harmonic mean of precision and recall

• AUC-ROC Curve for visualization

Step 6: Tune Hyperparameters

• Use Grid Search or Random Search for parameters like:

o Number of trees

o Max tree depth

o Minimum samples per leaf

• Perform Cross-Validation to generalize better

Step 7: Deploy and Monitor

• Integrate model into payment systems

• Monitor for:

o Model drift (data patterns changing over time)

o False positives (flagging real users as fraud)

o False negatives (missing frauds)

You might also like