0% found this document useful (0 votes)
464 views52 pages

Machine Learning: Types and Techniques

Uploaded by

Navin Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
464 views52 pages

Machine Learning: Types and Techniques

Uploaded by

Navin Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

UNIT III

LEARNING
Machine learning: definition - classification - Regression - approaches of machine
learning models Types of learning probability - Basics - Linear Algebra
Hypothesis space and inductive bias, evaluation, Training and test sets, cross
validation, concept of over fitting, under fitting, Bias and variance - Regression:
Linear regression - Logistic regression.

3.1. MACHINE LEARNING

Machine learning is a subfield of artificial intelligence (AI) that involves the


developmnent of algorithms and models that enable computers to learn and make
predictions or decisions without being explicitly programmed. It is a data-driven
approach where computers are trained on large amounts of data and learn patterns,
relationships, and rules from the data to make informed decisions or predictions.
In machine learning, algorithms are designed to automatically analyze and
interpret complex data, extract meaningful insights, and make accurate predictions or
decisions. This process involves several key components, including:
Data: Machine learning algorithms require a large amount of structured or
unstructured data as input for training. The data can include various features or
attributes that are relevant to the problem being solved.
Training: During the training phase, the machine learning model is presented
with the input data and learns from it to identify paterns and relationships. The
model adiusts its internal parameters based on the data to improve its performance
over time.
Algorithms: Machine learning algorithms are used to build models that can learn
from data. These algorithms can be categorized into different types, such as
supervised learning, unsupervised learning, reintorcement leaning, and deep
learning, each suited for different types of tasks.
3.2| Artificial lntelligence and Machine Learning Fundamentals

Feature extraction: Machine learning models often require the identification and
selection of relevant features from the input data. Feature extraction helps
in
reducingthe dimensionality of the data and focusing on the most informative aspects,
Evaluation and Testing: After training, the machine learning model is evaluated
using separate test data to measure its performance and accuracy. This step helps
assess the model's generalization capabilities and ensures it can make accurate
predictions or decisions on new, unseen data.
Machine learning has a wide range of applications, including image and speech
recognition, natural language processing, recommendation systems, fraud detection.
autonomous vehicles, healthcare diagnostics, and many mor. It continues to advance
rapidly, with new techniques and models being developed to tackle increasingly
complex problems and improve performance.
Machine Learning is defined as a technology that is used to train machines to
perform various actions such as predictions, recommendations, estimations, etc.
based on historical data or past experience.
Machine Learning enables computers to behave like human beings by training
them with the help of past experience and predicted data
There are three key aspects of Machine Learning, which are as follows:
Task: A task is defined as the main problem in which we are interested.
This task/problem can be related to the predictions and recommendations
and estimations, etc.
Experience: It is defined as learning from historical or past data and used
to estimate and resolve future tasks.
Performance: It is defined as the capacity of any machine to
resolveany
machine learning task or problem and provide
the best outcome for the
same. However, performance
is dependent on the type of machine learning
problems.

3.2. TYPES OF LEARNING IN MACHINE LEARNING

Machine Learning techniques are divided


mainly into the following 4 categorics:
3.2.1. SUPERVISED LEARNING
Supervised learning is applicable when a
machine has sample
data, i.e., input as
well as output dala with correct labels.
Correct labels are used to check
Learning 33
correctness of the model using some labels and tags. Supervised learning
technique
helps us to predict future events with the help of past experience and labeled
examples. nitially, it analyses the known training dataset, and later it introduces an
inferred function that makes predictions about output values. Further, it also predicts
errors during this entire learning process and also correcs those errors through
algorithms.
Example: Let's assume we have a set of images tagged as "dog". A machine
Iearning algorithm is trained with these dog images so it can easily distinguish
whether an image is a dog or not.

Categories of Supervised Machine Learning


Supervised machine learning can be classified into two types of problems, which
are given below:

Classification
Regression
(a) Classification

Classification algorithms are used to solve the classification problems in which


the output variable is categorical, such as "Yes'" or No, Male or Female,Red or Blue.
etc. The classification algorithms predict the categories present in the dataset. Some
real-world examples of classification algorithms are Spam Detection, Email filtering,
etc.

Some popular classification algorithms are given below:


Random Forest Algorithm
Decision Tree Algorithm
Logistic Regression Algorithm

3 Support Vector Machine Algorithm

(b) Regression
Regression algorithms are used to solve regression problems in which there is a
linear relationship between input and output variables. These are used to predict
continuous output variables,such as market trends, weather prediction, ete.
Some popular Regression algorithms are given below:
Simple Linear Regression Algorithm
3.4 Artiflcial Itelligence and Machlne Learning 'undamentd,

$ Multivariate Regression Algorithm


Decision Tree Algorithm

Lasso Regression

Advantages and Disadvantages of Supervised Learning

Advantages:
Since supervised lcarning work with the labelled dataset so we can have an
exact idea about the classes of objccts.
These algorithms are helpful in predicting the output on the basis of prig
experience.
Disadvantages:
These algorithms are not able to solve complex tasks.
It may predict the wrong output if the test data is different from the

training data.
It requires lots of computational time to train the algorithm.
Applicationsof Supervised Learning
Some common applications of Supervised Learning are given below:

Image Segmentation:
Supervised Learning algorithms are used in image
segmentation. In this process,
image classification is performed on different image data with pre-defined
labels.
Medical Diagnosis:
Supervised algorithms are also used inthe mcdical ficld
for diagnosis purposes. I1
is done by using medical images and past labclled
data with labels for disease
conditions. With such a process, the machine can identify a
disease for the new
patients.
Fraud Detection -
Superviscd Learning classilication algorithms are
used for
identifying fraud transactions, fraud customers, ctc, It is donec
by using historic dala
to identify the patterns that can lead to possible fraud.
Spam detection - In spam detection & filtering, lassification
algorithms are
Jsed. These algorithms classIly an email as spam or not spam.
The spam emails ar
sent tothe spam folder.
Learning
3.5
Speech Recognition' - Supervised learming algorithms are
also used in speech
recognition. The algorithm is trained with voice data, and
various identifications can
be done using the same, such as voice-activated
passwords, voice commands, etc.
3.2.2. UNSUPERVISED LEARNING
In unsupervised learning, a machine is trained
with some input samples or labels
only, while output is not known. The
training information is neither classified nor
labeled; hence, a machine may not always
provide correct output compared to
supervised learning.
Although Unsupervised learning is less common in practical
business settings, it
helps in exploring the data and can draw inferences from
datasets to describe hidden
structures from unlabeled data.
Example: Let's assume a machine is trained with some set
of documents having
different categories (Type A, B, and C), and we
have to organize them into
appropriate groups. Because the machine is provided only
with input samples or
without output, so, it can organize these datasets into type A, type B, and type C
categories, but it is not necessary whether it is organized correctly or not.

Categories of Unsupervised Machine Learning

Ünsupervised Learning can be further classified into two types, which are given
below:

Clustering
Association
() Clustering
The clustering technique is used when we want to find the inherent groups from
he data. It is a way to group the objects into a cluster such that the objects with the
most similarities remain in one group and have fewer or no similarities with the
Objects of other groups. An example of the clustering algorithm is grouping the

Clstomers by their purchasing behaviour.


Ome of the popular clustering algorithms are given below:
$ K-Means Clustering algorithm
3 Mean-shift algorithm
DBSCAN Algorithm
Machine Learning Fundamentas
Artificial Intelligence and
|3.6|

Principal Component Analysis


Independent Component Analysis

(i) Association
learning technique, which fo
Association rule learning is an unsupervised
a large dataset. The main aim of this
interesting relations among variables within

is to find the dependency of one data item on another data itenm


learning algorithm
so that it can generate maximum profit. Thie
and map those variables accordingly
usage mining
algorithm is mainly applied in Market Basket analysis, Web
.

continuous production, etc.


Some popular algorithms of Association rule learning are Apriori
Algorithm.

Eclat, FP.growth algorithm.


Advantages and Disadvantages of Unsupervised Learning Algorithm
Advantages:
These algorithms can be used for complicated tasks compared to the
supervised ones because these algorithms work on the unlabeled dataset.
Unsupervised algorithms are preferable for various tasks as getting the

unlabeled dataset is easier as compared to the labelled dataset.


Disadvantages:
The output bf an unsupervised algorithm can be less accurate as the dataset
is not labelled, and algorithms are not trained with the exact output
prior.
.
Working with Unsupervised learning is more difficult as it works with Us
unlabelled dataset that does not map with the output.
Applications of Unsupervised Learning
Network Analysis: Unsupervised learning is used for identifying plagiarism and
copyright in document network analysis of text data for scholarly articles.
Recommendation Systems: Recommendation systems unsupervised
widely use
learning techniques for buuilding recommendation applications different
web
for
applications and e-commerce websites.
Anomaly Detection: Anomaly detection is unsupervised
Sa popular application of
learning, which can identify unusual data points within usedto
the dataset. It is
discover fraudulent transactions:
Learning
3.7

Singular Value Decomposition: Singular Value Decomposition or


SVDis used
to extract particular information from
the database. For example, extracting
information of each user located at a particular location.

3.2.3. REINFORCEMENT LEARNING


Reinforcement Learning is a feedback-based machine learning technique.
In such
type of learning, agents (computer programs) need to explore the environment,
perform actions, and on the basis of their actions, they get rewards as feedback. For
each good action, they get a positive reward, and for each bad action,
they get a
negative rewárd. The goal of a Reinforcement learning agent is to maximize
the
positive rewards. Since there is no labeled data, the agent is bound to learn by its
experience only.

Categories of Reinforcement Learning


Reinforcement learning is categorized mainly into two types of
methods/algorithms:
) Positive Reinforcement Learning: Positive reinforcement learning
specifies increasing the tendency that the required behaviour would occur
again by adding something. It enhances the strength of the behaviour of
the agent and positively impacts it.
(in) Negative Reinforcement Learning: Negative reinforcement learning
works exactly opposite to the positive RL. It increases the tendency that
the specific behaviour would occur again by avoiding the negative
condition.

Real-world Use cases of Reinforcement Learning.

Video Games:
RL algorithms are much popular in gaming applications. It is used to gain super
human performance. Some popular games that use RL algorithms are AlphaGO and
AlphaGO Zero.

Resource Management:
The "Resource Management with Deep Reinforcement Learning" paper showed
that how to use RL in computer to automatically learn and schedule resources to wait
for different jobs in order to minimize average job slowdown.
3.8 Artificial Intelligence and Machine Learning Fundamentals

Robotics:
RL iswidely being used in Robotics applications. Robots are used in the industrial
and manufacturing area, and these robots are made more powerful wit,
reinforcement learning. There are different industries that have their vision of
building intelligent robots using Al and Machine learning technology.

Text Mining
Text-mining, one of the great applications of NLP, is now. being implemented
with the help of Reinforcement Learning by Sales force company.

Advantages and Disadvantages of Reinforcement Learning

Advantages
It helps in solving complex real-world problems which are difficult to be
solved by general techniques.
The learning model of RL is similar to the learning of human beirgs
hence mnost accurate results can be found.
Helps in achieving long term results.

Disadvantage
RL algorithms are not preferred for simpleproblems.

RL algorithms require huge data and computations.


Too much reinforcement learning can lead which
to an overload of states
can weaken the results.
The curse of dimensionality limits reinforcement physical
learning for real
systems.

3.2.4. SEMI-SUPERVISED LEARNING


Semi-supervised Learning is an intermediate Supervised ad
technique of both
unsupervised learning. It performs actions on datasets as
having few labels as well
unlabeled data. However, it generally contains unlabeled reduces
data. Hence, it also
the cost of the machine learning model as labels are corporat'
costly, but for c
purposes, it may have few labels. Further, it ccuracy and
also increases
nerformance of the machine learning model. the
Learning
|3.9|

Sem-supervised learning helps data scientists to overcome the


drawback of
supervised and unsupervised learning. Speech analysis, web content classification,
protein sequence classification, text documents classifiers., etc., áre some
important
applications of Semi-supervised learning.
Advantages and disadvantages of Semi-supervised Learning
Advantages:
It is simple and easy to understand
thealgorithm.
It is highly efficient.
It is used to solve drawbacks of
Supervised and Unsupervised Learning
algorithms.
Disadvantages:
Iterations results may not be stable.
We cannot apply these algorithms to network-level data.
Accuracy is low.

3.3. CLASSIFICATION ALGORÍTHM IN MACHINE LEARNING

As we know, the Supervised Machine Learning algorithm can be broadly


classified into Regression and Classification Algorithms. In Regression algorithms,
we have predicted the output for continuous values, but to predict the categorical
values, we need Classification algorithms.
The Classification algorithm is a Supervised Learning technique that is
used to identify the category of new observations on the basis of training
data.
InClassification, a program learns from the given dataset or observations
and then classifies new observation into a number of classes or groups.
Such as,Yes or No, 0 or 1, Spam or Not Spam, cat or dog, ete.
Classes can be called as targets/labels or categories,
Unlikeregression, the output variable of Classification is a category, not a
value. such as "Green or Blue", "firuit or animal", etc. Since the
Classification algorithm is a Supervised learning technique, hence it takes
labeled input data, which means it contains input with the coresponding
output.
3.10 Artificial Intelligence and Machine Learning Fundamentd

3
In classification algorithm,
a discrete output function(y) is
mappedto
inpu
variable(x ).
y=
y=f*), where categorical output
The best example of an ML classification algorithm is Email Span
Detector.
The main goal of the Classification algorithm is to identify the category
a given dataset, and these algorithms are mainly used to predict the Outr
t
for the categorical data.
Classification algorithms can be better understood using the belo
diagram. In the below diagram, there are two classes, class A and Clas R
These classes have features that are similar to each other
and dissimilar ta
other classes.

Class A

Class B
A

X
Fig. 3.1. Clasification
algorithms
The algorithm which implements
the classification on a
classifier. There are two types dataset is known
of Classifications:
Binary Classifier:
If the classification problem has
outcomes, then
it is called as Binary only two possibk
Classifier.
Examples: YES or
NO, MALE or NOT
SPAM,CAT or FEMALE, SPAM or
D0G, etc.
Multi-class Classifier:
If a classification
outcomes, then it is called
as problem has more than tu
Multi-class Classifier.
Learning
|3.11

Example: Classifications of types of crops, Classification types of


of
music.
3.3.1. TYPES OF ML CLASSIFICATION ALGORITHMS:
Classification Algorithms can be further divided into the
Mainly two category:
() Linear Models
Logistic Regression
Support Vector Machines
(üi) Non-linear Models
K-Nearest Neighbours
Kernel SVM

Naïve Bayes
Decision TreeClassification

Random Forest Classification

3.4. PROBABILITY IN MACHINE LEARNING

Probability is the bedrock of ML, which tells how likely is the event to occur. The
value of Probability always lies between 0 to 1. It is the core concept as well as a
primary prerequisite to understanding the ML models and their applications.
Probability can be calculated by the number of times the eventoccurs divided by
the total number of possible outcomes. Let's suppose we tossed a coin, then the
probability of getting head as a possible outcome can be calculated as below formula:
Number of ways to head occur
P (H)
total number of possible outcomes
1
P (H) =
2

P (H) = 0.5
a
Where: P (H) = Probability of occurring Head as outcome while tossing coin.

3.4.1. TYPES OF PROBABILITY


For better understanding the Probability, it can be categorized further in different
types as follows:
Machine Learning Fundamentals
Artificial Intelligence and
3.12|

Probability can
be calculated as
() Empirical Probability: Empirical the
occurs divided by the total number of incidets
number of times the event
observed.
can be calculated ae 4
(ii) Theoretical Probability: Theoretical Probability
can occur divided by the total number
number of ways the particular event
of possible outcomes.
(üi) Joint Probability: It tells the Probability of simultaneously occurring two
random events.
P(AN B) = P(A).P(B)
=
Where: P(ANB) Probability of occurringevents A and B both.
P (A) = Probability of event A
P (B) = Probability of event B
(iv) Conditional Probability: It is given by the Probability
of event A given
that event B occurred.
The Probability of an event A conditioned on an event B is denoted and defined
as;

P(A|B) = P(ANB)/P(B)
Similarly, P(B|A) = P(ANB) IP(A).
We can
write the joint
Probability of as
andB as P(AN B) P(A): PBA.
A
=
which means: "The chance of both
things happening is the chance that the first
one

happens, and then the second one


is given when the first
thing happened."
We
havea basic : understanding of
Probability required
Now,we will discuss to learn Machine Leaming
the basic introduction
of Statistics for ML.
3.5. LINEAR ALGEBRA FOR MACHINE
LEARNING
Machine learning
has a strong machine
connection with m
learning algorithm is based on mathematics. Each
the concepts of mathematics help o
mathematics, one can choosethe correct &
also
with the
algorithm time
complexity, number of features,
etc. Linear by considering training
field o
mathematics, which defines Algebra
is an essential.
the study of vectors, mapping,
and
lines required for linear transformation. matrices, planes
Learning
3.13
Linear algebra plays a vital role and key foundation in machine
learning, and it
enables ML algorithms to run on.a huge
number of datascts.
The concepts of linear algebra arc widely uscd in developing
algorithms in
machine learning. Although it is used almost in cach concept
of Machine learning,
specifically, it can perform the following task:
Optimization of data.

$ Applicable in loss functions, regularisation, covariance matrices, Singular


Value Decomposition (SVD), Matrix Operations, and support vector
machine classification.
Implementation of Linear Regression in Machine Learning.
Besides the above uses, linear algebra is also used in neural networks and the data
science field.

Basic mathematics principles and concepts like Linear algebra are the foundation
of Machine Learning and Deep Learning systems. To learn and understand Machine
Learning or Data Science, one needs to be familiar with linear algebra and
optimization theory. In this topie, we will explain all the Linear algebra concepts
required for machine learning.
Below are some benefits of learning Linear Algebra before Machine learning:

4 Better Graphic experience


& Improved Statistics
Creating better Machine Learming algorithms

$ Estimating the forecast of Machine Learning


Easy to Learn
Better Graphics Experience:
Linear Algebra helps to provide better graplical processing in Machine Leaming
like Image, audio, video, and edge detection. These are the various graphical
on.
representations supported by Machine Learning projects that you can work
Further, parts of the given data set are trained based on their categories by classifiers
remove the errors
provided by machine learning algorithms. These classifiers also
from the trained data.
Machine Learning Fundamentals
3.14| Artificial Intelligence and

Algebra helps solve and compute large and complex data


Moreover, Linear
Matrix Decomposition Techniques. There are
through a specific terminology named
matrix decomposition techniques, which are as follows:
tWOmost popular
Q-R
L-U

Improved Statistics:
data in Machine
Statistics is an important concept to organize and integrate
statistics in a better
Learning.Also, linear Algebra helps to understand the concept of
manner. Advanced statistical topics can be integrated using methods, operations, and
notations of linear algebra.

Creating better Machine Learning Algorithms:


Linear Algebra also helps to create better supervised as well as unsupervised
Machine Learning algorithms.
Few supervised learning algorithms can be created using Linear Algebra, which is
as follows:
Logistic Regression
Linear Regression
Decision Trees

Support Vector Machines (SVM)


Further, below are some unsupervised learning algorithms listed that can also
created with the help of linear algebra as follows:
Single Value Decomposition (SVD)
Clustering
Components Analysis

the help of Linear Algebra concepts, you can


With various
also self-customize the
parameters in the live project and understand in-depth same
knowledge to deliver the
with more accuracy and precision,.
Estimatingthe forecast of Machine Learning:

you are working on a Machine Learning


If broad-
project, then you a
minded person and also, you will be able to must be
impart more perspectives. Hence, inthis
Learning
3.15|

regard, you must increase the awareness


and affinity of Machine Learning concepts.
You can begin with setting up
different graphs, visualization, using
various
narameters for diverse machine
learning algorithms or taking up things that others
aroundyou might find difficult to understand.
Easy to Learn:
Linear Algebra is an important department
of Mathematics that is easy to
understand. It is taken into consideration
whenever there is a requirement of
advanced mathematics and its applications.

3.5.1. MINIMUM LINEAR ALGEBRA FOR


MACHINE LEARNING
Notation:
Notation in linear algebra enables you to read algorithm
descriptions in papers,
books, and websites to understand the algorithm's working.
Even if you use for-loops
rather than matrix operations, you will be able to
piece things together.
Operations:
Working with an advanced level of abstractions in vectors and matrices can
make
concepts clearer, and it can also help in
the description, coding, and even thinking
capability. In linear algebra, it is required to learn the basic operations such as
addition, multiplication, inversion, transposing of matrices, vectors, etc.
Matrix Factorization:

One of the most recommended areas of linear algebra is matrix factorization,


Specifically matrix deposition methods such as SVD and QR.

3.5.2. EXAMPLES OF LINEAR ALGEBRA IN MACHINE LEARNING


Below are some popular examples of linear algebra in Machine learning:
Datasets and Data Files
Linear Regression
Recommender Systems
One-hot encoding
Regularization
* Principal Component Analysis
3.16| Artificial Intelligence and. Machine Learning Fundamentals

Images and Photographs


$ Singular-Value Decomposition
* Deep Learning
Latent Semantic Analysis
) Datasets and Data Files
Each machine learning project works on the dataset, and we fit the machine
learning model using this dataset.
Each dataset resembles a table-like structure consisting of rows and columns.
Where each row represents observations, and each column represents
features/Variables. This dataset is handled as a Matrix, which is a key data structure
in Linear Algebra.
Further, when this dataset is divided into input and output for the supervised
learning model, it represents a Matrix(X) and Vector(y), where the vector is also an
important concept of linear algebra.
(i) Images and Photographs
In machine learning, images/photographs are used for computer vision
applications. Each Image is an example of the matrix from linear algebra because an
image is a table structure consisting of height and width for each pixel.
Moreover, different operations on images, such as cropping, scaling, resizing, el
are performed using notations and
operations of Linear Algebra.
(ii) One Hot Encoding
In machine learning, sometimes, we
need to work with categorical data. 1
categorical variables are encoded to makethem simpler and
and easier to work with,
the popular encoding technique to encode one-hot
these variables is as
known
encoding.
In the one-hot encoding technique, a table created that shows a Variable
is
with

one column for each category and one row Further,


for each example in the datasel.
each row is encoded as a binary vector, which This
contains either zero or one Value.
is an example of sparse represSentaton,
which is a subfield of Linear Algebra.
Learniug 3.17

(v) Linear Regression


Linear regression is a popular technique of machine learning borrowed from
statistics, It describes the relationship betwecn input and output variables and is used
in machine learning to predict mumerical values. The most common way to solve
linear regression problems using Least Square Optimization is solved with the help
of Matrix lactorization methods. Some commonly uscd matrix factorization methods
are LU decomposition, or Singular-valuc decomposition, which are the concept
of
linear algebra.

(v) Regularization

Inmachine learning, we usually look for the simplest nossible model to achieve
the best outcome for the specific problem. Simpler models generalize well, ranging
from specific examples to unknown datasets. These simpler models are often
considered models with smaller coefficient values.
technique used to minimize the size of coefficients of a model while it isbeing
A

fit on data is known as regularization. Common regularization techniques are L and


L, regularization. Both of these forms of regularization are, in fact, a measure of the

magnitude or length of the coefficients as a vector and are methods lifted directly
from linear algebra called the vector norm.
(vi) Principal Component Analysis
Generally, each dataset contains thousands of features, and fitting the model with
such a large dataset is one of the most challenging asks of machine learning.
Moreover, a model built with irrelevant features is less accurate than a model built
with relevant features. There are several methods in machine learning that
automatically reduce the number of columns of a dataset, and these methods
are
known as Dimensionality reduction. The mosk-commonly used dimensionality
reductions method in machine learning is Principal Component Analyšis or PCA,
Ihis technique makes projections of high-dimensional data for both visualizations
and training models. PCA uses the matrix factorization method from linear algebra.

(vii) Singular-Value Decomposition


Singular-Value decomposition is also one of the popular dimensionality reduction
techniques
and is also written as SVD in
short form.
Artificial Intelligence and Machine
Learning Fundamentol.
|3.18

matrix-factorization method of linear algebra, and it is widely used in


It is the
visualization, noise reduction. and
different applications such as feature selection,
many more.

(viii) Latent Semantic Analysis


a machine learning that works
Natural Language Processing or'NLP is subfield of
with text and spoken words.
occurrence of words.
NLP represents a text document as large matrices with the
and rows
For exanmple, the matrix column may contain the known vocabulary words,
as the
may contain sentences, paragraphs, pages, etc., with cells in the matrix marked
count or frequency of the number of times the word occurred. It is a sparse matrix
representation of text. Documents processed in this way are much easier to compare,
query, and use as the basis for a supervised machine learning model.
This form of data preparation is called Latent Semantic Analysis, or LSA for

short, and is alsoknown by the name Latent Semantic Indexing or. LSI.

(ix) Recommender System


A recommender system is a sub-field of machine learning, a predictive modelling
problem that provides recommendations of products. For example, online
recommendation of books based on the customer's previous purchase history,
recommendation of movies and TV series, as we see in Amazon & Netflix.
The development of recommender systems is mainly based on linear algebra
methods. We can understand it as an example of calculating the similarity between
sparse customer behaviour vectors using
distance measures such as Euclidea
distance or dot products.
Diferent matrix factorization methods such as singular-yalue
decomposition
used in recommender systems to query, search,
and compare user data.
(x) Deep Learning
Artificial Neural Networkss or ANN are
the non-linear ML algorithms that wokto
processthe brain and transfer information
from one layer to another in a similar
way.

learning studies these neural networks, which


Deep faster
implement newer and
hardware for the training and development of dataset
larger networks with a
All deep learning methods achieve great huge
results for different such
challenging tasks
Learning
3.19
o8machine translation, speech recognition, etc.
The core of processing neural
networks is based on linear algebra data structures, which are multiplied and added
t0gether. Deep learning algorithms also work with vectors, matrices, tensors (matrix
uith more than two dimensions) of inputs and coefficients for
multiple dimensions.

3.6. HYPOTHESIS ANDINDUCTIVEBIAS

3.6.1. HYPOTHESIS
The hypothesis is defined as the supposition or proposed
explanation based on
insufficient evidence or assumptions. It is just a guess based on some known
facts
but has not yet been proven. A good hypothesis is testable, which results in either
true or false.

The hypothesis is one of the commonly used concepts of statistics in Machine


Leamning, It is specifically used in Supervised Machine learning, where an ML model
learns a funçtion that best naps the input to corresponding outputs with
the help of
an available dataset.
Example: Let's understand the hypothesis with a common example. Some
scientist claims that ultraviolet (UV) light can damage .the eyes then it may also
cause blindness.

In this exanple, a scientist just claims that UV rays are harmful to the eyes, but
We assume they may cause blindness. However, it may or may not be possible.
Hence, these types of assumptions are called a hypothesis.

Unknown
Target
Function
t: XY

Learning Final
Training
Algorithm Hypothesis
Examples
(A)
(X1,y1)..(xn, yn)|

Hypothesis
Space
(H)

Fig. 3.2. Hypothesis


Machine Learning Fundamentals
Artificial Intelligence
and
|3.20
main aim is to determine the possible
the
supervised learning techniques, to the corresponding or
In space that best maps input
hypothesis out of hypothesis common methods given to find out the possible
are some
correct outputs. There where hypothesis
space is represented
by
Hypothesis space,
hypothesis from the are defined as followe.
lowercase-h (h). These
uppercase-h (H) and hypothesis by

Hypothesis space (H):


legal hypotheses; hence it ie
as a set of all possible
Hypothesis space is defined
knovwn as a hypothesis set. It is used by supervised machine learning algorithms
also
to determine the best possible hypothesis to describe the
target function or best mans
problem, the
input to output. It is often constrained by choice of the framing of the
choice of model, and the choice of model configuration.

Hypothesis (h):
Jt is defined as the approximate function that best describes the target in
supervised machine learning algorithms. It is primarily based on data as well as bias
and restrictions applied to data. Hence hypothesis (h) can be concluded as a smge
hypothesis that maps input to proper output and can be evaluated as
well as usea tu
make predictions.
The hypothesis (h) can be formulated
in machine learning as follows:
y= mx +
b
Where, Y: Range, m:
Slope of the line which
divided by change in x.x: divided test data or chango
domainc: intercept (constant)
Example: Let's understand the
hypothesis (h) with 3
two-dimensional coordinate and hypothesis space (H)
plane showing the distribution of
data as follows.

Fig, 3.3.
Learning
3,23

Now, assume we have some test


data by which ML aloorithms predict the output3
for input as follows:

?
+

?
+
t-+ ?

Fig. 3.4.
If we divide this coordinate plane in such as way that it can
help you to predict
output or result as follows:

+ ? ?
+

Fig. 3.5.
Based on the given test data, the output result will be as follows:

?
+
+
+
+/
+
?

Fig, 3.6.
However, based on data, algorithm, and constraints, this coordinate plane can also
be divided
in the following ways as follows:
Machine Learning Fundamentals
Artificial Intelligence and.
|3.22

?
+ (or) +
+ ?
|| ?

+ + ?

Fig. 3.7.

With the above example, we can conclude that;


Hypothesis space (H) is the composition of all legal best
possible ways to divide
maps input to proper output. Further, each
the coordinate plane so that it best
hypothesis and
individual best possible way is called a hypothesis (h). Hence, the
hypothesis space would be like this:

Possible Hypothesis

+ ? Hypothesis
+
Space
+ ?

Hypothesis in Machine Learning

Fig. 3.8. Hypothesis in machine learning

3.6.2. HYPOTHESIS IN STATISTICS


assumptid
Similar to the hypothesis in machine learning, it
is also considered an
presene
of the output. However, it is falsifiable, which means it can be failed in the
of sufficient evidence.
we
cannot accept any becauseit
Unlike machine learning,
hypothesis in statistics
is just an imaginary result and based on probability. Before start Working
on

experiment, we must aware of two important types of


be
hypotheses as follows: tells
A
Hypothesis: null hypothesis is a type of which
aNull statistical hypothesis
observations.
W

is no statistically..significant
that there effect exists in
the given set of
Learning
3.23

is alsoknown as conjecture and is uscd in quantitativeanalysis to test theories


about
markets, investment, and finance to decide an
whcther idea is truc or false.
Alternative Hypothesis: An alternative hypothesis is a direct contradiction of the
null hypothesis, which means if one of the two hypotheses is true,
then the other
must be false. In other words, an alternative
hynothesis is a type of statistical
hypothesis which tells that there is some significant cffect that exists in the given set
of observations.

Significance level
The significance level is the primary thing that must be set before starting an
experiment. It is useful to define the tolerance of error and the level at which effect
can be considered significantly. During the testing process
in an experiment, a 95%
significance level is accepted, and the remaining 5% can be neglected. The
significance level also tells the critical or threshold value. For e.g., in an experiment,
if the significance level is set to 98%, then the critical value is 0.02%.

P-value
The p-value in statistics is defined as the evidence against a null hypothesis. In
other words, P-value is the probability that a random chance generated the data or
something else that is equal or rarer under the null hypothesis condition.
If the p-value is smaller, the evidence will be stronger, and vice-versa which
means the null hypothesis can be rejected in testing. It is always represented in a
decimal form, such as 0.035.
Whenever a statistical test is carried out on the population and sample to find out
P-value, then it always depends upon the critical value. If the p-value is less than the
critical value, then it shows the effect is significant, and the null hypothesis can be
rejected. Further, if it is higher than the critical value, it shows that there is no
significant effect. and hence fails to reject the NullHypothesis.

3.6.3. INDUCTIVE BIAS


Inductive bias in machine learning refers to the set of assumptions or constraints
that a learning algorithm uses to make predictions or generalize from training data to
unseen data. It is a necessary component of any machine leaning algorithm because
are multiple
t helps in selecting one hypothesis or model over others when there
Solutions that fit the training data.
Machine Learning Fundamental,
3.24| Artificial Intelligence and

or beliets that guid


Inductive bias can be seen as a form of prior knowledge prior
or modee
learning process. It helps in redúcing the space of possible hypotheses
the
types of solutions. The choice of
and biases the learning algorithm towards certain
the nature of the problem, th.
inductive bias depends on various factors, including
learning practitioner
available data, and the design decisiorts made by the machine
different inductive biases, and they may perform well in
Different algorithms have
some domains but not in others.
common types of inductive bias:
Here are a few examples of
or models over more
Occam's Razor: This bias favors simpler hypotheses
are more likely to be correct and
complex ones. It assumes that simpler explanations
generalize well to unseen data.
outputs. It
Smoothness: This bias assumes that similar inputs should have similar
dont
encourages the learning algorithm to prefer smooth or continuous functions that
change abruptly in response to small changes in the input.
as decision
Bias towards specific functions: Some learning algorithms, such
or
trees or neural networks, have built-in biases towards certain types of functions
constant
structures. For example, decision trees have a bias towards piecewise
functions, while convolutional neural networks have a bias towards translation
invariance in images.
Temporal or spatial locality: This bias assumes that data points that are close o
each other in time or space are more likely to have similar labels or outputs. IH
commonly used in time series analysis or spatial data analysis.
Domain-specific biases: In some cases, domain-specific knowledge
medical
assumptions are incorporated into the learning algorithm. For example, in
may include between
diagnosis, the bias assumptions about the relationships
symptoms and diseases.
significant
It's important to note that the choice of inductive bias can have a
learming
impact on the performance and generalization capabilities of a Imachine
the
algorithmn. It can influence the types of patterns that the algorithm Can learn and
a key
biases it may exhibit. Striking the right balarnce between bias and flexibility is
challenge in machine learning rescarch and application
Learning 3.25

EVALUATION
3.7.

Fvaluating a Classification model:


Once our model iscompleted, it is necessary to evaluate its performance; either it
isa Classificationor Regression model. So for evaluating a Classification model, we
have the following ways:

() Log Loss or Cross-Entropy Loss:


It is used for evaluating the performance of a classifier, whose output is a
probability value between the 0 and 1.
For a good binary Classification model, the value of log loss should be
near to 0.
The value of log loss increases if the predicted value deviates from the
actual value.
accuracy of the model.
The lower log loss represents the higher
can be calculated as:
For Binary classification, cross-entropy
Loss = - (ylog(p) + (1-) log (1-p))
Where y = Actual output, p predicted output.

(i)Confusion Matrix:
us a matrix/table as output and describes
3 The confusion matrix provides
the performance of the model.
error matrix.
It is also known as the
in a summarized form, which has
3 The matrix consists of predictions result
predictions. The matrix
a total number of correct predictions and incorrect
looks like as below table:

Actual Positive Actual Negative

True Positive False Positive


Predicted Positive
False Negative True Negative
Predicted Negative
TP + TN
Accuracy = Total Population
|3.26| Artificial Intelligence and Machine Learning Fundamentals

(iii) AUC-ROC cUurve:


Curve and Al
ROC curve stands for Receiver Operating Characteristics
stands for Area Under the Curve.
of the classification model
3 It is a graph that shows the performance
different thresholds.
we
To visualize the performance of the multi-class classifícation model,
use the AUC-ROC Curve.
The ROC curve is plotted with TPR and FPR, where TPR (True Positiye
Rate) on Y-axis and FPR(False Positive Rate) on X-axis.

3.8. TRAIN AND TEST DATASETS IN MACHINE LEARNING

Machine Learning is one of the booming technologies across the world tha
enables computers/machines to turn a huge amount of data into predictions.
However, these predictions highly depend on the quality of the data, and if we arc
not using the right data for our model, then it will not generate the expected result. h
machine learning projects,we generally divide the original dataset into training dta
and test data. We train our model over a subset of the original dataset, ie., the
training dataset, and then evaluate whether it can generalize well to the new
unseen dataset or test set. Therefore, train and test datasets are
r
the two key conceps
of machine learning, where the training dataset is used to fit the model, and the tes
dataset ís used to evaluate the model.

3.8.1. TRAINING DATASET


The training data is the biggest (in -size) subset of is
the original dataset, which
used to train or fit the machine learning model. Firstly,
the training data is fed to the
thern learn how to make
ML algorithms, which lets them predictions for the giventask.
For example, for training a sentiment analysis model, as
the training data could be
below:
Input Output (Labels)
The nevw Ul is great Positive
Update is really
slowNegative
Learning
3.27
The training data varies depending on whether we are
using Supervised Learning
or Unsupervised Learning Algorithms.

For Unsupervised learning, the training data contains


unlabeled data points, i.e.,
inputs are not tagged with the corresponding outputs. Models are
required to find the
patterns from the given training datasets in order to
make predictions.
On the other hand, for supervised learning, the training data contains
labels in
to
order train the model and nmake predictions.

The type of training data that we provide to the model is highly responsible for the
model's accuracy and prediction ability. It means that the better the quality of the
training data, the better will be the performance of the model. Training data is
approximately more than or equal to 60% of the total data for an ML project

3.8.2. TEST DATASET


Once we train the model with the training dataset, it's time to test the model with
the test dataset. This dataset evaluates the performance of the model and ensures that
the model can generalize well with the new or unseen dataset. The test dataset is
another subset of original data, which is independent of the training dataset.
However, it has some similar types of features and class probability distribution and
uses it as a benchmark for model evaluation once the model training is completed.
Test data is a well-organized dataset that contains data for each type of scenario for a
given problem that the model would be facing when used in the real world. Usually,
the test dataset is approximately 20-25% of the total original dta for an ML project.
At this stage, we can also check and compare the testing accuracy with the
test dataset
training accuracy, which means how accurate our model is with the
on training data is greater
against the training dataset. If the accuracy of the model
than that on testing data, then the model is said to have
overfitting.

The testing data should:

3 Represent or part of the original dataset.


meaningful predictions.
Itshould be large enough to give
INTO TRAIN AND TEST SET
O.8.3, NEED OF SPLITTING DATASET
one of the important parts of data
Splitting the dataset into train and tet sets is
performance of our model and
Pre-processing, as by doing so, we can improve the
hence give better predictability.
Artificial Intelligence and Muchine Learning Fuundumentals
3.28|

:a
We can understand it as train our model with training sct and then testit
if we

and then our modcl will not be


with a
completely differcnt test datasct, ableto
features.
understand the correlations betwccn the
- Dataset

Training Set Test Set

Fig. 3.9. Train and test split


datasets, then it utt
Therefore, if we train and test the model with two different
to split a dataset into
decrease the performance of the model. Hence it is important
two parts, i.e., train and test set.
our model. Such as,. if it
In this way, we can easily evaluate the performance of
test dataset,
performs well with the training data, but does not perform well with the
then it is estimated that the model may be overfitted.
For splitting the dataset, we can use the train test split function of scikit-learn.
The bellow line of code can be used to split dataset:
1 from sklearn.model _selection import train test split

·2. x_train, x_test, y_train, y_test train test split(x, y, test size= 02
random_state-0)
Explanation:
function
In the first line of the above code, we have imported the train _test _split
from the sklearn library.
In the second line, we have used four variables, which are
x train: It is used to represent
features for the training data
x test: It is used to represent
features for testing data
y_train: It is used to represent dependent
variables for training dala
y_test: It is used to represent
independent variable for
d testing data Which
Inthe train_test_split() function, we have
Grst two are passed four parameters.
for arrays of data, Size of
and test size is the
the test set. The test_size may for specifying atio
be .5,.3, or .2, which dividing
of training and testing sets. tells the
Learning
3.29
The last parameter, random state,
is used to set a seed for a
generator so that you always get rando
the same result, and the most used Vatue
for this is 42.

a8.4. OVERFITTING AND UNDERFITTING ISsUES


Overfitting and underfitting are the most common
problems that occur in the
Machine Learning model.
A model can be said as overfitted when
it performs auite well with the training
dataset but does not generalize well with the new or unseen
dataset. The issue oi
overfitting occurs when the model tries to cover all the
data points and hence starts
caching noises present in the data. Due to this, it can't
generalize well to the new
dataset. Because of these issues, the accuracy and efficiency
of the model degrade.
Generally, the complex model has a high chance of overfitting,
There are various
ways by which we can
avoid overfitting in the model, such as Using the Cross
Validation method, early stopping the training, or by regularization, etc.
On the other hand, the nodel is said to be under-fitted when it is not
able to
capiure the underlying trend of the data. It means the model shows poor
performance even with the training dataset. In most cases, underfitting issues occur
when the model is not perfectly suitable for the problem that we are trying to solve.
Toavoid the overfitting issue, we can either increase the training time of the model
or increase the number of features in the dataset.

Training data vs. Testing Data

The main difference between training data and testing data is that training
data is the subset of original data that is used to train the machine learning
model, whereas testing data is used to check the accuracy of the model.
$ The training dataset is generally larger in size compared to the testing
dataset. The general ratios of splitting train and test datasets are 80:20.
70:30, or 90:10.
Training data is well known to the model as it is used to train the model,
whereas testing data is like unseen/new data to the model.
3.8.5. WORKING OF TRAINING AND TESTING DATA IN MACHINE LEARNING

Machine Learning algorithms enable the machines to make predictions and solve
problems on the basis of past observations or experiences. These experiences or
Machine Learning Fundamental
Artificial Intelligence
and
3.30|
which is fed to it. Further,
can from the training data,
take
Observations an algorithm that they can learn and.improve gver
ML algorithms is
one of the great things about data.
are trained with the relevant training
own, as they
time on their relevant training data, it is tested wih
with the
Once the model is trained enough process of training and testing
in

test data. We can understand the whole


the
steps, which are as follows: feeding it with training inm.
train the model by
1. Fced: Firstly, we need to
data. corresponding outputs (in
tagged with the
Define: Now, training data is
2.
the training data into tet
model transforms
Supervised Learning), and the
features.
vectors or a number of data
model by feeding it with the test
step, we test the
3. Test: In the last model is trained efficienty
step ensures that the
data/unseen dataset. This
and can generalize well.
using a flowchart given below
The above process is explained
1. Feed

Training data

2 Tag

Tag with desired


output
Use more
Al learns from training data
human tag

3. Test Incorrect
output

Correct output

Model is ready
Fig. 3.10. Train and test
split flowchart
Learning |3.31

Traits of Quality training data


the ability to the prediction of an ML model highly depends on how it has been
As

trained, therefore it is important to train the model with quality data. Further, ML
works on the concept of "Garbage In, Garbage Out." It means that whatever type
of

ibta we will input into our model, it will make the predictions accordingly. For a
ouality training data, the below points should be considered:

() Relevant
The very first quality of training data should be relevant to the problem that you
are going to solve. It means that
whatever data you are using should be relevant to
the current problem. For example, if you are building a model to analyze social
media data, then data should be taken from different social sites such as Twitter,
Facebook, Instagram, etc.

(i) Uniform:
There should always be uniformity among the features of a dataset. It means all
data for a particular problem should be taken from the same source with the same
atributes.

(iin) Consistency:
In thedataset, the similar attributes must always correspond to the similar label in
order to ensure uniformity in the dataset.

(v) Comprehensive:

Ihe training data must be large enough to represent sufficient features that
you
Ned to train the model in a better way. With a comprehensive dataset, the model will
be able to learn all the edge cases.

3.9, CROSS-VALIDATION

Cross-validation is a technique for validating the model etticiency by training it


on
the subset of innput data and testing on previously unseen subset of the input data.
We can a statistical model
also say it is a technique to check how
that generalizes to
an independent
dataset.
In machine
learning, there is always the
need to test the stability of the model. It
Means
based only on the training dataset; we can't fit our model on the training
Machine Learning Fundamentals
Artificial Itelligence and
3.32
For this purpose, we reserve a particular sample of the dataset m:,
dataset.
we test our model on that sample before
not part of the training dataset. After that,
process comes under croSs-validation T:
deployment, and this complete
something different from the general train-test split.
Hence the basic steps of cross-validations are:
Reserve a subset of the dataset as a validation set.
Provide the training tothe model using the training dataset.
Now, evaluate model performance using the validation set. If the model
performs well with the validation set, perform the further step, else check
for the issues.

Methods used for Cross-Validation


There are somne common methods that are used for cross-validation. These
methods are given below:

3.9.1. VALIDATION SET APPROACH


(i) Leave-P-out cross-validation
(ii) Leave one out cross-validation
(iüi) K-fold cross-validation

(iv) Stratified k-fold cross-validation


(v) Validation Set Approach
We
divide our input dataset
into a
validation set approach. training set and test or set in the
Both the subsets are validation
given 50% of the dataset.
it has one of the big disadvantages
But

train our model, so the model may that we are just t0


miss
using a 50% dataset
out to capture
inmportant information t®
of
dataset. It also tends to give the underfitted
model.
(0) Leave-P-out cross-validation
In this approach, the p
datasets are
are total n datapoints in left out of
the original the training data. means. ifther
as the training dataset input dataset, lt willbe used
and then n - p
process is repeated for all
thep data points as data Þoints
comple
the samples, the validation This
the effectiveness of the model. and the average set. kno
error calculated to
is
Learning
3.33|

There is a disadvantage of this technique: that is, it can be computationally


difficult for the large p.

i) Leave one out cross-validation


This method is similar to the leave-p-out
cross-validation, but instead of p, we
need to take 1
dataset out of training. It means, in this approach, for each learning
set, only one datapoint is reserved, and the remaining dataset is used to train the
model. This process repeats for each datapoint. Hence
for n samples, we get n
different training set and n test set. It has the following
features:
In thisapproach, the bias is minimum as all the data points are used.
The process is executed for n times; hence execution time is high.

* This approach leads to high variation in testing the effectiveness of the


model as we iteratively check against one data point.

(ii) K-Fold Cross-Validation

K-fold cross-validation approach divides the input dataset into K groups of


samples of equal sizes. These samples are called folds. For each learning set, the
prediction function uses k-1 folds, and the rest of the folds are used for the test set.
This approach is a very popular CV approach because it is easy to understand, and
the output is less biased than other methods.
The steps for k-fold cross-validation are:
Split the input dataset into K groups
For each group:

Take one group as the reserve or test data


set,

Use remaining groups as the training dataset


Fit the model on the training set and evaluate the performance of the
model using the test set.
Let's take an example of 5-folds cross-validation. So, the dataset is grouped into S
folds. On 1st iteration. the first fold isreserved for test the mnodel, and rest are used to

rain the model. On 2nd iteration, the second told is Used to test the model, and rest
are used to train the model. This process will continue until each fold is not used for

the test fold.


Machine Learning Fundamentals
3.34| Artificial Itelligence and

Consider the below diagram:


(Fold 1)
Fold 1
Fold 1 Fold
) (Fold 1

Fold 2) Fold 2 Fold 2)


2)
(Fold 2) Fold
Fold 3)-Dataset
{Fold 3)(Dataset Fold3)
Dataset (Fold 3
Dataset Fold 3 [Dataset
Fold 4) Fold 4 Fold
4
4 Fold 4)
Fold
(Fold 5) Fold 5) Fold 5
Fold 5) Fold 5)

00 Training set
Testing set

Fig. 3.1l. Validation

(iv) Stratified k-fold cross-validation

This technique is similar to k-fold cross-validation with some little changes.


This

approach works on stratification concept, it is a process of rearranging the data to


ensure that each fold or group is a good representative of the complete dataset. To
deal with the bias and variance, it is one of the best approaches.
It can be understood with an example
of housing prices, such that the price of
some houses can be much high than other houses. To tackle such situations, a
stratified k-fold cross-validaion technique is useful.
(v) Holdout Method
This method is the simplest cross-validation technigue among all. In this metho,
we need to remove a subset the training
of data and use it to get prediction results D)
training it on the rest part of the dataset.
The error that occurs in this process tells how
our model will perform with
well
the unknown dataset. Although this
approach is simple to perform,
icsue of high variance, and it still faces u
it also produces misleading
results sometimes.
3.9.2. COMPARISON OF CROSS-VALIDATION
TO TRAIN/TEST SPLIT IN IMACHINE
LEARNING

Train / test split: The input data is divided


intotwo parts, that are set and
test set on ratio of 70:30, 80:20, etc.
a
lt provides a high trainng
biggest disadvantages. variance, which is One ofthe

Training Data: The training


data is used to train the dependent
variable is known. model, and the
Learning 3.35

Test Data: The test data is used to make the predictions from the model that is
already trained on the training data. This has the same features as training data but
not the part of that.
Cross-Validation dataset: It is used to overcome the disadvantage of train/test
split by splitting the dataset into groups of train/test splits, and averaging the result. It
can be used
if we want to optimize our model that has been trained on the training
dataset for the best performance. It is more efficient as compared to train/test split as
every observation is used for the training and testing both.

3.9.3. LIMITATIONS OF CROSS-VALIDATION

There are some limitations of the cross-validation technique, which are given
below:
For the ideal conditions, it provides the optimum output. But for the
inconsistent data, it may produce a drastic result. So, it is one of the big
disadvantages of cross-validation, as there is no certainty of the type of
data in nachine learning.
In predictive modeling, the data evolves over a period, due to which, it
may face the differences between the training set and validation sets. Such
as if we create a model for the prediction of stock market values, and the
data is trained on the previous 5 years stock values, but the realistic future
so it is difficult to
values for the next 5 years may drastically different,
expect the correct output for such situations.
3.9.4. APPLICATIONS OF CROSS-VALIDATION

This technique can be used to compare the performance of different


predictive modeling methods.
research field.
It has great scope in the medical
It can also be used for the meta-analysis, as it is already being used by the
statistics.
data scientists in the field of medical

LEARNING
3.10. OVERFITTINGAND UNDERFITTING IN MACHINE

occur in machine
Overfitting and Underfitting are the tWo main problems that
learning and degrade the performance of the machine learning models.
I Machine Learning Fundamend
3.36| Artificial Intelligence and

model is to generalize well.


The main goal of each machine learning. He,
a
ML model to provide suitable outpa by
generalization defines the ability of an
means after providing training n
adapting the given set of unknown input. It te
accurate output. Hence, the underfitina
dataset, it can produce reliable and
need to be checked for the perforrnance of e
Overtitting are the two terms that
or not.
model and whether the model is generalizing well
understand sone toie
Before understanding the overfitting and underfitting, let's
term that will help to understand this topic well:
that heips te
3 Signal: It refers to the true underlying pattern of the data
machine learning model to learn from the data.
Noise: Noise is unnecessary and irrelevant data that reduces
fre

performance of the model.


to
Bias: Bias is a prediction error that is introduced in the model die
oversimplifying the machine learning algorithms. Or it is the differeac
between the predicted values and the actual values.
Variance: If the machine learning model performs well with the tririn
dataset, but does not perform well ,with the test dataset, then vaiz
0ccurS.

3.10.1. OVERFITTING
Overfitting occurs when our machine learning model tries to coverall t
data points or more the
than the required data points present in
dataset. Because of this, the model starts inaccurs
caching noise and
values present in the dataset, and all these efficiency
factors reduce the
higt
accuracy of the model. The overfitted model has and
low bias
variance.
The chances of occurrence of provit
overfitting increase as much We
m
training to our model, It means the more we model. the
chances of occurring the overfitted model.
train our

Overfitting is the main problem


that occurs in supervised learmng
Example: The concept of the overfitting can be
understood by the below
the linear regression output:
Learning
3.37

Fig. 3.12. linear regression output


As we can see from the above graph, the model tries to cover all the data points
present in the scatter plot. It may look efficient,
but in reality, it is not so. Because
the goal of the regression model to find the
best fit line, but herc we havc not got any
best fit, so, it will generate the prediction errors.

How to avoid the Overfitting in Model


Both overfitting and underfitting cause the degraded performance of the machine
learmingmodel. But the main cause is overfitting, so there are some ways by which
we can reduce the occurrence
of overfitting in our model.
Cross-Validation

3 Training with more data

3 Removing features

* Early stopping the training

Regularization
Ensembling
3.10.2. UNDERFITTING
not able to
Underfitting occurs when our machine learning model is
capture the underlying trend of the data. To avoid the overfitting in the
model, the fed of training data can be stopped at an early stage, due to
a result.
which the model may not learn enouglh from the training data. As
trend in the data.
it mayfail to find the best fit of thedominant
Artificial Intelligence and Machine Learning. Fundamentalg
|3.38|
to learn enough fromthe
the case of underfitting, the model is not able
In accuracy and produces unreliable
training data, and hence it reduces the
predictions.
bias and low variance.
An underfitted model has high
lines
We can understand the underfitting using below output of the
Example:
regression model:

Fig. 3.13. under fitting

As we can see from the above diagram, the model is unable to capture the daa
points present in the plot.
How to avoid underfitting:
Byincreasing the training time
3 of the model.
By increasing the number of features.

Goodness of Fit
The "Goodness fit" term is taken from the statistics, and the goal oftàe
of

machine. learning models to achieve the Statistis


goodness of fit. In
modeling, it defines how closely matchthe
the result or predicted values
true values of the dataset.
The model with a node!
good fit is between the underfitted
and overfitteddifticul
and ideally, it makes predictions with 0 errors,
but in practice, itis
to achieve it.
As we go
when train our model for a time. the errors data
in the training for
down, and the same happens with test model !
data, But if we train the
long duration, then the perfomance of due to üs
the model may decrease
overfitting, as the model also
alsolearnthe
noise present in the dataset.
Learning 3.39)

The crrors in the test datasct start


increasing, so the noint, just beforc the
raising of crrors, is the good point, and we can stop here for achieving a
good modcl.
Therc arc two otlher methods by wlhich we can yet a good point for our
model, which arc the resampling mcthod to cstimate model accuracy and
validation datasct.

11 BIAS AND VARIANCE IN MACHINE LEARNING

Machine learning is a branch


of Artificial Intclligcnce, which allows machines to
perforn data analysis and make predictions. However, if the machine Jearning model
accurate, it can make predictions crrors, and these prediction errors are usually
is not

known as Bias and Variance. In machinc learning, these errors


will always be present
as there is always a slight difference between the model predictions and actual
predictions. The main aim of ML/data science analysts is to reduce these errors in

order to get more accurate results.

High Bias High Variance

Validation Error

Error

Training Eror

Model Complexity

Fig, 3.14. blas and variance


3.11.1.
ERRORS IN MACHINE LEARNING

n machine learning, an error of how accurately an algorithm can


is a measure
hake predictions for
the previously unknown dataset. On the basis of these errors,
he machine
learning model is selected that can perform best on the particular
dataset,
Tbere are errors which are,
minly two lypes of
in machine learning,
I Machine Learning Fundamentals
3.40| Artificial Intelligence and

mod
errors: These errors can be reduced to improve the
() Reducible bias and Variance
accuracy. Such errors can further be classified into
errors: These errors will always be present in the modsl
(ii) Irreducible cause of these erroro i
algorithm has been used. The
regardless of which
reduced.
unknown variables whose value can't be

3.11.2. BIAS
In general, a machine learning model analyses the data,
find patterns in it and
dataset and
make predictions. While training, the model learns these patterns in the
a difference
applies them to test data for prediction. While making predictions,
occurs between prediction values made by the model and actual values/expected
can be
values, and this difference is known as bias errors or Errors due to bias. It
defined as an inability of machine learning algorithms such as Linear Regression to
capture the true relationship between the data points. Each algorithm begins wih
some amount of bias because bias occurs from assumptions in the model, wich
makes the target function simple to learn. A model has either:
(i) Low Bias: A low bias model will make fewer assumptions about the form
of the target function.
(ii) High Bias: A model with a high bias makes more. assumptions, and u
model becomes unable to capture the important features of our dataseL 1
high bias model also cannot perform well on new data.
Generally, a linear algorithm has a high bias, as it makes them learn fast.
simpler the algorithm, the higher the bias it has likely to be introduced. Whereas 3
nonlinear algorithm often has low bias.
Some examples of machine learning algorithms with low bias are Decision Tres
k-Nearest Neighbours and Support Vector Machines, algorithm
At the same time, an
with high bias is Linear Regression, Linear Logisti
Discriminant Analysis and
Regression.

3.11.3. WAYS TO REDUCE HIGH BIAS:


High bias mainly occurs due to a
much simple model.
ways
Below are Some
reduce the high bias:
Loerease the input teatures
as the model is underfitted.
Decrease the regularization term.
Learning
|3.41|
& Use more complex models,
such as including some polynomial features.
3.11.4. VARIANCE ERROR
The variance would specify the amount of variation in the prediction
if the
diferent training ata was used. In simple words, variance
tells that how much a
andom variable is different from its expected yalue.
Ideally, a model shouldnot vary
too much from one training dataset to
another, which means the algorithm should be
oo0d in understanding the hidden mapping
between inputs and output variables.
Variance errors are either of low variance or high varijance.

Low variance means there is a small variation in the prediction


of the target
function with changes in the training data set. At the same time, High variance
shows
a large variation in
the prediction of the target function with changes in the training
dataset.
A model that shows high variance earns a lot and perform well with the training
dataset, and does not generalize well with the unseen dataset. As a result, such
model gives good results with the training dataset but shows high error rates on the
test dataset.

Since, with high variance, the model learns too much from the dataset, it leads to
overfitting of the model. A model with high variance has the below problems:

3 A high variance model leads to overfitting.


$ Increase model complexities.
Usually, nonlinear algorithms have a lot of flexibility to fit the model, have high
variance.
Y4

Optimal Balance X Overfiting


Underfiting X

Fig. 3.15. Variance


variance are, Linear
Some examples of machine learning algorithms with low
Regression, Logistic Regression, and Linear discriminant analysis. At the same time,
algorithms with high variance are decision tree, Support Vector Machine, and K-
ncarest neighbours.
3.42| Artificial Intelligence and Machine Learning. Fundamentals

Ways to Reduce High Variance:


as a model
Reduce the input features or number of parameters
overfitted.
Do not use a much complex model.
Increase the training data.
Increase the Regularization term.

Different Combinations of Bias-Variance


There are four possible combinations of bias and variances, which are

represented by the below diagram:


Low Variance High Variance

Bias

Low

Bias

High

Fig. 3.16. four possible combinations


of bias and variances
(i) Low-Bias, Low-Variance:
The combination of low bias and low variance shows an
ideal machine leariu
model. However, it is not possible practically.
(ii) Low-Bias, High-Variance:
With bias and high variance, model
low predictions are inconsistent and accurate
on average. This case Occurs when of
the model learns with a large number
parameters and hence leads to an overfitting
(H) High-Bias, Low-Varlance:
l High bias and low variancc, predictions
With
inaccurate o1
are consistent
occurs when a but
average. This case model does not
learn well with
dataset

or uses few numbers of the parameter. It leads the training


to underfitting problems the model.
in
Learning 3.43
(iv) High-Bias, High-Variance:
With high bias and high variance, predictions are inconsistcnt and also inaccurate

on average.

3.11.5. IDENTIFICATION OF HIGH VARIANCE OR HIGH BIAS:

High variance can be identified if the model has:


Test Error

High Bias
High
Error
Variance

Acceptatble Test Error

Training Error

Training instances

Fig. 3.17. High variance identification


Low training error and high test error.
High Bias can be identified if the model has:
High training error and the test error is almost similar to training error.

Bias-Variance Trade-Off
While building the machine learning model, it is really important to take care of
bias and variance in order to avoid overfitting and underfitting in the model. If the
model is very simple with fewer parameters, it may have low variance and high bias.

Whereas, if the model has a large number of parameters, it will have high variance
and low bias. So, it is required to make a balance between bias and variance
errors,

and this balance between the bias error and variance


error is known as the Bias
Variance trade-off.
a low
For an accurate prediction of the model, algorithms need low variance and
are related to each other:
Olas, But this is not possible because bias and variance
bias.
we If the variance, it will increase the
decrease
increase the variance.
If we decrease the bias, it will
we need
Bias-Variance trade-off is a central issue in supervised learning. ldeally,
and simultaneously
model that accurately captures the regularities in training data
|3.44| Artificial Intelligence and. Machine Learning Fundamentals

generalizes well with the unseen dataset. Unfortunately, doing this is not possihle
simultaneously. Because a high variance algorithm may perform wellwith trainino
data, but it may lead to overfitting to noisy data. Whereas, high bias algorithm
generates a much simple model that may not even capture important regularities in
the data. So, we need to find a sweet spot between bias and variance to make an
optimal mnodel.
Complexity

Model Total Error

ErTor
Optimal

Variance

Bias
Model Complexity
Fig. 3.18. Bias-Variance
Hence, the Bias-Variance Trade-Off
trade-off is about finding
balance between bias
and variance errors. the sweet spot to make d

3.12. REGRESSION
Regression analysis
is a statistical method
dependent (target) to model the
and independent relationship between s
independent variables. More (predictor)
specifically, Regressionvariables with one Or more
how the value the dependent
of
analysis helps us
variable when other independent
variable is
changing corresponding to understand
values such as temperature, variables are
held fixed. to an independent
age, salary, It predicts continuous/real
We can anderstand price, etc.
the concept regression
of analysis using
Example: Suppose there is the below example:
a marketing company
advertisement every year and A, various
get sales on who does
advertisement made by ihe company that. The the
inthe last S years and below list shows
the correspondinB sales:
Learning
3.45

Advertisement
Sales
$90
$1000
$120
$1300
$150 $1800
$100 $1200
$130 $1380
$200 ??
Fig. 3.19. example
Now, the company wantS to do the advertisement of $200 in the year 2019 and
wants to know the prediction
about the sales for this year. So tosolve such type of
prediction problems in machine learning, we need regression analysis.

Regression is a supervised learning technique which helps in finding the


correlation between variables and enables us to predict the continuous output
variable based on the one or more predictor variables. It is mainly used for
prediction, forecasting, time series modeling, and determining the causal-effect
relationship between variables.
In Regression, we plot a graph between the variables which best fits the given
datapoints, using this plot, the machine learning model can make predictions about
the data. In simple words,"Regression shows a line or curve that passes through all

he datapoints on target-predictor graph in such a way that the vertical distance


between the datapoints and the regression line is minimum." The distance between
or
datapoints tells whether a model has captured a strong relationship not.
and line
Some examples of regression can be as:
Prediction of rain using temperature and other factors
Determining Market trends

3 Prediction of road accidents due to rash driving.


ANALYSIS:
3.12.1. TERMINOLOGIES RELATEDTO THE REGRESSION
we
Variable: The main factor in Regression analysis which
() Dependent
or understand is called the dependent variable. It is also
want to predict
called target variable.
Machine Learning Fundamentaly
|346 Artificial Intelligence and

whích affect the dependent variables


(ií) Independent Variable: The factors
of the dependent variables
Or which are used to predict the values
as a predictor.
called independent variable, alsocalled
very lovw value o
(if) Outliers: Outlier is an observation which contains either
very high value in comparison to other observed values. An outlier
harmper the result, so it should be avoided.
(iv) Multicollinearity: If the independent variables
are highly correlated with
cach other than other variables, then such condition is callei
Multicollinearity. It shouldnot be present in the dataset, because it creats
problem while ranking the most affecting variable.
(v) Underfitting and Overfiting: If our algorithm works well with he
training dataset but not wellwith test dataset, then such problem is called
Overfitting, And if our algorithm does not perform well even with training
dataset, then such problem is called underfitting.
Below are some other reasons for using Regression analysis:

3 Regression estimates the relationship between the target and the

independent variable.
4. It isused to find the trends in data.
It helps to predict real/continuous values.
3 By performing the regression, we can confidently determine the most
inportant factor, the least important factor, and how each factor 3
affecting the other factors.

3.12.2. TYPES OF REGRESSION

There are various types of regressions


which are used machine
in data science and
learning. Each type has its own importance on different core, all
scenarios, but at the
the regression methods analyze the effect of the independent dependent
variable on
varíables. Here vwe are discussing some important which
types of regression
are

given below:
Lincar Regression
Logistic Regression
Polynomial Regression
Learning 347

Support Vector Regression


Decision Tree Regression
Random Forest Regression
Ridge Regression
Lasso Regression

3.12.3. LINEAR REGRESSION:

Linear regression is a statistical regression method which is used for


predictive analysis.
It is one of the very simple and easy algorithms which works on regression
and shows the relationship between the continuous variables.
It is used for solving the regression problem in machine learing.

$ Linear regression shows the linear relationship between the independent


variable (X-axis) and the dependent variable (Y-axis), hence called linear
regression.
If there is only one input variable (x), then such linear regression is
called simple linear regression. And if there is more than one input
variable, then such linear regression is called multiple linear regression.
The relationship between variables in the linear regression model can be
an
explained using the below image. Here we are predicting the salary of
hemployee on the basis of the year of experience.

120000

100000

Salary 80000

60000

40000

2 6 8 10
experience

Fig. 3.20. linear regression


Fundamentole
|3.48 Artificial Intelligence and Machine Learning

Below is the mathematical equation for Linear regression:


Y = aX + b
Here, Y dependent variables (target variables),
=

variables),
X = Independent variables (predictor
a
and b are the linear coefficients
Some popular applications of linear regression are:
Analyzing trends and sales estimates

$ Salary forecasting
Real estate prediction
Arriving at ETAs in traffic.

3.12.4. LOGISTIC REGRESSION:


Logistic regression is another supervised learning algorithm which is used
to solve the classification problems. In classification problems, we have
dependent variables in a binary or discrete format such as 0 or 1.
Logistic regression algorithm works with the categorical variable such as 0
or 1, Yes or No, True or False, Spam or not spam, etc.

&.It is a predictive analysis algorithm hich works on the concept of

probability.
Logistic regression is a type of regression, but it is different from the linear
regression algorithm in the term how they are used.
Logistic regression uses sigmoid function or logistic function which is a
complex cost function. This sigmoid function is used to model the data in
logistic regression. The function can be represented as:
1

fr) =
1+e-*
fe)= Output betweenthe 0 and 1
value.

3x= input to the function


e= base of natural logarithm.
When we
provide the input values (data) to the function, it gives the S-curve
as

follows:
Learning |3.49|

1 sig(x)
sig(x) =
1+e
0.8

0.6

0.4/

0.2
X

-6 -1 -2 2 4 6 8

Fig. 3.21. S-curve


It uses the concept of threshold levels, values above the threshold level are
rounded up to 1, and values below the threshold level are rounded up to 0.
There are three types of logistic regression:
$ Binary(0/1, pass/fail)
* Multi(cats, dogs, lions)
Ordinal(low, medium, high)

TWO MARKS QUESTION WITH ANSWERS

1. What is Machine learning?


Machine learning is a branch of computer science which deals with system
programming in order to automatically learn and improve with experience. For
can perform the task based on data
example: Robots are programed so that they
programs from data.
they gather from sensors. It automatically learns
d What is Overfitting' in Machine learning?
error or noise
In machine learning, when a statistical modeldescribes random
occurs. When a model is
Instead of underlying relationship 'overfitting'
having too
excessively complex. overfitting is normally observed, because of
The model
many parameters with respect to the number of training data types.
has been overfit.
exibits poor performance which
Fundamentol
Machine Learning
3.50| Artificial Intelligence and

3. Why overfitting happens? model


overfitting exists as the criteria used for training the
The possibility of
1S not the same as the criteria used
to judge the efficacy of a model.

4. How can youavoid overfitting?


a overfitting can be avoided, overfitting happens
By using lot of data
a dataset, and you try to learn from it. But if yon
relatively as you have small
you are forced to come with a model based on that. In
have a small database and
use a technique known as cross validation. In this method
such situation, you can
dataset splits into two section, testing and training datasets, the testing
the
will only test the model while, in training dataset, the datapoints
will
dataset
come up with the model.
a on which
In this technique, a model is usually given a dataset of known data
which
training (training data set) is run and a dataset of unknown data against
the model is tested. The idea of cross validation is to define a dataset to "test"
the model in the training phase.
5. What is inductive machine learning?
The inductive machine learning involves the process of learning by examples,
where a system, fromn a set of observed instances tries to induceageneral rule.
6. What are the different Algorithm techniques in Machine Learning?
The different types of techniques in Machine Learning are
Supervised Learning
Unsupervised Learning

3 Semi-supervised Learning
Reinforcement Learning
Transduction
Learningto Learn
7. What are the three stages to build machine
the hypotheses or model in
learning?
Model building

Model testinge
X Applying the modelodda
Loaning 3.51

What is Training set' and Test set'?


In various areas of information science like machine a set of data is
learning,
usedto diiscover the potentially predictive relationship known as Training Set'.
Training set is an examples givento the learner,
while Test set is used to test the
accuracy of the hypotheses generated by
thelearner, and it is the set of example
held back from the learner. Training set are distinct from Test set.
A
List down variouS approaches for machine learning?
The different approaches in Machine Learning are

* Concept Vs Classification Learning


Symbolic Vs Statistical Learning
Inductive VsAnalytical Learning
10. What is classification in machine learning?
Classification is a task in machine learning that involves assigning predefined
categories or labels to data instances based on their features or characteristics.
The goal is to learna model that can accurately classify new, unseen data.

11. What is regression in machine learning?


Regression is a type of supervised learning where the goal is to predict a
continuous numerical value or a real-valued output based on input features. It
involves fitting a mathematical function or model to the training data to estimate
the relationships between the input variables and the target variable
12. How is
sprobability used in machine learning?
Probability is used in machine learning to model uncertainty and make
Probabilistic predictions. It helps quantify the likelihood of different
outcomesor
events based on available evidence or data. Probability theory is applied in
Yanous algorithms ike Naive Bayes, Hidden Markov Models,
and Bayesian
Networks.
13. What learning?
role linear algebra play in machine
does
for data representation, matrix
Linear
algebra is essential in machine learning a
It provides a
optimization techniques.
linear transformations, and
operations,
mathematical manipulating and analyzing data, as well as
framework for
building models efficiently.
and trainingg
Machine Learning Fundamental,
Artificial Intelligence and
|3.52
inductive bias in machine learning?
14. What is the hypothesis space
and
possible functions or models that a
refers to the set
of
The hypothesis space refer bias refers
to the assumptions or
can learn. Inductive
machine learning algorithm favoring certain hypotheses or
process by
constraints that guide the learning training data to
unseen data
in generalizing from
models over others. It helös machin.
evaluation and training 7 test sets in
15. What is the
importance of
learning? capabilities
crucial to assess the performance and generalization
Evaluation is are used to divide the
model. Training and test sets
of a machine learning model training, and the test set Used
set used for
available data, with the training
performance on unseen data. This helps estimate how
to evaluate the mnodel's
real-world scenarios.
well the modelwill perform in

REVIEW QUESTIONS

components. Discuss the


1. Explain the definition of machine learning and its key
importance of data and the learning process in machine learning.
learning. Provide
2. Describe the classification and regression problems in machine
examples of each and discuss the differences between the two.
3. Discuss different approaches to building machine learning models. Compare nd
contrast supervised, unsupervised, and reinforcement learning algorithms.
4. Explain the concept of probability and its relevance in machine learning. Discus
how probability is used in decision-making and uncertainty modeling.
learning
5. Discuss the basics of linear algebra and its significance in machine
mode!
Explain how matrices and vectors are used to represent data and
parameters.
Discuss
6.
Define the hypothesis space and inductive bias in machine learning.
the
how the choice of inductive bias influences the learning process and
a
generalization capabilities of model.

You might also like