Machine Learning: Types and Techniques
Machine Learning: Types and Techniques
LEARNING
Machine learning: definition - classification - Regression - approaches of machine
learning models Types of learning probability - Basics - Linear Algebra
Hypothesis space and inductive bias, evaluation, Training and test sets, cross
validation, concept of over fitting, under fitting, Bias and variance - Regression:
Linear regression - Logistic regression.
Feature extraction: Machine learning models often require the identification and
selection of relevant features from the input data. Feature extraction helps
in
reducingthe dimensionality of the data and focusing on the most informative aspects,
Evaluation and Testing: After training, the machine learning model is evaluated
using separate test data to measure its performance and accuracy. This step helps
assess the model's generalization capabilities and ensures it can make accurate
predictions or decisions on new, unseen data.
Machine learning has a wide range of applications, including image and speech
recognition, natural language processing, recommendation systems, fraud detection.
autonomous vehicles, healthcare diagnostics, and many mor. It continues to advance
rapidly, with new techniques and models being developed to tackle increasingly
complex problems and improve performance.
Machine Learning is defined as a technology that is used to train machines to
perform various actions such as predictions, recommendations, estimations, etc.
based on historical data or past experience.
Machine Learning enables computers to behave like human beings by training
them with the help of past experience and predicted data
There are three key aspects of Machine Learning, which are as follows:
Task: A task is defined as the main problem in which we are interested.
This task/problem can be related to the predictions and recommendations
and estimations, etc.
Experience: It is defined as learning from historical or past data and used
to estimate and resolve future tasks.
Performance: It is defined as the capacity of any machine to
resolveany
machine learning task or problem and provide
the best outcome for the
same. However, performance
is dependent on the type of machine learning
problems.
Classification
Regression
(a) Classification
(b) Regression
Regression algorithms are used to solve regression problems in which there is a
linear relationship between input and output variables. These are used to predict
continuous output variables,such as market trends, weather prediction, ete.
Some popular Regression algorithms are given below:
Simple Linear Regression Algorithm
3.4 Artiflcial Itelligence and Machlne Learning 'undamentd,
Lasso Regression
Advantages:
Since supervised lcarning work with the labelled dataset so we can have an
exact idea about the classes of objccts.
These algorithms are helpful in predicting the output on the basis of prig
experience.
Disadvantages:
These algorithms are not able to solve complex tasks.
It may predict the wrong output if the test data is different from the
training data.
It requires lots of computational time to train the algorithm.
Applicationsof Supervised Learning
Some common applications of Supervised Learning are given below:
Image Segmentation:
Supervised Learning algorithms are used in image
segmentation. In this process,
image classification is performed on different image data with pre-defined
labels.
Medical Diagnosis:
Supervised algorithms are also used inthe mcdical ficld
for diagnosis purposes. I1
is done by using medical images and past labclled
data with labels for disease
conditions. With such a process, the machine can identify a
disease for the new
patients.
Fraud Detection -
Superviscd Learning classilication algorithms are
used for
identifying fraud transactions, fraud customers, ctc, It is donec
by using historic dala
to identify the patterns that can lead to possible fraud.
Spam detection - In spam detection & filtering, lassification
algorithms are
Jsed. These algorithms classIly an email as spam or not spam.
The spam emails ar
sent tothe spam folder.
Learning
3.5
Speech Recognition' - Supervised learming algorithms are
also used in speech
recognition. The algorithm is trained with voice data, and
various identifications can
be done using the same, such as voice-activated
passwords, voice commands, etc.
3.2.2. UNSUPERVISED LEARNING
In unsupervised learning, a machine is trained
with some input samples or labels
only, while output is not known. The
training information is neither classified nor
labeled; hence, a machine may not always
provide correct output compared to
supervised learning.
Although Unsupervised learning is less common in practical
business settings, it
helps in exploring the data and can draw inferences from
datasets to describe hidden
structures from unlabeled data.
Example: Let's assume a machine is trained with some set
of documents having
different categories (Type A, B, and C), and we
have to organize them into
appropriate groups. Because the machine is provided only
with input samples or
without output, so, it can organize these datasets into type A, type B, and type C
categories, but it is not necessary whether it is organized correctly or not.
Ünsupervised Learning can be further classified into two types, which are given
below:
Clustering
Association
() Clustering
The clustering technique is used when we want to find the inherent groups from
he data. It is a way to group the objects into a cluster such that the objects with the
most similarities remain in one group and have fewer or no similarities with the
Objects of other groups. An example of the clustering algorithm is grouping the
(i) Association
learning technique, which fo
Association rule learning is an unsupervised
a large dataset. The main aim of this
interesting relations among variables within
Video Games:
RL algorithms are much popular in gaming applications. It is used to gain super
human performance. Some popular games that use RL algorithms are AlphaGO and
AlphaGO Zero.
Resource Management:
The "Resource Management with Deep Reinforcement Learning" paper showed
that how to use RL in computer to automatically learn and schedule resources to wait
for different jobs in order to minimize average job slowdown.
3.8 Artificial Intelligence and Machine Learning Fundamentals
Robotics:
RL iswidely being used in Robotics applications. Robots are used in the industrial
and manufacturing area, and these robots are made more powerful wit,
reinforcement learning. There are different industries that have their vision of
building intelligent robots using Al and Machine learning technology.
Text Mining
Text-mining, one of the great applications of NLP, is now. being implemented
with the help of Reinforcement Learning by Sales force company.
Advantages
It helps in solving complex real-world problems which are difficult to be
solved by general techniques.
The learning model of RL is similar to the learning of human beirgs
hence mnost accurate results can be found.
Helps in achieving long term results.
Disadvantage
RL algorithms are not preferred for simpleproblems.
3
In classification algorithm,
a discrete output function(y) is
mappedto
inpu
variable(x ).
y=
y=f*), where categorical output
The best example of an ML classification algorithm is Email Span
Detector.
The main goal of the Classification algorithm is to identify the category
a given dataset, and these algorithms are mainly used to predict the Outr
t
for the categorical data.
Classification algorithms can be better understood using the belo
diagram. In the below diagram, there are two classes, class A and Clas R
These classes have features that are similar to each other
and dissimilar ta
other classes.
Class A
Class B
A
X
Fig. 3.1. Clasification
algorithms
The algorithm which implements
the classification on a
classifier. There are two types dataset is known
of Classifications:
Binary Classifier:
If the classification problem has
outcomes, then
it is called as Binary only two possibk
Classifier.
Examples: YES or
NO, MALE or NOT
SPAM,CAT or FEMALE, SPAM or
D0G, etc.
Multi-class Classifier:
If a classification
outcomes, then it is called
as problem has more than tu
Multi-class Classifier.
Learning
|3.11
Naïve Bayes
Decision TreeClassification
Probability is the bedrock of ML, which tells how likely is the event to occur. The
value of Probability always lies between 0 to 1. It is the core concept as well as a
primary prerequisite to understanding the ML models and their applications.
Probability can be calculated by the number of times the eventoccurs divided by
the total number of possible outcomes. Let's suppose we tossed a coin, then the
probability of getting head as a possible outcome can be calculated as below formula:
Number of ways to head occur
P (H)
total number of possible outcomes
1
P (H) =
2
P (H) = 0.5
a
Where: P (H) = Probability of occurring Head as outcome while tossing coin.
Probability can
be calculated as
() Empirical Probability: Empirical the
occurs divided by the total number of incidets
number of times the event
observed.
can be calculated ae 4
(ii) Theoretical Probability: Theoretical Probability
can occur divided by the total number
number of ways the particular event
of possible outcomes.
(üi) Joint Probability: It tells the Probability of simultaneously occurring two
random events.
P(AN B) = P(A).P(B)
=
Where: P(ANB) Probability of occurringevents A and B both.
P (A) = Probability of event A
P (B) = Probability of event B
(iv) Conditional Probability: It is given by the Probability
of event A given
that event B occurred.
The Probability of an event A conditioned on an event B is denoted and defined
as;
P(A|B) = P(ANB)/P(B)
Similarly, P(B|A) = P(ANB) IP(A).
We can
write the joint
Probability of as
andB as P(AN B) P(A): PBA.
A
=
which means: "The chance of both
things happening is the chance that the first
one
Basic mathematics principles and concepts like Linear algebra are the foundation
of Machine Learning and Deep Learning systems. To learn and understand Machine
Learning or Data Science, one needs to be familiar with linear algebra and
optimization theory. In this topie, we will explain all the Linear algebra concepts
required for machine learning.
Below are some benefits of learning Linear Algebra before Machine learning:
Improved Statistics:
data in Machine
Statistics is an important concept to organize and integrate
statistics in a better
Learning.Also, linear Algebra helps to understand the concept of
manner. Advanced statistical topics can be integrated using methods, operations, and
notations of linear algebra.
(v) Regularization
Inmachine learning, we usually look for the simplest nossible model to achieve
the best outcome for the specific problem. Simpler models generalize well, ranging
from specific examples to unknown datasets. These simpler models are often
considered models with smaller coefficient values.
technique used to minimize the size of coefficients of a model while it isbeing
A
magnitude or length of the coefficients as a vector and are methods lifted directly
from linear algebra called the vector norm.
(vi) Principal Component Analysis
Generally, each dataset contains thousands of features, and fitting the model with
such a large dataset is one of the most challenging asks of machine learning.
Moreover, a model built with irrelevant features is less accurate than a model built
with relevant features. There are several methods in machine learning that
automatically reduce the number of columns of a dataset, and these methods
are
known as Dimensionality reduction. The mosk-commonly used dimensionality
reductions method in machine learning is Principal Component Analyšis or PCA,
Ihis technique makes projections of high-dimensional data for both visualizations
and training models. PCA uses the matrix factorization method from linear algebra.
short, and is alsoknown by the name Latent Semantic Indexing or. LSI.
3.6.1. HYPOTHESIS
The hypothesis is defined as the supposition or proposed
explanation based on
insufficient evidence or assumptions. It is just a guess based on some known
facts
but has not yet been proven. A good hypothesis is testable, which results in either
true or false.
In this exanple, a scientist just claims that UV rays are harmful to the eyes, but
We assume they may cause blindness. However, it may or may not be possible.
Hence, these types of assumptions are called a hypothesis.
Unknown
Target
Function
t: XY
Learning Final
Training
Algorithm Hypothesis
Examples
(A)
(X1,y1)..(xn, yn)|
Hypothesis
Space
(H)
Hypothesis (h):
Jt is defined as the approximate function that best describes the target in
supervised machine learning algorithms. It is primarily based on data as well as bias
and restrictions applied to data. Hence hypothesis (h) can be concluded as a smge
hypothesis that maps input to proper output and can be evaluated as
well as usea tu
make predictions.
The hypothesis (h) can be formulated
in machine learning as follows:
y= mx +
b
Where, Y: Range, m:
Slope of the line which
divided by change in x.x: divided test data or chango
domainc: intercept (constant)
Example: Let's understand the
hypothesis (h) with 3
two-dimensional coordinate and hypothesis space (H)
plane showing the distribution of
data as follows.
Fig, 3.3.
Learning
3,23
?
+
?
+
t-+ ?
Fig. 3.4.
If we divide this coordinate plane in such as way that it can
help you to predict
output or result as follows:
+ ? ?
+
Fig. 3.5.
Based on the given test data, the output result will be as follows:
?
+
+
+
+/
+
?
Fig, 3.6.
However, based on data, algorithm, and constraints, this coordinate plane can also
be divided
in the following ways as follows:
Machine Learning Fundamentals
Artificial Intelligence and.
|3.22
?
+ (or) +
+ ?
|| ?
+ + ?
Fig. 3.7.
Possible Hypothesis
+ ? Hypothesis
+
Space
+ ?
is no statistically..significant
that there effect exists in
the given set of
Learning
3.23
Significance level
The significance level is the primary thing that must be set before starting an
experiment. It is useful to define the tolerance of error and the level at which effect
can be considered significantly. During the testing process
in an experiment, a 95%
significance level is accepted, and the remaining 5% can be neglected. The
significance level also tells the critical or threshold value. For e.g., in an experiment,
if the significance level is set to 98%, then the critical value is 0.02%.
P-value
The p-value in statistics is defined as the evidence against a null hypothesis. In
other words, P-value is the probability that a random chance generated the data or
something else that is equal or rarer under the null hypothesis condition.
If the p-value is smaller, the evidence will be stronger, and vice-versa which
means the null hypothesis can be rejected in testing. It is always represented in a
decimal form, such as 0.035.
Whenever a statistical test is carried out on the population and sample to find out
P-value, then it always depends upon the critical value. If the p-value is less than the
critical value, then it shows the effect is significant, and the null hypothesis can be
rejected. Further, if it is higher than the critical value, it shows that there is no
significant effect. and hence fails to reject the NullHypothesis.
EVALUATION
3.7.
(i)Confusion Matrix:
us a matrix/table as output and describes
3 The confusion matrix provides
the performance of the model.
error matrix.
It is also known as the
in a summarized form, which has
3 The matrix consists of predictions result
predictions. The matrix
a total number of correct predictions and incorrect
looks like as below table:
Machine Learning is one of the booming technologies across the world tha
enables computers/machines to turn a huge amount of data into predictions.
However, these predictions highly depend on the quality of the data, and if we arc
not using the right data for our model, then it will not generate the expected result. h
machine learning projects,we generally divide the original dataset into training dta
and test data. We train our model over a subset of the original dataset, ie., the
training dataset, and then evaluate whether it can generalize well to the new
unseen dataset or test set. Therefore, train and test datasets are
r
the two key conceps
of machine learning, where the training dataset is used to fit the model, and the tes
dataset ís used to evaluate the model.
The type of training data that we provide to the model is highly responsible for the
model's accuracy and prediction ability. It means that the better the quality of the
training data, the better will be the performance of the model. Training data is
approximately more than or equal to 60% of the total data for an ML project
:a
We can understand it as train our model with training sct and then testit
if we
·2. x_train, x_test, y_train, y_test train test split(x, y, test size= 02
random_state-0)
Explanation:
function
In the first line of the above code, we have imported the train _test _split
from the sklearn library.
In the second line, we have used four variables, which are
x train: It is used to represent
features for the training data
x test: It is used to represent
features for testing data
y_train: It is used to represent dependent
variables for training dala
y_test: It is used to represent
independent variable for
d testing data Which
Inthe train_test_split() function, we have
Grst two are passed four parameters.
for arrays of data, Size of
and test size is the
the test set. The test_size may for specifying atio
be .5,.3, or .2, which dividing
of training and testing sets. tells the
Learning
3.29
The last parameter, random state,
is used to set a seed for a
generator so that you always get rando
the same result, and the most used Vatue
for this is 42.
The main difference between training data and testing data is that training
data is the subset of original data that is used to train the machine learning
model, whereas testing data is used to check the accuracy of the model.
$ The training dataset is generally larger in size compared to the testing
dataset. The general ratios of splitting train and test datasets are 80:20.
70:30, or 90:10.
Training data is well known to the model as it is used to train the model,
whereas testing data is like unseen/new data to the model.
3.8.5. WORKING OF TRAINING AND TESTING DATA IN MACHINE LEARNING
Machine Learning algorithms enable the machines to make predictions and solve
problems on the basis of past observations or experiences. These experiences or
Machine Learning Fundamental
Artificial Intelligence
and
3.30|
which is fed to it. Further,
can from the training data,
take
Observations an algorithm that they can learn and.improve gver
ML algorithms is
one of the great things about data.
are trained with the relevant training
own, as they
time on their relevant training data, it is tested wih
with the
Once the model is trained enough process of training and testing
in
Training data
2 Tag
3. Test Incorrect
output
Correct output
Model is ready
Fig. 3.10. Train and test
split flowchart
Learning |3.31
trained, therefore it is important to train the model with quality data. Further, ML
works on the concept of "Garbage In, Garbage Out." It means that whatever type
of
ibta we will input into our model, it will make the predictions accordingly. For a
ouality training data, the below points should be considered:
() Relevant
The very first quality of training data should be relevant to the problem that you
are going to solve. It means that
whatever data you are using should be relevant to
the current problem. For example, if you are building a model to analyze social
media data, then data should be taken from different social sites such as Twitter,
Facebook, Instagram, etc.
(i) Uniform:
There should always be uniformity among the features of a dataset. It means all
data for a particular problem should be taken from the same source with the same
atributes.
(iin) Consistency:
In thedataset, the similar attributes must always correspond to the similar label in
order to ensure uniformity in the dataset.
(v) Comprehensive:
Ihe training data must be large enough to represent sufficient features that
you
Ned to train the model in a better way. With a comprehensive dataset, the model will
be able to learn all the edge cases.
3.9, CROSS-VALIDATION
rain the model. On 2nd iteration, the second told is Used to test the model, and rest
are used to train the model. This process will continue until each fold is not used for
00 Training set
Testing set
Test Data: The test data is used to make the predictions from the model that is
already trained on the training data. This has the same features as training data but
not the part of that.
Cross-Validation dataset: It is used to overcome the disadvantage of train/test
split by splitting the dataset into groups of train/test splits, and averaging the result. It
can be used
if we want to optimize our model that has been trained on the training
dataset for the best performance. It is more efficient as compared to train/test split as
every observation is used for the training and testing both.
There are some limitations of the cross-validation technique, which are given
below:
For the ideal conditions, it provides the optimum output. But for the
inconsistent data, it may produce a drastic result. So, it is one of the big
disadvantages of cross-validation, as there is no certainty of the type of
data in nachine learning.
In predictive modeling, the data evolves over a period, due to which, it
may face the differences between the training set and validation sets. Such
as if we create a model for the prediction of stock market values, and the
data is trained on the previous 5 years stock values, but the realistic future
so it is difficult to
values for the next 5 years may drastically different,
expect the correct output for such situations.
3.9.4. APPLICATIONS OF CROSS-VALIDATION
LEARNING
3.10. OVERFITTINGAND UNDERFITTING IN MACHINE
occur in machine
Overfitting and Underfitting are the tWo main problems that
learning and degrade the performance of the machine learning models.
I Machine Learning Fundamend
3.36| Artificial Intelligence and
3.10.1. OVERFITTING
Overfitting occurs when our machine learning model tries to coverall t
data points or more the
than the required data points present in
dataset. Because of this, the model starts inaccurs
caching noise and
values present in the dataset, and all these efficiency
factors reduce the
higt
accuracy of the model. The overfitted model has and
low bias
variance.
The chances of occurrence of provit
overfitting increase as much We
m
training to our model, It means the more we model. the
chances of occurring the overfitted model.
train our
3 Removing features
Regularization
Ensembling
3.10.2. UNDERFITTING
not able to
Underfitting occurs when our machine learning model is
capture the underlying trend of the data. To avoid the overfitting in the
model, the fed of training data can be stopped at an early stage, due to
a result.
which the model may not learn enouglh from the training data. As
trend in the data.
it mayfail to find the best fit of thedominant
Artificial Intelligence and Machine Learning. Fundamentalg
|3.38|
to learn enough fromthe
the case of underfitting, the model is not able
In accuracy and produces unreliable
training data, and hence it reduces the
predictions.
bias and low variance.
An underfitted model has high
lines
We can understand the underfitting using below output of the
Example:
regression model:
As we can see from the above diagram, the model is unable to capture the daa
points present in the plot.
How to avoid underfitting:
Byincreasing the training time
3 of the model.
By increasing the number of features.
Goodness of Fit
The "Goodness fit" term is taken from the statistics, and the goal oftàe
of
Validation Error
Error
Training Eror
Model Complexity
mod
errors: These errors can be reduced to improve the
() Reducible bias and Variance
accuracy. Such errors can further be classified into
errors: These errors will always be present in the modsl
(ii) Irreducible cause of these erroro i
algorithm has been used. The
regardless of which
reduced.
unknown variables whose value can't be
3.11.2. BIAS
In general, a machine learning model analyses the data,
find patterns in it and
dataset and
make predictions. While training, the model learns these patterns in the
a difference
applies them to test data for prediction. While making predictions,
occurs between prediction values made by the model and actual values/expected
can be
values, and this difference is known as bias errors or Errors due to bias. It
defined as an inability of machine learning algorithms such as Linear Regression to
capture the true relationship between the data points. Each algorithm begins wih
some amount of bias because bias occurs from assumptions in the model, wich
makes the target function simple to learn. A model has either:
(i) Low Bias: A low bias model will make fewer assumptions about the form
of the target function.
(ii) High Bias: A model with a high bias makes more. assumptions, and u
model becomes unable to capture the important features of our dataseL 1
high bias model also cannot perform well on new data.
Generally, a linear algorithm has a high bias, as it makes them learn fast.
simpler the algorithm, the higher the bias it has likely to be introduced. Whereas 3
nonlinear algorithm often has low bias.
Some examples of machine learning algorithms with low bias are Decision Tres
k-Nearest Neighbours and Support Vector Machines, algorithm
At the same time, an
with high bias is Linear Regression, Linear Logisti
Discriminant Analysis and
Regression.
Since, with high variance, the model learns too much from the dataset, it leads to
overfitting of the model. A model with high variance has the below problems:
Bias
Low
Bias
High
on average.
High Bias
High
Error
Variance
Training Error
Training instances
Bias-Variance Trade-Off
While building the machine learning model, it is really important to take care of
bias and variance in order to avoid overfitting and underfitting in the model. If the
model is very simple with fewer parameters, it may have low variance and high bias.
Whereas, if the model has a large number of parameters, it will have high variance
and low bias. So, it is required to make a balance between bias and variance
errors,
generalizes well with the unseen dataset. Unfortunately, doing this is not possihle
simultaneously. Because a high variance algorithm may perform wellwith trainino
data, but it may lead to overfitting to noisy data. Whereas, high bias algorithm
generates a much simple model that may not even capture important regularities in
the data. So, we need to find a sweet spot between bias and variance to make an
optimal mnodel.
Complexity
ErTor
Optimal
Variance
Bias
Model Complexity
Fig. 3.18. Bias-Variance
Hence, the Bias-Variance Trade-Off
trade-off is about finding
balance between bias
and variance errors. the sweet spot to make d
3.12. REGRESSION
Regression analysis
is a statistical method
dependent (target) to model the
and independent relationship between s
independent variables. More (predictor)
specifically, Regressionvariables with one Or more
how the value the dependent
of
analysis helps us
variable when other independent
variable is
changing corresponding to understand
values such as temperature, variables are
held fixed. to an independent
age, salary, It predicts continuous/real
We can anderstand price, etc.
the concept regression
of analysis using
Example: Suppose there is the below example:
a marketing company
advertisement every year and A, various
get sales on who does
advertisement made by ihe company that. The the
inthe last S years and below list shows
the correspondinB sales:
Learning
3.45
Advertisement
Sales
$90
$1000
$120
$1300
$150 $1800
$100 $1200
$130 $1380
$200 ??
Fig. 3.19. example
Now, the company wantS to do the advertisement of $200 in the year 2019 and
wants to know the prediction
about the sales for this year. So tosolve such type of
prediction problems in machine learning, we need regression analysis.
independent variable.
4. It isused to find the trends in data.
It helps to predict real/continuous values.
3 By performing the regression, we can confidently determine the most
inportant factor, the least important factor, and how each factor 3
affecting the other factors.
given below:
Lincar Regression
Logistic Regression
Polynomial Regression
Learning 347
120000
100000
Salary 80000
60000
40000
2 6 8 10
experience
variables),
X = Independent variables (predictor
a
and b are the linear coefficients
Some popular applications of linear regression are:
Analyzing trends and sales estimates
$ Salary forecasting
Real estate prediction
Arriving at ETAs in traffic.
probability.
Logistic regression is a type of regression, but it is different from the linear
regression algorithm in the term how they are used.
Logistic regression uses sigmoid function or logistic function which is a
complex cost function. This sigmoid function is used to model the data in
logistic regression. The function can be represented as:
1
fr) =
1+e-*
fe)= Output betweenthe 0 and 1
value.
follows:
Learning |3.49|
1 sig(x)
sig(x) =
1+e
0.8
0.6
0.4/
0.2
X
-6 -1 -2 2 4 6 8
3 Semi-supervised Learning
Reinforcement Learning
Transduction
Learningto Learn
7. What are the three stages to build machine
the hypotheses or model in
learning?
Model building
Model testinge
X Applying the modelodda
Loaning 3.51
REVIEW QUESTIONS