Machine Learning
Introduction
Syllabus:
Introduction to Machine Learning, Examples of Machine Learning Applications, Learning
Types Supervised Learning -Learning a Class from Examples, Vapnik-Chervonenkis (VC)
Dimension, Probably Approximately Correct (PAC) Learning,Noise, Learning Multiple
Classes, Regression, Model Selection and Generalization, Dimensions of a Supervised
Machine
Dimensionality Reduction- Introduction,Subset Selection, Principal Components Analysis,
Factor Analysis, Multidimensional Scaling, Linear Discriminant Analysis,Isomap, Locally
Linear Embedding
Introduction: What is Machine Learning?
Machine learning is an umbrella term for a set of techniques and tools that help computers
learn and adapt on their own. Machine learning algorithms help AI learn without being
explicitly programmed to perform the desired action. By learning a pattern from sample
inputs, the machine learning algorithm predicts and performs tasks solely based on the
learned pattern and not a predefined program instruction. Machine learning is a life savior in
several cases where applying strict algorithms is not possible. It will learn the new process
from previous patterns and execute the knowledge.
One of the machine learning applications we are familiar with is the way our email providers
help us deal with spam. Spam filters use an algorithm to identify and move incoming junk
email to your spam folder. Several e-commerce companies also use machine learning
algorithms in conjunction with other IT security tools to prevent fraud and improve their
recommendation engine performance.
Popular Machine Learning Applications and Examples
1. Social Media Features
Social media platforms use machine learning algorithms and approaches to create some
attractive and excellent features. For instance, Facebook notices and records your activities,
chats, likes, and comments, and the time you spend on specific kinds of posts. Machine
learning learns from your own experience and makes friends and page suggestions for your
profile.
2. Product Recommendations
Product recommendation is one of the most popular and known applications of machine
learning. Product recommendation is one of the stark features of almost every e-commerce
website today, which is an advanced application of machine learning techniques.
Using machine learning and AI, websites track your behavior based on your previous
purchases, searching patterns, and cart history, and then make product recommendations.
3. Image Recognition
Image recognition, which is an approach for cataloging and detecting a feature or an object in
the digital image, is one of the most significant and notable machine learning and AI
techniques. This technique is being adopted for further analysis, such as pattern recognition,
face detection, and face recognition
.
4. Sentiment Analysis
Sentiment analysis is one of the most necessary applications of machine learning. Sentiment
analysis is a real-time machine learning application that determines the emotion or opinion of
the speaker or the writer. For instance, if someone has written a review or email (or any form
of a document), a sentiment analyzer will instantly find out the actual thought and tone of the
text. This sentiment analysis application can be used to analyze a review based website,
decision-making applications, etc.
5. Automating Employee Access Control
Organizations are actively implementing machine learning algorithms to determine the level
of access employees would need in various areas, depending on their job profiles. This is one
of the coolest applications of machine learning.
6. Marine Wildlife Preservation
Machine learning algorithms are used to develop behavior models for endangered cetaceans
and other marine species, helping scientists regulate and monitor their populations.
7. Regulating Healthcare Efficiency and Medical Services
Significant healthcare sectors are actively looking at using machine learning algorithms to
manage better. They predict the waiting times of patients in the emergency waiting rooms
across various departments of hospitals. The models use vital factors that help define the
algorithm, details of staff at various times of day, records of patients, and complete logs of
department chats and the layout of emergency rooms. Machine learning algorithms also come
to play when detecting a disease, therapy planning, and prediction of the disease situation.
This is one of the most necessary machine learning applications.
8. Predict Potential Heart Failure
An algorithm designed to scan a doctor’s free-form e-notes and identify patterns in a patient’s
cardiovascular history is making waves in medicine. Instead of a physician digging through
multiple health records to arrive at a sound diagnosis, redundancy is now reduced with
computers making an analysis based on available information.
9. Banking Domain
Banks are now using the latest advanced technology machine learning has to offer to help
prevent fraud and protect accounts from hackers. The algorithms determine what factors to
consider to create a filter to keep harm at bay. Various sites that are unauthentic will be
automatically filtered out and restricted from initiating transactions.
10. Language Translation
One of the most common machine learning applications is language translation. Machine
learning plays a significant role in the translation of one language to another. We are amazed
at how websites can translate from one language to another effortlessly and give contextual
meaning as well. The technology behind the translation tool is called ‘machine translation.’ It
has enabled people to interact with others from all around the world; without it, life would
not be as easy as it is now. It has provided confidence to travelers and business associates to
safely venture into foreign lands with the conviction that language will no longer be a barrier.
Types of Machine Learning:
Machine learning is a subset of AI, which enables the machine to automatically learn from
data, improve performance from past experiences, and make predictions. Machine learning
contains a set of algorithms that work on a huge amount of data. Data is fed to these
algorithms to train them, and on the basis of training, they build the model & perform a
specific task.
These ML algorithms help to solve different business problems like Regression,
Classification, Forecasting, Clustering, and Associations, etc.
Based on the methods and way of learning, machine learning is divided into mainly four
types, which are:
1. Supervised Machine Learning
2. Unsupervised Machine Learning
3. Semi-Supervised Machine Learning
4. Reinforcement Learning
1. Supervised Machine Learning
As its name suggests, Supervised machine learning is based on supervision. It means in the
supervised learning technique, we train the machines using the "labelled" dataset, and based
on the training, the machine predicts the output. Here, the labelled data specifies that some of
the inputs are already mapped to the output. More preciously, we can say; first, we train the
machine with the input and corresponding output, and then we ask the machine to predict the
output using the test dataset.
Let's understand supervised learning with an example. Suppose we have an input dataset of
cats and dog images. So, first, we will provide the training to the machine to understand the
images, such as the shape & size of the tail of cat and dog, Shape of eyes, colour, height
(dogs are taller, cats are smaller), etc. After completion of training, we input the picture of
a cat and ask the machine to identify the object and predict the output. Now, the machine is
well trained, so it will check all the features of the object, such as height, shape, colour, eyes,
ears, tail, etc., and find that it's a cat. So, it will put it in the Cat category. This is the process
of how the machine identifies the objects in Supervised Learning.
The main goal of the supervised learning technique is to map the input variable(x) with
the output variable(y). Some real-world applications of supervised learning are Risk
Assessment, Fraud Detection, Spam filtering, etc.
Categories of Supervised Machine Learning
Supervised machine learning can be classified into two types of problems, which are given
below:
o Classification
o Regression
a) Classification
Classification algorithms are used to solve the classification problems in which the output
variable is categorical, such as "Yes" or No, Male or Female, Red or Blue, etc. The
classification algorithms predict the categories present in the dataset. Some real-world
examples of classification algorithms are Spam Detection, Email filtering, etc.
Some popular classification algorithms are given below:
o Random Forest Algorithm
o Decision Tree Algorithm
o Logistic Regression Algorithm
o Support Vector Machine Algorithm
b) Regression
Regression algorithms are used to solve regression problems in which there is a linear
relationship between input and output variables. These are used to predict continuous output
variables, such as market trends, weather prediction, etc.
Some popular Regression algorithms are given below:
o Simple Linear Regression Algorithm
o Multivariate Regression Algorithm
o Decision Tree Algorithm
o Lasso Regression
Advantages and Disadvantages of Supervised Learning
Advantages:
o Since supervised learning work with the labelled dataset so we can have an exact idea
about the classes of objects.
o These algorithms are helpful in predicting the output on the basis of prior experience.
Disadvantages:
o These algorithms are not able to solve complex tasks.
o It may predict the wrong output if the test data is different from the training data.
o It requires lots of computational time to train the algorithm.
Applications of Supervised Learning
Some common applications of Supervised Learning are given below:
o ImageSegmentation:
Supervised Learning algorithms are used in image segmentation. In this process,
image classification is performed on different image data with pre-defined labels.
o MedicalDiagnosis:
Supervised algorithms are also used in the medical field for diagnosis purposes. It is
done by using medical images and past labelled data with labels for disease
conditions. With such a process, the machine can identify a disease for the new
patients.
o Fraud Detection - Supervised Learning classification algorithms are used for
identifying fraud transactions, fraud customers, etc. It is done by using historic data to
identify the patterns that can lead to possible fraud.
o Spam detection - In spam detection & filtering, classification algorithms are used.
These algorithms classify an email as spam or not spam. The spam emails are sent to
the spam folder.
o Speech Recognition - Supervised learning algorithms are also used in speech
recognition. The algorithm is trained with voice data, and various identifications can
be done using the same, such as voice-activated passwords, voice commands, etc.
2. Unsupervised Machine Learning
Unsupervised learning is different from the Supervised learning technique; as its name
suggests, there is no need for supervision. It means, in unsupervised machine learning, the
machine is trained using the unlabeled dataset, and the machine predicts the output without
any supervision.
In unsupervised learning, the models are trained with the data that is neither classified nor
labelled, and the model acts on that data without any supervision.
The main aim of the unsupervised learning algorithm is to group or categories the
unsorted dataset according to the similarities, patterns, and differences. Machines are
instructed to find the hidden patterns from the input dataset.
Let's take an example to understand it more preciously; suppose there is a basket of fruit
images, and we input it into the machine learning model. The images are totally unknown to
the model, and the task of the machine is to find the patterns and categories of the objects.
Categories of Unsupervised Machine Learning
Unsupervised Learning can be further classified into two types, which are given below:
o Clustering
o Association
1) Clustering
The clustering technique is used when we want to find the inherent groups from the data. It is
a way to group the objects into a cluster such that the objects with the most similarities
remain in one group and have fewer or no similarities with the objects of other groups. An
example of the clustering algorithm is grouping the customers by their purchasing behaviour.
Some of the popular clustering algorithms are given below:
o K-Means Clustering algorithm
o Mean-shift algorithm
o DBSCAN Algorithm
o Principal Component Analysis
o Independent Component Analysis
2) Association
Association rule learning is an unsupervised learning technique, which finds interesting
relations among variables within a large dataset. The main aim of this learning algorithm is to
find the dependency of one data item on another data item and map those variables
accordingly so that it can generate maximum profit. This algorithm is mainly applied
in Market Basket analysis, Web usage mining, continuous production, etc.
Some popular algorithms of Association rule learning are Apriori Algorithm, Eclat, FP-
growth algorithm.
Advantages and Disadvantages of Unsupervised Learning Algorithm
Advantages:
o These algorithms can be used for complicated tasks compared to the supervised ones
because these algorithms work on the unlabeled dataset.
o Unsupervised algorithms are preferable for various tasks as getting the unlabeled
dataset is easier as compared to the labelled dataset.
Disadvantages:
o The output of an unsupervised algorithm can be less accurate as the dataset is not
labelled, and algorithms are not trained with the exact output in prior.
o Working with Unsupervised learning is more difficult as it works with the unlabelled
dataset that does not map with the output.
Applications of Unsupervised Learning
o Network Analysis: Unsupervised learning is used for identifying plagiarism and
copyright in document network analysis of text data for scholarly articles.
o Recommendation Systems: Recommendation systems widely use unsupervised
learning techniques for building recommendation applications for different web
applications and e-commerce websites.
o Anomaly Detection: Anomaly detection is a popular application of unsupervised
learning, which can identify unusual data points within the dataset. It is used to
discover fraudulent transactions.
o Singular Value Decomposition: Singular Value Decomposition or SVD is used to
extract particular information from the database. For example, extracting information
of each user located at a particular location.
3. Semi-Supervised Learning
Semi-Supervised learning is a type of Machine Learning algorithm that lies between
Supervised and Unsupervised machine learning. It represents the intermediate ground
between Supervised (With Labelled training data) and Unsupervised learning (with no
labelled training data) algorithms and uses the combination of labelled and unlabeled datasets
during the training period.
Although Semi-supervised learning is the middle ground between supervised and
unsupervised learning and operates on the data that consists of a few labels, it mostly consists
of unlabeled data. As labels are costly, but for corporate purposes, they may have few labels.
It is completely different from supervised and unsupervised learning as they are based on the
presence & absence of labels.
To overcome the drawbacks of supervised learning and unsupervised learning
algorithms, the concept of Semi-supervised learning is introduced. The main aim of semi-
supervised learning is to effectively use all the available data, rather than only labelled data
like in supervised learning. Initially, similar data is clustered along with an unsupervised
learning algorithm, and further, it helps to label the unlabeled data into labelled data. It is
because labelled data is a comparatively more expensive acquisition than unlabeled data.
We can imagine these algorithms with an example. Supervised learning is where a student is
under the supervision of an instructor at home and college. Further, if that student is self-
analysing the same concept without any help from the instructor, it comes under unsupervised
learning. Under semi-supervised learning, the student has to revise himself after analyzing the
same concept under the guidance of an instructor at college.
Advantages and disadvantages of Semi-supervised Learning
Advantages:
o It is simple and easy to understand the algorithm.
o It is highly efficient.
o It is used to solve drawbacks of Supervised and Unsupervised Learning algorithms.
Disadvantages:
o Iterations results may not be stable.
o We cannot apply these algorithms to network-level data.
o Accuracy is low.
4. Reinforcement Learning
Reinforcement learning works on a feedback-based process, in which an AI agent (A
software component) automatically explore its surrounding by hitting & trail, taking
action, learning from experiences, and improving its performance. Agent gets rewarded
for each good action and get punished for each bad action; hence the goal of reinforcement
learning agent is to maximize the rewards.
In reinforcement learning, there is no labelled data like supervised learning, and agents learn
from their experiences only.
The reinforcement learning process is similar to a human being; for example, a child learns
various things by experiences in his day-to-day life. An example of reinforcement learning is
to play a game, where the Game is the environment, moves of an agent at each step define
states, and the goal of the agent is to get a high score. Agent receives feedback in terms of
punishment and rewards.
Due to its way of working, reinforcement learning is employed in different fields such
as Game theory, Operation Research, Information theory, multi-agent systems.
A reinforcement learning problem can be formalized using Markov Decision
Process(MDP). In MDP, the agent constantly interacts with the environment and performs
actions; at each action, the environment responds and generates a new state.
Categories of Reinforcement Learning
Reinforcement learning is categorized mainly into two types of methods/algorithms:
o Positive Reinforcement Learning: Positive reinforcement learning specifies
increasing the tendency that the required behaviour would occur again by adding
something. It enhances the strength of the behaviour of the agent and positively
impacts it.
o Negative Reinforcement Learning: Negative reinforcement learning works exactly
opposite to the positive RL. It increases the tendency that the specific behaviour
would occur again by avoiding the negative condition.
Real-world Use cases of Reinforcement Learning
o Video Games:
RL algorithms are much popular in gaming applications. It is used to gain super-
human performance. Some popular games that use RL algorithms
are AlphaGO and AlphaGO Zero.
o Resource Management:
The "Resource Management with Deep Reinforcement Learning" paper showed that
how to use RL in computer to automatically learn and schedule resources to wait for
different jobs in order to minimize average job slowdown.
o Robotics:
RL is widely being used in Robotics applications. Robots are used in the industrial
and manufacturing area, and these robots are made more powerful with reinforcement
learning. There are different industries that have their vision of building intelligent
robots using AI and Machine learning technology.
o Text Mining
Text-mining, one of the great applications of NLP, is now being implemented with
the help of Reinforcement Learning by Salesforce company.
Advantages and Disadvantages of Reinforcement Learning
Advantages
o It helps in solving complex real-world problems which are difficult to be solved by
general techniques.
o The learning model of RL is similar to the learning of human beings; hence most
accurate results can be found.
o Helps in achieving long term results.
Disadvantage
o RL algorithms are not preferred for simple problems.
o RL algorithms require huge data and computations.
o Too much reinforcement learning can lead to an overload of states which can weaken
the results.
Vapnik-Chervonenkis Dimension
The Vapnik-Chervonenkis (VC) dimension is a measure of the capacity of a hypothesis set to
fit different data sets. It was introduced by Vladimir Vapnik and Alexey Chervonenkis in the
1970s and has become a fundamental concept in statistical learning theory. The VC
dimension is a measure of the complexity of a model, which can help us understand how well
it can fit different data sets.
The VC dimension of a hypothesis set H is the largest number of points that can be shattered
by H. A hypothesis set H shatters a set of points S if, for every possible labeling of the points
in S, there exists a hypothesis in H that correctly classifies the points. In other words, a
hypothesis set shatters a set of points if it can fit any possible labeling of those points.
Bounds of VC – Dimension
The VC dimension provides both upper and lower bounds on the number of training
examples required to achieve a given level of accuracy. The upper bound on the number of
training examples is logarithmic in the VC dimension, while the lower bound is linear.
Applications of VC – Dimension
The VC dimension has a wide range of applications in machine learning and statistics. For
example, it is used to analyze the complexity of neural networks, support vector machines,
and decision trees. The VC dimension can also be used to design new learning algorithms
that are robust to noise and can generalize well to unseen data.
The VC dimension can be extended to more complex learning scenarios, such as multiclass
classification and regression. The concept of the VC dimension can also be applied to other
areas of computer science, such as computational geometry and graph theory.
Probably Approximately Correct:
PAC (Probably Approximately Correct) learning is a theoretical framework in machine
learning that deals with the learning of hypotheses from a given set of data. It provides
guarantees on the generalization performance of a learning algorithm by bounding the error
between the learned hypothesis and the true hypothesis.
In PAC learning, the goal is to find a hypothesis that performs well on unseen examples,
given a sample of labeled data for training. The framework assumes that the training data is
drawn independently and identically from an unknown probability distribution, and it aims to
find a hypothesis that has a low error rate on future unseen examples.
Key concepts in PAC learning include:
1. Sample Complexity: It refers to the number of training examples required to learn a
hypothesis within a specified level of confidence and accuracy.
2. PAC Guarantee: PAC learning guarantees that the learned hypothesis will have a low
generalization error with high probability, provided that the sample complexity is
sufficient.
3. Probably Approximately Correct: The "probably" part indicates that the guarantees
hold with high probability, while the "approximately correct" part implies that the
learned hypothesis may not perfectly match the true hypothesis but will be close
enough.
Applications of PAC learning in machine learning include:
1. Supervised Learning: PAC learning is particularly applicable to supervised learning
problems, where a model is trained on labeled examples to make predictions on new
unseen examples. It helps in understanding the generalization performance of learning
algorithms and determining the minimum sample size required for accurate
predictions.
2. Sample Selection: PAC learning guides the process of selecting representative
training samples, ensuring that the selected samples are informative and cover the
underlying distribution adequately.
3. Model Selection and Evaluation: PAC learning provides a theoretical framework for
selecting and evaluating models based on their generalization performance. It helps in
assessing the trade-off between model complexity, sample size, and expected
accuracy.
4. Active Learning: Active learning strategies use PAC learning principles to actively
query and select the most informative or uncertain instances from an unlabeled
dataset. This approach reduces the labeling effort by focusing on the samples that are
most beneficial for improving the model's performance.
5. Computational Learning Theory: PAC learning is an important component of
computational learning theory, which studies the theoretical foundations and limits of
machine learning algorithms. It provides insights into the feasibility and complexity
of learning tasks.
Overall, PAC learning provides a theoretical framework for understanding the sample
complexity, generalization performance, and guarantees in learning from data. It plays a
crucial role in shaping the design, evaluation, and analysis of machine learning algorithms.
Machine Learning Multiple Classes:
Machine learning is a field of study and is concerned with algorithms that learn from
examples.
Classification is a task that requires the use of machine learning algorithms that learn how to
assign a class label to examples from the problem domain. An easy to understand example is
classifying emails as “spam” or “not spam.”
There are many different types of classification tasks that you may encounter in machine
learning and specialized approaches to modeling that may be used for each. They are
1. Classification Predictive Modeling
2. Binary Classification
3. Multi-Class Classification
4. Multi-Label Classification
5. Imbalanced Classification
Predictive Modeling:
machine learning, classification refers to a predictive modeling problem where a class label is
predicted for a given example of input data.Examples of classification problems include:
Given an example, classify if it is spam or not.
Given a handwritten character, classify it as one of the known characters.
Given recent user behavior, classify as churn or not.
2. Binary Classification:
Binary classification refers to those classification tasks that have two class labels.
Examples include:
Email spam detection (spam or not).
Churn prediction (churn or not).
Conversion prediction (buy or not).
Typically, binary classification tasks involve one class that is the normal state and another
class that is the abnormal state.
For example “not spam” is the normal state and “spam” is the abnormal state. Another
example is “cancer not detected” is the normal state of a task that involves a medical test and
“cancer detected” is the abnormal state.
Popular algorithms that can be used for binary classification include:
Logistic Regression
k-Nearest Neighbors
Decision Trees
Support Vector Machine
Naive Bayes
4. Multi-Class Classification
Multi-class classification refers to those classification tasks that have more than two class
labels.
Examples include:
Face classification.
Plant species classification.
Optical character recognition.
Unlike binary classification, multi-class classification does not have the notion of normal and
abnormal outcomes. Instead, examples are classified as belonging to one among a range of
known classes.
Popular algorithms that can be used for multi-class classification include:
k-Nearest Neighbors.
Decision Trees.
Naive Bayes.
Random Forest.
Gradient Boosting.
5. Multi-Label Classification
Multi-label classification refers to those classification tasks that have two or more class
labels, where one or more class labels may be predicted for each example.
Consider the example of photo classification, where a given photo may have multiple objects
in the scene and a model may predict the presence of multiple known objects in the photo,
such as “bicycle,” “apple,” “person,” etc.
This is unlike binary classification and multi-class classification, where a single class label is
predicted for each example.
It is common to model multi-label classification tasks with a model that predicts multiple
outputs, with each output taking predicted as a Bernoulli probability distribution. This is
essentially a model that makes multiple binary classification predictions for each example.
Classification algorithms used for binary or multi-class classification cannot be used directly
for multi-label classification. Specialized versions of standard classification algorithms can
be used, so-called multi-label versions of the algorithms, including:
Multi-label Decision Trees
Multi-label Random Forests
Multi-label Gradient Boosting
6. Imbalanced Classification
Imbalanced classification refers to classification tasks where the number of examples in each
class is unequally distributed.
Typically, imbalanced classification tasks are binary classification tasks where the majority
of examples in the training dataset belong to the normal class and a minority of examples
belong to the abnormal class.
Examples include:
Fraud detection.
Outlier detection.
Medical diagnostic tests.
These problems are modeled as binary classification tasks, although may require specialized
techniques.
Specialized techniques may be used to change the composition of samples in the training
dataset by undersampling the majority class or oversampling the minority class.
Examples include:
Random Undersampling.
SMOTE Oversampling.
Specialized modeling algorithms may be used that pay more attention to the minority class
when fitting the model on the training dataset, such as cost-sensitive machine learning
algorithms.
Examples include:
Cost-sensitive Logistic Regression.
Cost-sensitive Decision Trees.
Cost-sensitive Support Vector Machines.
Finally, alternative performance metrics may be required as reporting the classification
accuracy may be misleading.
Examples include:
Precision.
Recall.
F-Measure.
What is Noise in Machine Learning?
Real-world data, which is used to feed data mining algorithms, has a number of factors that
can influence it. The existence of noise is a major factor in both of these problems. It’s an
inevitable problem, but one that a data-driven organization must fix.
Humans are prone to making mistakes when collecting data, and data collection instruments
may be unreliable, resulting in dataset errors. The errors are referred to as noise. Data noise in
machine learning can cause problems since the algorithm interprets the noise as a pattern and
can start generalizing from it.
A noisy dataset will wreak havoc on the entire analysis pipeline. Noise can be
measured as a signal to noise ratio by analysts and data scientists
Supervised Learning Models: An Overview
Classification predictive modeling problems are different from regression predictive
modeling problems, as classification is the task of predicting a discrete class label and
regression is the task of predicting a continuous quantity. However, both share the
same concept of utilizing known variables to make predictions, and there is a
significant overlap between the two models. Hence, the models for classification and
regression are presented together in this chapter. Figure 4-1 summarizes the list of the
models commonly used for classification and regression.
Some models can be used for both classification and regression with small
modifications. These are K-nearest neighbors, decision trees, support vector,
ensemble bagging/boosting methods, and ANNs (including deep neural networks), as
shown in Figure 4-1. However, some models, such as linear regression and logistic
regression, cannot (or cannot easily) be used for both problem types.
Machine Learning: As discussed in this article, machine learning is nothing but a field of
study which allows computers to “learn” like humans without any need of explicit
programming.
What is Predictive Modeling: Predictive modeling is a probabilistic process that allows us
to forecast outcomes, on the basis of some predictors. These predictors are basically features
that come into play when deciding the final result, i.e. the outcome of the model.
Dimensionality reduction is the process of reducing the number of features (or dimensions) in
a dataset while retaining as much information as possible. This can be done for a variety of
reasons, such as to reduce the complexity of a model, to improve the performance of a
learning algorithm, or to make it easier to visualize the data. There are several techniques for
dimensionality reduction, including principal component analysis (PCA), singular value
decomposition (SVD), and linear discriminant analysis (LDA). Each technique uses a
different method to project the data onto a lower-dimensional space while preserving
important information.
What is Dimensionality Reduction?
Dimensionality reduction is a technique used to reduce the number of features in a dataset
while retaining as much of the important information as possible. In other words, it is a
process of transforming high-dimensional data into a lower-dimensional space that still
preserves the essence of the original data.
In machine learning, high-dimensional data refers to data with a large number of features or
variables. The curse of dimensionality is a common problem in machine learning, where the
performance of the model deteriorates as the number of features increases. This is because
the complexity of the model increases with the number of features, and it becomes more
difficult to find a good solution. In addition, high-dimensional data can also lead to
overfitting, where the model fits the training data too closely and does not generalize well to
new data.
Dimensionality reduction can help to mitigate these problems by reducing the complexity of
the model and improving its generalization performance. There are two main approaches to
dimensionality reduction: feature selection and feature extraction.
FeatureSelection:
Feature selection involves selecting a subset of the original features that are most relevant to
the problem at hand. The goal is to reduce the dimensionality of the dataset while retaining
the most important features. There are several methods for feature selection, including filter
methods, wrapper methods, and embedded methods. Filter methods rank the features based
on their relevance to the target variable, wrapper methods use the model performance as the
criteria for selecting features, and embedded methods combine feature selection with the
model training process.
FeatureExtraction:
Feature extraction involves creating new features by combining or transforming the original
features. The goal is to create a set of features that captures the essence of the original data in
a lower-dimensional space. There are several methods for feature extraction, including
principal component analysis (PCA), linear discriminant analysis (LDA), and t-distributed
stochastic neighbor embedding (t-SNE). PCA is a popular technique that projects the original
features onto a lower-dimensional space while preserving as much of the variance as
possible.
Why is Dimensionality Reduction important in Machine Learning and Predictive
Modeling?
An intuitive example of dimensionality reduction can be discussed through a simple e-mail
classification problem, where we need to classify whether the e-mail is spam or not. This can
involve a large number of features, such as whether or not the e-mail has a generic title, the
content of the e-mail, whether the e-mail uses a template, etc. However, some of these
features may overlap. In another condition, a classification problem that relies on both
humidity and rainfall can be collapsed into just one underlying feature, since both of the
aforementioned are correlated to a high degree. Hence, we can reduce the number of features
in such problems. A 3-D classification problem can be hard to visualize, whereas a 2-D one
can be mapped to a simple 2-dimensional space, and a 1-D problem to a simple line. The
below figure illustrates this concept, where a 3-D feature space is split into two 2-D feature
spaces, and later, if found to be correlated, the number of features can be reduced even
further.
Components of Dimensionality Reduction
There are two components of dimensionality reduction:
Feature selection: In this, we try to find a subset of the original set of variables, or
features, to get a smaller subset which can be used to model the problem. It usually
involves three ways:
1. Filter
2. Wrapper
3. Embedded
Feature extraction: This reduces the data in a high dimensional space to a lower
dimension space, i.e. a space with lesser no. of dimensions.
Methods of Dimensionality Reduction
The various methods used for dimensionality reduction include:
Principal Component Analysis (PCA)
Linear Discriminant Analysis (LDA)
Generalized Discriminant Analysis (GDA)
Dimensionality reduction may be both linear and non-linear, depending upon the method
used. The prime linear method, called Principal Component Analysis, or PCA, is discussed
below.
Principal Component Analysis
This method was introduced by Karl Pearson. It works on the condition that while the data in
a higher dimensional space is mapped to data in a lower dimension space, the variance of the
data in the lower dimensional space should be maximum.
It involves the following steps:
Construct the covariance matrix of the data.
Compute the eigenvectors of this matrix.
Eigenvectors corresponding to the largest eigenvalues are used to reconstruct a large
fraction of variance of the original data.
Hence, we are left with a lesser number of eigenvectors, and there might have been some data
loss in the process. But, the most important variances should be retained by the remaining
eigenvectors.
Advantages of Dimensionality Reduction
It helps in data compression, and hence reduced storage space.
It reduces computation time.
It also helps remove redundant features, if any.
Improved Visualization: High dimensional data is difficult to visualize, and
dimensionality reduction techniques can help in visualizing the data in 2D or 3D,
which can help in better understanding and analysis.
Overfitting Prevention: High dimensional data may lead to overfitting in machine
learning models, which can lead to poor generalization performance. Dimensionality
reduction can help in reducing the complexity of the data, and hence prevent
overfitting.
Feature Extraction: Dimensionality reduction can help in extracting important features
from high dimensional data, which can be useful in feature selection for machine
learning models.
Data Preprocessing: Dimensionality reduction can be used as a preprocessing step
before applying machine learning algorithms to reduce the dimensionality of the data
and hence improve the performance of the model.
Improved Performance: Dimensionality reduction can help in improving the
performance of machine learning models by reducing the complexity of the data, and
hence reducing the noise and irrelevant information in the data.
Disadvantages of Dimensionality Reduction
It may lead to some amount of data loss.
PCA tends to find linear correlations between variables, which is sometimes
undesirable.
PCA fails in cases where mean and covariance are not enough to define datasets.
We may not know how many principal components to keep- in practice, some thumb
rules are applied.
Interpretability: The reduced dimensions may not be easily interpretable, and it may
be difficult to understand the relationship between the original features and the
reduced dimensions.
Overfitting: In some cases, dimensionality reduction may lead to overfitting,
especially when the number of components is chosen based on the training data.
Sensitivity to outliers: Some dimensionality reduction techniques are sensitive to
outliers, which can result in a biased representation of the data.
Computational complexity: Some dimensionality reduction techniques, such as
manifold learning, can be computationally intensive, especially when dealing with
large datasets.
Important points:
Dimensionality reduction is the process of reducing the number of features in a
dataset while retaining as much information as possible.
This can be done to reduce the complexity of a model, improve the performance of a
learning algorithm, or make it easier to visualize the data.
Techniques for dimensionality reduction include: principal component analysis
(PCA), singular value decomposition (SVD), and linear discriminant analysis (LDA).
Each technique projects the data onto a lower-dimensional space while preserving
important information.
Dimensionality reduction is performed during pre-processing stage before building a
model to improve the performance
It is important to note that dimensionality reduction can also discard useful
information, so care must be taken when applying these techniques.
Locally-Linear Embedding(LLE)
There can be many advantages of using Locally-Linear Embedding approaches over
Isomap or other approaches, like using it with sparse matrix algorithms we can obtain
a faster optimization and better results with many problems. The algorithm of LLE
starts with finding a set of the nearest neighbours of each point. After finding the
nearest neighbours by computing the weights set for each point it helps in describing
the point as a linear combination of its neighbours. And in the final step to finding the
low dimensional embedding for points, it uses a technique which we call the
eigenvector optimization. The final step also helps in describing each point as a linear
combination of its neighbour. LLE has no internal model.
Isomap:
Isomap stands for isometric mapping. Isomap is a non-linear dimensionality
reduction method based on the spectral theory which tries to preserve the geodesic
distances in the lower dimension. Isomap starts by creating a neighborhood network.
After that, it uses graph distance to the approximate geodesic distance between all
pairs of points. And then, through eigenvalue decomposition of the geodesic distance
matrix, it finds the low dimensional embedding of the dataset. In non-linear
manifolds, the Euclidean metric for distance holds good if and only if neighborhood
structure can be approximated as linear. If neighborhood contains holes, then
Euclidean distances can be highly misleading.
Roll No. Name of Student
404A001 AAYUSH SANDEEP DIWATE
404A006 AJAY SANTOSH GHANTE
404A031 BOMBLE SHITAL SURESH
404A039 DESHMUKH PURUSHOTTAM BALASAHEB
404A044 DHANISHTA GOTMARE
404A061 GOVEKAR PRASAD RAMESH
404A064 GUNJAL SHUBHAM SAMPAT
404B006 HIREMATH ANANYA MALLIKARJUN
404B029 KONDEDESHMUKH KSHITIJ AJIT
404B035 KUMBHAR MANASI SACHIN
404B055 NAGAR ANURAG GOPALDAS
404B065 NISHAD SIMRAN SHRAVANKUMAR
404B073 PANZADE MITALI SANTOSH
404B074 SANGAVE VAISHNAVI PREMKUMAR
404C021 QURESHI A MATIN JABIR
404C027 RAUT GAURAV KUNDAN
404C031 RITU KUMARI
404C067 WALKUNDE YASH MANOHAR
404D009 BANSODE SAKSHI SATYAWAN
404D015 BONDRE SIDDHESH DASHRATH
404D022 GANJIWALE SHASHWAT ASHISH
404D037 KAWDE SAURAV SURESH
404D064 SHINDE HINDAVI SUNIL
Subject Teacher Honours Coordinator HOD
5. Lack of Guidance for Algorithm Selection: PAC learning provides theoretical guarantees
on
generalization performance, but it does not offer specific guidance on which learning
algorithm
or approach to choose for a particular problem. Determining the most appropriate algorithm
within the PAC framework often requires additional empirical evaluations and domain
expertise.
6. Noisy and Corrupted Data: PAC learning assumes that the training data is noise-free and
accurately labelled. However, in real-world scenarios, data can often be noisy, corrupted, or
contain errors. PAC learning algorithms may struggle to handle such noisy data, leading to
degraded performance and compromised generalization capabilities.
7. Lack of Scalability: Some PAC learning algorithms may struggle to scale well with large
and high-dimensional datasets. The computational and memory requirements of these
algorithms may become prohibitive as the size and complexity of the dataset increase, limit