Machine Learning
BITS F464
BITS Pilani Dr. Pranav M. Pawar
Dubai Campus
Contents
• Discussion on course handout
• What is Machine learning (ML)?
• Relationship between AI, ML and DL
• Why we require ML?
• Traditional Programming vs ML
• Recipe of ML
• Types of Learning
• Supervised Learning
• Unsupervised Learning
• Semi-supervised Learning
• Reinforcement Learning
• Bayesian Learning
• Instance-based Learning
• Active Learning
• Deep Learning
• High Dimension Data and Curse of Dimensionality
BITS Pilani, Dubai Campus
Discussion on course handout
• ML course handout
BITS Pilani, Dubai Campus
What is ML?
• Machine Learning (ML) is a sub-field of computer
science that evolved from the study of pattern
recognition and computational learning theory in
artificial intelligence.
• Using algorithms that iteratively learn from data
• Allowing computers to discover patterns without being explicitly
programmed where to look
• Other Definitions
• Arthur Samuel (1959): Machine Learning is the field of study that gives
the computer the ability to learn without being explicitly programmed.
• Tom Mitchell (1998): a computer program is said to learn from
experience E with respect to some class of tasks T and performance
measure P, if its performance at tasks in T, as measured by P, improves
with experience E.
• Ethem Alpaydın (2014): Machine learning is programming computers to
optimize a performance criterion using example data or past experience.
BITS Pilani, Dubai Campus
Relationship between AI,ML and DL
Source: https://www.aunalytics.com/artificial-intelligence-machine-learning-and-deep-learning%E2%81%A0/
BITS Pilani, Dubai Campus
Why we require ML?
• For tasks that are easily performed by humans but
are complex for computer systems to emulate
• Vision: Identify faces in a photograph, objects in a video or still
image, etc.
• Natural language: Translate a sentence from Hindi to English,
question answering, identify sentiment of text, etc.
• Speech: Recognise spoken words, speaking sentences naturally
• Game playing: Play games like chess, Go, Dota.
• Robotics: Walking, jumping, displaying emotions, etc.
• Driving a car, navigating a maze, etc.
BITS Pilani, Dubai Campus
Traditional Programming vs ML
Source: https://morioh.com/p/1063d43ef15e,
https://insightsoftware.com/blog/machine-learning-vs-traditional-
programming/#:~:text=Traditional%20programming%20is%20a%20manual,the%20rules%20from%20the%20data .
BITS Pilani, Dubai Campus
Recipe of ML (1)
• Six main components of ML recipe are,
• Data, Task, Model, Loss Function, Learning and Evaluation.
• Data
• Acts as the fuel to ML model
• Data can be structured or unstructured data.
• Sources of data
• Open dataset repositories, crowd sourcing marketplaces, and create our
own data.
• Splitting of data
• Training data is used to train our model.
• Once our model is ready we use test data to measure its accuracy.
• Tasks
• A machine learning task is the type of prediction or inference being
made, based on the problem or question that is being asked, and the
available data.
• For example, the classification task assigns data to categories, and the
clustering task groups data according to similarity.
BITS Pilani, Dubai Campus
Recipe of ML (2)
• Model
• A model is nothing but a mathematical function which defines
relationship between input data and output data.
• Techniques can be use for model predictions are
• Bias-variance trade-off
• Regularization
• Overfitting
• Loss function
• A loss function is used to optimize a machine learning algorithm. In
simple words it tells us how good or bad our model is.
• Mean square error, cross-entropy etc.
BITS Pilani, Dubai Campus
Recipe of ML (2)
• Leaning algorithm
• To identify these parameters for a model
• Here we need to find the values of a,b,c such that loss function in
minimum.
• Different learning algorithm or optimizers are,
• Gradient descent, adagrad etc.
• Evaluation
• It decides how accurate your model is.
• Accuracy, precision, recall, F1-score etc.
BITS Pilani, Dubai Campus
Supervised Learning
• Uses labelled data.
• A class of problem that involves using a model to learn a
mapping between input examples and the target variable.
• Two main types of supervised learning problems
• Classification: Supervised learning problem that involves predicting a class label.
• Regression: Supervised learning problem that involves predicting a numerical
label.
• Example
• For classification: The MNIST handwritten digits dataset where the inputs are
images of handwritten digits (pixel data) and the output is a class label for what
digit the image represents (numbers 0 to 9).
• For regression: The Boston house prices dataset where the inputs are variables
that describe a neighbourhood and the output is a house price in dollars.
• Common algorithms
• K-nearest neighbour
• Linear regression
• Decision tree
• Support vector machine etc.
BITS Pilani, Dubai Campus
Unsupervised Learning
• Unlabelled data
• Describes a class of problems that involves using a model to
describe or extract relationships in data.
• Compared to supervised learning, unsupervised learning
operates upon only the input data without outputs or target
variables.
• Two main types of supervised learning problems
• Clustering: Unsupervised learning problem that involves finding groups in data.
• Density Estimation: Unsupervised learning problem that involves summarizing the
distribution of data.
• Example
• Clustering is broadly used in many applications such as market research, pattern
recognition, data analysis, and image processing. Clustering can also help marketers
discover distinct groups in their customer base.
• Application of density estimation involves using small groups of closely related data
samples to estimate the distribution for new points in the problem space. Same can
be use for geographical data analysis and market data analysis.
• Common algorithms
• K-means clustering
• Hierarchical clustering
• Kernel density estimation
• Principal component analysis
BITS Pilani, Dubai Campus
Semi-supervised Learning
• Labelled + unlabelled data
• Is supervised learning where the training data contains very
few labelled examples and a large number of unlabelled
examples.
• Cluster analysis is a method that seeks to partition a dataset
into homogenous subgroups, meaning grouping similar data
together with the data in each group being different from the
other groups.
• Semi supervised clustering uses some known cluster
information in order to classify other unlabelled data,
meaning it uses both labelled and unlabelled data just like
semi supervised machine learning.
• Example: A text document classifier => Semi-supervised
learning allows for the algorithm to learn from a small
amount of labelled text documents while still classifying a
large amount of unlabelled text documents in the training
data.
• Common algorithms: Cotraining, Semi-supervised SVM
BITS Pilani, Dubai Campus
Reinforcement Learning (1)
• Describes a class of problems
where an agent operates in an
environment and must learn to
operate using feedback.
• Reinforcement Learning has
four essential elements:
• Agent: The program you train, with the
aim of doing a job you specify.
• Environment: The world, real or virtual,
in which the agent performs actions.
• Action: A move made by the agent,
which causes a status change in the
environment.
• Rewards: The evaluation of an action,
which can be positive or negative.
BITS Pilani, Dubai Campus
Reinforcement Learning (2)
• An example of a reinforcement problem is playing a
game where the agent has the goal of getting a
high score and can make moves in the game and
received feedback in terms of punishments or
rewards.
• Common algorithms
• Q-learning
• Markov Decision Process etc.
BITS Pilani, Dubai Campus
Bayesian Learning
• Based on concept of Bayes theorem.
• Each observed training example can incrementally
decrease or increase the estimated probability that a
hypothesis is correct.
• This provides a more flexible approach to learning than algorithms that
completely eliminate a hypothesis if it is found to be inconsistent with any single
example.
• Prior knowledge can be combined with observed data to
determine the final probability of a hypothesis. In Bayesian
learning, prior knowledge is provided by asserting
• A prior probability for each candidate hypothesis, and
• A probability distribution over observed data for each possible hypothesis.
• Example: To determine the accuracy of medical test results
by taking into consideration how likely any given person is
to have a disease and the general accuracy of the test.
• Common Algorithms: Navie Bayes Classifier, Gibbs
Algorithm etc.
BITS Pilani, Dubai Campus
Instance-based Learning (1)
• Instance-based learning methods simply store the
training examples instead of learning explicit
description of the target function.
• Generalizing the examples is postponed until a new instance must
be classified.
• When a new instance is encountered, its relationship to the stored
examples is examined in order to assign a target function value for
the new instance.
• Instance-based methods are sometimes referred to
as lazy learning methods because they delay
processing until a new instance must be classified.
BITS Pilani, Dubai Campus
Instance-based Learning (2)
• A key advantage of lazy learning is that instead of
estimating the target function once for the entire
instance space, these methods can estimate it
locally and differently for each new instance to be
classified.
• Example: Recommender Systems. If you know a
user likes a particular item, then you can
recommend similar items for them. To find similar
items, you compare the set of users who like each
item—if a similar set of users like two different
items, then the items themselves are probably
similar.
• Common algorithm: KNN, Locally Weighted
Regression etc.
BITS Pilani, Dubai Campus
Active Learning (1)
• Key idea: A machine learning algorithm can
achieve greater accuracy with fewer labelled
training instances if it is allowed to choose the data
from which is learns.
• Active learner may ask queries in the form of
unlabelled instances to be labelled by a human
annotator.
BITS Pilani, Dubai Campus
Active Learning (2)
• Active learning is well-
motivated in many modern
machine learning problems,
where unlabelled data may
be abundant but labels are
difficult, time-consuming, or
expensive to obtain.
• Example: Gene expression
Active learning takes 31
and Cancer classification
points to achieve same
• Common techniques: Query accuracy as passive learning
Strategy Framework, with 174
Analysis of active learning.
BITS Pilani, Dubai Campus
Deep Learning
• Deep learning is a subfield of machine learning,
deep learning is what powers the most human-like
artificial intelligence.
• Deep learning structures algorithms in layers to
create an "artificial neural network” that can learn
and make intelligent decisions on its own
• Major types artificial neural network
• Feedforward Neural Network
• Recurrent Neural Network
• Convolution Neural Network etc.
BITS Pilani, Dubai Campus
Multidimensional Data and Problem
• Dimension: Number of Observables (e.g. age and
income)
• If number of observables is increased
• More time to compute
• More memory to store inputs and intermediate results
• More complicated explanations (knowledge from learning)
• Regression from 100 vs. 2 parameters
• No simple visualization
• 2D vs. 10D graph
• The severe difficulty that can arise in spaces of
many dimensions is called as curse of
dimensionality.
BITS Pilani, Dubai Campus
Curse of Dimensionality (1)
• Handling high dimensionality of data poses severe
problem.
• Important factor in designing ML algorithms.
• For example 1:
o Dataset: Measurement of mixture
of oil, water & gas (From pipelines)
o Categories of data: homogeneous
(red) , annular (green) & laminar
(blue)
o Figure shows representation of
100 points in two dimensions (x6
and x7)
o In actual each data point is a 12-
dimensional input vector.
o We want to predict class of x.
BITS Pilani, Dubai Campus
Curse of Dimensionality (2)
• User may classify it as
• Red => As it is surrounded by numerous
red points.
• Green => As plenty of nearby green
points.
• Blue => Unlikely it is blue.
• Intuition: Identity of the cross
should be more strongly
determined by nearby points
from the training set and less
strongly by more distant points.
• How to turn intuition to Learning
algorithm?
BITS Pilani, Dubai Campus
Curse of Dimensionality (3)
• Simple solution
• Divide input space into regular cells.
• Predict class according to which, cell
it belongs.
• Other solution: Take vote in which
class it belongs etc.
• The solution will become
more complex if number of
input variables increases
(dimensions D)
• No of cells will grow
exponentially
• Need large number
of training data points.
BITS Pilani, Dubai Campus
Curse of Dimensionality (4)
• For example 2:
• For D input variables general polynomial coefficient with order 3 will
be,
• As D increases number of coefficient increases, grows proportionally
to D3 in this case.
• For capturing complex dependencies may need higher order
polynomial
• For polynomial order M, no of coefficient will be DM
• Such method will have limited practical utility because of complexity.
BITS Pilani, Dubai Campus
Curse of Dimensionality (4)
• Remark
• Increasing the number of features will not always improve
classification accuracy.
• In practice, the inclusion of more features might actually
lead to worse performance.
• The number of training examples required increases
exponentially with dimensionality d (i.e., kd).
• Solutions
• Dimensionality reduction
• Feature extractions
BITS Pilani, Dubai Campus
Sources
• https://productcoalition.com/difference-between-traditional-
programming-versus-machine-learning-from-a-pm-
perspective-3802b02bc7f6.
• Chapter 1, Tom M. Mitchell, Machine Learning, The McGraw-
Hill Companies, 1st edition 2013.
• Chapter 1, Alpaydin Ethem. Introduction to Machine Learning,
3e, PHI, 2014.
• Recipe Of Machine Learning Theory | by Pratyush Singh
Kushwah | Medium
• What is a machine learning model? | Microsoft Docs
• Machine learning tasks - ML.NET | Microsoft Docs
• https://www.cse.iitb.ac.in/~cs725/notes
• https://machinelearningmastery.com/types-of-learning-in-machine-learning/
• .https://algorithmia.com/blog/semi-supervised-learning
• https://medium.com/ai%C2%B3-theory-practice-business/reinforcement-
learning-part-1-a-brief-introduction-a53a849771cf
• https://medium.com/datadriveninvestor/bayesian-learning-for-ml-
13d30485267e#:~:text=Bayesian%20learning%20treats%20model%20par
ameters,based%20on%20the%20observed%20data.
• https://web.cs.hacettepe.edu.tr/~ilyas/Courses/BIL712
• https://minds.wisconsin.edu/handle/1793/60660
BITS Pilani, Dubai Campus
BITS Pilani
Dubai Campus
Thank You!