Selecting the ML Approach
The data modeling approach for machine learning is based on the structure
and volume of the data at hand, regardless of the use case. Any of the
following approaches can be chosen considering all the factors.
Supervised Learning Unsupervised Learning Semi-supervised Learning Reinforcement Learning
Quiz Time
Guess what ML approach is used
by spam detection? Supervised Learning
Unsupervised Learning
Semi-supervised Learning
Reinforcement Learning
Quiz Time
Guess what ML approach is used
Supervised Learning
by spam detection?
Unsupervised Learning
Semi-supervised Learning
Reinforcement Learning
Fundamentals of Machine Learning and Deep Learning
Topic 5: Algorithms of Machine Learning
Machine Learning Algorithms
• There are four main types of machine learning algorithms.
• The choice of the algorithm depends on the type of data in the use case.
Types of Supervised Learning
The two main types of supervised learning that use labeled data are
regression and classification.
Classification
• Classification is applied when the output has finite and
discrete values.
• For example, social media sentiment analysis has three
potential outcomes: positive, negative, or neutral.
Regression
• Regression is applied when the output is a continuous
number.
• A simple regression algorithm: y = wx + b. For example,
relationship between environmental temperature (y) and
humidity levels (x).
Classification vs. Regression
By fitting to the labeled training set, you can find the most optimal model
parameters to predict unknown labels on other objects (test set).
If the label is a real number, we call the task regression. If the label is from the limited number of unordered
For example, finding actual value of house price based values, we call it classification. For example,
features like location, construction year, etc. classifying images of animals into separate groups
(labels) of dogs and cats.
Linear Regression
• Linear regression is an equation that describes a line
that represents the relationship between the input
variables (x) and the output variables (y).
• It does so by finding specific weightings for the input
variables called coefficients (B).
Quiz Time
Which of these is a use case for
linear regression? Spam detection
Google Translate
Car mileage based on brand,
model, year, weight, etc.
Robot learning to walk
Quiz Time
Which of these is a use case for
linear regression? Spam detection
Google Translate
Car mileage based on brand,
model, year, weight, etc.
Robot learning to walk
Meaning of Decision Tree
• A decision tree is a graphical representation of all the
possible solutions to a decision based on a few conditions.
• It uses predictive models to achieve results.
• A decision tree is drawn upside down with its root at the
top.
Classification and Regression Trees
• The tree splits into branches based on a condition or internal
Decision Root Node
node.
Node Commute more
Yes
than 1 hour
No
• The end of the branch that doesn’t split anymore is the
decision/leaf.
Commute more
than 1 hour
Decline offer • In this case, the condition whether the employee accepts or
Yes rejects the job offer is represented as green oval shaped
No
boxes.
Offers free
coffee
Decline offer
• This tree is called as classification tree as the target is to
Yes
No
classify whether the job is accepted by the employee or not.
• Regression trees are represented in the same manner, but
Decline offer
they predict continuous values like price of a house.
Decision Tree:
Should I accept a new
Job offer? Decline offer • Decision tree algorithms are referred to as CART or
Classification and Regression Trees.
• Each node represents a single input variable (x) and a split
point on that variable, assuming the variable is numeric.
Quiz Time
Can you think of a use case for
decision tree?
Naive Bayes
• Naive Bayes is a simple but surprisingly powerful algorithm for predictive modeling.
• The model comprises of two types of probabilities: the probability of each class and
the conditional probability of each class based on the value of x.
• Once calculated, this probability model can be used to make predictions for new data
using Bayes theorem.
• The probabilities can be easily estimated as bell curve when your data is real valued.
Naive Bayes Example
How does an email client classify between valid and spam emails?
Spam/Junk Ham/Inbox
Naive Bayes Classification
• The objects can be classified as either green or red. The task is to classify new cases
as they arrive.
• For Example, using Naïve Bayes, you can classify the class labels based on the
current objects.
• Since there are twice as many green objects as red, it is reasonable to believe that a
new case (which has not been observed yet) has same ratio.
Naive Bayes Classification
• In Bayesian analysis, this belief is known as prior probability.
• Prior probabilities are based on previous experience.
• Prior probability of green: number of green objects/total number of objects
• Prior probability of red: number of red objects/total number of objects
Naive Bayes Classification
Since there is a total of 60 objects, 40 of which are green and 20 are red, prior probabilities
for class membership are:
• Prior probability for green: 40/60
• Prior probability for red: 20/60 (number of red objects/total number of objects)
Naive Bayes Classification
• The more green (or red) objects there are in the vicinity of X, the more likely that the new
cases will belong to that particular color.
• To measure the likelihood, draw a circle around X which encompasses a number of points
irrespective of their class labels.
• Then, calculate the number of points in the circle that belong to each class label.
Naive Bayes Classification
CALCULATION OF LIKELIHOOD
In this illustration, it is clear that likelihood of X given GREEN is smaller than Likelihood of
X given RED, since the circle encompasses 1 GREEN object and 3 RED ones.
Naive Bayes Classification
CALCULATION OF PRIOR PROBABILITY
• Although the prior probabilities indicate that X may belong to GREEN (given that there
are twice as many GREEN compared to RED) the likelihood indicates otherwise.
• The class membership of X is RED (given that there are more RED objects in the vicinity
of X than GREEN).
• In Bayesian analysis, the final classification is produced by combining both sources of
information, i.e., the prior and the likelihood, to form a posterior probability using
Bayes' rule (named after Rev. Thomas Bayes 1702-1761).
Naive Bayes Classification
CALCULATION OF PRIOR PROBABILITY
Naive Bayes Classification
Finally, we classify X as RED since its class membership achieves the largest posterior probability.
Machine Learning Algorithms
The next algorithm is K-Means clustering.
K-Means Clustering
• K-Means clustering is an algorithm that can be used for any type of grouping.
• Examples of K-Means clustering:
o Group images
o Detect activity types in motion sensors
o Separate bots from anomalies
o Segment by purchasing history
• Meaningful changes in data can be detected by monitoring to see if a tracked data point
switches groups over time.
K-Means Clustering: Use Cases
Behavioral Inventory Sorting sensor Detecting bots or
segmentation categorization measurements anomalies
Segment by purchase Group inventory by Detect activity types in Separate valid activity
history sales activity motion sensors groups from bots
Segment by activities Group valid activity to
Group inventory by
on application, Group images clean up outlier
manufacturing metrics
website, or platform detection
Define personas
Separate audio
based on interests
Create profiles based Identify groups in
on activity monitoring health monitoring
K-Means Clustering for Unsupervised Learning
• To run a K-Means algorithm, randomly initialize three points called the cluster centroids.
• There are three cluster centroids in the image given below since data is grouped into three
clusters.
K-Means is an iterative algorithm and it involves two steps:
Step 1: Cluster assignment Step 2: Move centroid step
K-Means Clustering for Unsupervised Learning
Step 1:
Algorithm travels through data points, depending on which cluster is closer.
It assigns it to red, blue, or green cluster.
Step 2:
Algorithm calculates average of all points in cluster and moves centroid to the average location.
K-Means Clustering for Unsupervised Learning
• Steps 1 and 2 are repeated until there are no changes in clusters or when the specified
condition is met.
• K is chosen randomly, or elbow plot/silhouette score helps decide it.