0% found this document useful (0 votes)
26 views31 pages

Lec 3

The document discusses various machine learning approaches, including supervised, unsupervised, semi-supervised, and reinforcement learning, emphasizing the importance of data structure and volume in selecting the appropriate method. It details types of supervised learning, such as classification and regression, and introduces algorithms like decision trees, Naive Bayes, and K-Means clustering, explaining their applications and functionalities. Additionally, it highlights the process of K-Means clustering and its iterative nature in grouping data points.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views31 pages

Lec 3

The document discusses various machine learning approaches, including supervised, unsupervised, semi-supervised, and reinforcement learning, emphasizing the importance of data structure and volume in selecting the appropriate method. It details types of supervised learning, such as classification and regression, and introduces algorithms like decision trees, Naive Bayes, and K-Means clustering, explaining their applications and functionalities. Additionally, it highlights the process of K-Means clustering and its iterative nature in grouping data points.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Selecting the ML Approach

The data modeling approach for machine learning is based on the structure
and volume of the data at hand, regardless of the use case. Any of the
following approaches can be chosen considering all the factors.

Supervised Learning Unsupervised Learning Semi-supervised Learning Reinforcement Learning


Quiz Time

Guess what ML approach is used


by spam detection? Supervised Learning

Unsupervised Learning

Semi-supervised Learning

Reinforcement Learning
Quiz Time

Guess what ML approach is used


Supervised Learning
by spam detection?

Unsupervised Learning

Semi-supervised Learning

Reinforcement Learning
Fundamentals of Machine Learning and Deep Learning
Topic 5: Algorithms of Machine Learning
Machine Learning Algorithms

• There are four main types of machine learning algorithms.


• The choice of the algorithm depends on the type of data in the use case.
Types of Supervised Learning

The two main types of supervised learning that use labeled data are
regression and classification.
Classification

• Classification is applied when the output has finite and


discrete values.
• For example, social media sentiment analysis has three
potential outcomes: positive, negative, or neutral.
Regression

• Regression is applied when the output is a continuous


number.

• A simple regression algorithm: y = wx + b. For example,


relationship between environmental temperature (y) and
humidity levels (x).
Classification vs. Regression

By fitting to the labeled training set, you can find the most optimal model
parameters to predict unknown labels on other objects (test set).

If the label is a real number, we call the task regression. If the label is from the limited number of unordered
For example, finding actual value of house price based values, we call it classification. For example,
features like location, construction year, etc. classifying images of animals into separate groups
(labels) of dogs and cats.
Linear Regression

• Linear regression is an equation that describes a line


that represents the relationship between the input
variables (x) and the output variables (y).

• It does so by finding specific weightings for the input


variables called coefficients (B).
Quiz Time

Which of these is a use case for


linear regression? Spam detection

Google Translate

Car mileage based on brand,


model, year, weight, etc.

Robot learning to walk


Quiz Time

Which of these is a use case for


linear regression? Spam detection

Google Translate

Car mileage based on brand,


model, year, weight, etc.

Robot learning to walk


Meaning of Decision Tree

• A decision tree is a graphical representation of all the


possible solutions to a decision based on a few conditions.

• It uses predictive models to achieve results.


• A decision tree is drawn upside down with its root at the
top.
Classification and Regression Trees

• The tree splits into branches based on a condition or internal


Decision Root Node
node.
Node Commute more

Yes
than 1 hour
No
• The end of the branch that doesn’t split anymore is the
decision/leaf.
Commute more
than 1 hour
Decline offer • In this case, the condition whether the employee accepts or
Yes rejects the job offer is represented as green oval shaped
No
boxes.
Offers free
coffee
Decline offer
• This tree is called as classification tree as the target is to
Yes

No
classify whether the job is accepted by the employee or not.
• Regression trees are represented in the same manner, but
Decline offer
they predict continuous values like price of a house.
Decision Tree:
Should I accept a new
Job offer? Decline offer • Decision tree algorithms are referred to as CART or
Classification and Regression Trees.
• Each node represents a single input variable (x) and a split
point on that variable, assuming the variable is numeric.
Quiz Time

Can you think of a use case for


decision tree?
Naive Bayes

• Naive Bayes is a simple but surprisingly powerful algorithm for predictive modeling.
• The model comprises of two types of probabilities: the probability of each class and
the conditional probability of each class based on the value of x.

• Once calculated, this probability model can be used to make predictions for new data
using Bayes theorem.
• The probabilities can be easily estimated as bell curve when your data is real valued.
Naive Bayes Example

How does an email client classify between valid and spam emails?

Spam/Junk Ham/Inbox
Naive Bayes Classification

• The objects can be classified as either green or red. The task is to classify new cases
as they arrive.
• For Example, using Naïve Bayes, you can classify the class labels based on the
current objects.
• Since there are twice as many green objects as red, it is reasonable to believe that a
new case (which has not been observed yet) has same ratio.
Naive Bayes Classification

• In Bayesian analysis, this belief is known as prior probability.

• Prior probabilities are based on previous experience.

• Prior probability of green: number of green objects/total number of objects

• Prior probability of red: number of red objects/total number of objects


Naive Bayes Classification

Since there is a total of 60 objects, 40 of which are green and 20 are red, prior probabilities
for class membership are:
• Prior probability for green: 40/60
• Prior probability for red: 20/60 (number of red objects/total number of objects)
Naive Bayes Classification

• The more green (or red) objects there are in the vicinity of X, the more likely that the new
cases will belong to that particular color.

• To measure the likelihood, draw a circle around X which encompasses a number of points
irrespective of their class labels.

• Then, calculate the number of points in the circle that belong to each class label.
Naive Bayes Classification
CALCULATION OF LIKELIHOOD

In this illustration, it is clear that likelihood of X given GREEN is smaller than Likelihood of
X given RED, since the circle encompasses 1 GREEN object and 3 RED ones.
Naive Bayes Classification
CALCULATION OF PRIOR PROBABILITY

• Although the prior probabilities indicate that X may belong to GREEN (given that there
are twice as many GREEN compared to RED) the likelihood indicates otherwise.

• The class membership of X is RED (given that there are more RED objects in the vicinity
of X than GREEN).
• In Bayesian analysis, the final classification is produced by combining both sources of
information, i.e., the prior and the likelihood, to form a posterior probability using
Bayes' rule (named after Rev. Thomas Bayes 1702-1761).
Naive Bayes Classification
CALCULATION OF PRIOR PROBABILITY
Naive Bayes Classification

Finally, we classify X as RED since its class membership achieves the largest posterior probability.
Machine Learning Algorithms

The next algorithm is K-Means clustering.


K-Means Clustering

• K-Means clustering is an algorithm that can be used for any type of grouping.
• Examples of K-Means clustering:
o Group images
o Detect activity types in motion sensors
o Separate bots from anomalies
o Segment by purchasing history
• Meaningful changes in data can be detected by monitoring to see if a tracked data point
switches groups over time.
K-Means Clustering: Use Cases

Behavioral Inventory Sorting sensor Detecting bots or


segmentation categorization measurements anomalies

Segment by purchase Group inventory by Detect activity types in Separate valid activity
history sales activity motion sensors groups from bots

Segment by activities Group valid activity to


Group inventory by
on application, Group images clean up outlier
manufacturing metrics
website, or platform detection

Define personas
Separate audio
based on interests

Create profiles based Identify groups in


on activity monitoring health monitoring
K-Means Clustering for Unsupervised Learning

• To run a K-Means algorithm, randomly initialize three points called the cluster centroids.

• There are three cluster centroids in the image given below since data is grouped into three
clusters.

K-Means is an iterative algorithm and it involves two steps:

Step 1: Cluster assignment Step 2: Move centroid step


K-Means Clustering for Unsupervised Learning

Step 1:

Algorithm travels through data points, depending on which cluster is closer.


It assigns it to red, blue, or green cluster.

Step 2:

Algorithm calculates average of all points in cluster and moves centroid to the average location.
K-Means Clustering for Unsupervised Learning

• Steps 1 and 2 are repeated until there are no changes in clusters or when the specified
condition is met.

• K is chosen randomly, or elbow plot/silhouette score helps decide it.

You might also like