0% found this document useful (0 votes)
80 views51 pages

Aiml 2 3

The document discusses Bayesian learning and the Naive Bayes algorithm. Bayesian learning uses probability and Bayes' theorem to update predictions based on new evidence. The Naive Bayes algorithm is a classification method that assumes independence between features and calculates class probabilities.

Uploaded by

Vivek Tg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views51 pages

Aiml 2 3

The document discusses Bayesian learning and the Naive Bayes algorithm. Bayesian learning uses probability and Bayes' theorem to update predictions based on new evidence. The Naive Bayes algorithm is a classification method that assumes independence between features and calculates class probabilities.

Uploaded by

Vivek Tg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

MODULE 4

CHAPTER 8
BAYESIAN LEARNING
8.1 INTRODUCTION TO PROBABILITY-BASED
LEARNING
• Probability-based learning is one of the most important practical
learning methods which combines prior knowledge or prior
probabilities with observed data.
• Probabilistic learning uses the concept of probability theory that
describes how to model randomness, uncertainty, and noise to predict
future events.
• It is a tool for modelling large datasets and uses Bayes rule to infer
unknown quantities, predict and learn from data.
• In a probabilistic model, randomness plays a major role which gives
probability distribution a solution, while in a deterministic model there
is no randomness and hence it exhibits the same initial conditions
every time the model is run and is likely to get a single possible
outcome as the solution.
• Bayesian learning differs from probabilistic learning as it uses
subjective probabilities (i.e., probability that is based on an individual’s
belief or interpretation about the outcome of an event and it can
change over time) to infer parameters of a model.
• Two practical learning algorithms called Naïve Bayes learning and
Bayesian Belief Network (BBN) form the major part of Bayesian
learning. These algorithms use prior probabilities and apply Bayes rule
to infer useful information.
•Bayesian Learning is a learning method that describes and
represents knowledge in an uncertain domain and provides
a way to reason about this knowledge using probability
measure.
•It uses Bayes theorem to infer the unknown parameters of a
model.
•Bayesian inference is useful in many applications which
involve reasoning and diagnosis such as game theory,
medicine, etc. Bayesian inference is much more powerful in
handling missing data and for estimating any uncertainty in
predictions.
For Understanding
• The prior probability is the probability assigned to an event before
the arrival of some information that makes it necessary to revise
the assigned probability.
• The revision of the prior is carried out using Bayes' rule. The new
probability assigned to the event after the revision is called posterior
probability.
What is prior probability in Naive Bayes?
• The probability of each class before any characteristics are observed
is known as the prior probability in the Naive Bayes method
• Posterior probability = prior probability + new data
For Understanding
What is likelihood probability in Machine Learning with example?
• In simple words, as the name suggests, the likelihood is a function
that tells us how likely the specific data point suits the existing data
distribution.
For example.
• Suppose there are two data points in the dataset. The likelihood of
the first data point is greater than the second
For Understanding

Examples of Probability and Likelihood

Examples 1 – Coin Toss


• In the context of coin tosses, likelihood and probability represent
different aspects of the same experiment.
• The likelihood refers to the probability of observing a specific
outcome given a particular model or hypothesis.
• On the other hand, probability represents the long-term frequency of
an event occurring over multiple trials.
For Understanding
• To recap: probability is generally something we consider when
we have a model with a fixed set of parameters and we are
interested in the types of data that might be generated.
• Conversely, likelihood comes into play when we have already
observed data and we want to examine how likely certain model
parameters are.

• The distinction between probability and likelihood is


fundamentally important: Probability attaches to possible
results; likelihood attaches to hypotheses.
For Understanding
What is Probability?
Probability is a measure of the likelihood that an event will actually
occur based on information or assumptions that are currently known.
The probability of the event is commonly stated as a number between 0
and 1, where 0 indicates impossibility and 1 indicates inevitability.

To determine probability, use the following formula −


Probability=Numberoffavorableoutcomes/Totalnumberofoutcomes

For instance, the probability of getting heads when flipping a fair coin is
0.5 because there are two possible outcomes (heads or tails), and each
outcome has an equal likelihood of occurring.
Probability is used to describe the likelihood of events based on
assumptions or to make predictions about the future.
For Understanding

• Probability is used to make predictions about future


events, whereas likelihood is used to estimate unknown
parameters based on seen evidence.
8.2 FUNDAMENTALS OF BAYES THEOREM
• Naïve Bayes Model relies on Bayes theorem that works on the principle of three
kinds of probabilities called prior probability, likelihood probability, and posterior
probability.
• Prior Probability It is the general probability of an uncertain event before an
observation is seen or some evidence is collected. It is the initial probability that
is believed before any new information is collected.
• Likelihood Probability Likelihood probability is the relative probability of the
observation occurring for each class or the sampling density for the evidence
given the hypothesis. It is stated as P (Evidence | Hypothesis), which denotes
the likeliness of the occurrence of the evidence given the parameters.
• Posterior Probability It is the updated or revised probability of an event taking
into account the observations from the training data. P (Hypothesis | Evidence)
is the posterior distribution representing the belief about the hypothesis, given
the evidence from the training data. Therefore,
• Posterior probability = prior probability + new evidence
8.3 CLASSIFICATION USING BAYES MODEL
• Naïve Bayes Classification models work on the principle of Bayes
theorem.
• Bayes’ rule is a mathematical formula used to determine the
posterior probability, given prior probabilities of events.
• Generally, Bayes theorem is used to select the most probable
hypothesis from data, considering both prior knowledge and
posterior distributions. It is based on the calculation of the posterior
probability and is stated as:
P (Hypothesis h | Evidence E)
• where, Hypothesis h is the target class to be classified and Evidence E
is the given test instance.
• P (Hypothesis h| Evidence E) is calculated from the prior probability P
(Hypothesis h), the likelihood probability P (Evidence E |Hypothesis h)
and the marginal probability P (Evidence E).
• It can be written as:

• where, P (Hypothesis h) is the prior probability of the hypothesis h


without observing the training data or considering any evidence.
• It denotes the prior belief or the initial probability that the hypothesis h
is correct. P (Evidence E) is the prior probability of the evidence E from
the training dataset without any knowledge of which hypothesis holds.
It is also called the marginal probability.
• P (Evidence E | Hypothesis h) is the prior probability of Evidence E
given Hypothesis h.
• It is the likelihood probability of the Evidence E after observing the
training data that the hypothesis h is correct.
• P (Hypothesis h | Evidence E) is the posterior probability of
Hypothesis h given Evidence E.
• It is the probability of the hypothesis h after observing the training
data that the evidence E is correct.
• In other words, by the equation of Bayes Eq. (8.1), one can observe
that:
Posterior Probability α Prior Probability × Likelihood Probability
• Bayes theorem helps in calculating the posterior probability for a
number of hypotheses, from which the hypothesis with the highest
probability can be selected.
• This selection of the most probable hypothesis from a set of
hypotheses is formally defined as Maximum A Posteriori (MAP)
Hypothesis
• What is Naive Bayes Classifier?
• Naive Bayes classifier is a probabilistic machine learning model based
on Bayes’ theorem. It assumes independence between features and
calculates the probability of a given input belonging to a particular
class. It’s widely used in text classification, spam filtering, and
recommendation systems.
8.3.1 NAÏVE BAYES ALGORITHM
• It is a supervised binary class or multi class classification algorithm that
works on the principle of Bayes theorem.
• There is a family of Naïve Bayes classifiers based on a common principle.
• These algorithms classify for datasets whose features are independent
and each feature is assumed to be given equal weightage.
• It particularly works for a large dataset and is very fast. It is one of the
most effective and simple classification algorithms.
• This algorithm considers all features to be independent of each other
even though they are individually dependent on the classified object.
• Each of the features contributes a probability value independently during
classification and hence this algorithm is called as Naïve algorithm. Some
important applications of these algorithms are text classification,
recommendation system and face recognition.
• Solution: The training dataset T consists of 10 data instances with
attributes such as ‘CGPA’, ‘Interactiveness’, ‘Practical Knowledge’
and ‘Communication Skills’ as shown in Table 8.1.
• The target variable is Job Offer which is classified as Yes or No for a
candidate student.
• Step 1: Compute the prior probability for the target feature ‘Job
Offer’. The target feature ‘Job Offer’ has two classes, ‘Yes’ and ‘No’.
• It is a binary classification problem.
• Given a student instance, we need to classify whether ‘Job Offer =
Yes’ or ‘Job Offer = No’.
• From the training dataset, we observe that the frequency or the
number of instances with ‘Job Offer = Yes’ is 7 and ‘Job Offer = No’ is
3.
• The prior probability for the target feature is calculated by dividing
the number of instances belonging to a particular target class by the
total number of instances.
• Hence, the prior probability for ‘Job Offer = Yes’ is 7/10 and ‘Job Offer
= No’ is 3/10 as shown in Table 8.2.
• Step 2: Compute Frequency matrix and Likelihood Probability for each
of the feature. Step 2(a): Feature – CGPA Table 8.3 shows the
frequency matrix for the feature CGPA.
• Table 8.4 shows how the likelihood probability is calculated for CGPA
using conditional probability.
• As explained earlier the Likelihood probability is stated as the sampling
density for the evidence given the hypothesis.
• It is denoted as P (Evidence | Hypothesis), which says how likely is the
occurrence of the evidence given the parameters.
• It is calculated as the number of instances of each attribute value and for a
given class value divided by the number of instances with that class value.
• For example P (CGPA ≥9 | Job Offer = Yes) denotes the number of
instances with ‘CGPA ≥9’ and ‘Job Offer = Yes’ divided by the total number
of instances with ‘Job Offer = Yes’.
• From the Table 8.3 Frequency Matrix of CGPA, number of instances with
‘CGPA ≥9’ and ‘Job Offer = Yes’ is 3. The total number of instances with
‘Job Offer = Yes’ is 7. Hence, P (CGPA ≥9 | Job Offer = Yes) = 3/7.
• Similarly, the Likelihood probability is calculated for all attribute values of
feature CGPA.
• Step 2(b): Feature – Interactiveness Table 8.5 shows the frequency
matrix for the feature Interactiveness.
8.3.4 Gibbs Algorithm
• The main drawback of Bayes optimal classifier is that it computes the
posterior probability for all hypotheses in the hypothesis space and
then combines the predictions to classify a new instance.
• Gibbs algorithm is a sampling technique which randomly selects a
hypothesis from the hypothesis space according to the posterior
probability distribution and classifies a new instance.
• It is found that the prediction error occurs twice with the Gibbs
algorithm when compared to Bayes Optimal classifier.
8.4 NAÏVE BAYES ALGORITHM FOR
CONTINUOUS ATTRIBUTES
• There are two ways to predict with Naive Bayes algorithm for
continuous attributes:
• 1. Discretize continuous feature to discrete feature.
• 2. Apply Normal or Gaussian distribution for continuous feature.
Gaussian Naive Bayes Algorithm In Gaussian Naive Bayes, the values
of continuous features are assumed to be sampled from a Gaussian
distribution.
Thank You

You might also like