0% found this document useful (0 votes)
6 views21 pages

Naïve Bayes Classification

The document discusses Naïve Bayes classification, a statistical method for assigning class labels to unclassified data using Bayes' Theorem. It outlines the classification problem, various classification techniques, and the principles of probability, including conditional and joint probabilities. The Naïve Bayesian classifier operates under the assumption of attribute independence, making it efficient for training and testing, and is widely used in applications like spam filtering.

Uploaded by

snow25jon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views21 pages

Naïve Bayes Classification

The document discusses Naïve Bayes classification, a statistical method for assigning class labels to unclassified data using Bayes' Theorem. It outlines the classification problem, various classification techniques, and the principles of probability, including conditional and joint probabilities. The Naïve Bayesian classifier operates under the assumption of attribute independence, making it efficient for training and testing, and is widely used in applications like spam filtering.

Uploaded by

snow25jon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

MCSE0007: Machine Learning

Naïve Bayes Classification


Classification Problem
 Classification consists of assigning a class label
to a set of unclassified data
 More precisely, a classification problem can be
stated as below:
Definition: Classification Problem

Given a database D = 𝑡1, 𝑡2, … . . , 𝑡𝑚 of tuples and a set of classes C =


𝑐1, 𝑐2, … . . , 𝑐𝑘 , the classification problem is to define a mapping f ∶ D → 𝐶,

Where each 𝑡𝑖 is assigned to one class.

Note that tuple 𝑡𝑖 ∈ 𝐷 is defined by a set of attributes 𝐴 = 𝐴1, 𝐴2, … . . , 𝐴𝑛 .


Classification Techniques
⚫ A number of classification techniques are known, which can be broadly
classified into the following categories:

1. Statistical-Based Methods
• Regression
• Bayesian Classifier

2. Distance-Based Classification
• K-Nearest Neighbours

3. Decision Tree-Based Classification


• ID3, C 4.5, CART

5. Classification using Machine Learning (SVM)

6. Classification using Neural Network (A𝐍𝐍)


Bayesian Classifier
⚫ Principle
⚫ If it walks like a duck, quacks like a duck, then it is probably a duck
Bayesian Classifier …
⚫ A statistical classifier
⚫ Performs probabilistic prediction, i.e., predicts class
membership probabilities
⚫ Foundation
⚫ Based on Bayes’ Theorem.
⚫ Assumptions
1. The classes are mutually exclusive and exhaustive.
2. The attributes are independent given the class.
⚫ Called “Naïve” classifier because of these assumptions.
⚫ Empirically proven to be useful.
⚫ Scales very well.
Bayesian Classifier …
⚫ In many applications, the relationship between the
attributes set and the class variable is non-deterministic.
o In other words, a test cannot be classified to a class label
with certainty.
o In such a situation, the classification can be achieved
probabilistically.
⚫ The Bayesian classifier is an approach for modelling
probabilistic relationships between the attribute set and the
class variable.
⚫ More precisely, Bayesian classifier use Bayes’ Theorem of
Probability for classification.
Theory of Probability
Simple Probability
Definition: Simple Probability

If there are n elementary events associated with a random experiment and m of n of


them are favorable to an event A, then the probability of happening or occurrence of A
is
𝑚
𝑃 𝐴 =
𝑛

⚫ Suppose, A and B are any two events and P(A), P(B) denote the
probabilities that the events A and B will occur, respectively.

⚫ Mutually Exclusive Events:


⚫ Two events are mutually exclusive, if the occurrence of one precludes the
occurrence of the other.
Example: Tossing a coin (two events)
Tossing a ludo cube (Six events)
Simple Probability …
⚫ Independent events: Two events are independent if occurrences of one
does not alter the occurrence of other.

Example: Tossing both coin and ludo cube together.

Definition: Joint Probability

If P(A) and P(B) are the probability of two events, then

𝑃 𝐴∪𝐵 = 𝑃 𝐴 +𝑃 𝐵 −𝑃 𝐴∩𝐵

If A and B are mutually exclusive, then 𝑃 𝐴 ∩ 𝐵 = 0


If A and B are independent events, then 𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴 . 𝑃(𝐵)

Thus, for mutually exclusive events


𝑃 𝐴∪𝐵 = 𝑃 𝐴 +𝑃 𝐵
Conditional Probability
Definition: Conditional Probability

If events are dependent, then their probability is expressed by conditional


probability. The probability that A occurs given that B is denoted by 𝑃(𝐴|𝐵).

Suppose, A and B are two events associated with a random experiment. The
probability of A under the condition that B has already occurred and 𝑃(𝐵) ≠ 0 is
given by

Number of events in 𝐵 which are favourable to 𝐴


𝑃 𝐴𝐵 =
Number of events in 𝐵
Number of events favourable to 𝐴 ∩ 𝐵
=
Number of events favourable to 𝐵

𝑃(𝐴 ∩ 𝐵)
=
𝑃(𝐵)
Simple Probability …
Corollary: Conditional Probability

𝑃 𝐴 ∩𝐵 = 𝑃 𝐴 .𝑃 𝐵 𝐴 , 𝑖𝑓 𝑃 𝐴 ≠ 0
or 𝑃 𝐴 ∩𝐵 = 𝑃 𝐵 .𝑃 𝐴 𝐵 , 𝑖𝑓 𝑃(𝐵) ≠ 0

For three events A, B and C

𝑃 𝐴 ∩ 𝐵 ∩ 𝐶 = 𝑃 𝐴 .𝑃 𝐵 .𝑃 𝐶 𝐴 ∩ 𝐵

For n events A1, A2, …, An and if all events are mutually independent to each other

𝑃 𝐴1 ∩ 𝐴2 ∩ … … … … ∩ 𝐴𝑛 = 𝑃 𝐴1 . 𝑃 𝐴2 … … … … 𝑃 𝐴𝑛
Note:
𝑃 𝐴𝐵 =0 if events are mutually exclusive
𝑃 𝐴𝐵 =𝑃 𝐴 if A and B are independent
𝑃 𝐴 𝐵 ⋅ 𝑃 𝐵 = 𝑃 𝐵 𝐴 ⋅ 𝑃(𝐴) otherwise,
P A ∩ B = P(B ∩ A)
Prior and Posterior Probabilities

⚫ P(A) and P(B) are called prior probabilities X Y


⚫ P(A|B), P(B|A) are called posterior probabilities
𝑥1 A
Example 8.6: Prior versus Posterior Probabilities 𝑥2 A
⚫ This table shows that the event Y has two outcomes
namely A and B, which is dependent on another event X 𝑥3 B
with various outcomes like 𝑥1, 𝑥2 and 𝑥3. 𝑥3 A
⚫ Case1: Suppose, we don’t have any information of the
event A. Then, from the given sample space, we can 𝑥2 B
calculate P(Y = A) = 5 = 0.5 𝑥1 A
10

⚫ Case2: Now, suppose, we want to calculate P(X = 𝑥1 B


𝑥2 |Y =A) = 2 = 0.4 .
5 𝑥3 B

The later is the conditional or posterior 𝑥2 B


probability, where as the former is the prior 𝑥2 A
probability.
Naïve Bayesian Classifier
⚫ Suppose, Y is a class variable and X = 𝑋1, 𝑋2, … . . , 𝑋𝑛 is a set of attributes,
with instance of Y.

INPUT (X) CLASS(Y)


… … …
… … … …
𝑥 1, 𝑥 2 , … , 𝑥 𝑛 𝑦𝑖
… … … …

⚫ The classification problem, then can be expressed as the class-conditional


probability
𝑃 𝑌 = 𝑦𝑖| 𝑋1 = 𝑥1 AND 𝑋2 = 𝑥2 AND … . . 𝑋𝑛 = 𝑥𝑛
Naïve Bayesian Classifier …
⚫ Naïve Bayesian classifier calculate this posterior probability using Bayes’
theorem, which is as follows.
⚫ From Bayes’ theorem on conditional probability, we have
𝑃(𝑋|𝑌)∙𝑃(𝑌)
𝑃 𝑌𝑋 =
𝑃(𝑋)
𝑃(𝑋|𝑌) ∙ 𝑃(𝑌)
=
𝑃 𝑋 𝑌 = 𝑦1 ∙ 𝑃 𝑌 = 𝑦1 + ⋯ + 𝑃 𝑋 𝑌 = 𝑦𝑘 ∙ 𝑃 𝑌 = 𝑦𝑘
where,
𝑃 𝑋 = σ𝑘𝑖=1 𝑃(𝑋|𝑌 = 𝑦𝑖) ∙ 𝑃(Y = 𝑦𝑖 )
Note:
 𝑃 𝑋 is called the evidence (also the total probability) and it is a constant.
 The probability P(Y|X) (also called class conditional probability) is therefore
proportional to P(X|Y)∙ 𝑃(𝑌).
 Thus, P(Y|X) can be taken as a measure of Y given that X.
P(Y|X) ≈ 𝑃 𝑋 𝑌 ∙ 𝑃(𝑌)
Naïve Bayesian Classifier …
⚫ Suppose, for a given instance of X (say x = (𝑋1 = 𝑥1) and …..
(𝑋𝑛= 𝑥𝑛)).

⚫ There are any two class conditional probabilities namely P(Y=


𝑦𝑖|X=x) and P(Y= 𝑦𝑗 | X=x).

⚫ If P(Y= 𝑦𝑖 | X=x) > P(Y= 𝑦𝑗 | X=x), then we say that 𝑦𝑖 is more


stronger than 𝑦𝑗 for the instance X = x.

⚫ The strongest 𝑦𝑖 is the classification for the instance X = x.


Naïve Bayesian Classifier …
Naïve Bayes Algorithm (for discrete input attributes) has two phase
– 1. Learning Phase: Given a training set S,
Learning is easy, just
For each target value of ci (ci  c1 ,,c L )
create probability
P̂(C  ci )  estimate P(C  ci ) with examples in S; tables.
For every attributevalue xjk of each attributeXj ( j  1,, n; k  1,, N j )

P̂(X j  x jk |C  ci )  estimate P(Xj  x jk |C  ci ) with examples in S;

Output: conditional probability tables; for Xj , N j  L elements


– 2. Test Phase: Given an unknown instance X (a1 ,,,an)
Look up tables to assign the label c* to X’ if
[Pˆ(a |c* )  Pˆ(a |c* )]P̂(c * ) [Pˆ(a |c)  Pˆ(a |c)]P̂(c), c  c* , c  c ,,c
1 n 1 n 1 L

Classification is easy, just multiply probabilities


Naïve Bayesian Classifier …
Naïve Bayesian Classifier …
• Naïve Bayes is based on the independence assumption
– Training is very easy and fast; just requiring considering each attribute in
each class separately
– Test is straightforward; just looking up tables or calculating conditional
probabilities with normal distributions

• Naïve Bayes is a popular generative classifier model


1. Performance of naïve Bayes is competitive to most of state-of-the-art
classifiers even if in presence of violating the independence assumption
2. It has many successful applications, e.g., spam mail filtering
3. A good candidate of a base learner in ensemble learning
4. Apart from classification, naïve Bayes can do more…
Any Questions ?

You might also like