0% found this document useful (0 votes)
28 views31 pages

L4 Naive Bayes

The document outlines the Naïve Bayes classification method, detailing its foundational concepts in probability, the Naïve Bayes theorem, and its algorithm. It explains how to apply the theorem through examples and discusses potential problems and solutions in classification. The conclusion emphasizes the method's effectiveness in practical applications despite its assumptions of conditional independence among attributes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views31 pages

L4 Naive Bayes

The document outlines the Naïve Bayes classification method, detailing its foundational concepts in probability, the Naïve Bayes theorem, and its algorithm. It explains how to apply the theorem through examples and discusses potential problems and solutions in classification. The conclusion emphasizes the method's effectiveness in practical applications despite its assumptions of conditional independence among attributes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Naïve Bayes

Lương Thái Lê
Course Outline

1. Introduction to probability

2. Naïve Bayes Theorem

3. Naïve Bayes Algorithm

4. Naïve Bayes Examples

5. Naïve Bayes Problems

6. Naïve Bayes Conclusion


Basic concepts of probability
• Suppose we have an experiment (e.g. rolling a dice) whose outcome is random
(depends on the probability)
• Space of possibilities S: set of all possible outcomes
• Eg: S = {1,2,3,4,5,6} for the dice roll experiment
• An event E: a subset of S
• Eg: E = {1,3,5}
• Event space W: P(S)
• Random variable A: a function represents an event, and there is a degree of
probability that this event will occur.
• Eg: A= “The number of dots is odd when the dice are rolled”
Probability representation
• P(A) = The part of the space of events (W) where A is true
Some Properties
•0≤𝑃 𝐴 ≤1
• P(not A) = 1 – P(A)
• P(A V B)= P(A)+ P(B)-P(A ∧B)
• Suppose random variables A and B can take one of k ( >2) values
{v1,v2,…,vk}, then:
• P(A = vi ∧ A = v j ) = 0 if i ≠ j
• P(A=v1 V A=v2 V ... V A=vk) = σ𝑘𝑖=1 𝑃(𝐴 = 𝑣𝑖 ) = 1
• 𝑃(𝐵∧[A=v1 V A=v2 V ... V A=vm]= σ𝑚𝑖=1(𝑃(𝐵 ∧ 𝐴 = 𝑣𝑖 )
Conditional Probability
• P(A|B) is the part of space W in which A is true, provided
that B is true
• Eg:
A: “I will play football tomorow”
B: “It won’t rain tomorow”
=> P(A|B) is the probability that I will play football tomorrow if it
won’t rain
• Let P(A ∧ B) = P(A,B), then
𝑃(𝐴,𝐵)
𝑃 𝐴𝐵 =
𝑃(𝐵)
Probability independent variables (1)
• Two events (random variable) A and B are said to be probabilistically
independent if the probability of event A is the same for all cases:
• B happens
• B does not happen
• Eg:
• A: I will go swimming tomorrow
• B: Long will go swimming tomorrow

P(A|B) = P(A)
Probability independent variables (2)
Probability independent variables with >2
variables
• P(A|B,C) is probability of A when B,C (is known)
• Two variables A and C are said to be conditionally independent
of variable B, if the probability of A with respect to B is equal to
the probability of A with respect to B and C.
P (A|B,C) = P(A|B)
• Eg:
A: I will play football tomorrow
B: The football match will take place indoors
C: It will rain tomorow
Important Rules of Probability
• Chain rules:
• P(A,B) = P(A|B) P(B) = P(B|A) P(A)
• P(A|B) = P(A,B)/P(B) = P(B|A).P(A)/P(B)
• P(A,B|C) = P(A,B,C)/P(C) = P(A|B,C).P(B,C)/P(C)
= P(A|B,C).P(B|C)
• Probability independence and conditional independence
• P(A|B) = P(A); if A and B are probability independence
• P(A,B|C) = P(A|C).P(B|C); if A and B are conditional
independence with C
• P(A1,…,An|C) = P(A1|C)…P(An|C); if Ai are conditional
independence with C
Course Outline

1. Introduction to probability

2. Naïve Bayes Theorem

3. Naïve Bayes Algorithm

4. Naïve Bayes Examples

5. Naïve Bayes Problems

6. Naïve Bayes Conclusion


Bayes Theorem
𝑃 𝐷 ℎ . 𝑃(ℎ)
𝑃 ℎ𝐷 =
𝑃(𝐷)

• P(h):the prior probability of the (categorical) hypothesis h


• P(D): the prior probability of the observing of data
• P(D|h): the conditional probability of observing of data D, if hypothesis
(category) h is known to be true
• P(h|D): the conditional probability of the (categorical) hypothesis h being
true, if the data D is observed
=>Probabilistic classification methods will use this conditional probability
(called posterior probability)
Bayes Theorem – Example (1)
• Suppose we have this data set
Bayes Theorem – Example (2)
• Data D: Outlook is Sunny and Wind is Strong
• Categorical hypothesis h: He plays tennis
• The prior probability P(h): Probability that he plays tennis (no matter
how outdoors and windy)
• The prior probability P(D): Probability that Outdoors is sunny and
Wind is strong
• P(D|h): Probability that Outdoors is sunny and Wind is strong, if he
plays tennis
• P(h|D): Probability that he plays tennis if Outdoors is sunny and Wind
is strong
Maximum a Posteriori – MAP
• Given a set of possible hypotheses (target classes) H, the learning
system will find the most probable hypothesis h(∈H) for the observed
data D
• This hypothesis h is called the maximum posterior

(Bayes theorem)

(P(D) is the same with all h)


MAP - Example
• The set H includes 2 hypotheses:
• h1 : He plays tennis
• h2 : He dose not play tennis
• Calculate the 2 conditional probabilities: P(h1|D), P(h2|D)
• if P(h1|D)> P(h2|D) hMAP = h1
else hMAP = h1
• Because P(D)=P(D,h1)+P(D,h2) is the same for both hypotheses h1
and h2, so P(D) can be ignored.
• So if P(D|h1).P(h1)> P(D|h2).P(h2) then he plays tennis else he
dosen’t
Maximum Likelihood Estimation (MLE)
• Suppose that all assumptions have the same prior probability value:
P(hi)=P(hj), ∀hi,hj∈H
• The MLE method finds the hypothesis that maximizes the value
P(D|h); where P(D|h) is called the likelihood of data D for h
• The maximum likelihood hypothesis
MLE - Example
• The set H includes 2 hypothesis:
• h1 : He plays tennis
• h2 : He dose not play tennis
D: Data set (dates) in which the Outlook attribute is Sunny and the Wind is
Strong
• Find P(D|h1)and P(D|h2):
• P(Outlook=Sunny,Wind=Weak|h1)=1/8
• P(Outlook=Sunny,Wind=Weak|h2)=1/4
=> hMLE = h2 => He dose not play tennis
Course Outline

1. Introduction to probability

2. Naïve Bayes Theorem

3. Naïve Bayes Algorithm

4. Naïve Bayes Examples

5. Naïve Bayes Problems

6. Naïve Bayes Conclusion


Naïve Bayes Classification – Idea (1)
• Classification problem:
• Input:
• A training data set D ={(x(i),c(j))}; i= 1,…,m; j= 1,..,k where :
• x: is an training example, is represented by n dimension vector (x1, x2, …, xn)
• c: is a label class in the target class set C = {c1, c2, …, ck}
• A test example z not belong to D
• Output:
• A classification model F
• The class that z is determined belong to by F
• Motivation:
• find cMAP is the most
suitable class for z

Naïve Bayes
Naïve Bayes Classification – Idea (2)
• Because P(z1, z2, …, zn) is the same for all class ci , so we find:
𝑐𝑀𝐴𝑃 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑐𝑖 ∈𝐶 𝑃(𝑧1 , 𝑧2 ,…, 𝑧𝑛 𝑐𝑖 𝑃(𝑐𝑖 )
• Assumption in the Naïve Bayes classifier, attributes are conditionally
independent of classes:
𝑃(𝑧1 , 𝑧2 ,…, 𝑧𝑛 𝑐𝑖 = ς𝑛𝑗=1 𝑃(𝑧𝑗 |𝑐𝑖 )
• Naïve Bayes find the most likelihood class for z:

𝑐𝑁𝐵 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑐𝑖 ∈𝐶 𝑃(𝑐𝑖 ) ෑ 𝑃(𝑧𝑗 |𝑐𝑖 )


𝑗=1
Naïve Bayes Classification – Alg
• Training phase: with a training example x = (x1, x2, …, xn)
• Calculate P(ci) for each class ci ∈ 𝐶
• Calculate P(xj|ci) for each attribute xj ∈ vector x
• Classification phase: for a new example z = (z1, z2, …, zn)
• Calculate 𝑃(𝑐𝑖 ) ς𝑛𝑗=1 𝑃(𝑧𝑗 |𝑐𝑖 )
• Find the most suitable class c* for z:
𝑛

𝑐 ∗ = 𝑎𝑟𝑔𝑚𝑎𝑥𝑐𝑖∈𝐶 𝑃(𝑐𝑖 ) ෑ 𝑃(𝑧𝑗 |𝑐𝑖 )


𝑗=1
Course Outline

1. Introduction to probability

2. Naïve Bayes Theorem

3. Naïve Bayes Algorithm

4. Naïve Bayes Examples

5. Naïve Bayes Problems

6. Naïve Bayes Conclusion


Naïve Bayes Classification – Example (1)
A young student with an average income and a normal credit rating would buy a calculator?
Naïve Bayes Classification – Example (2)
• Problem modeling:
• z = (Age= Young, Income= Medium, Student= Yes, Credit_Rating= Fair)
• There are 2 class: c1 (Bye computer); c2 (Not bye computer)
• Calculate priorities
• P(c1) = 9/14
• P(c2) = 5/14
• Calculate the probability value of each attribute value for each subclass
Naïve Bayes Classification – Example (3)
• Calculate the probability (likelihood) of the example z for each class ci

• Determine the most possible class


• 𝑃 𝑐1 𝑃 𝑧 𝑐1 = (9/14).0,044 = 0,028
• 𝑃 𝑐2 𝑃 𝑧 𝑐2 = (5/14). 0,019 = 0,007

=> Conclusion: He (z) will buy a computer


Course Outline

1. Introduction to probability

2. Naïve Bayes Theorem

3. Naïve Bayes Algorithm

4. Naïve Bayes Examples

5. Naïve Bayes Problems

6. Naïve Bayes Conclusion


Naïve Bayes Classification – Problems (1)
• If there are no examples associated with class ci having attribute value zj :
𝑃 𝑧𝑗 𝑐𝑖 = 0 ⇒ 𝑃(𝑐𝑖 ) ς𝑛𝑗=1 𝑃 𝑧𝑗 𝑐𝑖 = 0
• Solution: Using Bayes to estimate 𝑃 𝑧𝑗 𝑐𝑖 :
𝑛 𝑐𝑖 , 𝑧𝑗 + 𝑚𝑝
𝑃 𝑧𝑗 𝑐𝑖 =
𝑛 𝑐𝑖 + 𝑚
• 𝑛 𝑐𝑖 = number of training examples associated with ci
• 𝑛 𝑐𝑖 , 𝑧𝑗 = number of training examples associated with ci having attribute value zj
• p: estimate for the probability value 𝑃 𝑧𝑗 𝑐𝑖
=> p = 1/k for feature fj has k possible values
• m: a weight is chosen
Naïve Bayes Classification – Problems (2)
• Limits on accuracy in computer calculations
• 𝑃 𝑧𝑗 𝑐𝑖 <1, so if the number of attribute values is big then:
lim(ς𝑛𝑗=1 𝑃 𝑧𝑗 𝑐𝑖 ) = 0
• Solution: Using the logarithmic function for probability values
Course Outline

1. Introduction to probability

2. Naïve Bayes Theorem

3. Naïve Bayes Algorithm

4. Naïve Bayes Examples

5. Naïve Bayes Problems

6. Naïve Bayes Conclusion


Naïve Bayes Classification – Conclusion
• One of the most commonly used machine learning methods in reality
• Despite assuming the conditional independence of the attributes for the
classifiers, the Naïve Bayes classifier still obtains good classification results
in many fields of practical applications.
• When to use?
• Traing data set has large or medium size
• Examples are represented by a large number of attributes
• Attributes are conditionally independent for target classes

You might also like