0% found this document useful (0 votes)
50 views49 pages

Lecture 11

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views49 pages

Lecture 11

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Bayesian Decision Theory

Primary source of reference: Pattern Classification – Duda


and Hart
An Example
• “Sorting incoming Fish on a conveyor
according to species using optical sensing”

Sea bass (Class 1)


Species
Salmon (Class 2)

Pattern Classification,
1
Chapter 1
Pattern Classification,
2
Chapter 1
• Problem Analysis

– Set up a camera and take some sample images to


extract features like

• Length of the fish


• Lightness (based on the gray level)
• Width of the fish

Pattern Classification,
3
Chapter 1
This is a linear classifier like
Perceptron.
Pattern Classification,
4
Chapter 1
Introduction
• The sea bass/salmon example
(a two class problem)

 For example if we randomly catch 100 fishes and


out of this if 75 are sea bass and 25 are salmon.
 Let the rule, in this case is: For any fish say its class
is sea bass.
 What is the error rate of this rule?
 This information which is independent of feature
values is called apriori knowledge.

5
• Let the two classes are 1 and 2
– P(1) + P( 2) = 1
– State of nature (class) is a random variable
– If P(1) = P(2), we say it is of uniform priors
• The catch of salmon and sea bass is equi-probable

6
• Decision rule with only the prior information
– Decide 1 if P(1) > P(2), otherwise decide 2
• This is not a good classifier.
• We should take feature values into account !
• If x is the pattern we want to classify, then use the rule:

If P(1 | x) > P(2 | x) then assign class 1


Else assign class 2

• P(1 | x) is called posteriori probability of class 1 given that


the pattern is x.

7
Bayes rule
• From data it might be possible for us to
estimate p( x | j ), where i = 1 or 2. These are
called class-conditional distributions.
• Also it is easy to find apriori probabilities P(1)
and P(2) . How this can be done?
• Bayes rule combines apriori probability with
class conditional distributions to find
posteriori probabilities.

8
Bayes Rule

P(A, B) P(A|B) * P(B)


P(B|A) = ----------- = ----------------
P(A) P(A)

This is Bayes Rule

Bayes, Thomas (1763) An essay


towards solving a problem in the doctrine
of chances. Philosophical Transactions
of the Royal Society of London, 53:370-
418

9
p(x | j ) . P (j )
P(j | x) = ---------------------
p(x)
– Where in case of two categories
j 2
p ( x)   p ( x |  j ) P ( j )
j 1

Likelihood . Prior
– Posterior = ----------------------
Evidence
10
11
12
• Decision given the posterior probabilities

X is an observation for which:

if P(1 | x) > P(2 | x) True state of nature = 1


if P(1 | x) < P(2 | x) True state of nature = 2

Therefore:
whenever we observe a particular x, the probability
of error is :
P(error | x) = P(1 | x) if we decide 2
P(error | x) = P(2 | x) if we decide 1

13
• Minimizing the probability of error

• Decide 1 if P(1 | x) > P(2 | x);


otherwise decide 2

Therefore:
P(error | x) = min [P(1 | x), P(2 | x)]
(error of Bayes decision)

14
Average error rate
Average probability of error, P(error) is :

P(error) =  P(error | x) p( x)dx


This is the expected value of P(error|x) w.r.t. x ,
i.e., Ex[P(error | x)]

15
•Consider a one dimensional two class problem. The feature used is color of fish. Color can be either white or dark P( ω 1 ) = 0.75, P( ω2 ) = 0.25,

16
•Consider a one dimensional two class problem. The feature used is color of fish. Color can be either white or dark P( ω 1 ) = 0.75, P( ω2 ) = 0.25,

17
•Consider a one dimensional two class problem. The feature used is color of fish. Color can be either white or dark P( ω 1 ) = 0.75, P( ω2 ) = 0.25,

18
•Consider a one dimensional two class problem. The feature used is color of fish. Color can be either white or dark P( ω 1 ) = 0.75, P( ω2 ) = 0.25,

19
•Consider a one dimensional two class problem. The feature used is color of fish. Color can be either white or dark P( ω 1 ) = 0.75, P( ω2 ) = 0.25,

20
21
• But, what is the error, if we use only apriori
probabilities?

22
23
24
25
• Same error? Where is the advantage?!

26
27
• But, P(error) based on apriori probabilities
only is 0.5.
• Error based on the Bayes classifier is the lower
bound.
– Any classifier’s error is greater than or equal to
this.
• One can prove this!

28
•Consider a one dimensional two class problem. The feature used is color of fish. Color can be either white or dark P( ω 1 ) = 0.75, P( ω2 ) = 0.25,

• Can you solve this?

29
Apriori Probabilities plays an
important role.
This is the knowledge about the
domain
Example
• Given height of a person we wish to classify
whether he/she is from India or Nepal.
• We assume that there are no other classes.
(Each and every person should belong to
either class “India” or to the class “Nepal”)
• For time being assume that we have only
height. (Only one feature)
Example: continued …
• Let h be the height and c be the class of a
person.
• Let the height is discretized as 2.0, 2.5, 3.0,
3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0.
• If height is 5.6, we round it to 5.5.
• We randomly took 100 people who are all
Nepalis. For each height value we counted
how many people are there.
Example: continued
•If we take randomly 100 Nepalis, their heights are as below.
•We found probabilities (these are approximate probability values!)
•These probabilities are called class conditional probabilities, i.e.,
P(h | Nepal).
•For example, P(h = 3.5 | class = Nepal) = 0.1

Height 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8

count 0 1 5 10 10 25 25 10 10 4 0 0 0

Probability 0 0.01 0.05 0.1 0.1 0.25 0.25 0.1 0.1 0.04 0 0 0
Class-conditional Distribution
• Class-conditional distribution for Nepalis

0.25

0.2

0.15 probability

0.1

0.05

0
2 3 4 5 6 7 8
Example: continued …
• Similarly, we took
randomly 100 persons
who are Indians and 0.25
found their respective 0.2
class-conditional
0.15
probabilities. probability

0.1

0.05

0
2 3 4 5 6 7 8

Class-conditional Distribution for the class “India”


Example: continued …
• So you took these probabilities to IIIT Sri City.
• You are asked to classify a student whose
height is 4.5.
• You searched the tables and found that P(4.5
| “Nepal”) = 0.25 and P(4.5 | “India”) = 0.1.
• So, you declared the person is a Nepali.
• …. Somewhere ….. Some thing is wrong …!
Example: continued …
• The security-person at the Gate who is watching you told
in a surprise tone… “Sir, don’t you know that in our
college we have only Indians and there are no Nepalis”.
• This is what is called as Prior knowledge.
• If you randomly take 100 people, if 50 of them are
Indians and 50 of them are Nepalis then the rule you
applied is correct.
– In IIITS, if you randomly take 100 students, all of them will be
Indians… So, this rule is incorrect!!
Example: continued …
• Actually you need to findout
P(Nepal | height = 4.5) and
P(India | height = 4.5)
and accordingly you need to classify.
• This is called as Posterior Probability.
Posterior Probability: Bayes Rule
• P(class = Nepal | height = 4.5)

P(height = 4.5 | class = Nepal) P(Nepal)


= --------------------------------------------------------
P(height = 4.5)

• Here, P(Nepal) is the Prior Probability


RELATIONSHIP BETWEEN K-NNC
AND THE BAYES CLASSIFIER

40
41
42
43
44
45
46
47
48

You might also like