0% found this document useful (0 votes)

112 views86 pages

Bayes Classification for Fish Sorting

This document provides an overview of Bayes classification and how it can be used to classify fish into different species using optical sensing. It discusses key concepts like conditional probability, Bayes' theorem, class-conditional probabilities, posterior probabilities, and the Bayes decision rule. It then presents an example of using length as a feature to classify between salmon and sea bass using training samples to estimate class-conditional probability distributions and applying the Bayes decision rule to assign new samples to a class.

Uploaded by

Rohit Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

112 views86 pages

Bayes Classification for Fish Sorting

Uploaded by

Rohit Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 86

Bayes Classification

Richa Singh

Slides are prepared from several information sources including Duda, Hart, Stork
Probability
§ Conditional probability of A given B:

§ Deriving chain rule from above:

2
Bayes Theorem

4
Bayes Theorem

5
Bayes Classification
An Example

• “Sorting incoming fish on a conveyor according to

species using optical sensing”

Sea bass
Species
Salmon

Let us build a machine learning system that classifies

between Sea Bass and Salmon
7
Fish Classification: Salmon vs. Sea Bass

• Set up a camera and take

some sample images
• Preprocessing involves image
enhancement and
segmentation;
• separate touching or
occluding fishes and
• extract fish contour

8
State of Nature/Prior
• Prior probabilities reﬂect domain expert’s
knowledge of how likely it is that each type of
ﬁsh will appear, before we actually see it.

– State of nature is a random variable: P(w1), P(w2)

– Uniform priors: The catch of salmon and sea bass
is equi-probable (P(w1) = P(w2))
– P(w1) + P(w2) = 1 (exclusivity and exhaustivity)

9
Problem Analysis

• Extract features from the images

• Length
• Lightness
• Width
• Number and shape of fins
• Position of the mouth, etc…
• This is the set of all suggested features to explore for
use in our classifier

10
Representation: Fish Length as Feature

Training Samples

11
Class-conditional Probabilities

• Use of the class–conditional information

• P(x|w1) and P(x|w2) describe the difference

in feature (length or lightness) between the
populations of sea-bass and salmon

12
Class-conditional PDF

13
Bayes’ Classification
• Posterior, likelihood, prior, evidence

– Evidence: In case of two categories

16
Posterior Probabilities

17
Bayes’ Decision
• Decision given the posterior probabilities

• Therefore, whenever we observe a particular

x, the probability of error is :

18
Review
• Classification based on a single feature
• Two class classification
• Sample is assigned to one of the two classes
• The cost of making a false accept or a false
reject is same

20
Bayesian Decision Theory

• Generalization of the preceding ideas

– Use of more than one feature
– Use more than two states of nature
– Allowing actions other than decide on the state of nature
• Allowing actions other than classification primarily allows the
possibility of rejection
• Refusing to make a decision in close or bad cases!
– Introduce a loss function which is more general than the
probability of error
• The loss function states how costly each action taken is
21
Multiple Features
• Digit Recognition

Classifier 5

• X1,…,Xn Î {0,1} (Black vs. White pixels)

• Y Î {5,6} (predict whether a digit is a 5 or a 6)

22
The Bayes Classifier
• Let’s expand this for our digit recognition task:

• To classify, we will simply compute these two probabilities

and predict based on which one is greater

23
Model Parameters
• How many parameters are required to specify the likelihood?
– (Supposing that each image is 30 x 30 pixels)

• # of parameters for modeling P(X1,…,Xn|Y):

– 2(2n-1)

• # of parameters for modeling P(X1|Y),…,P(Xn|Y)

– 2n

)
0 1
90 -
2(2

25
Model Parameters
• The problem with explicitly modeling P(X1,…,Xn|Y) is that
there are usually way too many parameters:
– We’ll run out of space
– We’ll run out of time
– And we’ll need tons of training data (which is usually not
available)

26
The Naïve Bayes Model
• The Naïve Bayes Assumption: Assume that all features are
independent given the class label Y

• Equationally speaking:

27
The Bayes Classifier
• A good strategy is to predict:

– (for example: what is the probability that the image

represents a 5 given its pixels?)

• So … How do we compute that?

The Bayes Classifier
• Let’s expand this for our digit recognition task:

• To classify, we’ll simply compute these two probabilities and predict based
on which one is greater
Naïve Bayes Training
• Now that we’ve decided to use a Naïve Bayes classifier, we need to train it
with some data:

MNIST Training Data

The Normal Density
• Univariate density
– Continuous density
– A lot of processes are asymptotically Gaussian
– Handwritten characters, speech sounds are ideal or
prototype corrupted by random process (central limit
theorem)

1 ( "
1 x−µ %
2+
2
N(x; µ, σ ) = exp *− $ '-
2π σ *) 2 # σ & -,

35
Univariate density

36
The Normal Density
• Multivariate density: Multivariate normal
density in d dimensions is:
2 1 # 1 t −1 &
N(x; µ, σ ) = 1/2
exp %− (x − µ ) Σ (x − µ )(
(2π )d/2 Σ $ 2 '

x = (x1, x2, …, xd)t

µ = (µ1, µ2, …, µd)t mean vector
S = d*d covariance matrix
|S| and S-1 are determinant and inverse respectively
37
Multivariate density

38
Discriminant Functions for the
Normal Density

• Minimum error-rate classification can be achieved by the

discriminant function

• g1(x) = ln p(x | w1) + ln P(w1) – ln p(x)

• g2(x) = ln p(x | w2) + ln P(w2) – ln p(x)

39
Discriminant Functions for the
Normal Density

• Minimum error-rate classification can be achieved by

the discriminant function
• gi(x) = ln P(x | wi) + ln P(wi)

• Case of multivariate normal

1 t −1 d 1
gi (x) = − (x − µi ) Σ i (x − µi ) − ln2π − ln Σ i + ln P(ω i )
2 2 2

2 1 # 1 t −1 &
N(x; µ, σ ) = 1/2
exp %− (x − µ ) Σ (x − µ )(
40
(2π )d/2 Σ $ 2 '
€
Analyzing Covariance Matrix
1 t −1 d 1
gi (x) = − (x − µi ) Σ i (x − µi ) − ln2π − ln Σ i + ln P(ω i )
2 2 2

• Two cases:
– Features are independent
– Features are dependent

Si = s i 2.I
Case Si = s2.I (I stands for the identity matrix)
Case Si = S (covariance of all classes are identical but arbitrary!)
Case Si = actual covariance

41
Discriminant Functions for the Normal
Density

• Case Si = s2.I (I stands for the identity matrix)

– sij = 0: Features are statistically independent
– sii is same for all the features
1 t −1 d 1
gi (x) = − (x − µi ) Σ i (x − µi ) − ln2π − ln Σ i + ln P(ω i )
2 2 2
1/s2 Constant for all Constant for all
the classes the classes

42
Discriminant Functions for the Normal
Density

• Case Si = s2.I (I stands for the identity matrix)

– sij = 0: Features are statistically independent)
– sii is same for all the features
1 t −1 d 1
gi (x) = − (x − µi ) Σ i (x − µi ) − ln2π − ln Σ i + ln P(ω i )
2 2 2

43
Discriminant Functions for the Normal
Density…

• Disregarding xtx, we get a linear discriminant function

gi (x) = w it x + w i0
where :
µi 1 t
w i = 2 ; w i0 = − 2 µi µi + ln P(ω i )
σ 2σ
€
(ω i0 is called the threshold for the ith category!)

44
Discriminant Functions for the Normal
Density…

• A classifier that uses linear discriminant functions is

called “a linear machine”
• The decision surfaces for a linear machine are
hyperplanes defined by gi(x) = gj(x)

45
Discriminant Functions for the Normal
Density…
– The hyperplane separating Ri and Rj

1 s2 P( w i )
x0 = ( µ i + µ j ) - ln ( µi - µ j )
2 µi - µ j
2
P( w j )

is always orthogonal to the line linking the means!

1
if P(ω i ) = P(ω j ) then x 0 = ( µi + µ j )
2
2
gi (x) = − x − µ i
46
47
48
49
50
51
52
53
Incorporating Risk Function
Bayesian Decision Theory – Risk Function
Loss matrix

• Loss function
• sea bass: 600 per kg
• Salmon: 300 per kg

0 −300 Seabass
𝐿=
+300 0 Salmon
Seabass Salmon

Zero-one loss function

0 1
𝐿=
1 0
55
Bayesian Decision Theory – Risk Function

• Let {w1, w2,…, wc} be the set of c states of

nature (or “categories”)
• Let {a1, a2,…, aa} be the set of possible
actions
• Let l(ai | wj) be the loss incurred for taking
action ai when the true state of nature is wj

56
Two-category Classification

• Actions:
• a1 : deciding w1
• a2 : deciding w2

• lij = l(ai | wj)

• Loss incurred for deciding ai when the true
state of nature is wj

57
Two-category Classification

• a1: deciding w1
• a2: deciding w2
• lij = l(ai | wj)
• Loss incurred for deciding ai when the
true state of nature is wj
• Conditional risk:

58
Loss11*cc(x| w1)*prior(w1)/p(x)+loss12*cc(x| w2)*prior(w2)/p(x) =
Loss21*cc(x| w1)*prior(w1)/p(x)+loss22*cc(x| w2)*prior(w2)/p(x)

Loss11*cc(x|w1)*prior(w1) +loss12*cc(x|w2)*prior(w2) =
Loss21*cc(x|w1)*prior(w1) + loss22*cc(x|w2)*prior(w2)

cc(w1)*prior(w1)(Loss11 - Loss21) =
cc(w2)*prior(w2)( -loss12*+ loss22)

60
Two-category Classification

• Our rule is the following:

if R(a1 | x) < R(a2 | x)
• Action a1: “decide w1” is taken
• This results in the equivalent rule :
• Decide w1 if:

• and decide w2 otherwise

61
Bayesian Decision Theory – Continuous
Features…
• Overall risk
R = Sum of all R(ai | x) for i = 1,…,a
Conditional risk

• Minimizing R Minimizing R(ai | x) for i

= 1,…, a
j =c

• R(α i | x) = ∑ λ (α i | ω j )P(ω j | x) for i = 1,…,a

j =1

62
€
Bayesian Decision Theory –
Continuous Features…

• Select the action ai for which R(ai | x) is

minimum

• R is minimum and is called the Bayes risk =

best performance that can be achieved!

63
Likelihood Ratio

• The preceding rule is equivalent to the following rule:

• If

• Then take action a1 (decide w1)

• Otherwise take action a2 (decide w2)

64
Likelihood Ratio…
• Optimal decision property

“If the likelihood ratio exceeds a threshold

value independent of the input pattern x, we
can take optimal actions”

65
Example: Checking on a course

• A student needs to make a decision which

courses to take, based only on first lecture’s
impression
• From student’s previous experience:
Quality of the course good fair bad

Probability (prior) 0.2 0.4 0.4

• These are prior probabilities.

66
Example: Checking on a course
• The student also knows the class-conditionals:

Pr(x|wj) good fair bad

Interesting lecture 0.8 0.5 0.1
Boring lecture 0.2 0.5 0.9

• The loss function is given by the matrix

l(ai | wj) good course fair course bad course

Taking the course 0 5 10
Not taking the course 20 5 0

67
Example: Checking on a course

• We can take decisions according to two rules:

• Posterior probability
• Risk minimization

68
Example: Checking on a course
• Given that the first class was interesting, the
student wants to make an optimal decision.
• Pr(bad|interesting)
• Pr(fair|interesting)
• Pr(good|interesting)

• Pr(interesting|bad)
• Pr(interesting|fair)
• Pr(interesting|good)

• Pr(interesting)
69
Example: Checking on a course
• The probability to get the “interesting lecture”(x= interesting):
• Pr(interesting)= Pr(interesting|good course)* Pr(good course)
+ Pr(interesting|fair course)* Pr(fair course)
+ Pr(interesting|bad course)* Pr(bad course)
=0.8*0.2+0.5*0.4+0.1*0.4=0.4

• Consequently, Pr(boring)=1-0.4=0.6
• Suppose the lecture was interesting. Then we want to compute the
posterior probabilities of each one of the 3 possible “states of nature”.

70
Example: Checking on a course
Pr(good course|interesting lecture)
Pr(interesting|good)Pr(good) 0.8*0.2
= = = 0.4
Pr(interesting) 0.4
Pr(fair|interesting)
Pr(interesting|fair)Pr(fair) 0.5*0.4
= = = 0.5
Pr(interesting) 0.4
• We can get Pr(bad|interesting) = 0.1 either by the same method,
or by noting that it complements the above two to 1.
• Now, we have all we need to make an intelligent decision about
an optimal action

71
Example: Checking on a course

• The student needs to minimize the conditional

risk c
R(a i | x) = å l (a i | w j ) P(w j | x)
j =1

R(taking|interesting) =
Pr(good|interesting) l(taking good course) +
Pr(fair|interesting) l(taking fair course) +
Pr(bad|interesting) l(taking bad course)
= 0.4*0 + 0.5*5 + 0.1*10 = 3.5
72
Example: Checking on a course

c
R(a i | x) = å l (a i | w j ) P(w j | x)
j =1

R(not taking|interesting) =
Pr(good|interesting)l(not taking good course) +
Pr(fair|interesting)l(not taking fair course) +
Pr(bad|interesting)l(not taking bad course)

= 0.4*20+0.5*5+0.1*0 = 10.5
73
Constructing an optimal decision function

• So, if the first lecture was interesting, the

student will minimize the conditional risk by
taking the course.
• In order to construct the full decision
function, we need to define the risk
minimization action for the case of boring
lecture, as well.

74
Exercise
• Select the optimal decision where: W = {w1,
w2}
• P(x | w1) N(2, 0.5) (Normal distribution)
• P(x | w2) N(1.5, 0.2)

• P(w1) = 2/3 é1 2ù
l=ê ú
• P(w2) = 1/3 ë 3 4 û

75
Questions?

76
Discriminant Functions for the Normal
Density…
• Case Si = S (covariance of all classes are
identical but arbitrary!)
1 t −1 d 1
gi (x) = − (x − µi ) Σ i (x − µi ) − ln2π − ln Σ i + ln P(ω i )
2 2 2

– Expand the term and disregard the quadratic

expression
€ where :
t −1 1 t −1
gi (x) = w x + w i0
i
w i = ∑ µ; w i0 = − µi ∑ µi + ln P(ω i )
2

77
Discriminant Functions for the Normal
Density…

1
x 0 = ( µi + µ j ) −
[
ln P(ω i ) /P(ω j ) ] .( µi − µ j )
t −1
2 ( µi − µ j ) Σ ( µi − µ j )

• Comments about this hyperplane:

€ – It passes through x0
– It is NOT orthogonal to the line linking the means.
– What happens when P(ωi)= P(ωj) ?
– If P(ωi) != P(ωj), then x0 shifts away from the more
likely mean.
78
Discriminant Functions for the Normal
Density…
1 t −1 d 1
gi (x) = − (x − µi ) Σ i (x − µi ) − ln2π − ln Σ i + ln P(ω i )
2 2 2

• When P(ωi) is the same for each of the c classes

€

• Mahalanobis distance classifier

1
gi (x) = − (x − µi ) t Σ −1
i (x − µi )
2

79
€
Mahalanobis Distance

Euclidean Distance:

Mahalanobis Distance:

80
Mahalanobis Distance

Euclidean Distance:

All points are equidistant

81
Mahalanobis Distance
Class 2

Class 1

82
Mahalanobis Distance
Covariance Matrix:

é0.3 0.2ù
S=ê ú
C ë 0.2 0.3û
B
A: (0.5, 0.5)

A B: (0, 1)
C: (1.5, 1.5)

Compute
Euclid(A,B) squared
versions
Euclid(A,C)
Mahal(A,B)
Mahal(A,C) 83
Mahalanobis Distance
Covariance Matrix:

é0.3 0.2ù
S=ê ú
ë 0.2 0.3û
C
A: (0.5, 0.5)
B B: (0, 1)
C: (1.5, 1.5)
A

Euclid(A,B) = 0.5
Euclid(A,C) = 2
Mahal(A,B) = 5
Mahal(A,C) = 4 84
Discriminant Functions for the
Normal Density…

The contour lines are elliptical in

shape because the covariance
matrix is not diagonal. However,
both densities show the same
elliptical shape. The prior
probabilities are the same, and so
the point x0 lies halfway between
the 2 means. The decision boundary
is not orthogonal to the red line.
Instead, it is tilted so that its points
are of equal distance to the contour
lines in w1 and those in w2.

85
86
87
Discriminant Functions for the Normal
Density…

• Case Si = arbitrary
– The covariance matrices are different for each category

g i ( x ) = x tWi x + w it x = w i 0
where :
1 -1
Wi = - S i
2
w i = S i- 1 µ i
1 t -1 1
wi0 = - µ i S i µ i - ln S i + ln P ( w i )
2 2

88
Discriminant Functions for the Normal
Density…

P(ω1)=P(ω2)
Disconnected
decision
regions

89
90
91
Discriminant Functions for the Normal
Density

92
Decision Regions for Two-Dimensional
Gaussian Data

93
Decision Regions for Two-Dimensional
Gaussian Data

94
Thanks.

Pattern Recognition Lecture Bayes Decision Theory: Prof. Dr. Marcin Grzegorzek
100% (1)
Pattern Recognition Lecture Bayes Decision Theory: Prof. Dr. Marcin Grzegorzek
35 pages
IIT Madras Notes Machine Learning
No ratings yet
IIT Madras Notes Machine Learning
13 pages
Pattern Classification: All Materials in These Slides Were Taken From
No ratings yet
Pattern Classification: All Materials in These Slides Were Taken From
44 pages
19-Introduction Classification Algorithm-18-09-2024
No ratings yet
19-Introduction Classification Algorithm-18-09-2024
102 pages
AI Statistical Methods Course
No ratings yet
AI Statistical Methods Course
23 pages
Week 02 PDF
No ratings yet
Week 02 PDF
39 pages
Lecture 2 Data Mining Functions
No ratings yet
Lecture 2 Data Mining Functions
40 pages
Data Mining
No ratings yet
Data Mining
73 pages
An Introduction To Pattern Recognition - 2
No ratings yet
An Introduction To Pattern Recognition - 2
46 pages
LINFO2262: Decision Trees + Random Forests: Pierre Dupont
No ratings yet
LINFO2262: Decision Trees + Random Forests: Pierre Dupont
43 pages
Introduction To Pattern Recognition: Vojtěch Franc
100% (1)
Introduction To Pattern Recognition: Vojtěch Franc
21 pages
Pattern Recognition: Tutorial 2
No ratings yet
Pattern Recognition: Tutorial 2
23 pages
Pattern Recognition for CS Scholars
0% (1)
Pattern Recognition for CS Scholars
37 pages
Bayesian Decision Theory Guide
No ratings yet
Bayesian Decision Theory Guide
39 pages
Duda ch10
No ratings yet
Duda ch10
17 pages
What Is Computer Vision?
No ratings yet
What Is Computer Vision?
120 pages
Pattern Recognition: Dr. Farah Qais Al-Khalidi
No ratings yet
Pattern Recognition: Dr. Farah Qais Al-Khalidi
43 pages
Pattern Recognition: Dr. Farah Qais Al-Khalidi
100% (1)
Pattern Recognition: Dr. Farah Qais Al-Khalidi
49 pages
Pattern Recognition Systems
No ratings yet
Pattern Recognition Systems
81 pages
01 - ML Introduction - Course Outline
No ratings yet
01 - ML Introduction - Course Outline
21 pages
EC-350 AI and Decision Support Systems: Week 1 Dr. Arslan Shaukat
No ratings yet
EC-350 AI and Decision Support Systems: Week 1 Dr. Arslan Shaukat
21 pages
Statistical Learning: Classification Course
No ratings yet
Statistical Learning: Classification Course
65 pages
Recommendation Systems
No ratings yet
Recommendation Systems
29 pages
Neural Networks & Pattern Recognition Guide
No ratings yet
Neural Networks & Pattern Recognition Guide
9 pages
Types of Data (Qualitative and Quantitative)
No ratings yet
Types of Data (Qualitative and Quantitative)
89 pages
Department of Computer Science and Engineering (CSE)
No ratings yet
Department of Computer Science and Engineering (CSE)
11 pages
Weka Tutorial
No ratings yet
Weka Tutorial
2 pages
ML Cheatsheet Final
No ratings yet
ML Cheatsheet Final
32 pages
Data Mining: Concepts and Techniques: - Chapter 6
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 6
172 pages
Clustering
No ratings yet
Clustering
84 pages
Gaussian Mixture Models Unit-III
No ratings yet
Gaussian Mixture Models Unit-III
13 pages
02 ML Supervised Learning
No ratings yet
02 ML Supervised Learning
32 pages
22 - State Graph Reasoning For Multimodal Conversational Recommendation
No ratings yet
22 - State Graph Reasoning For Multimodal Conversational Recommendation
12 pages
Class Notes Unit 2 ML Material
No ratings yet
Class Notes Unit 2 ML Material
31 pages
ML Unit 2
No ratings yet
ML Unit 2
25 pages
Day 5 Supervised Technique-Decision Tree For Classification PDF
100% (1)
Day 5 Supervised Technique-Decision Tree For Classification PDF
58 pages
Efficient Fine-Tuning with PEFT
No ratings yet
Efficient Fine-Tuning with PEFT
10 pages
Data Mining: Concepts and Techniques: Jiawei Han and Micheline Kamber
No ratings yet
Data Mining: Concepts and Techniques: Jiawei Han and Micheline Kamber
46 pages
Cluster
100% (1)
Cluster
72 pages
Lecture Notes - Random Forests PDF
100% (1)
Lecture Notes - Random Forests PDF
4 pages
Model Generalization
No ratings yet
Model Generalization
117 pages
Lecture 9 PDF
100% (1)
Lecture 9 PDF
28 pages
Lecture 01 (Introduction To Pattern Recognition)
No ratings yet
Lecture 01 (Introduction To Pattern Recognition)
26 pages
An Introduction To Digital Image Process
No ratings yet
An Introduction To Digital Image Process
233 pages
Lightgt: A Light Graph Transformer For Multimedia Recommendation
No ratings yet
Lightgt: A Light Graph Transformer For Multimedia Recommendation
10 pages
Introductory Techniques For 3-D Computer Vision
No ratings yet
Introductory Techniques For 3-D Computer Vision
182 pages
Linear Regression with Scikit-Learn
No ratings yet
Linear Regression with Scikit-Learn
8 pages
Prof. Chandan Singhavi
No ratings yet
Prof. Chandan Singhavi
86 pages
CSE 473 Pattern Recognition: Instructor: Dr. Md. Monirul Islam
100% (1)
CSE 473 Pattern Recognition: Instructor: Dr. Md. Monirul Islam
57 pages
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
Machine Learning Algorithms
No ratings yet
Machine Learning Algorithms
9 pages
Ai Unit-4
No ratings yet
Ai Unit-4
60 pages
Basic Image Processing For Robotics Manual
No ratings yet
Basic Image Processing For Robotics Manual
103 pages
Introduction To Feed Forward Neural Networks
No ratings yet
Introduction To Feed Forward Neural Networks
121 pages
Machine Learning Guide Line
No ratings yet
Machine Learning Guide Line
10 pages
Eda PDF
100% (1)
Eda PDF
45 pages
Bayes Decision Theory
No ratings yet
Bayes Decision Theory
53 pages
Bayesian
No ratings yet
Bayesian
21 pages
Lectures3 5
No ratings yet
Lectures3 5
57 pages
ICA Dim Red
No ratings yet
ICA Dim Red
39 pages
Dimensionality Reduction 22-01-22
No ratings yet
Dimensionality Reduction 22-01-22
47 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
Features Election
No ratings yet
Features Election
62 pages
Rohit Singh's Tech Skills & Projects
No ratings yet
Rohit Singh's Tech Skills & Projects
1 page
Individual Conference Form
No ratings yet
Individual Conference Form
24 pages
EE314 Lab 1 Final
No ratings yet
EE314 Lab 1 Final
19 pages
Mercedes Axor 1828 2022 09 29
No ratings yet
Mercedes Axor 1828 2022 09 29
7 pages
Nudge Ethics
No ratings yet
Nudge Ethics
76 pages
Dasaratha Shani Stotra: Dasaratha Shani Stotra in Hindi
0% (1)
Dasaratha Shani Stotra: Dasaratha Shani Stotra in Hindi
8 pages
MARGALLO - Asynchronous 2 Lecture 2
No ratings yet
MARGALLO - Asynchronous 2 Lecture 2
3 pages
Retrieval Augmentation Reduces Hallucination in Conversation
No ratings yet
Retrieval Augmentation Reduces Hallucination in Conversation
21 pages
Jefferson - Catalogo de Solenoides
No ratings yet
Jefferson - Catalogo de Solenoides
8 pages
Stress Concentration Solution For A 2D Dent in An Internally Pressurized Cylinder
No ratings yet
Stress Concentration Solution For A 2D Dent in An Internally Pressurized Cylinder
10 pages
ATA Question Bank@GSR
No ratings yet
ATA Question Bank@GSR
5 pages
Final Examination Timetable t2 2425
No ratings yet
Final Examination Timetable t2 2425
2 pages
NH Quick Tick Tests
100% (4)
NH Quick Tick Tests
17 pages
Civil Cad 2010 v-1.0 Eng
No ratings yet
Civil Cad 2010 v-1.0 Eng
11 pages
Summative Test 3 Ps
No ratings yet
Summative Test 3 Ps
2 pages
DipTrace Tutorial
No ratings yet
DipTrace Tutorial
134 pages
TOFD vs Radiography for Steel Welds
No ratings yet
TOFD vs Radiography for Steel Welds
6 pages
Btech II Sem TT Ay 2024-25
No ratings yet
Btech II Sem TT Ay 2024-25
34 pages
Manual de Motorola S9-HD
No ratings yet
Manual de Motorola S9-HD
15 pages
AS610-AS630 Outdoor Siren Installation Sheet
No ratings yet
AS610-AS630 Outdoor Siren Installation Sheet
6 pages
Model 3500 Multi-Pin PDF
100% (1)
Model 3500 Multi-Pin PDF
3 pages
Ground Improvement Brochure
No ratings yet
Ground Improvement Brochure
16 pages
Palombini - 1993 - Machine Songs V Pierre Schaeffer From Research I
No ratings yet
Palombini - 1993 - Machine Songs V Pierre Schaeffer From Research I
7 pages
Cessna 182 Skylane Certification
No ratings yet
Cessna 182 Skylane Certification
24 pages
Gupta and Gupta Vol 1 - PDF
No ratings yet
Gupta and Gupta Vol 1 - PDF
604 pages
B.Tech Project Report Template
No ratings yet
B.Tech Project Report Template
15 pages
Aristotle's Defense of Slavery
No ratings yet
Aristotle's Defense of Slavery
3 pages
Power Point Presentation On-: Excel Formulas
100% (1)
Power Point Presentation On-: Excel Formulas
12 pages
Operation Blue Star The True Story PDF
No ratings yet
Operation Blue Star The True Story PDF
25 pages
Christopher Fynsk - The Claim of Language-A Case For The Humanities
No ratings yet
Christopher Fynsk - The Claim of Language-A Case For The Humanities
39 pages
Grade 7 Creative Arts & Sports
100% (2)
Grade 7 Creative Arts & Sports
58 pages

Bayes Classification for Fish Sorting

Uploaded by

Bayes Classification for Fish Sorting

Uploaded by

Bayes Classification

§ Deriving chain rule from above:

• “Sorting incoming fish on a conveyor according to

Let us build a machine learning system that classifies

• Set up a camera and take

– State of nature is a random variable: P(w1), P(w2)

• Extract features from the images

• Use of the class–conditional information

• P(x|w1) and P(x|w2) describe the difference

– Evidence: In case of two categories

• Therefore, whenever we observe a particular

• Generalization of the preceding ideas

• X1,…,Xn Î {0,1} (Black vs. White pixels)

• To classify, we will simply compute these two probabilities

• # of parameters for modeling P(X1,…,Xn|Y):

• # of parameters for modeling P(X1|Y),…,P(Xn|Y)

– (for example: what is the probability that the image

• So … How do we compute that?

MNIST Training Data

x = (x1, x2, …, xd)t

• Minimum error-rate classification can be achieved by the

• g1(x) = ln p(x | w1) + ln P(w1) – ln p(x)

• Minimum error-rate classification can be achieved by

• Case of multivariate normal

• Case Si = s2.I (I stands for the identity matrix)

• Case Si = s2.I (I stands for the identity matrix)

• Disregarding xtx, we get a linear discriminant function

• A classifier that uses linear discriminant functions is

is always orthogonal to the line linking the means!

Zero-one loss function

• Let {w1, w2,…, wc} be the set of c states of

• lij = l(ai | wj)

• Our rule is the following:

• and decide w2 otherwise

• Minimizing R Minimizing R(ai | x) for i

• R(α i | x) = ∑ λ (α i | ω j )P(ω j | x) for i = 1,…,a

• Select the action ai for which R(ai | x) is

• R is minimum and is called the Bayes risk =

• The preceding rule is equivalent to the following rule:

• Then take action a1 (decide w1)

“If the likelihood ratio exceeds a threshold

• A student needs to make a decision which

Probability (prior) 0.2 0.4 0.4

• These are prior probabilities.

Pr(x|wj) good fair bad

• The loss function is given by the matrix

l(ai | wj) good course fair course bad course

• We can take decisions according to two rules:

• The student needs to minimize the conditional

• So, if the first lecture was interesting, the

– Expand the term and disregard the quadratic

• Comments about this hyperplane:

• When P(ωi) is the same for each of the c classes

• Mahalanobis distance classifier

All points are equidistant

The contour lines are elliptical in

You might also like