0% found this document useful (0 votes)

27 views99 pages

Unit 3

The document discusses Bayesian learning methods in machine learning, highlighting their importance in calculating probabilities for hypotheses and understanding various learning algorithms. It covers practical applications such as the naive Bayes classifier and the Gibbs algorithm, as well as the Bayes Optimal Classifier and the PAC learning framework. Additionally, it addresses sample complexity in both finite and infinite hypothesis spaces, emphasizing the role of the Vapnik-Chervonenkis dimension.

Uploaded by

Lavankumar Chiluka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views99 pages

Unit 3

Uploaded by

Lavankumar Chiluka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 99

MACHINE LEARNING

UNIT 3
BAYESIAN LEARNING
 Bayesian learning methods are relevant to our study of machine learning
for two different reasons
 First, Bayesian learning algorithms that calculate explicit probabilities for
hypotheses, such as the naive Bayes classifier, are among the most
practical approaches to certain types of learning problems.
 The second reason that Bayesian methods are important to our study of
machine learning is that they provide a useful perspective for
understanding many learning algorithms that do not explicitly manipulate
probabilities.
Features of Bayesian learning methods

 Each observed training example can incrementally decrease or increase

the estimated probability that a hypothesis is correct
 Prior knowledge can be combined with observed data to determine the

final probability of a hypothesis

 Bayesian methods can accommodate hypotheses that make
probabilistic predictions
 New instances can be classified by combining the predictions of multiple

 hypotheses, weighted by their probabilities.

WSN security
NAIVE BAYES CLASSIFIER
One highly practical Bayesian learning method is the naive Bayes
learner, often called the naive Bayes classijier.

The naive Bayes classifier applies to learning tasks where each

instance x is described by a conjunction of attribute values and
where the target function f ( x ) can take on any value from some
finite set V.

A set of training examples of the target function is provided, and a

new instance is presented, described by the tuple of attribute values
(al, a2.. .a,). The learner is asked to predict the target value, or
classification, for this new instance.
where VNB denotes the target value output by the naive Bayes
classifier
GIBBS ALGORITHM
Bayes optimal classifier obtains the best performance that
can be achieved from the given training data, it can be quite
costly to apply.

The expense is due to the fact that it computes the posterior

probability for every hypothesis in H and then combines the
predictions of each hypothesis to classify each new instance.
An alternative, less optimal method is the Gibbs algorithm
(see Opper and Haussler 1991), defined as follows:
Gibbs algorithm simply applies a hypothesis drawn at random
according to the current posterior probability distribution .

Surprisingly, it can be shown that under certain conditions the

expected misclassification error for the Gibbs algorithm is at
most twice the expected error of the Bayes optimal classifie
AN EXAMPLE: LEARNING TO
CLASSIFY TEXT
K-NEAREST NEIGHBUOR
CLASSIFICATION
K-Nearest Neighbours
 K-Nearest Neighbors is one of the most basic yet essential classification
algorithms in Machine Learning

 It belongs to the supervised learning domain

 Has application in pattern recognition, data mining and intrusion detection

 We are given some prior data (also called training data), which classifies
coordinates into groups identified by an attribute.

 given another set of data points (also called testing data), allocate these points a
group by analyzing the training set.
 given an unclassified point, we can assign it to a group by observing what
group its nearest neighbors belong to.

 This means a point close to a cluster of points classified as ‘Red’ has a higher
probability of getting classified as ‘Red’.

 Intuitively, we can see that the first point (2.5, 7) should be classified as
‘Green’ and the second point (5.5, 4.5) should be classified as ‘Red’.
Weighted K-Nearest Neighbor
Example
Minimum Description Length
Principle
Minimum Description Length Principle (Conti..)
Minimum Description Length Principle (Conti..)
BAYES OPTIMAL CLASSIFIER
The Bayes Optimal Classifier is a probabilistic model that
predicts the most likely outcome for a new situation

The Bayes theorem is a method for calculating a hypothesis’s

probability based on its prior probability, the probabilities of
observing specific data given the hypothesis, and the seen
data itself.

Maximum a Posteriori (MAP), a probabilistic framework for

determining the most likely hypothesis for a training
dataset.
Take a hypothesis space that has 3 hypotheses h1, h2, and
h3.

The posterior probabilities of the hypotheses are as

follows:
h1 -> 0.4
h2 -> 0.3
h3 -> 0.3

Hence, h1 is the MAP hypothesis. (MAP => max

posterior)
Suppose a new instance x is encountered, which is
classified negative by h2 and h3 but positive by h1.
Taking all hypotheses into account, the probability that x is
positive is .4 and the probability that it is negative is
therefore .6.
The classification generated by the MAP hypothesis is
different from the most probable classification in this case
which is negative.

The most probable classification of the new instance is

obtained by combining the predictions of all hypotheses,
weighted by their posterior probabilities.
If the new example’s probable classification can be any
value vj from a set V, the probability P(vj/D) that the right
classification for the new instance is vj is merely

The denominator is omitted since we’re only using this

for comparison and all the values of P(vj/D) will have the
same denominator.

The value vj, for which P (vj/D) is maximum, is the

best classification for the new instance.
Computational Learning Theory
PAC MODEL (PROBABLY
APPROXIMATELY CORRECT)

PAC (Probably Approximately Correct) learning is a framework used for

mathematical analysis. A PAC Learner tries to learn a concept
(approximately correct) by selecting a hypothesis from a set of
hypotheses that has a low generalization error.
Approximately correct Hypothesis if h⊕c <= ɛ
where 0< ɛ < 0.5

Probably Approximately Correct if Pr(h⊕ c ) <= 1- δ

δ is confidence interval where 0< δ < 0.5
SAMPLE COMPLEXITY FOR
FINITE HYPOTHESIS SPACES
 PAC-learnability is largely determined by the number of training examples
required by the learner.
 The growth in the number of required training examples with problem size,
called the sample complexity of the learning problem
 A learner is consistent if it outputs hypotheses that perfectly fit the training
data, whenever possible
 we derive a bound on the number of training examples required by
any consistent learner, independent of the specific algorithm it uses
SAMPLE COMPLEXITY FOR
INFINITE HYPOTHESIS SPACES
 we consider a second measure of the complexity of H, called the Vapnik-
Chervonenkis dimension of H we can state bounds on sample complexity
that use VC(H) rather than IHI.
 The VC dimension measures the complexity of the hypothesis space H, not
by the number of distinct hypotheses 1 H 1, but instead by the numberof
distinct instances from X that can be completely discriminated using H.
 It uses a notation called as shettering.

6.1 Bayesian Learning
No ratings yet
6.1 Bayesian Learning
33 pages
Unit III
No ratings yet
Unit III
19 pages
Unit 3 Bayesian Learning
No ratings yet
Unit 3 Bayesian Learning
49 pages
Unit 3
No ratings yet
Unit 3
16 pages
Bayesian Learning: Based On "Machine Learning", T. Mitchell, Mcgraw Hill, 1997, Ch. 6
No ratings yet
Bayesian Learning: Based On "Machine Learning", T. Mitchell, Mcgraw Hill, 1997, Ch. 6
54 pages
Data Mining - Module 7
No ratings yet
Data Mining - Module 7
8 pages
Machine Learning
No ratings yet
Machine Learning
33 pages
ML 3
No ratings yet
ML 3
45 pages
Slide07 Bayes
No ratings yet
Slide07 Bayes
51 pages
Module - 3 - Last Part
No ratings yet
Module - 3 - Last Part
16 pages
Bayesian Learning Essentials
No ratings yet
Bayesian Learning Essentials
49 pages
Machine Learning: Lecture 6: Bayesian Learning (Based On Chapter 6 of Mitchell T.., Machine Learning, 1997)
0% (1)
Machine Learning: Lecture 6: Bayesian Learning (Based On Chapter 6 of Mitchell T.., Machine Learning, 1997)
15 pages
Machine Learning Notes: Bayesian Methods
No ratings yet
Machine Learning Notes: Bayesian Methods
12 pages
Bayesian and Computational Learning
No ratings yet
Bayesian and Computational Learning
178 pages
Module 1 Part3
No ratings yet
Module 1 Part3
56 pages
Machine - Learning (Unit 3)
No ratings yet
Machine - Learning (Unit 3)
9 pages
Computational Learning Theory Guide
No ratings yet
Computational Learning Theory Guide
24 pages
Concept Learning
No ratings yet
Concept Learning
33 pages
ML Module 4 Chapter 8 RNSIT
No ratings yet
ML Module 4 Chapter 8 RNSIT
5 pages
Bayesian Decision Theory in ML
No ratings yet
Bayesian Decision Theory in ML
56 pages
Notes
No ratings yet
Notes
125 pages
Sec 1630
No ratings yet
Sec 1630
145 pages
Unit 2 Bayesian Learning
No ratings yet
Unit 2 Bayesian Learning
50 pages
PAC Learning & Machine Learning Course
No ratings yet
PAC Learning & Machine Learning Course
36 pages
Bayesian Learning
No ratings yet
Bayesian Learning
42 pages
DM See M4
No ratings yet
DM See M4
8 pages
Bayes Algorithm
No ratings yet
Bayes Algorithm
26 pages
UNIT 1 Notes
No ratings yet
UNIT 1 Notes
38 pages
Bayesian
No ratings yet
Bayesian
23 pages
Datamining Lect12
No ratings yet
Datamining Lect12
75 pages
Matters of Discussion
No ratings yet
Matters of Discussion
28 pages
Bayesian Learning
No ratings yet
Bayesian Learning
81 pages
Classification 2
No ratings yet
Classification 2
19 pages
ML Unit3
No ratings yet
ML Unit3
21 pages
15CS73 Module 4
No ratings yet
15CS73 Module 4
60 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
60 pages
E-Note 14654 Content Document 20231228101425AM
No ratings yet
E-Note 14654 Content Document 20231228101425AM
10 pages
ML RUSA Module 6 Probablistic EM KNN SVM
No ratings yet
ML RUSA Module 6 Probablistic EM KNN SVM
51 pages
DL Highlights
No ratings yet
DL Highlights
6 pages
Data Classification and Prediction : Lecture-11
No ratings yet
Data Classification and Prediction : Lecture-11
36 pages
K - Nearest Neighbours Classifier / Regressor
No ratings yet
K - Nearest Neighbours Classifier / Regressor
35 pages
UNIT I-Part 2
No ratings yet
UNIT I-Part 2
35 pages
Evaluation of Different Classifier
No ratings yet
Evaluation of Different Classifier
4 pages
Bayesian Classification, Nearest
No ratings yet
Bayesian Classification, Nearest
46 pages
ML - Unit4pdf
No ratings yet
ML - Unit4pdf
65 pages
L13 Bayesian Methods
No ratings yet
L13 Bayesian Methods
30 pages
Co-2 ML 2019
No ratings yet
Co-2 ML 2019
71 pages
ML Unit 3 Part 1
No ratings yet
ML Unit 3 Part 1
36 pages
Data Mining Lecture 10B: Classification
No ratings yet
Data Mining Lecture 10B: Classification
62 pages
CCS - Lec 5
No ratings yet
CCS - Lec 5
33 pages
6 Naive-Bayes
No ratings yet
6 Naive-Bayes
18 pages
Classification (NaiveBayes KNN SVM DecisionTrees)
No ratings yet
Classification (NaiveBayes KNN SVM DecisionTrees)
105 pages
Foundations of Machine Learning: Module 7: Computational Learning Theory
No ratings yet
Foundations of Machine Learning: Module 7: Computational Learning Theory
64 pages
ML 05 Bayesian Classifier
No ratings yet
ML 05 Bayesian Classifier
19 pages
ML Unit-3.-1
No ratings yet
ML Unit-3.-1
28 pages
Module05 - Bayesian Reasoning
No ratings yet
Module05 - Bayesian Reasoning
37 pages
IML Module 3
No ratings yet
IML Module 3
95 pages
Enriched Science and Math
No ratings yet
Enriched Science and Math
31 pages
Unanswered Questions and Hypotheses About Domestic
No ratings yet
Unanswered Questions and Hypotheses About Domestic
11 pages
Hunt and Hunt Et Al., Canal Irrigation and Local Social Organization
No ratings yet
Hunt and Hunt Et Al., Canal Irrigation and Local Social Organization
24 pages
Green Innovation & Consumer Resistance
No ratings yet
Green Innovation & Consumer Resistance
7 pages
Introduction To Financial Accounting Theory
No ratings yet
Introduction To Financial Accounting Theory
3 pages
Senior High Research Exam Guide
50% (2)
Senior High Research Exam Guide
2 pages
Quiz 2 ANSWERS
No ratings yet
Quiz 2 ANSWERS
19 pages
Problems and Changes in The Empiricist Criterion of Meaning
No ratings yet
Problems and Changes in The Empiricist Criterion of Meaning
18 pages
Process Skills Defined
100% (1)
Process Skills Defined
3 pages
Activity #1 (March 29, 2021) - Accounting Research Methods
No ratings yet
Activity #1 (March 29, 2021) - Accounting Research Methods
3 pages
Research Project On KFC Co
No ratings yet
Research Project On KFC Co
8 pages
Team Dynamics: Diversity & Structure
No ratings yet
Team Dynamics: Diversity & Structure
14 pages
MYP Crit B - C Level 2 - 3 - Differentiated Lab Report
No ratings yet
MYP Crit B - C Level 2 - 3 - Differentiated Lab Report
20 pages
Glucose Biscuits Market Analysis
No ratings yet
Glucose Biscuits Market Analysis
57 pages
Science - Wikipedia
No ratings yet
Science - Wikipedia
69 pages
Making Sense of International Relations Theory: Jennifer Sterling-Folker
No ratings yet
Making Sense of International Relations Theory: Jennifer Sterling-Folker
13 pages
Bad Management Hypotheses Are Demolishing Management Practices
No ratings yet
Bad Management Hypotheses Are Demolishing Management Practices
4 pages
Daily Lesson Log in Science
100% (1)
Daily Lesson Log in Science
4 pages
Hypothesis Testing Guide
No ratings yet
Hypothesis Testing Guide
4 pages
Okogho Churchill: The Socio-Economic Impact of Guinness Brewery (PLC) On Ikeja Residents
No ratings yet
Okogho Churchill: The Socio-Economic Impact of Guinness Brewery (PLC) On Ikeja Residents
19 pages
Correlation Between Students' Reading Habit and Their Critical Reading Skills
No ratings yet
Correlation Between Students' Reading Habit and Their Critical Reading Skills
20 pages
"Be Trained To Be The Best, Be Linked To Success": Bestlink College of The Philippines
No ratings yet
"Be Trained To Be The Best, Be Linked To Success": Bestlink College of The Philippines
28 pages
Theory: Provocative
No ratings yet
Theory: Provocative
1 page
Relationships Between Theory by Revisionworld
No ratings yet
Relationships Between Theory by Revisionworld
2 pages
Hypothesis Testing - Notes
No ratings yet
Hypothesis Testing - Notes
10 pages
The Science of Family Systems Theory
No ratings yet
The Science of Family Systems Theory
37 pages
Bharti Airtel
No ratings yet
Bharti Airtel
94 pages
You Be The Chemist - The Core of Chemistry
No ratings yet
You Be The Chemist - The Core of Chemistry
96 pages
The Parts of A Quantitative Research: Dir Marck L. Angio
No ratings yet
The Parts of A Quantitative Research: Dir Marck L. Angio
18 pages
Lecture Notes ON Parametric & Non-Parametric Tests FOR Social Scientists/ Participants OF Research Metodology Workshop Bbau, Lucknow
No ratings yet
Lecture Notes ON Parametric & Non-Parametric Tests FOR Social Scientists/ Participants OF Research Metodology Workshop Bbau, Lucknow
19 pages

Unit 3

Uploaded by

Unit 3

Uploaded by

MACHINE LEARNING

 Each observed training example can incrementally decrease or increase

final probability of a hypothesis

 hypotheses, weighted by their probabilities.

The naive Bayes classifier applies to learning tasks where each

A set of training examples of the target function is provided, and a

The expense is due to the fact that it computes the posterior

Surprisingly, it can be shown that under certain conditions the

 It belongs to the supervised learning domain

 Has application in pattern recognition, data mining and intrusion detection

The Bayes theorem is a method for calculating a hypothesis’s

Maximum a Posteriori (MAP), a probabilistic framework for

The posterior probabilities of the hypotheses are as

Hence, h1 is the MAP hypothesis. (MAP => max

The most probable classification of the new instance is

The denominator is omitted since we’re only using this

The value vj, for which P (vj/D) is maximum, is the

PAC (Probably Approximately Correct) learning is a framework used for

Probably Approximately Correct if Pr(h⊕ c ) <= 1- δ

You might also like