0% found this document useful (0 votes)

32 views29 pages

Lecture17 Mle Map

Uploaded by

MSR MSR

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views29 pages

Lecture17 Mle Map

Uploaded by

MSR MSR

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

10-601 Introduction to Machine Learning

Machine Learning Department

School of Computer Science
Carnegie Mellon University

MLE/MAP
+
Naïve Bayes
Matt Gormley
Lecture 17
Mar. 20, 2019

1
Reminders
• Homework 5: Neural Networks
– Out: Fri, Mar 1
– Due: Fri, Mar 22 at 11:59pm
• Homework 6: Learning Theory / Generative
Models
– Out: Fri, Mar 22
– Due: Fri, Mar 29 at 11:59pm (1 week)
TIP: Do the readings!

• Today’s In-Class Poll

– http://p17.mlcourse.org

2
MLE AND MAP

14
Likelihood Function One R.V.

• Suppose we have N samples D = {x(1), x(2), …, x(N)} from a

• The log-likelihood function:

17
Likelihood Function Two R.V.s

• Suppose we have N samples D = {(x(1), y(1)), …, (x(N), y(N))} from a

pair of random variables X, Y

• The conditional likelihood function:

• The joint likelihood function:

18
Likelihood Function Two R.V.s

• Suppose we have N samples D = {(x(1), y(1)), …, (x(N), y(N))} from a

pair of random variables X, Y

• The joint likelihood function:

– Case 4: Y is continuous with pdf f(y|β) and

19
MLE
Suppose we have data D = {x(i) }N
i=1
Principle of Maximum Likelihood
N Estimation:
Choose theMLE = `;Ktthat p(t
parameters maximize
|N ) the likelihood
(i)
of the data. MLE
= `;Kt
i=1 (i)
N
p(t | )
MAP
= `;Kt p(t(i)i=1
| )p( )
Maximum Likelihood Estimate (MLE)
i=1
θ2 θMLE

L(θ)
L(θ1, θ2)

θMLE θ1
20
MLE
What does maximizing likelihood accomplish?
• There is only a finite amount of probability
mass (i.e. sum-to-one constraint)
• MLE tries to allocate as much probability
mass as possible to the things we have
observed…

…at the expense of the things we have not

observed

21
Recipe for Closed-form MLE
1. Assume data was generated i.i.d. from some model
(i.e. write the generative story)
x(i) ~ p(x|θ)
2. Write log-likelihood
l(θ) = log p(x(1)|θ) + … + log p(x(N)|θ)
3. Compute partial derivatives (i.e. gradient)
!l(θ)/!θ1 = …
!l(θ)/!θ2 = …
…
!l(θ)/!θM = …
4. Set derivatives to zero and solve for θ
!l(θ)/!θm = 0 for all m ∈ {1, …, M}
θMLE = solution to system of M equations and M variables
5. Compute the second derivative and check that l(θ) is concave down
at θMLE

22
MLE
Example: MLE of Exponential Distribution
Goal:
• pdf of Exponential( ): f (x) = e x
• Suppose Xi Exponential( ) for 1 i N .
• pdf of Exponential( ): f
• Find MLE for data D = {x }i=1(x)
(i) =N e x

• Suppose
• First writeXdown
i Exponential(
log-likelihood) for 1 i N.
of sample.
Steps:
• Find MLE for data
• Compute ﬁrst derivative, set}to
D = {x (i) N
zero, solve for .
i=1
• First
• writesecond
Compute down log-likelihood
derivative and ofcheck
sample.
that it is
• Compute ﬁrst derivative,
concave down at MLE . set to zero, solve for .
• Compute second derivative and check that it is
concave down at MLE .

23
• pdf of Exponential( ): f (x) = e x

MLE •
•
Suppose Xi Exponential( ) for 1 i N .
Find MLE for data D = {x(i) }Ni=1
• First write down log-likelihood of sample.
• Compute ﬁrst derivative, set to zero, solve for .
Example: MLE of Exponential Distribution • Compute second derivative and check that it is
concave down at MLE .

• First write down log-likelihood of sample.

N
( )= HQ; f (x(i) ) (1)
i=1
N
= HQ;( 2tT( x(i) )) (2)
i=1
N
= HQ;( ) + x(i) (3)
i=1
N
= N HQ;( ) x(i) (4)
i=1

24
• pdf of Exponential( ): f (x) = e x

• Compute ﬁrst derivative, set to zero, solve for .

N
d ( ) d
= N HQ;( ) x(i) (1)
d d i=1
N
N
= x(i) = 0 (2)
i=1

MLE N
= N
(3)
i=1 x(i)

25
MLE
In-Class Exercise Steps to answer:
Show that the MLE of 1. Write log-likelihood
parameter ɸ for N of sample
samples drawn from 2. Compute derivative
Bernoulli(ɸ) is: w.r.t. ɸ
3. Set derivative to
zero and solve for ɸ

26
MLE
Question: Answer:
Assume we have N samples x(1), A. l(ɸ) = N1 log(ɸ) + N0 (1 - log(ɸ))
x(2), …, x(N) drawn from a B. l(ɸ) = N1 log(ɸ) + N0 log(1-ɸ)
Bernoulli(ɸ). C. l(ɸ) = log(ɸ)N1 + (1 - log(ɸ))N0
D. l(ɸ) = log(ɸ)N1 + log(1-ɸ)N0
What is the log-likelihood of E. l(ɸ) = N0 log(ɸ) + N1 (1 - log(ɸ))
the data l(ɸ)?
F. l(ɸ) = N0 log(ɸ) + N1 log(1-ɸ)
Assume N1 = # of (x(i) = 1) G. l(ɸ) = log(ɸ)N0 + (1 - log(ɸ))N1
N0 = # of (x(i) = 0) H. l(ɸ) = log(ɸ)N0 + log(1-ɸ)N1
I. l(ɸ) = the most likely answer

27
MLE
Question: Answer:
Assume we have N samples x(1), A. !l(θ)/!θ = ɸN1 + (1 - ɸ)N0
x(2), …, x(N) drawn from a B. !l(θ)/!θ = ɸ / N1 + (1 - ɸ) / N0
Bernoulli(ɸ). C. !l(θ)/!θ = N1 / ɸ + N0 / (1 - ɸ)
D. !l(θ)/!θ = log(ɸ) / N1 + log(1 - ɸ) / N0
E. !l(θ)/!θ = N1 / log(ɸ) + N0 / log(1 - ɸ)
What is the derivative of the
log-likelihood !l(θ)/!θ?

Assume N1 = # of (x(i) = 1)
N0 = # of (x(i) = 0)

28
Learning from Data (Frequentist)
Whiteboard
– Optimization for MLE
– Examples: 1D and 2D optimization
– Example: MLE of Bernoulli
– Example: MLE of Categorical
– Aside: Method of Langrange Multipliers

29
MLE vs. MAP
Suppose we have data D = {x(i) }N
i=1
Principle of Maximum Likelihood
N Estimation:
Choose theMLE = `;Ktthat p(t
parameters maximize
|N ) the likelihood
(i)
of the data. MLE
= `;Kt
i=1 (i)
N
p(t | )
MAP
= `;Kt p(t(i)i=1
| )p( )
Maximum Likelihood Estimate (MLE)
i=1
Principle of Maximum a posteriori (MAP) Estimation:
Choose the parameters that maximize the posterior
of the parameters given the data.

Maximum a posteriori (MAP) estimate 30

MLE vs. MAP
Suppose we have data D = {x(i) }N
i=1
Principle of Maximum Likelihood
N Estimation:
Choose theMLE = `;Ktthat p(t
parameters maximize
|N ) the likelihood
(i)
of the data. MLE
= `;Kt
i=1 (i)
N
p(t | )
MAP
= `;Kt p(t(i)i=1
| )p( )
Maximum Likelihood Estimate (MLE)
i=1
Principle of Maximum a posteriori (MAP) Estimation:
Choose the parameters that maximize the posterior
of the parameters given the data. Prior
N
MAP
= `;Kt p(t(i) | )p( )
i=1
Maximum a posteriori (MAP) estimate 31
MLE vs. MAP
Suppose we have data D = {x(i) }N
i=1
Principle of Maximum Likelihood
N Estimation:
Important!
Choose theMLE = `;Ktthat p(t
parameters maximize
(i)
Usually theparameters
|N ) the likelihood are
of the data. i=1 continuous,(i) so the prior is a
MLE
= `;Kt p(t | )
N probability density function
MAP
= `;Kt p(t(i)i=1
| )p( )
Maximum Likelihood Estimate (MLE)
i=1
Principle of Maximum a posteriori (MAP) Estimation:
Choose the parameters that maximize the posterior
of the parameters given the data. Prior
N
MAP
= `;Kt p(t(i) | )p( )
i=1
Maximum a posteriori (MAP) estimate 32
Learning from Data (Bayesian)
Whiteboard
– maximum a posteriori (MAP) estimation
– Optimization for MAP
– Example: MAP of Bernoulli—Beta

33
Takeaways
• One view of what ML is trying to accomplish is
function approximation
• The principle of maximum likelihood
estimation provides an alternate view of
learning

• Synthetic data can help debug ML algorithms

• Probability distributions can be used to model
real data that occurs in the world
(don’t worry we’ll make our distributions more
interesting soon!)
34
Learning Objectives
MLE / MAP
You should be able to…
1. Recall probability basics, including but not limited to: discrete
and continuous random variables, probability mass functions,
probability density functions, events vs. random variables,
expectation and variance, joint probability distributions,
marginal probabilities, conditional probabilities, independence,
conditional independence
2. Describe common probability distributions such as the Beta,
Dirichlet, Multinomial, Categorical, Gaussian, Exponential, etc.
3. State the principle of maximum likelihood estimation and
explain what it tries to accomplish
4. State the principle of maximum a posteriori estimation and
explain why we use it
5. Derive the MLE or MAP parameters of a simple model in closed
form

35
NAÏVE BAYES

36
Naïve Bayes Outline
• Real-world Dataset
– Economist vs. Onion articles
– Document à bag-of-words à binary
feature vector
• Naive Bayes: Model
– Generating synthetic "labeled documents"
– Definition of model
– Naive Bayes assumption
– Counting # of parameters with / without
NB assumption
• Naïve Bayes: Learning from Data
– Data likelihood
– MLE for Naive Bayes
– MAP for Naive Bayes
• Visualizing Gaussian Naive Bayes

37
Naïve Bayes
• Why are we talking about Naïve Bayes?
– It’s just another decision function that fits into
our “big picture” recipe from last time
– But it’s our first example of a Bayesian Network
and provides a clearer picture of probabilistic
learning
– Just like the other Bayes Nets we’ll see, it admits
a closed form solution for MLE and MAP
– So learning is extremely efficient (just counting)

38
Fake News Detector
Today’s Goal: To define a generative model of emails
of two different classes (e.g. real vs. fake news)

CNN The Onion

40
Fake News Detector
CNN The Onion

We can pretend the natural process generating these vectors is stochastic…

41
Naive Bayes: Model
Whiteboard
– Document à bag-of-words à binary feature
vector
– Generating synthetic "labeled documents"
– Definition of model
– Naive Bayes assumption
– Counting # of parameters with / without NB
assumption

42
Model 1: Bernoulli Naïve Bayes
Flip weighted coin

If HEADS, flip If TAILS, flip

each red coin each blue coin
y x1 x2 x3 … xM
… …
0 1 0 1 … 1

1 0 1 0 … 1 We can generate data in

this fashion. Though in
1 1 1 1 … 1 practice we never would
since our data is given.
0 0 0 1 … 1
Instead, this provides an
0 1 0 1 … 0 explanation of how the
Each red coin
corresponds to data was generated
1 1 0 1 … 0
an xm (albeit a terrible one).
43

Lecture 6
No ratings yet
Lecture 6
13 pages
11 Parameter Estimation
No ratings yet
11 Parameter Estimation
6 pages
Artificial Intelligence and Machine Learning
No ratings yet
Artificial Intelligence and Machine Learning
55 pages
Week 6 Mle
No ratings yet
Week 6 Mle
41 pages
4 MLEHandout
No ratings yet
4 MLEHandout
36 pages
Math for CompSci: MLE & Regularization
No ratings yet
Math for CompSci: MLE & Regularization
46 pages
ML Map and Bayseian
No ratings yet
ML Map and Bayseian
35 pages
Output 25
No ratings yet
Output 25
8 pages
Hasan 2 - Estimation Methods Method of Moments and Maximum Likelihood
No ratings yet
Hasan 2 - Estimation Methods Method of Moments and Maximum Likelihood
5 pages
Week 6 Mle Perraillon 0
No ratings yet
Week 6 Mle Perraillon 0
69 pages
Lecture5 Maximum Likelihood
No ratings yet
Lecture5 Maximum Likelihood
13 pages
Ds 7
No ratings yet
Ds 7
20 pages
MAP&MLE
No ratings yet
MAP&MLE
44 pages
L09 Learning I Bayesian Learning
No ratings yet
L09 Learning I Bayesian Learning
66 pages
Maximum Likelihood Estimation: Guy Lebanon February 19, 2011
No ratings yet
Maximum Likelihood Estimation: Guy Lebanon February 19, 2011
6 pages
Mathematical Foundations of Computational Linguistics: Manfred Klenner and Jannis Vamvas
No ratings yet
Mathematical Foundations of Computational Linguistics: Manfred Klenner and Jannis Vamvas
32 pages
04 Lecturenote MLE MAP Discriminative
No ratings yet
04 Lecturenote MLE MAP Discriminative
6 pages
2223hk1 Slide01 ML2022-2
No ratings yet
2223hk1 Slide01 ML2022-2
23 pages
CHAPTER 4 Parametric Methods
No ratings yet
CHAPTER 4 Parametric Methods
13 pages
3.exponential Family & Point Estimation - 552
0% (1)
3.exponential Family & Point Estimation - 552
33 pages
Unit 04 - Maximum Likelihood Estimation - 1 Per Page
No ratings yet
Unit 04 - Maximum Likelihood Estimation - 1 Per Page
62 pages
Main 2
No ratings yet
Main 2
37 pages
Machine Learning Homework Guide
No ratings yet
Machine Learning Homework Guide
6 pages
Lecture 10
No ratings yet
Lecture 10
59 pages
Lecture 2 Annotated
No ratings yet
Lecture 2 Annotated
60 pages
Mlelectures PDF
No ratings yet
Mlelectures PDF
24 pages
Mlelectures PDF
No ratings yet
Mlelectures PDF
24 pages
Note 4: EECS 189 Introduction To Machine Learning Fall 2020 1 MLE and MAP For Regression (Part I)
No ratings yet
Note 4: EECS 189 Introduction To Machine Learning Fall 2020 1 MLE and MAP For Regression (Part I)
6 pages
Version 1
No ratings yet
Version 1
18 pages
Output 23
No ratings yet
Output 23
6 pages
Homework 2
No ratings yet
Homework 2
4 pages
Assign 1
No ratings yet
Assign 1
5 pages
CMU ML Homework: MLE & MAP Analysis
No ratings yet
CMU ML Homework: MLE & MAP Analysis
10 pages
Maximum Likelihood Estimation Lecture
No ratings yet
Maximum Likelihood Estimation Lecture
22 pages
MLEstimation
No ratings yet
MLEstimation
8 pages
771 A18 Lec7
No ratings yet
771 A18 Lec7
120 pages
(MLE) - MLE-vs-Bayes
No ratings yet
(MLE) - MLE-vs-Bayes
11 pages
02 Review Estimation 2
No ratings yet
02 Review Estimation 2
36 pages
MLE Insights for Data Scientists
No ratings yet
MLE Insights for Data Scientists
15 pages
ML Unit 3
No ratings yet
ML Unit 3
14 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
6 pages
Lin Reg
No ratings yet
Lin Reg
34 pages
Lecture 2
No ratings yet
Lecture 2
8 pages
Probability Theory For Machine Learning: Chris Cremer September 2015
No ratings yet
Probability Theory For Machine Learning: Chris Cremer September 2015
40 pages
Naive Bayes Classifier and Other Topics
No ratings yet
Naive Bayes Classifier and Other Topics
52 pages
ECE 449 Notes
No ratings yet
ECE 449 Notes
5 pages
Ps 2,3
No ratings yet
Ps 2,3
48 pages
3 - Mle
No ratings yet
3 - Mle
14 pages
Chap1 Bishop
No ratings yet
Chap1 Bishop
35 pages
Applied Statistics - Lecture 1: Mario Beraha
No ratings yet
Applied Statistics - Lecture 1: Mario Beraha
52 pages
Chapter 4 ML Parametric Classification
No ratings yet
Chapter 4 ML Parametric Classification
42 pages
ML and MAP - HTML
No ratings yet
ML and MAP - HTML
9 pages
7 Mle
No ratings yet
7 Mle
31 pages
Probabilistic Models in Supervised Learning
No ratings yet
Probabilistic Models in Supervised Learning
32 pages
Lecture Slide 10
No ratings yet
Lecture Slide 10
48 pages
Maximum Likelihood Estimator (MLE) and Ex-Amples: 1. Binomial Distribution
No ratings yet
Maximum Likelihood Estimator (MLE) and Ex-Amples: 1. Binomial Distribution
4 pages
6 Probabilities
No ratings yet
6 Probabilities
52 pages
Bayesian Learning: Thanks To Nir Friedman, HU
No ratings yet
Bayesian Learning: Thanks To Nir Friedman, HU
41 pages
8 Recommender
No ratings yet
8 Recommender
139 pages
Decision Tree Classifiers
No ratings yet
Decision Tree Classifiers
95 pages
Futureinternet 15 00192
No ratings yet
Futureinternet 15 00192
24 pages
Final Report 169369314
No ratings yet
Final Report 169369314
11 pages
Final Report 169508596
No ratings yet
Final Report 169508596
6 pages
Final Report 169374910
No ratings yet
Final Report 169374910
7 pages
Sky High 3 Course PDF
No ratings yet
Sky High 3 Course PDF
7 pages
Coaching and Mentoring Form
100% (4)
Coaching and Mentoring Form
8 pages
BTEC HND Business - Doc Assignment 1
50% (4)
BTEC HND Business - Doc Assignment 1
8 pages
Causes of Building Cracks
No ratings yet
Causes of Building Cracks
5 pages
PPT#03
No ratings yet
PPT#03
46 pages
The Restless River Yarlung Tsangpo Siang Brahmaputra Jamuna
No ratings yet
The Restless River Yarlung Tsangpo Siang Brahmaputra Jamuna
216 pages
Registration Form
No ratings yet
Registration Form
4 pages
Mooney Aircraft Inspection Guide
100% (1)
Mooney Aircraft Inspection Guide
2 pages
Waler Beam To Sheet Pile Connection Plate Detail
100% (1)
Waler Beam To Sheet Pile Connection Plate Detail
2 pages
Discrete Time System Stability
100% (1)
Discrete Time System Stability
5 pages
Annual Report 2017 en
No ratings yet
Annual Report 2017 en
202 pages
LP 8 Taxation
No ratings yet
LP 8 Taxation
2 pages
Advanced Binary Trading Guide - Quotex Edition: 1. Understand The Platform (Quotex)
No ratings yet
Advanced Binary Trading Guide - Quotex Edition: 1. Understand The Platform (Quotex)
4 pages
Maharashtra Death Cetificate Application Form PDF 1
50% (2)
Maharashtra Death Cetificate Application Form PDF 1
1 page
Adiong Vs Comelec Case Digest
100% (3)
Adiong Vs Comelec Case Digest
2 pages
Empowering Youth Through Education
No ratings yet
Empowering Youth Through Education
4 pages
Usif Patel Case
67% (3)
Usif Patel Case
2 pages
International Standard: Mobile and Fixed Offshore Units - Electrical Installations - Part 2: System Design
No ratings yet
International Standard: Mobile and Fixed Offshore Units - Electrical Installations - Part 2: System Design
11 pages
SVP010093 Aeml Aarey C Doc 001 - R1
100% (1)
SVP010093 Aeml Aarey C Doc 001 - R1
13 pages
Value vs. Growth: Understanding the Trap
No ratings yet
Value vs. Growth: Understanding the Trap
18 pages
Chilean Maritime Law Overview
No ratings yet
Chilean Maritime Law Overview
3 pages
Sarimul H Mazumdar - 20240324 - 202649 - 0000
No ratings yet
Sarimul H Mazumdar - 20240324 - 202649 - 0000
2 pages
Graham C. Goodwin, Stefan F. Graebe, Mario E. Salgado - Control System Design - Prentice Hall (2000) - 15
No ratings yet
Graham C. Goodwin, Stefan F. Graebe, Mario E. Salgado - Control System Design - Prentice Hall (2000) - 15
1 page
MIS17-Chapter 01
100% (1)
MIS17-Chapter 01
61 pages
Automatic Controls, Electronic Controls, Compressors, Condensing Units and Packages For All Refrigerants
100% (2)
Automatic Controls, Electronic Controls, Compressors, Condensing Units and Packages For All Refrigerants
0 pages
Return-Oriented Programming Attacks
No ratings yet
Return-Oriented Programming Attacks
2 pages
Green Illustrative Research Report Presentation
No ratings yet
Green Illustrative Research Report Presentation
30 pages
Toyota Motor Phils, Corp. V Toyota Motor Phils. Corp. Labor Union
100% (1)
Toyota Motor Phils, Corp. V Toyota Motor Phils. Corp. Labor Union
6 pages
Business Etiquettes and Professionalism
No ratings yet
Business Etiquettes and Professionalism
22 pages
Raspberry. Trip Laser Wire
No ratings yet
Raspberry. Trip Laser Wire
8 pages