0% found this document useful (0 votes)
42 views6 pages

Questions For Unit 4

Uploaded by

PRANAV T V
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views6 pages

Questions For Unit 4

Uploaded by

PRANAV T V
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Estimation - Classical and Bayesian Approach

Classical (Frequentist) Approach


• Focus: Considers parameter as fixed, uses data to infer its value.
• Point Estimation: Uses estimators like Maximum Likelihood Estimation (MLE) or Method of Moments (MoM).
– MLE Formula:
θ̂MLE = arg max L(θ; X)
θ
• Interval Estimation: Constructs confidence intervals, often using normal approximations.
• Objective: Obtain point estimates for parameters without involving prior knowledge.
• Properties:
– Consistency: θ̂ → θ as n → ∞.
– Unbiasedness: E(θ̂) = θ.
– Efficiency: Minimizes variance among unbiased estimators.
– Sufficiency: Uses all information in the sample.
• Key Method: Maximum Likelihood Estimation (MLE).

Bayesian Approach
• Focus: Treats parameters as random variables with prior distributions.
• Bayes’ Theorem:
p(X|θ)p(θ)
p(θ|X) =
p(X)
• Posterior Distribution: Combines prior and likelihood to form the posterior p(θ|X).
• Objective: Incorporate prior beliefs with sample evidence to update knowledge.
• Posterior Distribution:
P (θ|X) ∝ P (X|θ) · P (θ)
• Properties:
– Flexible, subjective.
– Allows prior updates with new data.
– Uses posterior predictive distribution for inference.

Summary :
• Philosophy: Classical estimation treats parameters as fixed values, while Bayesian estimation treats parameters
as random variables with distributions.
• Prior Information: Bayesian estimation incorporates prior beliefs through prior distributions, while classical
estimation uses data alone.
• Uncertainty Quantification: Classical methods typically use point estimates and confidence intervals, while
Bayesian methods provide a full distribution (posterior) and credible intervals.
• Computation: Classical estimation is often simpler, while Bayesian methods require more complex computational
techniques such as MCMC.

Methods of Estimation
Point Estimation
• A single value estimate of a parameter.
• Methods:
– MLE: Maximizes likelihood function;
θ̂MLE = arg max L(θ|X)
θ

– Method of Moments (MoM): Sets sample moments equal to population moments.


– Bayesian Estimation: Uses posterior mean, median, or mode as estimates.

1
Interval Estimation
• Provides a range within which the parameter lies with a certain confidence.
• Confidence Interval: h i
θ̂ − zα/2 · σθ̂ , θ̂ + zα/2 · σθ̂

• Method of Moments (MoM):

– Sets sample moments equal to population moments to solve for parameters.


– Example: For parameter θ, E(X) = sample mean.

• Maximum Likelihood Estimation (MLE):


– Maximizes the likelihood function with respect to parameters.
• Bayesian Estimation:
– Computes posterior mean, median, or mode based on the posterior distribution.
– Posterior Mean: Z
E(θ|X) = θ p(θ|X) dθ

Likelihood and EM Algorithm


• Likelihood Function:
L(θ; X) = p(X|θ)

• Log-Likelihood:
ℓ(θ; X) = log(L(θ; X))

• Properties of Likelihood:
– Consistency: θ̂MLE → θ as n → ∞.
– Asymptotic Normality: √
n(θ̂MLE − θ) ∼ N (0, I −1 (θ))
where I(θ) is the Fisher information.
– Helps find MLE.
– For large samples, the MLE is approximately normally distributed.
• EM Algorithm:
– Purpose: Estimate parameters in models with latent variables.
– Properties:
∗ Iterative improvement.
∗ Converges to a local maximum of the likelihood.

– Iterative algorithm for finding MLE when data are incomplete or have latent variables.
– Steps:
∗ E-step: Compute the expectation of the complete-data log-likelihood.(Calculate expected log-likelihood
Q(θ|θ(t) ).)
∗ M-step: Maximize this expectation to update parameters.(Maximize Q(θ|θ(t) ) with respect to θ.)

Prior Distributions
Conjugate Priors
• Definition: Prior and posterior distributions are in the same family.
• Examples:
– Normal prior for normal likelihood.
– Beta prior for binomial likelihood.
• Benefit: Simplifies computation of posterior.

2
Informative Prior
• Reflects specific prior knowledge about the parameter.

• Example: Using expert data for priors.

Non-informative Prior
• Represents lack of prior information (e.g., uniform distribution).

• Objective: Minimize influence of prior on posterior.

Loss Functions
• Purpose: Quantify the cost of estimation errors.

• Common Loss Functions:


– Squared Error Loss:
L(θ, θ̂) = (θ − θ̂)2
– Absolute Error Loss:
L(θ, θ̂) = |θ − θ̂|
– Zero-One Loss:
L(θ, θ̂) = I(θ ̸= θ̂)

Risk Function
• Definition: Expected value of the loss function over the parameter space.
• Formula:
R(θ, θ̂) = Eθ [L(θ, θ̂)]

• Bayesian Risk: Minimizes expected posterior loss for decision-making.

Examples
Example (Classical): Suppose you have a sample of heights from a population and want to estimate the population
mean µ.
Let’s assume the sample heights are X = {170, 165, 180, 175, 160}.
Sample Mean (Point Estimate):
n
1X 170 + 165 + 180 + 175 + 160
µ̂ = Xi = = 170
n i=1 5

Confidence Interval for Mean (assuming normal distribution with unknown variance): Compute the sample
standard deviation s: v
u n r
u 1 X 1X
s=t (Xi − µ̂)2 = (Xi − 170)2 = 7.91
n − 1 i=1 4

For a 95% confidence level, with t0.025,4 ≈ 2.776:


 
s s
CI = µ̂ − tα/2,n−1 · √ , µ̂ + tα/2,n−1 · √
n n

= (170 − 2.776 × 3.54, 170 + 2.776 × 3.54) = (160.17, 179.83)


Example (Bayesian): Assume a prior belief that the population mean µ is normally distributed with µ0 = 160 and
variance σ02 = 25.
The likelihood (data distribution) is also normal, X ∼ N (µ, σ 2 ), with σ 2 = 16.
Posterior Mean (using conjugate normal prior):

σ02 · X̄ + σ 2 · µ0 25 · 170 + 16 · 160


µposterior = = = 165.27
σ02 + σ 2 25 + 16

3
2. Methods of Estimation
Example (Maximum Likelihood Estimation): Suppose X1 , X2 , . . . , Xn are i.i.d. samples from an exponential
distribution with unknown rate λ: f (x|λ) = λe−λx .
Likelihood Function:
n
Y P
L(λ) = λe−λXi = λn e−λ Xi
i=1
Log-Likelihood: X
ℓ(λ) = n ln λ − λ Xi
Maximize by taking the derivative:
dℓ n X n
= − Xi = 0 ⇒ λ̂ = P
dλ λ Xi
Example (Method of Moments): Suppose you have data from a distribution with unknown mean µ and variance
σ2 .
For a normal distribution, the first two moments are E[X] = µ and E[(X − µ)2 ] = σ 2 .
Equating sample moments to population moments: - Sample mean X̄ = µ. - Sample variance S 2 = σ 2 .
Thus, the method of moments estimates are µ̂ = X̄ and σ̂ 2 = S 2 .

3. Likelihood and Expectation-Maximization (EM) Algorithm


Example (Likelihood): Suppose X1 , X2 , . . . , Xn are i.i.d. samples from a normal distribution N (µ, σ 2 ).
Likelihood Function:
n
Y 1 (Xi −µ)2
L(µ, σ 2 ) = √ e− 2σ2
i=1 2πσ 2
Log-Likelihood:
n
n 1 X
ℓ(µ, σ 2 ) = − ln(2πσ 2 ) − 2 (Xi − µ)2
2 2σ i=1
To find the MLEs µ̂ and σ̂ 2 , differentiate ℓ with respect to µ and σ 2 , set to zero, and solve.
Example (EM Algorithm): Assume you observe data X from a mixture of two normal distributions with unknown
means µ1 , µ2 and common variance σ 2 .
E-Step: Compute the probability each observation belongs to each component, given the current parameter estimates.
M-Step: Use these probabilities to update the parameter estimates (e.g., means and variances) by maximizing the
expected log-likelihood.

4. Prior Distributions
Example (Conjugate Prior with Beta-Binomial): Assume X ∼ Binomial(n, θ) with a beta prior θ ∼ Beta(α, β).
Posterior Distribution: Since the beta prior is conjugate, the posterior is also a Beta distribution:
θ|X ∼ Beta(α + X, β + n − X)
Interpretation: Posterior updates based on the observed successes X and failures n − X, blending prior beliefs with
new data.

5. Loss Functions
Example (Squared Error Loss): Suppose you want to estimate the parameter θ = 5 and your estimate θ̂ = 4.
Squared Error Loss:
L(θ, θ̂) = (θ − θ̂)2 = (5 − 4)2 = 1
Example (Bayesian Decision with Loss Function): For estimating a parameter with squared error loss, the
Bayes estimator is the posterior mean.
Suppose the posterior distribution of θ after observing data is θ|X ∼ N (10, 2).
Posterior Mean: Since the Bayes estimator minimizes squared error loss, the best estimate of θ is 10.

6. Risk Function
Example (Risk Function for a Specific Estimator): Suppose X ∼ N (µ, 1) and you use µ̂ = X as the estimator for
µ.
Squared Error Risk: Since X is an unbiased estimator, R(µ, µ̂) = E[(X − µ)2 ] = Var(X) = 1.
The risk, or expected loss, is constant at 1, regardless of µ.

4
Problem:
Suppose you have a sample of n = 5 observations from a distribution with the probability density function (PDF) given
by:

θxθ−1
f (x; θ) = , 0 ≤ x ≤ 1, θ>0
θ
where θ is the unknown parameter.

1. Method of Moments: Use the method of moments to estimate the parameter θ.


2. Maximum Likelihood Estimation (MLE): Find the Maximum Likelihood Estimate (MLE) for θ.
3. EM Algorithm: Suppose the observed data come from a mixture of two distributions with the same form (but dif-
ferent parameters) and the goal is to estimate the parameters of the mixture. Set up the Expectation-Maximization
(EM) algorithm for this problem.

Solution:
1. Method of Moments:
The method of moments is used to estimate parameters by equating the sample moments to the population moments.
- The first population moment (mean) for the given distribution is:
Z 1 Z 1
E[X] = xf (x; θ) dx = x · θxθ−1 dx
0 0
This simplifies to:
1
E[X] =
θ+1
Now, the sample mean is:
n
1X
µ̂ = xi
n i=1
Equating the sample mean to the population mean:
n
1X 1
xi =
n i=1 θ+1
Solving for θ:
1
θ= 1
Pn −1
n i=1 xi
Thus, the method of moments estimate for θ is:
1
θ̂M M = 1
Pn −1
n i=1 xi

2. Maximum Likelihood Estimation (MLE):


To find the MLE, we first write down the likelihood function for a sample of size n:
n n
Y Y θxθ−1 i
L(θ) = f (xi ; θ) =
i=1 i=1
1
This simplifies to:
n
Y
L(θ) = θn xθ−1
i
i=1

The log-likelihood function is:


n
X
log L(θ) = n log θ + (θ − 1) log xi
i=1

To maximize the log-likelihood, we take the derivative with respect to θ:

5
n
d n X
log L(θ) = + log xi
dθ θ i=1
Setting the derivative equal to 0:
n
n X
+ log xi = 0
θ i=1
Solving for θ:
n
θ̂M LE = − Pn
i=1 log xi
Thus, the MLE estimate for θ is:
n
θ̂M LE = − Pn
i=1 log xi

3. EM Algorithm:
In this case, suppose the observed data comes from a mixture of two distributions with the same form but different
parameters: f (x; θ1 ) and f (x; θ2 ). The goal is to estimate the parameters θ1 and θ2 .
E-step (Expectation step): Given the current estimates of θ1 and θ2 , compute the responsibilities, i.e., the
probabilities of each data point belonging to each distribution:

π1 f (xi ; θ1 )
γi1 =
π1 f (xi ; θ1 ) + π2 f (xi ; θ2 )
π2 f (xi ; θ2 )
γi2 =
π1 f (xi ; θ1 ) + π2 f (xi ; θ2 )
where π1 and π2 are the mixing coefficients.
M-step (Maximization step): Update the parameter estimates θ1 , θ2 , π1 , and π2 based on the responsibilities:
Pn ˆ
γi1 xθi 1 −1
θˆ1 = i=1
Pn
i=1 γi1
Pn ˆ
γ xθ2 −1
Pn i2 i
θˆ2 = i=1
i=1 γi2
The mixing coefficients are updated as:
n
1X
πˆ1 = γi1
n i=1
n
1X
πˆ2 = γi2
n i=1
Repeat the E-step and M-step iteratively until convergence.

You might also like