0% found this document useful (0 votes)

14 views72 pages

Part 2 Estimation

The document discusses estimation theory, focusing on the best estimates for hidden values or parameters based on available data. It outlines three main estimation problems: regression, estimating hidden inputs, and building statistical relations without knowing the mapping functions. Additionally, it covers concepts such as unbiased, consistent, and efficient estimators, and introduces different estimation criteria including least mean square error, minimum absolute error, and maximum likelihood estimators.

Uploaded by

ahbaramohammed96

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views72 pages

Part 2 Estimation

Uploaded by

ahbaramohammed96

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 72

Probability and Stochastic

Process
PART 2

Concepts of Estimation Theory

Mohammed S. Elmusrati – University of VAASA 1

Introduction
• In estimation theory we are looking to the best
estimate for hidden (unseen) value(s) or parameters
based on available data (measurements) or
observations.
• Let’s define the problem as:
( )
y = h x;q + n
Where x is the hidden input, n is the random unknown noise
or biasing, h is vectors of mapping function based on
parameters θ, and N is the number of samples (observations)

• Where
Mohammed S. Elmusrati – University of VAASA 2
Estimation Problems
• For the previous equation we have at least the
following three interesting cases in estimation:
1. We know (at least partially) x and the observation y, and
we are looking for the best set of functions h and their
parameters θ. We call this Regression problem (or curve
fitting) and it is one of the tools in many applications like
Machine learning. We may know the general form of the
mapping h, for example the increasing exponential
functions in population models, but we need to know (or
estimate) the parameters of the exponential function.
Sometimes, we do not know even the mapping, and we
need to test/modify some general shapes to fit the
available x and y data. Or using black box modelling.

Mohammed S. Elmusrati – University of VAASA 3

Estimation Problems
2. The second case when we have the observations y and
we know (or at least we can assume) h with its
parameters, and we are looking to estimate x. Since x
vector is unknown for us (fully or partially) and it is
corrupted by noise then we should deal with it as a
random process.
3. The third interested case when we have only the
observations y and we do not know the mapping
functions h nor the inputs x. However, we may build
some statistical relations between the inputs and the
outputs. What is the best estimate of statistical
parameters based on the available information (e.g., y).

Mohammed S. Elmusrati – University of VAASA 4

Introduction
• We may simplify the general form of our estimation
problem to include mainly the third case as:
y=h x +n ()
x ÎÂ M´1
, and y;n ÎÂ N´1

• Here, we consider only one mapping function and we

are interested to estimate deterministic vector x
based on the N observations y.

Mohammed S. Elmusrati – University of VAASA 5

Unbiased Estimation
• How to assess the estimation process? How to decide which
estimate is the best? What are our evaluation criteria?
• Usually we estimate the hidden parameter or inputs x est based on
the observations {y}.
• However, since the estimate x est is based on random variables y,
hence, the estimate will be a random variable as well.
• If the expected value of the estimate E éë x est ùû equals the actual
estimated value, we call this estimate as unbiased estimate.
• x est is unbiased estimate of x if E éë x est ùû = x
• This means that the expected value of the error between the
estimate and the actual value is zero.

Mohammed S. Elmusrati – University of VAASA 6

Consistent Estimator
• The estimator x est is called consistent if when increasing
the number of observations that will reduce its
variance, i.e., when N is the number of observations
(or samples) used in the estimation
é
( ù
)
2
lim E ê x est - E éë x est ùû ú = 0
N®¥ ë û
• Hence, it is highly desirable for the estimator to be
unbiased and consistent. Because we will be close to
the actual parameter when we have large enough N.

Mohammed S. Elmusrati – University of VAASA 7

Efficient Estimator
• It is not possible to have an infinite number of
samples in order to make the estimate extremely
close to the actual parameter.
• Furthermore, it is possible to have several
different unbiased and consistent estimators.
Hence, Which one we should select?
• We may define the efficient estimator is that one
which has the minimum error variance for a
certain finite number of samples N.

Mohammed S. Elmusrati – University of VAASA 8

Efficient Estimator
• In order to decide if the estimator is efficient or not,
we should be able to know the minimum variance for
estimators of certain problem.
• Fortunately, the minimum variance or more precisely
the lower bound of the variance of the unbiased
estimators is given by Cramer-Rao (CR) lower bound.
• Considering the same problem in Slide 2 that we are
looking to estimate vector of parameters in inputs
x=[x1,..,xM]T based on the observations y =[y1,..,yN]T,
Mohammed S. Elmusrati – University of VAASA 9
Efficient Estimator
• Define the covariance matrix of the vectors of
estimates as R = E éêë( x - x )( x - x ) ùúû
T
x est est est

• The element (i,j) of the Fisher Information matrix

I(x) are given by

• Where f(y;x) is the joint probability density function

between the observations y and the parameters x
Mohammed S. Elmusrati – University of VAASA 10
Cramer-Rao Lower Bound
• Under the following regularity condition

• We may define the lower bound of the estimate variance

( ) ()
as é -1
ù
Var éë x est ùû ³ I x
i êë úûii
• This means that the variance of the ith parameter estimate
is lower bounded by the diagonal of the inverse Fisher
information matrix under the regularity condition.
Mohammed S. Elmusrati – University of VAASA 11
Cramer-Rao Lower Bound
• We can reduce the general form given for the Cramer-Rao Lower Bound
(CRLB) in the previous slide for a single estimated parameter as:
-1
( )
Var x est ³
é ¶2 ù
E ê 2 log f y;x ú ( )
ë ¶x û
• The regularity condition is
é ¶ ù
E ê log f y;x ú = 0( )
ë ¶x û
• The proof is not difficult and could be found in many books in estimation
theory. However, it is not given here. Nevertheless, we will use the
bound to prove the efficiency of certain estimators.
Mohammed S. Elmusrati – University of VAASA 12
Least Mean Square Error Estimate
• Let’s derive our first estimation algorithm. We
assume a single scalar parameter, i.e., we have N
observations based on yi = h( x ) + ni , i=1,..,N
• We don’t know the function h(.) and ni is additive
random measurement noise.
• One possible criteria is to find the best estimate
xest that minimizes the mean square error as:
é
( ( )) ù
2
minE ê x - x est y úû
ë
Mohammed S. Elmusrati – University of VAASA 13
Least Mean Square Error Estimate
• We may formulate the expected value as
¥ ¥
é
( ( )) ù
( ( )) f ( x,y )dxdy
2 2
E ê x - x est y
ë úû = ò ò x - x est y XY
-¥ -¥

• Where fXY(x,y) is the joint distribution between the

measurements and the parameter x. We may modify this
formula, since f XY ( x,y ) = fY ( y ) f X Y ( x y )
¥ ¥
é
( ( )) ù
( ( )) f ( y) f ( x y )dxdy
2 2
E ê x - xest y
ë úû = ò ò x - xest y Y XY
-¥ -¥

• We can find xest that minimize this function as

d é
( ( )) ù
2
E ê x - x est y úû = 0
dx est ë
Mohammed S. Elmusrati – University of VAASA 14
Least Mean Square Error Estimate
¥
é¥ ù
() ( ( )) f ( )
d 2

• Hence, dxest ò-¥ fY y êê -¥ò x - xest y XY

x y dx údy = 0
úû
ë
• But since fY(y) is always positive, then, it is true to
say d é ¥
ù ¥

( ( )) ( ) ( ( )) ( )
2
ê ò x - xest y f X Y x y dx ú = 0Þ -2 ò x - xest y f X Y x y dx = 0
dxest êë -¥ úû -¥

¥ ¥ ¥ ¥

ò xf
-¥
XY ( x y )dx - ò x ( y )f ( x y)dx = ò xf ( x y )dx - x ( y ) ò f ( x y )dx = 0
-¥
est XY
-¥
XY est
-¥
XY

¥
• Therefore, ( ) ò xf ( x y )dx = E éë x y ùû
x est y = XY
-¥

Mohammed S. Elmusrati – University of VAASA 15

Least Mean Square Error Estimate
• The previous result is very important since it says that
the best estimate of the parameter x is the mean of
the conditional probability density function of the
parameter given the observations, i.e., f(x|y).
• This function is not always available.
• Is there other ways to estimate the parameter x based
on the observations?
• Yes, we can use several other norms like minimizing
the the absolute value of the error or minimizing the
maximum value of the error as in the next slide:
Mohammed S. Elmusrati – University of VAASA 16
Minimum Absolute Error
• What is the optimum parameter estimate xest that
minimizes error = E éë x - xest ( y ) ùû ?
• Let’s do it in similar way as we did to minimize the
means square error:
¥ ¥ ¥
é¥ ù
( ) ò ò x - x (y) f (y) f ( )
E é x - x est y ù =
ë û est Y XY ()
êë -¥
() ( )
x y dxdy = ò fY y ê ò x - xest y f X Y x y dx údy
úû
-¥ -¥ -¥

¥
é¥ ù
d
dx est ë
()
E é x - x est y ù = 0 Þ
û
d
dx est ò-¥ Y êê -¥ò
f y () x - x est
y f () ( )
XY
x y dx údy = 0
úû
ë
x (y)
d é est ù
¥ ¥

Þ
d
dx est ò x - x (y) f
est XY ( )
x y dx = 0 =
dx est ê -¥
( ( )) ( )
XY (
XY ( )) ( )
ê - ò x - x est y f x y dx + ò x - x est y f x y dx ú = 0
ú
ë est ( ) û
-¥ x y

Mohammed S. Elmusrati – University of VAASA 17

Minimum Absolute Error
• Now we differentiate with respect to xest using the
following Leibniz Integral Rule
b( z ) b( z )
¶ ¶ f ( x, z ) ¶b ¶a
ò f ( x, z ) dx = ò dx + f ( b ( z ) , z ) - f ( a ( z ) , z )
¶z a( z ) a( z )
¶z ¶z ¶z
• We obtain:
()
x est y ¥ ()
x est y ¥

ò ( )
f X Y x y dx - ò ( )
f X Y x y dx = 0Þ ò ( )
f X Y x y dx = ò ( )
f X Y x y dx
-¥ ()
x est y -¥ ()
x est y

• This means that the optimum estimate is the median of

the conditional probability density function f(x|y)

Mohammed S. Elmusrati – University of VAASA 18

MinMax Error Criteria
• The third famous criteria for the optimization of
the estimated parameter is that one minimizes the
{
maximum error, i.e., min maxE éë x - xest ( y ) ùû }
• This could be formulated as
d ìï üï
¥

ímax ò x - x ( y ) f ( x y )dx ý
est
dx ïî
est -¥
XY
ïþ
• Roughly speaking the maximum of x is achieved at
the maximum of f(x|y). In other words, the best
estimate in this case is the mode of f(x|y).

Mohammed S. Elmusrati – University of VAASA 19

Different Estimators
• We have seen so far three different estimators
according to the criteria. However all of them are
based on the conditional probability density function:

– Minimizing the error variance L-2 Norm: the mean of f X Y x y ( )

– Minimize the absolute value L-1 Norm: the median f X Y x y ( )
– Minimize the maximum of the error L-inf Norm: The mode
( )
of f X Y x y

Mohammed S. Elmusrati – University of VAASA 20

Different Estimators

( )
fX Y x y

x̂MVx̂MM x̂MAP x
It is interesting to know that if the conditional probability is symmetric like the
Normal distribution, then all these three estimators are identical

Mohammed S. Elmusrati – University of VAASA 21

Maximum Likelihood Estimator
• It has been approved that the best estimator should be based
on the conditional probability of the parameter we are
looking to estimate given the observations or measurements

( )
fX Y x y

• However, unfortunately, it is a real challenge to have accurate

formulation of this a posterior density. However, using Bayes
rule, we may express it as: f yx f x ( ) ()
( )
fX Y x y =
YX

()
fY y
X

Mohammed S. Elmusrati – University of VAASA 22

Maximum Likelihood Estimator
• In terms of estimation theory we may express the parameters of
the previous Bayes’ formula as:
( )
– The density f X Y x y represents the distribution of the unknown
parameter x after collecting the measurements y. Hence, it is called
the posterior probability density function.
()
– The density f X x represents our believes about the possible values
of x before we watch any observations or collecting any
measurements. This could be based on assumptions and/or physical
behavior. It represents the priori statistical knowledge about x.
( )
– The density fY X y x is called the likelihood density which express how
the measurement or observations should behave at certain parameter
x.
– Finally the density fY ( y ) represents the general distribution of the
measurements regardless of the parameter x.
Mohammed S. Elmusrati – University of VAASA 23
Maximum Likelihood Estimator
• As we have seen from the different estimation
techniques, one method is by taking the maximum
value of the posterior probability density f X Y x y ( )
• Therefore, it is named as maximum a posteriori MAP
estimation.
( )
• However, since f X Y x y is generally very hard to know,
let’s see how to find some other equivalent estimator.
• Taking the logarithm of the posterior probability
density we obtain
Mohammed S. Elmusrati – University of VAASA 24
Maximum Likelihood Estimator
( ( )) ( ( ))
log f X Y x y = log fY X y x + log f X x - log fY y ( ( )) ( ( ))
• It is clear that taking the logarithm makes the density function
easier to handle. Moreover, the logarithm function is monotonic
increasing function, i.e., it will not change the location of the
maximum point.
• Generally speaking
if g(x)>0 for all x, and the xmax=Maximum[g(x)]
Then it is always true that xmax=Maximum[log(g(x))]
• Therefore, MAP estimate could be formulated as

{ ( ( ))} = max{log ( f ( y x )) + log ( f ( x ))}

max log f X Y x y YX X

Mohammed S. Elmusrati – University of VAASA 25

Maximum Likelihood Estimator
• In the last formulation we have dropped fY(y) because
it is not function in the parameter x and it will not
have any effect in finding the point which maximizes
the estimate.
• It is clear that to find the MAP point we will need to
know the likelihood density function as well as the
priori statistical knowledge about the parameter fX(x).
• In case we ignore the a priori part, and we maximize
only the likelihood density, we call this estimate the
maximum likelihood.
Mohammed S. Elmusrati – University of VAASA 26
Maximum Likelihood Estimator
• In other words the maximum likelihood estimator is
defined as:
{ ( )}
é ù
x ML = argmax fY X y x = argmax log ê fY X y x ú
ë û { ( )}
• We have seen that MAP is an optimum estimator
according to a certain criteria.
• Is MLE (maximum likelihood estimator) optimum in any
sense?
• ML estimator can be the optimum solution as the MAP in
some cases and can be suboptimal estimators in other
cases.
Mohammed S. Elmusrati – University of VAASA 27
Maximum Likelihood Estimator
• To see when MLE=MAP, let’s revisit the Bayes rule as
assume that all measurements (y1, y2, ..,yN) are
independent, then
max {log ( f ( x y ))} = max {log ( f ( y x ) f ( x ))} = max ílog ç Õ f ( y x ) f ( x )÷ ý
ìï æ N
ö üï
X k X
ï è øï
XY YX YX
î k=1 þ
ìN ü
ïî k=1
( ) ()
= max íå log é fY X yk x f X x ù ý
êë úû ï
þ
• It is clear that fX(x) is weighting the likelihood function.
Hence, if fX(x) is uniformly distributed over the whole
range of interest, then, it will not have any effects on the
location of the optimum x. In that case MLE=MAP.

Mohammed S. Elmusrati – University of VAASA 28

Maximum Likelihood Estimator
• The previous status of MAP=MLE could be shown
mathematically as:
d æ
dx è ( ( ))
log f X Y x y ö
ø
x=x MAP
d æ N
dx è k=1
é
êë
ù
úû ( )
= 0 = ç å log fY X yk x + log ë f X x û÷
é ù
ö
ø
()
x=x MAP
=

( ) ()
N

åf
1 d
dfY X yk x +
1 d
fX x = 0 ()
k=1
YX ( )
yk x dx f X x dx

• Hence, when fx(x)=constant, then its differentiation is

zero, so that
( )
N
1 d
åf dfY X yk x =0
k=1
YX ( y x ) dx
k
x=x ML

Mohammed S. Elmusrati – University of VAASA 29

Maximum Likelihood Estimator
• Hence, the MLE is in most cases is suboptimal. However, it can be
considered also as optimal solution when no priori information is
available about the parameter to be estimated.
• In this case, the best thing to assume is that, the parameter is
uniformly distributed. In other words, our pre-knowledge uncertainty
is the same to be any value.
• What is your pre-knowledge about throwing a coin to be landed head
or tail without looking to any observations? For sure, the best
assumption is that each one has probability 0.5 (uniform). Then the
MLE estimation is the optimum like the MAP. But if you know in
advance (based on some pre-knowledge) that the probability of head
is for example larger than 0.7, hence, MLE becomes just a suboptimal
estimator. One should use MAP which will give a better estimate.
• These concepts and more will be described through some examples.
Mohammed S. Elmusrati – University of VAASA 30
Example (1)
• Assume we are looking for a process of two outcomes
(Success) or (Fail). It may express too many practical
applications, few examples:
– Heading a target or mis-heading
– Correct or incorrect receiving of a transmitted symbol or
message.
– Positive or negative revenue
• Based on some historical independent observations,
we like to estimate the process parameter (in this case
the probability if success p). Let’s first assume that we
have no pre-knowledge about the process.

Mohammed S. Elmusrati – University of VAASA 31

Example (1)
• Assume that we have N observations, where M of them were
Successes (S) with yk=1 and (N-M) were Fails (F) with yk=0, as:
SSFSFSSFFSFFFSSFSSFFFFSSSFFS …..
• Since all observations are independent, hence,

• Although it is easy to find the MLE estimation of this problem by

taking the differentiation, however, in other more complicated
problems it could be very boring and lengthy. However, covert
the multiplications into summation by taking the logarithm,
makes it much more handy. We may call it the Likelihood
function (not density!) as
( ) ( ) () (
l y;x = p = log é P y x = p ù = Mlog p + N - M log 1- p
ë û ) ( )
Mohammed S. Elmusrati – University of VAASA 32
Example (1)
• Now we find the estimate of p which maximizes
the likelihood function (or density) as:
M N-M
d
dp
(
l y;x = p ) = 0Þ -
p̂ 1- p̂
=0
p= p̂

M N-M M
Þ = Þ Np̂ - Mp̂ = M - Mp̂ Þ p̂ =
p̂ 1- p̂ N
• Hence, the expected result to estimate the
probability is actually the MLE estimation.
Mohammed S. Elmusrati – University of VAASA 33
Example (1)
• Actually we may express what we have done in the
previous slide mathematically as:
1 N
( )
p̂ = å yk ; P yk = 1 = p, and P yk = 0 = 1- p
N k=1
( )
• In some applications, we may call yk as an indicator
function.
• Is this MLE estimator biased or unbiased?
• Is it consistent or not?
• Is it an efficient estimator?

Mohammed S. Elmusrati – University of VAASA 34

Example (1)
• Based on the definition of unbiased estimators, we
should find the expected value of estimated value as:
(
E éë yk ùû = 1´ P yk = 1 +0´ P yk = 0 = p ) ( )
• Therefore, E éë yk ùû = 1´ P ( yk = 1) + 0´ P ( yk = 0) = p
1 éN ù 1 N 1 N Np
E éë p̂ ùû = E ê å yk ú = å E éë yk ùû = å p = =p
N ë k=1 û N k=1 N k=1 N
• Hence, this estimator in unbiased.

Mohammed S. Elmusrati – University of VAASA 35

Example (1)
• Now let’s compute the variance of the estimated
value to see if it is consistent or not! The
derivation is given in step-by-step next:
éæ 1 N ö 2 ù
() (é
) ù
( )
Var p̂ = E ê p̂ - E éë p̂ ùû ú = E é p̂ - p ù = E éë p̂2 ùû - p2 = E êç å yk ÷ ú - p2
2 2

ë û êë úû êè N k=1 ø ú
ë û
éæ 1 N ö 2 ù é 1 æ N ö2ù é 1 N N ù 1 N N
Þ E êç å yk ÷ ú = E ê 2 ç å y k ÷ ú = E ê 2 å å yk yi ú = 2 å å E éë yk yi ùû
êè N k=1 ø ú ê N è k=1 ø ú ë N k=1 i=1 û N k=1 i=1
ë û ë û
ì E é y2 ù , k=i ìï p, k = i
ï ë kû
E éë yk yi ùû = í ( ) ( )
;E éë yk ùû = 1 P yk = 1 + 0 P y k = 0 = p Þ E éë yk yi ùû = í 2
2 2 2

ï E éë yk ùû E éë yi ùû , k ¹ i ïî p , k ¹ i
î
p N -1 2 ( ) p N -1 2 2 p 1- p ( ) ( )
1 N N
N k=1 i=1
é ù
N
1
( (
Þ 2 å å E ë y k y i û = 2 Np + N - N p = +
2
Mohammed
2
S. N
) )
Elmusrati – N
p ÞVar p̂ = +
University of VAASA N N
()
p -p =
36N
Example (1)
• From the previous slide result it is clear that

()
lim Var p̂ = lim
(
p 1- p ) =0
N®¥ N N®¥

• Hence the estimator is also consistent ☺

• Is it possible to have a better unbiased and consistent
estimator than this one? To answer this, we should find
the variance lower bound (CRLB)
• This is left as an exercise! -1
( )
Var x est ³
é ¶2 ù
( )
E ê 2 log f y;x ú
Mohammed S. Elmusrati – University of VAASA ë ¶x û37
Example (2)
• If we have some extra priory knowledge or different
uncertainty level about the parameter x that it could
be represented by the following density function:

• Find the value of α

()
f X x = 4e -a x
, 0 £ x £1

• How this knowledge might affect our optimum

estimator about the probability of x based on the
observations.
• Compare both results.
Mohammed S. Elmusrati – University of VAASA 38
Example (2)
• If we ignore our uncertainty about the parameter
before looking to any measurements or
observations, we will have the MLE estimator.
• However, this might not be the optimum as we
ignored important uncertainty information.
• Looking to the priori probability density function
of the unknown parameter, it looks like shown in
the next slide

Mohammed S. Elmusrati – University of VAASA 39

Example (2)
1 1

ò ( )
0
f X x dx = 1 Þ 4 ò e -a x dx = 1
0

Þ a = 3.9207 Prove it?

()

Looking to the priori distribution, the

fX x

probability of being Success of Fail is not

uniform. Now we have more accurate
impression about the uncertainty. Actually,
we know that the probability for the
Success case is less than 0.3 with a chance
of about 70%. This kind of information
x should have impact to improve our
estimation about the parameter x.

Mohammed S. Elmusrati – University of VAASA 40

Example (2)
• From slide 29
d æ N ö
å
dx çè k=1
log é f( ) y x
êë Y X k úû
ù + log éf x ù
ë X û÷ø () =0
x=x MAP

d
dx
( () ( ) (
M log x + N - M log 1- x + log 4 - a x ) x=x MAP
()
=0 )
N -M
M
-
x MAP 1- x MAP
(
- a = 0 Þ a x MAP 2 - a + N x MAP + M = 0 )
( 3.92+ N ) ± (3.92+ N ) -15.68M
2

Þ x MAP = , 0£ x MAP
£1
7.84
Mohammed S. Elmusrati – University of VAASA 41
Example (2)
• In the previous result, we have two results, and we
should always select the one which is between 0
and 1.
• Let’s assume that in our observation we have
M/N=0.5. In MLE estimation, we will estimate as
p=0.5. But how it will be in MAP estimate with the
availability of a priori density function.
• Next figure shows the MAP estimate for M/N=0.5
for several N values.

Mohammed S. Elmusrati – University of VAASA 42

Example (2)
From this figure we can see
easily the impact of the pre-
knowledge on the estimation
of the parameter.
p̂ = x MAP

If we have only two

M 1
= observations, i.e., N=2, and
N 2 we have M=1, in MLE the best
estimate of the probability
p=0.5. But with MAP, we can
see that the probability is just
0.2. However, if we repeat the
experiment many times, i.e., N
is very large and we have
N M/N=0.5, then we approach
the believe that p=0.5.

Mohammed S. Elmusrati – University of VAASA 43

Exercise
• In the previous estimation example, if as a priori
information we know that the probability is
uniformly distributed from 0.4 to 0.8.
• Find the MAP estimation in this case.

Mohammed S. Elmusrati – University of VAASA 44

Example (3)
• Assume that we are interested to estimate the actual value
of a constant x. However, the observation (or measurement)
is always corrupted by a zero mean Gaussian noise with
known variance 𝝈2 .
• The mathematical model of this problem is
yi = x + ni
• Where yi=y1,y2, … ,yN are N different available measurements
and ni=n1,n2, … ,nN are independent identical distributed
zero mean Gaussian (Normal) distribution.

Mohammed S. Elmusrati – University of VAASA 45

Example (3)
• This example presented in the previous slide is very
important as it gives foundation for many concepts in
Estimation theory.
• Since ni samples are zero mean Normal distributed random
process, hence, it is clear that the measurements yi are also
Normal distributed process but with mean equals the
constant x and with same variance as ni. Therefore,

( ) 1 ( )
2
- yi -x 2s 2
f y x yi x = e
2ps
Mohammed S. Elmusrati – University of VAASA 46
Example (3)
• In this example, we assume that we have no priori
knowledge about the parameter x that we are looking to
estimate. Therefore, the optimum estimator is the
maximum likelihood estimation.
• Assume we have N measurements, y1, y2, .., yN. Therefore,
the likelihood density becomes (due to the independence
assumptions of ni)

Mohammed S. Elmusrati – University of VAASA 47

Example (3)
• Again we compute the likelihood function as
N

å( y - x )
2

( ) ( ) N
( ) ( )
i
l y;x = log éê f y x ù
y x ú = - log 2p - N log s - i=1
ë û 2 2s 2
N N

å( y - x ) åy
d
( )
i i
Þ l y;x = i=1
= 0 Þ x ML = i=1
dx x=x ML
s2 N

• Hence, the conventional mean computation is the

MLE estimation for the actual mean value.

Mohammed S. Elmusrati – University of VAASA 48

Example (3)
• It is quite easy to prove that the previous MLE of
the mean is unbiased and consistent estimator as:
é N ù N N

ê å y i ú å E éë yi ùû å x Nx
E éë x ML ùû = E ê i=1 ú = i=1 = i=1 = = x Þ unbiased
ê N ú N N N
ê ú
ë û
éæ N ö ù
2

ê åy ú é ù
ç ÷ æ ö
2

( )
N

( ) é ù ê ú 1 ê
2 i
Var x ML = E ê x ML - E éë x ML ùû ú = E êç i=1
- x ÷ ú = 2 E ç å y i - Nx ÷ ú
ë û
êçç
N ÷ ú N êè i=1 ø ú
÷ø ë û
êë è úû
1 éN 2 N N N
ù
= 2 E ê å yi + å å yi y k - 2Nx å yi + N 2 x 2 ú
N ë i=1 i=1 k¹i i=1 û
Ns 2 s 2
1 é
N ë
2
(2
) (
2 2 2
)
2 2ù
= 2 N s + x + N N -1 x - 2N x + N x = 2 =
û N N
Þ Consistent
Mohammed S. Elmusrati – University of VAASA 49
Example (3)
• Let’s check if the estimator is efficient or not:
-1
( )
Var x est ³
é ¶2 ù
E ê 2 log f y;x ú ( )
ë ¶x û
It should be quit easy for
( ) ( ) ()
fY,X y ; x = fY X y x f X x Þ log éë fY,X ( ) ë ( )
y ; x ùû = log éê fY X y x ()
ù + log é f x ù
úû ë X û you to prove the
N regularity condition:
å( y - x )
2

N
( ) ( ) ( ( ))
i
= - log 2p - N log s - i=1
+ a constant representslog éë f X x ùû
2 2s 2

(
å y i - x ¶2 ) é ¶ ù
¶
( )
Þ log éë fY,X y ; x ùû = i=1 2
N
Þ 2 log éë fY,X y ; x ùû = - 2 ( ) ( )
E ê log f y;x ú = 0
¶x s ¶x s ë ¶x û
s2
( )
ÞVar x est ³
N
, Hence, x ML is an efficient estimator

Mohammed S. Elmusrati – University of VAASA 50

Example (4)
• In the same example, assume that we are
interested also to estimate the noise variance.
N

å( y - x )
2

( ) ( ) N
( ) ( )
i
l y;x = log éê f y x ù
y x ú = - log 2p - N log s - i=1
ë û 2 2s 2
N N

å( y - x ) å( y - x )
2 2

d
( ) N i i ML
Þ l y;x =- + i=1
= 0Þ s ML
2
= i=1
ds s =s ML
s s3 N

Mohammed S. Elmusrati – University of VAASA 51

Example (4)
• Is the previous estimate of the variance unbiased?

∵ E éë x ML
2
ù=
1
E
é æ N
ê åy ú=
ö
2
( ) ( = =
)
ù N s 2 + x 2 + N N -1 x 2 s 2 + Nx 2 s 2
+ x 2
û N 2 êçè i÷
ø ú N2 N N
ë i=1
û
é N
2ù
N
E éës ML
2
û N
1
( ) 1
ù = E ê å yi - x ML ú = å E é yi2 - 2 yi x ML + x ML
N ë
2
ù
û
ë i=1 û i=1

Né 2 2 s ù N +1 2
( )
2
= ê s + x - 2x +
2
+x ú=
2
s Þ Biased
Në N û N
N

å( y - x )
2
• To have unbiased estimator: i ML
s ML
2
= i=1
Is it a big problem? No, especially for large N! N -1
Mohammed S. Elmusrati – University of VAASA 52
Example (5)
• In the same previous example, let’s assume that we have a
priori knowledge about the parameter to be estimated.
• For example, assume x itself has Normal distribution with
known mean 𝜇x and variance 𝝈x.
• It might be the same problem that x is not fixed but
changing in a random manner, however, we may assume
that it is fixed during the measurement period. One
example is tracking a moving object in unpredictable way.
Hence, we will collect data to estimate its updated location
• Since in this problem we have some extra knowledge even
with high uncertainty, we should use MAP instead of MLE.

Mohammed S. Elmusrati – University of VAASA 53

Example (5)
• Using the MAP formulation given before in slide 29:

Mohammed S. Elmusrati – University of VAASA 54

Example (5)
• The result of the previous example is rather interesting.
• If 𝝈x is very small and close to zero, this means that the
uncertainty of x is very small and it should be very close to 𝜇x.
Look to the MAP estimation with β is close to zero. You can
see that xMAP –> 𝜇x regardless the number of samples N and
the values of yi.
• When 𝝈x is not small, but N is very large, then our estimate
will be closer to the MLE (summation of measurements
divided by N.
• Actually the MAP estimation is an optimized compromise
between the information gained from the measurements and
the priori information covered by fX(x).

Mohammed S. Elmusrati – University of VAASA 55

Example (6)
• Assume that we are interested to estimate a slowly
changing unknown random process. However, we
know that it follows normal distribution N(1,8).
However, our measurements are corrupted with zero
mean random noise as N(0,1).
• Write a simulation code to assist both MAP and MLE
estimation methods for N=1 to 100. Compute the
average error for 20 random values of the unknown
parameter.
• Plot the results.
Mohammed S. Elmusrati – University of VAASA 56
Example (7)
• A random variable x is to be estimated on the basis of a priori
information and the ith noisy measurement is expressed as
yi=x+ni, where ni is the ith noise sample
• Moreover, x and ni are assumed to be independent. The
distributions functions of x and ni are shown next.
• Find the optimum estimate of x?

fX ( x ) ()
fn n

2
x -
1 1 n
2 2

Mohammed Salem Elmusrati 57

Example (7)
• As we have done before, let’s first construct fX|yi
fy ( y x) f (x)
f X y x yi =
i
( ) i
X i

( )
f y yi
X

• Since fy is not function in the parameter x, then we may ignore

it, also fX is fixed from 0 to 2, then it is useful only in the
determination of the range of the admissible values of x.
• Therefore, as for the MLE estimation, we may find the
optimum x by looking to one of moments of fyi|X
• It is clear that ì 1 1
fy
i
X ( ) ( ï 1 - £ yi - x £
yi x = fn yi - x = í
ï 0
2 ) 2
î eleswhere
Mohammed Salem Elmusrati 58
Example (7)
• From the previous equation, we have
ì

)
1 1

fn yi - x
fy ( ) ( ) ï 1 - + yi £ x £ + yi
yi x = fn yi - x = í 2 2

(
X
ï 0
i

î eleswhere 1 y 1
- + yi i
2
+y
2 i
x
• Therefore, the parameter x could be determined based on the
measurement. For example, if we have a single measurement,
say yi=1, hence, the conditional probability will be uniform
from 0.5 to 1.5. It is clear that this uniform does not have a
single mode (maximum) value. Hence, we may consider the
mean or the median which are equal, and x̂ = 1 , i.e., x̂ = yi.
However, we should keep in mind that 0≤x≤2 as well.

Mohammed S. Elmusrati – University of VAASA 59

Example (7)
• For example, what will be our estimate if yi=2.2? Here, we
know from our priori information that the maximum of x is
2. Hence it cannot be 2.2? Therefore, we should truncate
the maximum to 2, and the minimum is 2.2-0.5=1.7.
Therefore, the average is (1.7+2)/2=1.85.
• We may construct the optimum estimate of the parameter
based on the measurements as: ì
ï y i , 0.5 £ y i £ 1.5
ï
ï y i + 0.5
x̂ = í , yi £ 0.5
ï 2
ï yi +1.5
ï , yi > 1.5
î
Mohammed S. Elmusrati – University of VAASA 2 60
Example (8)
• In Example (3), if the noise samples ni are
correlated. How this dependence will affect the
estimation of the parameter x?
• The conditional probability of the measurements
based on the estimated parameter is:

Mohammed S. Elmusrati – University of VAASA 61

Example (8)

Mohammed S. Elmusrati – University of VAASA 62

Example (9)
• In example (8) find analytically the ML estimation
for two measurements and the following three
cases:
é s2 0 ù é s2 0 ù é s2 a ù
R nn = ê ú R =ê 1 ú R =ê 1 ú
ê 0 s2 ú nn ê 0 s 2 ú nn ê a s 2 ú
ë û ë 2 û ë 2 û

• Compare and comments the results

Mohammed S. Elmusrati – University of VAASA 63

Example (9)

Mohammed S. Elmusrati – University of VAASA 64

Example (9)

Mohammed S. Elmusrati – University of VAASA 65

Example (9)

Mohammed S. Elmusrati – University of VAASA 66

Example 10
• Assume a system consists of complex interconnected
subsystems. Those subsystems may have faults independently
with exponential distributed time and parameter λ. The
system is robust so that it will have general fault only of k of
subsystems have failed to operate.
• Therefore, the time until the system has general failure is give
k
by
y=å x i
i=1
Where xi is exponential distributed random variable with
parameter λ. Hence, the distribution of y is Gamma with the
following probability density function y k-1l k - l y
( )
fY y =
()
G k
e
Mohammed S. Elmusrati – University of VAASA 67
Example 10
• We have database history about a certain system with several
time of failures record as y=[y1,y2,…,yN].
• Assume we do not know k nor λ. Based on the observations y,
find the MLE estimation of k and λ.
• Solution:
• Since we assume that all records are independent, hence

• The log-likelihood function is given by

Mohammed S. Elmusrati – University of VAASA 68

Example 10
• Now we can find the optimum parameters that maximizes the
log-likelihood function as

• Substituting the last result in the above equation we obtain

æ ö æ N ö
( ) å
( ) (( )) ()
ç Nk ÷ G¢ k G ç yi ÷
N ¢ k 1 N dG k
log ç N ÷ -
ç
ML

( )
ML

÷ G kML
1
N i=1
( )
+ å log yi = 0Þ log kML -
G kML
ML
= log ç i=1

ç N ÷ N i=1
( ) ( )
÷ - å log yi , where G¢ kML =
dk
çè å y i÷ ç ÷ø k=kML
i=1 ø Mohammed S. Elmusrati – Universityèof VAASA 69
Example 10
• It is clear from the previous result that estimating
the parameters of the gamma distribution, we
need to solve non-linear function. There are many
efficient numerical methods could be used to
solve the previous equation.
• If we have the following database of y=[2, 3, 7, 9,
3, 5], estimate the parameters k and λ.

Mohammed S. Elmusrati – University of VAASA 70

Exercise
• You have database history about a certain system
as y=[y1,y2,…,yN]. We believe that yi represents
Chi-Square random variables (slide 116 in Part 1).
Find the MLE estimation of its number of freedom
and variance.

Mohammed S. Elmusrati – University of VAASA 71

THANK YOU

Mohammed S. Elmusrati – University of VAASA 72

Chapter 2: Statistical Inference, Point Estimation, and Confidence Intervals
No ratings yet
Chapter 2: Statistical Inference, Point Estimation, and Confidence Intervals
16 pages
Week 1 1720465962 Estimation Hour 2
No ratings yet
Week 1 1720465962 Estimation Hour 2
14 pages
LN Estimation Theory
No ratings yet
LN Estimation Theory
11 pages
Stat 450850 Notes 2012
No ratings yet
Stat 450850 Notes 2012
190 pages
Chap 10
No ratings yet
Chap 10
7 pages
Classical Statistics & MLE Guide
No ratings yet
Classical Statistics & MLE Guide
8 pages
6 Point Estimation
No ratings yet
6 Point Estimation
49 pages
Solutions Exercises Chapter 2: Dependence)
No ratings yet
Solutions Exercises Chapter 2: Dependence)
3 pages
SLRM Note
No ratings yet
SLRM Note
15 pages
Estimation Bertinoro09 Cristiano Porciani 1
No ratings yet
Estimation Bertinoro09 Cristiano Porciani 1
42 pages
Design and Analysis of Computer Experiments: Theory: 1 Density Estimation
No ratings yet
Design and Analysis of Computer Experiments: Theory: 1 Density Estimation
9 pages
Point Estimation
No ratings yet
Point Estimation
41 pages
STAT2102 Chapter6
No ratings yet
STAT2102 Chapter6
5 pages
RegEstimationLS ML StatColumbia
No ratings yet
RegEstimationLS ML StatColumbia
44 pages
01 Estimation PDF
No ratings yet
01 Estimation PDF
13 pages
Econ-2042 - Unit 6-W12-13
No ratings yet
Econ-2042 - Unit 6-W12-13
77 pages
CH 5
No ratings yet
CH 5
24 pages
Lectura 1 Point Estimation
No ratings yet
Lectura 1 Point Estimation
47 pages
Lecture Notes For Mathematical Statistics
No ratings yet
Lecture Notes For Mathematical Statistics
184 pages
Estimation 4
No ratings yet
Estimation 4
16 pages
Statistical Methods
No ratings yet
Statistical Methods
25 pages
Lecture5 Module2 Anova 1
No ratings yet
Lecture5 Module2 Anova 1
9 pages
Module 4: Point Estimation: Statistics (OA3102)
No ratings yet
Module 4: Point Estimation: Statistics (OA3102)
41 pages
3.exponential Family & Point Estimation - 552
0% (1)
3.exponential Family & Point Estimation - 552
33 pages
Block 1
No ratings yet
Block 1
83 pages
Chapter4 Estimation
No ratings yet
Chapter4 Estimation
28 pages
Statistical Inference Cheat Sheet
No ratings yet
Statistical Inference Cheat Sheet
4 pages
7 Mle
No ratings yet
7 Mle
31 pages
Chap 3
No ratings yet
Chap 3
15 pages
ECON 1630 Problem Set #2 Fall 2021: Bias Variance
No ratings yet
ECON 1630 Problem Set #2 Fall 2021: Bias Variance
9 pages
Sample Theory With Ques. - Estimation (JAM MS Unit-14)
No ratings yet
Sample Theory With Ques. - Estimation (JAM MS Unit-14)
25 pages
Lec 5
No ratings yet
Lec 5
24 pages
Slide MI2036 Chap5
No ratings yet
Slide MI2036 Chap5
52 pages
For Section 1.7
No ratings yet
For Section 1.7
11 pages
Advanced Econometrics & Estimation
No ratings yet
Advanced Econometrics & Estimation
81 pages
Estimation Theory: x, x, x ,…… ……x ,x f x,θ θ θ θ
No ratings yet
Estimation Theory: x, x, x ,…… ……x ,x f x,θ θ θ θ
18 pages
Notes Estimation Theory
100% (3)
Notes Estimation Theory
39 pages
Estimation & Detection Theory Guide
No ratings yet
Estimation & Detection Theory Guide
6 pages
PDF Estimation Corr
No ratings yet
PDF Estimation Corr
43 pages
Chapitre 9 - Point Estimation
No ratings yet
Chapitre 9 - Point Estimation
35 pages
Estimation Theory Course Guide
No ratings yet
Estimation Theory Course Guide
21 pages
Chapter10 Solutions
No ratings yet
Chapter10 Solutions
62 pages
4 Statistics
No ratings yet
4 Statistics
3 pages
Lecture 13
No ratings yet
Lecture 13
12 pages
Properties of Estimators Explained
No ratings yet
Properties of Estimators Explained
43 pages
AllNotes 4
No ratings yet
AllNotes 4
56 pages
Introecon Estimators Properties
No ratings yet
Introecon Estimators Properties
8 pages
Chap 2
No ratings yet
Chap 2
40 pages
ML Notes
No ratings yet
ML Notes
4 pages
Estimation Theory for Engineers
No ratings yet
Estimation Theory for Engineers
40 pages
Stat-Review Xid-8243919 1
No ratings yet
Stat-Review Xid-8243919 1
24 pages
Statistical Inference
No ratings yet
Statistical Inference
35 pages
Statistics - Lec08 - Point Estimation - Comparing
No ratings yet
Statistics - Lec08 - Point Estimation - Comparing
15 pages
00 Estimation
No ratings yet
00 Estimation
33 pages
Point Estimation: Institute of Technology of Cambodia
No ratings yet
Point Estimation: Institute of Technology of Cambodia
22 pages
Lecture 4
No ratings yet
Lecture 4
26 pages
Caie D 22 03171
No ratings yet
Caie D 22 03171
36 pages
Unit-5 (Notes AI)
No ratings yet
Unit-5 (Notes AI)
28 pages
Inference in The Stochastic Cox-Ingersol-Ross Diff
No ratings yet
Inference in The Stochastic Cox-Ingersol-Ross Diff
6 pages
Generalized Linear Failure Rate Distribution
No ratings yet
Generalized Linear Failure Rate Distribution
23 pages
Graded Response Model Guide
No ratings yet
Graded Response Model Guide
16 pages
Crop Production Efficiency in Gurage Zone
100% (1)
Crop Production Efficiency in Gurage Zone
13 pages
Econometrics 1 Slide2b
No ratings yet
Econometrics 1 Slide2b
14 pages
Performance Evaluation of National Football League Teams
No ratings yet
Performance Evaluation of National Football League Teams
8 pages
Logistic Regression-Advanced Biostat PDF
No ratings yet
Logistic Regression-Advanced Biostat PDF
86 pages
1 PDF
No ratings yet
1 PDF
44 pages
Machine Learning Course Notes
No ratings yet
Machine Learning Course Notes
112 pages
Grade 11: Module 1-2: Exploring Random Variable and Constructing Probability Distribution
No ratings yet
Grade 11: Module 1-2: Exploring Random Variable and Constructing Probability Distribution
9 pages
ps3 Sol
No ratings yet
ps3 Sol
21 pages
Bayesian Skin Detection Method
No ratings yet
Bayesian Skin Detection Method
5 pages
Handout 9 PDF
No ratings yet
Handout 9 PDF
79 pages
IandF CT6 201509 ExaminersReport v4
No ratings yet
IandF CT6 201509 ExaminersReport v4
14 pages
White Noise With Arima Modelling
No ratings yet
White Noise With Arima Modelling
9 pages
Statistics in The Health Sciences Theory, Applications, and Computing 1st Edition ISBN 1138196894, 9781138196896 Full Download
100% (1)
Statistics in The Health Sciences Theory, Applications, and Computing 1st Edition ISBN 1138196894, 9781138196896 Full Download
16 pages
Arms Method For RA in Aviation 1141
No ratings yet
Arms Method For RA in Aviation 1141
67 pages
Berns Et Al-2010-Color Research Application
No ratings yet
Berns Et Al-2010-Color Research Application
11 pages
Multilevel Analysis Techniques and Applications Th... - (3. Estimation and Hypothesis Testing in Multilevel Regression)
No ratings yet
Multilevel Analysis Techniques and Applications Th... - (3. Estimation and Hypothesis Testing in Multilevel Regression)
14 pages
CS229: Factor Analysis Explained
No ratings yet
CS229: Factor Analysis Explained
9 pages
Panel Patent Data Using Poisson, - Ve Binomial and GMM
No ratings yet
Panel Patent Data Using Poisson, - Ve Binomial and GMM
32 pages
Methodology Mathematics T Coursework
100% (2)
Methodology Mathematics T Coursework
8 pages
STAT 231 Outline
No ratings yet
STAT 231 Outline
8 pages
Multivariate Poisson Regression Models
No ratings yet
Multivariate Poisson Regression Models
14 pages
Acteduk Ct6 Hand Qho v04
No ratings yet
Acteduk Ct6 Hand Qho v04
58 pages
Model-Assisted Bayesian Designs For Dose Finding and Optimization Methods and Applications
No ratings yet
Model-Assisted Bayesian Designs For Dose Finding and Optimization Methods and Applications
234 pages
Pelletier Wei Backtesting v03
No ratings yet
Pelletier Wei Backtesting v03
37 pages

Part 2 Estimation

Uploaded by

Part 2 Estimation

Uploaded by

Probability and Stochastic

Concepts of Estimation Theory

Mohammed S. Elmusrati – University of VAASA 1

Mohammed S. Elmusrati – University of VAASA 3

Mohammed S. Elmusrati – University of VAASA 4

• Here, we consider only one mapping function and we

Mohammed S. Elmusrati – University of VAASA 5

Mohammed S. Elmusrati – University of VAASA 6

Mohammed S. Elmusrati – University of VAASA 7

Mohammed S. Elmusrati – University of VAASA 8

• The element (i,j) of the Fisher Information matrix

• Where f(y;x) is the joint probability density function

• We may define the lower bound of the estimate variance

• Where fXY(x,y) is the joint distribution between the

• We can find xest that minimize this function as

• Hence, dxest ò-¥ fY y êê -¥ò x - xest y XY

Mohammed S. Elmusrati – University of VAASA 15

Mohammed S. Elmusrati – University of VAASA 17

• This means that the optimum estimate is the median of

Mohammed S. Elmusrati – University of VAASA 18

Mohammed S. Elmusrati – University of VAASA 19

– Minimizing the error variance L-2 Norm: the mean of f X Y x y ( )

Mohammed S. Elmusrati – University of VAASA 20

Mohammed S. Elmusrati – University of VAASA 21

• However, unfortunately, it is a real challenge to have accurate

Mohammed S. Elmusrati – University of VAASA 22

{ ( ( ))} = max{log ( f ( y x )) + log ( f ( x ))}

Mohammed S. Elmusrati – University of VAASA 25

Mohammed S. Elmusrati – University of VAASA 28

• Hence, when fx(x)=constant, then its differentiation is

Mohammed S. Elmusrati – University of VAASA 29

Mohammed S. Elmusrati – University of VAASA 31

• Although it is easy to find the MLE estimation of this problem by

Mohammed S. Elmusrati – University of VAASA 34

Mohammed S. Elmusrati – University of VAASA 35

• Hence the estimator is also consistent ☺

• Find the value of α

• How this knowledge might affect our optimum

Mohammed S. Elmusrati – University of VAASA 39

Þ a = 3.9207 Prove it?

Looking to the priori distribution, the

probability of being Success of Fail is not

Mohammed S. Elmusrati – University of VAASA 40

Mohammed S. Elmusrati – University of VAASA 42

If we have only two

Mohammed S. Elmusrati – University of VAASA 43

Mohammed S. Elmusrati – University of VAASA 44

Mohammed S. Elmusrati – University of VAASA 45

Mohammed S. Elmusrati – University of VAASA 47

• Hence, the conventional mean computation is the

Mohammed S. Elmusrati – University of VAASA 48

Mohammed S. Elmusrati – University of VAASA 50

Mohammed S. Elmusrati – University of VAASA 51

Mohammed S. Elmusrati – University of VAASA 53

Mohammed S. Elmusrati – University of VAASA 54

Mohammed S. Elmusrati – University of VAASA 55

Mohammed Salem Elmusrati 57

• Since fy is not function in the parameter x, then we may ignore

Mohammed S. Elmusrati – University of VAASA 59

Mohammed S. Elmusrati – University of VAASA 61

Mohammed S. Elmusrati – University of VAASA 62

• Compare and comments the results

Mohammed S. Elmusrati – University of VAASA 63

Mohammed S. Elmusrati – University of VAASA 64

Mohammed S. Elmusrati – University of VAASA 65

Mohammed S. Elmusrati – University of VAASA 66

• The log-likelihood function is given by

Mohammed S. Elmusrati – University of VAASA 68

• Substituting the last result in the above equation we obtain

Mohammed S. Elmusrati – University of VAASA 70

Mohammed S. Elmusrati – University of VAASA 71

Mohammed S. Elmusrati – University of VAASA 72

You might also like