0% found this document useful (0 votes)
9 views17 pages

Lec 8

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views17 pages

Lec 8

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Statistical Signal Processing

Prof. Prabin Kumar Bora


Department of Electronics & Electrical Engineering
Indian Institute of Technology - Guwahati

Lecture 8
Estimation Theory 1

Hello students, welcome to lecture 8 on estimation theory 1. In this lecture, I will discuss the
basics of estimation theory and some of the important properties of estimators. Let us start with
the introduction.
(Refer Slide Time: 00:58)

Introduction
We discussed the modeling of random data in terms of their joint PDF or PMF in the case of
discrete. We have to fit this joint PDF from the observed data. Assuming the nature of the joint
PDF, suppose we know the nature, what is the PDF, and there is an unknown parameter  , we
have to determine the best value for this unknown parameter  . So, that way estimation comes.
For modeling random data by a WSS random process, we have to determine the value of the
autocorrelation function at different lags. Therefore, estimation of autocorrelation function is to
be done. For fitting an ARMA model, discussed in the last lectures, we have to find the best
values of the model parameters from data. This is again a case of estimation of the model
parameters. We may have to determine the correct value of a signal from the observed noisy
data, from the observed noisy signals. So, that way, this is also a case of estimation of signal
from noisy observations. All the above cases are examples of application of the estimation
theory. Particularly, this is an example of signal estimation and these 1, 2 and 3 are examples of
parameter estimation. And we can do also signal estimation by using parameter estimation. For
example, by fitting the ARMA model, we can estimate a signal. So that way, estimation includes
two classes: parameter estimation and signal estimation. We will discuss the problem of
parameter estimation.
(Refer Slide Time: 03:03)

Parameter estimation

We have a sequence of observed random variables, X 1 , X 2 , ...., X N , there are N random

variables represented by a random vector. This is the representation.

X1 
 
X2
X     [ X X ... X ]
 
1 2 N

 
X N 

It is represented as a column vector and which we can write as the row vector transpose. So that
way, observed data are modeled as random variable. Now, particular values of the observed data,
what we observed, the observed data vector, x  [ x1 , x 2 , ..., x N ]  is a realization for the samples of
X . Because X is a random vector and whatever we observed at a particular time, these
constitute a realization for the samples of X . So, small x is a realization and capital X is a
random vector. Sometimes we consider X i ’s to be iid, that is also one important concept,

independent and identically distributed. All X 1 , X 2 , ...., X N are independent and each of them
will have the same distribution. X is characterized by a joint PDF which depends on some
unobservable parameter  given by this, f X ( x1 , x 2 , ..., x N ;  )  f X ( x ;  ) .

We can write the joint PDF in terms of a parameter  and this in the vector notation we will
write like this f ( x ;  ) and we will omit this x also. So, this will be simply f ( x ;  ) .
(Refer Slide Time: 04:57)

Estimators
Now, what is an estimator? An estimator ˆ ( X )  ˆ ( X 1 , X 2 , ..., X N ) , is a rule by which we

guess about the value of an unknown parameter  . So, we will guess about the unknown
parameter  using this rule. It is a function of the random variable X 1 , X 2 , ..., X N , and it does
not involve any unknown parameter. This functional relationship does not involve any unknown
parameter. Such a function is generally called a statistic. So, an estimator is a statistic. Being a

function of random variable, ˆ ( X ) is also random. So, that way this is a random variable,

ˆ ( X ) is a random variable. For a particular observation x1 , x 2 , ...., x N , we get what is known as

an estimate, not estimator of the parameter  . Thus, when we consider in terms of random

variable, ˆ ( X )  ˆ ( X 1 , X 2 , ..., X N ) is an estimator. And ˆ ( x )  ˆ ( x 1 , x 2 , ..., x N ) is an estimate.


So, this we have to distinguish, an estimator and estimate.

(Refer Slide Time: 06:29)


Example 1
Let us give an example. Example 1: let X 1 , X 2 , ...., X N be a set of independent, normal, N (  ,1)

random variables with unknown mean  . For this normal distribution,  is unknown. The joint
PDF of X 1 , X 2 , ...., X N is given by product of individual marginal PDFs as,
N 1 2
1  ( xi   )
f ( x 1 , x 2 , ..., x N ;  )   e 2

i 1 2

So, using the property of exponential function, we can write it in terms of a sum as
N
1

2
N 1 2  ( xi   )
1  ( xi   ) 1

2
f ( x 1 , x 2 , ..., x N ;  )  e 2
 e i 1
So, this is the joint PDF of data
 
N
i 1 2 2

and here unknown quantity is  and if we take


N
1
ˆ 
N
 X i
, it is an estimator for  . It does not involve any unknown parameter and it is a
i 1

function of the random variables. So it is an estimator for  .

(Refer Slide Time: 07:42)


Example 2
But question will arise, how we can get such an estimator? Example 2: this is a practical
example. Suppose we have a DC voltage  corrupted by noise V i and the observed data

X i , i  1, 2 , ..., N are given by X i    V i . Now, we assume that’s are independent and each is

distributed as normal N ( 0 ,  2 ) , it is 0 mean,  2 is the variance. Thus, X i ’s are iid, because X i

is a sum of   V i , this is 0 mean, so, X i ’s are iid and it is normal, N ( ,  2 ) random variable.

So, it is normal distribution with mean  and variance  2 . Then again, since it is a normal
n
1
distribution, this ˆ   X i
, is an estimator of  .
n i 1

(Refer Slide Time: 08:59)


Properties of an estimator
Let us discuss the desirable properties of an estimator. A good estimator should satisfy some
properties. These properties are described in terms of the mean and the variance of the estimator.
First, we will discuss unbiased estimator.

 An estimator ˆ of  is said to be unbiased, if and only if, E ˆ   . Unbiasedness means


that on the average, the estimator gives the true value of the parameter. So, on the
average, estimator gives the true value of the parameter. It is a desirable property.

 Now, if it is not an unbiased estimator, suppose ˆ is not an unbiased estimator, then

b (ˆ )  E ˆ   is called the bias of the estimator. So, this quantity, difference between
expected value and the true value, is called the bias of the estimator. It is desirable that

b (ˆ ) decreases as n increases, eventually it should go down to 0.

 ˆ is said to be an asymptotically unbiased estimator, if lim E ˆ   . So, as we have large


n 

number of data, then E ˆ will become close to  . Unbiasedness is necessary, but not
sufficient to make an estimator a good one. We have to consider other properties.
(Refer Slide Time: 10:51)

Example 3: example 2 revisited


We will give another example. Example 3: example 2 revisited. Suppose, we have a DC voltage
 corrupted by noise V i and the observed data X i , i  1, 2 , ..., N are given by X i    V i . If

V i ’s are 0 mean, then E X i  E   E V i   .


n
1
For ˆ   X i
, that is the estimator we are considering, E ˆ will be
n i 1
n
1 1
E ˆ   EX i
  N  
N i 1 N

1
Now, E X i   . Therefore, this will be E ˆ   N   . Therefore, this ˆ is an unbiased
N

estimator of  .
(Refer Slide Time: 11:56)

Example 4
We will consider another example. This is the estimator for variance. Suppose
X 1 , X 2 , ...., X N
are iid, independent and identically distributed, and we have two estimators for
N
1
variance  2 . There are two estimators, one is ˆ 1 2   ( X i  ˆ )
2
and second estimator is
N i 1

same as this summation,


N
1 1 1
ˆ 2   ( X i  ˆ )
2 2
but that scaling is and here . So that way, we have two
N 1 i 1 N 1 N

estimators, ˆ 1 2 and ˆ 2 2 . We can show that this estimator, ˆ 2 2 is an unbiased estimator, but this

ˆ 1 is not an unbiased estimator. For this, we will consider what is E ( X i  ˆ )


2 2
? First, we will
N

determine that E  ( X i  ˆ ) 2 , that we will write in terms of  . So, that way


i 1

E  ( X i  ˆ )  E  ( X i      ˆ )
2 2
.
i 1

So, this quantity now we can expand as


E  ( X i      ˆ )
2

  ( E ( X i   )  E (   ˆ )  2 E ( X i   )(   ˆ ))
2 2

First term will be E ( X i   ) 2 , the second term will be E (   ˆ ) 2 and then crossed terms will

be there, 2 E ( X i   )(   ˆ ) . So, I know that this E ( X i   ) 2 is iid, so that way, there will be

N such terms, because of this summation. So, we can write this expectation as
N

E  ( X i  ˆ )  N   E (   ˆ )  2 E ( X i   )(   ˆ ) .
2 2 2

i 1

So, let us see what is this term E (   ˆ ) 2 will equal to? This expression is important for us. So,

E (   ˆ )
2
, that is variance of ̂ is
2
  Xi 
E    ˆ 
2
 E   .
 N 

E
(N   
2
So, if I take N common, we get then because of this square, we get 2
Xi) and this
N

E
(  ( X i   ))
2
we can write as 2
. Now, this is again a summation and this summation we can
N

write in terms of the individual square terms and then all cross terms as
E E
(  ( X i   ))  (Xi  )   E ( X i   )( X  )
2 2
2 2 j
N N i ji

Because these are cross terms ( X i   )( X j   ) in this form, therefore, joint expectation

E ( X i   )( X j
 ) of this quantities will be 0, because of independent. X i and X j
are

independent, therefore, this expression  E ( X i   )( X j


 ) will become 0. So, therefore,
i ji


2
E
(Xi  )
2
this will be simply 2
and this is equal to . Because, each is identically
N N

distributed, therefore, variance will be same for all X i ’s and this term will E  ( X i   ) 2 will

N 
2 2

be  . So that way, we will get


2
2
here, that way, it will be . So, this is an important
N N


2

observation, therefore, this quantity, E    ˆ  ,is equal to


2
. This result will be used later
N

on also. We have to remember this result.


(Refer Slide Time: 16:12)
And similarly, we can show that E ( X i   )(   ˆ ) . Now, ̂ will involve all the random

variables, but only in the case where it is X i , then only this expectation will become non-zero,


2

otherwise this expectation will become 0. So that way, this expression will become  .
N

Therefore, we have to determine this expression,


N

E  ( X i  ˆ )  N  E (   ˆ )  2 E ( X i   )(   ˆ ))
2 2 2
.
i 1

So, we can write this as


N

E  ( X i  ˆ )  N   2
2 2 2 2
.
i 1

So that way, it will become, because this is  2 , this is  2  2 , so that way, it will become
( N  1) 
2
. So, this will imply that now, if we have to find out the E ˆ 12 that is scaling factor is

1 N 1
into this quantity E  ( X i  ˆ ) 2 . So, that way it will become  . So, E ˆ 12 is not equal
2

N N

to  2 . So, if I consider this E ˆ 12 that is not equal to  2


therefore, it is not an unbiased

1
estimator. But if we consider E ˆ 22  E  ( X i  ˆ )
2
that is equal to  2 . Therefore, ˆ 22
N 1

will be unbiased estimator. So ˆ 22 is an unbiased estimator of  2 while ˆ 12 is not. That is why,

in determining the sample variance, we always divide by ( N  1) to make it an unbiased

estimator. But as N tends to infinity this quantity will become 1, therefore ˆ 12 is asymptotically
unbiased.
(Refer Slide Time: 18:48)
Variance of the estimator
So, we considered the mean of the estimator. We will consider the variance of the estimator. The

variance of the estimator ˆ is given by this, v a r(ˆ )  E (ˆ  E (ˆ )) 2 . That is the variance, ˆ is

the random variable and E (ˆ ) is mean. So, that way, this is the variance. For the unbiased case,

this variance of ˆ will be simply v a r(ˆ )  E (ˆ   ) 2 , because E (ˆ )   . The variance of an

estimator should be as low as possible. An unbiased estimator ˆ is called a minimum variance


unbiased estimator, MVUE, if

E (ˆ   )  E (ˆ    ) .
2 2

That is variance of ˆ , is less than equal to variance of ˆ  , where ˆ  is any other unbiased
estimator.
(Refer Slide Time: 20:00)
Mean square error (MSE) of the estimator
So, we discussed mean, variance. Another term mean square error, MSE is given by

M S E  E (  ˆ ) .
2
Minimizing the MSE is an important estimator criterion. MSE should be as
small as possible. Out of all unbiased estimators, the MVUE has the minimum mean square
error. So, in the case of unbiased estimator, MVUE will have the least MSE. Now, MSE is
related to bias and variance as shown below.

M S E  v a r(ˆ )  b (ˆ ) .
2
This we can prove easily.

M S E  E (ˆ   ) ,
2
by definition.
Now we can write

E (ˆ   )  E (ˆ  E ˆ  E ˆ   ) . We are subtracting E ˆ and adding E ˆ . Now, we expand


2 2

it as

E (ˆ  E ˆ  E ˆ   )
2

 E (ˆ  E ˆ )  E ( E ˆ   )  2 E (ˆ  E ˆ )( E ˆ   )
2 2

 E (ˆ  E ˆ )  E ( E ˆ   )  2 ( E ˆ  E ˆ )( E ˆ   )
2 2

Now, this first term is the variance of ˆ and second term, because it is a constant quantity, it will

remain as ( E ˆ   ) only. And therefore, this quantity, E ( E ˆ   ) 2 , will be simply bias of ˆ 2 ,

b (ˆ ) . And the third term, because E ˆ  E E ˆ , that will be same as the E ˆ  E ˆ , because this
2

E ˆ is a constant quantity, so this term will become 0. Therefore, this term

2 ( E ˆ  E ˆ )( E ˆ   ) will become 0.

So therefore, what we get, M S E  v a r(ˆ )  b 2 (ˆ ) . So, this is one important relationship and

when this quantity, b 2 (ˆ ) , will become 0, bias 0, unbiased case. In that case, M S E  v a r(ˆ ) .
(Refer Slide Time: 22:19)
Consistent estimators
Now we will discuss consistent estimators. As we have more data, the quality of estimations
should be better. This idea is used in defining the consistent estimator. So, this estimator is a
good estimator, if we have large amount of data. An estimator, ˆ , is called a consistent

estimator, if ˆ converges in probability to  . So, convergence in probability that is defined in


this way.

lim P
N
 ˆ -    0 for   0 .

So, that means probability of the deviation, as N   , that should go down to 0 for any   0 .
So that way, we have defined a consistent estimator. Thus, a consistent estimator converges to
the true value of  in probability. Less rigorous test, because here we have to determine the
probability, so there is a less rigorous test is obtained by applying the Chebyshev inequality,

E ( ˆ -  )
2

P  ˆ -     

2

Here E ( ˆ -  ) 2 is nothing but the MSE. Therefore, if this quantity will be 0, then this probability

will also become 0. Therefore, if lim E (ˆ   ) 2  0 , then ˆ will be a consistent estimator. So,
N

this is the consistent estimator, this is the test for consistent estimator we will be using.
(Refer Slide Time: 24:15)
Now if ˆ is unbiased, in that case, MSE is same as variance of ˆ . Thus, if the estimator ˆ is

unbiased and variance of ˆ goes down to 0 as N   , then ˆ will be a consistent estimator.


Note that consistency is an asymptotic property. As we get more and more data, the estimate
become better and better. That is the concept behind consistent estimator.
(Refer Slide Time: 24:48)

Example 5
Let us consider one example. Suppose X 1 , X 2 , ..., X N are iid with unknown  and known
N
1
variance  2 and suppose ˆ 
N
 X i
. So, E ˆ will be
i 1

N
1
E ˆ 
N
 EX i
  .
i 1

So, that way it will be  , so that ̂ is unbiased. Variance of ̂ now is


v a r ( ˆ )  E  ˆ   
2

2
  X 
 E   
i

2 N 


N

That we have already determined in our example 4. So, that way, this will be equal to simply
 
2 2

. So, v a r( ˆ )  because X i ’s are iid. So, as limit N   , this variance will become 0.
N N

So,

2

lim v a r( ˆ )  lim  0 .
N N N

Therefore, ̂ is a consistent estimator of  . So that way, sample mean is not only unbiased, but
it is also consistent. That means, if we have more and more number of data, then variants will go
down to the 0 and we will get the true value of  . So, that is the idea behind consistent
estimators.
(Refer Slide Time: 26:32)

Efficient Estimator
We will introduce another term, efficient estimator. Suppose,  1 and  2 are two unbiased

estimators of  , with variance of  1 less than variance of  2 . This is a better estimator. The

relative efficiency of estimator  2 with respect to estimator  1 is defined by this ratio

v a r ( 1 )
R e la tiv e E f f ic ie n c y 
v a r (ˆ2 )
Variance of  1 divided by variance of  2 . This variance v a r( 1 ) is less, therefore, this number

will be less than 1. Particularly, if  1 is an MVUE, then  1 will be called an efficient estimator
and the absolute efficiency of an unbiased estimator with respect to this estimator. So, this is
relative efficiency, but absolute efficiency of an unbiased estimator will be determined with
respect to the MVUE. So here, this will be the estimator with minimum variance. Then we will
call this efficiency as the absolute efficiency.
(Refer Slide Time: 27:46)

Example
We will give another example. Suppose, X 1 , X 2 , ,X N
are iid random variables with
N
1
unknown mean  and ̂ and ̂ 1 are two estimators of  given by ˆ 
N
 X i
, this is the
i 1

1
sample mean as usual, and ̂ 1 , suppose we define that is, equal to (X1  X 2) . Now, both are
2

unbiased estimator and we see that variance of ̂ , this quantity, sample means, is equal to


2

var     . And we can show that using the same formula N  2 , variance of ̂ 1 will be
N


2
2
v a r  ˆ 1   . Therefore, efficiency, relative efficiency of ̂ 1 will be E ffic ie n c y o f  ˆ 1   ,
2 N

2
if I consider the ratio of this quantity divided by this quantity, then we will get . So, this is the
N

relative efficiency of estimator ̂ 1 with respects to ̂ .

(Refer Slide Time: 28:54)


Let us summarize
 The observed random data X 1 , X 2 , ..., X N , these are random variables, are modeled as
random variables and are characterized by a joint PDF which depends on some unobservable
parameter  .

 An estimator ˆ ( X )  ˆ ( X 1 , X 2 , ..., X N ) is a rule by which we guess about the value of the

unknown parameter  .

 An estimator ˆ of  is said to be unbiased, we introduce what is an unbiased estimator, so

unbiased, if and only if E ˆ   . Unbiasedness means that on the average the estimator gives
the true value of the parameter. It is a desirable property.

 ˆ is said to be asymptotically unbiased, if lim E ˆ   .


N

 Then, we discussed about MVUE. An unbiased estimator, ˆ , is called a minimum variance

unbiased estimator, MVUE, if variance of ˆ is less than equal to variance of ˆ  , where ˆ  is


any other unbiased estimator. So, in the case of MVUE, variance is least, minimum variance
unbiased estimator.
(Refer Slide Time: 30:33)
 We also discuss about the mean square error, MSE, of an estimator given by

M S E  E (  ˆ ) .
2

 MSE is related to the variance and bias by this relationship, M S E  v a r(ˆ )  b 2 (ˆ ) .

 Then, we defined consistent estimator. An estimator ˆ is called a consistent estimator of  ,

if ˆ converges in probability to  . Thus, what we have, lim N


P  ˆ -    0 for any

  0 . So, for any   0 , if probability of this deviation goes down to 0 as N   , then

the estimator will be consistent.

 If the estimator ˆ is unbiased and variance of ˆ goes down to 0 as N   , then ˆ will be


a consistent estimator. This is a test for the consistency of an unbiased estimator, variance of

ˆ should go down to 0 as N   .

 The relative efficiency of an estimator ˆ2 with respect to ˆ1 , which has lower variance, is

v a r ( 1 )
given by R e la tiv e E f f ic ie n c y  .
v a r (ˆ2 )

Thank You.

You might also like