Lec 8
Lec 8
Lecture 8
Estimation Theory 1
Hello students, welcome to lecture 8 on estimation theory 1. In this lecture, I will discuss the
basics of estimation theory and some of the important properties of estimators. Let us start with
the introduction.
(Refer Slide Time: 00:58)
Introduction
We discussed the modeling of random data in terms of their joint PDF or PMF in the case of
discrete. We have to fit this joint PDF from the observed data. Assuming the nature of the joint
PDF, suppose we know the nature, what is the PDF, and there is an unknown parameter , we
have to determine the best value for this unknown parameter . So, that way estimation comes.
For modeling random data by a WSS random process, we have to determine the value of the
autocorrelation function at different lags. Therefore, estimation of autocorrelation function is to
be done. For fitting an ARMA model, discussed in the last lectures, we have to find the best
values of the model parameters from data. This is again a case of estimation of the model
parameters. We may have to determine the correct value of a signal from the observed noisy
data, from the observed noisy signals. So, that way, this is also a case of estimation of signal
from noisy observations. All the above cases are examples of application of the estimation
theory. Particularly, this is an example of signal estimation and these 1, 2 and 3 are examples of
parameter estimation. And we can do also signal estimation by using parameter estimation. For
example, by fitting the ARMA model, we can estimate a signal. So that way, estimation includes
two classes: parameter estimation and signal estimation. We will discuss the problem of
parameter estimation.
(Refer Slide Time: 03:03)
Parameter estimation
X1
X2
X [ X X ... X ]
1 2 N
X N
It is represented as a column vector and which we can write as the row vector transpose. So that
way, observed data are modeled as random variable. Now, particular values of the observed data,
what we observed, the observed data vector, x [ x1 , x 2 , ..., x N ] is a realization for the samples of
X . Because X is a random vector and whatever we observed at a particular time, these
constitute a realization for the samples of X . So, small x is a realization and capital X is a
random vector. Sometimes we consider X i ’s to be iid, that is also one important concept,
independent and identically distributed. All X 1 , X 2 , ...., X N are independent and each of them
will have the same distribution. X is characterized by a joint PDF which depends on some
unobservable parameter given by this, f X ( x1 , x 2 , ..., x N ; ) f X ( x ; ) .
We can write the joint PDF in terms of a parameter and this in the vector notation we will
write like this f ( x ; ) and we will omit this x also. So, this will be simply f ( x ; ) .
(Refer Slide Time: 04:57)
Estimators
Now, what is an estimator? An estimator ˆ ( X ) ˆ ( X 1 , X 2 , ..., X N ) , is a rule by which we
guess about the value of an unknown parameter . So, we will guess about the unknown
parameter using this rule. It is a function of the random variable X 1 , X 2 , ..., X N , and it does
not involve any unknown parameter. This functional relationship does not involve any unknown
parameter. Such a function is generally called a statistic. So, an estimator is a statistic. Being a
function of random variable, ˆ ( X ) is also random. So, that way this is a random variable,
an estimate, not estimator of the parameter . Thus, when we consider in terms of random
random variables with unknown mean . For this normal distribution, is unknown. The joint
PDF of X 1 , X 2 , ...., X N is given by product of individual marginal PDFs as,
N 1 2
1 ( xi )
f ( x 1 , x 2 , ..., x N ; ) e 2
i 1 2
So, using the property of exponential function, we can write it in terms of a sum as
N
1
2
N 1 2 ( xi )
1 ( xi ) 1
2
f ( x 1 , x 2 , ..., x N ; ) e 2
e i 1
So, this is the joint PDF of data
N
i 1 2 2
X i , i 1, 2 , ..., N are given by X i V i . Now, we assume that’s are independent and each is
is a sum of V i , this is 0 mean, so, X i ’s are iid and it is normal, N ( , 2 ) random variable.
So, it is normal distribution with mean and variance 2 . Then again, since it is a normal
n
1
distribution, this ˆ X i
, is an estimator of .
n i 1
b (ˆ ) E ˆ is called the bias of the estimator. So, this quantity, difference between
expected value and the true value, is called the bias of the estimator. It is desirable that
number of data, then E ˆ will become close to . Unbiasedness is necessary, but not
sufficient to make an estimator a good one. We have to consider other properties.
(Refer Slide Time: 10:51)
1
Now, E X i . Therefore, this will be E ˆ N . Therefore, this ˆ is an unbiased
N
estimator of .
(Refer Slide Time: 11:56)
Example 4
We will consider another example. This is the estimator for variance. Suppose
X 1 , X 2 , ...., X N
are iid, independent and identically distributed, and we have two estimators for
N
1
variance 2 . There are two estimators, one is ˆ 1 2 ( X i ˆ )
2
and second estimator is
N i 1
estimators, ˆ 1 2 and ˆ 2 2 . We can show that this estimator, ˆ 2 2 is an unbiased estimator, but this
E ( X i ˆ ) E ( X i ˆ )
2 2
.
i 1
( E ( X i ) E ( ˆ ) 2 E ( X i )( ˆ ))
2 2
First term will be E ( X i ) 2 , the second term will be E ( ˆ ) 2 and then crossed terms will
be there, 2 E ( X i )( ˆ ) . So, I know that this E ( X i ) 2 is iid, so that way, there will be
N such terms, because of this summation. So, we can write this expectation as
N
E ( X i ˆ ) N E ( ˆ ) 2 E ( X i )( ˆ ) .
2 2 2
i 1
So, let us see what is this term E ( ˆ ) 2 will equal to? This expression is important for us. So,
E ( ˆ )
2
, that is variance of ̂ is
2
Xi
E ˆ
2
E .
N
E
(N
2
So, if I take N common, we get then because of this square, we get 2
Xi) and this
N
E
( ( X i ))
2
we can write as 2
. Now, this is again a summation and this summation we can
N
write in terms of the individual square terms and then all cross terms as
E E
( ( X i )) (Xi ) E ( X i )( X )
2 2
2 2 j
N N i ji
Because these are cross terms ( X i )( X j ) in this form, therefore, joint expectation
E ( X i )( X j
) of this quantities will be 0, because of independent. X i and X j
are
2
E
(Xi )
2
this will be simply 2
and this is equal to . Because, each is identically
N N
distributed, therefore, variance will be same for all X i ’s and this term will E ( X i ) 2 will
N
2 2
2
variables, but only in the case where it is X i , then only this expectation will become non-zero,
2
otherwise this expectation will become 0. So that way, this expression will become .
N
E ( X i ˆ ) N E ( ˆ ) 2 E ( X i )( ˆ ))
2 2 2
.
i 1
E ( X i ˆ ) N 2
2 2 2 2
.
i 1
So that way, it will become, because this is 2 , this is 2 2 , so that way, it will become
( N 1)
2
. So, this will imply that now, if we have to find out the E ˆ 12 that is scaling factor is
1 N 1
into this quantity E ( X i ˆ ) 2 . So, that way it will become . So, E ˆ 12 is not equal
2
N N
1
estimator. But if we consider E ˆ 22 E ( X i ˆ )
2
that is equal to 2 . Therefore, ˆ 22
N 1
estimator. But as N tends to infinity this quantity will become 1, therefore ˆ 12 is asymptotically
unbiased.
(Refer Slide Time: 18:48)
Variance of the estimator
So, we considered the mean of the estimator. We will consider the variance of the estimator. The
variance of the estimator ˆ is given by this, v a r(ˆ ) E (ˆ E (ˆ )) 2 . That is the variance, ˆ is
the random variable and E (ˆ ) is mean. So, that way, this is the variance. For the unbiased case,
this variance of ˆ will be simply v a r(ˆ ) E (ˆ ) 2 , because E (ˆ ) . The variance of an
E (ˆ ) E (ˆ ) .
2 2
That is variance of ˆ , is less than equal to variance of ˆ , where ˆ is any other unbiased
estimator.
(Refer Slide Time: 20:00)
Mean square error (MSE) of the estimator
So, we discussed mean, variance. Another term mean square error, MSE is given by
M S E E ( ˆ ) .
2
Minimizing the MSE is an important estimator criterion. MSE should be as
small as possible. Out of all unbiased estimators, the MVUE has the minimum mean square
error. So, in the case of unbiased estimator, MVUE will have the least MSE. Now, MSE is
related to bias and variance as shown below.
M S E v a r(ˆ ) b (ˆ ) .
2
This we can prove easily.
M S E E (ˆ ) ,
2
by definition.
Now we can write
it as
E (ˆ E ˆ E ˆ )
2
E (ˆ E ˆ ) E ( E ˆ ) 2 E (ˆ E ˆ )( E ˆ )
2 2
E (ˆ E ˆ ) E ( E ˆ ) 2 ( E ˆ E ˆ )( E ˆ )
2 2
Now, this first term is the variance of ˆ and second term, because it is a constant quantity, it will
b (ˆ ) . And the third term, because E ˆ E E ˆ , that will be same as the E ˆ E ˆ , because this
2
2 ( E ˆ E ˆ )( E ˆ ) will become 0.
So therefore, what we get, M S E v a r(ˆ ) b 2 (ˆ ) . So, this is one important relationship and
when this quantity, b 2 (ˆ ) , will become 0, bias 0, unbiased case. In that case, M S E v a r(ˆ ) .
(Refer Slide Time: 22:19)
Consistent estimators
Now we will discuss consistent estimators. As we have more data, the quality of estimations
should be better. This idea is used in defining the consistent estimator. So, this estimator is a
good estimator, if we have large amount of data. An estimator, ˆ , is called a consistent
lim P
N
ˆ - 0 for 0 .
So, that means probability of the deviation, as N , that should go down to 0 for any 0 .
So that way, we have defined a consistent estimator. Thus, a consistent estimator converges to
the true value of in probability. Less rigorous test, because here we have to determine the
probability, so there is a less rigorous test is obtained by applying the Chebyshev inequality,
E ( ˆ - )
2
P ˆ -
2
Here E ( ˆ - ) 2 is nothing but the MSE. Therefore, if this quantity will be 0, then this probability
will also become 0. Therefore, if lim E (ˆ ) 2 0 , then ˆ will be a consistent estimator. So,
N
this is the consistent estimator, this is the test for consistent estimator we will be using.
(Refer Slide Time: 24:15)
Now if ˆ is unbiased, in that case, MSE is same as variance of ˆ . Thus, if the estimator ˆ is
Example 5
Let us consider one example. Suppose X 1 , X 2 , ..., X N are iid with unknown and known
N
1
variance 2 and suppose ˆ
N
X i
. So, E ˆ will be
i 1
N
1
E ˆ
N
EX i
.
i 1
2
X
E
i
2 N
N
That we have already determined in our example 4. So, that way, this will be equal to simply
2 2
. So, v a r( ˆ ) because X i ’s are iid. So, as limit N , this variance will become 0.
N N
So,
2
lim v a r( ˆ ) lim 0 .
N N N
Therefore, ̂ is a consistent estimator of . So that way, sample mean is not only unbiased, but
it is also consistent. That means, if we have more and more number of data, then variants will go
down to the 0 and we will get the true value of . So, that is the idea behind consistent
estimators.
(Refer Slide Time: 26:32)
Efficient Estimator
We will introduce another term, efficient estimator. Suppose, 1 and 2 are two unbiased
estimators of , with variance of 1 less than variance of 2 . This is a better estimator. The
v a r ( 1 )
R e la tiv e E f f ic ie n c y
v a r (ˆ2 )
Variance of 1 divided by variance of 2 . This variance v a r( 1 ) is less, therefore, this number
will be less than 1. Particularly, if 1 is an MVUE, then 1 will be called an efficient estimator
and the absolute efficiency of an unbiased estimator with respect to this estimator. So, this is
relative efficiency, but absolute efficiency of an unbiased estimator will be determined with
respect to the MVUE. So here, this will be the estimator with minimum variance. Then we will
call this efficiency as the absolute efficiency.
(Refer Slide Time: 27:46)
Example
We will give another example. Suppose, X 1 , X 2 , ,X N
are iid random variables with
N
1
unknown mean and ̂ and ̂ 1 are two estimators of given by ˆ
N
X i
, this is the
i 1
1
sample mean as usual, and ̂ 1 , suppose we define that is, equal to (X1 X 2) . Now, both are
2
unbiased estimator and we see that variance of ̂ , this quantity, sample means, is equal to
2
var . And we can show that using the same formula N 2 , variance of ̂ 1 will be
N
2
2
v a r ˆ 1 . Therefore, efficiency, relative efficiency of ̂ 1 will be E ffic ie n c y o f ˆ 1 ,
2 N
2
if I consider the ratio of this quantity divided by this quantity, then we will get . So, this is the
N
unknown parameter .
unbiased, if and only if E ˆ . Unbiasedness means that on the average the estimator gives
the true value of the parameter. It is a desirable property.
M S E E ( ˆ ) .
2
MSE is related to the variance and bias by this relationship, M S E v a r(ˆ ) b 2 (ˆ ) .
ˆ should go down to 0 as N .
The relative efficiency of an estimator ˆ2 with respect to ˆ1 , which has lower variance, is
v a r ( 1 )
given by R e la tiv e E f f ic ie n c y .
v a r (ˆ2 )
Thank You.