0% found this document useful (0 votes)
46 views9 pages

Stat CH 3 Edited 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views9 pages

Stat CH 3 Edited 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

UNIT THREE

_____________________________________________________________________
STATISTICAL ESTIMATION
_____________________________________________________________________
3.1. DEFINITION OF STATISTICAL ESTIMATION
Statistical inference is the process of using limited information, a sample, for the purpose of
reaching conclusion about a large set of data, the population. Estimation refers to any procedure
where sample information is used to estimate or predict the numerical value of some population
measure (called parameter) such as the population mean μ.

An estimator is a procedure or function used in estimating a population parameter.


An estimate is the numerical value determined from the estimator.
A parameter is a characteristic of an entire population; a statistic is a summary
measure that is computed to describe a characteristic for only a sample of the
population.

There are two types of estimators. A point estimator of a population parameter is a procedure
that produces a single value as an estimate. The sample mean is a statistic that may be used as a
point estimate of the population mean. An interval estimator of the population parameter is the
procedure that produces a range of values. The range of values is useful as a measure of degree
of error that may exist in estimation.

3.1.1. CRITERIA FOR GOOD ESTIMATOR


In point estimator we seek the sample statistic that is the best estimator of the population
parameter. Many criteria have been developed to describe what the best is for a point estimator.
The more general of these are the criteria of unbiasedness, efficiency, and consistency.
Unbiasedeness
A statistic is an unbiased estimator of a parameter of the expected value of the statistic equals the
parameter, i.e. if
E (statistic) = Parameter
Any statistic chosen as an estimator is a random variable since the value of the statistic may
differ from sample to sample. The expected value of a random variable may be interpreted as
long-run average. Therefore, the above definition indicates that a statistic is an unbiased
estimator of a parameter if the average value of the statistic is the same as the parameter value.
Thus on average the estimator will be correct.

Efficiency
Unbiasedness alone does not guarantee a good estimator. In fact, some parameters may have
more than one unbiased estimator. Selection among the unbiased estimators is made on the basis
of comparing the variances of the estimators.
If the there exist more than one unbiased estimator of population parameter, the estimator with
minimum variance is the more efficient.

1
Even though the average value of an unbiased estimator equals the parameter, an estimator may
yield estimates that are not particularly close to the parameter value. The efficiency of an
estimator is measured by the variance of the estimator. The minimum variance unbiased
estimator is the unbiased estimator with the smallest variance.

Consistency
Another desirable property is that an estimator should produce estimates that have a high
probability of being close to the true value as the sample size increases. An estimator that has
this property is called a consistent estimator. The variance of a consistent estimator becomes
smaller as larger sample sizes are taken.

3.1.2 POINT ESTIMATOR OF THE MEAN


Assume that we have the following random sample of n= 6 elements from a population whose
parameter is not known.
1 2 4 5 7 11

The sample mean is X 


 X  30  5
n 6
The estimator is X , and 5 is the point estimate of the unknown population mean.

3.1.3 POINT ESTIMATE OF THE POPULATION PROPORTION


The above array contains two even numbers 2 and 4. Calling the even numbers success, the
sample proportion of success is:
2 1
P 
6 3
The statistic P is an estimator of the unknown population proportion of success and 1 is a
3
point estimate of the population proportion.

3.1.4 POINT ESTIMATE OF THE UNKNOWN POPULATION STANDARD


DEVIATION
We will use the symbol Sx to mean an estimate of the unknown population standard deviation σx.
The estimator, called sample standard deviation, is defined by the formula

Sx 
(X  X ) 2

n 1
Where X = sample mean
n= sample size
For the random sample 1, 2, 4, 5, 7, 11 write the symbol for and compute the sample standard
deviation.
Solution

2
Sx 
(X  X ) 2

n 1
(1  5)2  (2  5) 2  (4  5) 2  (5  5) 2  (7  5) 2  (11  5) 2
Sx  =3.633
6 1

3.1.5 POINT ESTIMATOR OF STANDARD ERROR OF THE MEAN


x
Standard error of the mean is computed by the formula  x  when the sample size is less
n
than 5 % of the population size. In our case, the total size of the population is unknown; therefore
it is safer to assume that the sample is less than 5% of the entire population. Hence, we will use
the estimator sx to estimate the standard error  X . The symbol S X is called the sample
n
standard error of the mean. The formula for S X is
Sx
SX 
n
Where Sx= Sample standard deviation
n= sample size

Thus, Sx is the estimator for σx, and S X is the estimator for  X . we have calculated Sx= 3.633
for the random sample of 1, 2, 4, 5, 7, 11. The sample standard error can be obtained using the
formula
Sx 3.633
SX  =  1.483
n 6

3.1.6 A POINT ESTIMATE OF SAMPLE STANDARD ERROR OF THE PROPORTION

Standard error of the proportion answers how far an unknown population proportion might be
from sample proportion. The symbol S p will be used to mean standard error of the proportion.

pq
SP 
n
Where p = sample proportion of success
q  1 p
n= sample size

Example
Let an even number be success, and suppose a sample of 200 numbers be selected randomly
from a population that contains 120 even numbers. Write the symbol for and compute the value
of the point estimator of the standard error of the proportion.

3
Pq 0.6 x0.4
Sp    0.0346
n 200

The following table shows some population parameters and their estimators.
Population parameter sample statistic (estimators)
Mean  X
Standard deviation σx Sx
Variance σ2x S2x
Proportion P P
Standard error of the mean  SX

3.2 INTERVAL ESTIMATE


Point estimators of population parameters, while useful, do not convey as much information as
the interval estimators. Point estimation produces a single value as an estimate of unknown
population parameter. The estimate may or may not be close to the parameter value; in other
words, the estimate may be incorrect. Unbiasedness guarantees only that the average value of the
estimator determined from repeated samples will equal the parameter value. An interval estimate,
on the other hand, is a range of values that conveys the fact that estimation is an uncertain
process. The standard error of the point estimator is used in creating a range of values; thus, a
measure of variability is incorporated into interval estimation. Further, a measure of confidence
in the interval estimator is provided; consequently, interval estimates are also called Confidence
Intervals. For this reasons, Interval estimators are considered more desirable than point
estimators.

3.2.1 INTERVAL ESTIMATE OF POPULATION MEAN


An interval estimate of  is an interval values of a and b; with in which an unknown population
mean is expected to lie. The interval is an inference based up on:
1. Value of the mean X of the simple random sample selected from the population, and
2. Known facts about sampling distributions of the mean
The confidence interval shows how certain we are that the interval is correct. The choice of
method used in constructing a confidence interval is for  depends upon whether or not the
population is normal and whether the population standard deviation  X is known or unknown.

3.2.1.1 Confidence Interval Estimate Of  , Normal Population, And Standard Deviation


Known
Suppose we have a normal population whose mean and standard deviation are  and x. the
x
sampling distribution of the mean is normal with the mean  and standard error of  x 
n
For the sampling distribution of the mean, the standard normal variable is

4
X 
Z
x

If we want to be 95% confident that the population mean,  falls within the estimate, we can
calculate the range as follows.
1. find the Z value for 95% confidence level
2. Use the obtained Z value to calculate the unknown population parameter.

For example z value for 95% confidence interval is 1.96. Therefore, if we want to be 95% sure
that the true population mean falls within the estimate, we can rearrange the above formula and
get:
X  1.96 x    X  1.96 x

The proportion of correct estimates (0.95 in our illustration) is called the confidence coefficient
C. the number 100C (95% in our illustration) is called the confidence level. The proportion of
incorrect statements is symbolized by the Greek letter α (alpha). The sum of the proportions of
correct and incorrect statements 1; so
C + α =1 or α = 1- c
We can describe C as the chance that the confidence interval is correct, and α as the chance that
the interval is incorrect.

Example1: A normal population has standard deviation of 10. a random sample of size 25 has a
mean of 50. Construct a 95% confidence interval estimate of the population mean.

Solution
To construct the confidence interval,
We have to first find Z value for 95% confidence level and then use the formula,
X  Z x    X  Z x to estimate the interval. The Z value for 95% confidence level is 1.96.
Therefore, the estimate can be given as, X  1.96 x    X  1.96 x . That is:
10 10
50  1.96( )    50  1.96( )
25 25
= 50  3.9    50  3.9
= 46.1    53.9
Example 2: An experiment involves selecting a random sample of 256 middle managers for
studying their annual income. The sample mean is computed to the 35,420 and the sample
standard deviation is 2,050.
a. What is the estimated mean income of all middle managers (the population)?
b. What is the 95% confidence interval c (rounded to the nearest 10)
c. What are the 95% confidence limits?
Solution

5
a. Sample mean is 35420 so this will approximate the population mean so  = 35420. It is
estimated from the sample mean.
b. The confidence interval is between 35170 and 35670 found by
S  2050 
X  1.96 = 35420 1.96   = 35168.87 and 35671.13
n  256 
c. The end points of the confidence interval are called the confidence limits. In this case they are
rounded to 35170 and 35670. 35170 is the lower limit and 35070 is the upper limit.

3.2.1.2 Precision, Confidence and Sample Size


The narrower the confidence interval is, the more precise it is. And the wider the interval, the
less precise is the interval. The end points of a confidence interval for µ are:
x
X Z
n
x
The smaller the value of Z , the more precise (narrower) is the confidence interval.
n
Consequently, the smaller Z and  x are, and the larger n is, the more precise will be the interval.
We conclude that the larger the sample size, the more precise is an interval estimate. It can also
be concluded that the smaller the variability the more precise the estimate. The final conclusion
that can be drawn from the above relationship is, the lower the confidence level, the more precise
is the interval estimate.

3.2.1.3. CONFIDENCE INTERVAL FOR A POPULATION PROPORTION

The confidence interval for a population proportion is estimated, p  Zp, Where p is the

p(1  p)
standard error of the proportion and  p 
n
Therefore the confidence interval for population proportion is constructed by

p (1  p )
p Z
n
Example: Suppose 1600 of 2000 union members sampled said they plan to vote for the proposal

to merge with a notional union. Union by laws state that at least 75% of all members must

approve for the merger to be enacted. Using the 0.95 degree of confidence, what is the interval

estimate for the population proportion? Based on the confidence interval, what conclusion can be

drawn? The interval is computed as follows.

p (1  p ) 0.80(1  0.8)
p Z = 0.80  1.96 = 0.08  1.96 0.00008
n 2000

6
= 0.78247 and 0 – 81753 rounded to 0.782 and 0.818.

Based on the sample results when all 2000 union members vote, the proposal will probably pass

because 0.75 lie below the interval between 0.782 and 0.818.

3.2.1.4. CONFIDENCE ESTIMATE OF µ, NORMAL POPULATION,  x UNKNOWN

Under the previous case we have seen the case where the population is uniformly distributed and
population standard deviation is known. In this case we search for Z value and use the formula
x
X Z to estimate the interval within which the population mean lies with C Confidence
n
coefficient. However, most of the time population mean µ is unknown, so is population standard
deviation, d. therefore, d must be estimated from sample standard deviation.

Sx 
(X  X ) 2

n 1

After calculate the standard deviation, standard error must be computed using the following
formula.
S
Sx  X
n

When population standard deviation known, the interval estimate can be calculated as
X 
Z
x
However, if population standard deviation is unknown, we need to estimate population standard
deviation with sample standard deviation and the distribution does not follow normal
distribution. The distribution rather follows a student’s t-distribution which was identified for the
first time by W. S. Gosset in 1900s. There are different t-distributions for each sample size. T-
distribution is discussed in a greater detail in hypothesis test. In this chapter we will only
illustrate how to make an interval estimate using the t-distribution; without giving much
emphasis for the distribution’s characteristic.

Tail areas for t-distribution are presented according to parameter called degrees of freedom. We
shall use the symbol  for degrees of freedom. Degree of freedom for t-distribution can be
calculated as  n  1.

Where
ν= degree of freedom
n= sample size

7
As ν increases, the tail area decreases; so is the t-value. As degrees of freedom increases, the t-
distribution approaches the standard normal distribution. When degree of freedom is 30, the t-
distribution is approximately similar to normal distribution.

To construct interval estimate for µ under this situation, we need to use the value of t / 2 , which
will be read from statistical table in association with the formula:
S S
X  t / 2, X    X  t / 2, X
n n
Where
X  Sample Mean
n= sample size
  n -1 (degrees of freedom)
Sx= sample standard deviation
μ=unknown population mean

Example: The environmental protection officer of a large industrial plant sought to determine
the mean daily amount of sulphur oxide (pollutant) emitted by the plant. Because measurements
costs were high, only a random sample 10 days’ measurements were obtained: these were, in
tons per day,
8 7 10 15 11 5 8 5 13 12
Suppose emissions per day are normally distributed. Estimate μ, the mean amount of sulphur
oxides emitted per day using the confidence interval with a confidence coefficient of 0.95.

Solution

X
 X = 95  9.5
n 10

Sx 
(X  X ) 2

=
94.5
=3.24
n 1 9

The confidence level is 95%. Therefore, significance level  = 1-C= 1-0.95= 0.05 and
/2=0.025.
Next, we have to calculate the degree of freedom for the observation; which is given as ν=n-1=
10-1=9
S S
We can now calculate the interval as X  t / 2, X    X  t / 2, X . t / 2, in this specific
n n
situation means t0.025, 9 = 2.26
Therefore Interval can be calculated as:
3.24 3.24
9.5  2.26( )    9.5  2.26( )
10 10

8
= 7.2    11.8
It may be difficult, sometimes, to know if the population is normally distributed or not. Hence,
we may need to use approximation.
The Central limit theorem proves that as sample sizes increases the distribution approaches
normal distribution. In fact for n greater than or equal to 30 statisticians use normal distribution.
Hence, we can use the Central limit theorem to construct interval estimate for a mean when
sample size is greater than or equal to 30.

You might also like