Would You Change the Channel?
A survey by the Roper Organization found that 45% of the people who were offended by
a television program would change the channel, while 15% would turn off their television sets. The survey further stated that
the margin of error is 3 percentage points, and 4000 adults were interviewed.
Several questions arise:
1.How do these estimates compare with the true population percentages?
2.What is meant by a margin of error of 3 percentage points?
3.Is the sample of 4000 large enough to represent the population of all adults who
watch television in the United States?
Introduction
One aspect of inferential statistics is estimation,which is the process of estimating the value of a parameter from information
obtained from a sample. For example, The Book of Odds,by Michael D. Shook and Robert L. Shook (New York: Penguin Putnam,
Inc.),contains the following statements:
“One out of 4 Americans is currently dieting.” (Calorie Control Council)
“Seventy-two percent of Americans have flown on commercial airlines.” (“The Bristol
Meyers Report: Medicine in the Next Century”)
“The average kindergarten student has seen more than 5000 hours of television .” (U.S.
Department of Education)
“The average school nurse makes $32,786 a year.” (National Association of School Nurses)
Since the populations from which these values were obtained are large, these values are only estimates of the true parameters
and are derived from data collected from samples. The statistical procedures for estimating the population mean, proportion,
variance, and standard deviation will be explained in this chapter.
An important question in estimation is that of sample size. How large should the sample be in order to make an accurate
estimate? This question is not easy to answer since the size of the sample depends on several factors, such as the accuracy
desired and the probability of making a correct estimate. The question of sample size will be explained in this
chapter also.
Inferential statistical techniques have various assumptionsthat must be met before valid conclusions can be obtained. One
common assumption is that the samples must be randomly selected. Chapter 1 explains how to obtain a random sample. The
other common assumption is that either the sample size must be greater than or equal to 30 or the population must be
normally or approximately normally distributed if the sample size is less than 30.
Confidence Intervals for the Mean
When SIs Known
Suppose a college president wishes to estimate the average age of students attending classes this semester. The president could
select a random sample of 100 students and find the average age of these students, say, 22.3 years. From the sample mean, the
president could infer that the average age of all the students is 22.3 years. This type of estimate is called a point estimate.
A point estimate is a specific numerical value estimate of a parameter. The best point estimate of the population mean m is the
sample mean . You might ask why other measures of central tendency, such as the median and
mode, are not used to estimate the population mean. The reason is that the means of samples vary less than other statistics
(such as medians and modes) when many samples are selected from the same population. Therefore, the sample mean is the
best estimate of the population mean. Sample measures (i.e., statistics) are used to estimate population measures (i.e.,
parameters). These statistics are called estimators.As previously stated, the sample mean is a better estimator of the population
mean than the sample median or sample mode.
A good estimator should satisfy the three properties described now.
Three Properties of a Good Estimator
1. The estimator should be an unbiased estimator.That is, the expected value or the mean of
the estimates obtained from samples of a given size is equal to the parameter being estimated.
2. The estimator should be consistent. For a consistent estimator,as sample size increases,
the value of the estimator approaches the value of the parameter estimated.
3. The estimator should be a relatively efficient estimator.That is, of all the statistics that can
be used to estimate a parameter, the relatively efficient estimator has the smallest variance.
Confidence Intervals
As stated in Chapter 6, the sample mean will be, for the most part, somewhat different from the population mean due to
sampling error. Therefore, you might ask a second question: How good is a point estimate? The answer is that there is no way of
knowing how close a particular point estimate is to the population mean.
This answer places some doubt on the accuracy of point estimates. For this reason,
statisticians prefer another type of estimate, called an interval estimate.
An interval estimate of a parameter is an interval or a range of values used to estimate
the parameter. This estimate may or may not contain the value of the parameter being
estimated.
In an interval estimate, the parameter is specified as being between two values. For
example, an interval estimate for the average age of all students might be 21.9 m
22.7, or 22.3 0.4 years.
Either the interval contains the parameter or it does not. A degree of confidence (usually a percent) can be assigned before an
interval estimate is made. For instance, you may
wish to be 95% confident that the interval contains the true population mean. Another
question then arises. Why 95%? Why not 99 or 99.5%?
If you desire to be more confident, such as 99 or 99.5% confident, then you must
make the interval larger. For example, a 99% confidence interval for the mean age of
college students might be 21.7 m22.9, or 22.3 0.6. Hence, a tradeoff occurs. To
be more confident that the interval contains the true population mean, you must make the
interval wider.
The confidence levelof an interval estimate of a parameter is the probability that the
interval estimate will contain the parameter, assuming that a large number of samples are
selected and that the estimation process on the same parameter is repeated.
A confidence intervalis a specific interval estimate of a parameter determined by using
data obtained from a sample and by using the specific confidence level of the estimate.
Intervals constructed in this way are called confidence intervals.Three common confidence intervals are used: the 90, the 95,
and the 99% confidence intervals.
The algebraic derivation of the formula for determining a confidence interval for a
mean will be shown later. A brief intuitive explanation will be given first.
The central limit theorem states that when the sample size is large, approximately
95% of the sample means taken from a population and same sample size will fall within
1.96 standard errors of the population mean, that is,
Now, if a specific sample mean is selected, say, , there is a 95% probability that the
interval m 1.96(s) contains . Likewise, there is a 95% probability that the interval specified by
will contain m, as will be shown later. Stated another way,
Historical Notes
Point and interval
estimates were known
as long ago as the late
1700s. However, it
wasn’t until 1937 that
a mathematician,
J. Neyman, formulated
practical applications
for them.