STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single Sample
Fall 2011
STAT355 () - Probability & Statistics
Chapter Fall 2011 7: Statistical 1 / 28 Inte
Chapter 7 - Statistical Intervals Based on a Single Sample
7.1 Basic Properties of Condence Intervals 7.2 Large-Sample Condence Intervals for a Population Mean and Proportion 7.3 Intervals Based on a Normal Population Distribution 7.4 Condence Intervals for the Variance and Standard Deviation of a Normal Population
STAT355 () - Probability & Statistics
Chapter Fall 2011 7: Statistical 2 / 28 Inte
Basic Properties of Condence Intervals
Consider a random sample X1 , ..., Xn from N (, 2 ) and x1 , ..., xn be the actual observations of the random sample. N (, 2 /n). Sample mean X Z= X N (0, 1) / n X 1.96) = 0.95 / n
P (1.96
STAT355 () - Probability & Statistics
Chapter Fall 2011 7: Statistical 3 / 28 Inte
Basic Properties of Condence Intervals
P (1.96 is equivalent to
X 1.96) = 0.95 / n
1.96 + 1.96 P (X X ) = 0.95 n n Thus, 1.96 , X + 1.96 ) (X n n is a random interval that includes or covers the true value of .
STAT355 () - Probability & Statistics
Chapter Fall 2011 7: Statistical 4 / 28 Inte
Basic Properties of Condence Intervals
1.96 (X , X + 1.96 ) n n is a random interval that includes or covers the true value of .
(1)
Denition
If, after observing X1 = x1 , X2 = x2 , ..., Xn = xn , we compute the observed , the resulting sample mean x and then substitute x into (1) in place of X xed interval + 1.96 ) ( x 1.96 , x n n is called a 95% condence interval for .
STAT355 () - Probability & Statistics
Chapter Fall 2011 7: Statistical 5 / 28 Inte
Basic Properties of Condence Intervals
Denition
A 100(1 )% condence interval for the mean of a normal population when the value of 2 is known is given ( x z/2 , x + z/2 ) n n or, equivalently, by x z/2 n
= 0.1, z/2 = z0.05 = 1.64 = 0.05, z/2 = z0.025 = 1.96
STAT355 () - Probability & Statistics
Chapter Fall 2011 7: Statistical 6 / 28 Inte
Example
Exercises 1: Consider a normal population with the value of known. 1 What is the condence interval level for the interval x 2.81/ n? 2 What is the condence interval level for the interval x 1.44/ n? 3 What is the value of z /2 that will result in a condence level of 99.7%?
STAT355 () - Probability & Statistics
Chapter Fall 2011 7: Statistical 7 / 28 Inte
Large-Sample Condence Intervals for a Population Mean
Consider X1 , ..., Xn from N (, 2 ). Often, 2 is unknown. Let S be the sample standard deviation.
Proposition
If n is suciently large, the standardized variable Z= X S/ n
has approximately a standard normal distribution. This implies that s x z/2 n is a large-sample condence interval for with condence level approximately 100(1 )%. This formula is valid regardless of the shape of the population distribution.
STAT355 () - Probability & Statistics
Chapter Fall 2011 7: Statistical 8 / 28 Inte
A Condence Interval for a Population Proportion
Let p denote the proportion of successes in a population. A random sample of n individuals is to be selected, and X is the number of successes in the sample. Provided that n is small compared to the population size, X can be regarded as a binomial rv with E (X ) = np and X = np (1 p )
Furthermore, if both np 10 and n(1 p ) 10, then X has approximately a normal distribution.
STAT355 () - Probability & Statistics
Chapter Fall 2011 7: Statistical 9 / 28 Inte
A Condence Interval for a Population Proportion
The natural estimator of p is p = X /n, the sample fraction of successes. Since p is just X multiplied by the constant 1/n, p also has approximately a normal distribution. As we know that, E ( p ) = p (unbiasedness) and p = p (1 p )/n.
The standard deviation p involves the unknown parameter p . Standardizing p by subtracting p and dividing by p then implies that P (z/2 p p p (1 p )/n z/2 ) 1
STAT355 () - Probability & Statistics
Chapter Fall 2011 7: Statistical 10 / 28 Inte
A Condence Interval for a Population Proportion
Proposition
Let p =
2 /2n p +z/ 2 2 /n 1+z/ 2
. Then a condence interval for a population proportion
p with condence level approximately 100(1 )% is
2 /4n2 p (1 p )/n + z/ 2 2 /n 1 + z/ 2
p z/2
STAT355 () - Probability & Statistics
Chapter Fall 2011 7: Statistical 11 / 28 Inte
Exercise (7.2) 21
In a sample of 1000 randomly selected consumers who had opportunities to send in a rebate claim form after purchasing a product, 250 of these people said they never did so. Calculate an upper condence bound at the 95% condence level for the true proportion of such consumers who never apply for a rebate. Based on this bound, is there compelling evidence that the true proportion of such consumers is smaller than 1/3?
STAT355 () - Probability & Statistics
Chapter Fall 2011 7: Statistical 12 / 28 Inte
Intervals Based on a Normal Population Distribution
The CI for presented earlier is valid provided that n is large. The resulting interval can be used whatever the nature of the population distribution. The CLT cannot be invoked, however, when n is small. In this case, one way to proceed is to make a specic assumption about the form of the population distribution and then derive a CI tailored to that assumption.
STAT355 () - Probability & Statistics
Chapter Fall 2011 7: Statistical 13 / 28 Inte
Intervals Based on a Normal Population Distribution
Assumption
The population of interest is normal, so that X1 , ..., Xn constitutes a random sample from a normal distribution with both and 2 unknown.
STAT355 () - Probability & Statistics
Chapter Fall 2011 7: Statistical 14 / 28 Inte
Intervals Based on a Normal Population Distribution
The key result underlying the interval in earlier section was that for large X has approximately a standard normal distribution. n, the rv Z = S / n When n is small, S is no longer likely to be close to s , so the variability in the distribution of Z arises from randomness in both the numerator and the denominator. This implies that the probability distribution of out than the standard normal distribution.
X S/ n
will be more spread
STAT355 () - Probability & Statistics
Chapter Fall 2011 7: Statistical 15 / 28 Inte
Intervals Based on a Normal Population Distribution
The result on which inferences are based introduces a new family of probability distributions called t distributions.
Theorem
is the mean of a random sample of size n from a normal When X distribution with mean, the rv T = X S/ n
has a probability distribution called a t distribution with n 1 degrees of freedom (df).
STAT355 () - Probability & Statistics
Chapter Fall 2011 7: Statistical 16 / 28 Inte
Properties of t Distributions
X Although the variable of interest is still S , we now denote it by T to / n emphasize that it does not have a standard normal distribution when n is small.
We know that a normal distribution is governed by two parameters; each dierent choice of in combination with 2 gives a particular normal distribution. Any particular t distribution results from specifying the value of a single parameter, called the number of degrees of freedom, abbreviated df.
STAT355 () - Probability & Statistics
Chapter Fall 2011 7: Statistical 17 / 28 Inte
Properties of t Distributions
Well denote this parameter by the Greek letter . Possible values of are the positive integers 1, 2, 3,... So there is a t distribution with 1 df, another with 2 df, yet another with 3 df, and so on. For any xed value of , the density function that species the associated t curve is even more complicated than the normal density function. Fortunately, we need concern ourselves only with several of the more important features of these curves.
STAT355 () - Probability & Statistics
Chapter Fall 2011 7: Statistical 18 / 28 Inte
Properties of t Distributions
Let t denote the t distribution with df.
1
Each t curve is bell-shaped and centered at 0. Each t curve is more spread out than the standard normal (z ) curve. As increases, the spread of the corresponding t curve decreases. As , the sequence of t curves approaches the standard normal curve (so the z curve is often called the t curve with df =).
STAT355 () - Probability & Statistics
Chapter Fall 2011 7: Statistical 19 / 28 Inte
Properties of t Distributions
T =
X S/ n
The number of df for T is n 1 because, although S is based on the n , ..., X Xn , the fact that (Xi X ) = 0 implies that deviations X1 X only n 1 of these are freely determined. The number of df for a t variable is the number of freely determined deviations on which the estimated standard deviation in the denominator of T is based. The use of t distribution in making inferences requires notation for capturing t -curve tail areas t analogous to z for the z curve.
STAT355 () - Probability & Statistics
Chapter Fall 2011 7: Statistical 20 / 28 Inte
Properties of t Distributions
Notation: Let t, = the number on the measurement axis for which the area under the t curve with df to the right of t, is ; t, is called a t critical value. For example, t.05,6 is the t critical value that captures an upper-tail area of 0.05 under the t curve with 6 df. Because t curves are symmetric about zero, -t, captures lower-tail area . Appendix Table A.5 gives t, for selected values of and n. The columns of the table correspond to dierent values of . To obtain t0.05,15 , go to the = 0.05 column, look down to the n = 15 row, and read t0.05,15 = 1.753.
STAT355 () - Probability & Statistics
Chapter Fall 2011 7: Statistical 21 / 28 Inte
The One-Sample t Condence Interval
Proposition
Let x and s be the sample mean and sample standard deviation computed from the results of a random sample from a normal population with mean . Then a 100(1 )% condence interval for is s s ( x t/2,n1 , x + t/2,n1 ) n n or, more compactly, s x t/2,n1 n
STAT355 () - Probability & Statistics
Chapter Fall 2011 7: Statistical 22 / 28 Inte
The One-Sample t Condence Interval
Example (11): Even as traditional markets for sweetgum lumber have declined, large section solid timbers traditionally used for construction bridges and mats have become increasingly scarce. The article Development of Novel Industrial Laminated Planks from Sweetgum Lumber (J. of Bridge Engr., 2008: 6466) described the manufacturing and testing of composite beams designed to add value to low-grade sweetgum lumber. Here is data on the modulus of rupture: 6807.99 7437.88 7659.50 7422.69 7637.06 6872.39 7378.61 7886.87 6663.28 7663.18 7295.54 6316.67 6165.03 6032.28 6702.76 7713.65 6991.41 6906.04 7440.17 7503.33 6992.23 6981.46 7569.75 6617.17 6984.12 7093.71 8053.26 8284.75 7347.95 7674.99
STAT355 () - Probability & Statistics
Chapter Fall 2011 7: Statistical 23 / 28 Inte
The One-Sample t Condence Interval
Use R software.
STAT355 () - Probability & Statistics
Chapter Fall 2011 7: Statistical 24 / 28 Inte
The One-Sample t Condence Interval
Example (12) Consider the following sample of fat content (in percentage) of n = 10 randomly selected hot dogs (Sensory and Mechanical Assessment of the Quality of Frankfurters, J. of Texture Studies, 1990: 395409): 25.2 21.3 22.8 17.0 29.8 21.0 25.5 16.0 20.9 19.5 Assuming that these were selected from a normal population distribution, nd a 95% CI for (interval estimate of) the population mean fat content. Use your calculator to obtain x and s .
STAT355 () - Probability & Statistics
Chapter Fall 2011 7: Statistical 25 / 28 Inte
The Chi-Squared (2 ) Distribution
Denition
Let X1 , X2 , ..., Xn be a random sample from a normal distribution with parameters and 2 . Then the rv (n 1)S 2 = 2 )2 (Xi X 2
has a chi-squared (2 ) probability distribution with = n 1 df. Notation: Let 2 , called a chi-squared critical value, denote the number on the horizontal axis such that of the area under the chi-squared curve with df lies to the right of 2 , . Remark: The chi-squared distribution is not symmetric
STAT355 () - Probability & Statistics
Chapter Fall 2011 7: Statistical 26 / 28 Inte
Condence Interval of 2
From the theorem, P (2 1/2,n1 we get the inequalities (n 1)S 2 (n 1)S 2 2 2 /2,n1 1/2,n1 A 100(1 )% condence interval for the variance 2 of a normal population is (n 1)s 2 (n 1)s 2 , ) ( 2 /2,n1 2 1/2,n1 (n 1)S 2 2 /2,n1 ) = 1 2
STAT355 () - Probability & Statistics
Chapter Fall 2011 7: Statistical 27 / 28 Inte
(Suppl) 51
An April 2009 survey of 2253 American adults conducted by the Pew Research Centers Internet & American Life Project revealed that 1262 of the respondents had at some point used wireless means for online access. 1 Calculate an interpret a 95% CI for the proportion of all American adults who at the time of the survey had used wireless means for online access. 2 What sample size is required if the desired width of the 95% CI is to be at most 0.04, irrespective of the sample results?
STAT355 () - Probability & Statistics
Chapter Fall 2011 7: Statistical 28 / 28 Inte