Xample 8 - 1: 8.1 Hypothesis Tests For Two Population Means (Large Samples) 375
Xample 8 - 1: 8.1 Hypothesis Tests For Two Population Means (Large Samples) 375
E XAMPLE                 A math instructor wrote two versions of the same test, which he
  8.1                    believed to be of equal difficulty. He gave the first version (A) to 36
                         students, and the second version (B) to 41 students. We will con-
                         sider the two groups to be random samples from the population of
                         all community college statistics students. The 36 students who took
                         version A had a mean score of 79.3 with a standard deviation of
                         11.26. The 41 students who took version B had a mean score of
                         84.1 with a standard deviation of 9.40. At the 0.05 level of signifi-
                         cance, test the claim that the two versions were of equal difficulty.
Since one version was not specifically mentioned first in the claim (“the two ver-
sions were of equal difficulty”), we will let version A be population 1 and version B
be population 2.
Step 1
Population 1: version A; population 2: version B
Claim in words: The two versions are of equal difficulty.
Claim: μ1 = μ2
Complement: μ1 ≠ μ2
H0: μ1 = μ2
HA: μ1 ≠ μ2
Step 2
Level of significance: α = 0.05
Step 3
                              x   1   – x2
Test statistic: z =
                              s2
                               1          s 22
                                      +
                              n1          n2
Step 4
α = 0.05, two-tailed test
0.9500
0.0250 0.0250
–1.96 –1 0 1 1.96
            x1 – x2
    z =
            s 12       s 22
                   +
            n1         n2
              79.3 – 84.1
        =
            11.26 2   9.40 2
                    +
               36       41
        = –2.01
376   CHAPTER 8 Two-Sample Hypothesis Tests
p-value
                         –2       –1          0         1          2
                        – 2.01
                     You may be wondering why we did not use the one-mean test developed in the
                last chapter. For instance, we could have tested the claim that the mean score for ver-
                sion A is below 82, and then tested the claim that the mean score for version B is
                above 82. The first problem with this approach is that it is very difficult to come up
                with that single number to test each mean against ahead of time. The second prob-
                lem associated with using two separate tests involves the level of significance. Recall
                that using a level of significance of 0.05 means that the probability of rejecting a null
                hypothesis that is actually true is 0.05. However, if we perform two separate hypoth-
                esis tests at the 0.05 level of significance, then there is a 0.0975 probability of reject-
                ing at least one null hypothesis that is actually true. This is calculated as follows.
                    P(Rejecting at least 1 true null hypothesis) = 1 – P(not rejecting a true null hypothesis)
                                                                = 1 – (0.95)2
                                                                = 0.0975
                Not only is performing two hypothesis tests more tedious and time consuming, it also
                increases our chances of making a Type I error.
                    We continue with two more examples.
                E XAMPLE          A reading group claims that Americans read more as they grow
                  8.2             older. A random sample of 115 Americans age 60 or older read for
                                  a mean length of 62.8 minutes per day, with a standard deviation of
                                  18.3 minutes per day. A random sample of 88 Americans between
                                  the ages of 50 and 59 read for a mean length of 54.2 minutes per
                                  day, with a standard deviation of 23.1 minutes per day. At the 0.01
                                  level of significance, test the claim that the mean time spent read-
                                  ing per day by Americans age 60 and older is longer than the mean
                                  time spent reading per day by Americans between the ages of 50
                                  and 59.
                We will let Americans age 60 and older be population 1 as they were the first group
                mentioned in the claim. We will let Americans between the ages of 50 and 59 be
                population 2.
                Step 1
                Population 1: 60 or older; population 2: 50 to 59 years old
                                      8.1 Hypothesis Tests for Two Population Means (Large Samples)      377
Claim in words: The mean time spent reading per day by Americans age 60 and
  older is longer than the mean time spent reading per day by Americans between
  the ages of 50 and 59.
Claim: μ1 > μ2
Complement: μ1 ≤ μ2
H0: μ1 ≤ μ2
HA: μ1 > μ2
Step 2
Level of significance: α = 0.01
Step 3
                             x   1   – x2
Test statistic: z =
                             s2
                              1          s 22
                                     +
                             n1          n2
Step 4
α = 0.01, right-tailed test
                                                                               0.9900
Decision rule: Reject H0 if z > 2.33.
                                                                                                  0.01
                                                       –2       –1         0            1   2
                                                                                                2.33
Step 5
           x1 – x2
    z =
           s 12       s 22
                  +
           n1         n2
            62.8 – 54.2
       =
           18.3 2   23.1 2
                  +
            115      88
      = 2.87
p-value
                                      0                  2.87
378   CHAPTER 8 Two-Sample Hypothesis Tests
                E XAMPLE               Many baseball fans prefer the National League to the American
                  8.3                  League because they believe that the National League has better
                                       pitchers, which produces lower scoring games. A random sample of
                                       71 National League baseball games had a mean combined score of
                                       9.7 runs per game, with a standard deviation of 4.94 runs per game.
                                       A random sample of 62 American League baseball games had a
                                       mean combined score of 10.2 runs per game, with a standard devi-
                                       ation of 4.98 runs per game. At the 0.05 level of significance, test
                                       the claim that the mean number of runs scored in a National League
                                       baseball game is less than it is in the American League.
                We will let the National League games be population 1 as they were the first group
                mentioned in the claim. We will let the American League games be population 2.
                Step 1
                Population 1: National League games; population 2: American League games
                Claim in words: The mean number of runs scored in a National League baseball
                  game is less than it is in the American League.
                Claim: μ1 < μ2
                Complement: μ1 ≥ μ2
                H0: μ1 ≥ μ2
                HA: μ1 < μ2
                Step 2
                Level of significance: α = 0.05
                Step 3
                                             x   1   – x2
                Test statistic: z =
                                             s2
                                              1           s 22
                                                      +
                                             n1           n2
                Step 4
                α = 0.05, left-tailed test
                                                                 0.9500
                                                 0.05
                                                     –2       –1          0   1   2
                                                       –1.645
Step 5
                          x1 – x2
                    z =
                           s 12       s 22
                                  +
                           n1         n2
                                       8.1 Hypothesis Tests for Two Population Means (Large Samples)   379
                 9.7 – 10.2
        =
             4.94 2   4.98 2
                    +
               71       62
        = –0.58
p-value
            –2            –1           0      1        2
                               –0.58
Confidence Intervals
Recall that the idea of hypothesis testing is based on confidence intervals. There is a
confidence interval that can be constructed for the difference between two population
means, when both sample sizes are large. We begin, as we did before, by calculating
the margin of error. Here is the formula.
                   s 12       s 22
    E = z α/2             +
                   n1         n2
                                                       –   –
We then subtract this margin of error from x 1 – x 2 to find the left endpoint of the con-
                                                       –    –
fidence interval, and we add the margin of error to x 1 – x 2 to find the right endpoint
of the confidence interval. (If you want to create a one-sided confidence interval, see
the “Extra” material at the end of the Section 6.2 exercises.)
E XAMPLE                  A math instructor wrote two versions of the same test that he
  8.4                     believed to be of equal difficulty. He gave the first version (A) to
                          36 students and the second version (B) to 41 students. We will
                          consider the two groups to be random samples from the popula-
                          tion of all community college statistics students. The 36 students
                          who took version A had a mean score of 79.3 with a standard devi-
                          ation of 11.26. The 41 students who took version B had a mean
                          score of 84.1 with a standard deviation of 9.40. Construct a 95%
                          confidence interval for the difference between the mean score of
                          version A and the mean score of version B.
380   CHAPTER 8 Two-Sample Hypothesis Tests
                We will let population 1 be the scores on version A. We begin by finding the value
                for z α/2. Recall that for a 95% confidence interval, z 0.025 = 1.96.
0.9500
0.0250 0.0250
                                                      –1            0   1
                                              –1.96                         1.96
                                 s 12       s 22
                    E = z α/2           +
                                 n1         n2
                                 11.26 2   9.40 2
                        = 1.96           +
                                    36       41
                        = 4.67
                We are 95% confident that the difference between the two means is between
                –0.13 points and –9.47 points. Note that this interval does not contain 0; this is
                strong evidence that the two population means are not equal. This conclusion can
                be affirmed by checking the hypothesis test of the first example of the section. ■
                     Before
                     34       21    36    28      46    31      45        40
                     48       42    27    41      39    29      37        37
                     29       46    34    38      43    46      35        34
                     32       45    27    40      47    41
                     After
                     35       39    50    25      25    36      34        31
                     45       35    37    32      36    31      44        38
                     34       23    31    36      21    30      44        45
                     30       39    39    42      28    47      40        27
In column A, enter the scores for the semester before the change in cells A1
through A30. In column B, enter the scores for the semester after the change
in cells B1 through B32.
In cell C1, type Variance #1, and type Variance #2 in cell D1. To calculate the vari-
ance of the scores in sample 1, type the following in cell C2.
=VAR(A1:A30)
To calculate the variance of the scores in sample 2, type the following in cell D2.
=VAR(B1:B32)
Excel tells us that the variance of the first sample is 50.48 and the variance of the
second sample is 52.53.
We proceed with the test by selecting Data Analysis from the Tools menu. When
the dialog box appears, select z-Test: Two Sample for Means and click on OK.
When the z-Test dialog box opens, type A1:A30 in the box labeled Variable 1
Range and type B1:B32 in the box labeled Variable 2 Range. In the box labeled
Hypothesized Mean Difference, enter 0. (We would enter 5 if we claimed that
the first mean was 5 points higher than the second mean.) In the box labeled
Variable 1 Variance (known) enter 50.48, which was the variance that we cal-
culated earlier. In the box labeled Variable 2 Variance (known) enter 52.53. In
the box labeled Alpha, enter 0.05, which is the level of significance for this test.
Click on OK.
Variable 1 Variable 2
                            (continued)
                                                                            Variable 1          Variable 2
                Excel lists the mean for each sample. The value labeled as z is the test statistic,
                which is 1.09. Then Excel lists the p-values and critical values for a one-tailed test
                and a two-tailed test. Since our test is a two-tailed test, the p-value is 0.2762 and
                the critical value to be used in the decision rule is 1.96.
                Step 1
                Population 1: previous teaching style; population 2: new teaching style
                Claim in words: The mean score for both styles is the same.
                Claim: μ1 = μ2
                Complement: μ1 ≠ μ2
                H0: μ1 = μ2
                HA: μ1 ≠ μ2
                Step 2
                Level of significance: α = 0.05
                Step 3
                                      x   1   – x2
                Test statistic: z =
                                      s2
                                       1          s 22
                                              +
                                      n1          n2
                Step 4
                α = 0.05, two-tailed test
0.9500
0.0250 0.0250
                                                         –1            0       1
                                          –1.96                                          1.96
                Step 5
                z = 1.09 (calculated by Excel)
                Decision: Fail to reject H0.
                Conclusion: There is not sufficient sample evidence to reject the claim that the
                 mean score for both styles is the same.
                            8.1 Hypothesis Tests for Two Population Means (Large Samples)   383
p-value
                                                              p-value = 0.2762
                                            0.1381
                                                              (calculated by Excel) ■
–2 –1 0 1.09 2
                    Before
                    34       21    36      28       46   31     45     40
                    48       42    27      41       39   29     37     37
                    29       46    34      38       43   46     35     34
                    32       45    27      40       47   41
                    After
                    35       39    50      25       25   36     34     31
                    45       35    37      32       36   31     44     38
                    34       23    31      36       21   30     44     45
                    30       39    39      42       28   47     40     27
Enter the scores for the semester before the change in list L1. Enter the scores for
the semester after the change in list L2. Before proceeding with the test, we must
find both sample standard deviations, which we will use as estimates of the popula-
tion standard deviations. We access the STAT CALC menu by pressing STAT and
384   CHAPTER 8 Two-Sample Hypothesis Tests
                using the    →   key to move to the right. Highlight option 1:1-Var Stats and press
                 ENTER . When you are brought to the main screen, enter the list L1 by pressing
                 2nd     1   , and then press   ENTER . Make note of the standard deviation, which
                is 7.10. Now repeat the process for list L2. Its standard deviation is 7.25.
                Now we access the STAT TESTS menu by pressing            STAT    and using the   →   key
                to move to the right. Highlight option 3:2-SampZTest and press           ENTER .
                Since we have the sample data, highlight Data next to Inpt:. Enter the first sample
                standard deviation (7.10) next to μ1. Enter the second sample standard deviation
                (7.25) next to μ2. Next to List1: enter L1 by pressing     2nd     1   . Next to List2:
                enter L2 by pressing    2nd     2   . Leave Freq1: and Freq2: as 1. Scroll down to
                see the alternate hypothesis. This test is a two-tailed test, so highlight ≠μ2 and
                press   ENTER . Finally, scroll down one more line to highlight Calculate and press
                 ENTER . Here are the screens that you should see.
                The first line of output gives us our test statistic, which rounds to 1.09. The next line
                gives us our p-value, which is 0.2761. The remaining information includes the mean,
                standard deviation, and size of each sample. If we repeated the exact same proce-
                dure but chose Draw instead of Calculate, we would see the following screen.
                This screen also includes the value of the test statistic and p-value in addition to
                the graph. ■
                     We can also use the TI-83 to help when we are given only the sample statistics
                instead of all of the data.
Access the STAT TESTS menu by pressing         STAT    and using the   →     key to move
to the right. Highlight option 3:2-SampZTest and press           ENTER .
Since we only have the sample statistics, highlight Stats next to Inpt:. Enter the
first sample standard deviation (7.10) next to μ1. Enter the second sample standard
                                                                         –
deviation (7.25) next to μ2. Enter the first sample mean (37.27) next to x1:, and
the first sample size (30) next to n1:. Enter the second sample mean (35.28) next
   –
to x2:,   and the second sample size (32) next to n2:. Scroll down to see the alter-
nate hypothesis. This test is a two-tailed test, so highlight ≠μ2 and press     ENTER .
Finally, scroll down one more line to highlight Calculate and press        ENTER . Here
are the screens that you should see.
Note that the results are slightly off when we compare them to the previous results.
This is due to the rounding of the sample means and sample standard deviations.
The first line of output gives us our test statistic, which rounds to 1.09. The next line
gives us our p-value, which is 0.2750. The remaining information includes the mean
and size of each sample.
Step 1
Population 1: previous teaching style; population 2: new teaching style
Claim in words: The mean score for both styles is the same.
Claim: μ1 = μ2
Complement: μ1 ≠ μ2
H0: μ1 = μ2
HA: μ1 ≠ μ2
Step 2
Level of significance: α = 0.05
Step 3
                      x   1   – x2
Test statistic: z =
                      s2
                       1          s 22
                              +
                      n1          n2
Step 4
α = 0.05, two-tailed test
386   CHAPTER 8 Two-Sample Hypothesis Tests
0.9500
0.0250 0.0250
                                                  –1            0       1
                                        –1.96                                 1.96
                Step 5
                z = 1.09 (calculated by TI-83)
                Decision: Fail to reject H0.
                Conclusion: There is not sufficient sample evidence to reject the claim that the
                 mean score for both styles is the same.
p-value
EXERCISES 8.1
                Use the standard procedure for all hypothesis tests.
                     1. A college student randomly surveys 45 female students at her school and finds that
                        their mean GPA is 3.22, with a standard deviation of 0.31. She also randomly sur-
                        veys 45 male students and finds that their mean GPA is 3.12 with a standard devi-
                        ation of 0.31.
                         (a) At the 0.05 level of significance, test the claim that the mean GPA for female
                             students at her school is greater than the mean GPA of male students.
                         (b) Identify potential sources of bias for these samples.
                     2. A random sample of 88 male deaths in a county over a 1-year period had a mean
                        age at death of 68.9 years, with a standard deviation of 16.71 years. A random
                        sample of 84 female deaths in the same county over the same 1-year period had
                        a mean age at death of 77.1 years, with a standard deviation of 15.65 years. At
                        the 0.01 level of significance, test the claim that the mean age at death in this
                        county is lower for males than it is for females.
                     3. At a college, 60 male students and 63 female students were randomly surveyed.
                        The males were taking a mean of 12.37 course units, with a standard deviation of
                        3.740 units. The females were taking a mean of 12.47 units, with a standard devi-
                        ation of 3.458 units. At the 0.05 level of significance, test the claim that the mean
                        number of course units is the same for male and female students.
                     4. A random sample of 45 male college students and 50 female college students was
                        taken, and each student was asked to write down their height. The male students
                        had a mean height of 70.4 inches, with a standard deviation of 2.44 inches. The
                      8.1 Hypothesis Tests for Two Population Means (Large Samples)          387
  female students had a mean height of 64.3 inches, with a standard deviation of
  2.86 inches.
   (a) At the 0.01 level, test the claim that male college students have a greater
       mean height than female college students.
   (b) How can we eliminate the bias that may be present with students self-reporting
       their heights?
5. At a community college, students seem to perform better during the summer ses-
   sion than they do during the fall and spring semesters. Many instructors and
   administrators feel that this may be because summer classes have a different stu-
   dent makeup than the classes during the fall and spring semesters. A random sam-
   ple of 100 students in the spring semester had a mean age of 23.1 years, with a
   standard deviation of 7.10 years. A random sample of 30 summer session students
   had a mean age of 26.2 years, with a standard deviation of 6.92 years. At the 0.01
   level of significance, test the claim that the mean age of spring semester students
   is not the same as the mean age of summer session students.
6. A student conducted a study of 103 men and 104 women who had been married
   at least one time. The men had a mean age of 25.4 years at their first marriage,
   with a standard deviation of 5.19 years. The women had a mean age of 22.6 years
   at their first marriage, with a standard deviation of 4.63 years.
   (a) At the 0.01 level of significance, test the claim that the mean age of men at their
       first marriage is higher than the mean age of women at their first marriage.
   (b) If you were going to test the claim, explain how you would gather data to do
       so. Be sure to list potential biases that you are seeking to eliminate.
7. The PSAT Mathematics scores of 50 high school juniors and 35 high school soph-
   omores were selected at random.
       Juniors
       24        36   72      48     28     50      44     46      38     72
       57        69   48      64     39     48      59     51      59     48
       42        37   52      72     47     55      38     58      40     60
       60        59   38      42     49     40      31     61      42     47
       56        37   62      59     59     44      69     41      60     33
       Sophomores
         29      52      42    68     38     60     58      49   45     55
         30      60      52    37     42     49     43      36   38     52
         58      61      31    61     46     58     49      36   59     49
         48      47      37    43     31
   At the 0.01 level of significance, test the claim that the mean score for juniors on
   this test is greater than the mean score of sophomores on this test.
8. Do students do as well on the PSAT verbal test as they do on the PSAT math test?
   PSAT verbal scores were obtained for a random sample of 40 high school juniors
   and PSAT math scores were obtained for a different sample of 40 high school jun-
   iors, as follows.
       Verbal
       28        41   49      41     40     41      48     47      48     37
       56        41   36      49     40     47      44     72      42     50
       64        47   56      54     58     71      38     63      40     50
       42        53   68      43     38     39      30     70      46     60
       Math
       34        52   46      56     29     51      49     41      66     42
       50        65   50      48     48     45      48     44      67     65
       45        44   52      54     46     30      32     58      39     49
       52        72   46      59     29     39      35     53      51     41
388   CHAPTER 8 Two-Sample Hypothesis Tests
                       At the 0.01 level of significance, test the claim that the mean score for the two
                       exams is the same.
                    9. It is a widely held belief that males have better mathematical skills than verbal
                       skills. Here are the SAT math scores for 36 randomly selected male students and
                       the SAT verbal scores for 30 randomly selected male students.
                            Math
                            370      670     460      530     690      640      740     540      610
                            470      610     660      410     450      720      740     430      480
                            460      520     580      580     560      510      320     640      650
                            580      540     580      440     310      260      420     320      410
                            Verbal
                            320      280     750      450     520      460      630     580      490
                            570      450     490      590     530      320      380     520      440
                            480      610     660      740     580      330      480     590      330
                            580      500     480
                       At the 0.05 level of significance, test the claim that the mean SAT math score for
                       males is greater than the mean SAT verbal score for males.
                   10. It is a widely held belief that females have better verbal skills than mathematical
                       skills. Here are the SAT math scores for 42 randomly selected female students and
                       the SAT verbal scores for 33 randomly selected female students.
                            Math
                            320      490     470      280     390      560      520     760      480
                            460      470     490      460     520      530      580     370      490
                            550      600     310      480     460      340      400     330      700
                            570      420     570      750     730      470      640     530      470
                            310      560     570      330     570      500
                            Verbal
                            280      250     510      680     570      360      480     540      600
                            350      590     390      580     540      510      550     320      510
                            290      550     550      350     450      530      630     630      630
                            490      570     690      620     500      380
                       At the 0.05 level of significance, test the claim that the mean SAT math score for
                       females is less than the mean SAT verbal score for females.
                   11. A random sample of 56 departing flights at an airport over a 3-month period had
                       a mean wait of 14.2 minutes between boarding and takeoff, with a standard devi-
                       ation of 4.53 minutes. At the same airport, a random sample of 81 incoming
                       flights over the same 3-month period had a mean wait of 17.5 minutes between
                       the time that the plane arrived at the gate and the time that the baggage reached
                       the baggage claim area, with a standard deviation of 9.87 minutes.
                       (a) At the 0.05 level of significance, test the claim that at this airport the mean
                           wait for takeoff is less than the mean wait for baggage.
                       (b) Explain how you would gather data to test the claim that the mean wait for
                           takeoff is less than the mean wait for baggage at U.S. airports.
                   12. A survey of 65 beer drinkers ages 21–29 had a mean of 22.7 servings of beer dur-
                       ing the last month, with a standard deviation of 8.49 servings. A survey of 107 beer
                       drinkers ages 30–39 had a mean of 19.8 servings of beer during the last month,
                       with a standard deviation of 6.21 servings. At the 0.05 level of significance, test the
                       claim that the mean monthly beer consumption of beer drinkers ages 21–29 is
                       greater than the mean monthly beer consumption of beer drinkers ages 30–39.
                       (Based on the results of a study by Maritz AmeriPoll.)
                        8.1 Hypothesis Tests for Two Population Means (Large Samples)                   389
13. A professional bowler claims that his game is better suited to synthetic lanes than
    to natural lanes. Here are the scores of 42 randomly selected games on synthetic
    lanes and the scores of 42 randomly selected games on natural lanes.
        Synthetic
        256       269           279        245       290      214       210           235         238
        213       279           236        218       222      247       188           249         214
        196       222           237        177       246      227       279           267         236
        237       244           217        258       264      266       279           238         257
        238       224           226        267       256      228
        Natural
        248       221           246        255       289      244       223           222         255
        192       224           203        204       179      244       234           225         218
        201       219           214        233       204      212       217           207         203
        244       163           215        182       192      203       198           210         225
        207       236           224        264       228      164
    At the 0.05 level of significance, test the bowler’s claim.
14. A math instructor is not sure that collecting homework from his class on a daily
    basis is beneficial to his students. He conducts an experiment by collecting home-
    work from one algebra class of 37 students but not collecting homework from
    another algebra class of 32 students. Here are the test scores for both classes.
        Collected Homework
        53     88       68            64     70       67       57       88        83         79
        78     55       84            78     78       62       82       76        54        74
        85     76       74            71     84       87       75       73        63         64
        73     48       82            87     78       78       75
        National
        11     11       6             1     15       19        9       11       12          13
          6    27       6         12        11        6       14       13         6         6
                                                                                (continues)
390   CHAPTER 8 Two-Sample Hypothesis Tests
                           (continued)
                           National
                           10     13      9         16         3    14    9    22    11      9
                            6      4      6          9     13       13   11     3     8     12
                            3     13      9         14         5     5    6    13     7     12
                            3      7      9         22     10        3   10     6     6      6
                            7      6     11         19     13       10    5     4    10     14
                            4
                       At the 0.01 level of significance, test the claim that the mean number of runs
                       scored in American League games is greater than the mean number of runs scored
                       in National League games.
                   16. Do American League baseball games have more hits than National League base-
                       ball games? Here are the number of hits in 62 randomly selected American League
                       games and 71 randomly selected National League games.
                           American
                           13     16     23         23     23       16   14    14    23      9
                           32     27     17         12     14       27   31    24     9     17
                           11     19     18         26     13       28   14    18    18     17
                           14     21     15         27     16       13   24    19    22     16
                           20     20     17         14     18       22   20    18    20     18
                           14     18     21         31     28       33   27    20    16     19
                           17      7
                           National
                           19     20     14         12     25       27   16    17    23     19
                           20     32     15         23     21       17   23    17    13     20
                           17     21     20         18     15       20   23    30    19     20
                           13      9     13         16     19       25   21    14    11     18
                            8     19     19         24     10       19   12    16    16     22
                           14     20     16         26     21        9   16    14    16     13
                           18     12     19         27     24       16   14    18    20     19
                           12
                       At the 0.01 level of significance, test the claim that the mean number of hits in
                       American League games is greater than the mean number of hits in National
                       League games.
                   17. Do American League baseball games take longer than National League baseball
                       games? Here are the lengths, in minutes, of 62 randomly selected American
                       League games and 71 randomly selected National League games.
                           American
                           175     168        185        183       196   151   198    148        189
                           146     225        199        143       149   156   192    230        178
                           124     167        156        185       179   185   144    158        157
                           196     190        166        176       186   153   189    180        205
                           163     177        154        172       178   186   187    163        181
                           208     142        200        150       172   166   168    182        219
                           235     224        202        171       159   153   169    171
                          8.1 Hypothesis Tests for Two Population Means (Large Samples)     391
         National
         170       187       169   129       159   209      181     183      197
         170       157       214   154       166   191      178     198      178
         161       198       179   150       265   136      143     152      190
         213       158       176   146       172   137      200     158      181
         202       152       128   161       153   164      169     194      108
         146       157       167   169       190   160      177     171      198
         159       160       146   188       149   151      172     143      189
         204       215       181   141       152   176      180     158
    At the 0.05 level of significance, test the claim that the mean time to complete an
    American League game is longer than the mean time to complete a National
    League game.
18. Here are the prices for a gallon of unleaded regular gas at 48 randomly selected
    gas stations in the Los Angeles area and 39 randomly selected gas stations in the
    San Francisco Bay area from the week before Christmas in 1999.
         Los Angeles
         $ 0.949         $ 1.019   $ 0.989     $ 1.039    $ 0.969     $ 0.999
         $ 0.979         $ 1.019   $ 1.039     $ 1.009    $ 0.939     $ 0.989
         $ 1.059         $ 0.959   $ 1.049     $ 1.029    $ 1.039     $ 0.999
         $ 1.099         $ 1.019   $ 1.019     $ 1.029    $ 1.069     $ 0.979
         $ 1.039         $ 1.089   $ 1.049     $ 0.989    $ 1.039     $ 1.019
         $ 1.019         $ 0.999   $ 1.009     $ 0.949    $ 1.019     $ 1.069
         $ 1.069         $ 0.979   $ 0.999     $ 1.029    $ 1.079     $ 0.979
         $ 0.999         $ 1.039   $ 0.969     $ 1.059    $ 1.049     $ 0.979
         San Francisco
         $ 1.049         $ 1.169   $ 1.029     $ 1.199    $ 1.059     $ 1.099
         $ 1.139         $ 1.109   $ 1.129     $ 1.139    $ 1.139     $ 1.189
         $ 1.089         $ 1.079   $ 1.149     $ 1.149    $ 1.059     $ 1.179
         $ 1.169         $ 1.069   $ 1.139     $ 1.149    $ 1.109     $ 1.099
         $ 1.099         $ 1.199   $ 1.129     $ 1.039    $ 1.109     $ 1.089
         $ 1.159         $ 1.069   $ 1.089     $ 1.129    $ 1.239     $ 1.149
         $ 1.169         $ 1.119   $ 1.169
     At the 0.01 level, test the claim that the mean price for a gallon of unleaded reg-
     ular gas in the Los Angeles area is lower than the mean price for a gallon of
     unleaded regular gas in the San Francisco area.
19. A random sample of 140 women who first married in 1970 had a mean age of 20.8
    years, with a standard deviation of 3.1 years. A similar sample of 108 women who
    first married in 1998 had a mean age of 25.0 years with a standard deviation of
    4.1 years. At the 0.05 level of significance, test the claim that the mean age at
    which women get married has increased from 1970 to 1998. (Based on data from
    the U.S. Census Bureau.)
20. Do women have more doctor visits per year than men? A random sample of 111
    men produced a mean of 3.8 visits per year, with a standard deviation of 2.1 vis-
    its. A random sample of 153 women produced a mean of 5.8 visits per year, with
    a standard deviation of 1.8 visits. At the 0.05 level of significance, test the claim
    that the mean number of doctor visits per year is higher for women than it is for
    men. (Based on the results of the March 2000 National Vital Statistics Report.)
21. Do households that have cable TV have higher incomes? A sample of 420 cable
     households had a mean annual income of $51,468 with a standard deviation of
392   CHAPTER 8 Two-Sample Hypothesis Tests
                        $43,329. A sample of 317 households that do not have cable TV had a mean
                        annual income of $36,735 with a standard deviation of $29,071. At the 0.05 level
                        of significance, test the claim that the mean annual income of households with
                        cable TV is higher than the mean annual income of households without cable TV.
                        (Based on a study by Cablevision Advertising Bureau.)
                    22. A random sample of 163 men between the ages of 20 and 29 had a mean weight
                        of 172.1 pounds with a standard deviation of 33.88 pounds. A random sample of
                        85 men between the ages of 50 and 59 had a mean weight of 189.2 pounds with
                        a standard deviation of 32.99 pounds. At the 0.05 level of significance, test the
                        claim that the mean weight of men who are between 20 and 29 years old is less
                        than the weight of men who are between 50 and 59 years old. (Based on the results
                        of a study by the Centers for Disease Control and Prevention.)
                    23. A child development student is interested in the amount of time that children
                        watch television. A study of 45 children ages 2–5 at the college’s day care center
                        revealed that they watched 135.4 minutes of television per day, with a standard
                        deviation of 82.1 minutes. A study of 120 elementary school children ages 6–12
                        revealed that they watched a mean of 170.6 minutes of television per day, with a
                        standard deviation of 106.9 minutes.
                        (a) Construct a 95% confidence interval for the difference between the mean
                            amount of television watched per day by 2- to 5-year-olds and the mean
                            amount of television watched per day by 6- to 12-year-olds.
                        (b) At the 0.05 level of significance, test the claim that the mean amount of tele-
                            vision watched per day by 2- to 5-year-olds is the same as the mean amount
                            of television watched per day by 6- to 12-year-olds.
                        (c) Does your hypothesis test from part (b) reinforce what your confidence inter-
                            val from part (a) told you? Explain.
                    24. A plant produces various types of corn chips. Among the varieties of chips that this
                        plant produces are blue corn tortilla chips and jalapeno tortilla chips, both of
                        which are sold in 9-ounce bags. A random sample of 50 bags of blue corn tortilla
                        chips had a mean weight of 9.06 ounces, with a standard deviation of 0.05 ounces.
                        A random sample of 50 jalapeno tortilla chips had a mean weight of 9.01 ounces,
                        with a standard deviation of 0.03 ounces.
                        (a) Construct a 95% confidence interval for the difference between the mean fill
                            of blue corn tortilla chip bags and the mean fill of jalapeno tortilla chip bags.
                        (b) At the 0.05 level of significance, test the claim that the mean fill of blue corn
                            tortilla chip bags is the same as the mean fill of jalapeno tortilla chip bags.
                        (c) Does your hypothesis test from part (b) reinforce what your confidence inter-
                            val from part (a) told you? Explain.
                                                                                        M I N I P RO J E C T
   Randomly sample at least 30 males and 30 females at your school, and ask
   them how many serious relationships they have had. Use these sample data to
   test the claim that college males and college females have had the same mean
   number of serious relationships at the 0.05 level of significance. In addition to
   your complete hypothesis test, include
                                                                                       SECTION 8.2
                                       Hypothesis Test for Two Population Means
                                                                (Small Samples)
In this section we introduce a hypothesis test for comparing the means of two differ-
ent populations, for which at least one of the samples has a size of less than 30. The
two samples must be independent, and both must be drawn from populations that are
normally distributed. If either of the samples are drawn from a population that is not
normally distributed, we cannot use the methods presented in this section. There is a
hypothesis test covered in Section 11.3 that may be used in this case.
     Traditionally, there have been two different test statistics that could be used. The
one that we will use does not assume the two samples come from populations that
have equal variances. (There is a second test that does assume the two samples come
from populations with the same variance, but we will leave that test as optional mate-
rial in the “Extra” section that follows Section 9.1.)
     The test statistic that we will be using is
           x1 – x 2
    t =
           s 12       s 22
                  +
           n1         n2
where the degrees of freedom are equal to the smaller of n1 – 1 and n2 – 1. There is
actually a formula that we could use to calculate the degrees of freedom, but our
choice of the smaller of these two values is conservative, making it harder to reject a
null hypothesis.
    We begin with a few examples of the hypothesis test using this test statistic, and
then we look at how to construct a confidence interval for the difference between two
population means.
394   CHAPTER 8 Two-Sample Hypothesis Tests
                E XAMPLE                  A female college student majoring in math has heard many times
                  8.8                     that males have higher SAT math scores than females. She believes
                                          that this is not true, but instead that the two groups have the same
                                          mean SAT math scores. She randomly samples 20 females at her
                                          school. Their mean score is 509.5, with a standard deviation of
                                          110.62. She also randomly samples 25 males at her school. Their
                                          mean score is 536.4, with a standard deviation of 117.79. Use these
                                          data to test the claim that females and males have the same mean
                                          SAT math score at the 0.05 level of significance.
                Since both samples have sizes of less than 30, the test statistic introduced in this
                section is the appropriate one. We begin by labeling the populations. We let females
                taking the SAT be population 1, and we let males taking the SAT be population 2.
                Step 1
                Population 1: females; population 2: males
                Claim in words: Females and males have the same mean SAT math score.
                Claim: μ1 = μ2
                Complement: μ1 ≠ μ2
                H0: μ1 = μ2
                HA: μ1 ≠ μ2
                Step 2
                Level of significance: α = 0.05
                Step 3
                                              x   1   – x2
                Test statistic: t =                              d.f. = smaller of (n 1 – 1) and (n 2 – 1)
                                              s2
                                               1          s 22
                                                      +
                                              n1          n2
                Step 4
                α = 0.05, two-tailed test, with 20 – 1 = 19 degrees of freedom
                Decision rule: Reject H0 if t < –2.093 or if t > 2.093.
                Step 5
                            x1 – x2
                    t =
                            s 12       s 22
                                   +
                            n1         n2
                              509.5 − 536.4
                        =
                            110.62 2   117.79 2
                                     +
                               20         25
                      = –0.788
     For the 1997–98 school year, the mean SAT math score for all males was 35 points
higher than the mean SAT math score for all females—so why weren’t we able to reject
the null hypothesis that the two mean scores were equal? One potential problem
could be the small sample sizes. There was a fairly wide gap between the two sample
means (26.9 points), but the small sample sizes kept the value of the test statistic
small. Similar results based on samples of 200 and 250, rather than 20 and 25, yield
a test statistic of approximately –2.5.
     Of course, bias could have affected the value of the test statistic as well. One pos-
sible source of bias may come from the students self-reporting their SAT scores. Do we
know that the score that they reported was their actual score? Are there any other
potential biases?
Since both samples have sizes of less than 30, the test statistic introduced in this
section is the appropriate one. We begin by labeling the populations. We let qualify-
ing games be population 1, and we let match play games be population 2.
Step 1
Population 1: Qualifying; population 2: match play
Claim in words: The mean score for qualifying games is higher than the mean
  score for match play games.
Claim: μ1 > μ2
Complement: μ1 ≤ μ2
H0: μ1 ≤ μ2
HA: μ1 > μ2
Step 2
Level of significance: α = 0.05
Step 3
                      x   1   – x2
Test statistic: t =                      d.f. = smaller of (n 1 – 1) and (n 2 – 1)
                      s2
                       1          s 22
                              +
                      n1          n2
Step 4
α = 0.05, right-tailed test, with 18 – 1 = 17 degrees of freedom
396   CHAPTER 8 Two-Sample Hypothesis Tests
                          x1 – x2
                    t =
                          s 12       s 22
                                 +
                          n1         n2
                           230.8 − 224.8
                      =
                          33.82 2   27.76 2
                                  +
                            18        24
                      = 0.613
                E XAMPLE                Many people believe that male racehorses run faster than female
                  8.10                  racehorses. To test this theory, the results of 55 randomly selected
                                        races at Santa Anita Park were selected, all at the distance of 1
                                        mile. Thirty-four of the races were for male racehorses, and 21 of
                                        the races were restricted to fillies and mares (female racehorses).
                                        The time required by the winners of the male races to complete the
                                        1-mile race had a mean of 96.9 seconds, with a standard deviation
                                        of 1.22 seconds. The time required by the winners of the races
                                        restricted to fillies and mares to complete the 1-mile race had a
                                        mean of 97.8 seconds, with a standard deviation of 1.55 seconds.
                                        At the 0.05 level of significance, test the claim that the mean time
                                        for a male racehorse to win a 1-mile race is less than the mean
                                        time for a female racehorse to win a 1-mile race.
                Even though one of the samples (male races) has a size that is at least 30, the
                small sample test is appropriate for this two-mean test because the other sample
                has a size that is less than 30. We begin by labeling the populations. We let male
                races be population 1, and we let female races be population 2.
                Step 1
                Population 1: male races; population 2: female races
                Claim in words: The mean time for a male racehorse to win a 1-mile race is less
                  than the mean time for a female racehorse to win a 1-mile race.
                Claim: μ1 < μ2
                Complement: μ1 ≥ μ2
                H0: μ1 ≥ μ2
                HA: μ1 < μ2
                Step 2
                Level of significance: α = 0.05
                                         8.2 Hypothesis Tests for Two Population Means (Small Samples)   397
Step 3
                              x    1    – x2
Test statistic: t =                                d.f. = smaller of (n 1 – 1) and (n 2 – 1)
                               s2
                                1           s 22
                                        +
                                   n1       n2
Step 4
α = 0.05, left-tailed test, with 21 – 1 = 20 degrees of freedom
Decision rule: Reject H0 if t < –1.725.
Step 5
          x1 – x2
    t =
          s 12       s 22
                 +
          n1         n2
           96.9 − 97.8
      =
          1.22 2   1.55 2
                 +
            34       21
      = –2.263
Using techniques developed in Section 7.2, the best that we can say is that the
p-value is between 0.025 and 0.01. A calculator or computer would tell us that the
actual p-value of this test is 0.0150. ■
Confidence Intervals
Recall that the idea of hypothesis testing is based on confidence intervals. There is a
confidence interval that can be constructed for the difference between two population
means, when at least one of the samples is small. We begin, as we did before, by cal-
culating the margin of error. Here is the formula.
                 s 12       s 22
    E = tα/2            +               degrees of freedom: the smaller of n 1 – 1 and n 2 – 1
                 n1         n2
                                                             –     –
We then subtract this margin of error from x 1 – x 2 to find the left endpoint of the con-
                                                       –    –
fidence interval, and we add the margin of error to x 1 – x 2 to find the right endpoint
of the confidence interval. (If you want to create a one-sided confidence interval, see
the “Extra” material at the end of the Section 6.2 exercises.)
                                     score on this exam for these 14 students was 80.9 points out of 100,
                                     with a standard deviation of 9.62 points. The mean score for the 10
                                     students who did not attend was 45.4 points out of 100, with a stan-
                                     dard deviation of 16.36 points. Construct a 95% confidence interval
                                     for the difference between the mean scores of algebra students who
                                     attend a review session and the mean scores of algebra students
                                     who do not attend a review session. (Assume that the students in
                                     this class represent a random sample of all algebra students.)
                We let population 1 be the algebra students who attend a review session, and pop-
                ulation 2 be the algebra students who do not attend a review session. We begin by
                finding t α/2. We have 9 degrees of freedom, since the smaller sample had size 10.
                For 9 degrees of freedom, t 0.025 = 2.262. Now we calculate the margin of error.
                                  s 12       s 22
                    E = t α/2            +
                                  n1         n2
                                    9.62 2   16.36 2
                      = 2.262              +
                                      14        10
                      = 13.07
                We are 95% confident that the mean score of algebra students who attend a
                review session will be between 22.43 and 48.57 points higher than the mean score
                of algebra students who do not attend a study session. ■
                        Monday
                        731              542          559         592    548      629    597       692
                        542              543          530         564    560      620    564       714
                        610              690          610         592    774      689    531
                        Friday
                        689              669          613         569    579      635    622       557
                        558              759          725         736    637      785    683       602
                    At the 0.05 level of significance, test the claim that the mean vol-
                    ume on Mondays is the same as the mean volume on Fridays.
In a new Excel worksheet, type Monday in cell A1 and Friday in B1. Enter the vol-
umes for Mondays in column A, from cell A2 through A24. Enter the volumes for
Fridays in column B, from cells B2 through B17.
Select Data Analysis from the Tools menu. When the dialog box appears, select
t-Test: Two-Sample Assuming Unequal Variances and click on OK. In the
box labeled Variable 1 Range, enter A1:A24. In the box labeled Variable 2
Range, enter B1:B17. These two cell ranges include the labels Monday and Friday,
so check the box labeled Labels. In the box labeled Hypothesized Mean
Difference, enter 0. Put the level of significance, 0.05, in the box labeled Alpha.
Click on OK to see the results, which will be on a new worksheet. Here is an exam-
ple of what you should see.
Monday Friday
We see the mean, variance, and size of each sample. The box labeled df contains
the degrees of freedom. Excel uses the actual formula for degrees of freedom,
rather than our conservative estimate. Here it is.
                                                 2
                       ⎛s2  s2           ⎞
                       ⎜ 1 + 2           ⎟
                       ⎜n   n2           ⎟
                       ⎝ 1               ⎠
    d .f . =
                                2                             2
                     ⎛s2    ⎞                       ⎛s2   ⎞
                 1   ⎜ 1    ⎟                1      ⎜ 2   ⎟
                                    +
               n1 – 1⎜ n1   ⎟           n2       – 1⎜n2   ⎟
                     ⎝      ⎠                       ⎝     ⎠
(Recall that we have been using 1 less than the smaller sample size as the number
of degrees of freedom, which in this case is 15.) Here is the calculation that pro-
duces 32 degrees of freedom.