Solutions exercises Chapter 6
Exercise 1
Note on the representativeness of the sample: if a sample is randomly drawn from a population, each subject
from the population is chosen equally likely. If subpopulations can be identified, it is preferable that the
subpopulations are represented proportionally in the sample, but this cannot be guaranteed in advance. In
this case, if we sample randomly, one would expect that about 27% of the sampled subjects are drawn from
subpopulation 1, 18% from 2, etc.
Applying the Chi-squared test we can verify whether the observed numbers for each subpopulation deviate
significantly from the expected number: we define the seven numbers 𝑁1 , 𝑁2 , … , 𝑁7 as follows: 𝑁1 is the
number of draws from the subpopulation 1, 𝑁2 is the number of draws from the subpopulation 2, etc.
Pearson’s 𝜒 2 -test in eight steps:
1. The numbers 𝑁1 , 𝑁2 , … , 𝑁7 have a multinomial distribution with 𝑛 = 150 and probabilities 𝑝1 , 𝑝2 , … , 𝑝7 .
 2. Test 𝐻0 : 𝑝1 = 0.27, 𝑝2 = 0.18, 𝑝3 = 0.15, 𝑝4 = 0.14, 𝑝5 = 0.10. 𝑝6 = 0.09 and 𝑝7 = 0.07 against
         𝐻1 : “at least one of the 𝑝𝑖 deviates from the values under 𝐻0 ”, with α = 5%
                                      (𝑁𝑖 −𝐸0 𝑁𝑖 )2
3. Test statistic: 𝜒 2 = ∑2𝑖=1                        , with expected numbers 𝐸0 𝑁𝑖 = 𝑛𝑝𝑖0 = 150𝑝𝑖0 under 𝐻0
                                         𝐸0 𝑁𝑖
4. If 𝐻0 is true, 𝜒 2 a Chi-square distribution with number of degrees of freedom 𝑑𝑓 = 7 − 1 = 6.
5. Observed value of 𝜒 2 :
          (43−40.5)2         (27−27.0)2       (31−22.5)2       (20−21.0)2   (11−15.0)2   (10−13.5)2   (8−10.5)2
    𝜒 2 = 40.5 + 27.0 + 22.5 + 21.0 + 15.0 + 13.5 + 10.5 = 5.982
6. Reject 𝐻0 if 𝜒 2 ≥ 𝑐 : using 𝛼 = 0.05 and the 𝜒62 -table, so 𝑐 = 12.59
7. The observed value 5.982 does not lie in the Rejection Region, so do not reject 𝐻0 .
8. At a 5% significance level we did not prove that the sample is not representative.
   (In practice it means that assuming representativeness is reasonable, that is, it is not rejected: note that
    we did not “prove” representativeness. We could not prove it is not the case….)
Exercise 2                                                Decision by mother daughter total
The mentioned 1 × 2 −table is:                             number      243     157     400
   The observed numbers 𝑁1 = 243 and 𝑁2 = 157 should be compared to the expected values under 𝐻0 :
                          1
    𝐸0 𝑁1 = 𝐸0 𝑁2 = 400 ∙ 2 = 200, so
                  (𝑁𝑖 −𝐸0 𝑁𝑖 )2       (243−200)2          (157−200)2
   𝜒 2 = ∑2𝑖=1                   =                   +                = 18.49
                     𝐸0 𝑁𝑖                200                200
   The test is upper-tailed with a critical value 𝑐 taken from the Chi-square table with 𝑑𝑓 = 2 – 1 = 1:
     𝑐 = 3.84
 Since the observed value of 𝜒 2 falls in the Rejection Region (𝜒 2 ≥ 3.84), we reject 𝐻0 .
Indeed, 𝑍 2 = 𝜒 2 , since 4.32 = 18.49, but (1.645)2 ≠ 𝑐. We have 1.6452 ≈ 2.71 ≠ 3.84
This is a consequence of the upper-tailed binomial test on 𝑝: 𝜒 2 “does not distinguish negative and positive
differences of the observed and expected numbers”.
Conclusion: for two-sided test the binomial test on p and the 𝜒 2 -test are equivalent, for a one-sided test on p
the binomial test should be preferred.
Note on the use of SPSS: SPSS reports as the p-value the “Asymptotic Significance (2-sided)”, though we
know that the Chi-square tests in this chapter are all right-sided. SPSS (falsely) calls this p-value 2-sided
because of the corresponding two-sided binomial z-test.
Exercise 3
To determine 4 categories (intervals) with probability 0.25
we will use the standard normal table, see the graph.
We are searching a value of c such that Φ(𝑐) = 0.75, so
𝑐 = Φ−1 (0.75) = 0.67.
Interval 1: from   to -0.67
Interval 2: from -0.67 to 0
Interval 3: from 0 to 0.67
Interval 4: from 0.67 to 
The observed numbers 𝑛1 , 𝑛2 , 𝑛3 and 𝑛4 for these 4 intervals are 4, 5, 6 and 5 (𝑛 = 20 in total), where the
expected numbers are 𝐸0 𝑁𝑖 = 𝑛 × 0.25 = 5 (satisfies the bound of the condition 𝐸0 𝑁𝑖 ≥ 5).
(Pearson’s) Chi-square test:
1. The numbers 𝑁1 , 𝑁2 , 𝑁3 and 𝑁4 are multinomially distributed with total 𝑛 = 10 and unknown success
   probabilities 𝑝1 , 𝑝2 , 𝑝3 and 𝑝4 , resp.
2. We test 𝐻0 : 𝑝1 = 𝑝2 = 𝑝3 = 𝑝4 = 0.25 against 𝐻1 : 𝑝𝑖 ≠ 0.25 for at least one 𝑖 with 𝛼 = 0.05.
                                 (𝑁𝑖 −𝐸0 𝑁𝑖 )2
3. Test statistic: 𝜒 2 = ∑4𝑖=1                   met 𝐸𝑖 = 5
                                    𝐸0 𝑁𝑖
4. Under 𝐻0 𝜒 2 has a Chi-square distribution with df = k – 1 = 3 .
                            (4−5)2        (5−5)2        (6−5)2     (5−5)2     1    1
5.   Observed value: 𝜒 2 = 5 + 5 + 5 + 5 = 5 + 5 = 0.4
6.   We will reject 𝐻0 if 𝜒 2 ≥ 𝑐 . 𝛼 = 0.05, so from the 𝜒32 -table it follows that 𝑐 = 7.81
7.   The observed value 0.4 does not lie in the RR, so do not reject 𝐻0 .
8.   At a 5% significance level we did not prove that the distribution from which the data are drawn is not the
     standard normal distribution. (Note that we did not prove that the distribution is standard normal)
Exercise 4
We denote the exam result a 1 if it is “low”, 2 for “medium” and 3 for “high”.
The text provides the following information, presented in the cross table:
                                                     Exam result
                                            1         2           3                            Row total
                   Level of education 1 11 (10.2) 20 (17.5) 4 (7.3)                              35
                   Level of education 2 15 (13.1) 18 (22.5) 12 (9.4)                             45
                   Level of education 3 9 (11.7)   22 (20)   9 (8.3)                             40
                   Column total            35        60          25                              120 = 𝑛
The numbers between brackets are the estimates of the expected numbers that we will use later. They are
                                      𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙×𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙                                 35×35
determined with the formula 𝐸̂0 𝑁𝑖𝑗 =                        , e.g. for cell (1,1): 𝐸̂0 𝑁11 = 120 ≈ 10.2
                                                𝑛
Define: 𝑁𝑖𝑗 = “number of employees with educational level 𝑖 and exam result 𝑗” (𝑖, 𝑗 = 1, 2, 3),
      𝑝𝑖𝑗 = “probability that an arbitrary person in the population has educational level 𝑖 and exam result 𝑗”
We will apply the test on independence, since we have only one sample where two variables (educational
level and exam result) are scored.
1. Model: The number 𝑁11 , 𝑁12 , … . , 𝑁33 have a multinomial distribution with total 𝑛 = 120 and
   probabilities 𝑝𝑖𝑗 .
2. Test 𝐻0 : 𝑝𝑖𝑗 = 𝑝𝑖∙ 𝑝∙𝑗 against 𝐻1 : 𝑝𝑖𝑗 ≠ 𝑝𝑖∙ 𝑝∙𝑗 for at least one pair (𝑖, 𝑗), with α = 1%.
        (remark: 𝑝𝑖∙ and 𝑝∙𝑗 are the row total and column total, related to the cell (𝑖, 𝑗),
         for example, 𝑝1∙ = 𝑝11 + 𝑝12 + 𝑝13 )
                                                        2
                                    (𝑁 −𝐸̂ 𝑁 )                               rowtotal×columntotal
                    2
3. Test statistic: 𝜒 =   ∑3𝑗=1 ∑3𝑖=1 𝑖𝑗 ̂ 0 𝑖𝑗              , with 𝐸̂0 𝑁𝑖𝑗 =
                                       𝐸 𝑁       0 𝑖𝑗                                 𝑛
4. Under 𝐻0 𝜒 2 has a chi-square distribution with degrees of freedom 𝑑𝑓 = (𝑐 − 1)(𝑟 − 1) = 4.
                                             (11 − 10.2)2   (20 − 17.5)2      (9 − 8.3)2
5. For 𝐸̂0 𝑁𝑖𝑗 see the table above, so 𝜒 2 =              +              + ⋯+            = 4.69
                                                            10.2            17.5             8.3
6. We reject 𝐻0 if 𝜒 2 ≥ 𝑐 . 𝛼 = 0.05, so from the 𝜒42 -table it follows that 𝑐 = 9.49.
7. The outcome 4.69 is not in the Rejection Region, so do not reject 𝐻0 .
8. At a 5% significance level we could not prove that the exam results depend on the educational level
Exercise 5
We start with producing the table with totals and the 𝐸𝑖𝑗 = 𝐸̂0 𝑁𝑖𝑗 between brackets in each cell:
 Type of company           No reply              Reply                   Row total
 Small                    102 (117.3)           98 (82.7)                  200
 Medium                   121 (117.3)           79 (82.7)                  200
 Large                    129 (117.3)           71 (82.7)                  200
 Column total                352                  248                      600
We have the results of three samples in this case. We are investigating whether the three populations of
companies differ in replying to the questionnaire: a test on the homogeneity of the “Reply”-variable.
(This only affects the first two steps of the testing procedure)
Let 𝑁11 and 𝑁12 be the numbers of the small companies that gave “No reply” and “Reply”, resp. We could
state that 𝑁11 and 𝑁12 are multionially distributed, but since we have only two categories we will prefer to
state that 𝑁11 binomially distributed with 𝑛 = 200 and success probability 𝑝11 (implying that
𝑁12 ~𝐵(200, 1 − 𝑝11 ) , 𝑝12 = 1 − 𝑝11 ). Instead of 𝑝𝑖1 and 𝑝𝑖2 we might use the symbols 𝑝𝑖 and 1 − 𝑝𝑖 .
Similarly 𝑁21 and 𝑁22 for the medium companies and 𝑁31 and 𝑁32 for the large companies are introduced.
The 8 steps of the test on the homogeneity (3 samples, all 𝑛 = 200) are in this case:
1. Model: the numbers 𝑁11 , 𝑁21 en 𝑁31 are and binomially distributed with success probabilities 𝑝𝑖1 .
2. Test 𝐻0 : 𝑝11 = 𝑝21 = 𝑝31 against 𝐻1 : “𝑝11 ≠ 𝑝21 or 𝑝21 ≠ 𝑝31 ” with α = 1%.
                                                       2
                                     (𝑁𝑖𝑗 −𝐸̂0 𝑁𝑖𝑗 )                                𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙×𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙
3. Test statistic: 𝜒 = ∑2𝑗=1 ∑3𝑖=1
                      2
                                                           with estimates 𝐸̂0 𝑁𝑖𝑗 =                        .
                                         𝐸̂0 𝑁𝑖𝑗                                              𝑛
4. Under 𝐻0 𝜒 2 has a Chi-square distribution with 𝑑𝑓 = (𝑟 − 1)(𝑐 − 1) = 2
                             (102−117.3)2                  (71−82.7)2
5.   Observed value: 𝜒 2 =              + ⋯ + 82.7 ≈ 7.93 (𝑓𝑜𝑟 𝐸𝑖𝑗 𝑎𝑛𝑑 𝑁𝑖𝑗 𝑠𝑒𝑒 𝑡ℎ𝑒 𝑡𝑎𝑏le above)
                               117.3
6.   We reject 𝐻0 if 𝜒 2 ≥ 𝑐 . 𝛼 = 0.01, so from the 𝜒22 -table it follows that 𝑐 = 9.21.
7.   The observed value 7.93 does not fall in the RR, so do not reject 𝐻0 .
8.   We cannot state, at a 1% level of significance, that the three types of companies responded differently.
Exercise 6
a. We consider the observations to be drawn as one random sample from a population of PhD-students,
   where for each PhD student the variables “Gender” and “Promotion in 6 years” are scored: a test on
   independence of these variables can be applied.
                                                               𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙×𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙
b. The observed numbers 𝑁𝑖𝑗 and the expected numbers 𝐸̂0 𝑁𝑖𝑗 =                        are the following:
                                                                         𝑛
                                                                             Gender
                                          Female                     male              Total
                              Yes 𝑁11 = 98, 𝐸̂0 𝑁11= 116.5 𝑁12 = 423, 𝐸̂0 𝑁12 = 404.5   521
         Promotion in 6 years
                               No 𝑁21 =131, 𝐸̂0 𝑁21= 112.5 𝑁22 = 372, 𝐸̂0 𝑁22 = 390.5   503
                              Total         229                       795             1024 = n
     1. Model: the numbers 𝑁11 , 𝑁12 , 𝑁21 and 𝑁22 are multinomially distributed with total 𝑛 = 1024 and
        (unknown) probabilities 𝑝11, 𝑝12, 𝑝21 , 𝑝22 .
     2. Test 𝐻0 : 𝑝𝑖𝑗 = 𝑝𝑖∙ 𝑝∙𝑗 versus 𝐻1 : 𝑝𝑖𝑗 ≠ 𝑝𝑖∙ 𝑝∙𝑗 , for at least one pair (𝑖, 𝑗), with α = 1%
           (remark: 𝑝𝑖∙ and 𝑝∙𝑗 are the row and column total, related to the cell (𝑖, 𝑗), e.g.. 𝑝1∙ = 𝑝11 + 𝑝12 )
                                                              2
                                            (𝑁𝑖𝑗 −𝐸̂0 𝑁𝑖𝑗 )                                  𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙×𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙
     3. Test statistic: 𝜒 2 = ∑2𝑗=1 ∑2𝑖=1                         , with estimates 𝐸̂0 𝑁𝑖𝑗 =
                                                𝐸̂0 𝑁𝑖𝑗                                                𝑛
     4. Under 𝐻0 𝜒 2 has a Chi-square distribution with number of degrees of freedom
        𝑑𝑓 = (𝑟 − 1)(𝑐 − 1) = 1
                                        2
                               (98−116.5)        (372−390.5)     2
   5.   Observed value: 𝜒 2 = 116.5 + ⋯ +                    ≈ 7.703 (𝐸̂0 𝑁𝑖𝑗 is in the table above)
                                                     390.5
   6.   We reject 𝐻0 if 𝜒 2 ≥ 𝑐 . 𝛼 = 0.05, so from the 𝜒12 -table it follows that 𝑐 = 3.84.
   7.   The observed value 7.703 lies in the Rejection Region, so reject 𝐻0 .
   8.   At a 5% level of significance we showed that the probability of a promotion within 6 years depends
        on the gender.
c. The z-score in exercise 3 of chapter 5: 𝑍 = −2.77, so 𝑍 2 ≈ 7.67, which equals 𝜒 2 ≈ 7.70 (small
   difference because of rounding)
   The test was: reject 𝐻0 , if |𝑍| ≥ 1.96 ⇔ 𝑍 2 ≥ 1.962 , which matches the Chi-square test χ2 ≥ 3.84.
Exercise 7
a. We apply the Chi-square test on independence toe (we have one sample and 2 variables in this case):
   We will add the row and column totals and the expected numbers (assuming independence) between
   brackets.
                                                                                     Not                           Row
                           Very high          high               perhaps                             Not at all
                                                                                 considerably                      total
 > 3 times per week        73 (67.5)     140 (141.4)          223 (227.6)        185 (192.4)        132 (124.2)      753
1 – 3 times per week       19 (17.7)      39 (37.0)            56 (59.5)          54 (50.3)          29 (32.5)       197
 3 times a month or         2 (8.9)       18 (18.6)            38 (29.9)           9 (25.3)          12 (16.3)        99
         less
    Column total              94              197                    317                268              173       1049
   1. Model: the numbers 𝑁11 , 𝑁12 , … , 𝑁35 have a multinomial distribution with total 𝑛 = 1049 and
      probabilities 𝑝11, 𝑝12 ,…, 𝑝35 .
   2. Test 𝐻0 : 𝑝𝑖𝑗 = 𝑝𝑖∙ 𝑝∙𝑗 against 𝐻1 : 𝑝𝑖𝑗 ≠ 𝑝𝑖∙ 𝑝∙𝑗 , for at least one pair (𝑖, 𝑗), with α = 5%.
           (remark: 𝑝𝑖∙ and 𝑝∙𝑗 are the row and column total, e.g.. 𝑝1∙ = 𝑝11 + ⋯ + 𝑝15 )
                                                    2
                                       (𝑁 −𝐸̂ 𝑁 )                                  𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙×𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙
                       2
   3. Test statistic: 𝜒 =   ∑5𝑗=1 ∑3𝑖=1 𝑖𝑗 ̂ 0 𝑖𝑗       , with estimates 𝐸̂0 𝑁𝑖𝑗 =
                                          𝐸 𝑁
                                            0 𝑖𝑗                                             𝑛
   4. Under 𝐻0 𝜒 2 has a Chi-square distribution with 𝑑𝑓 = (𝑟 − 1)(𝑐 − 1) = 8
                                                                           (73−67.5)2          (12−16.3)2
   5.   Observed value (using the values in the table above) 𝜒 2 = 67.5 + ⋯ + 16.3 ≈ 11.63.
   6.   We reject 𝐻0 if χ2 ≥ . 𝛼 = 0.05, so from the 𝜒82 -table it follows: 𝑐 = 15.51.
   7.   The outcome 11.63 is not in the rejection region (11.63 < 15.51), so do not reject 𝐻0 .
   8.   At a 5% significance level we could not prove that the need for automatic adjustment of the speed
        depends on the frequency of the car use.
b. 2 is de observed number 𝑛31 , but the estimated expected value 𝐸̂0 𝑁31 = 8.9 ≥ 5 is large enough.
Exercise 8
The reasoning is as follows: if the options are equally attractive, the number of 9+4 persons, who follow,
will be arbitrarily divided into the two groups of 10 and 9 persons in the two tests. We will reject “arbitrary
division” in favour of “option 1 is more attractive” if the probability of 9 or even 10 followers among the
option 1 test persons is small.
If 𝑋 = “# followers among the 10 persons in the option 1 test”, then:
the p-value = 𝑃(𝑋 ≥ 9) = 𝑃(𝑋 = 9) + 𝑃(𝑋 = 10) (see the diagram)                           Follow Not        Total
                13 6      13 6
              ( )( ) ( )( )
           = 9 19 1 + 1019 0 ≈ 4.64% + 0.31% = 4.95%
                (10)       (10)                                                Persons      13      6        19
                                                                                                     ↓         ↓     ↓
                                                                                        Option 1     9         1     10
Fisher’s exact test for a 2×2 cross table with small numbers:
1. Model: the numbers 𝑁𝑖𝑗 have a multinomial distribution with total 𝑛 = 19 and success rates 𝑝𝑖𝑗
2. Test 𝐻0 : 𝑝𝑖𝑗 = 𝑝𝑖∙ 𝑝∙𝑗 (independence) against
        𝐻1 : 𝑝𝑖𝑗 ≠ 𝑝𝑖∙ 𝑝∙𝑗 , for at least one pair (i, j), with α = 5%.
3. Test statistic: 𝑋 = “the number of followers among the 10 persons in the option 1 test”
4. Under 𝐻0 𝑋 has a an hypergeometric distribution (see diagram for the parameters)
5. Observed: 𝑋 = 9
6. We will reject 𝐻0 if the p-value ≤ 𝛼 = 0.05.
                                                           (13  6
                                                            9 )(1)       (13  6
                                                                          10)(0)
   p-value = 𝑃(𝑋 ≥ 9|𝐻0 ) = 𝑃(𝑋 = 9) + 𝑃(𝑋 = 10) =          (19
                                                                     +    (19
                                                                                   ≈ 4.64% + 0.31% = 4.95%
                                                             10)           10)
7. p-value < 𝛼 = 5%, so the null hypothesis is rejected.
8. At a 5% level of significance the hypothesis “equal effect in options 1 and 2” is rejected in favour of the
   alternative “option 1 is more attractive than option 2”.