0% found this document useful (0 votes)
87 views5 pages

Solutions Exercises Chapter 6 Exercise 1: Asymptotic Significance (2-Sided) "

The document provides details and steps for performing chi-squared tests on sample data to test hypotheses about population distributions. It includes 5 exercises that demonstrate applying the chi-squared test to different sample data scenarios, such as comparing observed and expected sample proportions, testing independence of variables, and testing homogeneity across multiple samples. The exercises show calculating the test statistic, determining the degrees of freedom, identifying the critical value, and making conclusions about whether to reject the null hypothesis or not based on the test statistic value.

Uploaded by

DanyValentin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views5 pages

Solutions Exercises Chapter 6 Exercise 1: Asymptotic Significance (2-Sided) "

The document provides details and steps for performing chi-squared tests on sample data to test hypotheses about population distributions. It includes 5 exercises that demonstrate applying the chi-squared test to different sample data scenarios, such as comparing observed and expected sample proportions, testing independence of variables, and testing homogeneity across multiple samples. The exercises show calculating the test statistic, determining the degrees of freedom, identifying the critical value, and making conclusions about whether to reject the null hypothesis or not based on the test statistic value.

Uploaded by

DanyValentin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Solutions exercises Chapter 6

Exercise 1
Note on the representativeness of the sample: if a sample is randomly drawn from a population, each subject
from the population is chosen equally likely. If subpopulations can be identified, it is preferable that the
subpopulations are represented proportionally in the sample, but this cannot be guaranteed in advance. In
this case, if we sample randomly, one would expect that about 27% of the sampled subjects are drawn from
subpopulation 1, 18% from 2, etc.
Applying the Chi-squared test we can verify whether the observed numbers for each subpopulation deviate
significantly from the expected number: we define the seven numbers 𝑁1 , 𝑁2 , … , 𝑁7 as follows: 𝑁1 is the
number of draws from the subpopulation 1, 𝑁2 is the number of draws from the subpopulation 2, etc.
Pearson’s 𝜒 2 -test in eight steps:
1. The numbers 𝑁1 , 𝑁2 , … , 𝑁7 have a multinomial distribution with 𝑛 = 150 and probabilities 𝑝1 , 𝑝2 , … , 𝑝7 .
2. Test 𝐻0 : 𝑝1 = 0.27, 𝑝2 = 0.18, 𝑝3 = 0.15, 𝑝4 = 0.14, 𝑝5 = 0.10. 𝑝6 = 0.09 and 𝑝7 = 0.07 against
𝐻1 : “at least one of the 𝑝𝑖 deviates from the values under 𝐻0 ”, with α = 5%
(𝑁𝑖 −𝐸0 𝑁𝑖 )2
3. Test statistic: 𝜒 2 = ∑2𝑖=1 , with expected numbers 𝐸0 𝑁𝑖 = 𝑛𝑝𝑖0 = 150𝑝𝑖0 under 𝐻0
𝐸0 𝑁𝑖
4. If 𝐻0 is true, 𝜒 2 a Chi-square distribution with number of degrees of freedom 𝑑𝑓 = 7 − 1 = 6.
5. Observed value of 𝜒 2 :
(43−40.5)2 (27−27.0)2 (31−22.5)2 (20−21.0)2 (11−15.0)2 (10−13.5)2 (8−10.5)2
𝜒 2 = 40.5 + 27.0 + 22.5 + 21.0 + 15.0 + 13.5 + 10.5 = 5.982
6. Reject 𝐻0 if 𝜒 2 ≥ 𝑐 : using 𝛼 = 0.05 and the 𝜒62 -table, so 𝑐 = 12.59
7. The observed value 5.982 does not lie in the Rejection Region, so do not reject 𝐻0 .
8. At a 5% significance level we did not prove that the sample is not representative.
(In practice it means that assuming representativeness is reasonable, that is, it is not rejected: note that
we did not “prove” representativeness. We could not prove it is not the case….)

Exercise 2 Decision by mother daughter total


The mentioned 1 × 2 −table is: number 243 157 400
 The observed numbers 𝑁1 = 243 and 𝑁2 = 157 should be compared to the expected values under 𝐻0 :
1
𝐸0 𝑁1 = 𝐸0 𝑁2 = 400 ∙ 2 = 200, so
(𝑁𝑖 −𝐸0 𝑁𝑖 )2 (243−200)2 (157−200)2
 𝜒 2 = ∑2𝑖=1 = + = 18.49
𝐸0 𝑁𝑖 200 200
 The test is upper-tailed with a critical value 𝑐 taken from the Chi-square table with 𝑑𝑓 = 2 – 1 = 1:
𝑐 = 3.84
 Since the observed value of 𝜒 2 falls in the Rejection Region (𝜒 2 ≥ 3.84), we reject 𝐻0 .
Indeed, 𝑍 2 = 𝜒 2 , since 4.32 = 18.49, but (1.645)2 ≠ 𝑐. We have 1.6452 ≈ 2.71 ≠ 3.84
This is a consequence of the upper-tailed binomial test on 𝑝: 𝜒 2 “does not distinguish negative and positive
differences of the observed and expected numbers”.
Conclusion: for two-sided test the binomial test on p and the 𝜒 2 -test are equivalent, for a one-sided test on p
the binomial test should be preferred.
Note on the use of SPSS: SPSS reports as the p-value the “Asymptotic Significance (2-sided)”, though we
know that the Chi-square tests in this chapter are all right-sided. SPSS (falsely) calls this p-value 2-sided
because of the corresponding two-sided binomial z-test.

Exercise 3
To determine 4 categories (intervals) with probability 0.25
we will use the standard normal table, see the graph.
We are searching a value of c such that Φ(𝑐) = 0.75, so
𝑐 = Φ−1 (0.75) = 0.67.
Interval 1: from   to -0.67
Interval 2: from -0.67 to 0
Interval 3: from 0 to 0.67
Interval 4: from 0.67 to 
The observed numbers 𝑛1 , 𝑛2 , 𝑛3 and 𝑛4 for these 4 intervals are 4, 5, 6 and 5 (𝑛 = 20 in total), where the
expected numbers are 𝐸0 𝑁𝑖 = 𝑛 × 0.25 = 5 (satisfies the bound of the condition 𝐸0 𝑁𝑖 ≥ 5).

(Pearson’s) Chi-square test:


1. The numbers 𝑁1 , 𝑁2 , 𝑁3 and 𝑁4 are multinomially distributed with total 𝑛 = 10 and unknown success
probabilities 𝑝1 , 𝑝2 , 𝑝3 and 𝑝4 , resp.
2. We test 𝐻0 : 𝑝1 = 𝑝2 = 𝑝3 = 𝑝4 = 0.25 against 𝐻1 : 𝑝𝑖 ≠ 0.25 for at least one 𝑖 with 𝛼 = 0.05.
(𝑁𝑖 −𝐸0 𝑁𝑖 )2
3. Test statistic: 𝜒 2 = ∑4𝑖=1 met 𝐸𝑖 = 5
𝐸0 𝑁𝑖
4. Under 𝐻0 𝜒 2 has a Chi-square distribution with df = k – 1 = 3 .
(4−5)2 (5−5)2 (6−5)2 (5−5)2 1 1
5. Observed value: 𝜒 2 = 5 + 5 + 5 + 5 = 5 + 5 = 0.4
6. We will reject 𝐻0 if 𝜒 2 ≥ 𝑐 . 𝛼 = 0.05, so from the 𝜒32 -table it follows that 𝑐 = 7.81
7. The observed value 0.4 does not lie in the RR, so do not reject 𝐻0 .
8. At a 5% significance level we did not prove that the distribution from which the data are drawn is not the
standard normal distribution. (Note that we did not prove that the distribution is standard normal)

Exercise 4
We denote the exam result a 1 if it is “low”, 2 for “medium” and 3 for “high”.
The text provides the following information, presented in the cross table:

Exam result
1 2 3 Row total
Level of education 1 11 (10.2) 20 (17.5) 4 (7.3) 35
Level of education 2 15 (13.1) 18 (22.5) 12 (9.4) 45
Level of education 3 9 (11.7) 22 (20) 9 (8.3) 40
Column total 35 60 25 120 = 𝑛

The numbers between brackets are the estimates of the expected numbers that we will use later. They are
𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙×𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙 35×35
determined with the formula 𝐸̂0 𝑁𝑖𝑗 = , e.g. for cell (1,1): 𝐸̂0 𝑁11 = 120 ≈ 10.2
𝑛

Define: 𝑁𝑖𝑗 = “number of employees with educational level 𝑖 and exam result 𝑗” (𝑖, 𝑗 = 1, 2, 3),
𝑝𝑖𝑗 = “probability that an arbitrary person in the population has educational level 𝑖 and exam result 𝑗”

We will apply the test on independence, since we have only one sample where two variables (educational
level and exam result) are scored.

1. Model: The number 𝑁11 , 𝑁12 , … . , 𝑁33 have a multinomial distribution with total 𝑛 = 120 and
probabilities 𝑝𝑖𝑗 .
2. Test 𝐻0 : 𝑝𝑖𝑗 = 𝑝𝑖∙ 𝑝∙𝑗 against 𝐻1 : 𝑝𝑖𝑗 ≠ 𝑝𝑖∙ 𝑝∙𝑗 for at least one pair (𝑖, 𝑗), with α = 1%.
(remark: 𝑝𝑖∙ and 𝑝∙𝑗 are the row total and column total, related to the cell (𝑖, 𝑗),
for example, 𝑝1∙ = 𝑝11 + 𝑝12 + 𝑝13 )
2
(𝑁 −𝐸̂ 𝑁 ) rowtotal×columntotal
2
3. Test statistic: 𝜒 = ∑3𝑗=1 ∑3𝑖=1 𝑖𝑗 ̂ 0 𝑖𝑗 , with 𝐸̂0 𝑁𝑖𝑗 =
𝐸 𝑁 0 𝑖𝑗 𝑛
4. Under 𝐻0 𝜒 2 has a chi-square distribution with degrees of freedom 𝑑𝑓 = (𝑐 − 1)(𝑟 − 1) = 4.
(11 − 10.2)2 (20 − 17.5)2 (9 − 8.3)2
5. For 𝐸̂0 𝑁𝑖𝑗 see the table above, so 𝜒 2 = + + ⋯+ = 4.69
10.2 17.5 8.3
6. We reject 𝐻0 if 𝜒 2 ≥ 𝑐 . 𝛼 = 0.05, so from the 𝜒42 -table it follows that 𝑐 = 9.49.
7. The outcome 4.69 is not in the Rejection Region, so do not reject 𝐻0 .
8. At a 5% significance level we could not prove that the exam results depend on the educational level
Exercise 5
We start with producing the table with totals and the 𝐸𝑖𝑗 = 𝐸̂0 𝑁𝑖𝑗 between brackets in each cell:

Type of company No reply Reply Row total


Small 102 (117.3) 98 (82.7) 200
Medium 121 (117.3) 79 (82.7) 200
Large 129 (117.3) 71 (82.7) 200
Column total 352 248 600

We have the results of three samples in this case. We are investigating whether the three populations of
companies differ in replying to the questionnaire: a test on the homogeneity of the “Reply”-variable.
(This only affects the first two steps of the testing procedure)

Let 𝑁11 and 𝑁12 be the numbers of the small companies that gave “No reply” and “Reply”, resp. We could
state that 𝑁11 and 𝑁12 are multionially distributed, but since we have only two categories we will prefer to
state that 𝑁11 binomially distributed with 𝑛 = 200 and success probability 𝑝11 (implying that
𝑁12 ~𝐵(200, 1 − 𝑝11 ) , 𝑝12 = 1 − 𝑝11 ). Instead of 𝑝𝑖1 and 𝑝𝑖2 we might use the symbols 𝑝𝑖 and 1 − 𝑝𝑖 .
Similarly 𝑁21 and 𝑁22 for the medium companies and 𝑁31 and 𝑁32 for the large companies are introduced.
The 8 steps of the test on the homogeneity (3 samples, all 𝑛 = 200) are in this case:

1. Model: the numbers 𝑁11 , 𝑁21 en 𝑁31 are and binomially distributed with success probabilities 𝑝𝑖1 .
2. Test 𝐻0 : 𝑝11 = 𝑝21 = 𝑝31 against 𝐻1 : “𝑝11 ≠ 𝑝21 or 𝑝21 ≠ 𝑝31 ” with α = 1%.
2
(𝑁𝑖𝑗 −𝐸̂0 𝑁𝑖𝑗 ) 𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙×𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙
3. Test statistic: 𝜒 = ∑2𝑗=1 ∑3𝑖=1
2
with estimates 𝐸̂0 𝑁𝑖𝑗 = .
𝐸̂0 𝑁𝑖𝑗 𝑛
4. Under 𝐻0 𝜒 2 has a Chi-square distribution with 𝑑𝑓 = (𝑟 − 1)(𝑐 − 1) = 2
(102−117.3)2 (71−82.7)2
5. Observed value: 𝜒 2 = + ⋯ + 82.7 ≈ 7.93 (𝑓𝑜𝑟 𝐸𝑖𝑗 𝑎𝑛𝑑 𝑁𝑖𝑗 𝑠𝑒𝑒 𝑡ℎ𝑒 𝑡𝑎𝑏le above)
117.3
6. We reject 𝐻0 if 𝜒 2 ≥ 𝑐 . 𝛼 = 0.01, so from the 𝜒22 -table it follows that 𝑐 = 9.21.
7. The observed value 7.93 does not fall in the RR, so do not reject 𝐻0 .
8. We cannot state, at a 1% level of significance, that the three types of companies responded differently.

Exercise 6
a. We consider the observations to be drawn as one random sample from a population of PhD-students,
where for each PhD student the variables “Gender” and “Promotion in 6 years” are scored: a test on
independence of these variables can be applied.
𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙×𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙
b. The observed numbers 𝑁𝑖𝑗 and the expected numbers 𝐸̂0 𝑁𝑖𝑗 = are the following:
𝑛

Gender
Female male Total
Yes 𝑁11 = 98, 𝐸̂0 𝑁11= 116.5 𝑁12 = 423, 𝐸̂0 𝑁12 = 404.5 521
Promotion in 6 years
No 𝑁21 =131, 𝐸̂0 𝑁21= 112.5 𝑁22 = 372, 𝐸̂0 𝑁22 = 390.5 503
Total 229 795 1024 = n

1. Model: the numbers 𝑁11 , 𝑁12 , 𝑁21 and 𝑁22 are multinomially distributed with total 𝑛 = 1024 and
(unknown) probabilities 𝑝11, 𝑝12, 𝑝21 , 𝑝22 .
2. Test 𝐻0 : 𝑝𝑖𝑗 = 𝑝𝑖∙ 𝑝∙𝑗 versus 𝐻1 : 𝑝𝑖𝑗 ≠ 𝑝𝑖∙ 𝑝∙𝑗 , for at least one pair (𝑖, 𝑗), with α = 1%
(remark: 𝑝𝑖∙ and 𝑝∙𝑗 are the row and column total, related to the cell (𝑖, 𝑗), e.g.. 𝑝1∙ = 𝑝11 + 𝑝12 )
2
(𝑁𝑖𝑗 −𝐸̂0 𝑁𝑖𝑗 ) 𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙×𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙
3. Test statistic: 𝜒 2 = ∑2𝑗=1 ∑2𝑖=1 , with estimates 𝐸̂0 𝑁𝑖𝑗 =
𝐸̂0 𝑁𝑖𝑗 𝑛
4. Under 𝐻0 𝜒 2 has a Chi-square distribution with number of degrees of freedom
𝑑𝑓 = (𝑟 − 1)(𝑐 − 1) = 1
2
(98−116.5) (372−390.5) 2
5. Observed value: 𝜒 2 = 116.5 + ⋯ + ≈ 7.703 (𝐸̂0 𝑁𝑖𝑗 is in the table above)
390.5
6. We reject 𝐻0 if 𝜒 2 ≥ 𝑐 . 𝛼 = 0.05, so from the 𝜒12 -table it follows that 𝑐 = 3.84.
7. The observed value 7.703 lies in the Rejection Region, so reject 𝐻0 .
8. At a 5% level of significance we showed that the probability of a promotion within 6 years depends
on the gender.

c. The z-score in exercise 3 of chapter 5: 𝑍 = −2.77, so 𝑍 2 ≈ 7.67, which equals 𝜒 2 ≈ 7.70 (small
difference because of rounding)
The test was: reject 𝐻0 , if |𝑍| ≥ 1.96 ⇔ 𝑍 2 ≥ 1.962 , which matches the Chi-square test χ2 ≥ 3.84.

Exercise 7
a. We apply the Chi-square test on independence toe (we have one sample and 2 variables in this case):
We will add the row and column totals and the expected numbers (assuming independence) between
brackets.

Not Row
Very high high perhaps Not at all
considerably total
> 3 times per week 73 (67.5) 140 (141.4) 223 (227.6) 185 (192.4) 132 (124.2) 753
1 – 3 times per week 19 (17.7) 39 (37.0) 56 (59.5) 54 (50.3) 29 (32.5) 197
3 times a month or 2 (8.9) 18 (18.6) 38 (29.9) 9 (25.3) 12 (16.3) 99
less
Column total 94 197 317 268 173 1049

1. Model: the numbers 𝑁11 , 𝑁12 , … , 𝑁35 have a multinomial distribution with total 𝑛 = 1049 and
probabilities 𝑝11, 𝑝12 ,…, 𝑝35 .
2. Test 𝐻0 : 𝑝𝑖𝑗 = 𝑝𝑖∙ 𝑝∙𝑗 against 𝐻1 : 𝑝𝑖𝑗 ≠ 𝑝𝑖∙ 𝑝∙𝑗 , for at least one pair (𝑖, 𝑗), with α = 5%.
(remark: 𝑝𝑖∙ and 𝑝∙𝑗 are the row and column total, e.g.. 𝑝1∙ = 𝑝11 + ⋯ + 𝑝15 )
2
(𝑁 −𝐸̂ 𝑁 ) 𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙×𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙
2
3. Test statistic: 𝜒 = ∑5𝑗=1 ∑3𝑖=1 𝑖𝑗 ̂ 0 𝑖𝑗 , with estimates 𝐸̂0 𝑁𝑖𝑗 =
𝐸 𝑁
0 𝑖𝑗 𝑛
4. Under 𝐻0 𝜒 2 has a Chi-square distribution with 𝑑𝑓 = (𝑟 − 1)(𝑐 − 1) = 8
(73−67.5)2 (12−16.3)2
5. Observed value (using the values in the table above) 𝜒 2 = 67.5 + ⋯ + 16.3 ≈ 11.63.
6. We reject 𝐻0 if χ2 ≥ . 𝛼 = 0.05, so from the 𝜒82 -table it follows: 𝑐 = 15.51.
7. The outcome 11.63 is not in the rejection region (11.63 < 15.51), so do not reject 𝐻0 .
8. At a 5% significance level we could not prove that the need for automatic adjustment of the speed
depends on the frequency of the car use.

b. 2 is de observed number 𝑛31 , but the estimated expected value 𝐸̂0 𝑁31 = 8.9 ≥ 5 is large enough.

Exercise 8
The reasoning is as follows: if the options are equally attractive, the number of 9+4 persons, who follow,
will be arbitrarily divided into the two groups of 10 and 9 persons in the two tests. We will reject “arbitrary
division” in favour of “option 1 is more attractive” if the probability of 9 or even 10 followers among the
option 1 test persons is small.
If 𝑋 = “# followers among the 10 persons in the option 1 test”, then:
the p-value = 𝑃(𝑋 ≥ 9) = 𝑃(𝑋 = 9) + 𝑃(𝑋 = 10) (see the diagram) Follow Not Total
13 6 13 6
( )( ) ( )( )
= 9 19 1 + 1019 0 ≈ 4.64% + 0.31% = 4.95%
(10) (10) Persons 13 6 19
↓ ↓ ↓
Option 1 9 1 10
Fisher’s exact test for a 2×2 cross table with small numbers:
1. Model: the numbers 𝑁𝑖𝑗 have a multinomial distribution with total 𝑛 = 19 and success rates 𝑝𝑖𝑗
2. Test 𝐻0 : 𝑝𝑖𝑗 = 𝑝𝑖∙ 𝑝∙𝑗 (independence) against
𝐻1 : 𝑝𝑖𝑗 ≠ 𝑝𝑖∙ 𝑝∙𝑗 , for at least one pair (i, j), with α = 5%.
3. Test statistic: 𝑋 = “the number of followers among the 10 persons in the option 1 test”
4. Under 𝐻0 𝑋 has a an hypergeometric distribution (see diagram for the parameters)
5. Observed: 𝑋 = 9
6. We will reject 𝐻0 if the p-value ≤ 𝛼 = 0.05.
(13 6
9 )(1) (13 6
10)(0)
p-value = 𝑃(𝑋 ≥ 9|𝐻0 ) = 𝑃(𝑋 = 9) + 𝑃(𝑋 = 10) = (19
+ (19
≈ 4.64% + 0.31% = 4.95%
10) 10)
7. p-value < 𝛼 = 5%, so the null hypothesis is rejected.
8. At a 5% level of significance the hypothesis “equal effect in options 1 and 2” is rejected in favour of the
alternative “option 1 is more attractive than option 2”.

You might also like