Received: 30 April 2019 Revised: 22 November 2019 Accepted: 10 December 2019
DOI: 10.1002/sim.8463
T U T O R I A L I N B I O S TAT I S T I C S
A primer on strong vs weak control of familywise error rate
Michael A. Proschan Erica H. Brittain
Biostatistics Research Branch, National
Institute of Allergy and Infectious Multiple comparison adjustments have a long history, yet confusion remains
Diseases, Bethesda, Maryland about which procedures control type 1 error rate in a strong sense and how to
show this. Part of the confusion stems from a powerful technique called the
Correspondence
Michael A. Proschan, Biostatistics closed testing principle, whose statement is deceptively simple, but is some-
Research Branch, National Institute times misinterpreted. This primer presents a straightforward way to think about
of Allergy and Infectious Diseases,
Bethesda, MD.
multiplicity adjustment.
Email: ProschaM@niaid.nih.gov
KEYWORDS
closed testing principle, familywise error rate, sequentially rejective Bonferroni, strong vs weak
control of the FWER
1 I N T RO DU CT ION
Studies designed to definitively answer controversial questions require high probability that all claimed differences are
correct. Accordingly, when there are multiple comparisons, the type 1 error rate for each comparison is often adjusted
to control the familywise error rate (FWER), the probability of rejecting at least one true null hypothesis.1,2 This article
reviews FWER control and how to show it. Although we focus on randomized clinical trials, the same principles apply
to other settings as well.
Some procedures control the FWER in only a weak sense, which we illustrate for a four-armed trial comparing pairs
of treatments with respect to a continuous outcome measure such as cholesterol change from baseline to end of study.
The null hypothesis H0ij for the comparison of arms i and j is that their means are equal: H0ij ∶ 𝜇i = 𝜇j . The global null
hypothesis is the intersection of all pairwise null hypotheses: H0 = ∩ij H0ij , which is equivalent to all means being equal.
Fisher's protected least significant difference (LSD) procedure rejects H0ij if and only if the F-statistic comparing all four
arms and the t-statistic comparing arms i and j are both statistically significant at level 𝛼.3,4 If the global null hypothesis is
true, the FWER is no greater than alpha because the procedure requires a statistically significant value for the F-statistic,
which occurs with probability 𝛼 under the global null hypothesis. Control of the FWER under the global null hypothesis
is called weak control of the FWER.
Fisher's protected LSD does not control the FWER under all possible null hypotheses. For example, suppose that the
population means of arms 1, 2 and 3 are equal, but the fourth population mean is so far from the other three that the
F-statistic is virtually guaranteed to reach statistical significance (top portion of Figure 1). Then Fisher's protected LSD
is virtually equivalent to pairwise comparisons at level 𝛼 each. The FWER will greatly exceed 𝛼. Similar inflation of the
FWER occurs in the scenario in the bottom portion of Figure 1. A procedure that controls the FWER at level 𝛼 or less
under every scenario in which at least one null hypothesis is true is said to strongly control the FWER. Figure 1 illustrates
that Fisher's protected LSD does not strongly control FWER.
Because it is not known which null hypotheses are true, we generally want strong control of the FWER in a phase III
clinical trial. In earlier phase trials, we might settle for less protection of the type 1 error rate. For instance, a phase II trial
comparing two drugs to a control with respect to an intermediate outcome might settle for weak control of the FWER.
Published 2020. This article is a U.S. Government work and is in the public domain in the USA.
Statistics in Medicine. 2020;39:1407–1413. wileyonlinelibrary.com/journal/sim 1407
1408 PROSCHAN and BRITTAIN
FIGURE 1 The three possible regions for pN − pS , and the location of the upper confidence limit required to make a false claim
2 MORE ST RINGENCY A ND CLOSED TESTING
A more stringent multiple comparison procedure, the Newman-Keuls procedure,4-6 declares the population means in
arms 1 and 2 different if and only if the following statistics are all statistically significant at level 𝛼: (i) the global F-statistic
F1234 ; (ii) the F-statistic F123 comparing arms 1, 2, and 3; (iii) the F-statistic F124 comparing arms 1, 2, and 4; and (iv) the
t-statistic T12 (equivalently, the F-statistic F12 ) comparing arms 1 and 2. Note: the Newman-Keuls procedure is generally
conducted using a common variance estimate pooled over all arms for each of the four tests; the version we describe above
is very similar and slightly easier to describe. Even though the Newman-Keuls procedure offers more error protection
than Fisher's protected LSD, it does not control the type 1 error rate strongly. For example, suppose 𝜇1 = 𝜇2 = 𝜇 and
𝜇3 = 𝜇4 = 𝜇 + L, where L is a very large number (bottom portion of Figure 1). Then all F-statistics comparing 3 or 4 arms
are virtually guaranteed to be statistically significant. Therefore, the procedure is again nearly equivalent to performing
all pairwise t-statistics at level 𝛼. If c𝛼 denotes the critical value for a two-tailed t-test at level 𝛼, the chance of rejecting at
least one of the true nulls H012 and H034 will approach
P (|T12 | > c𝛼 ∪ |T34 | > c𝛼 ) = 1 − P (|T12 | ≤ c𝛼 ∩ |T34 | ≤ c𝛼 )
= 1 − P (|T12 | ≤ c𝛼 ) P (|T34 | ≤ c𝛼 )
= 1 − (1 − 𝛼)2 , (1)
as L → ∞. For 𝛼 = .05, the FWER approaches 1 − 0.952 = 0.0975 as L → ∞.
For a three-armed trial, Fisher's protected LSD and the Newman-Keuls procedure are equivalent and control the
FWER strongly. To see this, we need only consider all scenarios in which at least one null hypothesis is true.
• The FWER is protected under the global null 𝜇1 = 𝜇2 = 𝜇3 because the probability that F123 is statistically significant
is 𝛼 or less.
• If the global null hypothesis is false, then at least two population means differ. That leaves two possibilities: either all
three means differ or exactly two means are equal, and the third mean is different.
– If all three means differ, then no type 1 error can be made.
– On the other hand, if exactly two means are equal, say 𝜇1 = 𝜇2 ≠ 𝜇3 , then only one type 1 error is possible. The
probability of rejecting that true null hypothesis is 𝛼 because the corresponding t-test has level 𝛼.
We have shown by direct enumeration of all possible nulls that the FWER is strongly controlled. As in this example,
there is often a simple way to enumerate the possibilities and show that a procedure strongly controls the FWER.
A general method of constructing tests that strongly controls the FWER, known as the closed testing procedure, is
described in Marcus et al.7 Let H0i , i = 1, … , k, be a set of null hypotheses and let P be a nonempty subset of indices
of {1, 2, … , k}. Define H0P as the intersection hypothesis ∩i∈P H0i , meaning that hypothesis H0i is true for all i ∈ P,
PROSCHAN and BRITTAIN 1409
T A B L E 1 Required rejections under the closed testing principle for
testing H012 in the context of pairwise comparisons
1 Pairwise Null 12
2 Pairwise Nulls 123 124 12 ∩ 34
3 Pairwise Nulls 123 124 1234
4 Pairwise Nulls 1234
Note: Shorthand notation is used for the result of intersecting 1, 2, 3, or 4 pairwise null
hypotheses when one of those nulls is H012 . For example, 123 denotes {𝜇 ∶ 𝜇1 = 𝜇2 = 𝜇3 }.
irrespective of the configuration of other parameters. In the example comparing pairs among four means, there are six
individual null hypotheses, denoted H012 , H013 , H014 , H023 , H024 , and H034 . Null hypothesis H012 specifies only that 𝜇1 = 𝜇2 .
A valid test of this null hypothesis must have type 1 error rate 𝛼 or less regardless of the values of 𝜇3 and 𝜇4 . The inter-
section hypothesis H012 ∩ H013 means that 𝜇1 = 𝜇2 and 𝜇1 = 𝜇3 , which is equivalent to 𝜇1 = 𝜇2 = 𝜇3 . A valid test of this
intersection hypothesis must have type 1 error rate 𝛼 or less regardless of the value of 𝜇4 . Returning to the more general
setting, suppose that there is an 𝛼 level test of each intersection hypothesis H0P . The closed testing procedure rejects H0P
if and only if the level 𝛼 test of H0Q is rejected for every Q containing P.
Theorem 1. The closed testing procedure strongly controls the FWER over the collection of all nonempty intersections
H0P , P ⊂ {1, … , k} of null hypotheses H01 , … , H0k .
The proof of strong control of the FWER is short: if I is the collection of indices i, for which H0i is true, then for at
least one true null to be rejected, the intersection hypothesis ∩i∈I H0i must be rejected by its corresponding 𝛼 level test. By
definition, this has probability 𝛼 or less, completing the proof.
The closed testing principle is easy to state, but is sometimes misunderstood. For example, the Newman-Keuls
procedure might seem like an application of the closed testing principle, yet the above example shows that the
Newman-Keuls procedure does not strongly control the type 1 error rate. This seems to violate Theorem 1. However,
the closed testing procedure requires an 𝛼 level test of all intersection hypotheses. Table 1 depicts the result of inter-
secting 1, 2, 3, or 4 pairwise nulls when one of those nulls is H012 . For instance, consider intersections of two pairwise
nulls.
1. H012 ∩ H013 = {𝜇 ∶ 𝜇1 = 𝜇2 and 𝜇1 = 𝜇3 } = {𝜇 ∶ 𝜇1 = 𝜇2 = 𝜇3 }, which is denoted 123. Similarly, 123 can arise as
H012 ∩ H023 .
2. H012 ∩ H014 = {𝜇 ∶ 𝜇1 = 𝜇2 and 𝜇1 = 𝜇4 } = {𝜇 ∶ 𝜇1 = 𝜇2 = 𝜇4 }, denoted 124. Similarly, 124 can arise as H012 ∩ H024 .
3. The final intersection of two pairwise nulls is H012 ∩ H034 .
The intersection in 3 above is different from those in items 1 and 2 because it does not reduce to equality of three
means (or four means, for that matter). If 𝜇 denotes the common value of (𝜇1 , 𝜇2 ) and 𝜇′ denotes the common value
of (𝜇3 , 𝜇4 ), then H012 ∩ H034 says nothing about the value of 𝜇 − 𝜇′ . Only if 𝜇 − 𝜇′ = 0 does H012 ∩ H034 simplify to
1234. A test of H012 ∩ H034 must be valid for any value of 𝜇 − 𝜇′ . In particular, the type 1 error rate must be 𝛼 or less
even under the scenario depicted in the bottom of Figure 1. The Newman-Keuls procedure does not test the inter-
section hypothesis H012 ∩ H034 . Therefore, the Newman-Keuls procedure is not an application of the closed testing
procedure.
The Newman-Keuls procedure can be modified so that it obeys the closed testing principle and strongly controls the
FWER. Hochberg and Tamhane1 describe such a procedure for a multiple range version of the Newman-Keuls procedure.
The following is an analogous modification of the F-statistic version of the Newman-Keuls procedure for four arms.
For the comparison of arms 1 and 2, we need only include an 𝛼 level test of the intersection hypothesis H012 ∩ H034 .
Replace 𝛼 by 𝛼 ′ in Equation (1), equate to the desired alpha level, and solve for 𝛼 ′ . For instance, if we want a 0.05 level
procedure, solve 1 − (1 − 𝛼 ′ )2 = 0.05 for 𝛼 ′ to get 𝛼 ′ = 1 − (.95)1∕2 = 0.0253. Therefore, an 𝛼 level test of H012 ∩ H034 rejects
if at least one of the two P-values p12 and p34 is less than or equal to .0253. The level .05 modified Newman-Keuls procedure
declares 𝜇1 ≠ 𝜇2 precisely when all of the following conditions are satisfied: (i) p12 ≤ .05, (ii) p123 ≤ .05, (iii) p124 ≤ .05,
(iv) p1234 ≤ .05, and (v) min(p12 , p34 ) ≤ .0253. We use the analogous modification for declaring other pairs significantly
different. This modified Newman-Keuls procedure strongly controls the FWER.
1410 PROSCHAN and BRITTAIN
3 EXAMPLES OF DIRECT ENUMERATION TO DEMONSTRATE
FWER CO NTRO L
It is often possible to make a direct argument that a procedure controls the FWER strongly. The approach is to consider
all possible configurations of the parameters, and to see if the type 1 error rate is appropriately controlled regardless of
the configuration. This can often lead to a simple proof, without appealing to the closed testing principle. We saw one
example of a direct argument in connection with Fisher's protected LSD test with three arms. The following is another
example.
Example 1. Suppose we want to show noninferiority of a new treatment N compared with a standard treatment S with
respect to the probability of experiencing a harmful event. If pN and pS denote event probabilities for N and S, the noninfe-
riority trial attempts to prove that pN − pS < M, where M > 0 is a prespecified noninferiority margin selected to preserve a
certain proportion of the known effect of S relative to placebo. The null and alternative hypotheses for the noninferiority
trial are
H0M ∶ pN − pS ≥ M; H1M ∶ pN − pS < M.
The trial may be so successful that superiority of N to S is demonstrated. That is, we reject null hypothesis H00 in favor of
H10 , where
H00 ∶ pN − pS ≥ 0; H10 ∶ pN − pS < 0.
We can express these tests equivalently in terms of confidence intervals for pN − pS : noninferiority and superiority corre-
spond to the upper limit of a two-sided, 100(1 − 2𝛼)% confidence interval being less than M or 0, respectively (although
these tests and confidence intervals are equivalent only when they both use the same standard error estimate). Can we
claim that N is superior even though the original goal was to show noninferiority? What if the original goal had been
superiority, but we prespecified a backup noninferiority analysis with a given margin M? Can we claim noninferiority if
superiority fails? That the answer to both questions is yes can be observed as follows (see Figure 2).
• If pN − pS < 0, then noninferiority and superiority are correct, so neither H0M nor H00 can be falsely rejected.
• Now suppose that pN − pS ≥ 0. If 0 ≤ pN − pS < M, then noninferiority is correct, so only H00 can be falsely rejected.
Because the test of H00 has level 𝛼, the probability of making this error is 𝛼 or less.
• On the other hand, suppose that pN − pS ≥ M. Now H0M or H00 can be falsely rejected. However, at least one of these
two errors will be made if and only if H0M is rejected. Therefore, the probability of falsely rejecting at least one of H0M
and H00 is the probability of falsely rejecting H0M . Because pN − pS ≥ M, this probability is ≤ 𝛼.
We have demonstrated that, regardless of the true value of pN − pS , the probability of falsely rejecting at least one of H0M
and H00 is 𝛼 or less. That is, the FWER is strongly controlled when one allows a change from noninferiority to superiority
and vice versa (provided that the noninferiority margin was prespecified).
μ1, μ2, μ3 μ4
μ1, μ2 μ3, μ4
F I G U R E 2 Configurations of four population means for which Fisher's protected LSD (top and bottom portion) and Newman-Keuls
(bottom portion) have inflated FWER. Means with the same horizontal position are the same. The FWER is approximately the same as that
of using three t-tests at level 𝛼 for arms 1, 2, and 3 for Fisher's protected LSD in the top portion and two t-tests at level 𝛼 for the comparison of
arms 1 and 2 and arms 3 and 4 for Newman-Keuls in the bottom portion [Colour figure can be viewed at wileyonlinelibrary.com]
PROSCHAN and BRITTAIN 1411
Some have argued that order matters; a successful noninferiority trial allows a superiority claim if data are suf-
ficiently promising, but a failed superiority trial does not allow the possibility of a noninferiority claim. Their argu-
ment is that superiority already “uses up” all of the alpha, so allowing a noninferiority claim inflates the type 1
error rate. The reasoning in Figure 2 and accompanying text belies this argument. There are two important caveats.
One is that the backup noninferiority analysis and its margin must be prespecified and supported by convincing
evidence from prior studies on the effect of the standard drug relative to placebo. We are not allowed to observe
the treatment effect in a failed superiority trial and then select a margin for a noninferiority claim; see Section
IV.G. of https://www.fda.gov/downloads/Drugs/Guidances/UCM202140.pdf. Another important caveat was pointed
out by a referee: the primary analysis population for superiority and noninferiority must be the same. Superior-
ity trials very often use intention-to-treat as the population for the primary analysis, whereas noninferiority trials
sometimes use an as-treated population for the primary analysis. If the populations for the two types of trials are
different, or if there are other differences in the handling of missing data, for example, the above justification
breaks down.
Example 2. Another direct enumeration argument for strong control is illustrated in consideration of a design for a
more complex three-arm study with a short-term binary endpoint. This example incorporates sequential testing and
a gatekeeping procedure. In this setting, primary interest is in showing that treatment A is superior to the control,
arm C. A known efficacious, but expensive, treatment B is also being studied. Only if treatment A shows superiority
to the control would there be interest in demonstrating that the more expensive treatment B is even better than treat-
ment A. Therefore, arm A is first compared with control, and if arm A beats control, arm B is compared with arm A.
All tests are one tailed, with a desired FWER of 0.025. To monitor the study over time, group-sequential boundaries
could be established such that, for a given pairwise comparison, the probability of falsely declaring superiority at some
point during the trial is .025. Because the primary comparison is between arms A and C, if the monitoring boundary
for A vs C is not crossed, no claim can be made about arm B vs A. However, if the A vs C boundary is crossed, arm
B could be declared superior to A if its corresponding boundary is also crossed at sometime. This is an example of a
gatekeeping procedure. The arm A vs C comparison represents the gate. Only if that gate is open (that comparison is
statistically significant) can the comparison of B with A “go through” (be declared statistically significant after cross-
ing its boundary). Suppose that at an interim analysis, arm B crosses its boundary with A, but arm A has not crossed
its boundary with C. Accordingly, the B vs A comparison could not yet be declared statistically significant. Nonethe-
less, one can consider the following strategy for reducing the number of participants: discontinue enrollment into arm
B, but continue to enroll into the comparison of arms A and C. If the comparison of arms A and C subsequently
crosses its boundary, the investigator would declare benefit for both comparisons, while if A never crosses its bound-
ary with C, no comparison would be declared statistically significant. Does this procedure strongly control the FWER?
One view might be that the boundaries for A vs C and B vs A must be crossed at the same time to declare statistical
significance.
The following enumeration argument shows that the FWER is controlled strongly whether the A vs C and B vs A
comparisons cross their boundaries at the same or different times.
• If the null hypothesis H0AC is true, the type 1 error rate is controlled irrespective of whether H0BA is true because the
procedure allows no claim to be made unless H0AC is rejected, and the monitoring boundary for A vs C is crossed
with probability .025 if H0AC is true.
• Suppose instead that H0AC is false.
– If H0BA is also false, then no type 1 error can be made.
– If H0BA is true, then only one type 1 error is possible, namely, for the comparison of arms B and A. However, the
monitoring boundary for B vs A is crossed with probability .025 if H0BA is true.
Therefore, the procedure controls the FWER regardless of which nulls are true. It does not matter if the two com-
parisons reach statistical significance at different times. It is true that if arm B were to continue enrolling after crossing
its boundary at an interim test, B vs A might not cross its boundary at the same time that A and C crosses its boundary.
Some might find this unappealing, but the FWER is nonetheless protected. Note that legitimate concerns could arise if the
outcome had been time to event. In that case, the proportional hazards assumption might not hold, causing the treat-
ment effect to differ by time. Allowing the two tests to be declared significant based on different amounts of follow-up
may make interpretation difficult in the time to event setting.
1412 PROSCHAN and BRITTAIN
4 A H EURISTIC ARGUMENT
We now present a heuristic argument that sounds very similar to those of Section 3, but with a crucial difference. To show
that Fisher's protected LSD strongly controls the FWER for a three-armed trial, we argue nonrigorously as follows. If the
F-statistic comparing the three arms is statistically significant at 𝛼 = .05, then either the global null hypothesis H0123 is
true and we have already made an error, or H0123 is false. If H0123 is true, then the probability of a significant F is 0.05,
so we are protected. If H0123 was false, then at most two means are equal, say 𝜇1 and 𝜇2 . In that case, the t-statistic T12
has probability 𝛼 of being statistically significant, so we are again protected. The subtle difference between the arguments
in Section 3 and this nonrigorous argument is that the nonrigorous argument is based on already having observed a
statistically significant F value. Conditioning on F123 being statistically significant changes the distribution of T12 such
that the t-test no longer has level 𝛼. The rigorous way to argue is to consider different parameter configurations before
observing any results, and show that regardless of the true configuration, the FWER is 𝛼 or less.
The heuristic argument in the preceding paragraph nonetheless offers some intuition about why certain proce-
dures strongly control the type 1 error rate. For instance, the Holm sequentially rejective Bonferroni procedure8 for
testing k null hypotheses H0i , i = 1, … , k, first compares the smallest P-value p(1) to 𝛼∕k. If p(1) > 𝛼∕k, testing ends.
On the other hand, if p(1) ≤ 𝛼∕k, the second smallest P-value p(2) is compared with 𝛼∕(k − 1). Similarly, if p(2) ≤
𝛼∕(k − 1), p(3) can be compared with 𝛼∕(k − 2), and so on. The heuristic argument is that either an error has already
been made in step 1 or there can be only k − 1 true hypotheses. Therefore, we can use level 𝛼∕(k − 1) at step 2,
and so on.
The same heuristic argument can be made with the clever graphical method of Bretz et al.8 The authors specify an
initial plan for dividing 𝛼 among mu1tiple comparisons, and a plan for how to transfer alpha from statistically significant
results to other comparisons. Gatekeeping procedures fall into this rubric. For instance, consider the comparison of low
dose, middle dose, and high dose of a drug to placebo. The original plan spends all alpha on the high dose comparison
with placebo. If that P-value is ≤ 𝛼, all alpha is transferred to the middle dose comparison with placebo. If that P-value is
also 𝛼 or less, all alpha is transferred to the low dose comparison with placebo. The nonrigorous justification is that once a
comparison reaches statistical significance, either a type 1 error has already been made, or that null hypothesis is false. In
the latter case, no alpha was actually spent, so the alpha originally allocated to that comparison can be transferred to other
comparisons using the prespecified plan. More rigorous proofs of strong control for sequentially rejective Bonferroni and
graphical procedures were given in References 8,9. The following example illustrates how the principle of transferring
alpha to different endpoints could have been used in an HIV trial.
Example 3. The Strategic Timing of Antiretroviral Treatment (START) trial randomized HIV positive patients with CD4
T-cell counts above 500 to one of two strategies: (i) start antiretroviral treatment immediately or (ii) defer antiretroviral
treatment until the CD4 count drops to 350 or less, or an AIDS event occurs. In the past, concerns had been raised about
potential toxicity of long-term use of antiretroviral treatment. Accordingly, investigators argued that START results would
be convincing only if benefit on AIDS events was not offset by an increase in non-AIDS events. Therefore, they chose as
primary endpoint any serious AIDS or non-AIDS event or death.
One alternative not selected would specify four different primary endpoints: (i) serious AIDS or non-AIDS events or
death, (ii) serious AIDS events, (iii) serious non-AIDS events, and (iv) death. Initially, allocate all .05 to outcome 1. If
that is statistically significant, transfer all .05 to outcome 2. Then, if outcome 2 is statistically significant, transfer all .05
to outcome 3, and if that is significant, transfer all .05 to outcome 4. Requiring outcome 1 to be statistically significant
maintains fidelity to the idea that any benefit on AIDS events must not be offset by detrimental effects on non-AIDS
events. If outcome 1 is statistically significant, the alpha is transferred to an outcome that is expected to show a beneficial
effect of early treatment. If that occurs, then outcome 3, which is arguably less likely to show a benefit, can be tested at
the full .05 alpha level. If outcome 3 is statistically significant at alpha level .05, then outcome 4 can also be tested at .05.
This procedure provides strong control of the FWER with four primary endpoints.
On May 15, 2015, the data and safety monitoring board recommended stopping the trial.10 At that time, the numbers
of events in the immediate vs deferred arms were: outcome 1: 42 of 2326 patients vs 96 of 2359 (P < .001); outcome 2:
14 vs 50 (P < .001); outcome 3: 29 vs 47 (P = .04); outcome 4: 12 vs 21 (P = .13). For the purpose of illustrating the Bretz
et al approach to transferring alpha, we ignore the early stopping and pretend that there was no monitoring. Then the
first three endpoints could be declared statistically significant because all .05 of the alpha is transferred from endpoint 1
to endpoint 2 to endpoint 3. The FWER remains strongly protected. With the more conservative Bonferroni method, the
third endpoint could not be declared statistically significant.
PROSCHAN and BRITTAIN 1413
5 S U M M ARY
Clinical trials often demand strict adjustment for multiple comparisons to protect the FWER regardless of which null
hypotheses are true (ie, strong control of the FWER). In simple settings, direct arguments can be used to demonstrate
strong control of the FWER by considering all possible configurations of parameters. Such simple arguments are often
more persuasive than invoking the closed testing principle, which is powerful but mystical to the uninitiated. In proofs
intended to be rigorous, steer clear of arguments that condition on sample results, such as: “given that the global null
hypothesis was rejected, either an error was made or the number of true hypotheses is at most k − 1.” Such arguments
are good for developing intuition about procedures, but not for rigorous proofs.
ORCID
Michael A. Proschan https://orcid.org/0000-0002-9161-3739
REFERENCES
1. Hochberg Y, Tamhane AC. Multiple Comparison Procedures. New York, NY: John Wiley & Sons; 1987.
2. Proschan MA. Multiple comparisons. In: D'Agostino R, Sullivan L, Massaro J, eds. Encyclopedia of Clinical Trials. New York, NY: John
Wiley & Sons; 2008.
3. Fisher RA. The Design of Experiments. Edinburgh, UK: Oliver & Boyd; 1935.
4. Snedecor GW, Cochran WG. Statistical Methods. 8th ed. Ames, IA: Iowa State University Press; 1989.
5. Newman D. The distribution of range in samples from a normal population, expressed in terms of an independent estimate of standard
deviation. Biometrika. 1939;31:20-30.
6. Keuls M. The use of the “studentized range” in connection with an analysis of variance. Euphytica. 1952;1:112-122.
7. Marcus R, Peritz E, Gabriel KR. On closed testing procedures with special reference to ordered analysis of variance. Biometrika.
1976;63:655-660.
8. Bretz F, Maurer W, Brannath W, Posch M. A graphical approach to sequentially rejective multiple test procedures. Stat Med.
2009;28:586-604.
9. Holm S. A simple sequentially rejective multiple test procedure. Scand J Stat. 1979;6:65-70.
10. The INSIGHT START Study Group. Initiation of antiretroviral therapy in early asymptomatic HIV infection. N Engl J Med.
2015;373:795-807.
How to cite this article: Proschan MA, Brittain EH. A primer on strong vs weak control of familywise
error rate. Statistics in Medicine. 2020;39:1407–1413. https://doi.org/10.1002/sim.8463