Minimax Hypothesis Testing
ECE531 Lecture 3: Minimax Hypothesis Testing
                                         D. Richard Brown III
                                      Worcester Polytechnic Institute
                                            05-February-2009
Worcester Polytechnic Institute                 D. Richard Brown III    05-February-2009   1 / 21
                                       Minimax Hypothesis Testing
Simple Binary Bayesian Risks Under Different Priors
                                   r(δ, π0 ) = π0 R0 (δ) + (1 − π0 )R1 (δ)
                       1
                                                                                                                         R0 (δ)
                     0. 9
                     0. 8                                                 r(δ, π0 )
           R1 (δ)
                     0. 7
                                                                                                                                  ′
                                                                                                                         R0 (δBπ0 )
                     0. 6
                                                                                                        ′
                     0. 5                                                                  r(δBπ0 , π0 )
                     0. 4
                     0. 3
                                                                                                r(δBπ0 , π0 )
                 ′
       R1 (δBπ0 )
                     0. 2
                     0. 1
                       0
                            0   0. 1        0. 2   0. 3         0. 4     0. 5     0. 6   0. 7    0. 8       0. 9     1
                                                          π0′          prior π0
 Worcester Polytechnic Institute                            D. Richard Brown III                              05-February-2009        2 / 21
                                       Minimax Hypothesis Testing
Least Favorable Prior State Distribution
                      1
                  0. 9
                  0. 8
                  0. 7
                                                                                                                                        ′
                                                                                                                                R0 (δBπ0 )
                  0. 6
                                                                                                               ′
                  0. 5                                                                       r(δBπ0 , π0 )
                              r(δBπlf , π          0)                                                                           R0 (δBπlf )
      R1 (δBπlf )
                  0. 3
                                                                                                    r(δBπ0 , π0 )
       R1 (δ   Bπ0′   )
                  0. 2
                  0. 1
                      0
                          0     0. 1        0. 2        0. 3         0. 4           0. 6   0. 7         0. 8       0. 9     1
                                                               π0′          πlf            prior   π0
 Worcester Polytechnic Institute                                 D. Richard Brown III                                05-February-2009        3 / 21
                               Minimax Hypothesis Testing
Minimax Hypothesis Testing
Definition:
                                   ρmm := arg min max Rj (ρ)
                                                             ρ          j
Remarks:
  ◮   No single decision rule minimizes the weighted average, e.g. Bayes,
      risk for every possible prior state distribution.
  ◮   A conservative approach is to minimize the worst case risk over all
      possible prior state distributions.
  ◮   Intuitively, there should be a least favorable prior. Does it always
      exist? Is it unique?
  ◮   Intuitively, the minimax decision rule should be the Bayesian decision
      rule with constant Bayesian risk over the priors. Is this always true?
 Worcester Polytechnic Institute                 D. Richard Brown III       05-February-2009   4 / 21
                               Minimax Hypothesis Testing
Minimum Bayesian Risk as a Function of the Prior
Let V (π) := r(δBπ , π) be the minimum Bayesian risk for the prior π.
Theorem
The minimum Bayesian risk V (π) is concave and continuous      P over the
space of priors satisfying πj ≥ 0, j = 0, 1, . . . , N − 1, and j πj = 1.
Hence, there exists a unique least favorable prior
                                        πlf     = arg max V (π).
                                                               π
 Worcester Polytechnic Institute                 D. Richard Brown III   05-February-2009   5 / 21
                               Minimax Hypothesis Testing
Concavity of the Minimum Bayesian Risk
A function is concave if, for any {x, y} in the domain of f and any
α ∈ [0, 1], f (αx + (1 − α)y) ≥ αf (x) + (1 − α)f (y).
Denote a pair of priors as π and π ′ and a third prior π ′′ = απ + (1 − α)π ′ .
We can write
                                                               ′′
                    V (π ′′ ) = π ′′⊤ R(δBπ )
                                                                ′′                ′′
                                   = απ ⊤ R(δBπ ) + (1 − α)π ′⊤ R(δBπ )
                                   ≥ αV (π) + (1 − α)V (π ′ )
hence V (π) is concave.
                                           V
                                                         V (π0′′ ) V (π ′ )
                                                                       0
                                               V (π0 )
                                           0
                                                  π0         π0′′     π0′     1
 Worcester Polytechnic Institute                   D. Richard Brown III           05-February-2009   6 / 21
                               Minimax Hypothesis Testing
Continuity of the Minimum Bayesian Risk
Theorem (“A First Course in Optimization Theory” by
R.K. Sundaram)
Let f : D → R be a concave function. Then, if D is open, f is continuous
on D. If D is not open, f is continuous on the interior of D.
Note that continuity does not imply differentiability.
 Worcester Polytechnic Institute                 D. Richard Brown III   05-February-2009   7 / 21
                                Minimax Hypothesis Testing
The Four Possibilities
         V (π0 )                                                         V (π0 )
 R1 (δBπlf )                                R0 (δBπlf )
                                                                 R1 (δBπlf )                      R0 (δBπlf )
               1                            π0
                                                                               2                  π0
                         π0lf                                                      π0lf
         V (π0 )                                                         V (π0 )
 R1 (δBπlf )                                                                                      R0 (δBπlf )
                                            R0 (δBπlf )
                                                                 R1 (δBπlf )
                   π0lf = 0
                                                                                   π0lf = 1
               3                            π0
                                                                               4                  π0
 Worcester Polytechnic Institute                  D. Richard Brown III                05-February-2009      8 / 21
                                Minimax Hypothesis Testing
Case 1: Differentiable Interior Maximum Risk
Theorem
                                                                                          ′                  ′
If there exists a prior π ′ such that the conditional risks satisfy R0 (δ Bπ ) = R1 (δ Bπ )
                                                                                   ′
then π ′ is a least favorable prior and the minimax decision rule is ρmm = δ Bπ .
Proof.
                                       ′                 ′
Given a π ′ satisfying R0 (δ Bπ ) = R1 (δ Bπ ). For any δ,
                   max{R0 (δ), R1 (δ)}           ≥           max π0 R0 (δ) + (1 − π0 )R1 (δ)
                                                         π0 ∈[0,1]
                                                 ≥       π ′ R0 (δ) + (1 − π ′ )R1 (δ)
                                                                         ′                    ′
                                                 ≥       π ′ R0 (δ Bπ ) + (1 − π ′ )R1 (δ Bπ )
                                                                     ′           ′
                                                 =       R0 (δ Bπ ) = R1 (δ Bπ )
Moreover, for any π0 ∈ [0, 1],
                          ′                          ′                       ′                    ′
V (π ′ ) = π ′ R0 (δ Bπ ) + (1 − π ′ )R1 (δ Bπ ) = π0 R0 (δ Bπ ) + (1 − π0 )R1 (δ Bπ ) ≥ V (π0 )
  Worcester Polytechnic Institute                 D. Richard Brown III                    05-February-2009       9 / 21
                               Minimax Hypothesis Testing
A Procedure for Finding the Minimax Decision Rule
 1. Find a Bayesian decision rule δBπ as a function of the prior π.
 2. See if Case 1 holds by solving for the unique least favorable prior πlf
    using the equalizer rule:
                                         R0 (δBπlf ) = R1 (δBπlf )
 3. If the solution exists, then set
                                                 ρmm = δBπlf
 4. If there is no solution to the equalizer rule, then see if Case 3 or 4
    holds by computing the risk at the endpoints π0lf = 0 and π0lf = 1.
 5. If neither endpoint is least favorable, then we must be in Case 2. In
    this case we must create a randomized minimax decision rule as a
    convex function of two deterministic Bayes decision rules.
 Worcester Polytechnic Institute                 D. Richard Brown III   05-February-2009   10 / 21
                               Minimax Hypothesis Testing
Example: Coherent Detection of BPSK
Our Bayes decision rule for coherent BPSK                               with prior π0 , π1 = 1 − π0 is
                                
                                1
                                      if y                             >γ
                      Bπ
                     δ (y) = 0/1 if y                                   =γ
                                
                                  0    if y                             < γ.
                                
                                    2
where γ := a0 +a
              2
                1
                  + a1σ−a0 ln ππ01 .
The conditional risks are
                                                              
                                            Bπ    γ − a0
                                    R0 (δ ) = Q
                                                     σ
                                                        
                                         Bπ       a1 − γ
                                    R1 (δ ) = Q
                                                     σ
                        R∞           2
where Q(x) :=                  √1 e−t /2 dt.
                          x     2π
Let’s try the equalizer rule. What value of γ gives us R0 (δBπ ) = R1 (δBπ )?
 Worcester Polytechnic Institute                 D. Richard Brown III                 05-February-2009   11 / 21
                               Minimax Hypothesis Testing
Example: Coherent Detection of BPSK
                                               a0 +a1
Answer: R0 = R1 when γ =                          2 .       Hence
                                                                        a0 +a1
                                     
                                     1
                                                            if y >        2
                               mm                                       a0 +a1
                              ρ (y) = 0/1                    if y =        2
                                                                        a0 +a1
                                     
                                       0                     if y <        2 .
                                     
                                                                a0 +a1
                                                        γ=         2
                                          a0                       a1
                                    Y0                                  Y1
What does this imply about the least favorable prior?
Answer: π0 = π1 = 21 .
Given a0 , a1 , and σ, the minimax rule allows you to guarantee a
worst-case risk over all priors.
 Worcester Polytechnic Institute                 D. Richard Brown III            05-February-2009   12 / 21
                                       Minimax Hypothesis Testing
Example: Coherent Detection of BPSK
                      1
                  0. 9
                  0. 8
                  0. 7
                                                                                                                                       ′
                                                                                                                               R0 (δBπ0 )
                  0. 6
                                                                                                               ′
                  0. 5                                                                       r(δBπ0 , π0 )
                              r(δBπlf , π          0)                                                                          R0 (δBπlf )
      R1 (δBπlf )
                  0. 3
                                                                                                    r(δBπ0 , π0 )
       R1 (δ   Bπ0′   )
                  0. 2
                  0. 1
                      0
                          0     0. 1        0. 2        0. 3         0. 4           0. 6   0. 7         0. 8       0. 9    1
                                                               π0′          πlf            prior   π0
 Worcester Polytechnic Institute                                 D. Richard Brown III                              05-February-2009        13 / 21
                                                 Minimax Hypothesis Testing
Cases 3-4: Maximum Risk Occurs at Boundary
Let’s return to our coin flipping problem from Lecture 1 (H0 ↔ x0 = HT
and H1 ↔ x1 = HH) with a modified cost matrix                                           
                                    0 100
                             C=
                                   100 60
                                       100
                                        90
                                        80
                                        70
                                        60
                          Bayes risk
                                        50
                                        40
                                        30
                                        20
                                                      rule 1
                                                      rule 2
                                        10            rule 3
                                                      rule 4
                                                      minimum
                                         0
                                             0       0.1    0.2   0.3   0.4     0.5     0.6   0.7   0.8   0.9   1
                                                                              prior π
                                                                                    0
 Worcester Polytechnic Institute                                    D. Richard Brown III                        05-February-2009   14 / 21
                               Minimax Hypothesis Testing
Cases 3-4: Maximum Risk Occurs at Boundary
Remarks:
  ◮ Rules 2 and 3, depending on the prior, minimize the Bayes risk.
  ◮ Note that the equalizer rule would give no solution to this problem:
                                   R0 (D1 ) = 100           and         R1 (D1 ) = 60
                                     R0 (D2 ) = 0           and         R1 (D2 ) = 100
                                    R0 (D3 ) = 50           and         R1 (D3 ) = 60
                                    R0 (D4 ) = 50           and         R1 (D4 ) = 100
      No decision rule gives R0 = R1 .
  ◮   In this example, the least favorable prior (maximizing the minimum
      risk) is π0 = 0, or that the coin is always HH. This should make sense.
  ◮   The minimax decision rule is Rule 3: observe T, decide the coin is
      fair; observe H, decide the coin is unfair.
  ◮   You can guarantee a worst-case risk of $60 by using Rule 3.
 Worcester Polytechnic Institute                 D. Richard Brown III                   05-February-2009   15 / 21
                                              Minimax Hypothesis Testing
Case 2: Non-Differentiable Interior Maximum
Back to our original coin flipping problem with cost matrix                                          
                                   0 100
                            C=
                                  100 0
                                    100
                                     90
                                     80
                                     70
                                     60
                                                       rule 1
                       Bayes risk
                                                       rule 2
                                     50                rule 3
                                                       rule 4
                                                       minimum
                                     40
                                     30
                                     20
                                     10
                                      0
                                          0      0.1       0.2   0.3       0.4     0.5     0.6   0.7   0.8   0.9    1
                                                                                 prior π
                                                                                       0
 Worcester Polytechnic Institute                                       D. Richard Brown III                        05-February-2009   16 / 21
                               Minimax Hypothesis Testing
Case 2: Non-Differentiable Interior Maximum
Remarks:
 ◮    Rules 2 and 3, depending on the prior, minimize the Bayes risk.
 ◮    Again, the equalizer rule gives no deterministic solution since
                                   R0 (D1 ) = 100           and         R1 (D1 ) = 0
                                     R0 (D2 ) = 0           and         R1 (D2 ) = 100
                                    R0 (D3 ) = 50           and         R1 (D3 ) = 0
                                    R0 (D4 ) = 50           and         R1 (D4 ) = 100
 ◮    In this example, the least favorable prior (maximizing the minimum
      risk) is π0 = 32 , or that the coin is HT with probability 23 .
 ◮    The minimax decision rule is neither Rule 2 or Rule 3.
 ◮    You can guarantee a worst-case risk of $100
                                               3 by using a randomized
      decision rule that is a combination of Rules 2 and 3.
 Worcester Polytechnic Institute                 D. Richard Brown III                  05-February-2009   17 / 21
                                          Minimax Hypothesis Testing
Case 2: Non-Differentiable Interior Maximum
Problem: Find a randomized decision rule that satisfies the equalizer rule
                    +
         R1 (δBπlf ) 100
                                  90
                                  80
                                                                   +
                                  70                          δBπlf
                                  60
                                                  rule 1
                     Bayes risk
                                                  rule 2                                                                                 −
                                  50              rule 3
                                                                                                         −
                                                                                                                              R0 (δBπlf )
                                                  rule 4                                               Bπlf
                                                  minimum
                                                                                                   δ
                                  40
                V (πlf )                                                                                                      V (πlf )
                                  30     r(ρmm )
                                  20
                                  10
                     −                                                                                                                   +
           R1 (δ   Bπlf
                                  ) 00      0.1       0.2   0.3    0.4      0.5        0.6   0.7       0.8       0.9      1   R0 (δBπlf )
                                                                          prior   π0
 Worcester Polytechnic Institute                            D. Richard Brown III                             05-February-2009      18 / 21
                               Minimax Hypothesis Testing
Case 2: Non-Differentiable Interior Maximum
Our randomized minimax decision rule is the
                                                            −                   +
                                   ρmm = αδBπlf + (1 − α)δBπlf
We can calculate the randomization α ∈ [0, 1] by applying the equalizer
rule:
                −                                +                      −                              +
  αR0 (δBπlf ) + (1 − α)R0 (δBπlf ) = αR1 (δBπlf ) + (1 − α)R1 (δBπlf )
which gives the solution
                                                            +           +
                                              R1 (δBπlf ) − R0 (δBπlf )
         α =                         −                      −               +                +
                       (R0 (δBπlf ) − R1 (δBπlf )) − (R0 (δBπlf ) − R1 (δBπlf ))
                                  +
                            V ′ (πlf )
               =             +            −
                       V ′ (πlf ) − V ′ (πlf )
 Worcester Polytechnic Institute                 D. Richard Brown III               05-February-2009       19 / 21
                               Minimax Hypothesis Testing
Case 2: Non-Differentiable Interior Maximum
What is the minimax decision rule in our example?
                                                 +
                                           V ′ (πlf ) = −100
                                           V ′ (πlf
                                                 −
                                                    ) = 50
hence α = 32 . If H0 is the hypothesis that the coin is HT, H1 is the
hypothesis that the coin is HH, the observations y0 = T, y1 = H, then our
deterministic decision rules 2 and 3 can be written as                                                         
                     −
                   Bπlf     1 0                  +
                                              Bπlf       1 1
          D3 = δ        =          and D2 = δ       =
                            0 1                          0 0
The minimax decision rule is then given by
                           » –           » – » –
                          2 1    1        1   1
ρmm (y = T)         =          +            =    (T → always decide HT)
                          3 0    3        0   0
                           » –           » – »    – „                 nPr(decide HT)=1/3«
                          2 0    1        1   1/3
ρmm (y = H)         =          +            =         H → randomize →
                          3 1    3        0   2/3                       Pr(decide HH)=2/3
 Worcester Polytechnic Institute                 D. Richard Brown III   05-February-2009   20 / 21
                               Minimax Hypothesis Testing
Final Remarks on Minimax Hypothesis Testing
 1. The objective of minimax hypothesis testing is to minimize your
    worst-case (maximum) risk over all possible prior state probabilities.
 2. Conservative approach but useful in scenarios when:
          ◮   the prior is unknown and/or
          ◮   you need to provide a maximum risk guarantee.
 3. Try the equalizer rule first!
 4. Minimax risk at the endpoints only occurs in weird cases.
 5. Finite observation space Y implies that the minimum Bayes risk curve
    V is not going to be differentiable everywhere. Randomization is
    often necessary to obtain the minimax decision rule in these cases.
 6. Composite hypotheses:
          ◮   The equalizer rule is still valid.
          ◮   Checking “endpoints” is still valid (but more points to check).
          ◮   In the case of a non-differentiable interior maximum, finding the
              randomization can be difficult.
 Worcester Polytechnic Institute                 D. Richard Brown III   05-February-2009   21 / 21