Soundness check: a few unprovable tasks in the benchmark

Hi @stefanzetzsche,

Thanks for the great contribution of releasing this benchmark of applying Dafny in classical mathematics verification tasks. The results are very impressive and suggest that Dafny-style verification tools could become a strong alternative to traditional automated theorem provers in certain domains.

While experimenting with the benchmark, I (with the help of AI tools) noticed that a few tasks appear to be unprovable. From my investigation, several of these issues seem to stem from errors introduced during the translation from Lean to Dafny, while others may already exist in the original Lean benchmark itself. Please double check since I might also made a few mistakes while understandting the original problems.

I have not yet gone through the entire benchmark systematically, but I would be very interested in collaborating with you to identify more issues, improve the translation pipeline, and potentially help build a more robust and automated benchmark for the community.

Here are the issues I found (descriptions polished with AI assistance).

##  A. translation bugs (14)
The Lean miniF2F original is correct but the Dafny translation introduced a false statement.

### A1. [`dataset/test/imo_1962_p2.dfy`](https://github.com/dafny-lang/miniF2F/blob/f8db1a6838dceb9067e05b72e49d62a3acb5fcaa/dataset/test/imo_1962_p2.dfy)
```dafny
ensures -1.0 <= x < 1.0 - sqrt(31.0/8.0)
```
**Bug:** division pulled inside the radical. Lean: `1 - real.sqrt 31 / 8` (= `√31/8 ≈ 0.696`)

**Fix:** `ensures -1.0 <= x < 1.0 - sqrt(31.0)/8.0`.

### A2. [`dataset/test/imo_1985_p6.dfy`](https://github.com/dafny-lang/miniF2F/blob/f8db1a6838dceb9067e05b72e49d62a3acb5fcaa/dataset/test/imo_1985_p6.dfy)
```dafny
requires forall x: real, n: nat | x >= 0.0 && n != 0 :: f(n+1)(x) == f(n)(x) * f(n)(x) + 1.0/(n as real)
```
**Bug:** dropped parentheses. Lean: `f (n+1) x = f n x * (f n x + 1/n)`; you wrote `f·f + 1/n` instead of `f·(f + 1/n)`.

**Fix:** `f(n+1)(x) == f(n)(x) * (f(n)(x) + 1.0/(n as real))`.


### A3. [`dataset/test/imo_1981_p6.dfy`](https://github.com/dafny-lang/miniF2F/blob/f8db1a6838dceb9067e05b72e49d62a3acb5fcaa/dataset/test/imo_1981_p6.dfy)
```dafny
ensures forall y: nat :: f(4)(y+1) == Int.pow(2, f(4)(y+3)) - 3
```
**Bug:** `+3` moved inside the argument. Lean: `f 4 (y+1) = 2^(f 4 y + 3) - 3`

**Fix:** `f(4)(y+1) == Int.pow(2, f(4)(y) + 3) - 3`.

### A4. [`dataset/valid/mathd_numbertheory_202.dfy`](https://github.com/dafny-lang/miniF2F/blob/f8db1a6838dceb9067e05b72e49d62a3acb5fcaa/dataset/valid/mathd_numbertheory_202.dfy)
```dafny
ensures Int.pow(19, 19) + Int.pow(99, 99) % 10 == 8
```
**Bug:** Lean: `(19^19+99^99)%10`.

**Fix:** `ensures (Int.pow(19,19) + Int.pow(99,99)) % 10 == 8`.

### A5. [`dataset/test/algebra_amgm_sum1toneqn_prod1tonleq1.dfy`](https://github.com/dafny-lang/miniF2F/blob/f8db1a6838dceb9067e05b72e49d62a3acb5fcaa/dataset/test/algebra_amgm_sum1toneqn_prod1tonleq1.dfy)

```dafny
lemma algebra_amgm_sum1toneqn_prod1tonleq1(a: nat -> real, n: nat)
  requires Real.sum(range(n), k => a(k)) == n as real
  ensures Real.prod(range(n), k => a(k)) <= 1.0
```
**Bug:** 1Lean types `a : ℕ → nnreal`; this one uses `real`, dropping `a(k) ≥ 0`

**Fix:** add `requires forall k :: 0 <= k < n ==> a(k) >= 0.0`.

### A6. [`dataset/valid/algebra_amgm_prod1toneq1_sum1tongeqn.dfy`](https://github.com/dafny-lang/miniF2F/blob/f8db1a6838dceb9067e05b72e49d62a3acb5fcaa/dataset/valid/algebra_amgm_prod1toneq1_sum1tongeqn.dfy)
```dafny
lemma algebra_amgm_prod1toneq1_sum1tongeqn(a: nat -> real, n: nat)
  requires Real.prod(range(n), k => a(k)) == 1.0
  ensures Real.sum(range(n), k => a(k)) >= n as real
```
**Bug:** as A5.

**Fix:** add `requires forall k :: 0 <= k < n ==> a(k) >= 0.0`.

### A7. [`dataset/test/mathd_numbertheory_227.dfy`](https://github.com/dafny-lang/miniF2F/blob/f8db1a6838dceb9067e05b72e49d62a3acb5fcaa/dataset/test/mathd_numbertheory_227.dfy)
```dafny
lemma mathd_numbertheory_227(x: nat, y: nat, n: nat)
  requires x/4 + y/6 == (x+y)/n
  ensures n == 5
```
**Bug:** lossy real→int coercion.  `(x,y,n)=(1,1,3)` — all integer quotients are `0`.

**Fix:** make `x, y, n` real (or cast) so the equation uses real division.

### A8. [`dataset/test/amc12b_2020_p2.dfy`](https://github.com/dafny-lang/miniF2F/blob/f8db1a6838dceb9067e05b72e49d62a3acb5fcaa/dataset/test/amc12b_2020_p2.dfy)
```dafny
ensures ((100*100 - 7*7)) / (70*70 - 11*11) * ((70 - 11) * (70 + 11) / ((100 - 7) * (100 + 7))) == 1
```
**Bug:** same lossy real→int coercion. `9951 / 4779 = 2` and `4779 / 9951 = 0` in integer division, so the product is `0 ≠ 1`.

**Fix:** evaluate the expression over `real`.

### A9. [`dataset/test/mathd_numbertheory_764.dfy`](https://github.com/dafny-lang/miniF2F/blob/f8db1a6838dceb9067e05b72e49d62a3acb5fcaa/dataset/test/mathd_numbertheory_764.dfy)
```dafny
ensures Real.sum(set k: int | 1 <= k <= p-2, (k: int) => if k % p == 0 then 0.0 else 1.0/((k % p) as real) * 1.0/((k % p) as real + 1.0)) == 2.0
```
**Bug:** translation bug of `zmod`

**Fix:** the sum must be taken in `zmod p`, not `real`.

### A10. [`dataset/test/mathd_numbertheory_427.dfy`](https://github.com/dafny-lang/miniF2F/blob/f8db1a6838dceb9067e05b72e49d62a3acb5fcaa/dataset/test/mathd_numbertheory_427.dfy)
```dafny
requires a == Int.sum(divisors(500), k => k)
ensures Int.sum(filter(prime, divisors(a as int)), k => k) == 25
```
### A11. [`dataset/valid/mathd_numbertheory_403.dfy`](https://github.com/dafny-lang/miniF2F/blob/f8db1a6838dceb9067e05b72e49d62a3acb5fcaa/dataset/valid/mathd_numbertheory_403.dfy)
```dafny
ensures Int.sum(divisors(198), k => k) == 270
```
### A12. [`dataset/valid/mathd_numbertheory_543.dfy`](https://github.com/dafny-lang/miniF2F/blob/f8db1a6838dceb9067e05b72e49d62a3acb5fcaa/dataset/valid/mathd_numbertheory_543.dfy)
```dafny
ensures Int.sum(divisors(Int.pow(30, 4)), k => 1) - 2 == 123
```
**Bug (A10–A12):** the library's `divisors(n: int)` returns **signed** divisors (both `+d` and `−d`), whereas Lean's `nat.divisors` / `nat.proper_divisors` are positive-only.

**Fix:** restrict the library `divisors` to positive divisors, or define extra functions for that.

### A13. [`dataset/test/imo_2007_p6.dfy`](https://github.com/dafny-lang/miniF2F/blob/f8db1a6838dceb9067e05b72e49d62a3acb5fcaa/dataset/test/imo_2007_p6.dfy)
```dafny
requires Real.sum(range(100), x => f(x+1)*f(x+1)) == 1.0
ensures Real.sum(range(99), x => f(x+1)*f(x+1) + f(x+2) + f(100)*f(100)*f(1)) < 12.0/25.0
```
**Bug:** Lean: `∑ x in finset.range 99, ((a (x+1))^2 * a (x+2)) + (a 100)^2 * a 1 < 12/25`, with `a : ℕ → nnreal`.

**Fix:** `ensures Real.sum(range(99), x => f(x+1)*f(x+1)*f(x+2)) + f(100)*f(100)*f(1) < 12.0/25.0`, and add `requires forall i :: 1 <= i <= 100 ==> f(i) >= 0.0`.

### A14. [`dataset/test/amc12a_2020_p15.dfy`](https://github.com/dafny-lang/miniF2F/blob/f8db1a6838dceb9067e05b72e49d62a3acb5fcaa/dataset/test/amc12a_2020_p15.dfy)
```dafny
ensures Complex.Real.abs(Complex.sub(a, b)) <= 2.0 * sqrt(21.0)
```
**Bug:** It does not resolve.

These are false in the Lean miniF2F source too, and Dafny extend the same error.

### B1. [`dataset/test/numbertheory_notequiv2i2jasqbsqdiv8.dfy`](https://github.com/dafny-lang/miniF2F/blob/f8db1a6838dceb9067e05b72e49d62a3acb5fcaa/dataset/test/numbertheory_notequiv2i2jasqbsqdiv8.dfy)
```dafny
ensures !((exists i, j :: a == 2*i && b == 2*j) <==> (exists k :: a*a + b*b == 8*k))
```
**Counterexample:** `(a,b)=(0,0)` 

### B2. [`dataset/test/mathd_numbertheory_728.dfy`](https://github.com/dafny-lang/miniF2F/blob/f8db1a6838dceb9067e05b72e49d62a3acb5fcaa/dataset/test/mathd_numbertheory_728.dfy)
```dafny
ensures (Int.pow(29, 13) - Int.pow(5, 13)) % 7 == 0
```
**bug:** the Lean theorem states the identical (false) equation
`(29^13 - 5^13) % 7 = 0`. `29 ≡ 1 (mod 7)` ⇒ `29¹³ ≡ 1`; `5¹³ ≡ 5`; so
`(29¹³−5¹³) % 7 = 3 ≠ 0`.

### B3. [`dataset/valid/mathd_numbertheory_155.dfy`](https://github.com/dafny-lang/miniF2F/blob/f8db1a6838dceb9067e05b72e49d62a3acb5fcaa/dataset/valid/mathd_numbertheory_155.dfy)
```dafny
ensures |filter(x => x % 19 == 7, set x | 100 <= x <= 999)| == 52
```
**bug:** the Lean theorem also claims the count is `52`. The actual count of `x ∈ [100,999]` with `x % 19 == 7` is **48** (`x = 102, 121, …, 995`).

### B4. [`dataset/test/mathd_algebra_276.dfy`](https://github.com/dafny-lang/miniF2F/blob/f8db1a6838dceb9067e05b72e49d62a3acb5fcaa/dataset/test/mathd_algebra_276.dfy)
```dafny
requires forall x: real :: 10.0*x*x - x - 24.0 == ((a as real)*x - 8.0) * ((b as real)*x + 3.0)
ensures a + b == 12
```
**Counterexample:** `(a,b)=(5,2)`.

### B5. [`dataset/test/mathd_numbertheory_618.dfy`](https://github.com/dafny-lang/miniF2F/blob/f8db1a6838dceb9067e05b72e49d62a3acb5fcaa/dataset/test/mathd_numbertheory_618.dfy)
```dafny
lemma mathd_numbertheory_618(n: nat, p: nat -> nat)
  requires forall x: nat :: p(x) == x*x - x + 41
  requires 1 < gcd(p(n), p(n+1))
  ensures 41 <= n
```
**Counterexample:** `n=0`.

### B6. [`dataset/test/amc12_2000_p1.dfy`](https://github.com/dafny-lang/miniF2F/blob/f8db1a6838dceb9067e05b72e49d62a3acb5fcaa/dataset/test/amc12_2000_p1.dfy)
```dafny
requires i != 0
requires m != 0
requires o != 0
requires i*m*o == 2001
ensures i+m+o <= 671
```
**Counterexample:** `(i,m,o)=(1,1,2001)`.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Soundness check: a few unprovable tasks in the benchmark #8

A. translation bugs (14)

A1. `dataset/test/imo_1962_p2.dfy`

A2. `dataset/test/imo_1985_p6.dfy`

A3. `dataset/test/imo_1981_p6.dfy`

A4. `dataset/valid/mathd_numbertheory_202.dfy`

A5. `dataset/test/algebra_amgm_sum1toneqn_prod1tonleq1.dfy`

A6. `dataset/valid/algebra_amgm_prod1toneq1_sum1tongeqn.dfy`

A7. `dataset/test/mathd_numbertheory_227.dfy`

A8. `dataset/test/amc12b_2020_p2.dfy`

A9. `dataset/test/mathd_numbertheory_764.dfy`

A10. `dataset/test/mathd_numbertheory_427.dfy`

A11. `dataset/valid/mathd_numbertheory_403.dfy`

A12. `dataset/valid/mathd_numbertheory_543.dfy`

A13. `dataset/test/imo_2007_p6.dfy`

A14. `dataset/test/amc12a_2020_p15.dfy`

B1. `dataset/test/numbertheory_notequiv2i2jasqbsqdiv8.dfy`

B2. `dataset/test/mathd_numbertheory_728.dfy`

B3. `dataset/valid/mathd_numbertheory_155.dfy`

B4. `dataset/test/mathd_algebra_276.dfy`

B5. `dataset/test/mathd_numbertheory_618.dfy`

B6. `dataset/test/amc12_2000_p1.dfy`

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Soundness check: a few unprovable tasks in the benchmark #8

Description

A. translation bugs (14)

A1. dataset/test/imo_1962_p2.dfy

A2. dataset/test/imo_1985_p6.dfy

A3. dataset/test/imo_1981_p6.dfy

A4. dataset/valid/mathd_numbertheory_202.dfy

A5. dataset/test/algebra_amgm_sum1toneqn_prod1tonleq1.dfy

A6. dataset/valid/algebra_amgm_prod1toneq1_sum1tongeqn.dfy

A7. dataset/test/mathd_numbertheory_227.dfy

A8. dataset/test/amc12b_2020_p2.dfy

A9. dataset/test/mathd_numbertheory_764.dfy

A10. dataset/test/mathd_numbertheory_427.dfy

A11. dataset/valid/mathd_numbertheory_403.dfy

A12. dataset/valid/mathd_numbertheory_543.dfy

A13. dataset/test/imo_2007_p6.dfy

A14. dataset/test/amc12a_2020_p15.dfy

B1. dataset/test/numbertheory_notequiv2i2jasqbsqdiv8.dfy

B2. dataset/test/mathd_numbertheory_728.dfy

B3. dataset/valid/mathd_numbertheory_155.dfy

B4. dataset/test/mathd_algebra_276.dfy

B5. dataset/test/mathd_numbertheory_618.dfy

B6. dataset/test/amc12_2000_p1.dfy

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

A1. `dataset/test/imo_1962_p2.dfy`

A2. `dataset/test/imo_1985_p6.dfy`

A3. `dataset/test/imo_1981_p6.dfy`

A4. `dataset/valid/mathd_numbertheory_202.dfy`

A5. `dataset/test/algebra_amgm_sum1toneqn_prod1tonleq1.dfy`

A6. `dataset/valid/algebra_amgm_prod1toneq1_sum1tongeqn.dfy`

A7. `dataset/test/mathd_numbertheory_227.dfy`

A8. `dataset/test/amc12b_2020_p2.dfy`

A9. `dataset/test/mathd_numbertheory_764.dfy`

A10. `dataset/test/mathd_numbertheory_427.dfy`

A11. `dataset/valid/mathd_numbertheory_403.dfy`

A12. `dataset/valid/mathd_numbertheory_543.dfy`

A13. `dataset/test/imo_2007_p6.dfy`

A14. `dataset/test/amc12a_2020_p15.dfy`

B1. `dataset/test/numbertheory_notequiv2i2jasqbsqdiv8.dfy`

B2. `dataset/test/mathd_numbertheory_728.dfy`

B3. `dataset/valid/mathd_numbertheory_155.dfy`

B4. `dataset/test/mathd_algebra_276.dfy`

B5. `dataset/test/mathd_numbertheory_618.dfy`

B6. `dataset/test/amc12_2000_p1.dfy`