Add more wide-float functionality by Noam2Stein · Pull Request #258 · Lokathor/wide

Noam2Stein · 2026-04-17T11:18:38Z

Added functions are: trunc, fract, signum, midpoint, clamp, fast_clamp, reduce_mul, exp2.

For clamp, because min and max already sacrifice performance to be consistent with
std, clamp is also consistent with std and has an assertion for min <= max. Because of
this there must also be a fast_clamp function that is faster.

Some code was previously not formatted.

Because `min` and `max` already sacrifice performance to be consistent with `std`, `clamp` is also consistent with `std` and has an assertion. Because of this there must also be a `fast_clamp` function that is faster.

Because the implementation swaps order of multiplication, the result is slightly different. The test now allows that. (`reduce_add` already allows slightly different outputs).

The max allowed difference is now 1e-12 instead of 1e-15.

`reduce_mul` incorrectly took an intrinsic from the `x86_64` module even though the correct module might be `x86`.

Noam2Stein · 2026-04-17T19:27:28Z

On second thought clamp having an assertion is likely not useful for most use cases. If you think the assertion should be removed i will remove it and fast_clamp because without an assertion it is identical to clamp.

Noam2Stein · 2026-04-23T13:31:05Z

@Lokathor i am tagging you just in case you did not receive a notification from GitHub.

Lokathor · 2026-04-23T15:21:57Z

I'll try to merge this soon, but I have had limited time lately. Definitely not forgotten though.

Lokathor · 2026-04-23T15:57:09Z

This looks good, but for the midpoint methods would it be better to multiply by 0.5 instead of divide by 2.0? I know it's basically the same output, but then you get just one multiply instruction instead of a whole division sequence right? I'm not sure if that's one of those minor optimizations that llvm does on its own or that we should write into the code.

Noam2Stein · 2026-04-23T18:27:48Z

I have always thought multiplication and division have the same cost since they both have their own instructions, but this can obviously be wrong. I will change it.

Also comment on what you think the behavior of clamp should be.

Lokathor · 2026-04-24T18:22:38Z

Looking at some stack overflow discussions, it seems that a floating Mul will never be slower than a floating Div, and depending on architecture it will sometimes be faster. And so let's keep the mul.

For clamp, I think a debug_assert instead of an assert hits the sweet spot between checking and speed. Then we don't need two versions. Does that sound good?

Noam2Stein · 2026-04-25T05:49:59Z

I also added Rem, div_euclid and rem_euclid.

It seems it is not possible to implement Rem with SIMD instructions. I tried to translate the implementation from libm but it has loops. Perhaps it is possible to use SIMD instructions for the non-loop parts of the implementation.

At least div_euclid and rem_euclid use SIMD instructions after calling Rem.

Noam2Stein · 2026-05-06T05:37:32Z

Is there something to fix before this is merged? asking just in case.

Lokathor · 2026-05-06T13:17:37Z

I just lost this a little down the pile. I'll try to release it soon.

Noam2Stein added 15 commits April 17, 2026 08:41

Apply rustfmt

99f4367

Some code was previously not formatted.

Add wide-float function trunc

981fc5c

Add wide-float function fract

c53630e

Add wide-float function signum

62636c9

Add wide-float function midpoint

8f3ec92

Add wide-float functions clamp and fast_clamp

7dbd796

Because `min` and `max` already sacrifice performance to be consistent with `std`, `clamp` is also consistent with `std` and has an assertion. Because of this there must also be a `fast_clamp` function that is faster.

Add wide-float function reduce_mul

21d075c

Add wide-float function exp2

ab691cf

Fix avx512f compilation errors

8740a31

Fix avx512f f64x8::reduce_mul test bug

caf59e0

Because the implementation swaps order of multiplication, the result is slightly different. The test now allows that. (`reduce_add` already allows slightly different outputs).

Increase exp2 test max difference

4e34278

The max allowed difference is now 1e-12 instead of 1e-15.

Add failure information to exp2 tests

be67233

Change exp2 test max difference

cda6578

Fix x86 compilation error

02dbd6f

`reduce_mul` incorrectly took an intrinsic from the `x86_64` module even though the correct module might be `x86`.

Fix import mistake

44a6fae

Use * 0.5 instead of / 2.0 for midpoint

7e55970

Noam2Stein added 3 commits April 25, 2026 07:35

Change the behaviour of clamp and remove fast_clamp

4e6181c

Implement Rem for float types

359f071

Add div_euclid and rem_euclid

9e737de

Merge branch 'main' into main

a77ce50

Lokathor merged commit e67769d into Lokathor:main May 6, 2026
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add more wide-float functionality#258

Add more wide-float functionality#258
Lokathor merged 20 commits into
Lokathor:mainfrom
Noam2Stein:main

Noam2Stein commented Apr 17, 2026

Uh oh!

Noam2Stein commented Apr 17, 2026

Uh oh!

Noam2Stein commented Apr 23, 2026

Uh oh!

Lokathor commented Apr 23, 2026

Uh oh!

Lokathor commented Apr 23, 2026

Uh oh!

Noam2Stein commented Apr 23, 2026

Uh oh!

Lokathor commented Apr 24, 2026

Uh oh!

Noam2Stein commented Apr 25, 2026

Uh oh!

Noam2Stein commented May 6, 2026

Uh oh!

Uh oh!

Lokathor commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Noam2Stein commented Apr 17, 2026

Uh oh!

Noam2Stein commented Apr 17, 2026

Uh oh!

Noam2Stein commented Apr 23, 2026

Uh oh!

Lokathor commented Apr 23, 2026

Uh oh!

Lokathor commented Apr 23, 2026

Uh oh!

Noam2Stein commented Apr 23, 2026

Uh oh!

Lokathor commented Apr 24, 2026

Uh oh!

Noam2Stein commented Apr 25, 2026

Uh oh!

Noam2Stein commented May 6, 2026

Uh oh!

Uh oh!

Lokathor commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants