Skip to content

Add more wide-float functionality#258

Merged
Lokathor merged 20 commits into
Lokathor:mainfrom
Noam2Stein:main
May 6, 2026
Merged

Add more wide-float functionality#258
Lokathor merged 20 commits into
Lokathor:mainfrom
Noam2Stein:main

Conversation

@Noam2Stein
Copy link
Copy Markdown
Contributor

Added functions are: trunc, fract, signum, midpoint, clamp, fast_clamp, reduce_mul, exp2.

For clamp, because min and max already sacrifice performance to be consistent with
std, clamp is also consistent with std and has an assertion for min <= max. Because of
this there must also be a fast_clamp function that is faster.

Some code was previously not formatted.
Because `min` and `max` already sacrifice performance to be consistent with
`std`, `clamp` is also consistent with `std` and has an assertion. Because of
this there must also be a `fast_clamp` function that is faster.
Because the implementation swaps order of multiplication, the result is slightly
different. The test now allows that. (`reduce_add` already allows slightly
different outputs).
The max allowed difference is now 1e-12 instead of 1e-15.
`reduce_mul` incorrectly took an intrinsic from the `x86_64` module even though
the correct module might be `x86`.
@Noam2Stein
Copy link
Copy Markdown
Contributor Author

On second thought clamp having an assertion is likely not useful for most use cases. If you think the assertion should be removed i will remove it and fast_clamp because without an assertion it is identical to clamp.

@Noam2Stein
Copy link
Copy Markdown
Contributor Author

@Lokathor i am tagging you just in case you did not receive a notification from GitHub.

@Lokathor
Copy link
Copy Markdown
Owner

I'll try to merge this soon, but I have had limited time lately. Definitely not forgotten though.

@Lokathor
Copy link
Copy Markdown
Owner

This looks good, but for the midpoint methods would it be better to multiply by 0.5 instead of divide by 2.0? I know it's basically the same output, but then you get just one multiply instruction instead of a whole division sequence right? I'm not sure if that's one of those minor optimizations that llvm does on its own or that we should write into the code.

@Noam2Stein
Copy link
Copy Markdown
Contributor Author

I have always thought multiplication and division have the same cost since they both have their own instructions, but this can obviously be wrong. I will change it.

Also comment on what you think the behavior of clamp should be.

@Lokathor
Copy link
Copy Markdown
Owner

Looking at some stack overflow discussions, it seems that a floating Mul will never be slower than a floating Div, and depending on architecture it will sometimes be faster. And so let's keep the mul.

For clamp, I think a debug_assert instead of an assert hits the sweet spot between checking and speed. Then we don't need two versions. Does that sound good?

@Noam2Stein
Copy link
Copy Markdown
Contributor Author

I also added Rem, div_euclid and rem_euclid.

It seems it is not possible to implement Rem with SIMD instructions. I tried to translate the implementation from libm but it has loops. Perhaps it is possible to use SIMD instructions for the non-loop parts of the implementation.

At least div_euclid and rem_euclid use SIMD instructions after calling Rem.

@Noam2Stein
Copy link
Copy Markdown
Contributor Author

Is there something to fix before this is merged? asking just in case.

@Lokathor Lokathor merged commit e67769d into Lokathor:main May 6, 2026
25 checks passed
@Lokathor
Copy link
Copy Markdown
Owner

Lokathor commented May 6, 2026

I just lost this a little down the pile. I'll try to release it soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants