Add more wide-float functionality#258
Conversation
Some code was previously not formatted.
Because `min` and `max` already sacrifice performance to be consistent with `std`, `clamp` is also consistent with `std` and has an assertion. Because of this there must also be a `fast_clamp` function that is faster.
Because the implementation swaps order of multiplication, the result is slightly different. The test now allows that. (`reduce_add` already allows slightly different outputs).
The max allowed difference is now 1e-12 instead of 1e-15.
`reduce_mul` incorrectly took an intrinsic from the `x86_64` module even though the correct module might be `x86`.
|
On second thought |
|
@Lokathor i am tagging you just in case you did not receive a notification from GitHub. |
|
I'll try to merge this soon, but I have had limited time lately. Definitely not forgotten though. |
|
This looks good, but for the |
|
I have always thought multiplication and division have the same cost since they both have their own instructions, but this can obviously be wrong. I will change it. Also comment on what you think the behavior of |
|
Looking at some stack overflow discussions, it seems that a floating Mul will never be slower than a floating Div, and depending on architecture it will sometimes be faster. And so let's keep the mul. For clamp, I think a debug_assert instead of an assert hits the sweet spot between checking and speed. Then we don't need two versions. Does that sound good? |
|
I also added It seems it is not possible to implement At least |
|
Is there something to fix before this is merged? asking just in case. |
|
I just lost this a little down the pile. I'll try to release it soon. |
Added functions are:
trunc,fract,signum,midpoint,clamp,fast_clamp,reduce_mul,exp2.For clamp, because
minandmaxalready sacrifice performance to be consistent withstd,clampis also consistent withstdand has an assertion formin <= max. Because ofthis there must also be a
fast_clampfunction that is faster.