Skip to content

Add functionality inconsistently missing from types#264

Open
Noam2Stein wants to merge 21 commits into
Lokathor:mainfrom
Noam2Stein:add-missing-functionality
Open

Add functionality inconsistently missing from types#264
Noam2Stein wants to merge 21 commits into
Lokathor:mainfrom
Noam2Stein:add-missing-functionality

Conversation

@Noam2Stein
Copy link
Copy Markdown
Contributor

This adds functionality that already exists for some types but is missing from others inconsistently. I did my best to find
all missing functionality and either fix it or list it below.

Because a ton of functionality is missing, this PR is quite big. I am not sure if this should have been split into multiple PRs?

This also fixes a bug i found in u64x8::simd_lt and optimizes f64xN::round_int.

Added missing trait implementations:

  • Cmp... traits for many integer types (including Cmp...<T>).
  • impl Not for &Wide for all types but f32x4 which already had it.
  • impl Sum<...> for f32x8
  • impl From<&[T]> for types where it is missing.

Added functions missing from f64xN:

  • recip
  • recip_sqrt
  • trunc_int
  • fast_trunc_int
  • fast_round_int

Added functions missing from some integer types:

  • abs
  • unsigned_abs
  • any
  • all
  • none
  • transpose
  • is_negative
  • reduce_add
  • reduce_max
  • reduce_min
  • saturating_add
  • saturating_sub

Notes

The Sum trait was already implemented via a macro for all types but f32x8. I do not know if this was intentional or a mistake, but i cannot think of any optimization that can be made for f32x8.

Currently transpose is left non-optimized for 8-bit and 16-bit integers.

Questions

For f32x8 (maybe for other types as well?) the existing implementations for recip and recip_sqrt use intrinsics _mm256_rcp_ps and _mm256_rsqrt_ps, which are documented as having "relative error". These functions should probably be cross platform deterministic so would it be better to replace those intrinsics with 1 / self?

Some integer types have simd_... inherit functions in addition to Cmp... trait implementations (e.g., i32x16). This is bad because it stops you from writing wide_value.simd_eq(scalar_value) (e.g., i32x16::ZERO.simd_eq(0)),
which the trait implementations support, but the inherit function that shadows the trait method does not. What should be done about this? (removing functions is a breaking change).

Inconsistencies that were not fixed

These inconsistencies were not fixed either because i was not sure what is the correct solution, or they are extremely small inconsistencies that are listed anyway. This includes all inconsistencies that i found and did not fixed:

  • Functions unpack_lo/hi have unclear names which mean different things depending on the type. u8x16 has similarly named functions unpack_low/high.

  • mul_widen and mul_keep_high

  • dot

  • mul_scale_round and mul_scale_round_n. These are tricky because existing implementations use intrinsics specific to i16.

  • Float sign_bit (does not count because it is deprecated).

  • i8xN functions swizzle, swizzle_relaxed, swizzle_half, swizzle_half_relaxed. These use intrinsics specific to i8.

  • Weird inconsistent conversions: i8x16::from_i16x16_saturate/truncate, i/u16x8::from_u8x16_low/high, i16x8::from_i8x16, f64xN::from_i32xN, f32xN::from_i32xN (missing for f64 and are similar to i32xN::round_float), f64x2::from_i32x4_lower2, u8x16::narrow_i16x8.

  • i8x16/i16x8::from_slice_unaligned is missing from other types and behave differently from Self::from(&[...]).

  • i32x8 implements From<&[i8]> which is not done for any other type.

  • impl From<i32x4> for f64x2 which is a non obvious conversion that is not implemented for any other type.

Tests

Adding tests separately for each type leads to bugs and takes a long time.

How do you feel about changing tests to use macros? This could make it easier to add functionality consistently in the future.

Possible syntax:

#[test]
fn example() {
    for_simd_types!(|T, N| {
        // Repeated for all types.
        // Has access to type aliases and consts `T`, `Simd`, `N`
    });
    for_simd_types!(|T: f32, N| {
        // Repeated for `f32x4`, `f32x8` and `f32x16`.
    });
    for_simd_types!(|T: Float, N| {
        // Repeated for `f32` and `f64` types.
    });
    for_simd_types!(|T: Float, N: 4| {
        // Repeated for `f32x4` and `f64x4`.
    });
}

I hope refactoring existing tests using this wont take too long, but doing so will probably find bugs.

Noam2Stein added 21 commits May 8, 2026 23:34
All missing `Cmp...` trait implementations were added. Some tests for existing
functions were missing. There was previously a bug in `u64x8::simd_lt`.
Previously `Not` was only implemented for references of 128-bit types.
The `Sum` trait was already implemented via a macro for all types but `f32x8`. I
do not know if `f32x8` was left out intentionally or by mistake, since i cannot
think of any optimization that can be made to this implementation.
For now 8-bit and 16-bit types do not have optimized implementations.
@RagnarGrootKoerkamp
Copy link
Copy Markdown

As another wide user, thanks so much for this PR! I've been wanting to do this for forever but never found the time.
Just some drive-by thoughts/comments from me:

  • Yes the mixed inherent and trait impls for Cmp... are currently very annoying. I'd say it's worth a breaking change to drop the inherent implementations.
  • The macro-based tests sound like a good idea indeed. Would be great to check that all integer types support the same set of functions, and similar for float types.
  • I don't know much about flaots, but recip_sqrt should probably use the intrinsic; that's what it's made for and this intrinsic is provided specifically because this operation can be done efficiently.
  • mul_widen is called widening_mul in the standard library.
  • Based on eg a044070, it looks like some functionality tests (not just does-it-compile tests) might be good?
  • Oh this bug in u64x8::simd_lt was quite bad!
  • You're adding a bunch of pub functions (like reduce variants). Should these be documented and/or moved into traits as well?

@Lokathor
Copy link
Copy Markdown
Owner

I don't have time to check closely on this today, but I will probably have time tomorrow. I like what I see so far.

i do like the macro test idea.

@Noam2Stein
Copy link
Copy Markdown
Contributor Author

Its great to hear this PR is helpful! I am working on a crate that uses wide so i am filling all functionality i need.

Yes the mixed inherent and trait impls for Cmp... are currently very annoying. I'd say it's worth a breaking change to drop the inherent implementations.

About simd_... functions, there could be a better solution than removing the inherit functions. Instead it is possible to make the traits pub(crate), and add public inherit functions like:

impl ... {
    #[expect(private_bounds)]
    pub fn simd_eq<Rhs>(self, rhs: Rhs) -> Self
    where
        Self: SimdEq<Rhs>,
    {
        SimdEq::simd_eq(self, rhs)
    }
}

This solution would still support both SIMD values and scalars for the second argument. This seems like a good solution unless there is another use for these traits which i am missing?

I don't know much about flaots, but recip_sqrt should probably use the intrinsic; that's what it's made for and this intrinsic is provided specifically because this operation can be done efficiently.

About recip, base functionality like +-*/ and recip should probably be cross-platform deterministic and consistent with std, as use cases that require cross-platform determinism rely on these. On the other hand, initially i thought standard library recip always returns the infinite-rounded result, but it seems the documentation does not promise this, so i should check its results compared to 1.0 / x.

For recip_sqrt since std does not have it, if documentation is added about rounding errors it would probably be fine.

Based on eg a044070, it looks like some functionality tests (not just does-it-compile tests) might be good?

The result of unpack_lo/hi is indirectly tested via transpose. Since these functions don't have any edge cases i don't think direct tests are needed unless they become public.

You're adding a bunch of pub functions (like reduce variants). Should these be documented and/or moved into traits as well?

All functions added (unless i made a mistake) already existed for some types but were missing from others. For example, reduce_add already existed for i32x4 but not for i8x16.

Documentation is missing from the entire crate so maybe fixing that should be a seperate PR?

About traits, these functions should probably only be in traits if the entire API is moved into traits. That is possible, but personally i dislike the idea.

@RagnarGrootKoerkamp
Copy link
Copy Markdown

About simd_... functions

Hmm. Indeed I use wide::CmpGt::simd_gt and such a lot, but it seems I do not actually use the traits themselves as bounds, so removing the traits and using (generic) inherent functions might just be fine?

Maybe it would then be better to have a uNxL trait that encapsulates the common logic of all integer types, but that's a separate thing.

@Lokathor
Copy link
Copy Markdown
Owner

Possibility: If we change the existing inherent methods to take Into instead of Self, that wouldn't be a breaking change I think? Then we can have a From impl for scalar values into wide values. I think those From impls mostly already exist, but might not for all types.

I would like it to be totally uniform where possible, but unfortunately I would also like to avoid breaking changes. Maybe we can just do as much as possible for the 1.0 version, and leave an overhaul for 2.0 (which is, ideally, some time after the stdlib simd types become stable, and wide can be rewritten as being "just" an interface over stdlib types).

@Noam2Stein
Copy link
Copy Markdown
Contributor Author

Rust lists this change as minor: https://doc.rust-lang.org/cargo/reference/semver.html#fn-generalize-compatible.

Inherit functions could be added using Into without making the original traits private. New users would not need to use the traits and previous code would be mostly non-broken (minor changes are always only "mostly non-breaking"). If this is the solution should the traits be deprecated?

I agree that it is a bad idea to make breaking changes especially because of std::simd. Tho if the std types get stabilized, is there a point to updating wide to 2.0.0? Previous code that wants to upgrade will need a refactor anyway so it would be better to upgrade directly to std::simd::Simd (unless the types are stabilized without all functionality).

@Noam2Stein
Copy link
Copy Markdown
Contributor Author

Two more notes:

  • The bound should probably be where Self: Cmp...<Rhs> instead of Rhs: Into<Self> because, even though this is a weird case, the traits can be implemented for third party types (impl CmpEq<ThirdPartyType> for f32x4).

  • Should i add the macroed tests into this PR?

@Lokathor
Copy link
Copy Markdown
Owner

Let's put the macro tests into the PR now, let's put the method adjustment and trait thing on hold until a later PR. I'd rather just merge the stuff that we know is an improvement, and save the pondering.

@RRRadicalEdward
Copy link
Copy Markdown
Contributor

RRRadicalEdward commented May 14, 2026

u8x16 has similarly named functions unpack_low/high.

This is the Rust equivalent of SSE2's unpack_low_i8_m128i. This is basically unpack and interleave low (unpack_low) or high (unpack_high) lanes.

mul_keep_high

This is the Rust equivalent of SSE2's mul_i16_keep_high_m128i(see https://doc.qu1x.dev/nalgebra-spacetime/safe_arch/fn.mul_i16_keep_high_m128i.html).

Hope it clarifies these functions 😄.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants