Skip to content

Conversation

@steven-johnson
Copy link
Contributor

The old definitions of bool_1, bool_2, bool_3 in simd_op_check_x86 (etc) all referred to the same entry in in_f32; as of llvm/llvm-project#76367, the LLVM optimizer is smart enough to realize that (eg) bool1 != bool2 by construction, and optimizes away the code that tests their conditions, such as the one for andps and orps. Initing them from different locations is enough to outsmart the compiler.

(bug was only noticed in the x86 test, but I updated the other tests to guard against future improvements there too.)

The old definitions of bool_1, bool_2, bool_3 in simd_op_check_x86 (etc) all referred to the same entry in in_f32; as of llvm/llvm-project#76367, the LLVM optimizer is smart enough to realize that (eg) bool1 != bool2 by construction, and optimizes away the code that tests their conditions, such as the one for andps and orps. Initing them from different locations is enough to outsmart the compiler.

(bug was only noticed in the x86 test, but I updated the other tests to guard against future improvements there too.)
@steven-johnson steven-johnson merged commit 84fe565 into main Feb 7, 2024
@steven-johnson steven-johnson deleted the srj/llvm-fp-fix branch February 7, 2024 17:41
ardier pushed a commit to ardier/Halide-mutation that referenced this pull request Mar 3, 2024
The old definitions of bool_1, bool_2, bool_3 in simd_op_check_x86 (etc) all referred to the same entry in in_f32; as of llvm/llvm-project#76367, the LLVM optimizer is smart enough to realize that (eg) bool1 != bool2 by construction, and optimizes away the code that tests their conditions, such as the one for andps and orps. Initing them from different locations is enough to outsmart the compiler.

(bug was only noticed in the x86 test, but I updated the other tests to guard against future improvements there too.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants