LLVM and SPIRV-LLVM-Translator pulldown (WW20 2026)#22020
Draft
iclsrc wants to merge 2087 commits into
Draft
Conversation
Move checking of FOR/FMinMaxNum restriction checks for epilogue vectorization to hasUnsupportedHeaderPhiRecipe and perform checks directly on VPlan. This unifies the checking code and enables epilogue vectorization of VPlans with dead FORs, although the latter should be cleaned up by scalar optimizations earlier in practice. PR: llvm/llvm-project#191815
…part 53) (#194772) Convert five tests to use new HLFIR lowering instead of legacy FIR lowering: Lower/allocatable-callee.f90, Lower/allocatable-caller.f90, Lower/assignment.f90, Lower/assumed-shape-caller.f90, Lower/Intrinsics/count.f90
…prevent unbounded path length growth. (#193691) Ref #147220. ### Problem Description Bazel's use of clang modules for its `layering_check` emits `extern module` declarations relative to some base path meaning those paths usually include long sequences of `../` followed by the path to the module itself. When parsing `extern module` in the module file, we (I believe intentionally) silently ignore missing module files. Currently in the problem case if the file existence check failed for any _other_ reason it also silently ignores it. This means that `-fmodules-strict-decluse` that bazel uses for the layering_check can throw a spurious `err_undeclared_use_of_module` error which is the problem reported in #147220. Clang's `extern module` parsing chooses to concatenate these relative paths recursively meaning the growth in those paths is unbounded. In this case the file existence check fails due to the path name being too long (ENAMETOOLONG in POSIX). In summary there are possibly 2 underlying problems that contribute to #147220 that we could try to fix: 1. Silently ignoring unexpected errors (ENAMETOOLONG) meaning the ultimately reported error (undeclared use of module) doesn't really help the user understand what was wrong. 2. Unbounded path growth when recursively declaring `extern module`s in a chain. I'm choosing to focus on (2) in this PR because both fixes seem useful, and (1) seems an intentional design choice. ### Implementation Collapse `../` in relative `extern module` paths before loading those modules for parsing.
This fixes 6617aac. Co-authored-by: Google Bazel Bot <google-bazel-bot@google.com>
This PR introduces an `erase` method to `ScopedHashTable`, designed to remove the most recent value associated with a given key within the scope stack. To support efficient deletion, the internal `ScopedHashTableVal` structure has been refactored into a doubly linked list, allowing the predecessor of a node to be identified in O(1) time during removal. Fix the MLIR CSE issue llvm/llvm-project#191135 (comment). Part of llvm/llvm-project#193778.
…n-reduced mask (#187076) Handles the case where the mask does not need to be trimmed, i.e. it's already equal to the reduced vector type, for `XferRead/WriteDropUnitDims` patterns. Signed-off-by: Ege Beysel <beysel@roofline.ai>
…asked stores. (#194689)
…94825) For the simplifyBinaryIntrinsic interface the `Call` argument passed in may be null, which differs from other interfaces such as simplifyIntrinsic and simplifyUnaryIntrinsic which require `Call` to be non-null. See FoldBinaryIntrinsic in InstSimplifyFolder.h where the `Call` argument has a default value of null. That means for all uses of `Call` in simplifyBinaryIntrinsic we must first check the pointer is not null to avoid an invalid dereference. This PR fixes the case for the get.active.lane.mask intrinsic. There isn't currently an easy way to test this fix because the only place I can see where FoldBinaryIntrinsic is called without a null `Call` is VPlanTransforms.cpp and we don't currently invoke the function for get.active.lane.mask intrinsics.
As mentioned at llvm/llvm-project#194239 (comment) : > Not related to your PR, but it looks like we're missing checks here for bool vectors and BitInt destination types. This patch adds the missing checks for bool vectors and BitInt types in the `ConstantEmitter::emitForMemory` function.
This change makes it possible to use YAML anchors [1], [2] with YAMLTraits. All of the necessary parser machinery already exists, so the only change that is necessary is to wire it up to YAMLTraits. This is done by keeping track of all `Anchor` -> `HNode *` mappings and reusing those when an `AliasNode` is encountered. In accordance with the spec [2], anchors do not have to be unique and refer to the last occurrence in the serialization. Example usage: ```yaml foo: &a 42 bar: *a ``` The above would be deserialized as: ```yaml foo: 42 bar: 42 ``` Note that aliases are a serialization detail and can be discarded during composition into a Representation Graph (`HNode` hierarchy). [1]: https://yaml.org/spec/1.2.2/#692-node-anchors [2]: https://yaml.org/spec/1.2.2/#3222-anchors-and-aliases
Summary: Right now it's a little difficult to use the multilibs support because the user must manually provide one. I believe that when the user configures multilibs with the LLVM CMake arguments at a minimum we should provide one that forward `-fmultilib-flag=<multilib>` to the created runtime. This RP makes CMake emit this by manually writing a flag. Because users could provide their own, this adds some extre complexity to prevent this from being overwritten. The desire for this change is to more easily ship this support in CMake configuration files without needing to write files manually (for the typical case).
### Summary part of : llvm/llvm-project#185382 This is a follow up : llvm/llvm-project#193658 Lower zip1 and zip2 intrinsics in https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#zip-elements All the intrinsics are handled inline in `llvm-project/build/lib/clang/23/include/arm_neon.h` like: ``` #ifdef __LITTLE_ENDIAN__ __ai __attribute__((target("neon"))) int8x8_t vzip1_s8(int8x8_t __p0, int8x8_t __p1) { int8x8_t __ret; __ret = __builtin_shufflevector(__p0, __p1, 0, 8, 1, 9, 2, 10, 3, 11); return __ret; } #else __ai __attribute__((target("neon"))) int8x8_t vzip1_s8(int8x8_t __p0, int8x8_t __p1) { int8x8_t __ret; int8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_8); int8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_8); __ret = __builtin_shufflevector(__rev0, __rev1, 0, 8, 1, 9, 2, 10, 3, 11); __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8); return __ret; } #endif ``` So no additional special lowering logic is needed.
- Normalize the header syntax for ReleaseNotes (current `.md` file and `ReleaseNotesTemplate.txt`) to use `#`-based headings - Normalize indents to distinguish doc title from page headers Fixes navigation indents for Furo theme update (see llvm/llvm-project#184440).
The SetVector already ensures that there are no cycles in the collection.
…4790) - Remove UNSUPPORTED: intelgpu from 12 passing tests: * mapping/data_member_ref.cpp * offloading/bug50022.cpp, info.c * offloading/target_critical_region.cpp, target_depend_nowait.cpp, target_nowait_target.cpp * offloading/strided_update/* (6 tests) * unified_shared_memory/close_member.c - Change CUDA tests from XFAIL to UNSUPPORTED for Intel GPU: * offloading/CUDA/basic_launch.cu * offloading/CUDA/basic_launch_blocks_and_threads.cu * offloading/CUDA/basic_launch_multi_arg.cu * offloading/CUDA/launch_tu.cu - Add Intel GPU configuration section to lit.cfg to disable USM tests by default
…#194610) This PR implements the refactorings discussed with @localspook in #193838 --------- Co-authored-by: Victor Chernyakin <chernyakin.victor.j@outlook.com>
z/OS has a table of mapped names in the IR. Counting the hits for just the name leads to one more hit than expected. Search for the name with the @ char to make sure the right occurrences are being counted.
…#194648) Fixes llvm/llvm-project#194596. When the function result symbol is encountered while the compiler is already completing the function result type, flang could recursively re-enter _CompleteFunctionResultType()_ and crash on invalid code. Instead of crashing on conflicting declarations, flang now reports an “already declared” error and stops further recursion.
Handle AVX-512 VGF2P8AFFINEQB rmbi instructions in X86MCInstLower. Unlike the existing rmi forms, rmbi uses a 64-bit broadcast memory operand, so the constant pool entry may only contain the broadcast source instead of a full-width vector constant. Print that constant repeated across the destination vector width when forming the asm comment. Related: llvm/llvm-project#194572
…tributes (#194726) Replace `getAsInteger()` parsing of the `patchable-function-entry` and `patchable-function-prefix` function attributes with the existing `Function::getFnAttributeAsParsedInteger()` helper across AsmPrinter and all backend targets. The IR verifier already validates these attributes as unsigned base-10 integers via `checkUnsignedBaseTenFuncAttr`, so parse failure at point of use indicates a verifier bypass or IR corruption. `getFnAttributeAsParsedInteger()` returns a default of 0 on failure (matching the implicit behavior of the old code) and emits a diagnostic rather than silently continuing.
Add operations that follow `float op(float, int)` pattern, mirroring the existing `spirv.GL.Ldexp` op
The constexpr functions in question take a scoped enum as an argument and a switch statement returns a value for each value of the enum. These are all legal statements in a constexpr function in C++14. Under constexpr rules, the evaluation of a constexpr function cannot lead to an evaluation of any prohibited forms of expressions. An evaluation of the functions being discussed with a valid argument will terminate at the switch, and an code that follows will not be evaluated. Using "llvm_unreachable" after the switch should be ok as long as the expansion of the llvm_unreachable macro does not contain any statements not allowed to appear in a constexpr function. At the same time, GCC before v9 did not tolerate any unguarded calls to non-constexpr functions after the switch. To avoid using "llvm_unreachable", which can have multiple expansions, use an assert with an explicit condition that the underlying value of the argument lies between the minimum and maximum values of the enum.
Pulled out of #194473 - update combineMinMaxReduction to fold to a ISD::VECREDUCE_SMAX/SMIN/UMAX/UMIN node and then perform the lowering later on. combineMinMaxReduction will go away once we can use shouldExpandReduction, rely on the middle-end to recognise reductions and not have to recreate them from the expanded patterns. I've added pre-SSE41 handling using vector unrolling - hopefully this will go away once #194672 is in place.
PR#194368 changed how line breaks are handles on Windows and it broke several libcxx tests on Windows, including libcxx/test/std/localization/locale.categories/facet.numpunct/ locale.numpunct.byname/thousands_sep.pass.cpp This patch addresses this issue.
### Summary part of llvm/llvm-project#185382 lower part of intrinsics in : https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#zip-elements Lower NEON::BI__builtin_neon_vzip_v and NEON::BI__builtin_neon_vzipq_v in CIRGenBuiltinAArch64.cpp by porting the existing incubator logic (`clangir/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp`) onto ClangIR: two bitcasts on the input vectors, two rounds of cir.vec.shuffle generating the low/high interleave patterns, each stored through a ptr_stride of the sret base pointer. ### Test - test_vzip_mf8 - test_vzipq_mf8 I found that these two intrinsics are defined in `llvm-project/clang/test/CodeGen/AArch64/fp8-intrinsics/acle_neon_fp8_untyped.c`, but this file seems to be a test suite specifically for the `mfloat8` type, so I did not remove their original test cases. Some of the new CHECK lines additionally match a pair of bitcasts before the shuffle; this shape comes from arm_neon.h's inline wrappers, which re-cast typed vectors (e.g. <4 x i16>) through <8 x i8> before calling __builtin_neon_vzip_v. Variants whose element type is already i8 (s8/u8/p8/mf8) skip that round-trip and therefore have no bitcasts in the check lines.
Cache root entry and SLPCostThreshold queries once, group !ForReduction-only checks under two blocks, extract a shared benign-node predicate from the two duplicated lambdas, and skip HasSingleLoad and allConstant work when results are dead. Reviewers: Pull Request: llvm/llvm-project#194895
…els (#194754) As it turns out, even if a `ProcResGroup` consists of in-order pipes, as long as its (the group's) BufferSize is not zero, Machine Scheduler will not use in-order scheduling on instructions that consume it. Since BufferSize also defaults to -1 for `ProcResGroup`, we have been scheduling the resource consumption of SiFive7's `PipeAB` (scalar pipes) and `VA1OrVA2` (vector pipes) in an out-of-order fashion! Co-authored-by: Min Hsu <min.hsu@sifive.com>
While adding implementation status for nl_types.h, I noticed docgen resolves it to nl-types.h instead of nl_types.h. As a result, headers with underscores are not matched correctly and their implementation status is not marked. This patch fixes the handling of underscored header names in docgen so they are processed consistently.
…0607) Uses `FindAvailableLoadedValue` to resolve load instructions in call arguments to constants before inline cost analysis. This gives the inliner more precise cost estimate and option to inline functions which would not be inlined otherwise. The `-O3` doesn't inline empty `std::set` and `std::map` because node deletion is recursive. The inliner doesn't know that `nullptr` is passed in as it is a `load` from a member. This addresses both `libstdc++` and `libc++`: - `libstdc++` - `FindAvailableLoadedValue` requires `MaxInstToScan=0`, because relevant store is 7 instructions away and `DefMaxInstsToScan = 6`. Benchmarking on large LLVM TUs showed no measurable compile-time difference between limit=6 and whole basic block - `libc++` - uses `memset` to zero all members in ctor, this patch handles only `memset` to zero (the type mismatch case), which could be generalized but seems very rare The store-to-load pattern is created and consumed within the same CGSCC inliner invocation: the ctor is inlined first (creating stores to the object), and then the dtor's inline cost is evaluated (seeing loads from the same object). No pass has an opportunity to simplify the IR in between. The `-flto` build eliminates empty `std::set` because the IR is simplified enough in the regular optimization pass. However, when the code is not header-only in a different TU, `-flto` doesn't help. The change is much more general than just `std::set` and `std::map`. I saw several impacts of it on LLVM codebase with `-O3`. Some function reduce in size due to better dead-code elimination. Some increase due to more aggressive inlining opportunities, and some are greatly simplified. In my experiments I saw no measurable regression in compile times compiling many large LLVM TUs. I measured ~1% faster compilation due to following opt passes being faster. However, this needs more benchmarks. Closes #183994
…overage (#194009) Fixes #193500
SIFixSGPRCopies was incorrectly handling inline assembly operands with
SGPR ("s") constraints when the value came from a memory load (which
produces a VGPR). The pass would fail to insert the necessary
v_readfirstlane instruction instead directly passes the vgpr value.
example:
asm sideeffect buffer_load_dwordx4 $0, $1, $2, 0 =v,v,s,n
previously it generated:
buffer_load_dwordx4 v[0:3], v0, v[8:11] (but sgpr is expected), 0 offen
The fix adds readfirstlanes during lowering when there is a copy from
divergent register to SGPR.
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
…le (#194398) This PR factors out the Flang/Fortranly only options from Options.td into a separately file (FlangOptions.td). Assisted-by: codex
…part 54) (#194774) Convert five tests to use new HLFIR lowering instead of legacy FIR lowering: Lower/Intrinsics/c_f_pointer.f90, Lower/Intrinsics/c_loc.f90, Lower/default-initialization-globals.f90, Lower/cray-pointer.f90, Lower/loops.f90
The new AppleClang is only available on macOS 26, so we need to update both.
…it (#194893) This replaces some SFINAE and function overloading with `if _LIBCPP_CONSTEXPR` to simplify the code a bit.
`DW_OP_addr_sect_offset4` is not a real DWARF opcode; it was a proprietary LLDB proposal that was never adopted (and has no llvm::dwarf constant). The same shared-library sliding problem is handled today by evaluating DW_OP_addr as a FileAddress and converting via Value::ConvertToLoadAddress.
The parallel DWARF linker deduplicates types across compile units using a shared TypePool. When multiple CUs define the same type, allocateTypeDie uses compare_exchange_strong to race for setting the canonical DIE. The first thread to succeed stores the DIE and clones its attributes, while subsequent threads use it the canonical one. Which thread wins depends on OS thread scheduling, making the output non-deterministic. This PR fixes the non-determinism by assigning each CompileUnit a priority based on its position in the link order (object file index, CU index within the file). When a CU wants to mark DIE as canonical, it acquires the spinlock, and only stores its DIE if its priority is strictly lower than the current canonical DIE. This ensures that the canonical DIE is always the lowest-priority (i.e. first) CU that defines that type. The replaced DIE is leaked into the bump allocator and the existing DebugTypeDeclFilePatch and accelerator record filters skips the orphaned DIEs via getFinalDie() checks. This PR also removes the AllowNonDeterministicOutput option, which was never set in the first place, and is now obsolete.
CONFLICT (content): Merge conflict in llvm/lib/SYCLLowerIR/CMakeLists.txt
…95131) When debugging PExpect tests, the 60 second timeout can make that process rather tedious. For TestStatusline, I used a class variable to easily override it while iterating but the idea is applicable more generally.
…nts" (#195135) Reverts llvm/llvm-project#190607 Causes crashes, e.g. https://lab.llvm.org/buildbot/#/builders/10/builds/27641
CONFLICT (content): Merge conflict in llvm/lib/Passes/PassBuilderPipelines.cpp
CONFLICT (content): Merge conflict in clang/include/clang/Options/Options.td
CONFLICT (content): Merge conflict in clang/lib/CodeGen/TargetInfo.cpp CONFLICT (content): Merge conflict in clang/lib/Sema/SemaSYCL.cpp
| import io | ||
| import os | ||
| import shutil | ||
| import subprocess |
XFAIL the upstream test which assumes upstream driver layout (libLLVMSYCL.so, per-target-runtime-dir, single-level include path, clang-linker-wrapper) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…36498) With de82b47, UseAllocaASForSrets is causing breakage in Clang::OpenMP/amdgcn_sret_ctor.cpp. CMPLRLLVM-75138 We should follow up to revist UseAllocaASForSrets in CMPLRLLVM-75358
Fix ast-attr-add-ir-attributes-misc.cpp test that broke after commit fb02433 which introduced ExplicitInstantiationDecl AST node. The commit added a new ExplicitInstantiationDecl node to preserve source information for explicit template instantiations. Updated the test CHECK lines to account for this new node appearing in the AST dump between the explicit instantiation statement and the partial specialization declaration. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Fixes: CMPLRLLVM-75129 Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
LLVM: llvm/llvm-project@bc325ec
SPIRV-LLVM-Translator: KhronosGroup/SPIRV-LLVM-Translator@bd774ef4