Skip to content

LLVM and SPIRV-LLVM-Translator pulldown (WW20 2026)#22020

Draft
iclsrc wants to merge 2087 commits into
syclfrom
llvmspirv_pulldown
Draft

LLVM and SPIRV-LLVM-Translator pulldown (WW20 2026)#22020
iclsrc wants to merge 2087 commits into
syclfrom
llvmspirv_pulldown

Conversation

@iclsrc
Copy link
Copy Markdown
Collaborator

@iclsrc iclsrc commented May 14, 2026

fhahn and others added 30 commits April 29, 2026 11:27
Move checking of FOR/FMinMaxNum restriction checks for epilogue
vectorization to hasUnsupportedHeaderPhiRecipe and perform checks
directly on VPlan.

This unifies the checking code and enables epilogue vectorization of
VPlans with dead FORs, although the latter should be cleaned up by
scalar optimizations earlier in practice.

PR: llvm/llvm-project#191815
…part 53) (#194772)

Convert five tests to use new HLFIR lowering instead of legacy FIR
lowering:
Lower/allocatable-callee.f90, Lower/allocatable-caller.f90,
Lower/assignment.f90, Lower/assumed-shape-caller.f90,
Lower/Intrinsics/count.f90
…prevent unbounded path length growth. (#193691)

Ref #147220.

### Problem Description
Bazel's use of clang modules for its `layering_check` emits `extern
module` declarations relative to some base path meaning those paths
usually include long sequences of `../` followed by the path to the
module itself.

When parsing `extern module` in the module file, we (I believe
intentionally) silently ignore missing module files. Currently in the
problem case if the file existence check failed for any _other_ reason
it also silently ignores it. This means that `-fmodules-strict-decluse`
that bazel uses for the layering_check can throw a spurious
`err_undeclared_use_of_module` error which is the problem reported in
#147220.

Clang's `extern module` parsing chooses to concatenate these relative
paths recursively meaning the growth in those paths is unbounded. In
this case the file existence check fails due to the path name being too
long (ENAMETOOLONG in POSIX).

In summary there are possibly 2 underlying problems that contribute to
#147220 that we could try to fix:
1. Silently ignoring unexpected errors (ENAMETOOLONG) meaning the
ultimately reported error (undeclared use of module) doesn't really help
the user understand what was wrong.
2. Unbounded path growth when recursively declaring `extern module`s in
a chain.

I'm choosing to focus on (2) in this PR because both fixes seem useful,
and (1) seems an intentional design choice.

### Implementation
Collapse `../` in relative `extern module` paths before loading those
modules for parsing.
This fixes 6617aac.

Co-authored-by: Google Bazel Bot <google-bazel-bot@google.com>
This PR introduces an `erase` method to `ScopedHashTable`, designed to
remove the most recent value associated with a given key within the
scope stack. To support efficient deletion, the internal
`ScopedHashTableVal` structure has been refactored into a doubly linked
list, allowing the predecessor of a node to be identified in O(1) time
during removal. Fix the MLIR CSE issue
llvm/llvm-project#191135 (comment).
Part of llvm/llvm-project#193778.
…n-reduced mask (#187076)

Handles the case where the mask does not need to be trimmed, i.e. it's
already equal to the reduced vector type, for
`XferRead/WriteDropUnitDims` patterns.

Signed-off-by: Ege Beysel <beysel@roofline.ai>
…94825)

For the simplifyBinaryIntrinsic interface the `Call` argument passed in
may be null, which differs from other interfaces such as
simplifyIntrinsic and simplifyUnaryIntrinsic which require `Call` to be
non-null. See FoldBinaryIntrinsic in InstSimplifyFolder.h where the
`Call` argument has a default value of null.

That means for all uses of `Call` in simplifyBinaryIntrinsic we must
first check the pointer is not null to avoid an invalid dereference.
This PR fixes the case for the get.active.lane.mask intrinsic.

There isn't currently an easy way to test this fix because the only
place I can see where FoldBinaryIntrinsic is called without a null
`Call` is VPlanTransforms.cpp and we don't currently invoke the function
for get.active.lane.mask intrinsics.
As mentioned at
llvm/llvm-project#194239 (comment)
:

> Not related to your PR, but it looks like we're missing checks here
for bool vectors and BitInt destination types.

This patch adds the missing checks for bool vectors and BitInt types in
the `ConstantEmitter::emitForMemory` function.
This change makes it possible to use YAML anchors [1], [2] with
YAMLTraits. All of the necessary parser machinery already exists, so the
only change that is necessary is to wire it up to YAMLTraits.

This is done by keeping track of all `Anchor` -> `HNode *` mappings and
reusing those when an `AliasNode` is encountered.

In accordance with the spec [2], anchors do not have to be unique and
refer to the last occurrence in the serialization.

Example usage:

```yaml
foo: &a 42
bar: *a
```

The above would be deserialized as:

```yaml
foo: 42
bar: 42
```

Note that aliases are a serialization detail and can be discarded during
composition into a Representation Graph (`HNode` hierarchy).

[1]: https://yaml.org/spec/1.2.2/#692-node-anchors
[2]: https://yaml.org/spec/1.2.2/#3222-anchors-and-aliases
Summary:
Right now it's a little difficult to use the multilibs support because
the user must manually provide one. I believe that when the user
configures multilibs with the LLVM CMake arguments at a minimum we
should provide one that forward `-fmultilib-flag=<multilib>` to the
created runtime.

This RP makes CMake emit this by manually writing a flag. Because users
could provide their own, this adds some extre complexity to prevent this
from being overwritten.

The desire for this change is to more easily ship this support in CMake
configuration files without needing to write files manually (for the
typical case).
### Summary

part of : llvm/llvm-project#185382

This is a follow up : llvm/llvm-project#193658

Lower zip1 and zip2 intrinsics in
https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#zip-elements

All the intrinsics are handled inline in
`llvm-project/build/lib/clang/23/include/arm_neon.h` like:
```
#ifdef __LITTLE_ENDIAN__
__ai __attribute__((target("neon"))) int8x8_t vzip1_s8(int8x8_t __p0, int8x8_t __p1) {
  int8x8_t __ret;
  __ret = __builtin_shufflevector(__p0, __p1, 0, 8, 1, 9, 2, 10, 3, 11);
  return __ret;
}
#else
__ai __attribute__((target("neon"))) int8x8_t vzip1_s8(int8x8_t __p0, int8x8_t __p1) {
  int8x8_t __ret;
  int8x8_t __rev0;  __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_8);
  int8x8_t __rev1;  __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_8);
  __ret = __builtin_shufflevector(__rev0, __rev1, 0, 8, 1, 9, 2, 10, 3, 11);
  __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
  return __ret;
}
#endif
```
So no additional special lowering logic is needed.
- Normalize the header syntax for ReleaseNotes (current `.md` file and
`ReleaseNotesTemplate.txt`) to use `#`-based headings
- Normalize indents to distinguish doc title from page headers

Fixes navigation indents for Furo theme update (see
llvm/llvm-project#184440).
The SetVector already ensures that there are no cycles in the
collection.
…4790)

- Remove UNSUPPORTED: intelgpu from 12 passing tests:
  * mapping/data_member_ref.cpp
  * offloading/bug50022.cpp, info.c
* offloading/target_critical_region.cpp, target_depend_nowait.cpp,
target_nowait_target.cpp
  * offloading/strided_update/* (6 tests)
  * unified_shared_memory/close_member.c

- Change CUDA tests from XFAIL to UNSUPPORTED for Intel GPU:
  * offloading/CUDA/basic_launch.cu
  * offloading/CUDA/basic_launch_blocks_and_threads.cu
  * offloading/CUDA/basic_launch_multi_arg.cu
  * offloading/CUDA/launch_tu.cu

- Add Intel GPU configuration section to lit.cfg to disable USM tests by
default
…#194610)

This PR implements the refactorings discussed with @localspook in
#193838

---------

Co-authored-by: Victor Chernyakin <chernyakin.victor.j@outlook.com>
z/OS has a table of mapped names in the IR. Counting the hits for just
the name leads to one more hit than expected. Search for the name with
the @ char to make sure the right occurrences are being counted.
…#194648)

Fixes llvm/llvm-project#194596.

When the function result symbol is encountered while the compiler is
already completing the function result type, flang could recursively
re-enter _CompleteFunctionResultType()_ and crash on invalid code.

Instead of crashing on conflicting declarations, flang now reports an
“already declared” error and stops further recursion.
Handle AVX-512 VGF2P8AFFINEQB rmbi instructions in X86MCInstLower.

Unlike the existing rmi forms, rmbi uses a 64-bit broadcast memory
operand, so the constant pool entry may only contain the broadcast
source instead of a full-width vector constant. Print that constant
repeated across the destination vector width when forming the asm
comment.

Related: llvm/llvm-project#194572
…tributes (#194726)

Replace `getAsInteger()` parsing of the `patchable-function-entry`
and `patchable-function-prefix` function attributes with the existing
`Function::getFnAttributeAsParsedInteger()` helper across AsmPrinter
and all backend targets.

The IR verifier already validates these attributes as unsigned base-10
integers via `checkUnsignedBaseTenFuncAttr`, so parse failure at point
of use indicates a verifier bypass or IR corruption.
`getFnAttributeAsParsedInteger()` returns a default of 0 on failure
(matching the implicit behavior of the old code) and emits a diagnostic
rather than silently continuing.
Add operations that follow `float op(float, int)` pattern, mirroring the
existing `spirv.GL.Ldexp` op
The constexpr functions in question take a scoped enum as an argument
and a switch statement returns a value for each value of the enum. These
are all legal statements in a constexpr function in C++14.

Under constexpr rules, the evaluation of a constexpr function cannot
lead to an evaluation of any prohibited forms of expressions. An
evaluation of the functions being discussed with a valid argument will
terminate at the switch, and an code that follows will not be evaluated.

Using "llvm_unreachable" after the switch should be ok as long as the
expansion of the llvm_unreachable macro does not contain any statements
not allowed to appear in a constexpr function. At the same time, GCC
before v9 did not tolerate any unguarded calls to non-constexpr
functions after the switch.

To avoid using "llvm_unreachable", which can have multiple expansions,
use an assert with an explicit condition that the underlying value of
the argument lies between the minimum and maximum values of the enum.
Pulled out of #194473 - update combineMinMaxReduction to fold to a
ISD::VECREDUCE_SMAX/SMIN/UMAX/UMIN node and then perform the lowering
later on.

combineMinMaxReduction will go away once we can use
shouldExpandReduction, rely on the middle-end to recognise reductions
and not have to recreate them from the expanded patterns.

I've added pre-SSE41 handling using vector unrolling - hopefully this
will go away once #194672 is in place.
PR#194368 changed how line breaks are handles on Windows and it broke
several libcxx tests on Windows, including
libcxx/test/std/localization/locale.categories/facet.numpunct/
locale.numpunct.byname/thousands_sep.pass.cpp
This patch addresses this issue.
### Summary

part of llvm/llvm-project#185382

lower part of intrinsics in :
https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#zip-elements

Lower NEON::BI__builtin_neon_vzip_v and NEON::BI__builtin_neon_vzipq_v
in CIRGenBuiltinAArch64.cpp by porting the existing incubator logic
(`clangir/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp`) onto ClangIR:
two bitcasts on the input vectors, two rounds of cir.vec.shuffle
generating the low/high interleave patterns, each stored through a
ptr_stride of the sret base pointer.

### Test
- test_vzip_mf8
- test_vzipq_mf8

I found that these two intrinsics are defined in
`llvm-project/clang/test/CodeGen/AArch64/fp8-intrinsics/acle_neon_fp8_untyped.c`,
but this file seems to be a test suite specifically for the `mfloat8`
type, so I did not remove their original test cases.

Some of the new CHECK lines additionally match a pair of bitcasts before
the shuffle; this shape comes from arm_neon.h's inline wrappers, which
re-cast typed vectors (e.g. <4 x i16>) through <8 x i8> before calling
__builtin_neon_vzip_v. Variants whose element type is already i8
(s8/u8/p8/mf8) skip that round-trip and therefore have no bitcasts in
the check lines.
Cache root entry and SLPCostThreshold queries once, group
!ForReduction-only checks under two blocks, extract a shared benign-node
predicate from the two duplicated lambdas, and skip HasSingleLoad and
allConstant work when results are dead.

Reviewers: 

Pull Request: llvm/llvm-project#194895
…els (#194754)

As it turns out, even if a `ProcResGroup` consists of in-order pipes, as
long as its (the group's) BufferSize is not zero, Machine Scheduler will
not use in-order scheduling on instructions that consume it. Since
BufferSize also defaults to -1 for `ProcResGroup`, we have been
scheduling the resource consumption of SiFive7's `PipeAB` (scalar pipes)
and `VA1OrVA2` (vector pipes) in an out-of-order fashion!

Co-authored-by: Min Hsu <min.hsu@sifive.com>
petbernt and others added 22 commits April 30, 2026 10:24
While adding implementation status for nl_types.h, I noticed docgen
resolves it to nl-types.h instead of nl_types.h. As a result, headers
with underscores are not matched correctly and their implementation
status is not marked.

This patch fixes the handling of underscored header names in docgen so
they are processed consistently.
…0607)

Uses `FindAvailableLoadedValue` to resolve load instructions in call
arguments to constants before inline cost analysis. This gives the
inliner more precise cost estimate and option to inline functions which
would not be inlined otherwise.

The `-O3` doesn't inline empty `std::set` and `std::map` because node
deletion is recursive. The inliner doesn't know that `nullptr` is passed
in as it is a `load` from a member.

This addresses both `libstdc++` and `libc++`:
- `libstdc++` - `FindAvailableLoadedValue` requires `MaxInstToScan=0`,
because relevant store is 7 instructions away and `DefMaxInstsToScan =
6`. Benchmarking on large LLVM TUs showed no measurable compile-time
difference between limit=6 and whole basic block
- `libc++` - uses `memset` to zero all members in ctor, this patch
handles only `memset` to zero (the type mismatch case), which could be
generalized but seems very rare

The store-to-load pattern is created and consumed within the same CGSCC
inliner invocation: the ctor is inlined first (creating stores to the
object), and then the dtor's inline cost is evaluated (seeing loads from
the same object). No pass has an opportunity to simplify the IR in
between.

The `-flto` build eliminates empty `std::set` because the IR is
simplified enough in the regular optimization pass. However, when the
code is not header-only in a different TU, `-flto` doesn't help.

The change is much more general than just `std::set` and `std::map`. I
saw several impacts of it on LLVM codebase with `-O3`. Some function
reduce in size due to better dead-code elimination. Some increase due to
more aggressive inlining opportunities, and some are greatly simplified.

In my experiments I saw no measurable regression in compile times
compiling many large LLVM TUs. I measured ~1% faster compilation due to
following opt passes being faster. However, this needs more benchmarks.

Closes #183994
SIFixSGPRCopies was incorrectly handling inline assembly operands with
SGPR ("s") constraints when the value came from a memory load (which
produces a VGPR). The pass would fail to insert the necessary
v_readfirstlane instruction instead directly passes the vgpr value.
example:
  asm sideeffect buffer_load_dwordx4 $0, $1, $2, 0 =v,v,s,n
previously it generated:
buffer_load_dwordx4 v[0:3], v0, v[8:11] (but sgpr is expected), 0 offen

The fix adds readfirstlanes during lowering when there is a copy from
divergent register to SGPR.

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
…le (#194398)

This PR factors out the Flang/Fortranly only options from Options.td
into a separately file (FlangOptions.td).

Assisted-by: codex
…part 54) (#194774)

Convert five tests to use new HLFIR lowering instead of legacy FIR
lowering:
Lower/Intrinsics/c_f_pointer.f90, Lower/Intrinsics/c_loc.f90,
Lower/default-initialization-globals.f90, Lower/cray-pointer.f90,
Lower/loops.f90
The new AppleClang is only available on macOS 26, so we need to update
both.
…it (#194893)

This replaces some SFINAE and function overloading with `if
_LIBCPP_CONSTEXPR` to simplify the code a bit.
`DW_OP_addr_sect_offset4` is not a real DWARF opcode; it was a
proprietary LLDB proposal that was never adopted (and has no llvm::dwarf
constant). The same shared-library sliding problem is handled today by
evaluating DW_OP_addr as a FileAddress and converting via
Value::ConvertToLoadAddress.
The parallel DWARF linker deduplicates types across compile units using
a shared TypePool. When multiple CUs define the same type,
allocateTypeDie uses compare_exchange_strong to race for setting the
canonical DIE. The first thread to succeed stores the DIE and clones its
attributes, while subsequent threads use it the canonical one. Which
thread wins depends on OS thread scheduling, making the output
non-deterministic.

This PR fixes the non-determinism by assigning each CompileUnit a
priority based on its position in the link order (object file index, CU
index within the file). When a CU wants to mark DIE as canonical, it
acquires the spinlock, and only stores its DIE if its priority is
strictly lower than the current canonical DIE. This ensures that the
canonical DIE is always the lowest-priority (i.e. first) CU that defines
that type. The replaced DIE is leaked into the bump allocator and the
existing DebugTypeDeclFilePatch and accelerator record filters skips the
orphaned DIEs via getFinalDie() checks.

This PR also removes the AllowNonDeterministicOutput option, which was
never set in the first place, and is now obsolete.
  CONFLICT (content): Merge conflict in llvm/lib/SYCLLowerIR/CMakeLists.txt
…95131)

When debugging PExpect tests, the 60 second timeout can make that
process rather tedious. For TestStatusline, I used a class variable to
easily override it while iterating but the idea is applicable more
generally.
Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>
  CONFLICT (content): Merge conflict in llvm/lib/Passes/PassBuilderPipelines.cpp
Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>
  CONFLICT (content): Merge conflict in clang/include/clang/Options/Options.td
Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>
CONFLICT (content): Merge conflict in clang/lib/CodeGen/TargetInfo.cpp
CONFLICT (content): Merge conflict in clang/lib/Sema/SemaSYCL.cpp
@iclsrc iclsrc added the disable-lint Skip linter check step and proceed with build jobs label May 14, 2026
import io
import os
import shutil
import subprocess
@jsji jsji force-pushed the llvmspirv_pulldown branch from d976fae to 453edf0 Compare May 15, 2026 01:25
againull and others added 4 commits May 14, 2026 18:50
XFAIL the upstream test which assumes upstream driver layout
(libLLVMSYCL.so, per-target-runtime-dir, single-level include path,
clang-linker-wrapper)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…36498)

With de82b47, UseAllocaASForSrets is causing breakage in
Clang::OpenMP/amdgcn_sret_ctor.cpp.
CMPLRLLVM-75138

We should follow up to revist UseAllocaASForSrets in CMPLRLLVM-75358
Fix ast-attr-add-ir-attributes-misc.cpp test that broke after commit
fb02433 which introduced ExplicitInstantiationDecl AST node.

The commit added a new ExplicitInstantiationDecl node to preserve
source information for explicit template instantiations. Updated
the test CHECK lines to account for this new node appearing in
the AST dump between the explicit instantiation statement and
the partial specialization declaration.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Fixes: CMPLRLLVM-75129

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

disable-lint Skip linter check step and proceed with build jobs

Projects

None yet

Development

Successfully merging this pull request may close these issues.