Skip to content

Fix llvm.addressof export symbols#55

Merged
nihalpasham merged 4 commits into
NVlabs:mainfrom
josephglanville:jpg/fix-addressof
May 15, 2026
Merged

Fix llvm.addressof export symbols#55
nihalpasham merged 4 commits into
NVlabs:mainfrom
josephglanville:jpg/fix-addressof

Conversation

@josephglanville
Copy link
Copy Markdown
Collaborator

Summary

The LLVM exporter could emit an undefined SSA value for llvm.addressof when a block used an addressof result before the block containing the AddressOfOp was printed.

With the old exporter, a source-level SharedArray repro failed during NVVM IR compilation with:

/usr/bin/llc-21: .../addressof_sharedarray_repro.ll:15:56: error: use of undefined value '%v4'
  %v2 = getelementptr inbounds float, ptr addrspace(3) %v4, i64 0
                                                       ^

Cause

AddressOfOp is a virtual global-symbol reference in textual LLVM IR. It should print as @global; it should not define an SSA register.

The exporter naming pre-pass assigned every ordinary op result a %vN name. Since AddressOfOp was not special-cased, an earlier-printed use could emit that temporary. When the AddressOfOp block printed later, export rewrote the mapping to @global, but no instruction ever defined the earlier %vN.

Fix

Pre-register AddressOfOp results as @{global_name} in the naming pre-pass and skip assigning them SSA temporaries.

This is the right model because LLVM global addresses are symbol operands, not instruction results. It matches the existing constant handling and makes addressof uses independent of CFG block print order.

Regression Tests

Added:

  • A focused dialect-llvm exporter test where an AddressOfOp is in a later-printed block and a GEP using it is in an earlier-printed block.
  • A user-level SharedArray LTOIR repro addressof_sharedarray_repro.

With the fix, the repro emits:

%v2 = getelementptr inbounds float, ptr addrspace(3) @__shared_mem_0, i64 0

and the IR is accepted by LLVM.

AddressOfOp is a virtual LLVM global-symbol reference. When the naming pre-pass gave it a %vN temporary, an earlier-printed use block could emit that temp even though no instruction would ever define it.

Pre-register AddressOfOp results as @{global_name}, matching constants and LLVM symbol spelling. This keeps addressof uses independent of CFG print order and adds dialect plus SharedArray LTOIR regressions for the undefined-value failure.

Signed-off-by: Joseph Glanville <jglanville@nvidia.com>
The naming pre-pass added in the previous commit already registers
every AddressOfOp result as `@<global_name>`, so the op-emit-time arm
that re-inserted the same name was dead code that obscured the
contract. Replace it with a debug-only assertion that the result is
pre-registered as a global symbol, and tighten the pre-pass comment to
point at the assertion. No textual IR change.
The original assertion only checked that the GEP line contains the
global symbol. Extend the regression test for issue NVlabs#54 so it also:

- Verifies the shared global is declared at module scope.
- Walks the exported IR and asserts that every `%vN` token in operand
  position has a matching `%vN = ...` definition. This catches the bug
  class directly (a stale `%vN` for an addressof result was the exact
  symptom in issue NVlabs#54) and any future regression that introduces a
  dangling SSA reference for any virtual op, not just AddressOfOp.
The original fixture only printed `SUCCESS:` and never launched the
kernel, so the smoketest accepted a buggy `.ll` because libNVVM was
never invoked. Turn it into a real end-to-end regression for issue
NVlabs#54: the kernel takes a seed and writes `seed * repro_weight()` to an
output buffer, the host loads it via `cuda_host::ltoir::load_kernel_module`
(which compiles the cuda-oxide-emitted NVVM IR through libNVVM and
links the cubin via nvJitLink), launches it, and asserts the result is
21.0 (= 7.0 * 3.0). A regression of the addressof export bug now fails
at libNVVM's verifier before the kernel can run.

The bug-triggering shape `OUTPUT_NORM[0] = OUTPUT_NORM[0] * weight` is
preserved verbatim; the produced `.ll` still has multiple GEPs and a
bitcast referencing `@__shared_mem_0` directly, which is exactly the
chain that emitted dangling `%vN` references before the fix.
Copy link
Copy Markdown
Collaborator

@nihalpasham nihalpasham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. Added a couple more pieces on top

  • Op-emit AddressOfOp arm now debug_assert!s the pre-pass invariant instead of duplicating it.
  • Test walks the IR and asserts every %vN has a definition (catches the bug class, not just the GEP).
  • Fixture now launches through libNVVM and validates on GPU; a regression fails at the verifier.

@nihalpasham nihalpasham merged commit b0fff54 into NVlabs:main May 15, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants