Fix llvm.addressof export symbols#55
Merged
Merged
Conversation
AddressOfOp is a virtual LLVM global-symbol reference. When the naming pre-pass gave it a %vN temporary, an earlier-printed use block could emit that temp even though no instruction would ever define it.
Pre-register AddressOfOp results as @{global_name}, matching constants and LLVM symbol spelling. This keeps addressof uses independent of CFG print order and adds dialect plus SharedArray LTOIR regressions for the undefined-value failure.
Signed-off-by: Joseph Glanville <jglanville@nvidia.com>
9f7ba64 to
67c6bb6
Compare
The naming pre-pass added in the previous commit already registers every AddressOfOp result as `@<global_name>`, so the op-emit-time arm that re-inserted the same name was dead code that obscured the contract. Replace it with a debug-only assertion that the result is pre-registered as a global symbol, and tighten the pre-pass comment to point at the assertion. No textual IR change.
The original assertion only checked that the GEP line contains the global symbol. Extend the regression test for issue NVlabs#54 so it also: - Verifies the shared global is declared at module scope. - Walks the exported IR and asserts that every `%vN` token in operand position has a matching `%vN = ...` definition. This catches the bug class directly (a stale `%vN` for an addressof result was the exact symptom in issue NVlabs#54) and any future regression that introduces a dangling SSA reference for any virtual op, not just AddressOfOp.
The original fixture only printed `SUCCESS:` and never launched the kernel, so the smoketest accepted a buggy `.ll` because libNVVM was never invoked. Turn it into a real end-to-end regression for issue NVlabs#54: the kernel takes a seed and writes `seed * repro_weight()` to an output buffer, the host loads it via `cuda_host::ltoir::load_kernel_module` (which compiles the cuda-oxide-emitted NVVM IR through libNVVM and links the cubin via nvJitLink), launches it, and asserts the result is 21.0 (= 7.0 * 3.0). A regression of the addressof export bug now fails at libNVVM's verifier before the kernel can run. The bug-triggering shape `OUTPUT_NORM[0] = OUTPUT_NORM[0] * weight` is preserved verbatim; the produced `.ll` still has multiple GEPs and a bitcast referencing `@__shared_mem_0` directly, which is exactly the chain that emitted dangling `%vN` references before the fix.
nihalpasham
approved these changes
May 15, 2026
Collaborator
nihalpasham
left a comment
There was a problem hiding this comment.
This looks good. Added a couple more pieces on top
- Op-emit
AddressOfOparm nowdebug_assert!s the pre-pass invariant instead of duplicating it. - Test walks the IR and asserts every
%vNhas a definition (catches the bug class, not just the GEP). - Fixture now launches through libNVVM and validates on GPU; a regression fails at the verifier.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The LLVM exporter could emit an undefined SSA value for
llvm.addressofwhen a block used an addressof result before the block containing theAddressOfOpwas printed.With the old exporter, a source-level
SharedArrayrepro failed during NVVM IR compilation with:Cause
AddressOfOpis a virtual global-symbol reference in textual LLVM IR. It should print as@global; it should not define an SSA register.The exporter naming pre-pass assigned every ordinary op result a
%vNname. SinceAddressOfOpwas not special-cased, an earlier-printed use could emit that temporary. When theAddressOfOpblock printed later, export rewrote the mapping to@global, but no instruction ever defined the earlier%vN.Fix
Pre-register
AddressOfOpresults as@{global_name}in the naming pre-pass and skip assigning them SSA temporaries.This is the right model because LLVM global addresses are symbol operands, not instruction results. It matches the existing constant handling and makes addressof uses independent of CFG block print order.
Regression Tests
Added:
dialect-llvmexporter test where anAddressOfOpis in a later-printed block and a GEP using it is in an earlier-printed block.SharedArrayLTOIR reproaddressof_sharedarray_repro.With the fix, the repro emits:
and the IR is accepted by LLVM.