Add cvt_f16x2_f32 intrinsic for f32-to-f16x2 packing by honeyspoon · Pull Request #66 · NVlabs/cuda-oxide

honeyspoon · 2026-05-16T01:12:30Z

Summary

Add cvt_f16x2_f32 — packs two f32 values into a single u32 containing two packed f16 values in a single PTX instruction.

Maps to PTX: cvt.rn.f16x2.f32 d, hi, lo;

CUDA C equivalent: __float2half2_rn()

Motivation

WMMA GEMM epilogues accumulate in f32 but need to write half-precision output. The scalar approach requires two separate f32→f16 conversions plus bit manipulation ((lo_bits) | (hi_bits << 16)). This intrinsic compiles to a single PTX instruction that does both conversions and the pack.

Implementation

Follows the same 4-crate pattern as the existing CvtF32x2Bf16x2Op (bfloat16 variant in tcgen05.rs):

cuda-device/src/convert.rs: cvt_f16x2_f32(lo: f32, hi: f32) -> u32 stub
dialect-nvvm/src/ops/convert.rs: CvtF16x2F32Op — 2 f32 operands, 1 i32 result
mir-importer/.../intrinsics/convert.rs: Standard 2-arg emit with result storage via emit_store_result_and_goto
mir-lower/.../intrinsics/convert.rs: llvm.inline_asm with constraint string "=r,f,f"

The lowering is structurally identical to the existing convert_cvt_f32x2_bf16x2 in crates/mir-lower/src/convert/intrinsics/tcgen05.rs (lines 510–539), differing only in the PTX instruction name.

Testing

cargo fmt --check           # pass
cargo clippy -p cuda-device  # pass (workspace clippy blocked by hi_sparse_bitset MSRV on rustc 1.93)

Validated in a WMMA GEMM kernel on SM_89: f32 accumulator → packed f16x2 output with correct values (cosine similarity 1.000000 vs scalar reference).

Signed-off-by: abder <bobmatt911@gmail.com>

Add cvt_f16x2_f32 intrinsic for f32-to-f16x2 packing

9b8e62a

Signed-off-by: abder <bobmatt911@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cvt_f16x2_f32 intrinsic for f32-to-f16x2 packing#66

Add cvt_f16x2_f32 intrinsic for f32-to-f16x2 packing#66
honeyspoon wants to merge 1 commit into
NVlabs:mainfrom
honeyspoon:upstream-cvt-f16x2

honeyspoon commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

honeyspoon commented May 16, 2026

Summary

Motivation

Implementation

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant