Skip to content

Phase A: Vulkan Compute Backend for VSL Matrix/Vector #237

@ulises-jeremias

Description

@ulises-jeremias

Motivation

VSL (V Scientific Library) is the low-level linear algebra foundation for VTL. VSL provides vsl.la.Matrix (column-major, f64), BLAS/LAPACK wrappers, and VCL (OpenCL) for data transport. For GPU acceleration to work end-to-end, VSL must provide GPU-accelerated gemm, matmul, and element-wise operations that VTL's la/ module calls.

This issue is the VSL counterpart of VTL issue #58 (Phase 1: Vulkan Compute Foundation).

VSL's Role in the GPU Architecture

VTL's la/la.v converts Tensor[T] -> []f64 -> vsl.la.Matrix -> calls VSL -> converts back. When VTL calls GPU-accelerated operations:

  • VTL's compute/gemm_vulkan is used directly for Tensor[T]
  • VSL's la.vulkan must also support vsl.la.Matrix for:
    • VSL's own operations (la.gemm, la.matmul)
    • VTL's la/ module when it uses VSL as the backend

Why Vulkan for VSL

VSL uses Vulkan Compute as the primary GPU backend (same as VTL). Benefits:

  • Cross-vendor: Works on NVIDIA, AMD, Intel, ARM Mali/Adreno
  • No proprietary SDK: Unlike CUDA which only works on NVIDIA
  • Same code path: VTL and VSL share the same Vulkan compute kernels
  • SPIR-V: Unified GPU IR across all vendors

Reference Repositories

The V ecosystem has existing GPU/compute infrastructure used as reference:

  • antono2/vulkan — Full raw Vulkan 1.0–1.4 bindings (~1.3 MB). Use as the pattern for VSL's own Vulkan bindings: C→V type mapping, struct layout, handle definitions, API function signatures. MIT licensed.
  • antono2/v_vulkan_bindings — Python generator: Khronos XML → V code. Fork to generate a vsl-specific Vulkan bindings subset. MIT licensed.
  • antono2/vulkan_memory_allocator — Pool-based GPU memory allocator. Use as architectural reference for VSL's memory allocator. MIT licensed.
  • vsl.vcl — Mature OpenCL wrapper. Use as pattern for VSL's compute abstraction.

Implementation Pattern: Self-Contained Wrappers

Decision: VSL maintains its own Vulkan bindings within vsl/vulkan/, following the same pattern as BLAS, LAPACK, and VCL. Use antono2/vulkan, antono2/v_vulkan_bindings, and antono2/vulkan_memory_allocator as reference implementations.

vsl/
├── vk/              ← Vulkan bindings (self-contained, like vcl/)
│   ├── vk.c.v       ← C function declarations
│   ├── vk.ctypes.v  ← C type definitions
│   ├── vk.device.v  ← Device, physical device, instance
│   ├── vk.buffer.v  ← Buffer creation, memory binding
│   ├── vk.memory.v  ← Memory allocation, map/unmap
│   ├── vk.shader.v  ← ShaderModule from SPIR-V
│   ├── vk.pipeline.v← Compute pipeline, pipeline layout
│   ├── vk.descriptor.v← Descriptor set layout, pool
│   ├── vk.command.v ← Command buffer, submit, wait
│   └── vk.kernels.v ← GLSL compute shader sources
└── compute/          ← VSL compute abstraction

This keeps VSL self-contained and gives full control over the Vulkan API surface.

Scope

Files to create

  1. vsl/vulkan/ — VSL's own Vulkan bindings directory
    • vsl/vulkan/vk.c.v — C function declarations (fn C.vkCreateInstance(...), fn C.vkCmdDispatch(...), etc.)
    • vsl/vulkan/vk.ctypes.v — C type definitions (VkInstance, VkDevice, VkBuffer, VkDeviceMemory, etc.)
    • vsl/vulkan/vk.device.v — Device discovery, instance, physical device, logical device, queue
    • vsl/vulkan/vk.buffer.v — Buffer creation, memory binding
    • vsl/vulkan/vk.memory.v — Memory allocation, host<->device mapping
    • vsl/vulkan/vk.shader.v — Shader module creation from SPIR-V
    • vsl/vulkan/vk.pipeline.v — Compute pipeline, pipeline layout
    • vsl/vulkan/vk.descriptor.v — Descriptor set layout, pool, allocation
    • vsl/vulkan/vk.command.v — Command buffer, submit, wait
    • vsl/vulkan/vk.kernels.v — GLSL compute shader sources as V string constants
  2. vsl/compute/ — VSL compute abstraction directory
    • vsl/compute/gemm.v — GPU GEMM dispatcher (routes to Vulkan/CUDA/VCL/BLAS)
    • vsl/compute/elementwise.v — GPU element-wise ops dispatcher
    • vsl/compute/broadcast.v — GPU broadcast ops dispatcher
  3. vsl/la/vulkan.v — Vulkan GEMM for vsl.la.Matrix

Files to modify

  • vsl/la/la.v — Update gemm, matmul to dispatch to GPU via $if defined(vulkan)
  • vsl/vcl/vector.c.v — Add methods to extract vcl.Vector[f64] from vsl.la.Matrix

API Contracts

// VSL Matrix GPU dispatch
pub fn gemm_gpu(a vsl.la.Matrix, b vsl.la.Matrix, alpha f64, beta f64) !vsl.la.Matrix

// Vulkan GEMM for VSL Matrix
pub fn gemm_vulkan(a vsl.la.Matrix, b vsl.la.Matrix) !vsl.la.Matrix

// Element-wise ops on VSL Matrix
pub fn relu_vulkan(x vsl.la.Matrix) !vsl.la.Matrix
pub fn sigmoid_vulkan(x vsl.la.Matrix) !vsl.la.Matrix
pub fn tanh_vulkan(x vsl.la.Matrix) !vsl.la.Matrix

// Broadcast ops
pub fn add_vulkan(a vsl.la.Matrix, b vsl.la.Matrix) !vsl.la.Matrix
pub fn mul_vulkan(a vsl.la.Matrix, b vsl.la.Matrix) !vsl.la.Matrix

Integration with VTL

VTL's la/la.v needs a path that calls VSL GPU compute:

// vtl/la/la.v — updated for GPU
$if defined(vulkan) {
    return vsl_compute.gemm_vulkan(a, b)!
} $else $if defined(cuda) {
    return vsl_compute.gemm_cuda(a, b)!
} $else $if defined(vcl) {
    return vsl_compute.gemm_vcl(a, b)!
} $else {
    return vsl.la.gemm(a, b)!
}

SPIR-V Compilation

Same strategy as VTL Phase 1:

  • Kernels as V string constants in vsl/vulkan/vk.kernels.v
  • Compile to SPIR-V via glslangValidator -V -x -o out.spv
  • Load into Vulkan via vkCreateShaderModule

VSL-specific Considerations

  • Matrix layout: VSL vsl.la.Matrix is column-major (Fortran style). Vulkan kernels must account for this:
    • LDA = num_rows (column stride = number of rows)
    • Index: A[i + j * lda] where i is row, j is column
  • Type: VSL operates on f64 only (no generic T). Simplifies kernel variants.
  • No views/slices: Unlike VTL's Tensor[T], VSL's Matrix does not have views — all operations allocate new matrices. GPU storage model is simpler.
  • VCL conflict: VSL already has vsl.vcl module. The Vulkan backend (vk) should coexist with VCL — they are separate backends, not competing.

Testing Plan

  • Unit tests: gemm_vulkan produces same output as la.gemm (within FP64 tolerance)
  • Integration: VTL la.matmul produces same output with Vulkan backend
  • VSL: Matrix.vulkan() / .cpu() round-trip preserves data
  • Benchmark: Vulkan GEMM achieves >= 80% of theoretical peak

Dependencies

  • vsl/vulkan/ module (new — needs to be written first)
  • glslangValidator in PATH for SPIR-V compilation
  • VSL la/ module (existing)

Checklist

  • vsl/vulkan/ directory and Vulkan bindings (vk.c.v, vk.ctypes.v)
  • vsl/vulkan/vk.device.v: device discovery, instance, physical device, queue
  • vsl/vulkan/vk.buffer.v: buffer creation, memory binding
  • vsl/vulkan/vk.memory.v: memory allocation, host<->device mapping
  • vsl/vulkan/vk.shader.v: shader module creation from SPIR-V
  • vsl/vulkan/vk.pipeline.v: compute pipeline, pipeline layout
  • vsl/vulkan/vk.descriptor.v: descriptor set layout, pool, allocation
  • vsl/vulkan/vk.command.v: command buffer, submit, wait
  • vsl/vulkan/vk.kernels.v: GLSL compute shader sources
  • vsl/compute/ directory and dispatcher modules
  • vsl/compute/gemm.v: GPU dispatch for GEMM
  • vsl/compute/elementwise.v: GPU dispatch for element-wise
  • vsl/compute/broadcast.v: GPU dispatch for broadcast
  • vsl/la/vulkan.v: Vulkan GEMM for vsl.la.Matrix
  • vsl/la/la.v: update gemm/matmul to dispatch to Vulkan
  • vsl/vcl/vector.c.v: add methods for vcl.Vector extraction
  • Tests: consistency across Vulkan / CPU
  • Integration test: VTL la.matmul with Vulkan backend
  • Benchmark: Vulkan GEMM vs CPU BLAS

Related: VTL #58 (Phase 1: Vulkan Compute Foundation)
Parent: #236
Labels: enhancement, gpu, phase-a, vulkan

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions