Context
VSL Role in VTL/VSL GPU Architecture
VSL (V Scientific Library) is the low-level math foundation for VTL. VSL provides:
vsl.la — Linear algebra (Matrix, Vector, BLAS/LAPACK wrappers)
vsl.vcl — OpenCL data transport (Device, Vector[T], Kernel, async transfers)
vsl.plot — Visualization
vsl.random — Random number generation
vsl.ml — Machine learning primitives
For GPU acceleration, VSL's role is:
- GPU-accelerated linear algebra (
la.gemm, la.matmul, etc.) on Vulkan, CUDA, and OpenCL
- Compute infrastructure for VTL — VTL's
la/ module delegates to VSL; VSL's GPU compute flows back to VTL
VTL Phase Mapping to VSL
| VTL Phase |
VSL Equivalent |
Scope |
| Phase 1 (#58): Vulkan foundation |
Phase A (#237): VSL Vulkan backend |
GPU GEMM, element-wise, broadcast for VSL Matrix |
| Phase 2 (#59): NN forward |
N/A |
VSL is not an NN library; VTL handles NN |
| Phase 3 (#60): VSL + CUDA |
Phase B (#238): VSL CUDA backend |
GPU GEMM, element-wise, broadcast for VSL Matrix |
| Phase 4 (#61): GPU autograd |
N/A |
VTL handles autograd; VSL needs GPU forward only |
| Phase 5 (#62): OpenCL |
Phase C (#239): VSL VCL compute |
Extend existing VCL from transport to compute |
| Phase 6 (#63): ARM |
Covered by Phase A |
Vulkan on ARM (Android) is the same code |
| Phase 7+ (#64): Performance |
Covered by all |
Kernel fusion, mixed precision, async |
Reference Repositories
The V ecosystem has existing GPU/compute infrastructure used as reference for this plan:
antono2/vulkan — Full Raw Vulkan Bindings (Reference)
- What: Complete, auto-generated Vulkan 1.0–1.4 API bindings (~1.3 MB, weekly auto-regeneration from Khronos XML)
- Coverage: Every struct, enum, handle, and function from Vulkan core + extensions
- Handles:
Instance, PhysicalDevice, Device, Queue, CommandBuffer, Buffer, DeviceMemory, ShaderModule, Pipeline, PipelineLayout, DescriptorSetLayout, DescriptorPool, DescriptorSet, Fence, Semaphore, etc.
- Compute support: Full —
ComputePipelineCreateInfo, create_compute_pipelines, cmd_dispatch, compute queue flags, ShaderStageFlags.compute
- C→V mapping pattern:
@[typedef] pub struct C.VkXXX, fn C.vkCreateInstance(...) int, etc.
- Reference value: Use as the pattern for VSL's own Vulkan bindings (
vsl/vulkan/). Study the C→V type mapping, struct layout, and API signatures.
- License: MIT
antono2/v_vulkan_bindings — Python Generator (Reference)
- What: Python tool that translates Khronos
vk.xml registry → V code
- Reference value: Fork to generate a
vsl-specific subset of Vulkan bindings (compute-only, stripped of graphics APIs for smaller footprint)
- License: MIT
antono2/vulkan_memory_allocator — GPU Memory Allocator (Reference)
- What: Pool-based GPU memory allocator using
dlmalloc for CPU-side bookkeeping
- API:
Allocator.new(), allocate(), create_buffer(), map(), unmap(), destroy()
- Reference value: Use as architectural reference for VSL's Vulkan memory allocator
- License: MIT
vsl.vcl — Existing OpenCL Compute (Foundation)
- What: Mature OpenCL wrapper with
Device, Vector[T], Kernel, async transfers
- License: MIT
- Reference value: Use as architectural reference for compute abstraction pattern
Implementation Pattern: Self-Contained Wrappers (Like BLAS/LAPACK/VCL)
VSL already maintains self-contained wrappers for all external math libraries:
vsl/
├── blas/ ← Pure-V BLAS fallback (own implementation)
├── lapack/ ← Pure-V LAPACK fallback (own implementation)
├── vcl/ ← OpenCL wrapper (self-contained, ~12 .v files)
├── vk/ ← Vulkan bindings + compute (NEW — follow same pattern)
└── compute/ ← VSL compute abstraction (NEW)
Decision: Do NOT import antono2/vulkan as a dependency. Instead, maintain VSL's own Vulkan bindings within vsl/vulkan/, following the same pattern as BLAS, LAPACK, and VCL. Use antono2/vulkan, antono2/v_vulkan_bindings, and antono2/vulkan_memory_allocator as reference implementations for:
- C→V type mapping patterns
- Struct layout and handle definitions
- API function signatures
- Memory allocator design
This keeps VSL self-contained, avoids external dependency on a third-party module, and gives full control over the Vulkan API surface exposed to VTL.
Reference: VSL Current State
VSL has no GPU compute — all operations run on CPU:
| Component |
Status |
Location |
vsl.vcl.Vector[T] |
✅ Data transport |
vcl/vector.c.v |
vsl.vcl.Device |
✅ Device/context/queue |
vcl/device.c.v |
vsl.vcl.Kernel |
✅ Kernel loading/execution |
vcl/kernel.c.v |
vsl.vcl.async transfers |
✅ load/data async |
vcl/vector.c.v, vcl/buffer.c.v |
la.gemm |
❌ CPU BLAS only |
la/*.v (filename-dispatched BLAS) |
la.matmul |
❌ CPU BLAS only |
la/*.v |
la.lstsq |
❌ CPU LAPACK only |
la/extra.v |
VSL Matrix vs VTL Tensor
VSL uses vsl.la.Matrix and vsl.la.Vector — different from VTL's Tensor[T]:
// VSL Matrix (column-major, f64 only)
pub struct Matrix {
data []f64
m int
n int
}
// VTL Tensor (generic, row/col-major)
@[heap]
pub struct Tensor[T] {
data &storage.CpuStorage[T]
memory MemoryFormat
shape []int
strides []int
}
VSL compute needs to work with VSL Matrix/Vector types, not VTL Tensor types. The bridge is in VTL's la/ module which converts Tensor[T] -> []f64 -> vsl.la.Matrix -> calls VSL -> converts back.
Architecture: VSL Vulkan + Compute Layer
vsl/
├── vk/ ← Vulkan bindings (self-contained, like vcl/)
│ ├── vk.c.v ← C function declarations (vkCreateInstance, vkCmdDispatch, etc.)
│ ├── vk.ctypes.v ← C type definitions (VkInstance, VkDevice, VkBuffer, etc.)
│ ├── vk.device.v ← Device, physical device, instance management
│ ├── vk.buffer.v ← Buffer creation, memory binding
│ ├── vk.memory.v ← Memory allocation, map/unmap, staging
│ ├── vk.shader.v ← ShaderModule from SPIR-V
│ ├── vk.pipeline.v ← Compute pipeline, pipeline layout
│ ├── vk.descriptor.v← Descriptor set layout, pool, allocation
│ ├── vk.command.v ← Command buffer, submit, wait
│ └── vk.kernels.v ← GLSL compute shader sources (string constants)
├── vcl/ ← Existing OpenCL (data transport + future compute)
├── compute/ ← VSL compute abstraction (dispatch: Vulkan / CUDA / VCL / BLAS)
│ ├── gemm.v
│ ├── elementwise.v
│ └── broadcast.v
└── la/
├── la.v ← Updated: dispatches to compute/ for GPU
├── vulkan.v ← Vulkan GEMM for vsl.la.Matrix
├── cuda.v ← CUDA GEMM for vsl.la.Matrix
└── extra.v ← Existing: trace, norm, lstsq, qr, lu (CPU only)
VSL vs VTL Phases Summary
| Phase |
VTL Issue |
VSL Issue |
VSL Scope |
| 1 |
#58 Vulkan foundation |
#237 VSL Vulkan backend |
GPU GEMM/element-wise for VSL Matrix |
| 2 |
#59 NN forward on GPU |
N/A |
Not applicable to VSL |
| 3 |
#60 VSL + CUDA |
#238 VSL CUDA backend |
GPU GEMM/element-wise for VSL Matrix |
| 4 |
#61 GPU autograd |
N/A |
VTL handles autograd |
| 5 |
#62 OpenCL VCL |
#239 VSL VCL compute |
Extend VCL from transport to compute |
| 6 |
#63 ARM |
Covered by #237 |
Vulkan on ARM is same code |
| 7+ |
#64 Performance |
Covered by all |
Kernel fusion, mixed precision |
Task List
Context
VSL Role in VTL/VSL GPU Architecture
VSL (V Scientific Library) is the low-level math foundation for VTL. VSL provides:
vsl.la— Linear algebra (Matrix, Vector, BLAS/LAPACK wrappers)vsl.vcl— OpenCL data transport (Device, Vector[T], Kernel, async transfers)vsl.plot— Visualizationvsl.random— Random number generationvsl.ml— Machine learning primitivesFor GPU acceleration, VSL's role is:
la.gemm,la.matmul, etc.) on Vulkan, CUDA, and OpenCLla/module delegates to VSL; VSL's GPU compute flows back to VTLVTL Phase Mapping to VSL
Reference Repositories
The V ecosystem has existing GPU/compute infrastructure used as reference for this plan:
antono2/vulkan— Full Raw Vulkan Bindings (Reference)Instance,PhysicalDevice,Device,Queue,CommandBuffer,Buffer,DeviceMemory,ShaderModule,Pipeline,PipelineLayout,DescriptorSetLayout,DescriptorPool,DescriptorSet,Fence,Semaphore, etc.ComputePipelineCreateInfo,create_compute_pipelines,cmd_dispatch, compute queue flags,ShaderStageFlags.compute@[typedef] pub struct C.VkXXX,fn C.vkCreateInstance(...) int, etc.vsl/vulkan/). Study the C→V type mapping, struct layout, and API signatures.antono2/v_vulkan_bindings— Python Generator (Reference)vk.xmlregistry → V codevsl-specific subset of Vulkan bindings (compute-only, stripped of graphics APIs for smaller footprint)antono2/vulkan_memory_allocator— GPU Memory Allocator (Reference)dlmallocfor CPU-side bookkeepingAllocator.new(),allocate(),create_buffer(),map(),unmap(),destroy()vsl.vcl— Existing OpenCL Compute (Foundation)Device,Vector[T],Kernel, async transfersImplementation Pattern: Self-Contained Wrappers (Like BLAS/LAPACK/VCL)
VSL already maintains self-contained wrappers for all external math libraries:
Decision: Do NOT import
antono2/vulkanas a dependency. Instead, maintain VSL's own Vulkan bindings withinvsl/vulkan/, following the same pattern as BLAS, LAPACK, and VCL. Useantono2/vulkan,antono2/v_vulkan_bindings, andantono2/vulkan_memory_allocatoras reference implementations for:This keeps VSL self-contained, avoids external dependency on a third-party module, and gives full control over the Vulkan API surface exposed to VTL.
Reference: VSL Current State
VSL has no GPU compute — all operations run on CPU:
vsl.vcl.Vector[T]vcl/vector.c.vvsl.vcl.Devicevcl/device.c.vvsl.vcl.Kernelvcl/kernel.c.vvsl.vcl.async transfersvcl/vector.c.v,vcl/buffer.c.vla.gemmla/*.v(filename-dispatched BLAS)la.matmulla/*.vla.lstsqla/extra.vVSL Matrix vs VTL Tensor
VSL uses
vsl.la.Matrixandvsl.la.Vector— different from VTL'sTensor[T]:VSL compute needs to work with VSL Matrix/Vector types, not VTL Tensor types. The bridge is in VTL's
la/module which convertsTensor[T]->[]f64->vsl.la.Matrix-> calls VSL -> converts back.Architecture: VSL Vulkan + Compute Layer
VSL vs VTL Phases Summary
Task List