-
cudarc
Safe and minimal CUDA bindings
-
whisper-rs
Rust bindings for whisper.cpp
-
neptune
Poseidon hashing over BLS12-381 for Filecoin
-
bindgen_cuda
Bindgen like interface to build cuda kernels to interact with within Rust
-
dlpark
dlpack Rust binding for Python
-
cuvs
RAPIDS vector search library
-
hvm
A massively parallel, optimal functional runtime in Rust
-
axonml-autograd
Automatic differentiation engine for Axonml ML framework
-
ai-hwaccel
Universal AI hardware accelerator detection, capability querying, and workload planning for Rust
-
candle-kernels
CUDA kernels for Candle
-
ringkernel-cuda
CUDA backend for RingKernel - NVIDIA GPU support via cudarc
-
shiguredo_nvcodec
Rust bindings for NVIDIA Video Codec SDK
-
mwa_hyperbeam
Primary beam code for the Murchison Widefield Array (MWA) radio telescope
-
cudaforge
Advanced CUDA kernel builder for Rust with incremental builds, auto-detection, and external dependency support
-
mamba-rs
Mamba SSM and Mamba-3 SISO in Rust with optional CUDA GPU acceleration. Inference and training (BPTT through SSM state, AdamW), CPU + GPU paths, custom CUDA kernels, CUDA Graph capture…
-
oxicuda-ptx
OxiCUDA PTX - PTX code generation DSL and IR for GPU kernel development
-
oxicuda-blas
OxiCUDA BLAS - GPU-accelerated BLAS operations (cuBLAS equivalent)
-
tl_backend
GPU Backend Trait Definitions for TL
-
ferrum-kernels
Unified compute kernels (CUDA/Metal/CPU) and model runner for Ferrum inference
-
oxicuda-driver
OxiCUDA Driver - Dynamic CUDA driver API wrapper via libloading (zero SDK dependency)
-
iron_learn
ML library with GPU-accelerated gradient descent. Supports tensors, complex numbers, linear/logistic regression, and CUDA optimization.
-
oxicuda-launch
OxiCUDA Launch - Type-safe GPU kernel launch infrastructure
-
cudaclaw
CUDA Rust bindings for GPU programming in the Cocapn fleet
-
async-cuda
Async CUDA for Rust
-
xlog-prob
Probabilistic inference engines for XLOG
-
omicsx
SIMD-accelerated sequence alignment and bioinformatics analysis for petabyte-scale genomic data
-
kn-cuda-sys
A wrapper around the CUDA APIs
-
ec-gpu
Traits for field and eliptic curve operations on GPUs
-
flash-map
GPU-native concurrent hash map with bulk-only API. Robin Hood hashing, SoA layout, CUDA kernels. Designed for blockchain state, HFT, and batch-parallel workloads.
-
zfp-sys
Raw Rust bindings to ZFP (https://github.com/LLNL/zfp)
-
bend-lang
A high-level, massively parallel programming language
-
llama-cpp-sys-2
Low Level Bindings to llama.cpp
-
oxideav-nvidia
Linux NVIDIA NVDEC/NVENC hardware decode/encode bridge for the oxideav framework — runtime-loaded via libloading, no compile-time CUDA SDK dep
-
async-tensorrt
Async TensorRT for Rust
-
tensor-crab
Rust-native ML library. No Python. No GIL. Just speed.
-
crown
A cryptographic library
-
rf-detr-ort
High-performance RF-DETR object detection inference via ONNX Runtime (TensorRT / CUDA / CPU)
-
ringkernel-graph
GPU-accelerated graph algorithm primitives for RingKernel (CSR, BFS, SCC, Union-Find, SpMV)
-
cuda-rust-wasm
CUDA to Rust transpiler with WebGPU/WASM support
-
object_detector
Object detection using ORT and the yoloe-26-seg model. This model can detect multiple objects per image, each having a tag, pixel-level mask, and a boundingbox. It's pretrained, it has a vocabulary of 4000+ objects.
-
xlog-cuda
CUDA kernel provider, buffers, and interop for XLOG
-
ringkernel-cuda-codegen
CUDA code generation from Rust DSL for RingKernel stencil kernels
-
wax-llm
Command-line LLM inference with Candle, safetensors, GGUF, and Metal support
-
mwa_hyperdrive
Calibration software for the Murchison Widefield Array (MWA) radio telescope
-
cuda-async
Safe Async CUDA support via Async Rust
-
gpu-scatter-gather
World's fastest wordlist generator using GPU acceleration with multi-GPU support
-
guerks_image_processing
CUDA image processing
-
cudf
Safe Rust bindings for NVIDIA libcudf -- GPU-accelerated DataFrame operations
-
whisper-mcp-server
Speech-to-text MCP server powered by whisper.cpp
-
pasta-msm
Optimized multiscalar multiplicaton for Pasta moduli for x86_64 and aarch64
-
axonml-optim
Optimizers and learning rate schedulers for the Axonml ML framework
-
moe-gpu-dsp
MoE-routed GPU signal processing framework — batch cuFFT, kernel dispatch, zero-copy pipelines
-
ndrs
A tensor library with GPU support
-
flodl-cli
libtorch manager and GPU diagnostic tool for Rust deep learning
-
ringkernel-montecarlo
GPU-accelerated Monte Carlo primitives for RingKernel (Philox RNG, variance reduction)
-
supraseal-c2
CUDA Groth16 proof generator for Filecoin
-
nove
lightweight deep learning library wrapped around Candle Tensor
-
xlog-cli
Command-line interface for deterministic and probabilistic XLOG execution
-
oxicuda-backend
OxiCUDA Backend - Abstract compute backend trait for GPU dispatch
-
burn-cuda
CUDA backend for the Burn framework
-
infernum
CLI - From the depths, intelligence rises
-
xlog-gpu
High-level Rust API for running XLOG programs on NVIDIA GPUs
-
ferrum-cuda-kernels
Custom CUDA kernels and decode runner for Ferrum inference
-
perdix
High-performance GPU-accelerated ring buffer for AI terminal multiplexing
-
torsh-backend
Backend abstraction layer for ToRSh
-
xdl-amp
Multi-backend GPU/ML acceleration for XDL
-
pylate-rs
WebAssembly library for late interaction models
-
lumen-engine-ffmpeg
FFmpeg integration for media decode, encode, muxing, and GPU interop in Lumen
-
optirs-gpu
OptiRS GPU acceleration and multi-GPU optimization
-
torch_poetry_bootstrap
A command-line tool to detect CUDA version and install the appropriate PyTorch wheel via Poetry
-
ec-gpu-gen
Code generator for field and eliptic curve operations on the GPUs
-
nvidia-video-codec-sdk
Bindings for NVIDIA Video Codec SDK
-
sass-assembler
SASS (NVIDIA GPU) assembler for Gaia project
-
llama-cpp-bindings
llama.cpp bindings for Rust
-
atomr-accel-cuda
GPU acceleration via the actor model. Wraps NVIDIA CUDA libraries (cuBLAS, cuDNN, cuFFT, cuRAND, cuSOLVER, cuSPARSE, cuTENSOR, cuBLASLt, NVRTC, NCCL) as supervised atomr actors with…
-
xlog-core
Core types, traits, and error surfaces shared across XLOG
-
burn_dragon_kernel
Fused GPU kernel crate for burn_dragon execution paths
-
cutile
lets programmers safely author and execute tile kernels directly in Rust
-
cuda-driver-sys
Rust binding to CUDA Driver APIs
-
singe-ptx
CUDA PTX parser, AST, and instruction metadata utilities
-
autd3-backend-cuda
CUDA Backend for AUTD3
-
docbert-pylate
late interaction (ColBERT) models, vendored into the docbert workspace
-
with-gpu
Intelligent GPU selection wrapper for CUDA commands
-
nam-ec-gpu-gen
Code generator for field and elliptic curve operations on the GPUs
-
oxicuda-sparse
OxiCUDA Sparse - GPU-accelerated sparse matrix operations (cuSPARSE equivalent)
-
rlkit
deep reinforcement learning library based on Rust and Candle, providing complete implementations of Q-Learning and DQN algorithms, supporting custom environments, various policy choices…
-
car-memgine
Memgine — graph-based memory engine for Common Agent Runtime
-
sbv2_core
Style-Bert-VITSの推論ライブラリ
-
signinum-cuda-runtime
CUDA Driver API runtime helpers for signinum device adapters
-
tensor_frame
A PyTorch-like tensor library for Rust with CPU, WGPU, and CUDA backends
-
morok-runtime
Kernel execution runtime for the Morok ML compiler
-
singe-cuda-find
CUDA toolkit discovery and library path resolution utilities
-
baracuda-forge
Build-time CUDA kernel compiler for the baracuda ecosystem: nvcc-driven incremental builds, parallel compilation, GPU auto-detection, and CUTLASS / custom git dependency support
-
atomr-accel-flashattn
FlashAttention v2 + v3 kernel templates for atomr-accel — fp16/bf16/fp8, causal, varlen, ALiBi, sliding window, sink tokens, MQA/GQA, paged KV-cache, and chunked prefill, dispatched through NVRTC + Phase 0…
-
baracuda-types
Shared type vocabulary for the baracuda CUDA stack (Half/BFloat16/Complex, DeviceRepr, CudaVersion, Feature, CudaStatus)
-
haagenti-cuda
CUDA GPU decompression kernels for Haagenti tensor compression
-
tensorrt-infer
Safe Rust wrappers for NVIDIA TensorRT inference
-
cyanea-gpu
GPU compute abstraction (CUDA/Metal) for the Cyanea bioinformatics ecosystem
-
tesser-cortex
High-performance, hardware-agnostic AI inference engine for Tesser
-
cocapn-glue-core
Cross-tier wire protocol unifying all FLUX ISA packages for the Cocapn fleet
-
icicle-core
GPU ZK acceleration by Ingonyama
-
nove_tensor
lightweight deep learning library wrapped around Candle Tensor
-
cuda-runtime-sys
Rust binding to CUDA Runtime APIs
-
luminal_cudarc
Safe wrappers around CUDA apis
-
ct-cuda-prep
GPU-ready CUDA snap kernels — compile-verified, CPU fallback, PTX analysis
-
baracuda-cuda-sys
Raw FFI bindings and dynamic loader for the CUDA Driver and Runtime APIs (libcuda / libcudart)
-
kaio-runtime
KAIO runtime — CUDA driver API wrapper, kernel launch, and device memory management. Part of the KAIO GPU kernel authoring framework.
-
atomr-accel-cutlass
CUTLASS kernel-template instantiation via NVRTC for atomr-accel. Provides GEMM, grouped GEMM, implicit-GEMM convolution, and EVT (epilogue visitor tree) actors that JIT CUTLASS C++…
-
whisper-rs-sys
Rust bindings for whisper.cpp (FFI bindings)
-
fatbinary
manipulate CUDA fatbinary format
-
cudf-cxx
cxx-based FFI bridge between Rust and NVIDIA libcudf C++ API
-
piper-tts-rs
Piper-TTS implementation in Rust
-
oxicuda-runtime
OxiCUDA Runtime - CUDA Runtime API wrapper (cudaMalloc/cudaMemcpy/cudaLaunchKernel) built on the driver API
-
baracuda-runtime
Safe Rust wrappers for the CUDA Runtime API (devices, streams, events, managed memory, kernel launch via the library API)
-
cuvs-sys
Low-level rust bindings to libcuvs
-
xndarray
CPU and CUDA-backed ndarray
-
infraqueue-ai-server
AI model server for INFRAQUEUE
-
abaddon
LLM inference engine - The Destroyer renders judgment
-
rcudnn
safe Rust wrapper for CUDA's cuDNN
-
RayBNN_Raytrace
Ray tracing library using GPUs, CPUs, and FPGAs via CUDA, OpenCL, and oneAPI
-
faiss-next-sys
Raw FFI bindings to Faiss (Facebook AI Similarity Search)
-
baracuda-cublas
Safe Rust wrappers for NVIDIA cuBLAS (classic BLAS, Lt, Xt)
-
tensorlogic-oxicuda-solver
OxiCUDA linear solver wrapper for TensorLogic (GPU + CPU fallback)
-
baracuda-cutensor-sys
Raw FFI bindings and dynamic loader for NVIDIA cuTENSOR (tensor contraction)
-
atomr-accel-agents
Agentic / LLM GPU actor blueprints on atomr-accel-cuda: RagPipeline, EmbeddingCache, CpuVectorIndex, SharedGpuStateCoordinator, LangGraphGpuActor
-
RayBNN_Sparse
Sparse Matrix Library for GPUs, CPUs, and FPGAs via CUDA, OpenCL, and oneAPI
-
baracuda-cvcuda-sys
Raw FFI bindings and dynamic loader for NVIDIA CV-CUDA (computer-vision operators)
-
baracuda-cutensor
Safe Rust wrappers for NVIDIA cuTENSOR. Scaffolding at v0.1.
-
dandelion-cuda
NVIDIA CUDA backend for dandelion LLM inference engine
-
jawe-cuvs-iv
RAPIDS vector search library
-
jawe-cuvs-iii
RAPIDS vector search library
-
RayBNN_DataLoader
Read CSV, numpy, and binary files to Rust vectors of f16, f32, f64, u8, u16, u32, u64, i8, i16, i32, i64
-
baracuda-nvcomp-sys
Raw FFI bindings and dynamic loader for NVIDIA nvCOMP (GPU compression)
-
singe-nccl-sys
Low-level FFI bindings for the NVIDIA Collective Communications Library (NCCL)
-
kitsune-stt
Speech-to-Text tool using Candle and Voxtral
-
baracuda-cusolver
Safe Rust wrappers for NVIDIA cuSOLVER (dense LU factorization at v0.1)
-
blazen-llm-llamacpp
Local LLM backend for Blazen using llama.cpp inference engine
-
baracuda-cufile-sys
Raw FFI bindings and dynamic loader for NVIDIA cuFile (GPUDirect Storage, Linux-only)
-
blazen-llm-mistralrs
Local LLM backend for Blazen using mistral.rs inference engine
-
atomr-accel-train
Distributed training blueprints on atomr-accel-cuda: DataParallelTrainer, PipelineParallelTrainer, TensorParallelTrainer, AsyncParameterServer, optimizer + loss enums
-
baracuda-nvml
Safe Rust wrappers for the NVIDIA Management Library (NVML) — driver-bundled GPU monitoring
-
baracuda-cvcuda
Safe Rust wrappers for NVIDIA CV-CUDA. Scaffolding at v0.1.
-
baracuda-cusparse
Safe Rust wrappers for NVIDIA cuSPARSE (generic-API SpMV at v0.1)
-
rakka-accel-train
Distributed training blueprints on rakka-accel-cuda: DataParallelTrainer, PipelineParallelTrainer, TensorParallelTrainer, AsyncParameterServer, optimizer + loss enums
-
baracuda-nccl
Safe Rust wrappers for NVIDIA NCCL (multi-GPU collective communication)
-
iro-cuda-ffi
IRO CUDA FFI - A minimal, rigid ABI boundary for Rust to orchestrate nvcc-compiled CUDA kernels
-
crseo-sys
Cuda Engined Optics Rust Interface
-
rakka-accel-agents
Agentic / LLM GPU actor blueprints on rakka-accel-cuda: RagPipeline, EmbeddingCache, CpuVectorIndex, SharedGpuStateCoordinator, LangGraphGpuActor
-
singe
A machine learning framework that sets tensors ablaze
-
hpt-cudakernels
implements cuda kernels for hpt
-
singe-kernel
Custom CUDA kernel development framework
-
singe-onnx
ONNX model loading and execution utilities
-
baracuda-nvcomp
Safe Rust wrappers for NVIDIA nvCOMP (GPU compression). Scaffolding at v0.1.
-
crown-bin
A cryptographic library
-
rcublas
safe Rust wrapper for CUDA's cuBLAS
-
singe-cuda-sys
Low-level FFI bindings for CUDA driver, runtime, NVRTC, and related NVIDIA APIs
-
blazen-audio-whispercpp
Local speech-to-text backend for Blazen using whisper.cpp
-
cublas
safe Rust wrapper for CUDA's cuDNN
-
baracuda-tensorrt
Safe Rust API for NVIDIA TensorRT runtime inference
-
baracuda-nvjitlink
Safe Rust wrappers for NVIDIA nvJitLink (CUDA 12.0+ JIT linker)
-
baracuda-curand
Safe Rust wrappers for NVIDIA cuRAND (pseudo- and quasi-random number generation)
-
blazen-image-diffusion
Local image generation backend for Blazen using diffusion-rs (pure Rust Stable Diffusion)
-
codemem-embeddings
Candle-based embedding service for Codemem using BAAI/bge-base-en-v1.5
-
baracuda-cudf
Safe Rust API skeleton for NVIDIA RAPIDS cuDF (GPU DataFrames)
-
blazen-embed-candle
Local embedding backend for Blazen using HuggingFace candle
-
cuda-config
Helper crate for finding CUDA libraries
-
baracuda-npp
Safe Rust wrappers for NVIDIA NPP (Performance Primitives). Core + signal subset at v0.1.
-
mnemefusion-llama-cpp-sys-2
Low Level Bindings to llama.cpp (MnemeFusion fork with build fixes)
-
ptx-builder
NVPTX build helper
-
icicle-cuda-runtime
Ingonyama's Rust wrapper of CUDA runtime
-
tropical-gemm-cuda
CUDA backend for tropical matrix multiplication
-
jawe-cuvs-sys-ii
Low-level rust bindings to libcuvs
-
hodu_cuda_kernels
hodu cuda kernels
-
RayBNN_Cell
Cell Position Generator for RayBNN
-
baracuda-cutlass-sys
Header acquisition for NVIDIA CUTLASS as a baracuda workspace dependency. Sparse-checkout fetch with file-locked caching; emits cargo:include for downstream build.rs consumers.
-
zenu-cuda
CUDA bindings for Rust
-
accel
GPGPU Framework for Rust
-
blazen-llm-candle
Local LLM backend for Blazen using candle inference engine
-
cuda
CUDA bindings
-
baracuda-cudnn-sys
Raw FFI bindings and dynamic loader for NVIDIA cuDNN (classic-API subset)
-
cudnn
safe Rust wrapper for CUDA's cuDNN
-
baracuda-cusolver-sys
Raw FFI bindings and dynamic loader for NVIDIA cuSOLVER (Dn subset)
-
baracuda-cusparse-sys
Raw FFI bindings and dynamic loader for NVIDIA cuSPARSE
-
kaio-core
KAIO core — PTX IR types and emission. Part of the KAIO GPU kernel authoring framework.
-
baracuda-cublas-sys
Raw FFI bindings and dynamic loader for NVIDIA cuBLAS (classic, Lt, Xt) libraries
-
baracuda-nvml-sys
Raw FFI bindings and dynamic loader for the NVIDIA Management Library (NVML)
-
RayBNN_Optimizer
Gradient Descent Optimizers and Genetic Algorithms using GPUs, CPUs, and FPGAs via CUDA, OpenCL, and oneAPI
-
ventura-cuda
cuda feature for ventura
-
emixai
Feature-gated AI helpers (audio, imaging, language, vision) for EssentialMix
-
baracuda-nccl-sys
Raw FFI bindings and dynamic loader for NVIDIA NCCL (multi-GPU collective communication)
-
baracuda-tensorrt-sys
Raw FFI bindings and dynamic loader for NVIDIA TensorRT (C API)
-
baracuda-nvjpeg-sys
Raw FFI bindings and dynamic loader for NVIDIA nvJPEG
-
baracuda-curand-sys
Raw FFI bindings and dynamic loader for NVIDIA cuRAND
-
singe-macros
Procedural macros for the Singe framework
-
baracuda-cufft-sys
Raw FFI bindings and dynamic loader for NVIDIA cuFFT
-
scir-gpu
SciR GPU foundations: device arrays and CUDA (feature-gated) elementwise/FIR kernels with CPU parity
-
luminal_cuda
Cuda compiler for luminal
-
cudarse-driver
Bindings to the CUDA Driver API that tries to stay faithful to the original
-
async-cuda-npp
Async NVIDIA Performance Primitives for Rust
-
cudf-sys
Native build script for linking against NVIDIA libcudf
-
zerch-embed
Local embedding model using ONNX Runtime
-
crown-jsasm
A cryptographic library
-
cudnn-sys
FFI bindings to cuDNN
-
jawe-cuvs-sys-iii
Low-level rust bindings to libcuvs
-
candle_embed
Text embeddings with Candle. Fast and configurable. Use any model from Hugging Face. CUDA or CPU powered.
-
jawe-cuvs-sys-iv
Low-level rust bindings to libcuvs
-
nam-supraseal-c2
CUDA Groth16 proof generator for Filecoin
-
whisper-cpp-plus-sys
Low-level FFI bindings for whisper.cpp
-
tensorrt-infer-sys
Raw FFI bindings for NVIDIA TensorRT inference
-
cuda-oxide
high-level, rusty wrapper over CUDA. It provides the best safety one can get when working with hardware.
-
gpufft-cuda-sys
Raw FFI bindings to cuFFT + CUDA Runtime. Internal plumbing for gpufft.
-
darknet-sys
-sys crate for Rust darknet wrapper
-
rcudnn-sys
FFI bindings to cuDNN
-
cmake-init
Initialize CMake project at speed
-
oxidized-transformers
Transformers library (not functional yet)
-
rcublas-sys
FFI bindings to cuBLAS
-
wgpu-cuda-interop
vulkan and cuda interop of memory
-
rummage-sys
Raw FFI bindings to the Rummage GPU Nostr mining library (CUDA)
-
tensorgraph-sys
backbone for tensorgraph, providing memory manamagement across devices
-
aprender-gpu
Pure Rust PTX generation for NVIDIA CUDA - no LLVM, no nvcc
Try searching with DuckDuckGo.