Stars
Kokkos C++ Performance Portability Programming Ecosystem: The Programming Model - Parallel Execution and Memory Abstraction
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Andr…
Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.
List of papers related to neural network quantization in recent AI conferences and journals.
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
NVIDIA curated collection of educational resources related to general purpose GPU programming.
A Python DSL to write Nvidia PTX for Hopper and Blackwell in JAX and PyTorch
GPUOcelot: A dynamic compilation framework for PTX
LLM Architecture Gallery source data
Open-source, low-cost 10.5 GHz PLFM phased array RADAR system
Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.6, DeepSeek, gpt-oss locally.
Qwen3.5-Thor — High-performance BF16/NVFP4 inference engine for Qwen3.5 model family on NVIDIA Jetson AGX Thor (SM110a Blackwell). C++17/CUDA, Ollama/OpenAI compatible API.
A high-performance inference engine for LLM, VLM, DiT and REC models, optimized for diverse AI accelerators.
llama.cpp fork with additional SOTA quants and improved performance
Distribute and run LLMs with a single file.
Diffusion model(SD,Flux,Wan,Qwen Image,Z-Image,...) inference in pure C/C++
C inference for Qwen3-ASR 0.6b and 1.7b transcriptions models
Pure C inference engine for Qwen3-TTS text-to-speech. No Python, no PyTorch — just C and BLAS. Supports 0.6B and 1.7B models, 9 voices, 10 languages.
On-device AI across mobile, embedded and edge for PyTorch
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
Run a 1-billion parameter LLM on a $10 board with 256MB RAM
An awesome & curated list of best LLMOps tools for developers
Packages related to gathering, viewing, and analyzing diagnostics data from robots.