Lists (1)
Sort Name ascending (A-Z)
Stars
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
Minimalist developer portfolio using Next.js 14, React, TailwindCSS, Shadcn UI and Magic UI
how to optimize some algorithm in cuda.
Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL
This is the homepage of a new book entitled "Mathematical Foundations of Reinforcement Learning."
🐶 Kubernetes CLI To Manage Your Clusters In Style!
Ongoing research training transformer models at scale
slime is an LLM post-training framework for RL Scaling.
A Survey of Reinforcement Learning for Large Reasoning Models
Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.
KernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)
Fast and memory-efficient exact attention
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks …
SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime
A streamlined and customizable framework for efficient large model (LLM, VLM, AIGC) evaluation and performance benchmarking.
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
A framework for few-shot evaluation of language models.
PyTorch native quantization and sparsity for training and inference
AIInfra(AI 基础设施)指AI系统从底层芯片等硬件,到上层软件栈支持AI大模型训练和推理。
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
The simplest, fastest repository for training/finetuning medium-sized GPTs.