Lists (1)
Sort Name ascending (A-Z)
Stars
This is the homepage of a new book entitled "Mathematical Foundations of Reinforcement Learning."
Minimalist developer portfolio using Next.js 14, React, TailwindCSS, Shadcn UI and Magic UI
🐶 Kubernetes CLI To Manage Your Clusters In Style!
Ongoing research training transformer models at scale
slime is an LLM post-training framework for RL Scaling.
A Survey of Reinforcement Learning for Large Reasoning Models
Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.
KernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)
Fast and memory-efficient exact attention
Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks …
SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime
A streamlined and customizable framework for efficient large model (LLM, VLM, AIGC) evaluation and performance benchmarking.
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
A framework for few-shot evaluation of language models.
Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.
PyTorch native quantization and sparsity for training and inference
AIInfra(AI 基础设施)指AI系统从底层芯片等硬件,到上层软件栈支持AI大模型训练和推理。
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
This repository contains the training code of ParetoQ introduced in our work "ParetoQ Scaling Laws in Extremely Low-bit LLM Quantization"
Code repo for the paper "LLM-QAT Data-Free Quantization Aware Training for Large Language Models"
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
The simplest, fastest repository for training/finetuning medium-sized GPTs.
Code for the paper "Language Models are Unsupervised Multitask Learners"