Highlights
- Pro
Starred repositories
⚡ HugoBlox: Markdown sites in minutes. Academic/resume/lab/portfolio for AI researchers & startups. Premium templates. Deploy to GitHub Pages now in 1-click 👇
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…
A next.js web application that integrates AI capabilities with draw.io diagrams. This app allows you to create, modify, and enhance diagrams through natural language commands and AI-assisted visual…
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
Qianfan-VL: Domain-Enhanced Universal Vision-Language Models
Bridge Megatron-Core to Hugging Face/Reinforcement Learning
An extremely fast Python package and project manager, written in Rust.
SGLang is a fast serving framework for large language models and vision language models.
verl: Volcano Engine Reinforcement Learning for LLMs
Certbot is EFF's tool to obtain certs from Let's Encrypt and (optionally) auto-enable HTTPS on your server. It can also act as a client for any other CA that uses the ACME protocol.
Distributed Compiler based on Triton for Parallel Systems
MAGI-1: Autoregressive Video Generation at Scale
mimalloc is a compact general purpose allocator with excellent performance.
Lightweight in-process concurrent programming
A lightweight data processing framework built on DuckDB and 3FS.
A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
DeepEP: an efficient expert-parallel communication library
FlashMLA: Efficient Multi-head Latent Attention Kernels
Fully open reproduction of DeepSeek-R1
Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals
QLoRA: Efficient Finetuning of Quantized LLMs