Starred repositories
An open-source long-horizon SuperAgent harness that researches, codes, and creates. With the help of sandboxes, memories, tools, skill, subagents and message gateway, it handles different levels of…
FlashMLA: Efficient Multi-head Latent Attention Kernels
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
A validation and profiling tool for AI infrastructure
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations
A plug-in replacement for DataLoader to load Imagenet disk-sequentially in PyTorch.
Universal Python binding for the LMDB 'Lightning' Database
Vector Search Engine base on BRPC + FAISS
The official PyTorch implementation of Google's Gemma models
A simple program to calculate and visualize the FLOPs and Parameters of Pytorch models, with handy CLI and easy-to-use Python API.
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
Make huge neural nets fit in memory
Provide Python access to the NVML library for GPU diagnostics
Building blocks for foundation models.
Code repository for the paper - "Matryoshka Representation Learning"
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
A curated list of reinforcement learning with human feedback resources (continually updated)
depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.
High-speed Large Language Model Serving for Local Deployment