Lists (10)
Sort Name ascending (A-Z)
Stars
A minimalist, open source online pastebin where the server has zero knowledge of pasted data. Data is encrypted/decrypted in the browser using 256 bits AES.
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
SGLang is a high-performance serving framework for large language models and multimodal models.
Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.
UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)
A platform for community discussion. Free, open, simple.
The HIP Environment and ROCm Kit - A lightweight open source build system for HIP and ROCm
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
Analyze computation-communication overlap in V3/R1.
Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing and logging. [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthr…
高颜值的第三方网易云播放器,支持 Windows / macOS / Linux
A sparse attention kernel supporting mix sparse patterns
ASCII generator (image to text, image to image, video to video)
A file explorer tree for neovim written in lua
Official implementation of paper "On the Diagram of Thought" (https://arxiv.org/abs/2409.10038)
InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)
Infinite Photorealistic Worlds using Procedural Generation
Efficient and easy multi-instance LLM serving
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.
Fast, Flexible and Portable Structured Generation
Efficient Triton Kernels for LLM Training
[ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding
16-fold memory access reduction with nearly no loss
A throughput-oriented high-performance serving framework for LLMs