Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
-
Updated
Jul 3, 2026 - C++
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
[Neurips 2025] R-KV: Redundancy-aware KV Cache Compression for Reasoning Models
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
Inference-native Tokenmaxxing Agent Harness for Loop Engineering
PiKV: KV Cache Management System for Mixture of Experts [Efficient ML System]
CacheRoute is an innovative LLM scheduling scheme dedicated to enabling flexible KV cache reuse across LLM systems, improving task performance and system efficiency.
(ACL2025 oral) SCOPE: Optimizing KV Cache Compression in Long-context Generation
🔥 [ICML'26] ParisKV: Fast and Drift-Robust KV-Cache Retrieval for Long-Context LLMs
Span Queries: What if we had a way to plan and optimize GenAI like we do for SQL?
A TurboQuant implementation with Llama.cpp for AMD with Vulkan runtime
High-Performance KV Cache Sharing Library
An empirical study of benchmarking LLM inference with KV cache offloading using vLLM and LMCache on NVIDIA GB200 with high-bandwidth NVLink-C2C .
KV Cache with PagedAttention vs PagedAttention + TurboQuant - experiments across token sizes comparing memory, latency, and accuracy.
KV-cache compression for LLMs: reference implementations of TurboAngle and TurboQuant codecs with Triton GPU kernels
Add a description, image, and links to the kvcache topic page so that developers can more easily learn about it.
To associate your repository with the kvcache topic, visit your repo's landing page and select "manage topics."