kvcache

Star

Here are 29 public repositories matching this topic...

kvcache-ai / Mooncake

Star

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

reinforcement-learning inference rdma disaggregation llm vllm sglang kvcache trt-llm tokenspeed

Updated Jul 3, 2026
C++

uccl-project / uccl

Star

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

ai networking hpc amd gpu collective cuda p2p nvidia broadcom moe rdma allreduce llm kvcache

Updated Jun 27, 2026
C++

Zefan-Cai / R-KV

Star

[Neurips 2025] R-KV: Redundancy-aware KV Cache Compression for Reasoning Models

llm kvcache reasoning-models

Updated Jul 2, 2026
Python

ovg-project / kvcached

Star

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

serverless inference-engine llm llm-serving vllm llm-inference ollama llm-framework sglang kvcache gpu-sharing kvcached gpu-mutiplexing kvcache-optimization elastic-kvcache online-offline-coserve

Updated Jul 2, 2026
Python

agentic-in / inferoa

Star

Inference-native Tokenmaxxing Agent Harness for Loop Engineering

agent inference token llm kvcache agentic-ai agent-harness harness-engineering tokenmaxxing loop-engineering

Updated Jun 18, 2026
TypeScript

ModelEngine-Group / unified-cache-management

Star

Persist and reuse KV Cache to speedup your LLM.

gpu cuda nfs torch ssd dram hbm ucm npu ascend llm vllm deepseek kvcache

Updated Jul 2, 2026
Python

alibaba / tair-kvcache

Star

Alibaba Cloud's high-performance KVCache system for LLM inference, with components for global cache management, inference simulation(HiSim), and more.

simulator kv-cache llm kvcache hisim

Updated Jun 30, 2026
C++

NoakLiu / PiKV

Star

PiKV: KV Cache Management System for Mixture of Experts [Efficient ML System]

distributed-systems parallel-computing moe mixture-model management-system mixture-of-experts mlsystem kv-cache kvcache

Updated Jun 12, 2026
Python

rh-aiservices-bu / sardeenz

Star

Sardeenz is a proof-of-concept application that allows you to load more than one model on a given GPU. It allows you to add more and more models onto a GPU, until it is fully utilized.

vllm kvcache kvcached

Updated Jun 9, 2026
TypeScript

BJTU-ANT / CacheRoute

Star

CacheRoute is an innovative LLM scheduling scheme dedicated to enabling flexible KV cache reuse across LLM systems, improving task performance and system efficiency.

network routing knowledge-injection llm vllm llm-inference kvcache lmcache llm-task-scheduling kvcache-reuse