Highlights
- Pro
Stars
A framework for efficient model inference with omni-modality models
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
An open-source long-horizon SuperAgent harness that researches, codes, and creates. With the help of sandboxes, memories, tools, skill, subagents and message gateway, it handles different levels of…
KV cache store for distributed LLM inference
A fast, clean, responsive Hugo theme.
Cost-efficient and pluggable Infrastructure components for GenAI inference
Universal LLM Deployment Engine with ML Compilation
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.
Command line tool to create and query container image manifest list/indexes
Reproduce of Pre-warming is Not Enough (SoCC'24)
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
Predict the performance of LLM inference services
A throughput-oriented high-performance serving framework for LLMs
vineyard (v6d): an in-memory immutable data manager. (Project under CNCF, TAG-Storage)
Efficient and easy multi-instance LLM serving
A library developed by Volcano Engine for high-performance reading and writing of PyTorch model files.
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Serverless LLM Serving for Everyone.
Fast Distributed Inference Serving for Large Language Models
Custom controller that extends the Horizontal Pod Autoscaler
SpotServe: Serving Generative Large Language Models on Preemptible Instances
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
[COLM 2024] OpenAgents: An Open Platform for Language Agents in the Wild