Stars
Sutskever 30 implementations inspired by https://papercode.vercel.app/
LLMRouter: An Open-Source Library for LLM Routing
Agentic networking policies and governance for agents and tools in Kubernetes
AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports VLMs, LLMs, embeddings, and speech-to-text.
Proposals and discussions for the AI Conformance Working Group.
🚀 Next Generation AI One-Stop Internationalization Solution. 🚀 下一代 AI 一站式 B/C 端解决方案,支持 OpenAI,Midjourney,Claude,讯飞星火,Stable Diffusion,DALL·E,ChatGLM,通义千问,腾讯混元,360 智脑,百川 AI,火山方舟,新必应,Gemini,Moonshot …
A high-performance inference engine for LLMs, optimized for diverse AI accelerators.
SGLang is a high-performance serving framework for large language models and multimodal models.
A flexible, high-performance serving system for machine learning models
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
Bootstrap Kubernetes the hard way. No scripts.
An open-source AI agent that brings the power of Gemini directly into your terminal.
llm-d benchmark scripts and tooling
Run Slurm on Kubernetes. A Slinky project.
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
Supercharge Your LLM with the Fastest KV Cache Layer
helm charts for deploying models with llm-d
Cloud-native high-performance edge/middle/service proxy
Cost-efficient and pluggable Infrastructure components for GenAI inference
vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
Next-generation AI Agent Optimization Platform: Cozeloop addresses challenges in AI agent development by providing full-lifecycle management capabilities from development, debugging, and evaluation…
Text-audio foundation model from Boson AI
GenAI inference performance benchmarking tool
An open-source, code-first Python toolkit for building, evaluating, and deploying sophisticated AI agents with flexibility and control.
A high-throughput and memory-efficient inference and serving engine for LLMs
A Datacenter Scale Distributed Inference Serving Framework