Lists (1)
Sort Name ascending (A-Z)
Starred repositories
Achieve state of the art inference performance with modern accelerators on Kubernetes
vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
eBPF-based autoinstrumentation of web applications and network metrics
FlashMLA: Efficient Multi-head Latent Attention Kernels
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
rvLLM: High-performance LLM inference in Rust. Drop-in vLLM replacement.
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
An open-source long-horizon SuperAgent harness that researches, codes, and creates. With the help of sandboxes, memories, tools, skill, subagents and message gateway, it handles different levels of…
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
Machine Learning Engineering Open Book
A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations
Lightpanda: the headless browser designed for AI and automation
A vulnerability scanner for container images and filesystems
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
ODIN [for Codex CLI as a plugin] - Outline Driven development approach for agentic INtelligence
AI agents running research on single-GPU nanochat training automatically
A high-throughput and memory-efficient inference and serving engine for LLMs
Lightweight static analysis for many languages. Find bug variants with patterns that look like source code.
🔥 xCrash provides the Android app with the ability to capture java crash, native crash and ANR. No root permission or any system permissions are required.
Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.
⚡A CLI tool for code structural search, lint and rewriting. Written in Rust
Web-based SQLite database browser written in Python
📑 PageIndex: Document Index for Vectorless, Reasoning-based RAG
A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization.
Financial data platform for analysts, quants and AI agents.
Fastest enterprise AI gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.
System Level Intelligent Router for Mixture-of-Models at Cloud, Data Center and Edge
Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing and logging. [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthr…