Stars
Achieve state of the art inference performance with modern accelerators on Kubernetes
A framework for few-shot evaluation of language models.
🪢 Open source AI engineering platform: LLM evals, observability, metrics, prompt management, playground, datasets. Integrates with OpenTelemetry, LangChain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
TensorZero is an open-source LLMOps platform that unifies an LLM gateway, observability, evaluation, optimization, and experimentation.
Learn it. Build it. Ship it for others.
LMCache: Supercharge Your LLM with the Fastest KV Cache Layer
Vendor-agnostic orchestration for training, inference and agentic workloads across NVIDIA, AMD, TPU, and Tenstorrent on clouds, Kubernetes, and bare metal.
A curated list of awesome open source and commercial platforms for serving models in production 🚀
Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes
An inference server for your machine learning models, including support for multiple frameworks, multi-model serving and more
SGLang is a high-performance serving framework for large language models and multimodal models.
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
Hundreds of models & providers. One command to find what runs on your hardware.
Supercomputing for Artificial Intelligence
Implementation of my AI agent system for ERC 3: https://erc.timetoact-group.at
Implementation of my RAG system that won all categories in Enterprise RAG Challenge 2
Needle simplifies building RAG pipelines.
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
Pruna is a model optimization framework built for developers, enabling you to deliver faster, more efficient models with minimal overhead.
A curated list of materials on AI efficiency
The Frontend Stack for Agents & Generative UI. React, Angular, Mobile, Slack, and more. Makers of the AG-UI Protocol
🤘 TT-NN operator library, and TT-Metalium low level kernel programming model.
Hexagon-MLIR is a compiler toolchain for compiling and executing AI kernels and models on Qualcomm Hexagon Neural Processing Units (NPUs).
FlashInfer: Kernel Library for LLM Serving
Open-source localization engineering tools. Connects to Lingo.dev localization engineering platform for consistent, quality translations.