Lists (7)
Sort Name ascending (A-Z)
Stars
A simple, fast and robust program-aware agentic inference system.
rvLLM: High-performance LLM inference in Rust. Drop-in vLLM replacement.
Semantic search over videos using Gemini Embedding 2 or Qwen3-VL.
MCP server and Claude Code skill for Excalidraw — programmatic canvas toolkit to create, edit, and export diagrams via AI agents with real-time canvas sync.
A low-latency & high-throughput serving engine for LLMs
RBLN Model Zoo — Compile once. Deploy anywhere.
⚡ A seamless integration of HuggingFace Transformers & Diffusers with RBLN SDK for efficient inference on RBLN NPUs.
A plug-and-play compiler that delivers free-lunch optimizations for both inference and training.
A pure-Python implementation of the Nvidia CuTe layout algebra intended to be approachable and easy to learn.
Large Language Model Text Generation Inference
Learning notes and hands-on experiments for understanding modern Machine Learning System.
Local, Free CharacterAI with inference on Apple Silicon and ESP32 WebSocket transport
A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).
The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
SGLang Omni: High-Performance Multi-Stage Pipeline Framework for Omni Models
Claude Code skills that turn any codebase into an interactive knowledge graph you can explore, search, and ask questions about (Multi-platform e.g., Codex are supported).
a LLM inference engine to run on consumer hardware
Complete solutions to the Programming Massively Parallel Processors Edition 4
Dashboard for InferenceX™, Open Source Continuous Inference
A clean, single-file PyTorch implementation of Attention Residuals (Kimi Team, MoonshotAI, 2026), integrated with Grouped Query Attention (GQA), SwiGLU feed-forward networks, and Rotary Position Em…
A fast, helpful, and open-source document parser
Give your agents the power of the Hugging Face ecosystem
A high-performance inference system for large language models, designed for production environments.
MimikaStudio - A local-first application for macOS (Apple Silicon) + Agentic MCP Support
Fastest enterprise AI gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.