-
Alibaba Cloud
- Hangzhou China
-
22:22
(UTC +08:00) - https://tonylu.dev (UNDER MAINTENANCE)
- @tonyluj
Starred repositories
A high-performance and light-weight router for vLLM large scale deployment
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
👻 Ghostty is a fast, feature-rich, and cross-platform terminal emulator that uses platform-native UI and GPU acceleration.
Manages Unified Access to Generative AI Services built on Envoy Gateway
A GPU-accelerated cross-platform terminal emulator and multiplexer written by @wez and implemented in Rust
A collection of 100+ specialized Claude Code subagents covering a wide range of development use cases
IronClaw is OpenClaw inspired implementation in Rust focused on privacy and security
Your Personal AI Assistant; easy to install, deploy on your own machine or on the cloud; supports multiple chat apps with easily extensible capabilities.
SGLang Omni: High-Performance Multi-Stage Pipeline Framework for Omni Models
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
vineyard (v6d): an in-memory immutable data manager. (Project under CNCF, TAG-Storage)
Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing, chat histor…
LLMRouter: An Open-Source Library for LLM Routing
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
A next.js web application that integrates AI capabilities with draw.io diagrams. This app allows you to create, modify, and enhance diagrams through natural language commands and AI-assisted visual…
React app for inspecting, building and debugging with the Realtime API
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
A framework for efficient model inference with omni-modality models
A Lightweight LLM Inference Performance Simulator
SGLang is a high-performance serving framework for large language models and multimodal models.
Shared data types for building collaborative software
WebAssembly Micro Runtime (WAMR)
A highly customable, adaptable, runtime agnostic and WASM/WASI friendly Gossip protocol (SWIM) which helps manage cluster membership and member failure detection.