Lists (1)
Sort Name ascending (A-Z)
Stars
A Triton-only attention backend for vLLM
VUA stands for 'VAST Undivided Attention'. It's a global KVCache storage solution optimizing LLM time to first token (TTFT) and GPU utilization.
zartbot / aeron
Forked from aeron-io/aeronEfficient reliable UDP unicast, UDP multicast, and IPC message transport
Efficient reliable UDP unicast, UDP multicast, and IPC message transport
A comprehensive knowledge base for Huawei Ascend NPU development, structured as distributed Agent Skills. https://ascend-ai-coding.github.io/awesome-ascend-skills/
An Online Deep Learning Interface for HPC programs on NVIDIA GPUs
Winner 🏆 (Agent-only) MLSys 2026 - FlashInfer AI Kernel Generation Contest for the DeepSeek Sparse Attention (DSA) track with an average speedup of 34.93x
mKernel: fast multi-node, multi-GPU fused kernels
A library of HTML slide templates designed so any coding agent can pick the right one and produce a beautiful deck on the user's behalf, automatically.
Generation of diagrams like flowcharts or sequence diagrams from text in a similar manner as markdown
CLI for X/Twitter API v2 -- post, search, like, bookmark from your terminal
一个接入微信的本地生活 Agent Bridge,让 Codex / Claude Code 拥有时间感、行踪感、随机唤醒和自主唤醒能力,用主动陪伴替代所有番茄钟和效率工具,自动记录日记、维护生活时间轴、发送文件和表情包,并调用 MCP / 本地工具。
A streamlined and customizable framework for efficient large model (LLM, VLM, AIGC) evaluation and performance benchmarking.
Implementation of "UniPrefill: Universal Long-Context Prefill Acceleration via Block-wise Dynamic Sparsification"
TokenSpeed is a speed-of-light LLM inference engine.
A PyTorch native platform for training generative AI models
Official repository for "SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space"
Graphs that teach > graphs that impress. Turn any code into an interactive knowledge graph you can explore, search, and ask questions about. Works with Claude Code, Codex, Cursor, Copilot, Gemini C…
An empirical study of benchmarking LLM inference with KV cache offloading using vLLM and LMCache on NVIDIA GB200 with high-bandwidth NVLink-C2C .
A Python DSL to write Nvidia PTX for Hopper and Blackwell in JAX and PyTorch
Utility to convert between various subscription format
Wrap Gemini CLI, Antigravity, ChatGPT Codex, Claude Code, Grok Build as an OpenAI/Gemini/Claude/Codex compatible API service, allowing you to enjoy the free Gemini 3.1 Pro, GPT 5.5, Grok 4.3, Claud…
Clash Mihomo for iOS/MacOS/Android/Windows/Linux
A rule-based tunnel for Android.
Warp is an agentic development environment, born out of the terminal.
A Codex-powered Chrome side-panel assistant for page context, tabs, voice, and image workflows.