- Beijing
-
22:33
(UTC -12:00)
Lists (3)
Sort Name ascending (A-Z)
Stars
分享AI Infra知识&代码练习:PyTorch/vLLM/SGLang框架入门⚡️、性能加速🚀、大模型基础🧠、AI软硬件🔧等
Persist and reuse KV Cache to speedup your LLM.
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
Boosting RAG on model and system performance with context reuse
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Research and development (R&D) is crucial for the enhancement of industrial productivity, especially in the AI era, where the core aspects of R&D are mainly focused on data and models. We are commi…
🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSy…
chat log tool, easily use your own chat data. 聊天记录工具,轻松使用自己的聊天数据
Zotero plugin to automatically move attachments and link them
⛷ Lightweight Markdown app to help you write great sentences.
😼 优雅地使用基于 clash/mihomo 的代理环境
Curated collection of papers in MoE model inference
Machine Learning Engineering Open Book
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
My learning notes for ML SYS.
Qwen3-Coder is the code version of Qwen3, the large language model series developed by Qwen team.
[EMNLP 2025] LightThinker: Thinking Step-by-Step Compression
The official implementation of "ML-Master: Towards AI-for-AI via Integration of Exploration and Reasoning"
AG-UI: the Agent-User Interaction Protocol. Bring Agents into Frontend Applications.
A sparse attention kernel supporting mix sparse patterns
A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations
A simple and trans-platform agent framework and tutorial