Skip to content
View FeiGSSS's full-sized avatar
😄
I may be slow to respond.
😄
I may be slow to respond.
  • Beijing
  • 04:11 (UTC -12:00)

Block or report FeiGSSS

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 2,230 189 Updated Dec 23, 2025

Boosting RAG on model and system performance with context reuse

Python 14 1 Updated Dec 19, 2025
Python 31 2 Updated Oct 16, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 4,293 355 Updated Dec 23, 2025

Contexts Optical Compression

Python 21,555 1,926 Updated Oct 25, 2025

[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Python 515 37 Updated Feb 10, 2025

Research and development (R&D) is crucial for the enhancement of industrial productivity, especially in the AI era, where the core aspects of R&D are mainly focused on data and models. We are commi…

Python 9,912 1,101 Updated Dec 23, 2025

🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSy…

3,493 354 Updated Jul 25, 2025

The best ChatGPT that $100 can buy.

Python 39,127 4,954 Updated Dec 9, 2025

chat log tool, easily use your own chat data. 聊天记录工具,轻松使用自己的聊天数据

9,112 2,312 Updated Oct 20, 2025

Nano vLLM

Python 10,035 1,256 Updated Nov 3, 2025

Minimalist vLLM implementation in Rust

Rust 83 13 Updated Dec 23, 2025

Zotero plugin to automatically move attachments and link them

JavaScript 1,116 22 Updated Dec 12, 2025

⛷ Lightweight Markdown app to help you write great sentences.

Swift 7,188 419 Updated Dec 23, 2025

😼 优雅地使用基于 clash/mihomo 的代理环境

Shell 7,165 880 Updated Dec 23, 2025

Curated collection of papers in MoE model inference

320 11 Updated Oct 20, 2025

Machine Learning Engineering Open Book

Python 16,085 987 Updated Dec 20, 2025

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 19,460 1,999 Updated Nov 1, 2025

My learning notes for ML SYS.

Python 4,771 303 Updated Dec 22, 2025

Qwen3-Coder is the code version of Qwen3, the large language model series developed by Qwen team, Alibaba Cloud.

Python 14,716 1,024 Updated Dec 4, 2025

[EMNLP 2025] LightThinker: Thinking Step-by-Step Compression

Python 126 5 Updated Apr 12, 2025

The official implementation of "ML-Master: Towards AI-for-AI via Integration of Exploration and Reasoning"

Python 213 33 Updated Dec 16, 2025

AG-UI: the Agent-User Interaction Protocol. Bring Agents into Frontend Applications.

TypeScript 10,928 1,002 Updated Dec 23, 2025

A sparse attention kernel supporting mix sparse patterns

C++ 411 39 Updated Dec 16, 2025

A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations

Python 16,246 1,190 Updated Dec 23, 2025

A simple and trans-platform agent framework and tutorial

Jupyter Notebook 197 42 Updated Dec 21, 2025

[ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoring

Python 261 19 Updated Jul 6, 2025

[ICML'25] Our study systematically investigates massive values in LLMs' attention mechanisms. First, we observe massive values are concentrated in low-frequency dimensions across different attentio…

Python 86 1 Updated Jun 20, 2025

Unified KV Cache Compression Methods for Auto-Regressive Models

Python 1,291 159 Updated Jan 4, 2025
Next