Skip to content
View zhuango's full-sized avatar
🎯
Focusing
🎯
Focusing
  • Peking

Block or report zhuango

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
269 results for source starred repositories
Clear filter

APEX+ is an LLM Serving Simulator

Python 37 6 Updated Jun 16, 2025

Supercharge Your LLM with the Fastest KV Cache Layer

Python 5,954 700 Updated Nov 11, 2025

LLM serving cluster simulator

Jupyter Notebook 120 12 Updated Apr 25, 2024

Simulator for LLM inference on an abstract 3D AIMC-based accelerator

Python 24 4 Updated Sep 18, 2025

A large-scale simulation framework for LLM inference

Python 474 89 Updated Jul 25, 2025

TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators

Python 93 10 Updated Jun 14, 2025

[TMLR] A curated list of language modeling researches for code (and other software engineering activities), plus related datasets.

3,049 203 Updated Nov 10, 2025

Awesome LLM compression research papers and tools.

1,701 110 Updated Nov 10, 2025

Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.

Python 574 70 Updated Sep 11, 2024

Official Repository of Absolute Zero Reasoner

Python 1,740 288 Updated Aug 24, 2025

[ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length

Python 128 7 Updated Oct 29, 2025

A live stream development of RL tunning for LLM agents

Python 3,590 499 Updated Oct 8, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,712 985 Updated Nov 6, 2025

A very simple GRPO implement for reproducing r1-like LLM thinking.

Python 1,438 110 Updated Aug 5, 2025

Curated collection of papers in machine learning systems

452 29 Updated Nov 8, 2025

TransMLA: Multi-Head Latent Attention Is All You Need (NeurIPS 2025 Spotlight)

Python 407 22 Updated Sep 23, 2025

Fully open data curation for reasoning models

Python 2,140 178 Updated Sep 3, 2025

Democratizing Reinforcement Learning for LLMs

Jupyter Notebook 4,699 441 Updated Nov 11, 2025

Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.

Python 48,137 3,945 Updated Nov 10, 2025

Simple RL training for reasoning

Python 3,784 279 Updated Aug 3, 2025

📰 Must-read papers and blogs on Speculative Decoding ⚡️

1,014 52 Updated Oct 25, 2025

Reproduce R1 Zero on Logic Puzzle

Python 2,411 162 Updated Mar 20, 2025

Scalable data pre processing and curation toolkit for LLMs

Python 1,209 187 Updated Nov 10, 2025

A series of technical report on Slow Thinking with LLM

Python 744 41 Updated Aug 13, 2025

ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)

Python 677 50 Updated Jan 20, 2025

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

Python 3,268 421 Updated Nov 10, 2025

Performance Estimates for Transformer AI Models in Science

Jupyter Notebook 9 1 Updated Oct 2, 2024

A recipe for online RLHF and online iterative DPO.

Python 536 49 Updated Dec 28, 2024
Python 29 2 Updated Feb 10, 2025
Next