Skip to content
View zhuango's full-sized avatar
🎯
Focusing
🎯
Focusing
  • Peking

Block or report zhuango

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
269 results for source starred repositories
Clear filter

APEX+ is an LLM Serving Simulator

Python 37 6 Updated Jun 16, 2025

Supercharge Your LLM with the Fastest KV Cache Layer

Python 5,918 694 Updated Nov 7, 2025

LLM serving cluster simulator

Jupyter Notebook 119 12 Updated Apr 25, 2024

Simulator for LLM inference on an abstract 3D AIMC-based accelerator

Python 24 4 Updated Sep 18, 2025

A large-scale simulation framework for LLM inference

Python 473 89 Updated Jul 25, 2025

TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators

Python 91 10 Updated Jun 14, 2025

[TMLR] A curated list of language modeling researches for code (and other software engineering activities), plus related datasets.

3,046 202 Updated Nov 5, 2025

Awesome LLM compression research papers and tools.

1,700 109 Updated Nov 6, 2025

Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.

Python 573 70 Updated Sep 11, 2024

Official Repository of Absolute Zero Reasoner

Python 1,737 290 Updated Aug 24, 2025

[ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length

Python 125 7 Updated Oct 29, 2025

A live stream development of RL tunning for LLM agents

Python 3,580 498 Updated Oct 8, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,697 976 Updated Nov 6, 2025

A very simple GRPO implement for reproducing r1-like LLM thinking.

Python 1,432 109 Updated Aug 5, 2025

Curated collection of papers in machine learning systems

448 29 Updated Oct 4, 2025

TransMLA: Multi-Head Latent Attention Is All You Need (NeurIPS 2025 Spotlight)

Python 407 22 Updated Sep 23, 2025

Fully open data curation for reasoning models

Python 2,136 177 Updated Sep 3, 2025

Democratizing Reinforcement Learning for LLMs

Jupyter Notebook 4,685 440 Updated Nov 4, 2025

Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.

Python 48,021 3,931 Updated Nov 7, 2025

Simple RL training for reasoning

Python 3,783 279 Updated Aug 3, 2025

📰 Must-read papers and blogs on Speculative Decoding ⚡️

1,007 52 Updated Oct 25, 2025

Reproduce R1 Zero on Logic Puzzle

Python 2,410 162 Updated Mar 20, 2025

Scalable data pre processing and curation toolkit for LLMs

Python 1,202 187 Updated Nov 7, 2025

A series of technical report on Slow Thinking with LLM

Python 743 41 Updated Aug 13, 2025

ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)

Python 676 50 Updated Jan 20, 2025

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

Python 3,263 420 Updated Nov 7, 2025

Performance Estimates for Transformer AI Models in Science

Jupyter Notebook 9 1 Updated Oct 2, 2024

A recipe for online RLHF and online iterative DPO.

Python 536 49 Updated Dec 28, 2024
Python 29 2 Updated Feb 10, 2025
Next