Skip to content
View zhuango's full-sized avatar
🎯
Focusing
🎯
Focusing
  • Peking

Block or report zhuango

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Python 8,785 572 Updated May 3, 2024

APEX+ is an LLM Serving Simulator

Python 37 6 Updated Jun 16, 2025

Supercharge Your LLM with the Fastest KV Cache Layer

Python 5,917 691 Updated Nov 6, 2025

LLM serving cluster simulator

Jupyter Notebook 119 12 Updated Apr 25, 2024

Simulator for LLM inference on an abstract 3D AIMC-based accelerator

Python 24 4 Updated Sep 18, 2025

A large-scale simulation framework for LLM inference

Python 473 89 Updated Jul 25, 2025

TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators

Python 91 10 Updated Jun 14, 2025

[TMLR] A curated list of language modeling researches for code (and other software engineering activities), plus related datasets.

3,040 201 Updated Nov 5, 2025

Awesome LLM compression research papers and tools.

1,700 109 Updated Nov 6, 2025

Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.

Python 573 70 Updated Sep 11, 2024

Official Repository of Absolute Zero Reasoner

Python 1,736 289 Updated Aug 24, 2025

[ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length

Python 125 7 Updated Oct 29, 2025

A live stream development of RL tunning for LLM agents

Python 3,577 498 Updated Oct 8, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,696 973 Updated Nov 6, 2025

A very simple GRPO implement for reproducing r1-like LLM thinking.

Python 1,430 109 Updated Aug 5, 2025

Curated collection of papers in machine learning systems

448 29 Updated Oct 4, 2025

TransMLA: Multi-Head Latent Attention Is All You Need (NeurIPS 2025 Spotlight)

Python 405 22 Updated Sep 23, 2025

Fully open data curation for reasoning models

Python 2,135 177 Updated Sep 3, 2025

Democratizing Reinforcement Learning for LLMs

Jupyter Notebook 4,679 440 Updated Nov 4, 2025

Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.

Python 47,985 3,925 Updated Nov 6, 2025

Simple RL training for reasoning

Python 3,782 279 Updated Aug 3, 2025

📰 Must-read papers and blogs on Speculative Decoding ⚡️

1,005 52 Updated Oct 25, 2025

Reproduce R1 Zero on Logic Puzzle

Python 2,410 163 Updated Mar 20, 2025

Scalable data pre processing and curation toolkit for LLMs

Python 1,200 186 Updated Nov 5, 2025

A series of technical report on Slow Thinking with LLM

Python 743 41 Updated Aug 13, 2025

ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)

Python 675 50 Updated Jan 20, 2025

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

Python 3,260 419 Updated Nov 6, 2025

Performance Estimates for Transformer AI Models in Science

Jupyter Notebook 9 1 Updated Oct 2, 2024

A recipe for online RLHF and online iterative DPO.

Python 536 49 Updated Dec 28, 2024
Python 29 2 Updated Feb 10, 2025
Next