-
Purdue University
- West Lafayette, Indiana
- ziansu.github.io
Highlights
- Pro
Lists (14)
Sort Name ascending (A-Z)
Stars
slime is an LLM post-training framework for RL Scaling.
Scalable toolkit for efficient model reinforcement
Ongoing research training transformer models at scale
Official code for "Self-Distilled Agentic Reinforcement Learning"
PyTorch-based open-source code for paper "SOD: Step-wise On-policy Distillation for Small Language Model Agents"
Reinforcement Learning via Self-Distillation (SDPO)
Code for "Matching Features, Not Tokens: Energy-Based Fine-Tuning of Language Models".
[ICML 2026 Oral] Minimalist RL for Diffusion LLMs. 89.1% on GSM8K.
Official PyTorch implementation for "Principled RL for Diffusion LLMs Emerges from a Sequence-Level Perspective"
CANDI: Continuous and Discrete Diffusion
Code for paper "SPG Sandwiched Policy Gradient for Masked Diffusion Language Models"
A community-maintained Python framework for creating mathematical animations.
Free Draft-and-Verification: Toward Lossless Parallel Decoding for Diffusion Large Language Models
Official Implementation for the paper "d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning"
Official implementation of "DPad: Efficient Diffusion Language Models with Suffix Dropout"
SDAR (Synergy of Diffusion and AutoRegression), a large diffusion language model(1.7B, 4B, 8B, 30B)
[ICLR 2026 Oral] DiffusionNFT: Online Diffusion Reinforcement with Forward Process
🥢像老乡鸡🐔那样做饭。已添加2026年发布的《老乡鸡菜品溯源报告 2.0中新出现的菜品。主要部分于2024年完工,非老乡鸡官方仓库。文字来自《老乡鸡菜品溯源报告》,并做归纳、编辑与整理。CookLikeHOC.
Tongyi Deep Research, the Leading Open-source Deep Research Agent
SGLang is a high-performance serving framework for large language models and multimodal models.
A high-throughput and memory-efficient inference and serving engine for LLMs
Open Source Implementation of Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution
The official GitHub repo for the survey paper "A Survey on Diffusion Language Models".
Code and implementations for the paper "AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning" by Zhiheng Xi et al.
[ICLR 2026] Official code for TraceRL: Revolutionizing post-training for Diffusion LLMs, powering the SOTA TraDo series.