Skip to content
View ziansu's full-sized avatar
😼
Seeking truth
😼
Seeking truth

Highlights

  • Pro

Block or report ziansu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

slime is an LLM post-training framework for RL Scaling.

Python 6,209 905 Updated Jun 18, 2026

Scalable toolkit for efficient model reinforcement

Python 1,743 428 Updated Jun 18, 2026

Ongoing research training transformer models at scale

Python 16,748 4,101 Updated Jun 18, 2026

Official code for "Self-Distilled Agentic Reinforcement Learning"

Python 227 16 Updated May 27, 2026

PyTorch-based open-source code for paper "SOD: Step-wise On-policy Distillation for Small Language Model Agents"

Python 138 8 Updated May 22, 2026

Reinforcement Learning via Self-Distillation (SDPO)

Python 957 107 Updated Feb 18, 2026

Code for "Matching Features, Not Tokens: Energy-Based Fine-Tuning of Language Models".

Python 23 Updated Mar 16, 2026

[ICML 2026 Oral] Minimalist RL for Diffusion LLMs. 89.1% on GSM8K.

Python 145 5 Updated Jun 9, 2026

Official PyTorch implementation for "Principled RL for Diffusion LLMs Emerges from a Sequence-Level Perspective"

Python 38 2 Updated Jan 25, 2026

Easy and Efficient dLLM Fine-Tuning

Python 259 15 Updated Mar 2, 2026

CANDI: Continuous and Discrete Diffusion

Python 27 1 Updated Oct 27, 2025

Code for paper "SPG Sandwiched Policy Gradient for Masked Diffusion Language Models"

Python 60 6 Updated Oct 29, 2025

A community-maintained Python framework for creating mathematical animations.

Python 39,059 2,917 Updated Jun 17, 2026
Python 12 1 Updated Oct 2, 2025

Free Draft-and-Verification: Toward Lossless Parallel Decoding for Diffusion Large Language Models

Python 21 Updated May 19, 2026

Official Implementation for the paper "d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning"

Python 446 52 Updated Jan 26, 2026

Official implementation of "DPad: Efficient Diffusion Language Models with Suffix Dropout"

Python 62 5 Updated Feb 13, 2026

SDAR (Synergy of Diffusion and AutoRegression), a large diffusion language model(1.7B, 4B, 8B, 30B)

Python 359 22 Updated Jun 2, 2026

[ICLR 2026 Oral] DiffusionNFT: Online Diffusion Reinforcement with Forward Process

Python 914 37 Updated Feb 10, 2026

🥢像老乡鸡🐔那样做饭。已添加2026年发布的《老乡鸡菜品溯源报告 2.0中新出现的菜品。主要部分于2024年完工,非老乡鸡官方仓库。文字来自《老乡鸡菜品溯源报告》,并做归纳、编辑与整理。CookLikeHOC.

Dockerfile 23,605 2,344 Updated May 8, 2026

EDB: The Ethereum Project Debugger

Rust 364 41 Updated May 10, 2026

Tongyi Deep Research, the Leading Open-source Deep Research Agent

Python 19,453 1,490 Updated Feb 27, 2026

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 29,165 6,604 Updated Jun 18, 2026

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 83,253 18,195 Updated Jun 18, 2026

Open Source Implementation of Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution

C++ 100 15 Updated Jul 18, 2025

The official GitHub repo for the survey paper "A Survey on Diffusion Language Models".

1,106 52 Updated May 29, 2026

Code and implementations for the paper "AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning" by Zhiheng Xi et al.

Python 778 74 Updated Feb 15, 2026

[ICLR 2026] Official code for TraceRL: Revolutionizing post-training for Diffusion LLMs, powering the SOTA TraDo series.

Python 510 43 Updated Jan 28, 2026
Next