Lists (26)
Sort Name ascending (A-Z)
3D&LLM
agent
Awesome
Data selection
diffusion
diffusion LLM
efficiency
Foundation model
Framework
Generation
generation agent
LLM reasoning
LLM RL
Long context understanding
MLLM reasoning
MLLM safety
MLLM understanding
multimodal embedding
OCR
on policy distillation
other
safety
self evolving llm
steaming VLM
unified model
unified model reasoning
Stars
The code for paper "Learning from the Self-future:On-policy Self-distillation for dLLMs"
OmniAgent (ICML 2026): the first native omni-modal agent for active video perception — a 7B agent that beats Qwen2.5-VL-72B with 73% fewer frames.
Explain in Your Own Words: Improving Reasoning via Token-Selective Dual Knowledge Distillation, The Fourteenth International Conference on Learning Representations (ICLR) 2026, Accepted
SCOPE: Signal-Calibrated On-Policy Distillation Enhancement with Dual-Path Adaptive Weighting
On Policy Distillation Build on top of Verl
Official Implementation of Trajectory-Refined Distillation
Elevate your AI research writing, no more tedious polishing ✨
Bash is all you need - A nano claude code–like 「agent harness」, built from 0 to 1
Academic Research Skills for Claude Code: research → write → review → revise → finalize
一个基于nano banana pro🍌的原生AI PPT生成应用,迈向"Vibe PPT"; 支持上传任意模板图片,上传任意素材&智能解析,一句话/大纲/页面描述自动生成PPT,口头修改指定区域、一键导出可编辑ppt - An AI-native slides generator based on nano banana pro🍌
MiMo Code: Where Models and Agents Co-Evolve
Source code of paper "RLCSD: Reinforcement Learning with Contrastive On-Policy Self-Distillation"
Official implementation of "Constitutional On-Policy Safe Distillation"
Official code for "Self-Distilled Agentic Reinforcement Learning"
PyTorch-based open-source code for paper "SOD: Step-wise On-policy Distillation for Small Language Model Agents"
TCOD: Exploring Temporal Curriculum in On-Policy Distillation for Multi-turn Autonomous Agents
The official implementation of our preprint paper "When Are Teacher Tokens Reliable? Position-Weighted On-Policy Self-Distillation for Reasoning"
A curated collection of papers, technical reports, frameworks, and tools for on-policy distillation (OPD) of large language models
GPT Image 2 prompt gallery, image prompt library, agentic skill, and CLI for OpenAI image generation/editing
UniRL is a Framework for Unified Multimodal Model Reinforcement Learning
Vision-OPD is a regional-to-global on-policy self-distillation framework that transfers a model's own privileged crop-conditioned perception to its full-image policy, enabling fine-grained visual u…
LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards
SkillOpt is a text-space optimizer that trains reusable natural-language skills for frozen LLM agents through trajectory-driven edits, validation-gated updates, and deployable best_skill.md artifacts.
Codebase for PrismMirror: Real-Time Human Frontal View Synthesis from a Single Image
[ICML 2026] Official codebase for "Flash-VAED: Plug-and-Play VAE Decoders for Efficient Video Generation"
Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation