-
University of Maryland
- College Park, MD, US
- https://yu-fangxu.github.io/
Highlights
- Pro
Stars
A curated collection of papers, technical reports, frameworks, and tools for on-policy distillation (OPD) of large language models
A curated collection of papers and resources on On-Policy Distillation for Large Language Models.
A curated list of resources (surveys, papers, benchmarks, and opensource projects) on Rubrics
SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?
SenseNova-U series: Native Unified Paradigm with NEO-unify from the First Principles
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
Paper list of agent for science
[ICLR 2026] Official repo for "Spotlight on Token Perception for Multimodal Reinforcement Learning"
[Findings of ACL 2026] ArrowGEV: Grounding Events in Video via Learning the Arrow of Time
MiroEval: A benchmark and evaluation framework for deep research agents — 100 tasks (70 text, 30 multimodal) assessed across synthesis quality, factuality, and research process. 13 systems evaluated.
Awesome Unified Multimodal Models
[ICML 2026] XSkill: Continual Learning from Experience and Skills in Multimodal Agents
Official repository for the paper "Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation"
Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation
SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning
Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
[ICLR 2025] "GraphRouter: A Graph-based Router for LLM Selections", Tao Feng, Yanzhen Shen, Jiaxuan You
Qwen3.6 is the large language model series developed by Qwen team, Alibaba Group.
ICLR 2026 (Oral) | EmotionThinker: Prosody-Aware Reinforcement Learning for Explainable Speech Emotion Reasoning
OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis
DeliveryBench: Can Agents Earn Profit in Real World?
[ICML 2026] Multimodal deep-research MLLM and benchmark. The first long-horizon multimodal deep-research MLLM, extending the number of reasoning turns to dozens and the number of search-engine inte…
Reinforcement Learning via Self-Distillation (SDPO)