-
Game-RL Public
Game-RL: Synthesizing Multimodal Verifiable Game Data to Boost VLMs' General Reasoning
-
We propose Reinforcement Learning from Community Feedback (RLCF), a training paradigm that uses large-scale community signals as supervision, and formulate scientific taste learning as a preference…
-
Thinking-with-Video Public
We introduce 'Thinking with Video', a new paradigm leveraging video generation for multimodal reasoning. Our VideoThinkBench shows that Sora-2 surpasses GPT5 by 10% on eyeballing puzzles and reache…
-
Awesome-Agent-RL Public
A curated list of awesome resources about reward construction for AI agents. This repository covers cutting-edge research, and practical guides on defining and collecting rewards to build more inte…
59 UpdatedSep 1, 2025 -
MathTrap Public
In this work, we investigate the compositionality of large language models (LLMs) in mathematical reasoning. Specifically, we construct a new dataset MATHTRAP‡ by introducing carefully designed log…
-