Google Scholar

Scaling rl to long videos

Y Chen, W Huang, B Shi, Q Hu, H Ye… - Advances in …, 2026 - proceedings.neurips.cc

… scale VLMs for reasoning over long videos. LongVILA-R1 encompasses a meticulously
constructed large-scale … Leveraging our curated dataset of 104K long video question-reasoning-…

Save Cite Cited by 65 Related articles All 3 versions View as HTML

[PDF] aclanthology.org

Video-rts: Rethinking reinforcement learning and test-time scaling for efficient and enhanced video reasoning

Z Wang, J Yoon, S Yu, MM Islam… - Proceedings of the …, 2025 - aclanthology.org

… large-scale supervised fine-tuning (SFT) data with long CoT … and directly utilize pure RL
training on simple video question-… pure RL training and sparse-to-dense video test-time scaling …

Save Cite Cited by 20 Related articles All 5 versions View as HTML

[PDF] arxiv.org

Scalelong: A multi-timescale benchmark for long video understanding

D Ma, H Yuan, X Wang, Q Zang, T Liu, X He… - arXiv preprint arXiv …, 2025 - arxiv.org

… distinct scale. ScaleLong includes 269 diverse long videos (averaging 86 minutes), with 4-8
questions per video (at last one per scale), across 5 major categories and 36 subcategories. …

Save Cite Cited by 11 Related articles All 2 versions View as HTML

[PDF] neurips.cc

Video-r1: Reinforcing video reasoning in mllms

K Feng, K Gong, B Li, Z Guo, Y Wang… - Advances in …, 2026 - proceedings.neurips.cc

… based reinforcement learning (RL), we introduce Video-R1 as the … To further explore the
impact of scaling up reinforcement … strategies that allow scaling to longer videos, enabling more …

Save Cite Cited by 325 Related articles All 3 versions View as HTML

[PDF] arxiv.org

VideoZoomer: Reinforcement-Learned Temporal Focusing for Long Video Reasoning

Y Ding, Y Zhang, X Lai, R Chu, Y Yang - arXiv preprint arXiv:2512.22315, 2025 - arxiv.org

… for single turn, combining RL-driven reasoning and multi-turn tool use strategy for long …
The x-axis(log scale) represents the fixed frame budget for the baseline and the average …

Save Cite Cited by 7 Related articles All 2 versions View as HTML

[PDF] arxiv.org

Kimi k1. 5: Scaling reinforcement learning with llms

K Team, A Du, B Gao, B Xing, C Jiang, C Chen… - arXiv preprint arXiv …, 2025 - arxiv.org

… By scaling up RL training, we aim to train a model that … our work is to scale long-context RL
training. Partial rollouts … handling long-CoT features by managing the rollouts of both long and …

Save Cite Cited by 884 Related articles All 6 versions View as HTML

[PDF] arxiv.org

Thinking with videos: Multimodal tool-augmented reinforcement learning for long video reasoning

H Zhang, X Gu, J Li, C Ma, S Bai, C Zhang… - arXiv preprint arXiv …, 2025 - arxiv.org

… We construct two large-scale, high-quality multi-task video … frame sampling strategy for
efficient long video understanding. … RL framework for efficient and accurate long video reasoning. …

Save Cite Cited by 44 Related articles All 2 versions View as HTML

[PDF] neurips.cc

Time-r1: Post-training large vision language model for temporal video grounding

Y Wang, Z Wang, B Xu, Y Du, K Lin… - Advances in …, 2026 - proceedings.neurips.cc

… Existing benchmarks for temporal video grounding either focus on large-scale datasets
tailored for … We also compare RL and SFT strategies across TVG, short video QA, and long …

Save Cite Cited by 72 Related articles All 3 versions View as HTML

[PDF] researchgate.net

[PDF][PDF] RL-VideoAlign: Reinforcement Learning for Long-Horizon Aligned, Temporally Consistent, and Interaction-Credible Video Generation

B Run, S Li, S Wang - researchgate.net

… -scale datasets like CityFlow [26] and VERI-Wild [27], which emphasize the difficulty of
cross-camera and long-… keeping subjects coherent during complex 3D rotations in RL-VideoAlign. …

Save Cite Related articles View as HTML

[PDF] arxiv.org

EasyVideoR1: Easier RL for Video Understanding

C Qin, C Yang, Q Si, N Gu, D Yao, Z Lin, P Fu… - arXiv preprint arXiv …, 2026 - arxiv.org

… To further prevent long video sequences from … -source video-language models at this scale.
We train on approximately 100K video samples assembled from publicly available video RL …

Create alert

Cite

Advanced search

Saved to My library

Scaling rl to long videos

Video-rts: Rethinking reinforcement learning and test-time scaling for efficient and enhanced video reasoning

Scalelong: A multi-timescale benchmark for long video understanding

Video-r1: Reinforcing video reasoning in mllms

VideoZoomer: Reinforcement-Learned Temporal Focusing for Long Video Reasoning

Kimi k1. 5: Scaling reinforcement learning with llms

Thinking with videos: Multimodal tool-augmented reinforcement learning for long video reasoning

Time-r1: Post-training large vision language model for temporal video grounding

[PDF][PDF] RL-VideoAlign: Reinforcement Learning for Long-Horizon Aligned, Temporally Consistent, and Interaction-Credible Video Generation

EasyVideoR1: Easier RL for Video Understanding

Related searches