Scaling rl to long videos
… scale VLMs for reasoning over long videos. LongVILA-R1 encompasses a meticulously
constructed large-scale … Leveraging our curated dataset of 104K long video question-reasoning-…
constructed large-scale … Leveraging our curated dataset of 104K long video question-reasoning-…
Video-rts: Rethinking reinforcement learning and test-time scaling for efficient and enhanced video reasoning
… large-scale supervised fine-tuning (SFT) data with long CoT … and directly utilize pure RL
training on simple video question-… pure RL training and sparse-to-dense video test-time scaling …
training on simple video question-… pure RL training and sparse-to-dense video test-time scaling …
Scalelong: A multi-timescale benchmark for long video understanding
… distinct scale. ScaleLong includes 269 diverse long videos (averaging 86 minutes), with 4-8
questions per video (at last one per scale), across 5 major categories and 36 subcategories. …
questions per video (at last one per scale), across 5 major categories and 36 subcategories. …
Video-r1: Reinforcing video reasoning in mllms
… based reinforcement learning (RL), we introduce Video-R1 as the … To further explore the
impact of scaling up reinforcement … strategies that allow scaling to longer videos, enabling more …
impact of scaling up reinforcement … strategies that allow scaling to longer videos, enabling more …
VideoZoomer: Reinforcement-Learned Temporal Focusing for Long Video Reasoning
… for single turn, combining RL-driven reasoning and multi-turn tool use strategy for long …
The x-axis(log scale) represents the fixed frame budget for the baseline and the average …
The x-axis(log scale) represents the fixed frame budget for the baseline and the average …
Kimi k1. 5: Scaling reinforcement learning with llms
… By scaling up RL training, we aim to train a model that … our work is to scale long-context RL
training. Partial rollouts … handling long-CoT features by managing the rollouts of both long and …
training. Partial rollouts … handling long-CoT features by managing the rollouts of both long and …
Thinking with videos: Multimodal tool-augmented reinforcement learning for long video reasoning
… We construct two large-scale, high-quality multi-task video … frame sampling strategy for
efficient long video understanding. … RL framework for efficient and accurate long video reasoning. …
efficient long video understanding. … RL framework for efficient and accurate long video reasoning. …
Time-r1: Post-training large vision language model for temporal video grounding
… Existing benchmarks for temporal video grounding either focus on large-scale datasets
tailored for … We also compare RL and SFT strategies across TVG, short video QA, and long …
tailored for … We also compare RL and SFT strategies across TVG, short video QA, and long …
[PDF][PDF] RL-VideoAlign: Reinforcement Learning for Long-Horizon Aligned, Temporally Consistent, and Interaction-Credible Video Generation
B Run, S Li, S Wang - researchgate.net
… -scale datasets like CityFlow [26] and VERI-Wild [27], which emphasize the difficulty of
cross-camera and long-… keeping subjects coherent during complex 3D rotations in RL-VideoAlign. …
cross-camera and long-… keeping subjects coherent during complex 3D rotations in RL-VideoAlign. …
EasyVideoR1: Easier RL for Video Understanding
… To further prevent long video sequences from … -source video-language models at this scale.
We train on approximately 100K video samples assembled from publicly available video RL …
We train on approximately 100K video samples assembled from publicly available video RL …