Stars
Official implementation of Tuna-2: Pixel Embeddings Beat Vision Encoders for Unified Understanding and Generation
Reference code for the Meta-Harness paper.
[ICLR2026] Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image Models
[ICLR 2026] Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
Official codebase for "Self Forcing: Bridging Training and Inference in Autoregressive Video Diffusion" (NeurIPS 2025 Spotlight)
Benchmarking physical understanding in generative video models
[ICLR 2026] "VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use"
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)
Lets make video diffusion practical!
A framework for few-shot evaluation of language models.
Fully open data curation for reasoning models
Leaderboard Comparing LLM Performance at Producing Hallucinations when Summarizing Short Documents
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
[NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modeling
Explore the Multimodal “Aha Moment” on 2B Model
[CVPR 2025] EgoLife: Towards Egocentric Life Assistant
[ICLR'25] Reconstructive Visual Instruction Tuning
✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
The first behavioral foundation model to control a virtual physics-based humanoid agent for a wide range of whole-body tasks.
HunyuanVideo: A Systematic Framework For Large Video Generation Model
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
Allegro is a powerful text-to-video model that generates high-quality videos up to 6 seconds at 15 FPS and 720p resolution from simple text input.
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838