Stars
Code2World: A GUI World Model via Renderable Code Generation
Official repository for “PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss”
[ICLR2026] Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image Models
[ICLR 2026] Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation
A Curated List of Awesome Works in World Modeling, Aiming to Serve as a One-stop Resource for Researchers, Practitioners, and Enthusiasts Interested in World Modeling.
A curated list of state-of-the-art research in embodied AI, focusing on vision-language-action (VLA) models, vision-language navigation (VLN), and related multimodal learning approaches.
Official implementation of the ICLR 2026 paper "Urban Socio-Semantic Segmentation with Vision-Language Reasoning"
Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization
Eevee: Towards Close-up High-resolution Video-based Virtual Try-on
Processed / Cleaned Data for Paper Copilot
[EMNLP25] Official code for "POSITION BIAS MITIGATES POSITION BIAS: Mitigate Position Bias Through Inter-Position Knowledge Distillation"
[AAAI2026] ImagerySearch: Adaptive Test-Time Search for Video Generation Beyond Semantic Dependency Constraints
[ICLR2026] Advancing End-To-End Pixel-Space Generative Modeling Via Self-Supervised Pre-Training
[ICLR 2026] Tree Search for LLM Agent Reinforcement Learning
[Up-to-date] Large Language Model Agent: A Survey on Methodology, Applications and Challenges
[ICLR2026] Implementation of "S^2-Guidance: Stochastic Self Guidance for Training-Free Enhancement of Diffusion Models"
[AAAI2026] Implementation Code for Omni-Effects
Improving Food Image Recognition with Noisy Vision Transformer
[ICLR26] NarrLV: Towards a Comprehensive Narrative-Centric Evaluation for Long Video Generation Models
Implementation of "FLUX-Text: A Simple and Advanced Diffusion Transformer Baseline for Scene Text Editing"
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning
MAGI-1: Autoregressive Video Generation at Scale
[ICCV 25] VMBench: A Benchmark for Perception-Aligned Video Motion Generation
Next Token Is Enough: Realistic Image Quality and Aesthetic Scoring with Multimodal Large Language Model.
[ICLR26]GPG: A Simple and Strong Reinforcement Learning Baseline for Model Reasoning
Collect every awesome work about r1!
Witness the aha moment of VLM with less than $3.