Skip to content

A comprehensive list of papers for the definition of World Models and using World Models for General Video Generation, Embodied AI, and Autonomous Driving, including papers, codes, and related websites.

License

Notifications You must be signed in to change notification settings

elilaird/Awesome-World-Models-2

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 

Repository files navigation

Awesome World Models for Robotics Awesome

This repository provides a curated list of papers for World Models for General Video Generation, Embodied AI, and Autonomous Driving. Template from Awesome-LLM-Robotics and Awesome-World-Model

Contributions are welcome! Please feel free to submit pull requests or reach out via email to add papers!

If you find this repository useful, please consider citing and giving this list a star ⭐. Feel free to share it with others!


Overview


Foundation paper of World Model

Blog or Technical Report

  • SIMA 2, SIMA 2: A Generalist Embodied Agent for Virtual Worlds. [Paper]
  • SimWorld, SimWorld: An Open-ended Realistic Simulator for Autonomous Agents in Physical and Social Worlds. [Paper] [Website]
  • Hunyuan-GameCraft-2, Hunyuan-GameCraft-2: Instruction-following Interactive Game World Model. [Paper] [Website]
  • GigaWorld-0, GigaWorld-0: World Models as Data Engine to Empower Embodied AI. [Paper] [Website]
  • PAN, PAN: A World Model for General, Interactable, and Long-Horizon World Simulation. [Paper]
  • Cosmos-Predict2.5, World Simulation with Video Foundation Models for Physical AI. [Paper] [Code]
  • Emu3.5, Emu3.5: Native Multimodal Models are World Learners. [Paper] [Website] [Code]
  • ODesign, ODesign: A World Model for Biomolecular Interaction Design. [Paper] [Website]
  • GigaBrain-0, GigaBrain-0: A World Model-Powered Vision-Language-Action Model. [Paper] [Website]
  • CWM, CWM: An Open-Weights LLM for Research on Code Generation with World Models. [Paper] [Website] [Code]
  • WoW, WoW: Towards a World omniscient World model Through Embodied Interaction. [Paper] [Website]
  • Matrix-Game 2.0, Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model. [Paper] [Website]
  • Matrix-3D, Matrix-3D: Omnidirectional Explorable 3D World Generation. [Paper] [Website]
  • HunyuanWorld 1.0, HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels. [Paper] [Website] [Code]
  • What Does it Mean for a Neural Network to Learn a "World Model"?. [Paper]
  • Matrix-Game, Matrix-Game: Interactive World Foundation Model. [Paper] [Code]
  • Cosmos-Drive-Dreams, Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models. [Paper] [Website]
  • GAIA-2, GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving. [Paper] [Website]
  • Cosmos, Cosmos World Foundation Model Platform for Physical AI. [Paper] [Website] [Code]
  • 1X Technologies, 1X World Model. [Blog]
  • Runway, Introducing General World Models. [Blog]
  • Wayve, Introducing GAIA-1: A Cutting-Edge Generative AI Model for Autonomy. [Paper] [Blog]
  • Yann LeCun, A Path Towards Autonomous Machine Intelligence. [Paper]

Surveys

  • "Beyond World Models: Rethinking Understanding in AI Models", AAAI 2026. [Paper]
  • "Simulating the Visual World with Artificial Intelligence: A Roadmap", arXiv 2025.11. [Paper] [Wesite] [Code]
  • "A Step Toward World Models: A Survey on Robotic Manipulation", arXiv 2025.11. [Paper]
  • "World Models Should Prioritize the Unification of Physical and Social Dynamics", NIPS 2025. [Paper] [Website]
  • "From Masks to Worlds: A Hitchhiker's Guide to World Models", arXiv 2025.10. [Paper] [Website]
  • "A Comprehensive Survey on World Models for Embodied AI", arXiv 2025.10. [Paper] [Website]
  • "The Safety Challenge of World Models for Embodied AI Agents: A Review", arXiv 2025.10. [Paper]
  • "Embodied AI: From LLMs to World Models", IEEE CASM. [Paper]
  • "3D and 4D World Modeling: A Survey", arXiv 2025.09. [Paper]
  • "Edge General Intelligence Through World Models and Agentic AI: Fundamentals, Solutions, and Challenges", arXiv 2025.08. [Paper]
  • "A Survey: Learning Embodied Intelligence from Physical Simulators and World Models", arXiv 2025.07. [Paper] [Code]
  • "Embodied AI Agents: Modeling the World", arXiv 2025.06. [Paper]
  • "From 2D to 3D Cognition: A Brief Survey of General World Models", arXiv 2025.06. [Paper]
  • "A Survey on World Models Grounded in Acoustic Physical Information", arXiv 2025.06. [Paper]
  • "Exploring the Evolution of Physics Cognition in Video Generation: A Survey", arXiv 2025.03. [Paper] [Code]
  • "World Models in Artificial Intelligence: Sensing, Learning, and Reasoning Like a Child", arXiv 2025.03. [Paper]
  • "Simulating the Real World: A Unified Survey of Multimodal Generative Models", arXiv 2025.03. [Paper] [Code]
  • "Four Principles for Physically Interpretable World Models", arXiv 2025.03. [Paper]
  • "The Role of World Models in Shaping Autonomous Driving: A Comprehensive Survey", arXiv 2025.02. [Paper] [Code]
  • "A Survey of World Models for Autonomous Driving", TPAMI. [Paper]
  • "Understanding World or Predicting Future? A Comprehensive Survey of World Models", arXiv 2024.11. [Paper]
  • "World Models: The Safety Perspective", ISSRE WDMD. [Paper]
  • "Exploring the Interplay Between Video Generation and World Models in Autonomous Driving: A Survey", arXiv 2024.11. [Paper]
  • "From Efficient Multimodal Models to World Models: A Survey", arXiv 2024.07. [Paper]
  • "Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI", arXiv 2024.07. [Paper] [Code]
  • "Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond", arXiv 2024.05. [Paper] [Code]
  • "World Models for Autonomous Driving: An Initial Survey", TIV. [Paper]
  • "A survey on multimodal large language models for autonomous driving", WACVW 2024. [Paper] [Code]

Benchmarks & Evaluation

  • On Memory: A comparison of memory mechanisms in world models: "On Memory: A comparison of memory mechanisms in world models" World Modeling Workshop 2026. [Paper]
  • SmallWorlds: "SmallWorlds: Assessing Dynamics Understanding of World Models in Isolated Environments", arXiv 2025.11. [Paper]
  • 4DWorldBench: "4DWorldBench: A Comprehensive Evaluation Framework for 3D/4D World Generation Models", arXiv 2025.11. [Paper]
  • Target-Bench: "Target-Bench: Can World Models Achieve Mapless Path Planning with Semantic Targets?", arXiv 2025.11. [Paper]
  • PragWorld: "PragWorld: A Benchmark Evaluating LLMs' Local World Model under Minimal Linguistic Alterations and Conversational Dynamics", AAAI 2026. [Paper]
  • "Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark", arXiv 2025.11. [Paper] [Code]
  • "Scalable Policy Evaluation with Video World Models", arXiv 2025.11. [Paper]
  • "Expert Evaluation of LLM World Models: A High-Tc Superconductivity Case Study", ICML 2025 workshop on Assessing World Models and the Explorations in AI Today. [Paper]
  • "Benchmarking World-Model Learning", arXiv 2025.10. [Paper]
  • World-in-World: "World-in-World: World Models in a Closed-Loop World", arXiv 2025.10. [Paper] [Website]
  • VideoVerse: "VideoVerse: How Far is Your T2V Generator from a World Model?", arXiv 2025.10. [Paper]
  • OmniWorld: "OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling", arXiv 2025.09. [Paper] [Website]
  • "Beyond Simulation: Benchmarking World Models for Planning and Causality in Autonomous Driving", ICRA 2025. [Paper]
  • WM-ABench: "Do Vision-Language Models Have Internal World Models? Towards an Atomic Evaluation", ACL 2025(Findings). [Paper] [Website]
  • UNIVERSE: "Adapting Vision-Language Models for Evaluating World Models", arxiv 2025.06. [Paper]
  • WorldPrediction: "WorldPrediction: A Benchmark for High-level World Modeling and Long-horizon Procedural Planning", arxiv 2025.06. [Paper]
  • "Toward Memory-Aided World Models: Benchmarking via Spatial Consistency", arxiv 2025.05. [Paper] [Datasets] [Code]
  • SimWorld: "SimWorld: A Unified Benchmark for Simulator-Conditioned Scene Generation via World Model", arxiv 2025.05. [Paper] [Code]
  • EWMBench: "EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models", arxiv 2025.05. [Paper] [Code]
  • "Toward Stable World Models: Measuring and Addressing World Instability in Generative Environments", arxiv 2025.03. [Paper]
  • WorldModelBench: "WorldModelBench: Judging Video Generation Models As World Models", CVPR 2025. [Paper] [Website]
  • Text2World: "Text2World: Benchmarking Large Language Models for Symbolic World Model Generation", arxiv 2025.02. [Paper] [Website]
  • ACT-Bench: "ACT-Bench: Towards Action Controllable World Models for Autonomous Driving", arxiv 2024.12. [Paper]
  • WorldSimBench: "WorldSimBench: Towards Video Generation Models as World Simulators", arxiv 2024.10. [Paper] [Website]
  • EVA: "EVA: An Embodied World Model for Future Video Anticipation", ICML 2025. [Paper] [Website]
  • AeroVerse: "AeroVerse: UAV-Agent Benchmark Suite for Simulating, Pre-training, Finetuning, and Evaluating Aerospace Embodied World Models", arxiv 2024.08. [Paper]
  • CityBench: "CityBench: Evaluating the Capabilities of Large Language Model as World Model", arXiv 2024.06. [Paper] [Code]
  • "Imagine the Unseen World: A Benchmark for Systematic Generalization in Visual World Models", NIPS 2023. [Paper]

General World Models

  • "Closing the Train-Test Gap in World Models for Gradient-Based Planning", `arxiv 2025.12. [Paper]
  • WonderZoom: "WonderZoom: Multi-Scale 3D World Generation", `arxiv 2025.12. [Paper] [Website]
  • Astra: "Astra: General Interactive World Model with Autoregressive Denoising", `arxiv 2025.12. [Paper] [Website] [Code]
  • Visionary: "Visionary: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform", `arxiv 2025.12. [Paper] [Website]
  • CLARITY: "CLARITY: Medical World Model for Guiding Treatment Decisions by Modeling Context-Aware Disease Trajectories in Latent Space", `arxiv 2025.12. [Paper]
  • UnityVideo: "UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation", `arxiv 2025.12. [Paper] [Website] [Code]
  • "Speech World Model: Causal State-Action Planning with Explicit Reasoning for Speech", `arxiv 2025.12. [Paper]
  • "Probing the effectiveness of World Models for Spatial Reasoning through Test-time Scaling", `arxiv 2025.12. [Paper] [Code]
  • ProPhy: "ProPhy: Progressive Physical Alignment for Dynamic World Simulation", `arxiv 2025.12. [Paper]
  • BiTAgent: "BiTAgent: A Task-Aware Modular Framework for Bidirectional Coupling between Multimodal Large Language Models and World Models", `arxiv 2025.12. [Paper]
  • RELIC: "RELIC: Interactive Video World Model with Long-Horizon Memory", `arxiv 2025.12. [Paper]
  • "Better World Models Can Lead to Better Post-Training Performance", `arxiv 2025.12. [Paper]
  • SeeU: "SeeU: Seeing the Unseen World via 4D Dynamics-aware Generation", `arxiv 2025.12. [Paper] [Website]
  • DynamicVerse: "DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling", `arxiv 2025.12. [Paper]
  • IC-World: "IC-World: In-Context Generation for Shared World Modeling", `arxiv 2025.12. [Paper] [Code]
  • WorldPack: "WorldPack: Compressed Memory Improves Spatial Consistency in Video World Modeling", `arxiv 2025.12. [Paper]
  • GrndCtrl: "GrndCtrl: Grounding World Models via Self-Supervised Reward Alignment", `arxiv 2025.12. [Paper]
  • ChronosObserver: "ChronosObserver: Taming 4D World with Hyperspace Diffusion Sampling", `arxiv 2025.12. [Paper]
  • AVWM: "Audio-Visual World Models: Towards Multisensory Imagination in Sight and Sound", `arxiv 2025.12. [Paper]
  • VCWorld: "VCWorld: A Biological World Model for Virtual Cell Simulation", `arxiv 2025.12. [Paper] [Code]
  • VISTAv2: "VISTAv2: World Imagination for Indoor Vision-and-Language Navigation", `arxiv 2025.12. [Paper] [Website]
  • Captain Safari: "Captain Safari: A World Engine", arxiv 2025.11. [Paper] [Website]
  • WorldWander: "WorldWander: Bridging Egocentric and Exocentric Worlds in Video Generation", arxiv 2025.11. [Paper] [Code]
  • Inferix: "Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation", arxiv 2025.11. [Paper] [Code]
  • MagicWorld: "MagicWorld: Interactive Geometry-driven Video World Exploration", arxiv 2025.11. [Paper]
  • "Counterfactual World Models via Digital Twin-conditioned Video Diffusion", arxiv 2025.11. [Paper]
  • WorldGen: "WorldGen: From Text to Traversable and Interactive 3D Worlds", arxiv 2025.11. [Paper] [Website]
  • X-WIN: "X-WIN: Building Chest Radiograph World Model via Predictive Sensing", arxiv 2025.11. [Paper]
  • "Object-Centric World Models for Causality-Aware Reinforcement Learning", AAAI 2026. [Paper]
  • "Latent-Space Autoregressive World Model for Efficient and Robust Image-Goal Navigation", arxiv 2025.11. [Paper]
  • Dynamic Sparsity: "Dynamic Sparsity: Challenging Common Sparsity Assumptions for Learning World Models in Robotic Reinforcement Learning Benchmarks", AAAI 2026. [Paper]
  • MrCoM: "MrCoM: A Meta-Regularized World-Model Generalizing Across Multi-Scenarios", AAAI 2026. [Paper]
  • "Next-Latent Prediction Transformers Learn Compact World Models", arxiv 2025.11. [Paper]
  • DR. WELL: "DR. WELL: Dynamic Reasoning and Learning with Symbolic World Model for Embodied LLM-Based Multi-Agent Collaboration", NeurIPS 2025 Workshop: Bridging Language, Agent, and World Models for Reasoning and Planning (LAW). [Paper] [Website]
  • "How Far Are Surgeons from Surgical World Models? A Pilot Study on Zero-shot Surgical Video Generation with Expert Assessment", arxiv 2025.11. [Paper]
  • "From Pixels to Cooperation Multi Agent Reinforcement Learning based on Multimodal World Models", arxiv 2025.11. [Paper]
  • "Bootstrap Off-policy with World Model", NIPS 2025. [Paper]
  • "Clone Deterministic 3D Worlds with Geometrically-Regularized World Models", arxiv 2025.10. [Paper]
  • "Semantic Communications with World Models", arxiv 2025.10. [Paper]
  • TRELLISWorld: "TRELLISWorld: Training-Free World Generation from Object Generators", arxiv 2025.10. [Paper]
  • WorldGrow: "WorldGrow: Generating Infinite 3D World", arxiv 2025.10. [Paper] [Code]
  • PhysWorld: "PhysWorld: From Real Videos to World Models of Deformable Objects via Physics-Aware Demonstration Synthesis", arxiv 2025.10. [Paper]
  • "How Hard is it to Confuse a World Model?", arxiv 2025.10. [Paper]
  • "Social World Model-Augmented Mechanism Design Policy Learning", NIPS 2025. [Paper]
  • VAGEN: "VAGEN: Reinforcing World Model Reasoning for Multi-Turn VLM Agents", NIPS 2025. [Paper] [Website]
  • Cosmos-Surg-dVRK: "Cosmos-Surg-dVRK: World Foundation Model-based Automated Online Evaluation of Surgical Robot Policy Learning", arxiv 2025.10. [Paper]
  • "Zero-shot World Models via Search in Memory", arxiv 2025.10. [Paper]
  • "Vector Quantization in the Brain: Grid-like Codes in World Models", NIPS 2025. [Paper]
  • Terra: "Terra: Explorable Native 3D World Model with Point Latents", arxiv 2025.10. [Paper] [Website]
  • Deep SPI: "Deep SPI: Safe Policy Improvement via World Models", arxiv 2025.10. [Paper]
  • "One Life to Learn: Inferring Symbolic World Models for Stochastic Environments from Unguided Exploration", arxiv 2025.10. [Paper] [Code]
  • R-WoM: "R-WoM: Retrieval-augmented World Model For Computer-use Agents", arxiv 2025.10. [Paper]
  • WorldMirror: "WorldMirror: Universal 3D World Reconstruction with Any-Prior Prompting", arxiv 2025.10. [Paper]
  • Unified World Models: "Unified World Models: Memory-Augmented Planning and Foresight for Visual Navigation", arxiv 2025.10. [Paper] [code]
  • "Code World Models for General Game Playing", arxiv 2025.10. [Paper]
  • MorphoSim: "MorphoSim: An Interactive, Controllable, and Editable Language-guided 4D World Simulator", arxiv 2025.10. [Paper] [code]
  • ChronoEdit: "ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation", arxiv 2025.10. [Paper] [Website]
  • SFP: "Spatiotemporal Forecasting as Planning: A Model-Based Reinforcement Learning Approach with Generative World Models", arxiv 2025.10. [Paper]
  • EvoWorld: "EvoWorld: Evolving Panoramic World Generation with Explicit 3D Memory", arxiv 2025.10. [Paper] [Code]
  • "World Model for AI Autonomous Navigation in Mechanical Thrombectomy", MICCAI 2025. Lecture Notes in Computer Science. [Paper]
  • DyMoDreamer: "DyMoDreamer: World Modeling with Dynamic Modulation", NeurIPS 2025. [Paper] [Code]
  • Dreamer4: "Training Agents Inside of Scalable World Models", arxiv 2025.09. [Paper] [Website]
  • "Reinforcement Learning with Inverse Rewards for World Model Post-training", arxiv 2025.09. [Paper]
  • "Context and Diversity Matter: The Emergence of In-Context Learning in World Models", arxiv 2025.09. [Paper]
  • FantasyWorld: "FantasyWorld: Geometry-Consistent World Modeling via Unified Video and 3D Prediction", arxiv 2025.09. [Paper]
  • "Remote Sensing-Oriented World Model", arxiv 2025.09. [Paper]
  • "World Modeling with Probabilistic Structure Integration", arxiv 2025.09. [Paper]
  • "One Model for All Tasks: Leveraging Efficient World Models in Multi-Task Planning", arxiv 2025.09. [Paper] [Code]
  • LatticeWorld: "LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation", arxiv 2025.09. [Paper]
  • "Planning with Reasoning using Vision Language World Model", arxiv 2025.09. [Paper]
  • "Social World Models", arxiv 2025.09. [Paper]
  • "Dynamics-Aligned Latent Imagination in Contextual World Models for Zero-Shot Generalization", arxiv 2025.08. [Paper]
  • HERO: "HERO: Hierarchical Extrapolation and Refresh for Efficient World Models", arxiv 2025.08. [Paper]
  • "Scalable RF Simulation in Generative 4D Worlds", arxiv 2025.08. [Paper]
  • "Finite Automata Extraction: Low-data World Model Learning as Programs from Gameplay Video", arxiv 2025.08. [Paper]
  • "Visuomotor Grasping with World Models for Surgical Robots", arxiv 2025.08. [Paper]
  • "In-Context Reinforcement Learning via Communicative World Models", arxiv 2025.08. [Paper] [Code]
  • PIGDreamer: "PIGDreamer: Privileged Information Guided World Models for Safe Partially Observable Reinforcement Learning", ICML 2025. [Paper]
  • SimuRA: "SimuRA: Towards General Goal-Oriented Agent via Simulative Reasoning Architecture with LLM-Based World Model", arxiv 2025.07. [Paper]
  • "Back to the Features: DINO as a Foundation for Video World Models", arxiv 2025.07. [Paper]
  • Yume: "Yume: An Interactive World Generation Model", arxiv 2025.07. [Paper] [Website] [Code]
  • "LLM world models are mental: Output layer evidence of brittle world model use in LLM mechanical reasoning", arxiv 2025.07. [Paper]
  • "Safety Certification in the Latent space using Control Barrier Functions and World Models", arxiv 2025.07. [Paper]
  • "Assessing adaptive world models in machines with novel games", arxiv 2025.07. [Paper]
  • "Graph World Model", ICML 2025. [Paper] [Website]
  • MobiWorld: "MobiWorld: World Models for Mobile Wireless Network", arxiv 2025.07. [Paper]
  • "Continual Reinforcement Learning by Planning with Online World Models", ICML 2025 Spotlight. [Paper]
  • AirScape: "AirScape: An Aerial Generative World Model with Motion Controllability", arxiv 2025.07. [Paper] [Website]
  • Geometry Forcing: "Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling", arxiv 2025.07. [Paper] [Website]
  • Martian World Models: "Martian World Models: Controllable Video Synthesis with Physically Accurate 3D Reconstructions", arxiv 2025.07. [Paper] [Website]
  • "What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models", ICML 2025. [Paper]
  • "Critiques of World Models", arxiv 2025.07. [Paper]
  • "When do World Models Successfully Learn Dynamical Systems?", arxiv 2025.07. [Paper]
  • WebSynthesis: "WebSynthesis: World-Model-Guided MCTS for Efficient WebUI-Trajectory Synthesis", arxiv 2025.07. [Paper]
  • "Accurate and Efficient World Modeling with Masked Latent Transformers", arxiv 2025.07. [Paper]
  • Dyn-O: "Dyn-O: Building Structured World Models with Object-Centric Representations", arxiv 2025.07. [Paper]
  • NavMorph: "NavMorph: A Self-Evolving World Model for Vision-and-Language Navigation in Continuous Environments", ICCV 2025. [Paper] [Code]
  • "A “Good” Regulator May Provide a World Model for Intelligent Systems", arxiv 2025.06. [Paper]
  • Xray2Xray: "Xray2Xray: World Model from Chest X-rays with Volumetric Context", arxiv 2025.06. [Paper]
  • MATWM: "Transformer World Model for Sample Efficient Multi-Agent Reinforcement Learning", arxiv 2025.06. [Paper]
  • "Measuring (a Sufficient) World Model in LLMs: A Variance Decomposition Framework", arxiv 2025.06. [Paper]
  • "Efficient Generation of Diverse Cooperative Agents with World Models", arxiv 2025.06. [Paper]
  • WorldLLM: "WorldLLM: Improving LLMs' world modeling using curiosity-driven theory-making", arxiv 2025.06. [Paper]
  • "LLMs as World Models: Data-Driven and Human-Centered Pre-Event Simulation for Disaster Impact Assessment", arxiv 2025.06. [Paper]
  • "Bootstrapping World Models from Dynamics Models in Multimodal Foundation Models", arxiv 2025.06. [Paper]
  • "Video World Models with Long-term Spatial Memory", arxiv 2025.06. [Paper] [Website]
  • DSG-World: "DSG-World: Learning a 3D Gaussian World Model from Dual State Videos", arxiv 2025.06. [Paper]
  • "Safe Planning and Policy Optimization via World Model Learning", arxiv 2025.06. [Paper]
  • FOLIAGE: "FOLIAGE: Towards Physical Intelligence World Models Via Unbounded Surface Evolution", arxiv 2025.06. [Paper]
  • "Linear Spatial World Models Emerge in Large Language Models", arxiv 2025.06. [Paper] [Code]
  • Simple, Good, Fast: "Simple, Good, Fast: Self-Supervised World Models Free of Baggage", ICLR 2025. [Paper] [Code]
  • Medical World Model: "Medical World Model: Generative Simulation of Tumor Evolution for Treatment Planning", arxiv 2025.06. [Paper]
  • "General agents need world models", ICML 2025. [Paper]
  • "Learning Abstract World Models with a Group-Structured Latent Space", arxiv 2025.06. [Paper]
  • DeepVerse: "DeepVerse: 4D Autoregressive Video Generation as a World Model", arxiv 2025.06. [Paper]
  • "World Models for Cognitive Agents: Transforming Edge Intelligence in Future Networks", arxiv 2025.06. [Paper]
  • Dyna-Think: "Dyna-Think: Synergizing Reasoning, Acting, and World Model Simulation in AI Agents", arxiv 2025.06. [Paper]
  • StateSpaceDiffuser: "StateSpaceDiffuser: Bringing Long Context to Diffusion World Models", arxiv 2025.05. [Paper]
  • "Learning World Models for Interactive Video Generation", arxiv 2025.05. [Paper]
  • "Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective", arxiv 2025.05. [Paper]
  • "Long-Context State-Space Video World Models", arxiv 2025.05. [Paper] [Website]
  • "Unlocking Smarter Device Control: Foresighted Planning with a World Model-Driven Code Execution Approach", arxiv 2025.05. [Paper]
  • "World Models as Reference Trajectories for Rapid Motor Adaptation", arxiv 2025.05. [Paper]
  • "Policy-Driven World Model Adaptation for Robust Offline Model-based Reinforcement Learning", arxiv 2025.05. [Paper]
  • "Building spatial world models from sparse transitional episodic memories", arxiv 2025.05. [Paper]
  • PoE-World: "PoE-World: Compositional World Modeling with Products of Programmatic Experts", arxiv 2025.05. [Paper] [Website]
  • "Explainable Reinforcement Learning Agents Using World Models", arxiv 2025.05. [Paper]
  • seq-JEPA: "seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models", arxiv 2025.05. [Paper]
  • "Coupled Distributional Random Expert Distillation for World Model Online Imitation Learning", arxiv 2025.05. [Paper]
  • "Learning Local Causal World Models with State Space Models and Attention", arxiv 2025.05. [Paper]
  • WebEvolver: "WebEvolver: Enhancing Web Agent Self-Improvement with Coevolving World Model", arxiv 2025.04. [Paper]
  • WALL-E 2.0: "WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents", arxiv 2025.04. [Paper] [Code]
  • ViMo: "ViMo: A Generative Visual GUI World Model for App Agent", arxiv 2025.04. [Paper]
  • "Simulating Before Planning: Constructing Intrinsic User World Model for User-Tailored Dialogue Policy Planning", SIGIR 2025. [Paper]
  • CheXWorld: "CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning", CVPR 2025. [Paper] [Code]
  • EchoWorld: "EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance", CVPR 2025. [Paper] [Code]
  • "Adapting a World Model for Trajectory Following in a 3D Game", ICLR 2025 Workshop on World Models. [Paper]
  • MineWorld: "MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft", arXiv 2025.04. [Paper] [Website]
  • MoSim: "Neural Motion Simulator Pushing the Limit of World Models in Reinforcement Learning", CVPR 2025. [Paper]
  • "Improving World Models using Deep Supervision with Linear Probes", ICLR 2025 Workshop on World Models. [Paper]
  • "Decentralized Collective World Model for Emergent Communication and Coordination", arXiv 2025.04. [Paper]
  • "Adapting World Models with Latent-State Dynamics Residuals", arXiv 2025.04. [Paper]
  • "Can Test-Time Scaling Improve World Foundation Model?", arXiv 2025.03. [Paper] [Code]
  • "Synthesizing world models for bilevel planning", arXiv 2025.03. [Paper]
  • "Long-context autoregressive video modeling with next-frame prediction", arXiv 2025.03. [Paper] [Code] [Website]
  • Aether: "Aether: Geometric-Aware Unified World Modeling", arXiv 2025.03. [Paper] [Website]
  • FUSDREAMER: "FUSDREAMER: Label-efficient Remote Sensing World Model for Multimodal Data Classification", arXiv 2025.03. [Paper] [Website]
  • "Inter-environmental world modeling for continuous and compositional dynamics", arXiv 2025.03. [Paper]
  • Disentangled World Models: "Disentangled World Models: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning", arXiv 2025.03. [Paper]
  • "Revisiting the Othello World Model Hypothesis", ICLR World Models Workshop. [Paper]
  • "Learning Transformer-based World Models with Contrastive Predictive Coding", arXiv 2025.03. [Paper]
  • "Surgical Vision World Model", arXiv 2025.03. [Paper]
  • "World Models for Anomaly Detection during Model-Based Reinforcement Learning Inference", arXiv 2025.03. [Paper]
  • WMNav: "WMNav: Integrating Vision-Language Models into World Models for Object Goal Navigation", arXiv 2025.03. [Paper] [Website]
  • SENSEI: "SENSEI: Semantic Exploration Guided by Foundation Models to Learn Versatile World Models", arXiv 2025.03. [Paper] [Website]
  • "Learning Actionable World Models for Industrial Process Control", arXiv 2025.03. [Paper]
  • "Implementing Spiking World Model with Multi-Compartment Neurons for Model-based Reinforcement Learning", arXiv 2025.03. [Paper]
  • "Discrete Codebook World Models for Continuous Control", ICLR 2025. [Paper]
  • Multimodal Dreaming: "Multimodal Dreaming: A Global Workspace Approach to World Model-Based Reinforcement Learning", arXiv 2025.02. [Paper]
  • "Generalist World Model Pre-Training for Efficient Reinforcement Learning", arXiv 2025.02. [Paper]
  • "Learning To Explore With Predictive World Model Via Self-Supervised Learning", arXiv 2025.02. [Paper]
  • M^3: "M^3: A Modular World Model over Streams of Tokens", arXiv 2025.02. [Paper]
  • "When do neural networks learn world models?", arXiv 2025.02. [Paper]
  • "Pre-Trained Video Generative Models as World Simulators", arXiv 2025.02. [Paper]
  • DMWM: "DMWM: Dual-Mind World Model with Long-Term Imagination", arXiv 2025.02. [Paper]
  • EvoAgent: "EvoAgent: Agent Autonomous Evolution with Continual World Model for Long-Horizon Tasks", arXiv 2025.02. [Paper]
  • "Acquisition through My Eyes and Steps: A Joint Predictive Agent Model in Egocentric Worlds", arXiv 2025.02. [Paper]
  • "Generating Symbolic World Models via Test-time Scaling of Large Language Models", arXiv 2025.02. [Paper] [Website]
  • "Improving Transformer World Models for Data-Efficient RL", arXiv 2025.02. [Paper]
  • "Trajectory World Models for Heterogeneous Environments", arXiv 2025.02. [Paper]
  • "Enhancing Memory and Imagination Consistency in Diffusion-based World Models via Linear-Time Sequence Modeling", arXiv 2025.02. [Paper]
  • "Objects matter: object-centric world models improve reinforcement learning in visually complex environments", arXiv 2025.01. [Paper]
  • GLAM: "GLAM: Global-Local Variation Awareness in Mamba-based World Model", arXiv 2025.01. [Paper]
  • GAWM: "GAWM: Global-Aware World Model for Multi-Agent Reinforcement Learning", arXiv 2025.01. [Paper]
  • "Generative Emergent Communication: Large Language Model is a Collective World Model", arXiv 2025.01. [Paper]
  • "Towards Unraveling and Improving Generalization in World Models", arXiv 2025.01. [Paper]
  • "Towards Physically Interpretable World Models: Meaningful Weakly Supervised Representations for Visual Trajectory Prediction", arXiv 2024.12. [Paper]
  • "Transformers Use Causal World Models in Maze-Solving Tasks", arXiv 2024.12. [Paper]
  • "Causal World Representation in the GPT Model", NIPS 2024 Workshop. [Paper]
  • Owl-1: "Owl-1: Omni World Model for Consistent Long Video Generation", arXiv 2024.12. [Paper]
  • "Navigation World Models", arXiv 2024.12. [Paper] [Website]
  • "Evaluating World Models with LLM for Decision Making", arXiv 2024.11. [Paper]
  • LLMPhy: "LLMPhy: Complex Physical Reasoning Using Large Language Models and World Models", arXiv 2024.11. [Paper]
  • WebDreamer: "Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents", arXiv 2024.11. [Paper] [Code]
  • "Scaling Laws for Pre-training Agents and World Models", arXiv 2024.11. [Paper]
  • DINO-WM: "DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning", arXiv 2024.11. [Paper] [Website]
  • "Learning World Models for Unconstrained Goal Navigation", NIPS 2024. [Paper]
  • "How Far is Video Generation from World Model: A Physical Law Perspective", arXiv 2024.11. [Paper] [Website] [Code]
  • Adaptive World Models: "Adaptive World Models: Learning Behaviors by Latent Imagination Under Non-Stationarity", NIPS 2024 Workshop Adaptive Foundation Models. [Paper]
  • LLMCWM: "Language Agents Meet Causality -- Bridging LLMs and Causal World Models", arXiv 2024.10. [Paper] [Code]
  • "Reward-free World Models for Online Imitation Learning", arXiv 2024.10. [Paper]
  • "Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation", arXiv 2024.10. [Paper]
  • AVID: "AVID: Adapting Video Diffusion Models to World Models", arXiv 2024.10. [Paper] [Code]
  • SMAC: "Grounded Answers for Multi-agent Decision-making Problem through Generative World Model", NeurIPS 2024. [Paper]
  • OSWM: "One-shot World Models Using a Transformer Trained on a Synthetic Prior", arXiv 2024.09. [Paper]
  • "Making Large Language Models into World Models with Precondition and Effect Knowledge", arXiv 2024.09. [Paper]
  • "Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction", arXiv 2024.08. [Paper]
  • MoReFree: "World Models Increase Autonomy in Reinforcement Learning", arXiv 2024.08. [Paper] [Project]
  • UrbanWorld: "UrbanWorld: An Urban World Model for 3D City Generation", arXiv 2024.07. [Paper]
  • PWM: "PWM: Policy Learning with Large World Models", arXiv 2024.07. [Paper] [Code]
  • "Predicting vs. Acting: A Trade-off Between World Modeling & Agent Modeling", arXiv 2024.07. [Paper]
  • GenRL: "GenRL: Multimodal foundation world models for generalist embodied agents", arXiv 2024.06. [Paper] [Code]
  • DLLM: "World Models with Hints of Large Language Models for Goal Achieving", arXiv 2024.06. [Paper]
  • "Cognitive Map for Language Models: Optimal Planning via Verbally Representing the World Model", arXiv 2024.06. [Paper]
  • CoDreamer: "CoDreamer: Communication-Based Decentralised World Models", arXiv 2024.06. [Paper]
  • Pandora: "Pandora: Towards General World Model with Natural Language Actions and Video States", arXiv 2024.06. [Paper] [Code]
  • EBWM: "Cognitively Inspired Energy-Based World Models", arXiv 2024.06. [Paper]
  • "Evaluating the World Model Implicit in a Generative Model", arXiv 2024.06. [Paper] [Code]
  • "Transformers and Slot Encoding for Sample Efficient Physical World Modelling", arXiv 2024.05. [Paper] [Code]
  • Puppeteer: "Hierarchical World Models as Visual Whole-Body Humanoid Controllers", arXiv 2024.05. [Paper] [Code]
  • BWArea Model: "BWArea Model: Learning World Model, Inverse Dynamics, and Policy for Controllable Language Generation", arXiv 2024.05. [Paper]
  • WKM: "Agent Planning with World Knowledge Model", arXiv 2024.05. [Paper] [Code]
  • Diamond: "Diffusion for World Modeling: Visual Details Matter in Atari", arXiv 2024.05. [Paper] [Code]
  • "Compete and Compose: Learning Independent Mechanisms for Modular World Models", arXiv 2024.04. [Paper]
  • "Dreaming of Many Worlds: Learning Contextual World Models Aids Zero-Shot Generalization", arXiv 2024.03. [Paper] [Code]
  • V-JEPA: "V-JEPA: Video Joint Embedding Predictive Architecture", Meta AI. [Blog] [Paper] [Code]
  • IWM: "Learning and Leveraging World Models in Visual Representation Learning", Meta AI. [Paper]
  • Genie: "Genie: Generative Interactive Environments", DeepMind. [Paper] [Blog]
  • Sora: "Video generation models as world simulators", OpenAI. [Technical report]
  • LWM: "World Model on Million-Length Video And Language With RingAttention", arXiv 2024.02. [Paper] [Code]
  • "Planning with an Ensemble of World Models", OpenReview. [Paper]
  • WorldDreamer: "WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens", arXiv 2024.01. [Paper] [Code]
  • CWM: "Understanding Physical Dynamics with Counterfactual World Modeling", ECCV 2024. [Paper] [Code]
  • Δ-IRIS: "Efficient World Models with Context-Aware Tokenization", ICML 2024. [Paper] [Code]
  • LLM-Sim: "Can Language Models Serve as Text-Based World Simulators?", ACL. [Paper] [Code]
  • AD3: "AD3: Implicit Action is the Key for World Models to Distinguish the Diverse Visual Distractors", ICML 2024. [Paper]
  • MAMBA: "MAMBA: an Effective World Model Approach for Meta-Reinforcement Learning", ICLR 2024. [Paper] [Code]
  • R2I: "Mastering Memory Tasks with World Models", ICLR 2024. [Paper] [Website] [Code]
  • HarmonyDream: "HarmonyDream: Task Harmonization Inside World Models", ICML 2024. [Paper] [Code]
  • REM: "Improving Token-Based World Models with Parallel Observation Prediction", ICML 2024. [Paper] [Code]
  • "Do Transformer World Models Give Better Policy Gradients?"", ICML 2024. [Paper]
  • DreamSmooth: "DreamSmooth: Improving Model-based Reinforcement Learning via Reward Smoothing", ICLR 2024. [Paper]
  • TD-MPC2: "TD-MPC2: Scalable, Robust World Models for Continuous Control", ICLR 2024. [Paper] [Torch Code]
  • Hieros: "Hieros: Hierarchical Imagination on Structured State Space Sequence World Models", ICML 2024. [Paper]
  • CoWorld: "Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning", NeurIPS 2024. [Paper]

World Models for Embodied AI

  • PRISM-WM: "Prismatic World Model: Learning Compositional Dynamics for Planning in Hybrid Systems", arXiv 2025.12. [Paper]
  • "Learning Robot Manipulation from Audio World Models", arXiv 2025.12. [Paper]
  • "Embodied Tree of Thoughts: Deliberate Manipulation Planning with Embodied World Model", arXiv 2025.12. [Paper] [Website]
  • "World Models That Know When They Don't Know: Controllable Video Generation with Calibrated Uncertainty", arXiv 2025.12. [Paper]
  • "Real-World Robot Control by Deep Active Inference With a Temporally Hierarchical World Model", IEEE Robotics and Automation Letters. [Paper]
  • "Seeing through Imagination: Learning Scene Geometry via Implicit Spatial World Modeling", arXiv 2025.12. [Paper]
  • IGen: "IGen: Scalable Data Generation for Robot Learning from Open-World Images", arXiv 2025.12. [Paper] [Website]
  • NavForesee: "NavForesee: A Unified Vision-Language World Model for Hierarchical Planning and Dual-Horizon Navigation Prediction", arXiv 2025.12. [Paper]
  • TraceGen: "TraceGen: World Modeling in 3D Trace Space Enables Learning from Cross-Embodiment Videos", arXiv 2025.11. [Paper] [Website]
  • ENACT: "ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction", arXiv 2025.11. [Paper] [Website] [Code]
  • "Learning Massively Multitask World Models for Continuous Control", arXiv 2025.11. [Paper] [Website]
  • UNeMo: "UNeMo: Collaborative Visual-Language Reasoning and Navigation via a Multimodal World Model", arXiv 2025.11. [Paper]
  • "MindForge: Empowering Embodied Agents with Theory of Mind for Lifelong Cultural Learning", NeurIPS 2025. [Paper]
  • "Towards High-Consistency Embodied World Model with Multi-View Trajectory Videos", arXiv 2025.11. [Paper]
  • WMPO: "WMPO: World Model-based Policy Optimization for Vision-Language-Action Models", arXiv 2025.11. [Paper] [Website]
  • "Robot Learning from a Physical World Model", arXiv 2025.11. [Paper] [Website]
  • "When Object-Centric World Models Meet Policy Learning: From Pixels to Policies, and Where It Breaks", arXiv 2025.11. [Paper]
  • WorldPlanner: "WorldPlanner: Monte Carlo Tree Search and MPC with Action-Conditioned Visual World Models", arXiv 2025.11. [Paper]
  • "Learning Interactive World Model for Object-Centric Reinforcement Learning", NIPS 2025. [Paper]
  • "Scaling Cross-Embodiment World Models for Dexterous Manipulation", arXiv 2025.11. [Paper]
  • "Co-Evolving Latent Action World Models", arXiv 2025.10. [Paper]
  • "Deductive Chain-of-Thought Augmented Socially-aware Robot Navigation World Model", arXiv 2025.10. [Paper] [Website]
  • "Deep Active Inference with Diffusion Policy and Multiple Timescale World Model for Real-World Exploration and Navigation", arXiv 2025.10. [Paper]
  • ProTerrain: "ProTerrain: Probabilistic Physics-Informed Rough Terrain World Modeling", arXiv 2025.10. [Paper]
  • "Ego-Vision World Model for Humanoid Contact Planning", arXiv 2025.10. [Paper] [Website]
  • Ctrl-World: "Ctrl-World: A Controllable Generative World Model for Robot Manipulation", arXiv 2025.10. [Paper] [Website] [Code]
  • iMoWM: "iMoWM: Taming Interactive Multi-Modal World Model for Robotic Manipulation", arXiv 2025.10. [Paper] [Website]
  • WristWorld: "WristWorld: Generating Wrist-Views via 4D World Models for Robotic Manipulation", arXiv 2025.10. [Paper]
  • "A Recipe for Efficient Sim-to-Real Transfer in Manipulation with Online Imitation-Pretrained World Models", arXiv 2025.10. [Paper]
  • "Kinodynamic Motion Planning for Mobile Robot Navigation across Inconsistent World Models", RSS 2025 Workshop on Resilient Off-road Autonomous Robotics (ROAR). [Paper]
  • EMMA: "EMMA: Generalizing Real-World Robot Manipulation via Generative Visual Transfer", arXiv 2025.09. [Paper]
  • LongScape: "LongScape: Advancing Long-Horizon Embodied World Models with Context-Aware MoE", arXiv 2025.09. [Paper]
  • KeyWorld: "KeyWorld: Key Frame Reasoning Enables Effective and Efficient World Models", arXiv 2025.09. [Paper]
  • DAWM: "DAWM: Diffusion Action World Models for Offline Reinforcement Learning via Action-Inferred Transitions", ICML 2025 Workshop. [Paper]
  • World4RL: "World4RL: Diffusion World Models for Policy Refinement with Reinforcement Learning for Robotic Manipulation", arXiv 2025.09. [Paper]
  • SAMPO: "SAMPO:Scale-wise Autoregression with Motion PrOmpt for generative world models", arXiv 2025.09. [Paper]
  • PhysicalAgent: "PhysicalAgent: Towards General Cognitive Robotics with Foundation World Models", arXiv 2025.09. [Paper]
  • "Empowering Multi-Robot Cooperation via Sequential World Models", arXiv 2025.09. [Paper]
  • "World Model Implanting for Test-time Adaptation of Embodied Agents", ICML 2025. [Paper]
  • "Learning Primitive Embodied World Models: Towards Scalable Robotic Learning", arxiv 2025.08. [Paper] [Website]
  • GWM: "GWM: Towards Scalable Gaussian World Models for Robotic Manipulation", ICCV 2025. [Paper] [Website]
  • "Imaginative World Modeling with Scene Graphs for Embodied Agent Navigation", arxiv 2025.08. [Paper]
  • "Bounding Distributional Shifts in World Modeling through Novelty Detection", arxiv 2025.08. [Paper]
  • Genie Envisioner: "Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation", arxiv 2025.08. [Paper] [Website]
  • DiWA: "DiWA: Diffusion Policy Adaptation with World Models", CoRL 2025. [Paper] [Code]
  • CoEx: "CoEx -- Co-evolving World-model and Exploration", arxiv 2025.07. [Paper]
  • "Latent Policy Steering with Embodiment-Agnostic Pretrained World Models", arxiv 2025.07. [Paper]
  • MindJourney: "MindJourney: Test-Time Scaling with World Models for Spatial Reasoning", arxiv 2025.07. [Paper] [Website]
  • FOUNDER: "FOUNDER: Grounding Foundation Models in World Models for Open-Ended Embodied Decision Making", ICML 2025. [Paper] [Website]
  • EmbodieDreamer: "EmbodieDreamer: Advancing Real2Sim2Real Transfer for Policy Training via Embodied World Modeling", arxiv 2025.07. [Paper] [Website]
  • World4Omni: "World4Omni: A Zero-Shot Framework from Image Generation World Model to Robotic Manipulation", arxiv 2025.06. [Paper] [Website]
  • RoboScape: "RoboScape: Physics-informed Embodied World Model", arxiv 2025.06. [Paper] [Code]
  • ParticleFormer: "ParticleFormer: A 3D Point Cloud World Model for Multi-Object, Multi-Material Robotic Manipulation", arxiv 2025.06. [Paper] [Website]
  • ManiGaussian++: "ManiGaussian++: General Robotic Bimanual Manipulation with Hierarchical Gaussian World Model", arxiv 2025.06. [Paper] [Code]
  • ReOI: "Reimagination with Test-time Observation Interventions: Distractor-Robust World Model Predictions for Visual Model Predictive Control", arxiv 2025.06. [Paper]
  • GAF: "GAF: Gaussian Action Field as a Dynamic World Model for Robotic Mlanipulation", arxiv 2025.06. [Paper] [Website]
  • "Prompting with the Future: Open-World Model Predictive Control with Interactive Digital Twins", RSS 2025. [Paper] [Website]
  • V-JEPA 2 and V-JEPA 2-AC: "V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning", arxiv 2025.06. [Paper] [Website] [Code]
  • "Time-Aware World Model for Adaptive Prediction and Control", ICML 2025. [Paper]
  • 3DFlowAction: "3DFlowAction: Learning Cross-Embodiment Manipulation from 3D Flow World Model", arxiv 2025.06. [Paper]
  • ORV: "ORV: 4D Occupancy-centric Robot Video Generation", arxiv 2025.06. [Paper] [Code] [Website]
  • WoMAP: "WoMAP: World Models For Embodied Open-Vocabulary Object Localization", arxiv 2025.06. [Paper]
  • "Sparse Imagination for Efficient Visual World Model Planning", arxiv 2025.06. [Paper]
  • Humanoid World Models: "Humanoid World Models: Open World Foundation Models for Humanoid Robotics", arxiv 2025.06. [Paper]
  • "Evaluating Robot Policies in a World Model", arxiv 2025.06. [Paper] [Website]
  • OSVI-WM: "OSVI-WM: One-Shot Visual Imitation for Unseen Tasks using World-Model-Guided Trajectory Generation", arxiv 2025.05. [Paper]
  • WorldEval: "WorldEval: World Model as Real-World Robot Policies Evaluator", arxiv 2025.05. [Paper] [Website]
  • "Consistent World Models via Foresight Diffusion", arxiv 2025.05. [Paper]
  • Vid2World: "Vid2World: Crafting Video Diffusion Models to Interactive World Models", arXiv 2025.05. [Paper] [Website]
  • RLVR-World: "RLVR-World: Training World Models with Reinforcement Learning", arXiv 2025.05. [Paper] [Website] [Code]
  • LaDi-WM: "LaDi-WM: A Latent Diffusion-based World Model for Predictive Manipulation", arXiv 2025.05. [Paper]
  • FlowDreamer: "FlowDreamer: A RGB-D World Model with Flow-based Motion Representations for Robot Manipulation", arXiv 2025.05. [Paper] [Website]
  • "Occupancy World Model for Robots", arXiv 2025.05. [Paper]
  • "Learning 3D Persistent Embodied World Models", arXiv 2025.05. [Paper]
  • TesserAct: "TesserAct: Learning 4D Embodied World Models", arXiv 2025.04. [Paper] [Website]
  • PIN-WM: "PIN-WM: Learning Physics-INformed World Models for Non-Prehensile Manipulation", arXiv 2025.04. [Paper]
  • "Offline Robotic World Model: Learning Robotic Policies without a Physics Simulator", arXiv 2025.04. [Paper]
  • ManipDreamer: "ManipDreamer: Boosting Robotic Manipulation World Model with Action Tree and Visual Guidance", arXiv 2025.04. [Paper]
  • UWM: "Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets", arXiv 2025.04. [Paper] [Website]
  • "Perspective-Shifted Neuro-Symbolic World Models: A Framework for Socially-Aware Robot Navigation", arXiv 2025.03. [Paper]
  • AdaWorld: "AdaWorld: Learning Adaptable World Models with Latent Actions", arXiv 2025.03. [Paper] [Website]
  • DyWA: "DyWA: Dynamics-adaptive World Action Model for Generalizable Non-prehensile Manipulation", arXiv 2025.03. [Paper] [Website]
  • "Towards Suturing World Models: Learning Predictive Models for Robotic Surgical Tasks", arXiv 2025.03. [Paper] [Website]
  • "World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning", arXiv 2025.03. [Paper]
  • LUMOS: "LUMOS: Language-Conditioned Imitation Learning with World Models", ICRA 2025. [Paper] [Website]
  • "Object-Centric World Model for Language-Guided Manipulation", arXiv 2025.03. [Paper]
  • DEMO^3: "Multi-Stage Manipulation with Demonstration-Augmented Reward, Policy, and World Model Learning", arXiv 2025.03. [Paper] [Website]
  • "Accelerating Model-Based Reinforcement Learning with State-Space World Models", arXiv 2025.02. [Paper]
  • "Learning Humanoid Locomotion with World Model Reconstruction", arXiv 2025.02. [Paper]
  • "Strengthening Generative Robot Policies through Predictive World Modeling", arXiv 2025.02. [Paper] [Website]
  • Robotic World Model: "Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics", arXiv 2025.01. [Paper]
  • RoboHorizon: "RoboHorizon: An LLM-Assisted Multi-View World Model for Long-Horizon Robotic Manipulation", arXiv 2025.01. [Paper]
  • Dream to Manipulate: "Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination", arXiv 2024.12. [Paper] [Website]
  • WHALE: "WHALE: Towards Generalizable and Scalable World Models for Embodied Decision-making", arXiv 2024.11. [Paper]
  • VisualPredicator: "VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning", arXiv 2024.10. [Paper]
  • "Multi-Task Interactive Robot Fleet Learning with Visual World Models", CoRL 2024. [Paper] [Code]
  • X-MOBILITY: "X-MOBILITY: End-To-End Generalizable Navigation via World Modeling", arXiv 2024.10. [Paper]
  • PIVOT-R: "PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation", NeurIPS 2024. [Paper]
  • GLIMO: "Grounding Large Language Models In Embodied Environment With Imperfect World Models", arXiv 2024.10. [Paper]
  • EVA: "EVA: An Embodied World Model for Future Video Anticipation", arxiv 2024.10. [Paper] [Website]
  • PreLAR: "PreLAR: World Model Pre-training with Learnable Action Representation", ECCV 2024. [Paper] [Code]
  • WMP: "World Model-based Perception for Visual Legged Locomotion", arXiv 2024.09. [Paper] [Project]
  • R-AIF: "R-AIF: Solving Sparse-Reward Robotic Tasks from Pixels with Active Inference and World Models", arXiv 2024.09. [Paper]
  • "Representing Positional Information in Generative World Models for Object Manipulation" arXiv 2024.09 [Paper]
  • DexSim2Real$^2$: "DexSim2Real$^2: Building Explicit World Model for Precise Articulated Object Dexterous Manipulation", arXiv 2024.09. [Paper]
  • DWL: "Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning", RSS 2024 (Best Paper Award Finalist). [Paper]
  • "Physically Embodied Gaussian Splatting: A Realtime Correctable World Model for Robotics", arXiv 2024.06. [Paper] [Website]
  • HRSSM: "Learning Latent Dynamic Robust Representations for World Models", ICML 2024. [Paper] [Code]
  • RoboDreamer: "RoboDreamer: Learning Compositional World Models for Robot Imagination", ICML 2024. [Paper] [Code]
  • COMBO: "COMBO: Compositional World Models for Embodied Multi-Agent Cooperation", ECCV 2024. [Paper] [Website] [Code]
  • ManiGaussian: "ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation", arXiv 2024.03. [Paper] [Code]

World Models for VLA

  • RoboScape-R: "RoboScape-R: Unified Reward-Observation World Models for Generalizable Robotics Training via RL", arxiv 2025.12. [Paper]
  • AdaPower: "AdaPower: Specializing World Foundation Models for Predictive Manipulation", arxiv 2025.12. [Paper]
  • RynnVLA-002: "RynnVLA-002: A Unified Vision-Language-Action and World Model", arxiv 2025.11. [Paper] [Code]
  • NORA-1.5: "NORA-1.5: A Vision-Language-Action Model Trained using World Model- and Action-based Preference Rewards", arxiv 2025.11. [Paper] [Website] [Code]
  • "Dual-Stream Diffusion for World-Model Augmented Vision-Language-Action Model", arxiv 2025.10. [Paper]
  • VLA-RFT: "VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators", arxiv 2025.10. [Paper]
  • World-Env: "World-Env: Leveraging World Model as a Virtual Environment for VLA Post-Training", arxiv 2025.09. [Paper]
  • MoWM: "MoWM: Mixture-of-World-Models for Embodied Planning via Latent-to-Pixel Feature Modulation", arxiv 2025.09. [Paper]
  • LAWM: "Latent Action Pretraining Through World Modeling", arxiv 2025.09. [Paper] [Code]
  • PAR: "Physical Autoregressive Model for Robotic Manipulation without Action Pretraining", arxiv 2025.08. [Paper] [Website]
  • DreamVLA: "DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge", arxiv 2025.07. [Paper] [Code] [Website]
  • WorldVLA: "WorldVLA: Towards Autoregressive Action World Model", arxiv 2025.06. [Paper] [Code]
  • UniVLA: "UniVLA: Unified Vision-Language-Action Model", arxiv 2025.06. [Paper] [Code]
  • MinD: "MinD: Unified Visual Imagination and Control via Hierarchical World Models", arxiv 2025.06. [Paper] [Website]
  • FLARE: "FLARE: Robot Learning with Implicit World Modeling", arxiv 2025.05. [Paper] [Code] [Website]
  • DreamGen: "DreamGen: Unlocking Generalization in Robot Learning through Video World Models", arxiv 2025.06. [Paper] [Code]
  • CoT-VLA: "CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models", CVPR 2025. [Paper]
  • UP-VLA: "UP-VLA: A Unified Understanding and Prediction Model for Embodied Agent", ICML 2025. [Paper] [Code]
  • 3D-VLA: "3D-VLA: A 3D Vision-Language-Action Generative World Model", ICML 2024. [Paper]

World Models for Visual Understanding

  • "Semantic World Models", arxiv 2025.10. [Paper] [Website]
  • DyVA: "Can World Models Benefit VLMs for World Dynamics?", arxiv 2025.10. [Paper] [Website]
  • "Video models are zero-shot learners and reasoners", arxiv 2025.09. [Paper]
  • "From Generation to Generalization: Emergent Few-Shot Learning in Video Diffusion Models", arxiv 2025.06. [Paper]

World Models for Autonomous Driving

  • MindDrive: "MindDrive: An All-in-One Framework Bridging World Models and Vision-Language Model for End-to-End Autonomous Driving", `arxiv 2025.12. [Paper]
  • "Think Before You Drive: World Model-Inspired Multimodal Grounding for Autonomous Vehicles", `arxiv 2025.12. [Paper]
  • U4D: "U4D: Uncertainty-Aware 4D World Modeling from LiDAR Sequences", `arxiv 2025.12. [Paper]
  • "Vehicle Dynamics Embedded World Models for Autonomous Driving", arXiv 2025.12. [Paper]
  • "World Model Robustness via Surprise Recognition", arXiv 2025.12. [Paper]
  • SparseWorld-TC: "SparseWorld-TC: Trajectory-Conditioned Sparse Occupancy World Model", arXiv 2025.11. [Paper]
  • AD-R1: "AD-R1: Closed-Loop Reinforcement Learning for End-to-End Autonomous Driving with Impartial World Models", arXiv 2025.11. [Paper]
  • Map-World: "Map-World: Masked Action planning and Path-Integral World Model for Autonomous Driving", arXiv 2025.11. [Paper]
  • WPT: "WPT: World-to-Policy Transfer via Online World Model Distillation", arXiv 2025.11. [Paper]
  • Percept-WAM: "Percept-WAM: Perception-Enhanced World-Awareness-Action Model for Robust End-to-End Autonomous Driving", arXiv 2025.11. [Paper]
  • Thinking Ahead: "Thinking Ahead: Foresight Intelligence in MLLMs and World Models", arXiv 2025.11. [Paper]
  • LiSTAR: "LiSTAR: Ray-Centric World Models for 4D LiDAR Sequences in Autonomous Driving", arXiv 2025.11. [Paper] [Website]
  • "Dual-Mind World Models: A General Framework for Learning in Dynamic Wireless Networks", arXiv 2025.10. [Paper]
  • "Addressing Corner Cases in Autonomous Driving: A World Model-based Approach with Mixture of Experts and LLMs", arXiv 2025.10. [Paper]
  • "From Forecasting to Planning: Policy World Model for Collaborative State-Action Prediction", NIPS 2025. [Paper] [Code]
  • "Rethinking Driving World Model as Synthetic Data Generator for Perception Tasks", arXiv 2025.10. [Paper] [Website]
  • OmniNWM: "OmniNWM: Omniscient Driving Navigation World Models", arXiv 2025.10. [Paper] [Website]
  • SparseWorld: "SparseWorld: A Flexible, Adaptive, and Efficient 4D Occupancy World Model Powered by Sparse and Dynamic Queries", arXiv 2025.10. [Paper] [Code]
  • "Vision-Centric 4D Occupancy Forecasting and Planning via Implicit Residual World Models", arXiv 2025.10. [Paper]
  • DriveVLA-W0: "DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving", arXiv 2025.10. [Paper] [Code]
  • CoIRL-AD: "CoIRL-AD: Collaborative-Competitive Imitation-Reinforcement Learning in Latent World Models for Autonomous Driving", arXiv 2025.10. [Paper] [Code]
  • TeraSim-World: "TeraSim-World: Worldwide Safety-Critical Data Synthesis for End-to-End Autonomous Driving", arXiv 2025.09. [Paper] [Website]
  • "Enhancing Physical Consistency in Lightweight World Models", arXiv 2025.09. [Paper]
  • OccTENS: "OccTENS: 3D Occupancy World Model via Temporal Next-Scale Prediction", arXiv 2025.09. [Paper]
  • IRL-VLA: "IRL-VLA: Training an Vision-Language-Action Policy via Reward World Model", arXiv 2025.08. [Paper] [Website] [Code]
  • LiDARCrafter: "LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences", arXiv 2025.08. [Paper] [Website] [Code]
  • FASTopoWM: "FASTopoWM: Fast-Slow Lane Segment Topology Reasoning with Latent World Models", arXiv 2025.07. [Paper] [Code]
  • Orbis: "Orbis: Overcoming Challenges of Long-Horizon Prediction in Driving World Models", arXiv 2025.07. [Paper] [Code]
  • "World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving", arXiv 2025.07. [Paper]
  • NRSeg: "NRSeg: Noise-Resilient Learning for BEV Semantic Segmentation via Driving World Models", arXiv 2025.07. [Paper] [Code]
  • World4Drive: "World4Drive: End-to-End Autonomous Driving via Intention-aware Physical Latent World Model", ICCV2025. [Paper] [Code]
  • Epona: "Epona: Autoregressive Diffusion World Model for Autonomous Driving", ICCV2025. [Paper] [Code]
  • "Towards foundational LiDAR world models with efficient latent flow matching", arXiv 2025.06. [Paper]
  • SceneDiffuser++: "SceneDiffuser++: City-Scale Traffic Simulation via a Generative World Model", CVPR 2025. [Paper]
  • COME: "COME: Adding Scene-Centric Forecasting Control to Occupancy World Model", arXiv 2025.06. [Paper] [Code]
  • STAGE: "STAGE: A Stream-Centric Generative World Model for Long-Horizon Driving-Scene Simulation", arXiv 2025.06. [Paper]
  • ReSim: "ReSim: Reliable World Simulation for Autonomous Driving", arXiv 2025.06. [Paper] [Code] [Project Page]
  • "Ego-centric Learning of Communicative World Models for Autonomous Driving", arXiv 2025.06. [Paper]
  • Dreamland: "Dreamland: Controllable World Creation with Simulator and Generative Models", arXiv 2025.06. [Paper] [Project Page]
  • LongDWM: "LongDWM: Cross-Granularity Distillation for Building a Long-Term Driving World Model", arXiv 2025.06. [Paper] [Project Page]
  • GeoDrive: "GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control", arXiv 2025.05. [Paper] [Code]
  • FutureSightDrive: "FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving", NeurIPS 2025. [Paper] [Code]
  • Raw2Drive: "Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2)", arXiv 2025.05. [Paper]
  • VL-SAFE: "VL-SAFE: Vision-Language Guided Safety-Aware Reinforcement Learning with World Models for Autonomous Driving", arXiv 2025.05. [Paper] [Project Page]
  • PosePilot: "PosePilot: Steering Camera Pose for Generative World Models with Self-supervised Depth", arXiv 2025.05. [Paper]
  • "World Model-Based Learning for Long-Term Age of Information Minimization in Vehicular Networks", arXiv 2025.05. [Paper]
  • "Learning to Drive from a World Model", arXiv 2025.04. [Paper]
  • DriVerse: "DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment", arXiv 2025.04. [Paper]
  • "End-to-End Driving with Online Trajectory Evaluation via BEV World Model", arXiv 2025.04. [Paper] [Code]
  • "Knowledge Graphs as World Models for Semantic Material-Aware Obstacle Handling in Autonomous Vehicles", arXiv 2025.03. [Paper]
  • MiLA: "MiLA: Multi-view Intensive-fidelity Long-term Video Generation World Model for Autonomous Driving", arXiv 2025.03. [Paper] [Project Page]
  • SimWorld: "SimWorld: A Unified Benchmark for Simulator-Conditioned Scene Generation via World Model", arXiv 2025.03. [Paper] [Project Page]
  • UniFuture: "Seeing the Future, Perceiving the Future: A Unified Driving World Model for Future Generation and Perception", arXiv 2025.03. [Paper] [Project Page]
  • EOT-WM: "Other Vehicle Trajectories Are Also Needed: A Driving World Model Unifies Ego-Other Vehicle Trajectories in Video Latent Space", arXiv 2025.03. [Paper]
  • "Temporal Triplane Transformers as Occupancy World Models", arXiv 2025.03. [Paper]
  • InDRiVE: "InDRiVE: Intrinsic Disagreement based Reinforcement for Vehicle Exploration through Curiosity Driven Generalized World Model", arXiv 2025.02. [Paper]
  • MaskGWM: "MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction", arXiv 2025.02. [Paper]
  • Dream to Drive: "Dream to Drive: Model-Based Vehicle Control Using Analytic World Models", arXiv 2025.02. [Paper]
  • "Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving", ICLR 2025. [Paper]
  • "Dream to Drive with Predictive Individual World Model", IEEE TIV. [Paper] [Code]
  • HERMES: "HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation", arXiv 2025.01. [Paper]
  • AdaWM: "AdaWM: Adaptive World Model based Planning for Autonomous Driving", ICLR 2025. [Paper]
  • AD-L-JEPA: "AD-L-JEPA: Self-Supervised Spatial World Models with Joint Embedding Predictive Architecture for Autonomous Driving with LiDAR Data", arXiv 2025.01. [Paper]
  • DrivingWorld: "DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT", arXiv 2024.12. [Paper] [Code] [Project Page]
  • DrivingGPT: "DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers", arXiv 2024.12. [Paper] [Project Page]
  • "An Efficient Occupancy World Model via Decoupled Dynamic Flow and Image-assisted Training", arXiv 2024.12. [Paper]
  • GEM: "GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control", arXiv 2024.12. [Paper] [Project Page]
  • GaussianWorld: "GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction", arXiv 2024.12. [Paper] [Code]
  • Doe-1: "Doe-1: Closed-Loop Autonomous Driving with Large World Model", arXiv 2024.12. [Paper] [Project Page] [Code]
  • "Pysical Informed Driving World Model", arXiv 2024.12. [Paper] [Project Page]
  • InfiniCube: "InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models", arXiv 2024.12. [Paper] [Project Page]
  • InfinityDrive: "InfinityDrive: Breaking Time Limits in Driving World Models", arXiv 2024.12. [Paper] [Project Page]
  • ReconDreamer: "ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration", arXiv 2024.11. [Paper] [Project Page]
  • Imagine-2-Drive: "Imagine-2-Drive: High-Fidelity World Modeling in CARLA for Autonomous Vehicles", ICRA 2025. [Paper] [Project Page]
  • DynamicCity: "DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes", ICLR 2025 Spotlight. [Paper] [Project Page] [Code]
  • DriveDreamer4D: "World Models Are Effective Data Machines for 4D Driving Scene Representation", arXiv 2024.10. [Paper] [Project Page]
  • DOME: "Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model", arXiv 2024.10. [Paper] [Project Page]
  • SSR: "Does End-to-End Autonomous Driving Really Need Perception Tasks?", arXiv 2024.09. [Paper] [Code]
  • "Mitigating Covariate Shift in Imitation Learning for Autonomous Vehicles Using Latent Space Generative World Models", arXiv 2024.09. [Paper]
  • LatentDriver: "Learning Multiple Probabilistic Decisions from Latent World Model in Autonomous Driving", arXiv 2024.09. [Paper] [Code]
  • RenderWorld: "World Model with Self-Supervised 3D Label", arXiv 2024.09. [Paper]
  • OccLLaMA: "An Occupancy-Language-Action Generative World Model for Autonomous Driving", arXiv 2024.09. [Paper]
  • DriveGenVLM: "Real-world Video Generation for Vision Language Model based Autonomous Driving", arXiv 2024.08. [Paper]
  • Drive-OccWorld: "Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving", arXiv 2024.08. [Paper]
  • CarFormer: "Self-Driving with Learned Object-Centric Representations", ECCV 2024. [Paper] [Code]
  • BEVWorld: "A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space", arXiv 2024.07. [Paper] [Code]
  • TOKEN: "Tokenize the World into Object-level Knowledge to Address Long-tail Events in Autonomous Driving", arXiv 2024.07. [Paper]
  • UMAD: "Unsupervised Mask-Level Anomaly Detection for Autonomous Driving", arXiv 2024.06. [Paper]
  • SimGen: "Simulator-conditioned Driving Scene Generation", arXiv 2024.06. [Paper] [Code]
  • AdaptiveDriver: "Planning with Adaptive World Models for Autonomous Driving", arXiv 2024.06. [Paper] [Code]
  • UnO: "Unsupervised Occupancy Fields for Perception and Forecasting", CVPR 2024. [Paper] [Code]
  • LAW: "Enhancing End-to-End Autonomous Driving with Latent World Model", arXiv 2024.06. [Paper] [Code]
  • Delphi: "Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation", arXiv 2024.06. [Paper] [Code]
  • OccSora: "4D Occupancy Generation Models as World Simulators for Autonomous Driving", arXiv 2024.05. [Paper] [Code]
  • MagicDrive3D: "Controllable 3D Generation for Any-View Rendering in Street Scenes", arXiv 2024.05. [Paper] [Code]
  • Vista: "A Generalizable Driving World Model with High Fidelity and Versatile Controllability", NeurIPS 2024. [Paper] [Code]
  • CarDreamer: "Open-Source Learning Platform for World Model based Autonomous Driving", arXiv 2024.05. [Paper] [Code]
  • DriveSim: "Probing Multimodal LLMs as World Models for Driving", arXiv 2024.05. [Paper] [Code]
  • DriveWorld: "4D Pre-trained Scene Understanding via World Models for Autonomous Driving", CVPR 2024. [Paper]
  • LidarDM: "Generative LiDAR Simulation in a Generated World", arXiv 2024.04. [Paper] [Code]
  • SubjectDrive: "Scaling Generative Data in Autonomous Driving via Subject Control", arXiv 2024.03. [Paper] [Project]
  • DriveDreamer-2: "LLM-Enhanced World Models for Diverse Driving Video Generation", arXiv 2024.03. [Paper] [Code]
  • Think2Drive: "Efficient Reinforcement Learning by Thinking in Latent World Model for Quasi-Realistic Autonomous Driving", ECCV 2024. [Paper]
  • MARL-CCE: "Modelling Competitive Behaviors in Autonomous Driving Under Generative World Model", ECCV 2024. [Paper] [Code]
  • GenAD: "Generalized Predictive Model for Autonomous Driving", CVPR 2024. [Paper] [Data]
  • GenAD: "Generative End-to-End Autonomous Driving", ECCV 2024. [Paper] [Code]
  • NeMo: "Neural Volumetric World Models for Autonomous Driving", ECCV 2024. [Paper]
  • MARL-CCE: "Modelling-Competitive-Behaviors-in-Autonomous-Driving-Under-Generative-World-Model", ECCV 2024. [Code]
  • ViDAR: "Visual Point Cloud Forecasting enables Scalable Autonomous Driving", CVPR 2024. [Paper] [Code]
  • Drive-WM: "Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving", CVPR 2024. [Paper] [Code]
  • Cam4DOCC: "Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications", CVPR 2024. [Paper] [Code]
  • Panacea: "Panoramic and Controllable Video Generation for Autonomous Driving", CVPR 2024. [Paper] [Code]
  • OccWorld: "Learning a 3D Occupancy World Model for Autonomous Driving", ECCV 2024. [Paper] [Code]
  • Copilot4D: "Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion", ICLR 2024. [Paper]
  • DrivingDiffusion: "Layout-Guided multi-view driving scene video generation with latent diffusion model", ECCV 2024. [Paper] [Code]
  • SafeDreamer: "Safe Reinforcement Learning with World Models", ICLR 2024. [Paper] [Code]
  • MagicDrive: "Street View Generation with Diverse 3D Geometry Control", ICLR 2024. [Paper] [Code]
  • DriveDreamer: "Towards Real-world-driven World Models for Autonomous Driving", ECCV 2024. [Paper] [Code]
  • SEM2: "Enhance Sample Efficiency and Robustness of End-to-end Urban Autonomous Driving via Semantic Masked World Model", TITS. [Paper]

Citation

If you find this repository useful, please consider citing this list:

@misc{leo2024worldmodelspaperslist,
    title = {Awesome-World-Models},
    author = {Leo Fan},
    journal = {GitHub repository},
    url = {https://github.com/leofan90/Awesome-World-Models},
    year = {2024},
}

About

A comprehensive list of papers for the definition of World Models and using World Models for General Video Generation, Embodied AI, and Autonomous Driving, including papers, codes, and related websites.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published