A comprehensive list of papers for the definition of World Models and using World Models for General Video Generation, Embodied AI, and Autonomous Driving, including papers, codes, and related webs…

931 22 Updated Dec 24, 2025

yangzhou24 / OmniWorld

OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling

Python 406 7 Updated Dec 20, 2025

facebookresearch / sam-3d-objects

SAM 3D Objects

Python 5,063 470 Updated Dec 16, 2025

alex4727 / MotionStream

MotionStream: Real-Time Video Generation with Interactive Motion Controls

443 16 Updated Nov 13, 2025

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 21,929 3,852 Updated Dec 24, 2025

mlfoundations / model-soups

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

Python 503 46 Updated Jul 15, 2024

meta-pytorch / torchtune

PyTorch native post-training library

Python 5,628 693 Updated Dec 23, 2025

krea-ai / realtime-video

Krea Realtime 14B. An open-source realtime AI video model.

Python 428 24 Updated Nov 13, 2025

knightnemo / Awesome-World-Models

A Curated List of Awesome Works in World Modeling, Aiming to Serve as a One-stop Resource for Researchers, Practitioners, and Enthusiasts Interested in World Modeling.

1,543 66 Updated Dec 22, 2025

MoonshotAI / Kimi-Linear

1,242 57 Updated Nov 17, 2025

HKUDS / ViMax

"ViMax: Agentic Video Generation (Director, Screenwriter, Producer, and Video Generator All-in-One)"

Python 1,611 273 Updated Dec 15, 2025

baaivision / Emu3.5

Native Multimodal Models are World Learners

Python 1,372 52 Updated Nov 28, 2025

meituan-longcat / LongCat-Video

Python 1,669 224 Updated Dec 20, 2025

bytedance / Video-As-Prompt

Official repo for paper "Video-As-Prompt: Unified Semantic Control for Video Generation"

Python 334 19 Updated Nov 2, 2025

yihao-meng / HoloCine

Official Implementations for Paper - HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives

Python 564 105 Updated Nov 26, 2025

deepseek-ai / DeepSeek-OCR

Contexts Optical Compression

Python 21,558 1,928 Updated Oct 25, 2025

nvidia-cosmos / cosmos-predict2.5

Cosmos-Predict2.5, the latest version of the Cosmos World Foundation Models (WFMs) family, specialized for simulating and predicting the future state of the world in the form of video.

Python 534 45 Updated Dec 20, 2025

EzioBy / Ditto

[Preprint 2025] Ditto: Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset

Python 539 43 Updated Oct 29, 2025

Tencent-Hunyuan / Hunyuan3D-Omni

Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets

Python 462 35 Updated Oct 17, 2025