A curated collection of fun and creative examples generated with Nano Banana & Nano Banana Pro🍌, Gemini-2.5-flash-image based model. We also release Nano-consistent-150K openly to support the commu…

19,051 1,985 Updated Dec 12, 2025

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 21,922 3,847 Updated Dec 23, 2025

KlingTeam / CamCloneMaster

[SIGGRAPH Asia'25] Enabling Reference-based Camera Control via Context without Explicit 3D Estimation

Python 139 11 Updated Oct 17, 2025

Kunbyte-AI / DRA-Ctrl

Official Implementation of DRA-Ctrl (Dimension-Reduction Attack! Video Generative Models are Experts on Controllable Image Synthesis)

Python 120 12 Updated Aug 15, 2025

Lakonik / piFlow

pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation

Python 226 9 Updated Dec 21, 2025

ZiyuGuo99 / MME-CoF

Are Video Models Ready as Zero-shot Reasoners?

Python 84 4 Updated Nov 24, 2025

baaivision / Emu3.5

Native Multimodal Models are World Learners

Python 1,372 52 Updated Nov 28, 2025

apple / pico-banana-400k

Python 1,738 77 Updated Dec 16, 2025

meituan-longcat / LongCat-Video

Python 1,663 223 Updated Dec 20, 2025

jiyt17 / ReDiff

Codebase of 'From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model'

Python 40 Updated Oct 27, 2025

yihao-meng / HoloCine

Official Implementations for Paper - HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives

Python 564 105 Updated Nov 26, 2025

bytedance / Video-As-Prompt

Official repo for paper "Video-As-Prompt: Unified Semantic Control for Video Generation"

Python 333 19 Updated Nov 2, 2025

zcablii / ViTP

Offical implementation of "Visual Instruction Pretraining for Domain-Specific Foundation Models"

Python 134 1 Updated Nov 12, 2025

deepseek-ai / DeepSeek-OCR

Contexts Optical Compression

Python 21,555 1,927 Updated Oct 25, 2025

Tencent-Hunyuan / HunyuanWorld-Mirror

Fast and Universal 3D reconstruction model for versatile tasks

Python 914 78 Updated Dec 18, 2025

World-In-World / world-in-world

Code implementation of the paper "World-in-World: World Models in a Closed-Loop World"

Jupyter Notebook 118 2 Updated Dec 22, 2025

krea-ai / realtime-video

Krea Realtime 14B. An open-source realtime AI video model.

Python 428 24 Updated Nov 13, 2025

InternRobotics / PhysHSI

Official implementation of the paper: "PhysHSI: Towards a Real-World Generalizable and Natural Humanoid-Scene Interaction System"

Python 217 13 Updated Oct 14, 2025

EzioBy / Ditto

[Preprint 2025] Ditto: Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset

Python 539 43 Updated Oct 29, 2025

Purshow / Awesome-Unified-Multimodal

📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.

339 15 Updated Oct 16, 2025

WayneJin0918 / SRUM

Official repo of paper "SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models". A post-training framework that creates a cost-effective, self-iterative optimization loop.

Python 88 6 Updated Nov 26, 2025

Qingyan Bai EzioBy

Lists (8)

3D

Dev

diffusion applications

image processing (low-level)

NLP&Vision

self-supervised learning

synthesis and editing

text2img synthesis

Stars