Stars
Seedance 2.0 prompt skill,使用该Skill生成Seedance 2.0 视频提示词
OmniGen2: Exploration to Advanced Multimodal Generation. https://arxiv.org/abs/2506.18871
DreamVVT: Mastering Realistic Video Virtual Try-On in the Wild via a Stage-Wise Diffusion Transformer Framework
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
[CVPR 2025 Workshop] CatV2TON is a lightweight DiT-based visual virtual try-on model, capable of supporting try-on for both images and videos.
MMaDA - Open-Sourced Multimodal Large Diffusion Language Models (dLLMs with block diffusion, mixed-CoT, unified RL)
GPT-ImgEval: Evaluating GPT-4o’s state-of-the-art image generation capabilities
[AAAI2025] DreamFit: Garment-Centric Human Generation via a Lightweight Anything-Dressing Encoder
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation
[NeurIPS 2025] Image editing is worth a single LoRA! 0.1% training data for fantastic image editing! Surpasses GPT-4o in ID persistence~ MoE ckpt released! Only 4GB VRAM is enough to run!
[ICCV 2025] 🔥🔥 UNO: A Universal Customization Method for Both Single and Multi-Subject Conditioning
A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training
A SOTA open-source image editing model, which aims to provide comparable performance against the closed-source models like GPT-4o and Gemini 2 Flash.
SkyReels-V2: Infinite-length Film Generative model
Lets make video diffusion practical!
[ICCV2025] LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds
[ICCV 2025] DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models (official implement)
[NeurIPS 2024 D&B Track] Official Repo for "LVD-2M: A Long-take Video Dataset with Temporally Dense Captions"
A ComfyUI custom node that integrates Google's Gemini Flash 2.0 Experimental model, enabling multimodal analysis of text, images, video frames, and audio directly within ComfyUI workflows.
[ICLR'25 Oral] Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Enjoy the magic of Diffusion models!