Skip to content
View VisionU's full-sized avatar

Block or report VisionU

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Seedance 2.0 prompt skill,使用该Skill生成Seedance 2.0 视频提示词

1,384 152 Updated Feb 12, 2026

OmniGen2: Exploration to Advanced Multimodal Generation. https://arxiv.org/abs/2506.18871

Jupyter Notebook 4,036 22 Updated Mar 20, 2026

DreamVVT: Mastering Realistic Video Virtual Try-On in the Wild via a Stage-Wise Diffusion Transformer Framework

147 11 Updated Aug 6, 2025
HTML 1 Updated Aug 7, 2025

Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Jupyter Notebook 4,296 359 Updated Nov 27, 2025

[CVPR 2025 Workshop] CatV2TON is a lightweight DiT-based visual virtual try-on model, capable of supporting try-on for both images and videos.

Python 206 14 Updated Feb 24, 2025

MMaDA - Open-Sourced Multimodal Large Diffusion Language Models (dLLMs with block diffusion, mixed-CoT, unified RL)

Python 1,618 87 Updated Feb 14, 2026

Open-source unified multimodal model

Python 5,780 511 Updated Oct 27, 2025

GPT-ImgEval: Evaluating GPT-4o’s state-of-the-art image generation capabilities

Python 305 8 Updated May 3, 2025

👔支持 Markdown 和富文本的在线简历排版工具

JavaScript 1,901 268 Updated Jul 13, 2023

Official implementation of BLIP3o-Series

Python 1,657 77 Updated Nov 29, 2025

[AAAI2025] DreamFit: Garment-Centric Human Generation via a Lightweight Anything-Dressing Encoder

Python 143 13 Updated May 6, 2025

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 9,927 767 Updated Sep 22, 2025

HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation

Python 1,211 109 Updated Oct 15, 2025

[NeurIPS 2025] Image editing is worth a single LoRA! 0.1% training data for fantastic image editing! Surpasses GPT-4o in ID persistence~ MoE ckpt released! Only 4GB VRAM is enough to run!

Python 2,085 114 Updated Dec 19, 2025

[ICCV 2025] 🔥🔥 UNO: A Universal Customization Method for Both Single and Multi-Subject Conditioning

Python 1,353 77 Updated Sep 12, 2025

A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training

Python 750 44 Updated Mar 27, 2026

A SOTA open-source image editing model, which aims to provide comparable performance against the closed-source models like GPT-4o and Gemini 2 Flash.

Python 2,172 95 Updated Dec 29, 2025

SkyReels-V2: Infinite-length Film Generative model

Python 6,652 1,377 Updated Jan 29, 2026

Lets make video diffusion practical!

Python 16,707 1,649 Updated Oct 16, 2025

[ICCV2025] LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds

Python 2,579 205 Updated Mar 17, 2026
Python 9 Updated Apr 27, 2025

[ICCV 2025] DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models (official implement)

Jupyter Notebook 154 11 Updated May 21, 2025

[NeurIPS 2024 D&B Track] Official Repo for "LVD-2M: A Long-take Video Dataset with Temporally Dense Captions"

Python 78 3 Updated Oct 15, 2024

A ComfyUI custom node that integrates Google's Gemini Flash 2.0 Experimental model, enabling multimodal analysis of text, images, video frames, and audio directly within ComfyUI workflows.

Python 336 26 Updated Apr 22, 2025

[ICLR'25 Oral] Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think

Python 1,591 81 Updated Mar 16, 2025

Enjoy the magic of Diffusion models!

Python 12,117 1,179 Updated Mar 24, 2026

Official repository for LTX-Video

Python 9,778 932 Updated Jan 5, 2026
Next