-
University of Science and Technology of China
- Beijing, China
-
13:19
(UTC +08:00) - https://zhendongwang6.github.io/
- https://scholar.google.com.hk/citations?user=Ya5VDjQAAAAJ&hl=zh-CN
Highlights
- Pro
Lists (27)
Sort Name ascending (A-Z)
chatgpt
clip
controlnet
dataset
diffusion model
face-anti-spoofing
face-forgery-detection
flow
gan
img2img
interview
knowledge distillation
large language models
large vision model
ocr
pretrain
r1
sam系列
score metrics
segmentation
subject driven generation
survey
tools
vae
video generation
vision_language
visual text generation
Stars
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
PersonaLive! : Expressive Portrait Image Animation for Live Streaming
[ICCV 2025] Official implementation for KV-Edit: Training-Free Image Editing for Precise Background Preservation
A survey for visual generation alignment
Native Multimodal Models are World Learners
EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
This is an early exploration to introduce Interleaving Reasoning to Text-to-image Generation field and achieve the SoTA benchmark performance. It also significantly improves the quality, fine-grain…
Qwen-Image text to image lora trainer
Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potential in Unified Multimodal Models through Self-supervised Learning.
HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generation
Official repository for the UAE paper, unified-GRPO, and unified-Bench
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset
Wan: Open and Advanced Large-Scale Video Generative Models
PyTorch code and models for VJEPA2 self-supervised learning from video.
Official PyTorch Implementation of "Latent Denoising Makes Good Visual Tokenizers"
[ICCV 2025 Highlight] OminiControl: Minimal and Universal Control for Diffusion Transformer
[ICLR 2025] Pyramidal Flow Matching for Efficient Video Generative Modeling
[NeurIPS 2025 Oral] Representation Entanglement for Generation: Training Diffusion Transformers Is Much Easier Than You Think
The world's first open-source multimodal creative assistant This is a substitute for Canva and Manus that prioritizes privacy and is usable locally.
The best OSS video generation models, created by Genmo
Enjoy the magic of Diffusion models!
Wan: Open and Advanced Large-Scale Video Generative Models
A curated list of recent diffusion models for video generation, editing, and various other applications.