-
University of Science and Technology of China
- Beijing, China
-
12:11
(UTC +08:00) - https://zhendongwang6.github.io/
- https://scholar.google.com.hk/citations?user=Ya5VDjQAAAAJ&hl=zh-CN
Highlights
- Pro
Lists (27)
Sort Name ascending (A-Z)
chatgpt
clip
controlnet
dataset
diffusion model
face-anti-spoofing
face-forgery-detection
flow
gan
img2img
interview
knowledge distillation
large language models
large vision model
ocr
pretrain
r1
sam系列
score metrics
segmentation
subject driven generation
survey
tools
vae
video generation
vision_language
visual text generation
Stars
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
PersonaLive! : Expressive Portrait Image Animation for Live Streaming
[ICCV 2025] Official implementation for KV-Edit: Training-Free Image Editing for Precise Background Preservation
A survey for visual generation alignment
Native Multimodal Models are World Learners
EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing [ICLR 2026]
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
[ICLR 2026] This is an early exploration to introduce Interleaving Reasoning to Text-to-image Generation field and achieve the SoTA benchmark performance. It also significantly improves the quality…
Qwen-Image text to image lora trainer
[ICLR 2026] Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potential in Unified Multimodal Models through Self-supervised Lear…
HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generation
Official repository for the UAE paper, unified-GRPO, and unified-Bench
[🚀 ICLR 2026]NextStep-1: SOTA Autogressive Image Generation with Continuous Tokens. A research project developed by the StepFun’s Multimodal Intelligence team.
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset
Wan: Open and Advanced Large-Scale Video Generative Models
PyTorch code and models for VJEPA2 self-supervised learning from video.
Official PyTorch Implementation of "Latent Denoising Makes Good Visual Tokenizers"
[ICCV 2025 Highlight] OminiControl: Minimal and Universal Control for Diffusion Transformer
[ICLR 2025] Pyramidal Flow Matching for Efficient Video Generative Modeling
[NeurIPS 2025 Oral] Representation Entanglement for Generation: Training Diffusion Transformers Is Much Easier Than You Think
The world's first open-source multimodal creative assistant This is a substitute for Canva and Manus that prioritizes privacy and is usable locally.
The best OSS video generation models, created by Genmo
Enjoy the magic of Diffusion models!
Wan: Open and Advanced Large-Scale Video Generative Models