ZhendongWang6

Follow

🎯

Focusing

Zhendong Wang ZhendongWang6

🎯

Focusing

Follow

Ph.D. student, focus on computer vision and deep learning.

130 followers · 54 following

University of Science and Technology of China
Beijing, China
12:11 (UTC +08:00)
https://zhendongwang6.github.io/
https://scholar.google.com.hk/citations?user=Ya5VDjQAAAAJ&hl=zh-CN

Achievements

Achievements

Highlights

Pro

Lists (27)

Sort

chatgpt

clip

controlnet

dataset

diffusion model

67 repositories

face-anti-spoofing

10 repositories

face-forgery-detection

23 repositories

flow

gan

img2img

interview

knowledge distillation

large language models

10 repositories

large vision model

ocr

pretrain

r1

sam系列

score metrics

segmentation

subject driven generation

survey

tools

16 repositories

vae

10 repositories

video generation

vision_language

visual text generation

Stars

ustctug / ustcthesis

LaTeX template for USTC thesis

TeX 1,995 439 Updated Jan 13, 2026

QwenLM / Qwen-Image-Layered

Qwen-Image-Layered: Layered Decomposition for Inherent Editablity

Python 1,534 117 Updated Dec 31, 2025

GVCLab / PersonaLive

PersonaLive! : Expressive Portrait Image Animation for Live Streaming

Python 1,593 254 Updated Dec 30, 2025

meituan-longcat / LongCat-Image

Python 603 54 Updated Feb 3, 2026

MCG-NJU / DDT

DDT: Decoupled Diffusion Transformer

Python 361 17 Updated Aug 22, 2025

Xilluill / KV-Edit

[ICCV 2025] Official implementation for KV-Edit: Training-Free Image Editing for Precise Background Preservation

Python 367 17 Updated May 21, 2025

XueZeyue / Awesome-Visual-Generation-Alignment-Survey

A survey for visual generation alignment

116 8 Updated Nov 9, 2025

baaivision / Emu3.5

Native Multimodal Models are World Learners

Python 1,445 54 Updated Dec 30, 2025

apple / pico-banana-400k

Python 1,772 78 Updated Dec 16, 2025

TIGER-AI-Lab / EditReward

EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing [ICLR 2026]

Python 117 4 Updated Feb 4, 2026

bytetriper / RAE

Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"

Python 1,744 64 Updated Jan 20, 2026

Osilly / Interleaving-Reasoning-Generation

[ICLR 2026] This is an early exploration to introduce Interleaving Reasoning to Text-to-image Generation field and achieve the SoTA benchmark performance. It also significantly improves the quality…

Python 86 Updated Jan 26, 2026

FlyMyAI / flymyai-lora-trainer

Qwen-Image text to image lora trainer

Python 699 62 Updated Dec 16, 2025

HorizonWind2004 / reconstruction-alignment

[ICLR 2026] Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potential in Unified Multimodal Models through Self-supervised Lear…

Python 355 13 Updated Jan 30, 2026

Tencent-Hunyuan / HunyuanImage-3.0

HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generation

Python 2,801 143 Updated Feb 3, 2026

PKU-YuanGroup / UAE

Official repository for the UAE paper, unified-GRPO, and unified-Bench

Python 156 6 Updated Sep 12, 2025

stepfun-ai / NextStep-1

[🚀 ICLR 2026]NextStep-1: SOTA Autogressive Image Generation with Continuous Tokens. A research project developed by the StepFun’s Multimodal Intelligence team.

Python 599 18 Updated Dec 25, 2025

QwenLM / Qwen-Image

Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.

Python 7,218 420 Updated Dec 31, 2025

wyhlovecpp / GPT-Image-Edit

GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset

Python 244 5 Updated Aug 15, 2025

Wan-Video / Wan2.2

Wan: Open and Advanced Large-Scale Video Generative Models

Python 14,023 1,675 Updated Dec 17, 2025

facebookresearch / vjepa2

PyTorch code and models for VJEPA2 self-supervised learning from video.

Python 2,921 316 Updated Aug 28, 2025

fudan-generative-vision / PPFlow

Python 7 1 Updated Jul 20, 2025

Jiawei-Yang / DeTok

Official PyTorch Implementation of "Latent Denoising Makes Good Visual Tokenizers"

Jupyter Notebook 172 4 Updated Dec 17, 2025

Yuanshi9815 / OminiControl

[ICCV 2025 Highlight] OminiControl: Minimal and Universal Control for Diffusion Transformer

Python 1,900 143 Updated Jul 3, 2025

jy0205 / Pyramid-Flow

[ICLR 2025] Pyramidal Flow Matching for Efficient Video Generative Modeling

Python 3,153 305 Updated Dec 21, 2024

Martinser / REG

[NeurIPS 2025 Oral] Representation Entanglement for Generation: Training Diffusion Transformers Is Much Easier Than You Think

Python 243 19 Updated Oct 4, 2025

11cafe / jaaz

The world's first open-source multimodal creative assistant This is a substitute for Canva and Manus that prioritizes privacy and is usable locally.

TypeScript 5,842 553 Updated Nov 10, 2025

genmoai / mochi

The best OSS video generation models, created by Genmo

Python 3,589 471 Updated Nov 14, 2025

modelscope / DiffSynth-Studio

Enjoy the magic of Diffusion models!

Python 11,695 1,123 Updated Feb 3, 2026

Wan-Video / Wan2.1

Wan: Open and Advanced Large-Scale Video Generative Models

Python 15,266 2,360 Updated Dec 15, 2025