Skip to content
View ZhendongWang6's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Block or report ZhendongWang6

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

LaTeX template for USTC thesis

TeX 1,995 439 Updated Jan 13, 2026

Qwen-Image-Layered: Layered Decomposition for Inherent Editablity

Python 1,534 117 Updated Dec 31, 2025

PersonaLive! : Expressive Portrait Image Animation for Live Streaming

Python 1,593 254 Updated Dec 30, 2025

DDT: Decoupled Diffusion Transformer

Python 361 17 Updated Aug 22, 2025

[ICCV 2025] Official implementation for KV-Edit: Training-Free Image Editing for Precise Background Preservation

Python 367 17 Updated May 21, 2025

A survey for visual generation alignment

116 8 Updated Nov 9, 2025

Native Multimodal Models are World Learners

Python 1,445 54 Updated Dec 30, 2025
Python 1,772 78 Updated Dec 16, 2025

EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing [ICLR 2026]

Python 117 4 Updated Feb 4, 2026

Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"

Python 1,744 64 Updated Jan 20, 2026

[ICLR 2026] This is an early exploration to introduce Interleaving Reasoning to Text-to-image Generation field and achieve the SoTA benchmark performance. It also significantly improves the quality…

Python 86 Updated Jan 26, 2026

Qwen-Image text to image lora trainer

Python 699 62 Updated Dec 16, 2025

[ICLR 2026] Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potential in Unified Multimodal Models through Self-supervised Lear…

Python 355 13 Updated Jan 30, 2026

HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generation

Python 2,801 143 Updated Feb 3, 2026

Official repository for the UAE paper, unified-GRPO, and unified-Bench

Python 156 6 Updated Sep 12, 2025

[🚀 ICLR 2026]NextStep-1: SOTA Autogressive Image Generation with Continuous Tokens. A research project developed by the StepFun’s Multimodal Intelligence team.

Python 599 18 Updated Dec 25, 2025

Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.

Python 7,218 420 Updated Dec 31, 2025

GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset

Python 244 5 Updated Aug 15, 2025

Wan: Open and Advanced Large-Scale Video Generative Models

Python 14,023 1,675 Updated Dec 17, 2025

PyTorch code and models for VJEPA2 self-supervised learning from video.

Python 2,921 316 Updated Aug 28, 2025

Official PyTorch Implementation of "Latent Denoising Makes Good Visual Tokenizers"

Jupyter Notebook 172 4 Updated Dec 17, 2025

[ICCV 2025 Highlight] OminiControl: Minimal and Universal Control for Diffusion Transformer

Python 1,900 143 Updated Jul 3, 2025

[ICLR 2025] Pyramidal Flow Matching for Efficient Video Generative Modeling

Python 3,153 305 Updated Dec 21, 2024

[NeurIPS 2025 Oral] Representation Entanglement for Generation: Training Diffusion Transformers Is Much Easier Than You Think

Python 243 19 Updated Oct 4, 2025

The world's first open-source multimodal creative assistant This is a substitute for Canva and Manus that prioritizes privacy and is usable locally.

TypeScript 5,842 553 Updated Nov 10, 2025

The best OSS video generation models, created by Genmo

Python 3,589 471 Updated Nov 14, 2025

Enjoy the magic of Diffusion models!

Python 11,695 1,123 Updated Feb 3, 2026

Wan: Open and Advanced Large-Scale Video Generative Models

Python 15,266 2,360 Updated Dec 15, 2025
Next