- hangzhou
Stars
CVPR2026 Pluggable Pruning with Contiguous Layer Distillation for Diffusion Transformers
A unified framework for easy reinforcement learning in Flow-Matching models
[ICLR 2026] "Does FLUX Already Know How to Perform Physically Plausible Image Composition?" (Official Implementation)
[CVPR2026 🎉] Stand-In is a lightweight, plug-and-play framework for identity-preserving video generation.
FireRed-Image-Edit is a powerful image editing foundation model achieving open-source state-of-the-art performance with precise instruction following, high-fidelity generation, superior identity co…
Streaming Flux editor: live camera→ editing every frames at interactive FPS based on FLUX.2-Klein-4B. Runs on a single H100 at 15+ FPS
The ultimate training toolkit for finetuning diffusion models
NanoBanana PPT Skills 基于 AI 自动生成高质量 PPT 图片和视频的强大工具,支持智能转场和交互式播放
Scalable group inference for generating high quality and diverse images with diffusion models.
[NeurIPS 2025 D&B🔥] ImgEdit: A Unified Image Editing Dataset and Benchmark
TurboDiffusion: 100–200× Acceleration for Video Diffusion Models
[CVPR 2026] PersonaLive! : Expressive Portrait Image Animation for Live Streaming
[ICLR 2026] Taming large-scale few-step training with self-adversarial flows! 👏🏻
[Tutorial] Few-Step Distillation for Text-to-Image Generation: A Practical Guide
[ICLR 2026] ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation
Official Implementation of "MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation"
[DEIMv2] Real Time Object Detection Meets DINOv3
[ICLR 2026 Oral] DiffusionNFT: Online Diffusion Reinforcement with Forward Process
OpenVision (ICCV 2025), OpenVision 2 (CVPR 2026), and OpenVision 3
Collection of extracted System Prompts from popular chatbots like ChatGPT, Claude & Gemini
All-in-one training for vision models (YOLO, ViTs, RT-DETR, DINOv3): pretraining, fine-tuning, distillation.
TradingAgents: Multi-Agents LLM Financial Trading Framework
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
T2I-Copilot: A Training-Free Multi-Agent Text-to-Image System for Enhanced Prompt Interpretation and Interactive Generation (ICCV'25)
[ICCV 2025] Hybrid Layout Control for Diffusion Transformer: Fewer Annotations, Superior Aesthetics.
A simple screen parsing tool towards pure vision based GUI agent