-
Video Coding Laboratory, Peking University
- https://lotayou.github.io
Lists (1)
Sort Name ascending (A-Z)
Starred repositories
"MoCA: Mixture-of-Components Attention for Scalable Compositional 3D Generation"
A Modular Framework for 3D Generation and Beyond [WIP]
Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
🦜🔗 The platform for reliable agents.
An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models
Tongyi Deep Research, the Leading Open-source Deep Research Agent
Official implementation of "Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs".
Edit-R1: Reinforce Image Editing with Diffusion Negative-Aware Finetuning and MLLM Implicit Feedback
The ultimate training toolkit for finetuning diffusion models
A curated collection of fun and creative examples generated with Nano Banana & Nano Banana Pro🍌, Gemini-2.5-flash-image based model. We also release Nano-consistent-150K openly to support the commu…
🤗A PyTorch-native Inference Engine with Hybrid Cache Acceleration and Parallelism for DiTs: Z-Image, FLUX2, Qwen-Image, etc.
This a collection of ComfyUI workflows to upscale images to 2K, 4K or 8K. Great for general upscale on photos and illustrations with Magnific-like results.
🎁 6,500,000+ Unsplash images made available for research and machine learning
https://wavespeed.ai/ Context parallel attention that accelerates DiT model inference with dynamic caching
Official Implementation for "Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation" (CVPR 2021) presenting the pixel2style2pixel (pSp) framework
[ICCV 2023] StyleGANEX: StyleGAN-Based Manipulation Beyond Cropped Aligned Faces
The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention
Illumination Drawing Tools for Text-to-Image Diffusion Models
[CVPR 2025 Highlight🔥] Identity-Preserving Text-to-Video Generation by Frequency Decomposition
InstantID-ROME: Improved Identity-Preserving Generation in Seconds 🔥
High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance
Custom nodes pack for ComfyUI This custom node helps to conveniently enhance images through Detector, Detailer, Upscaler, Pipe, and more.
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
A curated list of papers, code and resources pertaining to image composition/compositing or object insertion/addition/compositing, which aims to generate realistic composite image.
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.
Python tools for 3D face: 3DMM, Mesh processing(transform, camera, light, render), 3D face representations.
Pytorch Implementation of: "Stable-Hair: Real-World Hair Transfer via Diffusion Model" (AAAI 2025)
[CVPR 2024 Highlight] The official repo for "GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians"