-
Tongyi Lab, Alibaba Inc. | Peking University
- Hangzhou, China
- www.doublez.site
- in/doubleZ0108
- https://unsplash.com/@doublez0108
Lists (7)
Sort Name ascending (A-Z)
Starred repositories
[ICCV 2025] Official implementations for paper: VACE: All-in-One Video Creation and Editing
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
A curated collection of fun and creative examples generated with Nano Banana & Nano Banana Pro🍌, Gemini-2.5-flash-image based model. We also release Nano-consistent-150K openly to support the commu…
[BMVC 2025] C³-GS: Learning Context-aware, Cross-dimension, Cross-scale Feature for Generalizable Gaussian Splatting
Ongoing research training transformer models at scale
Official inference repo for FLUX.1 models
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
[ICME 2025] ICG-MVSNet: Learning Intra-view and Cross-view Relationships for Guidance in Multi-View Stereo
[ICCV 2025 Highlight] OminiControl: Minimal and Universal Control for Diffusion Transformer
[NeurIPS'23] Emergent Correspondence from Image Diffusion
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
Realtime Video and Audio Streaming with WebRTC and Gradio
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
A generative world for general-purpose robotics & embodied AI learning.
GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration.
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
Using Low-rank adaptation to quickly fine-tune diffusion models.
🐍 Geometric Computer Vision Library for Spatial AI
Open Source Image and Video Restoration Toolbox for Super-resolution, Denoise, Deblurring, etc. Currently, it includes EDSR, RCAN, SRResNet, SRGAN, ESRGAN, EDVR, BasicVSR, SwinIR, ECBSR, etc. Also …
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
[ECCV 2024 Oral 🔥] Arc2Face: A Foundation Model for ID-Consistent Human Faces ------------------------ [ICCVW 2025] ID-Consistent, Precise Expression Generation with Blendshape-Guided Diffusion
[TPAMI 2025] ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis
Code for robust monocular depth estimation described in "Ranftl et. al., Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer, TPAMI 2022"
Automatic integrate all Xiaomi devices to HomeAssistant via miot-spec, support Wi-Fi, BLE, ZigBee devices. 小米米家智能家居设备接入Hass集成
✨ Hotshot-XL: State-of-the-art AI text-to-GIF model trained to work alongside Stable Diffusion XL