- hangzhou
Stars
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Semantic segmentation models with 500+ pretrained convolutional and transformer-based backbones.
[ICLR 2025] Pyramidal Flow Matching for Efficient Video Generative Modeling
[NeurIPS 2025] Image editing is worth a single LoRA! 0.1% training data for fantastic image editing! Surpasses GPT-4o in ID persistence~ MoE ckpt released! Only 4GB VRAM is enough to run!
[ECCV 2022] XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model
[ICCV 2023] Tracking Anything with Decoupled Video Segmentation
Zero-shot Image-to-Image Translation [SIGGRAPH 2023]
Allegro is a powerful text-to-video model that generates high-quality videos up to 6 seconds at 15 FPS and 720p resolution from simple text input.
Official Pytorch Implementation for “Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation” (CVPR 2023)
[CVPR 2024 Highlight] Putting the Object Back Into Video Object Segmentation
[ICCV 2023] "TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition" (Official Implementation)
DDPO for finetuning diffusion models, implemented in PyTorch with LoRA support
[NeurIPS 2021] Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation
Official code for ICCV 2023 paper: "Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation".
CVPR2022 - Deep Hierarchical Semantic Segmentation - A structured, pixel-wise description of visual scenes in terms of the class hierarchy.
Repository of our CVPR2023 paper "Lana: A Language-Capable Navigator for Instruction Following and Generation"
[ICCV 2023] Phasic Content Fusing Diffusion Model with Directional Distribution Consistency for Few-Shot Model Adaption
Official implementation of “JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human Mesh Recovery“
DMAOT ranked 1st in the VOTS 2023 challenge.
[TIP 2023] Co-Learning Meets Stitch-Up for Noisy Multi-label Visual Recognition.
MS-AOT: Winner of VOT-STs2022 and VOT-RTs2022 (real-time)