- hangzhou
Stars
[NeurIPS 2025] Image editing is worth a single LoRA! 0.1% training data for fantastic image editing! Surpasses GPT-4o in ID persistence~ MoE ckpt released! Only 4GB VRAM is enough to run!
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Allegro is a powerful text-to-video model that generates high-quality videos up to 6 seconds at 15 FPS and 720p resolution from simple text input.
[ICLR 2025] Pyramidal Flow Matching for Efficient Video Generative Modeling
DDPO for finetuning diffusion models, implemented in PyTorch with LoRA support
[NeurIPS 2021] Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation
[ICCV 2023] Tracking Anything with Decoupled Video Segmentation
[ECCV 2022] XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model
[CVPR 2024 Highlight] Putting the Object Back Into Video Object Segmentation
MS-AOT: Winner of VOT-STs2022 and VOT-RTs2022 (real-time)
DMAOT ranked 1st in the VOTS 2023 challenge.
Zhejiang University Graduation Thesis LaTeX Template
A list of video object segmentation (VOS) papers
[TIP 2023] Co-Learning Meets Stitch-Up for Noisy Multi-label Visual Recognition.
[ICCV 2023] Phasic Content Fusing Diffusion Model with Directional Distribution Consistency for Few-Shot Model Adaption
Official implementation of the paper “Inversion-Based Style Transfer with Diffusion Models” (CVPR 2023)
Official code for ICCV 2023 paper: "Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation".
[ICCV 2023] "TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition" (Official Implementation)
Official implementation of “JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human Mesh Recovery“
Semantic segmentation models with 500+ pretrained convolutional and transformer-based backbones.
[ICCV 2023] VPD is a framework that leverages the high-level and low-level knowledge of a pre-trained text-to-image diffusion model to downstream visual perception tasks.
A LaTeX resume template designed for optimal information density and aesthetic appeal.
Official Implementation for "Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models" (SIGGRAPH 2023)
Official Pytorch Implementation for “Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation” (CVPR 2023)
Zero-shot Image-to-Image Translation [SIGGRAPH 2023]
A curated list of papers, code and resources pertaining to few-shot image generation.
An open-source project dedicated to tracking and segmenting any objects in videos, either automatically or interactively. The primary algorithms utilized include the Segment Anything Model (SAM) fo…
Repository of our CVPR2023 paper "Lana: A Language-Capable Navigator for Instruction Following and Generation"