- hangzhou
Stars
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Semantic segmentation models with 500+ pretrained convolutional and transformer-based backbones.
Zhejiang University Graduation Thesis LaTeX Template
[ICLR 2025] Pyramidal Flow Matching for Efficient Video Generative Modeling
An open-source project dedicated to tracking and segmenting any objects in videos, either automatically or interactively. The primary algorithms utilized include the Segment Anything Model (SAM) fo…
[NeurIPS 2025] Image editing is worth a single LoRA! 0.1% training data for fantastic image editing! Surpasses GPT-4o in ID persistence~ MoE ckpt released! Only 4GB VRAM is enough to run!
[ECCV 2022] XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model
[ICCV 2023] Tracking Anything with Decoupled Video Segmentation
Zero-shot Image-to-Image Translation [SIGGRAPH 2023]
Allegro is a powerful text-to-video model that generates high-quality videos up to 6 seconds at 15 FPS and 720p resolution from simple text input.
Official Pytorch Implementation for “Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation” (CVPR 2023)
[CVPR 2024 Highlight] Putting the Object Back Into Video Object Segmentation
[ICCV 2023] "TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition" (Official Implementation)
Official Implementation for "Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models" (SIGGRAPH 2023)
DDPO for finetuning diffusion models, implemented in PyTorch with LoRA support
A LaTeX resume template designed for optimal information density and aesthetic appeal.
Official implementation of the paper “Inversion-Based Style Transfer with Diffusion Models” (CVPR 2023)
[NeurIPS 2021] Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation
[ICCV 2023] VPD is a framework that leverages the high-level and low-level knowledge of a pre-trained text-to-image diffusion model to downstream visual perception tasks.
A curated list of papers, code and resources pertaining to few-shot image generation.
A list of video object segmentation (VOS) papers
Official code for ICCV 2023 paper: "Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation".
CVPR2022 - Deep Hierarchical Semantic Segmentation - A structured, pixel-wise description of visual scenes in terms of the class hierarchy.
Repository of our CVPR2023 paper "Lana: A Language-Capable Navigator for Instruction Following and Generation"
[ICCV 2023] Phasic Content Fusing Diffusion Model with Directional Distribution Consistency for Few-Shot Model Adaption
Official implementation of “JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human Mesh Recovery“
DMAOT ranked 1st in the VOTS 2023 challenge.
[TIP 2023] Co-Learning Meets Stitch-Up for Noisy Multi-label Visual Recognition.