-
Zhejiang University
- Hangzhou China
Starred repositories
[ECCV 2024 & TIP] DepictQA: Depicted Image Quality Assessment with Vision Language Models
[NeurIPS 2025] Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance
A unified inference and post-training framework for accelerated video generation.
Industry-level video foundation model for unified Text-to-Video (T2V) and Image-to-Video (I2V) generation.
Wan: Open and Advanced Large-Scale Video Generative Models
Zhejiang University Graduation Thesis LaTeX Template
[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation
Code for "Diffusion Model Alignment Using Direct Preference Optimization"
This repository contains the PyTorch implementation of the CVPR'2024 paper (Highlight), IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection.
Keyframe Interpolation with CogvideoX
[ICLR 2025] Pyramidal Flow Matching for Efficient Video Generative Modeling
VideoGen-Eval: Agent-based System for Video Generation Evaluation
📹 A more flexible framework that can generate videos at any resolution and creates videos from images.
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models
[CVPR2024 Highlight] VBench - We Evaluate Video Generation
💫 Industrial-strength Natural Language Processing (NLP) in Python
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
State-of-the-Art Embeddings, Retrieval, and Reranking
The official repository of "Spectral Motion Alignment for Video Motion Transfer using Diffusion Models".
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
Open-Sora: Democratizing Efficient Video Production for All
[ECCV 2024 Oral] MotionDirector: Motion Customization of Text-to-Video Diffusion Models.
A feature-rich command-line audio/video downloader
[ICLR 2024] Code for FreeNoise based on VideoCrafter
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models (CVPR 2024)
[ECCV 2022] DID-M3D: Decoupling Instance Depth for Monocular 3D Object Detection.