-
University of Waterloo
- Toronto
- https://congwei1230.github.io/
- @CongWei1230
- https://scholar.google.com/citations?user=y1d5C5YAAAAJ
- https://github.com/congwei1230
Stars
Consistent Autoregressive Video Generation with Long Context
OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis
[CVPR'26 Highlight] Ditto: Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset
[ICLR 2026] UniVideo: Unified Understanding, Generation, and Editing for Videos
[NeurIPS 2025 Spotlight] Demo implementation of MoCha Towards Movie-Grade Talking Character Synthesis
A version of verl to support diverse tool use [TMLR 2026]
Official Repo for MoCha Towards Movie-Grade Talking Character Synthesis
Official Repo for Paper "OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision" [ICLR2025]
FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions
InstantID: Zero-shot Identity-Preserving Generation in Seconds 🔥
Code and data for "AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks" [TMLR 2024]
Data and Code for Program of Thoughts [TMLR 2023]
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
[IJCV 2024] LaVie: High-Quality Video Generation with Cascaded Latent Diffusion Models
[ICLR 2024] SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction
Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)
Official PyTorch implementation of ODISE: Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models [CVPR 2023 Highlight]
Generative Models by Stability AI
[ICCV 2023] Simple Baselines for Interactive Video Retrieval with Questions and Answers
[ACM TOMM 2023] - Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features
An open source implementation of CLIP.
Fashion 200K dataset used in paper "Automatic Spatially-aware Fashion Concept Discovery."
[ICCV 2023] - Zero-shot Composed Image Retrieval with Textual Inversion
[EMNLP'21] Visual News: Benchmark and Challenges in News Image Captioning
Official repository of ICCV 2021 - Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models
[NeurIPS'23] "MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing".
Demo code for CVPR2023 paper "Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers"