-
The Chinese University of Hong Kong
- Hong Kong SAR, Shatin
- zibojia.github.io
Stars
The official implementation of StereoPilot
[ACM MM 2022] UConNet:Unsupervised Controllable Network for Image and Video Deraining
The official implementation of paper "ColorFlow: Retrieval-Augmented Image Sequence Colorization". ColorFlow:基于检索增强的图像序列上色
[ECCV 2024] PowerPaint, a versatile image inpainting model that supports text-guided object inpainting, object removal, image outpainting and shape-guided object inpainting with only a single model…
[SIGGRAPH 2025] Official code of the paper "Cobra: Efficient Line Art COlorization with BRoAder References". Cobra:利用更广泛参考图实现高效线稿上色
[CVPR 2026] Towards Real-Time Diffusion-Based Streaming Video Super-Resolution — An efficient one-step diffusion framework for streaming VSR with locality-constrained sparse attention and a tiny co…
[ICCV2025]LeanVAE: An Ultra-Efficient Reconstruction VAE for Video Diffusion Models
Tiny AutoEncoder for Hunyuan Video (and other video models)
Generative Omnimatte (CVPR 2025)
This is the official implementation of our paper: "MiniMax-Remover: Taming Bad Noise Helps Video Object Removal"
🕹️ Explore cutting-edge techniques in game generation
This is the official implementation of our Señorita-2M [Weights and Dataset] : A High-Quality Instruction-based Dataset for General Video Editing by Video Specialists
2024.06.19 本项目使用Chinese-CLIP搭建文搜图/图搜图页面,旨在帮助用户快速使用跨模态检索任务。本项目代码针对MUGE数据集约19w(189585张)数据作为底库数据。本项目提供了提取特征, 检索, 以及uI代码。
[ICLR 2025] BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities
📹 A more flexible framework that can generate videos at any resolution and creates videos from images.
Paint by Inpaint: Learning to Add Image Objects by Removing Them First
[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.
The official repository of "Video assistant towards large language model makes everything easy"
[NAACL 2024] Visually Guided Generative Text-Layout Pre-training for Document Intelligence
[ECCV 2024 Oral] ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction
Video-Inpaint-Anything: This is the inference code for our paper CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility.
[ECCV 2024] Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation
Using Low-rank adaptation to quickly fine-tune diffusion models.
QLoRA: Efficient Finetuning of Quantized LLMs
Code for ACL 2022 paper "BERT Learns to Teach: Knowledge Distillation with Meta Learning".
This repository contains datasets and baselines for benchmarking Chinese text recognition.