Stars
This is an official implementation for "Video Swin Transformers".
The agent that grows with you
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
Open and efficient video and image watermarking
python library for invisible image watermark (blind image watermark)
Blind&Invisible Watermark ,图片盲水印,提取水印无须原图!
AI agents running research on single-GPU nanochat training automatically
Your Intelligent End-to-End Multi-Modal Risk Assessment AI Expert
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
AgentEvolver: Towards Efficient Self-Evolving Agent System
[ICLR 2026] AudioMCQ: A 571k audio multiple-choice question dataset for post-training Large Audio Language Models with dual CoT annotations and audio-contribution filtering. 🏆 1st place in DCASE 20…
HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
A comprehensive benchmark of deepfake detection
We introduce AI-Face, the first million-scale AI-generated face dataset with demographic annotations, and conduct a comprehensive fairness benchmark. Our work has been accepted at CVPR 2025.
real time face swap and one-click video deepfake with only a single image
Official repository for the UAE paper, unified-GRPO, and unified-Bench
State-of-the-art 2D and 3D Face Analysis Project
[ECCV2024] This is an official inference code of the paper "Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering" and "Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Mu…
OmniGen2: Exploration to Advanced Multimodal Generation. https://arxiv.org/abs/2506.18871
Official inference repo for FLUX.1 models
WeChatCV / opencv_3rdparty
Forked from opencv/opencv_3rdpartyOpenCV - 3rdparty
The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.
Tevatron - Unified Document Retrieval Toolkit across Scale, Language, and Modality. Demo in SIGIR 2023, SIGIR 2025.
A curated list of papers, datasets and resources pertaining to open vocabulary object detection.
[NeurIPS 2025] Image editing is worth a single LoRA! 0.1% training data for fantastic image editing! Surpasses GPT-4o in ID persistence~ MoE ckpt released! Only 4GB VRAM is enough to run!