Highlights
- Pro
Stars
a state-of-the-art-level open visual language model | 多模态预训练模型
[AAAI 2025] Official implementation of "OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on"
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
TripoSR: Fast 3D Object Reconstruction from a Single Image
Official implementation of DeepLabCut: Markerless pose estimation of user-defined features with deep learning for all animals incl. humans
Use commands in English to control Blender with OpenAI's GPT-4
[CVPR'24 Highlight] Official PyTorch implementation of CoDeF: Content Deformation Fields for Temporally Consistent Video Processing
[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"
[ICLR 2024 Oral] Generative Gaussian Splatting for Efficient 3D Content Creation
Official implementations for paper: Anydoor: zero-shot object-level image customization
InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models
Model summary in PyTorch similar to `model.summary()` in Keras
Witness the aha moment of VLM with less than $3.
[ICML 2024] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
The best OSS video generation models, created by Genmo
Character Animation (AnimateAnyone, Face Reenactment)
[ECCV 2024, Oral] DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
Unofficial Implementation of Animate Anyone
Isaac Gym Reinforcement Learning Environments
[CVPR 2024 Highlight] FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects
【TMM 2025🔥】 Mixture-of-Experts for Large Vision-Language Models
Lumina-T2X is a unified framework for Text to Any Modality Generation
Code for "NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video", CVPR 2021 oral
A Unified Framework for Surface Reconstruction
The repo for "Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image" and "Metric3Dv2: A Versatile Monocular Geometric Foundation Model..."