Highlights
- Pro
Starred repositories
A latent text-to-image diffusion model
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Google Research
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
High-Resolution Image Synthesis with Latent Diffusion Models
PyTorch tutorials and fun projects including neural talk, neural style, poem writing, anime generation (《深度学习框架PyTorch:入门与实战》)
LAVIS - A One-stop Library for Language-Vision Intelligence
Code release for NeRF (Neural Radiance Fields)
Reference PyTorch implementation and models for DINOv3
🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022
Using Low-rank adaptation to quickly fine-tune diffusion models.
A unified framework for 3D content generation.
Homepage for STAT 157 at UC Berkeley
[CVPR 2024] 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering
Code for "LoFTR: Detector-Free Local Feature Matching with Transformers", CVPR 2021, T-PAMI 2022
A computer vision closed-loop learning platform where code can be run interactively online. 学习闭环《计算机视觉实战演练:算法与应用》中文电子书、源码、读者交流社区(持续更新中 ...) 📘 在线电子书 https://charmve.github.io/computer-vision-in-acti…
NeRF (Neural Radiance Fields) and NeRF in the Wild using pytorch-lightning
Efficient neural feature detector and descriptor
Code for "GVHMR: World-Grounded Human Motion Recovery via Gravity-View Coordinates", Siggraph Asia 2024
[CVPR 2025 Highlight] GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control
Code for the paper "Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models"
An unofficial implementation of paper 3D Gaussian Splatting for Real-Time Radiance Field Rendering by taichi lang.
A Scalable Pipeline for Making Steerable Multi-Task Mid-Level Vision Datasets from 3D Scans [ICCV 2021]
TartanAir dataset tools and samples