Highlights
- Pro
Stars
LVM
6 repositories
IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks
[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Pytorch Code for "[CVPR24, TPAMI25] LEGaussians: Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding"
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.