-
University of Wisconsin - Madison
- Madison. WI
- https://scholar.google.com/citations?user=euruCPEAAAAJ
Highlights
- Pro
Stars
[CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents
[ICLR'25] LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
[ECCV 2024 Oral] Audio-Synchronized Visual Animation
[ECCV 2024] Official implementation of the paper "Towards Latent Masked Image Modeling for Self-Supervised Visual Representation Learning"
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
This repository is a curated collection of the most exciting and influential CVPR 2024 papers. 🔥 [Paper + Code + Demo]
A video database bridging human actions and human-object relationships
[ICML'24 Oral] "MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions"
PyTorch code and models for V-JEPA self-supervised learning from video.
Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources
Recent LLM-based CV and related works. Welcome to comment/contribute!
①[ICLR2024 Spotlight] (GPT-4V/Gemini-Pro/Qwen-VL-Plus+16 OS MLLMs) A benchmark for multi-modality LLMs (MLLMs) on low-level vision and visual quality assessment.
MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
Meta-Transformer for Unified Multimodal Learning
GD-MAE: Generative Decoder for MAE Pre-training on LiDAR Point Clouds (CVPR 2023)
awesome grounding: A curated list of research papers in visual grounding
FinGPT: Open-Source Financial Large Language Models! Revolutionize 🔥 We release the trained model on HuggingFace.
[ICLR 2024] AGILE3D: Attention Guided Interactive Multi-object 3D Segmentation
✨✨Latest Advances on Multimodal Large Language Models
[ICCV'23 Workshop] SAM3D: Segment Anything in 3D Scenes
DragGAN meets GET3D for interactive mesh generation and editing.