Skip to content
View mu-cai's full-sized avatar

Highlights

  • Pro

Block or report mu-cai

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Official Repository for STTS.

Python 10 2 Updated Mar 19, 2026

[CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents

Python 1,931 161 Updated Mar 3, 2026

[ICLR'25] LLaRA: Supercharging Robot Learning Data for Vision-Language Policy

Python 229 8 Updated Mar 29, 2025

[ECCV 2024 Oral] Audio-Synchronized Visual Animation

Python 60 1 Updated Mar 15, 2026

[ECCV 2024] Official implementation of the paper "Towards Latent Masked Image Modeling for Self-Supervised Visual Representation Learning"

Python 30 2 Updated Mar 5, 2025

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 19,384 2,481 Updated May 30, 2026

This repository is a curated collection of the most exciting and influential CVPR 2024 papers. 🔥 [Paper + Code + Demo]

Python 738 57 Updated Apr 15, 2026

A video database bridging human actions and human-object relationships

Python 165 21 Updated Jun 30, 2020

[ICML'24 Oral] "MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions"

Python 210 15 Updated Oct 28, 2024

PyTorch code and models for V-JEPA self-supervised learning from video.

Python 3,984 404 Updated Feb 27, 2025

Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources

2,224 142 Updated Apr 16, 2026

Recent LLM-based CV and related works. Welcome to comment/contribute!

869 39 Updated Mar 8, 2025

①[ICLR2024 Spotlight] (GPT-4V/Gemini-Pro/Qwen-VL-Plus+16 OS MLLMs) A benchmark for multi-modality LLMs (MLLMs) on low-level vision and visual quality assessment.

Jupyter Notebook 287 13 Updated Aug 12, 2024

Inference code for CodeLlama models

Python 16,308 1,939 Updated Aug 12, 2024

MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.

Python 953 38 Updated Mar 19, 2025

Project Page for "LISA: Reasoning Segmentation via Large Language Model"

Python 2,649 207 Updated Feb 16, 2025

Meta-Transformer for Unified Multimodal Learning

Python 1,651 117 Updated Dec 5, 2023
Python 32 2 Updated Jun 1, 2023

GD-MAE: Generative Decoder for MAE Pre-training on LiDAR Point Clouds (CVPR 2023)

Python 124 7 Updated Apr 18, 2023

awesome grounding: A curated list of research papers in visual grounding

1,125 104 Updated Sep 21, 2025

FinGPT: Open-Source Financial Large Language Models! Revolutionize 🔥 We release the trained model on HuggingFace.

Jupyter Notebook 20,613 2,931 Updated Jun 1, 2026

🔥GrowSP in PyTorch (CVPR 2023)

Python 199 18 Updated May 30, 2026

[ICLR 2024] AGILE3D: Attention Guided Interactive Multi-object 3D Segmentation

Python 127 10 Updated Apr 1, 2026

✨✨Latest Advances on Multimodal Large Language Models

17,901 1,129 Updated Jun 18, 2026

[ICCV'23 Workshop] SAM3D: Segment Anything in 3D Scenes

Python 1,371 85 Updated Apr 21, 2024

DragGAN meets GET3D for interactive mesh generation and editing.

Python 466 22 Updated Jun 5, 2023

lidar to camera projection of KITTI

Python 182 32 Updated Mar 6, 2023

Segment Any RGBD

Python 867 52 Updated May 24, 2023
Next