mu-cai

Follow

Mu Cai mu-cai

Follow

Computer Sciences Ph.D. student @UW-Madison

66 followers · 3 following

University of Wisconsin - Madison
Madison. WI
https://scholar.google.com/citations?user=euruCPEAAAAJ

Achievements

Achievements

Highlights

Pro

Stars

allenai / STTS

Official Repository for STTS.

Python 10 2 Updated Mar 19, 2026

microsoft / Magma

[CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents

Python 1,931 161 Updated Mar 3, 2026

LostXine / LLaRA

[ICLR'25] LLaRA: Supercharging Robot Learning Data for Vision-Language Policy

Python 229 8 Updated Mar 29, 2025

lzhangbj / ASVA

[ECCV 2024 Oral] Audio-Synchronized Visual Animation

Python 60 1 Updated Mar 15, 2026

yibingwei-1 / LatentMIM

[ECCV 2024] Official implementation of the paper "Towards Latent Masked Image Modeling for Self-Supervised Visual Representation Learning"

Python 30 2 Updated Mar 5, 2025

facebookresearch / sam2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 19,384 2,481 Updated May 30, 2026

xiaobai1217 / Awesome-Video-Datasets

Video datasets

1,651 118 Updated Mar 8, 2023

SkalskiP / top-cvpr-2024-papers

This repository is a curated collection of the most exciting and influential CVPR 2024 papers. 🔥 [Paper + Code + Demo]

Python 738 57 Updated Apr 15, 2026

JingweiJ / ActionGenome

A video database bridging human actions and human-object relationships

Python 165 21 Updated Jun 30, 2020

google-deepmind / magiclens

[ICML'24 Oral] "MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions"

Python 210 15 Updated Oct 28, 2024

facebookresearch / jepa

PyTorch code and models for V-JEPA self-supervised learning from video.

Python 3,984 404 Updated Feb 27, 2025

ActiveVisionLab / Awesome-LLM-3D

Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources

2,224 142 Updated Apr 16, 2026

DirtyHarryLYL / LLM-in-Vision

Recent LLM-based CV and related works. Welcome to comment/contribute!

869 39 Updated Mar 8, 2025

Q-Future / Q-Bench

①[ICLR2024 Spotlight] (GPT-4V/Gemini-Pro/Qwen-VL-Plus+16 OS MLLMs) A benchmark for multi-modality LLMs (MLLMs) on low-level vision and visual quality assessment.

Jupyter Notebook 287 13 Updated Aug 12, 2024

meta-llama / codellama

Inference code for CodeLlama models

Python 16,308 1,939 Updated Aug 12, 2024

allenai / mmc4

MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.

Python 953 38 Updated Mar 19, 2025

JIA-Lab-research / LISA

Project Page for "LISA: Reasoning Segmentation via Large Language Model"

Python 2,649 207 Updated Feb 16, 2025

invictus717 / MetaTransformer

Meta-Transformer for Unified Multimodal Learning

Python 1,651 117 Updated Dec 5, 2023

FuchenUSTC / PointClustering

Python 32 2 Updated Jun 1, 2023

Nightmare-n / GD-MAE

GD-MAE: Generative Decoder for MAE Pre-training on LiDAR Point Clouds (CVPR 2023)

Python 124 7 Updated Apr 18, 2023

TheShadow29 / awesome-grounding

awesome grounding: A curated list of research papers in visual grounding

1,125 104 Updated Sep 21, 2025

AI4Finance-Foundation / FinGPT

FinGPT: Open-Source Financial Large Language Models! Revolutionize 🔥 We release the trained model on HuggingFace.

Jupyter Notebook 20,613 2,931 Updated Jun 1, 2026

vLAR-group / GrowSP

🔥GrowSP in PyTorch (CVPR 2023)

Python 199 18 Updated May 30, 2026

ywyue / AGILE3D

[ICLR 2024] AGILE3D: Attention Guided Interactive Multi-object 3D Segmentation

Python 127 10 Updated Apr 1, 2026

BradyFU / Awesome-Multimodal-Large-Language-Models

✨✨Latest Advances on Multimodal Large Language Models

17,901 1,129 Updated Jun 18, 2026

runnanchen / Label-Free-Scene-Understanding

49 3 Updated Nov 19, 2023

Pointcept / SegmentAnything3D

[ICCV'23 Workshop] SAM3D: Segment Anything in 3D Scenes

Python 1,371 85 Updated Apr 21, 2024

ashawkey / Drag3D

DragGAN meets GET3D for interactive mesh generation and editing.

Python 466 22 Updated Jun 5, 2023

azureology / kitti-velo2cam

lidar to camera projection of KITTI

Python 182 32 Updated Mar 6, 2023

Jun-CEN / SegmentAnyRGBD

Segment Any RGBD

Python 867 52 Updated May 24, 2023