Stars
The multi-view version of MonoDETR on nuScenes dataset
[CVPR 2022] PointCLIP: Point Cloud Understanding by CLIP
[NeurIPS 2022] Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training
Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
[CVPR 2023] Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners
[CVPR 2023] Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders
ZrrSkywalker / MathVista
Forked from lupantech/MathVistaMathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts
[CVPR 2023] Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis
Personalize Segment Anything Model (SAM) with 1 shot in 10 seconds
[ICLR 2025] Mathematical Visual Instruction Tuning for Multi-modal Large Language Models
[ECCV 2024] Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
[ICCV 2023] The first DETR model for monocular 3D object detection with depth-guided transformer
duanduanduanyuchen / packnet-sfm
Forked from TRI-ML/packnet-sfmTRI-ML Monocular Depth Estimation Repository
Vectornet for trajectory prediction, implemented in PyTorch/Torch_geometric (WIP)
Waymo Open Dataset
OpenMMLab Detection Toolbox and Benchmark
OpenMMLab Semantic Segmentation Toolbox and Benchmark.
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
duanduanduanyuchen / T2I-R1
Forked from CaraJ7/T2I-R1Official repository of T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT
[AAAI 2023] Zero-Shot Enhancement of CLIP with Parameter-free Attention
ZiyuGuo99 / awesome-MIM
Forked from ucasligang/awesome-MIMReading list for research topics in Masked Image Modeling
ZiyuGuo99 / Awesome-MIM-1
Forked from Lupin1998/Awesome-MIMAwesome List of Masked Image Modeling (MIM) Papers for Self-supervised Visual Representation Learning
ZiyuGuo99 / LLaMA-Adapter
Forked from OpenGVLab/LLaMA-AdapterFine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
ZiyuGuo99 / Awesome-Multimodal-Large-Language-Models
Forked from BradyFU/Awesome-Multimodal-Large-Language-Models✨✨Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.
Align 3D Point Cloud with Multi-modalities for Large Language Models