Starred repositories
S-CLIP: Semi-supervised Vision-Language Pre-training using Few Specialist Captions
[CVPR 2021] Involution: Inverting the Inherence of Convolution for Visual Recognition, a brand new neural operator
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
A DETR-style framework for open-vocabulary detection (OVD). CVPR 2023
Official implementation of "Delving into CLIP latent space for Video Anomaly Recognition", CVIU 2024
[ICCV'23 Oral] Unmasking Anomalies in Road-Scene Segmentation
Official code for RbA: Segmenting Unknown Regions Rejected by All (ICCV 2023)
[GCPR 2023] UGainS: Uncertainty Guided Anomaly Instance Segmentation
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
[WACV 2024 Survey Paper] Multimodal Large Language Models for Autonomous Driving
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models
✨✨Latest Advances on Multimodal Large Language Models
A curated list of awesome LLM/VLM/VLA/World Model for Autonomous Driving(LLM4AD) resources (continually updated)
WEDGE: A multi-weather autonomous driving dataset built from generative vision-language models
[ICCV 2023] StreamPETR: Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection
[ECCV2022] PETR: Position Embedding Transformation for Multi-View 3D Object Detection & [ICCV2023] PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images
[IROS 2024 Oral Presentation] WidthFormer: Toward Efficient Transformer-based BEV View Transformation
RAPiD: Rotation-Aware People Detection in Overhead Fisheye Images (CVPR 2020 Workshops)
[ECAI 2023] MonoSKD: General Distillation Framework for Monocular 3D Object Detection via Spearman Correlation Coefficient
[WACV'24] ODM3D: Alleviating Foreground Sparsity for Semi-Supervised Monocular 3D Object Detection
[CVPR2024] OneFormer3D: One Transformer for Unified Point Cloud Segmentation
[CVPR 2024] A world model for autonomous driving.
[ECCV 2024] 3D World Model for Autonomous Driving
[CVPR2024] NeuRAD: Neural Rendering for Autonomous Driving