Stars
WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
An Easy-to-use, Scalable and High-performance RLHF Framework designed for Multimodal Models.
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
A universal summary of current robotics simulators
[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
[ECCV 2024] Official implementation of the paper "X-Pose: Detecting Any Keypoints"
detrex is a research platform for DETR-based object detection, segmentation, pose estimation and other visual recognition tasks.
[ICCV 2023] Official implementation of the paper "DFA3D: 3D Deformable Attention For 2D-to-3D Feature Lifting"
[ICCV 2023] Official implementation of the paper "Neural Interactive Keypoint Detection"
Official implementation of the paper "Boosting Human-Object Interaction Detection with Text-to-Image Diffusion Model"
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
[CVPR 2023] The official implementation of CVPR 2023 paper "Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes"
[ICCV-2023] Official code for work "HumanMAC: Masked Motion Completion for Human Motion Prediction".
[ICLR 2023] Official implementation of the paper "Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation "