Stars
Offical implementation of [WACV 2026] paper KMOPS: Keypoint-Driven Method for Multi-Object Pose and Metric Size Estimation from Stereo Images.
[ICRA 2026] YOPO: A Minimalist’s Detection Transformer for Monocular RGB Category‑level 9D Multi‑Object Pose Estimation
Code for RFMPose: Generative Category-level Object Pose Estimation via Riemannian Flow Matching
[ICRA 2024] MESA is a fully distributed, asynchronous, and general purpose optimization algorithm for Consensus Simultaneous Localization and Mapping (CSLAM).
Qwen3-ASR is an open-source series of ASR models developed by the Qwen team at Alibaba Cloud, supporting stable multilingual speech/music/song recognition, language detection and timestamp prediction.
Official repository for "TimePFN: Effective Multivariate Time Series Forecasting with Synthetic Data" (AAAI 2025).
[ICLR-2025] POGEMA stands for Partially-Observable Grid Environment for Multiple Agents. This is a grid-based environment that was specifically designed to be flexible, tunable and scalable. It can…
MAPF-LNS2: Fast Repairing for Multi-Agent Path Finding via Large Neighborhood Search
Anytime Multi-Agent Path Finding via Large-Neighborhood Search
This is code repo for the paper, Depolying Ten Thousand Robots: Scalable Imitation Learninig for Lifelong Mulit-Agent Path Finding, which won the ICRA 2025 best paper on multi-robot systems and the…
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
[CVPR 2026] Fast-FoundationStereo: Real-Time Zero-Shot Stereo Matching
[DEIMv2] Real Time Object Detection Meets DINOv3
[CVPR 2025] SAT-HMR: Real-Time Multi-Person 3D Mesh Estimation via Scale-Adaptive Tokens
The official CPLP implementation from Prioritised Planning for Continuous-time Lifelong Multi-agent Pathfinding.
DAA*: Deep Angular A Star for Image-based Path Planning, ICCV 2025
Camera calibration with sub-pixel accuracy: https://discorpy.readthedocs.io/
RepViT: Revisiting Mobile CNN From ViT Perspective [CVPR 2024] and RepViT-SAM: Towards Real-Time Segmenting Anything
[CVPR25] Official implementation of `MobileMamba: Lightweight Multi-Receptive Visual Mamba Network.'
OmniGen2: Exploration to Advanced Multimodal Generation. https://arxiv.org/abs/2506.18871
lib release of paper [TopoTag: A Robust and Scalable Topological Fiducial Marker System]
Official implementation of paper [DeepTag: A General Framework for Fiducial Marker Design and Detection]