Stars
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
📡 Simple and ready-to-use tutorials for TensorFlow
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
MiniCPM4 & MiniCPM4.1: Ultra-Efficient LLMs on End Devices, achieving 3+ generation speedup on reasoning tasks
Inpaint anything using Segment Anything and inpainting models.
NVIDIA Isaac GR00T N1.5 - A Foundation Model for Generalist Robots.
Metric depth estimation from a single image
KITTI Object Visualization (Birdview, Volumetric LiDar point cloud )
Evaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo) in simulation under common setups (e.g., Google Robot, WidowX+Bridge) (CoRL 2024)
We extend Segment Anything to 3D perception by combining it with VoxelNeXt.
CLIPort: What and Where Pathways for Robotic Manipulation
Official implementation of "Re3Sim: Generating High-Fidelity Simulation Data via 3D-Photorealistic Real-to-Sim for Robotic Manipulation"