Stars
The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that sho…
Servo aims to empower developers with a lightweight, high-performance alternative for embedding web technologies in applications.
Trying to extract pcb schematics from images using computer vision
(ICCV 2025) ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations
MCP Server for Computer Use in Windows
[ICCV'23 Workshop] SAM3D: Segment Anything in 3D Scenes
Resources for Physics based simulation in Computer Graphics 图形学中物理模拟的资源整理
An independent extension based on IsaacLab. It provides support for Robot Manipulation tasks (Robot Arm and Dextrous Hand).
A large-scale benchmark and learning environment.
This is the repo of CoRL 2024 paper "Learning to Manipulate Anywhere: A Visual Generalizable Framework For Reinforcement Learning"
GRAPE: Guided-Reinforced Vision-Language-Action Preference Optimization
Two-Finger Parallel Gripper Open Source Project
Paper list in the survey: A Survey on Vision-Language-Action Models: An Action Tokenization Perspective
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.
Holistic Evaluation of Language Models (HELM) is an open source Python framework created by the Center for Research on Foundation Models (CRFM) at Stanford for holistic, reproducible and transparen…
official code for "Modality Curation: Building Universal Embeddings for Advanced Multimodal Information Retrieval"
Write scalable load tests in plain Python 🚗💨
The most Purr-fect Image File Format for your AI workflows
The official repo of One RL to See Them All: Visual Triple Unified Reinforcement Learning
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.
Collection of extracted System Prompts from popular chatbots like ChatGPT, Claude & Gemini
Learn Agentic AI using Dapr Agentic Cloud Ascent (DACA) Design Pattern and Agent-Native Cloud Technologies: OpenAI Agents SDK, Memory, MCP, A2A, Knowledge Graphs, Dapr, Rancher Desktop, and Kuberne…
The development and future prospects of large multimodal reasoning models.
Solve Visual Understanding with Reinforced VLMs