- Nagano, Japan
- https://qiita.com/shinmura0
Stars
The official implementation of "DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation". (arXiv 2601.22153)
ICRA 2026: Dexora: Open-source VLA for High-DoF Bimanual Dexterity
A Curated List of Vision-Language-Action (VLA) and World Action Models (WAM) Research and Beyond
Tool to build & run portable, lightweight, self-contained virtual machines.
Welcome to GR00T Whole-Body Control (WBC)! This is a unified platform for developing and deploying advanced humanoid controllers. This includes: Decoupled WBC models used in NVIDIA Isaac-Gr00t, Gr0…
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
[ICLR 2026] The offical Implementation of "Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model"
[RSS 2023] Diffusion Policy Visuomotor Policy Learning via Action Diffusion
A software framework integrating various imitation learning methods and benchmark environments for robotic manipulation
[NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modeling
Streamlit — A faster way to build and share data apps.
[CVPR2024] Code for "SAM-6D: Segment Anything Model Meets Zero-Shot 6D Object Pose Estimation".
[CVPR 2026] Fast-FoundationStereo: Real-Time Zero-Shot Stereo Matching
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
Visual Object Tagging Tool: An electron app for building end to end Object Detection Models from Images and Videos.
A PyTorch Library for Accelerating 3D Deep Learning Research
Official code repository of paper "D(R, O) Grasp: A Unified Representation of Robot and Object Interaction for Cross-Embodiment Dexterous Grasping"
VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)
Train robotic agents to learn to plan pushing and grasping actions for manipulation with deep reinforcement learning.
ディスプレイの4隅を検出するモデルのプロトタイプ
This repository contains the code of the CVPR 2022 paper "Image Segmentation Using Text and Image Prompts".
Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice…
Pytorch implementation of Contact-GraspNet
[CVPR 2024 Highlight] FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects