[Project page] · [Paper] · [ROS2 & Data Collection Tutorial] · [Visuo‑Tactile Gripper Guide]
Xinyue Zhu* 1, Binghao Huang* 1, Yunzhu Li1
*Equal contribution 1Columbia University
Tested on Ubuntu 22.04
-
System dependencies & Docker Follow the Universal Manipulation Interface guide to install Docker and all required system packages.
-
Conda environment We recommend Miniforge + mamba for faster solves.
mamba env create -f conda_environment.yaml mamba activate touchwild
The SLAM pipeline aligns GoPro videos with tactile logs and produces a time‑synchronised dataset.
-
Collect data Follow the Touch‑in‑the‑Wild ROS 2 Guide to record GoPro streams plus tactile JSON logs. For detailed instructions on collecting demonstrations with the UMI gripper, see the Data Collection Tutorial.
-
Organise files Collect all videos recorded during the session—including
- demo videos
- mapping videos
- gripper calibration video
—and the associated tactile JSON file, and place everything in one folder:
<YOUR_SESSION_FOLDER>/ ├── demo_mapping.mp4 ├── demo_gripper.mp4 ├── demo_0001.mp4 ├── demo_0002.mp4 └── tactile_recording_YYYYMMDD_HHMMSS.json -
Run the pipeline
(touchwild)$ python run_slam_pipeline.py <YOUR_SESSION_FOLDER> --bag <YOUR_SESSION_FOLDER>/tactile_recording_YYYYMMDD_HHMMSS.json
All SLAM outputs are written back into
<YOUR_SESSION_FOLDER>/. -
Generate training dataset
(touchwild)$ python scripts_slam_pipeline/07_generate_replay_buffer.py <YOUR_SESSION_FOLDER> -o <YOUR_SESSION_FOLDER>/dataset.zarr.zip
run_tactile_pipeline.py builds a visuo-tactile dataset with the same Zarr layout as the full SLAM pipeline, but containing only GoPro and tactile images for self‑supervised MAE pre‑training.
(touchwild)$ python run_tactile_pipeline.py --bag /path/to/tactile_recording_YYYYMMDD_HHMMSS.jsonGenerate visuo-tactile-only training dataset:
(touchwild)$ python scripts_tactile_pipeline/04_generate_replay_buffer.py <YOUR_SESSION_FOLDER> -o <YOUR_SESSION_FOLDER>/dataset.zarr.zip-
Dataset – We provide all our demonstrations and the pretraining dataset in
.zarr.zipformat on Hugging Face. -
Launch training
(touchwild)$ python -m pretrain_mae.pretrain_mae task.dataset_path=/path/to/dataset.zarr.zip
Checkpoints are stored in
pretrain_mae/pretrain_checkpoints/.
We provide an example pretrained MAE checkpoint.
pip install -U "huggingface_hub[cli]"
huggingface-cli download \
xinyue-zhu/pretrained_mae \
pretrain_mae.pth \
config.yaml \
--repo-type model \
--local-dir ./pretrain_checkpointsTo evaluate the pretrained checkpoint on the tactile reconstruction task:
(touchwild)$ python -m pretrain_mae.pretrain_eval --checkpoint /path/to/mae_checkpoint.pth --dataset /path/to/dataset.zarr.zip --plot_images The script reports Mean‑Squared‑Error (MSE) on the validation split and, with --plot_images, saves qualitative results to eval_outputs/.
We provide an example test_tube_collection dataset (~13 GB).
pip install -U "huggingface_hub[cli]"
huggingface-cli download \
xinyue-zhu/test_tube_collection \
test_tube_collection.zarr.zip \
--repo-type dataset \
--local-dir ./dataset(touchwild)$ python train.py \
--config-name train_diffusion_unet_timm_umi_workspace \
task.dataset_path=/path/to/dataset.zarr.zip \
policy.obs_encoder.use_tactile=true \
policy.obs_encoder.tactile_model_choice=pretrain \
policy.obs_encoder.pretrain_ckpt_path=/path/to/mae_checkpoint.pth(touchwild)$ accelerate --num_processes <NGPUS> train.py \
--config-name train_diffusion_unet_timm_umi_workspace \
task.dataset_path=/path/to/dataset.zarr.zip \
policy.obs_encoder.use_tactile=true \
policy.obs_encoder.tactile_model_choice=pretrain \
policy.obs_encoder.pretrain_ckpt_path=/path/to/mae_checkpoint.pthBelow we demonstrate deploying a trained policy on xArm 850.
Refer to the UMI Hardware Guide for GoPro configuration.
- Physically connect both tactile sensors to the machine running the policy.
- Follow the tactile hardware guide to configure persistent port naming.
# From outside the repository
(touchwild)$ cd ..
(touchwild)$ git clone https://github.com/xArm-Developer/xArm-Python-SDK.git
(touchwild)$ cd xArm-Python-SDK
(touchwild)$ pip install .-
Download UFactoryStudio‑Linux‑1.0.1.AppImage from the uFactory website.
-
Connect to the robot's IP address.
-
Go to Settings → Motion → TCP and set the payload to:
- Weight: 1.9 kg
- Center of Mass (CoM): x = -2 mm, y = -6 mm, z = 37 mm
-
Go to Settings → Motion → TCP and set the TCP offset to:
(x = 0 mm, y = 0 mm, z = 270 mm, roll = 0°, pitch = 0°, yaw = 90°)
Edit the configuration file to set the robot's IP address:
# File: /example/eval_robots_config.yaml
robot_ip: <your_robot_ip_here># Allow access to the HDMI capture card
sudo chmod -R 777 /dev/bus/usb
# Evaluate a checkpoint
(touchwild)$ python eval_real.py --robot_config example/eval_robots_config.yaml -i /path/to/policy_checkpoint.ckpt -o /path/to/output_folder
3D-ViTac: Learning Fine-Grained Manipulation with Visuo-Tactile Sensing. link.
VT-Refine: Learning Bimanual Assembly with Visuo-Tactile Feedback via Simulation Fine-Tuning. link
This project is released under the MIT License. See LICENSE for details.
Our Visuo-Tactile Gripper builds upon UMI Gripper. The SLAM pipeline builds upon Steffen Urban’s fork of ORB_SLAM3 and his OpenImuCameraCalibrator.
The gripper’s mechanical design is adapted from the Push/Pull Gripper by John Mulac, and the soft finger from an original design by Alex Alspach at TRI. The GoPro installation frame on robot side is adapted from Fast-UMI.