Touch in the Wild: Learning Fine‑Grained Manipulation with a Portable Visuo‑Tactile Gripper

[Project page] · [Paper] · [ROS2 & Data Collection Tutorial] · [Visuo‑Tactile Gripper Guide]

Xinyue Zhu^{* 1}, Binghao Huang^{* 1}, Yunzhu Li¹

^*Equal contribution ¹Columbia University

🛠️ Installation

Tested on Ubuntu 22.04

System dependencies & Docker Follow the Universal Manipulation Interface guide to install Docker and all required system packages.

Conda environment We recommend Miniforge + mamba for faster solves.

mamba env create -f conda_environment.yaml
mamba activate touchwild

📍 Creating a Dataset with the SLAM Pipeline

The SLAM pipeline aligns GoPro videos with tactile logs and produces a time‑synchronised dataset.

Collect data Follow the Touch‑in‑the‑Wild ROS 2 Guide to record GoPro streams plus tactile JSON logs. For detailed instructions on collecting demonstrations with the UMI gripper, see the Data Collection Tutorial.
Organise files Collect all videos recorded during the session—including
- demo videos
- mapping videos
- gripper calibration video
—and the associated tactile JSON file, and place everything in one folder:
```
<YOUR_SESSION_FOLDER>/
├── demo_mapping.mp4
├── demo_gripper.mp4
├── demo_0001.mp4
├── demo_0002.mp4
└── tactile_recording_YYYYMMDD_HHMMSS.json
```

Run the pipeline

(touchwild)$ python run_slam_pipeline.py <YOUR_SESSION_FOLDER> --bag <YOUR_SESSION_FOLDER>/tactile_recording_YYYYMMDD_HHMMSS.json

All SLAM outputs are written back into <YOUR_SESSION_FOLDER>/.

Generate training dataset

(touchwild)$ python scripts_slam_pipeline/07_generate_replay_buffer.py <YOUR_SESSION_FOLDER> -o <YOUR_SESSION_FOLDER>/dataset.zarr.zip

🖐️ Building a visuo-tactile‑only Dataset

run_tactile_pipeline.py builds a visuo-tactile dataset with the same Zarr layout as the full SLAM pipeline, but containing only GoPro and tactile images for self‑supervised MAE pre‑training.

(touchwild)$ python run_tactile_pipeline.py --bag /path/to/tactile_recording_YYYYMMDD_HHMMSS.json

Generate visuo-tactile-only training dataset:

(touchwild)$ python scripts_tactile_pipeline/04_generate_replay_buffer.py <YOUR_SESSION_FOLDER> -o <YOUR_SESSION_FOLDER>/dataset.zarr.zip

🧑‍🔬 Pre‑training the Visuo‑Tactile MAE

Dataset – We provide all our demonstrations and the pretraining dataset in .zarr.zip format on Hugging Face.

Launch training

(touchwild)$ python -m pretrain_mae.pretrain_mae task.dataset_path=/path/to/dataset.zarr.zip

Checkpoints are stored in pretrain_mae/pretrain_checkpoints/.

🔍 Evaluate a checkpoint

We provide an example pretrained MAE checkpoint.

pip install -U "huggingface_hub[cli]"

huggingface-cli download \
    xinyue-zhu/pretrained_mae \
    pretrain_mae.pth \
    config.yaml \
    --repo-type model \
    --local-dir ./pretrain_checkpoints

To evaluate the pretrained checkpoint on the tactile reconstruction task:

(touchwild)$ python -m pretrain_mae.pretrain_eval --checkpoint /path/to/mae_checkpoint.pth --dataset /path/to/dataset.zarr.zip --plot_images

The script reports Mean‑Squared‑Error (MSE) on the validation split and, with --plot_images, saves qualitative results to eval_outputs/.

📈 Training Diffusion Policies

We provide an example test_tube_collection dataset (~13 GB).

pip install -U "huggingface_hub[cli]"

huggingface-cli download \
    xinyue-zhu/test_tube_collection \
    test_tube_collection.zarr.zip \
    --repo-type dataset \
    --local-dir ./dataset

Single‑GPU

(touchwild)$ python train.py \
  --config-name train_diffusion_unet_timm_umi_workspace \
  task.dataset_path=/path/to/dataset.zarr.zip \
  policy.obs_encoder.use_tactile=true \
  policy.obs_encoder.tactile_model_choice=pretrain \
  policy.obs_encoder.pretrain_ckpt_path=/path/to/mae_checkpoint.pth

Multi‑GPU

(touchwild)$ accelerate --num_processes <NGPUS> train.py \
  --config-name train_diffusion_unet_timm_umi_workspace \
  task.dataset_path=/path/to/dataset.zarr.zip \
  policy.obs_encoder.use_tactile=true \
  policy.obs_encoder.tactile_model_choice=pretrain \
  policy.obs_encoder.pretrain_ckpt_path=/path/to/mae_checkpoint.pth

🦾 Real‑World Deployment

Below we demonstrate deploying a trained policy on xArm 850.

🎥 Camera Setup

Refer to the UMI Hardware Guide for GoPro configuration.

🖐️ Tactile Setup

Physically connect both tactile sensors to the machine running the policy.
Follow the tactile hardware guide to configure persistent port naming.

🤖 Robot Setup

Install the xArm Python SDK

# From outside the repository
(touchwild)$ cd ..
(touchwild)$ git clone https://github.com/xArm-Developer/xArm-Python-SDK.git
(touchwild)$ cd xArm-Python-SDK
(touchwild)$ pip install .

Launch uFactory Studio

Download UFactoryStudio‑Linux‑1.0.1.AppImage from the uFactory website.
Connect to the robot's IP address.
Go to Settings → Motion → TCP and set the payload to:
- Weight: 1.9 kg
- Center of Mass (CoM): x = -2 mm, y = -6 mm, z = 37 mm
Go to Settings → Motion → TCP and set the TCP offset to: (x = 0 mm, y = 0 mm, z = 270 mm, roll = 0°, pitch = 0°, yaw = 90°)

Configure IP

Edit the configuration file to set the robot's IP address:

# File: /example/eval_robots_config.yaml
robot_ip: <your_robot_ip_here>

☕ Running a Policy

# Allow access to the HDMI capture card
sudo chmod -R 777 /dev/bus/usb

# Evaluate a checkpoint
(touchwild)$ python eval_real.py --robot_config example/eval_robots_config.yaml -i /path/to/policy_checkpoint.ckpt -o /path/to/output_folder

🖐️ Related Works

3D-ViTac: Learning Fine-Grained Manipulation with Visuo-Tactile Sensing. link.

VT-Refine: Learning Bimanual Assembly with Visuo-Tactile Feedback via Simulation Fine-Tuning. link

🏷️ License

This project is released under the MIT License. See LICENSE for details.

🙏 Acknowledgements

Our Visuo-Tactile Gripper builds upon UMI Gripper. The SLAM pipeline builds upon Steffen Urban’s fork of ORB_SLAM3 and his OpenImuCameraCalibrator.

The gripper’s mechanical design is adapted from the Push/Pull Gripper by John Mulac, and the soft finger from an original design by Alex Alspach at TRI. The GoPro installation frame on robot side is adapted from Fast-UMI.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Touch in the Wild: Learning Fine‑Grained Manipulation with a Portable Visuo‑Tactile Gripper

🛠️ Installation

📍 Creating a Dataset with the SLAM Pipeline

🖐️ Building a visuo-tactile‑only Dataset

🧑‍🔬 Pre‑training the Visuo‑Tactile MAE

🔍 Evaluate a checkpoint

📈 Training Diffusion Policies

Single‑GPU

Multi‑GPU

🦾 Real‑World Deployment

🎥 Camera Setup

🖐️ Tactile Setup

🤖 Robot Setup

Install the xArm Python SDK

Launch uFactory Studio

Configure IP

☕ Running a Policy

🖐️ Related Works

🏷️ License

🙏 Acknowledgements

About

Uh oh!

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
assets		assets
diffusion_policy		diffusion_policy
example		example
pretrain_mae		pretrain_mae
scripts		scripts
scripts_real		scripts_real
scripts_slam_pipeline		scripts_slam_pipeline
scripts_tactile_pipeline		scripts_tactile_pipeline
tests		tests
umi		umi
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
conda_environment.yaml		conda_environment.yaml
eval_real.py		eval_real.py
franka_instruction.md		franka_instruction.md
run_slam_pipeline.py		run_slam_pipeline.py
run_tactile_pipeline.py		run_tactile_pipeline.py
train.py		train.py

License

YolandaXinyueZhu/touch_in_the_wild

Folders and files

Latest commit

History

Repository files navigation

Touch in the Wild: Learning Fine‑Grained Manipulation with a Portable Visuo‑Tactile Gripper

🛠️ Installation

📍 Creating a Dataset with the SLAM Pipeline

🖐️ Building a visuo-tactile‑only Dataset

🧑‍🔬 Pre‑training the Visuo‑Tactile MAE

🔍 Evaluate a checkpoint

📈 Training Diffusion Policies

Single‑GPU

Multi‑GPU

🦾 Real‑World Deployment

🎥 Camera Setup

🖐️ Tactile Setup

🤖 Robot Setup

Install the xArm Python SDK

Launch uFactory Studio

Configure IP

☕ Running a Policy

🖐️ Related Works

🏷️ License

🙏 Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages