Skip to content

Official Implementation of "Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach"

License

Notifications You must be signed in to change notification settings

breez3young/TACO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach

TACO Test-time Anti-exploration via pseudo-COunts (TACO)

Paper Project Page License Demo Models


πŸ“‹ Overview

TACO Architecture
TACO: Test-time Anti-exploration via pseudo-COunts

TACO (Test-time Anti-exploration via pseudo-COunts) is a novel test-time scaling framework for VLAs which retains the strong generalization capabilities of pretrained VLAs while effectively constraining outputs to the success modes of specific downstream tasks, performing the Anti-Exploration principle in offline RL. By leveraging a lightweight Coin Flipping Network (CFN), TACO acquires accurate measurement of distributional shift with minimal computational overhead, significantly improving performance on out-of-distribution testcases.

🎯 Key Features

  • Principled Anti-Exploration: Mitigates inference-time fragility by constraining generated actions to the "success modes" of the downstream task, effectively handling distribution shifts.
  • Universal Compatibility: Seamlessly integrates with Flow-Matching (e.g., $\pi_0$, $\pi_{0.5}$), Diffusion (e.g., RDT), and Autoregressive (e.g., OpenVLA) architectures.
  • Gradient-Free Steering: Performs Test-Time Scaling (TTS) via a generate-then-verify pipeline without modifying the heavy VLA backbone parameters.
  • Efficient Inference: Implements KV Cache Optimization to reuse visual-language representations, reducing inference latency by ~73% compared to the original manner, during parallel sampling.
  • High-Fidelity Verification: Utilizes a lightweight Coin Flipping Network (CFN) trained on internal representations with High-Fidelity Feature Search to accurately estimate action reliability.

πŸ† Main Contributions

  1. New Perspective on VLA Instability: We diagnose the inference fragility of generative VLAs as an out-of-support problem and propose TACO, the first framework to address this via the Anti-Exploration principle from Offline RL using Test-Time Scaling.
  2. Coupled Pseudo-Count Estimator: We introduce an efficient internal representation mechanism coupled with a High-Fidelity Feature Search strategy. This allows the CFN to accurately verify action chunks for denoising-based policies (Flow/Diffusion) that never see clean actions during training.
  3. SOTA Performance: Extensive experiments across extensive simulation tasks (RoboTwin, LIBERO, SimplerEnv) and real-world dual-arm manipulation demonstrate that TACO significantly boosts success rates (e.g., +16% in real-world tasks) over strong baselines like $\pi_0$.

πŸ“° News

πŸ› οΈ Installation

Install CFN

Create a conda env:

conda create -n taco python=3.10 -y
conda activate taco

Install torch (choose the version that suits your environment):

pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0

Install CFN:

cd cfn/
pip install -e .
cd ..

Install Pi0.5 (lerobot)

cd third_party/lerobot
pip install -e .

cd src/transformers
pip install -e .

cd ../../../..

(Optional) Install Robotwin2.0

cd ./third_party/Robotwin

Please follow Robotwin doc to install this local Robotwin and its requirements.

(Optional) Install Lerobot-Libero

cd ./third_party/lerobot
pip install -e ".[libero]"

Install OpenVLA

You can create a new conda environment, refer third_party/openvla to install Libero and our modified OpenVLA.

πŸš€ Quick Start

We provide pre-trained base policies and CFN checkpoints on πŸ€— Hugging Face, allowing you to directly evaluate TACO without training.

Download Pre-trained Models

# Download base policy checkpoints & CFN checkpoints
### CFN state dict is saved at `cfns` sub-directory in the repo
hf download rhodes-team-teleai/pi05_TACO_libero_finetuned --local-dir /path/to/your/dir --max-workers 16

πŸ“Š Usage

Pi0.5

Eval in Robotwin

  1. preparation

Collect Robotwin task dataset, we provide a pipline as an example:

bash ./scripts/robotwin_data/task_dataset_collection.sh
bash ./scripts/robotwin_data/data_trans/rt2-hdf5_2_hdf5_2_lerobot.sh
bash ./scripts/robotwin_data/data_trans/v21_to_v30.sh
bash ./scripts/robotwin_data/data_trans/make_sure_stats.sh

You will get lerobot dataset v3.0 at repo-id=RoboTwin2/demo_clean/${task}_v30. Please refer to Robotwin doc first if you have any questions.

You can fine-tune your own pi0.5:

bash ./third_party/lerobot/scripts/train_pi05.sh

or just use our trained pi0.5 model on Huggingface.

  1. Collect internal representation

Modify and run:

bash ./scripts/collect_inernal_representation/pi05_robotwin2/collect.sh
  1. Train CFN

Modify and run:

bash ./scripts/train_cfn/train_cfn_example.sh
  1. Eval TACO

Modify and run:

bash ./scripts/eval/eval_robotwin2_torch_pi05_taco.sh

Eval in Libero

  1. Collect internal representation

Run:

bash ./scripts/collect_inernal_representation/pi05_libero/collect.sh
  1. Train CFN

Modify and run:

bash ./scripts/train_cfn/train_cfn_example.sh
  1. Eval TACO

Modify and run:

bash ./scripts/eval/eval_libero_pi05_taco.sh

Openvla

Eval in Libero

  1. Collect internal representation

Download openvla/modified_libero_rlds from Huggingface.

Modify and run:

bash ./scripts/collect_inernal_representation/openvla_libero/collect.sh
  1. Train CFN

Modify and run:

bash ./scripts/train_cfn/train_cfn_example.sh
  1. Eval in Libero

Modify and run:

bash ./scripts/eval/libero_openvla_taco/eval_libero_openvla_taco.sh

πŸ“ˆ Results

Performance on Simpler-WindowX Benchmark

Evaluation of success rates (%) on Simpler-WindowX tasks. We compare $\pi_0$ + TACO against the base policy and other state-of-the-art methods.

Method Spoon on Towel Carrot on Plate Stack Cubes Eggplant in Basket Average
RT-1-X 0.0% 4.2% 0.0% 0.0% 1.1%
Octo 12.5% 8.3% 0.0% 43.1% 16.0%
RoboVLM 29.2% 25.0% 12.5% 58.3% 31.3%
SpatialVLA 16.7% 25.0% 29.2% 100.0% 42.7%
$\pi_0$ (Base) 36.0% 42.0% 34.0% 80.0% 48.0%
$\pi_0$ + TACO (Ours) 52.0% 52.0% 30.0% 88.0% 55.5%

Performance on Libero-Long Benchmark

Our method consistently improves performance across both Flow-Matching ($\pi_{0.5}$) and Autoregressive (OpenVLA) architectures on long-horizon tasks.

Task $\pi_{0.5}$ (Base) $\pi_{0.5}$ + TACO OpenVLA (Base) OpenVLA + TACO
Soup and Sauce in Basket 98.0% 100.0% 60.0% 66.0%
Cheese and Butter in Basket 100.0% 96.0% 76.0% 82.0%
Turn on Stove and Place Moka 98.0% 98.0% 58.0% 52.0%
Black Bowl in Drawer 98.0% 100.0% 36.0% 50.0%
Mugs on Plates 98.0% 98.0% 32.0% 50.0%
Book in Caddy 100.0% 100.0% 82.0% 90.0%
Mug and Pudding on Plate 96.0% 92.0% 60.0% 54.0%
Soup and Cheese in Basket 94.0% 100.0% 70.0% 80.0%
Moka Pots on Stove 68.0% 86.0% 20.0% 28.0%
Mug in Microwave 98.0% 96.0% 46.0% 48.0%
Average 94.8% 96.6% 54.0% 60.0%

Performance on RoboTwin 1.0 Benchmark

Task $\pi_0$ (Base) $\pi_0$ + TACO Improvement
Block Handover 41.0% 62.0% +21.0%
Bottles Adjust 31.0% 40.0% +9.0%
Container Place 25.0% 40.0% +15.0%
Diverse Bottles Pick 21.0% 27.0% +6.0%
Dual Bottles Pick Easy 60.0% 70.0% +10.0%
Dual Bottles Pick Hard 48.0% 52.0% +4.0%
Pick Apple Messy 15.0% 19.0% +4.0%
Shoe Place 42.0% 50.0% +8.0%
Mug Hanging Easy 7.0% 12.0% +5.0%
Average 32.2% 41.3% +9.1%

Performance on RoboTwin 2.0 Benchmark

Evaluation on the RoboTwin 2.0 benchmark. We report the success rate improvement of TACO over the $\pi_{0.5}$ baseline. The table highlights the top 5 tasks with the most significant gains.

Task $\pi_{0.5}$ (Base) $\pi_{0.5}$ + TACO Improvement
Move Can Pot 42.0% 57.0% +15.0%
Handover Block 24.0% 36.0% +12.0%
Place Shoe 53.0% 65.0% +12.0%
Stamp Seal 26.0% 38.0% +12.0%
Beat Block Hammer 69.0% 79.0% +10.0%
... ... ... ...
Average (41 tasks) 59.3% 64.0% +4.7%

πŸ“ Project Structure

TACO/
β”œβ”€β”€ cfn/                         # Coin Flipping implementation
β”‚   β”œβ”€β”€ cfn_net.py               # CFN model architecture
β”‚   └── feature_dataset.py       # Dataset for training CFN
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ collect_internal_representation/  # Scripts to collect features
β”‚   β”œβ”€β”€ train_cfn/               # CFN training scripts
β”‚   └── eval/                    # Evaluation scripts
β”œβ”€β”€ third_party/
β”‚   β”œβ”€β”€ lerobot/                 # Pi0.5 (LeRobot) implementation & LeRobot-Libero evaluation
β”‚   β”œβ”€β”€ openvla/                 # OpenVLA implementation
β”‚   └── Robotwin/                # Robotwin environment
└── README.md

πŸŽ“ Citation

If you find TACO useful for your research, please cite our paper:

@article{yang2025taco,
  title={Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach},
  author={Siyuan Yang, Yang Zhang, Haoran He, Ling Pan, Xiu Li, Chenjia Bai, Xuelong Li},
  journal={arXiv preprint arXiv:2512.02834},
  year={2025}
}

πŸ“„ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

πŸ™ Acknowledgments

  • LeRobot for the Pi0.5 implementation
  • OpenVLA for the OpenVLA base model
  • Robotwin for the simulation environment
  • Libero for the benchmark tasks

πŸ“§ Contact

For questions or collaborations, please contact:

About

Official Implementation of "Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published