TACO: Test-time Anti-exploration via pseudo-COunts
TACO (Test-time Anti-exploration via pseudo-COunts) is a novel test-time scaling framework for VLAs which retains the strong generalization capabilities of pretrained VLAs while effectively constraining outputs to the success modes of specific downstream tasks, performing the Anti-Exploration principle in offline RL. By leveraging a lightweight Coin Flipping Network (CFN), TACO acquires accurate measurement of distributional shift with minimal computational overhead, significantly improving performance on out-of-distribution testcases.
- Principled Anti-Exploration: Mitigates inference-time fragility by constraining generated actions to the "success modes" of the downstream task, effectively handling distribution shifts.
-
Universal Compatibility: Seamlessly integrates with Flow-Matching (e.g.,
$\pi_0$ ,$\pi_{0.5}$ ), Diffusion (e.g., RDT), and Autoregressive (e.g., OpenVLA) architectures. - Gradient-Free Steering: Performs Test-Time Scaling (TTS) via a generate-then-verify pipeline without modifying the heavy VLA backbone parameters.
- Efficient Inference: Implements KV Cache Optimization to reuse visual-language representations, reducing inference latency by ~73% compared to the original manner, during parallel sampling.
- High-Fidelity Verification: Utilizes a lightweight Coin Flipping Network (CFN) trained on internal representations with High-Fidelity Feature Search to accurately estimate action reliability.
- New Perspective on VLA Instability: We diagnose the inference fragility of generative VLAs as an out-of-support problem and propose TACO, the first framework to address this via the Anti-Exploration principle from Offline RL using Test-Time Scaling.
- Coupled Pseudo-Count Estimator: We introduce an efficient internal representation mechanism coupled with a High-Fidelity Feature Search strategy. This allows the CFN to accurately verify action chunks for denoising-based policies (Flow/Diffusion) that never see clean actions during training.
-
SOTA Performance: Extensive experiments across extensive simulation tasks (RoboTwin, LIBERO, SimplerEnv) and real-world dual-arm manipulation demonstrate that TACO significantly boosts success rates (e.g., +16% in real-world tasks) over strong baselines like
$\pi_0$ .
- [2025-12] Releasing TACO code and models. See our huggingface collections.
Create a conda env:
conda create -n taco python=3.10 -y
conda activate taco
Install torch (choose the version that suits your environment):
pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0
Install CFN:
cd cfn/
pip install -e .
cd ..
cd third_party/lerobot
pip install -e .
cd src/transformers
pip install -e .
cd ../../../..
cd ./third_party/Robotwin
Please follow Robotwin doc to install this local Robotwin and its requirements.
cd ./third_party/lerobot
pip install -e ".[libero]"
You can create a new conda environment, refer third_party/openvla to install Libero and our modified OpenVLA.
We provide pre-trained base policies and CFN checkpoints on π€ Hugging Face, allowing you to directly evaluate TACO without training.
# Download base policy checkpoints & CFN checkpoints
### CFN state dict is saved at `cfns` sub-directory in the repo
hf download rhodes-team-teleai/pi05_TACO_libero_finetuned --local-dir /path/to/your/dir --max-workers 16- preparation
Collect Robotwin task dataset, we provide a pipline as an example:
bash ./scripts/robotwin_data/task_dataset_collection.sh
bash ./scripts/robotwin_data/data_trans/rt2-hdf5_2_hdf5_2_lerobot.sh
bash ./scripts/robotwin_data/data_trans/v21_to_v30.sh
bash ./scripts/robotwin_data/data_trans/make_sure_stats.sh
You will get lerobot dataset v3.0 at repo-id=RoboTwin2/demo_clean/${task}_v30. Please refer to Robotwin doc first if you have any questions.
You can fine-tune your own pi0.5:
bash ./third_party/lerobot/scripts/train_pi05.sh
or just use our trained pi0.5 model on Huggingface.
- Collect internal representation
Modify and run:
bash ./scripts/collect_inernal_representation/pi05_robotwin2/collect.sh
- Train CFN
Modify and run:
bash ./scripts/train_cfn/train_cfn_example.sh
- Eval TACO
Modify and run:
bash ./scripts/eval/eval_robotwin2_torch_pi05_taco.sh
- Collect internal representation
Run:
bash ./scripts/collect_inernal_representation/pi05_libero/collect.sh
- Train CFN
Modify and run:
bash ./scripts/train_cfn/train_cfn_example.sh
- Eval TACO
Modify and run:
bash ./scripts/eval/eval_libero_pi05_taco.sh
- Collect internal representation
Download openvla/modified_libero_rlds from Huggingface.
Modify and run:
bash ./scripts/collect_inernal_representation/openvla_libero/collect.sh
- Train CFN
Modify and run:
bash ./scripts/train_cfn/train_cfn_example.sh
- Eval in Libero
Modify and run:
bash ./scripts/eval/libero_openvla_taco/eval_libero_openvla_taco.sh
Evaluation of success rates (%) on Simpler-WindowX tasks. We compare
| Method | Spoon on Towel | Carrot on Plate | Stack Cubes | Eggplant in Basket | Average |
|---|---|---|---|---|---|
| RT-1-X | 0.0% | 4.2% | 0.0% | 0.0% | 1.1% |
| Octo | 12.5% | 8.3% | 0.0% | 43.1% | 16.0% |
| RoboVLM | 29.2% | 25.0% | 12.5% | 58.3% | 31.3% |
| SpatialVLA | 16.7% | 25.0% | 29.2% | 100.0% | 42.7% |
|
|
36.0% | 42.0% | 34.0% | 80.0% | 48.0% |
| 52.0% | 52.0% | 30.0% | 88.0% | 55.5% |
Our method consistently improves performance across both Flow-Matching (
| Task |
|
OpenVLA (Base) | OpenVLA + TACO | |
|---|---|---|---|---|
| Soup and Sauce in Basket | 98.0% | 100.0% | 60.0% | 66.0% |
| Cheese and Butter in Basket | 100.0% | 96.0% | 76.0% | 82.0% |
| Turn on Stove and Place Moka | 98.0% | 98.0% | 58.0% | 52.0% |
| Black Bowl in Drawer | 98.0% | 100.0% | 36.0% | 50.0% |
| Mugs on Plates | 98.0% | 98.0% | 32.0% | 50.0% |
| Book in Caddy | 100.0% | 100.0% | 82.0% | 90.0% |
| Mug and Pudding on Plate | 96.0% | 92.0% | 60.0% | 54.0% |
| Soup and Cheese in Basket | 94.0% | 100.0% | 70.0% | 80.0% |
| Moka Pots on Stove | 68.0% | 86.0% | 20.0% | 28.0% |
| Mug in Microwave | 98.0% | 96.0% | 46.0% | 48.0% |
| Average | 94.8% | 96.6% | 54.0% | 60.0% |
| Task |
|
Improvement | |
|---|---|---|---|
| Block Handover | 41.0% | 62.0% | +21.0% |
| Bottles Adjust | 31.0% | 40.0% | +9.0% |
| Container Place | 25.0% | 40.0% | +15.0% |
| Diverse Bottles Pick | 21.0% | 27.0% | +6.0% |
| Dual Bottles Pick Easy | 60.0% | 70.0% | +10.0% |
| Dual Bottles Pick Hard | 48.0% | 52.0% | +4.0% |
| Pick Apple Messy | 15.0% | 19.0% | +4.0% |
| Shoe Place | 42.0% | 50.0% | +8.0% |
| Mug Hanging Easy | 7.0% | 12.0% | +5.0% |
| Average | 32.2% | 41.3% | +9.1% |
Evaluation on the RoboTwin 2.0 benchmark. We report the success rate improvement of TACO over the
| Task |
|
Improvement | |
|---|---|---|---|
| Move Can Pot | 42.0% | 57.0% | +15.0% |
| Handover Block | 24.0% | 36.0% | +12.0% |
| Place Shoe | 53.0% | 65.0% | +12.0% |
| Stamp Seal | 26.0% | 38.0% | +12.0% |
| Beat Block Hammer | 69.0% | 79.0% | +10.0% |
| ... | ... | ... | ... |
| Average (41 tasks) | 59.3% | 64.0% | +4.7% |
TACO/
βββ cfn/ # Coin Flipping implementation
β βββ cfn_net.py # CFN model architecture
β βββ feature_dataset.py # Dataset for training CFN
βββ scripts/
β βββ collect_internal_representation/ # Scripts to collect features
β βββ train_cfn/ # CFN training scripts
β βββ eval/ # Evaluation scripts
βββ third_party/
β βββ lerobot/ # Pi0.5 (LeRobot) implementation & LeRobot-Libero evaluation
β βββ openvla/ # OpenVLA implementation
β βββ Robotwin/ # Robotwin environment
βββ README.md
If you find TACO useful for your research, please cite our paper:
@article{yang2025taco,
title={Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach},
author={Siyuan Yang, Yang Zhang, Haoran He, Ling Pan, Xiu Li, Chenjia Bai, Xuelong Li},
journal={arXiv preprint arXiv:2512.02834},
year={2025}
}This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
- LeRobot for the Pi0.5 implementation
- OpenVLA for the OpenVLA base model
- Robotwin for the simulation environment
- Libero for the benchmark tasks
For questions or collaborations, please contact:
- Email: breezeyoung9470@gmail.com
- GitHub Issues: Open an issue