Using uv to manage virtual environment.
git clone https://github.com/ybpy/pistar.git
git submodule update --init --recursive
uv venv --python 3.11.9 /path/to/create/pistar/venv
source /path/to/your/pistar/venv/bin/activate
cd /path/to/pistar
GIT_LFS_SKIP_SMUDGE=1 uv sync --active
GIT_LFS_SKIP_SMUDGE=1 uv pip install -e .
uv pip install -r pistar_requirements.txt
sudo -E apt update && sudo -E apt install screen -yPiStar is not a one-time training pipeline. In general, you first train an initial policy with demo data, then use the policy to generate new data through rollout, and then merge the demo and rollout data. The VLM value model learns the value function according to value_label, rewrites adv_ind for the rollout data, and finally continues training PiStar with the merged and labeled data.
Recommended order:
- Convert the demo data into the LeRobot schema used by PiStar.
- Configure the LIBERO simulation client environment for later evaluation and rollout.
- Train an initial PiStar checkpoint using the demo data.
- Start the policy server and use the initial checkpoint for evaluation / rollout.
- Merge the demo data and rollout data with
scripts/merge_datasets.py. - Train the VLM value model with the merged data.
- Run VLM inference and overwrite
adv_indin the rollout data. - Continue fine-tuning PiStar with the merged data whose
adv_indlabels have been completed.
| Field | Description |
|---|---|
image |
Main-view camera image. |
wrist_image |
Wrist camera image. |
state |
Robot state used as policy input. |
actions |
Action supervision target. |
intervention |
1 indicates human/demo/intervention frames, and 0 indicates autonomous policy rollout frames. |
value_label |
Training supervision for the VLM value model. |
reward |
Sparse success reward. In a successful episode, usually only the last frame is 1. |
reward_label |
Reward signal used when the VLM computes advantage. |
adv_ind |
PiStar's advantage condition, usually positive, negative, or none. |
scripts/merge_datasets.py only keeps the data fields above, as well as timestamp, frame_index, episode_index, index, and task_index. It is only a pure merging script: it does not fill missing fields, recompute labels, rescale images, convert image layout, or determine whether an episode is demo or rollout. If a source dataset is missing some fields, it needs to be re-converted or completed before merging.
You can download the raw LIBERO dataset (rlds format, filtering failure episodes) from here.
LIBERO demo data uses the PiStar-specific conversion script. This script fills in the fields required for training both the VLM and PiStar:
python examples/libero/pistar_rlds_demo_processing.py \
--data_dir /path/to/modified_libero_rlds \
--output_dir /path/to/lerobot_datasets \
--repo_name libero_demo_pistarFor demo data, the conversion script treats each trajectory as a successful expert trajectory by default:
- Every frame has
intervention = 1. - Every frame has
adv_ind = positive. value_labelis generated according to the successful-trajectory rule, with values in[-1, 0].reward_labelis-1 / Tfor non-terminal frames and0for the last frame.
If --output_dir is set, the output path is /path/to/lerobot_datasets/libero_demo_pistar. If it is not set, LeRobot writes the dataset under HF_LEROBOT_HOME.
It is recommended to create a separate virtual environment for simulation so that MuJoCo/LIBERO dependencies are isolated from PiStar training dependencies.
uv venv --python 3.10 /path/to/create/libero/venv
source /path/to/your/libero/venv/bin/activate
uv pip sync examples/libero/requirements.txt third_party/libero/requirements.txt \
--extra-index-url https://download.pytorch.org/whl/cu113 \
--index-strategy=unsafe-best-match
uv pip install -e packages/openpi-client
uv pip install -e third_party/libero
uv pip install --no-deps git+https://github.com/huggingface/lerobot@0cf864870cf29f4738d3ade893e6fd13fbd7cdb5
export PYTHONPATH=$PYTHONPATH:$PWD/third_party/liberoIf you want to make sure the dependencies are installed into this environment, append --python /path/to/your/libero/venv/bin/python to the uv pip install command.
Return to the PiStar environment and train the initial checkpoint with the converted demo data. The training config must point to the demo dataset. The default LIBERO config pi05_star_libero uses the data source configured in src/openpi/training/config.py, so before training you need to confirm that it points to the newly converted demo dataset.
First compute the normalization statistics:
source /path/to/your/pistar/venv/bin/activate
XLA_PYTHON_CLIENT_PREALLOCATE=false python scripts/compute_norm_stats.py --config-name pi05_star_liberoThen start training:
XLA_PYTHON_CLIENT_PREALLOCATE=true XLA_PYTHON_CLIENT_MEM_FRACTION=0.9 python scripts/train.py pi05_star_libero --exp-name=demo_init --overwriteIf loading the pi05_base checkpoint from the cloud (gs://openpi-assets/checkpoints/pi05_base) fails, you can manually download the model weights to a local path before training:
pip install gsutil
mkdir -p /path/to/pi05_base
gsutil -m rsync -r \
gs://openpi-assets/checkpoints/pi05_base \
/path/to/pi05_baseAfter the download is complete, modify the weight_loader path of the corresponding TrainConfig in pistar/src/openpi/training/config.py to the local checkpoint path /path/to/pi05_base/params, then you can start training.
If you want to continue training an existing experiment, replace --overwrite with --resume:
XLA_PYTHON_CLIENT_PREALLOCATE=true XLA_PYTHON_CLIENT_MEM_FRACTION=0.9 python scripts/train.py pi05_star_libero --exp-name=demo_init --resumeAfter the initial PiStar checkpoint is trained, start the server in the PiStar environment on the machine that loads the checkpoint:
python scripts/serve_policy.py policy:checkpoint \
--policy.config=pi05_star_libero_infer \
--policy.dir=checkpoints/pi05_star_libero/demo_init/10000--policy.config must match the infer config corresponding to the checkpoint. --policy.dir should point to the checkpoint directory of one specific step.
source /path/to/your/libero/venv/bin/activate
export PYTHONPATH=$PYTHONPATH:$PWD/third_party/libero
export MUJOCO_GL=egl
export PYOPENGL_PLATFORM=egl
python examples/libero/main.py \
--args.adv_ind_input positive \
--args.task_suite_name libero_10 \
--args.num_trials_per_task 5If EGL rendering reports an error, install the system dependencies first and try again:
sudo -E apt-get update
sudo -E apt-get install -y libegl1 libgl1 libglvnd0 libgles2 libdrm2 libgbm1If EGL still fails, use Xvfb + GLX:
export MUJOCO_GL=glx
xvfb-run -a python examples/libero/main.py \
--args.adv_ind_input positive \
--args.task_suite_name libero_10 \
--args.num_trials_per_task 5Use the same client script, but enable LeRobot rollout export:
source /path/to/your/libero/venv/bin/activate
export PYTHONPATH=$PYTHONPATH:$PWD/third_party/libero
export MUJOCO_GL=egl
export PYOPENGL_PLATFORM=egl
python examples/libero/main.py \
--args.adv_ind_input positive \
--args.task_suite_name libero_10 \
--args.num_trials_per_task 5 \
--args.save_lerobot_rollout true \
--args.rollout_output_dir /path/to/lerobot_datasets \
--args.rollout_repo_id libero_rollout_round1 \
--args.rollout_overwrite trueThere is one point that can easily be confusing here:
--args.adv_ind_input positiveis the condition fed into the PiStar policy during inference.- The saved rollout frames are initially written with
adv_ind = none. This is only a placeholder and needs to be overwritten aspositiveornegativelater byscripts/label_advantage_from_vlm.py.
The simulation rollout output fields are consistent with the demo data. However, because LIBERO rollout has no human intervention, intervention = 0. Successful episodes generate value_label / reward / reward_label according to the successful rule, while failed episodes generate these fields according to the failure rule.
After both demo and rollout data have been aligned to the PiStar LeRobot schema, merge them:
python scripts/merge_datasets.py \
--sources \
/path/to/lerobot_datasets/libero_demo_pistar \
/path/to/lerobot_datasets/libero_rollout_round1 \
--output /path/to/lerobot_datasets/libero_mixed_round1 \
--overwriteThe merged dataset will be used for VLM value training, VLM advantage labeling, and the next round of PiStar fine-tuning. In later multi-round iterations, you can merge the demo data, round 1 rollout, round 2 rollout, and any other data that needs to enter training together.
The VLM value model takes image observations and task text as input and outputs a value estimate for the current frame. During training, it uses value_label as supervision. During inference, it computes value for rollout frames, then combines it with reward_label to compute N-step advantage, and finally writes the result back to adv_ind.
The VLM base checkpoint weights are available from two sources: ybpy/vlm_ckpt · Hugging Face and Google Drive. You may download the checkpoint from either source as you like before running training or inference.
After downloading, place the base weights and tokenizer.model somewhere accessible from the training machine. During training, --load_pretrained loads the base weights; --tokenizer_path needs to point to the local Gemma tokenizer file.
python scripts/train_value.py \
--data_dir /path/to/lerobot_datasets/libero_mixed_round1 \
--checkpoint_dir checkpoints/value_model/libero_round1 \
--batch_size 32 \
--num_train_steps 10000 \
--save_interval 1000 \
--load_pretrained \
--tokenizer_path /path/to/gemma/tokenizer.modelThe training script reads value_label and internally maps it to the value target. If old data contains the misspelled value_lable, the script is still compatible with it; new data should consistently use value_label.
python scripts/label_advantage_from_vlm.py \
--data_dir /path/to/lerobot_datasets/libero_mixed_round1 \
--checkpoint_dir checkpoints/value_model/libero_round1 \
--lookahead 50 \
--top_percent 30 \After the VLM completes adv_ind labeling, return to the PiStar environment and continue training the next-round policy. The training config must point to the dataset that has already been merged and labeled with adv_ind, rather than the initial dataset that only contains demo data.
The real-robot scripts are under the control_your_robot directory. It is recommended to run them from this directory to avoid relative path and local import issues:
cd /path/to/pistar/control_your_robot
export PYTHONPATH=$PWD:$PWD/src:$PYTHONPATHUse software master-slave teleoperation to collect demos:
python example/collect/collect_lerobot_master_slave_teleop.pyBefore running, modify the configuration at the bottom of the script:
REPO_ID: Output LeRobot dataset name.OUTPUT_DIR: Parent directory of the output dataset.TASK_NAME: Task text instruction.MASTER_CANandSLAVE_CAN: CAN interfaces of the master arm and slave arm.FPS,NUM_EPISODES, reset joint positions, camera settings, and other hardware and collection parameters.
This script saves demo-style data. Because every frame is manually teleoperated, it should be treated as positive expert data later. The saved dataset can be merged with rollout data using scripts/merge_datasets.py.
If you need the policy to execute autonomously while allowing human intervention and saving rollout data, use the DAgger deployment script:
python example/deploy/piper_dagger_on_PI0.py \
--model-path /path/to/checkpoint/step_dir \
--task-name "put the white plug into the two-hole socket" \
--train-config pi05_star_white_plug_infer \
--repo-id white_plug_rollout_round1 \
--output-dir /path/to/lerobot_datasets \
--num-episode 50 \
--fps 10 \
--penalty-value -1.0 \
--adv-ind positiveIf you only want to run a trained checkpoint and do not need to collect DAgger rollout data, use the single-arm inference script:
python example/deploy/piper_single_on_PI0.py \
--model-path /path/to/checkpoint/step_dir \
--task-name "put the white plug into the two-hole socket" \
--train-config pi05_star_white_plug_infer \
--max-step 160 \
--num-episode 10 \
--adv-ind positiveFor a regular pi05 checkpoint, --adv-ind can be omitted. For a PiStar checkpoint, you need to pass in the condition expected by the training config; common values are positive or negative.
If you have any questions or encounter any issues, feel free to join our WeChat group for discussion.