Problem description
Using pi0.5 as the VLA, with absolute joint positions as the action space and the action chunk set to 16, GRPO-based RL training in the place_cans_plastic scenario is unstable.
Configuration YAML file
YAML config file:
You can also paste the full config here.
defaults:
- env/robotwin_place_cans_plasticbox@env.train
- env/robotwin_place_cans_plasticbox@env.eval
- model/pi0_5@actor.model
- training_backend/fsdp@actor.fsdp_config
- weight_syncer/patch_syncer@weight_syncer
- override hydra/job_logging: stdout
hydra:
run:
dir: .
output_subdir: null
searchpath:
- file://${oc.env:EMBODIED_PATH}/config/
cluster:
num_nodes: 1
component_placement:
actor, env, rollout: 0,1,3,4,5,6
runner:
task_type: embodied
logger:
log_path: "../results"
project_name: rlinf
experiment_name: "robotwin_grpo_openpi_pi05"
logger_backends: ["tensorboard"] # wandb, swanlab
max_epochs: 1000
max_steps: -1
only_eval: False
val_check_interval: -1
save_interval: 10
resume_dir: null # Optional: path to a saved checkpoint directory, such as 'checkpoints/global_step_10'. If not None, it will be used to resume training.
ckpt_path: null # Optional: path to a .pt checkpoint. If not None, it will be loaded after the model is instantiated (for evaluation).
algorithm:
normalize_advantages: True
kl_penalty: kl # how to estimate kl divergence: kl or kl_penalty
group_size: 8
reward_coef: 1.0
rollout_epoch: 4
eval_rollout_epoch: 1 # set eval_rollout_epoch > 0 when enable runner.only_eval or runner.val_check_interval > 0
reward_type: chunk_level
logprob_type: chunk_level
entropy_type: token_level
update_epoch: 5
adv_type: grpo
loss_type: actor
loss_agg_func: "token-mean"
kl_beta: 0.0
entropy_bonus: 0
clip_ratio_high: 0.2
clip_ratio_low: 0.2
clip_ratio_c: 3.0
value_clip: 0.2
huber_delta: 10.0
gamma: 0.99
gae_lambda: 0.95
filter_rewards: True
rewards_lower_bound: 0.1
rewards_upper_bound: 0.9
# params for generation
sampling_params:
do_sample: True
temperature_train: 1.0
temperature_eval: 0.6
top_k: 50
top_p: 1.0
repetition_penalty: 1.0
add_BOS: False
# length argument for autoregressive sampling
# max length means max amount of tokens to generate
length_params:
max_new_token: null
max_length: 1024
min_length: 1
env:
group_name: "EnvGroup"
enable_offload: True
# Override the default values in env/robotwin_place_cans_plasticbox
train:
total_num_envs: 240
reward_coef: ${algorithm.reward_coef}
max_episode_steps: 320
max_steps_per_rollout_epoch: 320
group_size: ${algorithm.group_size}
assets_path: "/data/zsq/RoboTwin"
seeds_path: ${oc.env:REPO_PATH}/rlinf/envs/robotwin/seeds/train_seeds.json
center_crop: False
task_config:
embodiment: [aloha-agilex]
camera:
collect_wrist_camera: true
domain_randomization:
random_background: false
cluttered_table: false
clean_background_rate: 1
random_head_camera_dis: 0
random_table_height: 0
random_light: false
crazy_random_light_rate: 0
eval:
total_num_envs: 240
auto_reset: True
ignore_terminations: True
max_episode_steps: 320
reward_coef: ${algorithm.reward_coef}
max_steps_per_rollout_epoch: 320
group_size: 1
use_fixed_reset_state_ids: True
is_eval: True
assets_path: "/data/zsq/RoboTwin"
seeds_path: ${oc.env:REPO_PATH}/rlinf/envs/robotwin/seeds/eval_seeds.json
video_cfg:
save_video: True
video_base_dir: ${runner.logger.log_path}/video/eval
center_crop: False
task_config:
embodiment: [aloha-agilex]
camera:
collect_wrist_camera: true
domain_randomization:
random_background: false
cluttered_table: false
clean_background_rate: 1
random_head_camera_dis: 0
random_table_height: 0
random_light: false
crazy_random_light_rate: 0
rollout:
group_name: "RolloutGroup"
backend: "huggingface"
recompute_logprobs: False
enable_offload: True
pipeline_stage_num: 1
model:
model_path: ${actor.model.model_path}
precision: ${actor.model.precision}
actor:
group_name: "ActorGroup"
training_backend: "fsdp"
micro_batch_size: 40
global_batch_size: 960 # 1024
seed: 42
enable_offload: False
# Override the default values in model/openpi_pi05
model:
model_path: "/data/zsq/pi05_ckpt/robotwin_place_cans_plastic/20000_torch"
num_action_chunks: 16 # interface for the env
action_dim: 14
# add_value_head: True
num_steps: 5
use_proprio: True
openpi_data:
adapt_to_pi: False
extra_delta_transform: False
openpi:
config_name: pi05_robotwin
num_images_in_input: 3
action_chunk: ${actor.model.num_action_chunks}
action_env_dim: ${actor.model.action_dim}
num_steps: ${actor.model.num_steps}
noise_method: "flow_sde"
noise_level: 0.3
# value_after_vlm: True
# detach_critic_input: True
optim:
lr: 5.0e-06
value_lr: 1.0e-04
adam_beta1: 0.9
adam_beta2: 0.95
adam_eps: 1.0e-08
weight_decay: 0.01
clip_grad: 1.0
critic_warmup_steps: 0
# Override the default values in training_backend/fsdp
fsdp_config:
strategy: "fsdp"
gradient_checkpointing: False # for openpi, gradient checkpointing is not supported, please do not change this value
mixed_precision:
param_dtype: ${actor.model.precision}
reduce_dtype: ${actor.model.precision}
buffer_dtype: ${actor.model.precision}
reward:
use_reward_model: False
critic:
use_critic_model: False
Log file
You can find the log file in logs/ folder or the $output_dir/$experiment_name folder (defined in the yaml config) if you are using our example scripts.
Log file:
If you cannot find the log, please provide the full log messages here.
Generating Rollout Epochs: 0%| | 0/4 [00:00<?, ?it/s]
�[36m(RolloutGroup(rank=0) pid=1200033)�[0m
Generating Rollout Epochs: 25%|██▌ | 1/4 [15:03<45:11, 903.68s/it]
�[36m(RolloutGroup(rank=0) pid=1200033)�[0m
Generating Rollout Epochs: 50%|█████ | 2/4 [30:32<30:37, 918.57s/it]
�[36m(RolloutGroup(rank=0) pid=1200033)�[0m
Generating Rollout Epochs: 75%|███████▌ | 3/4 [45:52<15:19, 919.35s/it]
�[36m(RolloutGroup(rank=0) pid=1200033)�[0m
Generating Rollout Epochs: 100%|██████████| 4/4 [1:06:31<00:00, 1045.42s/it]
Generating Rollout Epochs: 100%|██████████| 4/4 [1:06:31<00:00, 997.91s/it]
├──────────────────────────────────────────────────── Metric Table ────────────────────────────────────────────────────┤
│ Global Step: 1/1000 │ Progress: ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │ 0.1% │
│ Elapsed: 01:15:04 │ ETA: 1250:03:29 │ Step Time: 4504.714s │
├──────────────────────────────────────────────────────── Time ────────────────────────────────────────────────────────┤
│ │
│actor/run_training=628.6 │cal_adv_and_returns=0.0054 │env/compute_bootstrap_rewards=0.0034 │
│env/env_interact_step=3529.1 │env/interact=3852.6 │env/recv_rollout_results=139.4 │
│env/run_interact_once=3852.6 │generate_rollouts=3861.6 │rollout/generate_one_epoch=3848.5 │
│rollout/predict=124.3 │step=4504.7 │sync_weights=14.564 │
│ │
├──────────────────────────────────────────────────── Environment ─────────────────────────────────────────────────────┤
│ │
│episode_len=320.0 │num_trajectories=960 │return=0.3 │
│reward=0.0009375 │success_once=0.3 │ │
│ │
├────────────────────────────────────────────────────── Rollout ───────────────────────────────────────────────────────┤
│ │
│advantages_max=2.475 │advantages_mean=-0.067 │advantages_min=-2.475 │
│rewards=9.38e-04 │ │ │
│ │
├─────────────────────────────────────────────────── Training/Actor ───────────────────────────────────────────────────┤
│ │
│actor/approx_kl=0.032 │actor/clip_fraction=0.141 │actor/clipped_ratio=0.989 │
│actor/dual_cliped_ratio=0.0000 │actor/entropy_loss=0.0000 │actor/grad_norm=10.714 │
│actor/lr=5.00e-06 │actor/policy_loss=-0.0056 │actor/policy_loss_abs=0.539 │
│actor/ratio=0.995 │actor/ratio_abs=0.146 │actor/total_loss=-0.0014 │
│ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
├──────────────────────────────────────────────────── Metric Table ────────────────────────────────────────────────────┤
│ Global Step: 2/1000 │ Progress: ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │ 0.2% │
│ Elapsed: 02:34:05 │ ETA: 1281:30:50 │ Step Time: 4622.696s │
├──────────────────────────────────────────────────────── Time ────────────────────────────────────────────────────────┤
│ │
│actor/run_training=629.4 │cal_adv_and_returns=0.0091 │env/compute_bootstrap_rewards=0.0033 │
│env/env_interact_step=3507.8 │env/interact=4098.7 │env/recv_rollout_results=132.9 │
│env/run_interact_once=4098.7 │generate_rollouts=4103.0 │rollout/generate_one_epoch=4093.4 │
│rollout/predict=124.0 │step=4740.7 │sync_weights=8.167 │
│ │
├──────────────────────────────────────────────────── Environment ─────────────────────────────────────────────────────┤
│ │
│episode_len=320.0 │num_trajectories=960 │return=0.20520833 │
│reward=0.00064127607 │success_once=0.20520833 │ │
│ │
├────────────────────────────────────────────────────── Rollout ───────────────────────────────────────────────────────┤�[36m(RolloutGroup(rank=0) pid=1200033)�[0m
Environment
python -V:Python 3.11.14
uv pip list:
Package Version
absl-py 2.4.0
accelerate 1.13.0
addict 2.4.0
aiohappyeyeballs 2.6.2
aiohttp 3.14.0
aiohttp-cors 0.8.1
aiosignal 1.4.0
annotated-types 0.7.0
anyio 4.13.0
argcomplete 3.6.3
array-record 0.8.3
asttokens 3.0.1
astunparse 1.6.3
attrs 26.1.0
augmax 0.4.1
av 17.1.0
bddl 3.6.0
beartype 0.19.0
beautifulsoup4 4.14.3
blinker 1.9.0
boto 2.49.0
cachebox 5.2.3
cachetools 5.5.2
certifi 2026.5.20
cffi 2.0.0
cfgv 3.5.0
charset-normalizer 3.4.7
chex 0.1.90
click 8.4.1
cloudpickle 3.1.2
cmake 4.3.2
colorful 0.6.0a1
colorlog 6.10.1
comm 0.2.3
configargparse 1.7.5
contourpy 1.3.3
crcmod 1.7
cryptography 46.0.7
cycler 0.12.1
dash 4.3.0rc0
datasets 3.6.0
debugpy 1.8.21
decorator 5.3.1
deepdiff 9.1.0
diffusers 0.38.0
dill 0.3.8
distlib 0.4.1
distro 1.9.0
dm-control 1.0.41
dm-env 1.6
dm-tree 0.1.10
docstring-parser 0.18.0
donfig 0.8.1.post1
draccus 0.10.0
easydict 1.13
einops 0.8.2
embreex 4.4.0
equinox 0.13.8
etils 1.14.0
evdev 1.9.3
executing 2.2.1
farama-notifications 0.0.6
fasteners 0.20
fastjsonschema 2.21.2
filelock 3.29.1
flash-attn 2.7.4.post1
flask 3.1.3
flatbuffers 25.12.19
flax 0.10.2
fonttools 4.63.0
frozenlist 1.8.0
fsspec 2025.3.0
ftfy 6.3.1
future 1.0.0
gast 0.7.0
gcs-oauth2-boto-plugin 3.3
gcsfs 2025.3.0
gdown 6.1.0
gitdb 4.0.12
gitpython 3.1.50
glfw 2.10.0
google-api-core 2.31.0
google-apitools 0.5.35
google-auth 2.39.0
google-auth-httplib2 0.4.0
google-auth-oauthlib 1.4.0
google-cloud-core 2.6.0
google-cloud-storage 3.11.0
google-crc32c 1.8.0
google-pasta 0.2.0
google-reauth 0.1.1
google-resumable-media 2.10.0
googleapis-common-protos 1.75.0
grpcio 1.81.0
gsutil 5.37
gym 0.26.2
gym-aloha 0.1.3
gym-notices 0.1.0
gymnasium 0.29.1
h11 0.16.0
h5py 3.14.0
hf-transfer 0.1.9
hf-xet 1.5.1.dev1
httpcore 1.0.9
httplib2 0.20.4
httpx 0.28.1
httpx-sse 0.4.3
huggingface-hub 0.36.2
humanize 4.15.0
hydra-core 1.4.0.dev1
icmplib 3.0.4
identify 2.6.19
idna 3.18
imageio 2.37.3
imageio-ffmpeg 0.6.0
immutabledict 4.3.1
importlib-metadata 9.0.0
importlib-resources 7.1.0
iniconfig 2.3.0
inquirerpy 0.3.4
ipython 9.14.1
ipython-pygments-lexers 1.1.1
ipywidgets 8.1.8
itsdangerous 2.2.0
janus 2.0.0
jax 0.5.3
jax-cuda12-pjrt 0.5.3
jax-cuda12-plugin 0.5.3
jaxlib 0.5.3
jaxtyping 0.2.36
jedi 0.20.0
jinja2 3.1.6
jiter 0.15.0
joblib 1.5.3
jsonlines 4.0.0
jsonschema 4.26.0
jsonschema-specifications 2025.9.1
jupyter-core 5.9.1
jupyterlab-widgets 3.0.16
jupytext 1.19.3
keras 3.14.1
kiwisolver 1.5.0
labmaze 1.0.6
lerobot 0.1.0
libclang 18.1.1
liger-kernel 0.8.0
llvmlite 0.48.0rc1
lxml 7.0.0a1
manifold3d 3.5.1
mapbox-earcut 2.0.0
markdown 3.10.2
markdown-it-py 4.2.0
markupsafe 3.0.3
matplotlib 3.11.0rc2
matplotlib-inline 0.2.2
mcp 1.27.2
mdit-py-plugins 0.6.1
mdurl 0.1.2
mergedeep 1.3.4
ml-collections 1.0.0
ml-dtypes 0.5.4
modelscope 1.37.1
monotonic 1.6
mplib 0.2.1
mpmath 1.3.0
msgpack 1.2.0rc1
msgspec 0.21.1
mujoco 3.9.0
multidict 6.7.1
multiprocess 0.70.16
mypy-extensions 1.1.0
namex 0.1.0
narwhals 2.22.1
nbformat 5.10.4
nest-asyncio 1.6.0
networkx 3.6.1
ninja 1.13.0
nltk 3.9.4
nodeenv 1.10.0
numba 0.66.0rc1
numcodecs 0.16.5
numpy 1.26.4
numpy-quaternion 2024.0.13
numpydantic 1.8.1
nvidia-cublas-cu12 12.4.5.8
nvidia-cuda-cupti-cu12 12.4.127
nvidia-cuda-nvcc-cu12 12.9.86
nvidia-cuda-nvrtc-cu12 12.4.127
nvidia-cuda-runtime-cu12 12.4.127
nvidia-cudnn-cu12 9.1.0.70
nvidia-cufft-cu12 11.2.1.3
nvidia-curand-cu12 10.3.5.147
nvidia-curobo 0.0.0
nvidia-cusolver-cu12 11.6.1.9
nvidia-cusparse-cu12 12.3.1.170
nvidia-cusparselt-cu12 0.6.2
nvidia-libnvcomp-cu12 5.2.0.13
nvidia-ml-py 13.595.45
nvidia-nccl-cu12 2.21.5
nvidia-nvcomp-cu12 5.2.0.13
nvidia-nvjitlink-cu12 12.4.127
nvidia-nvtx-cu12 12.4.127
nvitop 1.7.0
oauth2client 4.1.3
oauthlib 3.3.1
omegaconf 2.4.0.dev11
open3d 0.19.0
openai 2.41.0
opencensus 0.11.4
opencensus-context 0.2.dev0
opencv-python 4.11.0.86
opencv-python-headless 4.11.0.86
openexr 3.4.12
openpi 0.1.0
openpi-client 0.1.0
opentelemetry-api 1.42.1
opentelemetry-exporter-prometheus 0.63b1
opentelemetry-proto 1.42.1
opentelemetry-sdk 1.42.1
opentelemetry-semantic-conventions 0.63b1
opt-einsum 3.4.0
optax 0.2.8
optree 0.19.1
orbax-checkpoint 0.11.13
orderly-set 5.5.0
orjson 3.11.9
packaging 26.2
pandas 3.0.3
parso 0.8.7
peft 0.19.1
pexpect 4.9.0
pfzy 0.3.4
pillow 12.2.0
pip 26.1.2
platformdirs 4.10.0
plotly 6.8.0
pluggy 1.6.0
polars 1.41.2
polars-runtime-32 1.41.2
pre-commit 4.6.0
prettytable 3.17.0
prometheus-client 0.25.0
promise 2.3
prompt-toolkit 3.0.52
propcache 0.5.2
proto-plus 1.28.0
protobuf 6.33.6
psutil 7.2.2
ptyprocess 0.7.0
pure-eval 0.2.3
py-spy 0.4.2
pyarrow 24.0.0
pyasn1 0.6.3
pyasn1-modules 0.4.2
pybind11 3.0.4
pycollada 0.9.3
pycparser 3.0
pydantic 2.14.0a1
pydantic-core 2.47.0
pydantic-settings 2.14.1
pyecharts 2.1.0
pygments 2.20.0
pyjwt 2.13.0
pymunk 7.2.0
pynput 1.8.2
pyopengl 3.1.10
pyopenssl 26.0.0
pyparsing 3.3.2
pyperclip 1.11.0
pyquaternion 0.9.9
pyrealsense2 2.58.1.10581
pysocks 1.7.1
pytest 9.0.3
python-dateutil 2.9.0.post0
python-discovery 1.4.0
python-dotenv 1.2.2
python-multipart 0.0.32
python-xlib 0.33
pyu2f 0.1.5
pyyaml 6.0.3
pyyaml-include 1.4.1
pyzmq 27.1.0
ray 2.55.1
referencing 0.37.0
regex 2026.5.9
requests 2.34.2
requests-oauthlib 2.0.0
rerun-sdk 0.23.1
retry-decorator 2.0a1
retrying 1.4.2
rich 14.3.4
robosuite 1.4.1
rpds-py 2026.5.1
rsa 4.7.2
rtree 1.4.1
ruff 0.15.16
safetensors 0.8.0rc1
sapien 3.0.1
scikit-learn 1.9.0
scipy 1.17.1
sentencepiece 0.2.1
sentry-sdk 3.0.0a7
setuptools 75.8.2
setuptools-scm 10.0.5
shapely 2.1.2
simple-parsing 0.1.8
simplejson 4.1.1
six 1.17.0
smart-open 7.6.1
smmap 5.0.3
sniffio 1.3.1
soupsieve 2.8.4
sse-starlette 3.4.4
stack-data 0.6.3
starlette 1.2.1
svg-path 7.0
svgwrite 1.4.3
swanlab 0.8.0rc4
sympy 1.13.1
tensorboard 2.20.0
tensorboard-data-server 0.7.2
tensorflow 2.21.0
tensorflow-addons 0.23.0
tensorflow-datasets 4.9.10
tensorflow-graphics 2021.12.3
tensorflow-metadata 1.17.3
tensorstore 0.1.84
termcolor 3.3.0
threadpoolctl 3.6.0
timm 1.0.27
tokenizers 0.21.4
toml 0.10.2
toolz 1.1.0
toppra 0.6.3
torch 2.6.0
torchcodec 0.2.0
torchdata 0.11.0
torchvision 0.21.0
tqdm 4.67.3
tqdm-loggable 0.4.1
traitlets 5.15.1
transformers 4.53.2
transforms3d 0.4.2
tree 0.2.4
treescope 0.1.10
trimesh 4.12.2
triton 3.2.0
typeguard 4.5.2
typing-extensions 4.15.0
typing-inspect 0.9.0
typing-inspection 0.4.2
tyro 1.0.13
urllib3 2.7.0
uv 0.11.19
uvicorn 0.49.0
vcs-versioning 1.1.1
vhacdx 0.0.10
virtualenv 21.4.2
viser 1.0.30
wadler-lindig 0.1.7
wandb 0.25.0
warp-lang 1.11.1
watchdog 6.0.0
wcwidth 0.7.0
websockets 16.0
werkzeug 3.1.8
wheel 0.47.0
widgetsnbextension 4.0.15
wrapt 2.2.1
xxhash 3.7.0
yarl 1.24.2
yourdfpy 0.0.60
zarr 3.1.5
zipp 4.1.0
zstandard 0.25.0
nvidia-smi:
Fri Jun 12 11:51:56 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.126.09 Driver Version: 580.126.09 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100-SXM4-80GB On | 00000000:10:00.0 Off | 0 |
| N/A 54C P0 98W / 400W | 67108MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA A100-SXM4-80GB On | 00000000:16:00.0 Off | 0 |
| N/A 47C P0 93W / 400W | 67417MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA A100-SXM4-80GB On | 00000000:2F:00.0 Off | 0 |
| N/A 63C P0 356W / 400W | 16538MiB / 81920MiB | 67% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA A100-SXM4-80GB On | 00000000:33:00.0 Off | 0 |
| N/A 54C P0 120W / 400W | 67353MiB / 81920MiB | 94% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 4 NVIDIA A100-SXM4-80GB On | 00000000:8A:00.0 Off | 0 |
| N/A 50C P0 94W / 400W | 67929MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 5 NVIDIA A100-SXM4-80GB On | 00000000:8F:00.0 Off | 0 |
| N/A 56C P0 144W / 400W | 67353MiB / 81920MiB | 93% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 6 NVIDIA A100-SXM4-80GB On | 00000000:C6:00.0 Off | 0 |
| N/A 53C P0 200W / 400W | 68109MiB / 81920MiB | 100% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 7 NVIDIA A100-SXM4-80GB On | 00000000:CA:00.0 Off | 0 |
| N/A 45C P0 70W / 400W | 4MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
RLinf version: N/A
Docker image tag: N/A
Other reproduction info
The approx_kl and clip_fraction remain relatively low, suggesting that the policy updates may be too conservative.
Before submitting a new issue...
Problem description
Using pi0.5 as the VLA, with absolute joint positions as the action space and the action chunk set to 16, GRPO-based RL training in the place_cans_plastic scenario is unstable.
Configuration YAML file
YAML config file:
You can also paste the full config here.
Log file
You can find the log file in
logs/folder or the $output_dir/$experiment_name folder (defined in the yaml config) if you are using our example scripts.Log file:
If you cannot find the log, please provide the full log messages here.
Environment
python -V:Python 3.11.14
uv pip list:
Package Version
absl-py 2.4.0
accelerate 1.13.0
addict 2.4.0
aiohappyeyeballs 2.6.2
aiohttp 3.14.0
aiohttp-cors 0.8.1
aiosignal 1.4.0
annotated-types 0.7.0
anyio 4.13.0
argcomplete 3.6.3
array-record 0.8.3
asttokens 3.0.1
astunparse 1.6.3
attrs 26.1.0
augmax 0.4.1
av 17.1.0
bddl 3.6.0
beartype 0.19.0
beautifulsoup4 4.14.3
blinker 1.9.0
boto 2.49.0
cachebox 5.2.3
cachetools 5.5.2
certifi 2026.5.20
cffi 2.0.0
cfgv 3.5.0
charset-normalizer 3.4.7
chex 0.1.90
click 8.4.1
cloudpickle 3.1.2
cmake 4.3.2
colorful 0.6.0a1
colorlog 6.10.1
comm 0.2.3
configargparse 1.7.5
contourpy 1.3.3
crcmod 1.7
cryptography 46.0.7
cycler 0.12.1
dash 4.3.0rc0
datasets 3.6.0
debugpy 1.8.21
decorator 5.3.1
deepdiff 9.1.0
diffusers 0.38.0
dill 0.3.8
distlib 0.4.1
distro 1.9.0
dm-control 1.0.41
dm-env 1.6
dm-tree 0.1.10
docstring-parser 0.18.0
donfig 0.8.1.post1
draccus 0.10.0
easydict 1.13
einops 0.8.2
embreex 4.4.0
equinox 0.13.8
etils 1.14.0
evdev 1.9.3
executing 2.2.1
farama-notifications 0.0.6
fasteners 0.20
fastjsonschema 2.21.2
filelock 3.29.1
flash-attn 2.7.4.post1
flask 3.1.3
flatbuffers 25.12.19
flax 0.10.2
fonttools 4.63.0
frozenlist 1.8.0
fsspec 2025.3.0
ftfy 6.3.1
future 1.0.0
gast 0.7.0
gcs-oauth2-boto-plugin 3.3
gcsfs 2025.3.0
gdown 6.1.0
gitdb 4.0.12
gitpython 3.1.50
glfw 2.10.0
google-api-core 2.31.0
google-apitools 0.5.35
google-auth 2.39.0
google-auth-httplib2 0.4.0
google-auth-oauthlib 1.4.0
google-cloud-core 2.6.0
google-cloud-storage 3.11.0
google-crc32c 1.8.0
google-pasta 0.2.0
google-reauth 0.1.1
google-resumable-media 2.10.0
googleapis-common-protos 1.75.0
grpcio 1.81.0
gsutil 5.37
gym 0.26.2
gym-aloha 0.1.3
gym-notices 0.1.0
gymnasium 0.29.1
h11 0.16.0
h5py 3.14.0
hf-transfer 0.1.9
hf-xet 1.5.1.dev1
httpcore 1.0.9
httplib2 0.20.4
httpx 0.28.1
httpx-sse 0.4.3
huggingface-hub 0.36.2
humanize 4.15.0
hydra-core 1.4.0.dev1
icmplib 3.0.4
identify 2.6.19
idna 3.18
imageio 2.37.3
imageio-ffmpeg 0.6.0
immutabledict 4.3.1
importlib-metadata 9.0.0
importlib-resources 7.1.0
iniconfig 2.3.0
inquirerpy 0.3.4
ipython 9.14.1
ipython-pygments-lexers 1.1.1
ipywidgets 8.1.8
itsdangerous 2.2.0
janus 2.0.0
jax 0.5.3
jax-cuda12-pjrt 0.5.3
jax-cuda12-plugin 0.5.3
jaxlib 0.5.3
jaxtyping 0.2.36
jedi 0.20.0
jinja2 3.1.6
jiter 0.15.0
joblib 1.5.3
jsonlines 4.0.0
jsonschema 4.26.0
jsonschema-specifications 2025.9.1
jupyter-core 5.9.1
jupyterlab-widgets 3.0.16
jupytext 1.19.3
keras 3.14.1
kiwisolver 1.5.0
labmaze 1.0.6
lerobot 0.1.0
libclang 18.1.1
liger-kernel 0.8.0
llvmlite 0.48.0rc1
lxml 7.0.0a1
manifold3d 3.5.1
mapbox-earcut 2.0.0
markdown 3.10.2
markdown-it-py 4.2.0
markupsafe 3.0.3
matplotlib 3.11.0rc2
matplotlib-inline 0.2.2
mcp 1.27.2
mdit-py-plugins 0.6.1
mdurl 0.1.2
mergedeep 1.3.4
ml-collections 1.0.0
ml-dtypes 0.5.4
modelscope 1.37.1
monotonic 1.6
mplib 0.2.1
mpmath 1.3.0
msgpack 1.2.0rc1
msgspec 0.21.1
mujoco 3.9.0
multidict 6.7.1
multiprocess 0.70.16
mypy-extensions 1.1.0
namex 0.1.0
narwhals 2.22.1
nbformat 5.10.4
nest-asyncio 1.6.0
networkx 3.6.1
ninja 1.13.0
nltk 3.9.4
nodeenv 1.10.0
numba 0.66.0rc1
numcodecs 0.16.5
numpy 1.26.4
numpy-quaternion 2024.0.13
numpydantic 1.8.1
nvidia-cublas-cu12 12.4.5.8
nvidia-cuda-cupti-cu12 12.4.127
nvidia-cuda-nvcc-cu12 12.9.86
nvidia-cuda-nvrtc-cu12 12.4.127
nvidia-cuda-runtime-cu12 12.4.127
nvidia-cudnn-cu12 9.1.0.70
nvidia-cufft-cu12 11.2.1.3
nvidia-curand-cu12 10.3.5.147
nvidia-curobo 0.0.0
nvidia-cusolver-cu12 11.6.1.9
nvidia-cusparse-cu12 12.3.1.170
nvidia-cusparselt-cu12 0.6.2
nvidia-libnvcomp-cu12 5.2.0.13
nvidia-ml-py 13.595.45
nvidia-nccl-cu12 2.21.5
nvidia-nvcomp-cu12 5.2.0.13
nvidia-nvjitlink-cu12 12.4.127
nvidia-nvtx-cu12 12.4.127
nvitop 1.7.0
oauth2client 4.1.3
oauthlib 3.3.1
omegaconf 2.4.0.dev11
open3d 0.19.0
openai 2.41.0
opencensus 0.11.4
opencensus-context 0.2.dev0
opencv-python 4.11.0.86
opencv-python-headless 4.11.0.86
openexr 3.4.12
openpi 0.1.0
openpi-client 0.1.0
opentelemetry-api 1.42.1
opentelemetry-exporter-prometheus 0.63b1
opentelemetry-proto 1.42.1
opentelemetry-sdk 1.42.1
opentelemetry-semantic-conventions 0.63b1
opt-einsum 3.4.0
optax 0.2.8
optree 0.19.1
orbax-checkpoint 0.11.13
orderly-set 5.5.0
orjson 3.11.9
packaging 26.2
pandas 3.0.3
parso 0.8.7
peft 0.19.1
pexpect 4.9.0
pfzy 0.3.4
pillow 12.2.0
pip 26.1.2
platformdirs 4.10.0
plotly 6.8.0
pluggy 1.6.0
polars 1.41.2
polars-runtime-32 1.41.2
pre-commit 4.6.0
prettytable 3.17.0
prometheus-client 0.25.0
promise 2.3
prompt-toolkit 3.0.52
propcache 0.5.2
proto-plus 1.28.0
protobuf 6.33.6
psutil 7.2.2
ptyprocess 0.7.0
pure-eval 0.2.3
py-spy 0.4.2
pyarrow 24.0.0
pyasn1 0.6.3
pyasn1-modules 0.4.2
pybind11 3.0.4
pycollada 0.9.3
pycparser 3.0
pydantic 2.14.0a1
pydantic-core 2.47.0
pydantic-settings 2.14.1
pyecharts 2.1.0
pygments 2.20.0
pyjwt 2.13.0
pymunk 7.2.0
pynput 1.8.2
pyopengl 3.1.10
pyopenssl 26.0.0
pyparsing 3.3.2
pyperclip 1.11.0
pyquaternion 0.9.9
pyrealsense2 2.58.1.10581
pysocks 1.7.1
pytest 9.0.3
python-dateutil 2.9.0.post0
python-discovery 1.4.0
python-dotenv 1.2.2
python-multipart 0.0.32
python-xlib 0.33
pyu2f 0.1.5
pyyaml 6.0.3
pyyaml-include 1.4.1
pyzmq 27.1.0
ray 2.55.1
referencing 0.37.0
regex 2026.5.9
requests 2.34.2
requests-oauthlib 2.0.0
rerun-sdk 0.23.1
retry-decorator 2.0a1
retrying 1.4.2
rich 14.3.4
robosuite 1.4.1
rpds-py 2026.5.1
rsa 4.7.2
rtree 1.4.1
ruff 0.15.16
safetensors 0.8.0rc1
sapien 3.0.1
scikit-learn 1.9.0
scipy 1.17.1
sentencepiece 0.2.1
sentry-sdk 3.0.0a7
setuptools 75.8.2
setuptools-scm 10.0.5
shapely 2.1.2
simple-parsing 0.1.8
simplejson 4.1.1
six 1.17.0
smart-open 7.6.1
smmap 5.0.3
sniffio 1.3.1
soupsieve 2.8.4
sse-starlette 3.4.4
stack-data 0.6.3
starlette 1.2.1
svg-path 7.0
svgwrite 1.4.3
swanlab 0.8.0rc4
sympy 1.13.1
tensorboard 2.20.0
tensorboard-data-server 0.7.2
tensorflow 2.21.0
tensorflow-addons 0.23.0
tensorflow-datasets 4.9.10
tensorflow-graphics 2021.12.3
tensorflow-metadata 1.17.3
tensorstore 0.1.84
termcolor 3.3.0
threadpoolctl 3.6.0
timm 1.0.27
tokenizers 0.21.4
toml 0.10.2
toolz 1.1.0
toppra 0.6.3
torch 2.6.0
torchcodec 0.2.0
torchdata 0.11.0
torchvision 0.21.0
tqdm 4.67.3
tqdm-loggable 0.4.1
traitlets 5.15.1
transformers 4.53.2
transforms3d 0.4.2
tree 0.2.4
treescope 0.1.10
trimesh 4.12.2
triton 3.2.0
typeguard 4.5.2
typing-extensions 4.15.0
typing-inspect 0.9.0
typing-inspection 0.4.2
tyro 1.0.13
urllib3 2.7.0
uv 0.11.19
uvicorn 0.49.0
vcs-versioning 1.1.1
vhacdx 0.0.10
virtualenv 21.4.2
viser 1.0.30
wadler-lindig 0.1.7
wandb 0.25.0
warp-lang 1.11.1
watchdog 6.0.0
wcwidth 0.7.0
websockets 16.0
werkzeug 3.1.8
wheel 0.47.0
widgetsnbextension 4.0.15
wrapt 2.2.1
xxhash 3.7.0
yarl 1.24.2
yourdfpy 0.0.60
zarr 3.1.5
zipp 4.1.0
zstandard 0.25.0
nvidia-smi:
Fri Jun 12 11:51:56 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.126.09 Driver Version: 580.126.09 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100-SXM4-80GB On | 00000000:10:00.0 Off | 0 |
| N/A 54C P0 98W / 400W | 67108MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA A100-SXM4-80GB On | 00000000:16:00.0 Off | 0 |
| N/A 47C P0 93W / 400W | 67417MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA A100-SXM4-80GB On | 00000000:2F:00.0 Off | 0 |
| N/A 63C P0 356W / 400W | 16538MiB / 81920MiB | 67% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA A100-SXM4-80GB On | 00000000:33:00.0 Off | 0 |
| N/A 54C P0 120W / 400W | 67353MiB / 81920MiB | 94% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 4 NVIDIA A100-SXM4-80GB On | 00000000:8A:00.0 Off | 0 |
| N/A 50C P0 94W / 400W | 67929MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 5 NVIDIA A100-SXM4-80GB On | 00000000:8F:00.0 Off | 0 |
| N/A 56C P0 144W / 400W | 67353MiB / 81920MiB | 93% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 6 NVIDIA A100-SXM4-80GB On | 00000000:C6:00.0 Off | 0 |
| N/A 53C P0 200W / 400W | 68109MiB / 81920MiB | 100% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 7 NVIDIA A100-SXM4-80GB On | 00000000:CA:00.0 Off | 0 |
| N/A 45C P0 70W / 400W | 4MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
RLinf version: N/A
Docker image tag: N/A
Other reproduction info
The approx_kl and clip_fraction remain relatively low, suggesting that the policy updates may be too conservative.
Before submitting a new issue...