[Troubleshoot]: RL training with pi0.5 and GRPO in RoboTwin is unstable.

### Problem description

Using pi0.5 as the VLA, with absolute joint positions as the action space and the action chunk set to 16, GRPO-based RL training in the place_cans_plastic scenario is unstable.

### Configuration YAML file

YAML config file:

You can also paste the full config here.
```
defaults:
  - env/robotwin_place_cans_plasticbox@env.train
  - env/robotwin_place_cans_plasticbox@env.eval
  - model/pi0_5@actor.model
  - training_backend/fsdp@actor.fsdp_config
  - weight_syncer/patch_syncer@weight_syncer
  - override hydra/job_logging: stdout

hydra:
  run:
    dir: .
  output_subdir: null
  searchpath:
    - file://${oc.env:EMBODIED_PATH}/config/

cluster:
  num_nodes: 1
  component_placement:
    actor, env, rollout: 0,1,3,4,5,6

runner:
  task_type: embodied
  logger:
    log_path: "../results"
    project_name: rlinf
    experiment_name: "robotwin_grpo_openpi_pi05"
    logger_backends: ["tensorboard"] # wandb, swanlab

  max_epochs: 1000
  max_steps: -1

  only_eval: False
  val_check_interval: -1
  save_interval: 10

  resume_dir: null # Optional: path to a saved checkpoint directory, such as 'checkpoints/global_step_10'. If not None, it will be used to resume training.
  ckpt_path: null  # Optional: path to a .pt checkpoint. If not None, it will be loaded after the model is instantiated (for evaluation).

algorithm:
  normalize_advantages: True
  kl_penalty: kl  # how to estimate kl divergence: kl or kl_penalty
  group_size: 8
  reward_coef: 1.0

  rollout_epoch: 4
  eval_rollout_epoch: 1 # set eval_rollout_epoch > 0 when enable runner.only_eval or runner.val_check_interval > 0

  reward_type: chunk_level
  logprob_type: chunk_level
  entropy_type: token_level

  update_epoch: 5
  adv_type: grpo
  loss_type: actor
  loss_agg_func: "token-mean" 
  kl_beta: 0.0
  entropy_bonus: 0
  clip_ratio_high: 0.2
  clip_ratio_low: 0.2
  clip_ratio_c: 3.0
  value_clip: 0.2
  huber_delta: 10.0

  gamma: 0.99
  gae_lambda: 0.95

  filter_rewards: True
  rewards_lower_bound: 0.1
  rewards_upper_bound: 0.9
  # params for generation
  sampling_params:
    do_sample: True
    temperature_train: 1.0
    temperature_eval: 0.6
    top_k: 50
    top_p: 1.0
    repetition_penalty: 1.0
    add_BOS: False

  # length argument for autoregressive sampling
  # max length means max amount of tokens to generate
  length_params:
    max_new_token: null
    max_length: 1024
    min_length: 1

env:
  group_name: "EnvGroup"
  enable_offload: True
  # Override the default values in env/robotwin_place_cans_plasticbox
  train:
    total_num_envs: 240
    reward_coef: ${algorithm.reward_coef}
    max_episode_steps: 320
    max_steps_per_rollout_epoch: 320
    group_size: ${algorithm.group_size}
    assets_path: "/data/zsq/RoboTwin"
    seeds_path: ${oc.env:REPO_PATH}/rlinf/envs/robotwin/seeds/train_seeds.json
    center_crop: False
    task_config:
      embodiment: [aloha-agilex]
      camera:
        collect_wrist_camera: true
      domain_randomization:
        random_background: false
        cluttered_table: false
        clean_background_rate: 1
        random_head_camera_dis: 0
        random_table_height: 0
        random_light: false
        crazy_random_light_rate: 0
  eval:
    total_num_envs: 240
    auto_reset: True
    ignore_terminations: True
    max_episode_steps: 320
    reward_coef: ${algorithm.reward_coef}
    max_steps_per_rollout_epoch: 320
    group_size: 1
    use_fixed_reset_state_ids: True
    is_eval: True
    assets_path: "/data/zsq/RoboTwin"
    seeds_path: ${oc.env:REPO_PATH}/rlinf/envs/robotwin/seeds/eval_seeds.json
    video_cfg:
      save_video: True
      video_base_dir: ${runner.logger.log_path}/video/eval
    center_crop: False
    task_config:
      embodiment: [aloha-agilex]
      camera:
        collect_wrist_camera: true
      domain_randomization:
        random_background: false
        cluttered_table: false
        clean_background_rate: 1
        random_head_camera_dis: 0
        random_table_height: 0
        random_light: false
        crazy_random_light_rate: 0

rollout:
  group_name: "RolloutGroup"
  backend: "huggingface"
  recompute_logprobs: False
  enable_offload: True
  pipeline_stage_num: 1
  model:
    model_path: ${actor.model.model_path}
    precision: ${actor.model.precision}

actor:
  group_name: "ActorGroup"
  training_backend: "fsdp"
  micro_batch_size: 40
  global_batch_size: 960 # 1024
  seed: 42
  enable_offload: False

  # Override the default values in model/openpi_pi05
  model:
    model_path: "/data/zsq/pi05_ckpt/robotwin_place_cans_plastic/20000_torch"
    num_action_chunks: 16 # interface for the env
    action_dim: 14
    # add_value_head: True
    num_steps: 5
    use_proprio: True
    openpi_data:
      adapt_to_pi: False
      extra_delta_transform: False
    openpi:
      config_name: pi05_robotwin
      num_images_in_input: 3
      action_chunk: ${actor.model.num_action_chunks}
      action_env_dim: ${actor.model.action_dim}
      num_steps: ${actor.model.num_steps}
      noise_method: "flow_sde"
      noise_level: 0.3
      # value_after_vlm: True
      # detach_critic_input: True

  optim:
    lr: 5.0e-06
    value_lr: 1.0e-04
    adam_beta1: 0.9
    adam_beta2: 0.95
    adam_eps: 1.0e-08
    weight_decay: 0.01
    clip_grad: 1.0
    critic_warmup_steps: 0

  # Override the default values in training_backend/fsdp
  fsdp_config:
    strategy: "fsdp"
    gradient_checkpointing: False # for openpi, gradient checkpointing is not supported, please do not change this value
    mixed_precision:
      param_dtype: ${actor.model.precision}
      reduce_dtype: ${actor.model.precision}
      buffer_dtype: ${actor.model.precision}

reward:
  use_reward_model: False

critic:
  use_critic_model: False
```


### Log file

You can find the log file in `logs/` folder or the $output_dir/$experiment_name folder (defined in the yaml config) if you are using our example scripts.

Log file:

If you cannot find the log, please provide the full log messages here.
```
Generating Rollout Epochs:   0%|          | 0/4 [00:00<?, ?it/s]
[36m(RolloutGroup(rank=0) pid=1200033)[0m 
Generating Rollout Epochs:  25%|██▌       | 1/4 [15:03<45:11, 903.68s/it]
[36m(RolloutGroup(rank=0) pid=1200033)[0m 
Generating Rollout Epochs:  50%|█████     | 2/4 [30:32<30:37, 918.57s/it]
[36m(RolloutGroup(rank=0) pid=1200033)[0m 
Generating Rollout Epochs:  75%|███████▌  | 3/4 [45:52<15:19, 919.35s/it]
[36m(RolloutGroup(rank=0) pid=1200033)[0m 
Generating Rollout Epochs: 100%|██████████| 4/4 [1:06:31<00:00, 1045.42s/it]
Generating Rollout Epochs: 100%|██████████| 4/4 [1:06:31<00:00, 997.91s/it] 

├──────────────────────────────────────────────────── Metric Table ────────────────────────────────────────────────────┤
│ Global Step:    1/1000 │ Progress: ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │   0.1%                                 │
│ Elapsed: 01:15:04 │ ETA: 1250:03:29 │ Step Time: 4504.714s                                                           │
├──────────────────────────────────────────────────────── Time ────────────────────────────────────────────────────────┤
│                                                                                                                      │
│actor/run_training=628.6               │cal_adv_and_returns=0.0054             │env/compute_bootstrap_rewards=0.0034  │
│env/env_interact_step=3529.1           │env/interact=3852.6                    │env/recv_rollout_results=139.4        │
│env/run_interact_once=3852.6           │generate_rollouts=3861.6               │rollout/generate_one_epoch=3848.5     │
│rollout/predict=124.3                  │step=4504.7                            │sync_weights=14.564                   │
│                                                                                                                      │
├──────────────────────────────────────────────────── Environment ─────────────────────────────────────────────────────┤
│                                                                                                                      │
│episode_len=320.0                      │num_trajectories=960                   │return=0.3                            │
│reward=0.0009375                       │success_once=0.3                       │                                      │
│                                                                                                                      │
├────────────────────────────────────────────────────── Rollout ───────────────────────────────────────────────────────┤
│                                                                                                                      │
│advantages_max=2.475                   │advantages_mean=-0.067                 │advantages_min=-2.475                 │
│rewards=9.38e-04                       │                                       │                                      │
│                                                                                                                      │
├─────────────────────────────────────────────────── Training/Actor ───────────────────────────────────────────────────┤
│                                                                                                                      │
│actor/approx_kl=0.032                  │actor/clip_fraction=0.141              │actor/clipped_ratio=0.989             │
│actor/dual_cliped_ratio=0.0000         │actor/entropy_loss=0.0000              │actor/grad_norm=10.714                │
│actor/lr=5.00e-06                      │actor/policy_loss=-0.0056              │actor/policy_loss_abs=0.539           │
│actor/ratio=0.995                      │actor/ratio_abs=0.146                  │actor/total_loss=-0.0014              │
│                                                                                                                      │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯


╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
├──────────────────────────────────────────────────── Metric Table ────────────────────────────────────────────────────┤
│ Global Step:    2/1000 │ Progress: ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │   0.2%                                 │
│ Elapsed: 02:34:05 │ ETA: 1281:30:50 │ Step Time: 4622.696s                                                           │
├──────────────────────────────────────────────────────── Time ────────────────────────────────────────────────────────┤
│                                                                                                                      │
│actor/run_training=629.4               │cal_adv_and_returns=0.0091             │env/compute_bootstrap_rewards=0.0033  │
│env/env_interact_step=3507.8           │env/interact=4098.7                    │env/recv_rollout_results=132.9        │
│env/run_interact_once=4098.7           │generate_rollouts=4103.0               │rollout/generate_one_epoch=4093.4     │
│rollout/predict=124.0                  │step=4740.7                            │sync_weights=8.167                    │
│                                                                                                                      │
├──────────────────────────────────────────────────── Environment ─────────────────────────────────────────────────────┤
│                                                                                                                      │
│episode_len=320.0                      │num_trajectories=960                   │return=0.20520833                     │
│reward=0.00064127607                   │success_once=0.20520833                │                                      │
│                                                                                                                      │
├────────────────────────────────────────────────────── Rollout ───────────────────────────────────────────────────────┤[36m(RolloutGroup(rank=0) pid=1200033)[0m 

```


### Environment

python -V:Python 3.11.14
uv pip list:
Package                            Version
---------------------------------- ------------
absl-py                            2.4.0
accelerate                         1.13.0
addict                             2.4.0
aiohappyeyeballs                   2.6.2
aiohttp                            3.14.0
aiohttp-cors                       0.8.1
aiosignal                          1.4.0
annotated-types                    0.7.0
anyio                              4.13.0
argcomplete                        3.6.3
array-record                       0.8.3
asttokens                          3.0.1
astunparse                         1.6.3
attrs                              26.1.0
augmax                             0.4.1
av                                 17.1.0
bddl                               3.6.0
beartype                           0.19.0
beautifulsoup4                     4.14.3
blinker                            1.9.0
boto                               2.49.0
cachebox                           5.2.3
cachetools                         5.5.2
certifi                            2026.5.20
cffi                               2.0.0
cfgv                               3.5.0
charset-normalizer                 3.4.7
chex                               0.1.90
click                              8.4.1
cloudpickle                        3.1.2
cmake                              4.3.2
colorful                           0.6.0a1
colorlog                           6.10.1
comm                               0.2.3
configargparse                     1.7.5
contourpy                          1.3.3
crcmod                             1.7
cryptography                       46.0.7
cycler                             0.12.1
dash                               4.3.0rc0
datasets                           3.6.0
debugpy                            1.8.21
decorator                          5.3.1
deepdiff                           9.1.0
diffusers                          0.38.0
dill                               0.3.8
distlib                            0.4.1
distro                             1.9.0
dm-control                         1.0.41
dm-env                             1.6
dm-tree                            0.1.10
docstring-parser                   0.18.0
donfig                             0.8.1.post1
draccus                            0.10.0
easydict                           1.13
einops                             0.8.2
embreex                            4.4.0
equinox                            0.13.8
etils                              1.14.0
evdev                              1.9.3
executing                          2.2.1
farama-notifications               0.0.6
fasteners                          0.20
fastjsonschema                     2.21.2
filelock                           3.29.1
flash-attn                         2.7.4.post1
flask                              3.1.3
flatbuffers                        25.12.19
flax                               0.10.2
fonttools                          4.63.0
frozenlist                         1.8.0
fsspec                             2025.3.0
ftfy                               6.3.1
future                             1.0.0
gast                               0.7.0
gcs-oauth2-boto-plugin             3.3
gcsfs                              2025.3.0
gdown                              6.1.0
gitdb                              4.0.12
gitpython                          3.1.50
glfw                               2.10.0
google-api-core                    2.31.0
google-apitools                    0.5.35
google-auth                        2.39.0
google-auth-httplib2               0.4.0
google-auth-oauthlib               1.4.0
google-cloud-core                  2.6.0
google-cloud-storage               3.11.0
google-crc32c                      1.8.0
google-pasta                       0.2.0
google-reauth                      0.1.1
google-resumable-media             2.10.0
googleapis-common-protos           1.75.0
grpcio                             1.81.0
gsutil                             5.37
gym                                0.26.2
gym-aloha                          0.1.3
gym-notices                        0.1.0
gymnasium                          0.29.1
h11                                0.16.0
h5py                               3.14.0
hf-transfer                        0.1.9
hf-xet                             1.5.1.dev1
httpcore                           1.0.9
httplib2                           0.20.4
httpx                              0.28.1
httpx-sse                          0.4.3
huggingface-hub                    0.36.2
humanize                           4.15.0
hydra-core                         1.4.0.dev1
icmplib                            3.0.4
identify                           2.6.19
idna                               3.18
imageio                            2.37.3
imageio-ffmpeg                     0.6.0
immutabledict                      4.3.1
importlib-metadata                 9.0.0
importlib-resources                7.1.0
iniconfig                          2.3.0
inquirerpy                         0.3.4
ipython                            9.14.1
ipython-pygments-lexers            1.1.1
ipywidgets                         8.1.8
itsdangerous                       2.2.0
janus                              2.0.0
jax                                0.5.3
jax-cuda12-pjrt                    0.5.3
jax-cuda12-plugin                  0.5.3
jaxlib                             0.5.3
jaxtyping                          0.2.36
jedi                               0.20.0
jinja2                             3.1.6
jiter                              0.15.0
joblib                             1.5.3
jsonlines                          4.0.0
jsonschema                         4.26.0
jsonschema-specifications          2025.9.1
jupyter-core                       5.9.1
jupyterlab-widgets                 3.0.16
jupytext                           1.19.3
keras                              3.14.1
kiwisolver                         1.5.0
labmaze                            1.0.6
lerobot                            0.1.0
libclang                           18.1.1
liger-kernel                       0.8.0
llvmlite                           0.48.0rc1
lxml                               7.0.0a1
manifold3d                         3.5.1
mapbox-earcut                      2.0.0
markdown                           3.10.2
markdown-it-py                     4.2.0
markupsafe                         3.0.3
matplotlib                         3.11.0rc2
matplotlib-inline                  0.2.2
mcp                                1.27.2
mdit-py-plugins                    0.6.1
mdurl                              0.1.2
mergedeep                          1.3.4
ml-collections                     1.0.0
ml-dtypes                          0.5.4
modelscope                         1.37.1
monotonic                          1.6
mplib                              0.2.1
mpmath                             1.3.0
msgpack                            1.2.0rc1
msgspec                            0.21.1
mujoco                             3.9.0
multidict                          6.7.1
multiprocess                       0.70.16
mypy-extensions                    1.1.0
namex                              0.1.0
narwhals                           2.22.1
nbformat                           5.10.4
nest-asyncio                       1.6.0
networkx                           3.6.1
ninja                              1.13.0
nltk                               3.9.4
nodeenv                            1.10.0
numba                              0.66.0rc1
numcodecs                          0.16.5
numpy                              1.26.4
numpy-quaternion                   2024.0.13
numpydantic                        1.8.1
nvidia-cublas-cu12                 12.4.5.8
nvidia-cuda-cupti-cu12             12.4.127
nvidia-cuda-nvcc-cu12              12.9.86
nvidia-cuda-nvrtc-cu12             12.4.127
nvidia-cuda-runtime-cu12           12.4.127
nvidia-cudnn-cu12                  9.1.0.70
nvidia-cufft-cu12                  11.2.1.3
nvidia-curand-cu12                 10.3.5.147
nvidia-curobo                      0.0.0
nvidia-cusolver-cu12               11.6.1.9
nvidia-cusparse-cu12               12.3.1.170
nvidia-cusparselt-cu12             0.6.2
nvidia-libnvcomp-cu12              5.2.0.13
nvidia-ml-py                       13.595.45
nvidia-nccl-cu12                   2.21.5
nvidia-nvcomp-cu12                 5.2.0.13
nvidia-nvjitlink-cu12              12.4.127
nvidia-nvtx-cu12                   12.4.127
nvitop                             1.7.0
oauth2client                       4.1.3
oauthlib                           3.3.1
omegaconf                          2.4.0.dev11
open3d                             0.19.0
openai                             2.41.0
opencensus                         0.11.4
opencensus-context                 0.2.dev0
opencv-python                      4.11.0.86
opencv-python-headless             4.11.0.86
openexr                            3.4.12
openpi                             0.1.0
openpi-client                      0.1.0
opentelemetry-api                  1.42.1
opentelemetry-exporter-prometheus  0.63b1
opentelemetry-proto                1.42.1
opentelemetry-sdk                  1.42.1
opentelemetry-semantic-conventions 0.63b1
opt-einsum                         3.4.0
optax                              0.2.8
optree                             0.19.1
orbax-checkpoint                   0.11.13
orderly-set                        5.5.0
orjson                             3.11.9
packaging                          26.2
pandas                             3.0.3
parso                              0.8.7
peft                               0.19.1
pexpect                            4.9.0
pfzy                               0.3.4
pillow                             12.2.0
pip                                26.1.2
platformdirs                       4.10.0
plotly                             6.8.0
pluggy                             1.6.0
polars                             1.41.2
polars-runtime-32                  1.41.2
pre-commit                         4.6.0
prettytable                        3.17.0
prometheus-client                  0.25.0
promise                            2.3
prompt-toolkit                     3.0.52
propcache                          0.5.2
proto-plus                         1.28.0
protobuf                           6.33.6
psutil                             7.2.2
ptyprocess                         0.7.0
pure-eval                          0.2.3
py-spy                             0.4.2
pyarrow                            24.0.0
pyasn1                             0.6.3
pyasn1-modules                     0.4.2
pybind11                           3.0.4
pycollada                          0.9.3
pycparser                          3.0
pydantic                           2.14.0a1
pydantic-core                      2.47.0
pydantic-settings                  2.14.1
pyecharts                          2.1.0
pygments                           2.20.0
pyjwt                              2.13.0
pymunk                             7.2.0
pynput                             1.8.2
pyopengl                           3.1.10
pyopenssl                          26.0.0
pyparsing                          3.3.2
pyperclip                          1.11.0
pyquaternion                       0.9.9
pyrealsense2                       2.58.1.10581
pysocks                            1.7.1
pytest                             9.0.3
python-dateutil                    2.9.0.post0
python-discovery                   1.4.0
python-dotenv                      1.2.2
python-multipart                   0.0.32
python-xlib                        0.33
pyu2f                              0.1.5
pyyaml                             6.0.3
pyyaml-include                     1.4.1
pyzmq                              27.1.0
ray                                2.55.1
referencing                        0.37.0
regex                              2026.5.9
requests                           2.34.2
requests-oauthlib                  2.0.0
rerun-sdk                          0.23.1
retry-decorator                    2.0a1
retrying                           1.4.2
rich                               14.3.4
robosuite                          1.4.1
rpds-py                            2026.5.1
rsa                                4.7.2
rtree                              1.4.1
ruff                               0.15.16
safetensors                        0.8.0rc1
sapien                             3.0.1
scikit-learn                       1.9.0
scipy                              1.17.1
sentencepiece                      0.2.1
sentry-sdk                         3.0.0a7
setuptools                         75.8.2
setuptools-scm                     10.0.5
shapely                            2.1.2
simple-parsing                     0.1.8
simplejson                         4.1.1
six                                1.17.0
smart-open                         7.6.1
smmap                              5.0.3
sniffio                            1.3.1
soupsieve                          2.8.4
sse-starlette                      3.4.4
stack-data                         0.6.3
starlette                          1.2.1
svg-path                           7.0
svgwrite                           1.4.3
swanlab                            0.8.0rc4
sympy                              1.13.1
tensorboard                        2.20.0
tensorboard-data-server            0.7.2
tensorflow                         2.21.0
tensorflow-addons                  0.23.0
tensorflow-datasets                4.9.10
tensorflow-graphics                2021.12.3
tensorflow-metadata                1.17.3
tensorstore                        0.1.84
termcolor                          3.3.0
threadpoolctl                      3.6.0
timm                               1.0.27
tokenizers                         0.21.4
toml                               0.10.2
toolz                              1.1.0
toppra                             0.6.3
torch                              2.6.0
torchcodec                         0.2.0
torchdata                          0.11.0
torchvision                        0.21.0
tqdm                               4.67.3
tqdm-loggable                      0.4.1
traitlets                          5.15.1
transformers                       4.53.2
transforms3d                       0.4.2
tree                               0.2.4
treescope                          0.1.10
trimesh                            4.12.2
triton                             3.2.0
typeguard                          4.5.2
typing-extensions                  4.15.0
typing-inspect                     0.9.0
typing-inspection                  0.4.2
tyro                               1.0.13
urllib3                            2.7.0
uv                                 0.11.19
uvicorn                            0.49.0
vcs-versioning                     1.1.1
vhacdx                             0.0.10
virtualenv                         21.4.2
viser                              1.0.30
wadler-lindig                      0.1.7
wandb                              0.25.0
warp-lang                          1.11.1
watchdog                           6.0.0
wcwidth                            0.7.0
websockets                         16.0
werkzeug                           3.1.8
wheel                              0.47.0
widgetsnbextension                 4.0.15
wrapt                              2.2.1
xxhash                             3.7.0
yarl                               1.24.2
yourdfpy                           0.0.60
zarr                               3.1.5
zipp                               4.1.0
zstandard                          0.25.0
nvidia-smi:
Fri Jun 12 11:51:56 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.126.09             Driver Version: 580.126.09     CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A100-SXM4-80GB          On  |   00000000:10:00.0 Off |                    0 |
| N/A   54C    P0             98W /  400W |   67108MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA A100-SXM4-80GB          On  |   00000000:16:00.0 Off |                    0 |
| N/A   47C    P0             93W /  400W |   67417MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA A100-SXM4-80GB          On  |   00000000:2F:00.0 Off |                    0 |
| N/A   63C    P0            356W /  400W |   16538MiB /  81920MiB |     67%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA A100-SXM4-80GB          On  |   00000000:33:00.0 Off |                    0 |
| N/A   54C    P0            120W /  400W |   67353MiB /  81920MiB |     94%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   4  NVIDIA A100-SXM4-80GB          On  |   00000000:8A:00.0 Off |                    0 |
| N/A   50C    P0             94W /  400W |   67929MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   5  NVIDIA A100-SXM4-80GB          On  |   00000000:8F:00.0 Off |                    0 |
| N/A   56C    P0            144W /  400W |   67353MiB /  81920MiB |     93%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   6  NVIDIA A100-SXM4-80GB          On  |   00000000:C6:00.0 Off |                    0 |
| N/A   53C    P0            200W /  400W |   68109MiB /  81920MiB |    100%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   7  NVIDIA A100-SXM4-80GB          On  |   00000000:CA:00.0 Off |                    0 |
| N/A   45C    P0             70W /  400W |       4MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
RLinf version: N/A
Docker image tag:  N/A


### Other reproduction info

The approx_kl and clip_fraction remain relatively low, suggesting that the policy updates may be too conservative.

<img width="1220" height="328" alt="Image" src="https://github.com/user-attachments/assets/bedd9ad7-5af0-44b5-abcc-8dd10830a4fe" />

### Before submitting a new issue...

- [x] Have you checked relevant issues, FAQs (https://rlinf.readthedocs.io/en/latest/rst_source/faq.html), or asked the chatbot at the top right corner of the [documentation page](https://rlinf.readthedocs.io).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Troubleshoot]: RL training with pi0.5 and GRPO in RoboTwin is unstable. #1268

Problem description

Configuration YAML file

Log file

Environment

Other reproduction info

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Troubleshoot]: RL training with pi0.5 and GRPO in RoboTwin is unstable. #1268

Description

Problem description

Configuration YAML file

Log file

Environment

Other reproduction info

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions