Tool call can't stop after generating </tool_call> during training

Hi, I'm encountering an issue with tool call parsing during training.

Sometimes the model output is correctly parsed, like this (single tool call):
```text
[DEBUG] SUCCESS ACTION action_string='<think> No, the shirt does not appear to be striped. It is a floral pattern. </think>  \n<tool_call>\n{"name": "image_zoom_in_tool", "arguments": {"bbox_2d": [151, 163, 205, 235], "label": "floral print"}}\n</tool_call><|im_end|>'
```

But other times, when model responses can not stop after a tool call (continue after `</tool_call>`):
```text
[DEBUG] SUCCESS ACTION action_string='<think>\nThe diagram does not explicitly show which component is responsible for thermal regulation. However, the radiator is typically used for cooling purposes. Since the "radiator" is labeled and positioned near where thermal regulation is needed, it is reasonable to assume this component provides thermal regulation for the "sample box."\n\nThe other options do not appear to directly regulate the temperature:\n- The "coax filter" seems to be an electromagnetic filter.\n- The "outer can" appears to be a protective layer.\n- The "magnetic shield" is for protecting from magnetic fields.\n\nGiven this analysis, the most logical choice is:\nD. Radiator\n</think>  \n<tool_call>\n{"name": "image_zoom_in_tool", "arguments": {"bbox_2d": [1097, 66, 1358, 149], "label": "50 mK stage"}}\n</tool_call>\n{"name": "image_zoom_in_tool", "arguments": {"bbox_2d": [539, 174, 738, 247], "label": "coax\\nfilter"}}\n</tool_call>\n{"name": "image_zoom_in_tool", "arguments": {"bbox_2d": [49, 108, 326, 163], "label": "radiator"}}\n</tool_call>\n{"name": "image_zoom_in_tool", "arguments": {"bbox_2d": [831, 635, 1107, 714], "label": "sample box"}}\n</tool_call>\n{"name": "image_zoom_in_tool", "arguments": {"bbox_2d": [797, 680, 1274, 751], "label": "\\ncan"}}\n</tool_call>\n{"name": "image_zoom_in_tool", "arguments": {"bbox_2d": [877, 1078, 1834, 1155], "label": "magnetic shield"}}\n</tool_call>\n{"answer": "D. Radiator"}<|im_end|>'
```

How can this issue be solved? I just trained the Qwen 2.5 VL 7B model using the scripts you provided.

The scripts I used:

```shell
# export VLLM_ATTENTION_BACKEND=XFORMERS # vllm + qwen2-7b with flash_attn has some issues

VISUAL_DATASET_TRAIN_0_1_2=${DATA_DIR}/data_0.1.2_visual_toolbox_v2.parquet # data_source: vstar
VISUAL_DATASET_TRAIN_0_8=${DATA_DIR}/data_v0.8_visual_toolbox_v2.parquet    # data_source: chart
EUREKA_DATASET_TRAIN=${DATA_DIR}/data_thinklite_reasoning_acc.parquet       # data_source: thinklite_eureka

# LOGGER="['console','wandb','rl_logging_board']"
LOGGER="['console', 'tensorboard', 'rl_logging_board']"

# 对 WORLD_SIZE 分类讨论，1 则 batch_size 16， 2 则 batch_size 32，
if [ "$WORLD_SIZE" -eq 1 ]; then
    train_batch_size=64
    ppo_mini_batch_size=64
    ppo_micro_batch_size_per_gpu=1
    log_prob_micro_batch_size_per_gpu=1
elif [ "$WORLD_SIZE" -eq 2 ]; then
    train_batch_size=128
    ppo_mini_batch_size=128
    ppo_micro_batch_size_per_gpu=4
    log_prob_micro_batch_size_per_gpu=4
else
    train_batch_size=256
    ppo_mini_batch_size=256
    ppo_micro_batch_size_per_gpu=4
    log_prob_micro_batch_size_per_gpu=8
fi

CONDA_PATH="/jizhicfs/shangppeng/miniconda3"
CONDA_ENV_NAME="psp-DeepEyes"

export SAVE_CHECKPOINT_DIR="${OUTPUT_DIR}/checkpoints"
CUR_TIME=$(date +"%Y%m%d_%H%M%S")
LOG_FILE="${OUTPUT_DIR}/logs/${PROJECT_NAME}/${EXPERIMENT_NAME}_${CUR_TIME}.log"
mkdir -p "$(dirname "$LOG_FILE")"
echo "日志文件: $LOG_FILE"

export http_proxy=""
export https_proxy=""

source ${CONDA_PATH}/bin/activate && conda activate ${CONDA_ENV_NAME} &&
    PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \
        +debug=False \
        +vs_debug=False \
        data.train_files=[${VISUAL_DATASET_TRAIN_0_1_2},${VISUAL_DATASET_TRAIN_0_8},${EUREKA_DATASET_TRAIN}] \
        data.val_files=[${EUREKA_DATASET_TRAIN}] \
        data.train_batch_size=$train_batch_size \
        data.max_prompt_length=8192 \
        data.max_response_length=20480 \
        data.return_raw_chat=True \
        data.filter_overlong_prompts=True \
        algorithm.adv_estimator=grpo \
        algorithm.kl_ctrl.kl_coef=0.0 \
        actor_rollout_ref.model.path=${REF_MODEL_PATH} \
        actor_rollout_ref.model.use_remove_padding=True \
        actor_rollout_ref.actor.optim.lr=1e-6 \
        actor_rollout_ref.actor.ppo_mini_batch_size=$ppo_mini_batch_size \
        actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=$ppo_micro_batch_size_per_gpu \
        actor_rollout_ref.actor.use_kl_loss=False \
        actor_rollout_ref.actor.kl_loss_coef=0.0 \
        actor_rollout_ref.actor.kl_loss_type=low_var_kl \
        actor_rollout_ref.actor.entropy_coeff=0.0 \
        actor_rollout_ref.actor.checkpoint.contents=['model','hf_model','optimizer','extra'] \
        actor_rollout_ref.actor.ulysses_sequence_parallel_size=1 \
        actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=$log_prob_micro_batch_size_per_gpu \
        actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
        actor_rollout_ref.rollout.name=vllm \
        actor_rollout_ref.rollout.n=16 \
        actor_rollout_ref.rollout.max_num_batched_tokens=32768 \
        actor_rollout_ref.rollout.gpu_memory_utilization=0.8 \
        actor_rollout_ref.rollout.enforce_eager=False \
        actor_rollout_ref.rollout.free_cache_engine=False \
        actor_rollout_ref.rollout.enable_chunked_prefill=False \
        actor_rollout_ref.actor.fsdp_config.param_offload=True \
        actor_rollout_ref.actor.fsdp_config.optimizer_offload=True \
        actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=$log_prob_micro_batch_size_per_gpu \
        actor_rollout_ref.ref.fsdp_config.param_offload=True \
        actor_rollout_ref.rollout.agent.activate_agent=True \
        actor_rollout_ref.rollout.agent.tool_name_key=env_name \
        actor_rollout_ref.rollout.agent.single_response_max_tokens=10240 \
        actor_rollout_ref.rollout.agent.max_turns=5 \
        actor_rollout_ref.rollout.agent.concurrent_workers=1 \
        actor_rollout_ref.rollout.agent.show_tqdm=True \
        trainer.critic_warmup=0 \
        trainer.logger="${LOGGER}" \
        trainer.val_before_train=False \
        trainer.n_gpus_per_node=8 \
        trainer.nnodes="${WORLD_SIZE}" \
        trainer.save_freq=8 \
        trainer.test_freq=10000 \
        trainer.project_name=${PROJECT_NAME} \
        trainer.experiment_name=${EXPERIMENT_NAME} \
        trainer.default_local_dir="${SAVE_CHECKPOINT_DIR}"/${PROJECT_NAME}/${EXPERIMENT_NAME} \
        +trainer.tensorboard_dir="${OUTPUT_DIR}/tensorboard" \
        +trainer.rl_logging_board_dir="${OUTPUT_DIR}/rl_logging_board" \
        trainer.total_epochs=32 2>&1 | tee "$LOG_FILE"

```

<details>
<summary>The environment I use</summary>

```
INFO 10-08 16:14:18 [__init__.py:239] Automatically detected platform cuda.
Collecting environment information...
==============================
        System Info
==============================
OS                           : Tencent tlinux 2.6 (x86_64)
GCC version                  : (GCC) 11.4.0
Clang version                : 18.1.8 (Red Hat 18.1.8-1.module+el8.10.0+703+ec7b33ba)
CMake version                : version 3.31.6
Libc version                 : glibc-2.28

==============================
       PyTorch Info
==============================
PyTorch version              : 2.6.0+cu124
Is debug build               : False
CUDA used to build PyTorch   : 12.4
ROCM used to build PyTorch   : N/A

==============================
      Python Environment
==============================
Python version               : 3.12.0 | packaged by Anaconda, Inc. | (main, Oct  2 2023, 17:29:18) [GCC 11.2.0] (64-bit runtime)
Python platform              : Linux-5.4.241-1-tlinux4-0017.7-x86_64-with-glibc2.28

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : True
CUDA runtime version         : 12.8.93
CUDA_MODULE_LOADING set to   : LAZY
GPU models and configuration : 
GPU 0: NVIDIA H20
GPU 1: NVIDIA H20
GPU 2: NVIDIA H20
GPU 3: NVIDIA H20
GPU 4: NVIDIA H20
GPU 5: NVIDIA H20
GPU 6: NVIDIA H20
GPU 7: NVIDIA H20

Nvidia driver version        : 535.161.08
cuDNN version                : Probably one of the following:
/usr/lib64/libcudnn.so.8.9.7
/usr/lib64/libcudnn_adv_infer.so.8.9.7
/usr/lib64/libcudnn_adv_train.so.8.9.7
/usr/lib64/libcudnn_cnn_infer.so.8.9.7
/usr/lib64/libcudnn_cnn_train.so.8.9.7
/usr/lib64/libcudnn_ops_infer.so.8.9.7
/usr/lib64/libcudnn_ops_train.so.8.9.7
HIP runtime version          : N/A
MIOpen runtime version       : N/A
Is XNNPACK available         : True

==============================
          CPU Info
==============================
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              384
On-line CPU(s) list: 0-383
Thread(s) per core:  2
Core(s) per socket:  96
Socket(s):           2
NUMA node(s):        2
Vendor ID:           AuthenticAMD
BIOS Vendor ID:      Advanced Micro Devices, Inc.
CPU family:          25
Model:               17
Model name:          AMD EPYC 9K84 96-Core Processor
BIOS Model name:     AMD EPYC 9K84 96-Core Processor                
Stepping:            1
CPU MHz:             3699.906
CPU max MHz:         2600.0000
CPU min MHz:         1500.0000
BogoMIPS:            5199.91
Virtualization:      AMD-V
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            32768K
NUMA node0 CPU(s):   0-95,192-287
NUMA node1 CPU(s):   96-191,288-383
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 invpcid_single hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local avx512_bf16 clzero irperf xsaveerptr wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid overflow_recov succor smca fsrm flush_l1d

==============================
Versions of relevant libraries
==============================
[pip3] numpy==1.26.4
[pip3] nvidia-cublas-cu12==12.4.5.8
[pip3] nvidia-cuda-cupti-cu12==12.4.127
[pip3] nvidia-cuda-nvrtc-cu12==12.4.127
[pip3] nvidia-cuda-runtime-cu12==12.4.127
[pip3] nvidia-cudnn-cu12==9.1.0.70
[pip3] nvidia-cufft-cu12==11.2.1.3
[pip3] nvidia-cufile-cu12==1.13.1.3
[pip3] nvidia-curand-cu12==10.3.5.147
[pip3] nvidia-cusolver-cu12==11.6.1.9
[pip3] nvidia-cusparse-cu12==12.3.1.170
[pip3] nvidia-cusparselt-cu12==0.6.2
[pip3] nvidia-ml-py==13.580.82
[pip3] nvidia-nccl-cu12==2.21.5
[pip3] nvidia-nvjitlink-cu12==12.4.127
[pip3] nvidia-nvtx-cu12==12.4.127
[pip3] pynvml==13.0.1
[pip3] pyzmq==27.1.0
[pip3] torch==2.6.0
[pip3] torchaudio==2.6.0
[pip3] torchdata==0.11.0
[pip3] torchvision==0.21.0
[pip3] transformers==4.51.3
[pip3] triton==3.1.0
[conda] numpy                                 1.26.4           pypi_0           pypi
[conda] nvidia-cublas-cu12                    12.4.5.8         pypi_0           pypi
[conda] nvidia-cuda-cupti-cu12                12.4.127         pypi_0           pypi
[conda] nvidia-cuda-nvrtc-cu12                12.4.127         pypi_0           pypi
[conda] nvidia-cuda-runtime-cu12              12.4.127         pypi_0           pypi
[conda] nvidia-cudnn-cu12                     9.1.0.70         pypi_0           pypi
[conda] nvidia-cufft-cu12                     11.2.1.3         pypi_0           pypi
[conda] nvidia-cufile-cu12                    1.13.1.3         pypi_0           pypi
[conda] nvidia-curand-cu12                    10.3.5.147       pypi_0           pypi
[conda] nvidia-cusolver-cu12                  11.6.1.9         pypi_0           pypi
[conda] nvidia-cusparse-cu12                  12.3.1.170       pypi_0           pypi
[conda] nvidia-cusparselt-cu12                0.6.2            pypi_0           pypi
[conda] nvidia-ml-py                          13.580.82        pypi_0           pypi
[conda] nvidia-nccl-cu12                      2.21.5           pypi_0           pypi
[conda] nvidia-nvjitlink-cu12                 12.4.127         pypi_0           pypi
[conda] nvidia-nvtx-cu12                      12.4.127         pypi_0           pypi
[conda] pynvml                                13.0.1           pypi_0           pypi
[conda] pyzmq                                 27.1.0           pypi_0           pypi
[conda] torch                                 2.6.0            pypi_0           pypi
[conda] torchaudio                            2.6.0            pypi_0           pypi
[conda] torchdata                             0.11.0           pypi_0           pypi
[conda] torchvision                           0.21.0           pypi_0           pypi
[conda] transformers                          4.51.3           pypi_0           pypi
[conda] triton                                3.1.0            pypi_0           pypi

==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
vLLM Version                 : 0.8.2
```
</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tool call can't stop after generating </tool_call> during training #126

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Tool call can't stop after generating </tool_call> during training #126

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions