verl_gr/recipes/: task-specific implementations and data/reward logic (for example, OpenOneRec runtime preparation and workers).verl_gr/trainers/: trainer-side wrappers around upstreamverltrainer code.verl_gr/workers/: rollout-side extensions that are still useful outside a single recipe.verl_gr/third_party/: small compatibility helpers for non-verldependencies such asvllm.
docs/verl_gr/openonerec_mapping.md: maps legacy OpenOneRec runtime modules to the currentverl_grlayout.docs/verl_gr/openonerec_parity_plan.md: tracks the current Phase B parity/smoke checklist after the cleanup refactor.docs/verl_gr/minionerec_mapping.md: MiniOneRec dataset / reward / beam contract.docs/verl_gr/minionerec_pr_changes.md: workingbranch vsmain(MiniOneRec + performance).docs/verl_gr/rankgrpo_mapping.md: RankGRPO vs TRL root-cause comparison and analysis.docs/verl_gr/rankgrpo_target.md: alignment progress tracker by target item (convergence & efficiency).scripts/README.md: launcher index for GRPO / SFT / profiling scripts.
You will need to download OpenOneRec/OpenOneRec-RecIF first and then curate the RL data one-stop as follows. The flow is OpenOneRec-RecIF -> recommendation data preprocessing -> RL data split. Patch verl-GR/verl_gr/recipes/openonerec/data/recif_preprocessing.sh before getting started.
RECIF_DIR=/YOUR/RECIF/DIRThen run:
cd verl-GR/verl_gr/recipes/openonerec/data
bash recif_preprocessing.sh
bash prepare_rl.shYou will get the RL training data:
verl-GR/verl_gr/recipes/openonerec/output/rl_data/train.parquet- Training set (remaining data after merging all tasks)verl-GR/verl_gr/recipes/openonerec/output/rl_data/test.parquet- Test set (1000 samples randomly sampled from merged data)
For Rank-GRPO data, you need to download the Reddit-V2 dataset. Or simply download the preprocessed version here
- Install base dependencies from the official script in
requirements.txtcomments, then install pinned packages in this repo.
cd verl-GR
pip install -r requirements.txt- Run the OpenOneRec GRPO launcher (set your model path first).
cd verl-GR
export BASE_MODEL=/path/to/your/model
bash scripts/run_openonerec_grpo.sh- MiniOneRec GRPO (DDP, aligned with
MiniOneRec/rl.sh; requiresbitsandbytesforpaged_adamw_32bit):
cd verl-GR
export BASE_MODEL=/path/to/your/checkpoint
export PYTHON_BIN=/path/to/vllm-gr/bin/python
bash scripts/run_minionerec_grpo_rl_aligned.sh- Rank-GRPO (set your model path first)
cd verl-GR
export BASE_MODEL=/path/to/your/checkpoint
bash scripts/run_rankgrpo.sh- OpenOneRec
two_stageis implemented entirely insideverl-GR. - The async path uses
verl_gr/recipes/openonerec/two_stage_agent_loop.pytogether withverl_gr/workers/rollout/two_stage_vllm_async.py. - No local source patch to the upstream
verlrepo is required or expected.