verl-GR

Source Code Overview

verl_gr/recipes/: task-specific implementations and data/reward logic (for example, OpenOneRec runtime preparation and workers).
verl_gr/trainers/: trainer-side wrappers around upstream verl trainer code.
verl_gr/workers/: rollout-side extensions that are still useful outside a single recipe.
verl_gr/third_party/: small compatibility helpers for non-verl dependencies such as vllm.

Docs

docs/verl_gr/openonerec_mapping.md: maps legacy OpenOneRec runtime modules to the current verl_gr layout.
docs/verl_gr/openonerec_parity_plan.md: tracks the current Phase B parity/smoke checklist after the cleanup refactor.
docs/verl_gr/minionerec_mapping.md: MiniOneRec dataset / reward / beam contract.
docs/verl_gr/minionerec_pr_changes.md: workingbranch vs main (MiniOneRec + performance).
docs/verl_gr/rankgrpo_mapping.md: RankGRPO vs TRL root-cause comparison and analysis.
docs/verl_gr/rankgrpo_target.md: alignment progress tracker by target item (convergence & efficiency).
scripts/README.md: launcher index for GRPO / SFT / profiling scripts.

Data preparation

You will need to download OpenOneRec/OpenOneRec-RecIF first and then curate the RL data one-stop as follows. The flow is OpenOneRec-RecIF -> recommendation data preprocessing -> RL data split. Patch verl-GR/verl_gr/recipes/openonerec/data/recif_preprocessing.sh before getting started.

RECIF_DIR=/YOUR/RECIF/DIR

Then run:

cd verl-GR/verl_gr/recipes/openonerec/data
bash recif_preprocessing.sh
bash prepare_rl.sh

You will get the RL training data:

verl-GR/verl_gr/recipes/openonerec/output/rl_data/train.parquet - Training set (remaining data after merging all tasks)
verl-GR/verl_gr/recipes/openonerec/output/rl_data/test.parquet - Test set (1000 samples randomly sampled from merged data)

For Rank-GRPO data, you need to download the Reddit-V2 dataset. Or simply download the preprocessed version here

Launching Guide

Install base dependencies from the official script in requirements.txt comments, then install pinned packages in this repo.

cd verl-GR
pip install -r requirements.txt

Run the OpenOneRec GRPO launcher (set your model path first).

cd verl-GR
export BASE_MODEL=/path/to/your/model
bash scripts/run_openonerec_grpo.sh

MiniOneRec GRPO (DDP, aligned with MiniOneRec/rl.sh; requires bitsandbytes for paged_adamw_32bit):

cd verl-GR
export BASE_MODEL=/path/to/your/checkpoint
export PYTHON_BIN=/path/to/vllm-gr/bin/python
bash scripts/run_minionerec_grpo_rl_aligned.sh

Rank-GRPO (set your model path first)

cd verl-GR
export BASE_MODEL=/path/to/your/checkpoint
bash scripts/run_rankgrpo.sh

Two-Stage Notes

OpenOneRec two_stage is implemented entirely inside verl-GR.
The async path uses verl_gr/recipes/openonerec/two_stage_agent_loop.py together with verl_gr/workers/rollout/two_stage_vllm_async.py.
No local source patch to the upstream verl repo is required or expected.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
configs		configs
docs/verl_gr		docs/verl_gr
eval		eval
scripts		scripts
tests		tests
verl_gr		verl_gr
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

verl-GR

Source Code Overview

Docs

Data preparation

Launching Guide

Two-Stage Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

verl-GR

Source Code Overview

Docs

Data preparation

Launching Guide

Two-Stage Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages