Generative Reasoning Recommendation via LLMs

This repository contains the official implementation for the paper "Generative Reasoning Recommendation via LLMs".

🧩 Introduction

Large Language Models (LLMs) demonstrate remarkable reasoning abilities across many domains, yet they face fundamental challenges in functioning as Generative Reasoning Recommendation Models (GRRMs).

These challenges arise from the modeling gap between textual semantics and collaborative filtering signals, as well as the sparsity and stochasticity of user feedback.

To address this, we introduce GREAM — an end-to-end generative reasoning recommendation framework that unifies understanding, reasoning, and prediction for recommendation tasks.

🔍 Overview of GREAM

GREAM integrates three key components:

Collaborative–Semantic Alignment
Fuses heterogeneous textual evidence (titles, descriptions, reviews) to construct semantically consistent discrete item indices, aligning linguistic and interaction semantics.
Reasoning Curriculum Activation
Builds a synthetic Chain-of-Thought (CoT) dataset and trains via a progressive curriculum of:
- Behavioral evidence extraction
- Latent preference modeling
- Intent inference
- Recommendation formulation
- Denoised sequence rewriting
Sparse-Regularized Group Policy Optimization (SRPO)
A novel reinforcement learning method combining:
- Residual-Sensitive Verifiable Reward (RSVR)
- Bonus-Calibrated Group Advantage Estimation (BGAE)
  Enabling stable and verifiable fine-tuning under sparse signals.

⚙️ Installation

Set up your environment with the required packages.

bash scripts/install.sh

📦 Data Preparation

You can download data from here. Put data.zip under this directory and put sft_data.zip under LLaMA-Factory/data/. Then unzip them.

You can refer to data_processing/ for instructions on how to prepare your dataset.

🧠 SFT (Supervised Fine-Tuning)

We use LLaMA-Factory. Please refer to their repository for more details. You need to run scripts/construct_model.py to get Qwen3-4B-Instruct with extended vocabulary before sft training. Then use following commands to train on instruments:

llamafactory-cli train examples/train_full/qwen3-4b-mix.yaml

🧩 RL Training

Update the configuration in scripts/run.sh, then run:

bash scripts/run.sh

This phase applies SRPO (Sparse-Regularized Group Policy Optimization) for verifiable post-training refinement.

📊 Evaluation

To evaluate the model on Amazon datasets, run:

# For direct evaluation
torchrun --nproc_per_node=8 --master_port=23324 eval/test_ddp_direct.py \
   --ckpt_path [CKPT_PATH] \
   --dataset [DATASET_NAME] \
   --results_file [RESULTS_JSON_FILE] \
   --test_batch_size 8 \
   --num_beams 10 \
   --index_file .index.json \
   --test_task seqrec \
   --test_prompt_ids 5

# For reason evaluation, you need to deploy sglang servers first.

# You can use our deployment script. Base port is 10010.
bash scripts/deploy.sh [MODEL_PATH] [SERVE_NAME] [CUDA_DEVICES]

torchrun --nproc_per_node=8 --master_port=23324 eval/test_ddp_reason.py \
   --ckpt_path [CKPT_PATH] \
   --vllm_model_name [SERVE_NAME] \
   --dataset [DATASET_NAME] \
   --results_file [RESULTS_JSONL_FILE] \
   --test_batch_size 4 \
   --num_beams 10 \
   --index_file .index.json \
   --test_task seqrec-rl \
   --test_prompt_ids 5

Citation

If you find this work helpful, please cite:

@misc{hong2025generativereasoningrecommendationllms,
      title={Generative Reasoning Recommendation via LLMs}, 
      author={Minjie Hong and Zetong Zhou and Zirun Guo and Ziang Zhang and Ruofan Hu and Weinan Gan and Jieming Zhu and Zhou Zhao},
      year={2025},
      eprint={2510.20815},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2510.20815}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LLaMA-Factory		LLaMA-Factory
assets		assets
dapo		dapo
data_process		data_process
eval		eval
scripts		scripts
utils		utils
verl		verl
.gitignore		.gitignore
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Generative Reasoning Recommendation via LLMs

🧩 Introduction

🔍 Overview of GREAM

⚙️ Installation

📦 Data Preparation

🧠 SFT (Supervised Fine-Tuning)

🧩 RL Training

📊 Evaluation

Citation

About

Uh oh!

Releases

Packages

Languages

GPTAlgoPro/GRRM

Folders and files

Latest commit

History

Repository files navigation

Generative Reasoning Recommendation via LLMs

🧩 Introduction

🔍 Overview of GREAM

⚙️ Installation

📦 Data Preparation

🧠 SFT (Supervised Fine-Tuning)

🧩 RL Training

📊 Evaluation

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages