DDRO: Direct Document Relevance Optimization for Generative Information Retrieval

Official implementation of our SIGIR 2025 paper: Lightweight and Direct Document Relevance Optimization for Generative IR

Motivation

Generative IR models are typically trained via next-token prediction (cross-entropy loss) over docid tokens. While effective for language modeling, this objective optimizes token-level generation , not document-level ranking, which is the core requirement in IR systems.

DDRO addresses this misalignment by directly optimizing the model for document-level ranking using pairwise preference learning, without reinforcement learning or reward modeling.

Method

DDRO trains in two phases:

Phase 1 — Supervised Fine-Tuning (SFT)

The model learns to generate the correct docid sequence for a given query via autoregressive next-token prediction across three stages:

Pretraining — document content to docid (doc → docid)
Search pretraining — pseudo queries to docid (pseudoquery → docid)
Fine-tuning — real queries to docid using qrels supervision (query → docid)

Phase 2 — Pairwise Ranking Optimization (DDRO Loss)

The model is fine-tuned with a pairwise learning-to-rank objective inspired by Direct Preference Optimization (Rafailov et al., 2023), adapted for structured docid generation under beam decoding constraints.

The DDRO loss trains the model to prefer relevant documents (docid+) over non-relevant ones (docid-) relative to a frozen SFT reference policy:

Symbol	Description
`docid+`	Relevant document for query `q`
`docid-`	Non-relevant document
`π_θ`	Current model being optimized
`π_ref`	Frozen SFT reference model
`β`	Temperature controlling preference sensitivity

Why DDRO differs from standard DPO

	DPO	DDRO
Architecture	Decoder-only	Encoder-decoder
Output	Free-form text	Structured docid sequences
Decoding	Greedy/sampling	Constrained beam search
Objective	Open-ended preference	Document-level ranking

Project Structure

src/
├── data/          # Downloading, preprocessing, and docid instance generation
├── pretrain/      # Model training and evaluation
├── scripts/       # Shell scripts for SFT, DDRO, BM25, and preprocessing
└── utils/         # Tokenization, trie, metrics, trainers

ddro_env.yml       # Conda environment for training
pyserini.yml       # Conda environment for BM25 retrieval
requirements.txt   # Python dependencies

Each subdirectory contains a detailed README.md with further instructions.

Setup

1. Install environment

git clone https://github.com/kidist-amde/ddro.git
cd ddro
conda env create -f ddro_env.yml
conda activate ddro_env

2. Download datasets and pretrained model

We use MS MARCO (top-300K) and Natural Questions (NQ-320K), plus a pretrained T5-base model.

bash ./src/data/download/download_msmarco_datasets.sh
bash ./src/data/download/download_nq_datasets.sh
python ./src/data/download/download_t5_model.py

See src/data/download/README.md for details.

Data Preparation

MS MARCO — sample top-300K subset

bash scripts/preprocess/sample_top_docs.sh

Output: resources/datasets/processed/msmarco-docs-sents.top.300k.json.gz

Expected directory structure

resources/
├── datasets/
│   ├── raw/
│   │   ├── msmarco-data/
│   │   └── nq-data/
│   └── processed/
└── transformer_models/
    └── t5-base/

For full preprocessing instructions (docid generation, training/eval instance creation): src/data/data_prep/README.md

Training

Phase 1 — SFT

Run all three SFT stages with a single command:

bash src/scripts/sft/launch_SFT_training.sh

The --encoding flag controls the docid format (pq or url_title).

Phase 2 — DDRO

After SFT, run pairwise ranking optimization:

bash scripts/ddro/slurm_submit_ddro_training.sh

DDRO is implemented using a custom version of HuggingFace's DPOTrainer.

Evaluation

Option A — Quick evaluation via launcher (recommended)

# SLURM
sbatch src/pretrain/hf_eval/slurm_submit_hf_eval.sh

# Direct
python src/pretrain/hf_eval/launch_hf_eval_from_config.py \
  --dataset msmarco \
  --encoding pq \
  --scale top_300k \
  --hf_docids_repo kiyam/ddro-docids \
  --hf_tests_repo  kiyam/ddro-testsets

Option B — Manual evaluation with HF URIs

NQ + Title+URL

python src/pretrain/hf_eval/eval_hf_docid_ranking.py \
  --pretrain_model_path kiyam/ddro-nq-tu \
  --docid_path "hf:dataset:kiyam/ddro-docids:tu_nq_docids.txt" \
  --test_file_path "hf:dataset:kiyam/ddro-testsets:nq/test_data/query_dev.t5_128_1.url_title_nq.json" \
  --dataset_script_dir src/data/data_scripts \
  --dataset_cache_dir ./cache \
  --num_beams 50 --add_doc_num 6144 \
  --max_seq_length 64 --max_docid_length 100 \
  --use_docid_rank True --docid_format nq \
  --lookup_fallback True --device cuda:0

MS MARCO + PQ

python src/pretrain/hf_eval/eval_hf_docid_ranking.py \
  --pretrain_model_path kiyam/ddro-msmarco-pq \
  --docid_path "hf:dataset:kiyam/ddro-docids:pq_msmarco_docids.txt" \
  --test_file_path "hf:dataset:kiyam/ddro-testsets:msmarco/test_data_top_300k/query_dev.t5_128_1.pq.top_300k.json" \
  --dataset_script_dir src/data/data_scripts \
  --dataset_cache_dir ./cache \
  --num_beams 80 --add_doc_num 6144 \
  --max_seq_length 64 --max_docid_length 24 \
  --use_docid_rank True --docid_format msmarco \
  --lookup_fallback True --device cuda:0

Results

Dataset	Docid	Model	MRR@10	R@10
MS MARCO	PQ	`kiyam/ddro-msmarco-pq`	45.76	73.02
MS MARCO	TU	`kiyam/ddro-msmarco-tu`	50.07	74.01
NQ	PQ	`kiyam/ddro-nq-pq`	55.51	67.31
NQ	TU	`kiyam/ddro-nq-tu`	45.99	55.98

Notes

Recommended: transformers==4.37.2, tokenizers==0.15.2
NQ–PQ uses canonical integer docids; NQ–TU uses lowercased url_title strings — do not mix assets across sources
Default beam counts: NQ-PQ (100), NQ-TU (50), MS MARCO-PQ (80)
Logs saved to logs/<dataset>/dpo_*.log and logs/<dataset>/dpo_*.csv

Datasets and Checkpoints

All preprocessed datasets, docid encodings, and model checkpoints: DDRO Generative IR Collection on Hugging Face

Resource	Link
MS MARCO Top-300K dataset	kiyam/ddro-ms-dataset
NQ-320K dataset	kiyam/ddro-nq-dataset
DocID tables	kiyam/ddro-docids
Eval test sets	kiyam/ddro-testsets

Acknowledgments

License

This project is licensed under the Apache 2.0 License.

Citation

@inproceedings{mekonnen2025lightweight,
  title={Lightweight and Direct Document Relevance Optimization for Generative Information Retrieval},
  author={Mekonnen, Kidist Amde and Tang, Yubao and de Rijke, Maarten},
  booktitle={Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval},
  pages={1327--1338},
  year={2025}
}

For questions, please open an issue.

Name		Name	Last commit message	Last commit date
Latest commit History 108 Commits
hf_datasets		hf_datasets
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ddro_env.yml		ddro_env.yml
pyserini.yml		pyserini.yml
requirements.txt		requirements.txt
upload_nq_dataset.py		upload_nq_dataset.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DDRO: Direct Document Relevance Optimization for Generative Information Retrieval

Table of Contents

Motivation

Method

Phase 1 — Supervised Fine-Tuning (SFT)

Phase 2 — Pairwise Ranking Optimization (DDRO Loss)

Why DDRO differs from standard DPO

Project Structure

Setup

1. Install environment

2. Download datasets and pretrained model

Data Preparation

MS MARCO — sample top-300K subset

Expected directory structure

Training

Phase 1 — SFT

Phase 2 — DDRO

Evaluation

Option A — Quick evaluation via launcher (recommended)

Option B — Manual evaluation with HF URIs

Results

Notes

Datasets and Checkpoints

Acknowledgments

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DDRO: Direct Document Relevance Optimization for Generative Information Retrieval

Table of Contents

Motivation

Method

Phase 1 — Supervised Fine-Tuning (SFT)

Phase 2 — Pairwise Ranking Optimization (DDRO Loss)

Why DDRO differs from standard DPO

Project Structure

Setup

1. Install environment

2. Download datasets and pretrained model

Data Preparation

MS MARCO — sample top-300K subset

Expected directory structure

Training

Phase 1 — SFT

Phase 2 — DDRO

Evaluation

Option A — Quick evaluation via launcher (recommended)

Option B — Manual evaluation with HF URIs

Results

Notes

Datasets and Checkpoints

Acknowledgments

License

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages