Skip to content

kidist-amde/ddro

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

108 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DDRO: Direct Document Relevance Optimization for Generative Information Retrieval

SIGIR 2025 Paper License HuggingFace HuggingFace HuggingFace

Official implementation of our SIGIR 2025 paper: Lightweight and Direct Document Relevance Optimization for Generative IR


Table of Contents


Motivation

Generative IR models are typically trained via next-token prediction (cross-entropy loss) over docid tokens. While effective for language modeling, this objective optimizes token-level generation , not document-level ranking, which is the core requirement in IR systems.

DDRO addresses this misalignment by directly optimizing the model for document-level ranking using pairwise preference learning, without reinforcement learning or reward modeling.


Method

DDRO trains in two phases:

DDRO training pipeline overview

Phase 1 — Supervised Fine-Tuning (SFT)

The model learns to generate the correct docid sequence for a given query via autoregressive next-token prediction across three stages:

  1. Pretraining — document content to docid (doc → docid)
  2. Search pretraining — pseudo queries to docid (pseudoquery → docid)
  3. Fine-tuning — real queries to docid using qrels supervision (query → docid)

SFT loss

SFT objective

Phase 2 — Pairwise Ranking Optimization (DDRO Loss)

The model is fine-tuned with a pairwise learning-to-rank objective inspired by Direct Preference Optimization (Rafailov et al., 2023), adapted for structured docid generation under beam decoding constraints.

DDRO loss

DDRO objective

The DDRO loss trains the model to prefer relevant documents (docid+) over non-relevant ones (docid-) relative to a frozen SFT reference policy:

Symbol Description
docid+ Relevant document for query q
docid- Non-relevant document
π_θ Current model being optimized
π_ref Frozen SFT reference model
β Temperature controlling preference sensitivity

Why DDRO differs from standard DPO

DPO DDRO
Architecture Decoder-only Encoder-decoder
Output Free-form text Structured docid sequences
Decoding Greedy/sampling Constrained beam search
Objective Open-ended preference Document-level ranking

Project Structure

src/
├── data/          # Downloading, preprocessing, and docid instance generation
├── pretrain/      # Model training and evaluation
├── scripts/       # Shell scripts for SFT, DDRO, BM25, and preprocessing
└── utils/         # Tokenization, trie, metrics, trainers

ddro_env.yml       # Conda environment for training
pyserini.yml       # Conda environment for BM25 retrieval
requirements.txt   # Python dependencies

Each subdirectory contains a detailed README.md with further instructions.


Setup

1. Install environment

git clone https://github.com/kidist-amde/ddro.git
cd ddro
conda env create -f ddro_env.yml
conda activate ddro_env

2. Download datasets and pretrained model

We use MS MARCO (top-300K) and Natural Questions (NQ-320K), plus a pretrained T5-base model.

bash ./src/data/download/download_msmarco_datasets.sh
bash ./src/data/download/download_nq_datasets.sh
python ./src/data/download/download_t5_model.py

See src/data/download/README.md for details.


Data Preparation

MS MARCO — sample top-300K subset

bash scripts/preprocess/sample_top_docs.sh

Output: resources/datasets/processed/msmarco-docs-sents.top.300k.json.gz

Expected directory structure

resources/
├── datasets/
│   ├── raw/
│   │   ├── msmarco-data/
│   │   └── nq-data/
│   └── processed/
└── transformer_models/
    └── t5-base/

For full preprocessing instructions (docid generation, training/eval instance creation): src/data/data_prep/README.md


Training

Phase 1 — SFT

Run all three SFT stages with a single command:

bash src/scripts/sft/launch_SFT_training.sh

The --encoding flag controls the docid format (pq or url_title).

Phase 2 — DDRO

After SFT, run pairwise ranking optimization:

bash scripts/ddro/slurm_submit_ddro_training.sh

DDRO is implemented using a custom version of HuggingFace's DPOTrainer.


Evaluation

Option A — Quick evaluation via launcher (recommended)

# SLURM
sbatch src/pretrain/hf_eval/slurm_submit_hf_eval.sh

# Direct
python src/pretrain/hf_eval/launch_hf_eval_from_config.py \
  --dataset msmarco \
  --encoding pq \
  --scale top_300k \
  --hf_docids_repo kiyam/ddro-docids \
  --hf_tests_repo  kiyam/ddro-testsets

Option B — Manual evaluation with HF URIs

NQ + Title+URL

python src/pretrain/hf_eval/eval_hf_docid_ranking.py \
  --pretrain_model_path kiyam/ddro-nq-tu \
  --docid_path "hf:dataset:kiyam/ddro-docids:tu_nq_docids.txt" \
  --test_file_path "hf:dataset:kiyam/ddro-testsets:nq/test_data/query_dev.t5_128_1.url_title_nq.json" \
  --dataset_script_dir src/data/data_scripts \
  --dataset_cache_dir ./cache \
  --num_beams 50 --add_doc_num 6144 \
  --max_seq_length 64 --max_docid_length 100 \
  --use_docid_rank True --docid_format nq \
  --lookup_fallback True --device cuda:0

MS MARCO + PQ

python src/pretrain/hf_eval/eval_hf_docid_ranking.py \
  --pretrain_model_path kiyam/ddro-msmarco-pq \
  --docid_path "hf:dataset:kiyam/ddro-docids:pq_msmarco_docids.txt" \
  --test_file_path "hf:dataset:kiyam/ddro-testsets:msmarco/test_data_top_300k/query_dev.t5_128_1.pq.top_300k.json" \
  --dataset_script_dir src/data/data_scripts \
  --dataset_cache_dir ./cache \
  --num_beams 80 --add_doc_num 6144 \
  --max_seq_length 64 --max_docid_length 24 \
  --use_docid_rank True --docid_format msmarco \
  --lookup_fallback True --device cuda:0

Results

Dataset Docid Model MRR@10 R@10
MS MARCO PQ kiyam/ddro-msmarco-pq 45.76 73.02
MS MARCO TU kiyam/ddro-msmarco-tu 50.07 74.01
NQ PQ kiyam/ddro-nq-pq 55.51 67.31
NQ TU kiyam/ddro-nq-tu 45.99 55.98

Notes

  • Recommended: transformers==4.37.2, tokenizers==0.15.2
  • NQ–PQ uses canonical integer docids; NQ–TU uses lowercased url_title strings — do not mix assets across sources
  • Default beam counts: NQ-PQ (100), NQ-TU (50), MS MARCO-PQ (80)
  • Logs saved to logs/<dataset>/dpo_*.log and logs/<dataset>/dpo_*.csv

Datasets and Checkpoints

All preprocessed datasets, docid encodings, and model checkpoints: DDRO Generative IR Collection on Hugging Face

Resource Link
MS MARCO Top-300K dataset kiyam/ddro-ms-dataset
NQ-320K dataset kiyam/ddro-nq-dataset
DocID tables kiyam/ddro-docids
Eval test sets kiyam/ddro-testsets

Acknowledgments


License

This project is licensed under the Apache 2.0 License.


Citation

@inproceedings{mekonnen2025lightweight,
  title={Lightweight and Direct Document Relevance Optimization for Generative Information Retrieval},
  author={Mekonnen, Kidist Amde and Tang, Yubao and de Rijke, Maarten},
  booktitle={Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval},
  pages={1327--1338},
  year={2025}
}

For questions, please open an issue.

© 2025 Kidist Amde Mekonnen — IRLab, University of Amsterdam

About

We introduce the direct document relevance optimization (DDRO) for training a pairwise ranker model. DDRO encourages the model to focus on document-level relevance during generation

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors