Skip to content

DRPO4LLM/DRPO4LLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Double Robust Preference Optimization

This is the codebase for Doubly Robust Alignment for Large Language Models(DRPO)

Double Robustness: Requires Only Correct Specification of Either the Reference Policy or the Preference Model.

Quickstart

git clone https://github.com/DRPO4LLM/DRPO4LLM.git && cd drpo
pip install -r requirements.txt

You need to config your own policy model (reference policy model), auxiliary preference model, your dataset, and other hyperparameters in config.yaml or drpo.py before

python ./examples/{tldr, hh}/drpo.py

A typical dataset should be in the form of either

dataset = {"prompt": "The sky is",
                      "a1": " blue.",
                      "a2": " green.",
                      "rank": 1,}

or

# Conversational format
dataset = {"prompt": [{"role": "user", "content": "What color is the sky?"}],
                      "a1": [{"role": "assistant", "content": "It is blue."}],
                      "a2": [{"role": "assistant", "content": "It is green."}]
                      "rank": 1,}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages