Reinforcement Learning from Human Feedback

A short introduction to RLHF and post-training focused on language models.

Nathan Lambert

RLHF Model Completions Library

Explore completions from instruction-tuned models and their downstream RLHF counterparts.

The models used here are OLMo and Tülu models sourced from open-source post-training pipelines at the Allen Institute for AI. Overall, it is rare for intermediate models to be released at all. For each of the 18 models (9 SFT/RLHF pairs), we generated three completions for a static set of 16 prompts.

Feel free to use these examples for talks and other educational purposes. Please cite the book and the authors of the models. The data is available here and is licensed under ODC-BY.

Settings

Show settings

Citation

If you found this useful for your research, please cite it!

@book{rlhf2026lambert,
  author = {Nathan Lambert},
  title = {Reinforcement Learning from Human Feedback},
  year = {2026},
  publisher = {Online},
  url = {https://rlhfbook.com}
}

Reinforcement Learning from Human Feedback

RLHF Model Completions Library

Settings

Prompts

Model Pairs

Citation