A short introduction to RLHF and post-training focused on language models.
Explore completions from instruction-tuned models and their downstream RLHF counterparts.
The models used here are OLMo and Tülu models sourced from open-source post-training pipelines at the Allen Institute for AI. Overall, it is rare for intermediate models to be released at all. For each of the 18 models (9 SFT/RLHF pairs), we generated three completions for a static set of 16 prompts.
Feel free to use these examples for talks and other educational purposes. Please cite the book and the authors of the models. The data is available here and is licensed under ODC-BY.
If you found this useful for your research, please cite it!
@book{rlhf2026lambert,
author = {Nathan Lambert},
title = {Reinforcement Learning from Human Feedback},
year = {2026},
publisher = {Online},
url = {https://rlhfbook.com}
}