Align KTO with DPO: Align compute_ref_log_probs by albertvillanova · Pull Request #5852 · huggingface/trl

albertvillanova · 2026-05-26T16:27:10Z

Align KTO with DPO: Align compute_ref_log_probs.

Part of:

KTO refactoring #4786

Changes

Refactoring and Naming Consistency:

Renamed the method compute_reference_log_probs to compute_ref_log_probs and updated all internal calls accordingly for clarity and consistency.

Code Simplification and Parameter Handling:

Changed the method parameter from padded_batch to inputs in compute_ref_log_probs, and updated all usages from padded_batch[...] to inputs[...] to streamline the function interface.

Note

Low Risk
Rename-only refactor in experimental KTO with no training or loss logic changes.

Overview
Renames compute_reference_log_probs to compute_ref_log_probs in the experimental KTO trainer and updates the precompute path to call the new name, matching the DPO trainer API.

The method now takes inputs instead of padded_batch; all tensor lookups use inputs[...]. Behavior is unchanged—reference completion (and optional KL) forward passes and get_batch_logps are the same.

^{Reviewed by Cursor Bugbot for commit 8959f80. Bugbot is set up for automated code reviews on this repo. Configure here.}

cursor

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 8959f80. Configure here.}

cursor · 2026-05-26T16:29:12Z

-    def compute_reference_log_probs(self, padded_batch: dict) -> dict:
-        """Computes log probabilities of the reference model for a single padded batch of a KTO specific dataset."""
+    def compute_ref_log_probs(self, inputs):
+        """Computes reference log probabilities for a single padded batch."""


Rename not propagated to BCO trainer

Low Severity

The rename from compute_reference_log_probs to compute_ref_log_probs was applied only to the KTO trainer, but the BCO trainer still uses the old name compute_reference_log_probs with the old parameter name padded_batch. The project's AGENTS.md rule states that when modifying duplicated code across trainers, the same change must be applied to all other trainers, and "not propagating a change is a bug."

^{Triggered by project rule: ../.ai/AGENTS.md}

^{Reviewed by Cursor Bugbot for commit 8959f80. Configure here.}

albertvillanova added 4 commits May 26, 2026 18:20

Rename compute_reference_log_probs to compute_ref_log_probs

d526d71

Rename padded_batch to inputs

049fce5

Remove type hints

9102960

Align docstring

8959f80

cursor Bot reviewed May 26, 2026

View reviewed changes

qgallouedec approved these changes May 26, 2026

View reviewed changes

albertvillanova merged commit a25c07e into main May 26, 2026
6 checks passed

albertvillanova deleted the align-kto-dpo-compute_reference_log_probs branch May 26, 2026 20:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Align KTO with DPO: Align compute_ref_log_probs#5852

Align KTO with DPO: Align compute_ref_log_probs#5852
albertvillanova merged 4 commits into
mainfrom
align-kto-dpo-compute_reference_log_probs

albertvillanova commented May 26, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

albertvillanova commented May 26, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 26, 2026

Choose a reason for hiding this comment

Rename not propagated to BCO trainer

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

albertvillanova commented May 26, 2026 •

edited by cursor Bot

Loading