Align KTO with DPO: Align compute_ref_log_probs#5852
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 8959f80. Configure here.
| def compute_reference_log_probs(self, padded_batch: dict) -> dict: | ||
| """Computes log probabilities of the reference model for a single padded batch of a KTO specific dataset.""" | ||
| def compute_ref_log_probs(self, inputs): | ||
| """Computes reference log probabilities for a single padded batch.""" |
There was a problem hiding this comment.
Rename not propagated to BCO trainer
Low Severity
The rename from compute_reference_log_probs to compute_ref_log_probs was applied only to the KTO trainer, but the BCO trainer still uses the old name compute_reference_log_probs with the old parameter name padded_batch. The project's AGENTS.md rule states that when modifying duplicated code across trainers, the same change must be applied to all other trainers, and "not propagating a change is a bug."
Triggered by project rule: ../.ai/AGENTS.md
Reviewed by Cursor Bugbot for commit 8959f80. Configure here.
Align KTO with DPO: Align
compute_ref_log_probs.Part of:
Changes
Refactoring and Naming Consistency:
compute_reference_log_probstocompute_ref_log_probsand updated all internal calls accordingly for clarity and consistency.Code Simplification and Parameter Handling:
padded_batchtoinputsincompute_ref_log_probs, and updated all usages frompadded_batch[...]toinputs[...]to streamline the function interface.Note
Low Risk
Rename-only refactor in experimental KTO with no training or loss logic changes.
Overview
Renames
compute_reference_log_probstocompute_ref_log_probsin the experimental KTO trainer and updates the precompute path to call the new name, matching the DPO trainer API.The method now takes
inputsinstead ofpadded_batch; all tensor lookups useinputs[...]. Behavior is unchanged—reference completion (and optional KL) forward passes andget_batch_logpsare the same.Reviewed by Cursor Bugbot for commit 8959f80. Bugbot is set up for automated code reviews on this repo. Configure here.