[GRPO] update Liger-kernel grpo loss (delta, vespo, KL bias correction)#5690
Conversation
Liger Kernel v0.8.0 ships delta (two-sided clipping), use_bias_correction_kl, sapo_temperature_pos/neg, and vespo_k_pos/lambda_pos/k_neg/lambda_neg on LigerFusedLinearGRPOLoss. Forward them from GRPOConfig in compute_liger_loss, drop the now-stale delta+use_liger_kernel guard, and bump the liger-kernel pin to >=0.8.0 in the liger and dev extras. Cover the Liger paths by parametrizing test_training_loss_types, test_training_delta_clipping, and test_training_with_bias_correction_kl over use_liger_kernel (require_liger_kernel as a conditional skip mark) instead of adding parallel test functions.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 36829eb776
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
The GRPO Liger path now forwards delta, use_bias_correction_kl, sapo_temperature_*, and vespo_* kwargs that only exist on LigerFusedLinearGRPOLoss in liger-kernel>=0.8.0. Without bumping the runtime gate, an installed 0.7.x would pass is_liger_kernel_available() and then fail deep in the constructor with an opaque TypeError. Raising the floor surfaces a clear ImportError at trainer init instead. This only affects TRL's own consumers of import_utils.is_liger_kernel_available (GRPO trainer, experimental KTO, env script, require_liger_kernel test mark); DPO/SFT/GKD/Gold/Distillation pull their gate from transformers.utils and are not impacted.
|
lgtm! @codex review |
|
Codex Review: Didn't find any major issues. What shall we delve into next? ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
LGTM |
Liger Kernel v0.8.0 ships delta (two-sided clipping),
use_bias_correction_kl,sapo_temperature_pos/neg, andvespo_k_pos/lambda_pos/k_neg/lambda_negonLigerFusedLinearGRPOLoss.Forward them from
GRPOConfigincompute_liger_loss, drop the now-stale delta+use_liger_kernel guard, and bump the liger-kernel pin to >=0.8.0 in the liger and dev extras.Cover the Liger paths by parametrizing
test_training_loss_types,test_training_delta_clipping, andtest_training_with_bias_correction_kloveruse_liger_kernel(require_liger_kernelas a conditional skip mark) instead of adding parallel test functions.What does this PR do?
Fixes # (issue)
Before submitting
AI writing disclosure
We welcome the use of AI tools to help with contributions. For transparency and to help us improve our review process, please indicate the level of AI involvement in this PR.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.
Note
Medium Risk
Changes GRPO training behavior when
use_liger_kernelis enabled by wiring new loss parameters (e.g.,delta, VESPO, and KL bias correction) into the fused Liger loss and removing a previous configuration guard. Risk is mainly around correctness/regressions in Liger-backed training and test coverage depending on optional kernel availability.Overview
Updates GRPO’s Liger-kernel integration to v0.8.0. The Liger optional dependency and runtime minimum version are bumped from
0.7.0to0.8.0.When
use_liger_kernelis enabled,GRPOTrainernow forwards additional config knobs intoLigerFusedLinearGRPOLoss(includingdeltatwo-sided clipping,use_bias_correction_kl, and SAPO/VESPO parameters), and theGRPOConfigvalidation that previously rejecteddeltawith Liger is removed.Tests are adjusted to parameterize key GRPO training tests over
use_liger_kernel, increasing coverage of the Liger code path under the existingrequire_liger_kernelskip mark.Reviewed by Cursor Bugbot for commit 268e3d9. Bugbot is set up for automated code reviews on this repo. Configure here.