Skip to content

[GRPO] update Liger-kernel grpo loss (delta, vespo, KL bias correction)#5690

Merged
kashif merged 4 commits into
huggingface:mainfrom
kashif:liger-grpo-v0.8.0
May 4, 2026
Merged

[GRPO] update Liger-kernel grpo loss (delta, vespo, KL bias correction)#5690
kashif merged 4 commits into
huggingface:mainfrom
kashif:liger-grpo-v0.8.0

Conversation

@kashif

@kashif kashif commented May 1, 2026

Copy link
Copy Markdown
Collaborator

Liger Kernel v0.8.0 ships delta (two-sided clipping), use_bias_correction_kl, sapo_temperature_pos/neg, and vespo_k_pos/lambda_pos/k_neg/lambda_neg on LigerFusedLinearGRPOLoss.

Forward them from GRPOConfig in compute_liger_loss, drop the now-stale delta+use_liger_kernel guard, and bump the liger-kernel pin to >=0.8.0 in the liger and dev extras.

Cover the Liger paths by parametrizing test_training_loss_types, test_training_delta_clipping, and test_training_with_bias_correction_kl over use_liger_kernel (require_liger_kernel as a conditional skip mark) instead of adding parallel test functions.

What does this PR do?

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline, Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

AI writing disclosure

We welcome the use of AI tools to help with contributions. For transparency and to help us improve our review process, please indicate the level of AI involvement in this PR.

  • No AI usage: the PR was written entirely by a human.
  • AI-assisted: some parts were suggested or improved by AI, but the PR was written and reviewed by a human.
  • AI-generated: the PR was mostly or fully generated by an AI tool.

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.


Note

Medium Risk
Changes GRPO training behavior when use_liger_kernel is enabled by wiring new loss parameters (e.g., delta, VESPO, and KL bias correction) into the fused Liger loss and removing a previous configuration guard. Risk is mainly around correctness/regressions in Liger-backed training and test coverage depending on optional kernel availability.

Overview
Updates GRPO’s Liger-kernel integration to v0.8.0. The Liger optional dependency and runtime minimum version are bumped from 0.7.0 to 0.8.0.

When use_liger_kernel is enabled, GRPOTrainer now forwards additional config knobs into LigerFusedLinearGRPOLoss (including delta two-sided clipping, use_bias_correction_kl, and SAPO/VESPO parameters), and the GRPOConfig validation that previously rejected delta with Liger is removed.

Tests are adjusted to parameterize key GRPO training tests over use_liger_kernel, increasing coverage of the Liger code path under the existing require_liger_kernel skip mark.

Reviewed by Cursor Bugbot for commit 268e3d9. Bugbot is set up for automated code reviews on this repo. Configure here.

kashif added 2 commits May 1, 2026 10:41
Liger Kernel v0.8.0 ships delta (two-sided clipping), use_bias_correction_kl,
sapo_temperature_pos/neg, and vespo_k_pos/lambda_pos/k_neg/lambda_neg on
LigerFusedLinearGRPOLoss. Forward them from GRPOConfig in compute_liger_loss,
drop the now-stale delta+use_liger_kernel guard, and bump the liger-kernel pin
to >=0.8.0 in the liger and dev extras.

Cover the Liger paths by parametrizing test_training_loss_types,
test_training_delta_clipping, and test_training_with_bias_correction_kl over
use_liger_kernel (require_liger_kernel as a conditional skip mark) instead of
adding parallel test functions.
@kashif kashif requested a review from albertvillanova May 1, 2026 08:45

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 36829eb776

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread trl/trainer/grpo_trainer.py
@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

The GRPO Liger path now forwards delta, use_bias_correction_kl,
sapo_temperature_*, and vespo_* kwargs that only exist on
LigerFusedLinearGRPOLoss in liger-kernel>=0.8.0. Without bumping the runtime
gate, an installed 0.7.x would pass is_liger_kernel_available() and then fail
deep in the constructor with an opaque TypeError. Raising the floor surfaces a
clear ImportError at trainer init instead.

This only affects TRL's own consumers of import_utils.is_liger_kernel_available
(GRPO trainer, experimental KTO, env script, require_liger_kernel test mark);
DPO/SFT/GKD/Gold/Distillation pull their gate from transformers.utils and are
not impacted.
@qgallouedec

Copy link
Copy Markdown
Member

lgtm!

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. What shall we delve into next?

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@kashif kashif enabled auto-merge (squash) May 4, 2026 07:05
@AmineDiro

Copy link
Copy Markdown
Member

LGTM

@kashif kashif merged commit 5c521b7 into huggingface:main May 4, 2026
12 checks passed
@kashif kashif deleted the liger-grpo-v0.8.0 branch May 4, 2026 11:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants