[GRPO] update Liger-kernel grpo loss (delta, vespo, KL bias correction) by kashif · Pull Request #5690 · huggingface/trl

kashif · 2026-05-01T08:42:46Z

Liger Kernel v0.8.0 ships delta (two-sided clipping), use_bias_correction_kl, sapo_temperature_pos/neg, and vespo_k_pos/lambda_pos/k_neg/lambda_neg on LigerFusedLinearGRPOLoss.

Forward them from GRPOConfig in compute_liger_loss, drop the now-stale delta+use_liger_kernel guard, and bump the liger-kernel pin to >=0.8.0 in the liger and dev extras.

Cover the Liger paths by parametrizing test_training_loss_types, test_training_delta_clipping, and test_training_with_bias_correction_kl over use_liger_kernel (require_liger_kernel as a conditional skip mark) instead of adding parallel test functions.

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline, Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

AI writing disclosure

We welcome the use of AI tools to help with contributions. For transparency and to help us improve our review process, please indicate the level of AI involvement in this PR.

No AI usage: the PR was written entirely by a human.
AI-assisted: some parts were suggested or improved by AI, but the PR was written and reviewed by a human.
AI-generated: the PR was mostly or fully generated by an AI tool.

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

Note

Medium Risk
Changes GRPO training behavior when use_liger_kernel is enabled by wiring new loss parameters (e.g., delta, VESPO, and KL bias correction) into the fused Liger loss and removing a previous configuration guard. Risk is mainly around correctness/regressions in Liger-backed training and test coverage depending on optional kernel availability.

Overview
Updates GRPO’s Liger-kernel integration to v0.8.0. The Liger optional dependency and runtime minimum version are bumped from 0.7.0 to 0.8.0.

When use_liger_kernel is enabled, GRPOTrainer now forwards additional config knobs into LigerFusedLinearGRPOLoss (including delta two-sided clipping, use_bias_correction_kl, and SAPO/VESPO parameters), and the GRPOConfig validation that previously rejected delta with Liger is removed.

Tests are adjusted to parameterize key GRPO training tests over use_liger_kernel, increasing coverage of the Liger code path under the existing require_liger_kernel skip mark.

^{Reviewed by Cursor Bugbot for commit 268e3d9. Bugbot is set up for automated code reviews on this repo. Configure here.}

Liger Kernel v0.8.0 ships delta (two-sided clipping), use_bias_correction_kl, sapo_temperature_pos/neg, and vespo_k_pos/lambda_pos/k_neg/lambda_neg on LigerFusedLinearGRPOLoss. Forward them from GRPOConfig in compute_liger_loss, drop the now-stale delta+use_liger_kernel guard, and bump the liger-kernel pin to >=0.8.0 in the liger and dev extras. Cover the Liger paths by parametrizing test_training_loss_types, test_training_delta_clipping, and test_training_with_bias_correction_kl over use_liger_kernel (require_liger_kernel as a conditional skip mark) instead of adding parallel test functions.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 36829eb776

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

HuggingFaceDocBuilderDev · 2026-05-01T08:46:30Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

The GRPO Liger path now forwards delta, use_bias_correction_kl, sapo_temperature_*, and vespo_* kwargs that only exist on LigerFusedLinearGRPOLoss in liger-kernel>=0.8.0. Without bumping the runtime gate, an installed 0.7.x would pass is_liger_kernel_available() and then fail deep in the constructor with an opaque TypeError. Raising the floor surfaces a clear ImportError at trainer init instead. This only affects TRL's own consumers of import_utils.is_liger_kernel_available (GRPO trainer, experimental KTO, env script, require_liger_kernel test mark); DPO/SFT/GKD/Gold/Distillation pull their gate from transformers.utils and are not impacted.

qgallouedec · 2026-05-03T21:03:58Z

lgtm!

@codex review

chatgpt-codex-connector · 2026-05-03T21:06:30Z

Codex Review: Didn't find any major issues. What shall we delve into next?

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

AmineDiro · 2026-05-04T11:53:05Z

LGTM

kashif added 2 commits May 1, 2026 10:41

fix formatting

40d81dc

kashif requested a review from albertvillanova May 1, 2026 08:45

chatgpt-codex-connector Bot reviewed May 1, 2026

View reviewed changes

Comment thread trl/trainer/grpo_trainer.py

kashif mentioned this pull request May 2, 2026

LigerFusedLinearGRPOLoss produces ~100x larger grad_norm than TRL's non-Liger-Kernel path due to missing vLLM IS correction and other differences linkedin/Liger-Kernel#1082

Open

kashif enabled auto-merge (squash) May 4, 2026 07:05

Merge branch 'main' into liger-grpo-v0.8.0

268e3d9

AmineDiro approved these changes May 4, 2026

View reviewed changes

kashif merged commit 5c521b7 into huggingface:main May 4, 2026
12 checks passed

kashif deleted the liger-grpo-v0.8.0 branch May 4, 2026 11:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GRPO] update Liger-kernel grpo loss (delta, vespo, KL bias correction)#5690

[GRPO] update Liger-kernel grpo loss (delta, vespo, KL bias correction)#5690
kashif merged 4 commits into
huggingface:mainfrom
kashif:liger-grpo-v0.8.0

kashif commented May 1, 2026 •

edited by cursor Bot

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented May 1, 2026

Uh oh!

qgallouedec commented May 3, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 3, 2026

Uh oh!

AmineDiro commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

kashif commented May 1, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

AI writing disclosure

Who can review?

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented May 1, 2026

Uh oh!

qgallouedec commented May 3, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 3, 2026

Uh oh!

AmineDiro commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kashif commented May 1, 2026 •

edited by cursor Bot

Loading