Skip to content

GKDTrainer: Fix return_outputs in Liger kernel path and update tests#4688

Merged
kashif merged 8 commits into
huggingface:mainfrom
roycho96:main
May 6, 2026
Merged

GKDTrainer: Fix return_outputs in Liger kernel path and update tests#4688
kashif merged 8 commits into
huggingface:mainfrom
roycho96:main

Conversation

@roycho96

@roycho96 roycho96 commented Dec 13, 2025

Copy link
Copy Markdown
Contributor

What does this PR do?

Fixes UnboundLocalError when return_outputs=True in GKDTrainer's Liger kernel path, and updates tests for Liger kernel compatibility.

Changes

  1. Bug fix: Handle return_outputs=True in Liger kernel path

    • student_outputs was deleted before being returned in compute_loss
    • Now properly performs forward pass when outputs are needed
  2. Test update: Remove xfail marker from test_gkd_trainer_with_liger

    • OOM issue doesn't happen in liger-kernel 0.6.4
  3. New test: Add test_compute_loss_return_outputs_with_liger

    • Ensures evaluate() works correctly with Liger kernel

Tested with liger-kernel 0.6.4. All GKD tests pass.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.


Note

Medium Risk
Touches GKDTrainer.compute_loss in the Liger fused-loss path, which is core training/eval logic; incorrect outputs could affect evaluation behavior, but the change is small and covered by targeted tests.

Overview
Fixes GKDTrainer.compute_loss when using the Liger fused JSD loss so return_outputs=True no longer fails (keeps student_outputs alive, returns a ModelOutput, and frees memory/empties cache after loss computation).

Updates Liger-related tests by removing the previous xfail on the Liger training test and adding a new regression test asserting evaluate() works on the Liger path (i.e., return_outputs=True does not raise and produces eval_loss).

Reviewed by Cursor Bugbot for commit 36e2fc9. Bugbot is set up for automated code reviews on this repo. Configure here.

Sunghyun Cho added 3 commits December 12, 2025 16:45
- Add missing vocab dimension reduction in generalized_jsd_loss (jsd.sum(dim=-1)) to properly compute KL divergence
- Fix UnboundLocalError when return_outputs=True in Liger path by handling student_outputs before deletion
@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Comment thread tests/experimental/test_gkd_trainer.py Outdated
@roycho96

roycho96 commented May 4, 2026

Copy link
Copy Markdown
Contributor Author

@kashif
Gentle ping on this. Let me know if there’s anything I can help with to get it over the line.

@kashif

kashif commented May 4, 2026

Copy link
Copy Markdown
Collaborator

@codex review

@kashif

kashif commented May 4, 2026

Copy link
Copy Markdown
Collaborator

I was thinking perhaps a simpler fix could be to store the logits before deleting them in the liger path, and then use them for returning the outputs instead of re-runing the model() forward?

Comment thread tests/experimental/test_gkd_trainer.py Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 64a1b72680

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tests/experimental/test_gkd_trainer.py Outdated
Comment thread trl/experimental/gkd/gkd_trainer.py Outdated
@chatgpt-codex-connector

Copy link
Copy Markdown

💡 Codex Review

if not getattr(trainer, "use_liger_gkd_loss", False):

P1 Badge Replace getattr-based skip with explicit fused-loss check

Using getattr(trainer, "use_liger_gkd_loss", False) lets this test skip silently if the attribute is renamed or accidentally removed, which can mask regressions in the Liger path while CI still passes. In this repo, AGENTS.md explicitly requires avoiding getattr/hasattr; reading trainer.use_liger_gkd_loss directly (or checking an explicit runtime/version condition) makes contract breaks fail loudly instead of being hidden as skips.


if return_outputs:
student_outputs = model(
input_ids=inputs["input_ids"],
attention_mask=inputs["attention_mask"],
)

P2 Badge Propagate the duplicated Liger return_outputs fix consistently

This adds a special return_outputs handling branch only in GKDTrainer, but the repository’s AGENTS.md requires duplicated trainer logic to stay aligned across trainers. Leaving this as a one-off divergence in the shared Liger JSD pattern makes cross-trainer behavior inconsistent and increases the chance that equivalent paths (e.g., in other distillation trainers) keep the old bug/structure and drift further over time.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@roycho96

roycho96 commented May 4, 2026

Copy link
Copy Markdown
Contributor Author

I was thinking perhaps a simpler fix could be to store the logits before deleting them in the liger path, and then use them for returning the outputs instead of re-runing the model() forward?

Thanks for the suggestion! Applied it in the latest commit.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 745245f. Configure here.

Comment thread trl/experimental/gkd/gkd_trainer.py Outdated
Refactor return statement to include ModelOutput when return_outputs is True.
@kashif kashif merged commit 1240ecf into huggingface:main May 6, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants