Fix generate_tiny_models for gpt-oss by albertvillanova · Pull Request #5622 · huggingface/trl

albertvillanova · 2026-04-22T08:50:38Z

Fix generate_tiny_models for gpt-oss.

This PR updates the initialization logic for tiny model weights in generate_tiny_models.py, specifically refining how expert-related configuration parameters are set for different model IDs.

Fix #5621.

Fixing PR for "trl-internal-testing/tiny-GptOssForCausalLM":

https://huggingface.co/trl-internal-testing/tiny-GptOssForCausalLM/discussions/2
- See Files changed: https://huggingface.co/trl-internal-testing/tiny-GptOssForCausalLM/discussions/2/files

Fix issue introduced by:

🌺 OpenAI GPT OSS & Harmony support #3848

Motivation

When GptOss was added to generate_tiny_models in #3848, it was added to an existing loop that already used num_experts=4 for Qwen3MoeConfig. For Qwen3Moe, num_experts is the declared field name. GptOss was included in the same loop body without noticing that GptOss uses a different field name (num_local_experts). It was a copy-paste oversight.

From the very first commit adding GptOssConfig (huggingface/transformers#39923), the declared field was always num_local_experts: int = 128. There was never a num_experts field.

When the generate script passed num_experts=4, it was silently stored as an unknown extra kwarg with zero effect on model construction: config.num_local_experts stayed at 128, so the checkpoint was saved with 128 experts.

Changes

Model configuration improvements:

Split the handling of "Qwen/Qwen3-30B-A3B" and "openai/gpt-oss-20b" models so that each receives the appropriate expert configuration parameter: "num_experts" for "Qwen/Qwen3-30B-A3B" and "num_local_experts" for "openai/gpt-oss-20b". This ensures each model is initialized with the correct configuration.

Note

Low Risk
Small, localized change to a test-model generation script; main risk is regenerating/publishing different tiny checkpoints for GPT-OSS.

Overview
Fixes MoE tiny-model generation for openai/gpt-oss-20b by setting the correct expert-count config field.

In scripts/generate_tiny_models.py, the MoE loop now uses num_experts=4 only for Qwen/Qwen3-30B-A3B, and sets num_local_experts=4 for openai/gpt-oss-20b (keeping n_routed_experts=4 for zai-org/GLM-4.5), ensuring the generated GPT-OSS tiny checkpoint has the intended number of experts.

^{Reviewed by Cursor Bugbot for commit 2632c64. Bugbot is set up for automated code reviews on this repo. Configure here.}

HuggingFaceDocBuilderDev · 2026-04-22T08:53:19Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 52276f8. Configure here.}

qgallouedec

CI is green, thanks for the fix!

This reverts commit 52276f8.

Fix generate_tiny_models for gpt-oss

0a32fd5

Test tiny gpt-oss PR 2

52276f8

cursor Bot reviewed Apr 22, 2026

View reviewed changes

Comment thread tests/conftest.py Outdated

qgallouedec approved these changes Apr 22, 2026

View reviewed changes

Revert "Test tiny gpt-oss PR 2"

2632c64

This reverts commit 52276f8.

albertvillanova merged commit edaf6ec into main Apr 22, 2026
13 checks passed

albertvillanova deleted the fix-5621 branch April 22, 2026 14:16

albertvillanova mentioned this pull request Apr 23, 2026

Remove attribute_map from GptOssConfig huggingface/transformers#45578

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix generate_tiny_models for gpt-oss#5622

Fix generate_tiny_models for gpt-oss#5622
albertvillanova merged 3 commits into
mainfrom
fix-5621

albertvillanova commented Apr 22, 2026 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Apr 22, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

qgallouedec left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

albertvillanova commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Changes

Uh oh!

HuggingFaceDocBuilderDev commented Apr 22, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

qgallouedec left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

albertvillanova commented Apr 22, 2026 •

edited

Loading