Skip to content

Fix generate_tiny_models for gpt-oss#5622

Merged
albertvillanova merged 3 commits into
mainfrom
fix-5621
Apr 22, 2026
Merged

Fix generate_tiny_models for gpt-oss#5622
albertvillanova merged 3 commits into
mainfrom
fix-5621

Conversation

@albertvillanova

@albertvillanova albertvillanova commented Apr 22, 2026

Copy link
Copy Markdown
Member

Fix generate_tiny_models for gpt-oss.

This PR updates the initialization logic for tiny model weights in generate_tiny_models.py, specifically refining how expert-related configuration parameters are set for different model IDs.

Fix #5621.

Fixing PR for "trl-internal-testing/tiny-GptOssForCausalLM":

Fix issue introduced by:

Motivation

When GptOss was added to generate_tiny_models in #3848, it was added to an existing loop that already used num_experts=4 for Qwen3MoeConfig. For Qwen3Moe, num_experts is the declared field name. GptOss was included in the same loop body without noticing that GptOss uses a different field name (num_local_experts). It was a copy-paste oversight.

From the very first commit adding GptOssConfig (huggingface/transformers#39923), the declared field was always num_local_experts: int = 128. There was never a num_experts field.

When the generate script passed num_experts=4, it was silently stored as an unknown extra kwarg with zero effect on model construction: config.num_local_experts stayed at 128, so the checkpoint was saved with 128 experts.

Changes

Model configuration improvements:

  • Split the handling of "Qwen/Qwen3-30B-A3B" and "openai/gpt-oss-20b" models so that each receives the appropriate expert configuration parameter: "num_experts" for "Qwen/Qwen3-30B-A3B" and "num_local_experts" for "openai/gpt-oss-20b". This ensures each model is initialized with the correct configuration.

Note

Low Risk
Small, localized change to a test-model generation script; main risk is regenerating/publishing different tiny checkpoints for GPT-OSS.

Overview
Fixes MoE tiny-model generation for openai/gpt-oss-20b by setting the correct expert-count config field.

In scripts/generate_tiny_models.py, the MoE loop now uses num_experts=4 only for Qwen/Qwen3-30B-A3B, and sets num_local_experts=4 for openai/gpt-oss-20b (keeping n_routed_experts=4 for zai-org/GLM-4.5), ensuring the generated GPT-OSS tiny checkpoint has the intended number of experts.

Reviewed by Cursor Bugbot for commit 2632c64. Bugbot is set up for automated code reviews on this repo. Configure here.

@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 52276f8. Configure here.

Comment thread tests/conftest.py Outdated

@qgallouedec qgallouedec left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI is green, thanks for the fix!

@albertvillanova albertvillanova merged commit edaf6ec into main Apr 22, 2026
13 checks passed
@albertvillanova albertvillanova deleted the fix-5621 branch April 22, 2026 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CI fails with dev dependencies for gpt-oss models: RuntimeError: You set ignore_mismatched_sizes to False

3 participants