Fix generate_tiny_models for gpt-oss#5622
Merged
Merged
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 52276f8. Configure here.
qgallouedec
approved these changes
Apr 22, 2026
qgallouedec
left a comment
Member
There was a problem hiding this comment.
CI is green, thanks for the fix!
This reverts commit 52276f8.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix generate_tiny_models for gpt-oss.
This PR updates the initialization logic for tiny model weights in
generate_tiny_models.py, specifically refining how expert-related configuration parameters are set for different model IDs.Fix #5621.
Fixing PR for "trl-internal-testing/tiny-GptOssForCausalLM":
Fix issue introduced by:
Motivation
When GptOss was added to generate_tiny_models in #3848, it was added to an existing loop that already used
num_experts=4for Qwen3MoeConfig. For Qwen3Moe,num_expertsis the declared field name. GptOss was included in the same loop body without noticing that GptOss uses a different field name (num_local_experts). It was a copy-paste oversight.From the very first commit adding GptOssConfig (huggingface/transformers#39923), the declared field was always
num_local_experts: int = 128. There was never anum_expertsfield.When the generate script passed
num_experts=4, it was silently stored as an unknown extra kwarg with zero effect on model construction:config.num_local_expertsstayed at 128, so the checkpoint was saved with 128 experts.Changes
Model configuration improvements:
"Qwen/Qwen3-30B-A3B"and"openai/gpt-oss-20b"models so that each receives the appropriate expert configuration parameter:"num_experts"for"Qwen/Qwen3-30B-A3B"and"num_local_experts"for"openai/gpt-oss-20b". This ensures each model is initialized with the correct configuration.Note
Low Risk
Small, localized change to a test-model generation script; main risk is regenerating/publishing different tiny checkpoints for GPT-OSS.
Overview
Fixes MoE tiny-model generation for
openai/gpt-oss-20bby setting the correct expert-count config field.In
scripts/generate_tiny_models.py, the MoE loop now usesnum_experts=4only forQwen/Qwen3-30B-A3B, and setsnum_local_experts=4foropenai/gpt-oss-20b(keepingn_routed_experts=4forzai-org/GLM-4.5), ensuring the generated GPT-OSS tiny checkpoint has the intended number of experts.Reviewed by Cursor Bugbot for commit 2632c64. Bugbot is set up for automated code reviews on this repo. Configure here.