gpt-oss-20b PEFT fails one 1 GPU with DeepEP issue

**Describe the bug**
When we run gpt-oss-20b PEFT training on a single GPU, the script fails because `n_routed_experts` is not recognized.

```
[rank0]:   File "/opt/Automodel/nemo_automodel/components/moe/layers.py", line 493, in forward
[rank0]:     assert self.n_routed_experts % self.ep_size == 0, (
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1964, in __getattr__
[rank0]:     raise AttributeError(
[rank0]: AttributeError: 'GroupedExpertsDeepEP' object has no attribute 'n_routed_experts'
```
When we disable DeepEP, the script runs as expected.

The difference between `GroupedExperts` and `GroupedExpertsDeepEP` is that `n_routed_experts` is only set in the DeepEP version by the method `init_token_dispatcher`, which appears to only be called in multi-GPU settings. In vanilla `GroupedExperts` it's set in the `__init__` method.

**Steps/Code to reproduce bug**

Run an MOE config with DeepEP enabled on a single GPU.


**Expected behavior**

Script runs.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gpt-oss-20b PEFT fails one 1 GPU with DeepEP issue #968

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

gpt-oss-20b PEFT fails one 1 GPU with DeepEP issue #968

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions