[Model] Apply SharedFusedMoE to glm4_moe. #24849

whx-sjtu · 2025-09-15T04:10:19Z

Purpose

The class SharedFusedMoE was proposed by @bnellnm in PR #23273. The model glm4_moe has shared experts but we don't use SharedFusedMoE for glm4_moe, I'm not sure why, please cc @bnellnm.
Let glm4_moe uses SharedFusedMoE can solve two problems in vllm-ascend:

All-reduce related accuracy problem. In vllm-ascend we might use different MoE strategies in different situations according to num_tokens. For example, we might use all2all-based MoE strategy when num_tokens is small and use all-gather-based MoE strategy when num_tokens is large. This means in the same vllm service we sometimes perform all-reduce of shared experts together with router experts in maybe_all_reduce_tensor_model_parallel, and sometimes perform all-reduce of shared experts independently. However in currently glm4_moe modeling, whether perform all-reduce of shared experts independently is determined for sure in init by must_reduce_shared_expert_outputs, which conficts our implemention.
Same as PR [Kernels] Overlap shared experts with send/recv #23273, we also want to implement our conputition-communication overlap in Ascend backend, which can easily be done with SharedFusedMoE.

In conclusion, I think we should use SharedFusedMoE to glm4_moe. cc @bnellnm @robertgshaw2-redhat @wangxiyuan @LucasWilkinson @Yikun

Test Plan

No need to add new test.

Test ResultResult

all tests should pass

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request refactors Glm4MoE to use SharedFusedMoE when shared experts are present, which is a good improvement for flexibility and performance optimization on Ascend hardware. However, I've found a few critical issues in the implementation that need to be addressed. There's a bug in the forward method that will cause a TypeError during runtime, and another issue in the __init__ method that leads to incorrect model behavior by ignoring the routed_scaling_factor. I've also suggested a refactoring to improve code clarity and maintainability by reducing duplication.

vllm/model_executor/models/glm4_moe.py

Signed-off-by: whx-sjtu <2952154980@qq.com>

bnellnm · 2025-09-15T15:22:47Z

I only added SharedFusedMoE support for deepseek and llama4 since adding it everywhere would have been more disruptive. There's no reason it can't be applied to other models that use shared experts.

bnellnm

LGTM!

Yikun · 2025-09-16T02:38:26Z

Label ready to run all test

Signed-off-by: whx-sjtu <2952154980@qq.com>

Signed-off-by: whx-sjtu <2952154980@qq.com> Signed-off-by: charlifu <charlifu@amd.com>

Signed-off-by: whx-sjtu <2952154980@qq.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

gemini-code-assist bot reviewed Sep 15, 2025

View reviewed changes

vllm/model_executor/models/glm4_moe.py Outdated Show resolved Hide resolved

make glm4_moe utilize the shared_experts optimization

036509e

Signed-off-by: whx-sjtu <2952154980@qq.com>

whx-sjtu force-pushed the fix_glm_accu branch from 1d21408 to 036509e Compare September 15, 2025 04:20

fix lint

b1795cf

Signed-off-by: whx-sjtu <2952154980@qq.com>

whx-sjtu force-pushed the fix_glm_accu branch from 3305fe1 to b1795cf Compare September 15, 2025 08:11

bnellnm approved these changes Sep 15, 2025

View reviewed changes

Yikun added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 16, 2025

shen-shanshan mentioned this pull request Sep 15, 2025

[Bug]: GLM-4.5 Accuracy Issue with DP+EP vllm-project/vllm-ascend#2767

Open

DarkLight1337 approved these changes Sep 17, 2025

View reviewed changes

DarkLight1337 merged commit c15309a into vllm-project:main Sep 17, 2025
62 checks passed

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[Model] Apply SharedFusedMoE to glm4_moe. (vllm-project#24849)

2626514

Signed-off-by: whx-sjtu <2952154980@qq.com>

charlifu pushed a commit to ROCm/vllm that referenced this pull request Sep 25, 2025

[Model] Apply SharedFusedMoE to glm4_moe. (vllm-project#24849)

fa87338

Signed-off-by: whx-sjtu <2952154980@qq.com> Signed-off-by: charlifu <charlifu@amd.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025

[Model] Apply SharedFusedMoE to glm4_moe. (vllm-project#24849)

e339281

Signed-off-by: whx-sjtu <2952154980@qq.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Model] Apply SharedFusedMoE to glm4_moe. #24849

[Model] Apply SharedFusedMoE to glm4_moe. #24849

whx-sjtu commented Sep 15, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

bnellnm commented Sep 15, 2025

Uh oh!

bnellnm left a comment

Uh oh!

Yikun commented Sep 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[Model] Apply SharedFusedMoE to glm4_moe. #24849

[Model] Apply SharedFusedMoE to glm4_moe. #24849

Conversation

whx-sjtu commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test ResultResult

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

bnellnm commented Sep 15, 2025

Uh oh!

bnellnm left a comment

Choose a reason for hiding this comment

Uh oh!

Yikun commented Sep 16, 2025

Uh oh!

Uh oh!

Uh oh!

whx-sjtu commented Sep 15, 2025 •

edited

Loading