Skip to content

Conversation

whx-sjtu
Copy link
Contributor

@whx-sjtu whx-sjtu commented Sep 15, 2025

Purpose

The class SharedFusedMoE was proposed by @bnellnm in PR #23273. The model glm4_moe has shared experts but we don't use SharedFusedMoE for glm4_moe, I'm not sure why, please cc @bnellnm.
Let glm4_moe uses SharedFusedMoE can solve two problems in vllm-ascend:

  1. All-reduce related accuracy problem. In vllm-ascend we might use different MoE strategies in different situations according to num_tokens. For example, we might use all2all-based MoE strategy when num_tokens is small and use all-gather-based MoE strategy when num_tokens is large. This means in the same vllm service we sometimes perform all-reduce of shared experts together with router experts in maybe_all_reduce_tensor_model_parallel, and sometimes perform all-reduce of shared experts independently. However in currently glm4_moe modeling, whether perform all-reduce of shared experts independently is determined for sure in init by must_reduce_shared_expert_outputs, which conficts our implemention.
  2. Same as PR [Kernels] Overlap shared experts with send/recv #23273, we also want to implement our conputition-communication overlap in Ascend backend, which can easily be done with SharedFusedMoE.

In conclusion, I think we should use SharedFusedMoE to glm4_moe. cc @bnellnm @robertgshaw2-redhat @wangxiyuan @LucasWilkinson @Yikun

Test Plan

No need to add new test.

Test ResultResult

all tests should pass


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors Glm4MoE to use SharedFusedMoE when shared experts are present, which is a good improvement for flexibility and performance optimization on Ascend hardware. However, I've found a few critical issues in the implementation that need to be addressed. There's a bug in the forward method that will cause a TypeError during runtime, and another issue in the __init__ method that leads to incorrect model behavior by ignoring the routed_scaling_factor. I've also suggested a refactoring to improve code clarity and maintainability by reducing duplication.

Signed-off-by: whx-sjtu <2952154980@qq.com>
Signed-off-by: whx-sjtu <2952154980@qq.com>
@bnellnm
Copy link
Contributor

bnellnm commented Sep 15, 2025

I only added SharedFusedMoE support for deepseek and llama4 since adding it everywhere would have been more disruptive. There's no reason it can't be applied to other models that use shared experts.

Copy link
Contributor

@bnellnm bnellnm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Yikun Yikun added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 16, 2025
@Yikun
Copy link
Collaborator

Yikun commented Sep 16, 2025

Label ready to run all test

@DarkLight1337 DarkLight1337 merged commit c15309a into vllm-project:main Sep 17, 2025
62 checks passed
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
Signed-off-by: whx-sjtu <2952154980@qq.com>
charlifu pushed a commit to ROCm/vllm that referenced this pull request Sep 25, 2025
Signed-off-by: whx-sjtu <2952154980@qq.com>
Signed-off-by: charlifu <charlifu@amd.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
Signed-off-by: whx-sjtu <2952154980@qq.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants