[BugFix] Fix routed_scaling_factor double mul for dots1 and glm4 MoE models #24132

sarckk · 2025-09-03T00:22:42Z

Purpose

Deepseek fix landed in #24119. This is the same fix for Dots1 and GLM4 moe models as they both have double mult issue:

vllm/vllm/model_executor/models/dots1.py

Line 163 in 98aee61

router_logits=router_logits) * self.routed_scaling_factor

vllm/vllm/model_executor/models/glm4_moe.py

Line 189 in 98aee61

router_logits=router_logits) * self.routed_scaling_factor

Test Plan

gsm8k

lm_eval --model vllm --model_args pretrained=rednote-hilab/dots.vlm1.inst,tensor_parallel_size=8,max_model_len=32768,gpu_memory_utilization=0.9,enforce_eager=True --tasks gsm8k --batch_size auto

Test Result

Unable to test glm4 due to hang (#24133)

dots1

Before:

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.0144|±  |0.0033|
|     |       |strict-match    |     5|exact_match|↑  |0.0000|±  |0.0000|

After PR:

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.8400|±  |0.0101|
|     |       |strict-match    |     5|exact_match|↑  |0.7513|±  |0.0119|

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

…ssue Signed-off-by: Yong Hoon Shin <[email protected]>

Isotr0py

Thanks for fixing!

facebook-github-bot · 2025-09-03T04:28:00Z

@sarckk has imported this pull request. If you are a Meta employee, you can view this in D81556978.

* 'main' of https://github.com/845473182/vllm: (457 commits) [BugFix] Fix routed_scaling_factor double mul for dots1 and glm4 MoE models (vllm-project#24132) [Misc] Add check for dual_chunk_attention (vllm-project#24070) [Doc]: fix typos in Python comments (vllm-project#24115) [Doc]: fix typos in Python comments (vllm-project#24093) [Compile] Fix Compile Warning for `w4a8_mm_entry.cu` (vllm-project#23660) fix some typos (vllm-project#24071) [V1] Wrapper which plumbs request-level logits processors into vLLM batch-level logits processing (vllm-project#23656) Upgrade xgrammar to 0.1.23 (vllm-project#22988) Update release pipeline post PyTorch 2.8.0 update (vllm-project#24073) [XPU] Fix the bug of LoRA logits on the XPU platform (vllm-project#24081) [CI/Build] Disable SiluMul NVFP4 quant fusion tests (vllm-project#24121) [Bug] R1 Accuracy: Fix `routed_scaling_factor` Double Mul Issue (vllm-project#24119) [AMD][Kernel][Bugfix] Cast offsets tensor bn to tl.int64 to avoid GPU segfault (vllm-project#23692) [CI] Enable all hf transformers baselines in test_hybrid (vllm-project#23936) [Log] Only Print Profiler Results on Rank 0 (vllm-project#23370) Fix weights loading for Apertus (vllm-project#24100) [Metrics] Deprecate TPOT in favor of ITL (vllm-project#24110) [Bugfix] Fix packed_factor missing attribute error (vllm-project#23902) Run ruff format on a few files. (vllm-project#24075) [Bugfix] Fix transform_config parsing in Compressed Tensors (vllm-project#23945) ...

…models (vllm-project#24132) Signed-off-by: Yong Hoon Shin <[email protected]>

sarckk changed the title ~~[Bug] Fix routed_scaling_factor double mul for dots1 and glm4 MoE models~~ [BugFix] Fix routed_scaling_factor double mul for dots1 and glm4 MoE models Sep 3, 2025

sarckk marked this pull request as ready for review September 3, 2025 01:31

sarckk marked this pull request as draft September 3, 2025 01:37

[Bug] Dots1 and GLM4 accruacy: Fix routed_scaling_factor Double Mul I…

f806105

…ssue Signed-off-by: Yong Hoon Shin <[email protected]>

sarckk force-pushed the fix-moe-scaling branch from d1ced7f to f806105 Compare September 3, 2025 02:30

sarckk marked this pull request as ready for review September 3, 2025 02:30

sarckk requested a review from Isotr0py September 3, 2025 02:31

Isotr0py approved these changes Sep 3, 2025

View reviewed changes

Isotr0py enabled auto-merge (squash) September 3, 2025 02:34

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 3, 2025

Isotr0py merged commit 426cc86 into vllm-project:main Sep 3, 2025
51 checks passed

sarckk deleted the fix-moe-scaling branch September 4, 2025 00:18

eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025

[BugFix] Fix routed_scaling_factor double mul for dots1 and glm4 MoE …

ea95872

…models (vllm-project#24132) Signed-off-by: Yong Hoon Shin <[email protected]>

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[BugFix] Fix routed_scaling_factor double mul for dots1 and glm4 MoE …

18865f8

…models (vllm-project#24132) Signed-off-by: Yong Hoon Shin <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[BugFix] Fix routed_scaling_factor double mul for dots1 and glm4 MoE models #24132

[BugFix] Fix routed_scaling_factor double mul for dots1 and glm4 MoE models #24132

Uh oh!

sarckk commented Sep 3, 2025 •

edited by github-actions bot

Loading

Uh oh!

Isotr0py left a comment

Uh oh!

facebook-github-bot commented Sep 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[BugFix] Fix routed_scaling_factor double mul for dots1 and glm4 MoE models #24132

[BugFix] Fix routed_scaling_factor double mul for dots1 and glm4 MoE models #24132

Uh oh!

Conversation

sarckk commented Sep 3, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

dots1

Uh oh!

Isotr0py left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Sep 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sarckk commented Sep 3, 2025 •

edited by github-actions bot

Loading