[Feature][EPLB] Add support for unquantized models #21168

hsliuustc · 2025-07-18T09:13:23Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

This PR aims to work with the long term todo with this PR: #18343
Add EPLB support for unquantized MoE model,

Test Plan

GPUs: 4 * A100-80G

vllm serve /workspace/models/DeepSeek-V2-Lite \
  --gpu-memory-utilization 0.85 \
  --tensor-parallel-size 4 \
  --enforce-eager  \
  --host 0.0.0.0 \
  --port 20001 \
  --enable-eplb \
  --enable-expert-parallel \
  --trust-remote-code \

curl -X POST http://127.0.0.1:20001/v1/completions  \
     -H "Content-Type: application/json" \
     -d '{
         "model": "/workspace/models/DeepSeek-V2-Lite",
         "prompt": ["Explain the theory of relativity in simple terms."],
         "max_tokens": 50,
         "temperature": 0.0,
         "top_p": 1,
         "top_k": 1
         }'

Test Result

(Optional) Documentation Update

gemini-code-assist

Code Review

This pull request adds support for Expert Parallelism Load Balancing (EPLB) to unquantized Mixture of Experts (MoE) models, which is a great enhancement. The changes look good overall.

My main feedback is on the use of assert statements for input validation in vllm/model_executor/layers/fused_moe/layer.py. While assert is useful for internal sanity checks, it's not suitable for validating parameters in library code because assertions can be disabled in production environments. I've suggested replacing them with explicit checks that raise ValueError or TypeError to make the code more robust.

Please take a look at my detailed comments. Thanks for the contribution!

vllm/model_executor/layers/fused_moe/layer.py

github-actions · 2025-07-18T09:34:02Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: hsliu <[email protected]>

mergify · 2025-07-19T03:34:17Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @hsliuustc.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: hsliu <[email protected]>

abmfy · 2025-07-29T11:55:17Z

It looks like EPLB support for unquantized models has been implemented in #20775. Could you please confirm if that’s the case?
Thanks again for the contribution!

hsliuustc changed the title ~~Eplb dequant mo e~~ [Feature][EPLB] Add support for unquantized models Jul 18, 2025

gemini-code-assist bot reviewed Jul 18, 2025

View reviewed changes

vllm/model_executor/layers/fused_moe/layer.py Outdated Show resolved Hide resolved

vllm/model_executor/layers/fused_moe/layer.py Outdated Show resolved Hide resolved

[Feature][EPLB] Add support for unquantized models

fe75274

Signed-off-by: hsliu <[email protected]>

hsliuustc force-pushed the EPLB-dequant-MoE branch from 4f96014 to fe75274 Compare July 19, 2025 03:33

mergify bot added the needs-rebase label Jul 19, 2025

Merge branch 'main' into EPLB-dequant-MoE

d18dae8

mergify bot removed the needs-rebase label Jul 19, 2025

hsliuustc added 2 commits July 19, 2025 11:48

remove assertations

a1c1f3b

Signed-off-by: hsliu <[email protected]>

Fix line length issues in fused_moe layer.py

7d56ab5

Signed-off-by: hsliu <[email protected]>

hsliuustc force-pushed the EPLB-dequant-MoE branch from a604b32 to 7d56ab5 Compare July 21, 2025 05:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature][EPLB] Add support for unquantized models #21168

[Feature][EPLB] Add support for unquantized models #21168

hsliuustc commented Jul 18, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jul 18, 2025

Uh oh!

mergify bot commented Jul 19, 2025

Uh oh!

abmfy commented Jul 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[Feature][EPLB] Add support for unquantized models #21168

Are you sure you want to change the base?

[Feature][EPLB] Add support for unquantized models #21168

Conversation

hsliuustc commented Jul 18, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jul 18, 2025

Uh oh!

mergify bot commented Jul 19, 2025

Uh oh!

abmfy commented Jul 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hsliuustc commented Jul 18, 2025 •

edited by github-actions bot

Loading