Skip to content

Conversation

jinzhen-lin
Copy link
Contributor

@jinzhen-lin jinzhen-lin commented Aug 16, 2025

Fix #22881 .
The origin issue is introduced by #22017 . The gate layer is initilized with fp8 quantization, but the origin weight is bf16.

Signed-off-by: Jinzhen Lin <[email protected]>
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added the qwen Related to Qwen models label Aug 16, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses an FP8 accuracy issue for MoE models like Qwen3. The root cause was that layers intended to be skipped were being quantized because the configuration key modules_to_not_convert was not being checked. The fix correctly adds a fallback to check for this key if ignored_layers is not found or is empty. This change ensures compatibility with Hugging Face's quantization configuration format and resolves the accuracy problem. The implementation is correct and well-targeted.

Copy link
Collaborator

@yewentao256 yewentao256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add some lm-eval results to show the problem has already fixed?

Copy link
Collaborator

@simon-mo simon-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verified locally as well. merging.

@mgoin mgoin added bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed labels Aug 17, 2025
@simon-mo
Copy link
Collaborator

simon-mo commented Aug 17, 2025

qwen3-30b-fp8

Before

+ curl http://localhost:8000/v1/completions -H 'Content-Type: application/json' -d '{
        "prompt": "What is the capital of France?",
        "max_tokens": 20,
        "temperature": 0
    }'
{"id":"cmpl-f570666360b0494faee6c8390c923980","object":"text_completion","created":1755391260,"model":"/mnt/localdisk/qwen3-30b-fp8","choices":[{"index":0,"text":"!!!!!!!!!!!!!!!!!!!!","logprobs":null,"finish_reason":"length","stop_reason":null,"prompt_logprobs":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":7,"total_tokens":27,"completion_tokens":20,"prompt_tokens_details":null},"kv_transfer_params":null}+ echo ''

After

+ curl http://localhost:8000/v1/completions -H 'Content-Type: application/json' -d '{
        "prompt": "What is the capital of France?",
        "max_tokens": 20,
        "temperature": 0
    }'
{"id":"cmpl-9b06470dce2d4f7fb42a946bbb4d2725","object":"text_completion","created":1755391074,"model":"/mnt/localdisk/qwen3-30b-fp8","choices":[{"index":0,"text":" The capital of France is Paris. Paris has been the capital since the 3rd century and is","logprobs":null,"finish_reason":"length","stop_reason":null,"prompt_logprobs":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":7,"total_tokens":27,"completion_tokens":20,"prompt_tokens_details":null},"kv_transfer_params":null}+ echo ''

@simon-mo simon-mo merged commit a258ad8 into vllm-project:main Aug 17, 2025
18 of 27 checks passed
666even666 pushed a commit to 666even666/vllm that referenced this pull request Aug 18, 2025
juuice-lee pushed a commit to juuice-lee/vllm-moe.code that referenced this pull request Aug 18, 2025
divakar-amd pushed a commit to divakar-amd/vllm_upstream that referenced this pull request Aug 20, 2025
djmmoss pushed a commit to djmmoss/vllm that referenced this pull request Aug 21, 2025
BoyuanFeng pushed a commit to BoyuanFeng/vllm that referenced this pull request Aug 21, 2025
epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025
xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025
xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025
dumb0002 pushed a commit to dumb0002/vllm that referenced this pull request Aug 28, 2025
googlercolin pushed a commit to googlercolin/vllm that referenced this pull request Aug 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Model outputs are always '!!!!!!!!!!!!!!'
4 participants