Qwen3 model #14

jlonge4 · 2025-04-30T18:17:06Z

Issue #, if available:
N/A
Description of changes:
Add Qwen3 model file and inference notebook. Tested with Qwen/Qwen3-8B

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

jlonge4 · 2025-05-14T14:26:04Z

Logit Validation Benchmark Code:

!inference_demo \
    --model-type qwen3 \
    --task-type causal-lm \
    run \
    --model-path /home/ubuntu/model_hf_qwen/qwen/ \
    --compiled-model-path /home/ubuntu/traced_model_qwen/qwen/logit \
    --torch-dtype bfloat16 \
    --tp-degree 8 \
    --batch-size 1 \
    --max-context-length 16 \
    --seq-len 32 \
    --enable-bucketing \
    --pad-token-id 151645 \
    --prompt "To be, or not to be" \
    --check-accuracy-mode logit-matching \
    --benchmark

Results:

Expected Output:  [", that is the question. Whether 'tis nobler in the mind to suffer the slings and arrows of outrageous fortune"] tensor([[   11,   429,   374,   279,  3405,    13, 13139,   364,    83,   285,
         13049,  1536,   304,   279,  3971,   311,  7676,   279,  1739,   819,
           323, 36957,   315, 54488, 32315]])
Expected Logits Shape:  torch.Size([25, 1, 151936])
Actual Output:  [", that is the question. Whether 'tis nobler in the mind to suffer the slings and arrows of outrageous fortune"] tensor([[   11,   429,   374,   279,  3405,    13, 13139,   364,    83,   285,
         13049,  1536,   304,   279,  3971,   311,  7676,   279,  1739,   819,
           323, 36957,   315, 54488, 32315]])
Actual Logits Shape:  torch.Size([25, 1, 151936])
Passed logits validation!

Generating outputs...
Prompts: ['To be, or not to be']
Generated outputs:
Output 0: To be, or not to be, that is the question. Whether 'tis nobler in the mind to suffer the slings and arrows of outrageous fortune

Benchmark completed and its result is as following
{
    "e2e_model": {
        "latency_ms_p50": 169.31116580963135,
        "latency_ms_p90": 172.9245901107788,
        "latency_ms_p95": 174.3390679359436,
        "latency_ms_p99": 174.82486009597778,
        "latency_ms_p100": 174.94630813598633,
        "latency_ms_avg": 169.6009874343872,
        "throughput": 188.67814677305284
    },
    "context_encoding_model": {
        "latency_ms_p50": 13.715386390686035,
        "latency_ms_p90": 13.958406448364258,
        "latency_ms_p95": 13.969480991363525,
        "latency_ms_p99": 13.981258869171143,
        "latency_ms_p100": 13.984203338623047,
        "latency_ms_avg": 13.787257671356201,
        "throughput": 1160.4918382892702
    },
    "token_generation_model": {
        "latency_ms_p50": 8.931398391723633,
        "latency_ms_p90": 9.162139892578125,
        "latency_ms_p95": 9.23851728439331,
        "latency_ms_p99": 9.780135154724094,
        "latency_ms_p100": 12.94398307800293,
        "latency_ms_avg": 9.013524055480957,
        "throughput": 118.34069117705926
    }
}

jlonge4 · 2025-05-19T19:57:19Z

contributed/models/qwen3/qwen-3-test.ipynb

+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Thinking example"


@EmilyWebber that should do it : )

ValkyriaLenneth · 2025-05-27T00:53:54Z

@jlonge4
Thanks for your great work.
But I'm confusing about the transformers version.
Since neuronx-distributed need transformers==4.48, however the qwen3 need transformers>=4.51
How could you fix this problem
Thanks

jlonge4 · 2025-05-30T16:25:32Z

@ValkyriaLenneth Thanks for the kind words. At the time of creating this PR, I got this working on an older SDK version (Neuron SDK 2.17), AMI ID = ami-04faec134fd67f201.
With this version/AMI I had no issue using transformers==4.51.3.

jlonge4 · 2025-06-02T15:15:26Z

Updated model file and re run test notebook on latest AMI ami-0d0a2d26f80b645c2 and associated package versions:

libneuronxla                  2.2.3493.0+78c3e78c
neuronx-cc                    2.18.121.0+9e31e41a
neuronx-distributed           0.12.12111+cdd84048
neuronx-distributed-inference 0.3.5591+f50feae2
torch-neuronx                 2.6.0.2.7.5413+113e6810

AMI venv used aws_neuronx_venv_pytorch_2_6_nxd_inference
Neuron SDK v2.23.0 used reference

jlonge4 · 2025-06-02T17:21:09Z

Logit Validation with seq_length=1024 context_length=512

Result: Minimal logit divergence.

Test failed at batch 0 token 103. Top k = 5 error 0.01682760939002037 > 0.01.
Test failed at batch 0 token 108. Top k = 5 error 0.016880331560969353 > 0.01.
Divergence at index 204. Validating 1 tokens in each batch.
Divergence at index 319. Validating 115 tokens in each batch.
Test failed at batch 0 token 286. Top k = None error 0.07318327575922012 > 0.05. Top k = 1000 error 0.07318327575922012 > 0.03. Top k = 50 error 0.07318327575922012 > 0.02. Top k = 5 error 0.07318327575922012 > 0.01.
No divergence. Validating the remaining 81 tokens in each batch.
Test failed at batch 0 token 360. Top k = None error 0.06745750457048416 > 0.05. Top k = 1000 error 0.05250008776783943 > 0.03. Top k = 50 error 0.03233567625284195 > 0.02. Top k = 5 error 0.03233567625284195 > 0.01.
Test failed at batch 0 token 364. Top k = None error 0.37251684069633484 > 0.05. Top k = 1000 error 0.35812416672706604 > 0.03. Top k = 50 error 0.35812416672706604 > 0.02. Top k = 5 error 0.35812416672706604 > 0.01.
Summary: Max divergence difference = 0 at index (batch 0 token 0), Top k = None max error = 0.37251684069633484 at index (batch 0 token 364), Top k = 1000 max error = 0.35812416672706604 at index (batch 0 token 364), Top k = 50 max error = 0.35812416672706604 at index (batch 0 token 364), Top k = 5 max error = 0.35812416672706604 at index (batch 0 token 364)
Test fails logit validation.

jlonge4 commented May 19, 2025

View reviewed changes

jlonge4 mentioned this pull request May 20, 2025

Support Qwen3 huggingface/optimum-neuron#847

Merged

3 tasks

jlonge4 added 5 commits June 2, 2025 11:04

lint

110903a

add inference nb

2e783ea

logit val / cleanup

a75e7e2

update with thinking example

f3314cd

Update Qwen3 for latest nxdi

eeb5c29

jlonge4 force-pushed the jl-qwen3 branch from 4e16bca to eeb5c29 Compare June 2, 2025 15:12

larger logit val shape

77ced34

jlonge4 closed this Sep 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qwen3 model #14

Qwen3 model #14

Uh oh!

jlonge4 commented Apr 30, 2025

Uh oh!

jlonge4 commented May 14, 2025 •

edited

Loading

Uh oh!

jlonge4 May 19, 2025

Uh oh!

ValkyriaLenneth commented May 27, 2025

Uh oh!

jlonge4 commented May 30, 2025 •

edited

Loading

Uh oh!

jlonge4 commented Jun 2, 2025 •

edited

Loading

Uh oh!

jlonge4 commented Jun 2, 2025 •

edited

Loading

Uh oh!

Uh oh!

Qwen3 model #14

Qwen3 model #14

Uh oh!

Conversation

jlonge4 commented Apr 30, 2025

Uh oh!

jlonge4 commented May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jlonge4 May 19, 2025

Choose a reason for hiding this comment

Uh oh!

ValkyriaLenneth commented May 27, 2025

Uh oh!

jlonge4 commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jlonge4 commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jlonge4 commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jlonge4 commented May 14, 2025 •

edited

Loading

jlonge4 commented May 30, 2025 •

edited

Loading

jlonge4 commented Jun 2, 2025 •

edited

Loading

jlonge4 commented Jun 2, 2025 •

edited

Loading