accelerate Qwen-3-8b model with speculative decoding #3077

guybd · 2025-09-09T12:34:46Z

No description provided.

review-notebook-app · 2025-09-09T12:34:50Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

guybd · 2025-09-09T12:36:25Z

@sbalandi

sbalandi · 2025-09-10T13:26:13Z

supplementary_materials/notebooks/qwen-3/qwen3.ipynb

@@ -0,0 +1,598 @@
+{


please, add links with information, it would be nice if link is added to original model https://ollama.com/library/qwen3:8b/https://huggingface.co/Qwen/Qwen3-8B , to Hugging Face SmolAgents and QwenAgent( for example https://github.com/QwenLM/Qwen-Agent) , to OpenVINO GenAI and to some information about speculative decoding (for example paper https://arxiv.org/pdf/2211.17192 or post https://medium.com/openvino-toolkit/accelerating-llm-inference-with-speculative-decoding-using-openvino-genai-api-d965dfbb443e)

Reply via ReviewNB

sbalandi · 2025-09-10T13:26:14Z

supplementary_materials/notebooks/qwen-3/qwen3.ipynb

@@ -0,0 +1,598 @@
+{


Could you please add the range of versions or lower limit ?

Reply via ReviewNB

sbalandi · 2025-09-10T13:26:14Z

supplementary_materials/notebooks/qwen-3/qwen3.ipynb

@@ -0,0 +1,598 @@
+{


add link to repo https://github.com/openvinotoolkit/nncf
for links, please, follow next markup:
[OpenVINO LLM collection](https://huggingface.co/OpenVINO/Qwen3-8B-int4-ov):

Reply via ReviewNB

sbalandi · 2025-09-10T13:26:14Z

supplementary_materials/notebooks/qwen-3/qwen3.ipynb

@@ -0,0 +1,598 @@
+{


Could you please add some information about options of scheduler config which you set or just some link to explanation, for example: more information can be find here ?
Are this notebook applicable only for GPU ? If yes, you need to note it in ReadME and at the top of notebook, if the notebook also can be run on CPU, please, add chose for that - see the example here https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/speculative-sampling/speculative-sampling.ipynb in section "Select inference device"

Reply via ReviewNB

sbalandi · 2025-09-10T13:26:14Z

supplementary_materials/notebooks/qwen-3/qwen3.ipynb

@@ -0,0 +1,598 @@
+{


need to delete # ?

Reply via ReviewNB

sbalandi · 2025-09-10T13:26:14Z

supplementary_materials/notebooks/qwen-3/qwen3.ipynb

@@ -0,0 +1,598 @@
+{


Could you please add some explanation about num_assistant_tokens , smth like num_assistant_tokense defines numbers of candidates generated by draft_model per iteration

Reply via ReviewNB

sbalandi · 2025-09-10T13:26:14Z

supplementary_materials/notebooks/qwen-3/qwen3.ipynb

@@ -0,0 +1,598 @@
+{


I can't load that model, is this model will be added ?

Reply via ReviewNB

sbalandi · 2025-09-10T14:46:26Z

supplementary_materials/notebooks/qwen-3/smolagents/qwen3_agent.ipynb

@@ -0,0 +1,349 @@
+{


Why do you separate requirements in .txt and some other packages ?
ipython ipykernel ipywidgets - should be installed via common file from openvino_notebooks/requirements.txt

Reply via ReviewNB

sbalandi · 2025-09-10T14:46:26Z

supplementary_materials/notebooks/qwen-3/smolagents/qwen3_agent.ipynb

@@ -0,0 +1,349 @@
+{


I got problem here on generation output: raise AgentGenerationError(f"Error while generating output:\n{e}", self.logger) from e
It seems to me, that it's connected with error on prev step "Start the OpenVINO GenAI Server": ERROR: [Errno 98] error while attempting to bind on address ('127.0.0.1', 8000): address already in use , maybe it conflicts with jupyter server and it should be used another port

Reply via ReviewNB

@ofirzaf will check

guybd · 2025-09-17T16:52:05Z

@sbalandi
we have handled all your comments. please review

guybd added 5 commits September 3, 2025 09:28

adding qwen-3 notebook

979551f

remove cell outputs

42510e8

modify documentation

bba7831

Merge branch 'openvinotoolkit:latest' into latest

b394957

modify model path

34935fa

add smolagents demo

610416c

sbalandi reviewed Sep 10, 2025

View reviewed changes

guybd added 2 commits September 17, 2025 06:25

adding links, and answer Sofya's comments

ddad2d2

modify requirements and reinstall server

d420686

guybd marked this pull request as ready for review September 18, 2025 12:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

accelerate Qwen-3-8b model with speculative decoding #3077

accelerate Qwen-3-8b model with speculative decoding #3077

guybd commented Sep 9, 2025

Uh oh!

review-notebook-app bot commented Sep 9, 2025

Uh oh!

guybd commented Sep 9, 2025

Uh oh!

sbalandi Sep 10, 2025 •

edited

Loading

Uh oh!

sbalandi Sep 10, 2025 •

edited

Loading

Uh oh!

sbalandi Sep 10, 2025 •

edited

Loading

Uh oh!

sbalandi Sep 10, 2025 •

edited

Loading

Uh oh!

sbalandi Sep 10, 2025 •

edited

Loading

Uh oh!

sbalandi Sep 10, 2025 •

edited

Loading

Uh oh!

sbalandi Sep 10, 2025 •

edited

Loading

Uh oh!

sbalandi Sep 10, 2025 •

edited

Loading

Uh oh!

sbalandi Sep 10, 2025 •

edited

Loading

Uh oh!

guybd Sep 17, 2025

Uh oh!

guybd commented Sep 17, 2025

Uh oh!

Uh oh!

accelerate Qwen-3-8b model with speculative decoding #3077

Are you sure you want to change the base?

accelerate Qwen-3-8b model with speculative decoding #3077

Conversation

guybd commented Sep 9, 2025

Uh oh!

review-notebook-app bot commented Sep 9, 2025

Uh oh!

guybd commented Sep 9, 2025

Uh oh!

sbalandi Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sbalandi Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sbalandi Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sbalandi Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sbalandi Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sbalandi Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sbalandi Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sbalandi Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sbalandi Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guybd Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

guybd commented Sep 17, 2025

Uh oh!

Uh oh!

sbalandi Sep 10, 2025 •

edited

Loading

sbalandi Sep 10, 2025 •

edited

Loading

sbalandi Sep 10, 2025 •

edited

Loading

sbalandi Sep 10, 2025 •

edited

Loading

sbalandi Sep 10, 2025 •

edited

Loading

sbalandi Sep 10, 2025 •

edited

Loading

sbalandi Sep 10, 2025 •

edited

Loading

sbalandi Sep 10, 2025 •

edited

Loading

sbalandi Sep 10, 2025 •

edited

Loading