-
Notifications
You must be signed in to change notification settings - Fork 919
accelerate Qwen-3-8b model with speculative decoding #3077
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: latest
Are you sure you want to change the base?
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
@@ -0,0 +1,598 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please, add links with information, it would be nice if link is added to original model https://ollama.com/library/qwen3:8b/https://huggingface.co/Qwen/Qwen3-8B , to Hugging Face SmolAgents and QwenAgent( for example https://github.com/QwenLM/Qwen-Agent) , to OpenVINO GenAI and to some information about speculative decoding (for example paper https://arxiv.org/pdf/2211.17192 or post https://medium.com/openvino-toolkit/accelerating-llm-inference-with-speculative-decoding-using-openvino-genai-api-d965dfbb443e)
Reply via ReviewNB
@@ -0,0 +1,598 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,598 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add link to repo https://github.com/openvinotoolkit/nncf
for links, please, follow next markup:
[OpenVINO LLM collection](https://huggingface.co/OpenVINO/Qwen3-8B-int4-ov):
Reply via ReviewNB
@@ -0,0 +1,598 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add some information about options of scheduler config which you set or just some link to explanation, for example: more information can be find here ?
Are this notebook applicable only for GPU ? If yes, you need to note it in ReadME and at the top of notebook, if the notebook also can be run on CPU, please, add chose for that - see the example here https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/speculative-sampling/speculative-sampling.ipynb in section "Select inference device"
Reply via ReviewNB
@@ -0,0 +1,598 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,598 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add some explanation about num_assistant_tokens , smth like num_assistant_tokense
defines numbers of candidates generated by draft_model per iteration
Reply via ReviewNB
@@ -0,0 +1,598 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,349 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you separate requirements in .txt and some other packages ?
ipython ipykernel ipywidgets - should be installed via common file from openvino_notebooks/requirements.txt
Reply via ReviewNB
@@ -0,0 +1,349 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got problem here on generation output: raise AgentGenerationError(f"Error while generating output:\n{e}", self.logger) from e
It seems to me, that it's connected with error on prev step "Start the OpenVINO GenAI Server": ERROR: [Errno 98] error while attempting to bind on address ('127.0.0.1', 8000): address already in use , maybe it conflicts with jupyter server and it should be used another port
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ofirzaf will check
@sbalandi |
No description provided.