-
-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Optimize tensor parallel execution speed #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
WoosukKwon
approved these changes
Mar 31, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Thanks for the effort.
WoosukKwon
reviewed
Mar 31, 2023
hongxiayang
pushed a commit
to hongxiayang/vllm
that referenced
this pull request
Feb 13, 2024
AdrianAbeyta
referenced
this pull request
in ROCm/vllm
Mar 8, 2024
Rebase fp8_kv branch with upstream (3-07-2024)
z103cb
referenced
this pull request
in z103cb/opendatahub_vllm
Apr 22, 2024
These Dockerfile changes: - Update the release stage to work with the recently refactored `requirements-common.txt` / `requirements-cuda.txt` split - Fixup the kernel compilation in the `build` stage to correctly pick up cuda - Install the kernels from this docker build rather than pulling a precompiled wheel. We can swap that back once a new wheel is available with the correct pytorch version + updated interfaces --------- Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Joe Runde <[email protected]> Co-authored-by: Joe Runde <[email protected]>
fxmarty
pushed a commit
to fxmarty/vllm-public
that referenced
this pull request
May 31, 2024
[ROCm] adding a missing triton autotune config
Closed
1 task
wuhuikx
pushed a commit
to wuhuikx/vllm
that referenced
this pull request
Mar 27, 2025
### What this PR does / why we need it? Add dispatch key for NPU, so that the log could be print correctly. Now ``` executor_base.py:110] # CPU blocks: 220478, # CPU blocks: 21845 ``` After this pr ``` executor_base.py:110] # NPU blocks: 220478, # CPU blocks: 21845 ``` ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed and log printed as above Signed-off-by: MengqingCao <[email protected]>
1 task
1 task
1 task
robertgshaw2-redhat
added a commit
that referenced
this pull request
Jul 7, 2025
Load balance across multiple workers
zyongye
pushed a commit
to zyongye/vllm
that referenced
this pull request
Aug 5, 2025
zyongye
pushed a commit
to zyongye/vllm
that referenced
this pull request
Aug 6, 2025
1 task
heheda12345
added a commit
to heheda12345/vllm
that referenced
this pull request
Sep 29, 2025
* prefill mla Signed-off-by: Chen Zhang <[email protected]> * can run now Signed-off-by: Chen Zhang <[email protected]> * tmp Signed-off-by: Chen Zhang <[email protected]> * can output the first token Signed-off-by: Chen Zhang <[email protected]> * fix bug Signed-off-by: Chen Zhang <[email protected]> * remove some debug Signed-off-by: Chen Zhang <[email protected]> * update Signed-off-by: Chen Zhang <[email protected]> * hack through cu_seqlen_ks exploding issue * update basic.py Signed-off-by: Chen Zhang <[email protected]> * remove some unnecessary changes Signed-off-by: Chen Zhang <[email protected]> * clean up Signed-off-by: Chen Zhang <[email protected]> --------- Signed-off-by: Chen Zhang <[email protected]> Co-authored-by: Yongye Zhu <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Speed before this PR:
Speed after this PR: