Support block size 32 #35

WoosukKwon · 2023-04-10T05:33:35Z

This PR adds support for block size 32. It turns out that no modification to our attention kernel is required for this support.

…ge_with_newer_pytorch Update base docker image with Pytorch 2.3

@cyang49

…ch instead. (vllm-project#35) I tested the previous fix for the Triton cache collision issue (see: vllm-project#34) and it didn't work. I now see errors like: ``` FileNotFoundError: [Errno 2] No such file or directory: '/home/vllm/.triton/cache/1feb415f3280ca46eea8c4407a58c23e/fused_moe_kernel.json.tmp.pid_72_c0a0033e-6147-4520-ae3a-3847d02598f8' ``` which now shows the `uuid` instead of a random integer, but problem remains. This PR implements a different workaround, proposed by @cyang49, that tells Triton to use a custom cache manager which assigns a different directory based on the process id. This time I have tested it and it seems to work. --------- Signed-off-by: Thomas Parnell <[email protected]> Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Joe Runde <[email protected]> Co-authored-by: Chih-Chieh-Yang <[email protected]> Co-authored-by: Joe Runde <[email protected]> Co-authored-by: Nick Hill <[email protected]>

…wq_gptq_autoround

Signed-off-by: dengyunyang <[email protected]> Co-authored-by: dengyunyang <[email protected]>

WoosukKwon added 2 commits April 10, 2023 05:30

Support block size 32

cb961c6

Minor

8b24ddd

WoosukKwon merged commit b9926f7 into main Apr 10, 2023

WoosukKwon deleted the block-size branch April 10, 2023 06:07

shanshanpt mentioned this pull request Nov 17, 2023

Run long conetxt error : CUDA error: an illegal memory access was encountered #1700

Closed

junior-zsy mentioned this pull request Nov 20, 2023

Error with 32k Long Text in chatglm2-6b-32k Model #1725

Closed

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

Support block size 32 (vllm-project#35)

df652f5

tianyil1 pushed a commit to tianyil1/vllm that referenced this pull request Jun 5, 2024

Restore int64 sampling (vllm-project#35)

f6fb119

fxmarty pushed a commit to fxmarty/vllm-public that referenced this pull request Jun 12, 2024

Merge pull request vllm-project#35 from ROCm/charlifu/update_base_ima…

68cdb95

…ge_with_newer_pytorch Update base docker image with Pytorch 2.3

ZHJ19970917 mentioned this pull request Jul 14, 2024

[Bug]: When using qwen-32b-chat-awq with multi-threaded access, errors occur after approximately several hundred visits.”vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop has errored already.“ #6421

Closed

bigPYJ1151 pushed a commit to bigPYJ1151/vllm that referenced this pull request Jul 30, 2024

Merge pull request vllm-project#35 from intel-sandbox/jianan/enable_a…

fb4facc

…wq_gptq_autoround

alixiaodi mentioned this pull request Aug 2, 2024

[Bug]: #7072

Closed

sgsdxzy mentioned this pull request May 4, 2025

[Bug]: Unable to run Qwen3 on Turing GPUs after upgrading to torch 2.7.0 #17639

Open

1 task

hao-cold mentioned this pull request May 13, 2025

[Bug]: CUDA error: an illegal instruction was encountered #18045

Closed

1 task

markmc mentioned this pull request May 21, 2025

[Bug][Failing Test]: Distributed Comm Ops - distributed/test_shm_broadcast.py #18492

Closed

1 task

zerosurplus mentioned this pull request Jun 16, 2025

[Bug]: torch.distributed.DistNetworkError: The client socket has timed out after 600000ms while trying to connect to (172.17.0.9, 46229). #19670

Open

1 task

xiaomofang mentioned this pull request Jul 31, 2025

[Bug]: There is an issue with speculative inference in Eagle mode, where the context length of vLLM inference is constrained by the draft model. #21986

Open

1 task

Bounty-hunter added a commit to Bounty-hunter/vllm that referenced this pull request Sep 25, 2025

rm unused embedding file (vllm-project#35)

47e8a97

Signed-off-by: dengyunyang <[email protected]> Co-authored-by: dengyunyang <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Support block size 32 #35

Support block size 32 #35

Uh oh!

WoosukKwon commented Apr 10, 2023

Uh oh!

Uh oh!

Uh oh!

Support block size 32 #35

Support block size 32 #35

Uh oh!

Conversation

WoosukKwon commented Apr 10, 2023

Uh oh!

Uh oh!