Add an option to use dummy weights #33

WoosukKwon · 2023-04-09T06:31:37Z

No description provided.

* Bucketing/Warmup WIP * Cleanup * Revert "Fix model_output_idx on HPU (vllm-project#27)" This reverts commit 90dfa92. * Rework selected_token_indices fix to also work with block_size padding * Simple prompt attention POC * Remove cumsum * MQA/GQA support for simple prompt_attention * Cleanup * Fix typo * Restore profiling runs

…ernel tuning script for rocm. Merge pull request vllm-project#33 - tuned moe configs v2

Enable jit for com ops

…oject#26) * indexer medatata to separate prefill and decode * deep_gemm prefill kernel * decode kernel, can run for single batch * bug fixing insert decode k into kv before gemm * don't use tilelang quant function * faster non-looping torch for kv cache insertion * add chunked prefill impl * change quant kernel back to tilelang for promotion * fix format (vllm-project#31) Signed-off-by: Chen Zhang <[email protected]> * update unit tests * Fp8 indexer prefill (vllm-project#33) * init Signed-off-by: Chen Zhang <[email protected]> * can run --------- Signed-off-by: Chen Zhang <[email protected]> * remove debug comment Signed-off-by: Chen Zhang <[email protected]> * cleanup * further cleanup --------- Signed-off-by: Chen Zhang <[email protected]> Co-authored-by: mgoin <[email protected]> Co-authored-by: Chen Zhang <[email protected]>

Add use-dummy-weights option

5cd8f3d

WoosukKwon merged commit ee88a7e into main Apr 9, 2023

WoosukKwon deleted the dummy branch April 9, 2023 06:36

starlitsky2010 mentioned this pull request Sep 23, 2023

killed due to memory pressure (OOM), 0 Workers crashed due to other reasons at node #1160

Closed

shanshanpt mentioned this pull request Nov 17, 2023

Run long conetxt error : CUDA error: an illegal memory access was encountered #1700

Closed

junior-zsy mentioned this pull request Nov 20, 2023

Error with 32k Long Text in chatglm2-6b-32k Model #1725

Closed

orellavie1212 mentioned this pull request Dec 11, 2023

Mixtral-8x7B-v0.1 TP 8 GPUS EDIT: TypeError: PaddedGatherOp.forward() takes 6 positional arguments but 7 were given #2022

Closed

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

Add an option to use dummy model weights (vllm-project#33)

5365aa5

ZHJ19970917 mentioned this pull request Jul 14, 2024

[Bug]: When using qwen-32b-chat-awq with multi-threaded access, errors occur after approximately several hundred visits.”vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop has errored already.“ #6421

Closed

dllehr-amd pushed a commit to dllehr-amd/vllm that referenced this pull request Jul 22, 2024

Tune fused_moe_kernel for TP 1,2,4,8 and bf16 and fp16, updated moe k…

38ada92

…ernel tuning script for rocm. Merge pull request vllm-project#33 - tuned moe configs v2

bigPYJ1151 pushed a commit to bigPYJ1151/vllm that referenced this pull request Jul 31, 2024

Merge pull request vllm-project#33 from intel-sandbox/jit_com

8fa444c

Enable jit for com ops

alixiaodi mentioned this pull request Aug 2, 2024

[Bug]: #7072

Closed

hao-cold mentioned this pull request May 13, 2025

[Bug]: CUDA error: an illegal instruction was encountered #18045

Closed

1 task

markmc mentioned this pull request May 21, 2025

[Bug][Failing Test]: Distributed Comm Ops - distributed/test_shm_broadcast.py #18492

Closed

1 task

zerosurplus mentioned this pull request Jun 16, 2025

[Bug]: torch.distributed.DistNetworkError: The client socket has timed out after 600000ms while trying to connect to (172.17.0.9, 46229). #19670

Open

1 task

xiaomofang mentioned this pull request Jul 31, 2025

[Bug]: There is an issue with speculative inference in Eagle mode, where the context length of vLLM inference is constrained by the draft model. #21986

Open

1 task

zyongye pushed a commit to zyongye/vllm that referenced this pull request Aug 5, 2025

Remove basic_oai.py (vllm-project#33)

d9c54da

zyongye pushed a commit to zyongye/vllm that referenced this pull request Aug 6, 2025

Remove basic_oai.py (vllm-project#33)

fd6e6fb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add an option to use dummy weights #33

Add an option to use dummy weights #33

Uh oh!

WoosukKwon commented Apr 9, 2023

Uh oh!

Uh oh!

Uh oh!

Add an option to use dummy weights #33

Add an option to use dummy weights #33

Uh oh!

Conversation

WoosukKwon commented Apr 9, 2023

Uh oh!

Uh oh!