Skip to content

Conversation

WoosukKwon
Copy link
Collaborator

No description provided.

@WoosukKwon WoosukKwon merged commit ee88a7e into main Apr 9, 2023
@WoosukKwon WoosukKwon deleted the dummy branch April 9, 2023 06:36
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024
tianyil1 pushed a commit to tianyil1/vllm that referenced this pull request Jun 5, 2024
* Bucketing/Warmup WIP

* Cleanup

* Revert "Fix model_output_idx on HPU (vllm-project#27)"

This reverts commit 90dfa92.

* Rework selected_token_indices fix to also work with block_size padding

* Simple prompt attention POC

* Remove cumsum

* MQA/GQA support for simple prompt_attention

* Cleanup

* Fix typo

* Restore profiling runs
dllehr-amd pushed a commit to dllehr-amd/vllm that referenced this pull request Jul 22, 2024
…ernel tuning script for rocm.

Merge pull request vllm-project#33  - tuned moe configs v2
bigPYJ1151 pushed a commit to bigPYJ1151/vllm that referenced this pull request Jul 31, 2024
@alixiaodi alixiaodi mentioned this pull request Aug 2, 2024
zyongye pushed a commit to zyongye/vllm that referenced this pull request Aug 5, 2025
zyongye pushed a commit to zyongye/vllm that referenced this pull request Aug 6, 2025
heheda12345 added a commit to heheda12345/vllm that referenced this pull request Sep 29, 2025
…oject#26)

* indexer medatata to separate prefill and decode

* deep_gemm prefill kernel

* decode kernel, can run for single batch

* bug fixing insert decode k into kv before gemm

* don't use tilelang quant function

* faster non-looping torch for kv cache insertion

* add chunked prefill impl

* change quant kernel back to tilelang for promotion

* fix format (vllm-project#31)

Signed-off-by: Chen Zhang <[email protected]>

* update unit tests

* Fp8 indexer prefill (vllm-project#33)

* init

Signed-off-by: Chen Zhang <[email protected]>

* can run

---------

Signed-off-by: Chen Zhang <[email protected]>

* remove debug comment

Signed-off-by: Chen Zhang <[email protected]>

* cleanup

* further cleanup

---------

Signed-off-by: Chen Zhang <[email protected]>
Co-authored-by: mgoin <[email protected]>
Co-authored-by: Chen Zhang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant