Skip to content

Conversation

LucasWilkinson
Copy link
Collaborator

Improvement to address: vllm-project/vllm#18619 (comment)

When running the combine with large batch that is almost entirely decode with 1 prefill the previous grid was excessively large making the combine kernel take a long time.

Before this PR the grid size for combine would be cdiv(max_seqlen_q * num_heads, kBlockM) x batch_size after this PR its (cdiv(total_q * num_heads, kBlockM) + batch_size) x 1 which scales much better for large batches that are primarily made up of decodes.

e.g. if we have a batch of 256 where the q_seqlens are [600] + [1] * 255, (assuming num_heads 8 and kBlockM 8)

before this PR the grid would be:
cdiv(600 * 8, 8) x 256 = 153600

after this PR the grid is:
cdiv(855 * 8, 8) + 256 x 1 = 1111

Copy link
Member

@tlrmchlsmth tlrmchlsmth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good to me. Optimization makes sense. Nice work.

Should we try to push this upstream?

@LucasWilkinson
Copy link
Collaborator Author

Ya I'm going to make an upstream PR

LucasWilkinson and others added 11 commits June 16, 2025 18:04
Signed-off-by: Lucas Wilkinson <[email protected]>
Signed-off-by: Lucas Wilkinson <[email protected]>
Signed-off-by: Lucas Wilkinson <[email protected]>
Signed-off-by: Lucas Wilkinson <[email protected]>
Signed-off-by: Lucas Wilkinson <[email protected]>
Signed-off-by: Lucas Wilkinson <[email protected]>
Signed-off-by: Lucas Wilkinson <[email protected]>
Signed-off-by: Lucas Wilkinson <[email protected]>
Signed-off-by: Lucas Wilkinson <[email protected]>
Co-authored-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Lucas Wilkinson <[email protected]>
Signed-off-by: Lucas Wilkinson <[email protected]>
@LucasWilkinson LucasWilkinson force-pushed the lwilkinson/varlen-combine-scheduler branch from 604050e to 566d676 Compare June 16, 2025 18:05
@LucasWilkinson LucasWilkinson merged commit 2c6bcfc into main Jun 16, 2025
1 check passed
zyongye pushed a commit to zyongye/flash-attention that referenced this pull request Aug 7, 2025
* varlen combine scheduler

Signed-off-by: Lucas Wilkinson <[email protected]>

* cleanup

Signed-off-by: Lucas Wilkinson <[email protected]>

* move check

Signed-off-by: Lucas Wilkinson <[email protected]>

* standard scheduling algo

Signed-off-by: Lucas Wilkinson <[email protected]>

* better heuristic

Signed-off-by: Lucas Wilkinson <[email protected]>

* better comments

Signed-off-by: Lucas Wilkinson <[email protected]>

* cleanup

Signed-off-by: Lucas Wilkinson <[email protected]>

* cleanup

Signed-off-by: Lucas Wilkinson <[email protected]>

* put in a more readable heurisitic

Signed-off-by: Lucas Wilkinson <[email protected]>

* Apply suggestions from code review

Co-authored-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Lucas Wilkinson <[email protected]>

* FA2 8.0 PTX (vllm-project#69)

Signed-off-by: Lucas Wilkinson <[email protected]>

---------

Signed-off-by: Lucas Wilkinson <[email protected]>
Co-authored-by: Tyler Michael Smith <[email protected]>
LucasWilkinson added a commit that referenced this pull request Aug 7, 2025
* varlen combine scheduler

Signed-off-by: Lucas Wilkinson <[email protected]>

* cleanup

Signed-off-by: Lucas Wilkinson <[email protected]>

* move check

Signed-off-by: Lucas Wilkinson <[email protected]>

* standard scheduling algo

Signed-off-by: Lucas Wilkinson <[email protected]>

* better heuristic

Signed-off-by: Lucas Wilkinson <[email protected]>

* better comments

Signed-off-by: Lucas Wilkinson <[email protected]>

* cleanup

Signed-off-by: Lucas Wilkinson <[email protected]>

* cleanup

Signed-off-by: Lucas Wilkinson <[email protected]>

* put in a more readable heurisitic

Signed-off-by: Lucas Wilkinson <[email protected]>

* Apply suggestions from code review

Co-authored-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Lucas Wilkinson <[email protected]>

* FA2 8.0 PTX (#69)

Signed-off-by: Lucas Wilkinson <[email protected]>

---------

Signed-off-by: Lucas Wilkinson <[email protected]>
Co-authored-by: Tyler Michael Smith <[email protected]>
jayhshah pushed a commit that referenced this pull request Aug 8, 2025
* varlen combine scheduler

Signed-off-by: Lucas Wilkinson <[email protected]>

* cleanup

Signed-off-by: Lucas Wilkinson <[email protected]>

* move check

Signed-off-by: Lucas Wilkinson <[email protected]>

* standard scheduling algo

Signed-off-by: Lucas Wilkinson <[email protected]>

* better heuristic

Signed-off-by: Lucas Wilkinson <[email protected]>

* better comments

Signed-off-by: Lucas Wilkinson <[email protected]>

* cleanup

Signed-off-by: Lucas Wilkinson <[email protected]>

* cleanup

Signed-off-by: Lucas Wilkinson <[email protected]>

* put in a more readable heurisitic

Signed-off-by: Lucas Wilkinson <[email protected]>

* Apply suggestions from code review

Co-authored-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Lucas Wilkinson <[email protected]>

* FA2 8.0 PTX (#69)

Signed-off-by: Lucas Wilkinson <[email protected]>

---------

Signed-off-by: Lucas Wilkinson <[email protected]>
Co-authored-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Jay Shah <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants