Skip to content

Conversation

nngokhale
Copy link
Contributor

No description provided.

nngokhale and others added 4 commits October 8, 2025 10:29
Add VLLM_CONTIGUOUS_PA=true, VLLM_DEFRAG=true to llama and granite
models
Add  --async_scheduling for all
Add file with output env vars to control script generation
Remove VLLM_DECODE_BLOCK_BUCKET_MAX due to crash in flat_pa when its >
hpu_blocks.
Update granite settings with correct model config
Align num decode graph compute when contigous_pa, Add test.

Signed-off-by: Neelesh Gokhale <[email protected]>
Co-authored-by: Agata Dobrzyniewicz <[email protected]>
Change granite 4k max model len to 4096
Add endpoint-type to Vision benchmark template

Signed-off-by: Neelesh Gokhale <[email protected]>
Add export VLLM_WEIGHT_LOAD_FORCE_SYNC=1

Signed-off-by: Neelesh Gokhale <[email protected]>
Copy link

✅ CI Passed

All checks passed successfully against the following vllm commit:
f71952c1c49fb86686b0b300b727b26282362bf4

Copy link

✅ CI Passed

All checks passed successfully against the following vllm commit:
f71952c1c49fb86686b0b300b727b26282362bf4

Copy link
Collaborator

@wpyszka wpyszka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

align 0.10.2 with 0.11, approved

@wpyszka wpyszka merged commit 14b1f9d into vllm-project:releases/v0.11.0 Oct 16, 2025
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants