Gemma3 Multimodal optimization #404

jiminha · 2025-10-14T14:16:35Z

This is to optimize gemma3 multimodal memory/performance.

bucket vision tower based on batch bucket to reduce recompile overhead
modify merge_multimodal to use torch.where instead of masked_scatter for performance issue
add warmup multimodal bucket to precompile vision tower
port PT_HPU_SDPA_QKV_SLICE_MODE_FWD feature from vllm-fork v0 : this is necessary to reduce the memory for the longer sequence length.

Signed-off-by: Jimin Ha <[email protected]>

Signed-off-by: Mohit Deopujari <[email protected]>

Signed-off-by: Jimin Ha <[email protected]>

Reduces memory usage for long sequences by eliminating dual attention mask creation. Improves capacity from 150 to 400 images with 8K prompts by avoiding OOM issues. Limitation: Only available when block_list is None. Signed-off-by: Jimin Ha <[email protected]>

github-actions · 2025-10-14T14:16:48Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

jiminha and others added 6 commits October 13, 2025 08:03

Initial code for multimodal warmup for Gemma

d982dcf

Signed-off-by: Jimin Ha <[email protected]>

Update warmup

8258007

Signed-off-by: Jimin Ha <[email protected]>

Overiding the _process_image_input for Gemma3

562e88d

Signed-off-by: Mohit Deopujari <[email protected]>

Compile error fix on latest vllm merge

153925b

Signed-off-by: Jimin Ha <[email protected]>

Update _merge_multimodal_embedding for perf issue in HPU

54227a1

Signed-off-by: Jimin Ha <[email protected]>

jiminha requested review from adobrzyn, afierka-intel, kzawora-intel, mgawarkiewicz-intel, michalkuligowski, mswiniarsk, vivekgoe and xuechendi as code owners October 14, 2025 14:16

jiminha marked this pull request as draft October 14, 2025 14:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Gemma3 Multimodal optimization #404

Gemma3 Multimodal optimization #404

Uh oh!

jiminha commented Oct 14, 2025

Uh oh!

github-actions bot commented Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Gemma3 Multimodal optimization #404

Are you sure you want to change the base?

Gemma3 Multimodal optimization #404

Uh oh!

Conversation

jiminha commented Oct 14, 2025

Uh oh!

github-actions bot commented Oct 14, 2025

🚧 CI Blocked

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants