[Bug]: Gemma3 ValueError: Attempted to assign 256 + 256 = 512 multimodal tokens to 1536 placeholders

### Your current environment

Collecting environment information...                                                                                                                                                                        
PyTorch version: 2.6.0+cu124                                                                                                                                                                                 
Is debug build: False                                                                                                                                                                                        
CUDA used to build PyTorch: 12.4                                                                                                                                                                             
ROCM used to build PyTorch: N/A                                                                                                                                                                              
                                                                                                                                                                                                             
OS: Ubuntu 22.04.5 LTS (x86_64)                                                                                                                                                                              
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0                                                                                                                                                           
Clang version: Could not collect                                                                                                                                                                             
CMake version: version 3.22.1                                                                                                                                                                                
Libc version: glibc-2.35                                                                                                                                                                                     
                                                                                                                                                                                                            

### 🐛 Describe the bug

Just refer to this code snippet 
https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/vision_language_multi_image.py#L87-L114

and run the following script:
python tools/nemo_curator/test_vllm.py --model-type gemma3 --method generate

And Raising the error:
[Bug]: Gemma3 ValueError: Attempted to assign 256 + 256 = 512 multimodal tokens to 1536 placeholders

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: Gemma3 ValueError: Attempted to assign 256 + 256 = 512 multimodal tokens to 1536 placeholders #14963

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Gemma3 ValueError: Attempted to assign 256 + 256 = 512 multimodal tokens to 1536 placeholders #14963

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions