-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
Closed
Labels
bugSomething isn't workingSomething isn't workinghelp wantedExtra attention is neededExtra attention is neededmulti-modalityRelated to multi-modality (#4194)Related to multi-modality (#4194)v1
Description
In V1, we expect the output of get_multimodal_embedding
to correspond to the PlaceholderRange
, which is in turn constructed based on PromptUpdateDetails.features
. However, the current V1 code doesn't validate this, causing the model to crash during inference when under high load (e.g. #14897, #14963).
From a quick look at the code, these models output embedding sizes which are inconsistent with the placeholder range:
- Fuyu (fixed by [Bugfix] Added
embed_is_patch
mask for fuyu model #15731) - Gemma3 (fixed by [Bugfix] Re-enable Gemma3 for V1 #14980)
- Idefics3 (fixed by [Bugfix]
embed_is_patch
for Idefics3 #15696) - InternVL-based models (fixed by [Bugfix] Fix embedding assignment for InternVL-based models #15086)
- MiniCPM-V (fixed by [Model] MiniCPM-V/O supports V1 #15487)
(Basically, any model that has image newline/column tokens after applying HF processor needs a mask to map image patch features to image embeddings, as described below.)
To fix this, we can follow these steps:
- Update the multi-modal processor to output a mask to indicate which positions in the
PlaceholderRange
-aligned embeddings should the patch features (outputted by vision encoder) be assigned to. This mask can be calledembed_is_patch
. - Use
scatter_patch_features
to scatter the patch features into the image embedding tensor. - When merging multimodal embeddings, use
select_patch_features
to recover the patch features from the image embeddings. The number of patch features should correspond to the number of image tokens (which is a subset of the feature tokens inPromptUpdateDetails
).
Follow-up work:
- [V1] Scatter and gather placeholders in the model runner #15712 (assigned to @DarkLight1337)
- Directly use individual token IDs instead of range of IDs (assigned to @ywang96 )
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workinghelp wantedExtra attention is neededExtra attention is neededmulti-modalityRelated to multi-modality (#4194)Related to multi-modality (#4194)v1
Type
Projects
Status
Done