[Core] Remove legacy input mapper/processor from V0 #15686

DarkLight1337 · 2025-03-28T07:26:17Z

Blockers:

[Model] Refactor Phi-4-multimodal to use merged processor and support V1 #15477

github-actions · 2025-03-28T07:26:26Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: DarkLight1337 <[email protected]>

mergify · 2025-04-01T08:21:52Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @DarkLight1337.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: DarkLight1337 <[email protected]>

mergify · 2025-04-01T16:09:32Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @DarkLight1337.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 · 2025-04-19T09:41:48Z

The blocker is gone! PTAL

Signed-off-by: DarkLight1337 <[email protected]>

ywang96

LGTM - ~~can we merge from main again and turn on all MM CI for this PR?~~ since it's updated and CI already passed, feel free to merge.

Signed-off-by: DarkLight1337 <[email protected]>

Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Agata Dobrzyniewicz <[email protected]>

Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Mu Huai <[email protected]>

Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Yuqi Zhang <[email protected]>

#951) ### What this PR does / why we need it? Remove legacy input mapper/processor from V0. Find more details at #673 and vllm-project/vllm#15686. ### Does this PR introduce _any_ user-facing change? no. ### How was this patch tested? Launch online service: ```bash vllm serve Qwen/Qwen2.5-VL-7B-Instruct \ --dtype bfloat16 \ --max_model_len 32768 \ --max-num-batched-tokens 32768 ``` Query the server: ```bash curl http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "Qwen/Qwen2.5-VL-7B-Instruct", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": [ {"type": "image_url", "image_url": {"url": "https://modelscope.oss-cn-beijing.aliyuncs.com/resource/qwen.png"}}, {"type": "text", "text": "What is the text in the illustrate?"} ]} ] }' ``` Result: ```bash {"id":"chatcmpl-619e70733ed148b3be3a0b6524ee0ef3","object":"chat.completion","created":1748226332,"model":"/home/sss/.cache/modelscope/hub/models/Qwen/Qwen2___5-VL-7B-Instruct","choices":[{"index":0,"message":{"role":"assistant","reasoning_content":null,"content":"The text in the illustration reads \"TONGYI Qwen.\"","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"pro ``` Signed-off-by: shen-shanshan <[email protected]> Co-authored-by: wangxiyuan <[email protected]>

vllm-project#951) ### What this PR does / why we need it? Remove legacy input mapper/processor from V0. Find more details at vllm-project#673 and vllm-project/vllm#15686. ### Does this PR introduce _any_ user-facing change? no. ### How was this patch tested? Launch online service: ```bash vllm serve Qwen/Qwen2.5-VL-7B-Instruct \ --dtype bfloat16 \ --max_model_len 32768 \ --max-num-batched-tokens 32768 ``` Query the server: ```bash curl http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "Qwen/Qwen2.5-VL-7B-Instruct", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": [ {"type": "image_url", "image_url": {"url": "https://modelscope.oss-cn-beijing.aliyuncs.com/resource/qwen.png"}}, {"type": "text", "text": "What is the text in the illustrate?"} ]} ] }' ``` Result: ```bash {"id":"chatcmpl-619e70733ed148b3be3a0b6524ee0ef3","object":"chat.completion","created":1748226332,"model":"/home/sss/.cache/modelscope/hub/models/Qwen/Qwen2___5-VL-7B-Instruct","choices":[{"index":0,"message":{"role":"assistant","reasoning_content":null,"content":"The text in the illustration reads \"TONGYI Qwen.\"","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"pro ``` Signed-off-by: shen-shanshan <[email protected]> Co-authored-by: wangxiyuan <[email protected]> Signed-off-by: wangxiaoxin (A) <[email protected]>

vllm-project#951) ### What this PR does / why we need it? Remove legacy input mapper/processor from V0. Find more details at vllm-project#673 and vllm-project/vllm#15686. ### Does this PR introduce _any_ user-facing change? no. ### How was this patch tested? Launch online service: ```bash vllm serve Qwen/Qwen2.5-VL-7B-Instruct \ --dtype bfloat16 \ --max_model_len 32768 \ --max-num-batched-tokens 32768 ``` Query the server: ```bash curl http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "Qwen/Qwen2.5-VL-7B-Instruct", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": [ {"type": "image_url", "image_url": {"url": "https://modelscope.oss-cn-beijing.aliyuncs.com/resource/qwen.png"}}, {"type": "text", "text": "What is the text in the illustrate?"} ]} ] }' ``` Result: ```bash {"id":"chatcmpl-619e70733ed148b3be3a0b6524ee0ef3","object":"chat.completion","created":1748226332,"model":"/home/sss/.cache/modelscope/hub/models/Qwen/Qwen2___5-VL-7B-Instruct","choices":[{"index":0,"message":{"role":"assistant","reasoning_content":null,"content":"The text in the illustration reads \"TONGYI Qwen.\"","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"pro ``` Signed-off-by: shen-shanshan <[email protected]> Co-authored-by: wangxiyuan <[email protected]>

vllm-project#951) ### What this PR does / why we need it? Remove legacy input mapper/processor from V0. Find more details at vllm-project#673 and vllm-project/vllm#15686. ### Does this PR introduce _any_ user-facing change? no. ### How was this patch tested? Launch online service: ```bash vllm serve Qwen/Qwen2.5-VL-7B-Instruct \ --dtype bfloat16 \ --max_model_len 32768 \ --max-num-batched-tokens 32768 ``` Query the server: ```bash curl http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "Qwen/Qwen2.5-VL-7B-Instruct", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": [ {"type": "image_url", "image_url": {"url": "https://modelscope.oss-cn-beijing.aliyuncs.com/resource/qwen.png"}}, {"type": "text", "text": "What is the text in the illustrate?"} ]} ] }' ``` Result: ```bash {"id":"chatcmpl-619e70733ed148b3be3a0b6524ee0ef3","object":"chat.completion","created":1748226332,"model":"/home/sss/.cache/modelscope/hub/models/Qwen/Qwen2___5-VL-7B-Instruct","choices":[{"index":0,"message":{"role":"assistant","reasoning_content":null,"content":"The text in the illustration reads \"TONGYI Qwen.\"","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"pro ``` Signed-off-by: shen-shanshan <[email protected]> Co-authored-by: wangxiyuan <[email protected]> Signed-off-by: wangxiaoxin (A) <[email protected]>

…uly22 upstream changes - removed legacy input processors and refactored for multi-modal models (#28406) ### Ticket [N/A](#27285) ### Problem description - Legacy input mappers/processors were removed from vLLM V0 (vllm-project/vllm#15686, vllm-project/vllm#10114). These changes are required to maintain compatibility of existing integrated models after pulling upstream changes in tenstorrent/vllm#172. ### What's changed - Removed legacy vLLM input processors from Llama3, Gemma3, Qwen2.5-VL - Defined new multi-modal input processor classes for Llama3.2-11B-Vision (`MllamaMultiModalProcessor`), Gemma3 / Qwen2.5-VL (`MultiModalProcessor`) and added support multi-modal limits for each - Moved max seq len assertion for Llama8B to model initialization, `--max_model_len` must be set on vLLM side for any models which support less than default max context length - Fixed bug where `create_multimodal_model` import was removed for Llama3.2-11B-Vision and broke the model (from 87b758d) ### Checklist - [x] [All post commit](https://github.com/tenstorrent/tt-metal/actions/workflows/all-post-commit-workflows.yaml) CI passes - [x] [Blackhole Post commit](https://github.com/tenstorrent/tt-metal/actions/workflows/blackhole-post-commit.yaml) CI with demo tests passes (if applicable) - [x] [Model regression](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-models.yaml) CI passes (if applicable) - [x] [Device performance regression](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-device-models.yaml) CI passes (if applicable) - [x] (For models and ops writers) [Single-card demo tests](https://github.com/tenstorrent/tt-metal/actions/workflows/single-card-demo-tests.yaml) CI passes (if applicable) See [recommended dev flow](https://github.com/tenstorrent/tt-metal/blob/main/models/docs/MODEL_ADD.md#a-recommended-dev-flow-on-github-for-adding-new-models). - [x] [Galaxy quick](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-quick.yaml) CI passes (if applicable) - [x] [Galaxy demo tests, for Llama](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-demo-tests.yaml) CI passes, if applicable, because of current Llama work - [x] (For runtime and ops writers) [T3000 unit tests](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-unit-tests.yaml) CI passes (if applicable, since this is run on push to main) - [x] (For models and ops writers) [T3000 demo tests](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-demo-tests.yaml) CI passes (if applicable, since this is required for release) - [x] New/Existing tests provide coverage for changes vLLM nightly tests - https://github.com/tenstorrent/tt-metal/actions/runs/17680447236 --------- Signed-off-by: Salar <[email protected]> Co-authored-by: Igor Djuric <[email protected]>

…uly22 upstream changes - removed legacy input processors and refactored for multi-modal models (tenstorrent#28406) ### Ticket [N/A](tenstorrent#27285) ### Problem description - Legacy input mappers/processors were removed from vLLM V0 (vllm-project/vllm#15686, vllm-project/vllm#10114). These changes are required to maintain compatibility of existing integrated models after pulling upstream changes in tenstorrent/vllm#172. ### What's changed - Removed legacy vLLM input processors from Llama3, Gemma3, Qwen2.5-VL - Defined new multi-modal input processor classes for Llama3.2-11B-Vision (`MllamaMultiModalProcessor`), Gemma3 / Qwen2.5-VL (`MultiModalProcessor`) and added support multi-modal limits for each - Moved max seq len assertion for Llama8B to model initialization, `--max_model_len` must be set on vLLM side for any models which support less than default max context length - Fixed bug where `create_multimodal_model` import was removed for Llama3.2-11B-Vision and broke the model (from tenstorrent@87b758d) ### Checklist - [x] [All post commit](https://github.com/tenstorrent/tt-metal/actions/workflows/all-post-commit-workflows.yaml) CI passes - [x] [Blackhole Post commit](https://github.com/tenstorrent/tt-metal/actions/workflows/blackhole-post-commit.yaml) CI with demo tests passes (if applicable) - [x] [Model regression](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-models.yaml) CI passes (if applicable) - [x] [Device performance regression](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-device-models.yaml) CI passes (if applicable) - [x] (For models and ops writers) [Single-card demo tests](https://github.com/tenstorrent/tt-metal/actions/workflows/single-card-demo-tests.yaml) CI passes (if applicable) See [recommended dev flow](https://github.com/tenstorrent/tt-metal/blob/main/models/docs/MODEL_ADD.md#a-recommended-dev-flow-on-github-for-adding-new-models). - [x] [Galaxy quick](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-quick.yaml) CI passes (if applicable) - [x] [Galaxy demo tests, for Llama](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-demo-tests.yaml) CI passes, if applicable, because of current Llama work - [x] (For runtime and ops writers) [T3000 unit tests](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-unit-tests.yaml) CI passes (if applicable, since this is run on push to main) - [x] (For models and ops writers) [T3000 demo tests](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-demo-tests.yaml) CI passes (if applicable, since this is required for release) - [x] New/Existing tests provide coverage for changes vLLM nightly tests - https://github.com/tenstorrent/tt-metal/actions/runs/17680447236 --------- Signed-off-by: Salar <[email protected]> Co-authored-by: Igor Djuric <[email protected]>

…uly22 upstream changes - removed legacy input processors and refactored for multi-modal models (#28406) ### Ticket [N/A](#27285) ### Problem description - Legacy input mappers/processors were removed from vLLM V0 (vllm-project/vllm#15686, vllm-project/vllm#10114). These changes are required to maintain compatibility of existing integrated models after pulling upstream changes in tenstorrent/vllm#172. ### What's changed - Removed legacy vLLM input processors from Llama3, Gemma3, Qwen2.5-VL - Defined new multi-modal input processor classes for Llama3.2-11B-Vision (`MllamaMultiModalProcessor`), Gemma3 / Qwen2.5-VL (`MultiModalProcessor`) and added support multi-modal limits for each - Moved max seq len assertion for Llama8B to model initialization, `--max_model_len` must be set on vLLM side for any models which support less than default max context length - Fixed bug where `create_multimodal_model` import was removed for Llama3.2-11B-Vision and broke the model (from 87b758d) ### Checklist - [x] [All post commit](https://github.com/tenstorrent/tt-metal/actions/workflows/all-post-commit-workflows.yaml) CI passes - [x] [Blackhole Post commit](https://github.com/tenstorrent/tt-metal/actions/workflows/blackhole-post-commit.yaml) CI with demo tests passes (if applicable) - [x] [Model regression](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-models.yaml) CI passes (if applicable) - [x] [Device performance regression](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-device-models.yaml) CI passes (if applicable) - [x] (For models and ops writers) [Single-card demo tests](https://github.com/tenstorrent/tt-metal/actions/workflows/single-card-demo-tests.yaml) CI passes (if applicable) See [recommended dev flow](https://github.com/tenstorrent/tt-metal/blob/main/models/docs/MODEL_ADD.md#a-recommended-dev-flow-on-github-for-adding-new-models). - [x] [Galaxy quick](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-quick.yaml) CI passes (if applicable) - [x] [Galaxy demo tests, for Llama](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-demo-tests.yaml) CI passes, if applicable, because of current Llama work - [x] (For runtime and ops writers) [T3000 unit tests](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-unit-tests.yaml) CI passes (if applicable, since this is run on push to main) - [x] (For models and ops writers) [T3000 demo tests](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-demo-tests.yaml) CI passes (if applicable, since this is required for release) - [x] New/Existing tests provide coverage for changes vLLM nightly tests - https://github.com/tenstorrent/tt-metal/actions/runs/17680447236 --------- Signed-off-by: Salar <[email protected]> Co-authored-by: Igor Djuric <[email protected]>

mergify bot added the multi-modality Related to multi-modality (#4194) label Mar 28, 2025

DarkLight1337 added this to Multi-modality Core Mar 28, 2025

DarkLight1337 self-assigned this Mar 28, 2025

[Core] Remove legacy input mapper/processor from V0

166b551

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 force-pushed the deprecate-input-registry-v0 branch from a698201 to 166b551 Compare March 28, 2025 07:31

DarkLight1337 mentioned this pull request Mar 28, 2025

[RFC]: Merge input processor and input mapper for multi-modal models #10114

Closed

57 tasks

DarkLight1337 added 2 commits March 28, 2025 07:39

Remove unused code

5e1d1ea

Signed-off-by: DarkLight1337 <[email protected]>

Fix pre-commit

73d0dca

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 moved this to Blocked in Multi-modality Core Mar 28, 2025

mergify bot added the needs-rebase label Apr 1, 2025

Merge branch 'main' into deprecate-input-registry-v0

0b2b41d

Signed-off-by: DarkLight1337 <[email protected]>

mergify bot removed the needs-rebase label Apr 1, 2025

mergify bot added the needs-rebase label Apr 1, 2025

DarkLight1337 mentioned this pull request Apr 12, 2025

[V1][Perf] Avoid mem duplication when aggregating MM tensors #16440

Open

Merge branch 'main' into deprecate-input-registry-v0

febf089

Signed-off-by: DarkLight1337 <[email protected]>

mergify bot removed the needs-rebase label Apr 19, 2025

DarkLight1337 moved this from Blocked to In Progress in Multi-modality Core Apr 19, 2025

DarkLight1337 marked this pull request as ready for review April 19, 2025 09:41

DarkLight1337 requested review from ywang96, zhuohan123, youkaichao, alexm-redhat, comaniac and njhill as code owners April 19, 2025 09:41

DarkLight1337 added 2 commits April 19, 2025 15:02

Fix backward compatibility

d289671

Signed-off-by: DarkLight1337 <[email protected]>

Keep dummy data for now

1bc5f7d

Signed-off-by: DarkLight1337 <[email protected]>

ywang96 approved these changes Apr 28, 2025

View reviewed changes

DarkLight1337 merged commit aec9674 into vllm-project:main Apr 28, 2025
51 checks passed

DarkLight1337 deleted the deprecate-input-registry-v0 branch April 28, 2025 07:38

github-project-automation bot moved this from In Progress to Done in Multi-modality Core Apr 28, 2025

jikunshang pushed a commit to jikunshang/vllm that referenced this pull request Apr 29, 2025

[Core] Remove legacy input mapper/processor from V0 (vllm-project#15686)

859eba1

Signed-off-by: DarkLight1337 <[email protected]>

lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Apr 29, 2025

[Core] Remove legacy input mapper/processor from V0 (vllm-project#15686)

0ebe304

Signed-off-by: DarkLight1337 <[email protected]>

adobrzyn pushed a commit to HabanaAI/vllm-fork that referenced this pull request Apr 30, 2025

[Core] Remove legacy input mapper/processor from V0 (vllm-project#15686)

5f83b8a

Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Agata Dobrzyniewicz <[email protected]>

RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025

[Core] Remove legacy input mapper/processor from V0 (vllm-project#15686)

71f9f7f

Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Mu Huai <[email protected]>

ckhordiasma mentioned this pull request May 14, 2025

nm vllm ent 0.8.5 sync red-hat-data-services/vllm#139

Merged

zzzyq pushed a commit to zzzyq/vllm that referenced this pull request May 24, 2025

[Core] Remove legacy input mapper/processor from V0 (vllm-project#15686)

3fe0895

Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Yuqi Zhang <[email protected]>

shen-shanshan mentioned this pull request May 26, 2025

[ModelRunner][MultiModal] Remove legacy input mapper/processor from V0 vllm-project/vllm-ascend#951

Merged

DarkLight1337 mentioned this pull request Jul 30, 2025

[Deprecation] Remove deprecated args and methods #21907

Merged

4 tasks

skhorasganiTT mentioned this pull request Sep 12, 2025

[vLLM] Compatibility fixes for model generators after pulling Apr16-July22 upstream changes - removed legacy input processors and refactored for multi-modal models tenstorrent/tt-metal#28406

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Core] Remove legacy input mapper/processor from V0 #15686

[Core] Remove legacy input mapper/processor from V0 #15686

Uh oh!

DarkLight1337 commented Mar 28, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Mar 28, 2025

Uh oh!

mergify bot commented Apr 1, 2025

Uh oh!

mergify bot commented Apr 1, 2025

Uh oh!

DarkLight1337 commented Apr 19, 2025

Uh oh!

ywang96 left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[Core] Remove legacy input mapper/processor from V0 #15686

[Core] Remove legacy input mapper/processor from V0 #15686

Uh oh!

Conversation

DarkLight1337 commented Mar 28, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 28, 2025

Uh oh!

mergify bot commented Apr 1, 2025

Uh oh!

mergify bot commented Apr 1, 2025

Uh oh!

DarkLight1337 commented Apr 19, 2025

Uh oh!

ywang96 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 commented Mar 28, 2025 •

edited by github-actions bot

Loading

ywang96 left a comment •

edited

Loading