Skip to content

Conversation

huachenheli
Copy link
Contributor

@huachenheli huachenheli commented May 21, 2025

Directly converting RGBA to RGB via convert on PIL.Image produces a strange background as demonstrated in the picture:

Test plan:

Unit test: pytest tests/multimodal/test_image.py -s

Local server:
vllm serve /home/huachenheli/local/llm/huggingface/llama4/Llama-4-Scout-17B-16E-Instruct --gpu-memory-utilization 0.5 --tensor-parallel-size 8 --max-model-len 65536

curl -0 -v -X POST localhost:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d "@-" << EOF
{
    "messages": [{
        "role": "user",
        "model": "/home/huachenheli/local/llm/huggingface/llama4/Llama-4-Scout-17B-16E-Instruct",
        "content": [
            {"type": "text", "text": "What is the background color of this image, excluding four dice?"},
            {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/4/47/PNG_transparency_demonstration_1.png"}}
        ]
    }]
}
EOF

Without this change:

"content":"The background color of the image, excluding the four dice, appears to be a multicolored striped pattern. The colors include blue, white, green, purple, red, and yellow. However, if we had to choose a dominant color or a color that is most representative of the background, it would be difficult to pinpoint a single color due to the striped nature of the background.\n\nHowever, if the question is asking for a color that is visible behind the dice and not obstructed by them, then the answer could be inferred as black, since there is a black border around the image. \n\nTherefore, the background color of the image, excluding the four dice, is black."

With this change:

"content":"The background color of the image, excluding the four dice, is white."

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added the multi-modality Related to multi-modality (#4194) label May 21, 2025
@huachenheli huachenheli marked this pull request as ready for review May 21, 2025 22:29
Comment on lines 35 to 48
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could just make a image_to_image_mode function that has this conditional inside of it

@huachenheli huachenheli requested a review from mgoin May 21, 2025 23:56
@DarkLight1337
Copy link
Member

Thanks for reporting and fixing this issue! Can you add a unit test to avoid future regressions?

@huachenheli
Copy link
Contributor Author

Thanks for reporting and fixing this issue! Can you add a unit test to avoid future regressions?

Done.

@DarkLight1337
Copy link
Member

Hmm actually, I see other places where .convert is used, can you update them as well?

@huachenheli
Copy link
Contributor Author

Hmm actually, I see other places where .convert is used, can you update them as well?

Let me check.

@mergify mergify bot added the documentation Improvements or additions to documentation label May 22, 2025
@huachenheli
Copy link
Contributor Author

Updated existing call sites.

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) May 22, 2025 16:03
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label May 22, 2025
@DarkLight1337
Copy link
Member

Please also merge from main to fix CI failures.

auto-merge was automatically disabled May 22, 2025 20:10

Head branch was pushed to by a user without write access

Copy link

mergify bot commented May 22, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ChenheliHua.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label May 22, 2025
@mergify mergify bot removed the tpu Related to Google TPUs label May 22, 2025
huachenheli added a commit to huachenheli/vllm that referenced this pull request May 22, 2025
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Signed-off-by: Chenheli Hua <[email protected]>
@huachenheli huachenheli deleted the rgba branch July 2, 2025 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/build documentation Improvements or additions to documentation frontend multi-modality Related to multi-modality (#4194) needs-rebase ready ONLY add when PR is ready to merge/full CI is needed speculative-decoding structured-output tool-calling v1
Projects
Status: Done
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants